CPU Overheating during Renders, crashing on Save

OK, this is driving me crazy. I have a cover due to my publisher by Monday. Two figures, HDRI background. The computer has developed a thing where it just overheats and dies during final render. Big render 1400 by 2400. I had it set to save to file, originally to Dropbox. Changed it to Desktop. It's a Windows 7 Profesisonal running the latest version of Daz. Two big graphics cards with fans. So I had it render to screen. When I judged it was far enough along to be acceptable, I canceled the render and hit save. The computer instantly dies as if I pulled the plug. When I restarted it a moment later, it gave me a CPU overheated warning. I thought it might be the fan/heat sink, but all the fans are running -- I took the cover off and looked. This computer has been in this configuration for years, and ran fine. I've had a frend who's a tech person work on the computer, and he couldn't find out what was wrong. It only does it when I'm in Daz Studio, but I don't know if that's significant because I don't game.
Any ideas?
Comments
When you render in Studio, you have 100% usage of the CPU at all time during the render. In gaming, you just have bursts of higher usage. Hence, the render causes much more heat.
I am wondering why you can't use the card, they are not nVidea I guess?
In any case, have you cleaned up the air intakes, there might be dust covering them.
I use the cards -- they're NVIDIA cards. I don't use the CPU to render at all. I'll try cleaning the air intake.
AK
That's strange, then. Unless the scene doesn't fit on the cards, it should be the cards creating the heat, as they would be doing the rendering. It's possible that your CPU is kaputt, but I'm not sure how you can test that.
You might find these articles useful:
I would suggest taking a close look at the heat exchanger for the CPU. Typically it is a piece of metal between the CPU and the CPU fan. Some have fins which can become clogged with dirt and dust. This clogging prevents efficient airflow through the heat exchanger and through the CPU fan. If the fins are clogged, a can of compressed air (Walmart, Target, etc in the electronics section) can be used to blow them out. In the name of safety, always unplug the computer from the wall power before going inside. Be careful not to shake the can of canned air, this can result in moisture being blown into the electronics (bad!). If you don't feel comfortable with this, refer to professional help.
You may need to remove the GPU fan shrouds to get all the dust out. Even though we cleaned it periodically, one of my son's graphics cards kept overheating. When we removed it from the computer to get a better look at it, it appeared clean, but when we took off the fan shroud we could see that the big GPU heat sink was completely clogged with dust in the areas that you could not see with the shroud in place. Once the dust was removed, it was fine.
Great advice if the posters have identified the problem.
Find out what is causing it, then fix it.
Use something like HWinFO64; check the various temperature settings during the render.
Once you've identified the issues (CPU could be the cause, or it could be a symptom of the problem.), fix them. :)
Do you get the CPU over-heated message each time?
Is anything else over-heating?
If you see the CPU getting hot; stop it straight away, but make sure other temperatures are ok; there can be knock-on effects to an extent.
Canned air is not very effective, too little pressure (and it's not air, but gas, which may be inflammable or hazardous to your health).
https://www.thoughtco.com/whats-in-canned-air-3975941
I use a pool pump, it's extremely powerful and effective. There are many to choose from, some may be more powerful than others. Get a 110/220 V AC model without a power supply, they're more reliable (no electronics that can break down). Be careful with pointing them at fans, or block the fan from spinning if you want blow it clean, or it may cause them to spin so fast that it damages them.
https://www.amazon.com/s/ref=sr_pg_1?rh=i%3Aaps%2Ck%3Ainflatable+pool+pump&keywords=inflatable+pool+pump&ie=UTF8&qid=1527179249
Also, you might want to replace the CPU thermal paste, it often dries out over time which makes it less effective.
I'm curious, how high is the temps getting? and what is a safe temps while rendering? if anyone knows. What is the normal? and When should you be worried it's over heating?
I got a program that lists the heat, but I'm not sure what it SHOULD be.
Thanks for the help. I'm going to try those things.
32C - 70C
or
89f - 158F
This is pretty common with older thermal compounds. You can remove the heat sink, clean off the contact surfaces of both the CPU and the heat sink, reapply paste (I like GELID Solutions GC-Extreme, or Arctic SIlver), and reassemble. I decided a while back if the CPU cooler in question is one of the Intel Stock coolers, while it is off I replace it with a higher-capacity unit. The Intel stock coolers are not so good for 100% CPU utilization, just not enough heat removal.
...indeed, 3rd party CPU coolers tend to be more efficient. The heat pipes on the one I have look like a V8 engine exhaust manifold and the heat sink block behind the exhaust fan is big. .
What are your two video cards?
Press Ctrl+Alt+Delete and click on Task Manager. When you run DAZ Studio, you shouldn't be using more than about ~20% of your CPU max. If it goes to 50% or higher then it's not using your video cards and you'll have to look at the log file in DAZ Studio (it's in the help menu) to see what's going on.
If it still overheats, get CPUID HWMonitor (or hardware monitor). You can monitor your temperatures there of all your devices. In fact, you should do so at the same time as running DAZ Studio above. Check your fan speeds. Make sure they ramp up with CPU usage. If they don't ramp up, you may need to go into your BIOS and adjust those settings. Or if you have an app for your motherboard, that is usually much easier. Failing all of that, if it still overheats, then as others said, perhaps the thermal compound needs replacing. Perhaps there is dust in your heatsinks and/or fans. Perhaps you lack proper ventilation in your case.
Having built and repaired my fair share of machines, your problem doesn't seem like overheating to me. If you hadn't mentioned overheating, my first guess would have been that your power supply is failing or isn't powerful enough. When your video cards are rendering, they draw a LOT more power than when you're idling. If the power supply can't supply enough power to your devices when it needs it, the motherboard will shut down to protect your computer . Power supplies age over time. They will give less and less power to your components. And you said you had two video cards. Try it with only one video card and see if the same happens. But also make sure your power supply is rated strong enough to power both your video card and other components. You also mentioned that it shut down when you tried to save. A failing HDD can also shut your machine down.
First thing to check is always the power supply. If you have an app that came with your power supply that can monitor voltage, amps, etc. run that as well.
My 12 core Xeon CPU on my Mac Pro goes up to 190 degrees before the fans kicks in. But it never crashes. I don't think running my CPU's at 180 to 190 degrees F for hours is a good idea. So I got myself a Nvidia Titan card.
I just had this problem hit's 90 and the system shuts down and it happens when rendering animations in Iray, I just cleaned out the dust from all the fan vents and that seem to improve it a lot. Mind you I went off and bought a new i9 sytem Iray is a really killer !
My two Zotac 980TI AMP Extremes go no higher than 80C and that is within Nvidia's specifications. In fact, only the one goes that high. The second one barley goes over 65C. They have a tripple fan setup on them and a huge heat sink
As for my CPU, its a i7 6700K @ 4.2GHz. It never goes over 65C thanks to my Noctua NH-D15 heatsink. That heatsink will knock the heat back down to room temp within a min or two.
Ok, so there is a way to save your render if you have a crash like this as well, because that image still exists somewhere in your computer or else there would be nothing to save. You have to hunt down the temp file for it. Of course, finding the issue and fixing it should be you main concern, but if you have a deadline and NEED that picture, you can still recover it and submit it if fixxing the problem takes too long. I'll have to hunt the location of where it saves this temp file down, but I'll try to see if I can find it.
Ok, so here is where my PC saves this file till I hit the actual save button. I'm on windows 10, but I would imagine it should be about the same place on your machine. Look and see if it's still here, and if not do your render as you did before, and if it crashes again when you go to save check this location and with any luck you should be able to recover your work.
C:\Users\Brimstone Omega\AppData\Roaming\DAZ 3D\Studio4\temp\render
*note if you copy paste that address in to your file explorer it should just take you there, just change my name (Brimstone Omega) to what ever your account name is.
I tend to agree with nicstt on this. Rule #1 of fixing stuff is "find out what the problem is first before you fix it"
And certainly before you do drastic stuff like replacing CPU thermal paste...
You computer can suddenly die for MANY different, and seemingly unrelated reasons, such as:
That's why I strongly recommend that people who use their PC's for heavy duty stuff like rendering make it a standard practice to fire up a monitor application when you're rendering so you can at least monitor usage and temperatures. I use free apps like GPU-Z by TechPowerUp, and CPUID HWMonitor, and watch CPU and GPU usages and temperatures. If the computer crashes when temps are low (significantly less than say 100C), then you might look elsewhere. They're a quick and easy download, and can help immensely in determining how your system is performing.
And whatever you do, make sure you don't have a problem with drivers, especially with your GPU. Cuz apparently the most difficult task in the universe of software development is making reliable GPU drivers. They cause a large percentage of unknown issues and crashes.
Not long ago I had random crashes due to a failing hard drive. Drove me nuts. But since I was monitoring my CPU and GPU, etc., I knew they weren't the cause.
Yes, make sure your cooling system is working, and there are no dust bunnies. But more importantly, monitor your system regularly to help you determine where the issues are. Cuz you can blow your dust filters all day with a leaf blower if you want, but that won't solve crashes from a bad set of drivers.
Off the top, from what the OP mentioned, it seems like the CPU isn't being used, and therefore thermal overload of the CPU MIGHT not be the cause. But again, it's easy to assume, but without a monitor it's just guessing. Maybe some other software/bug might be unexpectedly overcranking the CPU for some reason. Not long ago there was a bug in Malwarebytes that caused my 16 CPU cores to run at 100%.
And keep in mind, what seems like it might be the obvious cause might just be coincidental. Using Studio at the time might be a coincidence. Maybe Studio is using the GPU drivers in a way that causes a crash due to a driver issue.
By the way, one other easy tool for helping to figure out what went wrong, if you're using Windows 10, is the "Reliability History". Under Control Panel/Security and Maintenance/Maintenance, go to "View reliability history" and it gives you a daily summary of what happened system-wise, including failed updates, crashes, etc.
Of course you're still dealing with the insanely cryptic Windows error messages, but this section can be extremely helpful to diagnose what really happened.
Thank you to Alien Renders for his advice. I was having the same problem with CPU getting real hot during renders. I couldn't figure out why this would be since I have an Nvidia card and it is supposed to be using it to render. Checked my render settings....yep, Preferred Device set to video card. So I go to Daz log file like Alien Render suggested to do, and sure enough, it is using CPU to render. So I open the task manager window with the performance tab showing and run a render and my CPU usage instantly goes to 100% and stays there until I quit rendering. So I go back to render settings and while I have a render going in the viewport and have the task manager window up, I checkmark to use CPU in the render settings. CPU usage instantly drops to 2%! So it turns out that I have to have both CPU AND graphics card selected as preferred device. If I only have graphics card selected it ONLY uses the CPU. Very strange. But anyway, at least on my system, having both selected causes DAZ to use the graphics card, and now as I look while having a render going, my CPU temp is 28 degrees C instead of 80 like it was when using the CPU to render. Thank you thank you thank you Alien Renders!
Something that I do with my 1080 Ti in my SFF build is undervolt the card. Check to see if your graphics card has a utility that can control voltage.
If so, try dialing the voltage back a bit. Dialing it back too much can result in the card crashing, but I'm able to undervolt mine by about 25% or so, which drops the heat from the high 80s to the mid '70s. This actually doesn't reduce the speed of the card by a whole lot, only by about 5-10%. My graphics card widget includes a 'temperature target' setting, which I set to a lower temp, which adjusts the voltage accordingly and also throttles the card when it hits that temp.
Graphics card manufacturers lately have taken to pushing up the voltages to eek out a few extra percentage points of performance while still hitting that 'fine line' of extra power versus longetivity. Overclockers will often push the voltages up even further, but this can shorten the life of your card, or kill it if you push it too hard.
I have a shorter card, so it only has two fans cooling the 1080 Ti, and a shorter heatsink, which is why it tends to run a bit hotter than other 1080 Ti's. The case also isn't 'ideal' for cooling as it's pretty small, with the GPU in it's own section of the case, and no place for an intake fan being a SFF build, hence what others might consider higher temps. This is my interim Daz build while I keep kicking the can on my dedicated render station build. My latest excuse is that I want the desktop verson of Ryzen Renoir, which won't be released for a bit. I've been using it for over a year now in this config without any significant issues. It'll eventually be retired to HTPC duty, which was always my intention.
Anyways, yeah check to see if your graphics card software allows you to reduce the card voltage, or to set lower temperature targets for your GPU. This might help you with your temps, and get you across the finish line with your current render.
Also, check to see if you can adjust the ramp up of your fan speed via software. On my card, the default doesn't max out the fan speed until much higher temps are achieved, but I was able to adjust the fan curve so that the fan speed hits maximum much earlier, say at 60c. The card will still heat up, but a bit more slowly as you are shedding more heat early on. This can help stave off GPU throttling by a couple of minutes, allowing your card to run at higher speeds a bit longer.
As for the CPU, depending on which CPU and motherboard you have, you might also be able to adjust the fan curves and such with utilities, if cleaning out your pc, improving airflow, and such isn't solving your problem. Also, maybe let the computer sit a few minutes after completing the render before attempting to save the render, to see if letting the CPU sit at idle for a few minutes will help your situation. Make sure your Daz viewport is set to a non-Iray mode as well before starting your render, as the Iray viewport is more demanding than the other modes.
Just noticed that the OP is from a while ago, but this advice is probably still useful for others.
@chuckitaway_6a8e23584d
Make sure you have at least the minimum driver required for the version of Studio that you use. If on Windows 10, make sure an update hasn't overwritten the Nvidia driver w/ a crappy MS one. Avoid using the "latest" Nvidia drivers until you really need to. Finally Iray will drop to CPU if the scene can't fit everything into VRAM. Even w/ 11GB VRAM you won't be able to fit multiple characters w/ 4K textures and a detailed high poly environment.
Well forget everything I said before. Well not everything, but it turns out I was wrong. When I saw my CPU usage go from 100 to 2% by ticking the boxes in render settings, I THOUGHT that it had stopped using the CPU and was only using the GPU. What actually had happened was it just stopped rendering altogether. I discovered that by going back to the log file and seeing that the render had been aborted. And then I saw all these lines:
IRAY rend error: CUDA device 0: cannot get memory statistics (unspecified launch failure)
IRAY rend error: CUDA device 0 (GeForce GTX 970): unspecified launch failure (while allocating host mapped memory buffer)
IRAY rend error: CUDA device 0 (GeForce GTX 970): device cannot be used as it does not support mapped memory (canMapHostMemory)
1.0 IRAY rend error: CUDA device 0 (GeForce GTX 970): Device initialization failed, will not be used
IRAY rend warn : There is no CUDA-capable GPU available to the iray photoreal renderer.
and so on..... so then I thought maybe I should check for an NVidia driver update for my card (GTX 970) and there was one released just a week ago. I did a clean install of the driver because I remembered I had messed with the settings in the NVidia control panel a few weeks ago after I had gotten a new monitor. Well it seems that fixed everything. Now with only GPU selected in the render settings, it is now utilizing the GPU. I think I probably screwed something up a few weeks ago when I was messing with the settings in NVidia control panel.
2020-04-04 04:29:16.421 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Rendering...
2020-04-04 04:29:16.475 Iray [INFO] - IRAY:RENDER :: 1.14 IRAY rend info : CUDA device 0 (GeForce GTX 970): Allocated 236.672 MiB of work space (280k active samples in 0.001s)
2020-04-04 04:29:18.876 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00002 iterations after 13.001s.
2020-04-04 04:29:21.263 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00003 iterations after 15.388s.
2020-04-04 04:29:23.633 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00004 iterations after 17.758s.
2020-04-04 04:29:26.011 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00005 iterations after 20.135s.
2020-04-04 04:29:28.384 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00006 iterations after 22.508s.
2020-04-04 04:29:30.759 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00007 iterations after 24.884s.
2020-04-04 04:29:33.138 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00008 iterations after 27.263s.
2020-04-04 04:29:35.518 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00009 iterations after 29.642s.
2020-04-04 04:29:37.892 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00010 iterations after 32.017s.
2020-04-04 04:29:40.264 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00011 iterations after 34.388s.
2020-04-04 04:29:42.631 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00012 iterations after 36.755s.
2020-04-04 04:29:45.002 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00013 iterations after 39.127s.
2020-04-04 04:29:47.373 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00014 iterations after 41.498s.
2020-04-04 04:29:49.741 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00015 iterations after 43.866s.
2020-04-04 04:29:52.121 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00016 iterations after 46.246s.
2020-04-04 04:29:54.499 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00017 iterations after 48.623s.
2020-04-04 04:29:59.224 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00019 iterations after 53.349s.
2020-04-04 04:30:03.964 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00021 iterations after 58.088s.
2020-04-04 04:30:08.697 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00023 iterations after 62.821s.
2020-04-04 04:30:13.419 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00025 iterations after 67.544s.
2020-04-04 04:30:18.126 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00027 iterations after 72.250s.
2020-04-04 04:30:25.288 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00030 iterations after 79.412s.
2020-04-04 04:30:32.586 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00033 iterations after 86.711s.
2020-04-04 04:30:39.715 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00036 iterations after 93.839s.
2020-04-04 04:30:46.780 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00039 iterations after 100.905s.
2020-04-04 04:30:56.189 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00043 iterations after 110.313s.
2020-04-04 04:31:05.607 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00047 iterations after 119.732s.
2020-04-04 04:31:17.362 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00052 iterations after 131.487s.
2020-04-04 04:31:29.239 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Received update to 00057 iterations after 143.364s.
2020-04-04 04:31:43.457 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2020-04-04 04:31:43.457 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (GeForce GTX 970): 57 iterations, 6.544s init, 150.405s render
2020-04-04 04:31:43.490 Compiling OpenGL Shader...
2020-04-04 04:31:43.491 Vertex Shader:
Vertex Shader compiled successfully.
Fragment Shader:
Fragment Shader compiled successfully.
Linking Shader:
Shader Program successfully linked.
@ fastbike1.... am I right? everything looks good in the log file?