Trying to assess temperature, fan and throttling levels

Hi there,

I am a new member here but not new to Linux. I am London-based and have been using various versions of Linux (Ubuntu, Mint, Manjaro, Arch) since 2013.

I am not a programmer, coder or hacker so I struggle to get guidance on the Arch Linux forum.

I have a Lenovo Thinkpad P52 laptop and I am concerned about high temperatures, possible throttling and excessive fan noise.

I am hoping to get some help assessing the situation in terms of reading/analysing the various programs I’ve installed (Thinkfan UI, Gnome System Monitor, autocpu-free, tuned-gui, tlp-ui) to try to figure out if there is a problem or if improvements can be made.

I have tended to use google to find solutions but hit a dead end now.

If anyone could give instructions for how to provide my system’s information, I will follow that closely and hope not to waste people’s time/energy.

Thanks for reading.

Hi Thinkpad, welcome to the forums!

So I can see three possible scenario’s based on what you’ve said so far;

  1. A hardware fault on your machine causing, which could be a faulty fan, or something as simple as dust in the fan / lack of lubricant
  2. A Linux vs Hardware compatibility issue with regards to fan control
  3. Something running on the Laptop that’s eating way more CPU than expected

Just to identify where we are, could you have a look at installing a package called “psensor”. If you run this up it should be able to read the CPU temperature. (if not, then you might also need to install the “lmsensors” package)

If you could run this for 5 mins so it graphs CPU usage vs Temp, and make a note of how loud the fan is vs how loud you expect, then post a screen shot of the psensor output, it should give us an idea of what you’re experiencing and which of the three scenario’s are least / most likely.

To give you an example of what you’re looking for, this is from my machine which is running Debian Gnome Desktop on a Raspberry Pi 5.

You may also see fan speed on yours which could be useful if you could graph that too … unfortunately my box is fan-less so there’s nothing to show :wink:

1 Like

Dear madpenguin, thanks a lot for replying. What you say makes sense.

I am definitely keen to distinguish which scenario might be the issue, and rule out others.
This is a second hand laptop so it could be scenario 1 and a lack of lubricant. I have opened it to install new storage and used an air duster.

I am suspicious of there being a fan control issue (scenario 2). This seems to be an area that requires more expertise.

I have wondered about scenario 3 also in terms of my drive layout and how it might negatively impact power usage or temperature because I also receive the ‘A start job is running for dev-disk-by… (1min 30s)’ error, although this doesn’t seem to be serious or impair functioning.

I installed psensor but I have about 30 read-out options compared to your 4.
I imagine I would need to remove some of them to give you more insight but I’m not sure which to remove or what most of them are.

Ok, so … first off, I don’t think there’s anything necessarily wrong with the fan. Laptop (and desktop) fans can be loud when they’re working flat out, the trick is to run the machine so that it doesn’t need the fans flat out to keep it cool.

Unfortunately (!) it would seem the P52 has a bit of a reputation for running “hot”. Looking at your numbers, 100C is really “too” hot, although different types of chip can take higher temperatures, running over 90 for any chip probably isn’t the best idea re; it’s lifespan.

I’ve seen two suggestions online that could likely make a difference;

  1. The fan will be attached to the CPU via some sort of cradle or screwed down mechanism, but beyond that there will be thermal paste between the CPU and the Fan heat sink. Apparently Lenovo paste isn’t so great over time, so replacing that would be a good start. Cleaning off the old stuff is relatively easy, you can buy cleaning fluid or I’ve seen people use lemon juice. (use sparingly in each instance) Then you can buy new paste on Amazon relatively cheaply. I’m pretty sure you’ll find lots of video demo’s on Youtube of how to reseat fans with new paste, maybe even a video for the P52.
  2. It would seem that the CPU may be a little “overpowered” for the case. If you can boot into the BIOS settings, there should be a section covering the CPU and CPU settings. Somewhere there should be an option for controlling the voltage applied to the CPU, kind of like an over-clocking or under-clocking function. I’ve seen a few posts where people have tried setting the cache power offset to -125mv which has resulted in the idle temp going down from 90C to 40C, which sounds like the sort of thing you need. That might also be worth a shot, but I’d try the paste first, if the heat isn’t getting to the fan heatsink, the fan effect will be limited, which “seems” to be the issue you’re having.

Re; the settings, I was looking at the CPU usage (min,max) and the CPU temp. As I say, running up towards 100C with CPU at 40% or less, feels like your cooling just isn’t working, yet if you can hear the fan … (!)

(the start job message shouldn’t be relevant to the heat)

Note; SSD’s “can” generate some heat and have a “MUCH” lower heat tollerance, although in this instance it seems to be registering an Ok temp. Typical SSD’s and NVMe modules seem to be listed with max-temp’s in the 60-70 range, so putting them in “hot” cases can be an issue of the whole case heats up. (there’s a lot to be said for external storage with independent cooling :wink: )

While I think, I recall back in the 90’s taking the lid off a failed Pentium (4?) machine which had for some reason stopped working. I was shocked to discover that the CPU apparently had no cooler on top so was unable to explain how it had worked for a year then suddenly stopped. On closer inspection, the CPU seemed to have a plastic casing around it. Turns out it really did have a cooler on top, it’s just that it got so hot it literally turned the entire fan to liquid plastic, which subsequently set around the CPU when the machine was powered off (!)

The point; don’t let anyone tell you running “hot” is “fine” … :wink:

Now my brain is working, back in 2012 I rescued a failed Dell server (a really big beefy one that someone had thought would run fine in a cupboard with no cooling). After a little testing it appeared that the graphics chip was causing a problem, and sure enough there was no heatsink on top. The heatsink was found loose in the bottom of the case together with the spring-loaded wire fixer, which was previously soldered to the mother board. The heatsink got so hot the solder softened sufficiently for the spring to overpower it (!) Unbelievably, after re-soldering the heatsink and running the machine “outside” of the cupboard, it continued to work with no problems …

1 Like

Thanks a lot for this, once again, it is really helpful for making sense of all this info.
I definitely heed your advise about not accepting hot running

I went to try out the BIOS voltage tweak but there is no option there. I will search for other means of doing this. Might a ‘ThrottleStop’ alternative work for these purposes?

As I monitor it more closely, it isn’t reaching 100C much. I will be vigilant about what might have caused that.
Generally CPU usage at 10% seems to result in CPU temp (and GPU temp) at around 55-60C.

I have been trying ThinkFan UI which has a feature to turn the fan on ‘full throttle’ and this brings those temps down to around 40C, then with full throttle switched off, temps go up to 50C after ten minutes and seem to stay for up to an hour at low CPU activity.

I would be keen to replace the thermal paste to see if it has an impact. I would rather not risk doing it myself, are there any places you would recommend to try or to avoid? This job could also be a chance to check/optimise the ram configuration for being ‘dual channel’ as I read that can help performance.

I am struggling to distinguish the NVMe modules for storage in the list of temperature. Preferences does provide some extra info but doesn’t ID each drive and there are 3 NVMe readings (for one drive?) at around 50C (fans at full throttle bring it down to 40C where it then seems to stay). I think one of them might not have a thermal pad attached - do you think this is worth investing in? Or a heatsink (not sure one would fit but interested in your opinion)?

Hi, Ok, so … :slight_smile:

BIOS: I have seen “throttlestop” mentioned with regards to changing the voltage setting. I sort of assumed this might be a label inside the BIOS setup, but if you’ve seen “throttlestop” somewhere you probably know more than me. “Somewhere” there is apparently the facility to change the voltage offset and -125mv is said to help lower the temp. Unfortunately since IBM sold the brand they’re not a machine range of hardware I’ve really had much contact physical with.

The fan noise “should” be linked to the CPU temperature. The motherboard should provide a fan header (probably PWM on a PC) which powers the CPU fan, so the fan speed should be linked to the temperature. I think on some machines it may be a linear increment, on others it’s a pre-programmed incement. On RPi5’s for example by default I think the fan comes on at ~ 60C and is max at 75C, so under “normal loads” at sub 60C there is no fan noise.

The full-throttle to bring the temp down feels a little like using a fire extinguisher on a volcano … it might temporarily address the symptoms but it’s not really going to solve your problem :wink:

Thermal paste … yeah, sounds a but scary and there is a risk when you open up computers. You can probably get it done as at a PC World / Curry’s, or maybe anyone who offers a gaming PC build service … they’ll probably charge you, and unless they’re a Lenvo laptop shop, I’m not sure they’re going to bring any specialist knowledge to the table that you couldn’t acquire via Youtube.

In general, don’t wear anything you know generates a static charge (like a jumper), touch a radiator before you start, limit your screwdriver contact to screws, avoid touching chips etc as much as possible. A little cleaning fluid and cotton wool will polish up the surface (of the CPU and fan heat sink) so it’s completely smooth. Make sure it’s dry, then use a spreader (light plastic or at a push some glossy cardboard) to get a thin film of paste. All over the CPU (not the heatsink), then stick the heatsink back on and screw it down. As I say, should be lots of video tutorials on YouTube which should help.

Memory … well dual channel should give you better performance. However, I’m a great believer in “if it ain’t broke”, so first I’d ask the question “is performance an issue?”. If it only has single channel memory in it, then there will be a reason for this. The reason might be as innocuous as “single channel was cheaper” or “only single channel was available”, but it might also be “dual channel makes it too hot” or “dual channel makes it unstable”. So consider whether you “need” more speed vs the risk it’s going to cause you “another” problem.
(maybe less so for a desktop, but laptops tend to be highly optimised to get too much performance into a box that’s too small and difficult to cool …)

For desktops, you can buy memory with heatsinks attached … I’ve one machine here with corsair memory where each stick comes with an aluminium jacket. Not seen this with NVMe tho’. My perception is that the issue with NVMe is this risk to the memory from system heat, rather than the actual heat they generate. I know they get hot, just maybe not as hot as CPU / graphics / RAM. (I guess maybe it depends on how hard and consistently you hit your storage) So I guess the question is, is the 50C a result of the NVMe working, or the environment resulting from the CPU heat. I would assume the latter, but if you can address the CPU tempt issue first it would then answer the question as to whether you need to explicitly cool the NVMe. (My guess would be “no”)

1 Like

I had to replace the fan unit in my Lenovo Thinkpad T500 because it was making awful noises although it still cooled the CPU. As MP says; there are plenty of YouTube videos to show in minute detail how to dismantle the laptop and how to clean everything and replace the heat-conductive paste. If you have a spare computer, it’s a good idea to watch/pause the video as you are doing it. I drew a diagram of the screw positions and laid out the screws on the diagram as I removed them, as replacing a long screw in the wrong position can cause damage.

One thing I don’t quite agree with is spreading the paste over the surface before mating the CPU with the fan. This is likely to trap air here and there, resulting in uneven cooling and unpredictable results. Unless the paste is very stiff I recommend putting a reasonably-sized blob in the centre of the CPU surface and then clamping everything together. This will force the paste outwards pushing out any air in the process. Remove any excess paste that gets squeezed out. The paste instructions will be your best guide.

Hope that helps a bit. My T500? Works a treat now.

1 Like

Thanks for this, Keith.

I’m glad and encouraged to hear your job worked out well.

I will certainly use a second computer to carry this out as carefully as possible, if I do. For now it is a bit too daunting but if I can’t find a professional option locally, I think I will have to give it a try and will follow your generous advice.

Thanks again for this.

“throttlestop” seems to be software that Windows users underthrottle with when it isn’t possible in BIOS. So I was searching for a Linux alternative. I’m not sure if intel-undervolt might enable the -125mv voltage offset - have you any experience of it? I’m tempted to give it a try, hoping it won’t be damaging…

I feel like fan noise and speed is linked directly to the CPU temperature - that it is working at least - that what the full-throttle testing was for, but I agree with you, it’s definitely not a good solution. My fan doesn’t seem to come on until around 70C.

I’ve quite a lot of experience of opening up computers, nearly all positive. But what you and Keith describe does feel particularly invasive and yes, scary! I thin I will try to track down a Lenovo laptop shop and failing that, will have to take the plunge… And if I do, I will certainly follow your advise step by step, so thank you for that.

I take your points about dual channel memory and deprioritising the NVMes, and will leave it be.

Hi, I’ve not seen “intel-undervolt” before, but then the only Intel chip I have is in my laptop. (I’ve been using AMD only for a very long time) Looking at the docs I’m not sure this is going to be a great option. Either way, I’d try the heat sink paste first.

If the fan really doesn’t come on until 70C, then it might be worth looking at the BIOS fan control (could it be that it’s coming on, but it’s really quiet @70?). Usually there is some control over when the FAN comes in the BIOS, the one I’m used to in my BIOS is “quiet mode”. Turning this on decreases the voltage applied to the fan so it runs slower.

Given changing the voltage may be an issue, the other approach might be to limit the
max CPU frequency. Usually on modern chips the idle frequency should be a fraction of the maximum frequency, then when you load the system the clock speed will increase dynamically up to it’s max, which these days seems to be in the 3-4GHz range. If you limit the top-end, then you’ll limit the performance and the heat it can generate.

Maybe try;

$ sudo apt install cpufrequtils

To get the tools.
Then for example on my box;

$ cpufreq-info -c 0
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 4294.55 ms.
  hardware limits: 1.55 GHz - 3.40 GHz
  available frequency steps: 3.40 GHz, 2.80 GHz, 1.55 GHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil
  current policy: frequency should be within 1.55 GHz and 3.40 GHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 1.39 GHz.
  cpufreq stats: 3.40 GHz:2.53%, 2.80 GHz:1.33%, 1.55 GHz:96.14%  (7148926)

Then based on this information you can use;

# bash shell
for i in 0 1 2 3 4 ....
do
    cpu-freqset -c $i -u nHz
done

(replacing … with the rest of the numbers relating to the CPU cores you have)
(replacing nGHz with the maximum frequency you want to use)
i.e. use the OS to set a soft limit on the maximum frequency it can use

Alternatively (or in addition) you could try changing the governor with -g, I think “-g conservative” will scale up the CPU frequency more slowly. (or if you just want to prove the use-case, try “powersave” which should run on minimum frequency)

1 Like

Thanks a lot for this, once again. I have made a note and will return to this advice once the thermal re-pasting is done and tested.

1 Like