Frozen laptop with constant SSD access [SOLVED]

Keith · 25 September 2023 17:45

Laptop: Dell Latitude E5570
O/S: Ubuntu 22.04 installed about two months ago.

Whilst running FireFox, Thunderbird and another application(?), the laptop virtually froze with the disk access light flashing manically.
I managed to start the system monitor (very slowly!) and it showed:

All four CPUs running at 80-97%
Memory: 99.7% of 8GB
Swap: 100% of 2.1GB
Network: unfortunately I didn’t make a note of that but I switched off the router, which made no difference.

After 10mins I hard rebooted (unwise?) and system monitor showed all back to normal.
Thoughts of malware spring to mind.
Comments/advice would be very welcome.

madpenguin · 25 September 2023 18:36

Mmm, you likely had an application with a severe memory leak. In this instance you’re looking at top to tell you which one it is. (shift + capital M) sorts by memory usage. e will then display the numbers in a more readable format. It’s the RES column you’re interested in. k will kill (by default) the top listed process (which if you’ve done the above, is likely the culprit).

When you say malware, this as described it more likely buggy software. Firefox and Thunderbird would be my first suspects. Thunderbird in particular.

Example top;

There are some websites which have a nasty habit of triggering this in browsers, and certain browsers handle the situation better than others. Chrome can get hit in the same way but tends to recover 30 seconds after your machine runs out of memory. I won’t mention any websites in particular, I especially won’t mention LinkedIn, but if it is the browser, you might find it useful to look back in your history to see which tabs were open when it crashed.

For what it’s worth, I don’t consider Firefox sufficiently stable for me to run it for anything other than testing. In recent times I find it crashes after a while, nearly every time I use it. If you don’t want Google’s ads etc, then Chromium is a good bet.

In Ubuntu;

snap install chromium

(although there seem to be quite a few others now that I’ve not tried)

Keith · 26 September 2023 08:12

Many thanks for your very helpful reply.

Top looks like a very useful command and today FF and Thunderbird are top of the list (and behaving themselves). When the crash occurred I did have quite few tabs open, several being rather graphical-heavy. So just being more careful about my tab usage might help in the future.

As for Firefox being buggy: well perhaps it is but you’ve helped me deal with that now. You recommend using Chromium to avoid Google ads, but Wikipedia says:

Chromium is a free and open-source web browser project, mainly developed and maintained by Google. This codebase provides the vast majority of code for the Google Chrome browser, which is proprietary software and has some additional features.

…and you know how I try to avoid Google products! The Brave web browser is also based on Chromium, apparently, so I’ll stick with FF using DuckDuckGo and Privacy Badger - the latter two being very good at maintaining my privacy. At least I believe so.

Keith

madpenguin · 26 September 2023 10:53

Mmm, the Chromium project has ~ 3000 project members and whereas Google and employees may contribute to the project, they don’t own it, nor is it a google product. (whereas google-chrome “is”)

If you take a look at the license seems to be a variation on the MIT license. As I read it, you can do pretty much anything with it so long as you don’t use Google’s name. This means any other browsers based on the code, also not a Google or Google related products. (it looks like the project has been forked ~ 6000 times)

If we’re going to consider projects to which Google contributes projects to avoid, then take a look at the 2022 Linux kernel contributions table (!)

contrib

And even worse, I’m expecting M$ to be up there in a fairly prominent position in the 2023 table. (from what I hear of the amount of code they’ve been putting forward)

The mitigation from the kernel perspective is that it has to get past many non-Google eyes, and then it has to get past Linus. From a browser perspective, the source code is all visible and many project members who are nothing to do with Google. Many of them will be browsing every change so anything untoward will get picked up by people who are very hot on privacy.

From memory, Chrome started off using a web engine called webkit, which was subsequently used in other browsers. Since then they’ve moved over to an engine called blink. It’s this engine which is common to Chromium and Chromium derived projects. More specifically, I think the MacOS browser was based on webkit (not sure if they’ve moved) but M$'s new Edge browser is based in blink.

As you can see, most browsers / vendors seem to be moving in the direction of Blink.

The really ironic thing about Mozilla is where the money comes from. It would appear that Google pay them somewhere between $400M and $450M per year, out of a turnover of around £500M, so the FireFox project would seem to be a project almost exclusively funded by Google. I was going to look at how many developers are involved in Firefox .vs. Chromium’s 3000, but it would appear the Firefox source code isn’t actually on GitLab or GitHub, it’s on a private mercury server, so it’s not immediately obvious to me how many people are involved or who they are. (i.e. FF isn’t terribly transparent in this respect)

Some recent comments by Mozilla employees would seem indicate that they think that FF is dead. However, because someone (!) is still paying the bills, the lights are still on.

DuckDuckGo

I too have been using DuckDuckGo. However have you noticed they’ve just added “Most Visited Sites” … this caused me to go look at what is being loaded when you hit the search page. If you right-click on the page, then “Inspect”, then in the resulting window select “Network”. Now reload the search page and see what it loads in. It would seem to me that a lot of that doesn’t need to be there with regards to their privacy claims. Not least the html snippet called ntp-tracker-stats.html which loads in 10 times every time you hit the search page.

I tend to take privacy claims with a pinch of salt, whoever is making them. If you look at how GGD make money they claim it’s derived from advertising. (insisting it’s all above-board and they don’t track you or sell personal data) Fundamentally however, they seem to be selling ad space, just like Google.

Once upon a time, in a former reality, Google’s published mantra was; “Don’t be Evil” …I sometimes reflect on how this turned out …

Gaz511 · 26 September 2023 12:38

“After 10mins I hard rebooted (unwise?)”

Before a hard shutdown best to first press and hold Alt+SysRq then one by one press and release the following characters in the order ‘R’ ‘E’ ‘I’ ‘S’ ‘U’ ‘B’ the computer should then reboot.
R - takes control of the keyboard
E - terminates processes
I - kills all processes
S - flush data to disk
U - remount all filesystems read only
B - reboot

I think whoever came up with this had three hands!

With regards to browsers and privacy Brave or Vivaldi seem to be regarded as the better ones (although not perfect) of the chrome based browsers and with Firefox getting some bad publicity recently I have used the LibreWolf fork for the last year or so.

madpenguin · 26 September 2023 12:55

Incidentally;

Will have been why it was killing your SSD. If this is a recurring issue, what you could do is disable swap (so long as you have enough memory to run under normal curcumstances) by commenting the swap line in /etc/fstab and reboot.

(or you could do swapoff -a to try it out)

With no swap, the errant application should cause itself (or a.n.other application) to generate a fatal OOM issue as soon as the memory runs out, rather than when the swap runs out. (i.e. much more quickly)

I run with no swap and on occasion I do get issues like this, seems to take 20-30 sec to right itself. If you need swap, i.e. it’s used all the time, it will drag your performance down and wear your SSD. In this instance memory is generally fairly cheap (£1.50 /Gb?) , maybe worth considering an upgrade …

Keith · 26 September 2023 15:18

@Gaz.
Many thanks for this. I remember being told this several years ago but didn’t make a note of it. And would you believe my computer doesn’t have a SySRq key? I tried Ctrl+Alt+Delete, of course, but that seems to be old hat now.

@MP
Well, I’m sort of reassured and I might try Chromium. I had no idea that so many powerful players £contribute to community s/w, and it’s no wonder that novices like I are in the dark when it comes to trying to make their browsing safe.
8GB is not that big these days and I may well be pushing the swap a bit. I’ll have a go at disabling it. I am particularly keen not to wear out the SSD, as I believe they can be a bit short-lived if exercised too much. And I’ll look into increasing the RAM if possible for this DELL.

Thanks both.
++++++++++++++++++++++++++++++++++++++++++++++++
[EDIT] From the swapoff man pages:

OPTIONS
-a, --all
All devices marked as “swap” in /etc/fstab are made available, except for those with the “noauto” option.
Devices that are already being used as swap are silently skipped.

Does this mean that if there is any swap occurring at the time of the command, that device (ie RAM) will be left to continue swapping?

madpenguin · 26 September 2023 15:48

I think the SSD lifespan issue comes down to how good the wear levelling software of a the manufacturer is (and it’s almost impossible to tell). Personally I always try to keep a good bit of free space on my SSD’s as regardless of the algorithm used, keeping your drive mostly full is always going to be a problematic.

For “dumb” wear levelling, it’s going to recycle free blocks and if your disk is relatively full, it will continually recycle a small number of blocks and wear them far more quickly than the rest of the drive.

More intelligent wear levelling can preemptively move static blocks around, but then it’s having to do more IO’s just to try to level rarely written blocks. So, the more free space you keep on the SSD, the less work it has to do / the more evenly writes will be spread.

I guess the answer is, always keep backups …

Just as an example of not being able to tell; going back to when SSD’s were still quite new, I deployed a number of servers to a Data Centre with 64Gb SSD’s as boot disks. All the same manufacturer (big name brand), probably all from the same batch. All went well, until one day, all the servers from that deployment threw the same error at the same time. And it was weird. All systems disks had gone into read-only mode at the same time, so the majority of the software (which was typically working in memory) was kind of limping along, but all logging was dead. I initially thought I’d worn through them in 3 months.

Turns out (and this isn’t even the fault of the SSD manufacturer) the controller chip (I think it was the SATA controller) inside the SSD, which controlled access to the data, had a, well … let’s call it a feature. The feature was, if the date/time > a specific point in time, put the drive into read-only mode. I kid you not, this was embedded in the SSD controller chip.

Fortunately (!) they put out a firmware patch that could be loaded through the BIOS, but booting servers over the phone from external USB drives containing bootable hardware patches, not my idea of fun. But unless you’re going to go through stuff at that level, there’s no real way to tell what it’s doing or how good it is.

madpenguin · 26 September 2023 15:50

Ok, so, swapoff -a can be interesting. If the amount of memory currently committed is > your physical memory, the command will fail. If however your physical memory > used + swap used, swapoff will shuttle swap space back into real memory, then when swap usage reaches zero, it will turn the swap off.

Try using top on one screen, then do swap off on another, and you can watch the swap usage decrease

Keith · 26 September 2023 17:13

I didn’t realise that it was the manufacturer that provided the levelling s/w, and it would be good to know how manufacturers compare in that regard, though probably difficult to tell, as you say.
Just now, system monitor tells me that my 8GB memory is 68% used and no swap, with just a few applications running, so it looks like an upgrade would be advisable?

Thanks for the explanation of the swapoff command. Complicated, ain’t it?

madpenguin · 26 September 2023 17:34

Mm, as I understand it we have layers within layers of suppliers. Memory manufacturers typically make the memory, but then someone else will make a PCI controller that goes on a PC motherboard, then someone else will make the chip that interfaces the PC’s PCI bus with the physical memory. (which typically goes inside the SSD device, and it’s this that, as I understand it, does the wear levelling)

So the wear levelling with be down the controller chip manufacturer, not the brand you think you’re buying. Notably, brands do tend to switch controller chip from time to time and don’t always mention doing so. (if they mention which controller they use in the first place)

Problem seems to be that there aren’t many controller manufacturers, they tend to be based in the East, and they don’t seem to be all that good as sharing technical details, at least my google search didn’t really turn up much re; levelling code and chips.

Going back to the “dumb” vs “intelligent” wear levelling … many suppliers claim to have intelligent wear levelling which moves infrequently used blocks “anyway”. But on the other hand, they don’t quote any stats re; the performance hit of doing so, and they still seem to recommend using the TRIM command to “manually” tell it when it can try to optimise itself. (which doesn’t seem all that ‘intelligent’)

Historically I’ve found TRIM to be a great way to lock up a drive for minutes at a time, so I’m a little hesitant about using it these days …

Heh, yeah, wait until someone asks what the VIRT column is on top, then you’ll get complicated

Keith · 26 September 2023 22:42

Funny you should mention that as I was going to ask. I guessed it meant something VIRTual. As always, FF and TB are top of the list.

madpenguin · 27 September 2023 01:47

Ok, well don’t say I didn’t warn you

VIRT is indeed a reference to virtual and refers to the address space allocated to the process, rather than the actual memory allocated to the process. A process can request address space that far exceeds the amount of system memory or indeed swap space available. For example the MMAP system call can map such an address space onto a file in such a way that you can read / write the file, simply by modifying locations within the virtual address space.

Why is this useful?

Well consider a situation where you want to write a complex storage system (or have written one) that works in memory. Making this persistent would probably involve a different design and quite a bit of conversion work to make it read / write to one or more files.

Instead, what you could do is simply map the existing memory space onto a MMAP’d file with a few lines of code, and you’re done. The code will think it’s running in memory, but in the background the Linux virtual memory system will do all the required read’s / writes to maintain the memory blocks on a disk file. (so you’re letting the Operating System do all of your IO, which is both efficient and robust)

So, if you map a 100G address space to a file, do you need 100G of disk space for the file?

On filesystems that implement sparse storage (like ext4), then no. It will only consume as much disk space as it needs to store the data you write to the virtual address space.

So what does 100G in the VIRT column actually mean in terms of memory consumption?

The only thing it really means is that your process is mapping a large virtual address space, it’s not a reflection on how much memory the process is actually using, or even could use. However, it does alert you to potentially odd behaviour with regards to the RSS column. Some processes may start with a relatively low RSS (physical memory usage) and this may build with time, potentially to many G’s. In this instance you might be inclined to worry about a memory leak. If however you have a large number in the VIRT column, then fear not, the RSS column includes mapped virtual memory pages, which are transient and are effectively just cached pages and not memory your application is directly using.

To clear the system page cache

To get a clearer view, you can attempt to clear the page cache with;

sudo sysctl vm.drop_caches=1

If you run top at the same time, at the very least you should see the buff/cache number jump when you do this. Not sure if FF uses MMAP, but chrome allocated itself 32G of virtual space, so if your Chome is looking a little bloated, this can make a difference.

An example of a low-level database that utilises MMAP in this way is LMDB. In turn many databases give the option of using LMDB as a storage engine, including MySQL. (and orbit-database)

Wait, there’s more …

MMAP’d can also be used to share virtual address space between processes, making it quite useful for multi-threaded and multi-process applications as it’s not subject to many of the limitations of traditional POSIX IPC. I could go on … … the Linux memory management system is and always has been a work of art and something that sets it apart from both older Unix’s and other OS’s like Windows.

Keith · 27 September 2023 17:42

Phew!! Thank you for trying to explain that. I get the idea of mapping (I think) but I really can’t get my head round the concepts you’ve outlined.
The sudo sysctl vm.drop_caches=1 command did indeed reduce the buffer/cache number but I’m still not sure of its significance. Another day, perhaps!