Review: Raspberry Pi meets NVMe

madpenguin · 14 May 2024 23:53

So in my drive to eradicate all power hungry Intel and AMD hardware, I’ve just installed my first Pi with an internal NVMe drive to replace my current storage solution.

Storage

In this instance I chose a Crucial P3 1Tb unit, partly due to the (relatively) low price and partly down to it being on various “good” compatibility lists.

crucial p3

Case

Rather than going with an M.2 “hat” I opted for the Argon NEO 5 NVMe case, which is essentially a NEO case (which I’ve used previously, which is integral heat sink and fan) with an extension that takes in the NVMe module. Notably the two are partitioned so hopefully the drive shouldn’t get too warm.

There is a panel on the bottom covering the NVMe module, so to access or replace the module it’s just a matter of unscrewing that panel.

I’d list all the other components I needed, but apart from a tenner for the power supply, it’s a Raspberry Pi, so that’s it!

But how fast is it?

Well, propaganda and benchmarks aside, it would appear that the raw speed (after I’ve tweaked the bus to Gen 3) is over twice as fast as the USB alternatives I have been using.

$ dd if=/dev/nvme0n1p2 of=/dev/null bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 5.72161 s, 916 MB/s

And how warm does it get?

Well, I suppose that remains to be seen. Currently my workstation is showing 55C, which is running in a “flirc” case with no fan. At the same time the new box is positively arctic in comparison. (but this is the effect of a passive heat-sink, and a fan, and an aluminium case.

$ vcgencmd measure_temp
temp=38.4'C

Pro’s and Con’s

On the one hand I’ve like to give it 5/5, but on the other hand, it won’t be everyone’s cup of tea.

Pro’s

All in one unit, no separate “hat” required, just a Pi board and NVMe module
Comes in the same size box as a standard NEO, only slightly larger (70x90x35mm)
Feels heavy and looks ‘quality’
Seems to perform very well on cursory benchmarks

Con’s

No access to SD card once case is assembled
System doesn’t seem to want to boot from USB, which means you need to install / make the NVMe bootable before assembly is complete. (technically this is a RPi shortcoming, however it doesn’t affect USB alternatives)
Installing the ribbon is “tricky”
Installing and routing the fan cable is “tricky”
Trying to get the screw into the NVMe holder is “tricky”
They know it’s “tricky” because they supply a spare ribbon cable (!) , but I suspect this may still be optimistic in some cases.
If you’re lacking in dexterity, familiarity with building kit, any kind of arthritis in your fingers etc, this one probably isn’t for you.

Summary

If you’re an enthusiast, hobbyist, or you have one of these to put it together for you, then I’d give it a 5/5 as an overall solution. If you’re a beginner looking to build your first RPi, maybe more like a 2/5. A standard case with an external USB literally takes a couple of minutes to get running - this one probably took me a hour to get right.

bigboj · 15 January 2025 16:04

dd if=/dev/nvme0n1p2 of=/dev/null bs=1M count=5000 gives you the buffered read result. You should use dd if=/dev/nvme0n1p2 of=/dev/null bs=1M count=5000 iflag=direct to get raw transfer rate

madpenguin · 15 January 2025 16:32

Hi bigboj,

Ok, so I should clarify, when I do a “dd” I prefix it with a cache drop. Given the system is tested at idle, a cache drop followed by a “dd” should give the device’s real-world performance, which you might expect to be the same as the raw performance.

I would agree that technically iflag would be a more correct approach and clearly gives the speed when doing raw IO. However (!) generally applications don’t do raw IO, typically everything is buffered. I’ve recently switched out my workstation for one of these, but with a 500Gb model rather than a 1Tb, here’s what I get on an idle system;

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/nvme0n1p2 of=/dev/null bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 6.10673 s, 859 MB/s

.vs.

# dd if=/dev/nvme0n1p2 of=/dev/null bs=1M count=5000 iflag=direct
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 6.17481 s, 849 MB/s

i.e. the raw performance is consistently fractionally slower than buffered - when the buffers are empty. My assumption has always been that for some reason, internal to the Linux Kernel (or “dd”), raw IO is actually fractionally slower than buffered IO, which is why I tend to go with the buffer drop approach.

If you know where the discrepancy comes from it’s something that I’ve wondered about for a long time?