Unraid vs. Ubuntu Bare Metal SMB Performance

In my last test I compared Unraid SMB performance with an Ubuntu VM running on Unraid, and Ubuntu outperformed Unraid. I was wondering if the VM disk image synthetically improved performance, maybe IO caching, so this time I tested Ubuntu on the same hardware that runs Unraid.

I configured the system to boot from either the Unraid USB stick, or an Ubuntu Server USB stick. In both cases the hardware was exactly the same, and the SMB share was on the same 4 x 1TB Samsung 860 Pro SSD BTRFS volume. I mounted the BTRS volume using the same mount options that Unraid uses. The Ubuntu Samba server used default options, the only change I made was to register the share.

Samba Config:
[cstshare]
comment = Samba on Ubuntu
path = /mnt/cache/CacheSpeedTest
read only = no
browsable = yes

BTRFS Mount:
mount -t btrfs -o noatime,nodiratime -U 89d1ad3a-83f3-4086-9006-5f0931370d36 /mnt/cache

I ran the same tests as before, and the results again showed that the Unraid SMB ReadWrite and Write performance is much worse compared to Ubuntu. It was interesting to note that the Ubuntu ReadWrite performance was higher than the theoretical 1Gbps limit at 1MB and 2MB block sizes. I re-tested twice and got the same results, my assumption is that the DiskSpd options to disable local and remote caching were not effective.

I have now tested Unraid vs. W2K19 VM, Ubuntu VM, and now Ubuntu Bare Metal, and Unraid ReadWrite and Write performance is always abysmal.

I have again reported my findings in the Unraid SMB performance issue thread, and we continue to wait for a fix.

Unraid vs. Ubuntu SMB Performance

In my last round of testing I found that Unraid v6.8 SMB still underperforms compared to Windows Server 2019, but I was wondering if it is a Linux Samba problem, or an Unraid problem.

I installed an Ubuntu Server 18.04.3 LTS VM on Unraid, bridged network, 16GB RAM, 128GB raw disk located on the BTRS cache volume, consisting of 4 x Samsung Pro 860 SSD drives. This is exactly the same configuration I use for the W2K19 test VM. I installed Samba on Ubuntu using default options.

I created a SMB share that is backed by the VM disk image, and a second share that is mapped directly to an Unraid share located on the cache volume. For both shares the Ubuntu VM and Samba server will handle SMB network traffic, but one share will write to the Ubuntu EXT4 volume backed by the VM disk image, and the second will write through to the underlying Unraid BTRFS cache volume using VirtFS.

I ran a series of tests using my DiskSpeedTest utility, and the results are below.

UnraidUbuntuW2K19SMB
Unraid, Ubuntu, W2K19

UnraidUbuntuDirectSMB
Ubuntu, Mapped

Note that the VirtFS mapped share exhibit some problems that appear to be caching related. E.g. the file iteration test would create 14000 files, but iterating the just created files would only read 3080.

My conclusion is that the Linux Samba SMB performance is on par with that of Windows Server 2019, and that the performance problems are attributed to the Unraid file write performance. The Windows test used NTFS and Ubuntu used EXT4, so it could be BTRFS and XFS related, but more likely something Unraid does. Maybe the next step could be to test a bare metal Ubuntu SMB on XFS and BTRFS.

Unraid SMB Performance: v6.7.2 vs. v6.8.1

I previously wrote about the poor SMB performance I experienced in Unraid v6.7.2. Unraid v6.8 supposedly addressed SMB performance issues for concurrent read and write operations, and after waiting for the first bugfix release of v6.8, I re-tested using v6.8.1.

In my last test I used a combination of batch files and copy and paste, this time I wrote a tool to make repeat testing easy. I am not going to describe the usage here, see the instructions at the GitHub repository. There are new reports in v6.8 of poor SMB performance when a folders contains large numbers of files, so I added a test to try and simulate that behavior, by creating a large number of files, then reading each file, then deleting each file.

I configured my Unraid server with two test SMB shares, one pointing to the cache, and one pointing to a single spinning disk. The cache consists of 4 x 1TB Samsung Pro 860 drives in a BTRFS volume, and the spinning disk is a Seagate IronWolf 12TB disk formatted XFS protected by a single similar model parity disk. The third share is backed by a Windows Server 2019 VM running on the cache disk.

I upgraded the server from v6.7.2 to v6.8.1, verified operation, and then restored it back to v6.7.2. I ran the first set of tests with v6.7.2, upgraded to v6.8.1, and re-ran the same set of tests. Both tests used exactly the same hardware configuration and environment, and were run back to back.

Here are results in graph form:

Sum_Of_MBPS
Sum of MBPS

Sum_Of_MBPS_and_BlockSize
Sum of MBPS and BlockSize

MBPS_1MB
MBPS at 1MB BlockSize

MBPS_1MB_NoW2K19
MBPS at 1MB BlockSize no W2K19

IterationTime
Iteration Time

What did we learn?

  • Windows Server 2019 SMB performance is still far superior compared to Unraid.
    • I don’t know if the Linux SMB implementation is just that much slower compared to Windows, or if the performance degradation is attributed to Unraid.
    • TODO: Test SMB performance between a Linux VM and Windows VM.
  • The cache performance in v6.8.1 is worse compared to v6.7.2.
  • No noticeable SMB performance improvement in v6.8.1.

Unraid in production, a bit rough around the edges, and terrible SMB performance

In my last two posts I described how I migrated from W2K16 and hardware RAID6 to Unraid. Now that I’ve had two Unraid servers in production for a while, I’ll describe some of the good and not so good I experienced.

Running Docker on Unraid is magnitudes easier compared to getting Docker to work on Windows. Docker allowed me to move all but one of my workloads from VM’s to containers, simplifying updates, reducing the memory footprint, and improving performance.

For my IP security camera NVR software I did switch from Milestone XProtect Express running on a W2K16 VM, to DW Spectrum running on an Ubuntu Server VM. DW Spectrum is the US brand name for the Nx Witness product, and the DW Spectrum branded product is sold in the US. I chose to switch to Nx Witness, no DW Spectrum, from XProtect because Nx Witness is lighter in resource consumption, easier to deploy, easier to update, has perpetual licenses, includes native remote viewing, and an official Docker release is forthcoming.

I have been a long time user of CrashPlan, and I switched to CrashPlan Pro when they stopped offering a consumer product. I tested CrashPlan Pro and Duplicati containers on Unraid, with Duplicati backing up to Backblaze B2. Duplicati is the clear winner, backups were very fast, and completed in about 3 days. Where after 5 days I stopped CrashPlan, when it estimated another 18 days to complete the same backup operation, and it showed the familiar out of memory error. My B2 storage cost will be a few $ higher compared to a single seat license for CrashPlan Pro, but the Duplicati plus B2 functionality and speed is superior.

2019-06-08 (10)

When the Unraid 6.7.0 release went public, I immediately updated, but soon realized my mistake, when several plugins stopped working. It took several weeks before plugin updates were released that restored full functionality. It is worth mentioning, again, that I find it strange that Unraid without community provided plugins is really not that usable, but the functionality still remains in community provided plugins, not in Unraid. Next time I will wait a few weeks for the dust to settle in the plugin community before updating.

Storage and disk management is reasonably easy, and much more flexible compared to hardware RAID management. But adding and removing disks is still mostly a manual process, and doing it without invalidating parity is very cumbersome and time consuming. At several times I gave up on the convoluted steps required to add or remove disks without invalidating parity, and just reconfigured the array and then rebuilt parity, hoping nothing goes wrong during the parity rebuild. This is in my opinion a serious shortcoming, maybe not in technology, but in lack of an easy to use and reliable workflow to help retain redundant protection at all times.

In order to temporarily make enough storage space in my secondary server, I removed all the SSD cache drives and replaced them with 12TB Seagate IronWolf drives. I did move all the data that used to be on the cache to regular storage, including the docker appdata folder. This should not be a big deal, but I immediately started getting SQLite DB corruption errors in apps like Plex, that store data in SQLite on the appdata share. After some troubleshooting I found many people complaining about this issue, that seems to have been exasperated by the recent Unraid 6.7.0 update. Apparently this is a known problem with the Fuse filesystem used by Unraid. Fuse dynamically spans shares and folders across disks, but apparently breaks file and file-region locking required by SQLite. The recommended workaround is to put all files that require locking to work on the cache, or on a single disk, effectively bypassing Fuse. If it is Fuse that breaks file locking behavior, I find it troubling that this is not considered a critical bug.

I am quite familiar with VM snapshot management using Hyper-V and VMWare, it is a staple of VM management. In Unraid I am using a Docker based Virt-Manager, which seems far less flexible, but more importantly, fails to take snapshots of UEFI based VM’s. Apparently this is a known shortcoming. I have not looked very hard for alternatives, but this seems to be a serious functional gap compared to Hyper-V or VMWare’s snapshot capabilities.

2019-06-05 (2)

As I started using the SMB file shares, now hosted on Unraid, in my regular day to day activities, I noticed that under some conditions the write speed becomes extremely slow, often dropping to around 2MB/s. This seems to happen when there are other file read operations in progress, and even a few KB/s of reads can drastically reduce the array SMB write performance. Interestingly the issue does not appear to affect my use of rsync between Unraid servers, but only SMB. I did find at least one other recent report of similar slowdowns, where only SMB is affected.

Since the problem appeared to be specific to Unraid SMB, and not general network performance, I compared the Unraid SMB performance with Windows SMB in a W2K19 VM running on the same Unraid system. By running W2K19 as a VM on the same Unraid system, the difference in performance will be mostly the SMB stack, not hardware or network.

On Unraid I created a share that is backed by the SSD cache array, that same SSD cache array holds the W2K19 VM disk image, so the storage subsystems are similar. I ran a similar test against an Unraid share backed by disk instead of cache.

I found a few references (1, 2) to SMB benchmarking using DiskSpd, and I used them as a basis for the test options I used. Start by creating a 64GB test file on all test shares, we reuse the file and it saves a lot of time to not recreate it every time. Note, we get a warning when creating the file on Unraid, due to SetFileValidData() not being supported by Unraid’s SMB implementation, but that should not be an issue.

>diskspd.exe -c64G \\storage\testcache\testfile64g.dat
WARNING: Could not set valid file size (error code: 50); trying a slower method of filling the file (this does not affect performance, just makes the test preparation longer)

>diskspd.exe -c64G \\storage\testmnt\testfile64g.dat
WARNING: Could not set valid file size (error code: 50); trying a slower method of filling the file (this does not affect performance, just makes the test preparation longer)

>diskspd.exe -c64G \\WIN-EKJ8HU9E5QC\TestW2K19\testfile64g.dat

I ran several tests similar to the following commandlines:

>diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\storage\testcache\testfile64g.dat > d:\diskspd_unraid_cache.txt
>diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\storage\testmnt\testfile64g.dat > d:\diskspd_unraid_mnt.txt
>diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\WIN-EKJ8HU9E5QC\TestW2K19\testfile64g.dat > d:\diskspd_w2k19.txt

For a full explanation of the commandline arguments see here. The test will do 50% read and 50% write, block sizes varied from 4KB to 2048KB, 2 threads, 8 outstanding IO operations, random aligned IO, warm up for 60s, run for 120s, disable local caching for remote filesystems.

W2K19.1

W2K19.3

Cache.1

Cache.3

Mount.1

Mount.3

From the results we can see that the Unraid SMB performance for this test is pretty poor. I redid the tests, this time doing independent read and write tests, and instead of various block sizes, I just did a 512KB block size test (I got lazy).

RW.1

RW.2

No matter how we look at it, the Unraid SMB write performance is still really bad.

I wanted to validate the synthetic tests results with a real world test, so I collected a folder containing around 65.2GB of fairly large files, on SSD, and copied the files up and down using robocopy from my Win10 system. I chose the size of files to be about double the size of the memory on the Unraid system, such that the impact of caching can be minimized. I made sure to use a RAW VM disk to eliminate any performance impact of growing a QCOW2 image file.

>robocopy d:\temp\out \\storage\testmnt\in /mir /fft > d:\robo_pc_mnt.txt
>robocopy d:\temp\out \\storage\testcache\in /mir /fft > d:\robo_pc_cache.txt
>robocopy d:\temp\out \\WIN-EKJ8HU9E5QC\TestW2K19\in /mir > d:\robo_pc_w2k19.txt

>robocopy \\storage\testmnt\in d:\temp\in /mir /fft > d:\robo_mnt_pc.txt
>robocopy \\storage\testcache\in d:\temp\in /mir /fft > d:\robo_cache_pc.txt
>robocopy \\WIN-EKJ8HU9E5QC\TestW2K19\in d:\temp\in /mir > d:\robo_w2k19_pc.txt

During the robocopy to Unraid I notice that sporadically the Unraid web UI, and web browsing in general, becomes very slow. This never happens while copying to W2K19. I can’t explain this, I see no errors reported in my Win10 client eventlog or resource monitor, I see no unusual errors on the network switches, and no errors in Unraid. I suspect whatever is impacting SMB performance is affecting network performance in general, but without data I am really just speculating.

The robocopy read results are pretty even, but again shows inferior Unraid SMB write performance. Do note that the W2K19 VM is still not as fast as my previous W2K16 RAID6 setup where I could consistently saturate the 1Gbps link for read and writes, on the same hardware and using the same disk.

Robocopy.1.png

Robocopy.2

It is very disappointing to discover the poor SMB performance, I reported my findings to the Unraid support forum, and I hope they can do something to improve performance, or maybe invalidate my findings.