## Unraid SMB Performance: v6.7.2 vs. v6.8.1

I previously wrote about the poor SMB performance I experienced in Unraid v6.7.2. Unraid v6.8 supposedly addressed SMB performance issues for concurrent read and write operations, and after waiting for the first bugfix release of v6.8, I re-tested using v6.8.1.

In my last test I used a combination of batch files and copy and paste, this time I wrote a tool to make repeat testing easy. I am not going to describe the usage here, see the instructions at the GitHub repository. There are new reports in v6.8 of poor SMB performance when a folders contains large numbers of files, so I added a test to try and simulate that behavior, by creating a large number of files, then reading each file, then deleting each file.

I configured my Unraid server with two test SMB shares, one pointing to the cache, and one pointing to a single spinning disk. The cache consists of 4 x 1TB Samsung Pro 860 drives in a BTRFS volume, and the spinning disk is a Seagate IronWolf 12TB disk formatted XFS protected by a single similar model parity disk. The third share is backed by a Windows Server 2019 VM running on the cache disk.

I upgraded the server from v6.7.2 to v6.8.1, verified operation, and then restored it back to v6.7.2. I ran the first set of tests with v6.7.2, upgraded to v6.8.1, and re-ran the same set of tests. Both tests used exactly the same hardware configuration and environment, and were run back to back.

Here are results in graph form:

What did we learn?

• Windows Server 2019 SMB performance is still far superior compared to Unraid.
• I don’t know if the Linux SMB implementation is just that much slower compared to Windows, or if the performance degradation is attributed to Unraid.
• TODO: Test SMB performance between a Linux VM and Windows VM.
• The cache performance in v6.8.1 is worse compared to v6.7.2.
• No noticeable SMB performance improvement in v6.8.1.

## Unraid repeat parity errors on reboot

This post started with a quick experiment, but after hardware incompatibilities forced me to swap SSD drives, and subsequently losing a data volume, it turned into a much bigger effort.

My two Unraid servers have been running nonstop without any issues for many months, last I looked the uptime on v6.7.2 was around 240 days. We recently experienced an extended power failure, and I noticed 5 parity errors, on both servers, after the servers were restarted.

Jan 1 06:09:23 Server-1 kernel: md: recovery thread: PQ corrected, sector=1962934168
Jan 1 06:09:23 Server-1 kernel: md: recovery thread: PQ corrected, sector=1962934176
Jan 1 06:09:23 Server-1 kernel: md: recovery thread: PQ corrected, sector=1962934184
Jan 1 06:09:23 Server-1 kernel: md: recovery thread: PQ corrected, sector=1962934192
Jan 1 06:09:23 Server-1 kernel: md: recovery thread: PQ corrected, sector=1962934200

Jan 1 04:42:39 Server-2 kernel: md: recovery thread: P corrected, sector=1962934168
Jan 1 04:42:39 Server-2 kernel: md: recovery thread: P corrected, sector=1962934176
Jan 1 04:42:39 Server-2 kernel: md: recovery thread: P corrected, sector=1962934184
Jan 1 04:42:39 Server-2 kernel: md: recovery thread: P corrected, sector=1962934192
Jan 1 04:42:39 Server-2 kernel: md: recovery thread: P corrected, sector=1962934200

I initially suspected that a dirty shutdown caused the corruption, but my entire rack is on a large UPS, and the servers are configured, and tested, to cleanly shutdown in case of a low battery condition. Unfortunately Unraid does not persist logs across reboots, so it was impossible to verify the shutdown behavior via logs. Unraid logs to memory and not to the USB flash drive to prevent flash wear, but I think this needs to be at least configurable, as no logs means troubleshooting after an unexpected reboot is near impossible. Yes, I know I can enable the Unraid syslog server, and I can redirect syslog to write to the flash drive, but syslog is not as reliable or complete as native logging, especially during a shutdown scenario, but more importantly, syslog was not enabled, so no shutdown logs.

I could not entirely rule out a dirty shutdown, but I could test a clean reboot scenario. I restarted from within Unraid, ran a parity check, same exact 5 parity errors were back, ran a parity check again, and clean. It takes more than a day to run a single parity check, so this is a cumbersome and time consuming exercise. It is  very suspicious that it is exactly the same 5 sectors, every time.

Jan  3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934168
Jan  3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934176
Jan  3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934184
Jan  3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934192
Jan  3 10:03:07 Server-2 kernel: md: recovery thread: P corrected, sector=1962934200

I searched the Unraid forums, and I found that there are other reports of similar repeat parity errors. In some instances attributed to a Marvel chipset, or a Supermicro AOC-SASLP-MV8 controller, or the SASLP2 driver. My systems use Adaptec RAID cards, 7805Q SAS2 and 81605ZQ SAS3, in HBA mode, so no Marvel chipset and no SASLP2 driver, but the same symptoms.

An all too common forum reply to storage problems is to switch to a LSI HBA, and I got the same reply when I reported the parity problem with my Adaptec hardware.

I was sceptical, causation vs. correlation. As example, take the SQLite corruption bug introduced in v6.7 and for the longest time it was blamed on hardware or 3rd party apps, but it eventually turns out to be an Unraid bug.

Arguing my case on a community support forum is not productive, and I just want the parity problem resolved, so I decided to switch to LSI HBA cards. I really do have a love hate relationship with community support, especially when I pay for a product, like Unraid or Plex Pass, but have no avenue to dedicated support.

I am no stranger to LSI cards, and the problems flashing from IR to IT mode firmware, so I got my LSI cards preflashed with the latest IT mode firmware at the Art of Server eBay store. My systems are wired with miniSAS HD SFF-8643 cables, and the only cards offered with miniSAS HD ports were LSI SAS9340-8i ServeRAID M1215 cards. I know the RAID hardware is overkill when using IT mode, and maybe I should have gone for vanilla LSI SAS 9300-8i cards, especially when the the Unraid community was quick to comment that a 9340 is not a “true” HBA.

I replaced the 7805Q with the SAS9340 in Server-2, and noticed that none of my SSD drives showed up in the LSI BIOS utility, only the spinning disks showed up. I put the 7805Q card back, and all the drives, including the SSD drives, showed up in the Adaptec BIOS utility. I replaced the 81605ZQ with the SAS9340 in Server-1, and this time some of the SSD’s showed up. None of my Samsung EVO 840 SSD’s showed up, but the Samsung Pro 850 and Pro 860 SSD’s did show up. I again replaced the 7805Q in Server-2 with the SAS9340, but this time I added a Samsung Pro 850, and it did show up.

The problem seemed to be limited to my Samsung EVO drives. I reached out to Art of Server for help, and although he was very responsive, he had not seen or heard of this problem. I looked at the LSI hardware compatibility list, and the EVO drives were listed. Some more searching, and I found a LSI KB article mentioning TRIM support not being supported on Samsung Pro 850 drives. It seems that the LSI HBA’s need TRIM to support DRAT (Deterministic Read After TRIM) / (Data Set Management TRIM supported (limit 8 blocks)), and RZAT (Deterministic read ZEROs after TRIM). The Wikipedia article on TRIM mentions specific drives for faulty TRIM implementations, including the Samsung 840 and 850 (without specifying Pro or EVO), and the Linux kernel has special handling for Samsung 840 and 850 drives.

	/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*",		"MU01",	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Micron_M500_*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Crucial_CT*M500*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Micron_M5[15]0_*",		"MU01",	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Crucial_CT*M550*",		"MU01",	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Crucial_CT*MX100*",		"MU01",	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Samsung SSD 840*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "Samsung SSD 850*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
{ "FCCT*M500*",			NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },

This is all still circumstantial, as it does not explain why the LSI controller would not mount the 840 EVO drives, but will mount the 850 Pro drive, when both are listed as problematic, and both are included on the LSI hardware compatibility list. I do not have EVO 850’s to test with, so I can not confirm if the problem is limited to EVO 840’s.

I still had the original parity problem to deal with, and to verify that a LSI HBA will resolve the problem, so I needed a working Unraid with LSI HBA system. Server-1 had two EVO 840’s, a 850 Pro, and a 860 Pro for the BTRFS cache volume. I pulled a Pro 850 and a Pro 860 drive from another system, and proceeded to replace the two EVO 840’s. Per the Unraid FAQ, I should be able to replace the drives one at a time, waiting for the BTRFS volume to rebuild. I replaced the first disk, it took about a day to rebuild, I replaced the second disk using the same procedure, but something went wrong, and my cache volume would not mount, and reported being corrupt.

Jan  6 07:25:41 Server-1 kernel: BTRFS info (device sdf1): allowing degraded mounts
Jan  6 07:25:41 Server-1 kernel: BTRFS info (device sdf1): disk space caching is enabled
Jan  6 07:25:41 Server-1 kernel: BTRFS info (device sdf1): has skinny extents
Jan  6 07:25:41 Server-1 kernel: BTRFS warning (device sdf1): devid 4 uuid 94867179-94ed-4580-ace4-f026694623f6 is missing
Jan  6 07:25:41 Server-1 kernel: BTRFS error (device sdf1): failed to verify dev extents against chunks: -5
Jan  6 07:25:41 Server-1 root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sdr1, missing codepage or helper program, or other error.
Jan  6 07:25:41 Server-1 emhttpd: shcmd (7033): exit status: 32
Jan  6 07:25:41 Server-1 emhttpd: /mnt/cache mount error: No file system
Jan  6 07:25:41 Server-1 emhttpd: shcmd (7034): umount /mnt/cache
Jan  6 07:25:41 Server-1 kernel: BTRFS error (device sdf1): open_ctree failed
Jan  6 07:25:41 Server-1 root: umount: /mnt/cache: not mounted.
Jan  6 07:25:41 Server-1 emhttpd: shcmd (7034): exit status: 32
Jan  6 07:25:41 Server-1 emhttpd: shcmd (7035): rmdir /mnt/cache

In retrospect I should have known something was wrong when Unraid reported the array being stopped, but I still saw lots of disk activity on the SSD drive bay lights. I suspect the BTRFS rebuild was still ongoing, or mounted, even if Unraid reported the array being stopped. No problem, I thought, I make daily data backups to Backblaze B2 using Duplicacy, and weekly Unraid (appdata and docker) backups, that are then backed up to B2. I recreated the cache volume, and got the server started again, but my Unraid data backups were missing.

It was an oversight and configuration mistake: I configured my backup share to be cached, I ran daily backups of the backup share to B2 at 2am, and weekly Unraid backups to the backup share on Mondays at 3am. The last B2 backup was Monday morning at 2am, the last Unraid backup was Monday morning at 3am. When the cache died all data on the cache was lost, including the last Unraid backup, that never made it to B2. My last recoverable Unraid backup on B2 was a week old.

So a few key learnings: do not use the cache for backup storage, schedule offsite backups to run after onsite backups, and if the lights are still blinking don’t pull the disk.

Once I had all the drives installed, I tested for TRIM support.

Samsung Pro 860, supports DRAT and RZAT:

root@Server-1:/mnt# hdparm -I /dev/sdf | grep TRIM
* Data Set Management TRIM supported (limit 8 blocks)
* Deterministic read ZEROs after TRIM

Samsung Pro 850, supports DRAT:

root@Server-2:~# hdparm -I /dev/sdf | grep TRIM
* Data Set Management TRIM supported (limit 8 blocks)

Samsung EVO 840, supports DRAT, but does not work with the LSI HBA:

root@Server-2:~# hdparm -I /dev/sdc | grep TRIM
* Data Set Management TRIM supported (limit 8 blocks)

The BTRFS volume consisting 4 x Pro 860 drives reported trimming what looks like all disks, 3.2 TiB:

root@Server-1:~# fstrim -v /mnt/cache
/mnt/cache: 3.2 TiB (3489240088576 bytes) trimmed

The BTRFS volume consisting of 2 x Pro 860 + 2 x Pro 850 drives reported trimming what looks like only 2 disks, 1.8 TiB:

root@Server-2:~# fstrim -v /mnt/cache
/mnt/cache: 1.8 TiB (1946586398720 bytes) trimmed

In summary, Samsung EVO 840 no good, Samsung Pro 850 avoid, Samsung Pro 860 is ok.

Server-2 uses SFF-8643 to SATA breakout cables with sideband SGPIO connectors, controlling the drive bay lights. With the Adaptec controller the drive bay lights worked fine, but with the LSI the lights do not appear to work. I am really tempted to replace the chassis with a SAS expander, alleviating the need for the breakout cables, but that is a project for another day.

After I recreated the cache volume, reinstalled the Duplicacy web container and tried to restore my now week old backup file. I could not get the web UI to restore the 240GB backup file, either the session timed out or the network connection was dropped. I reverted to using the CLI, and with a few retries, eventually restored the file. It was disappointing to learn that the web UI must remain open during the restore, and that the CLI does not automatically retry on network failures. Fortunately Duplicacy will do block-based restores and can resume restoring large files.

2020/01/06 07:59:01 Created restore session 1o8nqw
2020/01/06 07:59:01 Running /home/duplicacy/.duplicacy-web/bin/duplicacy_linux_x64_2.3.0 [-log restore -r 101 -storage B2-Backup -overwrite -stats --]
2020/01/06 07:59:01 Set current working directory to /cache/localhost/restore
2020/01/06 09:37:35 Deleted listing session jnji7l
2020/01/06 09:37:41 Invalid session
2020/01/06 12:07:57 Stopping the restore operation in session 1o8nqw
2020/01/06 12:07:57 Failed to restore files for backup B2-Backup-Backup revision 101 in the storage B2-Backup: Duplicacy was aborted
2020/01/06 12:07:57 closing log file restore-20200106-075901.log
2020/01/06 12:08:17 Deleted restore session 1o8nqw
Downloaded chunk 34683 size 13140565, 15.05MB/s 01:20:44 70.1%
1ff9d2c082d06226b0d81019338d048bf5a4428827a3fc0d3f6f337d66fd7fa9: read tcp 192.168.1.113:49858->206.190.215.16:443: wsarecv: An existing connection was forcibly closed by the remote host.
...
Files: 1 total, 243982.58M bytes
Total running time: 01:30:23

I did lose my DW-Spectrum IPVMS running on an Ubuntu Server VM. I’ve known that I don’t have a VM backup solution, but the video footage is on the storage server not in the VM, video backups go to B2, and it is reasonably easy to recreate the VM. I am still working on a DW-Spectrum docker solution for Unraid, but as of today the VMS does not recognize Unraid mapped storage volumes.

After all this trouble, I could finally test a parity check after reboot with the LSI HBA.  With the system up I ran a parity check, all clear, rebooted, ran the parity check again, and … no errors. I performed this operation on both servers, no problems.

I was really sceptical that the LSI would work where the Adaptec failed, and this does not rule out Unraid as the cause, but it does show that Unraid with the LSI HBA does not have the dirty parity on reboot problem.

## Unraid in production, a bit rough around the edges, and terrible SMB performance

In my last two posts I described how I migrated from W2K16 and hardware RAID6 to Unraid. Now that I’ve had two Unraid servers in production for a while, I’ll describe some of the good and not so good I experienced.

Running Docker on Unraid is magnitudes easier compared to getting Docker to work on Windows. Docker allowed me to move all but one of my workloads from VM’s to containers, simplifying updates, reducing the memory footprint, and improving performance.

For my IP security camera NVR software I did switch from Milestone XProtect Express running on a W2K16 VM, to DW Spectrum running on an Ubuntu Server VM. DW Spectrum is the US brand name for the Nx Witness product, and the DW Spectrum branded product is sold in the US. I chose to switch to Nx Witness, no DW Spectrum, from XProtect because Nx Witness is lighter in resource consumption, easier to deploy, easier to update, has perpetual licenses, includes native remote viewing, and an official Docker release is forthcoming.

I have been a long time user of CrashPlan, and I switched to CrashPlan Pro when they stopped offering a consumer product. I tested CrashPlan Pro and Duplicati containers on Unraid, with Duplicati backing up to Backblaze B2. Duplicati is the clear winner, backups were very fast, and completed in about 3 days. Where after 5 days I stopped CrashPlan, when it estimated another 18 days to complete the same backup operation, and it showed the familiar out of memory error. My B2 storage cost will be a few higher compared to a single seat license for CrashPlan Pro, but the Duplicati plus B2 functionality and speed is superior. When the Unraid 6.7.0 release went public, I immediately updated, but soon realized my mistake, when several plugins stopped working. It took several weeks before plugin updates were released that restored full functionality. It is worth mentioning, again, that I find it strange that Unraid without community provided plugins is really not that usable, but the functionality still remains in community provided plugins, not in Unraid. Next time I will wait a few weeks for the dust to settle in the plugin community before updating. Storage and disk management is reasonably easy, and much more flexible compared to hardware RAID management. But adding and removing disks is still mostly a manual process, and doing it without invalidating parity is very cumbersome and time consuming. At several times I gave up on the convoluted steps required to add or remove disks without invalidating parity, and just reconfigured the array and then rebuilt parity, hoping nothing goes wrong during the parity rebuild. This is in my opinion a serious shortcoming, maybe not in technology, but in lack of an easy to use and reliable workflow to help retain redundant protection at all times. In order to temporarily make enough storage space in my secondary server, I removed all the SSD cache drives and replaced them with 12TB Seagate IronWolf drives. I did move all the data that used to be on the cache to regular storage, including the docker appdata folder. This should not be a big deal, but I immediately started getting SQLite DB corruption errors in apps like Plex, that store data in SQLite on the appdata share. After some troubleshooting I found many people complaining about this issue, that seems to have been exasperated by the recent Unraid 6.7.0 update. Apparently this is a known problem with the Fuse filesystem used by Unraid. Fuse dynamically spans shares and folders across disks, but apparently breaks file and file-region locking required by SQLite. The recommended workaround is to put all files that require locking to work on the cache, or on a single disk, effectively bypassing Fuse. If it is Fuse that breaks file locking behavior, I find it troubling that this is not considered a critical bug. I am quite familiar with VM snapshot management using Hyper-V and VMWare, it is a staple of VM management. In Unraid I am using a Docker based Virt-Manager, which seems far less flexible, but more importantly, fails to take snapshots of UEFI based VM’s. Apparently this is a known shortcoming. I have not looked very hard for alternatives, but this seems to be a serious functional gap compared to Hyper-V or VMWare’s snapshot capabilities. As I started using the SMB file shares, now hosted on Unraid, in my regular day to day activities, I noticed that under some conditions the write speed becomes extremely slow, often dropping to around 2MB/s. This seems to happen when there are other file read operations in progress, and even a few KB/s of reads can drastically reduce the array SMB write performance. Interestingly the issue does not appear to affect my use of rsync between Unraid servers, but only SMB. I did find at least one other recent report of similar slowdowns, where only SMB is affected. Since the problem appeared to be specific to Unraid SMB, and not general network performance, I compared the Unraid SMB performance with Windows SMB in a W2K19 VM running on the same Unraid system. By running W2K19 as a VM on the same Unraid system, the difference in performance will be mostly the SMB stack, not hardware or network. On Unraid I created a share that is backed by the SSD cache array, that same SSD cache array holds the W2K19 VM disk image, so the storage subsystems are similar. I ran a similar test against an Unraid share backed by disk instead of cache. I found a few references (1, 2) to SMB benchmarking using DiskSpd, and I used them as a basis for the test options I used. Start by creating a 64GB test file on all test shares, we reuse the file and it saves a lot of time to not recreate it every time. Note, we get a warning when creating the file on Unraid, due to SetFileValidData() not being supported by Unraid’s SMB implementation, but that should not be an issue. >diskspd.exe -c64G \\storage\testcache\testfile64g.dat WARNING: Could not set valid file size (error code: 50); trying a slower method of filling the file (this does not affect performance, just makes the test preparation longer) >diskspd.exe -c64G \\storage\testmnt\testfile64g.dat WARNING: Could not set valid file size (error code: 50); trying a slower method of filling the file (this does not affect performance, just makes the test preparation longer) >diskspd.exe -c64G \\WIN-EKJ8HU9E5QC\TestW2K19\testfile64g.dat I ran several tests similar to the following commandlines: >diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\storage\testcache\testfile64g.dat > d:\diskspd_unraid_cache.txt >diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\storage\testmnt\testfile64g.dat > d:\diskspd_unraid_mnt.txt >diskspd -w50 -b512K -F2 -r -o8 -W60 -d120 -Srw -Rtext \\WIN-EKJ8HU9E5QC\TestW2K19\testfile64g.dat > d:\diskspd_w2k19.txt For a full explanation of the commandline arguments see here. The test will do 50% read and 50% write, block sizes varied from 4KB to 2048KB, 2 threads, 8 outstanding IO operations, random aligned IO, warm up for 60s, run for 120s, disable local caching for remote filesystems. From the results we can see that the Unraid SMB performance for this test is pretty poor. I redid the tests, this time doing independent read and write tests, and instead of various block sizes, I just did a 512KB block size test (I got lazy). No matter how we look at it, the Unraid SMB write performance is still really bad. I wanted to validate the synthetic tests results with a real world test, so I collected a folder containing around 65.2GB of fairly large files, on SSD, and copied the files up and down using robocopy from my Win10 system. I chose the size of files to be about double the size of the memory on the Unraid system, such that the impact of caching can be minimized. I made sure to use a RAW VM disk to eliminate any performance impact of growing a QCOW2 image file. >robocopy d:\temp\out \\storage\testmnt\in /mir /fft > d:\robo_pc_mnt.txt >robocopy d:\temp\out \\storage\testcache\in /mir /fft > d:\robo_pc_cache.txt >robocopy d:\temp\out \\WIN-EKJ8HU9E5QC\TestW2K19\in /mir > d:\robo_pc_w2k19.txt >robocopy \\storage\testmnt\in d:\temp\in /mir /fft > d:\robo_mnt_pc.txt >robocopy \\storage\testcache\in d:\temp\in /mir /fft > d:\robo_cache_pc.txt >robocopy \\WIN-EKJ8HU9E5QC\TestW2K19\in d:\temp\in /mir > d:\robo_w2k19_pc.txt During the robocopy to Unraid I notice that sporadically the Unraid web UI, and web browsing in general, becomes very slow. This never happens while copying to W2K19. I can’t explain this, I see no errors reported in my Win10 client eventlog or resource monitor, I see no unusual errors on the network switches, and no errors in Unraid. I suspect whatever is impacting SMB performance is affecting network performance in general, but without data I am really just speculating. The robocopy read results are pretty even, but again shows inferior Unraid SMB write performance. Do note that the W2K19 VM is still not as fast as my previous W2K16 RAID6 setup where I could consistently saturate the 1Gbps link for read and writes, on the same hardware and using the same disk. It is very disappointing to discover the poor SMB performance, I reported my findings to the Unraid support forum, and I hope they can do something to improve performance, or maybe invalidate my findings. ## Unraid and Robocopy Problems In my last post I described how I converted one of my W2K16 servers to Unraid, and how I am preparing for conversion of the second server. As I’ve been copying all my data from W2K16 to Unraid, I discovered some interesting discrepancies between W2K16 SMB and Unraid SMB. I use robocopy to mirror files from one server to the other, and once the first run completes, any subsequent runs should complete without needing to copy any files again (unless they were modified). First, you have to use the “robocopy.exe /mir [dest] /mir /fft” option, for Fat File Times, allowing for 2 seconds of drift in file timestamps. I found a large number of files that would copy over and over with no changes to the source files. I also found a particular folder that would “magically” show up on Unraid, and cannot be deleted from the Unraid share by robocopy. After some troubleshooting, I discovered that files with old timestamps, and folder names that end in a dot, do not copy correctly to Unraid. I looked at the files that would not copy, and I discovered that the file modified timestamps were all set to “1 Jan 1970 00:00”. I experimented by changing the modified timestamp to today’s date, and the files copied correctly. It seems that if the modified timestamp on the source file is older than 1 Jan 1980, the modified timestamp on Unraid for the same newly created file will always be set as 1 Jan 1980. When then running robocopy again, the source files will always be reported as older, and the file copied again. Below is an example of a folder of test files with a created date of 1 Jan 1970 UTC, I copy the files using robocopy, and copy them again. The second run of robocopy again copies all the files, instead of reporting them as similar. One can see that the destination timestamp is set to 1 Jan 1980, not 1 Jan 1970 as expected. The second set of problem files occur in folder names ending in a dot. Unraid ignores the dots on the end of the folder names, and when another folder exists without dots, the copy operation uses the wrong folder. Below is an example of a folder that contains two directories, one named “LocalState”, and one named “LocalState..”. I robocopy the folder contents, and when running robocopy again, it reports an extra folder. That extra folder gets “magically” created in the destination directory, but the “LocalState..” folder is missing. The same robocopy operations to the W2K16 server over SMB works as expected. From what I researched, the timestamp ranges for NTFS is 1 January 1601 to 14 September 30828, FAT is 1 January 1980 to 31 December 2107, and EXT4 is 1 January 1970 to 19 January 2106 (2038 + 408). I could not create files with a date earlier than 1 Jan 1980, but I could set file modified timestamps to dates greater than 2106, so I do not know what the Unraid timestamp range is. Creating and accessing directories with trailing dots requires special care on Windows using the NT style notation, e.g. “CreateDirectoryW(L”\\\\?\\C:\\Users\\piete\\Unraid.Badfiles\\TestDot..”, NULL), but robocopy does handle that correctly on W2K16 SMB. I don’t know if the observed behavior is specific to Unraid SMB, or if it would apply to Samba on Linux in general. But, it posed a problem as I wanted to make sure I do indeed have all files correctly backed up. I decided to write a quick little app to find problem files and folders. The app iterates through all files and folders, it will fix timestamps that are out of range, and report on finding files or folders that end in a dot. I ran it through my files, it fixed the timestamps for me, and I deleted the folders ending in dot by hand. Multiple robocopy runs now complete as expected. ## Moving from W2K16 to Unraid I have been happy with my server rack running my UniFi network equipment and two Windows Server 2016 (W2K16) instances. I use the servers for archiving my media collection and running Hyper-V for all sorts of home projects and work related experiments. But, time moves on, one can never have enough storage, and technology changes. So I set about a path that lead to me replacing my W2K16 servers with Unraid. I currently use Adaptec 7805Q and 81605ZQ RAID cards, with a mixture of SSD for caching, SSD RAID1 for boot and VM images, and HDD RAID6 for the large media storage array. The setup has been solid, and although I’ve had both SSD and HDD failures, the hot spares kicked in, and I replaced the failed drives with new hot spares, no data lost. For my large RAID6 media array I used lots of HGST 4TB Ultrastar (enterprise) and Deskstar (consumer) drives, but I am out of open slots in my 24-bay 4U case, so adding more storage has become a problem. I can replace the 4TB drives with larger drives, but in order to expand the RAID6 volume without loosing data, I need to replace all disks in the array, one-by-one, rebuilding parity in between every drive upgrade, and then expand the volume. This will be very expensive, take a very long time, and risk the data during during every drive rebuild. I have been looking for more flexible provisioning solutions, including Unraid, FreeNAS, OpenMediaVaultStorage Spaces, and Storage Spaces Direct. I am not just looking for dynamic storage, I also want a system that can run VM’s, and Docker containers, I want it to work with consumer and or small business hardware, and I do not want to spend all my time messing around in a CLI. I have tried Storage Spaces with limited success, but that was a long time ago. Storage Spaces Direct offers significant improvements, but with more stringent enterprise hardware requirements, that would make it too costly and complicated for my home use. FreeNAS offers the best storage capabilities, but I found the VM and Docker ecosystem to be an afterthought and still lacking. OpenMediaVault (OMV) is a relative newcomer, the web front-end is modern, think of OMV as Facebook and FreeNAS and Unraid as MySpace, with growing support for VM’s and Docker. Compared to FreeNAS and Unraid the OMV community is still very small, and I was reluctant to entrust my data to it. Unraid offered a good balance between storage, VM, and Docker, with a large support community. Unlike FreeNAS and OMV, Unraid is not free, but the price is low enough. An ideal solution would have been the storage flexibility offered by FreeNAS, the docker and VM app ecosystem offered by Unraid, and the UI of OMV. Since that does not exist, I opted to go with Unraid. Picking a replacement OS was one problem, but moving the existing systems to run on it, without loosing data or workloads, quite another. I decided to convert the two servers one at a time, so I moved all the Hyper-V workloads from Server-2 with the 8-bay chassis, to Server-1 with the 24-bay chassis. This left Server-1 unused, and I could go about converting it to Unraid. I not only had to install Unraid, I also had to provision enough storage in the 8-bay chassis to hold all the data from the 24-bay chassis, so that I could then move the data on Server-1 to Server-2, convert Server-1 to Unraid, and move the data back to Server-1. And I had to do this without risking the data, and without an extended outage. To get all the data from Server-1 to fit on Server-2, I pruned the near 60TB set down to around 40TB. You know how it works, no matter how much storage you have it will always be filled. I purchased 4 x 12TB Seagate IronWolf ST12000VN0007 drives, and combined with 2 x 4TB HGST drives, gave me around 44TB of of usable storage space, enough to copy all the important data from Server-1 to Server-2. While I was at it, I decided to upgrade the IPMI firmware, motherboard BIOS, and RAID controller firmware. I knew it is possible to upgrade the SuperMicro BIOS through IPMI, but you have to buy a per-motherboard locked Out-of-Band feature key from SuperMicro to enable this, something I had never bothered doing. While looking for a way to buy a code online, I found an interesting post describing a method of creating my own activation keys, and it worked. IPMI updated, motherboard BIOS updated, RAID firmware updated, I set about converting the Adaptec RAID controller from RAID to HBA mode. Unlike the LSI controllers that need to be re-flashed with IR or IT firmware to change modes, the Adaptec controller allows this configuration via the controller BIOS. In order to change modes, all drives have to be uninitialized, but there were two drives that I could not uninitialize. After some troubleshooting I discovered that it is not possible to delete MaxCache arrays from the BIOS. I had to boot using the Adaptec bootUSB Utility, that is a Linux bootable image that runs the MaxView storage controller GUI. MaxCache volumes deleted, I could convert to HBA mode. With the controller in HBA mode, I set about installing Unraid, well, it is not really installing in the classic sense, Unraid runs from a USB drive, and all drives in the system are used for storage. There are lots of info online on installing and configuring Unraid, but I found very good info on the Spaceinvader One Youtube channel. I have seen some reports of issues with USB drives, but I had no problems using a SanDisk Cruzer Fit drive. It took a couple iterations before I was happy with the setup, and here are a few important things I learned: • Unraid does not support SSD drives as data drives, see the install docs; “Do not assign an SSD as a data/parity device. While unRAID won’t stop you from doing this, SSDs are only supported for use as cache devices due TRIM/discard and how it impacts parity protection. Using SSDs as data/parity devices is unsupported and may result in data loss at this time.” This is one area where FreeNAS and OMV offer much better redundancy solutions using e.g. ZFS over Unraid’s parity solution, or many other commercial solutions that have for many years been using SSD’s in drive arrays. • Unraid’s caching solution using SSD drives and BTRFS works just ok. Unlike e.g. Adaptec MaxCache that seamlessly caches block storage regardless of the file system, the Unraid cache works at the file level. While this does create flexibility in deciding which files from which shares should be using the cache, it greatly complicates matters when running out of space on the cache. When a file is created on the cache, and the file is then enlarged to the point it no longer fits in the available space, the file operation will permanently fail. E.g. copying a large file to a cached share, and the file is larger that the available space, the copy will proceed until the cache runs out of space, and then fail, repeat and get the same. To avoid this, one has to set the minimum free space setting to a value larger than the largest file that would ever be created on the cache, for large files, this is very wasteful. Imagine a thin provisioned VM image, it can grow until no space, and then fail, until manually moved to a different drive. • The cache re-balancing and file moving algorithm is very rudimentary, the operation is schedule per time period, and will move files from the cache to regular storage. There is no support for flushing the cache in real-time as it runs out of space, there is no high water or low water mechanisms, no LRU or MRU file access logic. I installed the Mover Tuning plugin that allows balancing the cache based on consumed space, better, but still not good enough. • Exhausting the cache space while copying files to Unraid is painfully slow. I used robocopy to copy files from W2K16 to a share on Unraid that had caching set to “preferred”, meaning use the cache if it has space, and as soon as the cache ran out of space, the copy operation slowed down to a crawl. As soon as the cache ran out of space, new files were supposed to be written to HDD, but my experience showed that something was not working, and I had to disable the cache and then copy the files. The whole SSD and caching thing is a big disappointment. • Building parity while copying files is very slow. Copying files using robocopy while the parity was building resulted in about 200Mbps throughput, very slow. I cancelled the parity operation, disabled the parity drive, and copied with no parity protection in place, and got near the expected 1Gbps throughput. I will re-enable parity building after all data is copied across. • Performing typical disk based operations like add-, remove-, or replace- a drive, is very cumbersome. The wiki tries to explain, but it is still very confusing. I really expected much easier ways of doing typical disk based operations, especially when almost all operations result in the parity becoming invalid, leaving the system exposed to failure. • It is really easy to use Docker, with containers directly from Docker Hub, or from the Community Applications plugin that acts like an app store. • It is reasonably easy to create VM’s, one has to manually install the LibVirt KVM/QEMU drivers in Windows OS’s, but it is made easy with the automatic mounting of the LibVirt driver ISO. • I could not get any Ubuntu Desktop VM’s working, they would all hang during install. I had no problems with Ubuntu Server installs. I am sure there is a solution, I just did not try looking yet as I only needed Ubuntu Server. • VM runtime management is lacking, there is no support for snapshots or backups. One can install the Virt-Manager container to help, but it is still rather rudimentary compared to offerings from VMWare, Hyper-V, and VirtualBox. • In order to get things working I had to install several community plugins, I would have expected this functionality to be included in the base installation. Given how active the plugin authors are in the community, I wonder if not including said functionality by default may be intentional? • Drive power saving works very well, and drives are spun down when not in use. I will have to revisit the file and folder to drive distribution, as common access patterns to common files should be constrained to the same physical drive. • The community forum is very active and very helpful. I still have a few days of file copying left, and I will keep my W2K16 server operational until I am confident in the integrity and performance of Unraid. When I’m ready, I’ll convert the second server to Unraid, and then re-balance the storage, VM, and Docker workloads between the two servers. ## Monoprice MP Voxel 3D Printer Setup [Update] After using the printer for about two weeks, I returned it to Monoprice for a refund. I would suggest you stay away and look elsewhere, or wait until Monoprice addresses the serious issues; Polar Cloud disconnects while printing, IO timeout error breaks camera function, and the deal breaker is hangs during printing, with the touch screen unresponsive and the extruder and print bed heater still on. I’ve been looking for a new 3D printer to use at home, and I just installed and configured my new Monoprice Voxel 3D Printer. I bought my first 3D printer, a Makerbot Replicator 2, for our office new-tech lab in 2012. It was a shared printer, hard to maintain, and eventually printed to death. A few months ago I found the HobbyKing Turnigy Mini Fabrikator V2 on sale at a great price. It sat unopened, until a few weeks ago, when I wanted to use it to print Christmas ornaments for the kids to decorate. What a disappointment, difficult to configure, flimsy construction, and during the first print the filament guide adapter broke away from the print head, impossible to fix without ordering replacement parts, I just trashed it. In search of a replacement, I decided to look for something the kids could use with minimal (wishful thinking at this age) help from me. That meant an enclosed printer and iPad capable software. A semi-helpful Reddit post made it clear to me that the best printers, and best value for money printers, are not really kid friendly, but it did point me in the right direction. To Polar Cloud, and to FlashForge, and eventually the Monoprice Voxel, that is a cheaper Monoprice branded FlashForge Adventurer 3. Unboxing and setting up was easy, and in a few minutes I printed the sample cube that is included on the 8GB internal memory. The unit is quiet while printing, but the continuous high pitched fan noise is annoying. Next I configured WiFi and connected to Polar Cloud, this is where things started going wrong. The manual that is included in the box is not complete, and I found an updated manual and instructional video on the Monoprice site. The steps to connect to WiFi is easy, but I found that using the touch screen was very difficult. The WiFi password is hidden with asterisks, and the touch screen has a tendency to not respond, or to pick multiple characters, or the wrong character. Since I could not see the password, it took several frustrating tries to get the right password entered. This experience could easily be improved by simply not hiding the password, or allowing configuration and control via mobile app. The Monoprice site lists the FlashPrint software as version 3.23.2, while the version on the FlashForge site is 3.25.1, so naturally I installed the latest software from FlashForge. Mistake, I cloud not select the Voxel as printer, and it turns out that “FlashPrint-MP” is for Monoprice printers, and “FlashPrint” is for FlashForge printers. So back to installing FlashPrint-MP 3.23.2. The software is typical modeling and slicing, and allowed me to connect to the printer over the network. I followed the instructions to connect the Voxel to Polar Cloud, but I kept getting an error about my printer MAC address already belonging to a different user. I found a KB article that instructed me to hard reset the printer, and I dreaded having to reenter my WiFi password. The article mentioned that this problem will be addressed in a future firmware update, so I tried that first. I could not find any downloadable firmware, but I found that updating from the printer pulled new firmware over the internet. After a reboot, no more error, and I was connected to Polar Cloud. I encountered some IO timeout errors being displayed on the printer, I thought it was related to Polar Cloud, but I later encountered them during normal operation navigating the camera configuration menus. I suspect it is related the camera, since I lost the camera view on Polar Cloud as soon as I got this error. I hope this gets fixed in a firmware update. My first cloud print after the firmware update did not go so well, the head scratched the print plate. I did find other users (Amazon reviews) complaining of the same scratching problem, I suspect the firmware update may have reset the calibration. After re-leveling, cloud prints worked fine again, but this should never have happened. Printing from Polar Cloud is super simple, select a community model, upload your own model, and print. Job is sliced per selected options, and sent to the printer, it is a nice touch to have the printer camera images be displayed on the website while printing. I did notice that the time estimate displayed on the printer keeps changing, e.g. Polar Cloud listed the time remaining as 2 hours 26 minutes, Voxel 2 minutes … 11 minutes … 45 minutes. I suspect the job is spooled such that the printer only knows what is left in the buffer vs. the complete job. In closing, so far, I think it is a great printer for the money, and I hope future firmware updates improves the experience. Bad: • Instructions are incomplete and all over the place. • Poor touchscreen experience, difficult to press, double press, wrong press. • Difficult WiFi setup, password is hidden, and combined with wrong key presses results in frustration. • Scratched build plate, re-leveling seems to be required after a firmware update. • Out of the box Polar Cloud would not connect due to duplicate MAC address, requires firmware update to fix. • Sporadic IO timeout errors, suspect related to camera. • High pitched fan noise (I may try to mod this by replacing the fan or adding sound dampening material). • Hangs during printing, with the extruder and build plate heaters still on, and touchscreen unresponsive. Good: • Low price • Enclosed • Easy to use • Cloud print For others setting up this printer, I would recommend the following steps: 1. Ignore the manual in the box, download the manual from Monoprice. 2. Connect to WiFi, I suggest using the eraser part of a pencil to touch the screen while entering the password, this helps prevent typing mistakes. 3. Update the firmware from the tools menu. 4. Re-level the build plate. 5. Connect to Polar Cloud. 6. Print, do not print until after re-leveling. Next steps for me is to find suitable software for the kids to use for modeling on their iPads. ## PurpleAir Sensor Installation I’ve had a Ambient Weather WS-1400-IP weather station installed for some time, reporting to Weather Underground. During the fires of previous years, I considered getting an air quality monitor, but I could never find anything worth the installation effort. During this year’s fire season I saw several ads for PurpleAir, advertising that they collaborate with Weather Underground, so I decided to purchase and install a PA-II outdoor sensor. The installation instructions are sparse, and the device is not really what I would call rugged or weather proof. I would not put money it on it surviving outdoors for longer than a year, specifically because of the use a vanilla Micro-USB power plug that offers no corrosion protection. The unit I received came with a Nest outdoor camera power cable, but unlike the Nest camera that uses a watertight plug, the sensor uses an open USB cable. The instructions do say to point the open USB port downwards, instead I opted to seal it in using clear silicone sealer. The ideal installation location would have been near my Rachio water sprinkler controller, where I have a waterproof enclosure with power, but that location is also near the HVAC, instant hot water heater, and dryer vents, so not ideal due to local air pollutants. I installed the sensor next to my UniFi AC Mesh AP outdoor AP, the cables do look a bit messy, and if the sensor survives long enough, I may install an enclosure to clean up the cables. Configuring the device is reasonably simple, but a mobile app would have been easier. Power up the device, connect to it’s WiFi access point, access a web page hosted by the device, configure the local WiFi SSID and password, connect to local WiFi, then register the device with PurpleAir. After all is done, I received a welcome email, and I could see the device on the PurpleAir map. Next up, I have to figure out how to view combined weather and air quality data on Weather Underground, how to get a direct link to my sensor’s data (the map link shows an area only), and how to use the data API (I archive all my data). ## eNom Dynamic DNS Update Problems Update: On 27 July 2018 eNom support notified me by email that the issue is resolved. I tested it, and all is back to normal with DNS-O-Matic. Sometime between 12 May 2018 and 24 May 2018 the eNom dynamic DNS update mechanism stopped working. I use the very convenient DNS-O-Matic dynamic DNS update service to update my OpenDNS account, and several host records at eNom, pointing them to my home IP address. I was first alerted to the problem by a DNS-O-Matic status failure email, but as I was about to get on a plane for a business trip, I ignored the issue, hoping it was temporary. eNom response for 'foo.bar.net': -------------------- ;URL Interface ;Machine is SJL0VWAPI03 ;Encoding Type is utf-8 Command=SETDNSHOST APIType=API.NET Language=eng ErrCount=1 Err1=Domain name not found ResponseCount=1 ResponseNumber1=316153 ResponseString1=Validation error; not found; domain name(s) MinPeriod=1 MaxPeriod=10 Server=sjl0vwapi03 Site=eNom IsLockable= IsRealTimeTLD= TimeDifference=+0.00 ExecTime=0.053 Done=true TrackingKey=5d09a343-b2d6-44e2-8d70-0ad9bcabcb8d RequestDateTime=6/21/2018 6:11:11 PM -------------------- Here is the update history from DNS-O-Matic: 47.44.1.123, Jun 29, 2018 4:58 pm, ERROR 47.44.1.123, Jun 29, 2018 4:53 pm, ERROR 47.44.1.123, Jun 21, 2018 6:11 pm, ERROR 47.44.1.123, May 24, 2018 6:10 pm, ERROR 47.44.1.124, May 12, 2018 8:56 am, OK 47.44.1.124, May 4, 2018 2:48 pm, OK 47.44.1.124, May 3, 2018 1:42 pm, OK 47.44.1.124, Apr 1, 2018 12:39 pm, OK 47.44.1.124, Apr 1, 2018 9:58 am, OK 47.44.1.124, Mar 24, 2018 5:06 pm, OK As of yesterday, I could not find any other reports of similar issues on google, and the eNom status page showed no problems. I use a Ubiquity UniFi Security Gateway Pro as home router, and I have the dynamic DNS service in the UniFi controller configured to point to DNS-O-Matic, but it offered no additional hints as to the cause of the problem. I contacted eNom support over chat, and they informed me they know there is an issue, and they said I should use the following format for the update: http://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=%1&PW=%2&Zone=%3&DomainPassword=%4 %1 = Is username in Enom %2 = Is password %3 = Is my host and domain %4 = Is my domain access password This was interesting, I had looked at several eNom update scripts, even the eNom sample code, and they all used a different command format. I looked up the SetDNSHost documentation, and sure enough, it looks like eNom changed the API. Old format: https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&HostName=[host]&Zone=[domain]&DomainPassword=[password]&Address=[IP] New format: https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=[LoginName]&PW=[LoginPassword]&Zone=[FQDN]&DomainPassword=[Password]&Address=[IP] eNom changed the meaning of the “Zone” parameter to be the fully qualified domain name, and they required the addition of the account username and password. I tried the old format in my browser, and I got the same “Domain name not found” error. As I tried the URL, I noticed that HTTPS failed with a certificate mismatch. The certificate for https://dynamic.name-services.com points to reseller.enom.com. Broken SSL, and including my account username and password was not an acceptable option, additionally I use 2FA on my account, so I had doubts that my password would even work. I tried the command as described in the documentation, but I omitted my account password, and it worked. https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=[LoginName]&Zone=[FQDN]&DomainPassword=[Password]&Address=[IP] I still find it very weird that this has been broken for so long, and that I could not find other reports of the problem on google, are people not using eNom or eNom resellers with dynamic DNS? I also find it disappointing that the status page is not reflecting this problem, and that the SSL domain does not match, one would expect more from a domain company. Until eNom fixes the problem, or until DNS-O-Matic updates support for the new API format, I created a PowerShell script to update my domains, maybe it is useful for others with the same problem. UserName = 'eNom account username'
$HostNames = @('www', 'name1', 'name2', 'etc')$DomainName = 'yourdomain.com'
$Password = 'Domain change password'$url = 'http://myip.dnsomatic.com'
$webclient = New-Object System.Net.WebClient$result = $webclient.DownloadString($url)
Write-Host $result$IPAddress = $result.ToString()$webclient.Dispose()

# Ignore SSL error caused by dynamic.name-services.com SSL certificate pointing to a different domain
[System.Net.ServicePointManager]::ServerCertificateValidationCallback = {$true}$webclient = New-Object System.Net.WebClient
foreach ($hostname in$HostNames)
{
$url = "https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=$UserName&Zone=$hostname.$DomainName&DomainPassword=$Password&Address=$IPAddress"
Write-Host $url$result = $webclient.DownloadString($url);
Write-Host $result }$webclient.Dispose()
[System.Net.ServicePointManager]::ServerCertificateValidationCallback = $null ## CrashPlan throws in the towel … for home users Today CrashPlan, my current online backup provider, announced on Facebook of all places, that they threw in the towel, and will no longer provide service to home users. The backlash was heated, and I found the CEO’s video message on the blog post rather condescending. I’ve been a long time user of online backup providers, and many have thrown in the towel, especially when free file sync from Google and Microsoft offers ever expanding capabilities and more and more free storage. Eventually even the cheapest backup storage implementation becomes expensive, when compared to a cloud provider, and not profitable as a primary business. I’ve been using CrashPlan’s unlimited home plan for quite some time now, they were one of a few, today none, that were reasonably priced, allowed unlimited storage, and supported server class OS’s. But, I could sense the writing was on the wall; they split the home and business Facebook account, they split the website, the home support site has not seen activity in ages, they made major improvements to the enterprise backup agent, switching to a much leaner and faster C++ agent, while the home agent remained the old Java app with its many shortcomings, and there were some vague rumors on the street of a home business selloff attempt. The transition offered a free switch to the small business plan, for the remaining duration of the home subscription, plus 3 months, and then a 75% discount on next year’s plan. For my account, this means free CrashPlan Pro until 12 June 2018, then$2.50 per month until 12 June 2019, and then \$10.00 per month.

I’ve switched to the Pro plan, as they promised the agent updated itself, going from the old Java to the new C++ agent, the already backed up data was retained without needing to backup again, and all seems well, for now…

## Razer BSOD When Driver Verifier is Enabled

I am done with Razer, exciting promises for technology on paper, great looking hardware, terrible support, terrible software.

Not too long ago I complained about Razer’s poor UX and Support, this time it is a BSOD in one of their drivers, and forever crashing Razer Stargazer camera software.

I’ve been looking for a Windows Hello capable webcam, and the Razer Stargazer, based on Intel RealSense technology, looked promising. The device is all metal and tactical looking, but the software experience is so buggy, install this, install that, then crash after crash after crash. I ended up returning it for a refund, and got a Logitech BRIO instead, the BRIO is cheaper, and works great.

A couple days ago I was greeted with a BSOD on one of my test machines, a crash in the RZUDD.SYS “Razer Rzudd Engine” driver, part of the Razer Synapse software. What makes this interesting, is that the issue seems to be triggered by having Driver Verifier enabled.

One may be tempted to say do not enable Driver Verifier, but, the point of driver verifier is to help detect bugs in drivers, and is a basic requirement for driver certification. Per the WinDbg analysis, this appears to be a memory corruption bug. After some searching, I found that the Driver Verifier BSOD has been reported by other users, with no acknowledgement, and no fix forthcoming. I contacted Razer support, and not surprisingly, they suggested uninstall and reinstall. I tried the community forums, and I was just pointed back to support.

FAULTING_IP:
rzudd+28c80
...
DEFAULT_BUCKET_ID:  CODE_CORRUPTION
...
PROCESS_NAME:  RzSynapse.exe
...
STACK_TEXT:
nt!KeBugCheckEx
nt!MiSystemFault+0x12e69c
nt!MmAccessFault+0xae6
nt!KiPageFault+0x132
rzudd+0x28c80
rzudd+0x218d4
rzudd+0x7a9f
Wdf01000!FxIoQueue::DispatchRequestToDriver+0x1bf [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3325]
Wdf01000!FxIoQueue::DispatchEvents+0x3bf [minkernel\wdf\framework\shared\irphandlers\io\fxioqueue.cpp @ 3125]
Wdf01000!FxPkgIo::DispatchStep1+0x53e [minkernel\wdf\framework\shared\irphandlers\io\fxpkgio.cpp @ 324]
Wdf01000!FxDevice::DispatchWithLock+0x5a5 [minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1430]
nt!IovCallDriver+0x245
...
FAILURE_BUCKET_ID:  MEMORY_CORRUPTION_LARGE


I am done with Razer, exciting promises for technology on paper, great looking hardware, terrible support, terrible software.