Recovering the Firmware on a Supermicro BPN-SAS3-846EL1 Backplane

In a previous adventure I replaced an Adaptec HBA with a LSI SAS3 HBA, and the chassis drive bay LED’s stopped working. I suspect the LSI card does not play nice with the SGPIO sideband controller, and I decided to replace the chassis with one similar to my SC846 chassis, where the LSI card and drive bay LED’s do work fine.

What should have been a simple replacement turned into quite a recovery operation.

Since I now had SAS3 HBA’s in both my servers, I really wanted to get SAS3 backplanes, but I did not want to pay SAS3 chassis prices. I found a refurbished SuperChassis 846E16-R1200B chassis and two refurbished Supermicro BPN-SAS3-846EL1 backplanes on eBay. The one SAS3 backplane would replace the SAS2 backplane in my existing SC846, and the second SAS3 backplane would replace the SAS2 backplane in the newly purchased SC846. The combination of the 24 bay 4U chassis with a SAS2 backplane and a replacement SAS3 backplane is much cheaper compared to any native SAS3 chassis I could find.

I do understand that using SATA3 drives on a SAS3 backplane will not perform like SAS3 drives, but with multipath the aggregate throughput can still outperform SAS2.

I received my chassis and my backplanes. The chassis was clean but a bit dinged in one corner, and the expanders were clean, but the metal frames showing a little rust. I had no idea how old the firmware on the backplanes were, so I contacted Supermicro support to ask for the current firmware. After asking for my serial numbers, Supermicro sent me the latest firmware for my hardware. The firmware update instructions were included in the “ReleaseNote.txt” file that came with the firmware.

I removed the motherboard from the old chassis and installed it in the new SC846. I removed the SAS2 backplane, and installed the SAS3 backplane in its place. The power cable layout on the SAS3 backplane is a bit different, and I had to use a few molex power splitters to extend the power cables to reach the power plugs on the SAS3 backplane. The standard rails are too long to fit in my rack, and the short rails are too short for the chassis, so as I again used short outer rails and standard inner rails.

I powered the machine up through remote IPMI KVM, all looked good, and I booted into my Ubuntu Server USB stick so I could SSH into the box, and update the firmware.

The instructions from “ReleaseNote.txt” say:

How to Flash Firmware
-------------------------
Under Linux/Windows Environment to use CLIXTL (ver.6.00)
1. use "CLIXTL -l" to show SAS addresses
2. use "CLIXTL -f all  -d " to update firmware
3. use "CLIXTL -f 3  -d  -r" to update MFG and reset expander

example:
CLIXTL -l
CLIXTL -f all -t 500304800000007F -d SAS3-EXPFW_66.16.11.00.fw
CLIXTL -f 3 -t 500304800000007F -d BPN-SAS3-846EL-PRI_16_11.bin -r

The existing firmware on the expanders were v66.06.01.02 and MFG v06.02, while the new firmware was v66.16.11.00 and MFG v16.11.

Firmware Update History
-------------------------
01. migrate expander firmware to phase 16
02. enhance TMP, VOL, FAN, and PWS status in SES pages
03. present version information of current running firmware
04. Dynamic SES page element presentation
05. move BMC IP to SCSI network inquiry
06. support I2C R/W as slave to let BMC to identify platforms
07. redundant side has sensor information
08. firmware rewrite and optimization

I did the following:

$sudo ./CLIXTL -i -t 5003048001B24DBF ================================================================================ COMMAND-LINE INTERFACE XTOOL version 6.10.C Supermicro Computer ,Inc. ================================================================================ UNIT SPECIFIC INFORMATION: SAS ADDRESS - 5003048001B24DBF ENCLOSURE ID - 5003048001B24DBF ENCLOSURE INFORMATION: PLATFORM NAME - SMC846ELSAS3P SERIAL NUMBER - VENDOR ID - LSI PRODUCT ID - SAS3x40 VERSION INFORMATION: FLASH REGION 0 - 66.06.01.02 FLASH REGION 1 - 66.06.01.02 FLASH REGION 2 - 66.06.01.02 FLASH REGION 3 - 06.02 DEVICE INFORMATION: DEVICE NAME - /dev/sg0 BMC IP - NULL$ sudo ./CLIXTL -f all -t 5003048001B24DBF -d SAS3-EXPFW_66.16.11.00.fw
================================================================================
COMMAND-LINE INTERFACE XTOOL
version 6.10.C
Supermicro Computer ,Inc.
================================================================================
Firmware Region 0 - Finished
Firmware Region 1 - Finished
Firmware Region 2 - Finished

pieter@ubuntuusb:~/SAS3$sudo ./CLIXTL -f 3 -t 5003048001B24DBF -d BPN-SAS3-846EL-PRI_16_11.bin -r ================================================================================ COMMAND-LINE INTERFACE XTOOL version 6.10.C Supermicro Computer ,Inc. ================================================================================ Error, incompatible file type or directory And the expander dropped and never came back up. $ sudo ./CLIXTL -l
================================================================================
COMMAND-LINE INTERFACE XTOOL
version 6.10.C
Supermicro Computer ,Inc.
================================================================================
Error, no enclosure has been found

I can hear the comments now, why risk updating the firmware if it is not broken, true, but I didn’t know if it would work, and I’d rather start fresh. Do also note that I did the update on my secondary server, my primary server is still running unmodified, so no interruption of home service or work experiments.

I still had the second SAS3 backplane, so I replace the “bricked” one, leaving the firmware as is, and brought my Unraid server back up, all appeared fine. At least I had two working servers, giving me time to try and recover the backplane.

I sent Supermicro support an email asking for help, but since it was weekend I had to wait, so I did my own research. I found a forum post of a user that recovered a “bricked” SAS2 expander via the factory serial port, and I decided to give it a try.

I used my Arduino programming FTDI USB RS232 adapter, and the pin connections for PRI-SDB / 8 are:

PRI-SDB : 1 : TX  -> RS232 : RX
PRI-SDB : 2 : GND -> RS232 : GND
PRI-SDB : 3 : RX  -> RS232 : TX

The current XTools v6.10.C CLI does not include COM support, at least none that I could find in the documentation or CLI help, so I used the older v1.4 version.

>g3Xflash.exe -s com4 get avail
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface.....
Expander: Unknown (SAS_3X_40)
1) Unknown (SAS_3X_40) (00000000:00000000)

Good sign, the COM port worked, and the expander hardware was detected, but did not have an address.

I flashed the firmware and the MFG data:

>g3Xflash.exe -s com4 down fw SAS3-EXPFW_66.16.11.00.fw 0
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface..
Expander: Unknown (SAS_3X_40)
Expander Validation: Passed
Checksum: Passed
Target Firmware Region: 00
Current Version: 255.255.255.255
Replacement Version: 66.16.11.00
Image Validation: Passed
Pre-Validation of image is successful.
Post-validating........................................................Post-Validation of image is successful.

>g3Xflash.exe -s com4 down mfg BPN-SAS3-846EL-PRI_16_11.bin 3
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface.....
Expander: Unknown (SAS_3X_40)
Image Validation: Passed
Checksum: Passed
Reading MFG version from flash...Unable to retrieve version.
Replacement Version: 10.0b
Pre-Validation of image is successful.
Post-validating.........Post-Validation of image is successful.
Download Successful.

I reset the expander, the LED’s now did a test pattern that they did not do before, and things looked good:

>g3Xflash.exe -s com4 reset exp
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface....................
Expander: SC846-P (SAS_3X_40)
Are you sure you want to reset Expander?(y/n):y
Expander reset successful.

>g3Xflash.exe -s com4 get avail
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface..
INFO: Bootstrap is not present on board.
..................
Expander: SC846-P (SAS_3X_40)
1) SC846-P (SAS_3X_40) (50030480:0000007F)

>g3Xflash.exe -s com4 get exp

********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface....................
Expander: SC846-P (SAS_3X_40)
Expander: SC846-P (SAS_3X_40) C1
Enclosure Logical Id: 50030480:0000007F
Component Identifier: 0x0232
Component Revision: 0x03

>g3Xflash.exe -s com4 get ver 0
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface....................
Expander: SC846-P (SAS_3X_40)
Firmware Region Version: 66.16.11.00

Everything looked good, except the SAS address defaulted to 50030480:0000007F.

The firmware “ReleaseNote.txt” file states that the v6.00 CLIXTL tool can change the SAS address, but the only version on the Supermicro site is the  v6.10.C version, that does not support changing the SAS address.

How to modify SAS address
-------------------------
Under Linux/Windows Environment to use CLIXTL (ver.6.00)
1. use "CLIXTL -l" to show SAS addresses
2. use "CLIXTL -s  -t  -r" to change SAS address and reset expander

example:
CLIXTL -l
CLIXTL -s 500304801234567F -t 500304800000007F -r

The v1.4 GUI does support changing the SAS address. It appears that the GUI dynamically creates a MFG image (I could see a BIN file get created in the directory), but after it changed the address, the backplane was back to a borked state, and I had to repeat the recovery process.

By the next week I heard back from Supermicro, and they confirmed the instructions from the “ReleaseNote.txt” file were wrong, and I should use the instructions from the “Command-line Xtool 6.10.C.pdf” file.

Wrong, will bork the MFG data:
CLIXTL -f 3 -t 5003048001B24DBF -d BPN-SAS3-846EL-PRI_16_11.bin -r

Right:
CLIXTL -c -t 5003048001B24DBF -d BPN-SAS3-846EL-PRI_16_11.bin

Better, update firmware and MFG and retain settings:
CLIXTL -a usc -t 5003048001B24DBF -d ~/

I used the all in one update method on the server that was running the original firmware backplane, and it updated without issue:

# ./CLIXTL -a usc -t 500304800914683F -d ~/CLIXTL6.10.C_Linux/
================================================================================
COMMAND-LINE INTERFACE XTOOL
version 6.10.C
Supermicro Computer ,Inc.
================================================================================
Firmware Region 0 - Finished
Firmware Region 1 - Finished
Firmware Region 2 - Finished
MFG page Region 3 - Finished

[Reboot]

# ./CLIXTL -i -t 500304800914683F
================================================================================
COMMAND-LINE INTERFACE XTOOL
version 6.10.C
Supermicro Computer ,Inc.
================================================================================
UNIT SPECIFIC INFORMATION:
ENCLOSURE ID - 500304800914683F
ENCLOSURE INFORMATION:
PLATFORM NAME - SMC846ELSAS3P
SERIAL NUMBER -
VENDOR ID - LSI
PRODUCT ID - SAS3x40
VERSION INFORMATION:
FLASH REGION 0 - 66.16.11.00
FLASH REGION 1 - 66.16.11.00
FLASH REGION 2 - 66.16.11.00
FLASH REGION 3 - 16.11
DEVICE INFORMATION:
DEVICE NAME - /dev/sg11
BMC IP - NULL

I now had one perfectly updated backplane preserving all the original MFG data, and one backplane with default MFG data. I wanted to apply the MFG data from the good backplane to the default values backplane.

I downloaded the firmware and manufacturing data from the good backplane using the v1.4 tools (not supported in current CLIXTL):

./g3Xflash -i get avail
./g3Xflash -y -i 500304800914683F up fw up_fw_loader_0.fw 0
./g3Xflash -y -i 500304800914683F up fw up_fw_loader_1.fw 1
./g3Xflash -y -i 500304800914683F up fw up_fw_loader_2.fw 2
./g3Xflash -y -i 500304800914683F up mfg up_mfg_loader.bin 3

The downloaded files are larger and are padded with 0xFF or 0x00. I trimmed the MFG file to the right size, and modified the SAS address in two places:

I tried to uploaded the modified MFG data:

>g3Xflash.exe -y -s com4 down mfg BPN-SAS3-846EL-PRI_16_11.bin 3
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
Initializing Interface.....
Expander: Unknown (SAS_3X_40)
Image Validation: Passed
Checksum: Failed

But the tool complains that the checksum failed. From the file diff we can see that there is more than just the SAS address that change, I assume some sort of checksum calculation that goes with the data.

The v1.4 g3xFlash CLI help does reference XML options for converting between binary and XML MFG formats, but no instructions on how to use it. Like the serial recovery procedure these tools are probably for internal use only, and I could find no public references.

>g3Xflash.exe -h
********************************************************************************
g3Xflash
LSI SAS Expander Flash Utility
Version: 2.0.0.0
********************************************************************************
SYNTAX:
g3Xflash OPTIONS INTERFACE COMMAND
OPTIONS:
...
COMMAND:
...
up mfg
...
In case of mfg upload using XML file, command syntax changes
as below. XML file needs to be specified at two places.
e.g.
"g3Xflash -x   up mfg  3"

Supermicro support confirmed the current tools cannot change the SAS address, they would not supply the older version of the tools, and recommended I send the backplane in for service, or allow them to remotely SSH to the machine and they will change it for me. A bit disappointing that something so simple is made so complicated, and really way too much trouble.

Since I only had one expander in the chassis, there would be no issues using a default SAS address, and I decided to leave it as is. I replaced the other SAS2 backplane with the recovered SAS3 backplane, and the expander and all drives were back online.

If anybody knows how to update the SAS address, or has a copy of the v6.00 CLIXTL tools that supposedly can change the address, please do let me know.

Unraid and Robocopy Problems

In my last post I described how I converted one of my W2K16 servers to Unraid, and how I am preparing for conversion of the second server.

As I’ve been copying all my data from W2K16 to Unraid, I discovered some interesting discrepancies between W2K16 SMB and Unraid SMB. I use robocopy to mirror files from one server to the other, and once the first run completes, any subsequent runs should complete without needing to copy any files again (unless they were modified).

First, you have to use the “robocopy.exe /mir [dest] /mir /fft” option, for Fat File Times, allowing for 2 seconds of drift in file timestamps.

I found a large number of files that would copy over and over with no changes to the source files. I also found a particular folder that would “magically” show up on Unraid, and cannot be deleted from the Unraid share by robocopy.

After some troubleshooting, I discovered that files with old timestamps, and folder names that end in a dot, do not copy correctly to Unraid.

I looked at the files that would not copy, and I discovered that the file modified timestamps were all set to “1 Jan 1970 00:00”. I experimented by changing the modified timestamp to today’s date, and the files copied correctly. It seems that if the modified timestamp on the source file is older than 1 Jan 1980, the modified timestamp on Unraid for the same newly created file will always be set as 1 Jan 1980. When then running robocopy again, the source files will always be reported as older, and the file copied again.

Below is an example of a folder of test files with a created date of 1 Jan 1970 UTC, I copy the files using robocopy, and copy them again. The second run of robocopy again copies all the files, instead of reporting them as similar. One can see that the destination timestamp is set to 1 Jan 1980, not 1 Jan 1970 as expected.

The second set of problem files occur in folder names ending in a dot. Unraid ignores the dots on the end of the folder names, and when another folder exists without dots, the copy operation uses the wrong folder.

Below is an example of a folder that contains two directories, one named “LocalState”, and one named “LocalState..”. I robocopy the folder contents, and when running robocopy again, it reports an extra folder. That extra folder gets “magically” created in the destination directory, but the “LocalState..” folder is missing.

The same robocopy operations to the W2K16 server over SMB works as expected.

From what I researched, the timestamp ranges for NTFS is 1 January 1601 to 14 September 30828, FAT is 1 January 1980 to 31 December 2107, and EXT4 is 1 January 1970 to 19 January 2106 (2038 + 408). I could not create files with a date earlier than 1 Jan 1980, but I could set file modified timestamps to dates greater than 2106, so I do not know what the Unraid timestamp range is.

Creating and accessing directories with trailing dots requires special care on Windows using the NT style notation, e.g. “CreateDirectoryW(L”\\\\?\\C:\\Users\\piete\\Unraid.Badfiles\\TestDot..”, NULL), but robocopy does handle that correctly on W2K16 SMB.

I don’t know if the observed behavior is specific to Unraid SMB, or if it would apply to Samba on Linux in general. But, it posed a problem as I wanted to make sure I do indeed have all files correctly backed up.

I decided to write a quick little app to find problem files and folders. The app iterates through all files and folders, it will fix timestamps that are out of range, and report on finding files or folders that end in a dot. I ran it through my files, it fixed the timestamps for me, and I deleted the folders ending in dot by hand. Multiple robocopy runs now complete as expected.

eNom Dynamic DNS Update Problems

Update: On 27 July 2018 eNom support notified me by email that the issue is resolved. I tested it, and all is back to normal with DNS-O-Matic.

Sometime between 12 May 2018 and 24 May 2018 the eNom dynamic DNS update mechanism stopped working.

I use the very convenient DNS-O-Matic dynamic DNS update service to update my OpenDNS account, and several host records at eNom, pointing them to my home IP address.

I was first alerted to the problem by a DNS-O-Matic status failure email, but as I was about to get on a plane for a business trip, I ignored the issue, hoping it was temporary.

eNom response for 'foo.bar.net':
--------------------
;URL Interface
;Machine is SJL0VWAPI03
;Encoding Type is utf-8
Command=SETDNSHOST
APIType=API.NET
Language=eng
ErrCount=1
ResponseCount=1
ResponseNumber1=316153
MinPeriod=1
MaxPeriod=10
Server=sjl0vwapi03
Site=eNom
IsLockable=
IsRealTimeTLD=
TimeDifference=+0.00
ExecTime=0.053
Done=true
RequestDateTime=6/21/2018 6:11:11 PM
--------------------

Here is the update history from DNS-O-Matic:

47.44.1.123, Jun 29, 2018 4:58 pm, ERROR
47.44.1.123, Jun 29, 2018 4:53 pm, ERROR
47.44.1.123, Jun 21, 2018 6:11 pm, ERROR
47.44.1.123, May 24, 2018 6:10 pm, ERROR
47.44.1.124, May 12, 2018 8:56 am, OK
47.44.1.124, May 4, 2018 2:48 pm, OK
47.44.1.124, May 3, 2018 1:42 pm, OK
47.44.1.124, Apr 1, 2018 12:39 pm, OK
47.44.1.124, Apr 1, 2018 9:58 am, OK
47.44.1.124, Mar 24, 2018 5:06 pm, OK

As of yesterday, I could not find any other reports of similar issues on google, and the eNom status page showed no problems.

I use a Ubiquity UniFi Security Gateway Pro as home router, and I have the dynamic DNS service in the UniFi controller configured to point to DNS-O-Matic, but it offered no additional hints as to the cause of the problem.

I contacted eNom support over chat, and they informed me they know there is an issue, and they said I should use the following format for the update:

http://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=%1&PW=%2&Zone=%3&DomainPassword=%4

%1 = Is username in Enom
%3 = Is my host and domain
%4 = Is my domain access password

This was interesting, I had looked at several eNom update scripts, even the eNom sample code, and they all used a different command format. I looked up the SetDNSHost documentation, and sure enough, it looks like eNom changed the API.

Old format:

https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&HostName=[host]&Zone=[domain]&DomainPassword=[password]&Address=[IP]

New format:

https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=[LoginName]&PW=[LoginPassword]&Zone=[FQDN]&DomainPassword=[Password]&Address=[IP]

eNom changed the meaning of the “Zone” parameter to be the fully qualified domain name, and they required the addition of the account username and password.

I tried the old format in my browser, and I got the same “Domain name not found” error. As I tried the URL, I noticed that HTTPS failed with a certificate mismatch. The certificate for https://dynamic.name-services.com points to reseller.enom.com.

Broken SSL, and including my account username and password was not an acceptable option, additionally I use 2FA on my account, so I had doubts that my password would even work. I tried the command as described in the documentation, but I omitted my account password, and it worked.

https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=[LoginName]&Zone=[FQDN]&DomainPassword=[Password]&Address=[IP]

I still find it very weird that this has been broken for so long, and that I could not find other reports of the problem on google, are people not using eNom or eNom resellers with dynamic DNS?

I also find it disappointing that the status page is not reflecting this problem, and that the SSL domain does not match, one would expect more from a domain company.

Until eNom fixes the problem, or until DNS-O-Matic updates support for the new API format, I created a PowerShell script to update my domains, maybe it is useful for others with the same problem.

$UserName = 'eNom account username'$HostNames = @('www', 'name1', 'name2', 'etc')
$DomainName = 'yourdomain.com'$Password = 'Domain change password'

$url = 'http://myip.dnsomatic.com'$webclient = New-Object System.Net.WebClient
$result =$webclient.DownloadString($url) Write-Host$result
$IPAddress =$result.ToString()
$webclient.Dispose() # Ignore SSL error caused by dynamic.name-services.com SSL certificate pointing to a different domain [System.Net.ServicePointManager]::ServerCertificateValidationCallback = {$true}
$webclient = New-Object System.Net.WebClient foreach ($hostname in $HostNames) { # https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&HostName=[host]&Zone=[domain]&DomainPassword=[password]&Address=[IP] # https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=[LoginName]&Zone=[FQDN]&DomainPassword=[Password]&Address=[IP]$url = "https://dynamic.name-services.com/interface.asp?Command=SetDNSHost&UID=$UserName&Zone=$hostname.$DomainName&DomainPassword=$Password&Address=$IPAddress" Write-Host$url
$result =$webclient.DownloadString($url); Write-Host$result
}
$webclient.Dispose() [System.Net.ServicePointManager]::ServerCertificateValidationCallback =$null

What started as a simple Mini PCI Express WiFi card swap on a ThinkPad T61 notebook, turned into deploying a custom BIOS in order to get the card to work.

I love ThinkPad notebooks, they are workhorses that keep on going and going. I always keep my older models around for testing, and one of my old T61’s had an Intel 4965AGN card, that worked fine with Windows 10, until the release of the Anniversary / Redstone 1 update. After the RS1 update, WiFi would either fail to connect, or randomly drop out. The 4965AGN card is not supported by Intel on Win10, and the internet is full of problem reports of Win10 and 4965AGN cards.

Ok, no problem, I’ll just get a cheap, reasonably new, with support for Win10, Mini PCIe WiFi card, and swap the card. I got an Intel 3160 dual band 802.11AC card and mounting bracket for about \$20. The 3160 is a circa 2013 card with Win10 support. I installed the card, booted, and got a BIOS error 1802: Unauthorized network card is plugged in.

This lead me to the discovery of ThinkPad hardware whitelisting, where the BIOS only allows specific cards to be used, which lead me to Middleton’s BIOS, a custom T61 BIOS, that removes the hardware whitelisting, and enables SATA-2 support. I found working download links to the v2.29-1.08 Middleton BIOS here.

The BIOS update is packaged as a Win7 x86 executable or DOS bootable ISO image. As I’m running Win10 x64, and I could not find any CD-R discs around, I used Rufus to create a bootable DOS USB key, and I extracted the ISO contents using 7-Zip to a directory on the USB key. The ISO is created using a bootable 1.44MB DOS floppy image, and AUTOEXEC.BAT launches “FLASH2.EXE /U”, I created a batch file that does the same.

I removed the WiFi card, booted from USB, ran the flash, and got an error 1, complaining that flashing over the LAN is disabled. Ok, I enabled flashing the BIOS over the LAN in the BIOS, and rebooted.

I ran the update again, and this time I got error 99, complaining that BitLocker is enabled, and to temporarily disable BitLocker. I did not have BitLocker enabled, so I removed the hard drive and tried again, same error. Must be something in the BIOS, I disabled the security chip in the BIOS, tried again, and the update starts, but a minute or so later the screen goes crazy with INVALID OPCODE messages.

Hmm, maybe the updater does not like the FreeDOS boot image used by Rufus. Ok, let me create a MS-DOS USB key, uhh, on Win10, that turned out to be near impossible. Win10 does not include MS-DOS files, Rufus does not support custom locations for MS-DOS files, nor does it support getting them from floppy or CD images (readily available for download), the HP USB Disk utility complains my USB drive is locked, and writing raw images to USB result in a FAT12 disk structure that is too small to use. I say near impossible because I gave up, and instead went looking for an existing MS-DOS USB key I had made a long time ago. I am sure with a bit more persistence I could have found a way to create MS-DOS bootable USB keys on Win10, but that is an exercise of another day.

Trying again with a MS-DOS USB key, and voilà, BIOS flashed, and WiFi working.

I am annoyed that I had to go to this much trouble to get the new WiFi card working, but the best part of the exercise turns out to be the SATA-2 speed increase. This machine had a SSD drive, that I always found to be slow, but with the SATA-2 speed bump in Middleton’s BIOS, the machine is noticeably snappier.

A couple hours later, my curiosity got the better of me, and I made my own version of Rufus that will allow formatting of MS-DOS USB drives on Win10. In the process I engaged in an interesting discussion with the author of Rufus. I say interesting, but it was rather frustrating, Microsoft removed the MS-DOS files from Win10, and Rufus refuses to add support for sourcing of MS-DOS files from a user specified location, citing legal reasons, and my reluctance to first report the issue to FreeDOS. Anyway, can code, have compiler, if have time, will solve problem.

Self Signed BitFury Drivers

Almost two years ago I pre-ordered some bitcoin mining hardware from Butterfly Labs, what a waste. After countless delays, more than a year late, they finally shipped the hardware, and given the low probability of ever recovering the money through mining, I immediately sold the hardware on eBay, for a little profit.

In the mean time USB stick miners became available, outperforming GPU mining, and easy to setup and run. I’ve had a couple of ASICMiner Block Erupter’s running under my desk for some time, in the early days I saw some fractions of coins coming in, but in recent months they are so under-powered against the current hash-rates that they do little more than blink lights.

There is a resurgence in USB stick mining hardware, specifically the Bitfury type devices, many based on the NanoFury open source project that provided software, design, and PCB schematics.

I got myself a Red Fury, an Nano Fury II, and a Hex Fury. Compared to the 300MH/s of my little Block Erupters, these run at 2GH/s, 4GH/s, and 11GH/s respectively. There is still no way to ever make a profit in mining (at this scale), but I was really interested in seeing how these newer generation devices worked, especially since the publication of the NanoFury open source project, where in theory I could build my own.

So what does this have to do with self signed drivers, well, my mining tool of choice is CGMiner, but CGMiner currently only runs Nano Fury II’s at half speed, requiring the use of BFGMiner to go full speed. But unlike CGMiner that accesses all USB devices via Zadig installed WinUSB drivers, BFGMiner requires native Windows drivers, and neither the Red Fury nor the Hex Fury drivers are signed, so no installation on Windows 8 x64 (without disabling driver signing on every boot).

Looking at the INF files, all these devices do is register the USB hardware id as a generic null modem USB to COM bridge device, so no binaries required, just a signed CAT file.

After a bit of searching I found that I was not alone in my frustration, and I found a self-signed Red Fury driver. But, the Hex Fury used a different hardware id, and most people used CGMiner, so no need for a signed native driver as Zadig took care of that for us. So, I created my own signing script and signed my own drivers, install ok, BFGMiner happy.

If you just want signed drivers, get a copy of self-signed “Bitfury BF1” and “bi•fury” drivers here.

I tested on Windows 8.1 Update 1 x64:

Install the Windows 8.1 SDK and WDK.
Get the original “Bitfury BF1” and “bi•fury” INF files.
The bf1.inf file is saved in *NIX format (CR), convert it to Windows format (CRLF).
Create a self signed certificate:


makecert.exe -r -pe -ss PrivateCertStore -sr localMachine -n "CN=BitFury Test Signing Certificate" "C:\BitFury\BitFuryTest.cer"


Prepare the INF files, and create CAT files:


stampinf.exe -n -f "C:\BitFury\bf1.inf" -d * -v * -c "bf1.cat"
stampinf.exe -n -f "C:\BitFury\bifury_c4C.inf" -d * -v * -c "bifury_c4C.cat"
inf2cat.exe /v /driver:C:\BitFury\ /os:7_x86,7_x64,8_x86,8_x64


Sign the CAT files:


signtool.exe sign /v /s PrivateCertStore /n "BitFury Test Signing Certificate" /t http://timestamp.verisign.com/scripts/timestamp.dll "C:\BitFury\bf1.cat"
signtool.exe sign /v /s PrivateCertStore /n "BitFury Test Signing Certificate" /t http://timestamp.verisign.com/scripts/timestamp.dll "C:\BitFury\bifury_c4C.cat"


To use the drivers, you have to import the signing certificate into the local certificate store. As this is basically a self-signed CAT file, there are no trusted root certificates in the system that signed the signing certificate, and we need to add the signing certificate to the root and the trusted certificate stores.


certmgr.exe -add -c "C:\BitFury\BitFuryTest.cer" -s -r localMachine root
certmgr.exe -add -c "C:\BitFury\BitFuryTest.cer" -s -r localMachine trustedpublisher


Alternatively you can run “certlm.msc”, and import the certificate file into the “Trusted Root Certification Authorities” and the “Trusted Publishers” hives.

Last thing left to do is to use your newly signed drivers when selecting the custom driver from device manager.

Here is a package with signed drivers, and scripts to help you sign your own INF files, and import the certificate. It should work on Windows 7 and Windows 8 x86 and x64. Your mileage may vary, use at your own risk 🙂

I finally figured out why I kept on getting VIDEO_TDR_FAILURE BSOD’s when installing Windows 8 on my SuperMicro workstations. It turns out that the problem goes away when I use a PCIe slot associated with CPU #1, instead of a slot associated with CPU #2.

Some history on my adventures with Windows 8 and SuperMicro SuperWorkstations:
I got ACPI_BIOS_ERROR BSOD’s while installing Windows 8, SuperMicro provided a Beta BIOS that resolved the problem.
The Windows 8 install hangs if installing to a SSD drive on a LSI 2308 SAS controller, that issue is still unresolved, but can be worked around by connecting the SSD to the Intel SATA controller.
I got VIDEO_TDR_ERROR BSOD’s while installing Windows 8 with a NVidia Quadro 5000 graphic card, same with an ATI FirePro V7900 or a NVidia GeForce GTX 680 or an ATI HD 7970. And this post is about resolving that problem.

SuperMicro released v1.0a BIOS updates for the X9DAi and X9DA7 motherboards used in the 7470A-T and 7470A-73 SuperWorkstations. I was hoping this will resolve the VIDEO_TDR_FAILURE BSOD’s, but no.

The X9DA7 BIOS updated without issue, but the X9DAi update reported an error at the end of the update process; “Error when sending Enable Message to ME”.

I contacted SuperMicro support, and they asked me to make sure that there is no jumper on JPME1. There is no mention of JPME1 in the motherboard manual, but it is located next to JIPMB1, next to PCIe slot #1. The header had a jumper on pins 2 and 3, where the same header on the X9DA7 motherboard had a jumper between 1 and 2. I removed the jumper, and the BIOS update succeeded.

Unlike the ACPI_BIOS_ERROR BSOD that happens during the WinPE phase of the install, the VIDEO_TDR_FAILURE BSOD happens on the first boot after the install, during the hardware detection and driver install phase. This means that the technique I used to kernel debug the initial boot phase will not work, as the second boot is using the BCD already deployed to the target hard drive. I had to modify the BCD of the already installed image, prior to the install continuing after the reboot.

I tested many permutations of graphic cards and configurations, and it quickly became very annoying to have to type my Win8 product key every single time I boot and install. To avoid this I created configuration files in the sources directory on the install media, and this bypassed the key question. You can read more about the meaning of the file contents here:

EI.cfg:

[EditionID]Professional[Channel]Retail[VL]0

PID.txt:

[PID]Value=XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

To modify the BCD of the installed image, and be able to easily repeat the second phase of install testing, I installed a second hard drive, and deployed WinPE to the second drive. By using F11 during boot to choose the boot drive, I could select booting from the second drive at any time.

I have a variety WinPE v3 (Win7) based utility images, and I updated them to use WinPE v4 (Win8). In the process I lost the boot menu, and the first image in the menu automatically started booting. After some trial and error, I found the bootmenupolicy BCD option, and when set to legacy mode, the old style menu is back:

bcdedit /set {default} bootmenupolicy legacy

I installed Win8 on the primary drive, and during the reboot, instead of booting to the installed Win8 drive, I used F11 and booted to my secondary WinPE drive. From WinPE I modified the boot BCD to enable kernel debugging over the network:

bcdedit -store c:\boot\bcd /set {default} nocrashautoreboot yesbcdedit -store c:\boot\bcd /set {default} debugtype netbcdedit -store c:\boot\bcd /set {default} hostip 3232235876bcdedit -store c:\boot\bcd /set {default} port 50000bcdedit -store c:\boot\bcd /set {default} key my.secret.debug.keybcdedit -store c:\boot\bcd /debug {default} yes

This is equivalent to:

bcdedit /dbgsettings net host:192.168.1.100 port:50000 key:my.secret.debug.key

But unlike the dbgsettings command, this allows me to specify a BCD store. Also note that the IP address is stored as a single numeric value instead of the dotted IP format.

While still in WinPE, I captured the state of the primary Win8 drive by making a drive image using Symantec Ghost, the real Ghost, currently sold as Symantec Ghost Solution Suite, not the same named but volume snapshot based Norton Ghost or Symantec System Recovery. By saving a drive image, I can easily change hardware or configurations, test the install starting at the second phase, reboot to the secondary WinPE drive using F11, restore the entire drive image, and try again, while leaving the kernel debug options intact.

I tested with following hardware configurations in various permutations:

With the kernel debugger attached, I captured the following crash details in WinDbg for NVidia based cards:

VIDEO_TDR_FAILURE (116)Attempt to reset the display driver and recover from timeout failed.Arguments:Arg1: fffffa80211cd010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).Arg2: fffff8800782d0d8, The pointer into responsible device driver module (e.g. owner tag).Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.Arg4: 0000000000000002, Optional internal context dependent data.

 Debugging Details:------------------ FAULTING_IP: nvlddmkm+1ae0d8fffff8800782d0d8 4055 push rbp DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT BUGCHECK_STR: 0x116 PROCESS_NAME: System CURRENT_IRQL: 0 STACK_TEXT: fffff88012c76078 fffff80166fef0ea : 0000000000000000 0000000000000116 fffff88012c761e0 fffff80166f734b8 : nt!DbgBreakPointWithStatusfffff88012c76080 fffff80166fee742 : 0000000000000003 fffff88012c761e0 fffff80166f73e90 0000000000000116 : nt!KiBugCheckDebugBreak+0x12fffff88012c760e0 fffff80166ef4144 : fffffa802094b100 fffff880021ee9c0 fffffa801f54e400 0000000000000000 : nt!KeBugCheck2+0x79ffffff88012c76800 fffff88004b33dcb : 0000000000000116 fffffa80211cd010 fffff8800782d0d8 0000000000000000 : nt!KeBugCheckEx+0x104fffff88012c76840 fffff88004b32518 : fffff8800782d0d8 fffffa80211cd010 fffff88012c76949 00000000000000c7 : dxgkrnl!TdrBugcheckOnTimeout+0xeffffff88012c76880 fffff88004a1e608 : fffffa80211cd010 fffff88012c76949 0000000000000000 0000000000000002 : dxgkrnl!TdrIsRecoveryRequired+0x168fffff88012c768b0 fffff88004a4d539 : 0000000000000000 fffff78000000320 0000000000000000 fffffa801f54e400 : dxgmms1!VidSchiReportHwHang+0x438fffff88012c769b0 fffff88004a4ba49 : fffffa8000000002 fffffa801f54e400 fffffa801f54e840 fffffa801f54e840 : dxgmms1!VidSchiCheckHwProgress+0xe5fffff88012c76a00 fffff88004a16fe5 : ffffffffff676980 0000000000000001 fffff88012c76b69 fffffa801f54e400 : dxgmms1!VidSchiWaitForSchedulerEvents+0x20dfffff88012c76aa0 fffff88004a4b646 : 0000000000000000 000000000000000f fffffa801f54e400 fffffa801f54e400 : dxgmms1!VidSchiScheduleCommandToRun+0x289fffff88012c76bd0 fffff80166e9b521 : fffffa801f5abb00 fffffa801f54e400 fffff88003b01140 0000000006a21e1e : dxgmms1!VidSchiWorkerThread+0xcafffff88012c76c10 fffff80166ed9dd6 : fffff88003af5180 fffffa801f5abb00 fffff88003b01140 fffffa8019aac040 : nt!PspSystemThreadStartup+0x59fffff88012c76c60 0000000000000000 : fffff88012c77000 fffff88012c71000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND: .bugcheck ; kb FOLLOWUP_IP: nvlddmkm+1ae0d8fffff8800782d0d8 4055 push rbp SYMBOL_NAME: nvlddmkm+1ae0d8 FOLLOWUP_NAME: MachineOwner MODULE_NAME: nvlddmkm IMAGE_NAME: nvlddmkm.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf93d7 FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys 

BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

With the kernel debugger attached, I captured the following crash details in WinDbg for ATI based cards:

VIDEO_TDR_FAILURE (116)Attempt to reset the display driver and recover from timeout failed.Arguments:Arg1: fffffa801ed114d0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).Arg2: fffff8800725cefc, The pointer into responsible device driver module (e.g. owner tag).Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.Arg4: 000000000000000d, Optional internal context dependent data.

 Debugging Details:------------------ FAULTING_IP: atikmpag+8efcfffff8800725cefc 4055 push rbp DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT BUGCHECK_STR: 0x116 PROCESS_NAME: System CURRENT_IRQL: 0 STACK_TEXT: fffff88006fa9ee8 fffff803e6ff20ea : 0000000000000000 0000000000000116 fffff88006faa050 fffff803e6f764b8 : nt!DbgBreakPointWithStatusfffff88006fa9ef0 fffff803e6ff1742 : 0000000000000003 fffff88006faa050 fffff803e6f76e90 0000000000000116 : nt!KiBugCheckDebugBreak+0x12fffff88006fa9f50 fffff803e6ef7144 : fffffa801e2df4e0 fffff880020b99c0 fffffa801d31f010 0000000000000000 : nt!KeBugCheck2+0x79ffffff88006faa670 fffff88004d31dcb : 0000000000000116 fffffa801ed114d0 fffff8800725cefc 0000000000000000 : nt!KeBugCheckEx+0x104fffff88006faa6b0 fffff88004d30548 : fffff8800725cefc fffffa801ed114d0 fffff88006faa7b9 0000000000000180 : dxgkrnl!TdrBugcheckOnTimeout+0xeffffff88006faa6f0 fffff88004c11608 : fffffa801ed114d0 fffff88006faa7b9 000000000000000f fffffa801d31f8f8 : dxgkrnl!TdrIsRecoveryRequired+0x198fffff88006faa720 fffff88004c459f9 : 0000000000000001 fffff88006faa8a0 fffff88006faa920 0000000000000000 : dxgmms1!VidSchiReportHwHang+0x438fffff88006faa820 fffff88004c3ff72 : fffffa801d31f010 fffff78000000320 fffffa801d31f770 fffffa801d31f010 : dxgmms1!VidSchWaitForCompletionEvent+0x411fffff88006faa8e0 fffff88004c4206c : fffffa801d31f010 fffffa801d31f450 fffffa801d31f450 0000000000000000 : dxgmms1!VidSchiWaitForEmptyHwQueue+0x9afffff88006faa9d0 fffff88004c3ea85 : 0000000000000000 fffffa801d31f010 fffffa801d31f450 0000000000000000 : dxgmms1!VidSchiSuspend+0x74fffff88006faaa00 fffff88004c09fe5 : ffffffffff676980 0000000000000001 fffff88006faab69 fffffa801d31f010 : dxgmms1!VidSchiWaitForSchedulerEvents+0x249fffff88006faaaa0 fffff88004c3e646 : 0000000000000000 fffffa801d585660 fffffa801d44d7f0 fffffa801d31f010 : dxgmms1!VidSchiScheduleCommandToRun+0x289fffff88006faabd0 fffff803e6e9e521 : fffffa801d6b9b00 fffffa801d31f010 fffff88003932140 0000000004d91ecb : dxgmms1!VidSchiWorkerThread+0xcafffff88006faac10 fffff803e6edcdd6 : fffff88003926180 fffffa801d6b9b00 fffff88003932140 fffffa8019ac7500 : nt!PspSystemThreadStartup+0x59fffff88006faac60 0000000000000000 : fffff88006fab000 fffff88006fa5000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND: .bugcheck ; kb FOLLOWUP_IP: atikmpag+8efcfffff8800725cefc 4055 push rbp SYMBOL_NAME: atikmpag+8efc FOLLOWUP_NAME: MachineOwner MODULE_NAME: atikmpag IMAGE_NAME: atikmpag.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf9279 FAILURE_BUCKET_ID: 0x116_IMAGE_atikmpag.sys 

BUCKET_ID: 0x116_IMAGE_atikmpag.sys

This was not really helping me much, and I decided to repeat the tests but use the checked build of Windows 8 to help troubleshoot.

With the kernel debugger attached, I captured the following ASSERT during the boot:

Windows 8 Kernel Version 9200 MP (1 procs) Checked x64Built by: 9200.16384.amd64chk.win8_rtm.120725-1247Machine Name:Kernel base = 0xfffff8020e01d000 PsLoadedModuleList = 0xfffff8020e760ac0System Uptime: 0 days 0:00:06.228 (checked kernels begin at 49 days)Assertion: The BIOS has reported inconsistent resources (_CRS). Please upgrade your BIOS.ACPI!PnpBiosGetDeviceResourceList+0x15e:fffff880012c3c2a cd2c int 2Ch...Unknown bugcheck code (0)Unknown bugcheck descriptionArguments:Arg1: 0000000000000000Arg2: 0000000000000000Arg3: 0000000000000000Arg4: 0000000000000000

 Debugging Details:------------------ PROCESS_NAME: System FAULTING_IP: ACPI!PnpBiosGetDeviceResourceList+15efffff880012c3c2a cd2c int 2Ch ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred. EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred. DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: 0x0 CURRENT_IRQL: 0 LOCK_ADDRESS: fffff8020e7c5d60 -- (!locks fffff8020e7c5d60) Resource @ nt!PiEngineLock (0xfffff8020e7c5d60) Exclusively ownedThreads: fffffa8019a36040-01<*> 1 total locks, 1 locks currently held PNP_TRIAGE: Lock address : 0xfffff8020e7c5d60Thread Count : 1Thread address: 0xfffffa8019a36040Thread wait : 0x105eccd4 LAST_CONTROL_TRANSFER: from fffff880012b736f to fffff880012c3c2a STACK_TEXT: fffff880009b4b30 fffff880012b736f : fffffa8023a9e900 fffff880012a7e01 fffff880009b4c08 fffff880012a7e70 : ACPI!PnpBiosGetDeviceResourceList+0x15efffff880009b4bd0 fffff8800125acba : fffffa8023a9e900 fffffa8019ac54c0 fffff880012a7e70 fffffa801f477010 : ACPI!ACPIBusIrpQueryResourceRequirements+0x8bfffff880009b4c50 fffff8020e91b6a4 : fffffa8023a9e900 fffffa8019ac54c0 fffff880009b4db0 fffffa8023a9e900 : ACPI!ACPIDispatchIrp+0x2a6fffff880009b4cf0 fffff8020e91cd1b : fffffa8023a9e900 fffff880009b4db0 00000001c00000bb 0000000000000000 : nt!IopSynchronousCall+0x10cfffff880009b4d80 fffff8020e915bdb : fffffa8023a9e900 fffff880009b4e50 fffffa8023a4f850 000000000000001e : nt!PpIrpQueryResourceRequirements+0x5ffffff880009b4e10 fffff8020e91748d : fffffa8023a9b8e0 0000000000000000 ffffffff80000218 fffffa8023a9b8e0 : nt!PiQueryResourceRequirements+0x47fffff880009b4ea0 fffff8020e91a1f2 : fffffa8023a9b8e0 fffffa8023a9b8e0 0000000000000001 0000000000000000 : nt!PiProcessNewDeviceNode+0x159dfffff880009b5070 fffff8020e08feb5 : fffffa8019adcd20 0000000000000000 fffff880009b5358 0000000000000000 : nt!PipProcessDevNodeTree+0x1fefffff880009b5310 fffff8020e08fb59 : 0000000000000000 0000000000000000 0000000000000000 fffffa8037e19cc0 : nt!PnpDeviceActionWorker+0x345fffff880009b53d0 fffff8020ed4010d : 0000000000000000 fffff8a000000007 fffff8a000f08c00 0000000000000000 : nt!PnpRequestDeviceAction+0x2edfffff880009b5420 fffff8020ed3b39d : fffff8020d536800 fffff8020e7c83c0 0000000000000006 fffff8020d536800 : nt!IopInitializeBootDrivers+0x905fffff880009b5650 fffff8020ed2deb5 : fffff8020d536800 0000000000000000 fffff8020d536800 fffff8020d51ebf0 : nt!IoInitSystem+0xb5dfffff880009b59b0 fffff8020e82d013 : fffff8020d536800 fffffa8019a36040 0000000000000000 fffffa8019ab3040 : nt!Phase1InitializationDiscard+0x1899fffff880009b5bc0 fffff8020e1b289e : fffff8020d536800 fffff8020d536800 0000000000000000 0000000000000000 : nt!Phase1Initialization+0x13fffff880009b5bf0 fffff8020e24ef96 : fffff8020e82d000 fffff8020d536800 fffff8020e6c6180 00000000f8ffffff : nt!PspSystemThreadStartup+0x1a2fffff880009b5c60 0000000000000000 : fffff880009b6000 fffff880009b0000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND: kb FOLLOWUP_IP: ACPI!PnpBiosGetDeviceResourceList+15efffff880012c3c2a cd2c int 2Ch SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: ACPI!PnpBiosGetDeviceResourceList+15e FOLLOWUP_NAME: MachineOwner MODULE_NAME: ACPI IMAGE_NAME: ACPI.sys DEBUG_FLR_IMAGE_TIMESTAMP: 50109dd0 BUCKET_ID_FUNC_OFFSET: 15e FAILURE_BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList 

BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList 

This is interesting, the kernel ASSERT’s on a problem reported by the BIOS.

I contacted SuperMicro support, they said they will investigate the BIOS failure, and they suggested I try to use PCIe slot #3 instead of slot #5. The motherboard manual mentions that slots #1, #2, and #3 are to be used if CPU #1 is installed, and slots #4, #5, and #6 to be used only if CPU #2 is installed.

I have both processors installed, so not using the more conveniently located slot #5 never came to mind. I moved the graphic card to CPU #1 slot #3, and voila, install succeeded and Windows 8 was up and running!

I repeated the checked build test with the graphic card in slot #3, and the same BIOS ASSERT error was reported, so the BIOS ASSERT seems to be unrelated to the ACPI_TDR_FAILURE error.

This was a very frustrating problem, and I still don’t understand the root cause, but I am happy to be able to finally switch both workstations to Windows 8.

Part of the research I did before migrating from Blogger to WordPress.com, was to make sure that current Blogger permalinks will resolve correctly once the old posts were imported into WordPress.com. At the time all seemed fine, but soon after migrating, I received alerts from Google Webmaster Tools that there is an increase in site errors, specifically 404 errors.

Some background: Permalinks are the URL’s that point directly to specific posts on the blog. These URL’s are known by search engines, are shared on forums, and are basically the static address of posts. Blogger and WordPress.com use different styles of permalinks. WordPress.com allows some customization of permalinks, but unlike WordPress.org, there is no support for custom plugins to handle rewrites for permalinks, 302’s or 404’s.

Although not documented anywhere, WordPress.com does support Blogger style permalinks, and will correctly redirect the Blogger style link to the WordPress.com style page. As an example, see the links below, one for Blogger and one for WordPress.com:

http://blogdotinsanegenius.blogspot.com/2012/06/looks-can-be-deceiving.html https://blogdotinsanegenius.wordpress.com/2012/06/looks-can-be-deceiving

Search engines will know the link using the old blogger style URL, and both styles of links will correctly resolve to the current page:

https://blog.insanegenius.com/2012/06/19/looks-can-be-deceiving https://blog.insanegenius.com/2012/06/looks-can-be-deceiving.html

So why is it that Google Webmaster Tools reported a suddenly spike in 404’s?

By reviewing the links that report 404, I noticed that the permalink format of certain posts on WordPress.com was slightly different to the Blogger permalinks.

http://blogdotinsanegenius.blogspot.com/2009/10/hitachi-a7k2000-and-seagate-barracude.html http://blogdotinsanegenius.blogspot.com/2010/05/zotac-xboxhd-id11-mkv-h264-video.html http://blogdotinsanegenius.blogspot.com/2008/03/printing-from-network.html

https://blogdotinsanegenius.wordpress.com/2009/10/11/hitachi-ultrastar-and-seagate-barracude-lp-2tb-drives/ https://blogdotinsanegenius.wordpress.com/2010/05/28/zotac-xboxhd-id11-mkv-h-264-video-playback-performance/ https://blogdotinsanegenius.wordpress.com/2008/03/30/printing-from-the-network/

Notice the difference? Blogger appears to keep links short, and remove words like “the” and “and”.

I contacted WordPress.com support, and they provided a manual solution. They suggested that I modify the “slug” of each 404 post to match the Blogger style permalink.

This resolved the problem with the top 404’s, but I would have expected the Blogger import plugin to take care of this for me.

But, I soon received another alert email from Google Webmaster Tools, and this time the 404 posts looked a bit different.

Notice that all the links contain parameters in the URL (I think these are old style Google Analytics parameters), and without the parameter the redirect works, but with any parameters the redirect fails.

https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html?m=1

I again contacted WordPress.com support, and I am still awaiting a resolution.

[Update: 9 August 2012]
Just got an email from WordPress.com support, the problem with parameters is fixed, thank you.

SuperMicro Beta BIOS supports Windows 8 and Server 2012

In a previous post I reported that my SuperMicro SuperWorkstation 7047A-T failed to install Windows 8 or Windows Server 2012 due to a ACPI_BIOS_ERROR. I contacted SuperMicro support, and I was informed that new BIOS releases are on their way that will support Windows 8 and Server 2012.

This morning I received an email from SuperMicro, with a new Beta BIOS for the X9DAi motherboard used in the 7047A-T. The new BIOS allowed me to install Windows 8 and Server 2012.

I used a DOS bootable USB key, and installed the new BIOS.

The 7047A-T has USB ports on the back and on the front of the case. The ports on the front are all USB3, and it is not possible to boot from these ports, at least I have not yet found a configuration that allows booting from USB3 ports. I tried using USB2 keys and, my newest Kingston DataTraveler HyperX 3.0 super fast USB3 keys, the BIOS does not list any boot devices in these USB3 ports. To boot from USB you have to plug the USB key in one of the rear USB2 ports.

The new BIOS version is “1.0 beta”, compilation date “7/23/2012”. The BIOS screen looks like the more modern AMI EFI BIOS’s I’ve seen in other devices, i.e. the thin font instead of the classic console font.

I performed a “Restore Optimized Defaults”, and then went through the options to see what has changed and what is new.

The [Advanced] [Chipset Configuration] [North Bridge] [IOH Configuration] now sets all PCIe busses to GEN3, the old BIOS defaulted to GEN2.

The [Advanced] [SATA Configuration] now enabled hot plug on all ports, the old BIOS defaulted to hot plug disabled.

The [Advanced] [Boot Feature] ads a new power configuration item called “EuP”. This seems to be related to EU Directive 2005/32/EC:

EU Directive 2005/32/EC enacted by the European Union member countries dictates that after January 1, 2010, no computer or other energy using product (EuP) sold in the member countries may dissipate more than 1 Watt in the standby (S5) state.

I measured the power utilization, and the machine uses 2W when powered off, 140W at idle in Windows 8 desktop, and 7W while sleeping.

I updated my Windows 8 USB key to the latest build (I have access to), booted from the USB key, and installed Windows 8 without any major issues.

I had swapped the NVidia Quadro 4000 for a faster ATI FirePro V7900. The v1.0 BIOS worked fine with the Quadro 4000, but after installing the V7900, the screen powered on and Windows 7 started booting before I had a chance to see the BIOS screen. After installing the new Beta BIOS, the V7900 works as expected and I can see the BIOS screen during POST.

This is a note for ATI; please make sure your VGA driver install UI fits on a 640×480 display. When I swapped the Quadro 4000 for the V7900, and rebooted into Windows 7, I booted into a 640×480 16 color screen. Imagine my frustration trying to guess which button has focus when you can only see the top half of the ATI driver installer.

Windows 8 automatically installed drivers for the V7900.

The only driver Windows 8 did not automatically install is the C600 chipset SAS driver. I installed the Intel Rapid Storage Technology Enterprise (RSTe) drivers, and that solved that problem.

While running Windows 7 on this machine, and running the Windows Experience Index Assessment, the test would always crash. The same test in Windows 8 completed successfully.

I found the 2D and 3D results to be disappointing, and I tried to replace the “ATI FirePro V (FireGL V) Graphics Adapter (Microsoft Corporation – WDDM v1.20)” driver with the ATI Windows 8 Consumer Preview driver. Although the release notes indicate that the V7900 is supported, the driver installation failed with an unsupported hardware error. I’ll have to wait for newer Windows 8 drivers from ATI to see if the test scores improve.

I’m quite happy that I can use my new machines with Windows 8.

I just wish SuperMicro solved the BIOS incompatibility problems long ago, after all, it has been almost two years since the Windows 8 pre-release program started, and almost a year since the release of the public developer preview.

Debugging Windows 8 Install BSOD

In my last post I described how to prevent Windows from automatically restarting when encountering a BSOD during the OS install process. This allowed me to see the  ACPI_BIOS_ERROR fault code while installing Windows 8 on my new SuperMicro workstation. The new Windows 8 BSOD page looks friendly, but no longer displays any error parameters other than the main fault code.

In order to get additional details of the crash, I had to hook up a kernel debugger to the machine. Windows 8 adds USB3 and TCPIP kernel debug support, and I will describe how I used the TCPIP network option to capture details of the crash.

First thing to do is prepare our tools, download the Windows 8 Debugging Tools for Windows package, and the Windows 8 Symbols.

Unfortunately the debugging tools are no longer available as a standalone download, and you need to install the SDK or WDK on a Windows 8 system in order to get them, but you can choose to only install the debugging tools. Once you installed the debugging tools on one machine, you can copy the MSI installers or the directory to any other machines, including Windows 7 systems. You will find the tools in the “C:\Program Files (x86)\Windows Kits\8.0\Debuggers” folder.

Microsoft is pretty good at publishing symbols for most released versions of their products to their public symbol server, but I prefer to extract the symbols to a working directory on my machine, or to upload the symbols to our internal symbol server. You can install the downloaded symbols MSI package directly, or use the following command to extract the symbols from the MSI file to a location on disk. Run an elevated (right click run as administrator) command prompt, and type:

msiexec /a [symbol msi file name] /qb targetdir="[output directory]"

Next we need to enable kernel network debugging in the BCD options. This needs to be done on a Windows 8 machine as the network debugging command is not supported in older versions of BCDEdit. I should also call out that network debugging support is required for hardware logo certification, but not all current adapters support it. Insert the bootable Windows 8 USB key, run an elevated command prompt, and type:

bcdedit –store [usb key drive]:\boot\bcd /dbgsettings net hostip:[IP of WinDbg machine] port:50000

BCDEdit will output the connection security key that is required by WinDbg.

Start WinDbg, and enable network kernel debugging, entering the port number and security key.

Boot the target machine, you will see the target machine connecting to WinDbg:

Microsoft (R) Windows Debugger Version 6.2.8400.0 AMD64Copyright (c) Microsoft Corporation. All rights reserved.Using NET for debuggingOpened WinSock 2.0Waiting to reconnect...Connected to target 192.168.1.106 on port 50000 on local IP 192.168.1.100.Connected to Windows 8 8400 x64 target at (Fri Jul 20 11:07:21.583 2012 (UTC - 7:00)), ptr64 TRUEKernel Debugger connection established.

And then the ACPI_BIOS_ERROR crash:

25: kd> !analyze -v********************************************************************************                                                                             **                        Bugcheck Analysis                                    **                                                                             ********************************************************************************

 ACPI_BIOS_ERROR (a5)The ACPI Bios in the system is not fully compliant with the ACPI specification.The first value indicates where the incompatibility lies:This bug check covers a great variety of ACPI problems.  If a kernel debuggeris attached, use "!analyze -v".  This command will analyze the precise problem,and display whatever information is most useful for debugging the specificerror.Arguments:Arg1: 0000000000000003, ACPI_FAILED_MUST_SUCCEED_METHOD    ACPI tried to run a control method while creating device extensions    to represent the ACPI namespace, but this control method failed.Arg2: fffffa8019f2f288, The ACPI Object that was being runArg3: ffffffffc0000034, return value from the interpreterArg4: 00000000494e495f, Name of the control method (in ULONG format) Debugging Details:------------------ ACPI_OBJECT:  fffffa8019f2f288 DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT BUGCHECK_STR:  0xA5 PROCESS_NAME:  System CURRENT_IRQL:  0 LAST_CONTROL_TRANSFER:  from fffff803ca1e617a to fffff803ca0e5870 STACK_TEXT:  fffff880053eb418 fffff803ca1e617a : 0000000000000000 00000000000000a5 fffff880053eb580 fffff803ca16b930 : nt!DbgBreakPointWithStatusfffff880053eb420 fffff803ca1e57d2 : 0000000000000003 00000000494e495f fffff803ca168810 00000000000000a5 : nt!KiBugCheckDebugBreak+0x12fffff880053eb480 fffff803ca0eb044 : 00000000c0000034 fffff88001038255 fffffa801a50fe78 00000000c0000034 : nt!KeBugCheck2+0x79ffffff880053ebba0 fffff88001043949 : 00000000000000a5 0000000000000003 fffffa8019f2f288 ffffffffc0000034 : nt!KeBugCheckEx+0x104fffff880053ebbe0 fffff8800103bded : 0000000000000000 0000000000000000 0000000000008004 00000000c0000034 : ACPI!ACPIBuildCompleteMustSucceed+0x39fffff880053ebc20 fffff880010346bd : fffffa801a500000 0000000000008000 0000000000000000 fffffa8037e80000 : ACPI!AsyncCallBack+0x7ffffff880053ebc50 fffff88001034f56 : fffffa801a500000 fffff88001072be0 0000000000000000 0000000000000002 : ACPI!RunContext+0x141fffff880053ebc90 fffff880010386e3 : fffffa8019b1c3a0 0000000000000000 0000000000000000 fffffa8019a35258 : ACPI!InsertReadyQueue+0xd6fffff880053ebcc0 fffff8800103862a : fffff803ca2eb490 fffff88001072be0 0000000000000000 00000000546c6d41 : ACPI!RestartCtxtPassive+0x2ffffff880053ebcf0 fffff803ca0cb181 : fffffa8019e06b00 0000000000000080 fffff88004ac6540 0000000000000000 : ACPI!ACPIWorkerThread+0xeafffff880053ebd50 fffff803ca0dae26 : fffff88004aba180 fffffa8019e06b00 fffff88004ac6540 fffffa8019a8f940 : nt!PspSystemThreadStartup+0x59fffff880053ebda0 0000000000000000 : fffff880053ec000 fffff880053e6000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16 STACK_COMMAND:  kb FOLLOWUP_IP: ACPI!ACPIBuildCompleteMustSucceed+39fffff88001043949 cc              int     3 SYMBOL_STACK_INDEX:  4 SYMBOL_NAME:  ACPI!ACPIBuildCompleteMustSucceed+39 FOLLOWUP_NAME:  MachineOwner MODULE_NAME: ACPI IMAGE_NAME:  ACPI.sys DEBUG_FLR_IMAGE_TIMESTAMP:  4fe6a2b1 BUCKET_ID_FUNC_OFFSET:  39 FAILURE_BUCKET_ID:  0xA5_ACPI!ACPIBuildCompleteMustSucceed BUCKET_ID:  0xA5_ACPI!ACPIBuildCompleteMustSucceed 

Followup: MachineOwner

Even with all the crash details, it still doesn’t really help me make progress, as it has been two days since I logged the support request with SuperMicro, and no response yet.

CrashPlan Memory Utilization

I’ve been using CrashPlan as an online backup solution for quite some time, and it works really well.

I like the fact that I can subscribe to the consumer plan, with almost 3.5TB of data backed up, and that the backup client installs on a server OS. Many of the other “unlimited” backup providers I tested have restrictions in place that makes such a setup impossible.

CrashPlan sends email notifications about backup status, and I noticed that something was wrong with the backup:

I logged onto the machine, opened the main UI, and after a few seconds the UI just closed. opened it again, same thing, after about 15s the UI closed.

My initial thoughts were that it is a crash, but on attaching a debugger, the exit call stack showed that the process was cleanly terminated after receiving a signal.

On looking at the NT eventlog I could see that the service was restarting about every 15s:

The CrashPlan Backup Service service entered the stopped state. The CrashPlan Backup Service service entered the running state. The CrashPlan Backup Service service entered the stopped state. The CrashPlan Backup Service service entered the running state. The CrashPlan Backup Service service entered the stopped state. The CrashPlan Backup Service service entered the running state.

The service wasn’t crashing, it was externally being stopped and restarted. I looked in the CrashPlan directory, and I found several log files with a naming like restart_1342296082496.log. The contents of these files looked like this:

Sat 07/14/2012 13:01:22.53 : "C:\Program Files\CrashPlan\bin\restart.bat" ECHO is off. Sat 07/14/2012 13:01:22.53 : APP_BASE_NAME=CrashPlan Sat 07/14/2012 13:01:22.53 : APP_DIR=C:\Program Files\CrashPlan ECHO is off. Sat 07/14/2012 13:01:22.53 : Stopping CrashPlanService The CrashPlan Backup Service service is stopping. The CrashPlan Backup Service service was stopped successfully.

Sat 07/14/2012 13:01:25.05 : Sleeing 15 seconds...

Pinging 127.0.0.1 with 32 bytes of data: Reply from 127.0.0.1: bytes=32 time<1ms TTL=128 Reply from 127.0.0.1: bytes=32 time<1ms TTL=128 Reply from 127.0.0.1: bytes=32 time<1ms TTL=128 Reply from 127.0.0.1: bytes=32 time<1ms TTL=128

Ping statistics for 127.0.0.1: Packets: Sent = 15, Received = 15, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms Sat 07/14/2012 13:01:39.08 : Starting CrashPlanService

 The CrashPlan Backup Service service was started successfully. 

ECHO is off. Sat 07/14/2012 13:01:39.13 : Exiting...

I looked for a newer version, but 3.2.1 was the latest version. I logged a support ticket with CrashPlan, but I continued my investigation. I found a log file service.log.0, several MB in size, and inside it I found this:

[07.14.12 12:32:39.480 ERROR   QPub-BackupMgr       backup42.service.backup.BackupController] OutOfMemoryError occurred...RESTARTING! message=OutOfMemoryError in BackupQueue!

So it seems that the service is running out of memory. I now had a few good keywords to search on, and I found this post of a user with the same problem. At about the same time I received a reply from CrashPlan support, not bad for weekend service, with the same solution.

The CrashPlan backup service and desktop applications are Java apps, and as such the maximum amount of memory they use are capped by configuration. I have had similar problems with other memory hungry Java apps, like Jaikoz, that simply fail unless you increase the memory limit.

To fix the problem, shutdown the service, open the CrashPlanService.ini file in the program directory, and increase the maximum memory utilization parameter to 2GB, the default is 512MB, and restart the service:

Virtual Machine Parameters=-Xrs -Xms15M –Xmx2048M

After upping the memory all seemed well, and the service has been running for more than a day. But, I wanted to know just how much memory is CrashPlan using, and it turns out to be insane.

Here are the current stats for the amount of data I backup, as well as the resource utilization by the backup service and desktop app:

As you can see, the desktop app’s peak private bytes exceed 250MB, and the service exceeds 1.3GB, that’s right 1.3GB of memory!

Those numbers are simply outrageous.