Rack that server

It’s been a year and a half since we moved into the new house, and I finally have the servers racked in the garage. Looks pretty nice compared to my old setups.

My old setup was as follows:
Two DELL OptiPlex 990 small form factor machines with Windows Server 2008 R2 as Hyper-V servers. One server ran the important 24/7 VM’s, the other was used for testing and test VM’s. The 24/7 VM’s included a W2K8R2 domain controller and a W2K12 file server.
For storage I used a Synology DS2411+ NAS, with 12 x 3TB Hitachi Ultrastar drives, configured in RAID6, and served via iSCSI. The the iSCSI drive was mounted in the Hyper-V host, and configured as a 30TB passthrough disk for the file server VM, that served files over SMB and NFS.
These servers stood on a wooden storage rack in the garage, and at the new house they were temporarily housed under the desk in my office.

One of my primary objectives was to move the server equipment to the garage in an enclosed server rack, with enough space for expansion and away from dust. A garage is not really dust free and does get hot in the summer, not an ideal location for a server rack, but better than finding precious space inside the house. To keep dust to a minimum I epoxy coated the floor and installed foam air filters in the wall and door air inlet vents. To keep things cool, especially after parking two hot cars, I installed an extractor fan. I had planned on connecting it to a thermostat, but opted to use a Panasonic WhisperGreen extractor fan rated for 24/7 operation, and I just leave it on all the time. We have ongoing construction next door, and the biggest source of dust are the gaps around the garage door. I’ve considered applying sticky foam strips next to the garage door edges, but have not done so yet.

In retrospect, preparing the garage concrete surface by hand, and applying the Epoxy Coat kit by myself, is not something I would recommend for a novice. If you can, pay a pro to do it for you, or at least get a friend to help, and rent a diamond floor abrasion machine.

I did half the garage at a time, moving everything to one side, preparing the surface by hand, letting it dry, applying the epoxy and flakes, letting it dry, and then repeating the process for the other side. I decided the 7″ roller that came with the kit was too small, and I bought a 12″ roller, big mistake, as soon as I started rolling the epoxy there was lint everywhere. From the time you start applying the epoxy you have 20 minutes working time, no time to go buy the proper type of lint free roller. I did not make the same mistake twice, and used the kit roller for the second half, no lint. With the experience gained from the first half it was much easier the second time round, and the color flake application was also much more even compared to the first half.

To conserve space in the garage I used a Middle Atlantic WR-24-32 WR Series Roll Out Rotating Rack. The roll out and rotate design allowed me to mount the rack right against the wall and against other equipment, as it does not require rear or side panel access. I also used a low noise MW-4QFT-FC thermostatically controlled integrated extractor fan top to keep things cool, and a WRPFD-24 plexiglass front door to make it look nice.

The entire interior cage rolls out on heavy duty castors, and the bottom assembly rotates on ball bearings. The bottom of the enclosure is open in the center with steel plate tracks for the castors, and must be mounted down on a sturdy and level surface. My garage floor is not level and slopes towards the door, and consequently a fully loaded rack wants to roll out the door, and all the servers keep sliding out of the rails.

I had to level the enclosure by placing spacers under the front section, and then bolting it down on the concrete floor. This leaves the enclosure and the rails inside the enclosure level, but as soon as I pull the rack out on the floor, the chassis slide out and the entire rack wants to roll out the door. I had to build a removable wood platform with spacers to provide a level runway surface in front of the rack, that way I can pull the rack out on a level surface, and store the runway when not in use.

The WR-24-32 is 24U high, and accommodates equipment up to 26″ in length, quite a bit shorter than most standard racks. The interior rack assembly pillar bars are about 23″ apart, with equipment extending past the pillar ends. This turned out to be more of a challenge than the 26″ equipment length constraint. When the rack is in its outside rotated position, the 23″ pillars just clears the enclosure, but the 26″ equipment sticking out past the pillars do not, and prevents the rack from rotating. This requires brute force to lift the castors, and a very heavy loaded rack, over the rail edge and pull the enclosure out all the way before the rack would rotate freely.

Another problem with the 23″ pillar spacing is the minimum adjustable distance for the 4U Supermicro chassis rails is about 25″, and they would not fit between the pillars. I had to order a shorter set of adjustable rails, and use the chassis side of the original rails to match the chassis mounting holes, and the rack side of the rails to clear the pillars, fortunately they fit perfectly into each other, but not on the rack. The WR-24-32 has tapped 10-32 screw holes in all locations, i.e. no square holes anywhere, which meant I had to use my Dremel to cut the quick mount tabs from the rails in order to screw them on instead of hanging them on.

Rather than using another NAS based storage solution I opted for direct attached storage, so I was looking for a 24-bay chassis, less than 26″ in length, with low noise fans. I opted for a Supermicro 4U 24-bay SuperChassis 846BE16-R920B for the main file server, and a 4U 8-bay SuperChassis 745BTQ-R1K28B-SQ for the utility server. It was the SC846’s included rails that were too long to fit between the posts, and I replaced them with a MCP-290-00058-0N short rail set.

I used Supermicro X10SLM+-F Xeon boards with Intel Xeon E3-1270 v3 processors for both systems. Low power and low heat was a higher priority than performance, and the E3 v3 processors were a good balance. I’ve had good experiences with the X9 series SM boards, but I have mixed feelings about the X10 boards. Kingston dropped support for these boards due to memory chip incompatibilities, and SM certified memory for this board is very expensive, and I had endless troubles getting the boards to work with an Adaptec 7805Q controller. The 7805Q controller would simply fail to start, and after being bounced around between SM and Adaptec support, SM eventually provided me with a special BIOS build, that is yet to be publicly updated, that resolved the problem. I had no such problems with the newer 81605ZQ controller I used in the 24-bay chassis.

For the 24-bay system storage, I used 2 x Samsung 840 Pro 512GB SSD drives in RAID1 for booting the OS and for MaxCache, 4 x Samsung 840 EVO 1TB SSD drives in RAID5 to host VM’s, 16 x Hitachi 4TB Coolspin drives plus 2 x hot spares in RAID6 for main storage. The 56TB RAID6 volume is mounted as a passthrough disk to the file server VM. To save power and reduce heat I host all the VM’s on the SSD array, and opted to use the consumer grade Hitachi Coolspin drives over the more expensive but reliable Ultrastar drives. The 8-bay system has a similar configuration, less the large RAID6 data array.

The SM boards are very easy to manage using the integrated IPMI KVM functionality. Other than configuring the BIOS and IPMI IP settings on the first boot, I rarely have to use the rack mounted KVM console. Each server runs W2K12R2 with the Hyper-V role. I am no longer running a domain controller, the complexity outweighed the benefit, especially with the introduction of Microsoft online accounts used in Windows 8. The main VM is a W2K12R2 storage file server VM, with the RAID6 disk in passthrough, serving data over SMB and NFS. My other VM’s include a system running Milestone XProtect IP security camera network video recorder, a MSSQL and MySQL DB VM, a Spiceworks VM, a Splunk VM, a UniFi Controller VM, and several work related VM’s.

I had Verizon switch my internet connection from Coax to Ethernet, and I now run a Ubiquity EdgeRouter Pro. I did run a MiktroTik Routerboard CCR1009-8G-1S-1S+ for a while, and it is a very nice box, but as I also switched out my EnGenius EAP600 access points to Ubiquity UniFi AC units, and I replaced the problematic TRENDNet TPE-1020WS POE+ switches with Ubiquity ToughSwitch TS-8-Pro POE units, I preferred to stick to one brand in the hopes of better interoperability. Be weary of the ToughSwitch units though, seems that under certain conditions mixing 100Mbps and 1Gbps ports have serious performance problems. I am still on the fence about the UniFi AC units, they are really easy to manage via the UniFi controller, but some devices, like my Nest thermostats, are having problems staying connected. Not sure if it is a problem with access points or the Nest’s, as there are many people blaming this problem on a Nest firmware update.

I used an APC Smart-UPS X 1500VA Rack/Tower LCD 120V with Network Card for clean and reliable power, and an ITWatchDogs SuperGoose II Climate Monitor for environmental monitoring and alerting.

After building and configuring everything, I copied all 30TB of data from the DS2411+ to the new server using robocopy with the multithreaded option, took about 5 days to copy. I continued using the old systems for two weeks while I let the new systems settle in, in case anything breaks. I then re-synced the data using robocopy, moved the VM’s over, and pointed clients to the new systems.

VM’s are noticeably more response, presumable due to being backed by SSD. I can now have multiple XBMC systems simultaneously watch movies while I copy data to storage without any playback stuttering, something that used to be an issue on the old iSCSI system.

The best part is really the way the storage cabinet looks 🙂

This is the temporary server home under my office desk:
Before

Finished product:
After

The “runway” I constructed to create a level surface:
Runway

Pulled out all the way, notice the cage is clear, but the equipment won’t clear:
Out

To clear the equipment the castors have to be pulled over the edge:
Cleared

Rotated view:
Rotated

The rarely used KVM drawer:
KVM

Extractor fans:
Fans

Night mode:
Night

Advertisements

Storage Spaces Leaves Me Empty

I was very intrigued when I found out about Storage Spaces and ReFS being introduced in Windows Server 2012 and Windows 8. But now that I’ve spent some time with it, I’m left disappointed, and I will not be trusting my precious data with either of these features, just yet.

 

Microsoft publicly announced Storage Spaces and ReFS in early Windows 8 blog posts. Storage Spaces was of special interest to the Windows Home Server community in light of Microsoft first dropping support for Drive Extender in Windows Home Server 2011, and then completely dropping Windows Home Server, and replacing it with Windows Server 2012 Essentials. My personal interest was more geared towards expanding my home storage capacity in a cost effective and energy efficient way, without tying myself to proprietary hardware solutions.

 

I archive all my CD’s, DVD’s, and BD discs, and store the media files on a Synology DS2411+ with 12 x 3TB drives in a RAID6 volume, giving me approximately 27TB of usable storage. Seems like a lot of space, but I’ve run out of space, and I have a backlog of BD discs that need to be archived. In general I have been very happy with Synology (except for an ongoing problem with “Local UPS was plugged out” errors), and they do offer devices capable of more storage, specifically the RS2212+ with the RX1211 expansion unit offering up to 22 combined drive bays. But, at $2300 plus $1700, this is expensive, capped at 22 drives, and further ties me in with Synology. Compare that with $1400 for a Norco DS24-E or $1700 for a SansDigital ES424X6+BS 24 bay 4U storage unit, an inexpensive LSI OEM branded SAS HBA from eBay, or a LSI SAS 9207-8e if you like the real thing, connected to Windows Server 2012, running Storage Spaces and ReFS, and things look promising.

Arguable I am swapping one proprietary technology for another, but with native Windows support, I have many more choices for expansion. One could make the same argument for the use of ZFS on Linux, and if I was a Linux expert, that may have been my choice, but I’m not.

 

I tested using a SuperMicro SuperWorkstation 7047A-73, with dual Xeon E5-2660 processors and 32GB RAM. The 7047A-73 uses a X9DA7 motherboard, that includes a LSI SAS2308 6Gb/s SAS2 HBA, connected to 8 hot-swap drive bays.

For comparison with a hardware RAID solution I also tested using a LSI MegaRAID SAS 9286CV-8e 6Gb/s SAS2 RAID adapter, with the CacheCade 2.0 option, and a Norco DS12-E 12 bay SAS2 2U expander.

For drives I used Hitachi Deskstar 7K4000 4TB SATA3 desktop drives and Intel 520 series 480GB SATA3 SSD drives. I did not test with enterprise class drives, 4TB models are still excessively expensive, and defeats the purpose of cost effective home use storage.

 

I previously reported that the Windows Server 2012 and Windows 8 install will hang when trying to install on a SSD connected to the SAS2308. As such I installed Server 2012 Datacenter on an Intel 480GB SSD connected to the onboard SATA3 controller.

Windows automatically installed the drivers for the LSI SAS2308 controller.

I had to manually install the drivers for the C600 chipset RSTe controller, and as reported before, the driver works, but suffers from dyslexia.

The SAS2308 controller firmware was updated to the latest released SuperMicro v13.0.57.0.

 

Since LSI already released v14.0.0.0 firmware for their own SAS2308 based boards like the SAS 9207-8e, I asked SuperMicro support for their v14 version, and they provided me with an as yet unreleased v14.0.0.0 firmware version for test purposes. Doing a binary compare between the LSI version and the SuperMicro version, the differences appear to be limited to descriptive model numbers, and a few one byte differences that are probably configuration or default parameters. It is possible to cross-flash between some LSI and OEM adapters, but since I had a SuperMicro version of the firmware, this was not necessary.

SuperMicro publishes a v2.0.58.0 LSI driver that lists Windows 8 support, but LSI has not yet released Windows 8 or Server 2012 drivers for their own SAS2308 based products. I contacted LSI support, and their Windows 8 and Server 2012 drivers are scheduled for release in the P15 November 2012 update.

I tested the SuperMicro v14.0.0.0 firmware with the SuperMicro v2.0.58.0 driver, the SuperMicro v14.0.0.0 firmware with the Windows v2.0.55.84 driver, and the SuperMicro v2.0.58.0 driver with the SuperMicro v13.0.57.0 firmware. Any combination that included the SuperMicro v2.0.58.0 driver or the SuperMicro v14.0.0.0 firmware resulted in problems with the drives or controller not responding. The in-box Windows v2.0.55.84 driver and the released SuperMicro v13.0.57.0 firmware was the only stable combination.

Below are some screenshots of the driver versions and errors:

LSI.2.0.55.84LSI.2.0.58.0

Eventlog.Controller.ErrorEventlog.IO.RetriedEventlog.Reset.DeviceFormat.Failed

 

One of the reasons I am not yet prepared to use Storage Spaces or ReFS is because of the complete lack of decent documentation, best practice guides, or deployment recommendations. As an example, the only documentation on SSD journal drive configuration is in TechNet forum post from a Microsoft employee, requiring the use of PowerShell, and even then there is no mention of scaling or size ratio requirements. Yes, the actual PowerShell commandlet parameters are documented on MSDN, but not the use or the meaning.

PowerShell is very powerful and Server 2012 is completely manageable using PowerShell, but an appeal of Windows has always been the management user interface, especially important for adoption by SMB’s that do not have a dedicated IT staff. With Windows Home Server being replaced by Windows Server 2012 Essentials, the lack of storage management via the UI will require regular users to become PowerShell experts, or maybe Microsoft anticipates that configuration UI’s will be developed by hardware OEM’s deploying Windows Storage Server 2012 or Windows Server 2012 Essentials based systems.

My feeling is that Storage Spaces will be one of those technologies that matures and becomes generally usable after one or two releases or service packs post the initial release.

 

I tested disk performance using ATTO Disk Benchmark 2.47, and CrystalDiskMark 3.01c.

I ran each test twice, back to back, and report the average. I realize two runs are not statistically significant, but with just two runs it took several days to complete the testing in between regular work activities. I opted to only publish the CrystalDiskMark data as the ATTO Disk Benchmark results varied greatly between runs, while the CrystalDiskMark results were consistent.

Consider the values useful for relative comparison under my test conditions, but not useful for absolute comparison with other systems.

 

Before we get to the results, a word on the tests.

The JBOD tests were performed using the C600 SATA3 controller.
The Simple, Mirror, Triple, and RAID0 tests were performed using the SAS 2308 SAS2 controller.
The Parity, RAID5, RAID6, and CacheCade tests were performed using the SAS 9286CV-8e controller.

The Simple test created a simple storage pool.
The Mirror test created a 2-way mirrored storage pool.
The Triple test created a 3-way mirrored storage pool.
The Parity test created a parity storage pool.
The Journal test created a parity storage pool, with SSD drives used for the journal disks.
The CacheCade test created RAID sets, with SSD drives used for caching.

 

As I mentioned earlier, there is next to no documentation on how to use Storage Spaces. In order to use SSD drives as journal drives, I followed information provided in a TechNet forum post.

Create the parity storage pool using PowerShell or the GUI. Then associate the SSD drives as journal drives with the pool.

Windows PowerShell
Copyright (C) 2012 Microsoft Corporation. All rights reserved.

PS C:\Users\Administrator> Get-PhysicalDisk -CanPool $True

FriendlyName CanPool OperationalStatus HealthStatus Usage Size
------------ ------- ----------------- ------------ ----- ----
PhysicalDisk4 True OK Healthy Auto-Select 447.13 GB
PhysicalDisk5 True OK Healthy Auto-Select 447.13 GB

PS C:\Users\Administrator> $PDToAdd = Get-PhysicalDisk -CanPool $True
PS C:\Users\Administrator>
PS C:\Users\Administrator> Add-PhysicalDisk -StoragePoolFriendlyName "Pool" -PhysicalDisks $PDToAdd -Usage Journal
PS C:\Users\Administrator>
PS C:\Users\Administrator>
PS C:\Users\Administrator> Get-VirtualDisk

FriendlyName ResiliencySettingNa OperationalStatus HealthStatus IsManualAttach Size
me
------------ ------------------- ----------------- ------------ -------------- ----
Pool Parity OK Healthy False 18.18 TB

PS C:\Users\Administrator> Get-PhysicalDisk

FriendlyName CanPool OperationalStatus HealthStatus Usage Size
------------ ------- ----------------- ------------ ----- ----
PhysicalDisk0 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk1 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk2 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk3 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk4 False OK Healthy Journal 446.5 GB
PhysicalDisk5 False OK Healthy Journal 446.5 GB
PhysicalDisk6 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk7 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk8 False OK Healthy Auto-Select 447.13 GB
PhysicalDisk10 False OK Healthy Auto-Select 14.9 GB

PS C:\Users\Administrator>

I initially added the journal drives after the virtual drive was already created, but that would not use the journal drives. I had to delete the virtual drive, recreate it, and then the journal drives kicked in. There must be some way to manage this after virtual drives already exist, but again, no documentation.

 

In order to test Storage Spaces using the SAS 9286CV-8e RAID controller I had to switch it to JBOD mode using the commandline MegaCli utility.


D:\Install>MegaCli64.exe AdpSetProp EnableJBOD 1 a0

Adapter 0: Set JBOD to Enable success.

Exit Code: 0x00

D:\Install>MegaCli64.exe AdpSetProp EnableJBOD 0 a0

Adapter 0: Set JBOD to Disable success.

Exit Code: 0x00

D:\Install>

 

The RAID and CacheCade disk sets were created using the LSI MegaRAID Storage Manager GUI utility.

 

Below is a summary of the throughput results:

ReadWriteKBPS

ReadWriteIOPS

 

Not surprisingly the SSD drives had very good scores all around for JBOD, Simple, and RAID0. I only had two drives to test with, but I expect more drives to further improve performance.

The Simple, Mirror, and Triple test results speak for themselves, performance halving, and halving again.

The Parity test shows good read performance, and bad write performance. The write performance approaches that of a single disk.

The Parity with SSD Journal disks shows about the same read performance as without journal disks, and the write performance double that of a single disk.

The RAID0 and Simple throughput results are close, but the RAID0 write IOPS doubling that of the Simple volume.

The RAID5 and RAID6 read performance is close to Parity, but the write performance almost ten fold that of Parity. It appears that the SLI card writes to all drives in parallel, while Storage Spaces parity writes to one drive only.

The CacheCade read and write performance is less than without CacheCade, but the IOPS ten fold higher.

The ReFS performance is about 30% less than the equivalent NTFS performance.

 

 

Until Storage Spaces gets thoroughly documented and improves performance, I’m sticking with hardware RAID solutions.

Dyslexic Intel RSTe Driver

I encounter one problem after another running Windows 8 and Server 2012 on the dual Xeon E5 Intel C600 chipset based SuperMicro 7047A-T and 7047A-73 SuperWorkstation machines. I will say that this is really not representative of my Windows 8 experience in general, as all other machines I installed on worked fine with the in-box drivers.

The C602 includes the Intel Storage Controller Unit (SCU) SATA / SAS controller. Windows 8 and Server 2012 do not include in-box drivers for the SCU. The SCU drivers are part of the Intel Rapid Storage Technology Enterprise (RSTe) driver set. Note that the RSTe and RST drivers are different and not compatible with one another. When you install the full RSTe package, it includes SCU drivers for the SCU RAID controller, AHCI drivers for the SATA controller, and the Windows RST management application.

A clean install of Windows 8 will use the in-box drivers for the SATA controller. In the image below you can see the Intel 520 Series 480GB SSD drive show up with the correct model number:

Device.Manager.Win8

After installing RSTe (3.2.0.1132, 3.2.0.1134), the 4TB Hitachi drives attached to the SCU show up, but the model numbers of the drives, including the SSD drive attached to the SATA port, are now messed up:

Device.Manager.RSTe

The drive hardware identifiers are correct, but the friendly name is not:

Intel.SSD.Hardware

Intel.SSD.Friendly

It appears that the text BYTE’s are WORD swapped, i.e. ABCD becomes BADC.

The driver is also not functional, attempting to create a storage spaces pool using the Hitachi drives hangs forever, with no drive activity, requiring a hard power cycle:

Storage.Pool

And lastly, the Intel SSD Toolbox 3.0.3 is not compatible with Windows 8:

SSD.Toolbox

The clock is ticking for Windows Server 2012 (4 September, 1 day left) and Windows 8 (26 October, 7 weeks left) general availability, I can only hope compatible drivers, firmware, and utilities are forthcoming.

 

[Update: 4 September 2012]
SuperMicro posted updated RSTe drivers (package v3.5.0.1101, driver v3.5.0.1096). This driver set resolves the hang during storage space creation, but the drive names are still messed up.

Windows 8 VIDEO_TDR_FAILURE Madness

I finally figured out why I kept on getting VIDEO_TDR_FAILURE BSOD’s when installing Windows 8 on my SuperMicro workstations. It turns out that the problem goes away when I use a PCIe slot associated with CPU #1, instead of a slot associated with CPU #2.

Some history on my adventures with Windows 8 and SuperMicro SuperWorkstations:
I got ACPI_BIOS_ERROR BSOD’s while installing Windows 8, SuperMicro provided a Beta BIOS that resolved the problem.
The Windows 8 install hangs if installing to a SSD drive on a LSI 2308 SAS controller, that issue is still unresolved, but can be worked around by connecting the SSD to the Intel SATA controller.
I got VIDEO_TDR_ERROR BSOD’s while installing Windows 8 with a NVidia Quadro 5000 graphic card, same with an ATI FirePro V7900 or a NVidia GeForce GTX 680 or an ATI HD 7970. And this post is about resolving that problem.

 

SuperMicro released v1.0a BIOS updates for the X9DAi and X9DA7 motherboards used in the 7470A-T and 7470A-73 SuperWorkstations. I was hoping this will resolve the VIDEO_TDR_FAILURE BSOD’s, but no.

The X9DA7 BIOS updated without issue, but the X9DAi update reported an error at the end of the update process; “Error when sending Enable Message to ME”.

I contacted SuperMicro support, and they asked me to make sure that there is no jumper on JPME1. There is no mention of JPME1 in the motherboard manual, but it is located next to JIPMB1, next to PCIe slot #1. The header had a jumper on pins 2 and 3, where the same header on the X9DA7 motherboard had a jumper between 1 and 2. I removed the jumper, and the BIOS update succeeded.

JPME1

 

Unlike the ACPI_BIOS_ERROR BSOD that happens during the WinPE phase of the install, the VIDEO_TDR_FAILURE BSOD happens on the first boot after the install, during the hardware detection and driver install phase. This means that the technique I used to kernel debug the initial boot phase will not work, as the second boot is using the BCD already deployed to the target hard drive. I had to modify the BCD of the already installed image, prior to the install continuing after the reboot.

 

I tested many permutations of graphic cards and configurations, and it quickly became very annoying to have to type my Win8 product key every single time I boot and install. To avoid this I created configuration files in the sources directory on the install media, and this bypassed the key question. You can read more about the meaning of the file contents here:

EI.cfg:

[EditionID]
Professional
[Channel]
Retail
[VL]
0

PID.txt:

[PID]
Value=XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

 

To modify the BCD of the installed image, and be able to easily repeat the second phase of install testing, I installed a second hard drive, and deployed WinPE to the second drive. By using F11 during boot to choose the boot drive, I could select booting from the second drive at any time.

 

I have a variety WinPE v3 (Win7) based utility images, and I updated them to use WinPE v4 (Win8). In the process I lost the boot menu, and the first image in the menu automatically started booting. After some trial and error, I found the bootmenupolicy BCD option, and when set to legacy mode, the old style menu is back:

bcdedit /set {default} bootmenupolicy legacy

 

I installed Win8 on the primary drive, and during the reboot, instead of booting to the installed Win8 drive, I used F11 and booted to my secondary WinPE drive. From WinPE I modified the boot BCD to enable kernel debugging over the network:

bcdedit -store c:\boot\bcd /set {default} nocrashautoreboot yes
bcdedit -store c:\boot\bcd /set {default} debugtype net
bcdedit -store c:\boot\bcd /set {default} hostip 3232235876
bcdedit -store c:\boot\bcd /set {default} port 50000
bcdedit -store c:\boot\bcd /set {default} key my.secret.debug.key
bcdedit -store c:\boot\bcd /debug {default} yes

This is equivalent to:

bcdedit /dbgsettings net host:192.168.1.100 port:50000 key:my.secret.debug.key

But unlike the dbgsettings command, this allows me to specify a BCD store. Also note that the IP address is stored as a single numeric value instead of the dotted IP format.

 

While still in WinPE, I captured the state of the primary Win8 drive by making a drive image using Symantec Ghost, the real Ghost, currently sold as Symantec Ghost Solution Suite, not the same named but volume snapshot based Norton Ghost or Symantec System Recovery. By saving a drive image, I can easily change hardware or configurations, test the install starting at the second phase, reboot to the secondary WinPE drive using F11, restore the entire drive image, and try again, while leaving the kernel debug options intact.

 

I tested with following hardware configurations in various permutations:

 

With the kernel debugger attached, I captured the following crash details in WinDbg for NVidia based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa80211cd010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800782d0d8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000002, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`12c76078 fffff801`66fef0ea : 00000000`00000000 00000000`00000116 fffff880`12c761e0 fffff801`66f734b8 : nt!DbgBreakPointWithStatus
fffff880`12c76080 fffff801`66fee742 : 00000000`00000003 fffff880`12c761e0 fffff801`66f73e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`12c760e0 fffff801`66ef4144 : fffffa80`2094b100 fffff880`021ee9c0 fffffa80`1f54e400 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`12c76800 fffff880`04b33dcb : 00000000`00000116 fffffa80`211cd010 fffff880`0782d0d8 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`12c76840 fffff880`04b32518 : fffff880`0782d0d8 fffffa80`211cd010 fffff880`12c76949 00000000`000000c7 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`12c76880 fffff880`04a1e608 : fffffa80`211cd010 fffff880`12c76949 00000000`00000000 00000000`00000002 : dxgkrnl!TdrIsRecoveryRequired+0x168
fffff880`12c768b0 fffff880`04a4d539 : 00000000`00000000 fffff780`00000320 00000000`00000000 fffffa80`1f54e400 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`12c769b0 fffff880`04a4ba49 : fffffa80`00000002 fffffa80`1f54e400 fffffa80`1f54e840 fffffa80`1f54e840 : dxgmms1!VidSchiCheckHwProgress+0xe5
fffff880`12c76a00 fffff880`04a16fe5 : ffffffff`ff676980 00000000`00000001 fffff880`12c76b69 fffffa80`1f54e400 : dxgmms1!VidSchiWaitForSchedulerEvents+0x20d
fffff880`12c76aa0 fffff880`04a4b646 : 00000000`00000000 00000000`0000000f fffffa80`1f54e400 fffffa80`1f54e400 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`12c76bd0 fffff801`66e9b521 : fffffa80`1f5abb00 fffffa80`1f54e400 fffff880`03b01140 00000000`06a21e1e : dxgmms1!VidSchiWorkerThread+0xca
fffff880`12c76c10 fffff801`66ed9dd6 : fffff880`03af5180 fffffa80`1f5abb00 fffff880`03b01140 fffffa80`19aac040 : nt!PspSystemThreadStartup+0x59
fffff880`12c76c60 00000000`00000000 : fffff880`12c77000 fffff880`12c71000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

SYMBOL_NAME: nvlddmkm+1ae0d8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf93d7

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

 

With the kernel debugger attached, I captured the following crash details in WinDbg for ATI based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa801ed114d0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800725cefc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`06fa9ee8 fffff803`e6ff20ea : 00000000`00000000 00000000`00000116 fffff880`06faa050 fffff803`e6f764b8 : nt!DbgBreakPointWithStatus
fffff880`06fa9ef0 fffff803`e6ff1742 : 00000000`00000003 fffff880`06faa050 fffff803`e6f76e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`06fa9f50 fffff803`e6ef7144 : fffffa80`1e2df4e0 fffff880`020b99c0 fffffa80`1d31f010 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`06faa670 fffff880`04d31dcb : 00000000`00000116 fffffa80`1ed114d0 fffff880`0725cefc 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`06faa6b0 fffff880`04d30548 : fffff880`0725cefc fffffa80`1ed114d0 fffff880`06faa7b9 00000000`00000180 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`06faa6f0 fffff880`04c11608 : fffffa80`1ed114d0 fffff880`06faa7b9 00000000`0000000f fffffa80`1d31f8f8 : dxgkrnl!TdrIsRecoveryRequired+0x198
fffff880`06faa720 fffff880`04c459f9 : 00000000`00000001 fffff880`06faa8a0 fffff880`06faa920 00000000`00000000 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`06faa820 fffff880`04c3ff72 : fffffa80`1d31f010 fffff780`00000320 fffffa80`1d31f770 fffffa80`1d31f010 : dxgmms1!VidSchWaitForCompletionEvent+0x411
fffff880`06faa8e0 fffff880`04c4206c : fffffa80`1d31f010 fffffa80`1d31f450 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiWaitForEmptyHwQueue+0x9a
fffff880`06faa9d0 fffff880`04c3ea85 : 00000000`00000000 fffffa80`1d31f010 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiSuspend+0x74
fffff880`06faaa00 fffff880`04c09fe5 : ffffffff`ff676980 00000000`00000001 fffff880`06faab69 fffffa80`1d31f010 : dxgmms1!VidSchiWaitForSchedulerEvents+0x249
fffff880`06faaaa0 fffff880`04c3e646 : 00000000`00000000 fffffa80`1d585660 fffffa80`1d44d7f0 fffffa80`1d31f010 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`06faabd0 fffff803`e6e9e521 : fffffa80`1d6b9b00 fffffa80`1d31f010 fffff880`03932140 00000000`04d91ecb : dxgmms1!VidSchiWorkerThread+0xca
fffff880`06faac10 fffff803`e6edcdd6 : fffff880`03926180 fffffa80`1d6b9b00 fffff880`03932140 fffffa80`19ac7500 : nt!PspSystemThreadStartup+0x59
fffff880`06faac60 00000000`00000000 : fffff880`06fab000 fffff880`06fa5000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

SYMBOL_NAME: atikmpag+8efc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: atikmpag

IMAGE_NAME: atikmpag.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf9279

FAILURE_BUCKET_ID: 0x116_IMAGE_atikmpag.sys

BUCKET_ID: 0x116_IMAGE_atikmpag.sys

 

This was not really helping me much, and I decided to repeat the tests but use the checked build of Windows 8 to help troubleshoot.

With the kernel debugger attached, I captured the following ASSERT during the boot:

Windows 8 Kernel Version 9200 MP (1 procs) Checked x64
Built by: 9200.16384.amd64chk.win8_rtm.120725-1247
Machine Name:
Kernel base = 0xfffff802`0e01d000 PsLoadedModuleList = 0xfffff802`0e760ac0
System Uptime: 0 days 0:00:06.228 (checked kernels begin at 49 days)
Assertion: The BIOS has reported inconsistent resources (_CRS). Please upgrade your BIOS.
ACPI!PnpBiosGetDeviceResourceList+0x15e:
fffff880`012c3c2a cd2c int 2Ch
...
Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME: System

FAULTING_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x0

CURRENT_IRQL: 0

LOCK_ADDRESS: fffff8020e7c5d60 -- (!locks fffff8020e7c5d60)

Resource @ nt!PiEngineLock (0xfffff8020e7c5d60) Exclusively owned
Threads: fffffa8019a36040-01<*>
1 total locks, 1 locks currently held

PNP_TRIAGE:
Lock address : 0xfffff8020e7c5d60
Thread Count : 1
Thread address: 0xfffffa8019a36040
Thread wait : 0x105eccd4

LAST_CONTROL_TRANSFER: from fffff880012b736f to fffff880012c3c2a

STACK_TEXT:
fffff880`009b4b30 fffff880`012b736f : fffffa80`23a9e900 fffff880`012a7e01 fffff880`009b4c08 fffff880`012a7e70 : ACPI!PnpBiosGetDeviceResourceList+0x15e
fffff880`009b4bd0 fffff880`0125acba : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`012a7e70 fffffa80`1f477010 : ACPI!ACPIBusIrpQueryResourceRequirements+0x8b
fffff880`009b4c50 fffff802`0e91b6a4 : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`009b4db0 fffffa80`23a9e900 : ACPI!ACPIDispatchIrp+0x2a6
fffff880`009b4cf0 fffff802`0e91cd1b : fffffa80`23a9e900 fffff880`009b4db0 00000001`c00000bb 00000000`00000000 : nt!IopSynchronousCall+0x10c
fffff880`009b4d80 fffff802`0e915bdb : fffffa80`23a9e900 fffff880`009b4e50 fffffa80`23a4f850 00000000`0000001e : nt!PpIrpQueryResourceRequirements+0x5f
fffff880`009b4e10 fffff802`0e91748d : fffffa80`23a9b8e0 00000000`00000000 ffffffff`80000218 fffffa80`23a9b8e0 : nt!PiQueryResourceRequirements+0x47
fffff880`009b4ea0 fffff802`0e91a1f2 : fffffa80`23a9b8e0 fffffa80`23a9b8e0 00000000`00000001 00000000`00000000 : nt!PiProcessNewDeviceNode+0x159d
fffff880`009b5070 fffff802`0e08feb5 : fffffa80`19adcd20 00000000`00000000 fffff880`009b5358 00000000`00000000 : nt!PipProcessDevNodeTree+0x1fe
fffff880`009b5310 fffff802`0e08fb59 : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa80`37e19cc0 : nt!PnpDeviceActionWorker+0x345
fffff880`009b53d0 fffff802`0ed4010d : 00000000`00000000 fffff8a0`00000007 fffff8a0`00f08c00 00000000`00000000 : nt!PnpRequestDeviceAction+0x2ed
fffff880`009b5420 fffff802`0ed3b39d : fffff802`0d536800 fffff802`0e7c83c0 00000000`00000006 fffff802`0d536800 : nt!IopInitializeBootDrivers+0x905
fffff880`009b5650 fffff802`0ed2deb5 : fffff802`0d536800 00000000`00000000 fffff802`0d536800 fffff802`0d51ebf0 : nt!IoInitSystem+0xb5d
fffff880`009b59b0 fffff802`0e82d013 : fffff802`0d536800 fffffa80`19a36040 00000000`00000000 fffffa80`19ab3040 : nt!Phase1InitializationDiscard+0x1899
fffff880`009b5bc0 fffff802`0e1b289e : fffff802`0d536800 fffff802`0d536800 00000000`00000000 00000000`00000000 : nt!Phase1Initialization+0x13
fffff880`009b5bf0 fffff802`0e24ef96 : fffff802`0e82d000 fffff802`0d536800 fffff802`0e6c6180 00000000`f8ffffff : nt!PspSystemThreadStartup+0x1a2
fffff880`009b5c60 00000000`00000000 : fffff880`009b6000 fffff880`009b0000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: ACPI!PnpBiosGetDeviceResourceList+15e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME: ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 50109dd0

BUCKET_ID_FUNC_OFFSET: 15e

FAILURE_BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

 

This is interesting, the kernel ASSERT’s on a problem reported by the BIOS.

I contacted SuperMicro support, they said they will investigate the BIOS failure, and they suggested I try to use PCIe slot #3 instead of slot #5. The motherboard manual mentions that slots #1, #2, and #3 are to be used if CPU #1 is installed, and slots #4, #5, and #6 to be used only if CPU #2 is installed.

PCIe

I have both processors installed, so not using the more conveniently located slot #5 never came to mind. I moved the graphic card to CPU #1 slot #3, and voila, install succeeded and Windows 8 was up and running!

 

I repeated the checked build test with the graphic card in slot #3, and the same BIOS ASSERT error was reported, so the BIOS ASSERT seems to be unrelated to the ACPI_TDR_FAILURE error.

 

This was a very frustrating problem, and I still don’t understand the root cause, but I am happy to be able to finally switch both workstations to Windows 8.

Windows 8 Install Crash With NVidia Quadro 5000

I got Windows 8 RTM installed on my two SuperMicro SuperWorkstation machines, with a bit of trouble along the way, but nothing I could not work around. But, I ran into a problem with NVidia Quadro 5000 cards causing a VIDEO_TDR_FAILURE BSOD during the Windows 8 install process.

 

I was running my two workstations with ATI FirePro V7900 graphic cards, but I decided I wanted a bit more rendering horsepower. I wanted a card that had a good balance between modern architecture, great 2D performance, good 3D performance, OpenCL or CUDA support, and reasonable power consumption. I found the Tom’s Hardware Workstation Graphics 2012 benchmark site to be a very informative, and I decided that the NVidia Quadro 5000 was a very good choice.

I replaced my FirePro V7900 with the Quadro 5000, and started the Windows 8 x64 RTM install. All went well, until the first reboot during the install, and the machine would blue screen crash with a VIDEO_TDR_FAILURE. During the install process the hardware is identified, the appropriate drivers extracted, and on the reboot those drivers are started. It appears that soon after the NVidia driver loads, that it crashes.

 

The Timeout Detection and Recovery (TDR) feature was added to Windows Vista, and was a way for the OS to recover from a renderer failure without the need to restart the machine. Typically the user will see a notification that the graphic subsystem was restarted, but in cases where the restart fails, a VIDEO_TDR_FAILURE blue screen crash is generated.

The web is full of reports of NVidia VIDEO_TDR_FAILURE crashes, and solutions typically involve replacing the hardware or updating drivers. In my case I had two new machines, and two new graphic cards, and a brand new operating system, and both cards on both machines crashed.

I contacted SuperMicro support, and responsive as they always are, said they would investigate.

I also contacted PNY support, as PNY is the manufacturer of the NVidia Quadro 5000, here is their reply.

Again, I am sorry, but we do not list Windows 8 (yet) as being compatible with the Quadro 5000, or any other Quadro or Geforce card we manufacture. Until it is publically and commercially available, we cannot provide support for Windows 8. Windows 8 is not available to the end user yet, and it is in testing, as is the Nvidia driver. If you find issues, you must report them to Microsoft in order to improve compatibility in the final release. There is obviously a compatibility problem with Windows 8 and the Quadro 5000 right now (according to your testing of TWO cards), and unfortunately there is nothing we can do to fix it while in is not available to the public. My best advice is to try it again when it is officially released sometime in 2013.

Not very helpful at all, and their concept of Windows 8 release timing, and their responsibility, is way out there.

 

The real problem here is that it is the in-box NVidia drivers that are crashing, not drivers I install later. And as it is the in-box graphic drivers that crash, there is no (easy) way to update the drivers used by the Windows 8 install media.

 

I had previously used a Quadro 4000 card on the same machines, and they installed without incident, so it appears to be something unique the Quadro 5000 cards.

At this time I am waiting for SuperMicro to get back to me with suggestions, as I have little hope of hearing anything useful from PNY.

Windows 8 Install Hangs Booting From LSI 2308 SAS Controller

I’ve previously posted about problems installing Windows 8 on SuperMicro machines, and that SuperMicro released a Beta BIOS that solved the install problems. I’ve since run into two more problems; the install hanging when booting of the LSI 2038 SAS controller, and a BSOD when using a Quadro 5000 video card (more on that in a later post).

 

I have two SuperWorkstation machines, a 7047A-T using a X9DAi motherboard, and a 7047A-73 using a X9DA7 motherboard.

The X9DAi and X9DA7 both use the Intel C602 chipset. The X9DAi and X9DA7 both have 2 x SATA3 ports, 4 x SATA2 ports, and 4 x SAS / SATA2 ports. The X9DA7 has an additional LSI 2308 controller with 8 x SAS2 / SATA3 ports.

On the 7047A-T / X9DAi machine, the 8 x hot-swap drive trays are connected to the 2 x SATA3, 2 x SATA2, and 4 x SAS / SATA2 ports.

On the 7047A-73 / X9DA7 machine, the 8 x hot-swap drive trays are connected to the 8 x LSI 2308 ports.

SuperMicro support provided me with Beta BIOS’s for the X9DAi and X9DA7 motherboards, this resolved the ACPI_BIOS_ERROR, and allowed me to install Windows 8 RTM on these machines, or at least get past the BSOD while booting the install media.

 

I configured both machines with:

 

In the 7047A-T / X9DAi machine, I installed the SSD drive in slot-0 of the hot-swap trays, connected to SATA3 port-0. I installed Windows 8 x64 RTM without issue.

 

In the 7047A-73 / X9DA7 machine I installed the SSD drive in slot-0 of the hot-swap trays, connected to LSI2308 port-0. I installed Windows 8 x64 RTM, and the install hanged at 0% while copying files.

While in this state, I suspected the problem to be IO related, so I pressed Shift-F10 to open a console window, I ran diskpart, and diskpart hanged.

I downloaded the latest LSI 2308 drivers from the Supermicro FTP site. I ran the install again, this time I manually loaded the drivers instead of using the in-box drivers, same problem, hang at 0%.

LSI does not make drivers directly available for HBA chips, but the LSI SAS 9205-8e uses the LSI 2308, and I downloaded the drivers from LSI. They were the same version as the drivers available on the SuperMicro FTP site.

 

I contacted SuperMicro support, they suggested I install using the SATA3 port while they research the problem. Connecting the SSD drive to SATA3 port-0 installed fine.

 

I tested the same setup using Windows 7, and although Windows 7 did not include in-box drivers for the LSI 2308, after loading the drivers, Windows 7 installed fine with the SSD connected to LSI2308 port-0.

This probably indicates a Windows 8 compatibility problem with the LSI 2308 driver, or HBA firmware.

 

LSI HBA’s can be configured to run in Initiator Target (IT) or Integrated RAID (IR) mode. This can be changed by flashing with the appropriate IT or IR firmware. IT firmware is typically preferred where there is no need for hardware RAID and all disks will be in JBOD mode, e.g. for use with ZFS or Storage Spaces.

When you flash between IT and IR mode, you need to erase the firmware before re-flashing, i.e. you cannot simply flash one mode on top of another mode. On the SuperMicro motherboards, you also need to perform the flash operation from within the EFI shell, flashing from other environments will fail. You can follow these KB’s to help with the process; LSI 16266, SuperMicro 14368, and SuperMicro 14151. I would not recommend using SuperMicro 14368 method, as it wipes the entire firmware memory, and you will need to manually re-enter the SAS address. It is basically the difference between using “sas2flash -o -e 6” and “sas2flash -o -e 7”, see the SAS2flash reference guide for details.

SAS2Flash

 

The X9DA7 motherboard came with firmware version 13.0.0.56 for the LSI 2308, configured in IR mode. I updated the firmware using the firmware from the SuperMicro FTP site to 13.0.0.57 in IT mode.

The update process I followed was to boot into the EFI shell while having a USB drive attached containing the firmware update, the drive must contain the firmware, the boot BIOS, and the SAS2Flash.efi tool.

In the EFI shell run the “map” command to list the hardware and see which drive is the USB drive, mount that drive using “mount fs[drive number]:”, e.g. “mount fs1:”, then change to the directory to the USB drive using “fs1:”:

map
mount fs1:
fs1:

Then wipe the flash “sas2flash -o -e 6”, then program the new firmware and boot code “sas2flash -o -f [firmware file] -b [bootcode file]”, e.g. “sas2flsh -f 2308IT13.5FW -b mptsas2.rom”, then restart.:

sas2flash -o -e 6
sas2flsh -f 2308IT13.5FW -b mptsas2.rom
reset

Same problem, hang at 0%.

 

I again referred to the LSI site for updated firmware for the LSI 2308, and the LSI SAS 9205-8e and LSI SAS 9207-8e includes firmware P14 version 14.0.0.0, a major revision upgrade from version 13.0.0.57 from the SuperMicro site.

The P14 firmware packages does not include the EFI version of SAS2Flash, but a bit of search engine exploration showed it is still included in the P13 packages.

 

I am not quite brave enough to flash to this version yet, as a failed flash will require a hardware swap. I’ll continue running this machine with the SSD connected to the SATA3 port.

 

At this point I am waiting for SuperMicro support to get back to me with a solution, or confirming that I can flash the P14 firmware and see if that resolves the issue.

SuperMicro Beta BIOS supports Windows 8 and Server 2012

In a previous post I reported that my SuperMicro SuperWorkstation 7047A-T failed to install Windows 8 or Windows Server 2012 due to a ACPI_BIOS_ERROR. I contacted SuperMicro support, and I was informed that new BIOS releases are on their way that will support Windows 8 and Server 2012.

This morning I received an email from SuperMicro, with a new Beta BIOS for the X9DAi motherboard used in the 7047A-T. The new BIOS allowed me to install Windows 8 and Server 2012.

I used a DOS bootable USB key, and installed the new BIOS.

The 7047A-T has USB ports on the back and on the front of the case. The ports on the front are all USB3, and it is not possible to boot from these ports, at least I have not yet found a configuration that allows booting from USB3 ports. I tried using USB2 keys and, my newest Kingston DataTraveler HyperX 3.0 super fast USB3 keys, the BIOS does not list any boot devices in these USB3 ports. To boot from USB you have to plug the USB key in one of the rear USB2 ports.

The new BIOS version is “1.0 beta”, compilation date “7/23/2012”. The BIOS screen looks like the more modern AMI EFI BIOS’s I’ve seen in other devices, i.e. the thin font instead of the classic console font.

BIOS.Beta

I performed a “Restore Optimized Defaults”, and then went through the options to see what has changed and what is new.

The [Advanced] [Chipset Configuration] [North Bridge] [IOH Configuration] now sets all PCIe busses to GEN3, the old BIOS defaulted to GEN2.

The [Advanced] [SATA Configuration] now enabled hot plug on all ports, the old BIOS defaulted to hot plug disabled.

The [Advanced] [Boot Feature] ads a new power configuration item called “EuP”. This seems to be related to EU Directive 2005/32/EC:

EU Directive 2005/32/EC enacted by the European Union member countries dictates that after January 1, 2010, no computer or other energy using product (EuP) sold in the member countries may dissipate more than 1 Watt in the standby (S5) state.

I measured the power utilization, and the machine uses 2W when powered off, 140W at idle in Windows 8 desktop, and 7W while sleeping.

I updated my Windows 8 USB key to the latest build (I have access to), booted from the USB key, and installed Windows 8 without any major issues.

I had swapped the NVidia Quadro 4000 for a faster ATI FirePro V7900. The v1.0 BIOS worked fine with the Quadro 4000, but after installing the V7900, the screen powered on and Windows 7 started booting before I had a chance to see the BIOS screen. After installing the new Beta BIOS, the V7900 works as expected and I can see the BIOS screen during POST.

This is a note for ATI; please make sure your VGA driver install UI fits on a 640×480 display. When I swapped the Quadro 4000 for the V7900, and rebooted into Windows 7, I booted into a 640×480 16 color screen. Imagine my frustration trying to guess which button has focus when you can only see the top half of the ATI driver installer.

Windows 8 automatically installed drivers for the V7900.

The only driver Windows 8 did not automatically install is the C600 chipset SAS driver. I installed the Intel Rapid Storage Technology Enterprise (RSTe) drivers, and that solved that problem.

While running Windows 7 on this machine, and running the Windows Experience Index Assessment, the test would always crash. The same test in Windows 8 completed successfully.

Win8.EI

I found the 2D and 3D results to be disappointing, and I tried to replace the “ATI FirePro V (FireGL V) Graphics Adapter (Microsoft Corporation – WDDM v1.20)” driver with the ATI Windows 8 Consumer Preview driver. Although the release notes indicate that the V7900 is supported, the driver installation failed with an unsupported hardware error. I’ll have to wait for newer Windows 8 drivers from ATI to see if the test scores improve.

I’m quite happy that I can use my new machines with Windows 8.

I just wish SuperMicro solved the BIOS incompatibility problems long ago, after all, it has been almost two years since the Windows 8 pre-release program started, and almost a year since the release of the public developer preview.

Debugging Windows 8 Install BSOD

In my last post I described how to prevent Windows from automatically restarting when encountering a BSOD during the OS install process. This allowed me to see the  ACPI_BIOS_ERROR fault code while installing Windows 8 on my new SuperMicro workstation. The new Windows 8 BSOD page looks friendly, but no longer displays any error parameters other than the main fault code.

In order to get additional details of the crash, I had to hook up a kernel debugger to the machine. Windows 8 adds USB3 and TCPIP kernel debug support, and I will describe how I used the TCPIP network option to capture details of the crash.

 

First thing to do is prepare our tools, download the Windows 8 Debugging Tools for Windows package, and the Windows 8 Symbols.

Unfortunately the debugging tools are no longer available as a standalone download, and you need to install the SDK or WDK on a Windows 8 system in order to get them, but you can choose to only install the debugging tools. Once you installed the debugging tools on one machine, you can copy the MSI installers or the directory to any other machines, including Windows 7 systems. You will find the tools in the “C:\Program Files (x86)\Windows Kits\8.0\Debuggers” folder.

Microsoft is pretty good at publishing symbols for most released versions of their products to their public symbol server, but I prefer to extract the symbols to a working directory on my machine, or to upload the symbols to our internal symbol server. You can install the downloaded symbols MSI package directly, or use the following command to extract the symbols from the MSI file to a location on disk. Run an elevated (right click run as administrator) command prompt, and type:

msiexec /a [symbol msi file name] /qb targetdir="[output directory]"

 

Next we need to enable kernel network debugging in the BCD options. This needs to be done on a Windows 8 machine as the network debugging command is not supported in older versions of BCDEdit. I should also call out that network debugging support is required for hardware logo certification, but not all current adapters support it. Insert the bootable Windows 8 USB key, run an elevated command prompt, and type:

bcdedit –store [usb key drive]:\boot\bcd /dbgsettings net hostip:[IP of WinDbg machine] port:50000

BCDEdit will output the connection security key that is required by WinDbg.

 

Start WinDbg, and enable network kernel debugging, entering the port number and security key.

WinDbg.Network

 

Boot the target machine, you will see the target machine connecting to WinDbg:

Microsoft (R) Windows Debugger Version 6.2.8400.0 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Using NET for debugging
Opened WinSock 2.0
Waiting to reconnect...
Connected to target 192.168.1.106 on port 50000 on local IP 192.168.1.100.
Connected to Windows 8 8400 x64 target at (Fri Jul 20 11:07:21.583 2012 (UTC - 7:00)), ptr64 TRUE
Kernel Debugger connection established.

And then the ACPI_BIOS_ERROR crash:

25: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

ACPI_BIOS_ERROR (a5)
The ACPI Bios in the system is not fully compliant with the ACPI specification.
The first value indicates where the incompatibility lies:
This bug check covers a great variety of ACPI problems.  If a kernel debugger
is attached, use "!analyze -v".  This command will analyze the precise problem,
and display whatever information is most useful for debugging the specific
error.
Arguments:
Arg1: 0000000000000003, ACPI_FAILED_MUST_SUCCEED_METHOD
    ACPI tried to run a control method while creating device extensions
    to represent the ACPI namespace, but this control method failed.
Arg2: fffffa8019f2f288, The ACPI Object that was being run
Arg3: ffffffffc0000034, return value from the interpreter
Arg4: 00000000494e495f, Name of the control method (in ULONG format)

Debugging Details:
------------------

ACPI_OBJECT:  fffffa8019f2f288

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0xA5

PROCESS_NAME:  System

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from fffff803ca1e617a to fffff803ca0e5870

STACK_TEXT: 
fffff880`053eb418 fffff803`ca1e617a : 00000000`00000000 00000000`000000a5 fffff880`053eb580 fffff803`ca16b930 : nt!DbgBreakPointWithStatus
fffff880`053eb420 fffff803`ca1e57d2 : 00000000`00000003 00000000`494e495f fffff803`ca168810 00000000`000000a5 : nt!KiBugCheckDebugBreak+0x12
fffff880`053eb480 fffff803`ca0eb044 : 00000000`c0000034 fffff880`01038255 fffffa80`1a50fe78 00000000`c0000034 : nt!KeBugCheck2+0x79f
fffff880`053ebba0 fffff880`01043949 : 00000000`000000a5 00000000`00000003 fffffa80`19f2f288 ffffffff`c0000034 : nt!KeBugCheckEx+0x104
fffff880`053ebbe0 fffff880`0103bded : 00000000`00000000 00000000`00000000 00000000`00008004 00000000`c0000034 : ACPI!ACPIBuildCompleteMustSucceed+0x39
fffff880`053ebc20 fffff880`010346bd : fffffa80`1a500000 00000000`00008000 00000000`00000000 fffffa80`37e80000 : ACPI!AsyncCallBack+0x7f
fffff880`053ebc50 fffff880`01034f56 : fffffa80`1a500000 fffff880`01072be0 00000000`00000000 00000000`00000002 : ACPI!RunContext+0x141
fffff880`053ebc90 fffff880`010386e3 : fffffa80`19b1c3a0 00000000`00000000 00000000`00000000 fffffa80`19a35258 : ACPI!InsertReadyQueue+0xd6
fffff880`053ebcc0 fffff880`0103862a : fffff803`ca2eb490 fffff880`01072be0 00000000`00000000 00000000`546c6d41 : ACPI!RestartCtxtPassive+0x2f
fffff880`053ebcf0 fffff803`ca0cb181 : fffffa80`19e06b00 00000000`00000080 fffff880`04ac6540 00000000`00000000 : ACPI!ACPIWorkerThread+0xea
fffff880`053ebd50 fffff803`ca0dae26 : fffff880`04aba180 fffffa80`19e06b00 fffff880`04ac6540 fffffa80`19a8f940 : nt!PspSystemThreadStartup+0x59
fffff880`053ebda0 00000000`00000000 : fffff880`053ec000 fffff880`053e6000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND:  kb

FOLLOWUP_IP:
ACPI!ACPIBuildCompleteMustSucceed+39
fffff880`01043949 cc              int     3

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  ACPI!ACPIBuildCompleteMustSucceed+39

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME:  ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4fe6a2b1

BUCKET_ID_FUNC_OFFSET:  39

FAILURE_BUCKET_ID:  0xA5_ACPI!ACPIBuildCompleteMustSucceed

BUCKET_ID:  0xA5_ACPI!ACPIBuildCompleteMustSucceed

Followup: MachineOwner

 

Even with all the crash details, it still doesn’t really help me make progress, as it has been two days since I logged the support request with SuperMicro, and no response yet.

Windows 8 and Server 2012 on SuperMicro results in ACPI_BIOS_ERROR BSOD

I ran out of disk space on my development workstation, all those VM images add up. The machine has four drive bays, and all four have 3TB drives. I can replace the 3TB drives with 4TB drives, but migrating the RAID5 array will be time consuming and risky. I can add an external SAS storage enclosure, but they do not power down when the machine goes to sleep. So I looked at buying a new machine with more drive bays.

I’ve been using DELL Precision Workstations for my development machines for many years, they are fast and very reliable. My current workstation is a T5500, and I specifically chose the T5500 over the T7600 because of its features to physical size ratio. The T7600 does offer five drive bays over the T5500’s four, but if I’m going to change machines, adding only one more drive is not really worth the cost and effort.

Rather than buying a pre-configured and tested machine, I opted for the more exciting, sometimes rewarding, often frustrating, option of building my own. In order not to spend too much time on the project, I opted to use a chassis and motherboard combo, and just add peripherals. I chose the SuperMicro SuperWorkstation 7047A-T, containing the X9DAi motherboard. I specifically picked this model because it has eight hot-swap drive bays, is low noise, has a high efficiency PSU, and supports dual Intel Xeon E5-2600 processors.

I used 32GB Kingston KVR1600D3D4R11SK4/32GI memory, two Xeon E5-2660 processors, and an NVidia Quadro 4000 graphic card.

I prepared a USB key with Windows 8 x64 Release Preview. Microsoft does provide a tool to convert ISO images to USB keys, but I’ve been doing this by hand since long before the tool existed, and it is really easy and ultimately quicker to update.

Mount the ISO install image as a virtual drive using Virtual CloneDrive. Launch an elevated (right click run as administrator) command prompt, and run:

diskpart

list disk
select disk [number]
clean
create partition primary
select partition 1
active
format fs=fat32 quick
assign
exit

robocopy [virtual cd drive]:\ [usb key drive]:\ /mir

Once the USB key has been properly formatted, you only have to repeat the robocopy steps for any new builds or bits you want to copy.

I booted from the USB key, black screen with spinning circle animation, blue screen of sad face death, and an immediate reboot.

The machine rebooted so quickly I didn’t get a chance to see what the error was.

I tried Windows Server 2012 RC, same problem. I tried later builds of Windows 8 and Server 2012 (we are part of the Windows 8 Pre-Release Program, I hope I can say that now, at some point I was not even allowed to say that, like the Fight Club rules).

I logged a support case with SuperMicro, and I posted on the Microsoft Windows Server support forum. No reply yet from SuperMicro, no useful reply yet from the forum.

I think it is really silly that the default configuration of Windows is set to automatically reboot after a BSOD, even more so for an install situation. BSOD’s are serious, users and administrators need to know something terrible happened, even if they don’t immediately know what the error codes mean or what to do about it. I do know how to change the reboot option from inside windows, but I don’t know how to change it in the installer.

I was looking for a BCD option to disable auto-reboot, and after quite a bit of searching, I found a BcdOSLoaderBoolean_DisableCrashAutoReboot WMI BCD option on MSDN. After some more searching I found a NOCRASHAUTOREBOOT BCDEdit option.

That was really unusually difficult to find. Try it yourself, search for “nocrashautoreboot” and restrict the results to microsoft.com, there was only one hit on a Microsoft site, in a Word DOC file. Try the search on the rest of the web, and you get more hits.

Now that I knew what option to set, the rest was pretty easy. Insert the bootable USB key back in a working machine, open an elevated command prompt, and set the BCD option:

attrib -r [usb key drive]:\boot\bcd
bcdedit -store [usb key drive]:\boot\bcd -set {default} nocrashautoreboot yes

Start the install again, wait for the crash, and this time we can see the error is ACPI_BIOS_ERROR:

ACPI_BIOS_ERROR

There are many reports on the web about ACPI_BIOS_ERROR and Windows 8, most resolved by updating the BIOS, but also several reports of this error with SuperMicro motherboards, and unfortunately it seems without a positive resolution.

To make sure the problem was not peripheral or hardware related, I also installed Windows 7 and Windows Server 2008 R2, both installed and ran ok.

I use a KVM switch, and as I switched back to the machine while it was applying Windows Updates, there was some screen corruption that went away after the reboot. I updated the NVidia driver and the problem has not resurfaced, this may be a driver issue, or it may be a hardware issue:

NVIDIA

I am very disappointed that my brand new machine can only run Windows 7 and not Windows 8. I have yet to hear from SuperMicro support, but I hope they can resolve the problem with a BIOS update before Windows 8 and Windows Server 2012 is released in August.