Windows 8 VIDEO_TDR_FAILURE Madness

I finally figured out why I kept on getting VIDEO_TDR_FAILURE BSOD’s when installing Windows 8 on my SuperMicro workstations. It turns out that the problem goes away when I use a PCIe slot associated with CPU #1, instead of a slot associated with CPU #2.

Some history on my adventures with Windows 8 and SuperMicro SuperWorkstations:
I got ACPI_BIOS_ERROR BSOD’s while installing Windows 8, SuperMicro provided a Beta BIOS that resolved the problem.
The Windows 8 install hangs if installing to a SSD drive on a LSI 2308 SAS controller, that issue is still unresolved, but can be worked around by connecting the SSD to the Intel SATA controller.
I got VIDEO_TDR_ERROR BSOD’s while installing Windows 8 with a NVidia Quadro 5000 graphic card, same with an ATI FirePro V7900 or a NVidia GeForce GTX 680 or an ATI HD 7970. And this post is about resolving that problem.

 

SuperMicro released v1.0a BIOS updates for the X9DAi and X9DA7 motherboards used in the 7470A-T and 7470A-73 SuperWorkstations. I was hoping this will resolve the VIDEO_TDR_FAILURE BSOD’s, but no.

The X9DA7 BIOS updated without issue, but the X9DAi update reported an error at the end of the update process; “Error when sending Enable Message to ME”.

I contacted SuperMicro support, and they asked me to make sure that there is no jumper on JPME1. There is no mention of JPME1 in the motherboard manual, but it is located next to JIPMB1, next to PCIe slot #1. The header had a jumper on pins 2 and 3, where the same header on the X9DA7 motherboard had a jumper between 1 and 2. I removed the jumper, and the BIOS update succeeded.

JPME1

 

Unlike the ACPI_BIOS_ERROR BSOD that happens during the WinPE phase of the install, the VIDEO_TDR_FAILURE BSOD happens on the first boot after the install, during the hardware detection and driver install phase. This means that the technique I used to kernel debug the initial boot phase will not work, as the second boot is using the BCD already deployed to the target hard drive. I had to modify the BCD of the already installed image, prior to the install continuing after the reboot.

 

I tested many permutations of graphic cards and configurations, and it quickly became very annoying to have to type my Win8 product key every single time I boot and install. To avoid this I created configuration files in the sources directory on the install media, and this bypassed the key question. You can read more about the meaning of the file contents here:

EI.cfg:

[EditionID]
Professional
[Channel]
Retail
[VL]
0

PID.txt:

[PID]
Value=XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

 

To modify the BCD of the installed image, and be able to easily repeat the second phase of install testing, I installed a second hard drive, and deployed WinPE to the second drive. By using F11 during boot to choose the boot drive, I could select booting from the second drive at any time.

 

I have a variety WinPE v3 (Win7) based utility images, and I updated them to use WinPE v4 (Win8). In the process I lost the boot menu, and the first image in the menu automatically started booting. After some trial and error, I found the bootmenupolicy BCD option, and when set to legacy mode, the old style menu is back:

bcdedit /set {default} bootmenupolicy legacy

 

I installed Win8 on the primary drive, and during the reboot, instead of booting to the installed Win8 drive, I used F11 and booted to my secondary WinPE drive. From WinPE I modified the boot BCD to enable kernel debugging over the network:

bcdedit -store c:\boot\bcd /set {default} nocrashautoreboot yes
bcdedit -store c:\boot\bcd /set {default} debugtype net
bcdedit -store c:\boot\bcd /set {default} hostip 3232235876
bcdedit -store c:\boot\bcd /set {default} port 50000
bcdedit -store c:\boot\bcd /set {default} key my.secret.debug.key
bcdedit -store c:\boot\bcd /debug {default} yes

This is equivalent to:

bcdedit /dbgsettings net host:192.168.1.100 port:50000 key:my.secret.debug.key

But unlike the dbgsettings command, this allows me to specify a BCD store. Also note that the IP address is stored as a single numeric value instead of the dotted IP format.

 

While still in WinPE, I captured the state of the primary Win8 drive by making a drive image using Symantec Ghost, the real Ghost, currently sold as Symantec Ghost Solution Suite, not the same named but volume snapshot based Norton Ghost or Symantec System Recovery. By saving a drive image, I can easily change hardware or configurations, test the install starting at the second phase, reboot to the secondary WinPE drive using F11, restore the entire drive image, and try again, while leaving the kernel debug options intact.

 

I tested with following hardware configurations in various permutations:

 

With the kernel debugger attached, I captured the following crash details in WinDbg for NVidia based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa80211cd010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800782d0d8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000002, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`12c76078 fffff801`66fef0ea : 00000000`00000000 00000000`00000116 fffff880`12c761e0 fffff801`66f734b8 : nt!DbgBreakPointWithStatus
fffff880`12c76080 fffff801`66fee742 : 00000000`00000003 fffff880`12c761e0 fffff801`66f73e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`12c760e0 fffff801`66ef4144 : fffffa80`2094b100 fffff880`021ee9c0 fffffa80`1f54e400 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`12c76800 fffff880`04b33dcb : 00000000`00000116 fffffa80`211cd010 fffff880`0782d0d8 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`12c76840 fffff880`04b32518 : fffff880`0782d0d8 fffffa80`211cd010 fffff880`12c76949 00000000`000000c7 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`12c76880 fffff880`04a1e608 : fffffa80`211cd010 fffff880`12c76949 00000000`00000000 00000000`00000002 : dxgkrnl!TdrIsRecoveryRequired+0x168
fffff880`12c768b0 fffff880`04a4d539 : 00000000`00000000 fffff780`00000320 00000000`00000000 fffffa80`1f54e400 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`12c769b0 fffff880`04a4ba49 : fffffa80`00000002 fffffa80`1f54e400 fffffa80`1f54e840 fffffa80`1f54e840 : dxgmms1!VidSchiCheckHwProgress+0xe5
fffff880`12c76a00 fffff880`04a16fe5 : ffffffff`ff676980 00000000`00000001 fffff880`12c76b69 fffffa80`1f54e400 : dxgmms1!VidSchiWaitForSchedulerEvents+0x20d
fffff880`12c76aa0 fffff880`04a4b646 : 00000000`00000000 00000000`0000000f fffffa80`1f54e400 fffffa80`1f54e400 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`12c76bd0 fffff801`66e9b521 : fffffa80`1f5abb00 fffffa80`1f54e400 fffff880`03b01140 00000000`06a21e1e : dxgmms1!VidSchiWorkerThread+0xca
fffff880`12c76c10 fffff801`66ed9dd6 : fffff880`03af5180 fffffa80`1f5abb00 fffff880`03b01140 fffffa80`19aac040 : nt!PspSystemThreadStartup+0x59
fffff880`12c76c60 00000000`00000000 : fffff880`12c77000 fffff880`12c71000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

SYMBOL_NAME: nvlddmkm+1ae0d8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf93d7

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

 

With the kernel debugger attached, I captured the following crash details in WinDbg for ATI based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa801ed114d0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800725cefc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`06fa9ee8 fffff803`e6ff20ea : 00000000`00000000 00000000`00000116 fffff880`06faa050 fffff803`e6f764b8 : nt!DbgBreakPointWithStatus
fffff880`06fa9ef0 fffff803`e6ff1742 : 00000000`00000003 fffff880`06faa050 fffff803`e6f76e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`06fa9f50 fffff803`e6ef7144 : fffffa80`1e2df4e0 fffff880`020b99c0 fffffa80`1d31f010 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`06faa670 fffff880`04d31dcb : 00000000`00000116 fffffa80`1ed114d0 fffff880`0725cefc 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`06faa6b0 fffff880`04d30548 : fffff880`0725cefc fffffa80`1ed114d0 fffff880`06faa7b9 00000000`00000180 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`06faa6f0 fffff880`04c11608 : fffffa80`1ed114d0 fffff880`06faa7b9 00000000`0000000f fffffa80`1d31f8f8 : dxgkrnl!TdrIsRecoveryRequired+0x198
fffff880`06faa720 fffff880`04c459f9 : 00000000`00000001 fffff880`06faa8a0 fffff880`06faa920 00000000`00000000 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`06faa820 fffff880`04c3ff72 : fffffa80`1d31f010 fffff780`00000320 fffffa80`1d31f770 fffffa80`1d31f010 : dxgmms1!VidSchWaitForCompletionEvent+0x411
fffff880`06faa8e0 fffff880`04c4206c : fffffa80`1d31f010 fffffa80`1d31f450 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiWaitForEmptyHwQueue+0x9a
fffff880`06faa9d0 fffff880`04c3ea85 : 00000000`00000000 fffffa80`1d31f010 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiSuspend+0x74
fffff880`06faaa00 fffff880`04c09fe5 : ffffffff`ff676980 00000000`00000001 fffff880`06faab69 fffffa80`1d31f010 : dxgmms1!VidSchiWaitForSchedulerEvents+0x249
fffff880`06faaaa0 fffff880`04c3e646 : 00000000`00000000 fffffa80`1d585660 fffffa80`1d44d7f0 fffffa80`1d31f010 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`06faabd0 fffff803`e6e9e521 : fffffa80`1d6b9b00 fffffa80`1d31f010 fffff880`03932140 00000000`04d91ecb : dxgmms1!VidSchiWorkerThread+0xca
fffff880`06faac10 fffff803`e6edcdd6 : fffff880`03926180 fffffa80`1d6b9b00 fffff880`03932140 fffffa80`19ac7500 : nt!PspSystemThreadStartup+0x59
fffff880`06faac60 00000000`00000000 : fffff880`06fab000 fffff880`06fa5000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

SYMBOL_NAME: atikmpag+8efc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: atikmpag

IMAGE_NAME: atikmpag.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf9279

FAILURE_BUCKET_ID: 0x116_IMAGE_atikmpag.sys

BUCKET_ID: 0x116_IMAGE_atikmpag.sys

 

This was not really helping me much, and I decided to repeat the tests but use the checked build of Windows 8 to help troubleshoot.

With the kernel debugger attached, I captured the following ASSERT during the boot:

Windows 8 Kernel Version 9200 MP (1 procs) Checked x64
Built by: 9200.16384.amd64chk.win8_rtm.120725-1247
Machine Name:
Kernel base = 0xfffff802`0e01d000 PsLoadedModuleList = 0xfffff802`0e760ac0
System Uptime: 0 days 0:00:06.228 (checked kernels begin at 49 days)
Assertion: The BIOS has reported inconsistent resources (_CRS). Please upgrade your BIOS.
ACPI!PnpBiosGetDeviceResourceList+0x15e:
fffff880`012c3c2a cd2c int 2Ch
...
Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME: System

FAULTING_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x0

CURRENT_IRQL: 0

LOCK_ADDRESS: fffff8020e7c5d60 -- (!locks fffff8020e7c5d60)

Resource @ nt!PiEngineLock (0xfffff8020e7c5d60) Exclusively owned
Threads: fffffa8019a36040-01<*>
1 total locks, 1 locks currently held

PNP_TRIAGE:
Lock address : 0xfffff8020e7c5d60
Thread Count : 1
Thread address: 0xfffffa8019a36040
Thread wait : 0x105eccd4

LAST_CONTROL_TRANSFER: from fffff880012b736f to fffff880012c3c2a

STACK_TEXT:
fffff880`009b4b30 fffff880`012b736f : fffffa80`23a9e900 fffff880`012a7e01 fffff880`009b4c08 fffff880`012a7e70 : ACPI!PnpBiosGetDeviceResourceList+0x15e
fffff880`009b4bd0 fffff880`0125acba : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`012a7e70 fffffa80`1f477010 : ACPI!ACPIBusIrpQueryResourceRequirements+0x8b
fffff880`009b4c50 fffff802`0e91b6a4 : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`009b4db0 fffffa80`23a9e900 : ACPI!ACPIDispatchIrp+0x2a6
fffff880`009b4cf0 fffff802`0e91cd1b : fffffa80`23a9e900 fffff880`009b4db0 00000001`c00000bb 00000000`00000000 : nt!IopSynchronousCall+0x10c
fffff880`009b4d80 fffff802`0e915bdb : fffffa80`23a9e900 fffff880`009b4e50 fffffa80`23a4f850 00000000`0000001e : nt!PpIrpQueryResourceRequirements+0x5f
fffff880`009b4e10 fffff802`0e91748d : fffffa80`23a9b8e0 00000000`00000000 ffffffff`80000218 fffffa80`23a9b8e0 : nt!PiQueryResourceRequirements+0x47
fffff880`009b4ea0 fffff802`0e91a1f2 : fffffa80`23a9b8e0 fffffa80`23a9b8e0 00000000`00000001 00000000`00000000 : nt!PiProcessNewDeviceNode+0x159d
fffff880`009b5070 fffff802`0e08feb5 : fffffa80`19adcd20 00000000`00000000 fffff880`009b5358 00000000`00000000 : nt!PipProcessDevNodeTree+0x1fe
fffff880`009b5310 fffff802`0e08fb59 : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa80`37e19cc0 : nt!PnpDeviceActionWorker+0x345
fffff880`009b53d0 fffff802`0ed4010d : 00000000`00000000 fffff8a0`00000007 fffff8a0`00f08c00 00000000`00000000 : nt!PnpRequestDeviceAction+0x2ed
fffff880`009b5420 fffff802`0ed3b39d : fffff802`0d536800 fffff802`0e7c83c0 00000000`00000006 fffff802`0d536800 : nt!IopInitializeBootDrivers+0x905
fffff880`009b5650 fffff802`0ed2deb5 : fffff802`0d536800 00000000`00000000 fffff802`0d536800 fffff802`0d51ebf0 : nt!IoInitSystem+0xb5d
fffff880`009b59b0 fffff802`0e82d013 : fffff802`0d536800 fffffa80`19a36040 00000000`00000000 fffffa80`19ab3040 : nt!Phase1InitializationDiscard+0x1899
fffff880`009b5bc0 fffff802`0e1b289e : fffff802`0d536800 fffff802`0d536800 00000000`00000000 00000000`00000000 : nt!Phase1Initialization+0x13
fffff880`009b5bf0 fffff802`0e24ef96 : fffff802`0e82d000 fffff802`0d536800 fffff802`0e6c6180 00000000`f8ffffff : nt!PspSystemThreadStartup+0x1a2
fffff880`009b5c60 00000000`00000000 : fffff880`009b6000 fffff880`009b0000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: ACPI!PnpBiosGetDeviceResourceList+15e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME: ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 50109dd0

BUCKET_ID_FUNC_OFFSET: 15e

FAILURE_BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

 

This is interesting, the kernel ASSERT’s on a problem reported by the BIOS.

I contacted SuperMicro support, they said they will investigate the BIOS failure, and they suggested I try to use PCIe slot #3 instead of slot #5. The motherboard manual mentions that slots #1, #2, and #3 are to be used if CPU #1 is installed, and slots #4, #5, and #6 to be used only if CPU #2 is installed.

PCIe

I have both processors installed, so not using the more conveniently located slot #5 never came to mind. I moved the graphic card to CPU #1 slot #3, and voila, install succeeded and Windows 8 was up and running!

 

I repeated the checked build test with the graphic card in slot #3, and the same BIOS ASSERT error was reported, so the BIOS ASSERT seems to be unrelated to the ACPI_TDR_FAILURE error.

 

This was a very frustrating problem, and I still don’t understand the root cause, but I am happy to be able to finally switch both workstations to Windows 8.

WordPress.com 404 With Blogger Permalinks

Part of the research I did before migrating from Blogger to WordPress.com, was to make sure that current Blogger permalinks will resolve correctly once the old posts were imported into WordPress.com. At the time all seemed fine, but soon after migrating, I received alerts from Google Webmaster Tools that there is an increase in site errors, specifically 404 errors.

Some background: Permalinks are the URL’s that point directly to specific posts on the blog. These URL’s are known by search engines, are shared on forums, and are basically the static address of posts. Blogger and WordPress.com use different styles of permalinks. WordPress.com allows some customization of permalinks, but unlike WordPress.org, there is no support for custom plugins to handle rewrites for permalinks, 302’s or 404’s.

Although not documented anywhere, WordPress.com does support Blogger style permalinks, and will correctly redirect the Blogger style link to the WordPress.com style page. As an example, see the links below, one for Blogger and one for WordPress.com:

http://blogdotinsanegenius.blogspot.com/2012/06/looks-can-be-deceiving.html
https://blogdotinsanegenius.wordpress.com/2012/06/looks-can-be-deceiving

Search engines will know the link using the old blogger style URL, and both styles of links will correctly resolve to the current page:

https://blog.insanegenius.com/2012/06/19/looks-can-be-deceiving
https://blog.insanegenius.com/2012/06/looks-can-be-deceiving.html

So why is it that Google Webmaster Tools reported a suddenly spike in 404’s?

Google.404.1

By reviewing the links that report 404, I noticed that the permalink format of certain posts on WordPress.com was slightly different to the Blogger permalinks.

http://blogdotinsanegenius.blogspot.com/2009/10/hitachi-a7k2000-and-seagate-barracude.html
http://blogdotinsanegenius.blogspot.com/2010/05/zotac-xboxhd-id11-mkv-h264-video.html
http://blogdotinsanegenius.blogspot.com/2008/03/printing-from-network.html

https://blogdotinsanegenius.wordpress.com/2009/10/11/hitachi-ultrastar-and-seagate-barracude-lp-2tb-drives/
https://blogdotinsanegenius.wordpress.com/2010/05/28/zotac-xboxhd-id11-mkv-h-264-video-playback-performance/
https://blogdotinsanegenius.wordpress.com/2008/03/30/printing-from-the-network/

Notice the difference? Blogger appears to keep links short, and remove words like “the” and “and”.

I contacted WordPress.com support, and they provided a manual solution. They suggested that I modify the “slug” of each 404 post to match the Blogger style permalink.

Slug

This resolved the problem with the top 404’s, but I would have expected the Blogger import plugin to take care of this for me.

But, I soon received another alert email from Google Webmaster Tools, and this time the 404 posts looked a bit different.

Google.404.2

Notice that all the links contain parameters in the URL (I think these are old style Google Analytics parameters), and without the parameter the redirect works, but with any parameters the redirect fails.

https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html
https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html?m=1

I again contacted WordPress.com support, and I am still awaiting a resolution.

[Update: 9 August 2012]
Just got an email from WordPress.com support, the problem with parameters is fixed, thank you.

Windows 8 Install Crash With NVidia Quadro 5000

I got Windows 8 RTM installed on my two SuperMicro SuperWorkstation machines, with a bit of trouble along the way, but nothing I could not work around. But, I ran into a problem with NVidia Quadro 5000 cards causing a VIDEO_TDR_FAILURE BSOD during the Windows 8 install process.

 

I was running my two workstations with ATI FirePro V7900 graphic cards, but I decided I wanted a bit more rendering horsepower. I wanted a card that had a good balance between modern architecture, great 2D performance, good 3D performance, OpenCL or CUDA support, and reasonable power consumption. I found the Tom’s Hardware Workstation Graphics 2012 benchmark site to be a very informative, and I decided that the NVidia Quadro 5000 was a very good choice.

I replaced my FirePro V7900 with the Quadro 5000, and started the Windows 8 x64 RTM install. All went well, until the first reboot during the install, and the machine would blue screen crash with a VIDEO_TDR_FAILURE. During the install process the hardware is identified, the appropriate drivers extracted, and on the reboot those drivers are started. It appears that soon after the NVidia driver loads, that it crashes.

 

The Timeout Detection and Recovery (TDR) feature was added to Windows Vista, and was a way for the OS to recover from a renderer failure without the need to restart the machine. Typically the user will see a notification that the graphic subsystem was restarted, but in cases where the restart fails, a VIDEO_TDR_FAILURE blue screen crash is generated.

The web is full of reports of NVidia VIDEO_TDR_FAILURE crashes, and solutions typically involve replacing the hardware or updating drivers. In my case I had two new machines, and two new graphic cards, and a brand new operating system, and both cards on both machines crashed.

I contacted SuperMicro support, and responsive as they always are, said they would investigate.

I also contacted PNY support, as PNY is the manufacturer of the NVidia Quadro 5000, here is their reply.

Again, I am sorry, but we do not list Windows 8 (yet) as being compatible with the Quadro 5000, or any other Quadro or Geforce card we manufacture. Until it is publically and commercially available, we cannot provide support for Windows 8. Windows 8 is not available to the end user yet, and it is in testing, as is the Nvidia driver. If you find issues, you must report them to Microsoft in order to improve compatibility in the final release. There is obviously a compatibility problem with Windows 8 and the Quadro 5000 right now (according to your testing of TWO cards), and unfortunately there is nothing we can do to fix it while in is not available to the public. My best advice is to try it again when it is officially released sometime in 2013.

Not very helpful at all, and their concept of Windows 8 release timing, and their responsibility, is way out there.

 

The real problem here is that it is the in-box NVidia drivers that are crashing, not drivers I install later. And as it is the in-box graphic drivers that crash, there is no (easy) way to update the drivers used by the Windows 8 install media.

 

I had previously used a Quadro 4000 card on the same machines, and they installed without incident, so it appears to be something unique the Quadro 5000 cards.

At this time I am waiting for SuperMicro to get back to me with suggestions, as I have little hope of hearing anything useful from PNY.

Windows 8 Install Hangs Booting From LSI 2308 SAS Controller

I’ve previously posted about problems installing Windows 8 on SuperMicro machines, and that SuperMicro released a Beta BIOS that solved the install problems. I’ve since run into two more problems; the install hanging when booting of the LSI 2038 SAS controller, and a BSOD when using a Quadro 5000 video card (more on that in a later post).

 

I have two SuperWorkstation machines, a 7047A-T using a X9DAi motherboard, and a 7047A-73 using a X9DA7 motherboard.

The X9DAi and X9DA7 both use the Intel C602 chipset. The X9DAi and X9DA7 both have 2 x SATA3 ports, 4 x SATA2 ports, and 4 x SAS / SATA2 ports. The X9DA7 has an additional LSI 2308 controller with 8 x SAS2 / SATA3 ports.

On the 7047A-T / X9DAi machine, the 8 x hot-swap drive trays are connected to the 2 x SATA3, 2 x SATA2, and 4 x SAS / SATA2 ports.

On the 7047A-73 / X9DA7 machine, the 8 x hot-swap drive trays are connected to the 8 x LSI 2308 ports.

SuperMicro support provided me with Beta BIOS’s for the X9DAi and X9DA7 motherboards, this resolved the ACPI_BIOS_ERROR, and allowed me to install Windows 8 RTM on these machines, or at least get past the BSOD while booting the install media.

 

I configured both machines with:

 

In the 7047A-T / X9DAi machine, I installed the SSD drive in slot-0 of the hot-swap trays, connected to SATA3 port-0. I installed Windows 8 x64 RTM without issue.

 

In the 7047A-73 / X9DA7 machine I installed the SSD drive in slot-0 of the hot-swap trays, connected to LSI2308 port-0. I installed Windows 8 x64 RTM, and the install hanged at 0% while copying files.

While in this state, I suspected the problem to be IO related, so I pressed Shift-F10 to open a console window, I ran diskpart, and diskpart hanged.

I downloaded the latest LSI 2308 drivers from the Supermicro FTP site. I ran the install again, this time I manually loaded the drivers instead of using the in-box drivers, same problem, hang at 0%.

LSI does not make drivers directly available for HBA chips, but the LSI SAS 9205-8e uses the LSI 2308, and I downloaded the drivers from LSI. They were the same version as the drivers available on the SuperMicro FTP site.

 

I contacted SuperMicro support, they suggested I install using the SATA3 port while they research the problem. Connecting the SSD drive to SATA3 port-0 installed fine.

 

I tested the same setup using Windows 7, and although Windows 7 did not include in-box drivers for the LSI 2308, after loading the drivers, Windows 7 installed fine with the SSD connected to LSI2308 port-0.

This probably indicates a Windows 8 compatibility problem with the LSI 2308 driver, or HBA firmware.

 

LSI HBA’s can be configured to run in Initiator Target (IT) or Integrated RAID (IR) mode. This can be changed by flashing with the appropriate IT or IR firmware. IT firmware is typically preferred where there is no need for hardware RAID and all disks will be in JBOD mode, e.g. for use with ZFS or Storage Spaces.

When you flash between IT and IR mode, you need to erase the firmware before re-flashing, i.e. you cannot simply flash one mode on top of another mode. On the SuperMicro motherboards, you also need to perform the flash operation from within the EFI shell, flashing from other environments will fail. You can follow these KB’s to help with the process; LSI 16266, SuperMicro 14368, and SuperMicro 14151. I would not recommend using SuperMicro 14368 method, as it wipes the entire firmware memory, and you will need to manually re-enter the SAS address. It is basically the difference between using “sas2flash -o -e 6” and “sas2flash -o -e 7”, see the SAS2flash reference guide for details.

SAS2Flash

 

The X9DA7 motherboard came with firmware version 13.0.0.56 for the LSI 2308, configured in IR mode. I updated the firmware using the firmware from the SuperMicro FTP site to 13.0.0.57 in IT mode.

The update process I followed was to boot into the EFI shell while having a USB drive attached containing the firmware update, the drive must contain the firmware, the boot BIOS, and the SAS2Flash.efi tool.

In the EFI shell run the “map” command to list the hardware and see which drive is the USB drive, mount that drive using “mount fs[drive number]:”, e.g. “mount fs1:”, then change to the directory to the USB drive using “fs1:”:

map
mount fs1:
fs1:

Then wipe the flash “sas2flash -o -e 6”, then program the new firmware and boot code “sas2flash -o -f [firmware file] -b [bootcode file]”, e.g. “sas2flsh -f 2308IT13.5FW -b mptsas2.rom”, then restart.:

sas2flash -o -e 6
sas2flsh -f 2308IT13.5FW -b mptsas2.rom
reset

Same problem, hang at 0%.

 

I again referred to the LSI site for updated firmware for the LSI 2308, and the LSI SAS 9205-8e and LSI SAS 9207-8e includes firmware P14 version 14.0.0.0, a major revision upgrade from version 13.0.0.57 from the SuperMicro site.

The P14 firmware packages does not include the EFI version of SAS2Flash, but a bit of search engine exploration showed it is still included in the P13 packages.

 

I am not quite brave enough to flash to this version yet, as a failed flash will require a hardware swap. I’ll continue running this machine with the SSD connected to the SATA3 port.

 

At this point I am waiting for SuperMicro support to get back to me with a solution, or confirming that I can flash the P14 firmware and see if that resolves the issue.