Windows 8 VIDEO_TDR_FAILURE Madness

I finally figured out why I kept on getting VIDEO_TDR_FAILURE BSOD’s when installing Windows 8 on my SuperMicro workstations. It turns out that the problem goes away when I use a PCIe slot associated with CPU #1, instead of a slot associated with CPU #2.

Some history on my adventures with Windows 8 and SuperMicro SuperWorkstations:
I got ACPI_BIOS_ERROR BSOD’s while installing Windows 8, SuperMicro provided a Beta BIOS that resolved the problem.
The Windows 8 install hangs if installing to a SSD drive on a LSI 2308 SAS controller, that issue is still unresolved, but can be worked around by connecting the SSD to the Intel SATA controller.
I got VIDEO_TDR_ERROR BSOD’s while installing Windows 8 with a NVidia Quadro 5000 graphic card, same with an ATI FirePro V7900 or a NVidia GeForce GTX 680 or an ATI HD 7970. And this post is about resolving that problem.

 

SuperMicro released v1.0a BIOS updates for the X9DAi and X9DA7 motherboards used in the 7470A-T and 7470A-73 SuperWorkstations. I was hoping this will resolve the VIDEO_TDR_FAILURE BSOD’s, but no.

The X9DA7 BIOS updated without issue, but the X9DAi update reported an error at the end of the update process; “Error when sending Enable Message to ME”.

I contacted SuperMicro support, and they asked me to make sure that there is no jumper on JPME1. There is no mention of JPME1 in the motherboard manual, but it is located next to JIPMB1, next to PCIe slot #1. The header had a jumper on pins 2 and 3, where the same header on the X9DA7 motherboard had a jumper between 1 and 2. I removed the jumper, and the BIOS update succeeded.

JPME1

 

Unlike the ACPI_BIOS_ERROR BSOD that happens during the WinPE phase of the install, the VIDEO_TDR_FAILURE BSOD happens on the first boot after the install, during the hardware detection and driver install phase. This means that the technique I used to kernel debug the initial boot phase will not work, as the second boot is using the BCD already deployed to the target hard drive. I had to modify the BCD of the already installed image, prior to the install continuing after the reboot.

 

I tested many permutations of graphic cards and configurations, and it quickly became very annoying to have to type my Win8 product key every single time I boot and install. To avoid this I created configuration files in the sources directory on the install media, and this bypassed the key question. You can read more about the meaning of the file contents here:

EI.cfg:

[EditionID]
Professional
[Channel]
Retail
[VL]
0

PID.txt:

[PID]
Value=XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

 

To modify the BCD of the installed image, and be able to easily repeat the second phase of install testing, I installed a second hard drive, and deployed WinPE to the second drive. By using F11 during boot to choose the boot drive, I could select booting from the second drive at any time.

 

I have a variety WinPE v3 (Win7) based utility images, and I updated them to use WinPE v4 (Win8). In the process I lost the boot menu, and the first image in the menu automatically started booting. After some trial and error, I found the bootmenupolicy BCD option, and when set to legacy mode, the old style menu is back:

bcdedit /set {default} bootmenupolicy legacy

 

I installed Win8 on the primary drive, and during the reboot, instead of booting to the installed Win8 drive, I used F11 and booted to my secondary WinPE drive. From WinPE I modified the boot BCD to enable kernel debugging over the network:

bcdedit -store c:\boot\bcd /set {default} nocrashautoreboot yes
bcdedit -store c:\boot\bcd /set {default} debugtype net
bcdedit -store c:\boot\bcd /set {default} hostip 3232235876
bcdedit -store c:\boot\bcd /set {default} port 50000
bcdedit -store c:\boot\bcd /set {default} key my.secret.debug.key
bcdedit -store c:\boot\bcd /debug {default} yes

This is equivalent to:

bcdedit /dbgsettings net host:192.168.1.100 port:50000 key:my.secret.debug.key

But unlike the dbgsettings command, this allows me to specify a BCD store. Also note that the IP address is stored as a single numeric value instead of the dotted IP format.

 

While still in WinPE, I captured the state of the primary Win8 drive by making a drive image using Symantec Ghost, the real Ghost, currently sold as Symantec Ghost Solution Suite, not the same named but volume snapshot based Norton Ghost or Symantec System Recovery. By saving a drive image, I can easily change hardware or configurations, test the install starting at the second phase, reboot to the secondary WinPE drive using F11, restore the entire drive image, and try again, while leaving the kernel debug options intact.

 

I tested with following hardware configurations in various permutations:

 

With the kernel debugger attached, I captured the following crash details in WinDbg for NVidia based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa80211cd010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800782d0d8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000002, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`12c76078 fffff801`66fef0ea : 00000000`00000000 00000000`00000116 fffff880`12c761e0 fffff801`66f734b8 : nt!DbgBreakPointWithStatus
fffff880`12c76080 fffff801`66fee742 : 00000000`00000003 fffff880`12c761e0 fffff801`66f73e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`12c760e0 fffff801`66ef4144 : fffffa80`2094b100 fffff880`021ee9c0 fffffa80`1f54e400 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`12c76800 fffff880`04b33dcb : 00000000`00000116 fffffa80`211cd010 fffff880`0782d0d8 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`12c76840 fffff880`04b32518 : fffff880`0782d0d8 fffffa80`211cd010 fffff880`12c76949 00000000`000000c7 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`12c76880 fffff880`04a1e608 : fffffa80`211cd010 fffff880`12c76949 00000000`00000000 00000000`00000002 : dxgkrnl!TdrIsRecoveryRequired+0x168
fffff880`12c768b0 fffff880`04a4d539 : 00000000`00000000 fffff780`00000320 00000000`00000000 fffffa80`1f54e400 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`12c769b0 fffff880`04a4ba49 : fffffa80`00000002 fffffa80`1f54e400 fffffa80`1f54e840 fffffa80`1f54e840 : dxgmms1!VidSchiCheckHwProgress+0xe5
fffff880`12c76a00 fffff880`04a16fe5 : ffffffff`ff676980 00000000`00000001 fffff880`12c76b69 fffffa80`1f54e400 : dxgmms1!VidSchiWaitForSchedulerEvents+0x20d
fffff880`12c76aa0 fffff880`04a4b646 : 00000000`00000000 00000000`0000000f fffffa80`1f54e400 fffffa80`1f54e400 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`12c76bd0 fffff801`66e9b521 : fffffa80`1f5abb00 fffffa80`1f54e400 fffff880`03b01140 00000000`06a21e1e : dxgmms1!VidSchiWorkerThread+0xca
fffff880`12c76c10 fffff801`66ed9dd6 : fffff880`03af5180 fffffa80`1f5abb00 fffff880`03b01140 fffffa80`19aac040 : nt!PspSystemThreadStartup+0x59
fffff880`12c76c60 00000000`00000000 : fffff880`12c77000 fffff880`12c71000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

SYMBOL_NAME: nvlddmkm+1ae0d8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf93d7

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

 

With the kernel debugger attached, I captured the following crash details in WinDbg for ATI based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa801ed114d0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800725cefc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`06fa9ee8 fffff803`e6ff20ea : 00000000`00000000 00000000`00000116 fffff880`06faa050 fffff803`e6f764b8 : nt!DbgBreakPointWithStatus
fffff880`06fa9ef0 fffff803`e6ff1742 : 00000000`00000003 fffff880`06faa050 fffff803`e6f76e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`06fa9f50 fffff803`e6ef7144 : fffffa80`1e2df4e0 fffff880`020b99c0 fffffa80`1d31f010 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`06faa670 fffff880`04d31dcb : 00000000`00000116 fffffa80`1ed114d0 fffff880`0725cefc 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`06faa6b0 fffff880`04d30548 : fffff880`0725cefc fffffa80`1ed114d0 fffff880`06faa7b9 00000000`00000180 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`06faa6f0 fffff880`04c11608 : fffffa80`1ed114d0 fffff880`06faa7b9 00000000`0000000f fffffa80`1d31f8f8 : dxgkrnl!TdrIsRecoveryRequired+0x198
fffff880`06faa720 fffff880`04c459f9 : 00000000`00000001 fffff880`06faa8a0 fffff880`06faa920 00000000`00000000 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`06faa820 fffff880`04c3ff72 : fffffa80`1d31f010 fffff780`00000320 fffffa80`1d31f770 fffffa80`1d31f010 : dxgmms1!VidSchWaitForCompletionEvent+0x411
fffff880`06faa8e0 fffff880`04c4206c : fffffa80`1d31f010 fffffa80`1d31f450 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiWaitForEmptyHwQueue+0x9a
fffff880`06faa9d0 fffff880`04c3ea85 : 00000000`00000000 fffffa80`1d31f010 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiSuspend+0x74
fffff880`06faaa00 fffff880`04c09fe5 : ffffffff`ff676980 00000000`00000001 fffff880`06faab69 fffffa80`1d31f010 : dxgmms1!VidSchiWaitForSchedulerEvents+0x249
fffff880`06faaaa0 fffff880`04c3e646 : 00000000`00000000 fffffa80`1d585660 fffffa80`1d44d7f0 fffffa80`1d31f010 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`06faabd0 fffff803`e6e9e521 : fffffa80`1d6b9b00 fffffa80`1d31f010 fffff880`03932140 00000000`04d91ecb : dxgmms1!VidSchiWorkerThread+0xca
fffff880`06faac10 fffff803`e6edcdd6 : fffff880`03926180 fffffa80`1d6b9b00 fffff880`03932140 fffffa80`19ac7500 : nt!PspSystemThreadStartup+0x59
fffff880`06faac60 00000000`00000000 : fffff880`06fab000 fffff880`06fa5000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

SYMBOL_NAME: atikmpag+8efc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: atikmpag

IMAGE_NAME: atikmpag.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf9279

FAILURE_BUCKET_ID: 0x116_IMAGE_atikmpag.sys

BUCKET_ID: 0x116_IMAGE_atikmpag.sys

 

This was not really helping me much, and I decided to repeat the tests but use the checked build of Windows 8 to help troubleshoot.

With the kernel debugger attached, I captured the following ASSERT during the boot:

Windows 8 Kernel Version 9200 MP (1 procs) Checked x64
Built by: 9200.16384.amd64chk.win8_rtm.120725-1247
Machine Name:
Kernel base = 0xfffff802`0e01d000 PsLoadedModuleList = 0xfffff802`0e760ac0
System Uptime: 0 days 0:00:06.228 (checked kernels begin at 49 days)
Assertion: The BIOS has reported inconsistent resources (_CRS). Please upgrade your BIOS.
ACPI!PnpBiosGetDeviceResourceList+0x15e:
fffff880`012c3c2a cd2c int 2Ch
...
Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME: System

FAULTING_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x0

CURRENT_IRQL: 0

LOCK_ADDRESS: fffff8020e7c5d60 -- (!locks fffff8020e7c5d60)

Resource @ nt!PiEngineLock (0xfffff8020e7c5d60) Exclusively owned
Threads: fffffa8019a36040-01<*>
1 total locks, 1 locks currently held

PNP_TRIAGE:
Lock address : 0xfffff8020e7c5d60
Thread Count : 1
Thread address: 0xfffffa8019a36040
Thread wait : 0x105eccd4

LAST_CONTROL_TRANSFER: from fffff880012b736f to fffff880012c3c2a

STACK_TEXT:
fffff880`009b4b30 fffff880`012b736f : fffffa80`23a9e900 fffff880`012a7e01 fffff880`009b4c08 fffff880`012a7e70 : ACPI!PnpBiosGetDeviceResourceList+0x15e
fffff880`009b4bd0 fffff880`0125acba : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`012a7e70 fffffa80`1f477010 : ACPI!ACPIBusIrpQueryResourceRequirements+0x8b
fffff880`009b4c50 fffff802`0e91b6a4 : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`009b4db0 fffffa80`23a9e900 : ACPI!ACPIDispatchIrp+0x2a6
fffff880`009b4cf0 fffff802`0e91cd1b : fffffa80`23a9e900 fffff880`009b4db0 00000001`c00000bb 00000000`00000000 : nt!IopSynchronousCall+0x10c
fffff880`009b4d80 fffff802`0e915bdb : fffffa80`23a9e900 fffff880`009b4e50 fffffa80`23a4f850 00000000`0000001e : nt!PpIrpQueryResourceRequirements+0x5f
fffff880`009b4e10 fffff802`0e91748d : fffffa80`23a9b8e0 00000000`00000000 ffffffff`80000218 fffffa80`23a9b8e0 : nt!PiQueryResourceRequirements+0x47
fffff880`009b4ea0 fffff802`0e91a1f2 : fffffa80`23a9b8e0 fffffa80`23a9b8e0 00000000`00000001 00000000`00000000 : nt!PiProcessNewDeviceNode+0x159d
fffff880`009b5070 fffff802`0e08feb5 : fffffa80`19adcd20 00000000`00000000 fffff880`009b5358 00000000`00000000 : nt!PipProcessDevNodeTree+0x1fe
fffff880`009b5310 fffff802`0e08fb59 : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa80`37e19cc0 : nt!PnpDeviceActionWorker+0x345
fffff880`009b53d0 fffff802`0ed4010d : 00000000`00000000 fffff8a0`00000007 fffff8a0`00f08c00 00000000`00000000 : nt!PnpRequestDeviceAction+0x2ed
fffff880`009b5420 fffff802`0ed3b39d : fffff802`0d536800 fffff802`0e7c83c0 00000000`00000006 fffff802`0d536800 : nt!IopInitializeBootDrivers+0x905
fffff880`009b5650 fffff802`0ed2deb5 : fffff802`0d536800 00000000`00000000 fffff802`0d536800 fffff802`0d51ebf0 : nt!IoInitSystem+0xb5d
fffff880`009b59b0 fffff802`0e82d013 : fffff802`0d536800 fffffa80`19a36040 00000000`00000000 fffffa80`19ab3040 : nt!Phase1InitializationDiscard+0x1899
fffff880`009b5bc0 fffff802`0e1b289e : fffff802`0d536800 fffff802`0d536800 00000000`00000000 00000000`00000000 : nt!Phase1Initialization+0x13
fffff880`009b5bf0 fffff802`0e24ef96 : fffff802`0e82d000 fffff802`0d536800 fffff802`0e6c6180 00000000`f8ffffff : nt!PspSystemThreadStartup+0x1a2
fffff880`009b5c60 00000000`00000000 : fffff880`009b6000 fffff880`009b0000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: ACPI!PnpBiosGetDeviceResourceList+15e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME: ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 50109dd0

BUCKET_ID_FUNC_OFFSET: 15e

FAILURE_BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

 

This is interesting, the kernel ASSERT’s on a problem reported by the BIOS.

I contacted SuperMicro support, they said they will investigate the BIOS failure, and they suggested I try to use PCIe slot #3 instead of slot #5. The motherboard manual mentions that slots #1, #2, and #3 are to be used if CPU #1 is installed, and slots #4, #5, and #6 to be used only if CPU #2 is installed.

PCIe

I have both processors installed, so not using the more conveniently located slot #5 never came to mind. I moved the graphic card to CPU #1 slot #3, and voila, install succeeded and Windows 8 was up and running!

 

I repeated the checked build test with the graphic card in slot #3, and the same BIOS ASSERT error was reported, so the BIOS ASSERT seems to be unrelated to the ACPI_TDR_FAILURE error.

 

This was a very frustrating problem, and I still don’t understand the root cause, but I am happy to be able to finally switch both workstations to Windows 8.

SuperMicro Beta BIOS supports Windows 8 and Server 2012

In a previous post I reported that my SuperMicro SuperWorkstation 7047A-T failed to install Windows 8 or Windows Server 2012 due to a ACPI_BIOS_ERROR. I contacted SuperMicro support, and I was informed that new BIOS releases are on their way that will support Windows 8 and Server 2012.

This morning I received an email from SuperMicro, with a new Beta BIOS for the X9DAi motherboard used in the 7047A-T. The new BIOS allowed me to install Windows 8 and Server 2012.

I used a DOS bootable USB key, and installed the new BIOS.

The 7047A-T has USB ports on the back and on the front of the case. The ports on the front are all USB3, and it is not possible to boot from these ports, at least I have not yet found a configuration that allows booting from USB3 ports. I tried using USB2 keys and, my newest Kingston DataTraveler HyperX 3.0 super fast USB3 keys, the BIOS does not list any boot devices in these USB3 ports. To boot from USB you have to plug the USB key in one of the rear USB2 ports.

The new BIOS version is “1.0 beta”, compilation date “7/23/2012”. The BIOS screen looks like the more modern AMI EFI BIOS’s I’ve seen in other devices, i.e. the thin font instead of the classic console font.

BIOS.Beta

I performed a “Restore Optimized Defaults”, and then went through the options to see what has changed and what is new.

The [Advanced] [Chipset Configuration] [North Bridge] [IOH Configuration] now sets all PCIe busses to GEN3, the old BIOS defaulted to GEN2.

The [Advanced] [SATA Configuration] now enabled hot plug on all ports, the old BIOS defaulted to hot plug disabled.

The [Advanced] [Boot Feature] ads a new power configuration item called “EuP”. This seems to be related to EU Directive 2005/32/EC:

EU Directive 2005/32/EC enacted by the European Union member countries dictates that after January 1, 2010, no computer or other energy using product (EuP) sold in the member countries may dissipate more than 1 Watt in the standby (S5) state.

I measured the power utilization, and the machine uses 2W when powered off, 140W at idle in Windows 8 desktop, and 7W while sleeping.

I updated my Windows 8 USB key to the latest build (I have access to), booted from the USB key, and installed Windows 8 without any major issues.

I had swapped the NVidia Quadro 4000 for a faster ATI FirePro V7900. The v1.0 BIOS worked fine with the Quadro 4000, but after installing the V7900, the screen powered on and Windows 7 started booting before I had a chance to see the BIOS screen. After installing the new Beta BIOS, the V7900 works as expected and I can see the BIOS screen during POST.

This is a note for ATI; please make sure your VGA driver install UI fits on a 640×480 display. When I swapped the Quadro 4000 for the V7900, and rebooted into Windows 7, I booted into a 640×480 16 color screen. Imagine my frustration trying to guess which button has focus when you can only see the top half of the ATI driver installer.

Windows 8 automatically installed drivers for the V7900.

The only driver Windows 8 did not automatically install is the C600 chipset SAS driver. I installed the Intel Rapid Storage Technology Enterprise (RSTe) drivers, and that solved that problem.

While running Windows 7 on this machine, and running the Windows Experience Index Assessment, the test would always crash. The same test in Windows 8 completed successfully.

Win8.EI

I found the 2D and 3D results to be disappointing, and I tried to replace the “ATI FirePro V (FireGL V) Graphics Adapter (Microsoft Corporation – WDDM v1.20)” driver with the ATI Windows 8 Consumer Preview driver. Although the release notes indicate that the V7900 is supported, the driver installation failed with an unsupported hardware error. I’ll have to wait for newer Windows 8 drivers from ATI to see if the test scores improve.

I’m quite happy that I can use my new machines with Windows 8.

I just wish SuperMicro solved the BIOS incompatibility problems long ago, after all, it has been almost two years since the Windows 8 pre-release program started, and almost a year since the release of the public developer preview.

Intel DP45SG and Lian-Li PC-C33B HTPC

I recently built a new home theater PC using an Intel DP45SG motherboard and a Lian-Li PC-C33B case.
I am replacing my existing HTPC that appears to be not quite compatible with Windows 7.
The existing machine uses a Lian-Li PC-C31B case, Intel DG33TL motherboard, and ATI Sapphire HD 2600XT video card.
I started documenting the new machine installation, but the new case was on backorder, and I had all components except the new case, so I tried the HD 5750 card in the old machine.
The results were not so good, skip ahead and read about the display driver that stops responding, or read on.
Some background on the old machine…
The HD 2600XT GPU fan was very load, too loud for a HTPC.
I replaced the stock HD 2600XT fan with a Zalman VF900-Cu fan, and this made it much quieter.

The stock Lian-Li case fans reported erratic rotational speeds with the DG33TL fan sensors.
I replaced the stock case fans with Antec Tri-Cool fans, and the rotational speeds were reported correctly.

Since upgrading from Vista to Windows 7, the machine does not stay asleep, it will go to sleep, then within a few seconds wake up again.
The DG33TL board is also missing some Windows 7 drivers, specifically the SMBus driver from Vista has to be installed in compatibility mode.
I am a great fan of Lian-Li cases, they are light, extremely well made, and very stylish.
I have owned several Lian-Li cases, including a PC-V2100B Plus II, PC-C31B, PC-A06B, PC-60FWB, PC-B71B, and the latest the PC-C33B.
I chose to replace my PC-C31B case with the Lian-Li PC-C33B case because I wanted to use an ATX size motherboard, and the PC-C31B case only accommodates Micro-ATX boards.
The PC-C31 was succeeded by the PC-C32, and the PC-C32 was succeeded by the PC-C33, so the cases are very similar.
On the outside the PC-C31B and the PC-C33B look nearly identical, on the inside the PC-C33B layout is more spacious, and better laid out.
The one thing I wish the PC-C33B had retained was the hidden CD-ROM covers, it makes for a neater appearance.

I am not particular to any one brand of motherboard, but I normally use either Asus, Gigabyte, or Intel.
I chose the Intel DP45SG because it has already undergone several revisions to iron out the kinks, and it provided the basic functionality I needed without any additional bells and whistles I don’t need.
With the release of the i5 and i7 processors, and P55 chipsets, I chose to stay with the P45 chipset because the Core processors and dual-channel DDR3 memory is is reasonably cheap.
I went with an Intel Core 2 Quad Q9650 3GHz processor, and Kingston KHX1333C7D3K2 memory.
I haven’t used NVidia graphic cards in a long time, compared to the ATI HD series cards, the NVidia equivalents are just too expensive.
I chose the ATI Sapphire HD 5750 because it has an HDMI connecter, thus no need for a DVI to HDMI adapter, and it is quiet.
I could have gone with the 5770, but the 5750 is sufficient for my needs, primarily watching movies, and is quieter and uses less power.
The DP45SG board has three fan connectors, Front, Rear, and AUX.
The Lian-Li PC-C33B case has three fans, two rear 80mm 1200rpm fans, and a 140mm 1200rpm HDD cage fan.
I connected the front and rear fan connectors to the two rear 80mm fans, and the AUX connector to the 140mm HDD cage fan.
The DP45SG BIOS supports temperature feedback fan control.
But with this option enabled, the two stock Lian-Li 80mm rear fans would not run at all.
If I disable fan control, meaning the fans are on all the time, the fans worked fine.
I replaced the two stock fans with Scythe S-Flex 80mm 1500rpm fans, and they worked perfectly, and silently, at low RPM.
When I ordered the 80mm fans, I also ordered a Scythe Kaze Maru 140mm 1200rpm fan to replace the stock Lian-Li 140mm fan.
When I tried to install it, I realized that this was really a 120mm fan, or at least the mounting holes were for a 120mm fan.
There is a little piece of text on Scythe site that I missed:
“*Only Compatible to 120mm fan Slots!!*”
I left the stock Lian-Li 140mm fan and it works fine, maybe a little loud, but I don’t have a suitable replacement.
While searching for information on the fans not running, I came across the following on the Intel Desktop Control Center site:
“The Intel Desktop Board DP45SG was updated to revision AA# E27733-405 to add an alternate hardware monitoring and fan control ASIC.”
I have the 405 revision board, but without access to an older board, I really don’t know what changed.
I used Lavalys EVEREST to monitor the fan speeds from within Windows, at idle the 80mm fans run around 410rpm, and the 140mm fan at around 1100rpm.
EVEREST does however report the wrong fan labels; System should be AUX, Chassis should be Front, and Power Supply should be Rear.
I posted the mismatch on the EVEREST forum, I hope they fix it at some point.
The DP45SG board requires three power connectors, the normal 2×12 pin, a 2×2 pin, and a 4×1 pin.
I’ve seen other Intel boards requiring the additional 2×2, but this is the first board I’ve seen that requires the 2×2 and a 4×1.
I previously had a problem with an Intel S5000PSL board that required the extra 2×2, but the Corsair HX 850W PSU did not include the 2×2 pin connector, I had to buy a 4×1 to 2×2 converter for this board.
Fortunately the Thermaltake Toughpower 650W PSU I used for this build had all the required connectors.
On running the system I noticed one abnormality reported in the eventlog:
“The platform firmware has corrupted memory across the previous system power transition. Please check for updated firmware for your system.”

Searching I found several people reporting this event on a variety of hardware.
I did find this document from Microsoft on the topic, and they have this to say:
“During Windows development, we observed some systems that corrupt the lowest 1 MB of physical memory during a sleep transition. We traced the memory corruption to code defects in platform firmware. Because of the pervasiveness of the problem in the industry and the desire for reliable sleep transitions, Windows no longer stores operating system code and data in the lowest 1 MB of physical memory.”
Everything is now up and running with Windows 7 Ultimate x64.
The new HTPC replaced the old one in our living room, all that is left to do are the final tweaks for power profiles, remote control only login, codec’s, etc.
I normally use Media-Portal for a media frontend, but I’ve been playing with XBMC, and I think I’ll give that a try instead.
The new machine is not as quiet as the old one, I suspect it is because of the 140mm fan, and the additional ventilation holes on the side of the case.
I’ll keep on looking for a quieter 140mm fan, and maybe add some sound insulation, but for now it is good enough.

Problems with the old HTPC and the new HD 5750 card…

The new Lian-Li PC-C33B case was on backorder, so while I had all components except the case, I tried the HD 5750 in my current machine.
The HD 2600XT worked out of the box with the drivers included with Windows 7 Ultimate x64.
I replaced the HD 2600XT with the HD 5750, on reboot the display reverted to standard VGA, and 640×480 resolution.
I had to download and install the ATI Catalyst 9.11 drivers.
One would think that a VGA driver installer would be designed to fit on a screen that does not have VGA drivers installed, i.e. fit on a 640×480 resolution screen?
But no, with the standard VGA resolution the ATI driver installer window does not fit on the screen.
In order to install the drivers I had to move the window using the keyboard, see what the keyboard accelerator shortcut for a UI element is, or see where the tab focus is.
Really ATI, this seems like such a basic thing.

After the driver was installed I noticed the screen underscanned, meaning there is a black border or unused space around the screen.
I know from past experience that there is an overscan option in the Catalyst Control Center, but when I looked where I remember the setting to be, I could not find it.
I also noticed that the control panel menu layout has completely changed, and not for the better.
After some searching I found that you have to go to your displays panel, then click on the little arrow on the small monitor window, not the big monitor window.

Then adjust the overscan.
It was interesting to note that the default value, in Windows 7 at least, is to underscan.
When I first hooked up my HTPC running Vista to my plasma TV, fixing the overscan is the first thing I tried to do so that I could see the entire desktop.
Usability wise it makes sense to have a default that will let you see the entire desktop, vs. a default that cuts of parts of the screen.
My preference is to just let the TV overscan and not let the graphic card scale the output to compensate for overscan.
The 1:1 ratio, i.e. no scaling, results in better graphic quality, especially noticeable with fonts, at the expense of the desktop edge not being visible.
As I was searching for information on the new CCC options, I found many people complaining about CCC, and recommending using ATI Tray Tools instead.
I’ve never used it myself, but it is good to know there are alternatives.

Now that the driver was installed, another problem presented itself.
Every minute or so the screen would freeze, then a few seconds later it would start responding again, and windows would report:
“Display driver stopped responding and has recovered”.
When this happens the screen would freeze, the mouse cursor would still work, sometimes there would be squiggly lines on the screen, and other times it would go gray.
On two occasions the screen did not recover and I had to do a hard reset.

Analyzing the dump file with WinDbg, the problem is related to TDR, VIDEO_TDR_TIMEOUT_DETECTED, GRAPHICS_DRIVER_TDR_TIMEOUT.
MSDN has the following to say:
“This indicates that the display driver failed to respond in a timely fashion.”
Searching I found many people complaining about this problem with Windows 7 x64 and the 57xx cards, see here, here, and here.
A common response was to wait for the new Catalyst 9.12 drivers.
I was still using the 9.11 drivers, so I waited, and when released, I installed 9.12, but the same problem.
The ATI forum reported the same, the 9.12 driver, and the 9.12 driver hotfix does not address this problem.
The 5750 is still not working with the DG33TL board, but, fortunately it does work in the DP45SG board.
I replaced the HD 5750 with the old HD 2600XT, and the old machine is working fine again.

Getting Vista to go to sleep

I noted my troubles with the Intel GMA drivers, the Intel DG33TL motherboard, and Vista SP1 blue screen crashing in my earlier post.

Since I was running the 15.8 version of the Intel GMA drivers, and Microsoft KB948343 indicates that, based on the driver version numbers, these newer drivers should not be affected by SP1, yet the crash details were clearly the same, and no new driver was forthcoming to correct the blue screen crash, I decided to take the GMA drivers out of the picture.

I am currently using an ATI HD 26000 XT card in my HTPC, and this is a great card. I looked for the same model, the one I was using is from VisionTek, but I found a Sapphire brand card for significantly less. I am actually happier with the Sapphire compared with the VisionTek, the VisionTek fan was really loud, and since I was using it in my HTPC, I ended up buying a Zalman VF900-Cu replacement fan for the VisionTek card. The Sapphire card has no problem with a noisy fan.

I installed the ATI card, installed the drivers, and put the machine to sleep. This is where the GMA drivers would normally crash. This time there was no crash, but the machine also immediately woke up again, I could not get it to stay in sleep mode.

At this point I had had enough of the DG33TL board; it had given me more trouble than I was willing to put up with and I wanted a replacement board. Since I already had the machine open, while replacing the VGA card, I wanted a new board now, which meant instead of ordering online and waiting a few days I had to take a trip to my local Fry’s.

I knew my in store choices would be limited, so I did some research and selected a few models from Asus, Gigabyte, and Intel, with the primary requirement being ICH9 support so that I would not lose the RAID-0 configuration of my drives, and the motherboard swap would not require an OS reinstall. My first choice would have been a Gigabyte GA-G33-DS3R, unfortunately, as I suspected, it turns out that of all the options I was hoping for the only board that came close was an Intel DQ35JO.

Of the three boards on the shelf, all of them had been returns and were resealed, so this was even more of a risk, but they were marked down a few dollars so that did make me feel better, and I could always return the board.

The DQ35JO is very similar to the DG33TL. The DQ35JO is from the Executive series, and the DG33TL is from the Media series. The DQ35JO has no multichannel audio, but does have TPM and AMT. The component layouts are almost identical.

I replaced the board, powered on, the POST screen came up and then nothing. On reading the Intel support documents they recommended a BIOS reset. I removed the battery, waited a few minutes, replaced the battery and rebooted. This time the POST completed, and I could boot. I assume that since the board had been used, and I just replaced the memory and CPU, that this may have caused the initial boot failure. Before booting into Vista I first booted to my DOS bootable USB key and updated the BIOS to the latest version, then reset the BIOS configuration to defaults, and again made all the required changes, most importantly to restore the RAID drive configuration.

I booted into Vista Ultimate x64, waited a few minutes for the new drivers to load, and eventually the keyboard started working and I could login. The ATI control center application complained that there was no ATI driver installed, so I reinstalled the ATI driver, rebooted, and this time everything seemed fine. Not quite, Windows told me the hardware had changed and I had to reactivate. Activating over the internet failed, and I had to activate over the phone, that worked. I also noticed that Windows Update wasn’t working, the KB article for the error code told me to check the PC time. Since I had reset the BIOS without resetting the time, the time was off by years, on correcting the time WU started working again.

Now for the ultimate test, can the machine go to sleep? I press the sleep button and the machine sleeps, I touch the keyboard and the machine wakes up. I leave the machine idle for an hour, it goes to sleep, I touch the keyboard and the machine wakes up. Success!

There is one thing that is still not 100%, and this seems to be a problem on both the DG33TL and the DQ35JO; the case power light is not always on. E.g. after removing mains power and powering on the case power light will be on and stay on until the first sleep, and then the power light will turn off, and even resuming from sleep or rebooting will not turn the light back on.

Maybe I should have been more patient and ordered the Gigabyte GA-G33-DS3R instead, but for now I am happy.