Amazon Associate’s Account Closed

Amazon just notified me in email that my Associate’s account was closed due to not being in compliance with their operating agreement:

“You are not in compliance with Participation Requirement Number 29 because purchases resulting from Special Links on your site have been used for resale or commercial use.”

I have no idea how or why this happened.

 

A couple of years ago I moved my blog from the free Blogger platform to a paid WordPress.com hosted site. About the same time I signed up for an Amazon Associate’s account, profiting from any Amazon links resulting in sales, hoping that the proceeds would cover the costs of WordPress hosting and domain registration.

A quick calculation shows Amazon payouts of $669.57 between 2 August 2012 and 30 August 2016, that is about $167.39 per year, less the $99.00 for WordPress hosting, less $36.00 for Akismet blog spam filtering, less $19.00 for domain registration, leaves a profit of $13.39. Less $99.00 for bulk domain registration fees, not really fair to charge this fee to one domain, leaves a loss of $85.61 per year.

 

I do not know why I was suddenly out of compliance, I made no changes to either my Amazon Associates or WordPress accounts, and I’ve not posted any new content in a number of months. The WordPress stats show typical traffic (ignore the last two days), but the Amazon Associates report does show a marked increase in traffic:

 

I sent an email to Amazon support to clarify the violation, and to request my account be reinstated, but based on similar reports from other low traffic users,  I do not expect a resolution.

Instead, I opted-in to use WordPress’s own WordAds advertising platform, I still need to adjust the blog theme and settings to not interfere with reading, and I have no idea what the monetization would be, but at least I no longer have to bother with making special Amazon links.

Please comment and let me know if you find the ads to be intrusive, and I’ll consider funding the site without advertising assistance.

 

[Update: 1 September 2016]

A day after sending Amazon a request asking for an explanation, I received the following in email:

“This message is to advise you that your pievilsblo-20 account and your August 2016 Advertising Fees have been reinstated. Please accept our apologies for the closure.”

Looks like my account has been reinstated, no explanation of what happened.

Nest Protect False Alarms

2AM, beep, smoke alarm low battery warning, and when one beeps, all the interconnected ones beep, now it is impossible to find which one has a low battery. As for how smoke alarms look, I’ve always wondered who made those terrible aesthetic design choices, maybe it is some kind of industry insider competition to see who can design the ugliest unit with the most obnoxious markings, and still get them sold.

I was thrilled when Nest announced the Nest Protect combination smoke and CO alarm, finally usability and technology catching up with smoke alarms, and an attractive looking unit. I’ve been a long time fan and user of the Nest thermostats, first one v1 unit, and later two v2 units, and I hoped the Nest Protect would do for smoke alarms what Nest did for thermostats.

I pre-ordered ten alarms from Amazon in October 2013, delivered in December 2013. Installation was easy, but I do wish there was a way to get more spoken locations, e.g. “smoke in kids bedroom”, which kid’s bedroom, wait, let me get my phone to see, not.

A week or two after installation we are having friends over for a barbecue, I show the alarm units, I show the mobile app, I explain how great the wave to silence alarm feature is, and how it will warn you before the alarm sounds, everybody is very impressed. Until a few hours later when one of the units go off, “smoke in the guest bedroom”, what smoke. I wave at it, nothing, I press the button, “this alarm cannot be silenced”. Keep in mind they are all interconnected with a mesh wireless network, so all ten units are screaming. After the kids stopped crying and we moved the party outside, I get a ladder and remove the unit, still screaming, I take it to fresh air, still screaming, I get a screwdriver open it up and remove the batteries, silence, but the rest of the units are still screaming, and pressing the button on those units still say “this alarm cannot be silenced”. About 5 minutes after removing the battery from the failed alarm the the other alarms stop. Egg on my face.

Nest support exchanged the unit and sent out a replacement.

As I was browsing the Nest support forums I noticed many other users reporting false alarms, some reporting that replacement units resolved the problem, some reporting repeat problems. Things got worse for Nest when they issued a recall, offering refunds, disabling the wave feature with a firmware update, and stopped selling units until they swapped stock for units with the newer firmware before re-releasing at a reduced price.

October 2014 early AM the alarm goes off, false alarm again, at least this time the alarm silenced itself after a minute. After some back and forth, and an escalation, Nest support agreed to replace all units. The new units have September 2014 manufacturing dates, so I hope these new units are less buggy.

January 2015 early AM the alarm goes off, false alarm again, this time the alarm stopped after only a few seconds. I’ve had enough, my kids are scared, my wife is mad, Nest, you’re out.

Nest support agreed to issue a refund for all ten units, we’ll see how long it takes to receive the refund. And now I’m in the market for combination smoke and CO alarms again, and there are not many choices, if you want something that is functional and good looking.

I was tempted to wait for the First Alert Wi-Fi enabled combination smoke and CO alarm, available for pre-order on Amazon, and although this unit is from a well established manufacturer, hopefully no false alarms, I’m not making the same mistake I made with Nest. Regardless of the pre-order option, it still leaves me unprotected, and I need something now. I could simply not find a decent looking, combination smoke and CO, interconnectable, and hardwired unit, big problem being decent looking.

In the end I opted for the First Alert PC910V units, they are low profile voice enabled combination smoke and CO units with a built-in 10 year battery, sold at Lowes or Amazon. Not interconnected, not hardwired, but at least they look half decent.

Installing these units turned out to be a bit more tricky than I anticipated. The install base is so small that the round ceiling junction boxes are barely hidden, and the instructions specifically call out that they are not to be installed on junction boxes due to air flow concerns.

Smoke

Below are some pictures showing the size differences between the Nest base (left, bottom), First Alert base (center, middle), and a round cover plate (right, top):

Base Size Comparison

Base Size Comparison

To account for the junction box ventilation warning I sealed between the junction boxes and the ceiling drywall, and between the cover plate and the ceiling. The alarm bases were mounted on the cover plates, see pics below.

Junction box

Sealed around junction box

Sealed around cover

Base on cover

Installed

Due to the small footprint of the alarm, the cover plate and imperfections around the hole in the ceiling can be seen when looking up at an angle. (Sorry for the crappy pictures, iPhone in low night not so great)

Drywall marks

Let’s hope I never hear them peep, at least not for ten years if we can trust the battery life, and at least not without a real emergency.

Electrical Power Quality

Earlier this year we moved a couple miles from Redondo Beach to Manhattan Beach, bigger house, better school district.
As far as the house and area is concerned, it is definitely an upgrade, but not so for the utilities.

Monthly utilities are a lot more expensive, not so much the per unit fees, but the base service fees, not just a couple $, but three of four times what we paid in Redondo Beach. Now, if it came with better offerings, or better service, or higher quality, ok, but the opposite.
Water quality is worse, specifically hardness, MB supplies its own water, RB gets water from LADWP, and that unsightly water tower that no longer serves any practical purpose, with efforts to demolish it always being thwarted.
As a new resident trash collection makes me pay almost thirty $ extra per month for an extra trash can, while grandfathered-in residents keep extras for free. Now, I know it is unfair to judge a service by their employee’s actions, or is it, but the trash collection guy is a jerk, if a little dust and having to get out of the truck is going to get you agitated, you are in the wrong business, especially when compared with the pack of trash collection men in RB that were always friendly and willing to give a hand.
But, I really digress, I want to discuss electrical power quality problems.

In the six plus years we lived in RB, I think we had one scheduled power outage, and maybe two short unplanned outages. Since moving to MB earlier this year, we’ve had two scheduled outages, one lasting an entire day, and several unscheduled outages.
The power is unreliable, SCE knows it, the city knows it, there are some plans addressing it, see here here here here.

My concern is not really power being on or off, it is power being on but of poor quality; an electronic equipment killer.

When we moved in, the first signs of electrical problems were flickering lights. At first I thought it was a problem with the Vantage light control system, but even lights directly on utility power flickered. As soon as I hooked up UPS’s to my servers and the signal distribution system, the UPS’s started complaining about power quality. Occasionally during the day I would get a notification from the UPS’s that it detected a distorted input, and every night the UPS’s would complain about low input voltage.
It may be coincidental, but I’ve also had two astronomical clock light timers fail at the same time, the casings were scorched in what appears to be signs of electrical damage.

UPS Event Log:
APC Event Log

In order to quantify the problem, I used a Fluke VR1710 Voltage Quality Recorder. The device plugs into a mains outlet, and records events, and a USB port is used to configure the device, and download recorded data.

As I am not a power quality expert, I referred to Wikipedia to and Power Quality In Electrical Systems for information and reference material. To further simplify the analysis, I opted to compare my office power with my home power, this allowed me to easily visualize the quality differences, granted, I am assuming my office power is good.

I configured the VR1710 to take measurements every 10s, and to record exceptional events, about 10 days worth of data. I set the dip threshold to 106V, the swell threshold to 127V, and the transient sensitivity to 5V.

VR1710 Settings:
VR1710 Settings

Below are reports detailing the recorded events, click graphs to view full resolution:

Home Voltage:
Home - Voltage

There is a clear pattern of voltage drops below 102V every evening, these drops are also observed in the UPS logs showing low voltage warnings around 7:30PM every evening.

Office Voltage:
Office - Voltage

The office voltage is very stable.

Home Flicker:
Home - Flicker

According to Wikipedia and PQW short term flicker (Pst) is noticeable at values exceeding 1.0, and long term flicker (Plt) is noticeable at values exceeding 0.65. These results would explain why we observe lights flickering.

Office Flicker:
Office - Flicker

Office flicker values are well within acceptable ranges.

Home Statistics:
Home - Statistics

From this distribution we can see the wide spread in voltages, well below the 120V theoretical norm. This chart does not show it, but the 95% distribution is 115.5V, and the 5% distribution is 106.1V.

Office Statistics:
Office - Statistics

The office voltage distribution is nicely clustered around 119V, with the 95% distribution at 119.6V, and the 5% distribution at 117.4V.

Home Dips And Swells:
Home - Dips Swells

ITIC and CBEMA are standards for acceptable power quality, see here for a detailed description.
To describe the graph, I quote from the Fluke Power Log software manual:
Dips and swells are shown on a CBEMA (Computer Business Equipment Manufacturers Association) and ITIC (Information Technology Industry Council) plot classification table according to EN50160. On the CBEMA (blue) and ITIC (red), curve markers are plotted for each dip and swell. The height on the vertical axis shows the severity of the dip or swell relative to the nominal voltage. The horizontal position shows the duration of the dip or swell. These curves show an ac input voltage envelope which typically can be tolerated (no interruption in function) by most Information Technology Equipment (ITE).

Based on the graph we can see a large number of events exceeding the acceptable ranges. Since there were no dips at the office, there is no graph for the office.

Home Transients:
Home - Transients

I only show the transients graph for home, as the wave forms all look different, and the only difference between home and office is 87 events were recorded at home while 10 events were recorded at the office for the same approximate time duration. See PQW for an explanation of transients.

We can clearly see that the power quality at my house is significantly worse compared to the power at my office.

I am speculating, but I wonder if the old transformer across the road can supply sufficient power, given that it used to supply power to three small very old houses on four lots, demolished to make room for four new larger houses?

I just opened a support ticket with SCE, let’s hope they can do something about the problem.

LSI turns their back on Green

I previously blogged here and here on my research into finding a power saving RAID controllers.

I have been using LSI MegaRAID SAS 9280-4i4e controllers in my Windows 7 workstations and LSI MegaRAID SAS 9280-8e controllers Windows Server 2008 R2 servers. These controllers work great, my workstations go to sleep and wake up, and in workstations and servers drives spin down when not in use.

I am testing a new set of workstation and server systems running Windows 8 and Server 2012, and using the “2nd generation” PCIe 3.0 based LSI RAID controllers. I’m using LSI MegaRAID SAS 9271-8i with CacheVault and LSI MegaRAID SAS 9286CV-8eCC controllers.

I am unable to get any of the configured drives to spin down on either of the controllers, nor in Windows 8 or Windows Server 2012.

LSI has not yet published any Windows 8 or Server 2012 drivers on their support site. In September 2012, after the public release of Windows Server 2012, LSI support told me drivers would ship in November, and now they tell me drivers will ship in December. All is not lost as the 9271 and 9286 cards are detected by the default in-box drivers, and appear to be functional.

I had hoped the no spin-down problem was a driver issue, and that it would be corrected by updated drivers, but that appears to be wishful thinking.

I contacted LSI support about the drive spin-down issue, and was referred to this August 2011 KB 16563, pointing to KB 16385 stating:

newer versions of firmware no longer support DS3; the newest version of firmware to support DS3 was 12.12.0-0045_SAS_2108_FW_Image_APP-2.120.33-1197

When I objected to the removal, support replied with this canned quote:

In some cases, when Dimmer Switch with DS3 spins down the volume, the volume cannot spin up in time when I/O access is requested by the operating system.  This can cause the volume to go offline, requiring a reboot to access the volume again.

LSI basically turned their back on green by disabling drive spin-down on all new controllers and new firmware versions.

I have not had any issues with this functionality on my systems, and spinning down unused drives to save power and reduce heat is a basic operational requirement. Maybe there are issues with some systems, but at least give me the choice of enabling it in my environment.

A little bit of searching shows I am not alone in my complaint, see here and here.

And from Intel a November 2012 KB 033877 that they have disabled drive power save on all their RAID controllers, maybe not that surprising given that Intel uses rebranded LSI controllers.

After a series of overheating batteries and S3 failures, I have long ago given up on Adaptec RAID controllers, but this situation with LSI is making me take another look at them.

Adaptec is advertising Intelligent Power Management as a feature of their controllers, I ordered a 7805Q controller, and will report my findings in a future post.

RIP Boxee Box

After nearly six months of no software updates for the Boxee Box, Boxee announced the Boxee TV, and, as far as I’m concerned, the death of the Boxee Box.

Boxee is releasing an updated hardware platform, but they are abandoning all local media playback and cataloging capabilities, and instead focusing on a, US only, cloud storage DVR device.

I have no need for such a device, and based on the Boxee community forum posts, the blog comments, and even comments from their XBMC roots, I am not alone in expressing my disappointment.

I suspected this may happen, but I had always hoped that Boxee would eventually make good on their empty promises and fix the issues. If not fix it, then release an updated hardware platform that corrects the problems that plagued the first version, and I’d still be willing to pay for it.

 

I am one of the many users that is plagued by the HD audio playback dropout issues introduced in a firmware update almost two years ago. A problem Boxee blamed on the Intel CE4100 SDK, and promised to fix in March, but then backtracked saying that fixing it would incur too much testing overhead. Yes, break a feature that worked, then claim it is Intel’s fault, but refuse to correct it because it is too much trouble to test.

The Boxee Box will get a last update to fix an issue with Flash playback, but the HD audio issue will not be fixed.

 

I have already transitioned one of my Boxee Boxes to XBMC based OpenELEC 2.0 running on a Zotac ZBOX Nano XS ID11 Plus. It still has a few rough edges, but XBMC is actively being developed for a variety of exciting platforms.

The one thing about Boxee I will miss the most is the standalone D-Link DSM-22 Boxee remote, best remote for XBMC ever. If I had known they will be discontinued, and impossible to buy, I would have bought a couple spares. If you know where to buy DSM-22’s, please let me know.

 

Rest In Peace Boxee Box.

Dyslexic Intel RSTe Driver

I encounter one problem after another running Windows 8 and Server 2012 on the dual Xeon E5 Intel C600 chipset based SuperMicro 7047A-T and 7047A-73 SuperWorkstation machines. I will say that this is really not representative of my Windows 8 experience in general, as all other machines I installed on worked fine with the in-box drivers.

The C602 includes the Intel Storage Controller Unit (SCU) SATA / SAS controller. Windows 8 and Server 2012 do not include in-box drivers for the SCU. The SCU drivers are part of the Intel Rapid Storage Technology Enterprise (RSTe) driver set. Note that the RSTe and RST drivers are different and not compatible with one another. When you install the full RSTe package, it includes SCU drivers for the SCU RAID controller, AHCI drivers for the SATA controller, and the Windows RST management application.

A clean install of Windows 8 will use the in-box drivers for the SATA controller. In the image below you can see the Intel 520 Series 480GB SSD drive show up with the correct model number:

Device.Manager.Win8

After installing RSTe (3.2.0.1132, 3.2.0.1134), the 4TB Hitachi drives attached to the SCU show up, but the model numbers of the drives, including the SSD drive attached to the SATA port, are now messed up:

Device.Manager.RSTe

The drive hardware identifiers are correct, but the friendly name is not:

Intel.SSD.Hardware

Intel.SSD.Friendly

It appears that the text BYTE’s are WORD swapped, i.e. ABCD becomes BADC.

The driver is also not functional, attempting to create a storage spaces pool using the Hitachi drives hangs forever, with no drive activity, requiring a hard power cycle:

Storage.Pool

And lastly, the Intel SSD Toolbox 3.0.3 is not compatible with Windows 8:

SSD.Toolbox

The clock is ticking for Windows Server 2012 (4 September, 1 day left) and Windows 8 (26 October, 7 weeks left) general availability, I can only hope compatible drivers, firmware, and utilities are forthcoming.

 

[Update: 4 September 2012]
SuperMicro posted updated RSTe drivers (package v3.5.0.1101, driver v3.5.0.1096). This driver set resolves the hang during storage space creation, but the drive names are still messed up.

Windows 8 VIDEO_TDR_FAILURE Madness

I finally figured out why I kept on getting VIDEO_TDR_FAILURE BSOD’s when installing Windows 8 on my SuperMicro workstations. It turns out that the problem goes away when I use a PCIe slot associated with CPU #1, instead of a slot associated with CPU #2.

Some history on my adventures with Windows 8 and SuperMicro SuperWorkstations:
I got ACPI_BIOS_ERROR BSOD’s while installing Windows 8, SuperMicro provided a Beta BIOS that resolved the problem.
The Windows 8 install hangs if installing to a SSD drive on a LSI 2308 SAS controller, that issue is still unresolved, but can be worked around by connecting the SSD to the Intel SATA controller.
I got VIDEO_TDR_ERROR BSOD’s while installing Windows 8 with a NVidia Quadro 5000 graphic card, same with an ATI FirePro V7900 or a NVidia GeForce GTX 680 or an ATI HD 7970. And this post is about resolving that problem.

 

SuperMicro released v1.0a BIOS updates for the X9DAi and X9DA7 motherboards used in the 7470A-T and 7470A-73 SuperWorkstations. I was hoping this will resolve the VIDEO_TDR_FAILURE BSOD’s, but no.

The X9DA7 BIOS updated without issue, but the X9DAi update reported an error at the end of the update process; “Error when sending Enable Message to ME”.

I contacted SuperMicro support, and they asked me to make sure that there is no jumper on JPME1. There is no mention of JPME1 in the motherboard manual, but it is located next to JIPMB1, next to PCIe slot #1. The header had a jumper on pins 2 and 3, where the same header on the X9DA7 motherboard had a jumper between 1 and 2. I removed the jumper, and the BIOS update succeeded.

JPME1

 

Unlike the ACPI_BIOS_ERROR BSOD that happens during the WinPE phase of the install, the VIDEO_TDR_FAILURE BSOD happens on the first boot after the install, during the hardware detection and driver install phase. This means that the technique I used to kernel debug the initial boot phase will not work, as the second boot is using the BCD already deployed to the target hard drive. I had to modify the BCD of the already installed image, prior to the install continuing after the reboot.

 

I tested many permutations of graphic cards and configurations, and it quickly became very annoying to have to type my Win8 product key every single time I boot and install. To avoid this I created configuration files in the sources directory on the install media, and this bypassed the key question. You can read more about the meaning of the file contents here:

EI.cfg:

[EditionID]
Professional
[Channel]
Retail
[VL]
0

PID.txt:

[PID]
Value=XXXXX-XXXXX-XXXXX-XXXXX-XXXXX

 

To modify the BCD of the installed image, and be able to easily repeat the second phase of install testing, I installed a second hard drive, and deployed WinPE to the second drive. By using F11 during boot to choose the boot drive, I could select booting from the second drive at any time.

 

I have a variety WinPE v3 (Win7) based utility images, and I updated them to use WinPE v4 (Win8). In the process I lost the boot menu, and the first image in the menu automatically started booting. After some trial and error, I found the bootmenupolicy BCD option, and when set to legacy mode, the old style menu is back:

bcdedit /set {default} bootmenupolicy legacy

 

I installed Win8 on the primary drive, and during the reboot, instead of booting to the installed Win8 drive, I used F11 and booted to my secondary WinPE drive. From WinPE I modified the boot BCD to enable kernel debugging over the network:

bcdedit -store c:\boot\bcd /set {default} nocrashautoreboot yes
bcdedit -store c:\boot\bcd /set {default} debugtype net
bcdedit -store c:\boot\bcd /set {default} hostip 3232235876
bcdedit -store c:\boot\bcd /set {default} port 50000
bcdedit -store c:\boot\bcd /set {default} key my.secret.debug.key
bcdedit -store c:\boot\bcd /debug {default} yes

This is equivalent to:

bcdedit /dbgsettings net host:192.168.1.100 port:50000 key:my.secret.debug.key

But unlike the dbgsettings command, this allows me to specify a BCD store. Also note that the IP address is stored as a single numeric value instead of the dotted IP format.

 

While still in WinPE, I captured the state of the primary Win8 drive by making a drive image using Symantec Ghost, the real Ghost, currently sold as Symantec Ghost Solution Suite, not the same named but volume snapshot based Norton Ghost or Symantec System Recovery. By saving a drive image, I can easily change hardware or configurations, test the install starting at the second phase, reboot to the secondary WinPE drive using F11, restore the entire drive image, and try again, while leaving the kernel debug options intact.

 

I tested with following hardware configurations in various permutations:

 

With the kernel debugger attached, I captured the following crash details in WinDbg for NVidia based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa80211cd010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800782d0d8, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000002, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`12c76078 fffff801`66fef0ea : 00000000`00000000 00000000`00000116 fffff880`12c761e0 fffff801`66f734b8 : nt!DbgBreakPointWithStatus
fffff880`12c76080 fffff801`66fee742 : 00000000`00000003 fffff880`12c761e0 fffff801`66f73e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`12c760e0 fffff801`66ef4144 : fffffa80`2094b100 fffff880`021ee9c0 fffffa80`1f54e400 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`12c76800 fffff880`04b33dcb : 00000000`00000116 fffffa80`211cd010 fffff880`0782d0d8 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`12c76840 fffff880`04b32518 : fffff880`0782d0d8 fffffa80`211cd010 fffff880`12c76949 00000000`000000c7 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`12c76880 fffff880`04a1e608 : fffffa80`211cd010 fffff880`12c76949 00000000`00000000 00000000`00000002 : dxgkrnl!TdrIsRecoveryRequired+0x168
fffff880`12c768b0 fffff880`04a4d539 : 00000000`00000000 fffff780`00000320 00000000`00000000 fffffa80`1f54e400 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`12c769b0 fffff880`04a4ba49 : fffffa80`00000002 fffffa80`1f54e400 fffffa80`1f54e840 fffffa80`1f54e840 : dxgmms1!VidSchiCheckHwProgress+0xe5
fffff880`12c76a00 fffff880`04a16fe5 : ffffffff`ff676980 00000000`00000001 fffff880`12c76b69 fffffa80`1f54e400 : dxgmms1!VidSchiWaitForSchedulerEvents+0x20d
fffff880`12c76aa0 fffff880`04a4b646 : 00000000`00000000 00000000`0000000f fffffa80`1f54e400 fffffa80`1f54e400 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`12c76bd0 fffff801`66e9b521 : fffffa80`1f5abb00 fffffa80`1f54e400 fffff880`03b01140 00000000`06a21e1e : dxgmms1!VidSchiWorkerThread+0xca
fffff880`12c76c10 fffff801`66ed9dd6 : fffff880`03af5180 fffffa80`1f5abb00 fffff880`03b01140 fffffa80`19aac040 : nt!PspSystemThreadStartup+0x59
fffff880`12c76c60 00000000`00000000 : fffff880`12c77000 fffff880`12c71000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+1ae0d8
fffff880`0782d0d8 4055 push rbp

SYMBOL_NAME: nvlddmkm+1ae0d8

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf93d7

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

 

With the kernel debugger attached, I captured the following crash details in WinDbg for ATI based cards:

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa801ed114d0, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800725cefc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 000000000000000d, Optional internal context dependent data.

Debugging Details:
------------------

FAULTING_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR: 0x116

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`06fa9ee8 fffff803`e6ff20ea : 00000000`00000000 00000000`00000116 fffff880`06faa050 fffff803`e6f764b8 : nt!DbgBreakPointWithStatus
fffff880`06fa9ef0 fffff803`e6ff1742 : 00000000`00000003 fffff880`06faa050 fffff803`e6f76e90 00000000`00000116 : nt!KiBugCheckDebugBreak+0x12
fffff880`06fa9f50 fffff803`e6ef7144 : fffffa80`1e2df4e0 fffff880`020b99c0 fffffa80`1d31f010 00000000`00000000 : nt!KeBugCheck2+0x79f
fffff880`06faa670 fffff880`04d31dcb : 00000000`00000116 fffffa80`1ed114d0 fffff880`0725cefc 00000000`00000000 : nt!KeBugCheckEx+0x104
fffff880`06faa6b0 fffff880`04d30548 : fffff880`0725cefc fffffa80`1ed114d0 fffff880`06faa7b9 00000000`00000180 : dxgkrnl!TdrBugcheckOnTimeout+0xef
fffff880`06faa6f0 fffff880`04c11608 : fffffa80`1ed114d0 fffff880`06faa7b9 00000000`0000000f fffffa80`1d31f8f8 : dxgkrnl!TdrIsRecoveryRequired+0x198
fffff880`06faa720 fffff880`04c459f9 : 00000000`00000001 fffff880`06faa8a0 fffff880`06faa920 00000000`00000000 : dxgmms1!VidSchiReportHwHang+0x438
fffff880`06faa820 fffff880`04c3ff72 : fffffa80`1d31f010 fffff780`00000320 fffffa80`1d31f770 fffffa80`1d31f010 : dxgmms1!VidSchWaitForCompletionEvent+0x411
fffff880`06faa8e0 fffff880`04c4206c : fffffa80`1d31f010 fffffa80`1d31f450 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiWaitForEmptyHwQueue+0x9a
fffff880`06faa9d0 fffff880`04c3ea85 : 00000000`00000000 fffffa80`1d31f010 fffffa80`1d31f450 00000000`00000000 : dxgmms1!VidSchiSuspend+0x74
fffff880`06faaa00 fffff880`04c09fe5 : ffffffff`ff676980 00000000`00000001 fffff880`06faab69 fffffa80`1d31f010 : dxgmms1!VidSchiWaitForSchedulerEvents+0x249
fffff880`06faaaa0 fffff880`04c3e646 : 00000000`00000000 fffffa80`1d585660 fffffa80`1d44d7f0 fffffa80`1d31f010 : dxgmms1!VidSchiScheduleCommandToRun+0x289
fffff880`06faabd0 fffff803`e6e9e521 : fffffa80`1d6b9b00 fffffa80`1d31f010 fffff880`03932140 00000000`04d91ecb : dxgmms1!VidSchiWorkerThread+0xca
fffff880`06faac10 fffff803`e6edcdd6 : fffff880`03926180 fffffa80`1d6b9b00 fffff880`03932140 fffffa80`19ac7500 : nt!PspSystemThreadStartup+0x59
fffff880`06faac60 00000000`00000000 : fffff880`06fab000 fffff880`06fa5000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
atikmpag+8efc
fffff880`0725cefc 4055 push rbp

SYMBOL_NAME: atikmpag+8efc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: atikmpag

IMAGE_NAME: atikmpag.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fdf9279

FAILURE_BUCKET_ID: 0x116_IMAGE_atikmpag.sys

BUCKET_ID: 0x116_IMAGE_atikmpag.sys

 

This was not really helping me much, and I decided to repeat the tests but use the checked build of Windows 8 to help troubleshoot.

With the kernel debugger attached, I captured the following ASSERT during the boot:

Windows 8 Kernel Version 9200 MP (1 procs) Checked x64
Built by: 9200.16384.amd64chk.win8_rtm.120725-1247
Machine Name:
Kernel base = 0xfffff802`0e01d000 PsLoadedModuleList = 0xfffff802`0e760ac0
System Uptime: 0 days 0:00:06.228 (checked kernels begin at 49 days)
Assertion: The BIOS has reported inconsistent resources (_CRS). Please upgrade your BIOS.
ACPI!PnpBiosGetDeviceResourceList+0x15e:
fffff880`012c3c2a cd2c int 2Ch
...
Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME: System

FAULTING_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x0

CURRENT_IRQL: 0

LOCK_ADDRESS: fffff8020e7c5d60 -- (!locks fffff8020e7c5d60)

Resource @ nt!PiEngineLock (0xfffff8020e7c5d60) Exclusively owned
Threads: fffffa8019a36040-01<*>
1 total locks, 1 locks currently held

PNP_TRIAGE:
Lock address : 0xfffff8020e7c5d60
Thread Count : 1
Thread address: 0xfffffa8019a36040
Thread wait : 0x105eccd4

LAST_CONTROL_TRANSFER: from fffff880012b736f to fffff880012c3c2a

STACK_TEXT:
fffff880`009b4b30 fffff880`012b736f : fffffa80`23a9e900 fffff880`012a7e01 fffff880`009b4c08 fffff880`012a7e70 : ACPI!PnpBiosGetDeviceResourceList+0x15e
fffff880`009b4bd0 fffff880`0125acba : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`012a7e70 fffffa80`1f477010 : ACPI!ACPIBusIrpQueryResourceRequirements+0x8b
fffff880`009b4c50 fffff802`0e91b6a4 : fffffa80`23a9e900 fffffa80`19ac54c0 fffff880`009b4db0 fffffa80`23a9e900 : ACPI!ACPIDispatchIrp+0x2a6
fffff880`009b4cf0 fffff802`0e91cd1b : fffffa80`23a9e900 fffff880`009b4db0 00000001`c00000bb 00000000`00000000 : nt!IopSynchronousCall+0x10c
fffff880`009b4d80 fffff802`0e915bdb : fffffa80`23a9e900 fffff880`009b4e50 fffffa80`23a4f850 00000000`0000001e : nt!PpIrpQueryResourceRequirements+0x5f
fffff880`009b4e10 fffff802`0e91748d : fffffa80`23a9b8e0 00000000`00000000 ffffffff`80000218 fffffa80`23a9b8e0 : nt!PiQueryResourceRequirements+0x47
fffff880`009b4ea0 fffff802`0e91a1f2 : fffffa80`23a9b8e0 fffffa80`23a9b8e0 00000000`00000001 00000000`00000000 : nt!PiProcessNewDeviceNode+0x159d
fffff880`009b5070 fffff802`0e08feb5 : fffffa80`19adcd20 00000000`00000000 fffff880`009b5358 00000000`00000000 : nt!PipProcessDevNodeTree+0x1fe
fffff880`009b5310 fffff802`0e08fb59 : 00000000`00000000 00000000`00000000 00000000`00000000 fffffa80`37e19cc0 : nt!PnpDeviceActionWorker+0x345
fffff880`009b53d0 fffff802`0ed4010d : 00000000`00000000 fffff8a0`00000007 fffff8a0`00f08c00 00000000`00000000 : nt!PnpRequestDeviceAction+0x2ed
fffff880`009b5420 fffff802`0ed3b39d : fffff802`0d536800 fffff802`0e7c83c0 00000000`00000006 fffff802`0d536800 : nt!IopInitializeBootDrivers+0x905
fffff880`009b5650 fffff802`0ed2deb5 : fffff802`0d536800 00000000`00000000 fffff802`0d536800 fffff802`0d51ebf0 : nt!IoInitSystem+0xb5d
fffff880`009b59b0 fffff802`0e82d013 : fffff802`0d536800 fffffa80`19a36040 00000000`00000000 fffffa80`19ab3040 : nt!Phase1InitializationDiscard+0x1899
fffff880`009b5bc0 fffff802`0e1b289e : fffff802`0d536800 fffff802`0d536800 00000000`00000000 00000000`00000000 : nt!Phase1Initialization+0x13
fffff880`009b5bf0 fffff802`0e24ef96 : fffff802`0e82d000 fffff802`0d536800 fffff802`0e6c6180 00000000`f8ffffff : nt!PspSystemThreadStartup+0x1a2
fffff880`009b5c60 00000000`00000000 : fffff880`009b6000 fffff880`009b0000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
ACPI!PnpBiosGetDeviceResourceList+15e
fffff880`012c3c2a cd2c int 2Ch

SYMBOL_STACK_INDEX: 0

SYMBOL_NAME: ACPI!PnpBiosGetDeviceResourceList+15e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: ACPI

IMAGE_NAME: ACPI.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 50109dd0

BUCKET_ID_FUNC_OFFSET: 15e

FAILURE_BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

BUCKET_ID: 0x0_ACPI!PnpBiosGetDeviceResourceList

 

This is interesting, the kernel ASSERT’s on a problem reported by the BIOS.

I contacted SuperMicro support, they said they will investigate the BIOS failure, and they suggested I try to use PCIe slot #3 instead of slot #5. The motherboard manual mentions that slots #1, #2, and #3 are to be used if CPU #1 is installed, and slots #4, #5, and #6 to be used only if CPU #2 is installed.

PCIe

I have both processors installed, so not using the more conveniently located slot #5 never came to mind. I moved the graphic card to CPU #1 slot #3, and voila, install succeeded and Windows 8 was up and running!

 

I repeated the checked build test with the graphic card in slot #3, and the same BIOS ASSERT error was reported, so the BIOS ASSERT seems to be unrelated to the ACPI_TDR_FAILURE error.

 

This was a very frustrating problem, and I still don’t understand the root cause, but I am happy to be able to finally switch both workstations to Windows 8.

WordPress.com 404 With Blogger Permalinks

Part of the research I did before migrating from Blogger to WordPress.com, was to make sure that current Blogger permalinks will resolve correctly once the old posts were imported into WordPress.com. At the time all seemed fine, but soon after migrating, I received alerts from Google Webmaster Tools that there is an increase in site errors, specifically 404 errors.

Some background: Permalinks are the URL’s that point directly to specific posts on the blog. These URL’s are known by search engines, are shared on forums, and are basically the static address of posts. Blogger and WordPress.com use different styles of permalinks. WordPress.com allows some customization of permalinks, but unlike WordPress.org, there is no support for custom plugins to handle rewrites for permalinks, 302’s or 404’s.

Although not documented anywhere, WordPress.com does support Blogger style permalinks, and will correctly redirect the Blogger style link to the WordPress.com style page. As an example, see the links below, one for Blogger and one for WordPress.com:

http://blogdotinsanegenius.blogspot.com/2012/06/looks-can-be-deceiving.html
https://blogdotinsanegenius.wordpress.com/2012/06/looks-can-be-deceiving

Search engines will know the link using the old blogger style URL, and both styles of links will correctly resolve to the current page:

https://blog.insanegenius.com/2012/06/19/looks-can-be-deceiving
https://blog.insanegenius.com/2012/06/looks-can-be-deceiving.html

So why is it that Google Webmaster Tools reported a suddenly spike in 404’s?

Google.404.1

By reviewing the links that report 404, I noticed that the permalink format of certain posts on WordPress.com was slightly different to the Blogger permalinks.

http://blogdotinsanegenius.blogspot.com/2009/10/hitachi-a7k2000-and-seagate-barracude.html
http://blogdotinsanegenius.blogspot.com/2010/05/zotac-xboxhd-id11-mkv-h264-video.html
http://blogdotinsanegenius.blogspot.com/2008/03/printing-from-network.html

https://blogdotinsanegenius.wordpress.com/2009/10/11/hitachi-ultrastar-and-seagate-barracude-lp-2tb-drives/
https://blogdotinsanegenius.wordpress.com/2010/05/28/zotac-xboxhd-id11-mkv-h-264-video-playback-performance/
https://blogdotinsanegenius.wordpress.com/2008/03/30/printing-from-the-network/

Notice the difference? Blogger appears to keep links short, and remove words like “the” and “and”.

I contacted WordPress.com support, and they provided a manual solution. They suggested that I modify the “slug” of each 404 post to match the Blogger style permalink.

Slug

This resolved the problem with the top 404’s, but I would have expected the Blogger import plugin to take care of this for me.

But, I soon received another alert email from Google Webmaster Tools, and this time the 404 posts looked a bit different.

Google.404.2

Notice that all the links contain parameters in the URL (I think these are old style Google Analytics parameters), and without the parameter the redirect works, but with any parameters the redirect fails.

https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html
https://blog.insanegenius.com/2009/09/western-digital-re4-gp-2tb-drive.html?m=1

I again contacted WordPress.com support, and I am still awaiting a resolution.

[Update: 9 August 2012]
Just got an email from WordPress.com support, the problem with parameters is fixed, thank you.

Windows 8 Install Crash With NVidia Quadro 5000

I got Windows 8 RTM installed on my two SuperMicro SuperWorkstation machines, with a bit of trouble along the way, but nothing I could not work around. But, I ran into a problem with NVidia Quadro 5000 cards causing a VIDEO_TDR_FAILURE BSOD during the Windows 8 install process.

 

I was running my two workstations with ATI FirePro V7900 graphic cards, but I decided I wanted a bit more rendering horsepower. I wanted a card that had a good balance between modern architecture, great 2D performance, good 3D performance, OpenCL or CUDA support, and reasonable power consumption. I found the Tom’s Hardware Workstation Graphics 2012 benchmark site to be a very informative, and I decided that the NVidia Quadro 5000 was a very good choice.

I replaced my FirePro V7900 with the Quadro 5000, and started the Windows 8 x64 RTM install. All went well, until the first reboot during the install, and the machine would blue screen crash with a VIDEO_TDR_FAILURE. During the install process the hardware is identified, the appropriate drivers extracted, and on the reboot those drivers are started. It appears that soon after the NVidia driver loads, that it crashes.

 

The Timeout Detection and Recovery (TDR) feature was added to Windows Vista, and was a way for the OS to recover from a renderer failure without the need to restart the machine. Typically the user will see a notification that the graphic subsystem was restarted, but in cases where the restart fails, a VIDEO_TDR_FAILURE blue screen crash is generated.

The web is full of reports of NVidia VIDEO_TDR_FAILURE crashes, and solutions typically involve replacing the hardware or updating drivers. In my case I had two new machines, and two new graphic cards, and a brand new operating system, and both cards on both machines crashed.

I contacted SuperMicro support, and responsive as they always are, said they would investigate.

I also contacted PNY support, as PNY is the manufacturer of the NVidia Quadro 5000, here is their reply.

Again, I am sorry, but we do not list Windows 8 (yet) as being compatible with the Quadro 5000, or any other Quadro or Geforce card we manufacture. Until it is publically and commercially available, we cannot provide support for Windows 8. Windows 8 is not available to the end user yet, and it is in testing, as is the Nvidia driver. If you find issues, you must report them to Microsoft in order to improve compatibility in the final release. There is obviously a compatibility problem with Windows 8 and the Quadro 5000 right now (according to your testing of TWO cards), and unfortunately there is nothing we can do to fix it while in is not available to the public. My best advice is to try it again when it is officially released sometime in 2013.

Not very helpful at all, and their concept of Windows 8 release timing, and their responsibility, is way out there.

 

The real problem here is that it is the in-box NVidia drivers that are crashing, not drivers I install later. And as it is the in-box graphic drivers that crash, there is no (easy) way to update the drivers used by the Windows 8 install media.

 

I had previously used a Quadro 4000 card on the same machines, and they installed without incident, so it appears to be something unique the Quadro 5000 cards.

At this time I am waiting for SuperMicro to get back to me with suggestions, as I have little hope of hearing anything useful from PNY.

Windows 8 Install Hangs Booting From LSI 2308 SAS Controller

I’ve previously posted about problems installing Windows 8 on SuperMicro machines, and that SuperMicro released a Beta BIOS that solved the install problems. I’ve since run into two more problems; the install hanging when booting of the LSI 2038 SAS controller, and a BSOD when using a Quadro 5000 video card (more on that in a later post).

 

I have two SuperWorkstation machines, a 7047A-T using a X9DAi motherboard, and a 7047A-73 using a X9DA7 motherboard.

The X9DAi and X9DA7 both use the Intel C602 chipset. The X9DAi and X9DA7 both have 2 x SATA3 ports, 4 x SATA2 ports, and 4 x SAS / SATA2 ports. The X9DA7 has an additional LSI 2308 controller with 8 x SAS2 / SATA3 ports.

On the 7047A-T / X9DAi machine, the 8 x hot-swap drive trays are connected to the 2 x SATA3, 2 x SATA2, and 4 x SAS / SATA2 ports.

On the 7047A-73 / X9DA7 machine, the 8 x hot-swap drive trays are connected to the 8 x LSI 2308 ports.

SuperMicro support provided me with Beta BIOS’s for the X9DAi and X9DA7 motherboards, this resolved the ACPI_BIOS_ERROR, and allowed me to install Windows 8 RTM on these machines, or at least get past the BSOD while booting the install media.

 

I configured both machines with:

 

In the 7047A-T / X9DAi machine, I installed the SSD drive in slot-0 of the hot-swap trays, connected to SATA3 port-0. I installed Windows 8 x64 RTM without issue.

 

In the 7047A-73 / X9DA7 machine I installed the SSD drive in slot-0 of the hot-swap trays, connected to LSI2308 port-0. I installed Windows 8 x64 RTM, and the install hanged at 0% while copying files.

While in this state, I suspected the problem to be IO related, so I pressed Shift-F10 to open a console window, I ran diskpart, and diskpart hanged.

I downloaded the latest LSI 2308 drivers from the Supermicro FTP site. I ran the install again, this time I manually loaded the drivers instead of using the in-box drivers, same problem, hang at 0%.

LSI does not make drivers directly available for HBA chips, but the LSI SAS 9205-8e uses the LSI 2308, and I downloaded the drivers from LSI. They were the same version as the drivers available on the SuperMicro FTP site.

 

I contacted SuperMicro support, they suggested I install using the SATA3 port while they research the problem. Connecting the SSD drive to SATA3 port-0 installed fine.

 

I tested the same setup using Windows 7, and although Windows 7 did not include in-box drivers for the LSI 2308, after loading the drivers, Windows 7 installed fine with the SSD connected to LSI2308 port-0.

This probably indicates a Windows 8 compatibility problem with the LSI 2308 driver, or HBA firmware.

 

LSI HBA’s can be configured to run in Initiator Target (IT) or Integrated RAID (IR) mode. This can be changed by flashing with the appropriate IT or IR firmware. IT firmware is typically preferred where there is no need for hardware RAID and all disks will be in JBOD mode, e.g. for use with ZFS or Storage Spaces.

When you flash between IT and IR mode, you need to erase the firmware before re-flashing, i.e. you cannot simply flash one mode on top of another mode. On the SuperMicro motherboards, you also need to perform the flash operation from within the EFI shell, flashing from other environments will fail. You can follow these KB’s to help with the process; LSI 16266, SuperMicro 14368, and SuperMicro 14151. I would not recommend using SuperMicro 14368 method, as it wipes the entire firmware memory, and you will need to manually re-enter the SAS address. It is basically the difference between using “sas2flash -o -e 6” and “sas2flash -o -e 7”, see the SAS2flash reference guide for details.

SAS2Flash

 

The X9DA7 motherboard came with firmware version 13.0.0.56 for the LSI 2308, configured in IR mode. I updated the firmware using the firmware from the SuperMicro FTP site to 13.0.0.57 in IT mode.

The update process I followed was to boot into the EFI shell while having a USB drive attached containing the firmware update, the drive must contain the firmware, the boot BIOS, and the SAS2Flash.efi tool.

In the EFI shell run the “map” command to list the hardware and see which drive is the USB drive, mount that drive using “mount fs[drive number]:”, e.g. “mount fs1:”, then change to the directory to the USB drive using “fs1:”:

map
mount fs1:
fs1:

Then wipe the flash “sas2flash -o -e 6”, then program the new firmware and boot code “sas2flash -o -f [firmware file] -b [bootcode file]”, e.g. “sas2flsh -f 2308IT13.5FW -b mptsas2.rom”, then restart.:

sas2flash -o -e 6
sas2flsh -f 2308IT13.5FW -b mptsas2.rom
reset

Same problem, hang at 0%.

 

I again referred to the LSI site for updated firmware for the LSI 2308, and the LSI SAS 9205-8e and LSI SAS 9207-8e includes firmware P14 version 14.0.0.0, a major revision upgrade from version 13.0.0.57 from the SuperMicro site.

The P14 firmware packages does not include the EFI version of SAS2Flash, but a bit of search engine exploration showed it is still included in the P13 packages.

 

I am not quite brave enough to flash to this version yet, as a failed flash will require a hardware swap. I’ll continue running this machine with the SSD connected to the SATA3 port.

 

At this point I am waiting for SuperMicro support to get back to me with a solution, or confirming that I can flash the P14 firmware and see if that resolves the issue.