Power Saving RAID Controller (Continued)

This post continues from my last post on power saving RAID controllers.
It turns out the Adaptec 5 series controller are not that workstation friendly.
I was testing with Western Digital drives; 1TB Caviar Black WD1001FALS, 2TB Caviar Green WD20EADS, and 1TB RE3 WD1002FBYS.
I also wanted to test with the new 2TB RE4-GP WD2002FYPS drives, but they are on backorder.
I found that the Caviar Black WD1001FALS and Caviar Green WD20EADS drives were just dropping out of the array for no apparent reason, yet they were still listed in ASM as if nothing was wrong.
I also noticed that over time ASM listed medium errors and aborted command errors for these drives.
In comparison the RE3 WD1002FBYS drives worked perfectly.
A little searching pointed me to a feature of WD drives called Time Limited Error Recovery (TLER).
You can read more about TLER here, or here, or here.
Basically the enterprise class drives have TLER enabled, and the consumer drives not, so when the RAID controller issues a command and the drive does not respond in a reasonable amount of time, the controller drops the drive out of the array.
The same drives worked perfectly in single drive, RAID-0, and RAID-1 configurations with an Intel ICH10R RAID controller, granted, the Intel chipset controller is not in the same performance league.
The Adaptec 5805 and 5445 controllers I tested did let the drives spin down, but the controller is not S3 sleep friendly.
Every time my system resumes from S3 sleep ASM would complain “The battery-backup cache device needs a new battery: controller 1.”, and when I look in ASM it tells me the battery is fine.
Whenever the system enters S3 sleep the controller does not spin down any of the drives, this means that all the drives in external enclosures, or on external power, will keep on spinning while the machine is sleeping.
This defeats the purpose of power saving and sleep.
The embedded Intel ICH10R RAID controller did correctly spin down all drives before entering sleep.
Since installing the ASM utility my system is taking a noticably longer time to shutdown.
Vista provides a convenient, although not always accurate, way to see what is impacting system performance in terms of even timing, and ASM was identified as adding 16s to every shutown.
Under [Computer Management][Event Viewer][Applications and Services Logs][Microsoft][Windows][Diagnostics-Performance][Operational], I see this for every shutdown event:
This service caused a delay in the system shutdown process:
File Name : AdaptecStorageManagerAgent
Friendly Name :
Version :
Total Time : 20002ms
Degradation Time : 16002ms
Incident Time (UTC) : 6/11/2009 3:15:57 AM
It really seems that Adaptec did not design or test the 5 series controllers for use in Workstations, this is unfortunate, for performance wise the 5 series cards really are great.
[Update: 22 August 2009]
I received several WD RE4-GP / WD2002FYPS drives.
I tested with W2K8R2 booted from a WD RE3 / WD1002FBYS drive connected to an Intel ICH10R controller on an Intel S5000PSL server board.
I tested 8 drives in RAID6 connected to a LSI 8888ELP controller, worked perfectly.
I connected the same 8 drives to an Adaptec 51245 controller, at boot only 2 out of 8 drives were recognized.
After booting, ASM showed all 8 drives, but they were continuously dropping out and back in.
I received confirmation of similar failures with the RE4 drives and Adaptec 5 series cards from a blog reader.
Adaptec support told him to temporarily run the drives at 1.5Gb/s, apparently this does work, I did not test it myself, clearly this is not a long term solution, nor acceptable.
I am still waiting to hear back from Adaptec and WD support.
[Update: 30 August 2009]
I received a reply from Adaptec support, and the news is not good, there is a hardware compatibility problem between the WD RE4-GP /WD2002FYPS drives.
“I am afraid currently these drives are not supported with this model of controller. This is due to a compatibility issue with the onboard expander on the 51245 card. We are working on a hardware solution to this problem, but I am currently not able to say in what timeframe this will come.”
[Update: 31 August 2009]
I asked support if a firmware update will fix the issue, or if a hardware change will be required.
“Correct, a hardware solution, this would mean the card would need to be swapped, not a firmeware update. I can’t tell you for sure when the solution would come as its difficult to predict the amount of time required to certify the solution but my estimate would be around the end of September.”
[Update: 6 September 2009]
I experienced similar timeouts testing an Areca ARC-1680 controller.
Areca support was very forthcoming with the problem and the solution.
“this issue had been found few weeks ago and problem had been reported to WD and Intel which are vendors for hard drive and processor on controller. because the problem is physical layer issue which Areca have no ability to fix it.
but both Intel and WD have no fix available for this issue, the only solution is recommend customer change to SATA150 mode.
and they had closed this issue by this solution.
so i do not think a fix for SATA300 mode may available, sorry for the inconvenience.”
That explains why the problem happens with the Areca and Adaptec controllers, but not the LSI, both use the Intel IOP348 processor.

7 Comments

  1. Rodger says:

    I just purchased an Adaptec 51645 RAID controller and 8 WD2002FYPS 2TB drives. I am seeing the same behavior you mention. I contacted Adaptec Support and they confirmed the issues and said that tomorrow (09-Sep-2009) they would be releasing quarterly updates. This would include firmware, drivers, and compatibilty guide. I was getting ready to RMA all my drives but I am going to wait a couple days. Thanks for this blog entry. It helped me alot.

    Like

  2. Kevin says:

    So Rodger- did it work?! I'm dying to know, since I was about to hit 'purchase' on NewEgg for 24 of these drives and would like someway, somehow, to find a controller that would actually work with them…

    Like

  3. Karl says:

    I have been trying to get the WD2002FYPS drives to work on the Areca 5020 since July. I am the case Areca has been trying to troubleshoot you mentioned above. I have a RAID1 with a hot spare and it crashes consistently (because of 2TB limit on XP 32-bit, with hopes of Windows 7 64-bit install soon). It will degrade or show read errors constantly. I have to shut the RAID down and rescan my hardware to get it to show up again. I think it is also because of read/write errors because the controller can't handle the 64MB cache in the WD2002FYPS's. The other problem is the delayed writes, degrades, etc. are affecting my internal RAID1 on my Precision 380 (2 10K Velociraptors). I have had to replace 3 of them. I have also replaced the Areca 5020 once, 2 WD2002FYPS drives and the 3 Velociraptor drives and it still doesn't work. I have tried every configuration and I refuse to have to revert the drives to 1.5gbps, turn off LPM or use USB instead of SATA just to get them to work. Seems like it defeats the purpose buying these to begin with! If anyone has any ideas, it would be gratefully appreciated!

    Like

  4. godskins says:

    Scuse my english.. I speak FR usualy… I have the same prob…Adaptec 51245 firmware 5.2.0 Build 17380HDD = 12X 1TB WD1001FALSAsus P6T6 Re workstationWindows 2008 x64 R1 SP2 FrançaisEnterprise Edition 6.0 Service Pack 2 (Build #6002)I'm don't use the "power saver" utility… i'm realy disappointed, aggry about a product of this cost…

    Like

  5. _ says:

    im using the 5805 with 4x WD20EADS. my problem is that although i enabled low rpm and disk shutdown mode, they continue to run.disks are factory set at 13ish seconds to spin down,OS seprate disk of course)latest firmware/drivers. i'm using a W meter to monitor this and it does not change. tried this on another PC with just one of those disks as a smple volume and i can still feel the disk vibrating a bit way past the configured powerdown timer….

    Like

  6. jim says:

    I want to say that Adaptec's support totally blows. After getting a lame response to several issues – basically they passed the buck to MB and FW of drives – I asked them to explain exactly what was meant by the literature on the back of the box. In particular, it states "Compatibility and Support You Can Count On" … "deliver unsurpassed compatibility – now and in the future." Adaptec didn't have to ball to respond. They said they knew of no issues with their board. Yet, Pieter's Blog clearly states they know there is a problem that requires a replacement board. As for them passing the buck, the drives were on the compatibility list with the appropriate FW. The drives drop and take 10 to 50 hours to verify or rebuild (6 arrays of RAID 1). What I do with hardware that doesn't get the support it deserves from the manufacturer, or is simply crappy hardware is: spike it to the data center wall (if it has lights and blinks, then wire it up) post the response from the vendor, and any other supporting documents. Massively good for communicating to other professionals in the field that customers should not be ignored. Adaptec: Your reputation is going to be nailed to the wall soon for many to see.

    Like

Leave a reply to godskins Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.