Western Digital RE4-GP 2TB Drive Problems

In my previous two posts I described my research into the power saving features of various enterprise class RAID controllers.
In this post I detail the results of my testing of the Western Digital RE4-GP enterprise class “green” drives when used with hardware RAID controllers from Adaptec, Areca, and LSI.
To summarize, the RE4-GP drive fails with a variety of problems, Adaptec, Areca, and LSI acknowledge the problem and lays blame on WD, yet WD insists there are no known problems with the RE4-GP drives.
Test hardware:
Intel S5000PSL motherboard, dual Xeon E5450, 32GB RAM, firmware BIOS-98 BMC-65 FRUSDR-48
Adaptec 51245 RAID controller, firmware 17517, driver 5.2.0.17517
Areca ARC1680ix-12 RAID controller, firmware 1.47, driver 6.20.00.16_80819
LSI 8888ELP RAID controller, firmware 11.0.1-0017 (APP-1.40.62-0665), driver 4.16.0.64
Chenbro CK12803 28-port SAS expander, firmware AA11
Drive setup:
– Boot drive, 1 x 1TB WD Caviar Black WD1001FALS, firmware 05.00K05
Simple volume, connected to onboard Intel ICH10R controller running in RAID mode
– Data drives, 10 x 2TB WD RE4-GP WD2002FYPS drives, firmware 04.05G04
1 x hot spare, 3 x drive RAID5 4TB, 6 x drive RAID6 8TB, configured as GPT partitions, dynamic disks, and simple volumes
I started testing the drives as shipped, with no jumpers, running at SATA-II / 3Gb/s speeds.
Adaptec 51245, SATA-II / 3Gb/s:
The Adaptec card has 3 x internal SFF-8087 ports and 1 x external SFF-8088 port, supporting 12 internal drives.
The Adaptec card had immediate problems with the RE4-GP drives, in the ASM utility the drives would randomly drop out and in.
I could not complete testing.
Areca ARC1680ix-16, SATA-II / 3Gb/s:
The Areca card has 3 x internal SFF-8087 ports and 1 x external SFF-8088 port, supporting 12 internal drives.
Unlike the LSI and Adaptec cards that require locally installed management software, the Areca card is completely managed through a web interface from an embedded Ethernet port.
The Areca card allowed the RAID volumes to be created, but during initialization at around 7% the web interface stopped responding, requiring a cold reset.
I could not complete testing.
LSI 8888ELP and Chenbro CK12803, SATA-II / 3Gb/s:
The LSI card has 2 x internal SFF-8087 ports and 2 x external SFF-8088 port, supporting 8 internal drives.
Since I needed to host 10 drives, I used the Chenbro 28 port SAS expander.
The 8888ELP support page only lists the v3 series drivers, while W2K8R2 ships with the v4 series drivers, so I used the latest v4 drivers from the new 6Gb/s LSI cards.
The LSI and Chenbro allowed the volumes to be created, but during initialization 4 drives dropped out, and initialization failed.
I could not complete testing.
I contacted WD, Areca, Adaptec, and LSI support with my findings.
WD support said there is nothing wrong with the RE4-GP, and that they are not aware of any problems with any RAID controllers.
When I insisted that there must be something wrong, they suggested I try to force the drives to SATA-I / 1.5Gb/s speed and see if that helps.
I tested at SATA-I / 1.5Gb/s speed, and achieved some success, but I still insisted that WD acknowledge the problem.
The case was escalated to WD engineering, and I am still waiting for an update.
Adaptec support acknowledged a problem with RE4-GP drives when used with high port count controllers, and that a card hardware fix is being worked on.
I asked if the fix will be firmware or hardware, and was told hardware, and that the card will have to be swapped, but the timeframe is unknown.
Areca support acknowledged a problem between the Intel IOP348 controller and RE4-GP drives, and that Intel and WD are aware of the problem, and that running the drives at SATA-I / 1.5Gb/s speed resolves the problem.
I asked if a fix to run at SATA-II / 3Gb/s speeds will be made available, I was told this will not be possible without hardware changes, and no fix is planned.
LSI support acknowledged a problem with RE4-GP drives, and that they have multiple cases open with WD, and that my best option is to use a different drive, or to contact WD support.
I asked if a fix will become available, they said that it is unlikely that a firmware update would be able to resolve the problem, and that WD would need to provide a fix.
This is rather disappointing, WD advertises the RE4-GP as an enterprise class drive, yet 3/3 of the enterprise class RAID controllers I tested failed with the RE4-GP, and all three vendors blame WD, yet WD insists there is nothing wrong with the RE4-GP.
I continued testing, this time with the SATA-I / 1.5Gb/s jumper set.
Adaptec 51245, SATA-I / 1.5Gb/s:
This time the Adaptec card had no problems seeing the arrays, although some of the drives continue to report link errors.
A much bigger problem was that the controller and battery was overheating, the controller running at 103C / 217F.
In order to continue my testing I had to install an extra chassis fan to provide additional ventilation over the card.
The Adaptec and LSI have passive cooling, where in contrast the Areca has active cooling and only ran at around 51C / 124F.
The Areca and LSI batteries are off-board, and although a bit inconvenient to mount, they did not overheat like the Adaptec.
Initialization completed in 22 hours, compared to 52 hours for Areca and 8 hours for LSI.
The controller supports power management, and drives are spun down when not in use.
3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

Areca ARC1680ix-16, SATA-I / 1.5Gb/s:
This time the Areca card had no problems initializing the arrays.
Initialization completed in 52 hours, much longer compared to 22 hours for Adaptec and 8 hours for LSI.
Areca support said initialization time depends on the drive speed and controller load, and that the RE4-GP drives are known to be slow.
The controller supports power management, and drives are spun down when not in use.

3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

LSI 8888ELP and Chenbro CK12803, SATA-I / 1.5Gb/s:
This time only 2 drives dropped out, one out of each array, and initialization completed after I forced the drives back online.
Initialization completed in 8 hours, much quicker compared to 22 hours for Adaptec and 52 hours for Areca.

The controller only supports power management on unassigned drives, there is no support for spinning down configured but not in use drives.

3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

Although all three cards produced results when the RE4-GP drives were forced to SATA-I / 1.5Gb/s speeds, the results still show that the drives are unreliable.
The RE4-GP drive fails with a variety of problems, Adaptec, Areca, and LSI acknowledge the problem and lays blame on WD, yet WD insists there are no known problems with the RE4 drives-GP.
There are alternative low power drives available from Seagate and Hitachi.
I still haven’t forgiven Seagate for the endless troubles they caused with ES.2 drives and Intel IOP348 based controllers, and, like WD, also denying any problems with the drives, yet eventually releasing two firmware updates for the ES.2 drives.
I’ve always had good service from Hitachi drives, so maybe I’ll give the new Hitachi A7K2000 drives a run.
One thing is for sure, I will definately be returning the RE4-GP drives.
[Update: 11 October 2009]
I tested the Seagate Barracuda LP and Hitachi Ultrastar 2TB drives.
[Update: 24 October 2009]
WD support still has not responded to my request for the firmware.
Advertisements

29 thoughts on “Western Digital RE4-GP 2TB Drive Problems”

  1. I have built 4 linux servers with WD2002FYPS drives over the past 3 months, all using Areca 1680ix controllers, and they have not been having drives drop out of the arrays. I did not install any jumpers on the HDD's — they all are set to the default SATA300.Each server has 16 HDD's configured as RAID 6 with a hot spare. I have a Chenbro backplane (RM31616) connected to the Areca.By the way, the Areca 1680ix-16 has four (not three) internal SFF8087 connectors, and one external SFF8088. It takes just under 10 hours to initialize the RAID 6. I found that it is highly beneficial to force the individual HDD write caches to "Enabled" with the Areca system menu (then reboot), otherwise the initialization would take days. With a battery backup, when the HDD write cache is set to auto, that corresponds to disabling the write cache, which is normally what you want, but during initialization of the array it is better to have it enabled temporarily.The only problem I had with these servers is with the v1.46 firmware that shipped on the Areca's. There was high latency the first time a file is read or written, about 2 or 3 seconds. But when I upgraded the firmware to v1.47, this problem went away. Now all servers are happily serving more than 20 TB of data each, with only 1 HDD failure out of 64 HDD's (infant mortality?).Can you describe in more detail your Areca 1680ix setup and what exactly you observed with the WD2002FYPS drives?

    Like

  2. The firmware is out. Release notes:http://photos.imageevent.com/jayspics/docs/2579-701378-A00.pdfDescription of Change:Firmware Improvements: Addressing firmware compatibility associatedwith some systems with specific power management and warm bootexecution paths. These changes do not affect the form or fit of the drive butdo positively affect the function of the drive.Issues Identified and Addressed:1) Command Completion Time Outs:a)The RE4 GP drives intermittently would spin down while exitingfrom the lowest idle power mode (heads park in the rampautomatically) due to a false RPM speed fault.b) Improved error handling of a very specific type of servo defectto eliminate multiple missed revolutions during the operation ofwrite commands.2) Warm Boot Drop-Offs:The drive can drop off line intermittently after a warm boot (RAIDinitiated reboot following a RAID build).3) Performance:Optimized sequential write performance to eliminate pauses duringdata transfers

    Like

  3. John:The port count discrepancy is because I used an ARC1680ix-12 controller, not an ARC1680ix-16 as posted, I corrected the post.The setup and failures are as I described in the post, 3 drive RAID5, 6 drive RAID6, web frontend stops responding at 7% initialization.I did not test the Areca with a port expander, but I am glad to hear that your setup is working.Jay:Thanks for the FW info.

    Like

  4. The systems I built did not have a SAS expander. Rather, the 4 SFF-8087 ports on the 1680 are connected to 4 SFF-8087 ports on a Chenbro 16 drive SAS backplane.Did you use SFF-8087 to 4 SATA breakout cables to connect the SATA drives?Perhaps something in the SAS backplane circuitry helps to connect the SATA drives to the Areca without the severe issues you were seeing?Also, I should add that one of the 4 servers recently had 2 drives drop out of the array in 1 week, but the drives seem fine. It seems that certain access patterns may still have some incompatibilities (even though we had copied over 70 TB of data onto the arrays with no problems).I have since jumpered the RE4s to 1.5 Gb and turned off NCQ on the Areca. I've seen other reports on 2cpu that this elminates dropouts.However, I think the next servers I build will use the Areca 1261ML controller with the RE4s, since why introduce the complication of SAS when the drives are SATA?

    Like

  5. I do have the firmware and I have tested it on 24 drives in Raid-6. I so far have done a block copy of the entire array (40TB) and didn't get any drives drop out of the array. I do have a similar setup using a Dual Sas Expander Chenbro backplane (36-port) connected to a LSI 9280DE-8e. You can go ahead and contact LSI for the firmware. The Nov. 15 date just means that future hard drives with have that release.

    Like

  6. Just to tell you, that when I did the firmware upgrade on my existing Raid-6 array, I lost all the data. I'm not sure if was something I did wrong. 5 out of the 24 drives failed, so i had to recreate the Raid-6. It didn't matter to me because I'm just testing this stuff right now

    Like

  7. Call up LSI and I think Leonard can help you get the firmware. Ask for him. It's not on the website. I did do an upgrade of 24 more of these drives and didn't lose any data, so this Raid-6 stayed intact.

    Like

  8. Hello gents, we came accros this post as we are just about to purchase these drives. Can you please confirm that the firmware update has solved the problem. Also I am not seeing anyone using the 3ware controller. Any thoughts about that?

    Like

  9. Let me quickly report my experience flashing this drive with the 05 firmware: I had to use the HP tool to create a bootable DOS USB key and use the IDE mode (and not AHCI, not RAID) in the PC BIOS MCP option somewhere.I don't use RAID so I don't know the read error issues posted above. I just wanted to have the latest firmware in a RE4-GP.I use the disk as external SATA (e-SATA is that?) for manually backing up files on a PAckardBell imax X9992 running Vista.First I had to produce that bootable DOS USB key. I used the "HP USB Disk Storage Format Tool". Others I tried and was not successful with are bootdisk.com, vfd/mkbt of http://blogs.sun.com/dragonfly/entry/dos_bootable_usb_flash_drive and more.Once I created the DOS USB key I copied the 0405G05.exe and .bin on it.Then I had to switch the SATA MCP setting (in the PC BIOS) from "AHCI" to "IDE". Otherwise, the WD firmware flashing tool 0405G05.exe would not recognize the RE4-GP disk and wouldn't flash.(FWIW, previously I also installed the latest nVidia nForce SATA drivers, because the Vista SATA would not format to 100% but only to 54%, as reported in "formating issues with a WD 2tb drive can't format past 1tb(931gb)" at microsoft site.)Once the BIOS was IDE, I disconnected the PC internal hdd, for safety. Then I booted the DOS USB key and the 0405G05.exe tool recognized the RE4-GP and flashed it ok. It took like 30s or less, or so.Then, for formatting the RE4-GP in Vista, I had to continue using that "IDE" option in BIOS, it took like 5 to 6hours. Not AHCI, because when I tried to format it while "AHCI" in BIOS it lasted 14hours and only did 15%, at which point I got bored.This is strange, because I had previously formatted another RE4-GP, which had the 04 firmware, while using the AHCI option and it didn't take ages, just 5 hours or so. IT seems to me the 05 firmware makes it mandatory to format while "IDE" option, and 04 would format while "AHCI" option (if you want fast 5hours formatting).Now, I can see the drive reporting version 05 version (and not 04) in some Vista software tools but only if using IDE in BIOS. If using AHCI it doesn't show version 05 but 04.Finally, I switched back the option from IDE to AHCI, and did some disk rw testing and it reports faster (like 15-20%) than IDE. So I use it with AHCI.Thanks for posting the links to the firmware. I hope WD posts these links on their website.Alex

    Like

  10. In my case, I had problems even after the firmware updates for the drives and the mobo. This server was bought from a vendor that sells SuperMicro, and the controller on the board is LSI SAS1068E. The drives would timeout and cause a controller reset, causing forced raid rebuilds, and one time, knocking all 7 raid member disks completely offline, requiring a reboot. It would happen seemingly randomly, until I realized I could sometimes induce a controller reset by sending smartctl -a to all /dev/sd* drive device files.Anyway, the server vendor confirmed with Western Digital that my 7 RE4-GP's had serial numbers which WD acknowledged a manufacturing defect. WD advance shipped replacement drives which arrived today. I'm still transferring data to the new drives and have yet to test for the controller reset issue. That said, I don't know if I'm out of the weeds just yet, but I take the fact that WD shipped $1800 worth of replacement drives as a good sign.This thread seems dead but I thought this experience may help some people.

    Like

  11. Regarding Mike's comment…The LSI 1068 controllers have a firmware/hardware bug which affects ATA passthrough commands (such as smartctl).https://bugzilla.kernel.org/show_bug.cgi?id=14831… just to make things more miserable, we've also seen other 2TB WD drives reset spontaneously from time to time when querying SMART using different controllers (Seagate, and Hitachi drives were fine on the same setup).It's also worth noting that Dell frequently release drive firmware updates, and their changelogs are pretty good – check the SATA firmware update packs for recent Poweredge systems. I'm pretty sure I saw some RE4 updates in a recent pack.

    Like

  12. I wasn't intend to buy any wd product because I have two 750G "metal brick" marked WD in my basement. I use them as the base of my speaker stand…..but my friend bought it for me and I forgot the tell him not to buy this brand….my nightmare starts from that day.at first, I thought it's "Enterprise grade" hd, so I decide to give it another chance, I keep it in cool place running 2-3 times a week as a backup drive. in 3 months, it starts "click" while idle, in less than 6 months, I noticed the transfer speed become unstable, and the clicking sound happens more often. in less than 1 year, it decides to quit, I cannot read the NTFS index table and it slows down my whole IBM workstation, windows 7 took 5 minutes to locate this disk but won't show any size info or partition.checked the S.M.A.R.T table, found the problem, head relocate counter is 180k, the design limit is 191k, google it you will find out it's their firmware faulty. WD wasted my time and money again! Be AWARE!I have been using harddrive since 8088 and 286 era, I loved IBM and Quantum, but they both made big mistake and quit this market. some IBM harddrive from 10 years ago is still running in my thinkpad limited edition (S30), Seagate dies sometimes too, but WD is the worst. Maybe that's the way they make money. I went to their website trying to get the warranty, I bought it in June, 2010, and my warranty is end in July, 2010!!! They told me someone has used my serial number to get a "customer loyalty upgrading", therefore my harddrive is out of warranty the day that guy steals my serial. but they cannot tell me any thing about "that guy"….well…..I've learned a lesson now.

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s