Delete, Are You Sure, Yes, Oh Oh

I just had one of those moments where the blood drains from your head as you contemplate what you did.

I was in Disk Management Console, messing around with new drives I was testing, and I wanted to delete the volumes. Right Click, Delete, Are you sure, your data will be lost, and without thinking twice, I clicked yes.
Then the blood drained from my head as I realized what I just did; instead of deleting the test volume, I just deleted my main 5TB data volume!

This is 5TB GPT partition backed by RAID5 on 4 x 2TB drives. The RAID thing makes it even worse, what is the point of RAID if I manually delete the partition and destroy the data myself?

Luck was on my side though, just yesterday I ran a backup, so I could recover most of the data, but all my VM images will be gone, they are too big to backup.

I know partition recovery is possible, I’ve done it in DOS a few times, but that was DOS and a long time ago. I also know it is possible using direct disk editing, I know an admin who had to recover a volume on a SAN he accidentally deleted, but that was done with the help of EMC support. I was looking for an easier solution.

A Google search for partition recovery software showed a variety of options, some free, some paid. I found only one product that could recover GPT partitions, Active@ Partition Recovery.

I ran the trial software, and it found no drives. I found it strange that it did not UAC elevate, so I ran it again, this time as admin. It showed all my drives, I selected the drive with the missing volume, clicked quick scan, and it found a recoverable partition. Clicking on recover prompted me to enter a serial number, so I purchased a copy, entered my serial, clicked recover again, and waited. The UI reported that it saved a copy of the partition table, but it showed no progress of actually recovering the volume. I waited a bit more, opened explorer, and there was my drive back, fully recovered.

So Active@ Partition Recovery is not perfect, it needs some usability enhancements, but it works great. If you have experience with other partition recovery software supporting GPT, let me know in the comments.

Data Robotics DroboPro vs. QNAP TS-859 Pro

I previously wrote about my impressions of the DroboPro, and in case I was not clear, I was not impressed.

I recently read the announcement of the new QNAP TS-859 Pro, and from the literature it seemed like a great device, high performance, feature rich, and power saving.
The TS-859 Pro is now available, and I compared it with the DroboPro, and against a regular W2K8R2 file server.

The TS-859 is taller than the DroboPro, the DroboPro is deeper than the TS-859, and the width is about the same.

Before I get to the TS-859, let’s look at the DroboPro information and configuration screens.
The OS is Windows Server 2008 R2 Enterprise, but the steps should be about the same for Vista and Windows 7.
All but the DroboCopy context menu screens are listed below.
DroboPro information and configuration:

The dashboard believes there are no volumes, but Windows sees an unknown 2TB volume:

Creating a new volume:

As the dashboard software starts creating the volume, Windows will detect a new RAW volume being mounted, and asked if it should be formatted.
Just leave that dialog open and let the dashboard finish.
The dashboard will complete saying all is well, when in reality it is not:

The dashboard failed to correctly mount and format the volume.

Right click on the disk, bring it online, format the partition as a GPT simple volume.

The dashboard will pick up the change and show the correct state.

Email notifications are configured from the context menu.
The email notifications are generated by the user session application, so no user logged in, no email notifications.

DroboPro does not provide any diagnostics, even the diagnostic file is encrypted.

Unlike the DroboPro that comes with rudimentary documentation, the TS-859 has getting started instructions printed right on the top of the box, and includes a detailed configuration instruction pamphlet.
The DroboPro also has configuration instructions in the box, printed on the bottom of a piece of cardboard that looks like packaging material, and I only discovered these instructions as I was throwing out the packaging.
I loaded the TS-859 with 8 x Hitachi A7K2000 UltraStar 2TB drives.
On powering on the TS-859 the LCD showed the device is booting, then asked me if I want to initialize all the drives as RAID6.
You can opt-out of this procedure, or change the RAID configuration, by using the select and enter buttons on the LCD.
I used the default values and the RAID6 initialization started.
The LCD shows the progress, and the process completed in about 15 minutes.
Unlike the DroboPro that requires a USB connection and client side software, the TS-859 is completely web managed.
The LCD will show the LAN IP address, obtained via DHCP, login using the browser at http://%5BIP%5D:8080.
The default username and password is “admin”, “admin”.

Although the initial RAID6 initialization took only about 15 minutes, it took around 24 hours for the RAID6 synchronization to complete.
During this time the volume is accessible for storage, the device is just busy and not as responsive.

Unlike the DroboPro that shows no diagnostics, and generates an encrypted diagnostic file, the TS-859 has detailed diagnostics.

Unlike the DroboPro, email alerts are generated from the device and does not require any client software.

SMB / CIFS shares are enabled by default.

iSCSI target creation is very simple using a wizard.

While configuring the TS-859, I ran into a few small problems.
I quickly found the help and information I needed on the QNAP forum.
Unlike the DroboPro forum, the QNAP forum does not require a device serial number and is open to anybody.
The TS-859 default outbound network communication, SMTP, NTP, etc,. defaults to LAN1.
I had LAN1 directly connected for iSCSI and LAN2 connected to the routable network.
NTP time syncs were failing, after switching LAN1 and LAN2, the device could access the internet and NTP and the front page RSS feed started working.
Make sure to connect LAN1 to a network that can access the internet.
When I first initialized the RAID6 array, drive 8 was accessible and initializing, but didn’t report any SMART information.
I received instructions from the forum on how to use SSH to diagnose the drive, and after replacing the drive, SMART worked fine.
What I really wanted to do was compare performance, and to keep things fair I setup a configuration that had all machines connected at the same time.
This way I could run the tests one by one on the various devices, without needing to change configurations.

The client test machine is a Windows Server 2008 R2, DELL OptiPlex 960, Intel Quad Core Q9650 3GHz, 16GB RAM, Intel 160GB SSD, Hitachi A7K2000 2TB SATA, Intel Pro 1000 ET Dual Port.
The file server is a Windows Server 2008 R2, Intel S5000PSL, Dual Quad Core Xeon E5500, 32GB RAM, Super Talent 250GB SSD, Areca ARC-1680 RAID controller, 10 x Hitachi A7K2000 2TB SATA, RAID6, Intel Pro 1000 ET Dual Port.
The DroboPro has 8 x Hitachi A7K2000 2TB SATA, dual drive redundancy BeyondRAID, firmware 1.1.4.
The TS-859 Pro has 8 x Hitachi A7K2000 2TB SATA, RAID6, firmware 3.2.2b0128.

The client’s built in gigabit network card is connected to the switch.
The server’s built in gigabit network card is connected to the switch.
The TS-859 Pro LAN1 is connected to the switch.
The TS-859 Pro LAN2 is directly connected to the client on one of the Pro 1000 ET ports.
The DroboPro LAN1 is directly connected to the client on one of the Pro 1000 ET ports.

The DroboPro is configured as an iSCSI target hosting a 16TB volume.
The TS-859 Pro is configured as an iSCSI target hosting a 10TB volume.
The difference in size is unintentional, both units support thin provisioning, the DroboPro maximum defaults to the size of all drives combined, and the TS-859 maximum defaults to the effective RAID size.

The client maps the DroboPro iSCSI target as a GPT simple volume.

The client maps the TS-859 Pro iSCSI target as a GPT simple volume.

The first set of tests were done using ATTO Disk Benchmark 2.46.
Intel 160GB SSD:

Hitachi A7K2000 2TB SATA:

DroboPro iSCSI:

TS-589 Pro (1500 MTU) iSCSI:

TS-589 Pro Jumbo Frame (9000 MTU) iSCSI:

Read performance:
Device Speed (MB/s)
Intel SSD SATA 274
Hitachi SATA 141
TS-859 Pro Jumbo iSCSI 116
TS-859 Pro iSCSI 113
DroboPro iSCSI 62

Write performance:
Device Speed (MB/s)
Hitachi SATA 141
Intel SSD SATA 91
TS-859 Pro Jumbo iSCSI 90
TS-859 Pro iSCSI 83
DroboPro iSCSI 65


Summary:



The next set of tests used robocopy to copy a fileset from the local Hitachi SATA drive to the target drive backed by iSCSI.

The fileset consists of a single 24GB Ghost file, 3087 JPG files totaling 17GB, and 25928 files from the Windows XP SP3 Windows folder totaling 5GB.

DroboPro iSCSI:
Fileset Run 1 (B/s) Run 2 (B/s) Run 3 (B/s) Average (B/s)
Ghost 67998715 66449606 61345194 65264505
JPG 47376106 34469965 28865504 36903858
XP 33644442 21231487 18780348 24552092
Total 149019263 122151058 108991046 126,720,456


System load during Ghost file copy to DroboPro:


TS-859 Pro iSCSI:
Fileset Run 1 (B/s) Run 2 (B/s) Run 3 (B/s) Average (B/s)
Ghost 94824771 103356597 102596286 100259218
JPG 50591459 51817921 55830439 52746606
XP 39133922 38128876 37972580 38411793
Total 184550152 193303394 196399305 191,417,617


TS-859 Pro Jumbo iSCSI:
Fileset Run 1 (B/s) Run 2 (B/s) Run 3 (B/s) Average (B/s)
Ghost 91427745 113113714 112684967 105742142
JPG 49525622 51203544 51477482 50735549
XP 31910014 37429864 37699130 35679669
Total 172863381 201747122 201861579 192,157,361


System load during Ghost file copy to TS-859 Pro Jumbo:

This test uses the same fileset, but copies the files over SMB / CIFS.
Server SMB:
Fileset Run 1 (B/s) Run 2 (B/s) Run 3 (B/s) Average (B/s)
Ghost 108161169 116949441 115138722 113416444
JPG 53969349 56842239 55586620 55466069
XP 15829769 17550875 19336648 17572430
Total 177960287 191342555 190061990 186,454,944


TS-589 Pro SMB:
Fileset Run 1 (B/s) Run 2 (B/s) Run 3 (B/s) Average (B/s)
Ghost 64295886 65486617 63494735 64425746
JPG 52988736 52633239 53177864 52933279
XP 14345937 15703244 15506456 15185212
Total 131630559 133823100 132179055 132,544,238



Summary:

In terms of absolute performance the TS-859 Pro with Jumbo Frames is the fastest.
For iSCSI TS-859 Pro with Jumbo Frames is the fastest
For SMB the W2K8R2 server is the fastest.
If we look at the system load graphs we can see that the DroboPro network throughput is frequently stalling, while the TS-859 is consistently smooth.
This phenomena has been a topic of discussion on the DroboPro forum for some time, and the speculation is that the hardware cannot keep up with the network load.
Further speculation is that because the BeyondRAID technology is filesystem aware, it requires more processing power compared to a traditional block level RAID that is filesystem agnostic.

So let’s summarize:
The TS-859 Pro and the DroboPro are about the same price, around $1500.
The TS-859 Pro is a little louder than the DroboPro (with the DroboPro cover on).
The TS-859 Pro is not as pretty as the DroboPro, arguable.
The TS-859 Pro has ample diagnostics and remote managament capabilities, the DroboPro has none.
The TS-859 Pro has loads of features, the DroboPro provides only basic storage.
The TS-859 Pro is easy to setup, the DroboPro requires a USB connection and still fails to correctly configure, requiring manual intervention.
The TS-859 Pro outperforms the DroboPro by 52%.
The TS-859 Pro will stay in my lab, the DroboPro will go 🙂

DroboPro Impressions

In this post I am describing, and partly reviewing, my experience using a DroboPro over iSCSI.

I have been aware of of the Drobo storage devices for some time now, but never used one or knew anybody that owned one.

Recently a coworker’s large home RAID system had a controller failure, and after recovering the data, he migrated to a DroboPro using iSCSI.
After he told me how quite the device is, and how little power it uses, I wanted to try one out myself.
As I always do before purchasing hardware or software, I wanted to visit the community forums to see what owners have to say about their products.
But, to gain access to the Drobo Forums you have to register, and to register you need a valid Drobo device serial number, so there really was no way to know what was being discussed before purchasing a device.
This does seem rather weird, makes me wonder if they want to hide something, and searching online I found several other people that had similar feelings about Drobo’s forum policy, some saying so more politely than others.
Searching for DroboPro reviews online, I found mixed results, I found questions being asked about DroboPro and iSCSI, and several very negative Drobo comments, specifically unhappy Drobo Share owners.
One particular item of interest was the Drobo Users Community Forum, where the site owner closed the site in response to his dissatisfaction with the device and Data Robotics.
Even with the uncertainty of the device capabilities and stability, I decided to try one out anyway.
When I got the device I was surprised by just how small it is, about the size of a small form factor computer.
I unpacked the device, it comes with USB, FireWire, Ethernet, power cables, a CD and a user guide.
What I found missing was a getting started guide, and I went to the DroboPro support KB site in search of getting started documentation, I found none.
Admittedly, after I already had the device working, and as I was throwing away the packaging, I found the getting started steps printed on a piece of packaging.
I think a simple brochure would have been much more helpful compared to printing it under a part of the pretty packaging, that I discarded as I opened the box.
In order to configure the device you must use a USB connection and the Drobo Dashboard software.
I installed the dashboard, I plugged in the Ethernet cable and USB cables, and powered on.
Nothing, the dashboard software would not see the DroboPro.
Long story short, it turns out that you may have only one cable connected at a time, and since I had Ethernet and USB, the USB did not connect.
Admittedly, the getting started steps on the packaging did say Ethernet OR USB OR FireWire, but I did not literally take this as USB ONLY.
I now have the DroboPro running, and the dashboard sees the device.
There are no drives in the DroboPro, and the status light in the first drive slot is red, this means add a drive.
Strangely, even without a drive in the DroboPro a drive did appear in disk manager, the drive size was reported as a very big negative number, and a 32MB partition of unknown type, weird.
I inserted the first drive (Hitachi UltraStar A7K2000 2TB), the red light flashed for a bit, then turned green, and the second slot turned red.
I inserted the second drive, it turned green, and I continued inserting the remainder of the 8 drives.
While I was inserting the remainder of the drives, the second slot had some problem, I could hear the drive spinning up and down a few times, and then the slot turned red.
I replaced that drive with another, and the slot went back to green.
I went back to disk manager, and the previous 32MB disk was now gone, and instead there was a 2TB RAW drive, again a drive I did not create.
I opened the Drobo Dashboard volume manager, deleted the 2TB volume that was automatically created, and created a new 16TB NTFS volume.
The Drobo Dashboard automatically partitions and formats the volume for you, the supported file systems, on Windows, are FAT32 and NTFS.
While in the settings I changed the device settings to dual disk redundancy.
After applying this change, the device was busy for a few minutes flashing all drive lights, I assume while it was rearranging bits on the disks.
When you create a volume you must specify the partition file system format type.
My understanding is that the BeyondRAID technology used by Drobo requires understanding of the file system format, this is how they can dynamically move files around, and dynamically adjust the volume size, something that is not possible with traditional block level RAID.

Although the logical volume is reported as 16TB in size, the actual available storage using 8 x 2TB drives is about 11TB.

The logical volume size reported by the DroboPro to the OS is unrelated to the physical available storage size.
The Drobo documentation says one should create a volume as large as the maximum size you may ever need, and then simply add drives to back that storage as you need the space.
I tested this by creating 2 additional 16TB volumes, three times the physical storage capacity, and the drives showed up fine.
The one caveat is that if you ever format the partition, you must use quick format, regular format will fail.
While on the topic of sizes, the Drobo mixes SI and IEC prefixes, they say TB and GB, but they really mean TiB and GiB.
I even found a post about this on their forum, and the moderator response was that “most people don’t know the difference”, with this type of indifference the confusion will never be properly addressed.
I wanted to delete the 2 test volumes, and before I did this I wanted to USB Safely Remove the volume.
The safely remove failed, telling me that the device is in use, and that the DDService.exe was holding open the handles.
DDService.exe is the Drobo Dashboard Service.
Now was my opportunity to register with Drobo Forum.
After posting my question, a moderator almost immediately responded saying that I should use the dashboard to power down the device, and that the dashboard will unmount the volume.
I did not want to power down, I just wanted to unmount the volume.
I even found a Drobo support KB saying to use either the dashboard or the normal safely remove procedure.
Several users replied saying they have similar problems with the dashboard service preventing safe removal.
I deleted the two test volumes using the dashboard, it did appear to unmount them, and then reboot.
Still, one would expect the Drobo service to correctly respond to device removal notifications.
I wanted to know why the original drive in bay 2 had failed.
The dashboard does not display any diagnostic information, no drive power state, no SMART state, nothing.
When you right click on the dashboard tray icon there is an option to create a diagnostic report.
At first it seemed like the diagnostic report dialog hanged, then I noticed that DDService.exe crashed.
I restarted the dashboard and the service, and this time the report file was created on the desktop, to my surprised the file was encrypted.
Not allowing me access to any diagnostic information is highly unusual.
I found an old forum post on the now closed Drobo Users Community Forum, describing that the data file is a simple XOR.
But since the forum is closed the post was no longer available, fortunately the google cache still has the information.
Unfortunately it turns out that the encryption on newer models have changed.
I opened a support ticket attaching my diagnostic file, and requested the reason for the drive failure, I also asked why the file is encrypted.
I received a reply stating that the drive experienced 2 timeouts, and that is why it was kicked out.
The reason for the encryption is that apparently the log contains details of the BeyondRAID file movements, and that this is proprietary information.
Ok, I can understand not wanting to give away the secret sauce, but not making any diagnostic information available, and requiring tech support interaction for any questions will become a problem.

I was now ready to switch to iSCSI.
The PDF included on the CD was not much help, but the KB articles on the Drobo support site was helpful.
The steps calls for; power up with USB only, configure using dashboard, power down using dashboard, disconnect USB, connect Ethernet, power up, dashboard will reconnect after a few minutes.

The steps say that for DroboPro connected to a switch you cannot use automatic IP configuration, and you must use a static IP.

I could not see why no, so I ignored the steps and used automatic configuration, for whatever reason, it does not work.
I went back to USB, selected a static IP, rebooted, and this time after a few minutes the dashboard connected to the DroboPro, and the drive I had previously created re-appeared.
I assumed that the dashboard is configuring iSCSI targets for me,
I opened the Windows iSCSI Initiator, and as expected the target and device was already configured.
To test the device I started a robocopy of a large set of backup data from my PC to the DroboPro.
At this point I am not testing performance, I will do that later when I have the DroboPro connected to a dedicated Ethernet port.
The copy started, and I left the machine idle while it copied.
On returning later my machine had gone to sleep, and the DroboPro had gone to sleep, just the orange power light was on.
I woke up my machine, and noticed that the robocopy was still in progress, half way through a large file, but not resuming.
I waited, but the DroboPro did not wake up.
Every window I opened and every application I started on my PC would just hang.
In the end I had to reset my machine.
Back to the forum, and as before a moderator responded very quickly.
After a few back and forth questions, the moderator confirmed that it is a known problem with dashboard version 1.6.6 that I was using.
The suggested fix was to simply restart the dashboard and the dashboard service on wake from sleep.
This was not a reasonable solution for disappearing and hanging volumes will lead to data corruption.

I opened a support ticket, and was asked to revert to the older dashboard version 1.5.1.
On uninstalling the 1.6.6 dashboard, I received an error that I must be an administrator, but I am an administrator.
Support told me to disable UAC, and then uninstall.
This is rather surprising, Windows 7 has already shipped and the Drobo software is still not Vista / UAC ready.
On installing dashboard 1.5.1 I found that it is even more Vista unfriendly, the dashboard requires elevation, is added to the startup group, but applications requiring elevation are not allowed to auto start.
Even with the UAC quirks, so far with dashboard 1.5.1 I have not had any hanging problems on resume from sleep.
The dashboard includes an email alert feature.
But after I set it up, and pulled a drive, I did not receive an email alert.
Back to the forum, and a confirmation that the email alert is generated by the dashboard user session process.
This means that no user logged in, no email alert.
I find it rather weird that Drobo implemented iSCSI, and uses words like “enterprise ready“, “enterprise level“, and achieved “VMWare Ready” certification, yet there is not a single enterprise level feature in the product.
And not that I expect enterprise level reliability or performance in a consumer device, but basic functionality that is found in almost all comparable devices.
  • iSCSI and IP connectivity, but no web management interface.
  • USB for setup requires proximity to a physical machine, no remote management, and no virtual machine provisioning.
  • No DHCP support when connected to a LAN.
  • No raw volume management, must be a supported file system, must be managed by dashboard app.
  • I have to trust DroboPro with my data, but there is no diagnostic or health status.
  • I have to trust Data Robotics, but the forum is closed and diagnostic logs are encrypted.
  • Email alerts requires a user to be logged in, if I was logged in I would not need an email alert.
  • Software that is not fully Vista compatible, even after Windows 7 already shipped.
  • Software that shipped with known problems that could cause data corruption.
The DroboElite is more than double the price of a DroboPro.
The main differences between DroboPro and DroboElite are dual Ethernet ports, multi host access to volumes, and more volumes.
Although I do not have one to test, from what I can gather in documentation and the forum, none of the items above are any different.
As a direct attached USB or FireWire storage device some of the above mentioned items would be irrelevant, but iSCSI, I really expected more.

Next up, I’ll move the DroboPro from my workstation to my W2K8R2 server on a dedicated Ethernet port.

This will give me the ability to do some performance and benchmarking comparison between RAID6 DAS and the DroboPro BeyondRAID iSCSI.
[Update: 30 January 2010]

Hitachi Ultrastar and Seagate Barracude LP 2TB drives

In my previous post I talked about Western Digital RE4-GP 2TB drive problems.

In this post I present my test results for 2TB drives from Seagate and Hitachi.
The test setup is the same as for the RE4-GP testing, except that I only tested 4 drives from each manufacturer.
Unlike the enterprise class WD RE4-GP and Hitachi Ultrastar A7K2000 drives, the Seagate Barracuda LP drive is a desktop drive.
The equivalent should have been a Seagate Constellation ES drive, but as far as I know the 2TB drives are not yet available.
To summarize:
The Hitachi A7K2000 drives performed without issue on all three controllers, the Seagate Barracuda LP drive failed to work with the Adaptec controller.
The Hitachi Ultrastar A7K2000 outperformed the Seagate Barracuda LP drive, but this was not really a surprise given the drive specs.
The Areca ARC1680 controller produced the best and most reliable results, the Adaptec was close, but given the overheating problem, it is not reliable unless additional cooling is added.
Test hardware:
Intel S5000PSL motherboard, dual Xeon E5450, 32GB RAM, firmware BIOS-98 BMC-65 FRUSDR-48
Adaptec 51245 RAID controller, firmware 17517, driver 5.2.0.17517
Areca ARC1680ix-12 RAID controller, firmware 1.47, driver 6.20.00.16_80819
LSI 8888ELP RAID controller, firmware 11.0.1-0017 (APP-1.40.62-0665), driver 4.16.0.64
Chenbro CK12803 28-port SAS expander, firmware AA11
Drive setup:
– Boot drive, 1 x 1TB WD Caviar Black WD1001FALS, firmware 05.00K05
Simple volume, connected to onboard Intel ICH10R controller running in RAID mode
– Data drives, 4 x 2TB Hitachi Ultrastar A7K2000 HUA722020ALA330 drives, firmware JKAOA20N
1 x hot spare, 3 x drive RAID5 4TB, configured as GPT partitions, dynamic disks, and simple volumes
– Data drives, 4 x 2TB Seagate Barracuda LP ST32000542AS drives, firmware CC32
1 x hot spare, 3 x drive RAID5 4TB, configured as GPT partitions, dynamic disks, and simple volumes

I tested the drives as shipped, with no jumpers, running at SATA-II / 3Gb/s speeds.
Adaptec 51245, SATA-II / 3Gb/s:
As in my previous test I had to use an extra fan to keep the Adaptec card from overheating.
The Hitachi drives had no problems.
The Hitachi drives completed initialization in 16 hours.
The Seagate drives would not show up on the system, I tried different ports, resets, cable swaps, no go.
Adaptec, RAID5, Hitachi:

Adaptec, RAID5, WD:

Areca ARC1680ix-12, SATA-II / 3Gb/s:
The Areca had not problems with the Hitachi or Seagate drives.
The Hitachi drives completed initialization in 40 hours.
The Seagate drives completed initialization in 49 hours.
The array initialization time of the Areca is significantly longer compared to Adaptec or LSI.
Areca, RAID5, Hitachi:

Areaca, RAID5, Seagate:

Areca, RAID5, WD:

LSI 8888ELP and Chenbro CK12803, SATA-II / 3Gb/s:
The Hitachi drives reported a few “Invalid field in CDB” errors with, but it did not appear to affect the operation of the array.
The Hitachi drives completed initialization in 4 hours.
The Seagate drives reported lots of “Invalid field in CDB” and “Power on, reset, or bus device reset occurred” errors, but it did not appear to affect the operation of the array.
The Seagate drives made clicking sounds when they powered on, and occasionally during normal operation.
The Seagate drives completed initialization in 4 hours.

LSI, RAID5, Hitachi:

LSI, RAID5, Seagate:

LSI, RAID5, WD:

The Hitachi A7K2000 drives performed without issue on all three controllers, the Seagate Barracuda LP drive failed to work with the Adaptec controller.
The Hitachi A7K2000 outperformed the Seagate Barracuda LP drive, but this was not really a surprise given the drive specs.
The Areca ARC1680 controller produced the best and most reliable results, the Adaptec was close, but given the overheating problem, it is not reliable unless additional cooling is added.

I will be scaling my test up from 4 to 12 Hitachi drives, using the Areca controller, and I will expand the Areca cache from 512MB to 2GB.

Western Digital RE4-GP 2TB Drive Problems

In my previous two posts I described my research into the power saving features of various enterprise class RAID controllers.
In this post I detail the results of my testing of the Western Digital RE4-GP enterprise class “green” drives when used with hardware RAID controllers from Adaptec, Areca, and LSI.
To summarize, the RE4-GP drive fails with a variety of problems, Adaptec, Areca, and LSI acknowledge the problem and lays blame on WD, yet WD insists there are no known problems with the RE4-GP drives.
Test hardware:
Intel S5000PSL motherboard, dual Xeon E5450, 32GB RAM, firmware BIOS-98 BMC-65 FRUSDR-48
Adaptec 51245 RAID controller, firmware 17517, driver 5.2.0.17517
Areca ARC1680ix-12 RAID controller, firmware 1.47, driver 6.20.00.16_80819
LSI 8888ELP RAID controller, firmware 11.0.1-0017 (APP-1.40.62-0665), driver 4.16.0.64
Chenbro CK12803 28-port SAS expander, firmware AA11
Drive setup:
– Boot drive, 1 x 1TB WD Caviar Black WD1001FALS, firmware 05.00K05
Simple volume, connected to onboard Intel ICH10R controller running in RAID mode
– Data drives, 10 x 2TB WD RE4-GP WD2002FYPS drives, firmware 04.05G04
1 x hot spare, 3 x drive RAID5 4TB, 6 x drive RAID6 8TB, configured as GPT partitions, dynamic disks, and simple volumes
I started testing the drives as shipped, with no jumpers, running at SATA-II / 3Gb/s speeds.
Adaptec 51245, SATA-II / 3Gb/s:
The Adaptec card has 3 x internal SFF-8087 ports and 1 x external SFF-8088 port, supporting 12 internal drives.
The Adaptec card had immediate problems with the RE4-GP drives, in the ASM utility the drives would randomly drop out and in.
I could not complete testing.
Areca ARC1680ix-16, SATA-II / 3Gb/s:
The Areca card has 3 x internal SFF-8087 ports and 1 x external SFF-8088 port, supporting 12 internal drives.
Unlike the LSI and Adaptec cards that require locally installed management software, the Areca card is completely managed through a web interface from an embedded Ethernet port.
The Areca card allowed the RAID volumes to be created, but during initialization at around 7% the web interface stopped responding, requiring a cold reset.
I could not complete testing.
LSI 8888ELP and Chenbro CK12803, SATA-II / 3Gb/s:
The LSI card has 2 x internal SFF-8087 ports and 2 x external SFF-8088 port, supporting 8 internal drives.
Since I needed to host 10 drives, I used the Chenbro 28 port SAS expander.
The 8888ELP support page only lists the v3 series drivers, while W2K8R2 ships with the v4 series drivers, so I used the latest v4 drivers from the new 6Gb/s LSI cards.
The LSI and Chenbro allowed the volumes to be created, but during initialization 4 drives dropped out, and initialization failed.
I could not complete testing.
I contacted WD, Areca, Adaptec, and LSI support with my findings.
WD support said there is nothing wrong with the RE4-GP, and that they are not aware of any problems with any RAID controllers.
When I insisted that there must be something wrong, they suggested I try to force the drives to SATA-I / 1.5Gb/s speed and see if that helps.
I tested at SATA-I / 1.5Gb/s speed, and achieved some success, but I still insisted that WD acknowledge the problem.
The case was escalated to WD engineering, and I am still waiting for an update.
Adaptec support acknowledged a problem with RE4-GP drives when used with high port count controllers, and that a card hardware fix is being worked on.
I asked if the fix will be firmware or hardware, and was told hardware, and that the card will have to be swapped, but the timeframe is unknown.
Areca support acknowledged a problem between the Intel IOP348 controller and RE4-GP drives, and that Intel and WD are aware of the problem, and that running the drives at SATA-I / 1.5Gb/s speed resolves the problem.
I asked if a fix to run at SATA-II / 3Gb/s speeds will be made available, I was told this will not be possible without hardware changes, and no fix is planned.
LSI support acknowledged a problem with RE4-GP drives, and that they have multiple cases open with WD, and that my best option is to use a different drive, or to contact WD support.
I asked if a fix will become available, they said that it is unlikely that a firmware update would be able to resolve the problem, and that WD would need to provide a fix.
This is rather disappointing, WD advertises the RE4-GP as an enterprise class drive, yet 3/3 of the enterprise class RAID controllers I tested failed with the RE4-GP, and all three vendors blame WD, yet WD insists there is nothing wrong with the RE4-GP.
I continued testing, this time with the SATA-I / 1.5Gb/s jumper set.
Adaptec 51245, SATA-I / 1.5Gb/s:
This time the Adaptec card had no problems seeing the arrays, although some of the drives continue to report link errors.
A much bigger problem was that the controller and battery was overheating, the controller running at 103C / 217F.
In order to continue my testing I had to install an extra chassis fan to provide additional ventilation over the card.
The Adaptec and LSI have passive cooling, where in contrast the Areca has active cooling and only ran at around 51C / 124F.
The Areca and LSI batteries are off-board, and although a bit inconvenient to mount, they did not overheat like the Adaptec.
Initialization completed in 22 hours, compared to 52 hours for Areca and 8 hours for LSI.
The controller supports power management, and drives are spun down when not in use.
3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

Areca ARC1680ix-16, SATA-I / 1.5Gb/s:
This time the Areca card had no problems initializing the arrays.
Initialization completed in 52 hours, much longer compared to 22 hours for Adaptec and 8 hours for LSI.
Areca support said initialization time depends on the drive speed and controller load, and that the RE4-GP drives are known to be slow.
The controller supports power management, and drives are spun down when not in use.

3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

LSI 8888ELP and Chenbro CK12803, SATA-I / 1.5Gb/s:
This time only 2 drives dropped out, one out of each array, and initialization completed after I forced the drives back online.
Initialization completed in 8 hours, much quicker compared to 22 hours for Adaptec and 52 hours for Areca.

The controller only supports power management on unassigned drives, there is no support for spinning down configured but not in use drives.

3 x Drive RAID5 4TB performance:

6 x Drive RAID6 8TB Performance:

Although all three cards produced results when the RE4-GP drives were forced to SATA-I / 1.5Gb/s speeds, the results still show that the drives are unreliable.
The RE4-GP drive fails with a variety of problems, Adaptec, Areca, and LSI acknowledge the problem and lays blame on WD, yet WD insists there are no known problems with the RE4 drives-GP.
There are alternative low power drives available from Seagate and Hitachi.
I still haven’t forgiven Seagate for the endless troubles they caused with ES.2 drives and Intel IOP348 based controllers, and, like WD, also denying any problems with the drives, yet eventually releasing two firmware updates for the ES.2 drives.
I’ve always had good service from Hitachi drives, so maybe I’ll give the new Hitachi A7K2000 drives a run.
One thing is for sure, I will definately be returning the RE4-GP drives.
[Update: 11 October 2009]
I tested the Seagate Barracuda LP and Hitachi Ultrastar 2TB drives.
[Update: 24 October 2009]
WD support still has not responded to my request for the firmware.

Power Saving RAID Controller (Continued)

This post continues from my last post on power saving RAID controllers.
It turns out the Adaptec 5 series controller are not that workstation friendly.
I was testing with Western Digital drives; 1TB Caviar Black WD1001FALS, 2TB Caviar Green WD20EADS, and 1TB RE3 WD1002FBYS.
I also wanted to test with the new 2TB RE4-GP WD2002FYPS drives, but they are on backorder.
I found that the Caviar Black WD1001FALS and Caviar Green WD20EADS drives were just dropping out of the array for no apparent reason, yet they were still listed in ASM as if nothing was wrong.
I also noticed that over time ASM listed medium errors and aborted command errors for these drives.
In comparison the RE3 WD1002FBYS drives worked perfectly.
A little searching pointed me to a feature of WD drives called Time Limited Error Recovery (TLER).
You can read more about TLER here, or here, or here.
Basically the enterprise class drives have TLER enabled, and the consumer drives not, so when the RAID controller issues a command and the drive does not respond in a reasonable amount of time, the controller drops the drive out of the array.
The same drives worked perfectly in single drive, RAID-0, and RAID-1 configurations with an Intel ICH10R RAID controller, granted, the Intel chipset controller is not in the same performance league.
The Adaptec 5805 and 5445 controllers I tested did let the drives spin down, but the controller is not S3 sleep friendly.
Every time my system resumes from S3 sleep ASM would complain “The battery-backup cache device needs a new battery: controller 1.”, and when I look in ASM it tells me the battery is fine.
Whenever the system enters S3 sleep the controller does not spin down any of the drives, this means that all the drives in external enclosures, or on external power, will keep on spinning while the machine is sleeping.
This defeats the purpose of power saving and sleep.
The embedded Intel ICH10R RAID controller did correctly spin down all drives before entering sleep.
Since installing the ASM utility my system is taking a noticably longer time to shutdown.
Vista provides a convenient, although not always accurate, way to see what is impacting system performance in terms of even timing, and ASM was identified as adding 16s to every shutown.
Under [Computer Management][Event Viewer][Applications and Services Logs][Microsoft][Windows][Diagnostics-Performance][Operational], I see this for every shutdown event:
This service caused a delay in the system shutdown process:
File Name : AdaptecStorageManagerAgent
Friendly Name :
Version :
Total Time : 20002ms
Degradation Time : 16002ms
Incident Time (UTC) : 6/11/2009 3:15:57 AM
It really seems that Adaptec did not design or test the 5 series controllers for use in Workstations, this is unfortunate, for performance wise the 5 series cards really are great.
[Update: 22 August 2009]
I received several WD RE4-GP / WD2002FYPS drives.
I tested with W2K8R2 booted from a WD RE3 / WD1002FBYS drive connected to an Intel ICH10R controller on an Intel S5000PSL server board.
I tested 8 drives in RAID6 connected to a LSI 8888ELP controller, worked perfectly.
I connected the same 8 drives to an Adaptec 51245 controller, at boot only 2 out of 8 drives were recognized.
After booting, ASM showed all 8 drives, but they were continuously dropping out and back in.
I received confirmation of similar failures with the RE4 drives and Adaptec 5 series cards from a blog reader.
Adaptec support told him to temporarily run the drives at 1.5Gb/s, apparently this does work, I did not test it myself, clearly this is not a long term solution, nor acceptable.
I am still waiting to hear back from Adaptec and WD support.
[Update: 30 August 2009]
I received a reply from Adaptec support, and the news is not good, there is a hardware compatibility problem between the WD RE4-GP /WD2002FYPS drives.
“I am afraid currently these drives are not supported with this model of controller. This is due to a compatibility issue with the onboard expander on the 51245 card. We are working on a hardware solution to this problem, but I am currently not able to say in what timeframe this will come.”
[Update: 31 August 2009]
I asked support if a firmware update will fix the issue, or if a hardware change will be required.
“Correct, a hardware solution, this would mean the card would need to be swapped, not a firmeware update. I can’t tell you for sure when the solution would come as its difficult to predict the amount of time required to certify the solution but my estimate would be around the end of September.”
[Update: 6 September 2009]
I experienced similar timeouts testing an Areca ARC-1680 controller.
Areca support was very forthcoming with the problem and the solution.
“this issue had been found few weeks ago and problem had been reported to WD and Intel which are vendors for hard drive and processor on controller. because the problem is physical layer issue which Areca have no ability to fix it.
but both Intel and WD have no fix available for this issue, the only solution is recommend customer change to SATA150 mode.
and they had closed this issue by this solution.
so i do not think a fix for SATA300 mode may available, sorry for the inconvenience.”
That explains why the problem happens with the Areca and Adaptec controllers, but not the LSI, both use the Intel IOP348 processor.

Power Saving SATA RAID Controller

I’ve been a longtime user of Adaptec SATA RAID cards (3805, 5805, 51245), but over the years I’ve become more energy saving conscious, and the Adaptec controllers did not support Windows power management.
My workstations are normally running in the “Balanced” power mode so that they will go to sleep after an hour, but sometimes I need to run computationally intensive tasks that leaves the machines running 24/7.
During these periods the disks don’t need to be on and I want the disks to spin down, like they would had they been directly connected and not in a RAID configuration.
I was building a new system with 4 drives in RAID10, and I decided to the try a 3Ware / AMCC SATA 9690SA-4I RAID controller. Their sales support confirmed that the card does support native Windows power management.
I also ordered a battery backup unit with the card, and my first impressions of installing the battery backup unit was less than impressive. The BBU comes with 4 plastic screws with pillars, but the 9690SA card only had one mounting hole. After inserting the BBU in the IDC header I had to pull it back out and adjust it so that it would align properly.
After running the card for a few hours I started getting battery overheating warnings. The BBU comes with an extension cable, and I had to use the extension cable and mount the battery away from the controller board. After making this adjustment the BBU seemed to operate at normal temperature.
Getting back to installation, the 3Ware BIOS utility is very rudimentary (compared to Adaptec), I later found out that the 3Ware Disk Manager 2 (3DM2) utility is not much better. The BIOS only allowed you to create one boot volume, and the rest of the disk space was automatically allocated. The BIOS also only supports INT13 booting from the boot volume.
I installed Vista Ultimate x64 on the boot volume, and used the other of the volume for data. I also installed the 3DM2 management utility, and the client tray alerting application. The client utility does not work on Vista because it requires elevation, and elevation s not allowed for auto start items. The 3DM2 utility is a web server and you connect using your web browser.
At first the lack of management functionality did not bother me, I did not need it, and the drives seemed to perform fine. After a month or so I noticed that I was getting more and more controller reset messages in the eventlog. I contacted 3Ware support, and they told me they see CRC errors and that the fanout cable was probably bad. I replaced the cable, but the problems persisted.
The CRC errors reminded me of problems I had with Seagate ES2 drives on other systems, and I updated the firmware in the 4 500 GB Seagate drives I was using. No change, same problem.
I needed more disk space anyway, so I decided to upgrade the 500GB Seagate drives to 1TB WD Caviar Black drives. The normal procedure would be to remove the drives one by one, insert the new drive, wait for the array to rebuild, and when all drives have been replaced, to expand the volume.
A 3Ware KB article confirmed this operation, but, there was no support for volume expansion, what?
In order to expand the volume I would need to boot from DOS, Windows is not supported, run a utility to collect data, send the data to 3Ware, and they would create a custom expansion script for me that I then need to run against the volume to rewrite the META data. They highly recommend that I backup the data before proceeding.
I know the Adaptec Storage Manager (ASM) utility does support volume expansion, I’ve used it, it’s easy, it’s a right click in the GUI.
I never got to the point of actually trying the expansion procedure. After swapping the last drive I ran a verify, and one of the mirror units would not go past 22%. Support told me to try various things, disable scheduling, enable scheduling, stop the verify, restart the verify. When they eventually told me it seems there are some timeouts, and that the cause was Native Command Queuing (NCQ) and a bad BBU, I decided I had enough.
The new Adaptec 5-series cards do support power management, but unlike the 9690SA card they do not support native Windows power management, and requires power savings to be enabled through the ASM utility.
I ordered an Adaptec 5445 card, booted my system with the 9690SA still in place from WinPE, made an image backups using Symantec Ghost Solution Suite (SGSS), installed the 5445 card, created new RAID10 volumes, booted from WinPE, restored the images using Ghost, and Vista booted just fine.
From past experience I knew that when changing RAID controllers I had to make sure that the Adaptec driver would be ready after swapping the hardware, else the boot will fail. So before I swapped the cards and made the Ghost backup, I used regedit and changed the start type of the “arcsas” driver from disabled to boot. I know that SGSS does have support for driver injection used for bare metal restore, but since the Adaptec driver comes standard with Vista, I just had to enable it.
It has only been a few days, but the system is running stable with no errors. Based purely on boot times, I do think the WD WD1001FALS Caviar Black drives are faster than the Seagate ST3500320AS Barracuda drives I used before.
Let’s hope things stay this way.
[Updated: 17 July 2009]
The Adaptec was not that power friendly after all.
Read the continued post.