Rack that server

It’s been a year and a half since we moved into the new house, and I finally have the servers racked in the garage. Looks pretty nice compared to my old setups.

My old setup was as follows:
Two DELL OptiPlex 990 small form factor machines with Windows Server 2008 R2 as Hyper-V servers. One server ran the important 24/7 VM’s, the other was used for testing and test VM’s. The 24/7 VM’s included a W2K8R2 domain controller and a W2K12 file server.
For storage I used a Synology DS2411+ NAS, with 12 x 3TB Hitachi Ultrastar drives, configured in RAID6, and served via iSCSI. The the iSCSI drive was mounted in the Hyper-V host, and configured as a 30TB passthrough disk for the file server VM, that served files over SMB and NFS.
These servers stood on a wooden storage rack in the garage, and at the new house they were temporarily housed under the desk in my office.

One of my primary objectives was to move the server equipment to the garage in an enclosed server rack, with enough space for expansion and away from dust. A garage is not really dust free and does get hot in the summer, not an ideal location for a server rack, but better than finding precious space inside the house. To keep dust to a minimum I epoxy coated the floor and installed foam air filters in the wall and door air inlet vents. To keep things cool, especially after parking two hot cars, I installed an extractor fan. I had planned on connecting it to a thermostat, but opted to use a Panasonic WhisperGreen extractor fan rated for 24/7 operation, and I just leave it on all the time. We have ongoing construction next door, and the biggest source of dust are the gaps around the garage door. I’ve considered applying sticky foam strips next to the garage door edges, but have not done so yet.

In retrospect, preparing the garage concrete surface by hand, and applying the Epoxy Coat kit by myself, is not something I would recommend for a novice. If you can, pay a pro to do it for you, or at least get a friend to help, and rent a diamond floor abrasion machine.

I did half the garage at a time, moving everything to one side, preparing the surface by hand, letting it dry, applying the epoxy and flakes, letting it dry, and then repeating the process for the other side. I decided the 7″ roller that came with the kit was too small, and I bought a 12″ roller, big mistake, as soon as I started rolling the epoxy there was lint everywhere. From the time you start applying the epoxy you have 20 minutes working time, no time to go buy the proper type of lint free roller. I did not make the same mistake twice, and used the kit roller for the second half, no lint. With the experience gained from the first half it was much easier the second time round, and the color flake application was also much more even compared to the first half.

To conserve space in the garage I used a Middle Atlantic WR-24-32 WR Series Roll Out Rotating Rack. The roll out and rotate design allowed me to mount the rack right against the wall and against other equipment, as it does not require rear or side panel access. I also used a low noise MW-4QFT-FC thermostatically controlled integrated extractor fan top to keep things cool, and a WRPFD-24 plexiglass front door to make it look nice.

The entire interior cage rolls out on heavy duty castors, and the bottom assembly rotates on ball bearings. The bottom of the enclosure is open in the center with steel plate tracks for the castors, and must be mounted down on a sturdy and level surface. My garage floor is not level and slopes towards the door, and consequently a fully loaded rack wants to roll out the door, and all the servers keep sliding out of the rails.

I had to level the enclosure by placing spacers under the front section, and then bolting it down on the concrete floor. This leaves the enclosure and the rails inside the enclosure level, but as soon as I pull the rack out on the floor, the chassis slide out and the entire rack wants to roll out the door. I had to build a removable wood platform with spacers to provide a level runway surface in front of the rack, that way I can pull the rack out on a level surface, and store the runway when not in use.

The WR-24-32 is 24U high, and accommodates equipment up to 26″ in length, quite a bit shorter than most standard racks. The interior rack assembly pillar bars are about 23″ apart, with equipment extending past the pillar ends. This turned out to be more of a challenge than the 26″ equipment length constraint. When the rack is in its outside rotated position, the 23″ pillars just clears the enclosure, but the 26″ equipment sticking out past the pillars do not, and prevents the rack from rotating. This requires brute force to lift the castors, and a very heavy loaded rack, over the rail edge and pull the enclosure out all the way before the rack would rotate freely.

Another problem with the 23″ pillar spacing is the minimum adjustable distance for the 4U Supermicro chassis rails is about 25″, and they would not fit between the pillars. I had to order a shorter set of adjustable rails, and use the chassis side of the original rails to match the chassis mounting holes, and the rack side of the rails to clear the pillars, fortunately they fit perfectly into each other, but not on the rack. The WR-24-32 has tapped 10-32 screw holes in all locations, i.e. no square holes anywhere, which meant I had to use my Dremel to cut the quick mount tabs from the rails in order to screw them on instead of hanging them on.

Rather than using another NAS based storage solution I opted for direct attached storage, so I was looking for a 24-bay chassis, less than 26″ in length, with low noise fans. I opted for a Supermicro 4U 24-bay SuperChassis 846BE16-R920B for the main file server, and a 4U 8-bay SuperChassis 745BTQ-R1K28B-SQ for the utility server. It was the SC846’s included rails that were too long to fit between the posts, and I replaced them with a MCP-290-00058-0N short rail set.

I used Supermicro X10SLM+-F Xeon boards with Intel Xeon E3-1270 v3 processors for both systems. Low power and low heat was a higher priority than performance, and the E3 v3 processors were a good balance. I’ve had good experiences with the X9 series SM boards, but I have mixed feelings about the X10 boards. Kingston dropped support for these boards due to memory chip incompatibilities, and SM certified memory for this board is very expensive, and I had endless troubles getting the boards to work with an Adaptec 7805Q controller. The 7805Q controller would simply fail to start, and after being bounced around between SM and Adaptec support, SM eventually provided me with a special BIOS build, that is yet to be publicly updated, that resolved the problem. I had no such problems with the newer 81605ZQ controller I used in the 24-bay chassis.

For the 24-bay system storage, I used 2 x Samsung 840 Pro 512GB SSD drives in RAID1 for booting the OS and for MaxCache, 4 x Samsung 840 EVO 1TB SSD drives in RAID5 to host VM’s, 16 x Hitachi 4TB Coolspin drives plus 2 x hot spares in RAID6 for main storage. The 56TB RAID6 volume is mounted as a passthrough disk to the file server VM. To save power and reduce heat I host all the VM’s on the SSD array, and opted to use the consumer grade Hitachi Coolspin drives over the more expensive but reliable Ultrastar drives. The 8-bay system has a similar configuration, less the large RAID6 data array.

The SM boards are very easy to manage using the integrated IPMI KVM functionality. Other than configuring the BIOS and IPMI IP settings on the first boot, I rarely have to use the rack mounted KVM console. Each server runs W2K12R2 with the Hyper-V role. I am no longer running a domain controller, the complexity outweighed the benefit, especially with the introduction of Microsoft online accounts used in Windows 8. The main VM is a W2K12R2 storage file server VM, with the RAID6 disk in passthrough, serving data over SMB and NFS. My other VM’s include a system running Milestone XProtect IP security camera network video recorder, a MSSQL and MySQL DB VM, a Spiceworks VM, a Splunk VM, a UniFi Controller VM, and several work related VM’s.

I had Verizon switch my internet connection from Coax to Ethernet, and I now run a Ubiquity EdgeRouter Pro. I did run a MiktroTik Routerboard CCR1009-8G-1S-1S+ for a while, and it is a very nice box, but as I also switched out my EnGenius EAP600 access points to Ubiquity UniFi AC units, and I replaced the problematic TRENDNet TPE-1020WS POE+ switches with Ubiquity ToughSwitch TS-8-Pro POE units, I preferred to stick to one brand in the hopes of better interoperability. Be weary of the ToughSwitch units though, seems that under certain conditions mixing 100Mbps and 1Gbps ports have serious performance problems. I am still on the fence about the UniFi AC units, they are really easy to manage via the UniFi controller, but some devices, like my Nest thermostats, are having problems staying connected. Not sure if it is a problem with access points or the Nest’s, as there are many people blaming this problem on a Nest firmware update.

I used an APC Smart-UPS X 1500VA Rack/Tower LCD 120V with Network Card for clean and reliable power, and an ITWatchDogs SuperGoose II Climate Monitor for environmental monitoring and alerting.

After building and configuring everything, I copied all 30TB of data from the DS2411+ to the new server using robocopy with the multithreaded option, took about 5 days to copy. I continued using the old systems for two weeks while I let the new systems settle in, in case anything breaks. I then re-synced the data using robocopy, moved the VM’s over, and pointed clients to the new systems.

VM’s are noticeably more response, presumable due to being backed by SSD. I can now have multiple XBMC systems simultaneously watch movies while I copy data to storage without any playback stuttering, something that used to be an issue on the old iSCSI system.

The best part is really the way the storage cabinet looks 🙂

This is the temporary server home under my office desk:
Before

Finished product:
After

The “runway” I constructed to create a level surface:
Runway

Pulled out all the way, notice the cage is clear, but the equipment won’t clear:
Out

To clear the equipment the castors have to be pulled over the edge:
Cleared

Rotated view:
Rotated

The rarely used KVM drawer:
KVM

Extractor fans:
Fans

Night mode:
Night

Advertisements

LSI turns their back on Green

I previously blogged here and here on my research into finding a power saving RAID controllers.

I have been using LSI MegaRAID SAS 9280-4i4e controllers in my Windows 7 workstations and LSI MegaRAID SAS 9280-8e controllers Windows Server 2008 R2 servers. These controllers work great, my workstations go to sleep and wake up, and in workstations and servers drives spin down when not in use.

I am testing a new set of workstation and server systems running Windows 8 and Server 2012, and using the “2nd generation” PCIe 3.0 based LSI RAID controllers. I’m using LSI MegaRAID SAS 9271-8i with CacheVault and LSI MegaRAID SAS 9286CV-8eCC controllers.

I am unable to get any of the configured drives to spin down on either of the controllers, nor in Windows 8 or Windows Server 2012.

LSI has not yet published any Windows 8 or Server 2012 drivers on their support site. In September 2012, after the public release of Windows Server 2012, LSI support told me drivers would ship in November, and now they tell me drivers will ship in December. All is not lost as the 9271 and 9286 cards are detected by the default in-box drivers, and appear to be functional.

I had hoped the no spin-down problem was a driver issue, and that it would be corrected by updated drivers, but that appears to be wishful thinking.

I contacted LSI support about the drive spin-down issue, and was referred to this August 2011 KB 16563, pointing to KB 16385 stating:

newer versions of firmware no longer support DS3; the newest version of firmware to support DS3 was 12.12.0-0045_SAS_2108_FW_Image_APP-2.120.33-1197

When I objected to the removal, support replied with this canned quote:

In some cases, when Dimmer Switch with DS3 spins down the volume, the volume cannot spin up in time when I/O access is requested by the operating system.  This can cause the volume to go offline, requiring a reboot to access the volume again.

LSI basically turned their back on green by disabling drive spin-down on all new controllers and new firmware versions.

I have not had any issues with this functionality on my systems, and spinning down unused drives to save power and reduce heat is a basic operational requirement. Maybe there are issues with some systems, but at least give me the choice of enabling it in my environment.

A little bit of searching shows I am not alone in my complaint, see here and here.

And from Intel a November 2012 KB 033877 that they have disabled drive power save on all their RAID controllers, maybe not that surprising given that Intel uses rebranded LSI controllers.

After a series of overheating batteries and S3 failures, I have long ago given up on Adaptec RAID controllers, but this situation with LSI is making me take another look at them.

Adaptec is advertising Intelligent Power Management as a feature of their controllers, I ordered a 7805Q controller, and will report my findings in a future post.

Power Saving RAID Controller (Continued)

This post continues from my last post on power saving RAID controllers.
It turns out the Adaptec 5 series controller are not that workstation friendly.
I was testing with Western Digital drives; 1TB Caviar Black WD1001FALS, 2TB Caviar Green WD20EADS, and 1TB RE3 WD1002FBYS.
I also wanted to test with the new 2TB RE4-GP WD2002FYPS drives, but they are on backorder.
I found that the Caviar Black WD1001FALS and Caviar Green WD20EADS drives were just dropping out of the array for no apparent reason, yet they were still listed in ASM as if nothing was wrong.
I also noticed that over time ASM listed medium errors and aborted command errors for these drives.
In comparison the RE3 WD1002FBYS drives worked perfectly.
A little searching pointed me to a feature of WD drives called Time Limited Error Recovery (TLER).
You can read more about TLER here, or here, or here.
Basically the enterprise class drives have TLER enabled, and the consumer drives not, so when the RAID controller issues a command and the drive does not respond in a reasonable amount of time, the controller drops the drive out of the array.
The same drives worked perfectly in single drive, RAID-0, and RAID-1 configurations with an Intel ICH10R RAID controller, granted, the Intel chipset controller is not in the same performance league.
The Adaptec 5805 and 5445 controllers I tested did let the drives spin down, but the controller is not S3 sleep friendly.
Every time my system resumes from S3 sleep ASM would complain “The battery-backup cache device needs a new battery: controller 1.”, and when I look in ASM it tells me the battery is fine.
Whenever the system enters S3 sleep the controller does not spin down any of the drives, this means that all the drives in external enclosures, or on external power, will keep on spinning while the machine is sleeping.
This defeats the purpose of power saving and sleep.
The embedded Intel ICH10R RAID controller did correctly spin down all drives before entering sleep.
Since installing the ASM utility my system is taking a noticably longer time to shutdown.
Vista provides a convenient, although not always accurate, way to see what is impacting system performance in terms of even timing, and ASM was identified as adding 16s to every shutown.
Under [Computer Management][Event Viewer][Applications and Services Logs][Microsoft][Windows][Diagnostics-Performance][Operational], I see this for every shutdown event:
This service caused a delay in the system shutdown process:
File Name : AdaptecStorageManagerAgent
Friendly Name :
Version :
Total Time : 20002ms
Degradation Time : 16002ms
Incident Time (UTC) : 6/11/2009 3:15:57 AM
It really seems that Adaptec did not design or test the 5 series controllers for use in Workstations, this is unfortunate, for performance wise the 5 series cards really are great.
[Update: 22 August 2009]
I received several WD RE4-GP / WD2002FYPS drives.
I tested with W2K8R2 booted from a WD RE3 / WD1002FBYS drive connected to an Intel ICH10R controller on an Intel S5000PSL server board.
I tested 8 drives in RAID6 connected to a LSI 8888ELP controller, worked perfectly.
I connected the same 8 drives to an Adaptec 51245 controller, at boot only 2 out of 8 drives were recognized.
After booting, ASM showed all 8 drives, but they were continuously dropping out and back in.
I received confirmation of similar failures with the RE4 drives and Adaptec 5 series cards from a blog reader.
Adaptec support told him to temporarily run the drives at 1.5Gb/s, apparently this does work, I did not test it myself, clearly this is not a long term solution, nor acceptable.
I am still waiting to hear back from Adaptec and WD support.
[Update: 30 August 2009]
I received a reply from Adaptec support, and the news is not good, there is a hardware compatibility problem between the WD RE4-GP /WD2002FYPS drives.
“I am afraid currently these drives are not supported with this model of controller. This is due to a compatibility issue with the onboard expander on the 51245 card. We are working on a hardware solution to this problem, but I am currently not able to say in what timeframe this will come.”
[Update: 31 August 2009]
I asked support if a firmware update will fix the issue, or if a hardware change will be required.
“Correct, a hardware solution, this would mean the card would need to be swapped, not a firmeware update. I can’t tell you for sure when the solution would come as its difficult to predict the amount of time required to certify the solution but my estimate would be around the end of September.”
[Update: 6 September 2009]
I experienced similar timeouts testing an Areca ARC-1680 controller.
Areca support was very forthcoming with the problem and the solution.
“this issue had been found few weeks ago and problem had been reported to WD and Intel which are vendors for hard drive and processor on controller. because the problem is physical layer issue which Areca have no ability to fix it.
but both Intel and WD have no fix available for this issue, the only solution is recommend customer change to SATA150 mode.
and they had closed this issue by this solution.
so i do not think a fix for SATA300 mode may available, sorry for the inconvenience.”
That explains why the problem happens with the Areca and Adaptec controllers, but not the LSI, both use the Intel IOP348 processor.

Power Saving SATA RAID Controller

I’ve been a longtime user of Adaptec SATA RAID cards (3805, 5805, 51245), but over the years I’ve become more energy saving conscious, and the Adaptec controllers did not support Windows power management.
My workstations are normally running in the “Balanced” power mode so that they will go to sleep after an hour, but sometimes I need to run computationally intensive tasks that leaves the machines running 24/7.
During these periods the disks don’t need to be on and I want the disks to spin down, like they would had they been directly connected and not in a RAID configuration.
I was building a new system with 4 drives in RAID10, and I decided to the try a 3Ware / AMCC SATA 9690SA-4I RAID controller. Their sales support confirmed that the card does support native Windows power management.
I also ordered a battery backup unit with the card, and my first impressions of installing the battery backup unit was less than impressive. The BBU comes with 4 plastic screws with pillars, but the 9690SA card only had one mounting hole. After inserting the BBU in the IDC header I had to pull it back out and adjust it so that it would align properly.
After running the card for a few hours I started getting battery overheating warnings. The BBU comes with an extension cable, and I had to use the extension cable and mount the battery away from the controller board. After making this adjustment the BBU seemed to operate at normal temperature.
Getting back to installation, the 3Ware BIOS utility is very rudimentary (compared to Adaptec), I later found out that the 3Ware Disk Manager 2 (3DM2) utility is not much better. The BIOS only allowed you to create one boot volume, and the rest of the disk space was automatically allocated. The BIOS also only supports INT13 booting from the boot volume.
I installed Vista Ultimate x64 on the boot volume, and used the other of the volume for data. I also installed the 3DM2 management utility, and the client tray alerting application. The client utility does not work on Vista because it requires elevation, and elevation s not allowed for auto start items. The 3DM2 utility is a web server and you connect using your web browser.
At first the lack of management functionality did not bother me, I did not need it, and the drives seemed to perform fine. After a month or so I noticed that I was getting more and more controller reset messages in the eventlog. I contacted 3Ware support, and they told me they see CRC errors and that the fanout cable was probably bad. I replaced the cable, but the problems persisted.
The CRC errors reminded me of problems I had with Seagate ES2 drives on other systems, and I updated the firmware in the 4 500 GB Seagate drives I was using. No change, same problem.
I needed more disk space anyway, so I decided to upgrade the 500GB Seagate drives to 1TB WD Caviar Black drives. The normal procedure would be to remove the drives one by one, insert the new drive, wait for the array to rebuild, and when all drives have been replaced, to expand the volume.
A 3Ware KB article confirmed this operation, but, there was no support for volume expansion, what?
In order to expand the volume I would need to boot from DOS, Windows is not supported, run a utility to collect data, send the data to 3Ware, and they would create a custom expansion script for me that I then need to run against the volume to rewrite the META data. They highly recommend that I backup the data before proceeding.
I know the Adaptec Storage Manager (ASM) utility does support volume expansion, I’ve used it, it’s easy, it’s a right click in the GUI.
I never got to the point of actually trying the expansion procedure. After swapping the last drive I ran a verify, and one of the mirror units would not go past 22%. Support told me to try various things, disable scheduling, enable scheduling, stop the verify, restart the verify. When they eventually told me it seems there are some timeouts, and that the cause was Native Command Queuing (NCQ) and a bad BBU, I decided I had enough.
The new Adaptec 5-series cards do support power management, but unlike the 9690SA card they do not support native Windows power management, and requires power savings to be enabled through the ASM utility.
I ordered an Adaptec 5445 card, booted my system with the 9690SA still in place from WinPE, made an image backups using Symantec Ghost Solution Suite (SGSS), installed the 5445 card, created new RAID10 volumes, booted from WinPE, restored the images using Ghost, and Vista booted just fine.
From past experience I knew that when changing RAID controllers I had to make sure that the Adaptec driver would be ready after swapping the hardware, else the boot will fail. So before I swapped the cards and made the Ghost backup, I used regedit and changed the start type of the “arcsas” driver from disabled to boot. I know that SGSS does have support for driver injection used for bare metal restore, but since the Adaptec driver comes standard with Vista, I just had to enable it.
It has only been a few days, but the system is running stable with no errors. Based purely on boot times, I do think the WD WD1001FALS Caviar Black drives are faster than the Seagate ST3500320AS Barracuda drives I used before.
Let’s hope things stay this way.
[Updated: 17 July 2009]
The Adaptec was not that power friendly after all.
Read the continued post.