Storage Spaces Leaves Me Empty

I was very intrigued when I found out about Storage Spaces and ReFS being introduced in Windows Server 2012 and Windows 8. But now that I’ve spent some time with it, I’m left disappointed, and I will not be trusting my precious data with either of these features, just yet.

 

Microsoft publicly announced Storage Spaces and ReFS in early Windows 8 blog posts. Storage Spaces was of special interest to the Windows Home Server community in light of Microsoft first dropping support for Drive Extender in Windows Home Server 2011, and then completely dropping Windows Home Server, and replacing it with Windows Server 2012 Essentials. My personal interest was more geared towards expanding my home storage capacity in a cost effective and energy efficient way, without tying myself to proprietary hardware solutions.

 

I archive all my CD’s, DVD’s, and BD discs, and store the media files on a Synology DS2411+ with 12 x 3TB drives in a RAID6 volume, giving me approximately 27TB of usable storage. Seems like a lot of space, but I’ve run out of space, and I have a backlog of BD discs that need to be archived. In general I have been very happy with Synology (except for an ongoing problem with “Local UPS was plugged out” errors), and they do offer devices capable of more storage, specifically the RS2212+ with the RX1211 expansion unit offering up to 22 combined drive bays. But, at $2300 plus $1700, this is expensive, capped at 22 drives, and further ties me in with Synology. Compare that with $1400 for a Norco DS24-E or $1700 for a SansDigital ES424X6+BS 24 bay 4U storage unit, an inexpensive LSI OEM branded SAS HBA from eBay, or a LSI SAS 9207-8e if you like the real thing, connected to Windows Server 2012, running Storage Spaces and ReFS, and things look promising.

Arguable I am swapping one proprietary technology for another, but with native Windows support, I have many more choices for expansion. One could make the same argument for the use of ZFS on Linux, and if I was a Linux expert, that may have been my choice, but I’m not.

 

I tested using a SuperMicro SuperWorkstation 7047A-73, with dual Xeon E5-2660 processors and 32GB RAM. The 7047A-73 uses a X9DA7 motherboard, that includes a LSI SAS2308 6Gb/s SAS2 HBA, connected to 8 hot-swap drive bays.

For comparison with a hardware RAID solution I also tested using a LSI MegaRAID SAS 9286CV-8e 6Gb/s SAS2 RAID adapter, with the CacheCade 2.0 option, and a Norco DS12-E 12 bay SAS2 2U expander.

For drives I used Hitachi Deskstar 7K4000 4TB SATA3 desktop drives and Intel 520 series 480GB SATA3 SSD drives. I did not test with enterprise class drives, 4TB models are still excessively expensive, and defeats the purpose of cost effective home use storage.

 

I previously reported that the Windows Server 2012 and Windows 8 install will hang when trying to install on a SSD connected to the SAS2308. As such I installed Server 2012 Datacenter on an Intel 480GB SSD connected to the onboard SATA3 controller.

Windows automatically installed the drivers for the LSI SAS2308 controller.

I had to manually install the drivers for the C600 chipset RSTe controller, and as reported before, the driver works, but suffers from dyslexia.

The SAS2308 controller firmware was updated to the latest released SuperMicro v13.0.57.0.

 

Since LSI already released v14.0.0.0 firmware for their own SAS2308 based boards like the SAS 9207-8e, I asked SuperMicro support for their v14 version, and they provided me with an as yet unreleased v14.0.0.0 firmware version for test purposes. Doing a binary compare between the LSI version and the SuperMicro version, the differences appear to be limited to descriptive model numbers, and a few one byte differences that are probably configuration or default parameters. It is possible to cross-flash between some LSI and OEM adapters, but since I had a SuperMicro version of the firmware, this was not necessary.

SuperMicro publishes a v2.0.58.0 LSI driver that lists Windows 8 support, but LSI has not yet released Windows 8 or Server 2012 drivers for their own SAS2308 based products. I contacted LSI support, and their Windows 8 and Server 2012 drivers are scheduled for release in the P15 November 2012 update.

I tested the SuperMicro v14.0.0.0 firmware with the SuperMicro v2.0.58.0 driver, the SuperMicro v14.0.0.0 firmware with the Windows v2.0.55.84 driver, and the SuperMicro v2.0.58.0 driver with the SuperMicro v13.0.57.0 firmware. Any combination that included the SuperMicro v2.0.58.0 driver or the SuperMicro v14.0.0.0 firmware resulted in problems with the drives or controller not responding. The in-box Windows v2.0.55.84 driver and the released SuperMicro v13.0.57.0 firmware was the only stable combination.

Below are some screenshots of the driver versions and errors:

LSI.2.0.55.84LSI.2.0.58.0

Eventlog.Controller.ErrorEventlog.IO.RetriedEventlog.Reset.DeviceFormat.Failed

 

One of the reasons I am not yet prepared to use Storage Spaces or ReFS is because of the complete lack of decent documentation, best practice guides, or deployment recommendations. As an example, the only documentation on SSD journal drive configuration is in TechNet forum post from a Microsoft employee, requiring the use of PowerShell, and even then there is no mention of scaling or size ratio requirements. Yes, the actual PowerShell commandlet parameters are documented on MSDN, but not the use or the meaning.

PowerShell is very powerful and Server 2012 is completely manageable using PowerShell, but an appeal of Windows has always been the management user interface, especially important for adoption by SMB’s that do not have a dedicated IT staff. With Windows Home Server being replaced by Windows Server 2012 Essentials, the lack of storage management via the UI will require regular users to become PowerShell experts, or maybe Microsoft anticipates that configuration UI’s will be developed by hardware OEM’s deploying Windows Storage Server 2012 or Windows Server 2012 Essentials based systems.

My feeling is that Storage Spaces will be one of those technologies that matures and becomes generally usable after one or two releases or service packs post the initial release.

 

I tested disk performance using ATTO Disk Benchmark 2.47, and CrystalDiskMark 3.01c.

I ran each test twice, back to back, and report the average. I realize two runs are not statistically significant, but with just two runs it took several days to complete the testing in between regular work activities. I opted to only publish the CrystalDiskMark data as the ATTO Disk Benchmark results varied greatly between runs, while the CrystalDiskMark results were consistent.

Consider the values useful for relative comparison under my test conditions, but not useful for absolute comparison with other systems.

 

Before we get to the results, a word on the tests.

The JBOD tests were performed using the C600 SATA3 controller.
The Simple, Mirror, Triple, and RAID0 tests were performed using the SAS 2308 SAS2 controller.
The Parity, RAID5, RAID6, and CacheCade tests were performed using the SAS 9286CV-8e controller.

The Simple test created a simple storage pool.
The Mirror test created a 2-way mirrored storage pool.
The Triple test created a 3-way mirrored storage pool.
The Parity test created a parity storage pool.
The Journal test created a parity storage pool, with SSD drives used for the journal disks.
The CacheCade test created RAID sets, with SSD drives used for caching.

 

As I mentioned earlier, there is next to no documentation on how to use Storage Spaces. In order to use SSD drives as journal drives, I followed information provided in a TechNet forum post.

Create the parity storage pool using PowerShell or the GUI. Then associate the SSD drives as journal drives with the pool.

Windows PowerShell
Copyright (C) 2012 Microsoft Corporation. All rights reserved.

PS C:\Users\Administrator> Get-PhysicalDisk -CanPool $True

FriendlyName CanPool OperationalStatus HealthStatus Usage Size
------------ ------- ----------------- ------------ ----- ----
PhysicalDisk4 True OK Healthy Auto-Select 447.13 GB
PhysicalDisk5 True OK Healthy Auto-Select 447.13 GB

PS C:\Users\Administrator> $PDToAdd = Get-PhysicalDisk -CanPool $True
PS C:\Users\Administrator>
PS C:\Users\Administrator> Add-PhysicalDisk -StoragePoolFriendlyName "Pool" -PhysicalDisks $PDToAdd -Usage Journal
PS C:\Users\Administrator>
PS C:\Users\Administrator>
PS C:\Users\Administrator> Get-VirtualDisk

FriendlyName ResiliencySettingNa OperationalStatus HealthStatus IsManualAttach Size
me
------------ ------------------- ----------------- ------------ -------------- ----
Pool Parity OK Healthy False 18.18 TB

PS C:\Users\Administrator> Get-PhysicalDisk

FriendlyName CanPool OperationalStatus HealthStatus Usage Size
------------ ------- ----------------- ------------ ----- ----
PhysicalDisk0 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk1 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk2 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk3 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk4 False OK Healthy Journal 446.5 GB
PhysicalDisk5 False OK Healthy Journal 446.5 GB
PhysicalDisk6 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk7 False OK Healthy Auto-Select 3.64 TB
PhysicalDisk8 False OK Healthy Auto-Select 447.13 GB
PhysicalDisk10 False OK Healthy Auto-Select 14.9 GB

PS C:\Users\Administrator>

I initially added the journal drives after the virtual drive was already created, but that would not use the journal drives. I had to delete the virtual drive, recreate it, and then the journal drives kicked in. There must be some way to manage this after virtual drives already exist, but again, no documentation.

 

In order to test Storage Spaces using the SAS 9286CV-8e RAID controller I had to switch it to JBOD mode using the commandline MegaCli utility.


D:\Install>MegaCli64.exe AdpSetProp EnableJBOD 1 a0

Adapter 0: Set JBOD to Enable success.

Exit Code: 0x00

D:\Install>MegaCli64.exe AdpSetProp EnableJBOD 0 a0

Adapter 0: Set JBOD to Disable success.

Exit Code: 0x00

D:\Install>

 

The RAID and CacheCade disk sets were created using the LSI MegaRAID Storage Manager GUI utility.

 

Below is a summary of the throughput results:

ReadWriteKBPS

ReadWriteIOPS

 

Not surprisingly the SSD drives had very good scores all around for JBOD, Simple, and RAID0. I only had two drives to test with, but I expect more drives to further improve performance.

The Simple, Mirror, and Triple test results speak for themselves, performance halving, and halving again.

The Parity test shows good read performance, and bad write performance. The write performance approaches that of a single disk.

The Parity with SSD Journal disks shows about the same read performance as without journal disks, and the write performance double that of a single disk.

The RAID0 and Simple throughput results are close, but the RAID0 write IOPS doubling that of the Simple volume.

The RAID5 and RAID6 read performance is close to Parity, but the write performance almost ten fold that of Parity. It appears that the SLI card writes to all drives in parallel, while Storage Spaces parity writes to one drive only.

The CacheCade read and write performance is less than without CacheCade, but the IOPS ten fold higher.

The ReFS performance is about 30% less than the equivalent NTFS performance.

 

 

Until Storage Spaces gets thoroughly documented and improves performance, I’m sticking with hardware RAID solutions.

Advertisements

Synology DS2411+ Performance Review

In my last post I compared the performance of  Synology DS1511+ against the QNAP TS-859 Pro. As I finished writing that post, Synology announced the new Synology DS2411+.
Instead of using a DS1511+ and DX510 extender for 10 disks, the DS2411+ offers 12 disks in a single device. The price difference is also marginal, DS1511+ is $836, the DX510 is $500, and the DS2411+ is $1700. That is a difference of only $364, and well worth it for the extra storage space, and the reliability and stability of all drives in one enclosure. I ended up returning my DX510 and DS1511+, and got a DS2411+ instead.

To test the DS2411+, I ran the same performance tests, using the same MPIO setup as I described in my previous post. The only slight difference was in the way I configured the iSCSI LUN; the DS1511+ was configured as SHR2, while the DS2411+ was configured as RAID6. Theoretically both are the same when all the disks are the same size, and SHR2 ends up using RAID6 internally.
iSCSI LUN configuration:
DS2411.iSCSI.LUN

At idle the DS2411+ used 42W power, and under load it used 138W power. The idle power usage is close to the advertised 39W idle power usage, but quite a bit more than the advertised 105W power usage under load.

I use Remote Desktop Manager to manage all my devices in one convenient application. RDM supports web portals, Remote Desktop, Hyper-V, and many more remote configuration options, all in a single tabbed UI. What I found was that the Synology DSM has some problems when running in a tabbed IE browser. When I open the log history, I get a script error, and whenever I focus away and back on the browser window, the DSM desktop windows shift all the way to the left. I assume this is a DSM problem related to absolute and relative referencing. I logged a support case, and I hope they can fix it.
Script error:
DS2411.DSM.Script.Error

Test results:

Device
ATTO Read
ATTO Write
CDM Read
CDM Write
PM810 267.153 260.839 256.674 251.850
DS2411+ 244.032 165.564 149.802 156.673
DS1511+ 244.032 126.030 141.213 115.032
TS-859 Pro 136.178 95.152 116.015 91.097

Chart
DS2411+:
Atto.Synology.MPIOCDM.Synology.MPIO
DS1511+
Atto.Synology.MPIOCDM.Synology.MPIO

The DS2411+ published performance numbers are slightly better than the DS1511+ numbers, and my testing confirms that. so far I am really impressed with the DS2411+.

Synology DS1511+ vs. QNap TS-859 Pro, iSCSI MPIO Performance

Untitled Page I have been very happy with my QNap TS-859 Pro (Amazon), but I’ve run out of space while archiving my media collection, and I needed to expand the storage capacity. You can read about my experience with the TS-859 Pro here, and my experience archiving my media collection here.
My primary objective with this project is storage capacity expansion, and my secondary objective is improved performance.

My choices for storage capacity expansion included:

  • Replace the 8 x 2TB drives with 8 x 3TB drives, to give me 6TB of extra storage. The volume expansion would be very time consuming, but my network setup can remain unchanged during the expansion.
  • Get a second TS-859 Pro with 8 x 3TB drives, to give me 18TB of extra storage. I would need to add the new device to my network, and somehow rebalance the storage allocation across the two devices, without changing the file sharing paths, probably by using directory mount points.
  • Get a Synology DS1511+ (Amazon) and a DX510 (Amazon) expansion unit with 10 x 3TB drives to replace the QNap, to give me 12TB of extra storage, expandable to 15 x 3TB drives for 36TB of total storage. I will need to copy all data to the new device, then mount the new device in place of the old device.

I opted for the DS1511+ with one DX510 expansion unit, I can always add a second DX510 and expand the volume later if needed.
As far as hard drives go, I’ve been very happy with the Hitachi Ultrastar A7K2000 2TB drives I use in my workstations and the QNap, so I stayed with the larger Hitachi Ultrastar 7k3000 3TB drives for the Synology expansion.

For improving performance I had a few ideas:

  • The TS-859 Pro is a bit older than the DS1511+, and there are newer and more powerful QNap models available, like the TS-859 Pro+ (Amazon) with a faster processor, or the TS-659 Pro II (Amazon) with a faster processor and SATA3 support, so it not totally fair to compare the TS-859 Pro performance against the newer DS1511+. But, the newer QNap models do not support my capacity needs.
  • I use Hyper-V clients and dynamic VHD files located on an iSCSI volume mounted in the host server. I elected this setup because it allowed me great flexibility in creating logical volumes for the VM’s, without actually requiring the space to be allocated. In retrospect this may have been convenient, but it was not performing well in large file transfers between the iSCSI target and the file server Hyper-V client.
    For my new setup I was going to mount the iSCSI volume as a raw disk in the file server Hyper-V client. This still allowed me to easily move the iSCSI volume between hosts, but the performance will be better than fixed size VHD files, and much better than dynamic VHD files.
    Here is a blog post describing some options for using iSCSI and Hyper-V.
  • I used iSCSI thin provisioning, meaning that the logical target has a fixed size, but the physical storage only gets allocated as needed. This is very convenient, but turned out to be slower than instant allocation. The QNap iSCSI implementation is also a file-level iSCSI LUN, meaning that the iSCSI volume is backed by a file on an EXT4 volume.
    For my new setup I was going to use the Synology block-level iSCSI LUN, meaning that the iSCSI volume is directly mapped to a physical storage volume.
  • I use a single LAN port to connect to the iSCSI target, meaning the IO throughput is limited by network bandwidth to 1Gb/s or 125MB/s.
    For my new setup I wanted to use 802.3ad link aggregation or Multi Path IO (MPIO) to extend the network speed to a theoretical 2Gb/s or 250MB/s. My understanding of link aggregation turned out to be totally wrong, and I ended up using MPIO instead.

To create a 2Gb/s network link between the server and storage, I teamed two LAN ports on the Intel server adapter, I created a bond of the two LAN ports on the Synology, and I created two trunks for those connections on the switch. This gave me a theoretical 2Gb/s pipe between the server and the iSCSI target. But my testing showed no improvement in performance over a single 1Gb/s link. After some research I found that the logical link is 2Gb/s, but that the physical network stream going from one MAC address to another MAC address is still limited by the physical transport speed, i.e. 1Gb/s. This means that the link aggregation setup is very well suited to e.g. connect a server to a switch using a trunk, and allow multiple clients access to the server over the switch, each at full speed, but it has no performance benefit when there is a single source and destination, as is the case with iSCSI. Since link aggregation did not improve the iSCSI performance, I used MPIO instead.

I set up a test environment where I could compare the performance of different network and device configurations using readily available hardware and test tools. Although my testing produced reasonably accurate relative results, due to the differences in environments, it can’t really be used for absolute performance comparisons.

Disk performance test tools:

Server setup:

Network setup:

  • HP ProCurve V1810 switch, Jumbo Frames enabled, Flow Control enabled.
  • Jumbo Frames enabled on all adapters.
  • CAT6 cables.
  • All network adapters connected to the switch.

QNap setup:

Synology setup:

To test the performance using the disk test tools I mounted the iSCSI targets as drives in the server. I am not going to cover details on how to configure iSCSI, you can read the Synology and QNap iSCSI documentation, and more specifically the MPIO documentation for Windows, Synology and QNap.
A few notes on setting up iSCSI:

  • The QNap MPIO documentation shows that LAN-1 and LAN-2 are in a trunked configuration. As far as I could tell the best practices documentation from Microsoft, DELL, Synology, and other SAN vendors, say that trunking and MPIO should not be mixed. As such I did not trunk the LAN ports on the QNap.
  • I connected all LAN cables to the switch. I could have done direct connections to eliminate the impact of the switch, but this is not how how I will install the setup, and the switch should be sufficiently capable of handling the load and not add any performance degradation.
  • Before trying to enable MPIO on Windows Server, first connect one iSCSI target and map the device, then add the MPIO feature. If you do not have a mapped device, the MPIO iSCSI option will be greyed out.
  • The server’s iSCSI target configuration explicitly bound the source and destination devices based on the adapters IP address, i.e. server LAN-1 would bind to NAS LAN-1, etc. This ensured that traffic would only be routed to and from the specified adapters.
  • I found that the best MPIO load balance policy was the Least Queue Depth Option.

During my testing I encountered a few problems:

  • The DX510 expansion unit would sometimes not power on when the DS1511+ is powered on, or would sometimes fail to initialize the RAID volume, or would sometimes go offline while powered on. I RMA’d the device, and the replacement unit works fine.
  • During testing of the DS1511+, the write performance would sometimes degrade by 50% and never recover. The only solution was to reboot the device. Upgrading the the latest 3.1-1748 DSM firmware solved this problem.
  • During testing of the DS1511+, when one of the MPIO network links would go down, e.g. I unplug a cable, ghost iSCSI connections would remain open, and the iSCSI processes would consume 50% of the NAS CPU time. The only solution was to reboot the device. Upgrading the the latest 3.1-1748 DSM firmware solved this problem.
  • I could not get MPIO to work with the DS1511+, yet no errors were reported. It turns out that LAN-1 and LAN-2 must be on different subnets for MPIO to work.
  • Both the QNap and Synology exhibits weird LAN traffic behavior when both LAN-1 and LAN-2 is connected, and the server generates traffic directed to LAN-1 only. The NAS resource monitor would show high traffic volumes on LAN-1 and and LAN-2, even with no traffic directed at LAN-2. I am uncertain why this happens, maybe a reporting issue, maybe a switching issue, but to avoid it influencing the tests, I disconnected LAN-2 while not testing MPIO.

My test methodology was as follows:

  • Mount either the QNap or Synology iSCSI device, power of the other device while not being tested.
  • Connect the iSCSI target using LAN-1 only and unplug LAN-2, or connect using MPIO with LAN-1 and LAN-2 active.
  • Run all CDM tests with iterations set at 9, and a 4GB file-set size.
  • Run ATTO with the queue depth set to 8, and a 2GB file-set size.
  • As a baseline, I also tested the Samsung PM810 SSD drive using ATTO and CDW.

Test result summary:

Device
ATTO Read
ATTO Write
CDM Read
CDM Write
Total (MB/s)
PM810 267.153 260.839 256.674 251.850 1,036.516
DS1511+ MPIO 244.032 126.030 141.213 115.032 626.307
TS-859 Pro MPIO 136.178 95.152 116.015 91.097 438.442
DS1511+ 122.294 120.172 89.258 105.618 437.342
TS-859 Pro 119.370 99.864 76.529 89.752 385.515

image

Detailed results:
PM810:
Atto.P810 CDM.P810
DS1511+ MPIO:
Atto.Synology.MPIO CDM.Synology.MPIO
TS-859 Pro MPIO:
Atto.Qnap.MPIO CDM.Qnap.MPIO
DS1511+:
Atto.Synology CDM.Synology
TS-859 Pro:
Atto.Qnap CDM.Qnap

Initially, I was a little concerned about the DX510 being in a separate case connected with an eSATA cable to the main DS1511+. Especially after I had to RMA my first DX510 because of what appeared to be connectivity issues. I was also concerned that there would be a performance difference between the 5 drives in the DS1511+ and the 5 drives in the DX510. Testing showed no performance difference between a 5 drive volume and a 10 drive volume, and the only physically noticeable difference was that the drives in the DX510 ran a few degrees hotter compared to the drives in the DS1511+.

As you can see from the results, the DS1511+ with MPIO performs really very well. Especially the 244MB/s ATTO read performance that gets close to the theoretical maximum of 250MB/s over a 2Gb/s link.

But technology moves quickly, and as I was compiling my test data for this post, Synology released two new NAS units, the DS3611xs and the DS2411+. The DS2411+ is very appealing, it is equivalent in performance to the DS1511+, but supports 12 drives in the main enclosure.
I may just have to exchange my DS1511+ and DX510 for a DS2411+…

[Update: 25 July 2011]
I returned the DS1511+ and DX510 in exchange for a DS2411+.
Read my performance review here.

Archiving my CD, DVD and BD collection

I am about two thirds done archiving my entire CD, DVD, and BD collection to network storage. I have been ripping on a part time basis for about 5 months, and so far I’ve ripped over 700 discs.

I have considered archiving my media collection for some time, but just never got around to it. Recently our toddler discovered how to open discs and use them as toys, so storing the discs safely quickly became a priority. I’d like to give you some insight into what I’ve learned and what process I follow.

 

After ripping, I store the discs in aluminum storage cases that hold 600 discs in hanging sleeves. There are similar cases with a larger capacity, but the dimensions of the 600-disc case allows for easy manipulation and storage in my garage. I download or scan the cover images as part of the ripping process, so I had no need to keep them, and I, reluctantly, threw them away. If I could I would have kept the covers, but I found no convenient way to store them.

Below is a picture of the storage case:

StorageCase

 

All the ripped content is saved on my home server, and the files are accessible over wired Gigabit Ethernet and 802.11n Wireless. My server setup is probably excessive, but it serves a purpose. I run a Windows 2008 R2 Hyper-V Server. In the Hyper-V host I run two W2K8R2 guests, one being a Domain Controller, DHCP server, and DNS Server, and the other being a File Server. The file server storage is provided by 2 x QNAP TS-859 Pro iSCSI targets, each with 8 x 2TB drives in RAID6. This gives the file server about 24TB of usable disk space.

24TB may sound like a lot of storage, but considering that I store my documents, my pictures of which most are RAW, my home movies of which most are HD, and all my ripped media in uncompressed format, I really need that much storage.

 

I am currently using Boxee Boxes for media playback. The Boxee Box does not have all the features of XBMC, and I sometimes have to hard boot it to become operational again, but it plays most file types, it runs the Netflix app, and is reasonably maintenance free.

Although Boxee is derived from XBMC, I really miss some of the XBMC features, specifically the ability to set the type of content in a directory, and to sort by media meta-data. Like XBMC, Boxee expects directories and video files to be named a specific way, and the naming is used to lookup the content details. Unlike XBMC, Boxee treats all media sources the same way, so when I add a folder with TV episodes and another folder with movies Boxee often incorrectly classifies the content, and I have to spend time correcting the meta-data. What makes it worse is that I have to apply the same corrections on each individual Boxee Box, it would have been much more convenient if my Boxee account allowed my different Boxee Boxes to share configurations.

 

Ripping and storing the discs is part of the intake process, but I also need a searchable catalog of the disc information, where the ripped files are stored, and where the physical disc is stored. I use Music Collector and Movie Collector to catalog and record the disc information. Unlike other tools I’ve tested, the Music Collector Connect and Movie Collector Connect online services allow access my catalog content anywhere using a web browser. The Connect service does allow you to add content online, theoretically negating the need for the desktop products, but I found the desktop products to be much more effective to use for intake, and then export the content online.

To catalog a CD I take the following steps: I start the automatic add feature, that computes the disc fingerprint and uses the fingerprint to lookup the disc details online. In most cases the disc is correctly identified, including album, artist, track names, etc. In many cases the front disc cover image is available, but it is rare that both the front and back covers are available. If either cover is not available, I scan my own covers, and add them to the record. I found that many of the barcode numbers (UPC) do not match the barcode of my version of the discs, if they do not match, I scan my barcode and update the record. If I made any corrections, or added missing covers, I submit the updated data, so that other users can benefit from my corrections and additions.

To catalog a DVD or BD I take the following steps: I start the automatic add feature, I use a barcode scanner and I scan in the barcode, the barcode is used to lookup the disc details online. In most cases the disc is correctly identified, including name, release year, etc. In some cases my discs do not have barcodes, this is especially true for box sets where the box may have a barcode but the individual movies in the box does not, or where I threw away the part of the box that had the barcode.

Since I buy most of my movies from Amazon, I can use my order history to find the Amazon ASIN number of the item I purchased. I then use IMDB to lookup the UPC code associated with the ASIN number. To do this search for the movie by name in IMDB, then click on the “dvd details” dropdown in the “quick links” section, then search the page for the ASIN number, and copy and paste the associated UPC code. Alternatively you can just use Google and search for the “[ASIN number] UPC”, this is sometimes successful. I don’t know why Amazon, who owns IMDB, does not display UPC codes on the product details page?

If I still do not have a UPC code, I search for the movie by name, look at the results, and pick the movie with the cover matching my disc. In most cases the disc front and back cover is available. If either cover is not available, I scan my own covers, and add them to the record. If I made any corrections, or added missing covers, I submit the updated data, so that other users can benefit from my corrections and additions.

Below are screenshots of Music Collector and Music Collector Online:

MusicCollector    MusicCollector.Online

Below are screenshots of Movie Collector and Movie Collector Online:

MovieCollector    MovieCollector.Online

 

In terms of the ripping process, ripping CD’s is really the most problematic and time consuming. Unlike BD’s that are very resilient, CD’s scratch easily resulting in read errors. Sometimes I had to re-rip the same disc multiple times, between multiple drives, before all tracks ripped accurately. I want accurate and complete meta-data for the ripped files. Sometimes automatic meta detection did not work and I had to manually find and enter the artist, album, song title, etc. This is especially problematic when there are multiple variants, such as pressings and regional track content or track order, of the same logical disc, and I have to match the online meta-data against my particular version of the disc. BD’s and DVD’s typically have only one movie per disc, where each CD has multiple tracks, and the correct metadata has to be set for the album and each track. So although a CD may physically rip much faster compared to a BD, it takes a lot more time and manual effort to accurately rip, tag, and catalog a CD.

I use dBpoweramp for ripping CD’s, it has two advantages over other tools I’ve tested; AccurateRip and PerfectMeta.

Unlike data CD’s, audio CD track data cannot be read 100% accurately using a data CD drive. If the CD drive reads a data track and encounters a read failure, it reports the failure to the reading software. If the CD drive reads an audio track and encounters a read failure, it may ignore the error, it may interpolate the data, or it may replace the data with silence, all without telling the reading software that there was an error. As a result the saved file may contain pops, inaccurate data, or silence. In order to rip a CD track accurately, the ripping software needs to read the the same track several times, and compare the results, and keep on re-reading the track until the same result has been obtained a number of times. This makes ripping CD’s accurately a very time consuming process. Even if you do get the same results with every read, you are still not guaranteed the what you read is accurate, you may just have read the same bad data multiple times. You can read more about the technicalities of ripping audio CD’s accurately here.

AccurateRip solves this problem by creating an online database of disc and track fingerprints. A track is read at full speed, the track’s fingerprint is computed, and compared against the online database of similar tracks, if the fingerprint matches, the track is known to be good, and there is no need to re-read the track. This allows CD’s to be ripped very fast and very accurately.

I use the Free Lossless Audio Codec (FLAC) format for archiving my CD’s. FLAC reduces the file size, but retains the original audio quality. FLAC also supports meta-data allowing the track artist, album, title, and image of the CD cover, etc. to be stored in the file. Unlike the very common MP3 format, FLAC playback is by default not supported by Windows Media Player (WMP). To make WMP, and Windows, play FLAC, you need to install the Xiph FLAC DirectShow filters. Or use Media Player Classic Home Cinema (MPC-HC). A typical audio CD rips to about 400MB in FLAC files.

Just like a CD track can be identified using a fingerprint, an entire CD can also be identified using a fingerprint. When the same CD is manufactured in different batches, or different factories, it results in different track fingerprints for the same logical CD. The same logical CD may also contain different tracks or track orders when released in different regions, also resulting in a different CD fingerprints. CDDB ID is the classic fingerprint, but with uniqueness problems, the more modern Disc ID algorithm does not suffer from such problems, and allows very unique fingerprints to be created by just looking at the track layout, i.e. no need to read the track data.

CD meta-data providers match CD fingerprints against logical album details. Some of this information is freely available, such as freeDB, Discogs, and MusicBrainz, and some information is commercially available, such as Gracenote, GD3, and AMG. Free providers are typically community driven, while commercial providers may have more accurate data.

PerfectMeta makes the tagging process easy, fast, detailed, and accurate. By integrating with a variety of different meta-data providers, including commercial GD3 and AMG, the track meta-data will automatically be selected based on the most reliable provider, or the most consistent data.

Below are screenshots of dBpoweramp ripping a CD, and reviewing the meta-data:

dBpoweramp.Rip     dBpoweramp.MetaData

 

I use MakeMKV for ripping DVD’s and BD’s, it is fast and easy to use, and supports extracting multiple audio, subtitle, and video tracks to a single output file.

MakeMKV creates Matroska Media Container (MKV) format output files. MKV supports multiple media streams and meta-data in the same file. MKV is not a compression format, it is just a container file, inside the container can be any type of media stream such as an AVC video stream, a DTS-HD audio stream, a PGS subtitle stream, chapter markers, etc. MKV playback is by default not supported by WMP or Windows Media Center (WMC). One solution is to install codec packs such as the K-Lite Codec Pack, but I prefer to use standalone players such as Boxee, XBMC, or MPC-HC.

MakeMKV does not perform any recompression of the streams found on the DVD or BD, it simply reads them from the source and writes them to the MKV file. This means that the playback quality is unaltered and equivalent to that of the source material. This also means that the MKV file is normally the same size as the original DVD or BD disc, typically 7GB for a DVD and 35GB for a BD.

I hate starting a BD or DVD, and I have to sit there watching one trailer after the next, especially when the disc prohibits skipping the clip and the kids are getting impatient. I paid good money for the disc, why am I forced to watch advertising on a disc I own? MakeMKV solves this problem by allowing me to rip only the main movie, and when I start playing the MKV file, I immediately see the main movie start. The downside to ripping only the main movie is that disc extras are not available, and the downside to ripping in general is that BD+ interaction is also not available. Some people prefer to rip a disc to an ISO, and then play the ISO with a software player that still allows menu navigation, I have no such need, and ripping only the main movie satisfies my requirements.

When I make my stream selection I pick the main movie, the main English audio track, the English subtitles, and the English forced subtitles. If a movie contains an HD audio track, such as DTS-HD, TrueHD, or LPCM, I also select the non-HD audio track. I do this in case the playback hardware device does not support HD audio, or the player software cannot down-convert the HD audio to a format supported by the playback hardware. On some discs an HD audio and a non-HD audio track is included, but if not, MakeMKV can automatically extract DTS from DTS-HD and can extract AC3 from TrueHD.

On some discs where there are many subtitle streams of the same language, selection gets very complicated, this is especially true when the disc contains forced subtitles. Forced subtitles are the subtitles that are displayed when there is dialog in a language other than the main audio langue, such as when aliens are talking to each other, but when people talk there are no subtitles. On DVD’s the forced subtitles are normally in a separate subtitle stream, on BD’s the subtitle stream includes a forced-bit for specific sentences. MakeMKV can automatically extract forced subtitles as a separate stream from a subtitle stream that contains normal and forced subtitles. When I encounter a disc where I cannot make out which video, audio, or subtitle streams to extract, I use EAC3TO to extract the individual tracks, view, listen, or read them, and then decide which tracks to select in MakeMKV.

Ripping television series on DVD or BD has its own challenges. In order for players like Boxee and XBMC to correctly identify the shows, the files and folders must be properly organized and named. A disc typically contains a few episodes of the series, and some discs contain extras. When you make the track selections you need to include the episodes but exclude the extras. MakeMKV creates a folder for every disc, and names each file according to its track number on that disc. This results in multiple folders, one per disc, with duplicate file names in each folder. In order to re-assemble the series in one folder, you need to rename the episodes from each folder according to the correct season and episode number, such as S01E01.mkv, then move all the files to one folder. What makes this very complicated is when the episode order on disc is different to the aired episode order. The TV scrapers use community television series websites, such as TheTVDB and TVRage, to retrieve show information. The season number and show number must match the aired episode number, not the disc order number. It is a real pain to manually match the disc to aired episode numbers, and I don’t know why discs would use a different show order compared to the aired order? Once you have your episodes named, such as S01E01.mkv, it is very easy to correctly name the file and folder by using an application called TVRename. Point TVRename to your ripped television show folder, it will try to automatically match show names to the TheTVDB show names, you can manually search and correct mappings, it will then automatically rename the show, season, and filenames, according to your preference, and in a format that Boxee and XBMC recognizes.

Below are screenshots of MakeMKV with the stream selection screen for a DVD and a BD:

MakeMKV.WrathOfKhan    MakeMKV.IronMan2

 

When I started ripping my collection I had no idea it would take this long. If I were to dedicate my time to ripping and ripping only, I would have been done a long time ago, but I typically rip only a few discs per week, in between regular work activities; get to the office, insert disc, start working, swap disc, continue working, swap disc, go to meeting, rip a few discs while having lunch at my desk, rip a few discs during the weekend, repeat. The time it takes to rip a disc is important when you stare at the screen, but less so when you have other things to do.

Over the months I’ve used a variety of BD readers, some worked well for BD’s, but were really bad for CD’s, some were fast and some were slow. To illustrate the performance, I selected a BD, a DVD, and a CD, and I ripped them all using the same settings, on the same machine, but using a variety of drive models.

Some drive models incorporate a feature called riplocking, that limits the read speed when reading video discs in order to reduce drive noise. A riplocked drive will read a video BD or DVD much slower than a data BD or DVD, and this results in slow rip times. I used an application called Media Code Speed Edit (MCSE) to remove the riplock restriction on some of the drives.

All drives include Regional Playback Control (RPC) that restricts the media than can be played in that drive by region. There are different regions for DVD and BD discs. RPC-1 drives allow software to enforce the region protection, RPC-2 drives perform the region protection in drive hardware. Most new drives are RPC-2 drives. Drive region protection is not an issue for MakeMKV, and it can rip any region disc on any region drive. RPC-1 versions of firmware is available for many drives at the RPC-1 Database.

I tested the following drives:

Drive

Firmware

Notes

LG BH12LS35 1.00  
LG BH12LS35 1.00 Riplock removed
LG UH10LS20 1.00  
LG UH10LS20 1.00 Riplock removed
LG UH10LS20 1.00 RPC-1
Plextor PX-B940SA 1.08 Rebranded Pioneer BDR-205
Sony BD-5300S 1.04 Rebranded Lite-On iHBS112
Lite-On iHBS212 5L09  
Pioneer BDR-206 1.05  

I measured the rip speed in Mbps, as computed by dividing the output file size by the rip time in seconds. The file size is the size of the MKV file for DVD’s and BD’s, and the size of all files in the album folder for CD’s. The rip time is computed by subtracting the file create time from the file modified time. The test methodology is not a standard test, and the results should not be used in absolute comparisons, but are very valid in relative comparisons. For more standard testing and reviews visit CDFreaks.

Test results:

Chart
From the results we can see that the Sony BD-5300S (a rebranded Lite-On iHBS112) and the Lite-On iHBS212 drives are the fastest overall ripping drives, the fastest BD ripping drives, the fastest DVD dripping drives, but second slowest CD ripping drives. It is further interesting to note that the stock Lite-On drives were still faster than the riplock removed LG drives. The Lite-On drives also have the smallest AccurateRip drive correction offsets of all the drives.

 

I still have quite a way to go before all my discs are ripped, but at least I have the process down; rip, swap, repeat.