RAID, files and cloud storage

RAID is a method that takes independent drives and lets a system group them together for security (redundancy or parity), speed enhancements, storage space increases or all three. One of the long-time stalwarts of the RAID environment is RAID 5. In RAID 5, you need at least 3 identically sized disks. They are combined so that the storage space is N-1 (i.e., in a 3-drive system, total space is 2x drive size). The last disk is used for parity. With a parity drive, you can remove any one of the drives and still have access to your data. If you remove two or more of the drives though, you'd better have a good backup.

How does this work? Through the magic of XOR. The following statements are all true:


A   XOR B   = PAR
PAR XOR B   = A
A   XOR PAR = B

That's how parity lets you lose one disk and still recover your data. The same rules also apply in larger sets. Notice that you have to rotate the position of the parity data though. So a 5 drive system looks like:


A   XOR B   XOR C   XOR D   XOR E   = PAR
PAR XOR B   XOR C   XOR D   XOR E   = A
A   XOR PAR XOR C   XOR D   XOR E   = B
A   XOR B   XOR PAR XOR D   XOR E   = C
A   XOR B   XOR C   XOR PAR XOR E   = D
A   XOR B   XOR C   XOR D   XOR PAR = E

The other interesting thing to note is that the size stored on each drive is 1/(n-1) of the total file size. Thus a 100 byte file in a RAID 5 system only stores 20 bytes on each drive. This is where the space increase comes from.

Now, the question is, why do we care? Other than it's nice to know how something works, this technique could be applied to cloud storage. If you follow some of the events that have happened with online storage providers, you may have seen a number of them come and go. The problem is, if they disappear or lose your data, what happens then?

If you are using them as a convenient off-site data storage pool for large amounts of infrequently used data, you could implement a RAID 5-style data split. Keep in mind there's a bit of overhead in doing so, but the point of this exercise is to reduce your dependence on any one provider. Coincidentally, it will likely cost you the same or less than using a single cloud storage provider.

Take for instance Amazon's S3 service and Rackspace's Mosso Cloud Files. Both services charge per gigabyte per month. For arguments sake, let's assume we have a 10GB RAR file (BIGFILE.RAR) that we want to backup. If we split it into a RAID 5, 3 drive format, that would leave us with BIGFILE.RAR.1, BIGFILE.RAR.2 and BIGFILE.RAR.RAID. Each file will be 5GB (10GB / (3-1)) in size. I can upload one file to Amazon, one file to Rackspace and retain one file on my hard drive. I've now backed up the file in an online form that can be recovered even if I lose my hard drive or Amazon / Rackspace has an outage at the same moment I need to have access to my data. So long as I have access to any two parts of the file, I can create the original output file.

Obviously, Amazon and Rackspace are large enough it is unlikely they'd actually lose the data over a longer period of time. The same can't be said of some the smaller players in the market. Companies like Streamload managed to wipe out about half of their customers data during a reorganization before finally closing their doors. Anyone caught unaware lost all of their data.

I should also note that you could use a PAR or PAR2 system. The easiest to use implementation is probably QuickPAR. It uses a similar system to what I've shown here but the source is not very conducive to Delphi developers. From what I can tell, it was originally developed to push binary files around Usenet, but it would work equally well for cloud storage. If you're just looking for a good off-the-shelf tool, QuickPAR is probably the way to go. If you're interested in developing a parity solution that you can embed in your code, the source is included below.


procedure File2RaidFiles(fileName:string; raidLength:integer);
var FS:TFileStream;
    outputFS:array of TFileStream;
    byteArray:array of byte;
    ix:integer;
begin
//test to make sure we were called correctly
if not FileExists(fileName) then
   raise Exception.Create(Format('File %s doesn''t exist', [fileName]));
if raidLength<2 then
   raise Exception.Create(Format('Raid Length must be greater than 1. Given value was %d',[raidLength]));

FS:=TFileStream.Create(fileName,fmOpenRead);
try
   setLength(outputFS, raidLength+1); //+1 for the parity byte
   setLength(byteArray, raidLength+1);
   for ix := 0 to raidLength do       //create an output location for each stripe (named .stripe#) and the parity file (named .raid)
      begin
      if ix=raidLength then
         outputFS[ix]:=TFileStream.Create(fileName+'.raid',fmCreate)
      else
         outputFS[ix]:=TFileStream.Create(fileName+'.'+IntToStr(ix),fmCreate);
      end;
   try
      while FS.Position<FS.Size do                         //while we haven't hit the end of the file
         begin
         FS.Read(byteArray[0],raidLength);                 //read in the bytes to the byteArray
         for ix := 0 to raidLength-2 do                    //this calcs the parity byte, it's calculated by XORing the other bytes to it
            byteArray[raidLength]:=byteArray[ix] xor byteArray[ix+1];
         for ix := 0 to raidLength do                      //write out the bytes to the respective stripes (stripes are < raidlength) and the parity file (outputFS[raidlength])
            outputFS[ix].Write(byteArray[ix],1);
         end;
   finally
      for ix := 0 to raidLength do                         //clean up the output streams
         outputFS[ix].Free;
   end;
finally
   FS.Free;                                                //clean up the input stream
end;
end;

procedure RaidFiles2File(fileName, outputName:string; raidLength:integer);
var FS:TFileStream;
    inputFS:array of TFileStream;
    testFS:TFileStream;
    byteArray:array of byte;
    ix, damage:integer;
    countOfMissing:integer;
    checkByte:byte;
    checkFails:boolean;
begin
//test to make sure we were called correctly
if FileExists(outputName) then
   raise Exception.Create(Format('File %s already exist', [outputName]));
if raidLength<2 then
   raise Exception.Create(Format('Raid Length must be greater than 1. Given value was %d',[raidLength]));
//init the basics
checkFails:=false;
testFS:=nil;
countOfMissing:=0;
damage:=0;
setLength(inputFS, raidLength+1); //+1 is the parity byte
setLength(byteArray, raidLength+1);
for ix := 0 to raidLength do
   inputFS[ix]:=nil;

//setup the output file stream
FS:=TFileStream.Create(outputName,fmCreate);
try
   //create the input file streams. make sure we pickup the count of missing streams and a pntr to an input stream for use later on (any stream will do, just testing for eof
   for ix := 0 to raidLength do
      begin
      if ix=raidLength then //this is the parity stream
         begin
         if FileExists(fileName+'.raid') then
            inputFS[ix]:=TFileStream.Create(fileName+'.raid',fmOpenRead)
         end
      else                  //this is a stripe stream
         begin
         if FileExists(fileName+'.'+IntToStr(ix)) then
            begin
            inputFS[ix]:=TFileStream.Create(fileName+'.'+IntToStr(ix),fmOpenRead);
            testFS:=inputFS[ix];  //this is just to test for eof. all files are the same size
            end;
         end;
      if inputFS[ix]=nil then inc(countOfMissing); //if we didn't get an input stream, add to the missing count
      end;
   //you are only allowed to have 1 missing input file
   if countOfMissing>1 then
      raise Exception.Create('Unable to recover file! You must have at least N-1 parts of the file');
   assert(testFS<>nil, 'testFS=nil? This should never happen');

   while testFS.Position<testFS.Size do //while we are not at the end of the input file
      begin
      if countOfMissing=0 then          //if there are no missing streams, we can just remerge the data together
         begin
         for ix := 0 to raidLength do
            inputFS[ix].Read(byteArray[ix],1);

         checkByte:=0;
         //calc a checkByte to make sure it all still agrees
         for ix := 0 to raidLength-2 do //0 based, stop 1 short of the end
            if ix=0 then
               checkByte:=byteArray[ix] xor byteArray[ix+1] //seed check byte by XORing the 1st two bytes together
            else
               checkByte:=checkbyte xor byteArray[ix+1];      //XOR the bytes together
         if checkByte<>byteArray[raidLength] then  //the new checkByte doesn't match the old parity byte. That means we have a problem, the file doesn't match
            checkFails:=true;
         end
      else                              //if there are missing streams, we have to calculate the missing data
         begin                          //you have to reverse the XOR with the parity bit to determine the result
         for ix := 0 to raidLength-1 do //not counting the end of the array, that's where we'll store the damaged byte
            if inputFS[ix]<>nil then    //if this is a valid stream, read it's data
               inputFS[ix].Read(byteArray[ix],1)
            else
               begin                    //this isn't a valid stream, so remember this is damaged and read from the parity stream for this position
               damage:=ix;
               inputFS[raidLength].Read(byteArray[ix],1);
               end;
         //this is calcs the damaged byte into the end of the array (raidLength)
         for ix := 0 to raidLength-2 do //0 based, stop 1 short of the end
            byteArray[raidLength]:=byteArray[ix] xor byteArray[ix+1]; //XOR the bytes together
         byteArray[damage]:=byteArray[raidLength]; //replace the damaged byte with the restored byte
         end;
      FS.Write(byteArray[0],raidLength);           //write out the merged (and psbly restored) data
      end;
finally
   for ix := 0 to raidLength do                   //clean up the memory from the input streams
      if inputFS[ix]<>nil then inputFS[ix].Free;
   FS.Free;                                       //clean up the memory for the output stream
end;
if checkFails then
   raise Exception.Create(Format('This file (%s) does not match to its parity checks. Damage may be present', [outputName]));
end;

Delphi Case with Strings

Zarko Gajic posted about this topic on his Delphi Tips recently showing how one could use a case statement with strings. His solution basically passes in an array on the stack and then iterates through it providing the index number of the matching string. I don't often want to do this, but the idea comes up occassionally enough that I thought I'd play with it a little. The first thing that struck me with this is that passing things on the stack is bound to be slow. Any time you can avoid memory allocation in your routines, do it. The way Zarko wrote his StringToCaseSelect routine created a COPY of the information on the stack. In my testing, just changing the CaseList: array of string to CaseList:const array of string improved the performance of the code by almost 30% for his example. Mind, I'm not using hyper-precise counters; however, it definitely makes a difference. Secondly, I was curious how the performance changed if I used the old stand-by: If then else. Using the ...

Bob Swart said…

You write that the size stored on each drive is 1/(n-1) of the total file size. Thus, you conclude that a 100 byte file in a RAID 5 system only stores 20 bytes on each drive. This is not true IMHO. In a RAID 5 system, the entrie file is stored 4 times (4 times and the parity checksum). If you have N disks of M size, your total diskspace is N * M but the amount of files you can store is still only M.

April 17, 2009 at 11:43 PM

Thomas Mueller said…

wouldn't the example 10 gb file result in 3x 5 gb files for raid 5 with 3 "disks" rather than 3x 3.33 gb?

April 18, 2009 at 1:13 AM

Anonymous said…

RAID 5 implements striping with parity. Data is "striped" across disk, and parity is ditributed across disk too. Available disk space in n-1. It will tolerate one disk failure, but two disk failure will lead to data loss.
RAID 5 incurs a significate write penalty (due to parity calculation and writing), while read performance are good.

April 20, 2009 at 9:29 AM

Marshall Fryman said…

@Bob: RAID 5 does not mirror the data to each drive. You can see the documentation here discussing RAID 5 striping with parity. Perhaps you're thinking of RAID 1 (mirroring)? With RAID 1, there is no parity drive since an exact copy if stored on each drive.

@Thomas: You are correct. Sorry, I made an inline edit and didn't catch the number change. Thanks!

April 20, 2009 at 9:30 AM

Ok, apparently href's aren't allowed in the comment section. Here's the link: http://en.wikipedia.org/wiki/RAID_5#RAID_5

April 20, 2009 at 9:31 AM

Dale McKay said…

The following 2 statements are equivalent:

1) for ix := 0 to ( rl- 2 ) do
ba[rl] := ba[ix] xor ba[ix+1];

2) ba[rl] := ba[rl- 2] xor ba[rl-1];

So I think your parity calculation is incorrect.
You're not accounting for the result of the previous loop iteration.

April 27, 2009 at 12:58 AM

@Dale : You are, indeed, correct. I tested the code using a 3-file emulation which just so happens to work. When you move beyond 3-files (3 "drives") the parity code does not work. I've corrected the post so it contains the correct source. The only change is in the parity calculation. Thanks for catching that. I was so focused on writing a utility for Amazon and Rackspace I didn't think to test it for more files.

April 27, 2009 at 10:07 AM

Ruminated Rumblings

Search This Blog

Querying GitHub Issues from Google App Script

RAID, files and cloud storage

Labels

Comments

Popular posts from this blog

SMTP Mail and Indy (again)

Delphi Case with Strings

Detecting a virtualized environment