Delphi Case with Strings

Zarko Gajic posted about this topic on his Delphi Tips recently showing how one could use a case statement with strings. His solution basically passes in an array on the stack and then iterates through it providing the index number of the matching string. I don't often want to do this, but the idea comes up occassionally enough that I thought I'd play with it a little.

The first thing that struck me with this is that passing things on the stack is bound to be slow. Any time you can avoid memory allocation in your routines, do it. The way Zarko wrote his StringToCaseSelect routine created a COPY of the information on the stack. In my testing, just changing the CaseList: array of string to CaseList:const array of string improved the performance of the code by almost 30% for his example. Mind, I'm not using hyper-precise counters; however, it definitely makes a difference.

Secondly, I was curious how the performance changed if I used the old stand-by: If then else. Using the same testing routine (and imprecise timers), If Then Else is 368% percent faster than the original, non-const version and 283% faster than the const version. That's a pretty hefty speed penalty to pay just for using a case with strings.


procedure TForm1.standardIF;
const strAb = 'About';
      strBl = 'Borland';
      strDl = 'Delphi';
var s:string;
begin
     if s=strAB then begin end
else if s=strBl then begin end
else if s=strDl then begin end
end;

Finally, I started to wonder if there wasn't a faster way to do this. It is likely the iteration and memory operations that are consuming so much time in his example. Trying to reduce this lead me back to everyone's favorite string solution: the hash. CodeGear has a very simple hash routine they use in inifiles (TStringHash.HashOf). Taking that code, I tried using it as a simple


case HashOf('Delphi') of
HashOf('About')   : begin end;
HashOf('Borland') : begin end;
HashOf('Delphi')  : begin end;
end;

but forgot that the items must be constants. Since Delphi doesn't have a precompiler, that leaves me with the ugly task of generating the hash values for each string and then using those constants in my case. That provides me with the following code:


procedure TForm1.delphiHash;
const cnAb = 24272; //HashOf('About');
      cnBl = 389836; //HashOf('Borland');
      cnDl = 92361; //HashOf('Delphi');
begin
//from delphi hash inifiles
case HashOf('Delphi') of
cnAb : begin end;
cnBl : begin end;
cnDl : begin end;
end;
end;

That's quite a bit of work just to use a case with a string, but the performance actually does mean it might be worthwhile. In my tests, I found the case hash style ran 204% faster than the if then else, 577% faster than the const version of StringCaseSelect and 750% faster than the original StringCaseSelect.

Even though all of that is true, I have to admit that the performance difference on a modern CPU wasn't visible until I ran up to the 10,000,000 iterations or more. If you have a rarely-used routine, you could use which ever method suits you. If you have one that gets hammered a lot, the If Then Else style makes the most sense. I cannot really see the point where the payback for doing all the work of the precompiler makes a lot of sense to using the case hash style.

Of course, in my own development I rarely run into a case string need so perhaps you have a need that does make it worth while. If so, keep in mind that a hash does not guarantee uniqueness. It's unlikely you'll hit a duplicate, but not impossible. The less random the data, the more unlikely duplicates become.

Comments

Anonymous said…

Ah, thanks! That was bugging me too after I read that post.

March 31, 2008 at 3:00 PM

Anonymous said…

And the first thing that struck me was that Zarko obviously never heard about AnsiIndexText and AnsiIndexStr...

April 1, 2008 at 1:05 AM

Marshall Fryman said…

Actually, I never use those routines either, but it was an interesting point so I added AnsiIndexText and AnsiIndexStr to my test routine and discovered that Zarko's routine is MUCH faster than either of them. Using Zarko's routine (without the const) as a baseline, AnsiIndexText clocks in at 1213% slower and AnsiIndexStr clocks in at 1203% slower. The reason for the speed difference is because both Ansi routines make calls back to the Windows API using the local code page to do the comparison. In contract, Zarko's just uses CompareText which is optimized assembler with no remote calls.

Of course, I'm not sure how CompareText would work with Unicode text, but that wasn't my primary interest. I have yet to see a reason someone should use a case with string instead of an if then else structure.

April 1, 2008 at 9:37 AM

امیر said…

But what you have suggested may not work as the "if then" version! (because collision). There may be two string with the same "hash code". You need to know what is the exact code behind the hash function or do a compare ...

April 1, 2008 at 9:52 AM

Marshall Fryman said…

I completely agree that the hash sample has problems. That's why I've said that I cannot see any reason to use any structure OTHER than the "if then else". I did put a little caveat on the end of the post saying you had to watch out for duplicate hash numbers. It's definitely a known problem with hashes and reduces the provability of the code. Unfortunately, you can't even use a high definition hash (like SHA512) because the case requires an ordinal type.

April 1, 2008 at 9:56 AM

Mike Gibbard said…

And, a year on, I too have been curious about the performance of case statements using strings compared to stardard if-then-else and came to pretty much the same conclusions. Until I found a simple function on the SwissDelphiCenter site (http://www.swissdelphicenter.ch/torry/showcode.php?id=2330).
So, I decided to benchmark the contenders and got the following results (based on 10 million iterations per function):

Zarko - StringToCaseSelect: 1.956 seconds
Zarko - StringToCaseSelect + const: 1.747 seconds
RaverJK - CaseOfString: 1.249 seconds
RaverJK - CaseOfString + const: 1.171 seconds
RaverJK - CaseOfString + const + SameText: 0.813 seconds
If statement: 1.512 seconds

As you can see, the small modifications I made to RaverJKs original function made a significant speed difference to an already fast piece of code.

The modified CaseOfString function is:

function CaseOfString3(s: string; const a: array of string): Integer;
begin
Result := 0;
while (Result < Length(a)) and (not (SameText(a[Result], s))) do
Inc(Result);
if not SameText(a[Result], s) then Result := -1;
end;

April 9, 2009 at 10:18 AM

Marshall Fryman said…

Mike:

Can you identify which version of Delphi you ran those tests on? I'm using D2007 and am not able to match up your numbers.

In my tests I show the CaseOfString3 routine to be 191% of the time of the If Then Else routine. I did note that the larger the case elements, the slower the performance of the CaseOfString3 routine. For instance, when run with only 5 elements, the difference drops to 165%. My original test had 25 elements at 191%.

If you are using Delphi 2009, I suspect you would see less of a difference since it might use the MS Windows comparator which is able to compare taking into account the codepage. As I mentioned previously, Delphi 2007 has some support for the Windows comparator in the AnsiIndexXXX. This support slows those routines down considerably. My suspicion is that Delphi 2009 ONLY uses the codepage comparator generating much slower string operations. Note that I don't use D2009 so I cannot confirm that, it is only my opinion.

April 9, 2009 at 1:02 PM

Mike Gibbard said…

Marshall,

I used Delphi 7 running on a dual core 2.33GHz machine with 2GB RAM and Vista Ultimate

April 30, 2009 at 1:03 AM

Marshall Fryman said…

@Mike: Interesting. I'm running D2007 on a quad-core with Vista x64 Business and 8GB of RAM. Would you send me your code and I'll run it on my system? I've been looking for my test code but didn't apparently save the original so I'm left with just an EXE. E-mail is marshall.fryman at gmail.

April 30, 2009 at 2:43 PM

Alex vd vliet said…

Funny, I created the following piece of code to avoid a big if -then structure.

procedure TForm1.Button1Click(Sender: TObject);
type
TSubActions = (test1, test2, test3, test4);
var
FSub : TSubActions;
FAction : String;
begin
FSub := TSubActions(GetEnumValue(typeinfo(TSubActions), FAction));
case FSub of
test1 : begin; end;
test2 : begin; end;
test3 : begin; end;
test4 : begin; end;
end;
end;

November 10, 2011 at 5:47 AM

Ruminated Rumblings

Search This Blog

Querying GitHub Issues from Google App Script

Delphi Case with Strings

Comments

Popular posts from this blog

SMTP Mail and Indy (again)

Detecting a virtualized environment