On (my) Objective-C code that's slower then expected

I was looking into building a spell checker for the geographical names in BucharestApp based on a Peter Norvig's article on building a spell checker so I've started writing a translation of Peter's python script in Objective-C.

The unpleasant surprise came when I timed my Objective-C code against the python script.
Running the spell checker on one of my word lists in python took 1.11 seconds; the Objective-C code was 4.69 times slower.

After factoring out training time because this can be very easily optimized by something as simple as serializing the training dictionary instead of computing it at runtime, I watched how my Objective-C code takes a whopping 5.3 seconds for spellchecking 11 words.

generataed generate -1.459070 s
acount count -0.003310 s
guidlines guideline -1.173020 s
wheere where -0.002277 s
myne mine -0.001197 s
graet great -0.002184 s
silenc silent -0.001916 s
aggresive aggressive -0.003051 s
appreceiated appreciated -0.004764 s
aquantance acquaintance -1.483899 s
beggining beginning -1.185744 s
Total runtime -5.329394



The data set shows a cluster of words performing at .003 seconds and another cluster performing at 1.4 seconds.
What the miss-spellings "generataed", "guidlines", "aquantance", "beggining" have in common is edit distance. Unlike the other miss-spellings that are all at edit distance 1, they are all at edit distance 2. The problem with edit distance 2 is that it produces many variations of the miss-spelled word (around 200000) - Peter explains all this in his article.

My first instinct was to blame NSSet operations. After all, I was intersecting a 200000 set (the words at edit distance 2) with a 40000 set (the dictionary).
This must be slow, right? Wrong. After messing with the initial capacity of my NSSets and poking around Google for some explanation of what I was seeing, I started looking at the data Shark shows.

As expected, Shark shows the bottleneck in knownEdits2, the method which computes variations at edit distance 2.
But it's not NSSet operations that take the most CPU, but rather unexpectedly it's [NSString stringWithFormat].





-[CSSpell knownEdits2] does most of its work in -[CSSpell edits1StringsInObjCUnoptimized]:


- (NSSet *) edits1StringsInObjCUnoptimized:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;
//deletions
for(i=0;i<[word length];i++)
[s addObject:[NSString stringWithFormat:@"%@%@",
[word substringToIndex:i],
[word substringFromIndex:i+1]]];


//replaces
for(i=0;i<[word length];i++)
for(c='a';c<='z';c++)
[s addObject:[NSString stringWithFormat:@"%@%c%@",
[word substringToIndex:i],
c,
[word substringFromIndex:i+1]]];

// inserts
for(i=0;i<[word length];i++)
for(c='a';c<='z';c++)
[s addObject:[NSString stringWithFormat:@"%@%c%@",
[word substringToIndex:i],
c,
[word substringFromIndex:i]]];

//transposes
for(i=0;i<[word length]-1;i++)
[s addObject:[NSString stringWithFormat:@"%@%@%@%@",
[word substringToIndex:i],
[word substringWithRange:NSMakeRange(i+1, 1)],
[word substringWithRange:NSMakeRange(i, 1)],
[word substringFromIndex:i+2]]];

return s;
}




-[CSSpell edits1StringsInObjCUnoptimized] is shown by Shark to call -[NSString stringWithFormat] lots of times.
Which makes sense, since edits are formed using loops. But when I looked at how otx shows the assembler code generated from this method, I had another surprise (many parts cut for brevity):


-(id)[CSSpell edits1StringsInObjCUnoptimized:]
[..]
+36 0000281b e8ef270000 calll 0x0000500f +[NSMutableSet setWithCapacity:]
[..]
+101 0000285c e8ae270000 calll 0x0000500f -[(%esp,1) substringToIndex:] ///--
[..]
+134 0000287d e88d270000 calll 0x0000500f +[NSString stringWithFormat:]
[..]
+158 00002895 e875270000 calll 0x0000500f -[(%esp,1) addObject:]
[..]
+268 00002903 e807270000 calll 0x0000500f -[(%esp,1) substringToIndex:] ///--
[..]
+306 00002929 e8e1260000 calll 0x0000500f +[NSString stringWithFormat:]
[..]


The surprise is that substringToIndex is called twice.
Since it's called on an immutable string and thus it yields the same result within the method's scope, I would have expected GCC to optimize the call.

So I rewrote the method, optimizing calls to substring methods:


- (NSSet *) edits1StringsInObjC:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;

for(i=0;i<[word length];i++)
{
NSString * toIndex = [word substringToIndex:i];
NSString * fromIndex = [word substringFromIndex:i+1];
NSString * fromIndex2 = [word substringFromIndex:i];


if(i<[word length]-1)
[s addObject:[NSString stringWithFormat:@"%@%c%c%@",
toIndex,
[word characterAtIndex:i+1],
[word characterAtIndex:i],
[word substringFromIndex:i+2]]]; // transposes


[s addObject:[NSString stringWithFormat:@"%@%@", toIndex, fromIndex]]; //deletion

for(c='a';c<='z';c++)
{
[s addObject:[NSString stringWithFormat:@"%@%c%@",
toIndex,
c,
fromIndex]]; //replace
[s addObject:[NSString stringWithFormat:@"%@%c%@",
toIndex,
c,
fromIndex2]]; //insert
}
}

return s;
}



Timing the code after the optimization shows:

generataed generate -1.142463 s
acount count -0.001646 s
guidlines guideline -0.931142 s
wheere where -0.002310 s
myne mine -0.001241 s
graet great -0.001417 s
silenc silent -0.001597 s
aggresive aggressive -0.002322 s
appreceiated appreciated -0.003206 s
aquantance acquaintance -1.157321 s
beggining beginning -0.938545 s
Total runtime -4.192067



This optimization has shaved off ~0.35 seconds when checking miss-spells at edit distance 2.
In the total runtime, we've decreased to 78% out of the initial runtime.

If we check this new version using Shark, the bottleneck is still shown in -[NSString stringWithFormat], in fact the stringWithFormat calls have increased their presence in CPU time to 34% out of the total run-time. Considering the Shark data takes into account training time too, stringWithFormat is in fact burning probably around 50% of our CPU time.






Googling around yet again, I've arrived to the conclusion that NSString might be indeed slow because of the complete UTF-8 support it offers. For the purpose of the spell checker, UTF-8 is not needed, but unfortunately you just can't tell NSString to not care about UTF-8.
How about replacing stringWithFormat and forming a NSString out of pure C strings?



- (NSSet *) edits1StringsInC:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;

#define MAX_LEN_C 500
char toIndex[MAX_LEN_C];
char fromIndex[MAX_LEN_C];
char fromIndex2[MAX_LEN_C];
char str[MAX_LEN_C];
char temp[MAX_LEN_C];

if([word length] > MAX_LEN_C)
return [self edits1StringsInObjC:word];

strcpy(str, [word cStringUsingEncoding:NSASCIIStringEncoding]);

for(i=0;i<[word length];i++)
{
strncpy(toIndex, str, i);
toIndex[i] = '\0';
strcpy(fromIndex, str+i+1);
strcpy(fromIndex2, str+i);

sprintf(temp, "%s%c%c%s", toIndex, str[i+1], str[i], str+i+2);
NSString * transposed = [NSString stringWithCString:temp length:strlen(temp)];
if(i<[word length]-1)
[s addObject:transposed];

sprintf(temp, "%s%s", toIndex, fromIndex);
NSString * deletion = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:deletion];

for(c='a';c<='z';c++)
{
sprintf(temp, "%s%c%s", toIndex, c, fromIndex);
NSString * replaced = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:replaced];
sprintf(temp, "%s%c%s", toIndex, c, fromIndex2);
NSString * inserted = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:inserted];
}
}

return s;
}



Timing the code after changing string forming to C shows a total runtime of 2.04 seconds.
Our optimization has shaved off 1 second when checking miss-spells at edit distance 2 compared to the original implementation.
In the total runtime, we're down to 38% from the initial implementation runtime. Mind you, it's still twice as slow when compared to the python equivalent.

In this latest version, the CPU time is split between edits1StringsInC (3X), intersectSet (2X) and unionSet (1X).
edits1StringsInC still takes most of the CPU time, with the most intensive operations being sprintf at 41% and stringWithCString at 30%.

The next thing I tried is replacing sprintf with strcpy.
Yet another half a second in speed gain, but the code becomes less and less readable:


- (NSSet *) edits1StringsInCv2:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0, len;
char c;

#define MAX_LEN_C 500
char str[MAX_LEN_C];
char temp[MAX_LEN_C];

if([word length] > MAX_LEN_C-1)
return [self edits1StringsInObjC:word];

strcpy(str, [word cStringUsingEncoding:NSASCIIStringEncoding]);
for(i=0;i<[word length];i++)
{
strcpy(temp, str); temp[i] = str[i+1]; temp[i+1] = str[i];
len = strlen(temp);

if(i<[word length]-1)
{
NSString * transposed = [NSString stringWithCString:temp length:len];
[s addObject:transposed];
}

strcpy(temp + i, str+i+1);
NSString * deletion = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:deletion];

for(c='a';c<='z';c++)
{
temp[i] = c; strcpy(temp + i + 1, str+i+1);
NSString * replaced = [NSString stringWithCString:temp length:len];
[s addObject:replaced];

strcpy(temp + i + 1, str+i);
NSString * inserted = [NSString stringWithCString:temp length:len+1];
[s addObject:inserted];
}
}

return s;
}




Using strcpy and hard to follow pointer arithmetic instead of sprintf, we get at 1.5 seconds:

generataed generate -0.402929 s
acount count -0.001699 s
guidlines guideline -0.329685 s
wheere where -0.000743 s
myne mine -0.000473 s
graet great -0.001423 s
silenc silent -0.000769 s
aggresive aggressive -0.001243 s
appreceiated appreciated -0.001449 s
aquantance acquaintance -0.421072 s
beggining beginning -0.340965 s
Total runtime -1.509337



In knownEdits2 Shark shows a tie between edit1StringsInCv2 and -[NSMutableSet intersectSet].




At this point, further optimizations in knownEdits probably don't make sense unless NSMutableSet is replaced too. And working with sets in plain C is not fun, so for now I'll settle with a 1.50 seconds, 35% slower run-time when compared to the python equivalent.

Conclusions:

  • Python string handling is fast

  • Python set operations are fast

  • Objective-C string operations are slow