Annoying Scrabble words like azo are even more annoying when you use a blank

An interesting problem I had recently was to check to see if a string was valid word or not, comparing against a word list with over 170,000 entries.

Checking the string is easy, after loading the words into an NSArray, you can just call containsObject:

NSString *wordToFind = @“the”;
BOOL wordIsValid = [wordArray containsObject:wordToFind];

The code above takes 0.0310 seconds on my iPhone 5s.

It’s even faster with an NSSet:

NSString *wordToFind = @“the”;
BOOL wordIsValid = [wordSet containsObject:wordToFind];

Using an NSSet is over 300x faster than NSArray in this instance: 0.00001s!

Blankety blank

What if you wanted to search to see if the word was valid using blanks, like in Scrabble? NSPredicate makes this very easy:

NSString *wordToFind = @“th?”;
NSPredicate = [NSPredicate predicateWithFormat:@“SELF LIKE %@“,wordToFind];
NSArray *filteredArray = [wordArray filteredArrayUsingPredicate:predicate];

But then I looked at how long it took: 0.44 seconds. Ouch! It’s not acceptable to block the main thread for that long. There’s a similar method for NSSet:

NSString *wordToFind = @“th?”;
NSPredicate = [NSPredicate predicateWithFormat:@“SELF LIKE %@“,wordToFind];
NSSet *filteredSet = [wordSet filteredSetUsingPredicate:predicate];

which took 0.47 seconds – even worse!

Benchmarking

At this point I decided to benchmark the results by running the test on 500 randomly selected words, where 5% of the letters were turned into blanks. Here’s a sample from my test word list of 500 words:

unclu?tered
succumbs
piggeries
pseu?opregnant
combat?ve

(I have no idea what ‘pseudopregnant’ means…).

Running the test with 500 different words and measuring the time it took to do each test enabled me to record the mean, minimum, maximum and standard deviation of each method. Unfortunately I had to run these tests on the iOS simulator – turns out there’s a subtle bug when repeatedly running an NSPredicate which causes a huge number of NSComparisonPredicate objects to be malloc’ed and not freed, causing the tests above to blow up due to memory pressure on my 5S.

Using the simulator, here’s the results of the test with filteredArrayUsingPredicate: and filteredSetUsingPredicate:

Method	Average	Min	Max	Standard Deviation
filteredArrayUsingPredicate:	0.15	0.12	0.34	0.03
filteredSetUsingPredicate:	0.16	0.14	0.54	0.03

Other in-memory methods

I then tried a litany of various methods on NSSet and NSArray to see if I could come up with something faster.

Using blocks: indexOfObjectPassingTest

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    NSUInteger index = [wordArray indexOfObjectPassingTest:^BOOL(id obj, NSUInteger idx, BOOL *stop) {
        return [predicate evaluateWithObject:obj];
    }];
    if(index != NSNotFound)
    {
//can return the found word using index
}

Prefiltering NSArray

This method pre-filters the word list by narrowing it down to only words which have the same first letter (unless we have a blank as the first letter) as the word we’re trying to match.

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    
    NSString *firstLetter = [wordToCheck substringToIndex:1];
    NSArray *smallerArray;
    //prefiltering on first letter will only work if first letter is not blank
    if([firstLetter isEqualToString:@"?"])
    {
        smallerArray = wordArray;
    }
    else
    {
        NSPredicate *firstLetterPredicate = [NSPredicate predicateWithFormat:@"SELF BEGINSWITH %@",firstLetter];
        smallerArray = [wordArray filteredArrayUsingPredicate:firstLetterPredicate];
    }
    
    NSArray *filteredArray = [smallerArray filteredArrayUsingPredicate:predicate];
    if([filteredArray count] > 0)
    {
//return matched word here if needed
}

Prefiltering NSSet

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    
    NSString *firstLetter = [wordToCheck substringToIndex:1];
    //prefiltering on first letter will only work if first letter is not blank
    NSSet *smallerSet;
    if([firstLetter isEqualToString:@"?"])
    {
        smallerSet = wordSet;
    }
    else
    {
        NSPredicate *firstLetterPredicate = [NSPredicate predicateWithFormat:@"SELF BEGINSWITH %@",firstLetter];
        smallerSet = [wordSet filteredSetUsingPredicate:firstLetterPredicate];
    }
    
    NSSet *filteredSet = [smallerSet filteredSetUsingPredicate:predicate];
    if([filteredSet count] > 0)
    {
//return matched word
}

Using a compound predicate on NSArray

What if we combined the above two methods into a compound predicate – where we check to see if the first letter is the same AND the word is like the word we’re searching for. This might result in a faster match as we can avoid using the expensive LIKE predicate if the first letters are not matching:

    NSPredicate *predicate;
    //if first letter is blank we have to fall back to using normal predicate
    if([firstLetter isEqualToString:@"?"])
    {
        predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    }
    else
    {
        predicate = [NSPredicate predicateWithFormat:@"SELF BEGINSWITH %@ AND SELF LIKE %@",firstLetter,wordToCheck];
    }
    
    
    NSArray *filteredArray = [wordArray filteredArrayUsingPredicate:predicate];
    if([filteredArray count] > 0)
    {
//retrieve matched word here
}

Using a compound predicate on NSSet

    NSString *firstLetter = [wordToCheck substringToIndex:1];
    NSPredicate *predicate;
    //if first letter is blank we have to fall back to using normal predicate
    if([firstLetter isEqualToString:@"?"])
    {
        predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    }
    else
    {
        predicate = [NSPredicate predicateWithFormat:@"SELF BEGINSWITH %@ AND SELF LIKE %@",firstLetter,wordToCheck];
    }
    
    NSSet *filteredSet = [wordSet filteredSetUsingPredicate:predicate];
    if([filteredSet count] > 0)
    {
//match word here
}

NSArray concurrent enumeration

NSArray also has a method:

- (void)enumerateObjectsWithOptions:(NSEnumerationOptions)opts 
                         usingBlock:(void (^)(ObjectType obj, NSUInteger idx, BOOL *stop))block;

which allows concurrent enumeration when you pass the NSEnumerationConcurrent option:

__block NSString *stringToReturn;
    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    [wordArray enumerateObjectsWithOptions:NSEnumerationConcurrent
                                usingBlock:^(id obj, NSUInteger idx, BOOL *stop)  {
                                    if([predicate evaluateWithObject:obj])
                                    {
                                        stringToReturn = obj;
                                        *stop = YES;
                                    }
                                }];
    if(stringToReturn)
    {
}

NSSet concurrent enumeration

NSSet offers an almost identical method, except there is no index parameter as NSSets are unordered:

    __block NSString *stringToReturn;
    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF LIKE %@",wordToCheck];
    [wordSet enumerateObjectsWithOptions:NSEnumerationConcurrent
                                usingBlock:^(id obj, BOOL *stop)  {
                                    if([predicate evaluateWithObject:obj])
                                    {
                                        stringToReturn = obj;
                                        *stop = YES;
                                    }
                                }];
    if(stringToReturn)
    {
}

Results

Method	Average	Min	Max	Standard Deviation
indexOfObjectPassingTest	0.07	0.0001	0.31	0.04
Prefiltering NSArray	0.07	0.04	0.24	0.03
Prefiltering NSSet	0.08	0.05	0.41	0.03
NSArray compound predicate	0.09	0.06	0.33	0.03
NSSet compound predicate	0.10	0.08	0.26	0.03
NSArray concurrent enumeration	0.10	0.004	0.36	0.06
NSSet concurrent enumeration	0.10	0.01	0.31	0.06

We have managed to do a bit better with some of these methods, but we’re still nowhere near acceptable performance. Maybe Core Data might have some under-the-hood optimisations that might help us:

Core Data

I set up a simple Core Data stack, with one entity “Word” which had one attribute “word” (naming things is hard…). Here’s the method to retrieve a matching word in my data controller:

-(Word *)wordMatchingString:(NSString *)stringToMatch
{
    Word *wordToReturn;
    NSFetchRequest *request = [[NSFetchRequest alloc]init];
    //we want to retrieve all things
    NSEntityDescription *e = [[[persistenceController mom]entitiesByName]objectForKey:@"Word"];
    //set the entity description to the fetch request
    [request setEntity:e];
    NSPredicate *predicate = [NSPredicate predicateWithFormat:@"word LIKE %@",stringToMatch];
    [request setPredicate:predicate];
    //limit to one result
    [request setFetchLimit:1];
    
    NSError *error;
    //execute the fetch request and store it in an array
    NSArray *result = [[persistenceController mainContext] executeFetchRequest:request error:&error];
    //if not successful, throw an exception with the error
    if(!result)
    {
        [NSException raise:@"Fetch failed" format:@"Reason: %@",[error localizedDescription]];
    }
    
    //return the only object in the array
    wordToReturn = [result lastObject];
    return wordToReturn;
}

And here’s the method to match the word:

    Word *foundWord = [[WSDataController sharedController]wordMatchingString:wordToCheck];
    if(foundWord)
    {
}

The results for this were all over the place:

Method	Average	Min	Max	Standard Deviation
Core Data	0.10	0.001	0.76	0.06

I had high hopes for Core Data but it’s just not fast or consistent enough for this application.

Descent into SQLite

I was about to give up at this point. I asked Dave Sims if he had any advice and he suggested implementing a Directed Acyclic Word Graph (DAWG) which would be very efficient for finding matches – despite the opportunity to make “yo dawg” jokes I reckoned it might be a bit above my pay grade for some weekend pottering.

I’d read about some iOS developers preferring to deal directly with SQLite for their persistent storage, rather than using Core Data (which uses SQLite as one of its storage options). Relishing the opportunity to type in all caps, I followed a tutorial to get a basic SQLite setup going (the tutorial has a small error in, which XCode will catch with a warning, the solution is to replace the offending line with if (sqlite3_step(compiledStatement) == SQLITE_DONE) { ). Here’s my SQLite query:

    //SQLite uses underscores as the single character wildcard
    NSString *underscoreString = [wordToCheck stringByReplacingOccurrencesOfString:@"?" withString:@"_"];
    NSString *query = [NSString stringWithFormat:@"SELECT word FROM words WHERE word LIKE '%@'",underscoreString];
    NSArray *resultsArray = [[NSArray alloc] initWithArray:[self.dbManager loadDataFromDB:query]];
    if([resultsArray count]>0)
    {
}

This gave the following results:

Method	Average	Min	Max	Standard Deviation
SQLite	0.017	0.0014	0.056	0.004

Success!

Using SQLite not only resulted in faster matching, it was also much more reliable than any of the other methods, with a very small standard deviation. Obviously matching a string in an array of 170,000 strings is an edge case, and for most cases any of the other methods would have sufficed.

It’s also worth noting that I was able to run the Core Data and SQLite tests on my own iPhone 5S as they do not suffer the same NSPredicate memory leak as the other methods. Here are the results on the phone:

Method	Average	Min	Max	Standard Deviation
Core Data (iPhone 5S)	0.35	0.002	0.78	0.19
SQLite (iPhone 5S)	0.037	0.036	0.042	0.0006

Full results

Method	Average	Min	Max	Standard Deviation
filteredArrayUsingPredicate:	0.15	0.12	0.34	0.03
filteredSetUsingPredicate:	0.16	0.14	0.54	0.03
indexOfObjectPassingTest	0.07	0.0001	0.31	0.04
Prefiltering NSArray	0.07	0.04	0.24	0.03
Prefiltering NSSet	0.08	0.05	0.41	0.03
NSArray compound predicate	0.09	0.06	0.33	0.03
NSSet compound predicate	0.10	0.08	0.26	0.03
NSArray concurrent enumeration	0.10	0.004	0.36	0.06
NSSet concurrent enumeration	0.10	0.01	0.31	0.06
Core Data	0.10	0.001	0.76	0.06
SQLite	0.017	0.0014	0.056	0.004
Core Data (iPhone 5S)	0.35	0.002	0.78	0.19
SQLite (iPhone 5S)	0.037	0.036	0.042	0.0006