I recently had a requirement to be able to spot pluralised words which may not be real words, but follow English language conventions, and so appear as a plural to any normal reader. So ComplexBusinesObjects
is a valid plural of ComplexBusinesObject
but series is its own plural. The idea was to write code that would recognise a plural in the same way a human might, even if it was not perfect grammar. I needed to spot plurals given a string for the singular and the plural, rather than generates one from the other, which is different to many other more more popular scenarios.
There are many ways to tackle this. You could use code inside System.Data.Entity.Design.PluralizationServices.PluralizationService, Castle’s Inflector class or code from the Humanizer however I did not want all of entity framework, and found our requirement was “looser” than some of the stricter cases in those libraries. I am not generating the plurals, just testing for them, although given perfect generation code, identifying a pair would be trivial. But generating plurals from words that may not be real is challenging. I also liked a single class solution rather than an extra dependency and gave up hunting for a pre-rolled solution.
Identifying plurals for dummies
Version one was a very dumb but practically effective solution - if the first four characters match and it ends in an s, then it is a plural. An ugly solution, but it “worked” perfectly in production for many months and took about 2 minutes to write.
Getting more sophiticated
However as we processed more data, failing cases inevitably popped up and I decided a rewrite was in order. Using a list of rules and common exceptions I curated from the web such as the irregular plurals list on Wikipedia I concocted a strategy of rules and edge cases that will spot plurals well enough to satisfy my requirements.
The code is far from perfect, probably slow and will no doubt have some edge cases, but it works well for me. Hopefully it might for you too. It is hosted in a Gist and includes some tests that demonstrate its effectiveness.
Go forth and fork it.