Counting words in a string in C#

It’s been a while since I’ve done a programming post, but something came up the other day that I thought might be useful.

In one of my current projects, I needed to determine the number of words in a string. Simple enough, but before I wrote the code, I did a Google search to see if there was some obvious better way. Surprisingly, not only didn’t I see a better way, but all the examples I found were either really inefficient or wrong–or both!

In the inefficient and wrong category, a lot of people were doing a split on various white-space characters (or just a space), creating an array just for the purpose of counting. Not only do you end up with a potentially massive data structure you don’t need, but if there are multiple spaces or odd white-space combinations, you end up with bogus entries. Yikes.

Anyway, It seems that the test for accuracy is comparing to MS Word’s algorithm. My algorithm is pretty simple, but is fairly efficient, and matches the MS Word results for every test I’ve tried. This doesn’t mean that it is right–merely that, if it is wrong, it is wrong in the same way as MS Word!

I wrote it as an extension method, so it needs to be put into a static class. White space and comments removed for brevity:

public static int CountWords(this string value)
  int count = 0;
  bool inWord = false;
  for (int i = 0; i < value.Length; i++)
    bool isWhiteSpace = Char.IsWhiteSpace(value[i]);
    if(!inWord && !isWhiteSpace)
    inWord = !isWhiteSpace;
  return count;

Not much too it. Hopefully it will be useful to someone…

2 Comments on “Counting words in a string in C#

    • The Internet failed me–I thought it was always truthful :-(.

      I just found an interpreter for Whitespace written in Python. I think I will use it to create a Perl compiler…

Leave a Reply

Your email address will not be published. Required fields are marked *