Counting Word Occurrences


 

I put this solution together for a reader named Rakesh. Rakesh asked for a solution that counts the number of times certain words occur in Outlook messages received within a given date range. The list of words he wants to count are in column B of a spreadsheet, one word per row. Rakesh wants the counts to appear next to the words in column C. Here’s how Rakesh will use the solution.

  1. Select the Outlook folder containing the messages he wants to process
  2. Run the macro
  3. Enter the path and file name of the Excel workbook containing the words he wants to count for
  4. Enter a date range.

The macro scans every message in the selected folder that arrived within the date range he entered. The code scans each message for every word in the workbook and write the counts to the spreadsheet.

Instructions.

Follow these instructions to add the code to Outlook.

  1. Start Outlook
  2. Press ALT + F11 to open the Visual Basic Editor
  3. If not already expanded, expand Microsoft Office Outlook Objects
  4. If not already expanded, expand Modules
  5. Select an existing module (e.g. Module1) by double-clicking on it or create a new module by right-clicking Modules and selecting Insert → Module.
  6. Copy the code from the code snippet box and paste it into the right-hand pane of Outlook’s VB Editor window
  7. Click the diskette icon on the toolbar to save the changes
  8. Close the VB Editor
Const MACRO_NAME = "Count Word Occurrences"

Sub CountWords()
    Dim olkLst As Object, _
        olkMsg As Object, _
        olkAtt As Object, _
        excApp As Object, _
        excWkb As Object, _
        excWks As Object, _
        lngRow As Integer, _
        strFilename As String, _
        strDateRange As String, _
        arrTemp As Variant, _
        datStart As Date, _
        datEnd As Date
    strFilename = InputBox("Enter a filename (including path) to save the exported messages to.", MACRO_NAME)
    If strFilename <> "" Then
        strDateRange = InputBox("Enter a date range for the messages you want to process in the form ""mm/dd/yyyy to mm/dd/yyyy""", MACRO_NAME, Date & " to " & Date)
        arrTemp = Split(strDateRange, "to")
        datStart = IIf(IsDate(arrTemp(0)), arrTemp(0), Date)
        datEnd = IIf(IsDate(arrTemp(1)), arrTemp(1), Date)
        Set excApp = CreateObject("Excel.Application")
        Set excWkb = excApp.Workbooks.Open(strFilename)
        Set excWks = excWkb.Worksheets(1)
        Set olkLst = Application.ActiveExplorer.CurrentFolder.Items.Restrict("[ReceivedTime] >= '" & Format(datStart, "ddddd h:nn AMPM") & "'" & " AND [ReceivedTime] <= '" & Format(datEnd, "ddddd h:nn AMPM") & "'")
        lngRow = 5
        Do While excWks.Cells(lngRow, 2).Value <> ""
            excWks.Cells(lngRow, 3).Value = 0
            'Write Counts to spreadsheet
            For Each olkMsg In olkLst
                'Only export messages, not receipts or appointment requests, etc.
                If olkMsg.Class = olMail Then
                    If FindString(olkMsg.Body, excWks.Cells(lngRow, 2).Value) Then
                        excWks.Cells(lngRow, 3).Value = excWks.Cells(lngRow, 3).Value + 1
                    End If
                End If
            Next
            lngRow = lngRow + 1
        Loop
        excWkb.Save
        excWkb.Close
    End If
    Set olkMsg = Nothing
    Set olkLst = Nothing
    Set excWks = Nothing
    Set excWkb = Nothing
    Set excApp = Nothing
    MsgBox "Process complete.", vbInformation + vbOKOnly, MACRO_NAME
End Sub

Function FindString(strText As String, strFind As String) As Boolean
    Dim objRegEx As Object, colMatches As Object, objMatch As Object
    Set objRegEx = CreateObject("VBscript.RegExp")
    With objRegEx
        .IgnoreCase = False
        .Global = True
        .Pattern = strFind
        Set colMatches = .Execute(strText)
    End With
    If colMatches.Count > 0 Then
        FindString = True
    End If
    Set objRegEx = Nothing
    Set colMatches = Nothing
    Set objMatch = Nothing
End Function
Advertisements

23 comments on “Counting Word Occurrences

  1. Mr. Lee

    What if I need it to read two words? Like Option 1, Option 2. Your code works brilliant and I have column B with Option, Option 1, Option 2 etc all the way to 8 but obviously, it’s only reading “Option” what must I change to it can ready Option%?
    Thank you kindly

    • Hi, nesbie.

      I’m not sure I understand what you want. The solution already searches for every word you enter into the spreadsheet. There’s no limit. For example, if column B of the spreadsheet contained

      • Option 1
      • Option 2
      • Option 3

      Then the code would search every message for those three words.

      Or are you saying that you want the search to be a logical OR type search where the solution searches for all those messages containing Option 1, or Option 2, or Option 3?

    • At first I thought it would only count single words like in column B and give the output on column C for that given word – my issue is that the emails were coming in with a background color and for some odd reason, it wasn’t reading the content within the emails with a slight gray color so now I switched the incoming emails to come in as white and it’s reading it correctly. This made me think it was just reading single words given my issue was with the way my emails were coming and not on your code. Thank you for this code, works brilliant! A+ on your skills sir!

  2. David,

    Thank you for the post and code – it has been a huge help for me! I am trying to actually combine the original code (from your post) with the code you wrote for Amit. Essentially, I want to create a list of words in Excel, and whenever a new outlook e-mail comes in, it should automatically calculate the density (percent) of the use of each word in excel. So, if I have it look for the word “I” and it finds 2.4% of the total words in an incoming e-mail are “I”, then it should write that to an excel file (or just pop up an alert with the % of each word).

    Is this something that is possible? I have been trying to do it entirely within outlook, however I’m not sure that is doable. Thank you so much!

    Shawn

    • Hi, Shawn.

      It is possible, but I’m not sure it’s practical. Depending on how many words the solution has to search for in each message and the number of messages you receive each day this could slow Outlook down quite a bit. Pop-ups would be worse, especially in this instances where you received several messages at once. Are you sure you want to do this?

    • David,

      Thank you for your response – since I originally posted I rethought the use of the program, and I plan on pulling outlook out of it all together and focus on running the code through excel and word. Here is my current barrier that I am trying to figure out:

      I am looking for a combination of the code you posted but I can’t seem to get it right. I have a Macro in Excel that allows you to select a word document, and then [ideally in the background], I want it to run a word occurrence count on that selected word document, and then populate the percent into specific cells within excel.

      The issue is that I want it to run 5 or 6 categories of words, and each category has 20 words in it. So I would have a sheet in Excel, where A1 is “Personal Pronouns” and then I would have 20 personal pronouns listed below that starting in A3, and then in A2, after counting the % occurrence in the word document, it would sum all the occurrence percentages of personal pronouns and tell me the total % of personal pronoun use in the document. In B1 I might have “Violence Words” and then a list of words associated with violence below that.

      The background of this project is that I am trying to identify differences in writing and writing style for criminals or folks that commit acts of violence, so we can then apply this tool to the writing of individuals who raise red flags but haven’t done anything bad and determine whether or not they are a threat, and what the best course of action to helping them would be.

      Thank you for the help with this!

    • Hi, Shawn.

      I’d take a different approach to this. I’d write this in VBscript and have it process text files and write the results to an Excel spreadsheet. Text files are the most common file type and eliminate the need for working around formatting issues. Word documents can be exported to a text files as can Adobe Acrobat and many other file types. This would give you the ability to process more types of documents. Storing the word lists in the script would also make the solution more portable. All you’d need is the script. Would this solution work for you?

    • David,

      Yes – that solutions definitely would work better for me. Unfortunately I don’t know much VBScript, but now is as good as a time as ever to start learning!

      Also – I agree about the text files, that is not a problem at all. I definitely don’t want it having to look through any formatting or other code that is irrelevant to the person actually writing the document.

    • Hi, Shawn.

      Here is a script that counts the number of times different lists of words occur in a text file. I’ve defined three sets of words, but you can modify it to have as many or as few as you need. The script writes the results to a text file. Each list of words must be enclosed in parenthesis and the words must be separated by the | character.

      Follow these instructions to set up the script on your computer.

      1. Open Notepad
      2. Copy the code below and paste it into Notepad
      3. Edit the code as desired
      4. Save the file as Count-Words.vbs
      5. Close Notepad
      6. Move the saved file to your SendTo folder

      To use the script right-click any .txt file, select Send to, then select this script. The script will count the words and write the results to a text file in the same folder as the source file. The output file name will be Source-File-Name-Word-Count.txt (where Source-File-Name is the name of the input file).

      'Define some constants
      	Const WORD_COUNT = "\S+"
      	Const SCRIPT_NAME = "Count Words (v1.0)"
      
      'Define some variables
      	Dim strLst1, strLst2, strLst3, strFil, strFol, objFSO, objInp, objOut, strBuf, varCnt
      
      'Initialize some variables
      	strLst1 = "(customer|customers)"
      	strLst2 = "(manage|coordinate)"
      	strLst3 = "(communication)"
      
      'Main routine
      	strFil = WScript.Arguments(0)
      	Set objFSO = CreateObject("Scripting.FileSystemObject")
      	If objFSO.FileExists(strFil) Then
      		If objFSO.GetExtensionName(strFil) = "txt" Then
      			strFol = objFSO.GetParentFolderName(strFil)
      			Set objInp = objFSO.OpenTextFile(strFil)
      			strBuf = objInp.ReadAll
      			objInp.Close
      			varCnt = CountOccurrences(strbuf, WORD_COUNT)
      			varLst1 = CountOccurrences(strBuf,strLst1)
      			varLst2 = CountOccurrences(strBuf,strLst2)
      			varLst3 = CountOccurrences(strBuf,strLst3)
      			Set objOut = objFSO.CreateTextFile(strFol & "\" & objFSO.GetBaseName(strFil) & "-Word-Count.txt", True)
      			objOut.WriteLine "The document contains " & varCnt & " words."
      			objOut.WriteLine "The document contains " & varLst1 & " occurrences of the words in list #1.  Percent of total = " & FormatNumber(((varLst1/varCnt) * 100),2)
      			objOut.WriteLine "The document contains " & varLst2 & " occurrences of the words in list #2.  Percent of total = " & FormatNumber(((varLst2/varCnt) * 100),2)
      			objOut.WriteLine "The document contains " & varLst3 & " occurrences of the words in list #3.  Percent of total = " & FormatNumber(((varLst3/varCnt) * 100),2)
      			objOut.Close
      		End If
      	End If
      	Set objInp = Nothing
      	Set objOut = Nothing
      	Set objFSO = Nothing
      
      'Terminate the script
      	MsgBox "Counting complete.", vbInformation+vbOKOnly, SCRIPT_NAME
      	WScript.Quit
      
      Function CountOccurrences(strText, strFind)
          Dim objRegEx, colMatches
          Set objRegEx = CreateObject("VBscript.RegExp")
          With objRegEx
              .IgnoreCase = True
              .Pattern = strFind
              .Global = True
          End With
          Set colMatches = objRegEx.Execute(strText)
          CountOccurrences = colMatches.count
          Set objRegEx = Nothing
          Set colMatches = Nothing
      End Function
      
    • David, I am out of the country without access to a computer that can run this code until mid-May. I will try it as soon as I get back then – I am eager to move forward with this! I just wanted to stop in and say thank you for putting this together.
      Cheers!

  3. Hi David,

    I am looking for a Word 2007 macro that does the following:

    – Calculated the word density of a list of words found in a given text. The list of words can be in an Excel sheet. The macro finds out whether or not any of these words occur in the give text, counts the total number of occurrences, and calculates the density.

    It sounds similar to the code above, except that the text is in a standard Word doc.

    Any help more welcome.

    Thanks,

    Amit

    • Hi, Amit.

      Would the solution always check the entire document or would it ever need to search a subset of the document (e.g. Just the selected text)? Do you want to calculate the density for each word or only for the total?

    • Thanks for the reply, David. The program will need to check the entire doc. Let me explain what I really want. The macro will refer to a user-defined list (could be in Excel, or hard coded in the macro itself). It will then count how many times these words occur in the doc text, take an overall total, then do a simple density calculation.

      For eg, the list contains ‘red’ and ‘blue’. The document text (100 words) contains ‘red’. The macro output will give me 1%.

    • Amit,

      This should do it. I tested this with a very small document (i.e. less than 100 words) and it worked correctly. I don’t know how well it will work with larger documents. It may not work at all for very large documents. The code goes in Word. I’m assuming that you are familiar with adding macro code to Word.

      The code expects the spreadsheet to contain the list of words. It expects one word per line with the words in column A. Column B will contain the count for each word. Column C will contain the density in percentage format.

      Sub CountWordOccurrence()
          'On the next line, edit the path to the spreadsheet containing the list of words to count occurrences for.
          Const EXCEL_FILE_NAME = "c:\users\david\documents\testarea\21cstories.xlsx"
          'On the next line, edit the starting row within the spreadsheet.  The default of 2 assumes that the sheet has a column header.
          Const EXCEL_START_ROW = 2
          Const MACRO_NAME = "Count Word Occurrences"
          Dim excApp As Object, _
              excWkb As Object, _
              excWks As Object, _
              lngRow As Long, _
              lngTot As Long, _
              strBod As String, _
              strWrd As String
          lngTot = ActiveDocument.Range.ComputeStatistics(wdStatisticWords)
          strBod = ActiveDocument.Content.Text
          Set excApp = CreateObject("Excel.Application")
          Set excWkb = excApp.Workbooks.Open(EXCEL_FILE_NAME)
          Set excWks = excWkb.Worksheets(1)
          For lngRow = EXCEL_START_ROW To excWks.UsedRange.Rows.Count
              strWrd = excWks.Cells(lngRow, 1).Text
              excWks.Cells(lngRow, 2).Value = WordCount(strBod, "\b" & strWrd & "\b")
              If excWks.Cells(lngRow, 2).Value > 0 Then
                  excWks.Cells(lngRow, 3) = (excWks.Cells(lngRow, 2).Value / lngTot)
              Else
                  excWks.Cells(lngRow, 3) = 0
              End If
              excWks.Cells(lngRow, 3).NumberFormat = "0.00%"
          Next
          excWks.Columns("A:C").AutoFit
          excWkb.Save
          excWkb.Close
          Set excWks = Nothing
          Set excWkb = Nothing
          Set excApp = Nothing
          MsgBox "Process complete.", vbInformation + vbOKOnly, MACRO_NAME
      End Sub
      
      Function WordCount(strText As String, strFind As String) As Long
          Dim objRegEx As Object, colMatches As Object, objMatch As Object
          Set objRegEx = CreateObject("VBscript.RegExp")
          With objRegEx
              .IgnoreCase = True
              .Global = True
              .Pattern = strFind
              Set colMatches = .Execute(strText)
          End With
          WordCount = colMatches.Count
          Set objRegEx = Nothing
          Set colMatches = Nothing
          Set objMatch = Nothing
      End Function
      
    • Thank you so much, David. I tested and the macro is working smoothly.

      But need a small tweak. Sorry I didn’t explicitly make this clear – the % needed is for the total count of listed words found, and not individual words found. So if two listed words are found in a 100, it will be 2%. Secondly, instead of the process complete message, will it be possible to display this total % count directly to the user?

      David, let me explain the background and why I needed this code. I want to offer a small free utility on my website – a jargon density checker. You know how industry jargon is widespread and used recklessly. My utility will allow writers/content creators to check the jargon density in their content. The list will comprise popular jargon words.

      And I want to offer this tool as a web interface on my website. It will have minimal design – a text box for users to past text, and a few buttons. The function will be the same as that of this macro.

      If you could help me create the tool (in php, perl or python or any other suitable script) that will be great. I am willing to compensate for the efforts, so do let me know how much it will cost.

      Thank you so much again.

      Amit

    • Hi, Amit.

      In light of your explanation, I’ve rewritten the macro to eliminate Excel altogether, to calculate the overall density rather than calculating it for each word, and to display the density in a dialog-box. This version accepts a comma-separated list of words to search for rather than reading the list from a spreadsheet.

      Sub CountWordOccurrences()
          Const MACRO_NAME = "Count Word Occurrences"
          Dim lngTot As Long, _
              lngCnt As Long, _
              strBod As String, _
              strWrd As String, _
              arrWrd As Variant, _
              varWrd As Variant, _
              dblDen As Double
          strWrd = InputBox("Enter a comma separated list of words to search for.", "Enter Search Words")
          If strWrd <> "" Then
              lngTot = ActiveDocument.Range.ComputeStatistics(wdStatisticWords)
              strBod = ActiveDocument.Content.Text
              arrWrd = Split(strWrd, ",")
              For Each varWrd In arrWrd
                  lngCnt = lngCnt + WordCount(strBod, CStr("\b" & varWrd & "\b"))
              Next
          End If
          If lngCnt > 0 Then
              dblDen = (lngCnt / lngTot) * 100
          Else
              dblDen = 0
          End If
          MsgBox "The word denisty is %" & Format(dblDen, "###.##") & ".", vbInformation + vbOKOnly, MACRO_NAME
      End Sub
      
      Function WordCount(strText As String, strFind As String) As Long
          Dim objRegEx As Object, colMatches As Object, objMatch As Object
          Set objRegEx = CreateObject("VBscript.RegExp")
          With objRegEx
              .IgnoreCase = True
              .Global = True
              .Pattern = strFind
              Set colMatches = .Execute(strText)
          End With
          WordCount = colMatches.Count
          Set objRegEx = Nothing
          Set colMatches = Nothing
          Set objMatch = Nothing
      End Function
      

      I appreciate the offer, Amit, but I have to decline. I’m not taking on any paid projects. I might give this a shot at some point purely for the challenge, but I can’t promise anything.

      I deleted your other post (i.e. the one with your email address). I already have your email address and didn’t want to expose it to the wider internet.

    • Thank you for the new code, David. This is awesome, the code runs well.

      Thank you so much for your time and effort. I am going to use the macro in my work, and will also try and offer it to my site visitors for free, and will give you a credit line (in the code or message box).

      About the website utility, it would be great if you could try it out whenever you have time. No worries if it doesn’t work out.

      Regards,

      Amit

  4. Sorry for late reply David. I was suffering from sickness for whole week. actually the problem is the all counts against the keywords are vary from actual. for example macro counts 8 mails in my inbox against the keyword “OK” but the actual count is 12. Some keywords count is correct. i would like to tell you that all keywords are unique.

    When i tried this macro in other users system (macro enabled, same configuration-windows 7; office 2010) incorrect counts coming against the keywords. I think there is some problem in date restriction. when we entered date range it should be restricted the counts between the date range. Please assist.

    • Rakesh,

      No worries. I’m sorry you were sick. I hope it wasn’t anything serious and that you’re feeling better now.

      Are you saying that the solution isn’t working correctly on any computer or is it just not working correctly on one computer? When you say that the count is wrong, are you considering that the count is case sensitive? For example, if one of the words you’re having it search for is “OK”, then “OK” will be counted but “Ok”, “oK”, and “ok” would not be counted. Why do you think that there’s some problem with the date restriction? Are you entering the date range correctly (i.e. mm/dd/yyyy) or are you entering using a different date format?

      I tested the code before posting it and both the date restriction and the counts were correct. I tested the counts by using Outlook’s built-in search. For each word in the spreadsheet I searched the folder containing the test messages I used. Outlook tells you how many messages contain the word you’ve searched for, and that count always matched the count returned by the script. To test the date range, I put a debug statement in the code that displayed the date of each message processed. Those dates always fell within the date range I’d entered.

  5. Thanks a ton!!!!
    I have tested this macro and added following code to pick folder before to search or count emails by date. but there is one problem the code is not working on other user’s system. am i missing something? Please advise.

    Dim objnSpace As Object, objFolder As MAPIFolder
         
        Set objOutlook = CreateObject("Outlook.Application")
        Set objnSpace = objOutlook.GetNamespace("MAPI")
    
            On Error Resume Next
            Debug.Print strSheet
      'Select export folder
        Set objnSpace = Application.GetNamespace("MAPI")
        Set objFolder = objnSpace.PickFolder
      'Handle potential errors with Select Folder dialog box.
    If objFolder Is Nothing Then
        MsgBox "There are no mail messages to export", vbOKOnly, _
        "Error"
    Exit Sub
    ElseIf objFolder.DefaultItemType  olMailItem Then
        MsgBox "There are no mail messages to export", vbOKOnly, _
        "Error"
    Exit Sub
    ElseIf objFolder.Items.count = 0 Then
        MsgBox "There are no mail messages to export", vbOKOnly, _
        "Error"
    Exit Sub
    End If
    
    • You’re welcome, Rakesh.

      What’s not working on the other user’s system, the code you added or my code? Tell me about the other user’s system. What version of Outlook and Excel are they using? What happens when you try running the code on their computer? Are there any error messages, and, if so, what do they say? Are macros enabled on the other user’s system?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s