Do you read all of your e-mail? Chances are good that you don’t, since most people receive an overwhelming amount of e-mail each day. Reading e-mail has developed into dreaded time-consuming drudgery, a symptom of our data-driven society. Wouldn’t it be great if your computer could read all of it for you, save all the important stuff, and delete everything that doesn’t interest you?
This scenario isn’t terribly far fetched. Text mining, which is like data mining but uses words instead of numbers, is making it possible for computers to process huge amounts of text data neglected by time-crunched people. Already text mining applications are helping to cure disease, assure customer satisfaction, and may one day even detect lies.
Text mining applications mimic the human brain and “read” through documents faster than any human brain could comprehend. They spot patterns, recognize important words, ignore unimportant information, and piece together conclusions based upon what they find. They can even write responses to e-mail messages.
The undisputed leader in data mining software is SAS, the world’s largest privately held software company. SAS’s Text Miner, due for release in mid-2002, makes it possible to understand large quantities of information without having to read a word of it.
In the past, data mining dealt strictly with numbers. However, a neglected sea of data is stored in text documents. Words have much subtler relationships than numbers, and finding relationships and patterns within textual documents has historically been an impossible task. In the meantime, unread resumes, customer e-mails, surveys, research and other documents have continued to pile up.
SAS’s Text Miner overcomes the obstacles of textual documents. It not only recognizes patterns in English language documents but also offers custom dictionaries in French and German. More languages are, no doubt, on the way.
Already many companies are using text mining to assist with various aspects of their businesses. CRM companies are finding it useful for reading large quantities of customer e-mail, satisfaction surveys, and complaints. It’s proven useful in identifying the best customers and the best ways to retain them, as well as accurately identifying market segments and forecasting demand.
If you’ve ever applied for a job over the Internet and received no response, you’ve probably wondered if anyone even read your application. Chances are good that it was ignored in a flood of similar applications, and it’s quite possible that the company hired someone less qualified than you. Especially in large companies, huge numbers of resumes are regularly received and simply filed away. Text mining now makes it possible for these companies to process every resume and every application. You may actually get more personal attention from a computer through text mining.
Employees are seeing the benefits, too. Many companies send out employee satisfaction surveys but few have the resources to investigate them thoroughly. One company recently used text-mining software to review 15,000 such surveys collected over a five-year period. The results helped the company identify what they could do to retain their top employees by finding particular words in open-ended questions. Now that they’ve identified what keeps their best employees happy, hopes are high for reduced turnover rates.
Medicine is another area receiving much-needed assistance from text mining. In a recent clinical study, one pharmaceutical company processed 500 text responses from participants. Patterns of words used by respondents revealed that women over 40 could not tolerate high dosages of the drug. Text mining identified the trend, making the drug safer for use when it was put on the market.
Text mining can also help businesses save money by reducing fraud. From fraudulent bids on eBay to false insurance claims, text mining can find trends in the data. Data mining in general excels at finding patterns that are unusual – a single company with multiple addresses, for example, or a single individual who submits an unusually large number of accident claims – and reporting them.
Credit card companies, who are especially susceptible to online fraud, are able to identify groups most likely to commit fraudulent acts. Additionally, they can track usage trends and watch for unusual charges. If you never gamble, yet one day several charges to an online casino are recorded on your card, your credit card company can spot this anomaly and check with you to make sure the charges are legitimate.
Some insidious uses for text mining are possible as well. For example, some SAS developers believe text-mining software may one day detect lies. Software is already available that is capable of processing writing patterns by individuals and detecting variations in those patterns. Employers could potentially use such software to analyze employee e-mail and to figure out when an employee has lied. Text mining hasn’t reached this point yet, however, and it’s doubtful that it will ever accurately detect lies in speech.
Text mining has certainly progressed far from its early days of crawling the Web, searching for keywords. It’s clear that this technology will prove an invaluable timesaver for business and the government, saving time and money. What’s not so clear are the ethics behind the mining process.
While having your computer read your e-mail for you and write you a report about it may seem appealing, consider the darker implications. What are text miners doing with your information patterns? Be prepared for more controversy on this topic as text mining becomes more widely used by industry.
Jackie Rosenberger is an editor with Murdok, Inc.