Opinion Mining in eDiscovery

Opinion Mining Blog on August 19th, 2010 5 Comments

The world of eDiscovery changed with the amendments to the United States Federal Rules of Civil Procedure (FRCP) that took effect in December 1, 2006. The amendments were written in anticipation of legal arguments and tactics related to the production of Electronically Stored documents (ESI). Traditional legal arguments such as the cost and difficulty of producing ESI and claims that ESI was missing, deleted, or inaccessible are no longer considered an acceptable defense. Failure could result in stiff penalties. The FRCP amendments require organizations to hold all electronic records until a legal matter is formally settled, even if an organization only reasonably anticipates litigation.

The scope and need for of eDiscovery has exploded with the need for retrieval, identification and isolation of electronic records. Emails alone account for over 80% of eDiscovery requests. The growth of emails within large and medium-sized companies increases at the rate of terabytes per year. The use of instant messaging and hand-help mobile devices has increased the scope of enterprise data retention. The sheer volume of data generated poses a stiff challenge to traditional Information Retrieval techniques. These techniques that rely largely on keyword matching may soon fall short of expectations with the demand for the relevant data outstripping the solutions at hand.

For the purpose of this paper, we use the term keyword filtering to include metadata tagging, synonym identification and any other technique that use word-based filters.

eDiscovery for Emails

Currently available eDiscovery products and solutions that are used on email data sets, employ de-duping, filtering and tagging techniques. These products and solutions are effective for low-level identification, storage and eventual extraction. These techniques are fast and effective for identification or rejection of data at the keyword level. The result is a filtered data set that is considered directly relevant based on a set of trivial rejection or acceptance criteria. This enables extraction of derived data sets that can be used for manual review. Manual review is the most expensive step in eDiscovery. eDiscovery itself accounts for 30-40% of overall litigation costs.

As the overall volume of data increases, it is reasonable to assume that the volume of the derived data set that is extracted also correspondingly increases.  While the corresponding increase in overall cost of manual review can be partially offset by using the offshore model, it does not address concerns of privacy and security, management overheads etc. Clearly, there is a need for a more sophisticated approach that can be used in conjunction with the keyword filtering.

Opinion Mining

Opinion Mining or Sentiment Analysis is an area of specialization within Natural Language Processing, Computational Linguistics and Text mining.

Opinion Mining is the use of statistical models and software for the identification and classification of the opinion (attitude) of an opinion holder on a given subject. The classification categorizes the attitude of an opinion holder as supporting, opposing or neutral to a given subject.

 Opinion Mining is a proven technology and has been used in recent times for a wide variety of applications in Defense, Government and Marketing applications as a listening, analysis and engagement tool.  

Opinion Mining is a valuable tool for increasing the efficiency of eDiscovery.

Opinion Mining and eDiscovery

The goal of eDiscovery is the prudent application of a set of processes and tools for the eventual identification of relevant emails from a large data set. The application of process and technology in eDiscovery seeks to constantly reduce the relevant data sets to manageable proportions. Once the data set is reduced to a reasonable size, human reviewers (subject matter experts) are brought in for the final review.

Manual review is highly effective in the case of small data sets. As the data set gets larger, the vagaries of human nature come into play, thereby resulting in reduced efficiency and accuracy. The negative effects tend to grow exponentially with the size of the team and the management of the process, including training, quality assurance and the overall competence of the team.

Manual review is performed by lawyers or paralegals that are intimately familiar with the case or outsourced to a third party. These third-party providers are given a set of instructions (identification queries) for analysis. The queries are used during the manual review process to accept or reject emails within the dataset.

Opinion Mining can be used to translate the queries into an opinion. Opinion Mining software is then used to extract, isolate and classify the opinions expressed in the emails a by opinion holder(s), within the email data set, with a high level of accuracy. The result is a reduced set of emails that are classified as those supporting, opposing and neutral to, the opinion holder’s sentiment about an identified subject. These sets are a classified version of the original email data set that can then be manually reviewed.

Opinion Mining software analyzes text at a sentence level and returns as output, the identified sentence, along with the document organized within an identified opinion.  This effect is similar to a human reviewer accepting or rejecting a given email based on a query.

Opinion Mining offers another approach to eDiscovery that can be very effective when used in conjunction with the keyword filtering to reduce the cost of manual review while increasing the overall quality of output in large data sets.

Opinion mining software can be used by entities that provide or consume eDiscovery services and deal with large data sets. These include Legal firms, Corporations, Government and Defense organizations.

Tags: , ,

5 Responses to “Opinion Mining in eDiscovery”

  1. Eugenie says:

    Great One…

    I must say, its worth it! My link! http://www.kamusta.ph/blog/makaylaggmartin ,many Thanks….

  2. Dehmer says:

    hello…

    Hello there just quality post! http://polly11.jigsy.com ,i’d a good read.appreciate your article,My problem continues to be resolved….

  3. Cander says:

    hello…

    Hello there just quality post! http://angus11.onsugar.com/ ,i had a good read.appreciate your article,My problem continues to be resolved….

Leave a Reply