I. Reviewing documents
The way documents are reviewed is broken. Having been through law school and worked as a doc reviewer in a law firm, I thought I knew what document analysis entailed. It was not until I expanded my perception of what document review could be that I realized how wrong that thought was.
With advances in computing technology, documents can be broken down into a bunch of different pieces. Those pieces can then be scored against each other to deliver deeper insights about a document or series of documents that would otherwise be difficult to achieve. The above spreadsheet is actually one document, broken down into about 100 rows and 10 columns. Document review, now, is more than just reading words on a page.
The machine learning algorithms incorporated into the RiskGenius platform have opened my eyes to the ability of technologically advanced tools to supplement document review, make the process of reviewing documents easier, and empower lawyers, underwriters, and other professionals working with documents. Over the rest of this post, I will walk through an exercise that demonstrates the value of a modern review process that actually turns legal text into data that is able to be scored against each other.
II. Documents --> Data
To show you one of the cool things that our cyber liability index (the above spreadsheet) actually produced, I decided to start creating a picture of the average cyber liability policy. After looking at 1994 Definitions, 1204 Exclusions, 915 Conditions, 378 Insuring Agreement sections, 95 Limits of Liability, and 71 Opening Statements*, we were able to determine that the average cyber liability policy has 1.5xOpening Statements, 8xInsuring Agreements, 2xLimitations of Liability, 42xDefinitions, 19xConditions, and 26xExclusions. Here is how that looks as a pie chart:
More than that, though, we are able to look at a bunch of policies across a single discipline and understand how they compare to the baselines that we established. For example, the average cyber liability insurance policy has 99 clauses in it (the horizontal red line in the graph below), with an average similarity score of 89.5 (the vertical red line in the graph below). The average similarity score respresents how similar all the clauses in one policy are to all the clauses in all the other policies. The higher the score, the more similar that policy is to other policies in the industry. The closest policy to those average scores is indicated by a red circle. Zurich's Security and Privacy Protection Policy. Congrats Zurich!
We can also deliver unique insights about features that might help indicate where language is breaking down. As another example, I can say that based on our data, a larger number of clauses does not automatically mean a policy will have a better similarity score. Taking the inverse of this, however, I can say that the lowest grouping of policies in terms of number of clauses — those under 50 clauses — tend to indicate a lower similarity score.
It’s important to keep in mind that this could be for any number of reasons. It could be that the shorter policies we have reviewed are missing addenda or endorsements that would otherwise improve the similarity score. On its own, this type of tool does not solve many review problems, it does, however, more efficiently spot high level issues.
III. The Future of Document Analysis
As technology continues to improve, the question has shifted from whether or not improvement is possible - technology has shown that improvement is always possible - to one that focuses on the question of whether people are willing to redesign the workflow of the insurance industry to take advantage of things like Artificial Intelligence or Machine Learning.
By adding a simple user interface to the way insurance documents are reviewed that has analysts classifying the different components of an insurance policy into structured sets of data, our team is able to gain insights about what sort of standards exist in areas with rapidly evolving types of risk. And when you boil it down, this is no different than what attorneys do on a daily basis, but the benefit that we have is that by putting it in a different type of format, we can now immediately call upon that knowledge to make comparisons with other insurance policies. For instance, you could even use the average clause composition (above) and one of the graphs from last week that shows the most popular clauses within the industry to determine which pieces might be missing from smaller policies.
This is how Machine Learning and AI will transform the industry and help everyone, not by replacing people but by increasing the efficiency by which people can review documents and extract meaning from text. Specifically, these advances can be used to 1) reduce busy work, 2) enable insurance professionals to practice at the top of their ability, and 3) provide a mechanism for quickly understanding what portions of an insurance policy are most divergent from an industry standard. While there are a good number of average things that have gone into the creation of this blog, probably including my writing, the things that can be accomplished with the help of new technologies are anything but average.
*Note: Our definition of what counts as an opening statement includes declarations pages, tables of contents, and opening statements as separate clauses.