Legal data is incredibly valuable. In the insurance field, data about the average similarity score of the clauses in a policy - a measure of the relative similarity or divergence for one clause in relation to a set of other similar clauses that are meant to accomplish the same function - can be used to instantly show how one policy stacks up against another. Data about the composition of cyber policies can be used to reveal which clauses or information a given policy might be missing.
The following series of posts survey how the seemingly nascent features of an insurance policy - policy metadata - can be levereaged to create artificial intelligence that improve the quality and efficiency of drafting these policies. By using even the most rudementary AI tools, underwriters and legal professionals can begin to speed up the work they are doing by more effectively spotting areas that deviate from industry standards.
I. Reviewing documents
The way documents are reviewed is broken. Having been through law school and worked as a doc reviewer in a law firm, I thought I knew what document analysis entailed. It was not until I expanded my perception of what document review could be that I realized how wrong that thought was.
With advances in computing technology, documents can be broken down into a bunch of different pieces. Those pieces can then be scored against each other to deliver deeper insights about a document or series of documents that would otherwise be difficult to achieve. The above spreadsheet is actually one document, broken down into about 100 rows and 10 columns. Document review, now, is more than just reading words on a page.
The machine learning algorithms incorporated into the RiskGenius platform have opened my eyes to the ability of technologically advanced tools to supplement document review, make the process of reviewing documents easier, and empower lawyers, underwriters, and other professionals working with documents. Over the rest of this post, I will walk through an exercise that demonstrates the value of a modern review process that actually turns legal text into data that is able to be scored against each other.
II. Documents --> Data
To show you one of the cool things that our cyber liability index (the above spreadsheet) actually produced, I decided to start creating a picture of the average cyber liability policy. After looking at 1994 Definitions, 1204 Exclusions, 915 Conditions, 378 Insuring Agreement sections, 95 Limits of Liability, and 71 Opening Statements*, we were able to determine that the average cyber liability policy has 1.5xOpening Statements, 8xInsuring Agreements, 2xLimitations of Liability, 42xDefinitions, 19xConditions, and 26xExclusions. Here is how that looks as a pie chart:
In March, I started working for RiskGenius. As a company, we store files for different groups within the insurance community and provide layers of analytics on top of that to improve the operations of these groups. The type of analytics we provide range from a red line feature, which allows people to compare language from two policies against each other, to a clause score, which leverages multiple types of machine learning to gain an understanding of how similar one clause is to other clauses within a specific line of business.
Since I started, I’ve spent a considerable amount of time thinking about how to better understand the cyber liability insurance marketplace, with specific attention to how language is used in cyber policies. Along with a team of policy analysts, I have looked at the language of 4600+ cyber liability clauses, from 45+ cyber liability insurance policies, from 25+ carriers, from 5+ countries around the world. We reduced the agreements into forms and clauses, broke the clauses down by type, and quantitatively scored them against one another to see what the market for cyber liability actually looks like. What follows in this blog is an introductory analysis of the language within our cyber liability index.
I. Cyber risk evolves in parallel to the internet
As we enter further into the era of big data, the importance of effectively managing cyber risk continues to grow. Unfortunately, however, the policies that are meant to insure that specific type of risk are quite lacking in terms of their consistency with one another.
This sort of inconsistency is problematic on a number of levels. For consumers, inconsitency is problematic because it is unclear what types of risk are covered under different insurance policies. For the drafters of these policies, inconsistency is problematic because the task is to write coverage for threats that are unknown. To better understand what future threats are, it is helpful to have an understanding of what a cyber liability policy is and how it differs from other types of policies.
In general, cyber coverage is some combination of four components: errors and omissions, media liability, network security and privacy. These four components, however, are different. Each protects a different subset of cyber liability risk, and a strong protection of one subset of risk does not offset a weak protection of another subset of risk. In Zurich v. Sony, for example, a lower court held that Zurich did not have to pay out under the privacy coverage of its’ commercial and general liability policy, after hackers exposed the personal details of 77 million users in a 2011 hack of the Playstation Network.
And while the market for cyber liability products has improved in the past few years, such uncertainty is still going to be expensive and inefficient. Case in point, Equifax’s cyber policy may not fully cover the liabilities created by that breach.
As we go further down the rabbit hole, in order to protect for new types of cyber risk as technology continues to move forward, there needs to be a better way to rapidly digest and understand evolving areas of the cyber liability insurance marketplace because the threats are continuously evolving and our identities are becoming increasingly digital.
II. Using data to understand cyber policy language
Throughout this process of collecting and processing cyber liability policies into data, we have been able to unearth some valuable insights about the composition of the cyberliability marketplace. For instance, the following graph shows the average similarity of each different type of clause from our index of cyber liability data. You are looking at 1100 different types of clauses. The vertical axis indicates the frequency of the different clauses we identified — how often that clause type appears in cyber policies. The horizontal axis indicates what the average similarity of that clause is, with similarities that range from 0–100.
For example, the data point for the definition of "Service Provider" is circled in red. This clause category appears only 9 times in the index with an average similarity score of 82%. This means that although the clause is relatively frequent, the definition has an average level of variance, compared to other clauses in the industry.
I've been quiet the last month or so. There has been too much to learn.
We are learning from our customers.
We are learning from insurance policies.
And, we are learning from the machine learning algorithms.
Since so much has happened in the last six months, I thought I would highlight some of my favorite RiskGenius learnings.