1 00118601 Emerging themes 2019 A4 AW v31 combined - Page 50

Before we consider the potential tools in detail, first
a salutary warning to ensure that you have all the
potentially relevant data from all sources. This is about
conducting quality control on data recovered, but, more
fundamentally, retrieving from all relevant sources. This is
particularly important – and challenging – given the
massive spike in the use of WhatsApp and other social
media apps such as Snapchat (which can automatically
delete messages). However, if we obtain a legal hold and
can forensically image the device, these messages
should still be able to be retrieved – but these issues
must be considered upfront in the investigation.
Turning now to the start of the document review process
– what tools can you use to interrogate the different
unstructured data types?
Clustering documents: Combined with other tools
We are all aware of keywords. An important but
somewhat blunt instrument, they will filter your
population to only those documents responsive to
certain keywords. But how about clustering or grouping
documents so that you can see similar themes and
concepts? This can allow you to strategically prioritise
your review. It has also proven to be very effective
– especially combined with data mapping, email
threading and near-duplication techniques – to identify
similarities and “like-versions”. Image A shows what the
results might look like. It allows us to visualise the major
themes of the review population, and identify themes we
would expect to be relevant. As such, we can choose to
review documents relevant to particular topics, or
remove documents related to clearly irrelevant topics.
Image A
Email network analysis
It is also possible to analyse email data to see who
is communicating with who, and how frequently (see
Image B). Using this email network analysis allows us to
ascertain whether there are anomalous communication
flows requiring further investigation. This can also be
combined with mobile and social media data to provide
a fuller understanding of the relationships between
individuals. It is this holistic, rather than linear, view of
data that has the potential to accelerate document
review exercises and thereby reduce costs.
Image B
Individual A
Individual B
Simple active learning
How do you now go about identifying relevant
documents in what might still be a review population
numbering tens of thousands (if not more)? What about
that much-used, and very commonly misunderstood,
term “artificial intelligence”? A more accurate term would
be “machine learning”. Using tools of this nature – such
as forms of predictive coding – can certainly help,
including with increasing the accuracy of the review.
It has been demonstrated that machines are more
effective than humans when it comes to identifying
relevant documents. Having lawyers review small
batches of documents to train the machine to recognise
relevant documents, i.e. “simple active learning”, has
proven to be a cheaper, more efficient and ultimately
more effective way of conducting document review in
appropriate circumstances. The key is to ensure the
review population is of sufficient size and, by reviewing
a sample set of documents, contains enough relevant
documents to make utilising predictive coding viable.


Powered by

Full screen Click to read
Paperturn flip book
Download as PDF
Shopping cart
Full screen
Exit full screen