A man considers the differences between voice and written communications surveillance.

How does voice surveillance differ from other forms of communications surveillance?

As regulators encourage firms to effectively supervise business communications to deter misconduct and maintain compliance, what are the main differences to watch for when monitoring voice communications versus other written digital communications?

AISurveillance Article

17 September 2024 6 mins read

By Kathryn Fallah

Written by a human

In brief:

Within recordkeeping failure enforcements, regulators have highlighted that the supervision of all forms of communications remains a chief regulatory priority
While the objective of monitoring all communication forms is to flag risk and deter compliance breaches, it’s important to understand the difference in surveillance approaches
Certain factors set voice communications apart from written digital communication forms, which firms should bear in mind when utilizing surveillance solutions

Voice surveillance isn’t a new concept, though there has been a (questionable) long-held belief that the tools to perform accurate monitoring of voice channels are lacking. As of recently, however, the introduction of advancing and sophisticated technologies means that transcription reliability has been refined – which is invaluable considering the narrowing regulatory focus on communications compliance.

This is especially true as recordkeeping regulations and resulting enforcements increase. With the Securities and Exchange Commission (SEC) detailing that fined firms failed to “reasonably supervise their personnel with a view to preventing and detecting…violations” in a recent enforcement action, monitoring of personnel and their business communications has become a clear requirement.

Though a large number of enforcements have identified text or instant messaging (IM) channels as the method by which noncompliance occurred, the ability for voice communications to possess suspicious content that poses a risk to compliance is a substantial concern. From MiFID II in the EU to the Dodd-Frank Act in the U.S., voice has been highlighted as a critical part of the compliance picture.

Regulators have clarified their expectations around voice capture, noting that all conversations involving “business as such” will be subject to scrutiny. To prepare, firms must ensure their surveillance policies are in order, and understand how surveillance strategies differ.

Voice variances and surveillance strategies

Considering the range of communication platforms steadily adopted into business operations, it’s important to note that there’s often a variance in how people converse depending on the channel and typical tone associated with a communication format.

When emailing a colleague, for example, it may feel more natural to write a formal message starting with a salutation and ending with a “Best” or “Sincerely.” Though, when sending an IM or text message, the default may be to opt for abbreviations like “lol” or “btw.” In the same sense, people communicate differently when participating in a real-time voice call.

The most notable difference the sets voice apart from other digital communications is conversational style. A voice call may include more in-depth discussions and a distinct set of expressions; therefore, lexicons need to be tailored to account for this discrepancy. When using a comprehensive voice surveillance tool, firms can designate policies to specified data sources to identify and classify information easily.

Since real-time voice conversations are more off-the-cuff and colloquial, it’s likely that misconduct could be occurring during these exchanges. This makes it all the more critical to have procedures to actively monitor key conversations.

To this point, another challenge that arises is the ability to decipher languages, accents, slang, and pitches. In addition, surveillance tools need to tune out static or distracting background noise that could interfere with conversations. Written communications are more straightforward, meaning the ability for misinterpretation is slim – though, the margin for error when handling voice conversations increases. Thus, it is critical to have a surveillance tool specifically built to mitigate these issues.

These variances considered, surveillance processes are handled similarly following the transcription of a voice conversation. In the same way that firms can apply lexicons to digital communications to ensure compliance, lexicons can be applied to voice transcriptions to examine and locate high-risk portions.

Inside(r) voice

The utilization of large language models (LLM) in compliance processes like communications surveillance has enabled the advancement of voice surveillance solutions. As such, firms can have confidence in the precision of call transcription.

Previous voice surveillance methods would rely on manually sorting through entire recordings. One of the most notable methods, random sampling, entails parsing through arbitrary portions of recorded calls in an attempt to identify suspicious activity. Consequently, only a small percentage of a call will be surveilled.

In contrast, more modern solutions use AI-enablement to analyze an entire recording. To retrieve the most relevant part of a transcription, surveillance solutions highlight interest areas per the provided lexicons. Upon obtaining a specific timestamp, compliance teams can confirm if the conversation snippet is innocuous instead of arduously searching through entire recordings – saving time for more important compliance tasks.

LLMs also introduce myriad parameters that make it possible to achieve what smaller language models (LM) could not previously, such as deciphering a breadth of dialects, accents, languages, and slang, or recognizing certain phrases that have the same meaning. AI-enabled solutions also assist with call analysis and help decode unclear, muddled recordings that are often presented to compliance teams.

To ensure accuracy when analyzing these factors, new voice solutions can leverage the advantage of scalable and adaptable LLM models. In addition, multiple LLMs can be combined to enhance call analysis. By doing so, firms are offered the widest set of parameters and variables possible to empower accurate results.

Even after identifying a certain set of phrases or keywords to run against voice transcriptions, firms can update their policies when needed to build out their lexicon list. Whenever lexicon policies are updated, it’s beneficial to employ a solution that audits these changes within the system. Should a regulatory investigation occur, this process displays well-defined documentation of upheld policies.

Find your voice (and record it too)

In view of these considerations, a comprehensive voice surveillance solution should offer reliable accuracy and a wide range of parameters that can interpret various factors associated with voice recordings. Consider a voice solution that can organize and classify all communications in one archive, which provides the added benefit of easily accessible and searchable data.

As the concentration on compliant communications grows, so does the need to employ mechanisms that have the capacity to perform accurate, rapid, and consistent voice surveillance. Regulators have made clear that commitment to the identification and deterrence of misconduct is one of their top priorities – and should be firms, too.

Avoid the risk of unchecked misconduct and surveillance gaps with Global Relay’s newly released AI-enabled voice surveillance solution, which offers impressively accurate, comprehensive supervision abilities to identify high-risk communication areas.

Learn more

About Article

Published 17 September 2024

About Author

Kathryn Fallah