Automatic Speech Recognition (ASR) and how it is shaping the industry

We continue our tech series with a look at Automatic Speech Recognition (ASR), and how this technology is shaping the language industry.

ASR technology uses computer-based algorithms to identify, process and convert the spoken word to written text. It is also used to identify someone’s identity through their voice, as an authentication tool (known as Automatic Voice Recognition, AVR).

The technology has a long history dating back to the 1950s.  Most recently, however, the technology has hugely benefited from advances in deep learning and big data.

In our modern Smart Homes, AI assistants are helping us perform day to day tasks from doing the washing to adjusting the temperature; and our cars are taking instruction from Siri.  Elsewhere, in industries such as Healthcare, Security, and Banking, the integration of ASR technology is making great strides.

What does this mean for the language service industry?

AI-driven technologies have a major impact on the language service industry, with software possibly replacing the role of the traditional human transcriber in the future.

Developments in natural language processing mean that AI technology has become better at meeting challenges, such as accents and background noise.

Ongoing data collection means that the machines are constantly learning and improving their ability to ‘hear’ and understand a wider variety of words, languages, and accents.

We, as humans, have refined our process of developing our natural ability to listen and understand words.  We must train computers in the same way that we were taught by our parents and teachers. This training involves a lot of innovative thinking, manpower, and research.

Perfecting these speech recognition systems will take a lot more time and a lot more field data; there are thousands of languages, accents and dialects to take into account.

That’s not to say progress is not being made, but there is still a long way to go to prove their reliability.  Think of all the regional accents, background noise, speech impediments and multiple user inputs.  Remember that the technology must distinguish between homophones (did the speaker mean ‘son’ or ‘sun’), to learn the difference between proper names and separate words (“Tim Cook” is a person, not a request to find a cook named Tim), and more.

Global Lingo’s Approach

We have embedded ASR technology into our workflows, to offer customers a more flexible service when speed and cost-effectiveness are overarching.

We offer a service known as ASR + Post-editing.  Our integrated ASR software outputs a “raw” transcript from an audio or video file, almost instantaneously.  This transcript will contain errors – even with the best quality audio, the computer software cannot achieve the same guaranteed accuracy of the human.  Therefore, the transcript passes through a Post-editing stage. This produces a high-quality transcript, correctly attributed, thoroughly researched and in a template style and format to suit the client’s needs.

By using ASR in combination with Post-editing, we are able to provide transcripts with an accuracy level of around 95% in approximately a third of the time it would take us to transcribe from scratch. This also drastically reduces our workload, meaning we can offer you a significant cost saving.

What Sets Global Lingo Apart?

Unlike other ASR providers, Global Lingo’s approach ensures:

  • Dedicated customer service
  • Security of all data
  • Speed of turnaround
  • Cost effectiveness
  • Quality guaranteed

For clients who need large volumes of audio transcribed very quickly, ASR + Post-editing helps us to produce transcripts with turnaround times that would not be possible with human-only transcription. This rapidity is key to giving you the competitive advantage when sharing and publishing materials.

If you are interested in finding out more about our ASR-assisted transcription and other services or would like to discuss a specific project in more detail, please don’t hesitate to contact us.


Read our other articles in this Tech series here:

Translation Memory