Big data in the translation industry is an entirely new phenomenon that has been made possible due to the introduction of cloud-based translation platforms. What does this mean?

Unlike their server-based predecessors, cloud tools centralize the data of many organizations and individual translators, making it uniform and easy to compare. All translation tools have the potential to collect data on productivity, number of words translated, project managed, and costs. The disadvantage of desktop and server-based tools is that they have to be downloaded/installed onto a company’s own hardware. These servers and workstations are not connected to any external companies, and so all their data is essentially siloed. Furthermore, metrics tend to be uniquely customized to the needs of individual translation departments, and so even if a company using a server-based tool wanted to compare their data with other companies using the same tool, it would be very difficult. Other companies may not keep track of their data, or may analyse it in a completely different way, making it difficult to get accurate comparisons.

Why Cloud-Tools are Different

Cloud-based tools on the other hand collect the data from all their users under one roof, meaning it is comparable in a standardized way. In the early days of cloud tools, sample sizes would have been too small to be useful, however they have grown exponentially in usage over the last few years, with many of them now hosting tens of thousands of users. In a nutshell, big data in translation simply means that because thousands of companies’ data is now centralized, for the first time definitive conclusions can be drawn and benchmarks set that are representative of the whole industry.

Cloud platforms such as Memsource, Smartling, SmartCAT and MateCAT may now be in a position to allow their users take advantage of this data.

Why is Big Data so Interesting?

Big data has the potential to dramatically shake up the translation industry.

For the first time, it will be possible to benchmark things like the most productive translators in the industry, and find out what a good translation speed is. The top 10% of translators can expect to receive far more work offers, therefore increasing competition. Before the advent of big data, it would have been very hard to prove who the best translators were, or even what a good translation speed might have been.

Other potential insights might include things like what languages are going up and down in demand (therefore determining their value more accurately), and in the future it will also mean that companies no longer need to do benchmark testing when hiring a new translator; they can simply choose the top translators from a list of thousands.

Current Findings and Potential Benefits

Memsource published its first ‘big data’ article on June 28th 2016. Thanks to the centralized data of thousands of users, it was possible for the first time to accurately compare the disparity between machine and human translation. In this particular study, it was found that they are identical in 5 - 20% of cases. Another finding (published on July 19th) has been the exact extent to which translation memory can save a buyer money. Through big data, they were able to prove that translation memory increases productivity by exactly 36% on average, and that in extreme cases it can improve it by over 90%. Information like this is valuable to both LSPs and buyers alike, and could have a strong impact on ROI evaluations from translation memory.

Potential Benefits for Users:

Increased productivit
Definitive benchmarks set across industry
Translation memory can be benchmarked
Future trends can be predicted
Budget can be adjusted according to trends
Previous mistakes made clear (and therefore easy avoided in future)

The above examples are just the tip of the iceberg. As this concept catches on, better findings will almost certainly come of the woodwork that will alter the way we work, who we work with, and what we use. It is no exaggeration to say that for better or worse, big data in translation has the potential to transform the language industry.

by James Austin

James Austin is a Prague-based content developer for Memsource, musician and fitness enthusiast.