This week the lawyer Jose Mira posted on a social network a legal document that spoke of the “Hindu principle bio pro reo” instead of the Latin expression ‘in dubio pro reo’, which refers to the legal rule that if when proving an accusation there is room for doubt, Justice must favor the accused. The most favorable explanation (‘in dubio pro scriptore’ in this case) indicates that a speech recognition or predictive writing program could be behind this bungling.
The call Natural Language Processing (PLN) begins to be everywhere. It is about getting machines to understand texts and extract relevant information from them. It is the main task of
the so-called computational linguistics within the development of Artificial Intelligence (AI). It is mainly based on semantic analysis based on what we hear and read. Machines look for patterns and learn like humans, by trial and error. And they do it using millions of texts and speeches that they find on the Internet. Sooner or later, the program that transcribes a ‘Hindu principle’ will discover that it is an unusual expression, when the number of cases citing the correct Latin expression far outweighs the wrong one. and turn it into a fringe episode. And although unusual is not the same as incorrect, in statistical (and practical) terms the bungling will be corrected even if true knowledge continues to suffer. The big problem continues to be the context in which things are said, although there are also important advances in that regard.
Investment funds such as Blackstone or Nomura have been investing in PLN-based analysis techniques for years. This week, John authersBloomberg cited a study by Joseph mezrich, who for years has been the head of the quantitative strategy unit at Nomura and other funds, which indicates that companies that use complex and convoluted language when explaining their financial performance are less profitable than those that use more complex language. plain and direct. The firms that speak clearly show a profitability of 15.4% while those that do not, remain at 9.4%. Authers joked: “If the CEO talks like Kant, think twice before investing.”
Just over a month ago, Mezrich presented another study where he developed a technique to identify, from the explanations of listed companies, which of them meet the ESG (Environmental, Social and Governance Criteria) requirements. The analysis of the narratives also makes it possible to predict GDP growth, which has been done for years. Some pioneering exercises in Spain have to do with the analysis of the reaction to the speeches of the president of the European Central Bank, carried out by the economist Manuel Illueca (today president of the Valencian Institute of Finance) in June 2014. He discovered that Mario Draghi’s speech announcing greater ECB activism was better received by the press in peripheral countries than by those in central and northern Europe, such as published in Valencia Plaza. More common are word clouds, developed by digital media or the study that teachers Ricardo Queralt and Juan Manuel López Zafra made of the speeches of the motion of censure of Pedro Sánchez.
Another area that is investing and adapting these technologies is that of the Law and the result is not always bungling like that of the ‘Hindu principle’. The judicial system produces millions of documents each year that are being uploaded to PLN systems.. And from there arises a transcendental data set for the so-called ‘Predictive Justice’: the AI can identify which judges, in which cases and with what arguments they issue certain sentences. This information, for a law firm represents a comparative advantage, since can indicate the strategy to follow: from what type of reasoning are more persuasive before a specific judge or if it must force a change of magistrate to obtain a favorable sentence.
Work is also being done to automate the semantic analysis of the legislative production of parliaments and governments. The idea is that the rules are increasingly understandable and accessible to citizens, a purpose for which the political will of legislators has not been enough. Perhaps the machines are more sensitive to the matter.
Technology is also being useful in the fight against the pandemic. The US government and a coalition of research organizations led by the Allen Institute launched CORD-19, an open access search engine for scientists that encompasses open research data on Covid-19 and that uses the most recent advances. Recent Studies in Language Processing and AI Techniques. CORD-19 has access to hundreds of thousands of academic articles on Covid-19, SARS-CoV-2 and related coronaviruses. Its sophisticated systems have made it easier for researchers to manage increasing amounts of knowledge that would otherwise take months or years to link. This has led to an unprecedented acceleration of the pace of research and one of the factors behind the rapid development of vaccines.