Wikipedia is the place on the internet where the bots roam at ease. These computer programs that behave like humans in the Network are responsible for tasks that can be tedious for people, such as identifying and undoing cases of vandalism, adding links, correcting spelling and keeping the syntactic concordance of sentences. Although sometimes they can enter into conflict by editing each other – up to 4.7 million of the articles' editions are corrections that the bots constantly make among themselves – they have also proven useful. One of the tests is its ability to create manuals by reviewing, compiling and classifying the most relevant articles on any topic.
The so-called Wikibooks have been around for a long time. But until now, it was the humans who were in charge of selecting the content, a tedious and complex task due to the immense amount of content pages that are in Wikipedia. With this in mind, researcher Shahar Admati and his team from Ben-Gurion University of the Negev have developed a way to generate these manuals with machine learning. "The novelty of our technique is that it is designed to create a complete Wikibook without human intervention," says the team.
To achieve this, they first selected the existing Wikibooks created by humans that fulfilled all the necessary characteristics. They used the 6,700 manuals they found to train the algorithms. The complexity was that each part of the manual required a different automatic learning ability. The algorithm analyzes the pages of Wikipedia selected automatically to group them in a certain subject and decides if including them in a Wikibook would make it more similar to the books generated by the people.
Each thematic manual is structured by chapters. This was the next step to order the content. The algorithm had to "explore the network formed by the whole set of articles and discover how to divide it into coherent groups", explains MIT Technology Review. The last step was to determine the order in which the articles should appear in each chapter. To do this, the team organized the articles in pairs and used a network model to determine which one should appear first. By repeating this for all combinations of article pairs, the algorithm was able to establish a preferred order for the articles and, therefore, for the chapters.