today are flooded from an avalanche of data. It is what is popularly known as Big Data (data in large-scale intelligence data). It is the world of data and its importance is growing.
the intensive use of data has been interesting for urban planning (through the merger of geographic data of high fidelity), intelligent transportation (through the analysis and visualization of vivid and detailed data from the network of) roads), environmental monitoring (through networks of sensors that collect data in a ubiquitous manner), energy conservation (through the discovery of usage patterns), risk-prediction financial (through the analysis integrated in a network of contracts for) find dependencies among financial institutions), national security (through the analysis of social networks and possible terrorist financial transactions), information security (through the analysis of the recorded information), and so on . Storage and data-intensive can reduce the cost of health care and improve its quality, to do more preventive and personalized care and basing it on continuous and extensive monitoring of activities and symptoms of people, doing a lot most likely the maxim of ‘is worth more than cure’.
However, problems appear immediately during the acquisition of data, when the data tsunami requires us to make decisions about what data store and discard, how to store them in a reliable way. The current data are of very different types: tweets and blogs are fragments of text, weakly structured, while the images and videos are prepared, initially for storage and display, but not so much for their search and analysis. Turn that content into a form suitable for further analysis is a major challenge. The value of data increases considerably when they can be linked to other data, so data integration is another important challenge. As today most of the data are generated directly in digital format, we have the opportunity to influence the creation of data to facilitate the subsequent link and automatically linking data previously created.
methods to query and extract knowledge of current data are fundamentally different from the traditional statistical analysis in small samples. Data, in the world of the Big Data are distributed, they have noise (some values, it is not known what, they are not well), are dynamic, heterogeneous, interrelated and often unreliable. However, even with noisy data could be more valuable than the small sample because obtained patterns tend to dominate individual fluctuations and, often, to reveal patterns and most reliable hidden knowledge. In addition, interconnecting large networks of heterogeneous information, is you can explore the redundancy to compensate for missing data, verify conflicting cases, validate relationships and discover hidden new relationships and models.
the world of data needs a new Professional: data engineer. This professional is responsible for developing, build, test and maintain architectures, databases and processing systems on a large scale. Data engineers will have to implement new ways to improve data reliability, efficiency and the quality of them. An important additional aspect that must be taken into account is the security and confidentiality of the data (even more after the entry into force last May of the European regulation for the protection of personal data). The aspects mentioned are already present in existing applications.
is profiled an engineer of data as a computer engineer, with advanced knowledge of Software Engineering and information systems, which meet the characteristics of the data, the type of frequent queries that are interesting for the respondent and the aspects in which the entity is interested in improving through the intensive management of the data. You must know the available hardware and software and its possibilities. You must know the techniques of efficient storage, data processing in advanced and distributed architectures and techniques of software engineering. You must know the law and European and national regulations concerning the safety and confidentiality of the data. You must have adequate communication skills to interact with different user profiles from the data. In addition, in the world Big Data their work will be complemented with the scientist’s data to find new algorithms or use available ones to extract patterns from data.
miguel Toro, Arantza Illarramendi, Francisco Ruiz are professors of the universities of Sevilla, Basque country and Castilla La Mancha.
Chronicles of the Intangible is an area of disclosure on computer sciences, coordinated by the academic society SISTEDES (technologies of Software development and Software engineering society). The intangible is the non-material part of computer systems (i.e.software), and here is recounted its history and its future. The authors are professors of Spanish universities, coordinated by Ricardo Peña Mari (Professor of the Universidad Complutense de Madrid) and Macario Polo Usaola (Professor of the University of Castilla – La Mancha).