Plataforma de extração e recuperação de dados na Web no contexto de Big Data
Abstract
The dispersion of interest data to businesses and organizations in several domains on the Web, and in different formats, it becomes increasingly necessary the ability to get them and for that is needed to provide manners to extract these data, to ensure its reliability for the correct storage. Techniques of data extraction, in particular Web Scraping (search robot), allows the capturing of such data. This project aims to study techniques for data extraction, based on the web domain, and through this, it materializes in the development of a platform that offers the ability to extract this information by means of parameterization of search robots, allowing the user autonomy of its creation.
The (A) Assignor declares that (s) text (s) concerned is (are) of his personal authorship, being responsible, therefore, for the originality of the (s) even (s) and gives the organizers, full rights to choice of publisher, publication means, means of reproduction, dissemination of media, drawing, shape, everything that is needed for that publication be effected.
PUBLISHER undertakes to ensure the editorial quality of the publication, ensuring that the concepts and the thought of (a) ASSIGNOR remain faithful to the original. This assignment will be valid throughout the period of legal protection WORK, may hold the PUBLISHER will be shown how many issues it deems appropriate.
All rights are reserved. Any reproduction, even if part of the publication should include the reference credit, according to the current copyright law of Brazil. Still, the article submission process, the author agrees to the terms of an exclusive statement, originality and agreement to the final version. It is justified in the field "; COMMENTS TO THE EDITOR" ;, when the participation of more than one author.