In this paper, the study and application of data analysis techniques for extracting information is proposed. The contribution of this work targets the process of identification of relevant literature from a collection of crawled documents. Novel functions, called social network features, are described and evaluated on documents crawled on ArXiv, to examine their relevance. The results highlight the data analysis process and the performance of the classification of the data mining algorithms used.