Identifying Spammers on Twitter

Speaker: Demet Dagdelen, Freelancer Data Analyst

Abstract:

Twitter’s popularity attracts many spammers, which poses a threat to legitimate users of the service. In this presentation I will discuss some user and content-based features to use for spam detection. Furthermore, the presentation will compare five algorithms (Naive Bayes, Random Forest, Support Vector Machines, Logistic Regression and K-nearest neighbors) for spam detection. The results show that spam detection based on the suggested features in this presentation can achieve 99% precision (using 10-fold cross-validation). The presentation will also provide a very brief overview of the software of choice for this project: Weka (Waikato Environment for Knowledge Analysis).