Influential People Accounts Monitoring Using Machine Learning in Social Networks
Introduction
In this 21st century many breakthrough technologies have resulted in the development of various communication channels and social networking services i.e. twitter, facebook, myspace etc. over the internet while the dependence of large communities, societies and social lives of people on digital services have brought several challenges that need to be addressed such as diversified social engineering attacks, spams, online scams and financial frauds etc. New age of information has already started with the innovation of computers and the internet back in 1960’s. But today it has become compulsory to be aware of the entrapment while someone remains online and use digital services whether for personal use or office work. It is necessary to be careful while looking at the dissemination of misinformation despite it looks legitimate and coming from the known person. In this paper I will demonstrate the background information about the worldwide renowned cyber-attacks on commonly used social media platform by the vast number of people including, influencing personalities, celebrities, famous people and political entities and discuss the possible root causes that led these attacks to be happen and later will give an overview about what could be the reliable solution on both sides (People connected with online social network (twitter) and Online social network company (twitter) which offer these communication services from the technological and legal aspects of cyber security to avoid being a next target of malicious entities who have capabilities to destroy the reputation, gain financial advantage or influence the people on political biases. This paper provides some guidelines to the readers and social media company, what methods or strategies they can utilize in my opinion at their end to keep themselves secure and safe. The focus of this paper would be mainly consist on security of twitter and its audience since it has been a main target for the hackers frequently because of being a celebrities oriented platform.
Background Overview
With the rise of social media usage among the people, attackers have become more interested and involved in the online frauds and scams with the help of social engineering and offensive tools for their beneficial purpose. Since the beginning of internet hundreds of thousands of cyber-crimes have been conducted mostly by the bad actors in different regions of the world and the battle is still going on. Although it’s increasing with the more dependency of people on digital technologies. As the use of digital technologies have its own advantages such as people can do work easily in a very fast manner. But if we look on the other side of the coin, it contains certain number of lethal elements which are necessary to keep in mind from the security perspective. A massive 98 percent of respondents agreed with the statement that humans are weakest link in cyber security and over two-thirds agreed with this common fact strongly [1][2]. This paper addresses some of the early cyber-attacks on twitter and continues to the latest major cyber-attack happened in July2020. Corporates seeking to protect digital assets must face an uncomfortable truth, the biggest threat to cybersecurity lies within the company because of its link to different systems [2]. Later on there are some solutions and techniques presented that could play a part in reducing the risk of cyber-attacks on companies such as twitter or similar to this. I have taken help with solutions from different research areas available in the domain to narrow down the scope of this paper. Further details and suggested solution is presented based on my knowledge and understanding of different scenarios.
Analysis and Discussion
It has been observed that, well known cyber-attacks in the history of twitter usually targeted the famous personalities who have strong attraction towards the audience. In January, 2009 twitter faced a security breach in the which hackers were able with relative ease to gain access to user accounts, including one used by President Barack Obama [6]. In 2010 anonymous hackers had attacked some 250,000 user accounts, accessing passwords and email addresses as well as other information [12]. Again in 2011 the prime minister of Thailand became the victim of identity theft [13]. This trend tends to continuously grow year on year basis. In the below mentioned table it can be seen clearly.
The pattern of the popular attacks can also be seen here with the help of graphical representation in Figure1. Although the pattern seems somehow similar except the 2016 attack which was the largest one but it was not involved any financial damage to the users. Most of the time the targeted audience on the twitter has been the celebrities who have great number of followings by their fans as depicted in the above table as well. In terms of financial damage to the audience on twitter the biggest cyber-attack in the history of twitter has been the bitcoin scam launched by three youngsters residing in USA aged between 17–20 years old on 15th of July 2020 [23] [4]. The three were managed to take control of the internal admin panel of twitter accounts by tricking an employee of the company via social engineering mechanism.
The screen shots of tweets from few celebrities or influencing personalities verified accounts are given below that seems to be legitimate but in original they are a nothing more than a trap planned by the hackers [4]. The young hackers behind this attack were managed to collect more than 100000$ and login credentials of users accounts in less than one hour after launching this attack [23] but after some days of continuous struggle they were caught by FBI and prosecuted in the court.
The foremost problem here is to understand why twitter is so vulnerable for its users although it might be the reason for its having low user base as compare to other social networking services around the world because most people don’t take it as a safe and secure platform. I would like to address the issues at the very specific point which is connected to the influential people on twitter and why they are easy catch again and again by bad actors in cyber space. I have discussed some tips in later sections that can be helpful in reducing the risk of being hacked by people.
Methodology to Solve the problem
In this paper widely published technique named as predictive analytics can be applied over the accounts of celebrities or influential people in the twitter platform to recognize and monitor the abnormal behavior of accounts activity to reduce the frequency of scamming attacks on its platform. The use of social platforms for communicating, sharing, storing and managing significant information, is attracting cybercriminals who misuse the Web to exploit vulnerabilities for their illicit benefits [7]. Impersonators, phishers, scammers and spammers crop up all the time in Online Social Networks (OSNs), social apps and services, to get personal information about people and very challenging to be easily identified[7]. Predictive analytics mainly consists of two steps [10]: —
Step 1: — In the first step archived data is fed into the algorithm for creation of patterns.
Step 2: — In the second step current data is mapped to same algorithm and patterns for predictions One of the famous machine learning supervised algorithm named as “Random forest” which is suitable for classification and regression can be utilized to predict the behavior in consolidated form by creating decision trees on data samples (social connections of people) and then gets the prediction from each of them and finally selects the best solution in an accumulated form [8].
Furthermore, the “Random Forest” itself includes four steps basically to perform the mentioned scenario [8] [9] [10].
Step 1 — To begin, start your selection of random samples from a given dataset related to domain.
Step 2 − Next, this algorithm will construct a decision tree for every sample. Then it will get the prediction result from every decision tree i.e. small decision trees generated by dividing data.
Step 3 − In this step, voting will be performed for every predicted result of each decision tree.
Step 4 − Finally, select the most voted result as the final prediction result as your goal. Always remember the performance of machine learning model is directly proportional to the data features used to train it otherwise, it would give inaccurate results and to avoid this situation predictive analytics can be as detailed as [10].
- Define Project: — Each and every detail like outcomes, scope, deliverables, objectives and data sets to be used should be defined in this phase.
2. Data Collection: — It involves datamining with complete customer interaction and consent of data access.
3. Data Analysis: — Extracting vital information can be obtained through inspection, cleaning and modeling.
4. Inspection: — Validation and testing.
5. Modeling: — Modeling is all about preparing accurate predictive models.
6. Deployment: — Models are prepared to be deployed in every decision making process to get desired results. 7. Model Monitoring: Monitoring the status of model.
Advantages
The following are the advantages of using “Random Forest” algorithm [8].
1. Random forests work well for a large range of data items as compare to single decision tree.
2. Random forests are resilient and contains high performance accuracy.
3. Scaling of data is not necessary in random forest algorithm. It maintains good accuracy automatically even if you provide the data later without scaling.
4. Random Forest algorithms have quality to maintain accuracy even the larger part of data is missing.
Disadvantages
The following are the disadvantages of “Random Forest” algorithm [8].
- “Complexity is the main disadvantage of Random forest algorithms” [8].
- “Construction of Random forests are much harder and time-consuming than decision trees” [8].
- “More computational resources are required to implement Random Forest algorithm” [8].
- “The prediction process using random forests is very time-consuming in comparison with other algorithms” [8].
Conclusion
Although online identity protection over the internet is a big challenge nowadays, despite legal aspects of cyber security is trying to help in preventing cyber-attacks in cyber space by introducing and developing new laws for prosecution of criminals as the new cases of cyber-attacks are coming. Originally Many methods have been developed and used by various researchers to find out scammers in different social networks such as Artificial Intelligence, Deep Learning etc. [7]. Indeed, the above defined method is just one technique. Twitter is vulnerable in my opinion because there is no special mechanism or security layer for monitoring the accounts of influential people and their security is treated as the common ones. On the other hand, twitter internal admin panel has low security to commit certain actions on the accounts of people as twitter get information from people who were posting the scam (July-2020) snapshots to Facebook [22]. The above presented predictive analytics technique is based on model through which twitter can monitor the status of accounts and can predict the abnormal activities by such as posting similar things simultaneously by several verified accounts at same time. In the end the tips for people to surf securely is to avoid using public Wi-Fi, and be aware to click on links posted by people and use of strong passwords and VPN network could reduce the chance of account compromise [3] [14].
References
[3] https://edtimes.in/heres-how-celebrity-social-media-accounts-are-hacked/.
[4] https://www.wired.com/story/inside-twitter-hack-election-plan/.
[5] Cao, J., Adams-Cohen, N. and Alvarez, R.M., 2020. Reliable and Efficient Long-TermTwitter Monitoring. arXiv preprint arXiv:2005.02442.
[6] https://www.wired.com/2011/03/twitter-feds-lax-security/
[7] Verma, M. and Sofat, S., 2014. Techniques to detect spammers in twitter a survey.International Journal of Computer Applications, 85(10).
[8]https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_classification_algorithms_random_forest.htm#:~:text=Complexity%20is%20the%20main%20disadvantage,large%20collection%20of%20decision%20trees.
[9] https://www.logianalytics.com/predictive-analytics/predictive-algorithms-and-models/
[10]https://books.google.fi/books?hl=en&lr=&id=6nHcDwAAQBAJ&oi=fnd&pg=PA307&dq=+pattern+ detection+algorithm+machine+learning+in+social+network&ots=4JnK2I988U&sig=PnjpZpXRH4P8aohrl_jOSZEjeUQ&redir_esc=y#v=onepage&q&f=false
[11] https://www.wired.com/2011/03/twitter-feds-lax-security/
[12] https://www.dw.com/en/twitter-hit-by-anonymous-hackers-in-cyberattack/a-16571151
[14] https://www.theguardian.com/technology/2012/nov/08/twitter-warns-hack-password-reset
[18] https://fortune.com/2016/06/09/twitter-hack-malware/
[19] https://techcrunch.com/2017/03/15/twitter-counter-hacked/
[20] https://www.theverge.com/2018/11/13/18091236/twitter-target-elon-musk-scam-bitcoin
[21] https://www.opindia.com/2019/06/adnan-samis-twitter-account-hacked-by-pro-pakistan-turkishhackers/
[22] https://medium.com/@lucky225/the-twitter-hack-what-exactly-happened-d8740d33c1c
[24] https://towardsdatascience.com/machine-learning-basics-random-forestregression-be3e1e3bb91a