How to find swarms of fake Twitter followers
Using patterns in follow order, account creation dates, and other attributes to detect inauthentic follower growth
How does one go about detecting large-scale inauthentic follower growth on Twitter? Generally, organized fake follower activity has two traits that set it apart from extremely rapid organic growth:
Large fake follower networks follow the accounts they follow en masse (hundreds or thousands of fake followers from the same network in a short span of time).
Fake follower networks tend to have anomalous creation date patterns, such as large batches of accounts created on the same day or within the same date range.
Other patterns, such as having zero likes or zero tweets, weird follower/following ratios, large numbers of GAN-generated profile images, or repeated biographies often provide additional evidence that a group of followers is inauthentic.
Let’s take a look at an example account that has a high likelihood of having a large number of fake followers: @markenetspanel1, the “official” Twitter account of “social media management” site Markenet Panel (which sells Twitter followers, among other services). In order to analyze the followers, we first need to download them, which we can do with the Python library tweepy. To do this yourself, you’ll need to sign up for a Twitter developer account and get API keys if you haven’t done so already.
# simple python script to retrieve a Twitter user's followers
# and save some useful metadata for each in CSV format
#
# this is intended as a simple example, and various optimizations
# for efficiency and robustness are possible
import pandas as pd
import tweepy
import time
def encode (s):
return s.replace ("\\", "\\\\").replace ("\n", "\\n"
).replace ("\r", "\\r").replace ("\t", "\\t")
def get_followers (handle):
ids = []
for page in tweepy.Cursor (api.get_follower_ids,
screen_name=handle).pages ():
ids.extend (page)
time.sleep (61)
print ("ids: " + str (len (ids)))
start = 0
count = len (ids)
rows = []
while start < count:
end = min (start + 100, count)
for user in api.lookup_users (user_id=ids[start:end]):
rows.append ({
"id" : user.id_str,
"handle" : user.screen_name,
"displayName" : encode (user.name),
"bio" : encode (user.description),
"protected" : user.protected,
"followers" : user.followers_count,
"following" : user.friends_count,
"tweets" : user.statuses_count,
"likes" : user.favourites_count,
"createTime" : str (user.created_at)[:19],
"verified" : user.verified,
"defaultProfileImage" : user.default_profile_image,
"profileImageURL" : user.profile_image_url_https})
start = end
time.sleep (3)
print ("accounts: " + str (len (rows)))
df = pd.DataFrame (rows)
return df
consumer_key = "*** consumer key goes here here ***"
consumer_secret = "*** consumer secret goes here ***"
auth = tweepy.OAuthHandler (consumer_key, consumer_secret)
api = tweepy.API (auth)
handle = "markenetspanel1"
df = get_followers (handle)
df.to_csv (handle + "_followers.csv", index=False, encoding="utf-8")
The above Python code downloads all followers in most-recent-follower-first order, extracts some useful attributes such as handle (the @-name), display name, and various statistics, and writes the results to a CSV file. (Note that the order followers are returned in is subject to change per Twitter’s documentation, but it has been consistently most-recent-first since at least 2017). There’s plenty of room for improvement in this code; if you’re planning to download all the followers of extremely large accounts, you’ll want to add some error handling and a retry mechanism for failed API calls, for example.
Once we have the CSV file containing all the followers in most-recent-follower-first order, we can generate a follow order by creation date plot.
# FOLLOWER SCATTER PLOT - note that this code depends on the CSV
# being in reverse follow order, which is how the Twitter API currently
# returns followers. It will not work correctly if the rows in the CSV
# are rearranged.
import pandas as pd
import bokeh.plotting as bk
def follower_scatter_plot (followers, handle, opacity_norm=5000,
bubble_size=4, color=(0,90,180), cat_column=None,
cat_colors=None, start=None, end=None,
min_date=None, max_date=None, max_sample_size=200000):
followers["createTime"] = pd.to_datetime (followers["createTime"])
followers["order"] = followers.index
followers["order"] = followers["order"].max () - followers["order"]
df = followers[followers["createTime"] > pd.to_datetime ("2005")]
zoomed = ""
if start is not None:
df = df[df["order"] >= start]
zoomed = " (zoomed)"
if end is not None:
df = df[df["order"] < end]
zoomed = " (zoomed)"
if min_date is not None:
min_date = pd.to_datetime (min_date)
df = df[df["createTime"] >= min_date]
zoomed = " (zoomed)"
if max_date is not None:
max_date = pd.to_datetime (max_date)
df = df[df["createTime"] < max_date]
zoomed = " (zoomed)"
title = "@" + handle + \
" followers - follow order by creation date" + zoomed
p = bk.figure (title=title, width=800, height=800,
y_axis_type="datetime", x_axis_label="follow order",
y_axis_label="creation date")
if cat_colors is None or cat_column is None:
if len (df.index) > max_sample_size:
df = df.sample (max_sample_size)
alpha = opacity_norm / len (df.index)
p.circle (df["order"], df["createTime"], size=bubble_size,
color=color, alpha=alpha)
else:
for label in cat_colors:
df0 = df[df[cat_column] == label]
df1 = df.sample (1)
p.circle (df1["order"], df1["createTime"], size=bubble_size,
color=cat_colors[label], legend=label + \
" (" + str (len (df0.index)) + " accounts)")
p.circle (df1["order"], df1["createTime"], size=bubble_size,
color=(255,255,255))
if len (df.index) > max_sample_size:
df = df.sample (max_sample_size)
alpha = opacity_norm / len (df.index)
df["color"] = df[cat_column].apply (lambda x: cat_colors[x])
p.circle (df["order"], df["createTime"], size=4,
color=df["color"], alpha=alpha)
p.legend.location = "bottom_center"
p.xaxis.axis_label_text_font_size = "15pt"
p.yaxis.axis_label_text_font_size = "15pt"
p.yaxis.major_label_text_font_size = "12pt"
p.xaxis.major_label_text_font_size = "12pt"
p.yaxis[0].formatter.hours = ["%H:%M"]
p.yaxis[0].formatter.days = ["%Y-%m-%d"]
p.title.text_font_size = "14pt"
p.title.align = "center"
p.xaxis[0].formatter.use_scientific = False
return p
handle = "markenetspanel1"
df = pd.read_csv ( handle + "_followers.csv", encoding="utf-8")
colors = {
"has liked one or more tweets" : "#80a080",
"has never liked a tweet" : "#f06000",
}
df["cat"] = df["likes"].apply (lambda x: "has never liked a tweet" \
if x == 0 else "has liked one or more tweets")
p = follower_scatter_plot (df, handle, cat_column="cat", cat_colors=colors)
bk.show (p)
As we can see from the follow order by creation date plots, virtually all of @markenetspanel1’s followers are accounts with zero likes created in either October or November 2022 (mostly November). Given that the @markenetspanel1 account is advertising a follower sales website, this result isn’t exactly surprising, as the followers are most likely examples of the website’s merchandise. (The fake follower network associated with @markenetspanel1, which at one point consisted of over a million fake followers sold via multiple websites, is described in more detail in this Twitter thread.)
Obviously, many accounts that have large swarms of fake followers have some real followers as well. Let’s take a look at a few examples.
In early 2022, multiple candidates for the US House of Representatives picked up thousands of followers each from a large fake follower network. The candidates followed by the network included members of both major political parties: Republicans Blake Harbin of Georgia (@BlakeHarbinGA) and David Giglio of California (@DavidGiglioCA), and Democrat Raji Rab (@RajiRab2020). The fake followers show up as horizontal streaks (pink) on the follow order by creation date plots, since they were all created within a narrow date range and all followed the candidates at roughly the same time. By contrast, the organic followers (gray), show a wide range of creation dates going all the way back to 2008-2009.
Right-wing Twitter influencer @AppSame once accused Democratic US Representative Alexandria Ocasio-Cortez of having a large quantity of fake Twitter followers. In a rather ironic twist, a significant share of @AppSame’s own Twitter audience appears to be inauthentic. The follow order by creation date plot for @AppSame’s followers contains several stretches where nearly all followers follow at least 50 times as many accounts as they have followers of their own (shown in red). These areas show up as rectangular blocks rather than streaks since the followers were created over larger time ranges (months or years rather than days), but the distribution of creation dates still differs from that seen in periods of organic follower growth.
Finally, no discussion of fake Twitter followers would be complete without a peek at Moscow automation aficionado @ARTEM_KLYUSHIN’s account. His follow order by creation date plot contains various anomalies indicative of inauthentic follower growth: both streaks and rectangular regions consisting of thousands of followers with zero likes. Klyushin’s account has been followed and/or amplified by a variety of inauthentic networks over the years.
Some of the content of this article originally appeared in briefer form in this Twitter thread on follow order by creation date plots:
i've noticed fake followers have arrived in Github. Suddenly get a bunch of followers with accounts that popup out of nowhere, and all pretend to be college students with profiles that are a near carbon copy of other accounts. Some dormant, and others a month old.
This is good stuff. The first script worked but the second one gave Deprecation errors. When that was fixed, it resulted in BAD_COLUMN_NAME errors. Is there a fix for it?