Multiple users of new social media platform Bluesky have recently reported being followed by strange accounts with bizarre names and no real content. These accounts have display names consisting of random word pairs such as “Compassionate Yogurt” and “Unaccountable Reindeer”, handles consisting of two entirely different random words, and biographies consisting of at least three random words (separated by a variety of characters). Thankfully, unlike certain large social media platforms, Bluesky (or, more accurately, the underlying AT Protocol) provides an API that can be used to download public data programmatically, which is useful when researching large spam networks.
# updated 2023-09-13 to use atproto 0.0.27
from atproto import Client
import json
import pandas as pd
import random
import requests
import time
SEPARATORS = [",", "#", "|", "*", "\n"]
VALID_CHARS = "abcdefghijklmnopqrstuvwxyz-,#|*\n "
def retry (method, params):
retries = 5
delay = 1
while retries > 0:
try:
r = method (params)
return r
except:
print (" error, sleeping " + str (delay) + "s")
time.sleep (delay)
delay = delay * 2
retries = retries - 1
return None
def get_following (handle, client, batch=100, limit=1000):
print ("retrieving accounts followed by " + handle + "...")
r = retry (client.app.bsky.graph.get_follows,
{"actor" : handle, "limit" : batch})
cursor = r["cursor"]
rows = []
rows.extend (r.follows)
while cursor is not None and len (rows) < limit:
time.sleep (1)
r = retry (client.app.bsky.graph.get_follows,
{"actor" : handle, "limit" : batch, "cursor" : cursor})
cursor = r["cursor"]
rows.extend (r.follows)
return rows
def get_followers (handle, client, batch=100, limit=1000):
print ("retrieving accounts that follow " + handle + "...")
r = retry (client.app.bsky.graph.get_followers,
{"actor" : handle, "limit" : batch})
cursor = r["cursor"]
rows = []
rows.extend (r.followers)
while cursor is not None and len (rows) < limit:
r = retry (client.app.bsky.graph.get_followers,
{"actor" : handle, "limit" : batch, "cursor" : cursor})
cursor = r["cursor"]
rows.extend (r.followers)
return rows
def get_profiles (handles, client):
profiles = []
while len (handles) > 0:
if len (handles) > 25:
batch = handles[:25]
handles = handles[25:]
else:
batch = handles
handles = []
r = retry (client.app.bsky.actor.get_profiles,
{"actors" : batch})
profiles.extend (r.profiles)
return profiles
def validate_text (s):
if s is None:
return None
for c in s.lower ():
if c not in VALID_CHARS:
return None
return s
# this function will change depending on the identifying
# characteristics of the spam accounts being studied
def test_account (account, client):
if not account["handle"].endswith (".bsky.social"):
return False
if account["posts_count"] > 40:
return False
if account["followers_count"] > 40:
return False
display = validate_text (account["display_name"])
if display is None:
return False
display = display.replace ("-", "")
if display != display.title ():
return False
display = display.split ()
if len (display) != 2:
return False
for word in display:
if word.lower () in account["handle"].lower () \
or len (word) < 3:
return False
bio = validate_text (account["description"])
if bio is None:
return False
valid_bio = False
for s in SEPARATORS:
count = bio.count (s)
if count >= 2:
count2 = len (bio.replace (" ", "").split (s))
count3 = len (bio.replace (s, " ").split ())
if count2 >= 3 and abs (count - count2) < 2 \
and count3 <= count2:
if valid_bio == True:
valid_bio = False
break
else:
valid_bio = True
return valid_bio
def explore (handles, client, iterations=4, max_batch=14):
checked_following = set ()
checked_followers = set ()
checked = set ()
suspects = []
check_next = []
for i in range (iterations):
for handle in handles:
if handle not in checked_following:
checked_following.add (handle)
accounts = get_following (handle, client)
time.sleep (0.2)
for account2 in accounts:
handle2 = account2["handle"]
if handle2 not in checked_followers:
checked_followers.add (handle2)
followers = get_followers (handle2, client)
names = [f["handle"] for f in followers]
for account3 in get_profiles (names, client):
handle3 = account3["handle"]
if handle3 not in checked:
checked.add (handle3)
if test_account (account3, client):
check_next.append (handle3)
suspects.append (account3)
print ("potential spam accounts so far: " \
+ str (len (suspects)))
time.sleep (0.2)
if len (check_next) < max_batch:
handles = check_next
check_next = []
else:
random.shuffle (check_next)
handles = check_next[-max_batch:]
check_next = check_next[:-max_batch]
print ("suspected spam accounts after iteration " + \
str (i + 1) + ": " + str (len (suspects)))
return suspects
client = Client ()
client.login ("*****", "*****")
# seed with an initial group of suspected
# spam accounts belonging to the same network
handles = [
"middlearchaic.bsky.social",
"rudeindoor.bsky.social",
"heavyretirement.bsky.social",
]
suspects = explore (handles, client)
pd.DataFrame (suspects).to_csv ("bsky_spam_suspects.csv", index=False)
The code above uses the atproto Python module, an implementation of the AT Protocol, to retrieve public data about Bluesky accounts and explore the Bluesky social graph. Since this particular spam network engages in bulk following, it can be mapped by starting with a set of known spam accounts as seeds and downloading the lists of accounts they follow, and then in turn downloading the followers of those accounts and checking them for accounts matching the characteristics of the accounts in the network (two word display names in title case, handles that do not contain the words in the display names, etc). This process will generally need to be repeated multiple times in order to find most or all of the accounts in a given network, and the detection criteria will differ from network to network (the test_account
function in the above Python code).
This method yielded 406 accounts, 5 of which were determined to be false positives by manual review, leaving a total of 401 accounts that belong to the network. All 401 were created in August 2023 (or, more accurately, added to the Bluesky App View in August 2023). The spam accounts do not follow each other; rather, each one follows a small number of legitimate accounts, generally accounts with at least a few hundred followers and a decent amount of content.
The 401 accounts in this spam network have posted 7043 times as of August 26th, 2023. These posts are mostly reposts (5012 of 7043, 71.2%); the remainder contain links to news articles, some of which are accompanied by embedded images and others of which are plain text posts. Thus far, the accounts in the network have posted no original content — the text of the news posts is simply the first sentence or two of the linked article. The network’s posting behavior has a definite rhythm, alternating between reposts, news posts, and silence in a regular pattern, albeit with occasional changes in posting frequency and longer gaps here and there.
This spam network links a variety of media sites with widely varying degrees of credibility, with the Daily Mail at the top of the list. The sites linked by the network are a mix of local and national news sites based in a variety of countries, including the UK, the USA, Canada, the Philippines, Australia, India, and South Korea. The news content shared by the network has no apparent theme and appears to simply be a random selection of articles on a wide variety of topics from various sites.
This spam network reposts content from hundreds of different Bluesky accounts, including journalists, authors, various aggregator accounts, and the official Bluesky account itself (bsky.app), with the only discernible theme being that the accounts amplified are relatively popular. Presently it’s hard to tell what the ultimate purpose of this network is — perhaps it will eventually spam the platform in support of some political agenda, perhaps it’s the beginning of some kind of astroturf-for-hire operation, or perhaps it’s something else entirely. It will be interesting to observe how this particular spam network evolves over time (assuming that BlueSky doesn’t ban the accounts first).
You should, as I am attempting to right now, try listening to this article, rather than reading it. The python code was particularly stimulating. Lol
Thanks for posting, following this type of stuff. I’m sure there are vast amounts of fuckery on tap 2024, not to mention the war, and then there is still the rest of the world being varying degrees of chaotic.