Spamming where the skies are blue

New platform, same old shenanigans

Aug 27, 2023

spam cans floating in front of a blue sky with white clouds — this image is not to be taken literally

Multiple users of new social media platform Bluesky have recently reported being followed by strange accounts with bizarre names and no real content. These accounts have display names consisting of random word pairs such as “Compassionate Yogurt” and “Unaccountable Reindeer”, handles consisting of two entirely different random words, and biographies consisting of at least three random words (separated by a variety of characters). Thankfully, unlike certain large social media platforms, Bluesky (or, more accurately, the underlying AT Protocol) provides an API that can be used to download public data programmatically, which is useful when researching large spam networks.

collage of 16 bluesky accounts with random adjective + noun names — mashing adjectives and nouns together at random doesn’t exactly inspire confidence in the legitimacy of a group of social media accounts

# updated 2023-09-13 to use atproto 0.0.27

from atproto import Client
import json
import pandas as pd
import random
import requests
import time

SEPARATORS = [",", "#", "|", "*", "\n"]
VALID_CHARS = "abcdefghijklmnopqrstuvwxyz-,#|*\n "

def retry (method, params):
    retries = 5
    delay = 1
    while retries > 0:
        try:
            r = method (params)
            return r
        except:
            print ("    error, sleeping " + str (delay) + "s")
            time.sleep (delay)
            delay = delay * 2
            retries = retries - 1
    return None
           
def get_following (handle, client, batch=100, limit=1000):
    print ("retrieving accounts followed by " + handle + "...")
    r = retry (client.app.bsky.graph.get_follows,
            {"actor" : handle, "limit" : batch})
    cursor = r["cursor"]
    rows = []
    rows.extend (r.follows)
    while cursor is not None and len (rows) < limit:
        time.sleep (1)
        r = retry (client.app.bsky.graph.get_follows,
                {"actor" : handle, "limit" : batch, "cursor" : cursor})
        cursor = r["cursor"]
        rows.extend (r.follows)
    return rows

def get_followers (handle, client, batch=100, limit=1000):
    print ("retrieving accounts that follow " + handle + "...")
    r = retry (client.app.bsky.graph.get_followers,
            {"actor" : handle, "limit" : batch})
    cursor = r["cursor"]
    rows = []
    rows.extend (r.followers)
    while cursor is not None and len (rows) < limit:
        r = retry (client.app.bsky.graph.get_followers,
                {"actor" : handle, "limit" : batch, "cursor" : cursor})
        cursor = r["cursor"]
        rows.extend (r.followers)
    return rows

def get_profiles (handles, client):
    profiles = []
    while len (handles) > 0:
        if len (handles) > 25:
            batch = handles[:25]
            handles = handles[25:]
        else:
            batch = handles
            handles = []
        r = retry (client.app.bsky.actor.get_profiles,
                {"actors" : batch})
        profiles.extend (r.profiles)
    return profiles
        
def validate_text (s):
    if s is None:
        return None
    for c in s.lower ():
        if c not in VALID_CHARS:
            return None
    return s

# this function will change depending on the identifying
# characteristics of the spam accounts being studied
def test_account (account, client):
    if not account["handle"].endswith (".bsky.social"):
        return False
    if account["posts_count"] > 40:
        return False
    if account["followers_count"] > 40:
        return False
    display = validate_text (account["display_name"])
    if display is None:
        return False
    display = display.replace ("-", "")
    if display != display.title ():
        return False
    display = display.split ()
    if len (display) != 2:
        return False
    for word in display:
        if word.lower () in account["handle"].lower () \
                or len (word) < 3:
            return False
    bio = validate_text (account["description"])
    if bio is None:
        return False
    valid_bio = False
    for s in SEPARATORS:
        count = bio.count (s)
        if count >= 2:
            count2 = len (bio.replace (" ", "").split (s))
            count3 = len (bio.replace (s, " ").split ())
            if count2 >= 3 and abs (count - count2) < 2 \
                    and count3 <= count2:
                if valid_bio == True:
                    valid_bio = False
                    break
                else:
                    valid_bio = True              
    return valid_bio

def explore (handles, client, iterations=4, max_batch=14):
    checked_following = set ()
    checked_followers = set ()
    checked = set ()
    suspects = []
    check_next = []
    for i in range (iterations):
        for handle in handles:
            if handle not in checked_following:
                checked_following.add (handle)
                accounts = get_following (handle, client)
                time.sleep (0.2)
                for account2 in accounts:
                    handle2 = account2["handle"]
                    if handle2 not in checked_followers:
                        checked_followers.add (handle2)
                        followers = get_followers (handle2, client)
                        names = [f["handle"] for f in followers]
                        for account3 in get_profiles (names, client):
                            handle3 = account3["handle"]
                            if handle3 not in checked:
                                checked.add (handle3)
                                if test_account (account3, client):
                                    check_next.append (handle3)
                                    suspects.append (account3)
                        print ("potential spam accounts so far: " \
                                + str (len (suspects)))
                        time.sleep (0.2)
        if len (check_next) < max_batch:
            handles = check_next
            check_next = []
        else:
            random.shuffle (check_next)
            handles = check_next[-max_batch:]
            check_next = check_next[:-max_batch]
        print ("suspected spam accounts after iteration " + \
               str (i + 1) + ": " + str (len (suspects)))
    return suspects

client = Client ()
client.login ("*****", "*****")

# seed with an initial group of suspected 
# spam accounts belonging to the same network
handles = [
    "middlearchaic.bsky.social",
    "rudeindoor.bsky.social",
    "heavyretirement.bsky.social",
]
suspects = explore (handles, client)
pd.DataFrame (suspects).to_csv ("bsky_spam_suspects.csv", index=False)

The code above uses the atproto Python module, an implementation of the AT Protocol, to retrieve public data about Bluesky accounts and explore the Bluesky social graph. Since this particular spam network engages in bulk following, it can be mapped by starting with a set of known spam accounts as seeds and downloading the lists of accounts they follow, and then in turn downloading the followers of those accounts and checking them for accounts matching the characteristics of the accounts in the network (two word display names in title case, handles that do not contain the words in the display names, etc). This process will generally need to be repeated multiple times in order to find most or all of the accounts in a given network, and the detection criteria will differ from network to network (the test_account function in the above Python code).

histogram of account creation dates for the 401 accounts in the network, all created in August 2023 — welcome to Astroturf Account August

This method yielded 406 accounts, 5 of which were determined to be false positives by manual review, leaving a total of 401 accounts that belong to the network. All 401 were created in August 2023 (or, more accurately, added to the Bluesky App View in August 2023). The spam accounts do not follow each other; rather, each one follows a small number of legitimate accounts, generally accounts with at least a few hundred followers and a decent amount of content.

hourly post volume by post type bar chart — there’s a definite rhythm to the spam, although it changes occasionally

The 401 accounts in this spam network have posted 7043 times as of August 26th, 2023. These posts are mostly reposts (5012 of 7043, 71.2%); the remainder contain links to news articles, some of which are accompanied by embedded images and others of which are plain text posts. Thus far, the accounts in the network have posted no original content — the text of the news posts is simply the first sentence or two of the linked article. The network’s posting behavior has a definite rhythm, alternating between reposts, news posts, and silence in a regular pattern, albeit with occasional changes in posting frequency and longer gaps here and there.

collage of posts from the network containing links to media sites — this botnet will bring you the latest news about dinosaur races

table of websites most frequently linked by the network — some things just make sense, and the Daily Mail being at the top of a list of websites linked by a group of spam accounts is one of those things

This spam network links a variety of media sites with widely varying degrees of credibility, with the Daily Mail at the top of the list. The sites linked by the network are a mix of local and national news sites based in a variety of countries, including the UK, the USA, Canada, the Philippines, Australia, India, and South Korea. The news content shared by the network has no apparent theme and appears to simply be a random selection of articles on a wide variety of topics from various sites.

table of accounts most frequently reposted by the network — if you’re on this list, it probably means your Bluesky account is popular enough to get noticed by spambot networks

This spam network reposts content from hundreds of different Bluesky accounts, including journalists, authors, various aggregator accounts, and the official Bluesky account itself (bsky.app), with the only discernible theme being that the accounts amplified are relatively popular. Presently it’s hard to tell what the ultimate purpose of this network is — perhaps it will eventually spam the platform in support of some political agenda, perhaps it’s the beginning of some kind of astroturf-for-hire operation, or perhaps it’s something else entirely. It will be interesting to observe how this particular spam network evolves over time (assuming that BlueSky doesn’t ban the accounts first).

Derek Plaslaiko

Aug 28, 2023Liked by Conspirador Norteño

You should, as I am attempting to right now, try listening to this article, rather than reading it. The python code was particularly stimulating. Lol

Expand full comment

Dem’s Atomic Discoball

Aug 29, 2023Liked by Conspirador Norteño

Thanks for posting, following this type of stuff. I’m sure there are vast amounts of fuckery on tap 2024, not to mention the war, and then there is still the rest of the world being varying degrees of chaotic.

Conspirador Norteño

Spamming where the skies are blue

New platform, same old shenanigans

Discussion about this post