This "product review" website does not exist, at least not in any meaningful sense
A mix of synthetically-generated faces and reviews lifted from Amazon result in an entirely unnecessary website
The Internet is full of sites offering customer reviews of a variety of products, but not all of these review sites are legitimate. One interesting example is Overeview.io (not to be confused with Overeview.com), which claims to offer “legitimate product reviews” for over 2 million products ranging from laptop computers to kitchen supplies. According to a blurb on the site, Overeview.io uses unspecified “advanced artificial intelligence tools” to rate products. In reality, all of Overeview.io’s reviews appear to be plagiarized from Amazon, and the only demonstrable use of artificial intelligence is the StyleGAN-generated faces that are used to represent the reviewers.
The product reviews on Overeview.io contain links to Amazon.com listings where the products in question can be purchased. A quick scroll through the customer ratings on Amazon reveals that the “reviews” on Overeview.io are simply reworded versions of reviews from Amazon customers. For example, here’s the text of an Amazon user’s review of a paper product:
“This is perfect for my preschool class. The rolls are very thick so my little ones can handle them without squeezing them. I love the bright colors that doesn’t chip off. No more collecting paper towels rolls.”
And here’s the Overeview.io version:
“It is perfect for the preschool class I teach. My little ones can handle the rolls without having to squeeze them since they are very thick. This is one of my favorite types of paint because the bright colors don’t fade away. Getting rid of paper towels rolls has never been easier.”
Due to the sheer volume of reviews, the rephrasing was presumably done via automated means, although it is unclear whether AI language models were used in some way or whether this was done using more conventional article spinning techniques. The scores (1-5 stars) are copied from the Amazon reviews, and the aggregate score for each product appears to simply be the average of the individual scores, rather than anything calculated by “advanced artificial intelligence tools”.
Overeview.io’s plagiarized reviews do not contain the name and profile photo of the original Amazon user who wrote the review; instead, the reviews are attributed to fictional people with random first and last names and StyleGAN-generated faces. (A few “reviewers” have a gray silhouette as an avatar instead of a synthetically generated face, similar to the default profile images used by many social media sites. None of the “reviewers” use any other type of image.) The site uses at least 8091 unique GAN-generated faces to represent “reviewers”, and like most spammy projects that use large numbers of artificially generated faces, some of these images contain obvious glitches. These glitches include vestigial heads next to the primary face (also known as “side demons”), surreal hats with nonsensical logos or indecipherable text, hands with extra fingers, and faces that melt into the background or other objects.
One of the most obvious anomalies present in unmodified StyleGAN-generated faces is the extremely consistent positioning of primary facial features: regardless of what direction the face appears to be looking in, the eyes, nose, and mouth are in the same location. This particular trait becomes evident when many such faces are blended together, as demonstrated in the animation below.
How does one gather the data to analyze a site like this? Below is the Python code used to crawl the Overeview.io website and extract the text of the reviews along with the names of the alleged “reviewers” and the GAN-generated faces used as profile images. One thing you’ll notice about this code snippet is that it is objectively bad in certain ways, the most obvious being that two entirely unrelated web scraping techniques are used to scrape the site (Selenium, which works by launching and automating a web browser, and direct HTTP requests via Python’s requests module). The reason for this is simple: the goal was to get something working to scrape the site with as little effort on my part as possible, so I cobbled together existing code snippets from past projects and made no effort to make the resulting code clean or elegant. This runs counter to established best practices for software development, but social media/spam research isn’t about writing production software, and often the best approach to writing code for research purposes is to spend comparatively little time on coding in order to focus on the resulting data and the overall project at hand.
import bs4
import pandas as pd
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium import *
import requests
import time
options = Options ()
firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX
driver = webdriver.Firefox (capabilities=firefox_capabilities,
options=options)
root_url = "https://overeview.io"
driver.get (root_url)
elements = driver.find_elements (By.CLASS_NAME,
"footer-sub_category-item")
urls = set ()
for url in [e.find_elements (
By.TAG_NAME, "a")[0].get_attribute ("href") for e in elements]:
going = True
driver.get (url)
while going:
elements = driver.find_elements (By.CLASS_NAME, "col-md-4")
for e in elements:
try:
u = e.find_elements (By.TAG_NAME,
"a")[0].get_attribute ("href")
if u.startswith (url):
urls.add (u)
except:
pass
buttons = driver.find_elements (By.TAG_NAME, "button")
going = False
print (str (len (urls)) + " " + url)
for button in buttons:
if button.text == "Load more products":
button.click ()
time.sleep (3)
going = True
time.sleep (1)
image_path = "overeview_images/"
rows = []
image_urls = set ()
pos = 0
for url in urls:
r = requests.get (url)
soup = bs4.BeautifulSoup (r.text)
elements = soup.find_all ("div", {"class" : "pb-3"})
for e in elements:
e1 = e.find_all ("p", {"class" : "text-break"})
if len (e1) > 0:
e1 = e1[0]
text = e1.text.strip ()
e1 = e.find_all ("div", {"class" : "text-right"})[0]
rating = e1.text.strip ()
e = e.find_all ("div", {"class" : "align-items-center"})[0]
name = e.find_all ("div",
{"class" : "font-weight-bold"})[0].text.strip ()
date = e.find_all ("div",
{"class" : "text-muted"})[0].text[2:].strip ()
image_url = root_url + e.find_all ("img")[0]["data-src"]
rows.append ({
"name" : name,
"text" : text,
"rating" : rating,
"date" : date,
"image_url" : image_url
})
image_urls.add (image_url)
pos = pos + 1
if pos % 10 == 0:
print (str (len (rows)) + "\t" + str (len (image_urls)))
df = pd.DataFrame (rows)
df.to_csv ("overeview_all.csv", index=False)
for url in image_urls:
fname = url.split ("/")[-1]
with requests.get (url, stream=True) as r:
r.raise_for_status ()
with open (image_path + fname, "wb") as f:
for chunk in r.iter_content (chunk_size=8192):
f.write (chunk)