Mastodon privacy: you can't really opt out of search engine indexing
There are a lot of reasons people might not want their posts on a social network to be indexed by search engines. Too bad Mastodon's "opt out" doesn't actually opt you out.
There are a lot of reasons people might not want their posts on a social network to be indexed by search engines. One of the most important is personal safety. Harassers often use search engines to find out information about the people they're targeting – or to find new people to target. So Mastodon gives you the option (on the Preferences/Other settings page) to opt out of search engine indexing.
Unfortunately, as we'll discuss below, selecting this option doesn't actually fully opt you out of search engine indexing.
If you're familiar with Mastodon's cavalier approach to privacy and security, this probably doesn't come as a surprise to you. As hachyderm.io admin Kris Nova says in Operating Mastodon, Privacy, and Content
My immediate advice is to treat everything on Mastodon as if it is public data!
However, a lot of people think Mastodon's more private and secure than it really is. For one thing, Mastodon has long positioned itself as reducing harassment by learning from Twitter’s mistakes. And making it harder for harassers to search for past messages is often used as an example of this. As EFF's Bill Buddington says in Is Mastodon Private and Secure?
"This cuts down on harassment, because abusive accounts will have a harder time discovering posts and accounts using key words typically used by the population they’re targeting (a technique frequently used by trolls and harassers)"
So having this option that doesn't actually work gives people a false sense of security – but leaves them exposed if they're at risk of harassment.
Why doesn't opting out really opt you out?
Understanding why this option doesn't work as expected requires a bit of digging into how "noindex" rule works in search engines and how Mastodon treats the opt-out setting.
As Google's Block Search Indexing with ‘noindex' describes, when an HTML page has a <meta name="robots" content="noindex"> tag, Google and other search engines that support the noindex rule won't index the page. Of course, badly-behaved search engines can ignore the tag, so this won't stop people who write their own crawlers. Still, writing a crawler and storing all your own data is a pretty significant investment, so this is a useful level of protection.
Mastodon version 4.0 always puts a noindex tag on most pages, but whether or not it's on your profile depends on the "opt-out of search engine indexing" option. The option is also used to determine whether there's a noindex tag on web pages for your posts. My indieweb.social account has that option turned on, so if you look at the HTML for this post, you'll see the noindex rule. So far so good.
But if I do a search for "@email@example.com", I'll find pages with my posts in them. In fact, I can even do a search for specific text in one of my posts – here's one for "jdp23 gh*st" that brings up a thread Anil Dash started that I replied to. In this particular case I don't care, but imagine a situation where instead of gh*st I had used a term that attracts harassers.
That'd be very bad.
And as Darius Kazemi verified, this applies to unlisted posts as well as public posts.
From a software perspective, the bug here is that Mastodon is only checking the "opt out of search engines" setting for the original author. Anil, like many others, doesn't mind if his posts are indexed by search engines. When I reply to him, that means that my post will be indexed by search engines as well – even though I've opted out.
I filed a bug report on this and it'll be interesting to see what the response is.
But wait, there's more
This isn't the only way your public and unlisted Mastodon posts can wind up in a search engine even if you've opted out. If somebody from another instance is following you, there's no guarantee that the software they're running will pay attention to the "opt-out from search engines" setting. As long as the other instances are running Mastodon software, this isn't an issue unless admins have intentionally disabled this functionality. However, other software that's compatible with Mastodon may not know about this setting.
In fact, because of the way Mastodon implements federation, even posts that have been deleted have copies on other instances that can still be found by search engines. Yikes!
It's worth noting that "local-only posts," supported by Mastodon forks (variants) like Glitch and Hometown, provide significant protection here. Local-only posts aren't included in externally-accesses pages, so search engines never see them. As Hometown maintainer Kazemi points out, if you're on an instance where you trust the admins and the other members, local-only posts give you the ability to ensure that your stuff only goes to actors you personally trust. Unfortunately, Mastodon's BDFL (benevolent dictator for life) has rejcted this valuable anti-harassment technology from the main line of code, so most instances don't have this functionality.
Of course, Mastodon's not the only social network site where you don't have any privacy. Twitter allows you to delete your tweets and direct messages, but doesn't actually commit to deleting them from their internal databases or backups. And since there's currently more organized harassment on Twitter than Mastodon, and their investors (including Larry Ellison of Oracle, Prince Alwaleed bin Talal bin Abdulaziz of Saudi Arabia, and the Qatar Investment Authority) get special rights to your personal data, the risks are likely higher there.
Still, don't kid yourself: Mastodon's security and privacy story is not good. Lenin Alevski recently found a system misconfiguration vulnerability making content and videos from supposedly-private direct messages open to the world; as well as infosec.exchange's 33,000 users, Alevski reports this affected several other high-profile sites. The lack of end-to-end encryption means that admins can read supposedly-private direct messages – and if you're DM'ing with somebody on another instance, their admins can read it as well.
The What to do? section of Dan Goodin's How secure a Twitter replacement is Mastodon? Let us count the ways has a useful list of some of the things you can do to cut down the risks, but they only go so far. At the end of the day, I agree with Kevin Beaumont, a security professional and admin for the cyberplace.social instance, who Goodin quotes as saying:
“My take is the same as Twitter. Don’t write anything on social media you wouldn’t write in public."