Twitter scraping guide |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[File:Twitter-logo.jpg|alt=twitter logo collection|thumb|God I miss the 2010s]] | |||
== Twitter (X) == | |||
Twitter (also known as X if you're a Republican) is an abhorrent social media site where users share videos of minorities having altercations with the police, share political "opinions" and view Elon Musk tweets because [https://x.com/wongmjane/status/1641884551189512192 their algorithm is literally designed that way now]:<blockquote>"Twitter’s algorithm specifically labels whether the Tweet author is Elon Musk “author_is_elon” besides the Democrat, Republican and “Power User” labels"</blockquote>Despite Elon Musk's acquisition, the site remains relevant and people insist on using this piece of shit, so let's talk about how we can get data from it. | |||
== (Sane) Scraping methods == | |||
=== Syndication endpoint === | |||
Twitter embeds makes a request to the following endpoint: | |||
https://cdn.syndication.twimg.com/tweet-result?id=1845711253295264243&token=5 | |||
Where <code>id</code> is the tweet ID and <code>token</code> the access token. An ex-twitter employee leaked this intel online, but I can't for the life of me find the original source for this claim. The token can be either <code>5</code> or <code>a</code>, for some reason. The only caveat is that this endpoint doesn't let us get NSFW tweets. When looking up a NSFW post on the [https://platform.twitter.com/embed/Tweet.html?id=1843206748091691010 original embedding page], the page itself returns a "Not found" error, but the actual API returns an error that looks like this:<syntaxhighlight lang="json"> | |||
{ | |||
"__typename":"TweetTombstone", | |||
"tombstone": { | |||
"text": { | |||
"text":"Age-restricted adult content. This content might not be appropriate for people under 18 years old. To view this media, you’ll need to log in to X. Learn more", | |||
"entities": [ | |||
{ | |||
"from_index":134, | |||
"to_index":140, | |||
"ref": { | |||
"__typename":"TimelineUrl", | |||
"url":"https://twitter.com", | |||
"url_type":"ExternalUrl" | |||
} | |||
}, | |||
{ | |||
"from_index":147, | |||
"to_index":157, | |||
"ref":{ | |||
"__typename":"TimelineUrl", | |||
"url":"https://help.twitter.com/rules-and-policies/notices-on-twitter", | |||
"url_type":"ExternalUrl" | |||
} | |||
} | |||
], | |||
"rtl":false | |||
} | |||
} | |||
} | |||
</syntaxhighlight>This endpoint lets us fetch video content, images, profile pictures.. Pretty much everything you'd want except the replies. | |||
=== Fetching NSFW tweets === | |||
If you supply a login token, you can use a tool like [https://github.com/mikf/gallery-dl gallery-dl] or even [https://github.com/yt-dlp/yt-dlp/ yt-dlp] to download video content. If you wish to fetch NSFW tweets without having to login, you can fallback to using the [https://github.com/FxEmbed/FxEmbed FXTwitter API]: | |||
Example: https://api.fxtwitter.com/KXII/status/1890605540813808015 | |||
This API works by logging into thousands of accounts and getting their bearer tokens and then storing them in a database. These tokens take a really long time to expire, so that's why they're stored in that way. Please use this API sparingly as to not cause FxEmbed to get locked out of their accounts. Alternatively, you can run this API yourself, but you will need to supply your own twitter accounts. | |||
== Using the API == | |||
[[File:Twitter dashboard.png|alt=Twitter API dashboard|thumb|What a joke]] | |||
First, you'll need to sign up for Twitter (no phone number requirement in Canada as of March 30th 2025) and [https://developer.twitter.com/en/portal/petition/essential/basic-info head over here] to enable developer features. Create an app, and you will see something similar to the picture linked here -> | |||
I don't know why social media sites like to cripple their APIs so much, like why even make them in the first place? 100 fucking requests per MONTH? Not to mention [https://docs.x.com/x-api/introduction their pricing] is absolutely fucking nuts: | |||
* For hobbyists or prototypes | |||
* 10,000/month Posts read-limit rate cap | |||
* Cost: $200 per month | |||
Read that again. For hobbyists. What the fuck are you smoking my nigga? |
Latest revision as of 23:04, 3 April 2025

Twitter (X)
Twitter (also known as X if you're a Republican) is an abhorrent social media site where users share videos of minorities having altercations with the police, share political "opinions" and view Elon Musk tweets because their algorithm is literally designed that way now:
"Twitter’s algorithm specifically labels whether the Tweet author is Elon Musk “author_is_elon” besides the Democrat, Republican and “Power User” labels"
Despite Elon Musk's acquisition, the site remains relevant and people insist on using this piece of shit, so let's talk about how we can get data from it.
(Sane) Scraping methods
Syndication endpoint
Twitter embeds makes a request to the following endpoint:
https://cdn.syndication.twimg.com/tweet-result?id=1845711253295264243&token=5
Where id
is the tweet ID and token
the access token. An ex-twitter employee leaked this intel online, but I can't for the life of me find the original source for this claim. The token can be either 5
or a
, for some reason. The only caveat is that this endpoint doesn't let us get NSFW tweets. When looking up a NSFW post on the original embedding page, the page itself returns a "Not found" error, but the actual API returns an error that looks like this:
{
"__typename":"TweetTombstone",
"tombstone": {
"text": {
"text":"Age-restricted adult content. This content might not be appropriate for people under 18 years old. To view this media, you’ll need to log in to X. Learn more",
"entities": [
{
"from_index":134,
"to_index":140,
"ref": {
"__typename":"TimelineUrl",
"url":"https://twitter.com",
"url_type":"ExternalUrl"
}
},
{
"from_index":147,
"to_index":157,
"ref":{
"__typename":"TimelineUrl",
"url":"https://help.twitter.com/rules-and-policies/notices-on-twitter",
"url_type":"ExternalUrl"
}
}
],
"rtl":false
}
}
}
This endpoint lets us fetch video content, images, profile pictures.. Pretty much everything you'd want except the replies.
Fetching NSFW tweets
If you supply a login token, you can use a tool like gallery-dl or even yt-dlp to download video content. If you wish to fetch NSFW tweets without having to login, you can fallback to using the FXTwitter API:
Example: https://api.fxtwitter.com/KXII/status/1890605540813808015
This API works by logging into thousands of accounts and getting their bearer tokens and then storing them in a database. These tokens take a really long time to expire, so that's why they're stored in that way. Please use this API sparingly as to not cause FxEmbed to get locked out of their accounts. Alternatively, you can run this API yourself, but you will need to supply your own twitter accounts.
Using the API

First, you'll need to sign up for Twitter (no phone number requirement in Canada as of March 30th 2025) and head over here to enable developer features. Create an app, and you will see something similar to the picture linked here ->
I don't know why social media sites like to cripple their APIs so much, like why even make them in the first place? 100 fucking requests per MONTH? Not to mention their pricing is absolutely fucking nuts:
- For hobbyists or prototypes
- 10,000/month Posts read-limit rate cap
- Cost: $200 per month
Read that again. For hobbyists. What the fuck are you smoking my nigga?