From lolcat's wiki
Jump to: navigation, search
old instagram logo
Old Instagram logo from when everything didn't suck ass

Instagram

Instagram is the worst image-sharing social media website the world has ever seen. Anyone who has ever posted on there should seek help.

Why is it such garbage?

  • Images are limited to 1080x1080 (1350px in height for portrait images)
  • Facebook, need I say more
  • The API fucking sucks
  • It's one of the hardest websites to get data from.

The API

Just as a preface: don't even bother. The API is severely limited. Most platforms have API endpoints that let you search through posts, hashtags, or an user's profile, like Twitter does. Here, it's an entirely different story, and the API restrictions they put on us developers are sincerely kind of funny. Not to mention their documentation are an absolute nightmare to navigate, it's literally easier to figure out how to scrape the platform than to understand how the API works, but here's a breakdown anyway:

First, you need a to get a developer account on the Facebook developer portal. It will require you to first:

  1. Create a Facebook account
  2. Verify your phone number (can't be voIP) OR verify using a payment method (the payment method can't be a prepaid card)
  3. Sell your soul to Faceberg overlords

From there, you can refer to this page which barely explains anything. The only thing that's worth noting here is that the Basic Display API is deprecated. Before it's death, that API was letting us preview the last 6 posts of an user and view posts. So with that shit gone, what's left for us to use?

The torture begins

How can I search for posts? Well, the short answer is that you can't. The only API available is the hashtag search endpointwhich only lets you search up "30 unique hashtags on behalf of an Instagram Business or Creator Account within a rolling, 7 day period."

So to reiterate, you need a business or creator account to hit up this endpoint. Getting one of those accounts is even more of a pain, because you need to first make a valid API request through curl (good luck doing that when all endpoints require a fucking business account). You then need to verify that you're a business, send tax information through some form and honestly it's as far as I'm willing to look to get access to the world's shittiest API. Even once you have this fucking business account, you need an app token, which requires you to send Facebook an app prototype that uses their API. A human will then review your app and ask for adjustments indefinitely. This is barely even scratching the surface, the requirements are fucking INSANE, what kind of crack do they sell down Hacker Way?

>Oh but hurr I just want to embed post data in my program

If you want to get an user's posts, again, you can't really do that. The only way to get any sort of data is to have the owner of an Instagram account allow your app, which will then allow you to scrape that user's posts. Alternatively, there's the oEmbed API, but again, that requires an app review which I won't bother with.

So.. How can I scrape shit?

There are methods available, but they all have caveats. First, you need to understand that Instagram blocks all non-residential IPs, and even then, it only lets you view a very small threshold of posts/profiles before it blocks you. When scraping the main endpoints (like instagram.com/profile), it's advisable to be logged in.

Instagram frontends

Bibliogram used to be a frontend for Instagram, but even when it was maintained it always was regarded as a piece of shit. It would only let you see a limited amount of posts, and every user had a really small ratelimit. It was an all-around shitty experience. Personally, I'm glad it died off.

On the other hand, you've got frontends like https://imginn.com/ and https://www.pixnoy.com. They work, and they support searching profiles. They're all behind Cloudflare though, so it might be harder to scrape shit from those places. Besides, you shouldn't really scrape anything from these shitholes as they can be quite unreliable themselves. If you search up "instagram view profile no login" or "instagram story viewer", you'll be able to find a plethora of other sites that let you hit up the site, although most of them won't support searching for profiles.

gallery-dl

Gallery-DL lets you scrape IG posts. To bypass the initial page view threshold, you can supply cookies to your gallery-dl process.

To do this, log onto a (burner) Instagram account in a private window. From here, use a Firefox extension like ExportCookies or any other extension that lets you export cookies as a text file. Then, run the following command:

gallery-dl <url> --cookies <cookie file>

It should then create a folder hierarchy of all images it can find.

Now, it's important to note that this script will be hitting /graphql endpoints (which are notorious for having many bot checks), so make sure to add timeouts between requests to avoid detection.

The fun way

I have found some endpoints which:

  1. don't require login
  2. don't require residential proxies
  3. may or may not have ratelimits? Who knows..

They all share one thing in common, and it's that they're all used to embed content on other sites through the use of <iframes>.

Getting profile data

Visit https://www.instagram.com/kamicakes_1/embed/. In the HTML of the page, you have access to the following information:

  • Username
  • Truncated amount of followers (Eg; 1800 or 1.2M)
  • Last 6 posts in the highest resolution available
  • Total amount of posts

I've checked far and wide, but there is no known way of doing pagination on that profile view or getting additional data like the profile bio.

Getting post data

Visit https://www.instagram.com/p/DBy_xXopllw/embed/captioned/. Again, in the HTML of the page, you will find the following:

  • Username
  • Truncated amount of followers
  • All images from the post in the highest resolution available
  • Post caption (thanks to the /captioned/ part of the URL)
  • Number of comments (again, thanks to the /captioned/ part)

You can also replace the post ID by a reel's ID to get the thumbnail of a "reel" video, although it won't let you access the video itself.

Resolve a numerical ID to a post ID

Let's say you wanted to scrape the entirety of Instagram, this would be the endpoint you'd use to discover posts. The idea here is to cycle through from 0 to a bajillion to get all known post IDs.

https://lookaside.instagram.com/seo/google_widget/crawler/?media_id=3460616374939174801

This endpoint will serve a Location header pointing to https://i.instagram.com/p/DAGlEIEJkeR, which will in turn return a 404 error. However, this URL can be rewritten to omit the i. prefix to resolve to an actual post.