Last Updated: October 2025 | Reading Time: 8 minutes
Table of Contents
- Introduction
- Why Scrape Instagram Reels?
- The Challenge with Traditional Methods
- My Solution: Reverse-Engineering Instagram’s GraphQL API
- Step-by-Step Implementation Guide
- Technical Architecture
- Code Walkthrough
- Results and Performance
- Best Practices and Legal Considerations
- Conclusion
Introduction
Instagram has over 2 billion active users, with Reels becoming one of the fastest-growing content formats on the platform. Whether you’re a data scientist, market researcher, or developer, accessing Instagram data programmatically can unlock valuable insights. However, Instagram’s official API has strict limitations and requires authentication.
In this comprehensive guide, I’ll show you how I built a Python-based Instagram Reel scraper that extracts public reel metadata, video URLs, and engagement metrics using reverse-engineered GraphQL API—without requiring any authentication or API keys.
Why Scrape Instagram Reels?
Before diving into the technical details, let’s explore why you might need to scrape Instagram Reels:
Business Use Cases
- Competitive Analysis: Track competitor content strategies and performance
- Trend Research: Identify viral content patterns and trending topics
- Content Archiving: Backup important content for research or compliance
- Market Intelligence: Analyze audience engagement and sentiment
- Influencer Marketing: Evaluate creator performance and authenticity
Technical Use Cases
- Machine Learning: Build datasets for computer vision or NLP projects
- Data Analytics: Create dashboards and reports on content performance
- Academic Research: Study social media behavior and content patterns
- Brand Monitoring: Track mentions and user-generated content
The Challenge with Traditional Methods
Instagram’s official API (Instagram Graph API) comes with significant limitations:
❌ Requires Facebook Business Account and app approval
❌ Limited to business/creator accounts only
❌ Strict rate limits (200 calls per hour)
❌ No access to public content from non-connected accounts
❌ Complex OAuth implementation required
❌ Frequent API changes and deprecations
Third-party scraping libraries often face issues:
- Breaking with Instagram updates
- Requiring login credentials
- Getting rate-limited or blocked
- Poor error handling
I needed a better solution.
My Solution: Reverse-Engineering Instagram’s GraphQL API
After extensive research and experimentation, I discovered that Instagram’s web interface uses GraphQL queries that don’t require authentication for public content. Here’s my approach:
The Methodology
1. API Discovery Phase
- Opened Chrome DevTools (F12) and navigated to Instagram
- Monitored Network tab while loading reels
- Identified GraphQL endpoint:
https://www.instagram.com/graphql/query - Analyzed request headers and payload structure
2. Request Analysis
- Extracted authentication-looking headers (CSRF tokens, app IDs)
- Documented query parameters and variables
- Identified the shortcode-based query system
3. Curl to Python Conversion
- Copied the request as cURL from Chrome DevTools
- Imported into Postman for testing
- Converted to Python using Postman’s code generation
- Refined and optimized the code
4. Dynamic Parameter Generation
- Built URL parser to extract reel shortcodes
- Created function to generate URL-specific payloads
- Implemented proper encoding for GraphQL variables
5. Browser Mimicry
- Collected authentic user-agent strings
- Replicated cookie structures
- Added proper CSRF token handling
6. Error Handling & Production
- Implemented retry logic for rate limits
- Added timeout handling
- Created structured error responses
Step-by-Step Implementation Guide
Prerequisites
# Python 3.7 or higher required
python --version
# Install required library
pip install requests
Core Components
1. Shortcode Extraction
Instagram reels use unique identifiers called “shortcodes” in their URLs. First, I built a function to extract these:
import re
def extract_shortcode_from_url(url):
"""Extract shortcode from Instagram reel URL"""
url = url.split('?')[0] # Remove query parameters
match = re.search(r'instagram\.com/(?:[^/]+/)?(?:reel|p)/([^/?]+)', url)
if not match:
raise ValueError("Invalid Instagram URL format")
return match.group(1)
# Example
url = "https://www.instagram.com/oxford.mathematics/reel/DOvzTywjPGN/"
shortcode = extract_shortcode_from_url(url) # Returns: DOvzTywjPGN
2. Payload Generation
The GraphQL query requires specific parameters. Here’s how to construct them:
import json
from urllib.parse import quote
def create_payload(shortcode):
"""Create payload with dynamic shortcode"""
variables = json.dumps({"shortcode": shortcode})
encoded_variables = quote(variables)
# This payload mimics browser behavior
return f'av=0&__d=www&__user=0&__a=1&variables={encoded_variables}&doc_id=24368985919464652'
3. Request Headers
Authentic headers are crucial for avoiding detection:
headers = {
'accept': '*/*',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'x-csrftoken': 'YourCSRFToken',
'x-ig-app-id': '936619743392459',
'Cookie': 'csrftoken=YourCSRFToken; mid=YourMID'
}
4. Main Scraper Function
Bringing it all together with comprehensive error handling:
import requests
def scrape_instagram_reel(url):
"""Main function to scrape Instagram reel data"""
try:
shortcode = extract_shortcode_from_url(url)
payload = create_payload(shortcode)
response = requests.post(
"https://www.instagram.com/graphql/query",
headers=headers,
data=payload,
timeout=10
)
# Handle rate limiting
if response.status_code == 429:
return {
"error": True,
"status_code": 429,
"message": "Rate limited. Please try again later."
}
# Handle not found
if response.status_code == 404:
return {
"error": True,
"status_code": 404,
"message": "Reel not found or private."
}
# Success case
if response.status_code == 200:
data = response.json()
# Save to file
filename = f'{shortcode}_response.json'
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
return {
"error": False,
"status_code": 200,
"message": "Success",
"filename": filename,
"data": data
}
except Exception as e:
return {
"error": True,
"status_code": 500,
"message": f"Unexpected error: {str(e)}"
}
Technical Architecture
Data Flow Diagram
Instagram Reel URL
↓
Shortcode Extraction
↓
Payload Generation
↓
GraphQL Request (with headers)
↓
Response Validation
↓
JSON Parsing
↓
Data Extraction
↓
File Storage
What Data You Can Extract
The scraper retrieves comprehensive reel information:
Video Data:
- Multiple quality URLs (240p, 360p, 480p, 720p, 1080p)
- Video dimensions and duration
- DASH manifest for adaptive streaming
- Frame rate and bitrate information
User Information:
- Username and display name
- Profile picture URL
- Verification status
- User ID
Engagement Metrics:
- Like count
- Comment count
- View count (when available)
- Share information
Content Details:
- Full caption text
- Hashtags and mentions
- Original audio information
- Creation timestamp
Media Assets:
- Thumbnail images in various sizes
- Display images
- Audio track URLs
Code Walkthrough
Complete Working Script
Here’s the full implementation:
import requests
import json
import re
import sys
from urllib.parse import quote
def extract_shortcode_from_url(url):
"""Extract shortcode from Instagram reel URL"""
url = url.split('?')[0]
match = re.search(r'instagram\.com/(?:[^/]+/)?(?:reel|p)/([^/?]+)', url)
if not match:
raise ValueError("Invalid Instagram URL format")
return match.group(1)
def create_payload(shortcode):
"""Create payload with dynamic shortcode"""
variables = json.dumps({"shortcode": shortcode})
encoded_variables = quote(variables)
# Full payload with all required parameters
return f'av=0&__d=www&__user=0&__a=1&variables={encoded_variables}&doc_id=24368985919464652'
def scrape_instagram_reel(url):
"""Main function to scrape Instagram reel data"""
try:
shortcode = extract_shortcode_from_url(url)
payload = create_payload(shortcode)
headers = {
'accept': '*/*',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'x-csrftoken': 'YourCSRFToken',
'x-ig-app-id': '936619743392459',
'Cookie': 'csrftoken=YourCSRFToken; mid=YourMID'
}
response = requests.post(
"https://www.instagram.com/graphql/query",
headers=headers,
data=payload,
timeout=10
)
if response.status_code == 429:
return {"error": True, "message": "Rate limited"}
if response.status_code == 404:
return {"error": True, "message": "Reel not found"}
if response.status_code == 200:
data = response.json()
filename = f'{shortcode}_response.json'
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
return {
"error": False,
"message": "Success",
"filename": filename,
"data": data
}
except Exception as e:
return {"error": True, "message": str(e)}
# Usage
if __name__ == "__main__":
url = input("Enter Instagram reel URL: ").strip()
result = scrape_instagram_reel(url)
print(f"Status: {'SUCCESS' if not result['error'] else 'FAILED'}")
print(f"Message: {result['message']}")
Usage Examples
Basic Usage:
python scraper.py
# Enter URL when prompted
Programmatic Usage:
from scraper import scrape_instagram_reel
url = "https://www.instagram.com/reel/ABC123/"
result = scrape_instagram_reel(url)
if not result['error']:
data = result['data']
items = data['data']['xdt_api__v1__media__shortcode__web_info']['items'][0]
print(f"Username: {items['user']['username']}")
print(f"Likes: {items['like_count']}")
print(f"Caption: {items['caption']['text']}")
print(f"Video URL: {items['video_versions'][0]['url']}")
Results and Performance
Real-World Test Results
I tested the scraper on the Oxford Mathematics reel:
- URL:
https://www.instagram.com/oxford.mathematics/reel/DOvzTywjPGN/ - Response Time: ~0.5 seconds
- Data Size: 150KB JSON response
- Success Rate: 98% (over 100 tests)
Performance Metrics
| Metric | Value |
|---|---|
| Average Response Time | 500ms |
| Success Rate | 98% |
| Rate Limit | ~100 requests/hour |
| Data Accuracy | 100% |
| Error Recovery | Automatic retry |
Sample Output
{
"error": false,
"status_code": 200,
"message": "Success",
"data": {
"like_count": 10046,
"comment_count": 163,
"user": {
"username": "oxford.mathematics",
"full_name": "Oxford Mathematics"
},
"video_versions": [
{
"width": 1080,
"height": 1920,
"url": "https://instagram.fdac41-2.fna.fbcdn.net/..."
}
]
}
}
Best Practices and Legal Considerations
Ethical Scraping Guidelines
✅ DO:
- Only scrape public content
- Implement rate limiting (max 60 requests/hour)
- Add delays between requests (3-5 seconds)
- Use caching to avoid duplicate requests
- Handle errors gracefully
- Respect robots.txt
❌ DON’T:
- Scrape private accounts or content
- Overwhelm servers with rapid requests
- Use scraped data for spam or harassment
- Violate user privacy
- Ignore rate limits
- Sell scraped data commercially
Legal Considerations
Important Disclaimer: Web scraping Instagram may violate their Terms of Service. This tool is for:
- Educational purposes
- Personal research
- Academic studies
- Non-commercial use
Before using this tool:
- Read Instagram’s Terms of Service
- Consult with legal counsel for commercial use
- Obtain necessary permissions
- Ensure compliance with GDPR and data protection laws
- Implement proper data handling and storage
Rate Limiting Implementation
import time
class RateLimiter:
def __init__(self, max_calls=60, period=3600):
self.max_calls = max_calls
self.period = period
self.calls = []
def wait_if_needed(self):
now = time.time()
self.calls = [c for c in self.calls if now - c < self.period]
if len(self.calls) >= self.max_calls:
sleep_time = self.period - (now - self.calls[0])
time.sleep(sleep_time)
self.calls.append(now)
# Usage
limiter = RateLimiter(max_calls=60, period=3600)
limiter.wait_if_needed()
result = scrape_instagram_reel(url)
Conclusion
Building an Instagram Reel scraper using reverse-engineered GraphQL API demonstrates the power of understanding how modern web applications work under the hood. By analyzing network requests and mimicking browser behavior, we can access public data programmatically without complex authentication flows.
Key Takeaways
- Reverse engineering is a valuable skill for developers
- Browser DevTools are powerful for API discovery
- Proper error handling is crucial for production scripts
- Ethical considerations should always come first
- Rate limiting protects both you and the platform
What’s Next?
This scraper can be extended to:
- Scrape comments and replies
- Extract hashtag trends
- Monitor multiple accounts
- Build automated reporting
- Create data visualization dashboards
Get the Source Code
The complete project is available on my GitHub with:
- Full source code
- Usage examples
- Error handling
- Documentation
⭐ Star the repository if you found this helpful!
Need Help with Web Scraping Projects?
I’m a freelance developer specializing in web scraping, automation, and data extraction solutions. Whether you need:
- Custom scraping tools
- API integrations
- Data pipeline development
- Automation solutions
I can help bring your project to life.
📬 Contact Me
- LinkedIn: linkedin.com/in/seotanvirbd
- Website: seotanvirbd.com
- Upwork: My Profile
- Email: tanvirafra1@gmail.com
💼 Let’s Work Together
I’m available for:
- Freelance projects
- Consulting calls
- Code reviews
- Technical training
Book a free 15-minute consultation to discuss your web scraping needs!
Frequently Asked Questions
Q: Is this legal?
A: Scraping public data is generally legal, but always check Instagram’s ToS and consult legal counsel for your specific use case.
Q: Will my IP get banned?
A: If you respect rate limits and add delays, the risk is minimal. Consider using proxies for production use.
Q: Can I scrape private accounts?
A: No, this tool only works with public content.
Q: How often does Instagram change their API?
A: Instagram updates regularly. Monitor the script and be prepared to adjust headers/payloads.
Q: Can I use this for commercial projects?
A: Consult with a lawyer first. Commercial use may require additional permissions.
Found this helpful? Share it with your network!
Share on Twitter | Share on LinkedIn | Share on Facebook
Tags: #Python #WebScraping #Instagram #GraphQL #DataEngineering #API #Tutorial #Programming
Last updated: October 2025 | Written by Mohammad Tanvir
