Scrape Profiles
The scraping pipeline discovers TikTok influencers by searching hashtags and user queries, enriches profiles with follower counts and bios, labels them by category, and stores everything in the database. Production-tested with 10,695+ influencers crawled.
Quick Start
Section titled “Quick Start”# Crawl Top tab for #Cat videos (5 scroll passes)python3 -m marketing_system.bots.tiktok.scraper "#Cat" \ --tab top --date-filter "Past 24 hours" --passes 5
# Crawl Users tab for pet influencerspython3 -m marketing_system.bots.tiktok.scraper "#Cat" \ --tab users --passes 3REST API
Section titled “REST API”curl -X POST http://localhost:5055/api/bot/start \ -H "Content-Type: application/json" \ -d '{ "type": "crawl", "device": "L9AIB7603188953", "params": { "query": "#Cat", "tab": "top", "passes": 5, "date_filter": "Past 24 hours" } }'Python
Section titled “Python”from marketing_system.skills.tiktok import loadfrom marketing_system.bots.common.adb import Device
dev = Device()skill = load()
# Crawl via the skill workflowwf = skill.get_workflow("crawl_hashtag", dev, query="#Cat", passes=5, date_filter="Past 24 hours")result = wf.run()Two Crawl Modes
Section titled “Two Crawl Modes”Top Tab Crawling
Section titled “Top Tab Crawling”Searches a hashtag, applies date/sort filters, scrolls through video tiles, and extracts creator handles from each video.
Flow: Open search -> type hashtag -> submit -> Top tab -> apply filters -> scroll -> extract creators -> enrich profiles
Parameters:
| Param | Default | Description |
|---|---|---|
query | required | Hashtag to search (e.g., #Cat) |
passes | 3 | Number of scroll passes |
date_filter | None | Past 24 hours, This week, This month, Last 3 months, Last 6 months |
sort | Relevance | Relevance, Like count, Date posted |
Users Tab Crawling
Section titled “Users Tab Crawling”Searches a query, switches to the Users tab, and extracts profiles directly from the user list.
Flow: Open search -> type query -> submit -> Users tab -> scroll -> extract handles + follower counts
Parameters:
| Param | Default | Description |
|---|---|---|
query | required | Search query |
passes | 3 | Number of scroll passes |
min_followers | None | Minimum follower threshold |
max_followers | None | Maximum follower threshold |
Profile Enrichment
Section titled “Profile Enrichment”After discovering a handle, the scraper optionally visits the profile to collect:
- Followers count
- Following count
- Total likes received
- Bio text
- Profile screenshot (saved to
data/profile_screenshots/)
Enrichment runs inline during crawling — tap user -> scrape stats -> back to results.
Deduplication
Section titled “Deduplication”Influencers are deduplicated at two levels:
- In-memory set — prevents re-visiting the same handle within a single crawl session
- Database upsert —
INSERT OR IGNOREon thehandlecolumn (unique constraint)
This means you can run the same crawl multiple times safely. Only new influencers get added.
Labels
Section titled “Labels”Influencers are automatically labeled based on the crawl query:
- Crawling
#Catlabels influencers ascat - Crawling
#Doglabels asdog - Labels are stored in the
influencer_labelstable
You can add or remove labels via the API:
curl -X POST http://localhost:5055/api/influencer/42/labels \ -H "Content-Type: application/json" \ -d '{"add": ["pet", "macro"], "remove": ["spam"]}'Continuous Crawling
Section titled “Continuous Crawling”For long-running crawl sessions, use the crawl runner which rotates through hashtags automatically:
# Rotates through all hashtags in the DB, picks least-recently-crawledpython3 -m marketing_system.bots.tiktok.crawl_runnerOr use the continuous scrape wrapper with auto-restart:
skill = load()wf = skill.get_workflow("continuous_scrape", dev, duration=3600) # 1 hourwf.run()Scheduling Crawls
Section titled “Scheduling Crawls”Set up recurring crawl jobs via the scheduler:
curl -X POST http://localhost:5055/api/schedules \ -H "Content-Type: application/json" \ -d '{ "name": "Daily cat crawl", "job_type": "crawl", "device": "L9AIB7603188953", "interval_minutes": 1440, "params": {"query": "#Cat", "tab": "top", "passes": 5}, "max_duration_minutes": 30, "priority": 5 }'The scheduler runs on a 30-second tick. Only one job runs per phone at a time.
Dashboard
Section titled “Dashboard”The Influencers tab shows all crawled profiles in a filterable grid:
- Search by handle, bio text
- Filter by label (cat, dog, pet, etc.)
- Filter by follower range
- Filter by outreach status (not contacted, DM sent, replied, etc.)
- View profile screenshots
- Click to see full outreach history
Available Tab Options
Section titled “Available Tab Options”TikTok search supports these tabs, but only top and users are currently implemented:
| Tab | Status |
|---|---|
top | Implemented |
users | Implemented |
videos | Not yet |
live | Not yet |
photos | Not yet |
sounds | Not yet |
hashtags | Not yet |
shop | Not yet |
Related
Section titled “Related”- Send DMs — outreach to crawled influencers
- Stealth Mode — avoid detection during long crawl sessions
- Scheduler — automate recurring crawls