Skip to content

Scrape Profiles

The scraping pipeline discovers TikTok influencers by searching hashtags and user queries, enriches profiles with follower counts and bios, labels them by category, and stores everything in the database. Production-tested with 10,695+ influencers crawled.

Terminal window
# Crawl Top tab for #Cat videos (5 scroll passes)
python3 -m marketing_system.bots.tiktok.scraper "#Cat" \
--tab top --date-filter "Past 24 hours" --passes 5
# Crawl Users tab for pet influencers
python3 -m marketing_system.bots.tiktok.scraper "#Cat" \
--tab users --passes 3
Terminal window
curl -X POST http://localhost:5055/api/bot/start \
-H "Content-Type: application/json" \
-d '{
"type": "crawl",
"device": "L9AIB7603188953",
"params": {
"query": "#Cat",
"tab": "top",
"passes": 5,
"date_filter": "Past 24 hours"
}
}'
from marketing_system.skills.tiktok import load
from marketing_system.bots.common.adb import Device
dev = Device()
skill = load()
# Crawl via the skill workflow
wf = skill.get_workflow("crawl_hashtag", dev,
query="#Cat", passes=5, date_filter="Past 24 hours")
result = wf.run()

Searches a hashtag, applies date/sort filters, scrolls through video tiles, and extracts creator handles from each video.

Flow: Open search -> type hashtag -> submit -> Top tab -> apply filters -> scroll -> extract creators -> enrich profiles

Parameters:

ParamDefaultDescription
queryrequiredHashtag to search (e.g., #Cat)
passes3Number of scroll passes
date_filterNonePast 24 hours, This week, This month, Last 3 months, Last 6 months
sortRelevanceRelevance, Like count, Date posted

Searches a query, switches to the Users tab, and extracts profiles directly from the user list.

Flow: Open search -> type query -> submit -> Users tab -> scroll -> extract handles + follower counts

Parameters:

ParamDefaultDescription
queryrequiredSearch query
passes3Number of scroll passes
min_followersNoneMinimum follower threshold
max_followersNoneMaximum follower threshold

After discovering a handle, the scraper optionally visits the profile to collect:

  • Followers count
  • Following count
  • Total likes received
  • Bio text
  • Profile screenshot (saved to data/profile_screenshots/)

Enrichment runs inline during crawling — tap user -> scrape stats -> back to results.

Influencers are deduplicated at two levels:

  1. In-memory set — prevents re-visiting the same handle within a single crawl session
  2. Database upsertINSERT OR IGNORE on the handle column (unique constraint)

This means you can run the same crawl multiple times safely. Only new influencers get added.

Influencers are automatically labeled based on the crawl query:

  • Crawling #Cat labels influencers as cat
  • Crawling #Dog labels as dog
  • Labels are stored in the influencer_labels table

You can add or remove labels via the API:

Terminal window
curl -X POST http://localhost:5055/api/influencer/42/labels \
-H "Content-Type: application/json" \
-d '{"add": ["pet", "macro"], "remove": ["spam"]}'

For long-running crawl sessions, use the crawl runner which rotates through hashtags automatically:

Terminal window
# Rotates through all hashtags in the DB, picks least-recently-crawled
python3 -m marketing_system.bots.tiktok.crawl_runner

Or use the continuous scrape wrapper with auto-restart:

skill = load()
wf = skill.get_workflow("continuous_scrape", dev, duration=3600) # 1 hour
wf.run()

Set up recurring crawl jobs via the scheduler:

Terminal window
curl -X POST http://localhost:5055/api/schedules \
-H "Content-Type: application/json" \
-d '{
"name": "Daily cat crawl",
"job_type": "crawl",
"device": "L9AIB7603188953",
"interval_minutes": 1440,
"params": {"query": "#Cat", "tab": "top", "passes": 5},
"max_duration_minutes": 30,
"priority": 5
}'

The scheduler runs on a 30-second tick. Only one job runs per phone at a time.

The Influencers tab shows all crawled profiles in a filterable grid:

  • Search by handle, bio text
  • Filter by label (cat, dog, pet, etc.)
  • Filter by follower range
  • Filter by outreach status (not contacted, DM sent, replied, etc.)
  • View profile screenshots
  • Click to see full outreach history

TikTok search supports these tabs, but only top and users are currently implemented:

TabStatus
topImplemented
usersImplemented
videosNot yet
liveNot yet
photosNot yet
soundsNot yet
hashtagsNot yet
shopNot yet
  • Send DMs — outreach to crawled influencers
  • Stealth Mode — avoid detection during long crawl sessions
  • Scheduler — automate recurring crawls