Scrapers
Scrapers are an essential part of Browserhub. They contain specific steps and properties to extract data from the web. On this page, we'll learn how to query scrapers.
The scraper model
The scraper model contains all the steps and properties of the scrapers. You can get the scraper name, the schedule that has been set on the app, the proxy being used, and more.
Properties
- Name
id
- Type
- string
- Description
Unique identifier of the scraper.
- Name
name
- Type
- string
- Description
Name of the scraper.
- Name
steps
- Type
- array
- Description
Chronological steps to run the scraper.
- Name
pagination
- Type
- object
- Description
Pagination details of the scraper.
- Name
rows_limit
- Type
- integer
- Description
Maximum number of rows that want to get extracted.
- Name
schedule
- Type
- object
- Description
Schedule details of the scraper.
- Name
proxy
- Type
- object
- Description
Proxy details of the scraper.
- Name
webhook_url
- Type
- string
- Description
URL that will be called after the scraper run
status
is notrunning
anymore. Browserhub API will send the run object withPOST
method to it.
- Name
created_at
- Type
- string
- Description
ISO 8601 date and time of scraper creation.
List all scrapers
This endpoint allows you to retrieve a paginated list of all your scrapers. By default, a maximum of 50 scrapers are shown per page.
Optional attributes
- Name
page
- Type
- integer
- Description
Page number of the request. Omitting this will make the API assume that you're requesting the 1st page.
Request
curl -G https://api.browserhub.io/v1/scrapers \
-H "Authorization: Bearer {API key}"
Response
{
"next_page": 2,
"data": [
{
"id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
"name": "Get featured products from Hiut Denim",
"steps": [
{
"id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
"type": "visit",
"url": "https://hiutdenim.co.uk"
},
{
"id": "18ec387f-f101-4dca-afa4-2890068750f4",
"type": "click",
"field_name": "Search Button"
},
{
"id": "de169090-01ba-4dce-a991-363ebe30855f",
// ...
}
],
"pagination": {
type: "continuous_scroll"
},
"rows_limit": 30,
"schedule": {
"interval": "1 hour",
"next_run": "2023-03-12T00:35:48Z"
},
"proxy": {
"type": "data_center",
"country": "United States"
},
"webhook_url": null,
"created_at": "2023-03-11T04:13:19Z"
},
{
"id": "fdd28450-3886-442c-9be1-b1d69721a785",
// ...
}
]
}
Retrieve a scraper
This endpoint allows you to retrieve a scraper by providing the scraper id.
Request
curl -G https://api.browserhub.io/v1/scrapers/f4fcf5a4-730d-4dab-b0cb-c7433810266a \
-H "Authorization: Bearer {API key}"
Response
{
"id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
"name": "Get featured products from Hiut Denim",
"steps": [
{
"id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
"type": "visit",
"url": "https://hiutdenim.co.uk"
},
{
"id": "18ec387f-f101-4dca-afa4-2890068750f4",
"type": "click",
"field_name": "Search Button"
},
{
"id": "de169090-01ba-4dce-a991-363ebe30855f",
// ...
}
],
"pagination": {
type: "continuous_scroll"
},
"rows_limit": 30,
"schedule": {
"interval": "1 hour",
"next_run": "2023-03-12T00:35:48Z"
},
"proxy": {
"type": "data_center",
"country": "United States"
},
"webhook_url": null,
"created_at": "2023-03-11T04:13:19Z"
}