Scrapers

Scrapers are an essential part of Browserhub. They contain specific steps and properties to extract data from the web. On this page, we'll learn how to query scrapers.

The scraper model

The scraper model contains all the steps and properties of the scrapers. You can get the scraper name, the schedule that has been set on the app, the proxy being used, and more.

Properties

Name
id
Type
string
Description
Unique identifier of the scraper.
Name
name
Type
string
Description
Name of the scraper.
Name
steps
Type
array
Description
Chronological steps to run the scraper.
Name
pagination
Type
object
Description
Pagination details of the scraper.
Name
rows_limit
Type
integer
Description
Maximum number of rows that want to get extracted.
Name
schedule
Type
object
Description
Schedule details of the scraper.
Name
proxy
Type
object
Description
Proxy details of the scraper.
Name
webhook_url
Type
string
Description
URL that will be called after the scraper run status is not running anymore. Browserhub API will send the run object with POST method to it.
Name
created_at
Type
string
Description
ISO 8601 date and time of scraper creation.

GET/v1/scrapers

List all scrapers

This endpoint allows you to retrieve a paginated list of all your scrapers. By default, a maximum of 50 scrapers are shown per page.

Optional attributes

Name
page
Type
integer
Description
Page number of the request. Omitting this will make the API assume that you're requesting the 1st page.

Request

GET

/v1/scrapers

curl -G https://api.browserhub.io/v1/scrapers \
  -H "Authorization: Bearer {API key}"

Response

{
  "next_page": 2,
  "data": [
    {
      "id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
      "name": "Get featured products from Hiut Denim",
      "steps": [
        {
          "id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
          "type": "visit",
          "url": "https://hiutdenim.co.uk"
        },
        {
          "id": "18ec387f-f101-4dca-afa4-2890068750f4",
          "type": "click",
          "field_name": "Search Button"
        },
        {
          "id": "de169090-01ba-4dce-a991-363ebe30855f",
          // ...
        }
      ],
      "pagination": {
        type: "continuous_scroll"
      },
      "rows_limit": 30,
      "schedule": {
        "interval": "1 hour",
        "next_run": "2023-03-12T00:35:48Z"
      },
      "proxy": {
        "type": "data_center",
        "country": "United States"
      },
      "webhook_url": null,
      "created_at": "2023-03-11T04:13:19Z"
    },
    {
      "id": "fdd28450-3886-442c-9be1-b1d69721a785",
      // ...
    }
  ]
}

GET/v1/scrapers/:id

Retrieve a scraper

This endpoint allows you to retrieve a scraper by providing the scraper id.

Request

GET

/v1/scrapers/f4fcf5a4-730d-4dab-b0cb-c7433810266a

curl -G https://api.browserhub.io/v1/scrapers/f4fcf5a4-730d-4dab-b0cb-c7433810266a \
  -H "Authorization: Bearer {API key}"

Response

{
  "id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
  "name": "Get featured products from Hiut Denim",
  "steps": [
    {
      "id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
      "type": "visit",
      "url": "https://hiutdenim.co.uk"
    },
    {
      "id": "18ec387f-f101-4dca-afa4-2890068750f4",
      "type": "click",
      "field_name": "Search Button"
    },
    {
      "id": "de169090-01ba-4dce-a991-363ebe30855f",
      // ...
    }
  ],
  "pagination": {
    type: "continuous_scroll"
  },
  "rows_limit": 30,
  "schedule": {
    "interval": "1 hour",
    "next_run": "2023-03-12T00:35:48Z"
  },
  "proxy": {
    "type": "data_center",
    "country": "United States"
  },
  "webhook_url": null,
  "created_at": "2023-03-11T04:13:19Z"
}