Scrapers

Scrapers are an essential part of Browserhub. They contain specific steps and properties to extract data from the web. On this page, we'll learn how to query scrapers.

The scraper model

The scraper model contains all the steps and properties of the scrapers. You can get the scraper name, the schedule that has been set on the app, the proxy being used, and more.

Properties

  • Name
    id
    Type
    string
    Description

    Unique identifier of the scraper.

  • Name
    name
    Type
    string
    Description

    Name of the scraper.

  • Name
    steps
    Type
    array
    Description

    Chronological steps to run the scraper.

  • Name
    pagination
    Type
    object
    Description

    Pagination details of the scraper.

  • Name
    rows_limit
    Type
    integer
    Description

    Maximum number of rows that want to get extracted.

  • Name
    schedule
    Type
    object
    Description

    Schedule details of the scraper.

  • Name
    proxy
    Type
    object
    Description

    Proxy details of the scraper.

  • Name
    webhook_url
    Type
    string
    Description

    URL that will be called after the scraper run status is not running anymore. Browserhub API will send the run object with POST method to it.

  • Name
    created_at
    Type
    string
    Description

    ISO 8601 date and time of scraper creation.


GET/v1/scrapers

List all scrapers

This endpoint allows you to retrieve a paginated list of all your scrapers. By default, a maximum of 50 scrapers are shown per page.

Optional attributes

  • Name
    page
    Type
    integer
    Description

    Page number of the request. Omitting this will make the API assume that you're requesting the 1st page.

Request

GET
/v1/scrapers
curl -G https://api.browserhub.io/v1/scrapers \
  -H "Authorization: Bearer {API key}"

Response

{
  "next_page": 2,
  "data": [
    {
      "id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
      "name": "Get featured products from Hiut Denim",
      "steps": [
        {
          "id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
          "type": "visit",
          "url": "https://hiutdenim.co.uk"
        },
        {
          "id": "18ec387f-f101-4dca-afa4-2890068750f4",
          "type": "click",
          "field_name": "Search Button"
        },
        {
          "id": "de169090-01ba-4dce-a991-363ebe30855f",
          // ...
        }
      ],
      "pagination": {
        type: "continuous_scroll"
      },
      "rows_limit": 30,
      "schedule": {
        "interval": "1 hour",
        "next_run": "2023-03-12T00:35:48Z"
      },
      "proxy": {
        "type": "data_center",
        "country": "United States"
      },
      "webhook_url": null,
      "created_at": "2023-03-11T04:13:19Z"
    },
    {
      "id": "fdd28450-3886-442c-9be1-b1d69721a785",
      // ...
    }
  ]
}

GET/v1/scrapers/:id

Retrieve a scraper

This endpoint allows you to retrieve a scraper by providing the scraper id.

Request

GET
/v1/scrapers/f4fcf5a4-730d-4dab-b0cb-c7433810266a
curl -G https://api.browserhub.io/v1/scrapers/f4fcf5a4-730d-4dab-b0cb-c7433810266a \
  -H "Authorization: Bearer {API key}"

Response

{
  "id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
  "name": "Get featured products from Hiut Denim",
  "steps": [
    {
      "id": "d92d342d-d9b4-4182-ae2f-3b67695b691f",
      "type": "visit",
      "url": "https://hiutdenim.co.uk"
    },
    {
      "id": "18ec387f-f101-4dca-afa4-2890068750f4",
      "type": "click",
      "field_name": "Search Button"
    },
    {
      "id": "de169090-01ba-4dce-a991-363ebe30855f",
      // ...
    }
  ],
  "pagination": {
    type: "continuous_scroll"
  },
  "rows_limit": 30,
  "schedule": {
    "interval": "1 hour",
    "next_run": "2023-03-12T00:35:48Z"
  },
  "proxy": {
    "type": "data_center",
    "country": "United States"
  },
  "webhook_url": null,
  "created_at": "2023-03-11T04:13:19Z"
}