Runs

Runs are the log of scrapers execution. They are created immediately before the scrapers begin the execution. On this page, we'll learn how to query and create runs.

The run model

The run model contains all the events and results of scraper execution. The status of the scraper, the credits used to run the scraper, and more.

Properties

  • Name
    id
    Type
    string
    Description

    Unique identifier of the run.

  • Name
    scraper_id
    Type
    string
    Description

    Unique identifier of the executed scraper.

  • Name
    source
    Type
    string
    Description

    Source of which the run created. Be it from the app, scheduler, API, or even integrations.

  • Name
    status
    Type
    string
    Description

    Current state of the scraper. It's either running, successful, or failed.

  • Name
    credits_used
    Type
    integer
    Description

    Total credits to run the scraper.

  • Name
    steps
    Type
    array
    Description

    Chronological steps to run the scraper.

  • Name
    rows_limit
    Type
    integer
    Description

    Maximum number of rows that want to get extracted.

  • Name
    webhook_url
    Type
    string
    Description

    URL that will be called after the scraper run status is not running anymore. Browserhub API will send the run object with POST method to it.

  • Name
    result_rows
    Type
    array
    Description

    Extracted rows from the scraper execution.

  • Name
    created_at
    Type
    string
    Description

    ISO 8601 date and time of run creation.


GET/v1/runs

List all runs

This endpoint allows you to retrieve a paginated list of all your runs. By default, a maximum of 50 runs are shown per page.

Optional attributes

  • Name
    scraper_id
    Type
    string
    Description

    Unique identifier of the executed scraper. You can get runs that were executed by a particular scraper.

  • Name
    page
    Type
    integer
    Description

    Page number of the request. Omitting this will make the API assume that you're requesting the 1st page.

Request

GET
/v1/runs
curl -G https://api.browserhub.io/v1/runs \
  -H "Authorization: Bearer {API key}" \
  -d scraper_id=f4fcf5a4-730d-4dab-b0cb-c7433810266a

Response

{
  "next_page": 2,
  "data": [
    {
      "id": "3fda0ace-4395-4537-9da7-16ee529fee44",
      "scraper_id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
      "source": "app",
      "status": "successful",
      "credits_used": 2,
      "steps": [
        {
          "type": "visit",
          "url": "https://hiutdenim.co.uk"
        },
        {
          "type": "click",
          "field_name": "Search Button"
        },
        {
          "type": "get_list",
          // ...
        }
      ],
      "rows_limit": 30,
      "webhook_url": null,
      "result_rows": [
        {
          "product_name": "The Anderson - Organic Denim",
          "price": "$230"
        },
        {
          "product_name": "Shorts - Organic Denim",
          // ...
        }
      ],
      "created_at": "2023-03-16T10:21:45Z"
    },
    {
      "id": "b6b8de2c-b56f-4d19-9e68-ce8fc0d99b15",
      "scraper_id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
      // ...
    }
  ]
}

POST/v1/runs

Create a run

This endpoint allows you to run a scraper by providing the scraper id. There are also modifications that you can do by providing the optional attributes.

Required attributes

  • Name
    scraper_id
    Type
    string
    Description

    Unique identifier of the scraper that you want to run.

Optional attributes

  • Name
    steps
    Type
    array
    Description

    Set of allowed steps to modify. Here are the step types that can be modified:

    • visit: It's the starting URL that the scraper will go. Send a step object with id and url attributes to modify this kind of step.
    • write_input: It's a step to write something on an input text. Send a step object with id and value attributes to modify this kind of step.
    • select_dropdown: It's a step to select a dropdown value. Send a step object with id and value attributes to modify this kind of step.

    If you try to modify steps beyond the step types above, you'll get 400 bad request error.

  • Name
    rows_limit
    Type
    integer
    Description

    Maximum number of extracted rows that want to be modified.

  • Name
    webhook_url
    Type
    string
    Description

    URL that will be called after the scraper run status is not running anymore. Browserhub API will send the run object with POST method to it.

Request

POST
/v1/runs
curl https://api.browserhub.io/v1/runs \
  -H "Authorization: Bearer {API key}" \
  -d scraper_id=f4fcf5a4-730d-4dab-b0cb-c7433810266a

Response

{
  "id": "3fda0ace-4395-4537-9da7-16ee529fee44",
  "scraper_id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
  "source": "api",
  "status": "running",
  "credits_used": 0,
  "steps": [
    {
      "type": "visit",
      "url": "https://hiutdenim.co.uk"
    },
    {
      "type": "click",
      "field_name": "Search Button"
    },
    {
      "type": "get_list",
      // ...
    }
  ],
  "rows_limit": 30,
  "webhook_url": null,
  "result_rows": null,
  "created_at": "2023-03-16T10:21:45Z"
}

GET/v1/runs/:id

Retrieve a run

This endpoint allows you to retrieve a run by providing the run id.

Request

GET
/v1/runs/3fda0ace-4395-4537-9da7-16ee529fee44
curl -G https://api.browserhub.io/v1/runs/3fda0ace-4395-4537-9da7-16ee529fee44 \
  -H "Authorization: Bearer {API key}"

Response

{
  "id": "3fda0ace-4395-4537-9da7-16ee529fee44",
  "scraper_id": "f4fcf5a4-730d-4dab-b0cb-c7433810266a",
  "source": "app",
  "status": "successful",
  "credits_used": 2,
  "steps": [
    {
      "type": "visit",
      "url": "https://hiutdenim.co.uk"
    },
    {
      "type": "click",
      "field_name": "Search Button"
    },
    {
      "type": "get_list",
      // ...
    }
  ],
  "rows_limit": 30,
  "webhook_url": null,
  "result_rows": [
    {
      "product_name": "The Anderson - Organic Denim",
      "price": "$230"
    },
    {
      "product_name": "Shorts - Organic Denim",
      // ...
    }
  ],
  "created_at": "2023-03-16T10:21:45Z"
}