Web scraper

Extract data and analyze website with the Web scraper.

Description

What it does The Web scraper component allows you to collect and extract information from a web address using a URL. By combining the Web scraper with an AI model you can filter what interests you on a site or order specific tasks to be carried out on the internet page.

The Web scraper component has the identifier of scraper-X, where X represents the instance number of the Web scraper component.

When to use it? Imagine you're tracking news about a certain topic or industry. You can use the Web Scraper node to automatically fetch the latest articles from news websites related to your topic, keeping you updated without manually browsing every day.

Component settings

Parameter Name
Description

URL

This parameter specifies the URL or collection of URLs to source data from.

Content output format

Option:

  • HTML

  • Markdown

  • Plaintext

Advanced configurations

Options
Description

Enable caching

This option determines whether the results of the component are cached. This means that on the next run of the Flow, Diaflow will utilize the previous computed component output, as long as the inputs have not changed.

Caching time

Only applicable if the "Enable Caching" option has been enabled. This parameter controls how long Diaflow will wait before automatically clearing the cache.

Use case

Here is a simple use case of the Web scraper component, where the Web scraper component is being used to extract functions from a page that talk about HTML/CSS widgets.

Last updated

Was this helpful?