Web scraper
Extract data and analyze website with the Web scraper.
Last updated
Was this helpful?
Extract data and analyze website with the Web scraper.
Last updated
Was this helpful?
What it does The Web scraper component allows you to collect and extract information from a web address using a URL. By combining the Web scraper with an AI model you can filter what interests you on a site or order specific tasks to be carried out on the internet page.
The Web scraper component has the identifier of scraper-X, where X represents the instance number of the Web scraper component.
When to use it? Imagine you're tracking news about a certain topic or industry. You can use the Web Scraper node to automatically fetch the latest articles from news websites related to your topic, keeping you updated without manually browsing every day.
URL
This parameter specifies the URL or collection of URLs to source data from.
Content output format
Option:
HTML
Markdown
Plaintext
Enable caching
This option determines whether the results of the component are cached. This means that on the next run of the Flow, Diaflow will utilize the previous computed component output, as long as the inputs have not changed.
Caching time
Only applicable if the "Enable Caching" option has been enabled. This parameter controls how long Diaflow will wait before automatically clearing the cache.
Here is a simple use case of the Web scraper component, where the Web scraper component is being used to extract functions from a page that talk about HTML/CSS widgets.