Diaflow's Documentation
HomePricingIntegrations
Documentation
Documentation
  • 😎WELCOME TO DIAFLOW
    • Introduction to Generative AI
    • What can you build with Diaflow?
  • 💻USER ACCOUNT
    • Create your user account
    • Delete your user account
    • Log out and log in
    • Change "Personal" & "Workspace" settings
    • Reset user account password
  • 🚀Getting Started
    • Let's start with the basics
      • How a component works?
      • How a flow works?
      • Creating your first flow
    • Dashboard
      • Services
      • Create a flow from scratch
      • Create a flow from templates
      • View your flows
    • Terminology
  • 🌊Flows
    • Overview
    • Create a flow
    • Delete a flow
    • Manage a flow
    • Publish a flow
    • Unpublish a flow
    • Deployment
    • Component Reference
      • Trigger
        • When inputs are submitted (Apps)
        • Cronjob (Automation)
        • Webhook (Automation)
        • Microsoft Outlook (Automation)
      • Outputs (Apps)
        • Text Output
        • Chart Output
        • Video Output
        • Audio Output
        • Image Output
      • Built in tools
        • Branch
        • Merge (Multiple data source to JSON)
        • Split Data (JSON Formatter)
        • Video to audio
        • Get current date and time
        • Web scraper
        • Document to plain text
        • Retrieve data from spreadsheet (Spreadsheet analyzer)
        • Spreadsheet creator
        • Convert JSON to chart data
        • PDF to image
        • Get weather information
        • HTTP Request
        • Get GEO Location
        • SMTP
        • Loop
      • Built in resources
        • Diaflow Vision
        • Diaflow Vectors
        • Diaflow Drive
        • Diaflow Table
      • Apps
        • Hunter.io
        • Outlook Email
        • Telegram
        • Slack
        • Python
        • YouTube
        • SerpAPI
        • Google Sheet
          • Document-level Operations
          • Sheet-level Operations
          • Data-level Operations
      • Database
        • MySQL
        • Microsoft SQL
        • PostgreSQL
        • Snowflake
      • Private AI/LLM Models
        • OpenAI
          • GPT Variants
          • GPT Vision
          • DALL-E Variants
          • TTS Variants
          • Whisper
        • Anthropic
        • Llama
        • Google Gemini
        • Cohere
        • MistralAI
      • Public AI/LLM Models
        • OpenAI Cloud
        • Perplexity Cloud
        • Deepseek Cloud
        • Anthropic Cloud
        • Replicate
        • Straico
        • OpenRouter
        • Cohere Cloud
        • Google Gemini Cloud
        • MistralAI Cloud
        • ElevenLabs Cloud
      • AI Tools
  • ✒️PRODUCTIVITY TOOLS
    • Tables
    • Drive
    • Vectors
      • Document
      • Article
      • URLs
  • 🏠Workspace
    • History
    • Teams
    • Billing & Subscription
      • Upgrade/Downgrade a subscription
      • Buy credits
      • Credit Usage
      • Cancel a subscription
    • Settings
      • Personnal
      • Workspace
        • Change workspace
        • Workspace settings
        • Custom Domain
        • Delete workspace
      • Change Language
    • Documentation
    • Integrations
    • API keys
  • 📑Other
    • FAQs
    • Contact Information
Powered by GitBook
On this page
  • Description
  • Component settings
  • Advanced configurations
  • Use case

Was this helpful?

  1. Flows
  2. Component Reference
  3. Built in tools

Web scraper

Extract data and analyze website with the Web scraper.

Last updated 1 month ago

Was this helpful?

Description

What it does The Web scraper component allows you to collect and extract information from a web address using a URL. By combining the Web scraper with an AI model you can filter what interests you on a site or order specific tasks to be carried out on the internet page.

The Web scraper component has the identifier of scraper-X, where X represents the instance number of the Web scraper component.

When to use it? Imagine you're tracking news about a certain topic or industry. You can use the Web Scraper node to automatically fetch the latest articles from news websites related to your topic, keeping you updated without manually browsing every day.

Component settings

Parameter Name
Description

URL

This parameter specifies the URL or collection of URLs to source data from.

Content output format

Option:

  • HTML

  • Markdown

  • Plaintext

Advanced configurations

Options
Description

Enable caching

This option determines whether the results of the component are cached. This means that on the next run of the Flow, Diaflow will utilize the previous computed component output, as long as the inputs have not changed.

Caching time

Only applicable if the "Enable Caching" option has been enabled. This parameter controls how long Diaflow will wait before automatically clearing the cache.

Use case

Here is a simple use case of the Web scraper component, where the Web scraper component is being used to extract functions from a page that talk about HTML/CSS widgets.

🌊