Links

Analyze and query the content of websites.

The Links component allows you to vectorize data from URL locations which is then suitable for input into an LLM component, so that you can query and ask questions about the website.

The Links component value has the identifier of li-X, where X represents the instance number of the Links component.

The Links component has the following parameters that can be specified directly on the UI component.

Parameter NameDescription

Url (s)

This parameter specifies the URL or collection or URLs to source data from.

The Links component does not have any input connections.

Component Outputs

The Links component has the following output connections.

Output Name FormatDescriptionConstraints

To VectorDB/Python

This output connection contains information regarding the provided URLs so that the results of analyzing the provided URLs can be vectorized into a suitable format for an LLM.

The output connection must be linked to a Vector Database or Python component.

The Links component has the following configuration options.

Configuration Option NameDescription

Description

This is a user supplied textual description of Links component.

Query Method

Available Options:

From Internal Data

Data from the specified URL will be acquired as soon as the URL is specified. Get Latest Data Data will be acquired in real-time from the desired URL and processed by the Links component each and every time the Links component is run.

Data Retrieval Mode

Available Options: HTML Parser (go to VectorDB) Crawl HTML (only the body part) of the url -> so the data need to be vectorized to feed the LLM. Meta Data (direct to LLM) Crawl the meta part of the URL -> the data is string so it can be directly connect to the LLM.

Enable Caching

This option determines whether the results of the component are cached. This means that on the next run of the Flow, Diaflow will utilize the previous computed component output, as long as the inputs have not changed.

Caching Time

Only applicable if the "Enable Caching" option has been enabled. This parameter controls how long Diaflow will wait before automatically clearing the cache.

Clear Cache

Only applicable if the "Enable Caching" option has been enabled. Clicking this button will clear the cache.

Chunking Method

Specifies the strategy used to divide the dataset into smaller, manageable chunks or partitions. Available Options: - Letter - Word - Sentence - Passage

Chunk Size

Specifies the chunk size when performing a semantic search. A larger chunk size can have more context, but then the tradeoff is loss of specificity in the answer. With larger chunks, you have less of them, and therefore less total database vectors to search and to score against your comparison input. Permitted Range: 0 - 4500

Chunk Overlap

Specifies the degree of redundancy or overlap between chunks of data stored within the vector database. A chunk in this context typically refers to a segment or partition of the overall dataset. Permitted Range: 0 - 4500

K

Specifies the number of nearest neighbors to be retrieved or considered in a search query.

When performing similarity searches or nearest neighbor queries in vector databases, the goal is often to find the closest vectors (or data points) to a given query vector. The parameter "K" specifies how many of these nearest neighbors should be returned as part of the query result.

Permitted Range: 1 - 100

Use Cases

The following is a simple use case of the Links Component, where the Links Component is being used to

Last updated