OpenAI

To process a large amount of data based on context and input from the user using OpenAI Provider.

Description

The OpenAI component allows you to integrate OpenAI into your flows. You can customize the parameters used by OpenAI component, and also specify the context of knowledge that the OpenAI component operates on, as well as provide the input query.

The OpenAI component UI changes depending on the model selected, as each model has differing available options. You can specify the exact model to run with the "Model" dropdown menu. These models range from Text to Image, GPT Chat, GPT Vision, Text to Speech models and more. See the Parameters table for more information on the available models to use.

The OpenAI component has the identifier of opa-X, where X represents the instance number of the OpenAI component.

Component settings

Parameter Name

Description

Credential

You can specify to use your own OpenAI credentials or alternatively you can use Diaflow's default credentials.

Models

This parameter specifies the version of OpenAI that the component should use. Available values: - GPT-4o-mini - GPT-4 - O1-MINI

- 03-MINI - O1-PREVIEW - DALL-E 3 - TTS - TTS-HD - Whisper

Each of the above AI models serves different purposes, ranging from natural language understanding and generation (GPT), image generation (DALL-E), text-to-speech synthesis (TTS), to specialized tasks like ASMR generation (Whisper). They vary in terms of capabilities, modalities, and target applications.

Each of the available models is summarized below:

GPT Vision

GPT-4 Vision:
- A version of GPT-4 tailored specifically for understanding and generating visual content, such as images and videos. It extends the capabilities of traditional GPT models to incorporate vision-based inputs and outputs.
GPT-4o:
- "o" for omni is a step towards much more natural human-computer interaction. It accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, which is similar to human response.

DALL-E Variants

DALL-E 2:
- A version of OpenAI's DALL-E model, which generates images from textual descriptions. DALL-E 2 includes improvements over the original DALL-E model in terms of image generation quality or efficiency.
DALL-E 3:
- Another iteration of the DALL-E model, with further enhancements compared to DALL-E 2.

TTS Variants

TTS-1:
- Stands for Text-to-Speech 1, a model designed to convert written text into spoken audio. It provides high-quality and natural-sounding speech synthesis.
TTS-1 HD:
- A variant of TTS-1 optimized for high-definition audio synthesis, offering even higher fidelity and clarity in the generated speech.

Whisper

Whisper-1:
- A model optimized for generating ASMR (Autonomous Sensory Meridian Response) content, which typically involves producing calming and pleasurable sensations through auditory stimuli. Whisper-1 specializes in creating ASMR-inducing audio content.

For more information regarding the various versions of OpenAI model please refer to the following subsections.

Last updated 7 months ago

Was this helpful?