Chunk Activity

The Chunk activity divides input text into smaller segments or chunks for the RAG workflow. You always use this activity with the Ingest activity because it prepares large text or files for processing and embedding. It supports several chunking strategies, including Paragraph-Based Chunking and Semantic Chunking, to determine how text divides into smaller parts. The Chunk activity's output, which contains the chunked text and its metadata, moves to the Ingest activity for additional processing and storage in a vector database.

General

The General panel contains the following fields.

Property Module Property? Yes/No Description
Name No The name to be displayed as the label for the activity in the process.
RAG Connection Yes Indicates the RAG Configuration Connection shared resource utilized by this activity.
Chunk Strategy No

The dropdown defines how input text is divided into smaller segments.

The supported chunking strategies are:

  • Paragraph-Based Chunking

  • Semantic Chunking

Description

On the Description tab, you can enter a short description of the activity.

Field

Literal Value/ Process Property/ Module Property?

Description

Description

None

A description of the activity.

Input

The following is the Input for the activity.

Input Item Data Type Description
filePath string The optional field indicates the path of a single file whose content to be chunked. One of filePath, fileDirectory, or textContent must be specified.
fileDirectory string The optional field indicates the path of a directory that contains multiple files to be chunked.
textContent string The optional field indicates the raw text content to be chunked directly.
maxSegmentSizeInChars string The optional field indicates the maximum number of characters permitted in each chunk.
maxOverlapSizeInChars string The optional field indicates the maximum number of characters that can overlap between consecutive chunks.

Output

The following is the Output for the activity.

Output Item Data Type Description
chunk complex

A recurring element that includes the text and its associated metadata for every chunk generated by the activity.

text string The field contains actual text data of the chunk.
metadata complex A repeating element that provides key-value pairs of metadata associated with the chunk.

Fault

The Fault tab lists exceptions that are generated by this activity.

Error Schema Element Data Type Description
msg string The error message returned by the plug-in.
msgCode string The error code returned by the plug-in.
Fault Generated When...
RAGPluginException Any exception occurs during the activity execution.