Chunk Activity
The Chunk activity divides input text into smaller segments or chunks for the RAG workflow. You always use this activity with the Ingest activity because it prepares large text or files for processing and embedding. It supports several chunking strategies, including Paragraph-Based Chunking and Semantic Chunking, to determine how text divides into smaller parts. The Chunk activity's output, which contains the chunked text and its metadata, moves to the Ingest activity for additional processing and storage in a vector database.
General
The General panel contains the following fields.
| Property | Module Property? Yes/No | Description |
|---|---|---|
| Name | No | The name to be displayed as the label for the activity in the process. |
| RAG Connection | Yes | Indicates the RAG Configuration Connection shared resource utilized by this activity. |
| Chunk Strategy | No |
The dropdown defines how input text is divided into smaller segments. The supported chunking strategies are:
|
Description
On the Description tab, you can enter a short description of the activity.
|
Field |
Literal Value/ Process Property/ Module Property? |
Description |
|---|---|---|
|
Description |
None |
A description of the activity. |
Input
The following is the Input for the activity.
| Input Item | Data Type | Description |
|---|---|---|
| filePath | string | The optional field indicates the path of a single file whose content to be chunked. One of filePath, fileDirectory, or textContent must be specified. |
| fileDirectory | string | The optional field indicates the path of a directory that contains multiple files to be chunked. |
| textContent | string | The optional field indicates the raw text content to be chunked directly. |
| maxSegmentSizeInChars | string | The optional field indicates the maximum number of characters permitted in each chunk. |
| maxOverlapSizeInChars | string | The optional field indicates the maximum number of characters that can overlap between consecutive chunks. |
Output
The following is the Output for the activity.
| Output Item | Data Type | Description |
|---|---|---|
| chunk | complex |
A recurring element that includes the text and its associated metadata for every chunk generated by the activity. |
| text | string | The field contains actual text data of the chunk. |
| metadata | complex | A repeating element that provides key-value pairs of metadata associated with the chunk. |
Fault
The Fault tab lists exceptions that are generated by this activity.
| Error Schema Element | Data Type | Description |
|---|---|---|
| msg | string | The error message returned by the plug-in. |
| msgCode | string | The error code returned by the plug-in. |
| Fault | Generated When... |
|---|---|
| RAGPluginException | Any exception occurs during the activity execution. |