Data Channels

Data channels send and receive records to and from scoring pipelines.

Details on data channel architecture is in the
Data Channels chapter in the architecture guide.

Details on configuring and deploying data channels is in the Working with Data Channels chapter in the user guide.

This chapter describes additional configuration that must be done in a Kubernetes cluster to support some data channels.

File Channels
- AKS
- EKS
JDBC Channels

File Channels

File data channels require a Kubernetes Persistent Volume Claim to be available in the namespace in which they are deployed. Once a persistent volume claim is configured, the volume name is available when deploying a file data channel.

AKS

Persistent Volume Claims (PVCs) represent storage requests from a user. PVCs allow usage of abstract storage that a cluster administrator might have provisioned. PVCs consume storage resources through Persistent Volumes (PVs). PVs allow abstraction of storage provising and consumption.

PVs and PVCs together provide a mechanism that let File Sources and Sinks read from, and write to, files shares in various kinds of storage (including cloud-provider-specific storage systems). A File Source deployment instance will read data from a folder named <file-source-deployment-name>/<user-name> in the file share (through the specified PVC) and stream data to scoring pipelines deployed by <user-name> that connect to <file-source-deployment-name>. A File Sink deployment will write data being sent from scoring pipelines deployed by <user-name> that connect to <file-sink-deployment-name>.

Example: PV and PVC Setup

There are several ways to create PVs and PVCs for use with File Source and Sink deployments. The following examples demonstrates one way of creating them using Azure File Shares. The following example assumes that the following are available or true:

an Azure storage account name (<account-name>) and key(<account-key>)
access to the kubectl command line tool using the appropriate context

Step-1: Creating a Kubernetes Secret to access storage

The Azure storage account name and key can be stored as a Kubernetes Secret for use in the creation of persistent volumes. Create the following configuration file.

create-storage-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: <storage-secret-name>
type: Opaque
data:
  azurestorageaccountname: <base-64-encoded-account-name>
  azurestorageaccountkey: <base-64-encoded-account-key>

To use the create-storage-secret.yaml file, run the following command:

kubectl apply -f create-storage-secret.yaml --namespace datachannels

Step-2: Creating a persistent volume that maps to a file share

The next step is to create Kubernetes Persistent Volumes that map to the file shares created in Azure storage. Create the following configuration file (adapted from this example).

create-persistent-volume.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-for-source
  labels:
    usage: pv-for-source
  annotations:
    description: "A persistent volume for file sources"
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  azureFile:
    secretName: storage-secret-name
    shareName: data-file-share
    readOnly: false

To use the create-persistent-volume.yaml file, run the following command:

kubectl apply -f create-persistent-volume.yaml

Step-3: Creating a persistent volume claim that uses a persistent volume

The next step is to create Kubernetes Persistent Volume Claims that use Persistent Volumes. Create the following configuration file (adapted from this example).

create-persistent-volume-claim.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-for-source
  annotations:
    description: "A persistent volume claim for file sources"
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  selector:
    matchLabels:
        usage: pv-for-source

To use the create-persistent-volume-claim.yaml file, run the following command:

kubectl apply -f create-persistent-volume-claim.yaml --namespace datachannels

pv-for-source will now become available as an option for the Volume Name field in Data Channel Deployments page.

EKS

EKS File-channels flow consist of below stages:

Create EFS Storage
Create SFTP Server
SFTP Client Tool Setup
EFS Configuration at EKS cluster

1. Create EFS Storage

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic EFS file system for use with AWS Cloud services and on-premises resources.

To create the EFS storage follow the simple steps:

Step 1: Login to the AWS console and search for the EFS service. Click on the “Create file system”

Step 2: Add EFS Storage name and select the VPC where EKS cluster is hosted. Click on the create button

Note: VPC group should be the same for EFS and EKS cluster.

Step 3: Select the newly created EFS storage

Step 4: Once the Elastic File system is in available state, in the network tab a new IP address will be assigned per Availability Zone. Select the same Security Group assigned to the EKS cluster

2. Create SFTP Server

AWS has provided a service called “AWS Transfer Family”. Using "AWS Transfer Family service, we can set up a SFTP server. The File Transfer Protocol is a standard communication protocol used for the transfer of computer files from a client to a server on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server.

To configure the SFTP server, follow the simple steps :

Step 1: Login to the AWS console and search for the “AWS Transfer Family” service. Click on “Create Server”

Step 2: Select “SFTP” option and click on “next”

Step 3: Select identity provider as “Service Managed” and click on “next”

Step 4: Select “Publicly accessible” endpoint and click on “next”

Step 5: Select “Amazon EFS” as a domain and click on “next”

Step 6: Keep the default additional details and click on “next”

Step 7: Review the details and click on create

Step 8: Now, create a Role with below two policies in IAM section

* AmazonElasticFileSystemFullAccess
* AmazonElasticFileSystemClientFullAccess

Step 9: Select the newly created SFTP server in AWS Transfer Family service

Step 10: Click on “Add User”

Step 11: Add the user details

* Username : Add username.
* User ID  : It should be "1000".
* Group ID : It should be "1000".
* Secondary Group ID : It should be "1000".
* Role     : Select the newly created IAM Role.
* Home directory: Select the EFS storage name that we created in EFS step 2.
* SSH Public key: Generate "ssh-rsa" key on your machine and add public key here.
  
  1. Windows Key generation. (https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement#user-key-generation)
  2. MacOS/Linux Key generation. (https://docs.rightscale.com/faq/How_Do_I_Generate_My_Own_SSH_Key_Pair.html)

Step 12: Click on “Add”

Step 13: Click on the SFTP server and note the endpoint and username

Note: Copied endpoint will be used as a hostname in SFTP client tool configuration.

3. SFTP Client Tool Setup

Follow the simple steps for SFTP Client tool configuration:

Step 1. SFTP Configuration

* Download and install one of the SFTP client tools (WinSCP or FileZilla).
* Click on the "Site manager".
* Click on "New Site" and provide a suitable name.
* Select protocol "SFTP".
* Copy the endpoint from SFTP server and paste it under "Host".
* Select Logon type "SSH key file".
* Provide username created at SFTP server.
* Add SSH private key.
* Click on connect

Step 2. Accept the host key for the first time

Step 3. Once the file channel setup is done then the user can see the file-datasource folder as below

4. EFS Configuration at EKS cluster

Configure the EFS storage in EKS cluster.

Follow below steps for configuration:

Step 1: Create a new namespace for hosting the provisioner

Namespace.yaml
	apiVersion: v1
	kind: Namespace
	metadata:
	name: <namespace name>