Apache Pulsar Schema Registry
Pulsar messages are stored as unstructured byte arrays. Both the producer and consumer need to have the same data structure for the messages. Apache Pulsar schema is the metadata that defines how to translate the raw message bytes into a formal structure type. It serializes data into raw bytes before they are published to a topic and deserializes the raw bytes before they are delivered to consumers.
Pulsar uses a built-in Pulsar Schema Registry
as a central repository to store the registered schema information. The producers and consumers use the schema registry to coordinate the schema of a topic's messages through brokers. The Pulsar schema is applied at the Topic
level. As producers and consumers can upload schema to brokers, Pulsar schema work on both the producer and consumer side.
Schema registry ensures data consistency and increases data quality. It supports schema versioning so that the different versions of the schema can be used simultaneously without causing compatibility issues.
Pulsar Schema is defined in a data structure called SchemaInfo. The following table describes the field of a SchemaInfo defined for Pulsar Schema.
Field | Description |
---|---|
Name |
Name of the Schema in string format. |
Type |
Specify the Schema Type for serializing and deserializing the schema data. See Pulsar documentation for more information about Schema Types. |
Schema |
Specify the schema to be registered in the Schema Registry. Only the Avro schema type is supported. |
Properties | Specify a user-defined property as a string or string map, which can be used by applications to carry any application-specific logic. |
Configuring the Schema Registry
See Pulsar Destination Configuration Properties for information about configuring the Schema Registry.
Schema Compatibility
When schemas evolve and change, you must add (register) different schema versions. Before adding a version of a schema, the Schema Registry checks whether the new schema is compatible with the previously registered schema for a given topic. See Pulsar documentation for more information about setting schema compatibility.
Schema Registration Scenarios
The schema compatibility check ensures that the existing consumers can process the introduced messages. The following table describes the different scenarios of schema registration in the Schema Registry based on the schema exists or not.
Schema Compatibility Scenarios | Result |
---|---|
If no schema exists for the topic |
|
If a schema exists and the producer or consumer connects using the same schema that is already stored |
|
If a schema exists and the producer or consumer connects using a new schema that is compatible |
|