Apache Pulsar Schema Registry

Pulsar messages are stored as unstructured byte arrays. Both the producer and consumer need to have the same data structure for the messages. Apache Pulsar schema is the metadata that defines how to translate the raw message bytes into a formal structure type. It serializes data into raw bytes before they are published to a topic and deserializes the raw bytes before they are delivered to consumers.

Pulsar uses a built-in Pulsar Schema Registry as a central repository to store the registered schema information. The producers and consumers use the schema registry to coordinate the schema of a topic's messages through brokers. The Pulsar schema is applied at the Topic level. As producers and consumers can upload schema to brokers, Pulsar schema work on both the producer and consumer side.

Schema registry ensures data consistency and increases data quality. It supports schema versioning so that the different versions of the schema can be used simultaneously without causing compatibility issues.

Pulsar Schema is defined in a data structure called SchemaInfo. The following table describes the field of a SchemaInfo defined for Pulsar Schema.

Field Description
Name

Name of the Schema in string format.

Type

Specify the Schema Type for serializing and deserializing the schema data.

See Pulsar documentation for more information about Schema Types.

Schema

Specify the schema to be registered in the Schema Registry. Only the Avro schema type is supported.

Properties Specify a user-defined property as a string or string map, which can be used by applications to carry any application-specific logic.

Configuring the Schema Registry

See Pulsar Destination Configuration Properties for information about configuring the Schema Registry.

Schema Compatibility

When schemas evolve and change, you must add (register) different schema versions. Before adding a version of a schema, the Schema Registry checks whether the new schema is compatible with the previously registered schema for a given topic. See Pulsar documentation for more information about setting schema compatibility.

Schema Registration Scenarios

The schema compatibility check ensures that the existing consumers can process the introduced messages. The following table describes the different scenarios of schema registration in the Schema Registry based on the schema exists or not.

Note: You must set the appropriate schema compatibility type before registering a new schema.
Schema Compatibility Scenarios Result
If no schema exists for the topic
  • The producer is created with the schema.

  • The schema is transmitted to the broker and stored as there is no existing schema.

  • The consumer created using the same topic can consume messages using the same schema.

If a schema exists and the producer or consumer connects using the same schema that is already stored
  • The schema is transmitted to the broker.

  • The broker determines the compatibility of the schema.

  • The broker attempts to store the schema in BookKeeper. But if it is already stored, then the same schema is used to produce and consume the messages.

If a schema exists and the producer or consumer connects using a new schema that is compatible
  • The schema is transmitted to the broker.

  • The broker determines the compatibility of the schema and stores the new schema as the current version (with a new version number).