Apache Kafka Connect

Introduction to Apache Kafka Connect

Apache Kafka Connect is one of the new features introduced in Apache Kafka 0.9. It simplifies the integration between Apache Kafka and other systems. The main architecture of a messaging system is to have a Producer which will produce the data in topic and there will be Consumers who will consume those messages from the Topics as published by the Producer. To make it simplified Apache Kafka Connect comes in play which will take the data from other external systems, published in the Topics and from these topics it will published the data in other external systems. The connectors will served the above data flow smoothly from one external system to other external system without writing a single code for Publisher and Consumer. The connectors who are consuming the data from other systems and publishing the data in the Topics are called Source Connectors. And the connectors which are consuming the data from the Topics and published in other external systems are called Sink Connectors.

Please find below a flow diagram of Apache Kafka Connect:

From the above diagram we are getting a clear picture of Apache Kafka Functionality. There are different types of connectors are available:

  • File Source & Sink Connectors
  • JDBC Source & Sink Connectors
  • MQTT Source & Sink Connectors
  • NoSQL Source & Sink Connectors (MongoDB, HBase, Cassandra, Hadoop, etc.)
  • GraphQL Source & Sink Connectors

One can test the concept of the connectors after you download Kafka and unzip Kafka in the local folder. A default connector is available i.e. the FileConnector, by using the FileConnectors one can check the source and sink functionality.

How to test the default FileConnector functionality

  • Create a file named as test-source-file.txt
  • Use the default Kafka Topic named as connect-test
  • Write something in test-source-file.txt
  • It will be published in the connect-test topic using the source connector as defined in the connect-file-source.properties file in the config folder
  • Create another file named as test-sink-file.txt
  • From the connect-test topic using the sink connector it will published the data in the test-sink-file.txt using the connect-file-sink.properties present in the same config folder

File Connectors Example

A diagram will clear the idea that how the above functionality works

One can see here that 2 connectors have used:

  • FileStreamSource – It reads the data from the test-source-file.txt file and published the data in connect-test topic. This FileStreamSource connector is driven by a properties file named as connect-file-source.properties of config folder.
  • FileStreamSink – It consumes the data from the connect-test topic and write the data in the test-sink-file.txt file. This FileStreamSink connector is driven by another properties file named as connecct-file-sink.properties of the same config folder.

Let’s examine the connect-file-source.properties:

name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test-source-file.txt
topic=connect-test

Let’s provide the details of the key used in the above source properties file

  1. name – Name of the connector is local-file-source.
  2. connector.classFileStreamSource is the class which is performing the functionality to publish the record in topic reading the data from the file.
  3. tasks.max – Maximum tasks to be created. Here it is mentioned as 1.
  4. file – The file from where the FileStreamSource connector class will read the data from the data and publish it in a topic. By default the name of the file is test.txt. But I have changed it to test-source-file.txt.
  5. topic – The topic where the same Source Connector class will publish the data after reading from the file. By default the topic name is connect-test.

Now, let’s examine the connect-file-sink.properties:

name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test-sink-file.txt
topics=connect-test

Let’s provide the details of the key used in the above sink properties file

  1. name – Name of the connector is local-file-sink.
  2. connector.classFileStreamSink is the class which is performing the functionality to consume the record from the topic and writing the data to a file.
  3. tasks.max – Maximum tasks to be created. Here it is mentioned as 1.
  4. file – The file from where the FileStreamSink connector class will consume the data from the topic and write in a file. By default the name of the file is test.sink.txt. But I have changed it to test-sink-file.txt.
  5. topic – The topic where the same Source Connector class will publish the data after reading from the file. By default the topic name is connect-test.

Run Kafka File Connect in Windows

  • Download Kafka and extract Kafka in your local Windows system
  • Open the Command Prompt (CMD) and start the Zookeeper

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

  • Open another CMD and start the Apache Kafka Server

.\bin\windows\kafka-server-start.bat .\config\server.properties

  • Open another CMD and start both the Source and Sink Connectors

.\bin\windows\connect-standalone.bat config\connect-standalone.properties config\connect-file-source.properties config\connect-file-sink.properties

After this command, the source connector is ready to read the content from the test-source-file.txt present inside the extracted Kafka folder.

  • Write some dummy lines in the test-source-file.txt file and check whether that will be published in the connect-test topic using the File Source Connector and from there

Hi Test Name
Welcome to Kafka Connect
Hi How are you doing?

  • Open another CMD and start to monitor the connect-test Kafka Topic from the beginning

.\bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:2181 --from-beginning --topic connect-test

Output of the above command is:

{"schema":{"type":"string","optional":false},"payload":"Hi Test Name"}

{"schema":{"type":"string","optional":false},"payload":"Welcome to Kafka Connect"}

{"schema":{"type":"string","optional":false},"payload":"Hi How are you doing?"}

  • Verified that the same lines are present in the test-sink-file.txt after the File Sink Connector consume the data from the connect-test topic and write those lines in the test-sink-file.txt file

Hi Test Name
Welcome to Kafka Connect
Hi How are you doing?

I have provided one child page where you can experience the usage of a Mongo Kafka Connector using a small Spring Boot application.