Q-1). What is Kafka?
A-1). Kafka is an open source distributed and fault tolerant event streaming platform developed by Apache Software Foundation written in Scala.
Q-2). What are the salient features of Apache Kafka?
A-2). The salient features of Apache Kafka are as follows:
- High Throughput – Supports millions of messages
- Scalability – Highly scalable system with no downtime
- Replication – Messages are replicated across the cluster to support for multiple subscribers and balances the consumers in case of failures
- Durability – Persistence of messages to disk
- Stream Processing – Used with real time streaming applications like Apache Kafka & Storm
- Data Loss – Kafka with proper configurations can ensure zero data loss
Q-3). What are the various components in Kafka?
A-3). The four major components of Kafka are as follows:
- Topic – A stream of messages belonging to the same type
- Producer – Publishes message to the Topic
- Brokers – A set of servers where the published messages are stored
- Consumer – Subscribes to various topics and pulls the data from the brokers
Q-4). What is the role of offset?
A-4). Messages in the partition are assigned with a unique identification number is called the offset. The role of the offset is to uniquely identify each and every messages within the partition.
Q-5). What is Consumer Group?
A-5). Consumer Group consists of one or more Consumers that jointly consume a set of subscribed topics.
Q-6). What is a Topic?
A-6). Apache Kafka Topic is that entity where the Producers publishes the messages and the Consumers consumes the messages from it. Every topic is assigned to an unique name and in this Topic the data resides like that of database.
Q-7). What is Apache Kafka Broker?
A-7). The physical/virtual machines where the Topic resides are called Brokers.
Q-8). What is the role of Zookeeper?
A-8). Zookeeper is used to store the offsets of a consumed messages for a specific topic and partition by a specific Consumer Group.
Q-9). Is it possible to use Kafka without Zookeeper?
A-9). No it is not possible to bypass zookeeper and connect to the Kafka Server. If for any reason, zookeeper is down then you cannot process any client request.
Q-10). What is Leader and Follower?
A-10). Every partition in Kafka Clusters has one leader and none or more servers acts as followers. The leader performs the reading and writing requests for the partition, while the role of the followers is to replicate the leader. If for any reason, leader is down then one of the followers will be assigned as the leader.
Q-11). What are Replicas and ISR?
A-11). Replicas are essentially a list of nodes that replicate the log for a particular partition irrespective of whether they play the role of the leader or not. While, ISR stands for In-Sync Replicas, a set of message replicas that are synced to the leaders.
Q-12). Why are replications critical in Kafka?
A-12). Replication ensures that any published messages are not lost and can be consumed in the event of any program error, frequent software upgrades or machine error.
Q-13). If a Replica stays out from ISR for a long time, what does it signify?
A-13). It means that the follower is unable to fetch data so fast as data accumulated by the leader.
Q-14). What is the process to start a Kafka Server?
A-14). First start the Apache Zookeeper then start the Kafka Server.
- Command to start the Apache Zookeeper – .\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties
- Command to start the Kafka Server – .\bin\windows\kafka-server-start.bat .\config\server.properties
Q-15). How to define a Partitioning Key?
A-15). Partitioning Key is used to determine the destination partition of the message. A hashing based partitioner is used to determine the Partition Key. One can also use customized partitions.
Q-16). When does QueueFullException occur?
A-16). QueueFullException occurs when the Producer sending messages at a certain pace or speed which the Brokers cannot handle. So in this scenario one needs to add more brokers to handle the increased load.
Q-17). What is the role of KafkaProducer API?
A-17). KafkaProducer API creates a wrapper for 2 types of producers one is kafka.producer.SyncProducer and another is kafka.producer.async.AsyncProducer. The goal is to provide a single API to the client which can handle all the producer functionalities.
Q-18). What is the main difference between Kafka and Flume?
A-18). Kafka is scalable while Flume is not. Kafka ensures message durability while Flume is not.
Q-19). What is the difference between Traditional Messaging System and Apache Kafka?
A-19). The differences are as follows:
Traditional Messaging System | Apache Kafka |
Limited Scalability | Scalable |
Transient in-memory persistence | Messages also stored in replicated logs |
Lower throughput | Higher throughput |
Q-20). What are the components in a Kafka Message Structure?
A-20). A Kafka Message has 3 parts:
- Timestamp Identifier
- A Unique Message ID
- Message Payload (Binary)