Kafka 测试环境宕机原因查询(二)

liondinas

浏览: 398931 次
性别:
来自: 北京

最近访客更多访客>>

Harleyd

chenjizone

eejron

fxtxz2

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Kafka

上节遗留问题

    a) 关于 inLock(controllerContext.controllerLock) 这个代码，因为 controllerLock 是 ReentrantLock 类型
    的锁，所以如果处理 session失效的zookeeper线程成功抢占当前嗦的话，就能进入到 handleNewSession 内部方法，并进一步
    执行 onControllerResignation 里面的代码。而正式因为可重入锁，也可以执行 onControllerResignation 里面的代码

    b) 里面有 deleteTopicManager.shutdown()-> deleteTopicsThread.shutdown() 方法。而此时，对应的 controllerLock
    已经调用了两次 lock，第一次是 handleNewSession 中的 lock, 第二次是 onControllerResignation 的lock。

启动分析

a) 10.3.63.5/6/7 三台服务节点，对应的是 1， 2， 3。三台服务于 2015-01-20 13：51 左右同时启动。
    b） 5 上面的 controller.log 如下：

[2015-01-20 13:51:13,361] INFO [ControllerEpochListener on 1]: Initialized controller epoch to 21 and zk version 20 (kafka.controller.ControllerEpochListener)
[2015-01-20 13:51:13,467] INFO [Controller 1]: Controller starting up (kafka.controller.KafkaController)
[2015-01-20 13:51:13,538] INFO [Controller 1]: Controller startup complete (kafka.controller.KafkaController)

        6 上面的 controller.log 如下：

[2015-01-20 13:50:45,173] INFO [ControllerEpochListener on 2]: Initialized controller epoch to 21 and zk version 20 (kafka.controller.ControllerEpochListener)
[2015-01-20 13:50:45,222] INFO [Controller 2]: Controller starting up (kafka.controller.KafkaController)
[2015-01-20 13:50:45,291] INFO [Controller 2]: Controller startup complete (kafka.controller.KafkaController)

        7 上面的 controller.log 如下：

[2015-01-20 13:52:09,970] INFO [ControllerEpochListener on 3]: Initialized controller epoch to 20 and zk version 19 (kafka.controller.ControllerE
pochListener)
[2015-01-20 13:52:10,063] INFO [Controller 3]: Controller starting up (kafka.controller.KafkaController)
[2015-01-20 13:52:10,088] INFO [Controller 3]: Broker 3 starting become controller state transition (kafka.controller.KafkaController)
[2015-01-20 13:52:10,098] INFO [Controller 3]: Controller 3 incremented epoch to 21 (kafka.controller.KafkaController)
[2015-01-20 13:52:10,691] INFO [Controller 3]: Partitions undergoing preferred replica election: (kafka.controller.KafkaController)
[2015-01-20 13:52:10,692] INFO [Controller 3]: Partitions that completed preferred replica election: (kafka.controller.KafkaController)
[2015-01-20 13:52:10,693] INFO [Controller 3]: Resuming preferred replica election for partitions: (kafka.controller.KafkaController)
[2015-01-20 13:52:10,698] INFO [Controller 3]: Partitions being reassigned: Map() (kafka.controller.KafkaController)
[2015-01-20 13:52:10,699] INFO [Controller 3]: Partitions already reassigned: List() (kafka.controller.KafkaController)
[2015-01-20 13:52:10,700] INFO [Controller 3]: Resuming reassignment of partitions: Map() (kafka.controller.KafkaController)
[2015-01-20 13:52:10,708] INFO [Controller 3]: List of topics to be deleted: (kafka.controller.KafkaController)
[2015-01-20 13:52:10,709] INFO [Controller 3]: List of topics ineligible for deletion: V1_A,V1_TEST,V1_B (kafka.controller.KafkaController)
[2015-01-20 13:52:10,712] INFO [Controller 3]: Currently active brokers in the cluster: Set() (kafka.controller.KafkaController)
[2015-01-20 13:52:10,713] INFO [Controller 3]: Currently shutting brokers in the cluster: Set() (kafka.controller.KafkaController)
[2015-01-20 13:52:10,713] INFO [Controller 3]: Current list of topics in the cluster: Set(V1_TEST, V1_B, V1_A) (kafka.controller.KafkaController)
[2015-01-20 13:52:10,730] INFO [Replica state machine on controller 3]: Started replica state machine with initial state -> Map([Topic=V1_TEST,Pa
rtition=9,Replica=2] -> ReplicaDeletionIneligible, [Topic=V1_A,Partition=1,Replica=2] -> ReplicaDeletionIneligible, [Topic=V1_A,Partition=4,Repli
...
[2015-01-20 13:52:11,035] INFO [Partition state machine on Controller 3]: Started partition state machine with initial state -> Map([V1_TEST,4] -
> OfflinePartition, [V1_A,4] -> OfflinePartition, [V1_TEST,3] -> OfflinePartition, [V1_B,9] -> OfflinePartition, [V1_A,3] -> OfflinePartition, [V
1_TEST,2] -> OfflinePartition, [V1_TEST,0] -> OfflinePartition, [V1_B,2] -> OfflinePartition, [V1_TEST,1] -> OfflinePartition, [V1_B,4] -> Offlin
ePartition, [V1_B,8] -> OfflinePartition, [V1_B,1] -> OfflinePartition, [V1_B,6] -> OfflinePartition, [V1_TEST,7] -> OfflinePartition, [V1_A,0] -
> OfflinePartition, [V1_TEST,8] -> OfflinePartition, [V1_B,0] -> OfflinePartition, [V1_B,5] -> OfflinePartition, [V1_TEST,9] -> OfflinePartition,
[V1_TEST,5] -> OfflinePartition, [V1_A,1] -> OfflinePartition, [V1_A,5] -> OfflinePartition, [V1_B,7] -> OfflinePartition, [V1_B,3] -> OfflinePa
rtition, [V1_A,2] -> OfflinePartition, [V1_TEST,6] -> OfflinePartition, [V1_TEST,10] -> NewPartition, [V1_A,6] -> OfflinePartition) (kafka.contro
ller.PartitionStateMachine)
[2015-01-20 13:52:11,042] INFO [Controller 3]: Broker 3 is ready to serve as the new controller with epoch 21 (kafka.controller.KafkaController)
[2015-01-20 13:52:11,044] INFO [Controller 3]: Starting preferred replica leader election for partitions (kafka.controller.KafkaController)
[2015-01-20 13:52:11,045] INFO [Partition state machine on Controller 3]: Invoking state change to OnlinePartition for partitions (kafka.control
ler.PartitionStateMachine)
[2015-01-20 13:52:11,098] INFO [Controller 3]: Controller startup complete (kafka.controller.KafkaController)

      c) 通过查看zookeeper的broker/ids 的节点，发现先出现 3，然后是 2，然后是1。而且根据上面的日志显示，只有节点 3 真正成为了 controller。原理是在 KafkaController
      startup 的时候进行了选举，如果谁能够第一个写入到zookeeper 的 /controler 节点成功，则成为真正意义上的 controller，目前便于理解说是主controller。然后在成功成为主
      controller后会调用 KafkaController 的 onControllerFailover 方法。这个回调是在 ZookeeperLeaderElector .elect 方法中的 onBecomingLeader（）

      d) 这样的话，就不是所有的controller 都启动了 deleteTopicManager.start()，只有主的controller 启动了。而只有主的controller 才执行了 onControllerFailover ->
      initializeControllerContext() ->initializeTopicDeletion() -> deleteTopicManager = new TopicDeletionManager(this, topicsQueuedForDeletion,
      topicsIneligibleForDeletion)

      e) 而其他的controller 是没有启动 deleteTopicManager 的，从上面的代码也可以看到， deleteTopicManager 也是没有初始化的，即还是 null 。

      f) 根据上面的得出，就是在zookeeper的session失效时，进行回调。先回调 KafkaController 的 handleNewSession，然后回调 KafkaHealthcheck 的 handleNewSession
      方法。根据zookeeper 的源码得出，执行这个回调的是另外一个线程。所以会先输出 ZK expired; shut down all controller components and try to re-elect 代码。然后
      是执行 inLock(controllerContext.controllerLock)，而 controllerLock 是 ReentrantLock。所以，如果当前有其他线程先拿到这个锁，就导致内部的代码无法执行。这也
      就能解释为什么有的执行内部代码了，有的没有执行。

      g) 上面我们假设执行了内部代码，所以会往下执行一直到 onControllerResignation() -> deleteTopicManager.shutdown() -> deleteTopicsThread.shutdown()。实际
      执行的代码如下：

def shutdown(): Unit = {
    info("Shutting down")
    isRunning.set(false)
    if (isInterruptible)
      interrupt()
    shutdownLatch.await()
    info("Shutdown completed")
}

     如果这个失效所在的节点是主 controller 的话，这个线程是启动了的，也就是会调用 start方法，代码如下。如果不是主节点，是不会调用 start 方法。这样在 shutdownLatch.await()
     处一直等待。

override def run(): Unit = {
    info("Starting ")
    try{
      while(isRunning.get()){
        doWork()
      }
    } catch{
      case e: Throwable =>
        if(isRunning.get())
          error("Error due to ", e)
    }
    shutdownLatch.countDown()
    info("Stopped ")
}

    h) 这里面有一个疑问是，如果不是主controller， deleteTopicManager 和 deleteTopicsThread 都不会被初始化的，也就是 null 。执行代码会抛出空指针异常的，目前
    却没有遇到。暂时还不能解决这个问题。。。。。

I0Itec zkclient 源码地址
http://www.boyunjian.com/javasrc/com.github.sgroschupf/zkclient/0.1/_/org/I0Itec/zkclient/util/ZkEventThread.java#

ReentrantLock 分析：
http://blog.csdn.net/aesop_wubo/article/details/7574379

server.log 分析

a) 5 上面的 server.log 如下：
[2015-01-20 13:50:01,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,167] INFO Property broker.id is overridden to 1 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,167] INFO Property host.name is overridden to 10.3.63.5 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,167] INFO Property log.cleaner.enable is overridden to true (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,168] INFO Property log.dirs is overridden to /data/kafka-log (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,168] INFO Property log.retention.bytes is overridden to 5368709120 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,169] INFO Property log.retention.check.interval.ms is overridden to 600000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,169] INFO Property log.retention.hours is overridden to 720 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,169] INFO Property log.segment.bytes is overridden to 536870912 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,170] INFO Property num.io.threads is overridden to 8 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,170] INFO Property num.network.threads is overridden to 500 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,171] INFO Property num.partitions is overridden to 2 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,171] INFO Property port is overridden to 9092 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,171] INFO Property socket.receive.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,172] INFO Property socket.request.max.bytes is overridden to 104857600 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,172] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,173] INFO Property zookeeper.connect is overridden to 10.3.63.8:2181,10.3.63.9:2181 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,173] INFO Property zookeeper.connection.timeout.ms is overridden to 500000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:50:01,532] INFO [Kafka Server 1], starting (kafka.server.KafkaServer)
[2015-01-20 13:50:01,535] INFO [Kafka Server 1], Connecting to zookeeper on 10.3.63.8:2181,10.3.63.9:2181 (kafka.server.KafkaServer)

    b) 6 上面的server.log 如下：

[2015-01-20 13:49:57,403] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,455] INFO Property broker.id is overridden to 2 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,455] INFO Property host.name is overridden to 10.3.63.6 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,455] INFO Property log.cleaner.enable is overridden to true (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,456] INFO Property log.dirs is overridden to /data/kafka-log (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,456] INFO Property log.retention.bytes is overridden to 5368709120 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,456] INFO Property log.retention.check.interval.ms is overridden to 600000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,456] INFO Property log.retention.hours is overridden to 720 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,457] INFO Property log.segment.bytes is overridden to 536870912 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,457] INFO Property num.io.threads is overridden to 8 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,457] INFO Property num.network.threads is overridden to 500 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,457] INFO Property num.partitions is overridden to 2 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,457] INFO Property port is overridden to 9092 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,458] INFO Property socket.receive.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,458] INFO Property socket.request.max.bytes is overridden to 104857600 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,458] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,458] INFO Property zookeeper.connect is overridden to 10.3.63.8:2181,10.3.63.9:2181 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,459] INFO Property zookeeper.connection.timeout.ms is overridden to 500000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:49:57,480] INFO [Kafka Server 2], starting (kafka.server.KafkaServer)
[2015-01-20 13:49:57,482] INFO [Kafka Server 2], Connecting to zookeeper on 10.3.63.8:2181,10.3.63.9:2181 (kafka.server.KafkaServer)

    c) 7 上面的 server.log 如下：
[2015-01-20 13:51:36,520] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,680] INFO Property broker.id is overridden to 3 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,680] INFO Property host.name is overridden to 10.3.63.7 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,681] INFO Property log.cleaner.enable is overridden to true (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,681] INFO Property log.dirs is overridden to /data/kafka-log (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,681] INFO Property log.retention.bytes is overridden to 5368709120 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,681] INFO Property log.retention.check.interval.ms is overridden to 600000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,682] INFO Property log.retention.hours is overridden to 720 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,682] INFO Property log.segment.bytes is overridden to 536870912 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,682] INFO Property num.io.threads is overridden to 8 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,682] INFO Property num.network.threads is overridden to 500 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,683] INFO Property num.partitions is overridden to 2 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,683] INFO Property port is overridden to 9092 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,683] INFO Property socket.receive.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,683] INFO Property socket.request.max.bytes is overridden to 104857600 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,684] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,684] INFO Property zookeeper.connect is overridden to 10.3.63.8:2181,10.3.63.9:2181 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,684] INFO Property zookeeper.connection.timeout.ms is overridden to 500000 (kafka.utils.VerifiableProperties)
[2015-01-20 13:51:36,740] INFO [Kafka Server 3], starting (kafka.server.KafkaServer)
[2015-01-20 13:51:36,743] INFO [Kafka Server 3], Connecting to zookeeper on 10.3.63.8:2181,10.3.63.9:2181 (kafka.server.KafkaServer)

    d) 目前看到的日志结果是一样的。

上述分析问题

a) 按照上面的分析，如果是主controller 的话，则不可能出现死掉的现象。而现实的情况是 7 也确实出现了以前出现的现象。下面是 7 上的 controller 日志
[2015-01-20 14:51:57,113] INFO [SessionExpirationListener on 3], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
[2015-01-20 14:51:57,115] INFO [delete-topics-thread], Shutting down (kafka.controller.TopicDeletionManager$DeleteTopicsThread)

上节遗留问题再分析

a) 由上面的分析我们知道，上个节点只有一个成功写入 /controler 节点的才能成为真正的 controller,
    这个真正的controller会启动一个 DeleteTopicsThread 的线程，等待用户的删除话题或者其他的删除
    话题的请求，并处理这些请求。

    b) 等待是通过 doWork() -> awaitTopicDeletionNotification() -> deleteTopicsCond.await()
    来挂起的。通过 await 方法，可以使其他线程能够获取到对应的锁

    c) 而 deleteTopicsCond = controllerContext.controllerLock.newCondition() 是原来 controller
    中的 ReentrantLock 而得到的。参考下面的链接能知道，正是因为 DeleteTopicsThread 的 deleteTopicsCond.await()
    方法，才能够使 KafkaController 的handNession 能够获取到对应的锁。

    d) 此时的 handNewssion 会导致 DeleteTopicsThread 的 shutdown 操作。而如果此时没有其他线程占用 ReentrantLock
    锁的话， deleteTopicsCond.await() 方法，会因为 interrupt() 方法而打断。但是，这个shutdown 操作，是 zookeepeer
    的线程在 handNewssion 操作，在操作之前已经获得了 ReentrantLock 的锁，而此时， DeleteTopicsThread 需要重新或得
    这个锁，就会一直等待。 await 代码参考如下：

public final void await() throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    Node node = addConditionWaiter(); // 将当前线程包装下后，
                                      // 添加到Condition自己维护的一个链表中。
    int savedState = fullyRelease(node);// 释放当前线程占有的锁，从demo中看到，
                                        // 调用await前，当前线程是占有锁的

    int interruptMode = 0;
    while (!isOnSyncQueue(node)) {// 释放完毕后，遍历AQS的队列，看当前节点是否在队列中，
        // 不在说明它还没有竞争锁的资格，所以继续将自己沉睡。
        // 直到它被加入到队列中，聪明的你可能猜到了，
        // 没有错，在singal的时候加入不就可以了？
        LockSupport.park(this);
        if ((interruptMode = checkInterruptWhileWaiting(node)) != 0)
            break;
    }
    // 被唤醒后，重新开始正式竞争锁，同样，如果竞争不到还是会将自己沉睡，等待唤醒重新开始竞争。
    if (acquireQueued(node, savedState) && interruptMode != THROW_IE)
        interruptMode = REINTERRUPT;
    if (node.nextWaiter != null)
        unlinkCancelledWaiters();
    if (interruptMode != 0)
        reportInterruptAfterWait(interruptMode);
}

我们知道AQS自己维护的队列是当前等待资源的队列，AQS会在资源被释放后，依次唤醒队列中从前到后的所有节点，使他们对应的线程恢复执行。直到队列为空。

而Condition自己也维护了一个队列，该队列的作用是维护一个等待signal信号的队列，两个队列的作用是不同，事实上，每个线程也仅仅会同时存在以上两个队列中的一个，流程是这样的：

    线程1调用reentrantLock.lock时，线程被加入到AQS的等待队列中。
    线程1调用await方法被调用时，该线程从AQS中移除，对应操作是锁的释放。
    接着马上被加入到Condition的等待队列中，以为着该线程需要signal信号。
    线程2，因为线程1释放锁的关系，被唤醒，并判断可以获取锁，于是线程2获取锁，并被加入到AQS的等待队列中。
    线程2调用signal方法，这个时候Condition的等待队列中只有线程1一个节点，于是它被取出来，并被加入到AQS的等待队列中。注意，这个时候，线程1 并没有被唤醒。
    signal方法执行完毕，线程2调用reentrantLock.unLock()方法，释放锁。这个时候因为AQS中只有线程1，于是，AQS释放锁后按从头到尾的顺序唤醒线程时，线程1被唤醒，于是线程1回复执行。
    直到释放所整个过程执行完毕。