“rabbitmq 停掉以后,compute会退出”问题的解决

当rabbitmq 停掉以后,过两分钟左右,compute会自动退出,日志中出现:

2012-03-25 21:41:26 INFO nova.rpc.common [-] Reconnecting to AMQP server on 192.168.28.5:5672
2012-03-25 21:41:27 ERROR nova.rpc.common [-] AMQP server on 192.168.28.5:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 7 seconds.
(nova.rpc.common): TRACE: Traceback (most recent call last):
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 446, in reconnect
(nova.rpc.common): TRACE: self._connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 423, in _connect
(nova.rpc.common): TRACE: self.connection.connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 118, in connect
(nova.rpc.common): TRACE: return self.connection
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 438, in connection
(nova.rpc.common): TRACE: self._connection = self._establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 404, in _establish_connection
(nova.rpc.common): TRACE: conn = self.transport.establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 242, in establish_connection
(nova.rpc.common): TRACE: connect_timeout=conninfo.connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 51, in __init__
(nova.rpc.common): TRACE: super(Connection, self).__init__(*args, **kwargs)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/connection.py”, line 125, in __init__
(nova.rpc.common): TRACE: self.transport = create_transport(host, connect_timeout, ssl)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 220, in create_transport
(nova.rpc.common): TRACE: return TCPTransport(host, connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 58, in __init__
(nova.rpc.common): TRACE: self.sock.connect((host, port))
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 179, in connect
(nova.rpc.common): TRACE: socket_checkerr(fd)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 43, in socket_checkerr
(nova.rpc.common): TRACE: raise socket.error(err, errno.errorcode[err])
(nova.rpc.common): TRACE: error: [Errno 113] EHOSTUNREACH
(nova.rpc.common): TRACE:

 

这个问题,是由于openstack中,对rabbitmq 如果失去连接,会进行尝试,缺省是尝试12次,每次间隔10秒,到时间还不能连接,就抛出错误,退出。

解决办法,在 nova.conf 加入下面的参数:

#防止 rabbitmq重启导致 compute 死掉
rabbit_max_retries=0

 

具体原因,可以参见代码:impl_kombu.py

self.max_retries = FLAGS.rabbit_max_retries

def reconnect(self):
“””Handles reconnecting and re-establishing queues.
Will retry up to self.max_retries number of times.
self.max_retries = 0 means to retry forever.
Sleep between tries, starting at self.interval_start
seconds, backing off self.interval_stepping number of seconds
each attempt.
“””

if self.max_retries and attempt == self.max_retries:
LOG.exception(_(‘Unable to connect to AMQP server on ‘
‘%(hostname)s:%(port)d after %(max_retries)d ‘
‘tries: %(err_str)s’) % log_info)
# NOTE(comstud): Copied from original code. There’s
# really no better recourse because if this was a queue we
# need to consume on, we have no way to consume anymore.
sys.exit(1)

此条目发表在OpenStack分类目录,贴了, 标签。将固定链接加入收藏夹。