OpenStack - Ironic Race Condition during Deployment
With RDO Manager/RH-OSP, I encountered a problem with Ironic & Nova during a deployment of ~30 physical servers.
By default, Nova is able to launch 10 instances builds to run concurrently but actually, Ironic can’t deal with and it cause a race condition…
The problem is Ironic try to attach a wrong profile/instance to a node.
Error example: Compute profile on a Controller server.
2015-08-24 18:18:05.228 1200 ERROR oslo-messaging.rpc.dispatcher [-] Exception during message handling: Node 702d0321-5e9d-49f2-b914-0831bcbdfebd is associated with instance ccc2dc87-81b1-4769-8019-ffa5e3ecb979.
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher Traceback (most recent call last):
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo-messaging/rpc/dispatcher.py", line 142, in dispatch-and-reply
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher executor-callback))
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo-messaging/rpc/dispatcher.py", line 186, in dispatch
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher executor-callback)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo-messaging/rpc/dispatcher.py", line 130, in do-dispatch
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher result = func(ctxt, new-args)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo-messaging/rpc/server.py", line 142, in inner
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher return func(args, kwargs)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update-node
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher node-obj.save()
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher return fn(self, ctxt, args, kwargs)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher self.dbapi.update-node(self.uuid, updates)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update-node
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher return self.-do-update-node(node-id, values)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in do-update-node
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher instance=ref.instance-uuid)
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher --> NodeAssociated: Node 702d0321-5e9d-49f2-b914-0831bcbdfebd is associated with instance ccc2dc87-81b1-4769-8019-ffa5e3ecb979. <--
2015-08-24 18:18:05.228 1200 TRACE oslo-messaging.rpc.dispatcher
The workaround consists to:
- Decrease the number of max concurrent builds from
10
to2
- Set the number of nodes you want to deploy on
scheduler_max_attempts
parameter - Reduce size of the RPC thread pool from
64
to4
/etc/nova/nova.conf
max_concurrent_builds=2
scheduler_max_attempts=30
/etc/ironic/ironic.conf
rpc_thread_pool_size = 4
Don’t forget to restart Ironic & Nova services.
This is the minimum values, you can increase them to find a better adjustment.
More information here