问题描述:

We have deployed OpenStack Kilo on Ubuntu 14.04 using Mirantis OS fuel 7.0. We have a setup with 3 controller/storage nodes and 3 compute nodes, all connected for high availability through haproxy.

After some problems with the storage (we're using a multi-backend based on Cinder/LVM volumes on the controllers plus a NetApp NFS shared storage) we've managed to get volume creation, expanding, deletion, attachment, detachment etc.

The problem is when we try to create an snapshot of one of the volumes stored in cinder_iscsi backend. We get this, on the controller that's trying to create the snapshot through the web interface:

 [email protected]:~# tail -f /var/log/cinder-all.log | grep -v "cinder-api"

<158>May 11 11:28:17 Nefeles001 cinder-volume 2016-05-11 11:28:17.294 92341 INFO cinder.volume.manager [req-a79a8998-70f7-4b9d-b1d1-68f8a04e5399 2d60663e49a74eca9f0a96dc713154c5 2a1b8d6fd53045dd8acc8b09c292cb9f - - -] snapshot c4d32012-38ca-4ba8-bca2-186d5703620d: creating

<158>May 11 11:28:17 Nefeles001 cinder-volume 2016-05-11 11:28:17.836 92341 INFO cinder.brick.local_dev.lvm [req-a79a8998-70f7-4b9d-b1d1-68f8a04e5399 2d60663e49a74eca9f0a96dc713154c5 2a1b8d6fd53045dd8acc8b09c292cb9f - - -] Logical Volume not found when querying LVM info. (vg_name=cinder, lv_name=volume-3f253a13-7f12-46fb-bec5-df70b80d9d9c

<155>May 11 11:28:17 Nefeles001 cinder-volume 2016-05-11 11:28:17.836 92341 ERROR cinder.brick.local_dev.lvm [req-a79a8998-70f7-4b9d-b1d1-68f8a04e5399 2d60663e49a74eca9f0a96dc713154c5 2a1b8d6fd53045dd8acc8b09c292cb9f - - -] Trying to create snapshot by non-existent LV: volume-3f253a13-7f12-46fb-bec5-df70b80d9d9c

<155>May 11 11:28:17 Nefeles001 cinder-volume 2016-05-11 11:28:17.861 92341 ERROR oslo_messaging.rpc.dispatcher [req-a79a8998-70f7-4b9d-b1d1-68f8a04e5399 2d60663e49a74eca9f0a96dc713154c5 2a1b8d6fd53045dd8acc8b09c292cb9f - - -] Exception during message handling: Volume device not found at volume-3f253a13-7f12-46fb-bec5-df70b80d9d9c.

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher executor_callback))

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher executor_callback)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 662, in create_snapshot

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher snapshot.save(context)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 654, in create_snapshot

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher model_update = self.driver.create_snapshot(snapshot)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 351, in create_snapshot

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher self.configuration.lvm_type)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 785, in _wrapper

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher return r.call(f, *args, **kwargs)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/retrying.py", line 223, in call

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher return attempt.get(self._wrap_exception)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/retrying.py", line 261, in get

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.value[0], self.value[1], self.value[2])

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/retrying.py", line 217, in call

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/brick/local_dev/lvm.py", line 567, in create_lv_snapshot

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher raise exception.VolumeDeviceNotFound(device=source_lv_name)

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher VolumeDeviceNotFound: Volume device not found at volume-3f253a13-7f12-46fb-bec5-df70b80d9d9c.

2016-05-11 11:28:17.861 92341 TRACE oslo_messaging.rpc.dispatcher

Now, doing a little digging, we found out that such volume is not stored on that specific node. The problem is, each controller node knows about the whole lot of volumes on the platform through the common database, but when the snapshot creation is intercepted by one of the nodes that doesn't physically have the LVM block stored on its discs, it fails.

We can confirm this because issuing the same command on the local node, using cinder CLI instead of the horizon web interface, the snapshot gets created and works as intended. Same for delete, etc.

As a side note, we have not created AvailabilityZones for the storage setup, so all three controllers use the default nova but I think if that was the issue, we'd be having much problems with the crate/delete volumes which never fail regardless of if the controller receiving the order is the one with the volume block in it.

Thanks.

相关阅读:
Top