All of a sudden, the Hypervisor Summary page in Horizon dashboard was not updating the usage stats of vCPUs, RAM and Local storage for one of the compute node. I see new VMs are launched on that compute node, but the stats page still showed all the vCPUs, RAM and disk were intact/not-used (though the new VMs have consumed all of the available resources on that node). Here’s a snapshot of the error “nova.compute.manager Stderr: u qemu-img: Could not open”
Below is the snapshot of Hypervisor Summary page displaying the usage statistics of all compute hosts. In my case, the compute host (cloudsecurity4) was not reporting the correct usage stats.
I expected the usage stats to change when new VMs are launched, but that was not the case. The below snapshot shows the number of VMs that are scheduled on compute node “cloudsecurity4”.
Do you face similar issue in OpenStack Mitaka? Then, here’s how I fixed the issue.
Solution:
Step 1: Lookout for any error message in compute host.
# tailf /var/log/nova/nova-compute.log
ERROR nova.compute.manager Stderr: u"qemu-img: Could not open '/var/lib/libvirt/images/test-1.qcow2': Could not open '/var/lib/libvirt/images/test-1.qcow2': Permission denied\n" INFO nova.compute.resource_tracker [req-5e1d0cdf-216b-4ca8-bdb4-c178825784ba - - - - -] Auditing locally available compute resources for node cloudsecurity4 ERROR nova.compute.manager [req-5e1d0cdf-216b-4ca8-bdb4-c178825784ba - - - - -] Error updating resources for node cloudsecurity4
The above error message says ‘qemu-img‘ is not able to open an image that’s stored in /var/lib/libvirt/images folder and surprisingly, it was looking for test-1.qcow2. I’m not clear why Nova was even trying to run qemu-img on test-1.qcow2 file, because I don’t see any instance running in the name of ‘test-1‘ nor I remember one was running before. Even if an instance named ‘test-1‘ was running before, why Nova was even attempting to read that image now? Well, the answer to that question still remains blank to me.
However, the permission denied error tempted me to check the permission of the folder ‘/var/lib/libvirt/images‘ and it was owned by user ‘libvirt-qemu‘ and group ‘kvm‘. So what do you think I would have done? Of course, I changed the ownership of the folder to ‘nova:nova‘ thinking that nova-compute service should not have a problem in reading the image files.
Step 2: Provide permission for nova to read images in /var/lib/libvirt/images folder.
# chown nova:nova /var/lib/libvirt/images
Step 3: Restart nova-compute service
# /etc/init.d/nova-compute restart
You know what? The Hypervisor summary started showing the correct usage statistics for compute host (cloudsecurity4).
I did go back to nova-compute log file to see what it says now.
# tailf /var/log/nova/nova-compute.log
WARNING nova.virt.libvirt.driver [req-9305df9b-d716-4c3c-bc3e-b75945f85ed8 - - - - -] Periodic task is updating the host stat, it is trying to get disk test, but disk file was removed by concurrent operations such as resize. 2017-06-01 22:35:59.818 97322 INFO nova.compute.resource_tracker [req-9305df9b-d716-4c3c-bc3e-b75945f85ed8 - - - - -] Total usable vcpus: 16, total allocated vcpus: 13
From the above snapshot, it was clear that nova.compute.resource_tracker was reporting the correct usage statistics of the compute host.
There’s also a bug report that talks about this issue.
Please can you train on Openstack. I will work with your timing, I need help fast