* "No such device" error when mounting immediately after formatting [not found] <CE549F34.4ABB%kelsey.prantis@intel.com> @ 2013-09-10 16:39 ` Prantis, Kelsey 2013-09-12 7:58 ` Stefan Hajnoczi 0 siblings, 1 reply; 3+ messages in thread From: Prantis, Kelsey @ 2013-09-10 16:39 UTC (permalink / raw) To: kvm@vger.kernel.org; +Cc: Murrell, Brian Hi folks, We have been experiencing a problem with our test bed for a while now, and were hoping perhaps some of the expertise on this mailing list might be able to help us find a solution. We have a cluster of 7 KVM vms on a host. The host OS is Fedora 18, and the guest OS is Centos 6.4. Installed kvm/qemu/kernel packages are as follows: qemu-system-x86-1.2.2-11.fc18.x86_64 qemu-common-1.2.2-11.fc18.x86_64 qemu-img-1.2.2-11.fc18.x86_64 libvirt-daemon-driver-qemu-0.10.2.5-1.fc18.x86_64 qemu-kvm-1.2.2-11.fc18.x86_64 ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch kernel-3.9.4-200.fc18.x86_64 To 4 of the vms we have attached the same 5 lvs to be used as shared storage, with definitions like the below (disk1-disk5): <disk type='block' device='disk'> <driver name='qemu' type='raw' /> <source dev='/dev/vg_00/disk1'/> <target dev='sda' bus='scsi'/> <shareable/> <serial>disk1</serial> <alias name='scsi0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> Throughout the course of our automated test suite, our tests format the device with an ext4 file system and then immediately mount the file system to write a few files after the format completes. Most of the time this works great. However, some small percentage of the time it is failing on the mount command with "No such device". Unable to mount /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1: No such device We know that the device does in fact exist and was operable, since the mkfs command just had completed successfully and without error, so I am not sure why suddenly it is returning "No such device" when trying to mount, and only a small percentage of the time. To prove that the device is in fact there, we've tried putting the mount into a retry-loop as a debug measure to show the device is eventually there, and without fail in one of the loop iterations the mount does complete successfully. It seems like there could possibly be some sort of race between closing the device after the mkfs and quickly opening it again for the mount? We've reproduced this both with directly attached devices, as above, as well as with iscsi devices. At this point I am pretty stumped how to even continue debugging this issue, so help would be very much appreciated! Thankful for any help, Kelsey Prantis ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: "No such device" error when mounting immediately after formatting 2013-09-10 16:39 ` "No such device" error when mounting immediately after formatting Prantis, Kelsey @ 2013-09-12 7:58 ` Stefan Hajnoczi 2013-10-07 19:41 ` Prantis, Kelsey 0 siblings, 1 reply; 3+ messages in thread From: Stefan Hajnoczi @ 2013-09-12 7:58 UTC (permalink / raw) To: Prantis, Kelsey; +Cc: kvm@vger.kernel.org, Murrell, Brian On Tue, Sep 10, 2013 at 04:39:56PM +0000, Prantis, Kelsey wrote: > We have a cluster of 7 KVM vms on a host. The host OS is Fedora 18, and the guest OS is Centos 6.4. Installed kvm/qemu/kernel packages are as follows: > > qemu-system-x86-1.2.2-11.fc18.x86_64 > qemu-common-1.2.2-11.fc18.x86_64 > qemu-img-1.2.2-11.fc18.x86_64 > libvirt-daemon-driver-qemu-0.10.2.5-1.fc18.x86_64 > qemu-kvm-1.2.2-11.fc18.x86_64 > ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch > kernel-3.9.4-200.fc18.x86_64 > > To 4 of the vms we have attached the same 5 lvs to be used as shared storage, with definitions like the below (disk1-disk5): > > <disk type='block' device='disk'> > <driver name='qemu' type='raw' /> > <source dev='/dev/vg_00/disk1'/> > <target dev='sda' bus='scsi'/> > <shareable/> > <serial>disk1</serial> > <alias name='scsi0-0-0'/> > <address type='drive' controller='0' bus='0' target='0' unit='0'/> > </disk> > > Throughout the course of our automated test suite, our tests format the device with an ext4 file system and then immediately mount the file system to write a few files after the format completes. Most of the time this works great. However, some small percentage of the time it is failing on the mount command with "No such device". > > Unable to mount /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1: No such device > > > We know that the device does in fact exist and was operable, since the mkfs command just had completed successfully and without error, so I am not sure why suddenly it is returning "No such device" when trying to mount, and only a small percentage of the time. To prove that the device is in fact there, we've tried putting the mount into a retry-loop as a debug measure to show the device is eventually there, and without fail in one of the loop iterations the mount does complete successfully. It seems like there could possibly be some sort of race between closing the device after the mkfs and quickly opening it again for the mount? > > We've reproduced this both with directly attached devices, as above, as well as with iscsi devices. This is weird because the symlinks in /dev/disk/by-*/ just point back to ../../sd*. The "No such device" error message implies the device node exists on the file system but the kernel thinks a device for that major/minor number is not present. I wonder if the output of "udevadm monitor" during the mfks and mount steps shows devices appearing/disappearing? That might explain a race condition. Can you share your script that runs mkfs and mounts the file system? At which point in the boot process does your script run? Stefan ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: "No such device" error when mounting immediately after formatting 2013-09-12 7:58 ` Stefan Hajnoczi @ 2013-10-07 19:41 ` Prantis, Kelsey 0 siblings, 0 replies; 3+ messages in thread From: Prantis, Kelsey @ 2013-10-07 19:41 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kvm@vger.kernel.org, Murrell, Brian >I wonder if the output of "udevadm monitor" during the mfks and mount >steps shows devices appearing/disappearing? That might explain a race >condition. So sorry for the long delay in response, but the results of the "udevadm monitor" gave me a new lead that led to solving the problem (which I will discuss below). >At which point in the boot process does your script run? Our script does not run as part of the boot process. It is just formatting and mounting the devices to write repeatedly well after boot. The Solution: ------------------------ The key bit of information I think we were missing before is that the formatting and mounting were occurring in parallel for multiple devices attached to the node. When looking at the "udevadm monitor" results it brought to my attention that it was having to load a module in response to the mount command, and I wondered if there could be a race with two parallel mount commands that ask for the same module to be loaded. Turns out, that was a known kernel bug, which was fixed in kernel 3.7.0, and has nothing to do with kvm: - Original ticket here: https://bugzilla.redhat.com/show_bug.cgi?id=771285 - Patch submitted here: http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709 I've filed a ticket with RedHat to request the fix be back ported to the REHL6 kernel here: https://bugzilla.redhat.com/show_bug.cgi?id=1009704 Until then, I found the simplest workaround was to explicitly load the module (ex: "modprobe ext4"), prior to beginning the formatting and mounting process. Sorry to bug you guys with a unrelated issue, but hopefully this explanation can help anyone else who stumbles into the problem. Regards, Kelsey ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-10-07 19:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CE549F34.4ABB%kelsey.prantis@intel.com>
2013-09-10 16:39 ` "No such device" error when mounting immediately after formatting Prantis, Kelsey
2013-09-12 7:58 ` Stefan Hajnoczi
2013-10-07 19:41 ` Prantis, Kelsey
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.