All of lore.kernel.org
 help / color / mirror / Atom feed
* "No such device" error when mounting immediately after formatting
       [not found] <CE549F34.4ABB%kelsey.prantis@intel.com>
@ 2013-09-10 16:39 ` Prantis, Kelsey
  2013-09-12  7:58   ` Stefan Hajnoczi
  0 siblings, 1 reply; 3+ messages in thread
From: Prantis, Kelsey @ 2013-09-10 16:39 UTC (permalink / raw)
  To: kvm@vger.kernel.org; +Cc: Murrell, Brian

Hi folks,

We have been experiencing a problem with our test bed for a while now, and were hoping perhaps some of the expertise on this mailing list might be able to help us find a solution.

We have a cluster of 7 KVM vms on a host. The host OS is Fedora 18, and the guest OS is Centos 6.4. Installed kvm/qemu/kernel packages are as follows:

qemu-system-x86-1.2.2-11.fc18.x86_64
qemu-common-1.2.2-11.fc18.x86_64
qemu-img-1.2.2-11.fc18.x86_64
libvirt-daemon-driver-qemu-0.10.2.5-1.fc18.x86_64
qemu-kvm-1.2.2-11.fc18.x86_64
ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch
kernel-3.9.4-200.fc18.x86_64

To 4 of the vms we have attached the same 5 lvs to be used as shared storage, with definitions like the below (disk1-disk5):

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' />
      <source dev='/dev/vg_00/disk1'/>
      <target dev='sda' bus='scsi'/>
      <shareable/>
      <serial>disk1</serial>
      <alias name='scsi0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Throughout the course of our automated test suite, our tests format the device with an ext4 file system and then immediately mount the file system to write a few files after the format completes. Most of the time this works great. However, some small percentage of the time it is failing on the mount command with "No such device".

Unable to mount /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1: No such device


We know that the device does in fact exist and was operable, since the mkfs command just had completed successfully and without error, so I am not sure why suddenly it is returning "No such device" when trying to mount, and only a small percentage of the time.  To prove that the device is in fact there, we've tried putting the mount into a retry-loop as a debug measure to show the device is eventually there, and without fail in one of the loop iterations the mount does complete successfully. It seems like there could possibly be some sort of race between closing the device after the mkfs and quickly opening it again for the mount?

We've reproduced this both with directly attached devices, as above, as well as with iscsi devices.


At this point I am pretty stumped how to even continue debugging this issue, so help would be very much appreciated!


Thankful for any help,

Kelsey Prantis

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "No such device" error when mounting immediately after formatting
  2013-09-10 16:39 ` "No such device" error when mounting immediately after formatting Prantis, Kelsey
@ 2013-09-12  7:58   ` Stefan Hajnoczi
  2013-10-07 19:41     ` Prantis, Kelsey
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Hajnoczi @ 2013-09-12  7:58 UTC (permalink / raw)
  To: Prantis, Kelsey; +Cc: kvm@vger.kernel.org, Murrell, Brian

On Tue, Sep 10, 2013 at 04:39:56PM +0000, Prantis, Kelsey wrote:
> We have a cluster of 7 KVM vms on a host. The host OS is Fedora 18, and the guest OS is Centos 6.4. Installed kvm/qemu/kernel packages are as follows:
> 
> qemu-system-x86-1.2.2-11.fc18.x86_64
> qemu-common-1.2.2-11.fc18.x86_64
> qemu-img-1.2.2-11.fc18.x86_64
> libvirt-daemon-driver-qemu-0.10.2.5-1.fc18.x86_64
> qemu-kvm-1.2.2-11.fc18.x86_64
> ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch
> kernel-3.9.4-200.fc18.x86_64
> 
> To 4 of the vms we have attached the same 5 lvs to be used as shared storage, with definitions like the below (disk1-disk5):
> 
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw' />
>       <source dev='/dev/vg_00/disk1'/>
>       <target dev='sda' bus='scsi'/>
>       <shareable/>
>       <serial>disk1</serial>
>       <alias name='scsi0-0-0'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> 
> Throughout the course of our automated test suite, our tests format the device with an ext4 file system and then immediately mount the file system to write a few files after the format completes. Most of the time this works great. However, some small percentage of the time it is failing on the mount command with "No such device".
> 
> Unable to mount /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1: No such device
> 
> 
> We know that the device does in fact exist and was operable, since the mkfs command just had completed successfully and without error, so I am not sure why suddenly it is returning "No such device" when trying to mount, and only a small percentage of the time.  To prove that the device is in fact there, we've tried putting the mount into a retry-loop as a debug measure to show the device is eventually there, and without fail in one of the loop iterations the mount does complete successfully. It seems like there could possibly be some sort of race between closing the device after the mkfs and quickly opening it again for the mount?
> 
> We've reproduced this both with directly attached devices, as above, as well as with iscsi devices.

This is weird because the symlinks in /dev/disk/by-*/ just point back to
../../sd*.  The "No such device" error message implies the device node
exists on the file system but the kernel thinks a device for that
major/minor number is not present.

I wonder if the output of "udevadm monitor" during the mfks and mount
steps shows devices appearing/disappearing?  That might explain a race
condition.

Can you share your script that runs mkfs and mounts the file system?

At which point in the boot process does your script run?

Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: "No such device" error when mounting immediately after formatting
  2013-09-12  7:58   ` Stefan Hajnoczi
@ 2013-10-07 19:41     ` Prantis, Kelsey
  0 siblings, 0 replies; 3+ messages in thread
From: Prantis, Kelsey @ 2013-10-07 19:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm@vger.kernel.org, Murrell, Brian

>I wonder if the output of "udevadm monitor" during the mfks and mount
>steps shows devices appearing/disappearing?  That might explain a race
>condition.

So sorry for the long delay in response, but the results of the "udevadm
monitor" gave me a new lead that led to solving the problem (which I will
discuss below).


>At which point in the boot process does your script run?

Our script does not run as part of the boot process. It is just formatting
and mounting the devices to write repeatedly well after boot.

The Solution:
------------------------
The key bit of information I think we were missing before is that the
formatting and mounting were occurring in parallel for multiple devices
attached to the node. When looking at the "udevadm monitor" results it
brought to my attention that it was having to load a module in response to
the mount command, and I wondered if there could be a race with two
parallel mount commands that ask for the same module to be loaded.

Turns out, that was a known kernel bug, which was fixed in kernel 3.7.0,
and has nothing to do with kvm:
    - Original ticket here:
https://bugzilla.redhat.com/show_bug.cgi?id=771285
    - Patch submitted here:
http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

I've filed a ticket with RedHat to request the fix be back ported to the
REHL6 kernel here:  https://bugzilla.redhat.com/show_bug.cgi?id=1009704

Until then, I found the simplest workaround was to explicitly load the
module (ex: "modprobe ext4"), prior to beginning the formatting and
mounting process.

Sorry to bug you guys with a unrelated issue, but hopefully this
explanation can help anyone else who stumbles into the problem.

Regards,
Kelsey


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-07 19:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CE549F34.4ABB%kelsey.prantis@intel.com>
2013-09-10 16:39 ` "No such device" error when mounting immediately after formatting Prantis, Kelsey
2013-09-12  7:58   ` Stefan Hajnoczi
2013-10-07 19:41     ` Prantis, Kelsey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.