All of lore.kernel.org
 help / color / mirror / Atom feed
* disk corruption after virsh destroy
@ 2013-07-02 14:40 Brian J. Murrell
  2013-07-02 15:26 ` Brian J. Murrell
  2013-07-03  8:47 ` Stefan Hajnoczi
  0 siblings, 2 replies; 5+ messages in thread
From: Brian J. Murrell @ 2013-07-02 14:40 UTC (permalink / raw)
  To: kvm

I have a cluster of VMs setup with shared virtio-scsi disks.  The 
purpose of sharing a disk is that if a VM goes down, another can pick up 
and mount the (ext4) filesystem on shared disk a provide service to it.

But just to be super clear, only one VM ever has a filesystem mounted at 
a time even though multiple VMs technically can access the device at the 
same time.  A VM mounting a filesystem ensures absolutely that no other 
node has it mounted before mounting it.

That said, what I am finding is that when one a node dies and another 
node tries to mount the (ext4) filesystem, it is found dirty and needs 
an fsck.

My understanding is that with ext{3,4}, this should not be the case and 
indeed it is my experience, on real hardware with coherent disk caching 
(i.e. no non-battery-backed caching disk controllers lying to the O/S 
about what has been written to physical disk) that this is the case. 
That is, a node failing does not leave an ext{3,4} filesystem dirty such 
that it needs an fsck.

So, clearly, somewhere between the KVM VM and the physical disk, there 
is a cache that is resulting in the guest O/S believing data is being 
written to physical disk that is not actually being written there.  To 
that end, I have ensured that on these shared disks that I set 
"cache=none", but this does not seem to have fixed the problem.

Here is my KVM commandline.  Please bear with the unfortunate line 
wrapping since my MUA (Thunderbird) doesn't allow for one to specify 
lines which shouldn't be wrapped.  I have tried to ameliorate that by 
indenting all of the lines that start command line options with two spaces.

/usr/bin/qemu-kvm -name wtm-60vm5 -S -M pc-0.14 -enable-kvm -m 8192 \
   -smp 1,sockets=1,cores=1,threads=1 \
   -uuid 5cbc2568-e32d-11e2-9c1f-001e67293bea -no-user-config \
   -nodefaults \
   -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/wtm-60vm5.monitor,server,nowait 
\
   -mon chardev=charmonitor,id=monitor,mode=control \
   -rtc base=utc -no-shutdown \
   -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
   -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive 
file=/var/lib/libvirt/images/wtm-60vm5.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=node1-root
   -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
   -drive 
file=/dev/vg_00/disk1,if=none,id=drive-scsi0-0-0-0,format=raw,serial=disk1,cache=none\
   -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
   -drive 
file=/dev/vg_00/disk2,if=none,id=drive-scsi0-0-0-1,format=raw,serial=disk2,cache=none 
\
   -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 
\
   -drive 
file=/dev/vg_00/disk3,if=none,id=drive-scsi0-0-0-2,format=raw,serial=disk3,cache=none 
\
   -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2 
\
   -drive 
file=/dev/vg_00/disk4,if=none,id=drive-scsi0-0-0-3,format=raw,serial=disk4,cache=none 
\
   -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi0-0-0-3,id=scsi0-0-0-3 
\
   -drive 
file=/dev/vg_00/disk5,if=none,id=drive-scsi0-0-0-4,format=raw,serial=disk5,cache=none 
\
   -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi0-0-0-4,id=scsi0-0-0-4 
\
   -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=27 \
   -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:60:d9:05,bus=pci.0,addr=0x3 
\
   -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32
   -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:60:a7:05,bus=pci.0,addr=0x8 
-chardev pty,id=charserial0 \
   -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:3 \
   -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 \
   -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7

Clearly it's the 5 scsi disks which are yielding corruption when a VM is 
destroyed with "virsh destroy".

Any ideas on what I need to do to ensure that writes at the guest O/S 
layer which are to be sent to physical disk actually make it to physical 
disk on the host?

Of course, I am happy to provide any additional information, debugging, 
etc. that may be needed.

Cheers,
b.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk corruption after virsh destroy
  2013-07-02 14:40 disk corruption after virsh destroy Brian J. Murrell
@ 2013-07-02 15:26 ` Brian J. Murrell
  2013-07-03  8:47 ` Stefan Hajnoczi
  1 sibling, 0 replies; 5+ messages in thread
From: Brian J. Murrell @ 2013-07-02 15:26 UTC (permalink / raw)
  To: kvm

I really should have included some version info.  I had intended to, 
after I got all of the details written down but simply forgot before I 
hit Send.  Apologies.

In any case, this was all on Fedora 18.  Here's the versions of all 
packages related to qemu, kvm and libvirt:

$ rpm -qa | grep -e kvm -e qemu -e libvirt | sort
fence-virtd-libvirt-0.3.0-5.fc18.x86_64
ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch
libvirt-0.10.2.6-1.fc18.x86_64
libvirt-client-0.10.2.6-1.fc18.x86_64
libvirt-daemon-0.10.2.6-1.fc18.x86_64
libvirt-daemon-config-network-0.10.2.6-1.fc18.x86_64
libvirt-daemon-config-nwfilter-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-interface-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-libxl-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-lxc-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-network-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-nodedev-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-nwfilter-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-qemu-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-secret-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-storage-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-uml-0.10.2.6-1.fc18.x86_64
libvirt-daemon-driver-xen-0.10.2.6-1.fc18.x86_64
libvirt-daemon-kvm-0.10.2.6-1.fc18.x86_64
libvirt-daemon-qemu-0.10.2.6-1.fc18.x86_64
libvirt-python-0.10.2.6-1.fc18.x86_64
qemu-1.2.2-13.fc18.x86_64
qemu-common-1.2.2-13.fc18.x86_64
qemu-img-1.2.2-13.fc18.x86_64
qemu-kvm-1.2.2-13.fc18.x86_64
qemu-system-alpha-1.2.2-13.fc18.x86_64
qemu-system-arm-1.2.2-13.fc18.x86_64
qemu-system-cris-1.2.2-13.fc18.x86_64
qemu-system-lm32-1.2.2-13.fc18.x86_64
qemu-system-m68k-1.2.2-13.fc18.x86_64
qemu-system-microblaze-1.2.2-13.fc18.x86_64
qemu-system-mips-1.2.2-13.fc18.x86_64
qemu-system-or32-1.2.2-13.fc18.x86_64
qemu-system-ppc-1.2.2-13.fc18.x86_64
qemu-system-s390x-1.2.2-13.fc18.x86_64
qemu-system-sh4-1.2.2-13.fc18.x86_64
qemu-system-sparc-1.2.2-13.fc18.x86_64
qemu-system-unicore32-1.2.2-13.fc18.x86_64
qemu-system-x86-1.2.2-13.fc18.x86_64
qemu-system-xtensa-1.2.2-13.fc18.x86_64
qemu-user-1.2.2-13.fc18.x86_64
ruby-libvirt-0.4.0-4.fc18.x86_64

Running kernel is 3.9.2-200.fc18.x86_64

Let me know if there is anything more I can add.

Cheers,
b.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk corruption after virsh destroy
  2013-07-02 14:40 disk corruption after virsh destroy Brian J. Murrell
  2013-07-02 15:26 ` Brian J. Murrell
@ 2013-07-03  8:47 ` Stefan Hajnoczi
  2013-07-06 13:03   ` Bernd Schubert
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-07-03  8:47 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: kvm

On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote:
> I have a cluster of VMs setup with shared virtio-scsi disks.  The
> purpose of sharing a disk is that if a VM goes down, another can
> pick up and mount the (ext4) filesystem on shared disk a provide
> service to it.
> 
> But just to be super clear, only one VM ever has a filesystem
> mounted at a time even though multiple VMs technically can access
> the device at the same time.  A VM mounting a filesystem ensures
> absolutely that no other node has it mounted before mounting it.
> 
> That said, what I am finding is that when one a node dies and
> another node tries to mount the (ext4) filesystem, it is found dirty
> and needs an fsck.
> 
> My understanding is that with ext{3,4}, this should not be the case
> and indeed it is my experience, on real hardware with coherent disk
> caching (i.e. no non-battery-backed caching disk controllers lying
> to the O/S about what has been written to physical disk) that this
> is the case. That is, a node failing does not leave an ext{3,4}
> filesystem dirty such that it needs an fsck.
> 
> So, clearly, somewhere between the KVM VM and the physical disk,
> there is a cache that is resulting in the guest O/S believing data
> is being written to physical disk that is not actually being written
> there.  To that end, I have ensured that on these shared disks that
> I set "cache=none", but this does not seem to have fixed the
> problem.

I expect journal replay and possibly fsck when an ext4 file system was
left in a mounted state and with I/O pending (e.g. due to power
failure).

A few questions:

1. Is the guest mounting the file system with barrier=0?  barrier=1 is
   the default.

2. Do the physical disks have a volatile write cache enabled (if yes,
   the guest should use barrier=1)?  If the physical disks have a
   non-volatile write cache or the write cache is disabled (then
   barrier=0 is okay).

3. Have you tested without the cluster?  Run a single VM and kill it
   while it is busy.  Then start it up again and see if there is fsck.

4. Is it possible that your previous cluster setup used tune2fs(8) to
   disable fsck in some cases?  That could explain why you didn't see
   fsck before but do now.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk corruption after virsh destroy
  2013-07-03  8:47 ` Stefan Hajnoczi
@ 2013-07-06 13:03   ` Bernd Schubert
  2013-07-15  1:23     ` Stefan Hajnoczi
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schubert @ 2013-07-06 13:03 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Brian J. Murrell, kvm

On 07/03/2013 10:47 AM, Stefan Hajnoczi wrote:
> On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote:
>> I have a cluster of VMs setup with shared virtio-scsi disks.  The
>> purpose of sharing a disk is that if a VM goes down, another can
>> pick up and mount the (ext4) filesystem on shared disk a provide
>> service to it.
>>
>> But just to be super clear, only one VM ever has a filesystem
>> mounted at a time even though multiple VMs technically can access
>> the device at the same time.  A VM mounting a filesystem ensures
>> absolutely that no other node has it mounted before mounting it.
>>
>> That said, what I am finding is that when one a node dies and
>> another node tries to mount the (ext4) filesystem, it is found dirty
>> and needs an fsck.
>>
>> My understanding is that with ext{3,4}, this should not be the case
>> and indeed it is my experience, on real hardware with coherent disk
>> caching (i.e. no non-battery-backed caching disk controllers lying
>> to the O/S about what has been written to physical disk) that this
>> is the case. That is, a node failing does not leave an ext{3,4}
>> filesystem dirty such that it needs an fsck.
>>
>> So, clearly, somewhere between the KVM VM and the physical disk,
>> there is a cache that is resulting in the guest O/S believing data
>> is being written to physical disk that is not actually being written
>> there.  To that end, I have ensured that on these shared disks that
>> I set "cache=none", but this does not seem to have fixed the
>> problem.
> 
> I expect journal replay and possibly fsck when an ext4 file system was
> left in a mounted state and with I/O pending (e.g. due to power
> failure).
> 
> A few questions:
> 
> 1. Is the guest mounting the file system with barrier=0?  barrier=1 is
>    the default.
> 
> 2. Do the physical disks have a volatile write cache enabled (if yes,
>    the guest should use barrier=1)?  If the physical disks have a
>    non-volatile write cache or the write cache is disabled (then
>    barrier=0 is okay).

Er, why? The As far as I understood Brian the physical disks have not
been reset, so their cache should be irrelevant?
If the VM needs barrier=1, then there must be some VM caching involved,
but Brian tried to disable that. At least in the past that worked fine
with the emulated LSI scsi controller. That way I simulated shared
storage and as long as I used the raw disks format and cache=none there
never had been any corruption (although qcow2 didn't work and introduced
issues).

Brian, maybe you could figure out the pattern of the corruption? I need
to add a very mode to ql-fstest, but with some network file systems on
top of ext4 it also should work.

https://bitbucket.org/aakef/ql-fstest


Cheers,
Bernd


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk corruption after virsh destroy
  2013-07-06 13:03   ` Bernd Schubert
@ 2013-07-15  1:23     ` Stefan Hajnoczi
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2013-07-15  1:23 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Brian J. Murrell, kvm

On Sat, Jul 06, 2013 at 03:03:07PM +0200, Bernd Schubert wrote:
> On 07/03/2013 10:47 AM, Stefan Hajnoczi wrote:
> > On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote:
> >> I have a cluster of VMs setup with shared virtio-scsi disks.  The
> >> purpose of sharing a disk is that if a VM goes down, another can
> >> pick up and mount the (ext4) filesystem on shared disk a provide
> >> service to it.
> >>
> >> But just to be super clear, only one VM ever has a filesystem
> >> mounted at a time even though multiple VMs technically can access
> >> the device at the same time.  A VM mounting a filesystem ensures
> >> absolutely that no other node has it mounted before mounting it.
> >>
> >> That said, what I am finding is that when one a node dies and
> >> another node tries to mount the (ext4) filesystem, it is found dirty
> >> and needs an fsck.
> >>
> >> My understanding is that with ext{3,4}, this should not be the case
> >> and indeed it is my experience, on real hardware with coherent disk
> >> caching (i.e. no non-battery-backed caching disk controllers lying
> >> to the O/S about what has been written to physical disk) that this
> >> is the case. That is, a node failing does not leave an ext{3,4}
> >> filesystem dirty such that it needs an fsck.
> >>
> >> So, clearly, somewhere between the KVM VM and the physical disk,
> >> there is a cache that is resulting in the guest O/S believing data
> >> is being written to physical disk that is not actually being written
> >> there.  To that end, I have ensured that on these shared disks that
> >> I set "cache=none", but this does not seem to have fixed the
> >> problem.
> > 
> > I expect journal replay and possibly fsck when an ext4 file system was
> > left in a mounted state and with I/O pending (e.g. due to power
> > failure).
> > 
> > A few questions:
> > 
> > 1. Is the guest mounting the file system with barrier=0?  barrier=1 is
> >    the default.
> > 
> > 2. Do the physical disks have a volatile write cache enabled (if yes,
> >    the guest should use barrier=1)?  If the physical disks have a
> >    non-volatile write cache or the write cache is disabled (then
> >    barrier=0 is okay).
> 
> Er, why? The As far as I understood Brian the physical disks have not
> been reset, so their cache should be irrelevant?

You are right.  The physical disk write cache should not matter in this
case.

Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-15  1:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-02 14:40 disk corruption after virsh destroy Brian J. Murrell
2013-07-02 15:26 ` Brian J. Murrell
2013-07-03  8:47 ` Stefan Hajnoczi
2013-07-06 13:03   ` Bernd Schubert
2013-07-15  1:23     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.