[Qemu-devel] Data corruption in Qemu 2.7.1

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Data corruption in Qemu 2.7.1
@ 2017-01-13 10:44 Peter Lieven
  2017-01-17  6:40 ` Fam Zheng
  2017-01-17  7:33 ` [Qemu-devel] " Alexandre DERUMIER
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Lieven @ 2017-01-13 10:44 UTC (permalink / raw)
  To: qemu-devel@nongnu.org; +Cc: qemu-stable

Hi,

i currently facing a problem in our testing environment where I see file system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

Thanks,
Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Data corruption in Qemu 2.7.1
  2017-01-13 10:44 [Qemu-devel] Data corruption in Qemu 2.7.1 Peter Lieven
@ 2017-01-17  6:40 ` Fam Zheng
  2017-01-17 10:14   ` [Qemu-devel] [Qemu-stable] " Peter Lieven
  2017-01-17  7:33 ` [Qemu-devel] " Alexandre DERUMIER
  1 sibling, 1 reply; 15+ messages in thread
From: Fam Zheng @ 2017-01-17  6:40 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel@nongnu.org, qemu-stable

On Fri, 01/13 11:44, Peter Lieven wrote:
> Hi,
> 
> i currently facing a problem in our testing environment where I see file
> system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
> Trying to bisect, but has anyone observed this before?

The information here is too scarce to tell but a file corruption is more often a
result of two writers modifying the disk concurrently. Have you ruled that out?
Is the corruption reproducible?

Fam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable]  Data corruption in Qemu 2.7.1
  2017-01-17  6:40 ` Fam Zheng
@ 2017-01-17 10:14   ` Peter Lieven
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Lieven @ 2017-01-17 10:14 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel@nongnu.org, qemu-stable

Am 17.01.2017 um 07:40 schrieb Fam Zheng:
> On Fri, 01/13 11:44, Peter Lieven wrote:
>> Hi,
>>
>> i currently facing a problem in our testing environment where I see file
>> system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
>> Trying to bisect, but has anyone observed this before?
> The information here is too scarce to tell but a file corruption is more often a
> result of two writers modifying the disk concurrently. Have you ruled that out?
> Is the corruption reproducible?

My issue was primary with iSCSI and I cut bisect it. I already send a patch to the list:

commit 0bd57e907311be6e4f97394cfd9afebe271457e2
Author: Peter Lieven <pl@kamp.de>
Date:   Mon Jan 16 16:10:26 2017 +0100

     block/iscsi: avoid data corruption with cache=writeback

     nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
     request starts and ends within the same cluster. This results
     in passing -1 to bitmap_set and bitmap_clear and they don't
     handle negative values properly. In the end this leads to data
     corruption.

     Fixes: e1123a3b40a1a9a625a29c8ed4debb7e206ea690
     Cc: qemu-stable@nongnu.org
     Signed-off-by: Peter Lieven <pl@kamp.de>

However, one user also reported corruption with LVM. I will check if he uses virtio-scsi.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Data corruption in Qemu 2.7.1
  2017-01-13 10:44 [Qemu-devel] Data corruption in Qemu 2.7.1 Peter Lieven
  2017-01-17  6:40 ` Fam Zheng
@ 2017-01-17  7:33 ` Alexandre DERUMIER
  2017-01-17  8:03   ` [Qemu-devel] [Qemu-stable] " Fabian Grünbichler
  1 sibling, 1 reply; 15+ messages in thread
From: Alexandre DERUMIER @ 2017-01-17  7:33 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-stable

Hi,

proxmox users have reported recently corruption with qemu 2.7 and scsi-block (with passing physical /dev/sdX to virtio-scsi).

working fine with qemu 2.6.

qemu 2.7 + scsi-hd works fine

https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2



----- Mail original -----
De: "Peter Lieven" <pl@kamp.de>
À: "qemu-devel" <qemu-devel@nongnu.org>
Cc: "qemu-stable" <qemu-stable@nongnu.org>
Envoyé: Vendredi 13 Janvier 2017 11:44:32
Objet: [Qemu-devel] Data corruption in Qemu 2.7.1

Hi, 

i currently facing a problem in our testing environment where I see file system corruption with 2.7.1 on iSCSI and Local Storage (LVM). 
Trying to bisect, but has anyone observed this before? 

Thanks, 
Peter 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable]  Data corruption in Qemu 2.7.1
  2017-01-17  7:33 ` [Qemu-devel] " Alexandre DERUMIER
@ 2017-01-17  8:03   ` Fabian Grünbichler
  2017-01-17 10:41     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-17  8:03 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: Peter Lieven, qemu-devel, qemu-stable, Paolo Bonzini

On Tue, Jan 17, 2017 at 08:33:46AM +0100, Alexandre DERUMIER wrote:
> Hi,
> 
> proxmox users have reported recently corruption with qemu 2.7 and scsi-block (with passing physical /dev/sdX to virtio-scsi).
> 
> working fine with qemu 2.6.
> 
> qemu 2.7 + scsi-hd works fine
> 
> https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2
> 

I am fairly sure this is a separate issue.

Commit 8fdc7839e40f43a426bc7e858cf1dbfe315a3804 (first included in
2.7.0)[1] changed the behaviour of scsi-block passthrough. Previously
this worked with SATA disks, now it doesn't anymore. A bisect run
confirmed this, scsi-block with a SATA disk passed through via
virtio-scsi-single corrupts on writes since that commit, scsi-hd and
scsi-disk work fine (scsi-generic corrupts as well).

PVE's detection logic for passthrough just differentiated between disks
and tape drives, and unfortunately the SG_IO ioctl says SATA disks are
disks as well.. we probably need to default to scsi-hd or scsi-disk
instead of scsi-block, and only when we explicitly detect a "real" SCSI
disk we are allowed to use scsi-block?

@Paolo: was the old behaviour just an accident and the new bevaviour
intentional? documentation is quite sparse, or maybe I am looking in the
wrong places..

1: scsi-block: always use SG_IO

Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status.  This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT.  Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-17  8:03   ` [Qemu-devel] [Qemu-stable] " Fabian Grünbichler
@ 2017-01-17 10:41     ` Paolo Bonzini
  2017-01-17 11:22       ` Fabian Grünbichler
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-01-17 10:41 UTC (permalink / raw)
  To: Fabian Grünbichler, Alexandre DERUMIER
  Cc: Peter Lieven, qemu-devel, qemu-stable



On 17/01/2017 09:03, Fabian Grünbichler wrote:
> Commit 8fdc7839e40f43a426bc7e858cf1dbfe315a3804 (first included in
> 2.7.0)[1] changed the behaviour of scsi-block passthrough. Previously
> this worked with SATA disks, now it doesn't anymore. A bisect run
> confirmed this, scsi-block with a SATA disk passed through via
> virtio-scsi-single corrupts on writes since that commit, scsi-hd and
> scsi-disk work fine (scsi-generic corrupts as well).
> 
> PVE's detection logic for passthrough just differentiated between disks
> and tape drives, and unfortunately the SG_IO ioctl says SATA disks are
> disks as well.. we probably need to default to scsi-hd or scsi-disk
> instead of scsi-block, and only when we explicitly detect a "real" SCSI
> disk we are allowed to use scsi-block?
> 
> @Paolo: was the old behaviour just an accident and the new bevaviour
> intentional? documentation is quite sparse, or maybe I am looking in the
> wrong places..

No, it would be a bug (QEMU or kernel).

Do you have an easy reproducer with dd, as suggested at
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2?

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-17 10:41     ` Paolo Bonzini
@ 2017-01-17 11:22       ` Fabian Grünbichler
  2017-01-17 15:03         ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-17 11:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alexandre DERUMIER, Peter Lieven, qemu-devel, qemu-stable

On Tue, Jan 17, 2017 at 11:41:44AM +0100, Paolo Bonzini wrote:
> 
> 
> On 17/01/2017 09:03, Fabian Grünbichler wrote:
> > Commit 8fdc7839e40f43a426bc7e858cf1dbfe315a3804 (first included in
> > 2.7.0)[1] changed the behaviour of scsi-block passthrough. Previously
> > this worked with SATA disks, now it doesn't anymore. A bisect run
> > confirmed this, scsi-block with a SATA disk passed through via
> > virtio-scsi-single corrupts on writes since that commit, scsi-hd and
> > scsi-disk work fine (scsi-generic corrupts as well).
> > 
> > PVE's detection logic for passthrough just differentiated between disks
> > and tape drives, and unfortunately the SG_IO ioctl says SATA disks are
> > disks as well.. we probably need to default to scsi-hd or scsi-disk
> > instead of scsi-block, and only when we explicitly detect a "real" SCSI
> > disk we are allowed to use scsi-block?
> > 
> > @Paolo: was the old behaviour just an accident and the new bevaviour
> > intentional? documentation is quite sparse, or maybe I am looking in the
> > wrong places..
> 
> No, it would be a bug (QEMU or kernel).
> 
> Do you have an easy reproducer with dd, as suggested at
> https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2?
> 
> Paolo
> 

setup

1) dd 1G of (u)random data to a file on a file system on a SATA disk on
the host (I tested with ext2/3/4, ZFS and btrfs, btrfs produces
corruption the fastest here, but raw writes to the device should already
trigger the kernel error messages)

2) calculate the md5sum of this file, and store it on the same FS

test (with current qemu master, using PVE's 4.4.35 kernel as host kernel

1) start Ubuntu 16.04 VM (see full commandline at the end, I tested with
three disks, one for each of the file systems above) - the same behaviour
can also be observed when booting from an Alpine Linux iso

2) mount FS

3) dd random file from above to new file on same FS

4) sync

5) check md5sum of resulting file

6) repeat 3-5 until md5sum does not match, kernel spews error
messages, or you are convinced that everything is OK

sample kernel message (for ext3):
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192
Jan 17 11:39:32 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 31937280)
Jan 17 11:39:32 ubuntu kernel: buffer_io_error: 246 callbacks suppressed
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936768
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936769
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936770
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936771
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936772
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936773
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936774
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936775
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936776
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936777
Jan 17 11:39:39 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:41 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:55 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:56 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:07 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:08 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:15 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:22 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8

qemu commandline, originally generated by PVE, feel free to adapt or
minimize.. sdb, sdc and sde are the three SATA disks from above,
vm-101-disk-1 is the Ubuntu rootfs on LVM-thin.

/root/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
 -enable-kvm \
 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' \
 -mon 'chardev=qmp,mode=control' \
 -pidfile /var/run/qemu-server/101.pid \
 -smbios 'type=1,uuid=3e550136-9d7f-4a12-9d98-e25a5d2d5806' \
 -name diskpassthroughtest \
 -smp '8,sockets=1,cores=8,maxcpus=8' \
 -nodefaults \
 -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
 -vga cirrus \
 -vnc unix:/var/run/qemu-server/101.vnc,x509,password \
 -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce \
 -m 4096 \
 -k de \
 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
 -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
 -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
 -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
 -iscsi 'initiator-name=iqn.1993-08.org.debian:01:c6676e1b1f72' \
 -drive 'file=/mnt/pve/iso/template/iso/ubuntu-16.04.1-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' \
 -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
 -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' \
 -drive 'file=/dev/sdb,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' \
 -device 'scsi-block,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
 -device 'virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2' \
 -drive 'file=/dev/sdc,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' \
 -device 'scsi-block,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' \
 -device 'virtio-scsi-pci,id=virtioscsi3,bus=pci.3,addr=0x4' \
 -drive 'file=/dev/sde,if=none,id=drive-scsi3,format=raw,cache=none,aio=native,detect-zeroes=on' \
 -device 'scsi-block,bus=virtioscsi3.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi3,id=scsi3' \
 -drive 'file=/dev/pve/vm-101-disk-1,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' \
 -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=103' \
 -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
 -device 'virtio-net-pci,mac=46:83:4B:92:EC:88,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-17 11:22       ` Fabian Grünbichler
@ 2017-01-17 15:03         ` Paolo Bonzini
  2017-01-17 16:24           ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-01-17 15:03 UTC (permalink / raw)
  To: Fabian Grünbichler
  Cc: Alexandre DERUMIER, Peter Lieven, qemu-devel, qemu-stable



On 17/01/2017 12:22, Fabian Grünbichler wrote:
> 6) repeat 3-5 until md5sum does not match, kernel spews error
> messages, or you are convinced that everything is OK
> 
> sample kernel message (for ext3):
> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
> Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192

Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the 
host?  Or also using this systemtap script.

The important bit would be the lines with a nonzero status, but the
others can be useful to see what the surroundings look like.

# example output for "sudo stap -v strace.stp -c 'sg_opcodes /dev/sda'"
# | sg_opcodes[3444] 00000000 12 00 00 00 24 00 00 00 be 91
# | sg_opcodes[3444] 08100002 a3 0c 00 00 00 00 00 00 20 00

global cdbs%
global reqs%
global names%

function check_pid() {
    return target() == 0 || pid() == target();
}

probe kernel.function("blk_fill_sghdr_rq") {
    if (!check_pid()) next;

    names[$rq]=sprintf("%s[%d]", execname(), tid())
    cdbs[$rq]=sprintf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x",
		    $hdr->cmdp[0],$hdr->cmdp[1],$hdr->cmdp[2],$hdr->cmdp[3],$hdr->cmdp[4],
		    $hdr->cmdp[6],$hdr->cmdp[5],$hdr->cmdp[7],$hdr->cmdp[8],$hdr->cmdp[9])
}

probe kernel.function("scsi_setup_cmnd") {
    if (!($req in cdbs)) next;
    reqs[$req->special] = $req;
}

probe kernel.function("scsi_finish_command") {
    if (!($cmd in reqs)) next;
    rq = reqs[$cmd];
    printf("%s %08x %s\n", names[rq], $cmd->result, cdbs[rq]);
    delete reqs[$cmd]
    delete cdbs[rq]
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-17 15:03         ` Paolo Bonzini
@ 2017-01-17 16:24           ` Paolo Bonzini
  2017-01-18 11:50             ` Fabian Grünbichler
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-01-17 16:24 UTC (permalink / raw)
  To: Fabian Grünbichler
  Cc: Peter Lieven, qemu-devel, Alexandre DERUMIER, qemu-stable



On 17/01/2017 16:03, Paolo Bonzini wrote:
> 
> 
> On 17/01/2017 12:22, Fabian Grünbichler wrote:
>> 6) repeat 3-5 until md5sum does not match, kernel spews error
>> messages, or you are convinced that everything is OK
>>
>> sample kernel message (for ext3):
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
>> Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192
> 
> Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the 
> host?  Or also using this systemtap script.
> 
> The important bit would be the lines with a nonzero status, but the
> others can be useful to see what the surroundings look like.
> 
> # example output for "sudo stap -v strace.stp -c 'sg_opcodes /dev/sda'"
> # | sg_opcodes[3444] 00000000 12 00 00 00 24 00 00 00 be 91
> # | sg_opcodes[3444] 08100002 a3 0c 00 00 00 00 00 00 20 00
> 
> global cdbs%
> global reqs%
> global names%
> 
> function check_pid() {
>     return target() == 0 || pid() == target();
> }
> 
> probe kernel.function("blk_fill_sghdr_rq") {
>     if (!check_pid()) next;
> 
>     names[$rq]=sprintf("%s[%d]", execname(), tid())
>     cdbs[$rq]=sprintf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x",
> 		    $hdr->cmdp[0],$hdr->cmdp[1],$hdr->cmdp[2],$hdr->cmdp[3],$hdr->cmdp[4],
> 		    $hdr->cmdp[6],$hdr->cmdp[5],$hdr->cmdp[7],$hdr->cmdp[8],$hdr->cmdp[9])
> }
> 
> probe kernel.function("scsi_setup_cmnd") {
>     if (!($req in cdbs)) next;
>     reqs[$req->special] = $req;
> }
> 
> probe kernel.function("scsi_finish_command") {
>     if (!($cmd in reqs)) next;
>     rq = reqs[$cmd];
>     printf("%s %08x %s\n", names[rq], $cmd->result, cdbs[rq]);
>     delete reqs[$cmd]
>     delete cdbs[rq]

Please add a "delete names[rq]" here too!

Paolo

> }
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-17 16:24           ` Paolo Bonzini
@ 2017-01-18 11:50             ` Fabian Grünbichler
  2017-01-18 16:19               ` Fabian Grünbichler
  0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-18 11:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable

On 17/01/2017 16:03, Paolo Bonzini wrote:
> On 17/01/2017 12:22, Fabian Grünbichler wrote:
>> 6) repeat 3-5 until md5sum does not match, kernel spews error
>> messages, or you are convinced that everything is OK
>>
>> sample kernel message (for ext3):
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
>> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
>> Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192
> 
> Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the 
> host?  Or also using this systemtap script.
> 
> The important bit would be the lines with a nonzero status, but the
> others can be useful to see what the surroundings look like.
> 

OT: systemtap is not working with your script under Debian Jessie (or
maybe in general under Debian Jessie? not sure).

after some further testing it seems like this change in Qemu exposes
some subtle issue with our specific kernel (it works fine with the
upstream Ubuntu 4.4 one which ours is based on). I am currently
debugging further to narrow down potential causes - if I need further
input from your side or if I suspect Qemu to be at fault I'll resurrect
this thread (and include the strace output).

thanks for your quick reaction anyhow!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-18 11:50             ` Fabian Grünbichler
@ 2017-01-18 16:19               ` Fabian Grünbichler
  2017-01-18 16:30                 ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-18 16:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 5664 bytes --]

On Wed, Jan 18, 2017 at 12:50:50PM +0100, Fabian Grünbichler wrote:
> On 17/01/2017 16:03, Paolo Bonzini wrote:
> > On 17/01/2017 12:22, Fabian Grünbichler wrote:
> >> 6) repeat 3-5 until md5sum does not match, kernel spews error
> >> messages, or you are convinced that everything is OK
> >>
> >> sample kernel message (for ext3):
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
> >> Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
> >> Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192
> > 
> > Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the 
> > host?  Or also using this systemtap script.
> > 
> > The important bit would be the lines with a nonzero status, but the
> > others can be useful to see what the surroundings look like.
> > 
> 
> OT: systemtap is not working with your script under Debian Jessie (or
> maybe in general under Debian Jessie? not sure).
> 
> after some further testing it seems like this change in Qemu exposes
> some subtle issue with our specific kernel (it works fine with the
> upstream Ubuntu 4.4 one which ours is based on). I am currently
> debugging further to narrow down potential causes - if I need further
> input from your side or if I suspect Qemu to be at fault I'll resurrect
> this thread (and include the strace output).
> 
> thanks for your quick reaction anyhow!
> 

okay, so this looks like either a bug in Qemu or the upstream kernel.

disabling THP on the hypervisor host with

# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

allows reproducing the bug very reliably, shutting the VM down, then
enabling THP (with 'always') and trying again makes it go away.

Qemu was compiled with:
../configure --with-confsuffix=/kvm --target-list=x86_64-softmmu
--disable-xen --enable-gnutls --enable-sdl --enable-uuid
--enable-linux-aio --enable-libiscsi --disable-smartcard
--audio-drv-list=alsa --enable-spice --enable-usb-redir --enable-libusb
--disable-gtk --enable-xfsctl --enable-numa --disable-strip
--enable-jemalloc --disable-libnfs --disable-fdt

attached is an strace with qemu master and mainline 4.9 running on
Debian Jessie - I will try to test it with Fedora or CentOS tomorrow.

journal in the VM says the following:

Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8


strace (with some random grep-ing):
[pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
[pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)

[-- Attachment #2: host-strace.gz --]
[-- Type: application/gzip, Size: 314401 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-18 16:19               ` Fabian Grünbichler
@ 2017-01-18 16:30                 ` Paolo Bonzini
  2017-01-18 17:17                   ` Fabian Grünbichler
  2017-01-19 11:59                   ` Fabian Grünbichler
  0 siblings, 2 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-01-18 16:30 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable



On 18/01/2017 17:19, Fabian Grünbichler wrote:
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
> Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
> Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
> Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
> Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
> Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
> Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
> Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> 
> 
> strace (with some random grep-ing):
> [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
> [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)

This is useful, thanks.  I suspect blk_rq_map_user_iov is failing,
meaning that the scatter/gather list has too many segments for the HBA
in the host.  (The limit can be found in /sys/block/sda/queue/max_segments).

This is consistent with your finding here:

> disabling THP on the hypervisor host with
> 
> # echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
> 
> allows reproducing the bug very reliably, shutting the VM down, then
> enabling THP (with 'always') and trying again makes it go away.

because no THP means more memory fragmentation and thus more segments.

I'm not sure how to fix it, unfortunately. :(

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-18 16:30                 ` Paolo Bonzini
@ 2017-01-18 17:17                   ` Fabian Grünbichler
  2017-01-19 11:59                   ` Fabian Grünbichler
  1 sibling, 0 replies; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-18 17:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable


> Paolo Bonzini <pbonzini@redhat.com> hat am 18. Januar 2017 um 17:30 geschrieben:
> 
> 
> 
> 
> On 18/01/2017 17:19, Fabian Grünbichler wrote:
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
> > Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> > Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> > 
> > 
> > strace (with some random grep-ing):
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
> 
> This is useful, thanks.  I suspect blk_rq_map_user_iov is failing,
> meaning that the scatter/gather list has too many segments for the HBA
> in the host.  (The limit can be found in /sys/block/sda/queue/max_segments).

I can try to get some more info tomorrow..

> 
> This is consistent with your finding here:
> 
> > disabling THP on the hypervisor host with
> > 
> > # echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
> > 
> > allows reproducing the bug very reliably, shutting the VM down, then
> > enabling THP (with 'always') and trying again makes it go away.
> 
> because no THP means more memory fragmentation and thus more segments.
> 
> I'm not sure how to fix it, unfortunately. :(

Well at least this means we have a (potentially too conservative) check for deciding when to use scsi-disk instead of scsi-block (maybe this could be detected in qemu as well?).

Seems especially troublesome since the (hypervisor) admin can change it at runtime, and it seems like there are widespread recommendations to disable THP for e.g., DB use cases..

> 
> Paolo
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-18 16:30                 ` Paolo Bonzini
  2017-01-18 17:17                   ` Fabian Grünbichler
@ 2017-01-19 11:59                   ` Fabian Grünbichler
  2017-01-24  9:35                     ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2017-01-19 11:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable

On Wed, Jan 18, 2017 at 05:30:17PM +0100, Paolo Bonzini wrote:
> 
> 
> On 18/01/2017 17:19, Fabian Grünbichler wrote:
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
> > Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
> > Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
> > Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
> > Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
> > Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> > Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
> > 
> > 
> > strace (with some random grep-ing):
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
> > [pid  1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
> 
> This is useful, thanks.  I suspect blk_rq_map_user_iov is failing,
> meaning that the scatter/gather list has too many segments for the HBA
> in the host.  (The limit can be found in /sys/block/sda/queue/max_segments).

limit is 168 for all the disks I tested with.

> 
> This is consistent with your finding here:
> 
> > disabling THP on the hypervisor host with
> > 
> > # echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
> > 
> > allows reproducing the bug very reliably, shutting the VM down, then
> > enabling THP (with 'always') and trying again makes it go away.
> 
> because no THP means more memory fragmentation and thus more segments.

it is also very easily reproducible with both THP enable and defrag set
to madvise or always, if tested in fragmented- or low-memory conditions.

my test host has 64G of memory, my test VM 4G, huge pages are 2k big

if I simulate some memory load by repeatedly reserving 50G memory using
stress-ng:

# stress-ng --vm 50 --vm-bytes=1G --vm-hang 30

and then start the test VM and the dd-ing, I can see the big chunk of
AnonHugePages allocated to the VM system grow:

# grep -E 'AnonHugePages:[[:space:]]+[0-9]{5,} kB' /proc/$(pidof qemu-system-x86_64)/smaps

up to about 3G (of 4G), and hit the issue.

without the additional load and fragmentation using stress-ng, the
AnonHugePages allocated to the qemu process grow to the expected 4G, and
the issue does not occur.

> 
> I'm not sure how to fix it, unfortunately. :(

so this means either use non-transparent huge pages when using
scsi-block (haven't verified but should work?), or use aggressive THP
settings and/or always leave enough memory reserves? :-/ this is very
unfortunate IMHO (and probably also not a very realistic usage
scenario?)

> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1
  2017-01-19 11:59                   ` Fabian Grünbichler
@ 2017-01-24  9:35                     ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-01-24  9:35 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: qemu-devel, Alexandre DERUMIER, qemu-stable



On 19/01/2017 12:59, Fabian Grünbichler wrote:
> > I'm not sure how to fix it, unfortunately. :(
> 
> so this means either use non-transparent huge pages when using
> scsi-block (haven't verified but should work?), or use aggressive THP
> settings and/or always leave enough memory reserves? :-/ this is very
> unfortunate IMHO (and probably also not a very realistic usage
> scenario?)

Yes, I agree.  Unfortunately there is no API in Linux that lets you
verify how many segments an iov is split into.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-24  9:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-13 10:44 [Qemu-devel] Data corruption in Qemu 2.7.1 Peter Lieven
2017-01-17  6:40 ` Fam Zheng
2017-01-17 10:14   ` [Qemu-devel] [Qemu-stable] " Peter Lieven
2017-01-17  7:33 ` [Qemu-devel] " Alexandre DERUMIER
2017-01-17  8:03   ` [Qemu-devel] [Qemu-stable] " Fabian Grünbichler
2017-01-17 10:41     ` Paolo Bonzini
2017-01-17 11:22       ` Fabian Grünbichler
2017-01-17 15:03         ` Paolo Bonzini
2017-01-17 16:24           ` Paolo Bonzini
2017-01-18 11:50             ` Fabian Grünbichler
2017-01-18 16:19               ` Fabian Grünbichler
2017-01-18 16:30                 ` Paolo Bonzini
2017-01-18 17:17                   ` Fabian Grünbichler
2017-01-19 11:59                   ` Fabian Grünbichler
2017-01-24  9:35                     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).