qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Dietmar Maurer <dietmar@proxmox.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, Sergio Lopez <slp@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Wed, 1 Apr 2020 20:12:56 +0200	[thread overview]
Message-ID: <20200401181256.GB27663@linux.fritz.box> (raw)
In-Reply-To: <997901084.0.1585755465486@webmail.proxmox.com>

Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > I really nobody else able to reproduce this (somebody already tried to reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> -device
> 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> -machine "type=pc,accel=kvm"
> 
> Do you also run "stress-ng -d 5" indied the VM?

I'm not using the exact same test case, but something that I thought
would be similar enough. Specifically, I run the script below, which
boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
of=/dev/sda' while the script keeps starting and cancelling backup jobs
in the background.

Anyway, I finally managed to bisect my problem now (did it wrong the
first time) and got this result:

00e30f05de1d19586345ec373970ef4c192c6270 is the first bad commit
commit 00e30f05de1d19586345ec373970ef4c192c6270
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Tue Oct 1 16:14:09 2019 +0300

    block/backup: use backup-top instead of write notifiers

    Drop write notifiers and use filter node instead.

    = Changes =

    1. Add filter-node-name argument for backup qmp api. We have to do it
    in this commit, as 257 needs to be fixed.

    2. There are no more write notifiers here, so is_write_notifier
    parameter is dropped from block-copy paths.

    3. To sync with in-flight requests at job finish we now have drained
    removing of the filter, we don't need rw-lock.

    4. Block-copy is now using BdrvChildren instead of BlockBackends

    5. As backup-top owns these children, we also move block-copy state
    into backup-top's ownership.

    [...]


That's a pretty big change, and I'm not sure how it's related to
completed requests hanging in the thread pool instead of reentering the
file-posix coroutine. But I also tested it enough that I'm confident
it's really the first bad commit.

Maybe you want to try if your problem starts at the same commit?

Kevin


#!/bin/bash

qmp() {
cat <<EOF
{'execute':'qmp_capabilities'}
EOF

while true; do
cat <<EOF
{ "execute": "drive-backup", "arguments": {
  "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": "/tmp/backup.raw" } }
EOF
sleep 1
cat <<EOF
{ "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
EOF
sleep 2
done
}

./qemu-img create -f qcow2 /tmp/test.qcow2 4G
for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io /tmp/test.qcow2

qmp | x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -machine pc \
    -m 1G \
    -object 'iothread,id=iothread-virtioscsi0' \
    -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
    -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \
    -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
    -device scsi-hd,drive=drive_image1,id=image1 \
    -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \
    -boot d \
    -qmp stdio -monitor vc



  parent reply	other threads:[~2020-04-01 18:13 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31  8:46 bdrv_drained_begin deadlock with io-threads Dietmar Maurer
2020-03-31  9:17 ` Dietmar Maurer
2020-03-31  9:33   ` Dietmar Maurer
2020-03-31 12:58 ` Kevin Wolf
2020-03-31 14:32   ` Dietmar Maurer
2020-03-31 14:53     ` Vladimir Sementsov-Ogievskiy
2020-03-31 15:24       ` Dietmar Maurer
2020-03-31 15:37         ` Kevin Wolf
2020-03-31 16:18           ` Dietmar Maurer
2020-04-01 10:37             ` Kevin Wolf
2020-04-01 15:37               ` Dietmar Maurer
2020-04-01 15:50                 ` Dietmar Maurer
2020-04-01 18:12                 ` Kevin Wolf [this message]
2020-04-01 18:28                   ` Dietmar Maurer
2020-04-01 18:44                     ` Kevin Wolf
2020-04-02  6:48                       ` Dietmar Maurer
2020-04-02  9:10                       ` Dietmar Maurer
2020-04-02 12:14                         ` Kevin Wolf
2020-04-02 14:25                           ` Kevin Wolf
2020-04-02 15:40                             ` Dietmar Maurer
2020-04-02 16:47                               ` Kevin Wolf
2020-04-02 17:10                                 ` Kevin Wolf
2020-04-03  6:48                                   ` Thomas Lamprecht
2020-04-03  8:26                                   ` Dietmar Maurer
2020-04-03  8:47                                     ` Kevin Wolf
2020-04-03 16:31                                       ` Dietmar Maurer
2020-04-06  8:31                                         ` Kevin Wolf
2020-04-02 15:44                             ` Dietmar Maurer
2020-04-01 18:35                   ` Kevin Wolf
2020-04-02  9:21                   ` Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401181256.GB27663@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=dietmar@proxmox.com \
    --cc=jsnow@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).