From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu block <qemu-block@nongnu.org>,
Stefan Hajnoczi <stefanha@gmail.com>,
Alexander Bezzubikov <zuban32s@gmail.com>,
qemu-devel <qemu-devel@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
Date: Fri, 14 Aug 2015 16:21:15 +0200 [thread overview]
Message-ID: <55CDF95B.2040206@kamp.de> (raw)
In-Reply-To: <20150814140852.GB5856@noname.redhat.com>
Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>>>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>>>> #0 0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>>>> No symbol table info available.
>>>>>>>>>>>> #1 0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>>>> nfds=3,
>>>>>>>>>>>> timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>>> ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>>> tvsec = 4
>>>>>>>>>>>> #2 0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>>>> blocking=true)
>>>>>>>>>>>> at aio-posix.c:231
>>>>>>>>>>>> node = 0x0
>>>>>>>>>>>> was_dispatching = false
>>>>>>>>>>>> ret = 1
>>>>>>>>>>>> progress = false
>>>>>>>>>>>> #3 0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>>>> offset=4292007936,
>>>>>>>>>>>> qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>>>> block.c:2699
>>>>>>>>>>>> aio_context = 0x5555563528e0
>>>>>>>>>>>> co = 0x5555563888a0
>>>>>>>>>>>> rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>>> qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>>>> 2147483647,
>>>>>>>>>>>> flags = 0}
>>>>>>>>>>>> #4 0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>>>> flags=0)
>>>>>>>>>>>> at block.c:2722
>>>>>>>>>>>> qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>>>> size =
>>>>>>>>>>>> 2048}
>>>>>>>>>>>> iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>>>> #5 0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #6 0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>> buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #7 0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>>>> lba=2095707,
>>>>>>>>>>>> buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>>> ret = 32767
>>>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>>>> blk_read()
>>>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>>>> keeps
>>>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>>>> request
>>>>>>>>>>> is completed.
>>>>>>>>>> I will look at this.
>>>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>>>> cannot mount
>>>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>>>> tried with
>>>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>>>> another
>>>>>>>> bug):
>>>>>>>>
>>>>>>>> (gdb) thread apply all bt full
>>>>>>>>
>>>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>>>> #0 qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>>>> util/qemu-thread-posix.c:120
>>>>>>>> err = <optimized out>
>>>>>>>> __func__ = "qemu_cond_broadcast"
>>>>>>>> #1 0x0000555555911164 in rfifolock_unlock
>>>>>>>> (r=r@entry=0x555556259910) at
>>>>>>>> util/rfifolock.c:75
>>>>>>>> __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>>>> #2 0x0000555555875921 in aio_context_release
>>>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>>>> at async.c:329
>>>>>>>> No locals.
>>>>>>>> #3 0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>>> node = <optimized out>
>>>>>>>> was_dispatching = false
>>>>>>>> i = <optimized out>
>>>>>>>> ret = <optimized out>
>>>>>>>> progress = false
>>>>>>>> timeout = 611734526
>>>>>>>> __PRETTY_FUNCTION__ = "aio_poll"
>>>>>>>> #4 0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>>>> block/io.c:552
>>>>>>>> aio_context = 0x5555562598b0
>>>>>>>> co = <optimized out>
>>>>>>>> rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>>>> (unknown: 0)}
>>>>>>>> #5 0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>>> flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>>> qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>>>> = 2048}
>>>>>>>> iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>>>> #6 0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>>>> No locals.
>>>>>>>> #7 0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>>> ret = <optimized out>
>>>>>>>> #8 0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>>>> hw/ide/atapi.c:116
>>>>>>>> ret = <optimized out>
>>>>>>>> #9 ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>>> byte_count_limit = <optimized out>
>>>>>>>> size = <optimized out>
>>>>>>>> ret = 2
>>>>>>> This is still the same scenario Kevin explained.
>>>>>>>
>>>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read(). This
>>>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>>>> to complete. This blocks other vcpu threads and the main loop thread.
>>>>>>>
>>>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>>>> blk_aio_readv() instead of blk_read().
>>>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>>>> and then doesn't continue. Maybe someone with more knowledge
>>>>>> of ATAPI/IDE could help?
>>>>> Converting synchronous code to asynchronous requires an understanding
>>>>> of the device's state transitions. Asynchronous code has to put the
>>>>> device registers into a busy state until the request completes. It
>>>>> also needs to handle hardware register accesses that occur while the
>>>>> request is still pending.
>>>> That was my assumption as well. But I don't know how to proceed...
>>>>
>>>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>>>> Maybe @John can help?
>>>>
>>>> Peter
>>>>
>> I looked into this again and it seems that the remaining problem (at least when the CDROM is
>> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
>> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
>> issues a DMA cancel.
>>
>> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
>> approaches would be correct or not).
>>
>> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
>> the connection is reestablished. The guest OS issues DMA cancel operations again and
>> again, but when the connectivity is back I end in the following assertion:
>>
>> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.
> I would have to check the specs to see if this is allowed.
Maybe there is a better approach after all...
>
>> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.
> This wouldn't be correct for write requests: We would tell the guest
> that the request is cancelled when it's actually still in flight. At
> some point it could still complete, however, and that's not expected by
> the guest.
In case of a CDROM we have a read only device. So this could work?
>
>> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>> * In the future we'll be able to safely cancel the I/O if the
>> * whole DMA operation will be submitted to disk with a single
>> * aio operation with preadv/pwritev.
> Not sure how likely it is that cancelling that single AIOCB will
> actually cancel the operation and not end up doing bdrv_drain_all()
> internally instead because there is no good way of cancelling the
> request.
You might be right.
It seems that the whole thing is not trivial. But it seems so wrong that a whole vServer goes
down just because someone forgot to eject a CDROM he used once and then the NFS
server has a hickup.
Peter
next prev parent reply other threads:[~2015-08-14 14:21 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <55803637.3060607@kamp.de>
2015-06-16 15:34 ` [Qemu-devel] [Qemu-block] RFC cdrom in own thread? Stefan Hajnoczi
2015-06-17 8:35 ` Kevin Wolf
2015-06-18 6:03 ` Peter Lieven
2015-06-18 6:57 ` Markus Armbruster
2015-06-18 6:39 ` Peter Lieven
2015-06-18 6:59 ` Paolo Bonzini
2015-06-18 7:03 ` Peter Lieven
2015-06-18 7:12 ` Peter Lieven
2015-06-18 7:45 ` Kevin Wolf
2015-06-18 8:30 ` Peter Lieven
2015-06-18 8:42 ` Kevin Wolf
2015-06-18 9:29 ` Peter Lieven
2015-06-18 9:36 ` Stefan Hajnoczi
2015-06-18 9:53 ` Peter Lieven
2015-06-19 13:14 ` Peter Lieven
2015-06-22 9:25 ` Stefan Hajnoczi
2015-06-22 13:09 ` Peter Lieven
2015-06-22 21:54 ` John Snow
2015-06-23 6:36 ` Peter Lieven
2015-08-14 13:43 ` Peter Lieven
2015-08-14 14:08 ` Kevin Wolf
2015-08-14 14:21 ` Peter Lieven [this message]
2015-08-14 14:45 ` Peter Lieven
2015-08-15 19:02 ` Peter Lieven
2015-06-18 10:17 ` Peter Lieven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55CDF95B.2040206@kamp.de \
--to=pl@kamp.de \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=zuban32s@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).