Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu block <qemu-block@nongnu.org>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	Alexander Bezzubikov <zuban32s@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?
Date: Fri, 14 Aug 2015 16:21:15 +0200	[thread overview]
Message-ID: <55CDF95B.2040206@kamp.de> (raw)
In-Reply-To: <20150814140852.GB5856@noname.redhat.com>

Am 14.08.2015 um 16:08 schrieb Kevin Wolf:
> Am 14.08.2015 um 15:43 hat Peter Lieven geschrieben:
>> Am 22.06.2015 um 23:54 schrieb John Snow:
>>> On 06/22/2015 09:09 AM, Peter Lieven wrote:
>>>> Am 22.06.2015 um 11:25 schrieb Stefan Hajnoczi:
>>>>> On Fri, Jun 19, 2015 at 2:14 PM, Peter Lieven <pl@kamp.de> wrote:
>>>>>> Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:
>>>>>>> On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven <pl@kamp.de> wrote:
>>>>>>>> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>>>>>>>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>>>>>>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
>>>>>>>>>>> Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>>>>>>>>>>>> Thread 2 (Thread 0x7ffff5550700 (LWP 2636)):
>>>>>>>>>>>> #0  0x00007ffff5d87aa3 in ppoll () from
>>>>>>>>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>>>>>>>> No symbol table info available.
>>>>>>>>>>>> #1  0x0000555555955d91 in qemu_poll_ns (fds=0x5555563889c0,
>>>>>>>>>>>> nfds=3,
>>>>>>>>>>>>       timeout=4999424576) at qemu-timer.c:326
>>>>>>>>>>>>           ts = {tv_sec = 4, tv_nsec = 999424576}
>>>>>>>>>>>>           tvsec = 4
>>>>>>>>>>>> #2  0x0000555555956feb in aio_poll (ctx=0x5555563528e0,
>>>>>>>>>>>> blocking=true)
>>>>>>>>>>>>       at aio-posix.c:231
>>>>>>>>>>>>           node = 0x0
>>>>>>>>>>>>           was_dispatching = false
>>>>>>>>>>>>           ret = 1
>>>>>>>>>>>>           progress = false
>>>>>>>>>>>> #3  0x000055555594aeed in bdrv_prwv_co (bs=0x55555637eae0,
>>>>>>>>>>>> offset=4292007936,
>>>>>>>>>>>>       qiov=0x7ffff554f760, is_write=false, flags=0) at
>>>>>>>>>>>> block.c:2699
>>>>>>>>>>>>           aio_context = 0x5555563528e0
>>>>>>>>>>>>           co = 0x5555563888a0
>>>>>>>>>>>>           rwco = {bs = 0x55555637eae0, offset = 4292007936,
>>>>>>>>>>>>             qiov = 0x7ffff554f760, is_write = false, ret =
>>>>>>>>>>>> 2147483647,
>>>>>>>>>>>> flags = 0}
>>>>>>>>>>>> #4  0x000055555594afa9 in bdrv_rw_co (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4, is_write=false,
>>>>>>>>>>>> flags=0)
>>>>>>>>>>>>       at block.c:2722
>>>>>>>>>>>>           qiov = {iov = 0x7ffff554f780, niov = 1, nalloc = -1,
>>>>>>>>>>>> size =
>>>>>>>>>>>> 2048}
>>>>>>>>>>>>           iov = {iov_base = 0x7ffff44cc800, iov_len = 2048}
>>>>>>>>>>>> #5  0x000055555594b008 in bdrv_read (bs=0x55555637eae0,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at block.c:2730
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #6  0x000055555599acef in blk_read (blk=0x555556376820,
>>>>>>>>>>>> sector_num=8382828,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", nb_sectors=4) at
>>>>>>>>>>>> block/block-backend.c:404
>>>>>>>>>>>> No locals.
>>>>>>>>>>>> #7  0x0000555555833ed2 in cd_read_sector (s=0x555556408f88,
>>>>>>>>>>>> lba=2095707,
>>>>>>>>>>>>       buf=0x7ffff44cc800 "(", sector_size=2048) at
>>>>>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>>>>>           ret = 32767
>>>>>>>>>>> Here is the problem: The ATAPI emulation uses synchronous
>>>>>>>>>>> blk_read()
>>>>>>>>>>> instead of the AIO or coroutine interfaces. This means that it
>>>>>>>>>>> keeps
>>>>>>>>>>> polling for request completion while it holds the BQL until the
>>>>>>>>>>> request
>>>>>>>>>>> is completed.
>>>>>>>>>> I will look at this.
>>>>>>>> I need some further help. My way to "emulate" a hung NFS Server is to
>>>>>>>> block it in the Firewall. Currently I face the problem that I
>>>>>>>> cannot mount
>>>>>>>> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously
>>>>>>>> tried with
>>>>>>>> a kernel NFS mount). It reads a few sectors and then stalls (maybe
>>>>>>>> another
>>>>>>>> bug):
>>>>>>>>
>>>>>>>> (gdb) thread apply all bt full
>>>>>>>>
>>>>>>>> Thread 3 (Thread 0x7ffff0c21700 (LWP 29710)):
>>>>>>>> #0  qemu_cond_broadcast (cond=cond@entry=0x555556259940) at
>>>>>>>> util/qemu-thread-posix.c:120
>>>>>>>>          err = <optimized out>
>>>>>>>>          __func__ = "qemu_cond_broadcast"
>>>>>>>> #1  0x0000555555911164 in rfifolock_unlock
>>>>>>>> (r=r@entry=0x555556259910) at
>>>>>>>> util/rfifolock.c:75
>>>>>>>>          __PRETTY_FUNCTION__ = "rfifolock_unlock"
>>>>>>>> #2  0x0000555555875921 in aio_context_release
>>>>>>>> (ctx=ctx@entry=0x5555562598b0)
>>>>>>>> at async.c:329
>>>>>>>> No locals.
>>>>>>>> #3  0x000055555588434c in aio_poll (ctx=ctx@entry=0x5555562598b0,
>>>>>>>> blocking=blocking@entry=true) at aio-posix.c:272
>>>>>>>>          node = <optimized out>
>>>>>>>>          was_dispatching = false
>>>>>>>>          i = <optimized out>
>>>>>>>>          ret = <optimized out>
>>>>>>>>          progress = false
>>>>>>>>          timeout = 611734526
>>>>>>>>          __PRETTY_FUNCTION__ = "aio_poll"
>>>>>>>> #4  0x00005555558bc43d in bdrv_prwv_co (bs=bs@entry=0x55555627c0f0,
>>>>>>>> offset=offset@entry=7038976, qiov=qiov@entry=0x7ffff0c208f0,
>>>>>>>> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
>>>>>>>> block/io.c:552
>>>>>>>>          aio_context = 0x5555562598b0
>>>>>>>>          co = <optimized out>
>>>>>>>>          rwco = {bs = 0x55555627c0f0, offset = 7038976, qiov =
>>>>>>>> 0x7ffff0c208f0, is_write = false, ret = 2147483647, flags =
>>>>>>>> (unknown: 0)}
>>>>>>>> #5  0x00005555558bc533 in bdrv_rw_co (bs=0x55555627c0f0,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
>>>>>>>>      flags=flags@entry=(unknown: 0)) at block/io.c:575
>>>>>>>>          qiov = {iov = 0x7ffff0c208e0, niov = 1, nalloc = -1, size
>>>>>>>> = 2048}
>>>>>>>>          iov = {iov_base = 0x555557874800, iov_len = 2048}
>>>>>>>> #6  0x00005555558bc593 in bdrv_read (bs=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/io.c:583
>>>>>>>> No locals.
>>>>>>>> #7  0x00005555558af75d in blk_read (blk=<optimized out>,
>>>>>>>> sector_num=sector_num@entry=13748, buf=buf@entry=0x555557874800 "(",
>>>>>>>> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
>>>>>>>>          ret = <optimized out>
>>>>>>>> #8  0x00005555557abb88 in cd_read_sector (sector_size=<optimized out>,
>>>>>>>> buf=0x555557874800 "(", lba=3437, s=0x55555760db70) at
>>>>>>>> hw/ide/atapi.c:116
>>>>>>>>          ret = <optimized out>
>>>>>>>> #9  ide_atapi_cmd_reply_end (s=0x55555760db70) at hw/ide/atapi.c:190
>>>>>>>>          byte_count_limit = <optimized out>
>>>>>>>>          size = <optimized out>
>>>>>>>>          ret = 2
>>>>>>> This is still the same scenario Kevin explained.
>>>>>>>
>>>>>>> The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
>>>>>>> function holds the QEMU global mutex while waiting for the I/O request
>>>>>>> to complete.  This blocks other vcpu threads and the main loop thread.
>>>>>>>
>>>>>>> The solution is to convert the CD-ROM emulation code to use
>>>>>>> blk_aio_readv() instead of blk_read().
>>>>>> I tried a little, but i am stuck with my approach. I reads one sector
>>>>>> and then doesn't continue. Maybe someone with more knowledge
>>>>>> of ATAPI/IDE could help?
>>>>> Converting synchronous code to asynchronous requires an understanding
>>>>> of the device's state transitions.  Asynchronous code has to put the
>>>>> device registers into a busy state until the request completes.  It
>>>>> also needs to handle hardware register accesses that occur while the
>>>>> request is still pending.
>>>> That was my assumption as well. But I don't know how to proceed...
>>>>
>>>>> I don't know ATAPI/IDE code well enough to suggest a fix.
>>>> Maybe @John can help?
>>>>
>>>> Peter
>>>>
>> I looked into this again and it seems that the remaining problem (at least when the CDROM is
>> mounted via libnfs) is the blk_drain_all() in bmdma_cmd_writeb. At least I end there if I have
>> a proper OS booted and cut off the NFS server. The VM remains responsive until the guest OS
>> issues a DMA cancel.
>>
>> I do not know what the proper solution is. I had the following ideas so far (not knowing if the
>> approaches would be correct or not).
>>
>> a) Do not clear BM_STATUS_DMAING if we are not able to drain all requests. This works until
>> the connection is reestablished. The guest OS issues DMA cancel operations again and
>> again, but when the connectivity is back I end in the following assertion:
>>
>> qemu-system-x86_64: ./hw/ide/pci.h:65: bmdma_active_if: Assertion `bmdma->bus->retry_unit != (uint8_t)-1' failed.
> I would have to check the specs to see if this is allowed.

Maybe there is a better approach after all...

>
>> b) Call the aiocb with -ECANCELED and somehow (?) turn all the callbacks of the outstanding IOs into NOPs.
> This wouldn't be correct for write requests: We would tell the guest
> that the request is cancelled when it's actually still in flight. At
> some point it could still complete, however, and that's not expected by
> the guest.

In case of a CDROM we have a read only device. So this could work?

>
>> c) Follow the hint in the comment in bmdma_cmd_writeb (however this works out):
>>              * In the future we'll be able to safely cancel the I/O if the
>>              * whole DMA operation will be submitted to disk with a single
>>              * aio operation with preadv/pwritev.
> Not sure how likely it is that cancelling that single AIOCB will
> actually cancel the operation and not end up doing bdrv_drain_all()
> internally instead because there is no good way of cancelling the
> request.

You might be right.

It seems that the whole thing is not trivial. But it seems so wrong that a whole vServer goes
down just because someone forgot to eject a CDROM he used once and then the NFS
server has a hickup.

Peter

next prev parent reply	other threads:[~2015-08-14 14:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <55803637.3060607@kamp.de>
2015-06-16 15:34 ` [Qemu-devel] [Qemu-block] RFC cdrom in own thread? Stefan Hajnoczi
2015-06-17  8:35   ` Kevin Wolf
2015-06-18  6:03     ` Peter Lieven
2015-06-18  6:57       ` Markus Armbruster
2015-06-18  6:39     ` Peter Lieven
2015-06-18  6:59       ` Paolo Bonzini
2015-06-18  7:03         ` Peter Lieven
2015-06-18  7:12           ` Peter Lieven
2015-06-18  7:45             ` Kevin Wolf
2015-06-18  8:30               ` Peter Lieven
2015-06-18  8:42                 ` Kevin Wolf
2015-06-18  9:29                   ` Peter Lieven
2015-06-18  9:36                     ` Stefan Hajnoczi
2015-06-18  9:53                       ` Peter Lieven
2015-06-19 13:14                       ` Peter Lieven
2015-06-22  9:25                         ` Stefan Hajnoczi
2015-06-22 13:09                           ` Peter Lieven
2015-06-22 21:54                             ` John Snow
2015-06-23  6:36                               ` Peter Lieven
2015-08-14 13:43                               ` Peter Lieven
2015-08-14 14:08                                 ` Kevin Wolf
2015-08-14 14:21                                   ` Peter Lieven [this message]
2015-08-14 14:45                                   ` Peter Lieven
2015-08-15 19:02                                     ` Peter Lieven
2015-06-18 10:17                   ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55CDF95B.2040206@kamp.de \
    --to=pl@kamp.de \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=zuban32s@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).