qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: <lv.mengzhao@zte.com.cn>
To: <stefanha@gmail.com>, <kwolf@redhat.com>, <hreitz@redhat.com>
Cc: <stefanha@redhat.com>, <mst@redhat.com>, <qemu-devel@nongnu.org>,
	<hu.jian@zte.com.cn>, <cv11411@126.com>
Subject: Re:[PATCH] virtio-blk: don't start dataplane during the stop of dataplane
Date: Mon, 23 Oct 2023 10:12:38 +0800 (CST)	[thread overview]
Message-ID: <202310231012386714935@zte.com.cn> (raw)


[-- Attachment #1.1.1: Type: text/plain, Size: 6316 bytes --]

>> I won't be able to reply until November 2nd. Maybe Kevin or Hanna can>> discuss this with you in the meantime.
Thanks
 @Kevin, @Hanna, Do you have any oponions on this issue and my solution ?







Original Mail


Sender: StefanHajnoczi
To: lv mengzhao10286442;
CC: stefanha@redhat.com;mst@redhat.com;kwolf@redhat.com;hreitz@redhat.com;qemu-devel@nongnu.org;hu jian10269307;cv11411@126.com;
Date: 2023/10/19 20:16
Subject: Re: Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane

On Thu, 19 Oct 2023 at 00:31, <lv.mengzhao@zte.com.cn> wrote:
> 
> On Tue, Oct 17, 2023 at 10:04PM +0800, stefanha@redhat.com wrote:
> 
> > > From: hujian <hu.jian@zte.com.cn> 
> 
> > > 
> 
> > > During the stop of dataplane for virtio-blk, virtio_bus_cleanup_host_notifier() is be
> 
> > > called to clean up notifier at the end, if polled ioeventfd, virtio_blk_handle_output()
> 
> > > is used to handle io request. But due to s->dataplane_disabled is false, it will be
> 
> > > returned directly, which drops io request.
> 
> > > Backtrace:
> 
> > > ->virtio_blk_data_plane_stop
> 
> > >   ->virtio_bus_cleanup_host_notifier
> 
> > >     ->virtio_queue_host_notifier_read
> 
> > >       ->virtio_queue_notify_vq
> 
> > >         ->vq->handle_output
> 
> > >           ->virtio_blk_handle_output
> 
> > >             ->if (s->dataplane  && !s->dataplane_stoped)
> 
> > >               ->if (!s->dataplane_disabled)
> 
> > >                 ->return *
> 
> > >             ->virtio_blk_handle_output_do
> 
> > > The above problem can occur when using "virsh reset" cmdline to reset guest, while
> 
> > > guest does io.
> 
> > > To fix this problem, don't try to start dataplane if s->stopping is true, and io would
> 
> > > be handled by virtio_blk_handle_vq().
> 
> > > 
> 
> > > Signed-off-by: hujian <hu.jian@zte.com.cn> 
> 
> > > ---
> 
> > >  hw/block/virtio-blk.c | 2 +-
> 
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> > 
> 
> > I have dropped this patch again after Fiona pointed out it does not
> 
> > compile and Kevin noticed that handling requests from the main loop
> 
> > thread while the I/O is still being processed in the IOThread is going
> 
> > to cause thread-safety issues.
> 
> > 
> 
> > Can you explain the problem you are seeing in more detail? You run
> 
> > "virsh reset" while the guest is doing I/O. Then what happens?
> 
> > 
> 
> > Stefan
> 
> 
> 1 Compilation issues
> 
> I'm sorry to be in such a hurry to submit the patch that I forgot to compile it locally.
> 
> Compilable patches are at the bottom.
> 
> 
> 2 Troubleshooting
 
I won't be able to reply until November 2nd. Maybe Kevin or Hanna can
discuss this with you in the meantime.
 
Stefan
 
> We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel version: 4.18),
> 
> which is loop execution: virsh create -> virsh suspend -> virsh resume -> virsh reset -> 
> 
> virsh shutdown, and io stress test inside the virtual machine. After the loop is executed
> 
> about 200 times, after "virsh reset" is executed, the virtual machine goes into emergency
> 
> mode and fails to start normally. Theoretically, "virsh reset" may causes data loss of
> 
> virtual machine, but not enough to put it into emergency mode.
> 
> 
> Coincidentally, I happen to be fixing another different fault with the same phenomenon,
> 
> which is caused by a our private patch, this patch calls virtio_blk_data_plane_ [stop|start]
> 
> to operate the dataplane, if QEMU is processing io requests at same time, it may cause the
> 
> loss of io requests.
> 
> 
> Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is used to
> 
> clean up notifiers, and my understanding is that it would handle the remaining IO requests.
> 
> The stack is as follows, I add the print at the star line and find that virtio_blk_handle_output()
> 
> returned directly instead of continuing to call virtio_blk_handle_vq() to handle io. So I modify
> 
> the code here to make it don't return during the stop of dataplane, and our internal private patch
> 
> works normally.
> 
> 
> Backtrace:
> 
> ->virtio_blk_data_plane_stop
> 
>   ->virtio_bus_cleanup_host_notifier
> 
>     ->virtio_queue_host_notifier_read
> 
>       ->virtio_queue_notify_vq
> 
>         ->vq->handle_output
> 
>           ->virtio_blk_handle_output
> 
>             ->if (s->dataplane  && !s->dataplane_stoped)
> 
>               ->if (!s->dataplane_disabled)
> 
>                 ->return *
> 
>             ->virtio_blk_handle_vq
> 
> 
> Back to the problem caused by the virsh reset, libvirt sends the "system_reset" QEMU
> 
> monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually calls
> 
> virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss will
> 
> also occur if QEMU still has io to process during the stop of dataplane.
> 
> 
> 3 Thread-safety issues
> 
> virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to the QEMU
> 
> main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO requests
> 
> are still handling by iothread(if configured). I'm a little confused as to why there
> 
> is thread-safety issues.
> 
> 
> Lastly, please CC to cv11411@126.com, this is my private email, so I can contact with
> 
> you in my free time, Thanks.
> 
> 
> 4 Compilable patches
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> 
> index 39e7f23..d3a719c 100644
> 
> --- a/hw/block/virtio-blk.c
> 
> +++ b/hw/block/virtio-blk.c
> 
> @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
> 
>  static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> 
>  {
> 
>      VirtIOBlock *s = (VirtIOBlock *)vdev;
> 
> +    VirtIOBlockDataPlane *dp = s->dataplane;
> 
> 
> -    if (s->dataplane && !s->dataplane_started) {
> 
> +    if (s->dataplane && !s->dataplane_started && !dp->stopping) {
> 
>          /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> 
>           * dataplane here instead of waiting for .set_status().
> 
>           */
> 
> 
> 
> 
>

[-- Attachment #1.1.2: Type: text/html , Size: 15072 bytes --]

             reply	other threads:[~2023-10-23  2:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23  2:12 lv.mengzhao [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-10-19  7:28 Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane lv.mengzhao
2023-10-19 12:15 ` Stefan Hajnoczi
2023-10-23  9:13 ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202310231012386714935@zte.com.cn \
    --to=lv.mengzhao@zte.com.cn \
    --cc=cv11411@126.com \
    --cc=hreitz@redhat.com \
    --cc=hu.jian@zte.com.cn \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).