From: <lv.mengzhao@zte.com.cn>
To: <stefanha@gmail.com>, <kwolf@redhat.com>, <hreitz@redhat.com>
Cc: <stefanha@redhat.com>, <mst@redhat.com>, <qemu-devel@nongnu.org>,
<hu.jian@zte.com.cn>, <cv11411@126.com>
Subject: Re:[PATCH] virtio-blk: don't start dataplane during the stop of dataplane
Date: Mon, 23 Oct 2023 10:12:38 +0800 (CST) [thread overview]
Message-ID: <202310231012386714935@zte.com.cn> (raw)
[-- Attachment #1.1.1: Type: text/plain, Size: 6316 bytes --]
>> I won't be able to reply until November 2nd. Maybe Kevin or Hanna can>> discuss this with you in the meantime.
Thanks
@Kevin, @Hanna, Do you have any oponions on this issue and my solution ?
Original Mail
Sender: StefanHajnoczi
To: lv mengzhao10286442;
CC: stefanha@redhat.com;mst@redhat.com;kwolf@redhat.com;hreitz@redhat.com;qemu-devel@nongnu.org;hu jian10269307;cv11411@126.com;
Date: 2023/10/19 20:16
Subject: Re: Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane
On Thu, 19 Oct 2023 at 00:31, <lv.mengzhao@zte.com.cn> wrote:
>
> On Tue, Oct 17, 2023 at 10:04PM +0800, stefanha@redhat.com wrote:
>
> > > From: hujian <hu.jian@zte.com.cn>
>
> > >
>
> > > During the stop of dataplane for virtio-blk, virtio_bus_cleanup_host_notifier() is be
>
> > > called to clean up notifier at the end, if polled ioeventfd, virtio_blk_handle_output()
>
> > > is used to handle io request. But due to s->dataplane_disabled is false, it will be
>
> > > returned directly, which drops io request.
>
> > > Backtrace:
>
> > > ->virtio_blk_data_plane_stop
>
> > > ->virtio_bus_cleanup_host_notifier
>
> > > ->virtio_queue_host_notifier_read
>
> > > ->virtio_queue_notify_vq
>
> > > ->vq->handle_output
>
> > > ->virtio_blk_handle_output
>
> > > ->if (s->dataplane && !s->dataplane_stoped)
>
> > > ->if (!s->dataplane_disabled)
>
> > > ->return *
>
> > > ->virtio_blk_handle_output_do
>
> > > The above problem can occur when using "virsh reset" cmdline to reset guest, while
>
> > > guest does io.
>
> > > To fix this problem, don't try to start dataplane if s->stopping is true, and io would
>
> > > be handled by virtio_blk_handle_vq().
>
> > >
>
> > > Signed-off-by: hujian <hu.jian@zte.com.cn>
>
> > > ---
>
> > > hw/block/virtio-blk.c | 2 +-
>
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> >
>
> > I have dropped this patch again after Fiona pointed out it does not
>
> > compile and Kevin noticed that handling requests from the main loop
>
> > thread while the I/O is still being processed in the IOThread is going
>
> > to cause thread-safety issues.
>
> >
>
> > Can you explain the problem you are seeing in more detail? You run
>
> > "virsh reset" while the guest is doing I/O. Then what happens?
>
> >
>
> > Stefan
>
>
> 1 Compilation issues
>
> I'm sorry to be in such a hurry to submit the patch that I forgot to compile it locally.
>
> Compilable patches are at the bottom.
>
>
> 2 Troubleshooting
I won't be able to reply until November 2nd. Maybe Kevin or Hanna can
discuss this with you in the meantime.
Stefan
> We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel version: 4.18),
>
> which is loop execution: virsh create -> virsh suspend -> virsh resume -> virsh reset ->
>
> virsh shutdown, and io stress test inside the virtual machine. After the loop is executed
>
> about 200 times, after "virsh reset" is executed, the virtual machine goes into emergency
>
> mode and fails to start normally. Theoretically, "virsh reset" may causes data loss of
>
> virtual machine, but not enough to put it into emergency mode.
>
>
> Coincidentally, I happen to be fixing another different fault with the same phenomenon,
>
> which is caused by a our private patch, this patch calls virtio_blk_data_plane_ [stop|start]
>
> to operate the dataplane, if QEMU is processing io requests at same time, it may cause the
>
> loss of io requests.
>
>
> Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is used to
>
> clean up notifiers, and my understanding is that it would handle the remaining IO requests.
>
> The stack is as follows, I add the print at the star line and find that virtio_blk_handle_output()
>
> returned directly instead of continuing to call virtio_blk_handle_vq() to handle io. So I modify
>
> the code here to make it don't return during the stop of dataplane, and our internal private patch
>
> works normally.
>
>
> Backtrace:
>
> ->virtio_blk_data_plane_stop
>
> ->virtio_bus_cleanup_host_notifier
>
> ->virtio_queue_host_notifier_read
>
> ->virtio_queue_notify_vq
>
> ->vq->handle_output
>
> ->virtio_blk_handle_output
>
> ->if (s->dataplane && !s->dataplane_stoped)
>
> ->if (!s->dataplane_disabled)
>
> ->return *
>
> ->virtio_blk_handle_vq
>
>
> Back to the problem caused by the virsh reset, libvirt sends the "system_reset" QEMU
>
> monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually calls
>
> virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss will
>
> also occur if QEMU still has io to process during the stop of dataplane.
>
>
> 3 Thread-safety issues
>
> virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to the QEMU
>
> main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO requests
>
> are still handling by iothread(if configured). I'm a little confused as to why there
>
> is thread-safety issues.
>
>
> Lastly, please CC to cv11411@126.com, this is my private email, so I can contact with
>
> you in my free time, Thanks.
>
>
> 4 Compilable patches
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>
> index 39e7f23..d3a719c 100644
>
> --- a/hw/block/virtio-blk.c
>
> +++ b/hw/block/virtio-blk.c
>
> @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
>
> static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>
> {
>
> VirtIOBlock *s = (VirtIOBlock *)vdev;
>
> + VirtIOBlockDataPlane *dp = s->dataplane;
>
>
> - if (s->dataplane && !s->dataplane_started) {
>
> + if (s->dataplane && !s->dataplane_started && !dp->stopping) {
>
> /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>
> * dataplane here instead of waiting for .set_status().
>
> */
>
>
>
>
>
[-- Attachment #1.1.2: Type: text/html , Size: 15072 bytes --]
next reply other threads:[~2023-10-23 2:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 2:12 lv.mengzhao [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-10-19 7:28 Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane lv.mengzhao
2023-10-19 12:15 ` Stefan Hajnoczi
2023-10-23 9:13 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202310231012386714935@zte.com.cn \
--to=lv.mengzhao@zte.com.cn \
--cc=cv11411@126.com \
--cc=hreitz@redhat.com \
--cc=hu.jian@zte.com.cn \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.