From: <lv.mengzhao@zte.com.cn>
To: <stefanha@gmail.com>, <kwolf@redhat.com>, <hreitz@redhat.com>
Cc: <stefanha@redhat.com>, <mst@redhat.com>, <qemu-devel@nongnu.org>,
<hu.jian@zte.com.cn>, <cv11411@126.com>
Subject: Re:[PATCH] virtio-blk: don't start dataplane during the stop of dataplane
Date: Mon, 23 Oct 2023 10:12:38 +0800 (CST) [thread overview]
Message-ID: <202310231012386714935@zte.com.cn> (raw)
[-- Attachment #1.1.1: Type: text/plain, Size: 6316 bytes --]
>> I won't be able to reply until November 2nd. Maybe Kevin or Hanna can>> discuss this with you in the meantime.
Thanks
@Kevin, @Hanna, Do you have any oponions on this issue and my solution ?
Original Mail
Sender: StefanHajnoczi
To: lv mengzhao10286442;
CC: stefanha@redhat.com;mst@redhat.com;kwolf@redhat.com;hreitz@redhat.com;qemu-devel@nongnu.org;hu jian10269307;cv11411@126.com;
Date: 2023/10/19 20:16
Subject: Re: Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane
On Thu, 19 Oct 2023 at 00:31, <lv.mengzhao@zte.com.cn> wrote:
>
> On Tue, Oct 17, 2023 at 10:04PM +0800, stefanha@redhat.com wrote:
>
> > > From: hujian <hu.jian@zte.com.cn>
>
> > >
>
> > > During the stop of dataplane for virtio-blk, virtio_bus_cleanup_host_notifier() is be
>
> > > called to clean up notifier at the end, if polled ioeventfd, virtio_blk_handle_output()
>
> > > is used to handle io request. But due to s->dataplane_disabled is false, it will be
>
> > > returned directly, which drops io request.
>
> > > Backtrace:
>
> > > ->virtio_blk_data_plane_stop
>
> > > ->virtio_bus_cleanup_host_notifier
>
> > > ->virtio_queue_host_notifier_read
>
> > > ->virtio_queue_notify_vq
>
> > > ->vq->handle_output
>
> > > ->virtio_blk_handle_output
>
> > > ->if (s->dataplane && !s->dataplane_stoped)
>
> > > ->if (!s->dataplane_disabled)
>
> > > ->return *
>
> > > ->virtio_blk_handle_output_do
>
> > > The above problem can occur when using "virsh reset" cmdline to reset guest, while
>
> > > guest does io.
>
> > > To fix this problem, don't try to start dataplane if s->stopping is true, and io would
>
> > > be handled by virtio_blk_handle_vq().
>
> > >
>
> > > Signed-off-by: hujian <hu.jian@zte.com.cn>
>
> > > ---
>
> > > hw/block/virtio-blk.c | 2 +-
>
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> >
>
> > I have dropped this patch again after Fiona pointed out it does not
>
> > compile and Kevin noticed that handling requests from the main loop
>
> > thread while the I/O is still being processed in the IOThread is going
>
> > to cause thread-safety issues.
>
> >
>
> > Can you explain the problem you are seeing in more detail? You run
>
> > "virsh reset" while the guest is doing I/O. Then what happens?
>
> >
>
> > Stefan
>
>
> 1 Compilation issues
>
> I'm sorry to be in such a hurry to submit the patch that I forgot to compile it locally.
>
> Compilable patches are at the bottom.
>
>
> 2 Troubleshooting
I won't be able to reply until November 2nd. Maybe Kevin or Hanna can
discuss this with you in the meantime.
Stefan
> We have done a lifecycle test for the VM (QEMU version: 4.1.0, Host kernel version: 4.18),
>
> which is loop execution: virsh create -> virsh suspend -> virsh resume -> virsh reset ->
>
> virsh shutdown, and io stress test inside the virtual machine. After the loop is executed
>
> about 200 times, after "virsh reset" is executed, the virtual machine goes into emergency
>
> mode and fails to start normally. Theoretically, "virsh reset" may causes data loss of
>
> virtual machine, but not enough to put it into emergency mode.
>
>
> Coincidentally, I happen to be fixing another different fault with the same phenomenon,
>
> which is caused by a our private patch, this patch calls virtio_blk_data_plane_ [stop|start]
>
> to operate the dataplane, if QEMU is processing io requests at same time, it may cause the
>
> loss of io requests.
>
>
> Analyzing virtio_blk_data_plane_stop(), virtio_bus_cleanup_host_notifier() is used to
>
> clean up notifiers, and my understanding is that it would handle the remaining IO requests.
>
> The stack is as follows, I add the print at the star line and find that virtio_blk_handle_output()
>
> returned directly instead of continuing to call virtio_blk_handle_vq() to handle io. So I modify
>
> the code here to make it don't return during the stop of dataplane, and our internal private patch
>
> works normally.
>
>
> Backtrace:
>
> ->virtio_blk_data_plane_stop
>
> ->virtio_bus_cleanup_host_notifier
>
> ->virtio_queue_host_notifier_read
>
> ->virtio_queue_notify_vq
>
> ->vq->handle_output
>
> ->virtio_blk_handle_output
>
> ->if (s->dataplane && !s->dataplane_stoped)
>
> ->if (!s->dataplane_disabled)
>
> ->return *
>
> ->virtio_blk_handle_vq
>
>
> Back to the problem caused by the virsh reset, libvirt sends the "system_reset" QEMU
>
> monitor command to QEMU, and QEMU calls qmp_system_reset(), and eventually calls
>
> virtio_blk_data_plane_[stop|start] to reset devices. I suspect that io loss will
>
> also occur if QEMU still has io to process during the stop of dataplane.
>
>
> 3 Thread-safety issues
>
> virtio_blk_data_plane_stop() calls blk_set_aio_context() to switch bs back to the QEMU
>
> main loop after virtio_bus_cleanup_host_notifier(), so the remaining IO requests
>
> are still handling by iothread(if configured). I'm a little confused as to why there
>
> is thread-safety issues.
>
>
> Lastly, please CC to cv11411@126.com, this is my private email, so I can contact with
>
> you in my free time, Thanks.
>
>
> 4 Compilable patches
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>
> index 39e7f23..d3a719c 100644
>
> --- a/hw/block/virtio-blk.c
>
> +++ b/hw/block/virtio-blk.c
>
> @@ -1165,8 +1165,9 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
>
> static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>
> {
>
> VirtIOBlock *s = (VirtIOBlock *)vdev;
>
> + VirtIOBlockDataPlane *dp = s->dataplane;
>
>
> - if (s->dataplane && !s->dataplane_started) {
>
> + if (s->dataplane && !s->dataplane_started && !dp->stopping) {
>
> /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>
> * dataplane here instead of waiting for .set_status().
>
> */
>
>
>
>
>
[-- Attachment #1.1.2: Type: text/html , Size: 15072 bytes --]
next reply other threads:[~2023-10-23 2:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 2:12 lv.mengzhao [this message]
-- strict thread matches above, loose matches on Subject: below --
2023-10-19 7:28 Re: [PATCH] virtio-blk: don't start dataplane during the stop of dataplane lv.mengzhao
2023-10-19 12:15 ` Stefan Hajnoczi
2023-10-23 9:13 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202310231012386714935@zte.com.cn \
--to=lv.mengzhao@zte.com.cn \
--cc=cv11411@126.com \
--cc=hreitz@redhat.com \
--cc=hu.jian@zte.com.cn \
--cc=kwolf@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).