* [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() @ 2022-06-09 16:47 Stefan Hajnoczi 2022-06-09 16:47 ` [PATCH 1/2] " Stefan Hajnoczi ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Stefan Hajnoczi @ 2022-06-09 16:47 UTC (permalink / raw) To: qemu-devel Cc: Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block, Stefan Hajnoczi An unlucky I/O pattern can result in stalled Linux AIO requests when the plugged counter becomes unbalanced. See Patch 1 for details. Patch 2 adds a comment to explain why the laio_io_unplug() even checks max batch in the first place. Stefan Hajnoczi (2): linux-aio: fix unbalanced plugged counter in laio_io_unplug() linux-aio: explain why max batch is checked in laio_io_unplug() block/linux-aio.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) -- 2.36.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() 2022-06-09 16:47 [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() Stefan Hajnoczi @ 2022-06-09 16:47 ` Stefan Hajnoczi 2022-06-09 17:45 ` Stefano Garzarella 2022-06-09 16:47 ` [PATCH 2/2] linux-aio: explain why max batch is checked " Stefan Hajnoczi 2022-06-14 16:29 ` [PATCH 0/2] linux-aio: fix unbalanced plugged counter " Stefan Hajnoczi 2 siblings, 1 reply; 7+ messages in thread From: Stefan Hajnoczi @ 2022-06-09 16:47 UTC (permalink / raw) To: qemu-devel Cc: Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block, Stefan Hajnoczi, Nikolay Tenev, Stefano Garzarella Every laio_io_plug() call has a matching laio_io_unplug() call. There is a plugged counter that tracks the number of levels of plugging and allows for nesting. The plugged counter must reflect the balance between laio_io_plug() and laio_io_unplug() calls accurately. Otherwise I/O stalls occur since io_submit(2) calls are skipped while plugged. Reported-by: Nikolay Tenev <nt@storpool.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- block/linux-aio.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 4c423fcccf..6078da7e42 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -363,8 +363,10 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s, uint64_t dev_max_batch) { assert(s->io_q.plugged); + s->io_q.plugged--; + if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || - (--s->io_q.plugged == 0 && + (!s->io_q.plugged && !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { ioq_submit(s); } -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() 2022-06-09 16:47 ` [PATCH 1/2] " Stefan Hajnoczi @ 2022-06-09 17:45 ` Stefano Garzarella 0 siblings, 0 replies; 7+ messages in thread From: Stefano Garzarella @ 2022-06-09 17:45 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block, Nikolay Tenev On Thu, Jun 09, 2022 at 05:47:11PM +0100, Stefan Hajnoczi wrote: >Every laio_io_plug() call has a matching laio_io_unplug() call. There is >a plugged counter that tracks the number of levels of plugging and >allows for nesting. > >The plugged counter must reflect the balance between laio_io_plug() and >laio_io_unplug() calls accurately. Otherwise I/O stalls occur since >io_submit(2) calls are skipped while plugged. > We can add a Fixes tag: Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()") >Reported-by: Nikolay Tenev <nt@storpool.com> >Cc: Stefano Garzarella <sgarzare@redhat.com> >Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> >--- > block/linux-aio.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > >diff --git a/block/linux-aio.c b/block/linux-aio.c >index 4c423fcccf..6078da7e42 100644 >--- a/block/linux-aio.c >+++ b/block/linux-aio.c >@@ -363,8 +363,10 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s, > uint64_t dev_max_batch) > { > assert(s->io_q.plugged); >+ s->io_q.plugged--; >+ > if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || >- (--s->io_q.plugged == 0 && >+ (!s->io_q.plugged && > !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { > ioq_submit(s); > } >-- >2.36.1 > Thanks for fixing this issue! Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] linux-aio: explain why max batch is checked in laio_io_unplug() 2022-06-09 16:47 [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() Stefan Hajnoczi 2022-06-09 16:47 ` [PATCH 1/2] " Stefan Hajnoczi @ 2022-06-09 16:47 ` Stefan Hajnoczi 2022-06-09 17:47 ` Stefano Garzarella 2022-06-14 16:29 ` [PATCH 0/2] linux-aio: fix unbalanced plugged counter " Stefan Hajnoczi 2 siblings, 1 reply; 7+ messages in thread From: Stefan Hajnoczi @ 2022-06-09 16:47 UTC (permalink / raw) To: qemu-devel Cc: Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block, Stefan Hajnoczi, Stefano Garzarella It may not be obvious why laio_io_unplug() checks max batch. I discussed this with Stefano and have added a comment summarizing the reason. Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- block/linux-aio.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/block/linux-aio.c b/block/linux-aio.c index 6078da7e42..9c2393a2f7 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -365,6 +365,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s, assert(s->io_q.plugged); s->io_q.plugged--; + /* + * Why max batch checking is performed here: + * Another BDS may have queued requests with a higher dev_max_batch and + * therefore in_queue could now exceed our dev_max_batch. Re-check the max + * batch so we can honor our device's dev_max_batch. + */ if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || (!s->io_q.plugged && !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { -- 2.36.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] linux-aio: explain why max batch is checked in laio_io_unplug() 2022-06-09 16:47 ` [PATCH 2/2] linux-aio: explain why max batch is checked " Stefan Hajnoczi @ 2022-06-09 17:47 ` Stefano Garzarella 0 siblings, 0 replies; 7+ messages in thread From: Stefano Garzarella @ 2022-06-09 17:47 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block On Thu, Jun 09, 2022 at 05:47:12PM +0100, Stefan Hajnoczi wrote: >It may not be obvious why laio_io_unplug() checks max batch. I discussed >this with Stefano and have added a comment summarizing the reason. > >Cc: Stefano Garzarella <sgarzare@redhat.com> >Cc: Kevin Wolf <kwolf@redhat.com> >Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> >--- > block/linux-aio.c | 6 ++++++ > 1 file changed, 6 insertions(+) > >diff --git a/block/linux-aio.c b/block/linux-aio.c >index 6078da7e42..9c2393a2f7 100644 >--- a/block/linux-aio.c >+++ b/block/linux-aio.c >@@ -365,6 +365,12 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s, > assert(s->io_q.plugged); > s->io_q.plugged--; > >+ /* >+ * Why max batch checking is performed here: >+ * Another BDS may have queued requests with a higher dev_max_batch and >+ * therefore in_queue could now exceed our dev_max_batch. Re-check the max >+ * batch so we can honor our device's dev_max_batch. >+ */ > if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || > (!s->io_q.plugged && > !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { >-- >2.36.1 > I should have added that... Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() 2022-06-09 16:47 [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() Stefan Hajnoczi 2022-06-09 16:47 ` [PATCH 1/2] " Stefan Hajnoczi 2022-06-09 16:47 ` [PATCH 2/2] linux-aio: explain why max batch is checked " Stefan Hajnoczi @ 2022-06-14 16:29 ` Stefan Hajnoczi 2022-06-16 21:27 ` Mark Mielke 2 siblings, 1 reply; 7+ messages in thread From: Stefan Hajnoczi @ 2022-06-14 16:29 UTC (permalink / raw) To: qemu-devel; +Cc: Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block [-- Attachment #1: Type: text/plain, Size: 694 bytes --] On Thu, Jun 09, 2022 at 05:47:10PM +0100, Stefan Hajnoczi wrote: > An unlucky I/O pattern can result in stalled Linux AIO requests when the > plugged counter becomes unbalanced. See Patch 1 for details. > > Patch 2 adds a comment to explain why the laio_io_unplug() even checks max > batch in the first place. > > Stefan Hajnoczi (2): > linux-aio: fix unbalanced plugged counter in laio_io_unplug() > linux-aio: explain why max batch is checked in laio_io_unplug() > > block/linux-aio.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > -- > 2.36.1 > Thanks, applied to my block tree: https://gitlab.com/stefanha/qemu/commits/block Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() 2022-06-14 16:29 ` [PATCH 0/2] linux-aio: fix unbalanced plugged counter " Stefan Hajnoczi @ 2022-06-16 21:27 ` Mark Mielke 0 siblings, 0 replies; 7+ messages in thread From: Mark Mielke @ 2022-06-16 21:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Hanna Reitz, qemu-stable, Kevin Wolf, qemu-block [-- Attachment #1: Type: text/plain, Size: 1776 bytes --] Thank you for finding this and fixing it. This issue has been giving us grief for months, and this patch appears to resolve the problem. In our case, it seemed to have much greater severity with the RHEL / CentOS 7.x Linux 3.10 kernel when tied to SolidFire iSCSI based storage. This caused it to escape notice in our original soak period, and is likely a contributor to why others didn't encounter the problem. However, I believe this looks like a serious problem that could affect any guest machine that does a large amount of I/O. I believe the SolidFire connection may be that the I/O can queue up more easily than the local NVMe storage we also use, and there could be something related to the SolidFire QoS re-balancing where the iSCSI connection may be re-negotiated from time to time. So, I think this is more like "happens in some environments more than others", and unfortunately it happened a lot in one of our environments. :-( On Tue, Jun 14, 2022 at 12:36 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Thu, Jun 09, 2022 at 05:47:10PM +0100, Stefan Hajnoczi wrote: > > An unlucky I/O pattern can result in stalled Linux AIO requests when the > > plugged counter becomes unbalanced. See Patch 1 for details. > > > > Patch 2 adds a comment to explain why the laio_io_unplug() even checks > max > > batch in the first place. > > > > Stefan Hajnoczi (2): > > linux-aio: fix unbalanced plugged counter in laio_io_unplug() > > linux-aio: explain why max batch is checked in laio_io_unplug() > > > > block/linux-aio.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > -- > > 2.36.1 > > > > Thanks, applied to my block tree: > https://gitlab.com/stefanha/qemu/commits/block > > Stefan > -- Mark Mielke <mark.mielke@gmail.com> [-- Attachment #2: Type: text/html, Size: 2474 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-06-16 21:29 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-09 16:47 [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug() Stefan Hajnoczi 2022-06-09 16:47 ` [PATCH 1/2] " Stefan Hajnoczi 2022-06-09 17:45 ` Stefano Garzarella 2022-06-09 16:47 ` [PATCH 2/2] linux-aio: explain why max batch is checked " Stefan Hajnoczi 2022-06-09 17:47 ` Stefano Garzarella 2022-06-14 16:29 ` [PATCH 0/2] linux-aio: fix unbalanced plugged counter " Stefan Hajnoczi 2022-06-16 21:27 ` Mark Mielke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).