From: Stefan Reiter <s.reiter@proxmox.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Date: Mon, 22 Mar 2021 16:26:12 +0100 [thread overview]
Message-ID: <b5c777bf-d8c7-00df-4373-58ab4f06ae2a@proxmox.com> (raw)
In-Reply-To: <20210322110847.cdo477ve2gydab64@wobu-vie.proxmox.com>
On 3/22/21 12:08 PM, Wolfgang Bumiller wrote:
> On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote:
>> If OOB is disabled, events received in monitor_qmp_event will be handled
>> in the main context. Thus, we must not acquire a qmp_queue_lock there,
>> as the dispatcher coroutine holds one over a yield point, where it
>> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED
>> event is received just then, it can race and block the main thread by
>> waiting on the queue lock.
>>
>> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler
>> thread, so the main thread can always make progress during the
>> reschedule.
>>
>> The delaying of the cleanup is safe, since the dispatcher always moves
>> back to the iothread afterward, and thus the cleanup will happen before
>> it gets to its next iteration.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>
> This is a tough one. It *may* be fine, but I wonder if we can approach
> this differently:
>
> From what I can gather we have the following call stacks & contexts:
>
> Guarded lock (lock & release):
> * monitor_qmp_cleanup_queue_and_resume
> by monitor_qmp_event
> by file handler (from I/O loop)
> ^ iohandler_context (assuming that's where the file handling happens...)
> (after this patch as BH though)
>
> * handle_qmp_command
> a) by the json parser (which is also re-initialized by
> monitor_qmp_event btw., haven't checked if that can also
> "trigger" its methods immediately)
> b) by monitor_qmp_read
> by file handler (from I/O loop)
> ^ iohandler_context
>
> Lock-"returning":
> * monitor_qmp_requests_pop_any_with_lock
> by coroutine_fn monitor_qmp_dispatcher_co
> ^ iohandler_context
>
> Lock-releasing:
> * coroutine_fn monitor_qmp_dispatcher_co
> ^ qemu_aio_context
>
> The only *weird* thing that immediately pops out here is
> `monitor_qmp_requests_pop_any_with_lock()` keeping a lock while
> switching contexts.
monitor_qmp_dispatcher_co? _pop_any_ doesn't switch contexts...
But yes, that is weird, as I mentioned in my original mail too.
> This is done in order to allow `AIO_WAIT_WHILE` to work while making
> progress on the events, but do we actually already need to be in this
> context for the OOB `monitor_resume()` call or can we defer the context
> switch to after having done that and released the lock?
> `monitor_resume()` itself seems to simply schedule a BH which should
> work regardless if I'm not mistaken. There's also a
> `readline_show_prompt()` call, but that *looks* harmless?
The BH should indeed be harmless since we don't schedule on
qemu_get_current_aio_context, and the readline_show_prompt call we can
ignore here since it's guarded with "!monitor_is_qmp(mon)".
> `monitor_resume()` is also called without the lock later on, so even if
> it needs to be in this context at that point for whatever reason, does
> it need the lock?
>
It doesn't access the queue, so I don't see why it'd need the lock. And
as you said, it currently works without too, actually, before commit
88daf0996c ("qmp: Resume OOB-enabled monitor before processing the
request") it always did so.
I'll cobble together a v2 with this in mind.
next prev parent reply other threads:[~2021-03-22 15:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-18 13:35 [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB Stefan Reiter
2021-03-22 11:08 ` Wolfgang Bumiller
2021-03-22 15:26 ` Stefan Reiter [this message]
2021-03-26 14:48 ` Markus Armbruster
2021-03-29 9:49 ` Stefan Reiter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5c777bf-d8c7-00df-4373-58ab4f06ae2a@proxmox.com \
--to=s.reiter@proxmox.com \
--cc=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=t.lamprecht@proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).