From: Markus Armbruster <armbru@redhat.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: "Michael Roth" <mdroth@linux.vnet.ibm.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
qemu-stable@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH] monitor/qmp: resume monitor when clearing its queue
Date: Wed, 09 Oct 2019 21:18:04 +0200 [thread overview]
Message-ID: <8736g1zq1f.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <20191009101032.kxts5buz7sp3cyo5@olga.proxmox.com> (Wolfgang Bumiller's message of "Wed, 9 Oct 2019 12:10:32 +0200")
Wolfgang Bumiller <w.bumiller@proxmox.com> writes:
> On Wed, Oct 09, 2019 at 10:39:44AM +0200, Markus Armbruster wrote:
>> Cc: Marc-André for additional monitor and chardev expertise.
>>
>> Wolfgang Bumiller <w.bumiller@proxmox.com> writes:
>>
>> > When a monitor's queue is filled up in handle_qmp_command()
>> > it gets suspended. It's the dispatcher bh's job currently to
>> > resume the monitor, which it does after processing an event
>> > from the queue. However, it is possible for a
>> > CHR_EVENT_CLOSED event to be processed before before the bh
>> > is scheduled, which will clear the queue without resuming
>> > the monitor, thereby preventing the dispatcher from reaching
>> > the resume() call.
>>
>> Because with the request queue cleared, there's nothing for
>> monitor_qmp_requests_pop_any_with_lock() to pop, so
>> monitor_qmp_bh_dispatcher() won't look at this monitor. It stays
>> suspended forever. Correct?
>>
>> Observable effect for the monitor's user?
>
> Yes.
I was too terse, let me try again: what exactly breaks for the monitor's
user?
> More easily triggered now with oob. We ran into this a longer time
> ago, but our only reliable trigger was a customized version of
> -loadstate which loads the state from a separate file instead of the
> vmstate region of a qcow2. Turns out that doing this on a slow storage
> (~12s to load the data) caused our status daemon to try to poll the qmp
> socket during the load-state and give up after a 3s timeout. And since
> the BH runs in the main loop which is not even entered until after the
> loadstate has finished, but iothread handling the qmp socket does fill &
> clear the queue, the qmp socket always ended up unusable afterwards.
>
> Aside from that we have users reporting the same symptom (hanging qmp)
> appearing randomly on busy systems.
>
>> > Fix this by resuming the monitor when clearing a queue which
>> > was filled up.
>> >
>> > Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
>> > ---
>> > @Michael, we ran into this with qemu 4.0, so if the logic in this patch
>> > is correct it may make sense to include it in the 4.0.1 roundup.
>> > A backport is at [1] as 4.0 was before the monitor/ dir split.
>> >
>> > [1] https://gitlab.com/wbumiller/qemu/commit/9d8bbb5294ed084f282174b0c91e1a614e0a0714
>> >
>> > monitor/qmp.c | 10 ++++++++++
>> > 1 file changed, 10 insertions(+)
>> >
>> > diff --git a/monitor/qmp.c b/monitor/qmp.c
>> > index 9d9e5d8b27..c1db5bf940 100644
>> > --- a/monitor/qmp.c
>> > +++ b/monitor/qmp.c
>> > @@ -70,9 +70,19 @@ static void qmp_request_free(QMPRequest *req)
>> > /* Caller must hold mon->qmp.qmp_queue_lock */
>> > static void monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon)
>> > {
>> > + bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > 0)
>> > + || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;
>>
>> Can you explain why this condition is correct?
>
> Sorry, I meant to add a comment pointing to monitor_qmp_bh_dispatcher(),
> which does the following *after* popping 1 element off the queue:
>
> need_resume = !qmp_oob_enabled(mon) ||
> mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> qemu_mutex_unlock(&mon->qmp_queue_lock);
>
> It's supposed to be the same condition, but _before_ popping off an
> element (hence no `- 1`), but the queue shouldn't be empty as well
> otherwise the `monitor_suspend()` in `handle_qmp_command()` hasn't
> happened, though on second though we could probably just return early in
> that case.).
I see.
Could we monitor_resume() unconditionally here?
>> > while (!g_queue_is_empty(mon->qmp_requests)) {
>> > qmp_request_free(g_queue_pop_head(mon->qmp_requests));
>> > }
>> > + if (need_resume) {
>> > + /*
>> > + * Pairs with the monitor_suspend() in handle_qmp_command() in case the
>> > + * queue gets cleared from a CH_EVENT_CLOSED event before the dispatch
>> > + * bh got scheduled.
>> > + */
>> > + monitor_resume(&mon->common);
>> > + }
>> > }
>> >
>> > static void monitor_qmp_cleanup_queues(MonitorQMP *mon)
>>
>> Is monitor_qmp_cleanup_req_queue_locked() the correct place?
>>
>> It's called from
>>
>> * monitor_qmp_event() case CHR_EVENT_CLOSED via
>> monitor_qmp_cleanup_queues(), as part of destroying the monitor's
>> session state.
>>
>> This is the case you're trying to fix. Correct?
>>
>> I figure monitor_resume() is safe because we haven't really destroyed
>> anything, yet, we merely flushed the request queue. Correct?
>>
>> * monitor_data_destroy() via monitor_data_destroy_qmp() when destroying
>> the monitor.
>>
>> Can need_resume be true in this case? If yes, is monitor_resume()
>> still safe? We're in the middle of destroying the monitor...
>
> I thought so when first reading through it, but on second though, we
> should probably avoid this for sanity's sake.
> Maybe with a flag, or an extra parameter.
> Or we could introduce a "bool queue_filled" we set in handle_qmp_command()
> instead of "calculating" `need_resume` in 2 places and unset it in
> `monitor_data_destroy()` before clearing the queue?
Could we simply call monitor_resume() in monitor_qmp_event() right after
monitor_qmp_cleanup_queues()?
next prev parent reply other threads:[~2019-10-09 21:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-02 8:30 [PATCH] monitor/qmp: resume monitor when clearing its queue Wolfgang Bumiller
2019-10-09 8:39 ` Markus Armbruster
2019-10-09 10:10 ` Wolfgang Bumiller
2019-10-09 19:18 ` Markus Armbruster [this message]
2019-10-10 8:12 ` Wolfgang Bumiller
2019-10-10 9:03 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736g1zq1f.fsf@dusky.pond.sub.org \
--to=armbru@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.