From: Markus Armbruster <armbru@redhat.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Stefan Reiter <s.reiter@proxmox.com>,
qemu-devel@nongnu.org, Wolfgang Bumiller <w.bumiller@proxmox.com>
Subject: Re: [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Date: Thu, 08 Apr 2021 16:10:31 +0200 [thread overview]
Message-ID: <87tuog98a0.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <1f326b87-b568-5aa5-011e-057e046c0717@proxmox.com> (Thomas Lamprecht's message of "Thu, 8 Apr 2021 15:27:51 +0200")
Thomas Lamprecht <t.lamprecht@proxmox.com> writes:
> On 08.04.21 14:49, Markus Armbruster wrote:
>> Kevin Wolf <kwolf@redhat.com> writes:
>>> Am 08.04.2021 um 11:21 hat Markus Armbruster geschrieben:
>>>> Should this go into 6.0?
>>>
>>> This is something that the responsible maintainer needs to decide.
>>
>> Yes, and that's me. I'm soliciting opinions.
>>
>>> If it helps you with the decision, and if I understand correctly, it is
>>> a regression from 5.1, but was already broken in 5.2.
>>
>> It helps.
>>
>> Even more helpful would be a risk assessment: what's the risk of
>> applying this patch now vs. delaying it?
>
> Stefan is on vacation this week, but I can share some information, maybe it
> helps.
>
>>
>> If I understand Stefan correctly, Proxmox observed VM hangs. How
>> frequent are these hangs? Did they result in data corruption?
>
>
> They were not highly frequent, but frequent enough to get roughly a bit over a
> dozen of reports in our forum, which normally means something is off but its
> limited to certain HW, storage-tech used or load patterns.
>
> We had initially a hard time to reproduce this, but a user finally could send
> us a backtrace of a hanging VM and with that information we could pin it enough
> down and Stefan came up with a good reproducer (see v1 of this patch).
Excellent work, props!
> We didn't got any report of actual data corruption due to this, but the VM
> hangs completely, so a user killing it may produce that theoretical; but only
> for those program running in the guest that where not made power-loss safe
> anyway...
>
>>
>> How confident do we feel about the fix?
>>
>
> Cannot comment from a technical POV, but can share the feedback we got with it.
>
> Some context about reach:
> We have rolled the fix out to all repository stages which had already a build of
> 5.2, that has a reach of about 100k to 300k installations, albeit we only have
> some rough stats about the sites that accesses the repository daily, cannot really
> tell who actually updated to the new versions, but there are some quite update-happy
> people in the community, so with that in mind and my experience of the feedback
> loop of rolling out updates, I'd figure a lower bound one can assume without going
> out on a limb is ~25k.
>
> Positive feedback from users:
> We got some positive feedback from people which ran into this at least once per
> week about the issue being fixed with that. In total almost a dozen user reported
> improvements, a good chunk of those which reported the problem in the first place.
>
> Mixed feedback:
> We had one user which reported still getting QMP timeouts, but that their VMs did
> not hang anymore (could be high load or the like). Only one user reported that it
> did not help, still investigating there, they have quite high CPU pressure stats
> and it actually may also be another issue, cannot tell for sure yet though.
>
> Negative feedback:
> We had no new users reporting of new/worse problems in that direction, at least
> from what I'm aware off.
>
> Note, we do not use OOB currently, so above does not speak for the OOB case at
> all.
Thanks!
next prev parent reply other threads:[~2021-04-08 14:13 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-22 15:40 [PATCH v2] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB Stefan Reiter
2021-04-07 13:19 ` Kevin Wolf
2021-04-08 9:21 ` Markus Armbruster
2021-04-08 10:27 ` Kevin Wolf
2021-04-08 12:49 ` Markus Armbruster
2021-04-08 13:27 ` Thomas Lamprecht
2021-04-08 14:10 ` Markus Armbruster [this message]
2021-04-09 15:30 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tuog98a0.fsf@dusky.pond.sub.org \
--to=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=s.reiter@proxmox.com \
--cc=t.lamprecht@proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.