From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "Michael Tokarev" <mjt@tls.msk.ru>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"QEMU Development" <qemu-devel@nongnu.org>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Mark Cave-Ayland" <mark.caveayland@nutanix.com>
Subject: Re: apparent race condition in mttcg memory handling
Date: Mon, 21 Jul 2025 10:52:19 -0700 [thread overview]
Message-ID: <1ab1dfd2-9cfc-487c-8a27-f8790ba4f770@linaro.org> (raw)
In-Reply-To: <CAFEAcA9zM1+qWLhfErnokzzYWbnMizKLfBe_Be-AqrqG72c7jQ@mail.gmail.com>
On 7/21/25 10:31 AM, Peter Maydell wrote:
> On Mon, 21 Jul 2025 at 18:26, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 7/21/25 10:14 AM, Michael Tokarev wrote:
>>> rr is the first thing I tried. Nope, it's absolutely hopeless. It
>>> tried to boot just the kernel for over 30 minutes, after which I just
>>> gave up.
>>>
>>
>> I had a similar thing to debug recently, and with a simple loop, I
>> couldn't expose it easily. The bug I had was triggered with 3%
>> probability, which seems close from yours.
>> As rr record -h is single threaded, I found useful to write a wrapper
>> script [1] to run one instance, and then run it in parallel using:
>> ./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)
>>
>> With that, I could expose the bug in 2 minutes reliably (vs trying for
>> more than one hour before). With your 64 cores, I'm sure it will quickly
>> expose it.
>
> I think the problem here is that the whole runtime to get to
> point-of-potential failure is too long, not that it takes too
> many runs to get a failure.
>
> For that kind of thing I have had success in the past with
> making a QEMU snapshot close to the point of failure so that
> the actual runtime that it's necessary to record under rr is
> reduced.
>
That's a good idea indeed. In the bug I had, it was due to KASLR address
chosen, so by using a snapshot I would have had not expose the random
aspect.
In case of current bug, it seems to be a proper race condition, so
trying more combinations with a preloaded snapshot to save a few seconds
per run is a good point.
> -- PMM
next prev parent reply other threads:[~2025-07-21 17:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 19:20 apparent race condition in mttcg memory handling Michael Tokarev
2025-06-04 10:47 ` Michael Tokarev
2025-07-21 11:47 ` Philippe Mathieu-Daudé
2025-07-21 16:23 ` Pierrick Bouvier
2025-07-21 16:29 ` Pierrick Bouvier
2025-07-21 17:14 ` Michael Tokarev
2025-07-21 17:25 ` Pierrick Bouvier
2025-07-21 17:28 ` Pierrick Bouvier
2025-07-21 17:31 ` Peter Maydell
2025-07-21 17:52 ` Pierrick Bouvier [this message]
2025-07-22 20:11 ` Gustavo Romero
2025-07-23 6:31 ` Michael Tokarev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ab1dfd2-9cfc-487c-8a27-f8790ba4f770@linaro.org \
--to=pierrick.bouvier@linaro.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=alex.bennee@linaro.org \
--cc=mark.caveayland@nutanix.com \
--cc=mjt@tls.msk.ru \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).