From: "Alex Bennée" <alex.bennee@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <rth@twiddle.net>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
qemu-devel <qemu-devel@nongnu.org>,
maciej.borzecki@rndity.com
Subject: Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Date: Wed, 05 Jul 2017 22:46:52 +0100 [thread overview]
Message-ID: <87r2xua75f.fsf@linaro.org> (raw)
In-Reply-To: <CAFEAcA_DcOMzo=X1Y9O+5VrfeA32i012vpQmqB++aEXH6J1qeg@mail.gmail.com>
Peter Maydell <peter.maydell@linaro.org> writes:
> On 5 July 2017 at 20:30, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Peter Maydell <peter.maydell@linaro.org> writes:
>>
>>> On 5 July 2017 at 17:01, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>> An interesting bug was reported on #qemu today. It was bisected to
>>>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run
>>>> with taskset -c 0. Originally the fingers where pointed at mttcg but it
>>>> occurs in both single and multi-threaded modes.
>>>>
>>>> I think the problem is qemu_system_reset_request() is certainly racy
>>>> when resetting a running CPU. AFAICT:
>>>>
>>>> - Guest resets board, writing to some hw address (e.g.
>>>> arm_sysctl_write)
>>>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET)
>>>> - We exit iowrite and drop the BQL
>>>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset
>>>> - we start writing new values to CPU env while still in TCG code
>>>> - CHAOS!
>>>>
>>>> The general solution for this is to ensure these sort of tasks are done
>>>> with safe work in the CPUs context when we know nothing else is running.
>>>> It seems this is probably best done by modifying
>>>> qemu_system_reset_request to queue work up on current_cpu and execute it
>>>> as safe work - I don't think the vl.c thread should ever be messing
>>>> about with calling cpu_reset directly.
>>>
>>> My first thought is that qemu_system_reset() should absolutely
>>> stop every CPU (or other runnable thing like a DMA agent) in the
>>> system.
>>
>> Are all these reset calls system wide though?
>
> It's called 'system_reset' because it resets the entire system...
>
>> After all with PCSI you
>> can bring individual cores up and down. I appreciate the vexpress stuff
>> pre-dates those well defined semantics though.
>
> It's individual core reset that's a more ad-hoc afterthought,
> really.
>
>> vm_stop certainly tries to deal with things gracefully as well as send
>> qapi events, drain IO queues and the rest of it. My only concern is it
>> handles two cases - external vm_stops and those from the current CPU.
>>
>> I think it may be cleaner for CPU originated halts to use the
>> async_safe_run_on_cpu() mechanism.
>
> System reset already has an async component to it -- you call
> qemu_system_reset_request(), which just says "schedule a system
> reset as soon as convenient". qemu_system_reset() is the thing
> that runs later and actually does the job (from the io thread,
> not the CPU thread).
>
> Looking more closely at the vl.c code, it looks like it
> calls pause_all_vcpus() before calling qemu_system_reset():
> shouldn't that be pausing all the TCG CPUs?
Looking deeper it seems cpu_stop_current() is doing the wrong thing.
Because it sets cpu->stopped the pause_all_vcpus() in the vl.c thread
doesn't wait.
I suspect it should really be doing a cpu_loop_exit. I'll see if I can
work up a patch.
>
> thanks
> -- PMM
--
Alex Bennée
next prev parent reply other threads:[~2017-07-05 21:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-05 16:01 [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime Alex Bennée
2017-07-05 16:14 ` Peter Maydell
2017-07-05 16:21 ` Paolo Bonzini
2017-07-05 19:31 ` Alex Bennée
2017-07-06 8:37 ` Alex Bennée
2017-07-05 19:30 ` Alex Bennée
2017-07-05 19:42 ` Peter Maydell
2017-07-05 20:10 ` Alex Bennée
2017-07-05 21:46 ` Alex Bennée [this message]
[not found] <mailman.82700.1499272965.22738.qemu-devel@nongnu.org>
2017-07-05 16:54 ` G 3
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r2xua75f.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=f4bug@amsat.org \
--cc=maciej.borzecki@rndity.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).