From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <rth@twiddle.net>
Cc: peter.maydell@linaro.org, pbonzini@redhat.com,
edgar.iglesias@xilinx.com, cota@braap.org, qemu-devel@nongnu.org,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
"open list\:ARM" <qemu-arm@nongnu.org>
Subject: Re: [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr hang (DEBUG!)
Date: Sat, 10 Jun 2017 09:51:26 +0100 [thread overview]
Message-ID: <87vao4b4z5.fsf@linaro.org> (raw)
In-Reply-To: <fc351edb-7c08-c341-d8ee-85f6768e4931@twiddle.net>
Richard Henderson <rth@twiddle.net> writes:
> On 06/09/2017 10:01 AM, Alex Bennée wrote:
>> THIS IS A DEBUG PATCH DO NOT MERGE
>>
>> I include all the comments to show my working. I was trying to
>> isolate which instructions cause the problem. It turns out it is the
>> RET instruction. I don't understand why because AFAICT it is a
>> pretty much a BR instruction.
>
> Yeah, same thing for Alpha.
>
> It has been my guess that not chaining through RET means that we get
> back to the main loop regularly and often, letting interrupts be
> recognized in a timely manner.
>
> I can't figure out why that would be, however, since interrupts
> *ought* to be setting icount_decr, and the TB to which we chain *is*
> checking that to return to the main loop.
Indeed - if that was broken a lot more stuff wouldn't work.
> Since changing the timing affects the outcome (e.g. -d exec), it
> follows that this *must* be some sort of race condition. But since
> this still happens with single-threaded mode, I can't imagine what
> sort of race condition it might be.
Apart from timer expiry I can't think what other interactions the other
threads have on the main TCG thread. I guess there is IO but my test
hangs way before the kernel starts poking the disk. Is there an
interaction between IRQs and QEMU's serial driver?
>
> More data points. I removed the tb_htable_lookup, and that by itself
> is enough to fix Alpha booting. But it doesn't help the aarch64
> kernel+image that I have. Which does still boot with -d nochain
> (which, along with disabling goto_tb chaining, also disables all
> goto_ptr).
I wonder what is different about your aarch64 image and mine then?
Because mine works just with suppressing the chaining for RET.
>
> Not really sure where to go from here.
I would agree with Emilio that we revert but I can't quite shake the
feeling we are missing an underlying problem. Would just skipping the
htable lookup (but keeping the tb_jmp_cache) be an OK fix for now? Have
we just been lucky that whatever mechanism causes the "hang" wasn't due
to?
>
>
> r~
--
Alex Bennée
WARNING: multiple messages have this Message-ID (diff)
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <rth@twiddle.net>
Cc: peter.maydell@linaro.org, pbonzini@redhat.com,
edgar.iglesias@xilinx.com, cota@braap.org, qemu-devel@nongnu.org,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
"open list:ARM" <qemu-arm@nongnu.org>
Subject: Re: [Qemu-devel] [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr hang (DEBUG!)
Date: Sat, 10 Jun 2017 09:51:26 +0100 [thread overview]
Message-ID: <87vao4b4z5.fsf@linaro.org> (raw)
In-Reply-To: <fc351edb-7c08-c341-d8ee-85f6768e4931@twiddle.net>
Richard Henderson <rth@twiddle.net> writes:
> On 06/09/2017 10:01 AM, Alex Bennée wrote:
>> THIS IS A DEBUG PATCH DO NOT MERGE
>>
>> I include all the comments to show my working. I was trying to
>> isolate which instructions cause the problem. It turns out it is the
>> RET instruction. I don't understand why because AFAICT it is a
>> pretty much a BR instruction.
>
> Yeah, same thing for Alpha.
>
> It has been my guess that not chaining through RET means that we get
> back to the main loop regularly and often, letting interrupts be
> recognized in a timely manner.
>
> I can't figure out why that would be, however, since interrupts
> *ought* to be setting icount_decr, and the TB to which we chain *is*
> checking that to return to the main loop.
Indeed - if that was broken a lot more stuff wouldn't work.
> Since changing the timing affects the outcome (e.g. -d exec), it
> follows that this *must* be some sort of race condition. But since
> this still happens with single-threaded mode, I can't imagine what
> sort of race condition it might be.
Apart from timer expiry I can't think what other interactions the other
threads have on the main TCG thread. I guess there is IO but my test
hangs way before the kernel starts poking the disk. Is there an
interaction between IRQs and QEMU's serial driver?
>
> More data points. I removed the tb_htable_lookup, and that by itself
> is enough to fix Alpha booting. But it doesn't help the aarch64
> kernel+image that I have. Which does still boot with -d nochain
> (which, along with disabling goto_tb chaining, also disables all
> goto_ptr).
I wonder what is different about your aarch64 image and mine then?
Because mine works just with suppressing the chaining for RET.
>
> Not really sure where to go from here.
I would agree with Emilio that we revert but I can't quite shake the
feeling we are missing an underlying problem. Would just skipping the
htable lookup (but keeping the tb_jmp_cache) be an OK fix for now? Have
we just been lucky that whatever mechanism causes the "hang" wasn't due
to?
>
>
> r~
--
Alex Bennée
next prev parent reply other threads:[~2017-06-10 8:50 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-09 17:00 [Qemu-devel] [RFC DEBUG PATCH 0/3] debug patch for lookup-ptr hang Alex Bennée
2017-06-09 17:00 ` [Qemu-devel] [RFC DEBUG PATCH 1/3] vl: Fix broken thread=xxx option of the --accel parameter Alex Bennée
2017-06-09 17:00 ` [Qemu-devel] [RFC DEBUG PATCH 2/3] tcg-runtime: light re-factor of lookup_tb_ptr Alex Bennée
2017-06-09 17:01 ` [RFC DEBUG PATCH 3/3] translate-a64: fix lookup_tb_ptr hang (DEBUG!) Alex Bennée
2017-06-09 17:01 ` [Qemu-devel] " Alex Bennée
2017-06-10 2:29 ` Richard Henderson
2017-06-10 2:29 ` [Qemu-devel] " Richard Henderson
2017-06-10 8:51 ` Alex Bennée [this message]
2017-06-10 8:51 ` Alex Bennée
2017-06-10 16:59 ` Richard Henderson
2017-06-10 16:59 ` [Qemu-devel] " Richard Henderson
2017-06-11 5:07 ` Emilio G. Cota
2017-06-11 5:07 ` [Qemu-devel] " Emilio G. Cota
2017-06-12 10:31 ` Alex Bennée
2017-06-12 10:31 ` [Qemu-devel] " Alex Bennée
2017-06-13 22:53 ` [PATCH] target/aarch64: exit to main loop after handling MSR Emilio G. Cota
2017-06-13 22:53 ` [Qemu-devel] " Emilio G. Cota
2017-06-13 23:01 ` no-reply
2017-06-14 4:48 ` Richard Henderson
2017-06-14 4:48 ` [Qemu-devel] " Richard Henderson
2017-06-14 10:46 ` Paolo Bonzini
2017-06-14 10:46 ` [Qemu-devel] " Paolo Bonzini
2017-06-14 11:45 ` Alex Bennée
2017-06-14 11:45 ` [Qemu-devel] " Alex Bennée
2017-06-14 12:02 ` Paolo Bonzini
2017-06-14 12:02 ` [Qemu-devel] " Paolo Bonzini
2017-06-14 12:14 ` Alex Bennée
2017-06-14 12:14 ` [Qemu-devel] " Alex Bennée
2017-06-14 12:16 ` Paolo Bonzini
2017-06-14 12:16 ` [Qemu-devel] " Paolo Bonzini
2017-06-14 12:35 ` Alex Bennée
2017-06-14 12:35 ` [Qemu-devel] " Alex Bennée
2017-06-14 12:43 ` Paolo Bonzini
2017-06-14 12:43 ` [Qemu-devel] " Paolo Bonzini
2017-06-14 10:38 ` Alex Bennée
2017-06-14 10:38 ` [Qemu-devel] " Alex Bennée
2017-06-09 21:11 ` [Qemu-devel] [RFC DEBUG PATCH 0/3] debug patch for lookup-ptr hang no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vao4b4z5.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=cota@braap.org \
--cc=crosthwaite.peter@gmail.com \
--cc=edgar.iglesias@xilinx.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.