From: Richard Henderson <richard.henderson@linaro.org>
To: Jamie Iles <quic_jiles@quicinc.com>
Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, philmd@linaro.org,
peter.maydell@linaro.org
Subject: Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
Date: Wed, 10 May 2023 16:01:20 +0100 [thread overview]
Message-ID: <4e780ff9-b25c-5439-2fe7-80057785042d@linaro.org> (raw)
In-Reply-To: <ZFItGQ5o67WuOwYp@JILES.na.qualcomm.com>
On 5/3/23 10:44, Jamie Iles wrote:
> Hi Richard,
>
> On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
>> On 4/27/23 03:09, Jamie Iles wrote:
>>> From: Jamie Iles <jiles@qti.qualcomm.com>
>>>
>>> The round-robin scheduler will iterate over the CPU list with an
>>> assigned budget until the next timer expiry and may exit early because
>>> of a TB exit. This is fine under normal operation but with icount
>>> enabled and SMP it is possible for a CPU to be starved of run time and
>>> the system live-locks.
>>>
>>> For example, booting a riscv64 platform with '-icount
>>> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
>>> has timers enabled and starts performing TLB shootdowns. In this case
>>> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
>>> 1. As we enter the TCG loop, we assign the icount budget to next timer
>>> interrupt to CPU 0 and begin executing where the guest is sat in a busy
>>> loop exhausting all of the budget before we try to execute CPU 1 which
>>> is the target of the IPI but CPU 1 is left with no budget with which to
>>> execute and the process repeats.
>>>
>>> We try here to add some fairness by splitting the budget across all of
>>> the CPUs on the thread fairly before entering each one. The CPU count
>>> is cached on CPU list generation ID to avoid iterating the list on each
>>> loop iteration. With this change it is possible to boot an SMP rv64
>>> guest with icount enabled and no hangs.
>>>
>>> New in v3 (address feedback from Richard Henderson):
>>> - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>>> appropriate
>>> - Move rr_cpu_count() call to be conditional on icount_enabled()
>>> - Initialize cpu_budget to 0
>>>
>>> Jamie Iles (2):
>>> cpu: expose qemu_cpu_list_lock for lock-guard use
>>> accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
>>
>> It appears as if one of these two patches causes a failure in replay, e.g.
>>
>> https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
>>
>> Would you have a look, please?
>
> I was out on vacation and it looks like this job got cleaned up, but was
> this a mutex check failing for the replay mutex and/or iothread mutex?
> If so then the following patch fixes it for me which I can include in a
> v4
That was it. I'll squash this fix and re-queue.
r~
>
> Thanks,
>
> Jamie
>
>
> diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
> index e1e8afaf2f99..3d2cfbbc9778 100644
> --- i/accel/tcg/tcg-accel-ops-icount.c
> +++ w/accel/tcg/tcg-accel-ops-icount.c
> @@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
> g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
> g_assert(cpu->icount_extra == 0);
>
> + replay_mutex_lock();
> +
> cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
> insns_left = MIN(0xffff, cpu->icount_budget);
> cpu_neg(cpu)->icount_decr.u16.low = insns_left;
> cpu->icount_extra = cpu->icount_budget - insns_left;
>
> - replay_mutex_lock();
> -
> if (cpu->icount_budget == 0) {
> /*
> * We're called without the iothread lock, so must take it while
> diff --git i/replay/replay.c w/replay/replay.c
> index c39156c52221..0f7d766efe81 100644
> --- i/replay/replay.c
> +++ w/replay/replay.c
> @@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
> int replay_get_instructions(void)
> {
> int res = 0;
> - replay_mutex_lock();
> + g_assert(replay_mutex_locked());
> if (replay_next_event_is(EVENT_INSTRUCTION)) {
> res = replay_state.instruction_count;
> if (replay_break_icount != -1LL) {
> @@ -85,7 +85,6 @@ int replay_get_instructions(void)
> }
> }
> }
> - replay_mutex_unlock();
> return res;
> }
>
prev parent reply other threads:[~2023-05-10 15:02 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 2:09 [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
2023-04-27 2:09 ` [PATCH v3 1/2] cpu: expose qemu_cpu_list_lock for lock-guard use Jamie Iles
2023-04-27 7:44 ` Richard Henderson
2023-04-28 23:05 ` Philippe Mathieu-Daudé
2023-04-27 2:09 ` [PATCH v3 2/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
2023-04-27 7:46 ` Richard Henderson
2023-04-29 9:28 ` [PATCH v3 0/2] " Richard Henderson
2023-04-29 12:23 ` Philippe Mathieu-Daudé
2023-05-03 9:44 ` Jamie Iles
2023-05-10 15:01 ` Richard Henderson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4e780ff9-b25c-5439-2fe7-80057785042d@linaro.org \
--to=richard.henderson@linaro.org \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quic_jiles@quicinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).