qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: Jamie Iles <quic_jiles@quicinc.com>, qemu-devel@nongnu.org
Cc: richard.henderson@linaro.org, pbonzini@redhat.com,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
Date: Mon, 17 Apr 2023 10:16:24 +0200	[thread overview]
Message-ID: <dd8160f5-5802-8503-2162-69dfe718b7b7@linaro.org> (raw)
In-Reply-To: <20230414105111.30708-1-quic_jiles@quicinc.com>

Hi Jamie,

On 14/4/23 12:51, Jamie Iles wrote:
> The round-robin scheduler will iterate over the CPU list with an
> assigned budget until the next timer expiry and may exit early because
> of a TB exit.  This is fine under normal operation but with icount
> enabled and SMP it is possible for a CPU to be starved of run time and
> the system live-locks.
> 
> For example, booting a riscv64 platform with '-icount
> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> has timers enabled and starts performing TLB shootdowns.  In this case
> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> 1.  As we enter the TCG loop, we assign the icount budget to next timer
> interrupt to CPU 0 and begin executing where the guest is sat in a busy
> loop exhausting all of the budget before we try to execute CPU 1 which
> is the target of the IPI but CPU 1 is left with no budget with which to
> execute and the process repeats.
> 
> We try here to add some fairness by splitting the budget across all of
> the CPUs on the thread fairly before entering each one.  The CPU count
> is cached on CPU list generation ID to avoid iterating the list on each
> loop iteration.  With this change it is possible to boot an SMP rv64
> guest with icount enabled and no hangs.
> 
> Signed-off-by: Jamie Iles <quic_jiles@quicinc.com>
> ---
>   accel/tcg/tcg-accel-ops-icount.c | 16 ++++++++++++++--
>   accel/tcg/tcg-accel-ops-icount.h |  3 ++-
>   accel/tcg/tcg-accel-ops-rr.c     | 26 +++++++++++++++++++++++++-
>   3 files changed, 41 insertions(+), 4 deletions(-)
> 
> diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
> index 84cc7421be88..a7ffc8a68bad 100644
> --- a/accel/tcg/tcg-accel-ops-icount.c
> +++ b/accel/tcg/tcg-accel-ops-icount.c
> @@ -89,7 +89,19 @@ void icount_handle_deadline(void)
>       }
>   }
>   
> -void icount_prepare_for_run(CPUState *cpu)

    /* Return icount budget shared fairly across all CPUs */

Or rename to something more explicit such
icount_fair_shared_budget_per_cpu()?

> +int64_t icount_cpu_timeslice(int cpu_count)
> +{
> +    int64_t limit = icount_get_limit();
> +    int64_t timeslice = limit / cpu_count;
> +
> +    if (timeslice == 0) {
> +        timeslice = limit;
> +    }
> +
> +    return timeslice;
> +}
> +
> +void icount_prepare_for_run(CPUState *cpu, int64_t timeslice)

Maybe s/timeslice/per_cpu_icount_budget_max/, max_icount_budget_per_cpu
or a more descriptive variable name? Otherwise OK.

>   {
>       int insns_left;
>   
> @@ -101,7 +113,7 @@ void icount_prepare_for_run(CPUState *cpu)
>       g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
>       g_assert(cpu->icount_extra == 0);
>   
> -    cpu->icount_budget = icount_get_limit();
> +    cpu->icount_budget = MIN(icount_get_limit(), timeslice);

Alternatively we could pass timeslice as argument to icount_get_limit().

>       insns_left = MIN(0xffff, cpu->icount_budget);
>       cpu_neg(cpu)->icount_decr.u16.low = insns_left;
>       cpu->icount_extra = cpu->icount_budget - insns_left;
> diff --git a/accel/tcg/tcg-accel-ops-icount.h b/accel/tcg/tcg-accel-ops-icount.h
> index 1b6fd9c60751..e8785a0e196d 100644
> --- a/accel/tcg/tcg-accel-ops-icount.h
> +++ b/accel/tcg/tcg-accel-ops-icount.h
> @@ -11,7 +11,8 @@
>   #define TCG_ACCEL_OPS_ICOUNT_H
>   
>   void icount_handle_deadline(void);
> -void icount_prepare_for_run(CPUState *cpu);
> +void icount_prepare_for_run(CPUState *cpu, int64_t timeslice);
> +int64_t icount_cpu_timeslice(int cpu_count);
>   void icount_process_data(CPUState *cpu);
>   
>   void icount_handle_interrupt(CPUState *cpu, int mask);
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 290833a37fb2..bccb3670a656 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -139,6 +139,25 @@ static void rr_force_rcu(Notifier *notify, void *data)
>       rr_kick_next_cpu();
>   }
>   

Maybe worth adding a comment to remember "The CPU count is cached on CPU
list generation ID to avoid iterating the list on each loop iteration."

> +static int rr_cpu_count(void)
> +{
> +    static unsigned int last_gen_id = ~0;
> +    static int cpu_count;
> +    CPUState *cpu;
> +
> +    cpu_list_lock();
> +    if (cpu_list_generation_id_get() != last_gen_id) {
> +        cpu_count = 0;
> +        CPU_FOREACH(cpu) {
> +            ++cpu_count;
> +        }
> +        last_gen_id = cpu_list_generation_id_get();
> +    }
> +    cpu_list_unlock();
> +
> +    return cpu_count;
> +}
> +
>   /*
>    * In the single-threaded case each vCPU is simulated in turn. If
>    * there is more than a single vCPU we create a simple timer to kick
> @@ -185,6 +204,9 @@ static void *rr_cpu_thread_fn(void *arg)
>       cpu->exit_request = 1;
>   
>       while (1) {
> +        int cpu_count = rr_cpu_count();
> +        int64_t icount_timeslice = INT64_MAX;

s/icount_timeslice/icount_budget/?


Modulo the nitpicking comments on "timeslice" naming which I'm not
familiar with, and documenting a bit the code, LGTM.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



  reply	other threads:[~2023-04-17  8:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 10:51 [PATCH] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Jamie Iles
2023-04-17  8:16 ` Philippe Mathieu-Daudé [this message]
2023-04-21 15:11 ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd8160f5-5802-8503-2162-69dfe718b7b7@linaro.org \
    --to=philmd@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quic_jiles@quicinc.com \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).