From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 75B81C77B72 for ; Mon, 17 Apr 2023 08:17:18 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1poK2O-0004lJ-32; Mon, 17 Apr 2023 04:16:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1poK2J-0004l2-Qx for qemu-devel@nongnu.org; Mon, 17 Apr 2023 04:16:32 -0400 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1poK2H-00089k-SP for qemu-devel@nongnu.org; Mon, 17 Apr 2023 04:16:31 -0400 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-2f62b1385e3so756899f8f.2 for ; Mon, 17 Apr 2023 01:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1681719386; x=1684311386; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ToZCG4TNNrZUJ6c9MT/jIJm2rWpywDIrnmINcq+yCvo=; b=HsVx0Q3fG9bJE0/sTlFALIhuPaxp72zvGzrKNy9HN5D3EuNZHYS0nHrDzAVdBo4mP9 XvPKIXL8Y3+rkq3OdNzDpdMa0XNXBBTsdxXWw/Zx7rK/PYS61ei7LbLmWx252jSDrZw5 2ywMy2i3TeuMp4YEyu3fwB8Rh6rH4QItHsvL1o0OnIl4BFg/Akfhx2w0cT/W5DCZ6rDA efrwApsBWWP03I0p619nHthLJLQjbW9hETlpJb2VD6yHVvrXQujoBd6oQqUo5+D8tDpA tw/S4zElzzQtW6Xxubk6eDUDmEES2l0kPgQ1NgWX4Drs99XEWUG6ywrQQ6sXscw/lsnP H9uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681719386; x=1684311386; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ToZCG4TNNrZUJ6c9MT/jIJm2rWpywDIrnmINcq+yCvo=; b=jr26S1diCM8tfJ1EuY7XcCuQ4PdMTfo5HjKvx94PaRsg+c2v5np4jbgQ4+BP9HsYH0 +PEvg6sS0NOHTQ/8FLD8XJ63npVmXVGNNK6nn6YADGk8LnFcMpEapKQAs4epnrcG52a0 VTFZ5Mdcp532zcaJ9UkUrLTvF6Fv9d2evFHr6vCLBMZsgJt7uXODatIPh+PUJttFVhUP vQIj0xOT3r9zyyYD0PqTtJBw01a61kUNDMIMe4lwaSCKs6vt2s36H/PCe5bAnKUzVpBT iGIp6AmFYoD1uO1wZ87aIoJnHTLaadB6+nnAqtKvGhg7PSX9B7FMz31n7dPBL6DKTfRG EcwA== X-Gm-Message-State: AAQBX9c4+CKkj16A34cNXvskvs45wlQ7etZdUik+f0+JxfeVU2sb33Kb uhMS/R5w0qniEWlvw2g5Mkbn0Q== X-Google-Smtp-Source: AKy350Z6spo/vTHkzrMF2KJql87K6JP4Dgs3mpM7+FtBmEc9DDxyGN3uDHIc3Pyr+5CcEbRP/2fEEw== X-Received: by 2002:a5d:65c7:0:b0:2f4:e0e5:aaf8 with SMTP id e7-20020a5d65c7000000b002f4e0e5aaf8mr4636732wrw.68.1681719386348; Mon, 17 Apr 2023 01:16:26 -0700 (PDT) Received: from [192.168.30.216] ([81.0.6.76]) by smtp.gmail.com with ESMTPSA id s15-20020adfeb0f000000b002c55306f6edsm9910249wrn.54.2023.04.17.01.16.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 Apr 2023 01:16:25 -0700 (PDT) Message-ID: Date: Mon, 17 Apr 2023 10:16:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount Content-Language: en-US To: Jamie Iles , qemu-devel@nongnu.org Cc: richard.henderson@linaro.org, pbonzini@redhat.com, =?UTF-8?Q?Alex_Benn=c3=a9e?= References: <20230414105111.30708-1-quic_jiles@quicinc.com> From: =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= In-Reply-To: <20230414105111.30708-1-quic_jiles@quicinc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::435; envelope-from=philmd@linaro.org; helo=mail-wr1-x435.google.com X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.976, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Hi Jamie, On 14/4/23 12:51, Jamie Iles wrote: > The round-robin scheduler will iterate over the CPU list with an > assigned budget until the next timer expiry and may exit early because > of a TB exit. This is fine under normal operation but with icount > enabled and SMP it is possible for a CPU to be starved of run time and > the system live-locks. > > For example, booting a riscv64 platform with '-icount > shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel > has timers enabled and starts performing TLB shootdowns. In this case > we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU > 1. As we enter the TCG loop, we assign the icount budget to next timer > interrupt to CPU 0 and begin executing where the guest is sat in a busy > loop exhausting all of the budget before we try to execute CPU 1 which > is the target of the IPI but CPU 1 is left with no budget with which to > execute and the process repeats. > > We try here to add some fairness by splitting the budget across all of > the CPUs on the thread fairly before entering each one. The CPU count > is cached on CPU list generation ID to avoid iterating the list on each > loop iteration. With this change it is possible to boot an SMP rv64 > guest with icount enabled and no hangs. > > Signed-off-by: Jamie Iles > --- > accel/tcg/tcg-accel-ops-icount.c | 16 ++++++++++++++-- > accel/tcg/tcg-accel-ops-icount.h | 3 ++- > accel/tcg/tcg-accel-ops-rr.c | 26 +++++++++++++++++++++++++- > 3 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c > index 84cc7421be88..a7ffc8a68bad 100644 > --- a/accel/tcg/tcg-accel-ops-icount.c > +++ b/accel/tcg/tcg-accel-ops-icount.c > @@ -89,7 +89,19 @@ void icount_handle_deadline(void) > } > } > > -void icount_prepare_for_run(CPUState *cpu) /* Return icount budget shared fairly across all CPUs */ Or rename to something more explicit such icount_fair_shared_budget_per_cpu()? > +int64_t icount_cpu_timeslice(int cpu_count) > +{ > + int64_t limit = icount_get_limit(); > + int64_t timeslice = limit / cpu_count; > + > + if (timeslice == 0) { > + timeslice = limit; > + } > + > + return timeslice; > +} > + > +void icount_prepare_for_run(CPUState *cpu, int64_t timeslice) Maybe s/timeslice/per_cpu_icount_budget_max/, max_icount_budget_per_cpu or a more descriptive variable name? Otherwise OK. > { > int insns_left; > > @@ -101,7 +113,7 @@ void icount_prepare_for_run(CPUState *cpu) > g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0); > g_assert(cpu->icount_extra == 0); > > - cpu->icount_budget = icount_get_limit(); > + cpu->icount_budget = MIN(icount_get_limit(), timeslice); Alternatively we could pass timeslice as argument to icount_get_limit(). > insns_left = MIN(0xffff, cpu->icount_budget); > cpu_neg(cpu)->icount_decr.u16.low = insns_left; > cpu->icount_extra = cpu->icount_budget - insns_left; > diff --git a/accel/tcg/tcg-accel-ops-icount.h b/accel/tcg/tcg-accel-ops-icount.h > index 1b6fd9c60751..e8785a0e196d 100644 > --- a/accel/tcg/tcg-accel-ops-icount.h > +++ b/accel/tcg/tcg-accel-ops-icount.h > @@ -11,7 +11,8 @@ > #define TCG_ACCEL_OPS_ICOUNT_H > > void icount_handle_deadline(void); > -void icount_prepare_for_run(CPUState *cpu); > +void icount_prepare_for_run(CPUState *cpu, int64_t timeslice); > +int64_t icount_cpu_timeslice(int cpu_count); > void icount_process_data(CPUState *cpu); > > void icount_handle_interrupt(CPUState *cpu, int mask); > diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c > index 290833a37fb2..bccb3670a656 100644 > --- a/accel/tcg/tcg-accel-ops-rr.c > +++ b/accel/tcg/tcg-accel-ops-rr.c > @@ -139,6 +139,25 @@ static void rr_force_rcu(Notifier *notify, void *data) > rr_kick_next_cpu(); > } > Maybe worth adding a comment to remember "The CPU count is cached on CPU list generation ID to avoid iterating the list on each loop iteration." > +static int rr_cpu_count(void) > +{ > + static unsigned int last_gen_id = ~0; > + static int cpu_count; > + CPUState *cpu; > + > + cpu_list_lock(); > + if (cpu_list_generation_id_get() != last_gen_id) { > + cpu_count = 0; > + CPU_FOREACH(cpu) { > + ++cpu_count; > + } > + last_gen_id = cpu_list_generation_id_get(); > + } > + cpu_list_unlock(); > + > + return cpu_count; > +} > + > /* > * In the single-threaded case each vCPU is simulated in turn. If > * there is more than a single vCPU we create a simple timer to kick > @@ -185,6 +204,9 @@ static void *rr_cpu_thread_fn(void *arg) > cpu->exit_request = 1; > > while (1) { > + int cpu_count = rr_cpu_count(); > + int64_t icount_timeslice = INT64_MAX; s/icount_timeslice/icount_budget/? Modulo the nitpicking comments on "timeslice" naming which I'm not familiar with, and documenting a bit the code, LGTM. Reviewed-by: Philippe Mathieu-Daudé