All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: Ihor Solodrai <ihor.solodrai@pm.me>,
	sched-ext@meta.com, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest
Date: Thu, 23 Jan 2025 23:26:37 +0100	[thread overview]
Message-ID: <Z5LCHVHZPl2fjPyc@gpd3> (raw)
In-Reply-To: <Z5KOLqwLq96HjkwH@gpd3>

On Thu, Jan 23, 2025 at 07:45:08PM +0100, Andrea Righi wrote:
> On Thu, Jan 23, 2025 at 06:57:58AM -1000, Tejun Heo wrote:
> > On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote:
> > > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote:
> > > > 
> > > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > Hello, sorry about the delay.
> > > > > 
> > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> > > > > ...
> > > > > 
> > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> > > > > 
> > > > > 
> > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound
> > > > > kworkers should be allowed on all CPUs. I'll update the test to test the
> > > > > actual cpumask but can you see whether this failure is consistent or flaky?
> > > > 
> > > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
> > > > Previous time only 1 of 3 runs failed.
> > > > 
> > > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680
> > > 
> > > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0.
> > > 
> > > More exactly, none of the enum values are being resolved correctly, likely
> > > due to the CO:RE enum refactoring. There’s probably something broken in
> > > tools/testing/selftests/sched_ext/Makefile, I’ll take a look.
> > 
> > Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the
> > pending pull request is merged. The original report is a separate issue tho.
> > I'm still a bit baffled by it.
> 
> For the enum part: https://lore.kernel.org/all/20250123124606.242115-1-arighi@nvidia.com/
> 
> And yeah, I missed that the original bug report was about the unbound
> kworker not allowed to be dispatched on cpu 1. Weird... I'm wondering if we
> need to do the cpumask_cnt / scx_bpf_dsq_cancel() game, like we did with
> scx_rustland to handle concurrent affinity changes, but in this case the
> kworker shouldn't have its affinity changed...

Thinking more about this, scx_bpf_task_cpu(p) returns the last known CPU
where the task p was running, but it doesn't necessarily give a CPU where
the task can run at any time. In general it's probably a safer choice to
rely on p->cpus_ptr, maybe doing bpf_cpumask_any_distribute(p->cpus_ptr)
for this test case.

However, I still don't see why the unbound kworker couldn't be dispatched
on cpu 1 in this particular case...

-Andrea

  reply	other threads:[~2025-01-23 22:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09 15:29 [PATCH] scx: Fix maximal BPF selftest prog David Vernet
2024-12-10 14:37 ` Andrea Righi
2024-12-10 20:52 ` Ihor Solodrai
2024-12-11 21:01   ` [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass() Tejun Heo
2024-12-11 21:03     ` Tejun Heo
2024-12-12 18:12     ` Ihor Solodrai
2024-12-17 23:44       ` Ihor Solodrai
2024-12-18 18:34         ` Tejun Heo
2024-12-19 22:51           ` Ihor Solodrai
2024-12-20 19:26             ` Ihor Solodrai
2024-12-25  0:09             ` [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest Tejun Heo
2024-12-25  0:10               ` Tejun Heo
2025-01-15 23:50               ` Ihor Solodrai
2025-01-22  1:40                 ` Tejun Heo
2025-01-22 19:10                   ` Ihor Solodrai
2025-01-23  9:40                     ` Andrea Righi
2025-01-23 16:57                       ` Tejun Heo
2025-01-23 18:45                         ` Andrea Righi
2025-01-23 22:26                           ` Andrea Righi [this message]
2025-01-24 22:00                             ` [PATCH sched_ext/for-6.14-fixes] sched_ext: selftests/dsp_local_on: Fix sporadic failures Tejun Heo
2025-01-25  4:54                               ` Andrea Righi
2025-01-27 18:45                                 ` Tejun Heo
2024-12-11  6:31 ` [PATCH] scx: Fix maximal BPF selftest prog Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5LCHVHZPl2fjPyc@gpd3 \
    --to=arighi@nvidia.com \
    --cc=bpf@vger.kernel.org \
    --cc=ihor.solodrai@pm.me \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@meta.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.