public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	David Vernet <void@manifault.com>
Subject: Re: [PATCH sched_ext/for-6.11 1/2] sched_ext: Implement DSQ iterator
Date: Fri, 28 Jun 2024 12:14:10 -1000	[thread overview]
Message-ID: <Zn81srqbHfKBC7zZ@slm.duckdns.org> (raw)
In-Reply-To: <CAADnVQ+h2W88nWnj_frPa24vYmE+yebHYaT6mronRnDYvC+JLQ@mail.gmail.com>

Hello,

On Thu, Jun 27, 2024 at 06:11:48PM -0700, Alexei Starovoitov wrote:
> > +struct bpf_iter_scx_dsq_kern {
> > +       struct scx_dsq_node             cursor;
> > +       struct scx_dispatch_q           *dsq;
> > +       u32                             dsq_seq;
> > +       u32                             flags;
> > +} __attribute__((aligned(8)));
> > +
> > +struct bpf_iter_scx_dsq {
> > +       u64                             __opaque[12];
> > +} __attribute__((aligned(8)));
> 
> I think this is a bit too much to put on the prog stack.
> Folks are working on increasing this limit and moving
> the stack into "divided stack", so it won't be an issue eventually,
> but let's find a way to reduce it.

Yeah, it is kinda big. Do you have some idea on where the boundary between
okay and too big would fall on?

> It seems to me scx_dsq_node has a bunch of fields,
> but if I'm reading the code correctly this patch is
> only using cursor.list part of it ?

Great point. Cursors used to have to go on the rbtrees too but that's no
longer the case, so I should be able to drop the rbnode which should help
reducing the size substantially. I'll see what I can do.

> Another alternative is to use bpf_mem_alloc() like we do
> in bpf_iter_css_task and others?

This might be okay but given that this can be used pretty frequently (e.g.
every scheduling event) and it *seems* possible to reduce its size
substantially, I'd like to keep it on stack if possible.

> > +__bpf_kfunc int bpf_iter_scx_dsq_new(struct bpf_iter_scx_dsq *it, u64 dsq_id,
> > +                                    u64 flags)
> > +{
> > +       struct bpf_iter_scx_dsq_kern *kit = (void *)it;
> > +
> > +       BUILD_BUG_ON(sizeof(struct bpf_iter_scx_dsq_kern) >
> > +                    sizeof(struct bpf_iter_scx_dsq));
> > +       BUILD_BUG_ON(__alignof__(struct bpf_iter_scx_dsq_kern) !=
> > +                    __alignof__(struct bpf_iter_scx_dsq));
> > +
> > +       if (flags & ~__SCX_DSQ_ITER_ALL_FLAGS)
> > +               return -EINVAL;
> > +
> > +       kit->dsq = find_non_local_dsq(dsq_id);
> > +       if (!kit->dsq)
> > +               return -ENOENT;
> > +
> > +       INIT_LIST_HEAD(&kit->cursor.list);
> > +       RB_CLEAR_NODE(&kit->cursor.priq);
> > +       kit->cursor.flags = SCX_TASK_DSQ_CURSOR;
> 
> Are these two assignments really necessary?
> Something inside nldsq_next_task() is using that?
> 
> > +       kit->dsq_seq = READ_ONCE(kit->dsq->seq);
> > +       kit->flags = flags;

I'm a bit confused whether you're referring to the statements above or
below, but AFAICS, they're all used except for kit->cursor.priq.

- SCX_TASK_DSQ_CURSOR assignment is what tells nldsq_next_task() that the
  node is a cursor, not a real task, and thus should be skipped for internal
  iterations.

- kit->dsq_seq is used by bpf_iter_scx_dsq_next() to ignore tasks that are
  queued after the iteration has started. This, among other things,
  guarantees that p->scx.dsq_vtime increases monotonically throughout
  iteration.

- kit->flags carries SCX_DSQ_ITER_REV which tells bpf_iter_scx_dsq_next()
  the direction of the iteration.

Thanks.

-- 
tejun

      reply	other threads:[~2024-06-28 22:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-28  0:20 [PATCH sched_ext/for-6.11 1/2] sched_ext: Implement DSQ iterator Tejun Heo
2024-06-28  0:24 ` [PATCH sched_ext/for-6.11 2/2] sched_ext: Implement scx_bpf_consume_task() Tejun Heo
2024-06-28  1:34   ` Alexei Starovoitov
2024-06-28 21:57     ` Tejun Heo
2024-06-28 22:34       ` Andrii Nakryiko
2024-06-28 23:04         ` Tejun Heo
2024-06-28 23:12           ` Tejun Heo
2024-06-28 23:56             ` Andrii Nakryiko
2024-06-29  1:35               ` Tejun Heo
2024-07-02  0:55                 ` Andrii Nakryiko
2024-06-28  1:11 ` [PATCH sched_ext/for-6.11 1/2] sched_ext: Implement DSQ iterator Alexei Starovoitov
2024-06-28 22:14   ` Tejun Heo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zn81srqbHfKBC7zZ@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox