From: Michal Hocko <mhocko@suse.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Sebastian Sewior <bigeasy@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Hou Tao <houtao1@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Matthew Wilcox <willy@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Jann Horn <jannh@google.com>, Tejun Heo <tj@kernel.org>,
linux-mm <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v3 4/6] memcg: Use trylock to access memcg stock_lock.
Date: Fri, 20 Dec 2024 09:24:55 +0100 [thread overview]
Message-ID: <Z2Up17maf6FHkVu5@tiehlicka> (raw)
In-Reply-To: <CAADnVQLm=gSAh2u3iF4HoGmLEqa-AV0FAEnDqcoFYDgZ06d+gQ@mail.gmail.com>
On Thu 19-12-24 16:39:43, Alexei Starovoitov wrote:
> On Wed, Dec 18, 2024 at 11:52 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 19-12-24 08:27:06, Michal Hocko wrote:
> > > On Thu 19-12-24 08:08:44, Michal Hocko wrote:
> > > > All that being said, the message I wanted to get through is that atomic
> > > > (NOWAIT) charges could be trully reentrant if the stock local lock uses
> > > > trylock. We do not need a dedicated gfp flag for that now.
> > >
> > > And I want to add. Not only we can achieve that, I also think this is
> > > desirable because for !RT this will be no functional change and for RT
> > > it makes more sense to simply do deterministic (albeit more costly
> > > page_counter update) than spin over a lock to use the batch (or learn
> > > the batch cannot be used).
> >
> > So effectively this on top of yours
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index f168d223375f..29a831f6109c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1768,7 +1768,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > return ret;
> >
> > if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> > - if (gfp_mask & __GFP_TRYLOCK)
> > + if (!gfpflags_allow_blockingk(gfp_mask))
> > return ret;
> > local_lock_irqsave(&memcg_stock.stock_lock, flags);
>
> I don't quite understand such a strong desire to avoid the new GFP flag
> especially when it's in mm/internal.h. There are lots of bits left.
> It's not like PF_* flags that are limited, but fine
> let's try to avoid GFP_TRYLOCK_BIT.
Because historically this has proven to be a bad idea that usually
backfires. As I've said in other email I do care much less now that
this is mostly internal (one can still do that but would need to try
hard). But still if we _can_ avoid it and it makes the code generally
_sensible_ then let's not introduce a new flag.
[...]
> How about the following:
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index ff9060af6295..f06131d5234f 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -39,6 +39,17 @@ static inline bool gfpflags_allow_blocking(const
> gfp_t gfp_flags)
> return !!(gfp_flags & __GFP_DIRECT_RECLAIM);
> }
>
> +static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags)
> +{
> + /*
> + * !__GFP_DIRECT_RECLAIM -> direct claim is not allowed.
> + * !__GFP_KSWAPD_RECLAIM -> it's not safe to wake up kswapd.
> + * All GFP_* flags including GFP_NOWAIT use one or both flags.
> + * try_alloc_pages() is the only API that doesn't specify either flag.
I wouldn't be surprised if we had other allocations like that. git grep
is generally not very helpful as many/most allocations use gfp argument
of a sort. I would slightly reword this to be more explicit.
/*
* This is stronger than GFP_NOWAIT or GFP_ATOMIC because
* those are guaranteed to never block on a sleeping lock.
* Here we are enforcing that the allaaction doesn't ever spin
* on any locks (i.e. only trylocks). There is no highlevel
* GFP_$FOO flag for this use try_alloc_pages as the
* regular page allocator doesn't fully support this
* allocation mode.
> + */
> + return !(gfp_flags & __GFP_RECLAIM);
> +}
> +
> #ifdef CONFIG_HIGHMEM
> #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM
> #else
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f168d223375f..545d345c22de 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1768,7 +1768,7 @@ static bool consume_stock(struct mem_cgroup
> *memcg, unsigned int nr_pages,
> return ret;
>
> if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> - if (gfp_mask & __GFP_TRYLOCK)
> + if (!gfpflags_allow_spinning(gfp_mask))
> return ret;
> local_lock_irqsave(&memcg_stock.stock_lock, flags);
> }
>
> If that's acceptable then such an approach will work for
> my slub.c reentrance changes too.
It certainly is acceptable for me. Do not forget to add another hunk to
avoid charging the full batch in this case.
Thanks!
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2024-12-20 8:24 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-18 3:07 [PATCH bpf-next v3 0/6] bpf, mm: Introduce try_alloc_pages() alexei.starovoitov
2024-12-18 3:07 ` [PATCH bpf-next v3 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation alexei.starovoitov
2024-12-18 11:32 ` Michal Hocko
2024-12-19 0:05 ` Shakeel Butt
2024-12-19 7:18 ` Michal Hocko
2024-12-19 1:18 ` Alexei Starovoitov
2024-12-19 7:13 ` Michal Hocko
2024-12-20 0:41 ` Alexei Starovoitov
2024-12-19 0:10 ` Shakeel Butt
2024-12-19 1:39 ` Alexei Starovoitov
2024-12-18 3:07 ` [PATCH bpf-next v3 2/6] mm, bpf: Introduce free_pages_nolock() alexei.starovoitov
2024-12-18 4:58 ` Yosry Ahmed
2024-12-18 5:33 ` Alexei Starovoitov
2024-12-18 5:57 ` Yosry Ahmed
2024-12-18 6:37 ` Alexei Starovoitov
2024-12-18 6:49 ` Yosry Ahmed
2024-12-18 7:25 ` Alexei Starovoitov
2024-12-18 7:40 ` Yosry Ahmed
2024-12-18 11:32 ` Michal Hocko
2024-12-19 1:45 ` Alexei Starovoitov
2024-12-19 7:03 ` Michal Hocko
2024-12-20 0:42 ` Alexei Starovoitov
2024-12-18 3:07 ` [PATCH bpf-next v3 3/6] locking/local_lock: Introduce local_trylock_irqsave() alexei.starovoitov
2024-12-18 3:07 ` [PATCH bpf-next v3 4/6] memcg: Use trylock to access memcg stock_lock alexei.starovoitov
2024-12-18 11:32 ` Michal Hocko
2024-12-19 1:53 ` Alexei Starovoitov
2024-12-19 7:08 ` Michal Hocko
2024-12-19 7:27 ` Michal Hocko
2024-12-19 7:52 ` Michal Hocko
2024-12-20 0:39 ` Alexei Starovoitov
2024-12-20 8:24 ` Michal Hocko [this message]
2024-12-20 16:10 ` Alexei Starovoitov
2024-12-20 19:45 ` Shakeel Butt
2024-12-21 7:20 ` Michal Hocko
2024-12-18 3:07 ` [PATCH bpf-next v3 5/6] mm, bpf: Use memcg in try_alloc_pages() alexei.starovoitov
2024-12-18 3:07 ` [PATCH bpf-next v3 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs alexei.starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2Up17maf6FHkVu5@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=houtao1@huawei.com \
--cc=jannh@google.com \
--cc=kernel-team@fb.com \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.