All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Yang Shi <shy828301@gmail.com>, Roman Gushchin <guro@fb.com>,
	Shakeel Butt <shakeelb@google.com>,
	Dave Chinner <david@fromorbit.com>,
	Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker
Date: Tue, 15 Dec 2020 17:44:12 +0100	[thread overview]
Message-ID: <20201215164412.GA385334@cmpxchg.org> (raw)
In-Reply-To: <6ffd6aa1-2c55-f4d3-a60a-56786d40531a@virtuozzo.com>

On Thu, Dec 10, 2020 at 06:17:54PM +0300, Kirill Tkhai wrote:
> On 10.12.2020 18:13, Johannes Weiner wrote:
> > On Wed, Dec 09, 2020 at 09:32:37AM -0800, Yang Shi wrote:
> >> On Wed, Dec 9, 2020 at 7:42 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >>>
> >>> On 08.12.2020 20:13, Yang Shi wrote:
> >>>> On Thu, Dec 3, 2020 at 3:40 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >>>>>
> >>>>> On 02.12.2020 21:27, Yang Shi wrote:
> >>>>>> Use per memcg's nr_deferred for memcg aware shrinkers.  The shrinker's nr_deferred
> >>>>>> will be used in the following cases:
> >>>>>>     1. Non memcg aware shrinkers
> >>>>>>     2. !CONFIG_MEMCG
> >>>>>>     3. memcg is disabled by boot parameter
> >>>>>>
> >>>>>> Signed-off-by: Yang Shi <shy828301@gmail.com>
> >>>>>> ---
> >>>>>>  mm/vmscan.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >>>>>>  1 file changed, 82 insertions(+), 6 deletions(-)
> >>>>>>
> >>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>>>> index cba0bc8d4661..d569fdcaba79 100644
> >>>>>> --- a/mm/vmscan.c
> >>>>>> +++ b/mm/vmscan.c
> >>>>>> @@ -203,6 +203,12 @@ static DECLARE_RWSEM(shrinker_rwsem);
> >>>>>>  static DEFINE_IDR(shrinker_idr);
> >>>>>>  static int shrinker_nr_max;
> >>>>>>
> >>>>>> +static inline bool is_deferred_memcg_aware(struct shrinker *shrinker)
> >>>>>> +{
> >>>>>> +     return (shrinker->flags & SHRINKER_MEMCG_AWARE) &&
> >>>>>> +             !mem_cgroup_disabled();
> >>>>>> +}
> >>>>>> +
> >>>>>>  static int prealloc_memcg_shrinker(struct shrinker *shrinker)
> >>>>>>  {
> >>>>>>       int id, ret = -ENOMEM;
> >>>>>> @@ -271,7 +277,58 @@ static bool writeback_throttling_sane(struct scan_control *sc)
> >>>>>>  #endif
> >>>>>>       return false;
> >>>>>>  }
> >>>>>> +
> >>>>>> +static inline long count_nr_deferred(struct shrinker *shrinker,
> >>>>>> +                                  struct shrink_control *sc)
> >>>>>> +{
> >>>>>> +     bool per_memcg_deferred = is_deferred_memcg_aware(shrinker) && sc->memcg;
> >>>>>> +     struct memcg_shrinker_deferred *deferred;
> >>>>>> +     struct mem_cgroup *memcg = sc->memcg;
> >>>>>> +     int nid = sc->nid;
> >>>>>> +     int id = shrinker->id;
> >>>>>> +     long nr;
> >>>>>> +
> >>>>>> +     if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> >>>>>> +             nid = 0;
> >>>>>> +
> >>>>>> +     if (per_memcg_deferred) {
> >>>>>> +             deferred = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_deferred,
> >>>>>> +                                                  true);
> >>>>>
> >>>>> My comment is about both 5/9 and 6/9 patches.
> >>>>
> >>>> Sorry for the late reply, I don't know why Gmail filtered this out to spam.
> >>>>
> >>>>>
> >>>>> shrink_slab_memcg() races with mem_cgroup_css_online(). A visibility of CSS_ONLINE flag
> >>>>> in shrink_slab_memcg()->mem_cgroup_online() does not guarantee that you will see
> >>>>> memcg->nodeinfo[nid]->shrinker_deferred != NULL in count_nr_deferred(). This may occur
> >>>>> because of processor reordering on !x86 (there is no a common lock or memory barriers).
> >>>>>
> >>>>> Regarding to shrinker_map this is not a problem due to map check in shrink_slab_memcg().
> >>>>> The map can't be NULL there.
> >>>>>
> >>>>> Regarding to shrinker_deferred you should prove either this is not a problem too,
> >>>>> or to add proper synchronization (maybe, based on barriers) or to add some similar check
> >>>>> (maybe, in shrink_slab_memcg() too).
> >>>>
> >>>> It seems shrink_slab_memcg() might see shrinker_deferred as NULL
> >>>> either due to the same reason. I don't think there is a guarantee it
> >>>> won't happen.
> >>>>
> >>>> We just need guarantee CSS_ONLINE is seen after shrinker_maps and
> >>>> shrinker_deferred are allocated, so I'm supposed barriers before
> >>>> "css->flags |= CSS_ONLINE" should work.
> >>>>
> >>>> So the below patch may be ok:
> >>>>
> >>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>>> index df128cab900f..9f7fb0450d69 100644
> >>>> --- a/mm/memcontrol.c
> >>>> +++ b/mm/memcontrol.c
> >>>> @@ -5539,6 +5539,12 @@ static int mem_cgroup_css_online(struct
> >>>> cgroup_subsys_state *css)
> >>>>                 return -ENOMEM;
> >>>>         }
> >>>>
> >>>>
> >>>> +       /*
> >>>> +        * Barrier for CSS_ONLINE, so that shrink_slab_memcg() sees
> >>>> shirnker_maps
> >>>> +        * and shrinker_deferred before CSS_ONLINE.
> >>>> +        */
> >>>> +       smp_mb();
> >>>> +
> >>>>         /* Online state pins memcg ID, memcg ID pins CSS */
> >>>>         refcount_set(&memcg->id.ref, 1);
> >>>>         css_get(css);
> >>>
> >>> smp barriers synchronize data access from different cpus. They should go in a pair.
> >>> In case of you add the smp barrier into mem_cgroup_css_online(), we should also
> >>> add one more smp barrier in another place, which we want to synchonize with this.
> >>> Also, every place should contain a comment referring to its pair: "Pairs with...".
> >>
> >> Thanks, I think you are correct. Looked into it further, it seems the
> >> race pattern looks like:
> >>
> >> CPU A                                                                  CPU B
> >> store shrinker_maps pointer                      load CSS_ONLINE
> >> store CSS_ONLINE                                   load shrinker_maps pointer
> >>
> >> By checking the memory-barriers document, it seems we need write
> >> barrier/read barrier pair as below:
> >>
> >> CPU A                                                                  CPU B
> >> store shrinker_maps pointer                       load CSS_ONLINE
> >> <write barrier>                                             <read barrier>
> >> store CSS_ONLINE                                    load shrinker_maps pointer
> >>
> >>
> >> So, the patch should look like:
> >>
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index df128cab900f..489c0a84f82b 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -5539,6 +5539,13 @@ static int mem_cgroup_css_online(struct
> >> cgroup_subsys_state *css)
> >>                 return -ENOMEM;
> >>         }
> >>
> >> +       /*
> >> +        * Barrier for CSS_ONLINE, so that shrink_slab_memcg() sees
> >> shirnker_maps
> >> +        * and shrinker_deferred before CSS_ONLINE. It pairs with the
> >> read barrier
> >> +        * in shrink_slab_memcg().
> >> +        */
> >> +       smp_wmb();
> > 
> > Is there a reason why the shrinker allocations aren't done in
> > .css_alloc()? That would take care of all necessary ordering:
> 
> The reason is that allocations have to be made in a place, where
> mem-cgroup_iter() can't miss it, since memcg_expand_shrinker_maps()
> shouldn't miss allocated shrinker maps.

I see, because we could have this:

.css_alloc()
  memcg_alloc_shrinker_maps()
    down_read(&shrinker_sem)
    map = alloc(shrinker_nr_max * sizeof(long));
    rcu_assign_pointer(memcg->...->shrinker_map = map);
    up_read(&shrinker_sem);
                                                            register_shrinker()
                                                              down_write(&shrinker_sem)
                                                              shrinker_nr_max = id + 1;
                                                              memcg_expand_shrinker_maps()
                                                                for_each_mem_cgroup()
                                                                  realloc
                                                              up_write(&shrinker_sem)
  list_add_tail_rcu(&css->sibling, &parent->children);

  /* boom: missed new shrinker, map too small */

Thanks for the clarification.

  reply	other threads:[~2020-12-15 16:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-02 18:27 [RFC PATCH 0/9] Make shrinker's nr_deferred memcg aware Yang Shi
2020-12-02 18:27 ` [PATCH 1/9] mm: vmscan: simplify nr_deferred update code Yang Shi
2020-12-03  2:56   ` Roman Gushchin
2020-12-02 18:27 ` [PATCH 2/9] mm: vmscan: use nid from shrink_control for tracepoint Yang Shi
2020-12-03  3:13   ` Xiaqing (A)
2020-12-11 19:20     ` Yang Shi
2020-12-02 18:27 ` [PATCH 3/9] mm: memcontrol: rename memcg_shrinker_map_mutex to memcg_shrinker_mutex Yang Shi
2020-12-02 18:27 ` [PATCH 4/9] mm: vmscan: use a new flag to indicate shrinker is registered Yang Shi
2020-12-03  3:01   ` Roman Gushchin
2020-12-03  4:59     ` Yang Shi
2020-12-03 20:08       ` Roman Gushchin
2020-12-03 22:25         ` Yang Shi
2020-12-04 18:52           ` Johannes Weiner
2020-12-04 21:24             ` Yang Shi
2020-12-02 18:27 ` [PATCH 5/9] mm: memcontrol: add per memcg shrinker nr_deferred Yang Shi
2020-12-03  3:06   ` Roman Gushchin
2020-12-03  4:54     ` Yang Shi
2020-12-03 18:03       ` Yang Shi
2020-12-03 20:07         ` Roman Gushchin
2020-12-03 22:49           ` Yang Shi
2020-12-03 23:30             ` Roman Gushchin
2020-12-04  0:22               ` Yang Shi
2020-12-10 15:33   ` Johannes Weiner
2020-12-10 19:12     ` Yang Shi
2020-12-11 17:52       ` Yang Shi
2020-12-10 21:59     ` Yang Shi
2020-12-02 18:27 ` [PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker Yang Shi
2020-12-03  3:08   ` Roman Gushchin
2020-12-03  5:01     ` Yang Shi
2020-12-03 11:40   ` Kirill Tkhai
2020-12-08 17:13     ` Yang Shi
2020-12-09 15:41       ` Kirill Tkhai
2020-12-09 17:32         ` Yang Shi
2020-12-10 15:13           ` Johannes Weiner
2020-12-10 15:17             ` Kirill Tkhai
2020-12-15 16:44               ` Johannes Weiner [this message]
2020-12-02 18:27 ` [PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers Yang Shi
2020-12-02 18:27 ` [PATCH 8/9] mm: memcontrol: reparent nr_deferred when memcg offline Yang Shi
2020-12-02 18:27 ` [PATCH 9/9] mm: vmscan: shrink deferred objects proportional to priority Yang Shi
2020-12-03  2:52 ` [RFC PATCH 0/9] Make shrinker's nr_deferred memcg aware Roman Gushchin
2020-12-03 17:52   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201215164412.GA385334@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.