linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Luiz Capitulino <lcapitulino@redhat.com>
To: Glauber Costa <glommer@gmail.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>,
	dchinner@redhat.com, Michal Hocko <mhocko@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, devel@openvz.org,
	Glauber Costa <glommer@openvz.org>,
	John Stultz <john.stultz@linaro.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH v14 16/18] vmpressure: in-kernel notifications
Date: Fri, 20 Dec 2013 10:03:32 -0500	[thread overview]
Message-ID: <20131220100332.0c5c1ad5@redhat.com> (raw)
In-Reply-To: <CAA6-i6pDqDemeQ+s4EorOx39qmNAtAfVYfg0Z2wtTEu-S7mY=A@mail.gmail.com>

On Fri, 20 Dec 2013 18:31:02 +0400
Glauber Costa <glommer@gmail.com> wrote:

> > I have the exact problem described above for a project I'm working on
> > and this solution seems to solve it well.
> >
> > However, I had a few issues while trying to use this interface. I'll
> > comment on them below, but please take this more as advice seeking
> > than patch review.
> >
> >> This patch extends that to also support in-kernel users. Events that
> >> should be generated for in-kernel consumption will be marked as such,
> >> and for those, we will call a registered function instead of triggering
> >> an eventfd notification.
> >>
> >> Please note that due to my lack of understanding of each shrinker user,
> >> I will stay away from converting the actual users, you are all welcome
> >> to do so.
> >>
> >> Signed-off-by: Glauber Costa <glommer@openvz.org>
> >> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> >> Acked-by: Anton Vorontsov <anton@enomsg.org>
> >> Acked-by: Pekka Enberg <penberg@kernel.org>
> >> Reviewed-by: Greg Thelen <gthelen@google.com>
> >> Cc: Dave Chinner <dchinner@redhat.com>
> >> Cc: John Stultz <john.stultz@linaro.org>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> Cc: Michal Hocko <mhocko@suse.cz>
> >> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> ---
> >>  include/linux/vmpressure.h |    5 +++++
> >>  mm/vmpressure.c            |   53 +++++++++++++++++++++++++++++++++++++++++---
> >>  2 files changed, 55 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> >> index 3f3788d..9102e53 100644
> >> --- a/include/linux/vmpressure.h
> >> +++ b/include/linux/vmpressure.h
> >> @@ -19,6 +19,9 @@ struct vmpressure {
> >>       /* Have to grab the lock on events traversal or modifications. */
> >>       struct mutex events_lock;
> >>
> >> +     /* False if only kernel users want to be notified, true otherwise. */
> >> +     bool notify_userspace;
> >> +
> >>       struct work_struct work;
> >>  };
> >>
> >> @@ -38,6 +41,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css,
> >>                                    struct cftype *cft,
> >>                                    struct eventfd_ctx *eventfd,
> >>                                    const char *args);
> >> +extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
> >> +                                         void (*fn)(void));
> >>  extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> >>                                       struct cftype *cft,
> >>                                       struct eventfd_ctx *eventfd);
> >> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> >> index e0f6283..730e7c1 100644
> >> --- a/mm/vmpressure.c
> >> +++ b/mm/vmpressure.c
> >> @@ -130,8 +130,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
> >>  }
> >>
> >>  struct vmpressure_event {
> >> -     struct eventfd_ctx *efd;
> >> +     union {
> >> +             struct eventfd_ctx *efd;
> >> +             void (*fn)(void);
> >
> > How does the callback access its private data?
> >
> >> +     };
> >>       enum vmpressure_levels level;
> >> +     bool kernel_event;
> >>       struct list_head node;
> >>  };
> >>
> >> @@ -147,12 +151,15 @@ static bool vmpressure_event(struct vmpressure *vmpr,
> >>       mutex_lock(&vmpr->events_lock);
> >>
> >>       list_for_each_entry(ev, &vmpr->events, node) {
> >> -             if (level >= ev->level) {
> >> +             if (ev->kernel_event) {
> >> +                     ev->fn();
> >
> > I think it would be interesting to pass 'level' to the callback (I'll
> > probably use it myself), but we could wait for a in-tree user before
> > adding it.
> >
> >> +             } else if (vmpr->notify_userspace && level >= ev->level) {
> >>                       eventfd_signal(ev->efd, 1);
> >>                       signalled = true;
> >>               }
> >>       }
> >>
> >> +     vmpr->notify_userspace = false;
> >>       mutex_unlock(&vmpr->events_lock);
> >>
> >>       return signalled;
> >> @@ -222,7 +229,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> >>        * we account it too.
> >>        */
> >>       if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> >> -             return;
> >> +             goto schedule;
> >>
> >>       /*
> >>        * If we got here with no pages scanned, then that is an indicator
> >> @@ -239,8 +246,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> >>       vmpr->scanned += scanned;
> >>       vmpr->reclaimed += reclaimed;
> >>       scanned = vmpr->scanned;
> >> +     /*
> >> +      * If we didn't reach this point, only kernel events will be triggered.
> >> +      * It is the job of the worker thread to clean this up once the
> >> +      * notifications are all delivered.
> >> +      */
> >> +     vmpr->notify_userspace = true;
> >>       spin_unlock(&vmpr->sr_lock);
> >>
> >> +schedule:
> >>       if (scanned < vmpressure_win)
> >>               return;
> >>       schedule_work(&vmpr->work);
> >> @@ -324,6 +338,39 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
> >>  }
> >>
> >>  /**
> >> + * vmpressure_register_kernel_event() - Register kernel-side notification
> >> + * @css:     css that is interested in vmpressure notifications
> >> + * @fn:              function to be called when pressure happens
> >> + *
> >> + * This function register in-kernel users interested in receiving notifications
> >> + * about pressure conditions. Pressure notifications will be triggered at the
> >> + * same time as userspace notifications (with no particular ordering relative
> >> + * to it).
> >> + *
> >> + * Pressure notifications are a alternative method to shrinkers and will serve
> >> + * well users that are interested in a one-shot notification, with a
> >> + * well-defined cgroup aware interface.
> >> + */
> >> +int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
> >> +                                   void (*fn)(void))
> >> +{
> >> +     struct vmpressure *vmpr = css_to_vmpressure(css);
> >
> > This doesn't allow for css=NULL. What's the recommended way for a today's
> > shrinker (which is not related to cgroups) to register with this API?
> >
> > Also, you don't seem to provide a way to de-register from the event.
> >
> 
> The answer for all of your questions above can be summarized by noting
> that for the lack of other users (at the time), this patch does the bare minimum
> for memcg needs. I agree, for instance, that it would be good to pass the level
> but since memcg won't do anything with thta, I didn't pass it.
> 
> That should be extended if you need to.

That works for me. That is, including this minimal version first and
extending it when we get in-tree users.

> 
> > I hacked a patch to be able to use this, seems to work but it's a ugly
> > hack:
> >
> > ---
> >  include/linux/vmpressure.h |  3 ++-
> >  mm/vmpressure.c            | 13 +++++++++----
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> > index 9102e53..de416b6 100644
> > --- a/include/linux/vmpressure.h
> > +++ b/include/linux/vmpressure.h
> > @@ -42,7 +42,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css,
> >                                      struct eventfd_ctx *eventfd,
> >                                      const char *args);
> >  extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
> > -                                           void (*fn)(void));
> > +                                           void (*fn)(void *data, int level),
> > +                                           void *data);
> >  extern void vmpressure_unregister_event(struct cgroup_subsys_state *css,
> >                                         struct cftype *cft,
> >                                         struct eventfd_ctx *eventfd);
> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> > index 730e7c1..4ed0e85 100644
> > --- a/mm/vmpressure.c
> > +++ b/mm/vmpressure.c
> > @@ -132,9 +132,10 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
> >  struct vmpressure_event {
> >         union {
> >                 struct eventfd_ctx *efd;
> > -               void (*fn)(void);
> > +               void (*fn)(void *data, int level);
> >         };
> >         enum vmpressure_levels level;
> > +       void *data;
> >         bool kernel_event;
> >         struct list_head node;
> >  };
> > @@ -152,7 +153,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,
> >
> >         list_for_each_entry(ev, &vmpr->events, node) {
> >                 if (ev->kernel_event) {
> > -                       ev->fn();
> > +                       ev->fn(ev->data, level);
> >                 } else if (vmpr->notify_userspace && level >= ev->level) {
> >                         eventfd_signal(ev->efd, 1);
> >                         signalled = true;
> > @@ -352,21 +353,25 @@ int vmpressure_register_event(struct cgroup_subsys_state *css,
> >   * well-defined cgroup aware interface.
> >   */
> >  int vmpressure_register_kernel_event(struct cgroup_subsys_state *css,
> > -                                     void (*fn)(void))
> > +                                    void (*fn)(void *data, int level), void *data)
> >  {
> > -       struct vmpressure *vmpr = css_to_vmpressure(css);
> > +       struct vmpressure *vmpr;
> >         struct vmpressure_event *ev;
> >
> > +       vmpr = css ? css_to_vmpressure(css) : memcg_to_vmpressure(NULL);
> > +
> >         ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> >         if (!ev)
> >                 return -ENOMEM;
> >
> >         ev->kernel_event = true;
> > +       ev->data = data;
> >         ev->fn = fn;
> >
> >         mutex_lock(&vmpr->events_lock);
> >         list_add(&ev->node, &vmpr->events);
> >         mutex_unlock(&vmpr->events_lock);
> > +
> 
> Your patch makes sense.
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-12-20 15:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16 12:16 [PATCH v14 00/18] kmemcg shrinkers Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 01/18] memcg: make cache index determination more robust Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 02/18] memcg: consolidate callers of memcg_cache_id Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 03/18] memcg: move initialization to memcg creation Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 04/18] memcg: make for_each_mem_cgroup macros public Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 05/18] memcg: remove KMEM_ACCOUNTED_ACTIVATED flag Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 06/18] memcg: rework memcg_update_kmem_limit synchronization Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 07/18] list_lru, shrinkers: introduce list_lru_shrink_{count,walk} Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 08/18] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 09/18] vmscan: move call to shrink_slab() to shrink_zones() Vladimir Davydov
2013-12-16 12:16 ` [PATCH v14 10/18] vmscan: remove shrink_control arg from do_try_to_free_pages() Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 11/18] vmscan: call NUMA-unaware shrinkers irrespective of nodemask Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 12/18] vmscan: shrink slab on memcg pressure Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 13/18] vmscan: take at least one pass with shrinkers Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 14/18] list_lru: add per-memcg lists Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 15/18] fs: make shrinker memcg aware Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 16/18] vmpressure: in-kernel notifications Vladimir Davydov
2013-12-20 14:26   ` Luiz Capitulino
2013-12-20 14:31     ` Glauber Costa
2013-12-20 14:32       ` Glauber Costa
2013-12-20 14:36       ` Vladimir Davydov
2013-12-20 15:03       ` Luiz Capitulino [this message]
2013-12-20 16:44         ` Luiz Capitulino
2013-12-20 16:46           ` Glauber Costa
2013-12-20 16:53             ` Luiz Capitulino
2013-12-20 16:58               ` Glauber Costa
2013-12-20 17:00                 ` Luiz Capitulino
2013-12-16 12:17 ` [PATCH v14 17/18] memcg: reap dead memcgs upon global memory pressure Vladimir Davydov
2013-12-16 12:17 ` [PATCH v14 18/18] memcg: flush memcg items upon memcg destruction Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131220100332.0c5c1ad5@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dchinner@redhat.com \
    --cc=devel@openvz.org \
    --cc=glommer@gmail.com \
    --cc=glommer@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=john.stultz@linaro.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).