* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
@ 2013-04-23 17:11 ` Anton Vorontsov
2013-04-23 18:17 ` Glauber Costa
2013-04-23 19:13 ` Pekka Enberg
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Anton Vorontsov @ 2013-04-23 17:11 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner
On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
[...]
> This patch extends that to also support in-kernel users.
Yup, that is the next logical step. ;-) The patches look good to me, just
one question...
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * we account it too.
> */
> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
I wonder if we want to let kernel users to specify the gfp mask here? The
current mask is good for userspace notifications, but in-kernel users
might be interested in including (or excluding) different types of
allocations, e.g. watch only for DMA allocations pressure?
Thanks!
Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 17:11 ` Anton Vorontsov
@ 2013-04-23 18:17 ` Glauber Costa
0 siblings, 0 replies; 19+ messages in thread
From: Glauber Costa @ 2013-04-23 18:17 UTC (permalink / raw)
To: Anton Vorontsov
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
Johannes Weiner
On 04/23/2013 10:11 AM, Anton Vorontsov wrote:
> On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
>> From: Glauber Costa <glommer@parallels.com>
> [...]
>> This patch extends that to also support in-kernel users.
>
> Yup, that is the next logical step. ;-) The patches look good to me, just
> one question...
>
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * we account it too.
>> */
>> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>
> I wonder if we want to let kernel users to specify the gfp mask here? The
> current mask is good for userspace notifications, but in-kernel users
> might be interested in including (or excluding) different types of
> allocations, e.g. watch only for DMA allocations pressure?
>
That is outside of the scope of this patch anyway. For this one, if you
believe it is good, could I have your tag? =)
But answering your question regardless of the scope, I believe the
context of the allocation is an implementation detail of the kernel -
regardless of how widely understood it is. The thing I like the most
about your work, is precisely the fact that is hides the implementation
details so well.
So unless there is a strong use case that would benefit from it, I am
inclined to say this is not wanted.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
2013-04-23 17:11 ` Anton Vorontsov
@ 2013-04-23 19:13 ` Pekka Enberg
2013-04-23 20:24 ` Anton Vorontsov
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Pekka Enberg @ 2013-04-23 19:13 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm@kvack.org, <cgroups@vger.kernel.org>,
Andrew Morton, Dave Chinner, Anton Vorontsov, John Stultz,
Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner
On Tue, Apr 23, 2013 at 11:22 AM, Glauber Costa <glommer@openvz.org> wrote:
> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
Looks good to me.
Acked-by: Pekka Enberg <penberg@kernel.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
2013-04-23 17:11 ` Anton Vorontsov
2013-04-23 19:13 ` Pekka Enberg
@ 2013-04-23 20:24 ` Anton Vorontsov
2013-04-23 21:01 ` Anton Vorontsov
` (2 more replies)
2013-04-24 7:21 ` Greg Thelen
2013-04-24 19:42 ` Greg Thelen
4 siblings, 3 replies; 19+ messages in thread
From: Anton Vorontsov @ 2013-04-23 20:24 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner
On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
Just a couple more questions... :-)
[...]
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * through vmpressure_prio(). But so far, keep calm.
> */
> if (!scanned)
> - return;
> + goto schedule;
>
> mutex_lock(&vmpr->sr_lock);
> vmpr->scanned += scanned;
> vmpr->reclaimed += reclaimed;
> + vmpr->notify_userspace = true;
Setting the variable on every event seems a bit wasteful... does it make
sense to set it in vmpressure_register_event()? We'll have to make it a
counter, but the good thing is that we won't need any additional locks for
the counter.
> /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
Why don't we need the unregister function? I see that the memcg portion
deals with dangling memcgs, but do they dangle forver?
Oh, and a few cosmetic changes down below...
Other than that, this particular patch looks perfect, feel free to add my:
Acked-by: Anton Vorontsov <anton@enomsg.org>
Thanks!
Anton
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 1862012..3131e72 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -19,7 +19,7 @@ struct vmpressure {
/* Have to grab the lock on events traversal or modifications. */
struct mutex events_lock;
- /* false if only kernel users want to be notified, true otherwise */
+ /* False if only kernel users want to be notified, true otherwise. */
bool notify_userspace;
struct work_struct work;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 8d77ad0..acd3e66 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -156,9 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
mutex_lock(&vmpr->events_lock);
list_for_each_entry(ev, &vmpr->events, node) {
- if (ev->kernel_event)
+ if (ev->kernel_event) {
ev->fn();
- else if (vmpr->notify_userspace && (level >= ev->level)) {
+ } else if (vmpr->notify_userspace && level >= ev->level) {
eventfd_signal(ev->efd, 1);
signalled = true;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 20:24 ` Anton Vorontsov
@ 2013-04-23 21:01 ` Anton Vorontsov
2013-04-24 6:26 ` Glauber Costa
2013-04-24 11:20 ` Glauber Costa
2 siblings, 0 replies; 19+ messages in thread
From: Anton Vorontsov @ 2013-04-23 21:01 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, John Stultz,
Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner
On Tue, Apr 23, 2013 at 04:24:46PM -0400, Anton Vorontsov wrote:
[...]
> > /**
> > + * vmpressure_register_kernel_event() - Register kernel-side notification
>
> Why don't we need the unregister function? I see that the memcg portion
> deals with dangling memcgs, but do they dangle forver?
Oh, I got it. vmpressure_unregister_event() will unregister all the events
anyway. Cool.
Thanks!
Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 20:24 ` Anton Vorontsov
2013-04-23 21:01 ` Anton Vorontsov
@ 2013-04-24 6:26 ` Glauber Costa
2013-04-24 11:20 ` Glauber Costa
2 siblings, 0 replies; 19+ messages in thread
From: Glauber Costa @ 2013-04-24 6:26 UTC (permalink / raw)
To: Anton Vorontsov
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
Johannes Weiner
On 04/24/2013 12:24 AM, Anton Vorontsov wrote:
> On Tue, Apr 23, 2013 at 12:22:08PM +0400, Glauber Costa wrote:
>> From: Glauber Costa <glommer@parallels.com>
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>
> Just a couple more questions... :-)
>
> [...]
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * through vmpressure_prio(). But so far, keep calm.
>> */
>> if (!scanned)
>> - return;
>> + goto schedule;
>>
>> mutex_lock(&vmpr->sr_lock);
>> vmpr->scanned += scanned;
>> vmpr->reclaimed += reclaimed;
>> + vmpr->notify_userspace = true;
>
> Setting the variable on every event seems a bit wasteful... does it make
> sense to set it in vmpressure_register_event()? We'll have to make it a
> counter, but the good thing is that we won't need any additional locks for
> the counter.
>
Yes, vmpressure_register_event would be a better place for it. I will
change and keep the acks, since it does not change the spirit of the
patch too much.
I will also apply the cosmetics you attached. Thanks
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 20:24 ` Anton Vorontsov
2013-04-23 21:01 ` Anton Vorontsov
2013-04-24 6:26 ` Glauber Costa
@ 2013-04-24 11:20 ` Glauber Costa
2 siblings, 0 replies; 19+ messages in thread
From: Glauber Costa @ 2013-04-24 11:20 UTC (permalink / raw)
To: Anton Vorontsov
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
Johannes Weiner
On 04/24/2013 12:24 AM, Anton Vorontsov wrote:
> Setting the variable on every event seems a bit wasteful... does it make
> sense to set it in vmpressure_register_event()? We'll have to make it a
> counter, but the good thing is that we won't need any additional locks for
> the counter.
My bad, I was not looking at the code.
There are two variables here:
One of them is an event variable: kernel_event. It is set to true upon
registration when we are registering a kernel event. We use it to decide
whether we should call a function or signal eventfd for that event.
The other, is a vmpr event and should indeed, be set in all pressure
isntances. It tells us if we should only trigger kernel events (if
false), or userspace events as well (if true).
This is because, for instance, userspace events may not always be
triggered (for instance, due to flags mismatch)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
` (2 preceding siblings ...)
2013-04-23 20:24 ` Anton Vorontsov
@ 2013-04-24 7:21 ` Greg Thelen
2013-04-24 8:36 ` Glauber Costa
2013-04-24 19:42 ` Greg Thelen
4 siblings, 1 reply; 19+ messages in thread
From: Greg Thelen @ 2013-04-24 7:21 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, Anton Vorontsov,
John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
Johannes Weiner
On Tue, Apr 23 2013, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/vmpressure.h | 6 ++++++
> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 76be077..1862012 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -19,6 +19,9 @@ struct vmpressure {
> /* Have to grab the lock on events traversal or modifications. */
> struct mutex events_lock;
>
> + /* false if only kernel users want to be notified, true otherwise */
> + bool notify_userspace;
> +
> struct work_struct work;
> };
>
> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> struct eventfd_ctx *eventfd,
> const char *args);
> +
> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
> + void (*fn)(void));
> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
> struct eventfd_ctx *eventfd);
> #else
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 736a601..8d77ad0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
> }
>
> struct vmpressure_event {
> - struct eventfd_ctx *efd;
> + union {
> + struct eventfd_ctx *efd;
> + void (*fn)(void);
> + };
> enum vmpressure_levels level;
> + bool kernel_event;
> struct list_head node;
> };
>
> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
> mutex_lock(&vmpr->events_lock);
>
> list_for_each_entry(ev, &vmpr->events, node) {
> - if (level >= ev->level) {
> + if (ev->kernel_event)
> + ev->fn();
> + else if (vmpr->notify_userspace && (level >= ev->level)) {
> eventfd_signal(ev->efd, 1);
> signalled = true;
> }
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * we account it too.
> */
> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> - return;
> + goto schedule;
>
> /*
> * If we got here with no pages scanned, then that is an indicator
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * through vmpressure_prio(). But so far, keep calm.
> */
> if (!scanned)
> - return;
> + goto schedule;
If goto schedule is taken here then scanned==0. Then
scanned<vmpressure_win (below), so this function would always simply
return. So this change seems like a no-op. Should the schedule: below
be just before schedule_work(&vmpr->work)? But this wouldn't do much
either because vmpressure_work_fn() would immediately return if
vmpr->scanned==0. Presumable the idea is to avoid notifying user space
or kernel callbacks if lru pages are not scanned - at least until
vmpressure_prio() is called with a priority more desperate than
vmpressure_level_critical_prio at which time this function's scanned!=0.
>
> mutex_lock(&vmpr->sr_lock);
> vmpr->scanned += scanned;
> vmpr->reclaimed += reclaimed;
> + vmpr->notify_userspace = true;
> scanned = vmpr->scanned;
> mutex_unlock(&vmpr->sr_lock);
>
> +schedule:
> if (scanned < vmpressure_win || work_pending(&vmpr->work))
> return;
> schedule_work(&vmpr->work);
> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> }
>
> /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
> + * @cg: cgroup that is interested in vmpressure notifications
> + * @fn: function to be called when pressure happens
> + *
> + * This function register in-kernel users interested in receiving notifications
> + * about pressure conditions. Pressure notifications will be triggered at the
> + * same time as userspace notifications (with no particular ordering relative
> + * to it).
> + *
> + * Pressure notifications are a alternative method to shrinkers and will serve
> + * well users that are interested in a one-shot notification, with a
> + * well-defined cgroup aware interface.
> + */
> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
It seems useful to include the "struct cgroup *" as a parameter to fn.
This would allow for fn to shrink objects it's caching in the cgroup.
Also, why not allow level specification for kernel events?
It might be neat if vmpressure_register_event() used
vmpressure_register_kernel_event() with a callback function calls
eventfd_signal(). This would allow for a uniform event notification
type which is agnostic of user vs kernel. However, as proposed there
are different signaling conditions. So I'm not sure it's worth the time
to combine the even types. So feel free to ignore this paragraph.
> +{
> + struct vmpressure *vmpr = cg_to_vmpressure(cg);
> + struct vmpressure_event *ev;
> +
> + ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> + if (!ev)
> + return -ENOMEM;
> +
> + ev->kernel_event = true;
> + ev->fn = fn;
> +
> + mutex_lock(&vmpr->events_lock);
> + list_add(&ev->node, &vmpr->events);
> + mutex_unlock(&vmpr->events_lock);
> + return 0;
> +}
> +
> +/**
> * vmpressure_unregister_event() - Unbind eventfd from vmpressure
> * @cg: cgroup handle
> * @cft: cgroup control files handle
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-24 7:21 ` Greg Thelen
@ 2013-04-24 8:36 ` Glauber Costa
2013-04-24 19:35 ` Greg Thelen
0 siblings, 1 reply; 19+ messages in thread
From: Glauber Costa @ 2013-04-24 8:36 UTC (permalink / raw)
To: Greg Thelen
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
Kamezawa Hiroyuki, Johannes Weiner
On 04/24/2013 11:21 AM, Greg Thelen wrote:
> On Tue, Apr 23 2013, Glauber Costa wrote:
>
>> From: Glauber Costa <glommer@parallels.com>
>>
>> During the past weeks, it became clear to us that the shrinker interface
>> we have right now works very well for some particular types of users,
>> but not that well for others. The later are usually people interested in
>> one-shot notifications, that were forced to adapt themselves to the
>> count+scan behavior of shrinkers. To do so, they had no choice than to
>> greatly abuse the shrinker interface producing little monsters all over.
>>
>> During LSF/MM, one of the proposals that popped out during our session
>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>> for userspace consumption, but also provide a well-stablished,
>> cgroup-aware entry point for notifications.
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>>
>> Please note that due to my lack of understanding of each shrinker user,
>> I will stay away from converting the actual users, you are all welcome
>> to do so.
>>
>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>> include/linux/vmpressure.h | 6 ++++++
>> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 50 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>> index 76be077..1862012 100644
>> --- a/include/linux/vmpressure.h
>> +++ b/include/linux/vmpressure.h
>> @@ -19,6 +19,9 @@ struct vmpressure {
>> /* Have to grab the lock on events traversal or modifications. */
>> struct mutex events_lock;
>>
>> + /* false if only kernel users want to be notified, true otherwise */
>> + bool notify_userspace;
>> +
>> struct work_struct work;
>> };
>>
>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>> struct eventfd_ctx *eventfd,
>> const char *args);
>> +
>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>> + void (*fn)(void));
>> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>> struct eventfd_ctx *eventfd);
>> #else
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 736a601..8d77ad0 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>> }
>>
>> struct vmpressure_event {
>> - struct eventfd_ctx *efd;
>> + union {
>> + struct eventfd_ctx *efd;
>> + void (*fn)(void);
>> + };
>> enum vmpressure_levels level;
>> + bool kernel_event;
>> struct list_head node;
>> };
>>
>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>> mutex_lock(&vmpr->events_lock);
>>
>> list_for_each_entry(ev, &vmpr->events, node) {
>> - if (level >= ev->level) {
>> + if (ev->kernel_event)
>> + ev->fn();
>> + else if (vmpr->notify_userspace && (level >= ev->level)) {
>> eventfd_signal(ev->efd, 1);
>> signalled = true;
>> }
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * we account it too.
>> */
>> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>> - return;
>> + goto schedule;
>>
>> /*
>> * If we got here with no pages scanned, then that is an indicator
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * through vmpressure_prio(). But so far, keep calm.
>> */
>> if (!scanned)
>> - return;
>> + goto schedule;
>
> If goto schedule is taken here then scanned==0. Then
> scanned<vmpressure_win (below), so this function would always simply
> return. So this change seems like a no-op. Should the schedule: below
> be just before schedule_work(&vmpr->work)? But this wouldn't do much
> either because vmpressure_work_fn() would immediately return if
> vmpr->scanned==0. Presumable the idea is to avoid notifying user space
> or kernel callbacks if lru pages are not scanned - at least until
> vmpressure_prio() is called with a priority more desperate than
> vmpressure_level_critical_prio at which time this function's scanned!=0.
>
Yes, the idea is to avoid calling the callbacks. I can just return at
this point if you prefer. I figured that jumping to the common entry
point would be more consistent, only that. I don't care either way.
>>
>> mutex_lock(&vmpr->sr_lock);
>> vmpr->scanned += scanned;
>> vmpr->reclaimed += reclaimed;
>> + vmpr->notify_userspace = true;
>> scanned = vmpr->scanned;
>> mutex_unlock(&vmpr->sr_lock);
>>
>> +schedule:
>> if (scanned < vmpressure_win || work_pending(&vmpr->work))
>> return;
>> schedule_work(&vmpr->work);
>> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>> }
>>
>> /**
>> + * vmpressure_register_kernel_event() - Register kernel-side notification
>> + * @cg: cgroup that is interested in vmpressure notifications
>> + * @fn: function to be called when pressure happens
>> + *
>> + * This function register in-kernel users interested in receiving notifications
>> + * about pressure conditions. Pressure notifications will be triggered at the
>> + * same time as userspace notifications (with no particular ordering relative
>> + * to it).
>> + *
>> + * Pressure notifications are a alternative method to shrinkers and will serve
>> + * well users that are interested in a one-shot notification, with a
>> + * well-defined cgroup aware interface.
>> + */
>> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
>
> It seems useful to include the "struct cgroup *" as a parameter to fn.
> This would allow for fn to shrink objects it's caching in the cgroup.
>
> Also, why not allow level specification for kernel events?
>
Because I don't want to overdesign. This is a in-kernel API, so we can
change it if we want to. There is only one user, and that is called from
the root cgroup, without level distinction.
The cgroup argument makes sense, but I would rather leave it as is for
now. As for levels, it might make sense as well, but I would much rather
leave the implementation to someone actually using them - specially
since this is not a simple parameter passing.
> It might be neat if vmpressure_register_event() used
> vmpressure_register_kernel_event() with a callback function calls
> eventfd_signal(). This would allow for a uniform event notification
> type which is agnostic of user vs kernel. However, as proposed there
> are different signaling conditions. So I'm not sure it's worth the time
> to combine the even types. So feel free to ignore this paragraph.
>
I don't think it is worth it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-24 8:36 ` Glauber Costa
@ 2013-04-24 19:35 ` Greg Thelen
0 siblings, 0 replies; 19+ messages in thread
From: Greg Thelen @ 2013-04-24 19:35 UTC (permalink / raw)
To: Glauber Costa
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
Kamezawa Hiroyuki, Johannes Weiner
On Wed, Apr 24 2013, Glauber Costa wrote:
> On 04/24/2013 11:21 AM, Greg Thelen wrote:
>> On Tue, Apr 23 2013, Glauber Costa wrote:
>>
>>> From: Glauber Costa <glommer@parallels.com>
>>>
>>> During the past weeks, it became clear to us that the shrinker interface
>>> we have right now works very well for some particular types of users,
>>> but not that well for others. The later are usually people interested in
>>> one-shot notifications, that were forced to adapt themselves to the
>>> count+scan behavior of shrinkers. To do so, they had no choice than to
>>> greatly abuse the shrinker interface producing little monsters all over.
>>>
>>> During LSF/MM, one of the proposals that popped out during our session
>>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>>> for userspace consumption, but also provide a well-stablished,
>>> cgroup-aware entry point for notifications.
>>>
>>> This patch extends that to also support in-kernel users. Events that
>>> should be generated for in-kernel consumption will be marked as such,
>>> and for those, we will call a registered function instead of triggering
>>> an eventfd notification.
>>>
>>> Please note that due to my lack of understanding of each shrinker user,
>>> I will stay away from converting the actual users, you are all welcome
>>> to do so.
>>>
>>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>>> Cc: Dave Chinner <david@fromorbit.com>
>>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.cz>
>>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> ---
>>> include/linux/vmpressure.h | 6 ++++++
>>> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
>>> 2 files changed, 50 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>>> index 76be077..1862012 100644
>>> --- a/include/linux/vmpressure.h
>>> +++ b/include/linux/vmpressure.h
>>> @@ -19,6 +19,9 @@ struct vmpressure {
>>> /* Have to grab the lock on events traversal or modifications. */
>>> struct mutex events_lock;
>>>
>>> + /* false if only kernel users want to be notified, true otherwise */
>>> + bool notify_userspace;
>>> +
>>> struct work_struct work;
>>> };
>>>
>>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>>> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>> struct eventfd_ctx *eventfd,
>>> const char *args);
>>> +
>>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>>> + void (*fn)(void));
>>> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>>> struct eventfd_ctx *eventfd);
>>> #else
>>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>>> index 736a601..8d77ad0 100644
>>> --- a/mm/vmpressure.c
>>> +++ b/mm/vmpressure.c
>>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>>> }
>>>
>>> struct vmpressure_event {
>>> - struct eventfd_ctx *efd;
>>> + union {
>>> + struct eventfd_ctx *efd;
>>> + void (*fn)(void);
>>> + };
>>> enum vmpressure_levels level;
>>> + bool kernel_event;
>>> struct list_head node;
>>> };
>>>
>>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>>> mutex_lock(&vmpr->events_lock);
>>>
>>> list_for_each_entry(ev, &vmpr->events, node) {
>>> - if (level >= ev->level) {
>>> + if (ev->kernel_event)
>>> + ev->fn();
>>> + else if (vmpr->notify_userspace && (level >= ev->level)) {
>>> eventfd_signal(ev->efd, 1);
>>> signalled = true;
>>> }
>>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>> * we account it too.
>>> */
>>> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>>> - return;
>>> + goto schedule;
>>>
>>> /*
>>> * If we got here with no pages scanned, then that is an indicator
>>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>>> * through vmpressure_prio(). But so far, keep calm.
>>> */
>>> if (!scanned)
>>> - return;
>>> + goto schedule;
>>
>> If goto schedule is taken here then scanned==0. Then
>> scanned<vmpressure_win (below), so this function would always simply
>> return. So this change seems like a no-op. Should the schedule: below
>> be just before schedule_work(&vmpr->work)? But this wouldn't do much
>> either because vmpressure_work_fn() would immediately return if
>> vmpr->scanned==0. Presumable the idea is to avoid notifying user space
>> or kernel callbacks if lru pages are not scanned - at least until
>> vmpressure_prio() is called with a priority more desperate than
>> vmpressure_level_critical_prio at which time this function's scanned!=0.
>>
>
> Yes, the idea is to avoid calling the callbacks. I can just return at
> this point if you prefer. I figured that jumping to the common entry
> point would be more consistent, only that. I don't care either way.
Leave it as is. I don't really care either way.
>>> mutex_lock(&vmpr->sr_lock);
>>> vmpr->scanned += scanned;
>>> vmpr->reclaimed += reclaimed;
>>> + vmpr->notify_userspace = true;
>>> scanned = vmpr->scanned;
>>> mutex_unlock(&vmpr->sr_lock);
>>>
>>> +schedule:
>>> if (scanned < vmpressure_win || work_pending(&vmpr->work))
>>> return;
>>> schedule_work(&vmpr->work);
>>> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>>> }
>>>
>>> /**
>>> + * vmpressure_register_kernel_event() - Register kernel-side notification
>>> + * @cg: cgroup that is interested in vmpressure notifications
>>> + * @fn: function to be called when pressure happens
>>> + *
>>> + * This function register in-kernel users interested in receiving notifications
>>> + * about pressure conditions. Pressure notifications will be triggered at the
>>> + * same time as userspace notifications (with no particular ordering relative
>>> + * to it).
>>> + *
>>> + * Pressure notifications are a alternative method to shrinkers and will serve
>>> + * well users that are interested in a one-shot notification, with a
>>> + * well-defined cgroup aware interface.
>>> + */
>>> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
>>
>> It seems useful to include the "struct cgroup *" as a parameter to fn.
>> This would allow for fn to shrink objects it's caching in the cgroup.
>>
>> Also, why not allow level specification for kernel events?
>>
> Because I don't want to overdesign. This is a in-kernel API, so we can
> change it if we want to. There is only one user, and that is called from
> the root cgroup, without level distinction.
>
> The cgroup argument makes sense, but I would rather leave it as is for
> now. As for levels, it might make sense as well, but I would much rather
> leave the implementation to someone actually using them - specially
> since this is not a simple parameter passing.
If there's going to be a single global listener for now, then I agree.
Leave your change as-is.
>> It might be neat if vmpressure_register_event() used
>> vmpressure_register_kernel_event() with a callback function calls
>> eventfd_signal(). This would allow for a uniform event notification
>> type which is agnostic of user vs kernel. However, as proposed there
>> are different signaling conditions. So I'm not sure it's worth the time
>> to combine the even types. So feel free to ignore this paragraph.
>>
>
> I don't think it is worth it.
Fine with me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-23 8:22 ` [PATCH 1/2] vmpressure: in-kernel notifications Glauber Costa
` (3 preceding siblings ...)
2013-04-24 7:21 ` Greg Thelen
@ 2013-04-24 19:42 ` Greg Thelen
2013-04-24 20:04 ` Glauber Costa
2013-04-25 10:50 ` Glauber Costa
4 siblings, 2 replies; 19+ messages in thread
From: Greg Thelen @ 2013-04-24 19:42 UTC (permalink / raw)
To: Glauber Costa
Cc: linux-mm, cgroups, Andrew Morton, Dave Chinner, Anton Vorontsov,
John Stultz, Joonsoo Kim, Michal Hocko, Kamezawa Hiroyuki,
Johannes Weiner
On Tue, Apr 23 2013, Glauber Costa wrote:
> From: Glauber Costa <glommer@parallels.com>
>
> During the past weeks, it became clear to us that the shrinker interface
> we have right now works very well for some particular types of users,
> but not that well for others. The later are usually people interested in
> one-shot notifications, that were forced to adapt themselves to the
> count+scan behavior of shrinkers. To do so, they had no choice than to
> greatly abuse the shrinker interface producing little monsters all over.
>
> During LSF/MM, one of the proposals that popped out during our session
> was to reuse Anton Voronstsov's vmpressure for this. They are designed
> for userspace consumption, but also provide a well-stablished,
> cgroup-aware entry point for notifications.
>
> This patch extends that to also support in-kernel users. Events that
> should be generated for in-kernel consumption will be marked as such,
> and for those, we will call a registered function instead of triggering
> an eventfd notification.
>
> Please note that due to my lack of understanding of each shrinker user,
> I will stay away from converting the actual users, you are all welcome
> to do so.
>
> Signed-off-by: Glauber Costa <glommer@openvz.org>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/vmpressure.h | 6 ++++++
> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 50 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
> index 76be077..1862012 100644
> --- a/include/linux/vmpressure.h
> +++ b/include/linux/vmpressure.h
> @@ -19,6 +19,9 @@ struct vmpressure {
> /* Have to grab the lock on events traversal or modifications. */
> struct mutex events_lock;
>
> + /* false if only kernel users want to be notified, true otherwise */
> + bool notify_userspace;
> +
> struct work_struct work;
> };
>
> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> struct eventfd_ctx *eventfd,
> const char *args);
> +
> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
> + void (*fn)(void));
> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
> struct eventfd_ctx *eventfd);
> #else
> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> index 736a601..8d77ad0 100644
> --- a/mm/vmpressure.c
> +++ b/mm/vmpressure.c
> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
> }
>
> struct vmpressure_event {
> - struct eventfd_ctx *efd;
> + union {
> + struct eventfd_ctx *efd;
> + void (*fn)(void);
> + };
> enum vmpressure_levels level;
> + bool kernel_event;
> struct list_head node;
> };
>
> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
> mutex_lock(&vmpr->events_lock);
>
> list_for_each_entry(ev, &vmpr->events, node) {
> - if (level >= ev->level) {
> + if (ev->kernel_event)
> + ev->fn();
> + else if (vmpr->notify_userspace && (level >= ev->level)) {
> eventfd_signal(ev->efd, 1);
> signalled = true;
> }
> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * we account it too.
> */
> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
> - return;
> + goto schedule;
>
> /*
> * If we got here with no pages scanned, then that is an indicator
> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
> * through vmpressure_prio(). But so far, keep calm.
> */
> if (!scanned)
> - return;
> + goto schedule;
>
> mutex_lock(&vmpr->sr_lock);
> vmpr->scanned += scanned;
> vmpr->reclaimed += reclaimed;
> + vmpr->notify_userspace = true;
Should notify_userspace get cleared sometime? Seems like we might need
to clear or decrement notify_userspace in vmpressure_event() when
calling eventfd_signal().
> scanned = vmpr->scanned;
> mutex_unlock(&vmpr->sr_lock);
>
> +schedule:
> if (scanned < vmpressure_win || work_pending(&vmpr->work))
> return;
> schedule_work(&vmpr->work);
> @@ -328,6 +336,38 @@ int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
> }
>
> /**
> + * vmpressure_register_kernel_event() - Register kernel-side notification
> + * @cg: cgroup that is interested in vmpressure notifications
> + * @fn: function to be called when pressure happens
> + *
> + * This function register in-kernel users interested in receiving notifications
> + * about pressure conditions. Pressure notifications will be triggered at the
> + * same time as userspace notifications (with no particular ordering relative
> + * to it).
> + *
> + * Pressure notifications are a alternative method to shrinkers and will serve
> + * well users that are interested in a one-shot notification, with a
> + * well-defined cgroup aware interface.
> + */
> +int vmpressure_register_kernel_event(struct cgroup *cg, void (*fn)(void))
> +{
> + struct vmpressure *vmpr = cg_to_vmpressure(cg);
> + struct vmpressure_event *ev;
> +
> + ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> + if (!ev)
> + return -ENOMEM;
> +
> + ev->kernel_event = true;
> + ev->fn = fn;
> +
> + mutex_lock(&vmpr->events_lock);
> + list_add(&ev->node, &vmpr->events);
> + mutex_unlock(&vmpr->events_lock);
> + return 0;
> +}
> +
> +/**
> * vmpressure_unregister_event() - Unbind eventfd from vmpressure
> * @cg: cgroup handle
> * @cft: cgroup control files handle
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-24 19:42 ` Greg Thelen
@ 2013-04-24 20:04 ` Glauber Costa
2013-04-25 10:50 ` Glauber Costa
1 sibling, 0 replies; 19+ messages in thread
From: Glauber Costa @ 2013-04-24 20:04 UTC (permalink / raw)
To: Greg Thelen
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
Kamezawa Hiroyuki, Johannes Weiner
On 04/24/2013 11:42 PM, Greg Thelen wrote:
> On Tue, Apr 23 2013, Glauber Costa wrote:
>
>> From: Glauber Costa <glommer@parallels.com>
>>
>> During the past weeks, it became clear to us that the shrinker interface
>> we have right now works very well for some particular types of users,
>> but not that well for others. The later are usually people interested in
>> one-shot notifications, that were forced to adapt themselves to the
>> count+scan behavior of shrinkers. To do so, they had no choice than to
>> greatly abuse the shrinker interface producing little monsters all over.
>>
>> During LSF/MM, one of the proposals that popped out during our session
>> was to reuse Anton Voronstsov's vmpressure for this. They are designed
>> for userspace consumption, but also provide a well-stablished,
>> cgroup-aware entry point for notifications.
>>
>> This patch extends that to also support in-kernel users. Events that
>> should be generated for in-kernel consumption will be marked as such,
>> and for those, we will call a registered function instead of triggering
>> an eventfd notification.
>>
>> Please note that due to my lack of understanding of each shrinker user,
>> I will stay away from converting the actual users, you are all welcome
>> to do so.
>>
>> Signed-off-by: Glauber Costa <glommer@openvz.org>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.cz>
>> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> ---
>> include/linux/vmpressure.h | 6 ++++++
>> mm/vmpressure.c | 48 ++++++++++++++++++++++++++++++++++++++++++----
>> 2 files changed, 50 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
>> index 76be077..1862012 100644
>> --- a/include/linux/vmpressure.h
>> +++ b/include/linux/vmpressure.h
>> @@ -19,6 +19,9 @@ struct vmpressure {
>> /* Have to grab the lock on events traversal or modifications. */
>> struct mutex events_lock;
>>
>> + /* false if only kernel users want to be notified, true otherwise */
>> + bool notify_userspace;
>> +
>> struct work_struct work;
>> };
>>
>> @@ -36,6 +39,9 @@ extern struct vmpressure *css_to_vmpressure(struct cgroup_subsys_state *css);
>> extern int vmpressure_register_event(struct cgroup *cg, struct cftype *cft,
>> struct eventfd_ctx *eventfd,
>> const char *args);
>> +
>> +extern int vmpressure_register_kernel_event(struct cgroup *cg,
>> + void (*fn)(void));
>> extern void vmpressure_unregister_event(struct cgroup *cg, struct cftype *cft,
>> struct eventfd_ctx *eventfd);
>> #else
>> diff --git a/mm/vmpressure.c b/mm/vmpressure.c
>> index 736a601..8d77ad0 100644
>> --- a/mm/vmpressure.c
>> +++ b/mm/vmpressure.c
>> @@ -135,8 +135,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned,
>> }
>>
>> struct vmpressure_event {
>> - struct eventfd_ctx *efd;
>> + union {
>> + struct eventfd_ctx *efd;
>> + void (*fn)(void);
>> + };
>> enum vmpressure_levels level;
>> + bool kernel_event;
>> struct list_head node;
>> };
>>
>> @@ -152,7 +156,9 @@ static bool vmpressure_event(struct vmpressure *vmpr,
>> mutex_lock(&vmpr->events_lock);
>>
>> list_for_each_entry(ev, &vmpr->events, node) {
>> - if (level >= ev->level) {
>> + if (ev->kernel_event)
>> + ev->fn();
>> + else if (vmpr->notify_userspace && (level >= ev->level)) {
>> eventfd_signal(ev->efd, 1);
>> signalled = true;
>> }
>> @@ -227,7 +233,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * we account it too.
>> */
>> if (!(gfp & (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS)))
>> - return;
>> + goto schedule;
>>
>> /*
>> * If we got here with no pages scanned, then that is an indicator
>> @@ -238,14 +244,16 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
>> * through vmpressure_prio(). But so far, keep calm.
>> */
>> if (!scanned)
>> - return;
>> + goto schedule;
>>
>> mutex_lock(&vmpr->sr_lock);
>> vmpr->scanned += scanned;
>> vmpr->reclaimed += reclaimed;
>> + vmpr->notify_userspace = true;
>
> Should notify_userspace get cleared sometime? Seems like we might need
> to clear or decrement notify_userspace in vmpressure_event() when
> calling eventfd_signal().
>
Hhummm, I was kind of assuming that it would always reach this point
zeroed. But looking at the code again, I am wrong, and you are right: it
should be cleared as soon as the notifications are fired.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-24 19:42 ` Greg Thelen
2013-04-24 20:04 ` Glauber Costa
@ 2013-04-25 10:50 ` Glauber Costa
2013-04-25 18:34 ` Greg Thelen
1 sibling, 1 reply; 19+ messages in thread
From: Glauber Costa @ 2013-04-25 10:50 UTC (permalink / raw)
To: Greg Thelen
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
Kamezawa Hiroyuki, Johannes Weiner
[-- Attachment #1: Type: text/plain, Size: 399 bytes --]
On 04/24/2013 11:42 PM, Greg Thelen wrote:
>> + vmpr->notify_userspace = true;
> Should notify_userspace get cleared sometime? Seems like we might need
> to clear or decrement notify_userspace in vmpressure_event() when
> calling eventfd_signal().
>
I am folding the attached patch and keeping the acks unless the ackers
oppose.
Greg, any other problem you spot here? Thanks for the review BTW.
[-- Attachment #2: vmpressure.diff --]
[-- Type: text/x-patch, Size: 789 bytes --]
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 1a082a0..e16256e 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -164,6 +164,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,
}
}
+ vmpr->notify_userspace = false;
mutex_unlock(&vmpr->events_lock);
return signalled;
@@ -249,8 +250,13 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg,
mutex_lock(&vmpr->sr_lock);
vmpr->scanned += scanned;
vmpr->reclaimed += reclaimed;
- vmpr->notify_userspace = true;
scanned = vmpr->scanned;
+ /*
+ * If we didn't reach this point, only kernel events will be triggered.
+ * It is the job of the worker thread to clean this up once the
+ * notifications are all delivered.
+ */
+ vmpr->notify_userspace = true;
mutex_unlock(&vmpr->sr_lock);
schedule:
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] vmpressure: in-kernel notifications
2013-04-25 10:50 ` Glauber Costa
@ 2013-04-25 18:34 ` Greg Thelen
0 siblings, 0 replies; 19+ messages in thread
From: Greg Thelen @ 2013-04-25 18:34 UTC (permalink / raw)
To: Glauber Costa
Cc: Glauber Costa, linux-mm, cgroups, Andrew Morton, Dave Chinner,
Anton Vorontsov, John Stultz, Joonsoo Kim, Michal Hocko,
Kamezawa Hiroyuki, Johannes Weiner
On Thu, Apr 25 2013, Glauber Costa wrote:
> On 04/24/2013 11:42 PM, Greg Thelen wrote:
>>> + vmpr->notify_userspace = true;
>> Should notify_userspace get cleared sometime? Seems like we might need
>> to clear or decrement notify_userspace in vmpressure_event() when
>> calling eventfd_signal().
>>
> I am folding the attached patch and keeping the acks unless the ackers
> oppose.
>
> Greg, any other problem you spot here? Thanks for the review BTW.
Looks good to me. Feel free to add my tag to the patch with
vmpressure.diff folded in.
Reviewed-by: Greg Thelen <gthelen@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread