* [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace
@ 2010-02-19 22:28 Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects Kirill A. Shutemov
2010-02-20 4:00 ` [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace KAMEZAWA Hiroyuki
0 siblings, 2 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2010-02-19 22:28 UTC (permalink / raw)
To: containers, linux-mm
Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura,
Kirill A. Shutemov
Notify userspace about cgroup removing only after rmdir of cgroup
directory to avoid race between userspace and kernelspace.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
kernel/cgroup.c | 32 +++++++++++++++++---------------
1 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ce9008f..46903cb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -780,28 +780,15 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
static int cgroup_call_pre_destroy(struct cgroup *cgrp)
{
struct cgroup_subsys *ss;
- struct cgroup_event *event, *tmp;
int ret = 0;
for_each_subsys(cgrp->root, ss)
if (ss->pre_destroy) {
ret = ss->pre_destroy(ss, cgrp);
if (ret)
- goto out;
+ break;
}
- /*
- * Unregister events and notify userspace.
- */
- spin_lock(&cgrp->event_list_lock);
- list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
- list_del(&event->list);
- eventfd_signal(event->eventfd, 1);
- schedule_work(&event->remove);
- }
- spin_unlock(&cgrp->event_list_lock);
-
-out:
return ret;
}
@@ -2991,7 +2978,6 @@ static void cgroup_event_remove(struct work_struct *work)
event->cft->unregister_event(cgrp, event->cft, event->eventfd);
eventfd_ctx_put(event->eventfd);
- remove_wait_queue(event->wqh, &event->wait);
kfree(event);
}
@@ -3009,6 +2995,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
unsigned long flags = (unsigned long)key;
if (flags & POLLHUP) {
+ remove_wait_queue_locked(event->wqh, &event->wait);
spin_lock(&cgrp->event_list_lock);
list_del(&event->list);
spin_unlock(&cgrp->event_list_lock);
@@ -3457,6 +3444,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
struct dentry *d;
struct cgroup *parent;
DEFINE_WAIT(wait);
+ struct cgroup_event *event, *tmp;
int ret;
/* the vfs holds both inode->i_mutex already */
@@ -3540,6 +3528,20 @@ again:
set_bit(CGRP_RELEASABLE, &parent->flags);
check_for_release(parent);
+ /*
+ * Unregister events and notify userspace.
+ * Notify userspace about cgroup removing only after rmdir of cgroup
+ * directory to avoid race between userspace and kernelspace
+ */
+ spin_lock(&cgrp->event_list_lock);
+ list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
+ list_del(&event->list);
+ remove_wait_queue(event->wqh, &event->wait);
+ eventfd_signal(event->eventfd, 1);
+ schedule_work(&event->remove);
+ }
+ spin_unlock(&cgrp->event_list_lock);
+
mutex_unlock(&cgroup_mutex);
return 0;
}
--
1.6.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects
2010-02-19 22:28 [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace Kirill A. Shutemov
@ 2010-02-19 22:28 ` Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation Kirill A. Shutemov
2010-02-20 4:22 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects KAMEZAWA Hiroyuki
2010-02-20 4:00 ` [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace KAMEZAWA Hiroyuki
1 sibling, 2 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2010-02-19 22:28 UTC (permalink / raw)
To: containers, linux-mm
Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura,
Kirill A. Shutemov
Events should be removed after rmdir of cgroup directory, but before
destroying subsystem state objects. Let's take reference to cgroup
directory dentry to do that.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
include/linux/cgroup.h | 3 ---
kernel/cgroup.c | 8 ++++++++
mm/memcontrol.c | 9 ---------
3 files changed, 8 insertions(+), 12 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 64cebfe..1719c75 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -395,9 +395,6 @@ struct cftype {
* closes the eventfd or on cgroup removing.
* This callback must be implemented, if you want provide
* notification functionality.
- *
- * Be careful. It can be called after destroy(), so you have
- * to keep all nesessary data, until all events are removed.
*/
int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
struct eventfd_ctx *eventfd);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 46903cb..d142524 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2979,6 +2979,7 @@ static void cgroup_event_remove(struct work_struct *work)
eventfd_ctx_put(event->eventfd);
kfree(event);
+ dput(cgrp->dentry);
}
/*
@@ -3099,6 +3100,13 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
goto fail;
}
+ /*
+ * Events should be removed after rmdir of cgroup directory, but before
+ * destroying subsystem state objects. Let's take reference to cgroup
+ * directory dentry to do that.
+ */
+ dget(cgrp->dentry);
+
spin_lock(&cgrp->event_list_lock);
list_add(&event->list, &cgrp->event_list);
spin_unlock(&cgrp->event_list_lock);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a443c30..8fe6e7f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3358,12 +3358,6 @@ static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
}
}
- /*
- * We need to increment refcnt to be sure that all thresholds
- * will be unregistered before calling __mem_cgroup_free()
- */
- mem_cgroup_get(memcg);
-
if (type == _MEM)
rcu_assign_pointer(memcg->thresholds, thresholds_new);
else
@@ -3457,9 +3451,6 @@ assign:
/* To be sure that nobody uses thresholds before freeing it */
synchronize_rcu();
- for (i = 0; i < thresholds->size - size; i++)
- mem_cgroup_put(memcg);
-
kfree(thresholds);
unlock:
mutex_unlock(&memcg->thresholds_lock);
--
1.6.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation
2010-02-19 22:28 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects Kirill A. Shutemov
@ 2010-02-19 22:28 ` Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds Kirill A. Shutemov
2010-02-20 4:24 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation KAMEZAWA Hiroyuki
2010-02-20 4:22 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects KAMEZAWA Hiroyuki
1 sibling, 2 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2010-02-19 22:28 UTC (permalink / raw)
To: containers, linux-mm
Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura,
Kirill A. Shutemov
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
Documentation/cgroups/cgroup_event_listener.c | 102 +++++++++++++++++++++++++
1 files changed, 102 insertions(+), 0 deletions(-)
create mode 100644 Documentation/cgroups/cgroup_event_listener.c
diff --git a/Documentation/cgroups/cgroup_event_listener.c b/Documentation/cgroups/cgroup_event_listener.c
new file mode 100644
index 0000000..a8277b2
--- /dev/null
+++ b/Documentation/cgroups/cgroup_event_listener.c
@@ -0,0 +1,102 @@
+/*
+ * cgroup_event_listener.c - Simple listener of cgroup events
+ *
+ * Copyright (C) Kirill A. Shutemov <kirill@shutemov.name>
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <libgen.h>
+#include <limits.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/eventfd.h>
+
+#define USAGE_STR "Usage: cgroup_event_listener <path-to-control-file> <args>\n"
+
+int main(int argc, char **argv)
+{
+ int efd = -1;
+ int cfd = -1;
+ int event_control = -1;
+ char event_control_path[PATH_MAX];
+ int ret;
+
+ if (argc != 3) {
+ fputs(USAGE_STR, stderr);
+ return 1;
+ }
+
+ cfd = open(argv[1], O_RDONLY);
+ if (cfd == -1) {
+ fprintf(stderr, "Cannot open %s: %s\n", argv[1],
+ strerror(errno));
+ goto out;
+ }
+
+ ret = snprintf(event_control_path, PATH_MAX, "%s/cgroup.event_control",
+ dirname(argv[1]));
+ if (ret > PATH_MAX) {
+ fputs("Path to cgroup.event_control is too long\n", stderr);
+ goto out;
+ }
+
+ event_control = open(event_control_path, O_WRONLY);
+ if (event_control == -1) {
+ fprintf(stderr, "Cannot open %s: %s\n", event_control_path,
+ strerror(errno));
+ goto out;
+ }
+
+ efd = eventfd(0, 0);
+ if (efd == -1) {
+ perror("eventfd() failed");
+ goto out;
+ }
+
+ ret = dprintf(event_control, "%d %d %s", efd, cfd, argv[2]);
+ if (ret == -1) {
+ perror("Cannot write to cgroup.event_control");
+ goto out;
+ }
+
+ while (1) {
+ uint64_t result;
+
+ ret = read(efd, &result, sizeof(result));
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
+ perror("Cannot read from eventfd");
+ break;
+ }
+ assert (ret == sizeof(result));
+
+ ret = access(event_control_path, W_OK);
+ if ((ret == -1) && (errno == ENOENT)) {
+ puts("The cgroup seems to have removed.");
+ ret = 0;
+ break;
+ }
+
+ if (ret == -1) {
+ perror("cgroup.event_control is not accessable any more");
+ break;
+ }
+
+ printf("%s %s: crossed\n", argv[1], argv[2]);
+ }
+
+out:
+ if (efd >= 0)
+ close(efd);
+ if (event_control >= 0)
+ close(event_control);
+ if (cfd >= 0)
+ close(cfd);
+
+ return (ret != 0);
+}
--
1.6.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds
2010-02-19 22:28 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation Kirill A. Shutemov
@ 2010-02-19 22:28 ` Kirill A. Shutemov
2010-02-20 4:26 ` KAMEZAWA Hiroyuki
2010-02-20 4:24 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation KAMEZAWA Hiroyuki
1 sibling, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2010-02-19 22:28 UTC (permalink / raw)
To: containers, linux-mm
Cc: Paul Menage, Li Zefan, Andrew Morton, KAMEZAWA Hiroyuki,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura,
Kirill A. Shutemov
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
---
Documentation/cgroups/memcg_test.txt | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index e011488..4d32e0e 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -396,3 +396,24 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
memory.stat of both A and B.
See 8.2 of Documentation/cgroups/memory.txt to see what value should be
written to move_charge_at_immigrate.
+
+ 9.10 Memory thresholds
+ Memory controler implements memory thresholds using cgroups notification
+ API. You can use Documentation/cgroups/cgroup_event_listener.c to test
+ it.
+
+ (Shell-A) Create cgroup and run event listener
+ # mkdir /cgroup/A
+ # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
+
+ (Shell-B) Add task to cgroup and try to allocate and free memory
+ # echo $$ >/cgroup/A/tasks
+ # a="$(dd if=/dev/zero bs=1M count=10)"
+ # a=
+
+ You will see message from cgroup_event_listener every time you cross
+ the thresholds.
+
+ Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
+
+ It's good idea to test root cgroup as well.
--
1.6.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace
2010-02-19 22:28 [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects Kirill A. Shutemov
@ 2010-02-20 4:00 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-20 4:00 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura
On Sat, 20 Feb 2010 00:28:16 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> Notify userspace about cgroup removing only after rmdir of cgroup
> directory to avoid race between userspace and kernelspace.
>
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Hmm, but could you elaborate what is the race in patch description.
I can imagine by reading your patch but please write it in clear way.
> ---
> kernel/cgroup.c | 32 +++++++++++++++++---------------
> 1 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index ce9008f..46903cb 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -780,28 +780,15 @@ static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
> static int cgroup_call_pre_destroy(struct cgroup *cgrp)
> {
> struct cgroup_subsys *ss;
> - struct cgroup_event *event, *tmp;
> int ret = 0;
>
> for_each_subsys(cgrp->root, ss)
> if (ss->pre_destroy) {
> ret = ss->pre_destroy(ss, cgrp);
> if (ret)
> - goto out;
> + break;
> }
>
> - /*
> - * Unregister events and notify userspace.
> - */
> - spin_lock(&cgrp->event_list_lock);
> - list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> - list_del(&event->list);
> - eventfd_signal(event->eventfd, 1);
> - schedule_work(&event->remove);
> - }
> - spin_unlock(&cgrp->event_list_lock);
> -
> -out:
> return ret;
> }
>
> @@ -2991,7 +2978,6 @@ static void cgroup_event_remove(struct work_struct *work)
> event->cft->unregister_event(cgrp, event->cft, event->eventfd);
>
> eventfd_ctx_put(event->eventfd);
> - remove_wait_queue(event->wqh, &event->wait);
> kfree(event);
> }
>
> @@ -3009,6 +2995,7 @@ static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> unsigned long flags = (unsigned long)key;
>
> if (flags & POLLHUP) {
> + remove_wait_queue_locked(event->wqh, &event->wait);
> spin_lock(&cgrp->event_list_lock);
> list_del(&event->list);
> spin_unlock(&cgrp->event_list_lock);
> @@ -3457,6 +3444,7 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
> struct dentry *d;
> struct cgroup *parent;
> DEFINE_WAIT(wait);
> + struct cgroup_event *event, *tmp;
> int ret;
>
> /* the vfs holds both inode->i_mutex already */
> @@ -3540,6 +3528,20 @@ again:
> set_bit(CGRP_RELEASABLE, &parent->flags);
> check_for_release(parent);
>
> + /*
> + * Unregister events and notify userspace.
> + * Notify userspace about cgroup removing only after rmdir of cgroup
> + * directory to avoid race between userspace and kernelspace
> + */
> + spin_lock(&cgrp->event_list_lock);
> + list_for_each_entry_safe(event, tmp, &cgrp->event_list, list) {
> + list_del(&event->list);
> + remove_wait_queue(event->wqh, &event->wait);
> + eventfd_signal(event->eventfd, 1);
> + schedule_work(&event->remove);
> + }
> + spin_unlock(&cgrp->event_list_lock);
> +
> mutex_unlock(&cgroup_mutex);
> return 0;
> }
> --
> 1.6.6.2
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects
2010-02-19 22:28 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation Kirill A. Shutemov
@ 2010-02-20 4:22 ` KAMEZAWA Hiroyuki
2010-02-20 11:56 ` Kirill A. Shutemov
1 sibling, 1 reply; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-20 4:22 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura
On Sat, 20 Feb 2010 00:28:17 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> Events should be removed after rmdir of cgroup directory, but before
> destroying subsystem state objects. Let's take reference to cgroup
> directory dentry to do that.
>
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Okay, I welcome this.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Just a quesion...After this change, if cgroup has remaining event,
cgroup is removed by workqueue of event->remove() -> d_put(), finally. Right ?
Do you have a test set for checking this behavior ?
Thanks,
-Kame
> ---
> include/linux/cgroup.h | 3 ---
> kernel/cgroup.c | 8 ++++++++
> mm/memcontrol.c | 9 ---------
> 3 files changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 64cebfe..1719c75 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -395,9 +395,6 @@ struct cftype {
> * closes the eventfd or on cgroup removing.
> * This callback must be implemented, if you want provide
> * notification functionality.
> - *
> - * Be careful. It can be called after destroy(), so you have
> - * to keep all nesessary data, until all events are removed.
> */
> int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
> struct eventfd_ctx *eventfd);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 46903cb..d142524 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2979,6 +2979,7 @@ static void cgroup_event_remove(struct work_struct *work)
>
> eventfd_ctx_put(event->eventfd);
> kfree(event);
> + dput(cgrp->dentry);
> }
>
> /*
> @@ -3099,6 +3100,13 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
> goto fail;
> }
>
> + /*
> + * Events should be removed after rmdir of cgroup directory, but before
> + * destroying subsystem state objects. Let's take reference to cgroup
> + * directory dentry to do that.
> + */
> + dget(cgrp->dentry);
> +
> spin_lock(&cgrp->event_list_lock);
> list_add(&event->list, &cgrp->event_list);
> spin_unlock(&cgrp->event_list_lock);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a443c30..8fe6e7f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3358,12 +3358,6 @@ static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
> }
> }
>
> - /*
> - * We need to increment refcnt to be sure that all thresholds
> - * will be unregistered before calling __mem_cgroup_free()
> - */
> - mem_cgroup_get(memcg);
> -
> if (type == _MEM)
> rcu_assign_pointer(memcg->thresholds, thresholds_new);
> else
> @@ -3457,9 +3451,6 @@ assign:
> /* To be sure that nobody uses thresholds before freeing it */
> synchronize_rcu();
>
> - for (i = 0; i < thresholds->size - size; i++)
> - mem_cgroup_put(memcg);
> -
> kfree(thresholds);
> unlock:
> mutex_unlock(&memcg->thresholds_lock);
> --
> 1.6.6.2
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation
2010-02-19 22:28 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds Kirill A. Shutemov
@ 2010-02-20 4:24 ` KAMEZAWA Hiroyuki
1 sibling, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-20 4:24 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura
On Sat, 20 Feb 2010 00:28:18 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Nice. but please add one-line patch description, at least.
(Because it helps we see merge log rather than patch dump.)
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> Documentation/cgroups/cgroup_event_listener.c | 102 +++++++++++++++++++++++++
> 1 files changed, 102 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/cgroups/cgroup_event_listener.c
>
> diff --git a/Documentation/cgroups/cgroup_event_listener.c b/Documentation/cgroups/cgroup_event_listener.c
> new file mode 100644
> index 0000000..a8277b2
> --- /dev/null
> +++ b/Documentation/cgroups/cgroup_event_listener.c
> @@ -0,0 +1,102 @@
> +/*
> + * cgroup_event_listener.c - Simple listener of cgroup events
> + *
> + * Copyright (C) Kirill A. Shutemov <kirill@shutemov.name>
> + */
> +
> +#include <assert.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <libgen.h>
> +#include <limits.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#include <sys/eventfd.h>
> +
> +#define USAGE_STR "Usage: cgroup_event_listener <path-to-control-file> <args>\n"
> +
> +int main(int argc, char **argv)
> +{
> + int efd = -1;
> + int cfd = -1;
> + int event_control = -1;
> + char event_control_path[PATH_MAX];
> + int ret;
> +
> + if (argc != 3) {
> + fputs(USAGE_STR, stderr);
> + return 1;
> + }
> +
> + cfd = open(argv[1], O_RDONLY);
> + if (cfd == -1) {
> + fprintf(stderr, "Cannot open %s: %s\n", argv[1],
> + strerror(errno));
> + goto out;
> + }
> +
> + ret = snprintf(event_control_path, PATH_MAX, "%s/cgroup.event_control",
> + dirname(argv[1]));
> + if (ret > PATH_MAX) {
> + fputs("Path to cgroup.event_control is too long\n", stderr);
> + goto out;
> + }
> +
> + event_control = open(event_control_path, O_WRONLY);
> + if (event_control == -1) {
> + fprintf(stderr, "Cannot open %s: %s\n", event_control_path,
> + strerror(errno));
> + goto out;
> + }
> +
> + efd = eventfd(0, 0);
> + if (efd == -1) {
> + perror("eventfd() failed");
> + goto out;
> + }
> +
> + ret = dprintf(event_control, "%d %d %s", efd, cfd, argv[2]);
> + if (ret == -1) {
> + perror("Cannot write to cgroup.event_control");
> + goto out;
> + }
> +
> + while (1) {
> + uint64_t result;
> +
> + ret = read(efd, &result, sizeof(result));
> + if (ret == -1) {
> + if (errno == EINTR)
> + continue;
> + perror("Cannot read from eventfd");
> + break;
> + }
> + assert (ret == sizeof(result));
> +
> + ret = access(event_control_path, W_OK);
> + if ((ret == -1) && (errno == ENOENT)) {
> + puts("The cgroup seems to have removed.");
> + ret = 0;
> + break;
> + }
> +
> + if (ret == -1) {
> + perror("cgroup.event_control is not accessable any more");
> + break;
> + }
> +
> + printf("%s %s: crossed\n", argv[1], argv[2]);
> + }
> +
> +out:
> + if (efd >= 0)
> + close(efd);
> + if (event_control >= 0)
> + close(event_control);
> + if (cfd >= 0)
> + close(cfd);
> +
> + return (ret != 0);
> +}
> --
> 1.6.6.2
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds
2010-02-19 22:28 ` [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds Kirill A. Shutemov
@ 2010-02-20 4:26 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-02-20 4:26 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura
On Sat, 20 Feb 2010 00:28:19 +0200
"Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Thank you.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> Documentation/cgroups/memcg_test.txt | 21 +++++++++++++++++++++
> 1 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
> index e011488..4d32e0e 100644
> --- a/Documentation/cgroups/memcg_test.txt
> +++ b/Documentation/cgroups/memcg_test.txt
> @@ -396,3 +396,24 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
> memory.stat of both A and B.
> See 8.2 of Documentation/cgroups/memory.txt to see what value should be
> written to move_charge_at_immigrate.
> +
> + 9.10 Memory thresholds
> + Memory controler implements memory thresholds using cgroups notification
> + API. You can use Documentation/cgroups/cgroup_event_listener.c to test
> + it.
> +
> + (Shell-A) Create cgroup and run event listener
> + # mkdir /cgroup/A
> + # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
> +
> + (Shell-B) Add task to cgroup and try to allocate and free memory
> + # echo $$ >/cgroup/A/tasks
> + # a="$(dd if=/dev/zero bs=1M count=10)"
> + # a=
> +
> + You will see message from cgroup_event_listener every time you cross
> + the thresholds.
> +
> + Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
> +
> + It's good idea to test root cgroup as well.
> --
> 1.6.6.2
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects
2010-02-20 4:22 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects KAMEZAWA Hiroyuki
@ 2010-02-20 11:56 ` Kirill A. Shutemov
0 siblings, 0 replies; 9+ messages in thread
From: Kirill A. Shutemov @ 2010-02-20 11:56 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: containers, linux-mm, Paul Menage, Li Zefan, Andrew Morton,
Balbir Singh, Pavel Emelyanov, Dan Malek, Daisuke Nishimura
On Sat, Feb 20, 2010 at 6:22 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Sat, 20 Feb 2010 00:28:17 +0200
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
>
>> Events should be removed after rmdir of cgroup directory, but before
>> destroying subsystem state objects. Let's take reference to cgroup
>> directory dentry to do that.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
>
> Okay, I welcome this.
>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
>
> Just a quesion...After this change, if cgroup has remaining event,
> cgroup is removed by workqueue of event->remove() -> d_put(), finally. Right ?
Yes
> Do you have a test set for checking this behavior ?
Run ./cgroup_event_listener to listen any event in the cgroup. Remove directory
of the cgroup (cgroup_event_listener will detect it). And check 'num_cgroups'
column in /proc/cgroups.
>
> Thanks,
> -Kame
>
>
>
>> ---
>> include/linux/cgroup.h | 3 ---
>> kernel/cgroup.c | 8 ++++++++
>> mm/memcontrol.c | 9 ---------
>> 3 files changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 64cebfe..1719c75 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -395,9 +395,6 @@ struct cftype {
>> * closes the eventfd or on cgroup removing.
>> * This callback must be implemented, if you want provide
>> * notification functionality.
>> - *
>> - * Be careful. It can be called after destroy(), so you have
>> - * to keep all nesessary data, until all events are removed.
>> */
>> int (*unregister_event)(struct cgroup *cgrp, struct cftype *cft,
>> struct eventfd_ctx *eventfd);
>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> index 46903cb..d142524 100644
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -2979,6 +2979,7 @@ static void cgroup_event_remove(struct work_struct *work)
>>
>> eventfd_ctx_put(event->eventfd);
>> kfree(event);
>> + dput(cgrp->dentry);
>> }
>>
>> /*
>> @@ -3099,6 +3100,13 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft,
>> goto fail;
>> }
>>
>> + /*
>> + * Events should be removed after rmdir of cgroup directory, but before
>> + * destroying subsystem state objects. Let's take reference to cgroup
>> + * directory dentry to do that.
>> + */
>> + dget(cgrp->dentry);
>> +
>> spin_lock(&cgrp->event_list_lock);
>> list_add(&event->list, &cgrp->event_list);
>> spin_unlock(&cgrp->event_list_lock);
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index a443c30..8fe6e7f 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3358,12 +3358,6 @@ static int mem_cgroup_register_event(struct cgroup *cgrp, struct cftype *cft,
>> }
>> }
>>
>> - /*
>> - * We need to increment refcnt to be sure that all thresholds
>> - * will be unregistered before calling __mem_cgroup_free()
>> - */
>> - mem_cgroup_get(memcg);
>> -
>> if (type == _MEM)
>> rcu_assign_pointer(memcg->thresholds, thresholds_new);
>> else
>> @@ -3457,9 +3451,6 @@ assign:
>> /* To be sure that nobody uses thresholds before freeing it */
>> synchronize_rcu();
>>
>> - for (i = 0; i < thresholds->size - size; i++)
>> - mem_cgroup_put(memcg);
>> -
>> kfree(thresholds);
>> unlock:
>> mutex_unlock(&memcg->thresholds_lock);
>> --
>> 1.6.6.2
>>
>>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-02-20 11:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-19 22:28 [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation Kirill A. Shutemov
2010-02-19 22:28 ` [PATCH -mmotm 4/4] memcg: Update memcg_test.txt to describe memory thresholds Kirill A. Shutemov
2010-02-20 4:26 ` KAMEZAWA Hiroyuki
2010-02-20 4:24 ` [PATCH -mmotm 3/4] cgroups: Add simple listener of cgroup events to documentation KAMEZAWA Hiroyuki
2010-02-20 4:22 ` [PATCH -mmotm 2/4] cgroups: remove events before destroying subsystem state objects KAMEZAWA Hiroyuki
2010-02-20 11:56 ` Kirill A. Shutemov
2010-02-20 4:00 ` [PATCH -mmotm 1/4] cgroups: Fix race between userspace and kernelspace KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).