* [PATCH 2/2] Memory usage limit notification addition to memcg (v3)
2009-07-14 0:16 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
@ 2009-07-14 0:16 ` Vladislav Buzov
2009-07-14 0:30 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
2009-07-14 0:36 ` Paul Menage
2 siblings, 0 replies; 13+ messages in thread
From: Vladislav Buzov @ 2009-07-14 0:16 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Linux Containers Mailing List, Linux memory management list,
Dan Malek, Andrew Morton, Paul Menage, KAMEZAWA Hiroyuki,
Balbir Singh, Vladislav Buzov
This patch updates the Memory Controller Control Group to add a
configurable memory usage limit notification. The feature was
presented at the April 2009 Embedded Linux Conference.
Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
Signed-off-by: Dan Malek <dan@embeddedalley.com>
---
Documentation/cgroups/mem_notify.txt | 140 ++++++++++++++++++++++++++++++++++
mm/memcontrol.c | 100 ++++++++++++++++++++++++-
2 files changed, 239 insertions(+), 1 deletions(-)
create mode 100644 Documentation/cgroups/mem_notify.txt
diff --git a/Documentation/cgroups/mem_notify.txt b/Documentation/cgroups/mem_notify.txt
new file mode 100644
index 0000000..94be3f3
--- /dev/null
+++ b/Documentation/cgroups/mem_notify.txt
@@ -0,0 +1,140 @@
+
+Memory Limit Notification
+
+Attempts have been made in the past to provide a mechanism for
+the notification to processes (task, an address space) when memory
+usage is approaching a high limit. The intention is that it gives
+the application an opportunity to release some memory and continue
+operation rather than be OOM killed. The CE Linux Forum requested
+a more contemporary implementation, and this is the result.
+
+The memory limit notification is an extension to the existing Memory
+Resource Controller. Please read memory.txt in this directory to
+understand its operation before continuing here.
+
+1. Operation
+
+When the Memory Controller cgroup file system is mounted, the following
+files will appear:
+
+ memory.notify_threshold_in_bytes
+ memory.notify_threshold_lowait
+
+The notification is based upon reaching a threshold below the Memory
+Resource Controller limit (memory.limit_in_bytes). The threshold
+represents the minimal number of bytes that should be available under
+the limit. When the controller group is created, the threshold is set
+to zero which triggers notification when the Memory Resource Controller
+limit is reached.
+
+The threshold may be set by writing to memory.notify_threshold_in_bytes,
+such as:
+
+ echo 10M > memory.notify_threshold_in_bytes
+
+The current number of available bytes may be computed at any time as a
+difference between the memory.limit_in_bytes and memory.usage_in_bytes.
+
+The memory.notify_threshold_lowait is a blocking read file. The read will
+block until one of four conditions occurs:
+
+ - The amount of available memory is equal or less than the threshold
+ defined in memory.notify_threshold_in_bytes
+ - The memory.notify_threshold_lowait file is written with any value (debug)
+ - A thread is moved to another controller group
+ - The cgroup is destroyed or forced empty (memory.force_empty)
+
+
+1.1 Example Usage
+
+An application must be designed to properly take advantage of this
+memory threshold notification feature. It is a powerful management component
+of some operating systems and embedded devices that must provide
+highly available and reliable computing services. The application works
+in conjunction with information provided by the operating system to
+control limited resource usage. Since many programmers still think
+memory is infinite and never check the return value from malloc(), it
+may come as a surprise that such mechanisms have been utilized long ago.
+
+A typical application will be multithreaded, with one thread either
+polling or waiting for the notification event. When the event occurs,
+the thread will take whatever action is appropriate within the application
+design. This could be actually running a garbage collection algorithm
+or to simply signal other processing threads they must do something to
+reduce their memory usage. The notification thread will then be required
+to poll the actual usage until the low limit of its choosing is met,
+at which time the reclaim of memory can stop and the notification thread
+will wait for the next event.
+
+Internally, the application only needs to
+fopen("memory.notify_usage_in_bytes" ..) or
+fopen("memory.notify_threshold_lowait" ...), then either poll the former
+files or block read on the latter file using fread() or fscanf() as desired.
+Subtracting the value returned from either of these read function from the
+value obtained by reading memory.limit_in_bytes and further comparing it with
+the threshold obtained by reading memory.notify_threshold_in_bytes will be an
+indication of the amount of memory used over the threshold limit.
+
+2. Configuration
+
+Follow the instructions in memory.txt for the configuration and usage of
+the Memory Resource Controller cgroup. Once this is created and tasks
+assigned, use the memory threshold notification as described here.
+
+The only action that is needed outside of the application waiting or polling
+is to set the memory.notify_threshold_in_bytes. To set a notification to occur
+when memory usage of the cgroup reaches or exceeds 1 MByte below the limit
+can be simply done:
+
+ echo 1M > memory.notify_threshold_in_bytes
+
+This value may be read or changed at any time. Writing a higher value once
+the Memory Resource Controller is in operation may trigger immediate
+notification if the usage is above the new threshold. Writing a value higher
+than the Memory Controller limit will cause an error while setting the limit
+lower than the threshold will cause setting the threshold to zero.
+
+3. Debug and Testing
+
+The design of cgroups makes it easier to perform some debugging or
+monitoring tasks without modification to the application. For example,
+a write of any value to memory.notify_threshold_lowait will wake up all
+threads waiting for notifications regardless of current memory usage.
+
+Collecting performance data about the cgroup is also simplified, as
+no application modifications are necessary. A separate task can be
+created that will open and monitor any necessary files of the cgroup
+(such as current limits, usage and usage percentages and even when
+notification occurs). This task can also operate outside of the cgroup,
+so its memory usage is not charged to the cgroup.
+
+4. Design
+
+The Memory Resource Controller utilizes the Resource Counter to track and manage
+the memory of the Control Group. The Resource Counter was extended to support
+the resource usage threshold, which is the minimal difference between the
+resource limit and usage causing the notification. For the Memory Controller
+cgroup it means a number of bytes of the memory not in use so the cgroup
+parameters may continue to be dynamically modified without the need to modify
+the notification parameters. Otherwise, the notification threshold would have
+to also be computed and modified on any Memory Resource Controller operating
+parameter change.
+
+The cgroup file semantics are not well suited for this type of notification
+mechanism. While applications may choose to simply poll the current
+usage at their convenience, it was also desired to have a notification
+event that would trigger when the usage attained the threshold. The
+blocking read() was chosen, as it is the only current useful method.
+This presented the problems of "out of band" notification, when you want
+to return some exceptional status other than reaching the notification
+threshold. In the cases listed above, the read() on the
+memory.notify_threshold_lowait file will not block and return "0" for
+the remaining size. When this occurs, the thread must determine if the task
+has moved to a new cgroup or if the cgroup has been destroyed. Due to
+the usage model of this cgroup, neither is likely to happen during normal
+operation of a product.
+
+Dan Malek <dan@embeddedalley.com>
+Vladislav Buzov <vbuzov@embeddedalley.com>
+Embedded Alley Solutions, Inc.
+10 July 2009
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2fa20d..3b49fd4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,11 @@
* Copyright 2007 OpenVZ SWsoft Inc
* Author: Pavel Emelianov <xemul@openvz.org>
*
+ * Memory Limit Notification update
+ * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
+ * Author: Dan Malek <dan@embeddedalley.com>
+ * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
@@ -180,6 +185,9 @@ struct mem_cgroup {
/* set when res.limit == memsw.limit */
bool memsw_is_minimum;
+ /* tasks waiting for memory usage threshold notification */
+ wait_queue_head_t notify_threshold_wait;
+
/*
* statistics. This must be placed at the end of memcg.
*/
@@ -2052,7 +2060,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
}
/*
* The user of this function is...
- * RES_LIMIT.
+ * RES_LIMIT, RES_THRESHOLD
*/
static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
const char *buffer)
@@ -2075,6 +2083,17 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
else
ret = mem_cgroup_resize_memsw_limit(memcg, val);
break;
+ case RES_THRESHOLD:
+ /* This function does all necessary parse...reuse it */
+ ret = res_counter_memparse_write_strategy(buffer, &val);
+ if (ret)
+ break;
+ /* For memsw threshold is not implemented */
+ if (type == _MEM)
+ ret = res_counter_set_threshold(&memcg->res, val);
+ else
+ ret = -EINVAL;
+ break;
default:
ret = -EINVAL; /* should be BUG() ? */
break;
@@ -2308,6 +2327,68 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
return 0;
}
+/*
+ * This is a blocking read operation forcing a reader to sleep unless
+ * a low memory condition occurs, someone intentionaly writes to
+ * "memory.notify_threshold_lowait" or cgroup state is changed. E.g.
+ * the cgroup is destroyed or task is moved to another cgroup.
+ */
+static u64 mem_cgroup_notify_threshold_lowait(struct cgroup *cgrp,
+ struct cftype *cft)
+{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+ DEFINE_WAIT(notify_lowait);
+
+ /*
+ * A memory resource usage of zero is a special case that
+ * causes us not to sleep. It normally happens when the
+ * cgroup is about to be destroyed, and we don't want someone
+ * trying to sleep on a queue that is about to go away. This
+ * condition can also be forced as part of testing.
+ */
+ if (likely(mem->res.usage != 0)) {
+ prepare_to_wait(&mem->notify_threshold_wait, ¬ify_lowait,
+ TASK_INTERRUPTIBLE);
+
+ if (res_counter_check_under_threshold(&mem->res))
+ schedule();
+
+ finish_wait(&mem->notify_threshold_wait, ¬ify_lowait);
+ }
+
+ return res_counter_read_u64(&mem->res, RES_USAGE);
+}
+
+/*
+ * Memory usage threshold notification callback. Called under disabled
+ * interrupts by the memory resource counter when low memory condition
+ * occurs.
+ */
+static void mem_cgroup_res_threshold_notifier(struct res_counter *cnt)
+{
+ struct mem_cgroup *memcg;
+
+ memcg = mem_cgroup_from_res_counter(cnt, res);
+ if (waitqueue_active(&memcg->notify_threshold_wait))
+ wake_up_locked(&memcg->notify_threshold_wait);
+}
+
+/*
+ * This is used to wake up all threads that may be hanging
+ * out waiting for a low memory condition prior to that happening.
+ * Useful for triggering the event to assist with debug of applications.
+ */
+static int mem_cgroup_notify_threshold_wake_em_up(struct cgroup *cgrp,
+ unsigned int event)
+{
+ struct mem_cgroup *memcg;
+
+ memcg = mem_cgroup_from_cont(cgrp);
+ if (waitqueue_active(&memcg->notify_threshold_wait))
+ wake_up(&memcg->notify_threshold_wait);
+ return 0;
+}
+
static struct cftype mem_cgroup_files[] = {
{
@@ -2351,6 +2432,17 @@ static struct cftype mem_cgroup_files[] = {
.read_u64 = mem_cgroup_swappiness_read,
.write_u64 = mem_cgroup_swappiness_write,
},
+ {
+ .name = "notify_threshold_in_bytes",
+ .private = MEMFILE_PRIVATE(_MEM, RES_THRESHOLD),
+ .write_string = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read,
+ },
+ {
+ .name = "notify_threshold_lowait",
+ .trigger = mem_cgroup_notify_threshold_wake_em_up,
+ .read_u64 = mem_cgroup_notify_threshold_lowait,
+ },
};
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -2554,6 +2646,9 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
mem->last_scanned_child = 0;
spin_lock_init(&mem->reclaim_param_lock);
+ init_waitqueue_head(&mem->notify_threshold_wait);
+ mem->res.threshold_notifier = mem_cgroup_res_threshold_notifier;
+
if (parent)
mem->swappiness = get_swappiness(parent);
atomic_set(&mem->refcnt, 1);
@@ -2568,6 +2663,7 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
{
struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+ mem_cgroup_notify_threshold_wake_em_up(cont, 0);
return mem_cgroup_force_empty(mem, false);
}
@@ -2597,6 +2693,8 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
struct cgroup *old_cont,
struct task_struct *p)
{
+ mem_cgroup_notify_threshold_wake_em_up(old_cont, 0);
+
mutex_lock(&memcg_tasklist);
/*
* FIXME: It's better to move charges of this process from old
--
1.5.6.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
2009-07-14 0:16 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
2009-07-14 0:16 ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
@ 2009-07-14 0:30 ` KAMEZAWA Hiroyuki
2009-07-14 1:29 ` Vladislav D. Buzov
2009-07-14 0:36 ` Paul Menage
2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14 0:30 UTC (permalink / raw)
To: Vladislav Buzov
Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
Linux memory management list, Dan Malek, Andrew Morton,
Paul Menage, Balbir Singh
On Mon, 13 Jul 2009 17:16:20 -0700
Vladislav Buzov <vbuzov@embeddedalley.com> wrote:
> This patch updates the Resource Counter to add a configurable resource usage
> threshold notification mechanism.
>
> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> ---
> Documentation/cgroups/resource_counter.txt | 21 ++++++++-
> include/linux/res_counter.h | 69 ++++++++++++++++++++++++++++
> kernel/res_counter.c | 7 +++
> 3 files changed, 95 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> index 95b24d7..1369dff 100644
> --- a/Documentation/cgroups/resource_counter.txt
> +++ b/Documentation/cgroups/resource_counter.txt
> @@ -39,7 +39,20 @@ to work with it.
> The failcnt stands for "failures counter". This is the number of
> resource allocation attempts that failed.
>
> - c. spinlock_t lock
> + e. unsigned long long threshold
> +
> + The resource usage threshold to notify the resouce controller. This is
> + the minimal difference between the resource limit and current usage
> + to fire a notification.
> +
> + f. void (*threshold_notifier)(struct res_counter *counter)
> +
> + The threshold notification callback installed by the resource
> + controller. Called when the usage reaches or exceeds the threshold.
> + Should be fast and not sleep because called when interrupts are
> + disabled.
> +
This interface isn't very useful..hard to use..can't you just return the result as
"exceeds threshold" to the callers ?
If I was you, I'll add following state to res_counter
enum {
RES_BELOW_THRESH,
RES_OVER_THRESH,
} res_state;
struct res_counter {
.....
enum res_state state;
}
Then, caller does
example)
prev_state = res->state;
res_counter_charge(res....)
if (prev_state != res->state)
do_xxxxx..
notifier under spinlock is not usual interface. And if this is "notifier",
something generic, notifier_call_chain should be used rather than original
one, IIUC.
So, avoiding to use "callback" is a way to go, I think.
Thanks,
-Kame
> + g. spinlock_t lock
>
> Protects changes of the above values.
>
> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
> usage usage_in_<unit_of_measurement>
> max_usage max_usage_in_<unit_of_measurement>
> limit limit_in_<unit_of_measurement>
> + threshold notify_threshold_in_<unit_of_measurement>
> failcnt failcnt
> lock no file :)
>
> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
> usage prohibited
> max_usage reset to usage
> limit set the limit
> + threshold set the threshold
> failcnt reset to zero
>
> -
> + d. Notification is enabled by installing the threshold notifier callback. It
> + is up to the resouce controller to communicate the notification to user
> + space tasks.
>
> 5. Usage example
>
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 511f42f..5ec98d7 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,11 @@
> *
> * Author: Pavel Emelianov <xemul@openvz.org>
> *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> + *
> * See Documentation/cgroups/resource_counter.txt for more
> * info about what this counter is.
> */
> @@ -35,6 +40,19 @@ struct res_counter {
> */
> unsigned long long limit;
> /*
> + * the resource usage threshold to notify the resouce controller. This
> + * is the minimal difference between the resource limit and current
> + * usage to fire a notification.
> + */
> + unsigned long long threshold;
> + /*
> + * the threshold notification callback installed by the resource
> + * controller. Called when the usage reaches or exceeds the threshold.
> + * Should be fast and not sleep because called when interrupts are
> + * disabled.
> + */
> + void (*threshold_notifier)(struct res_counter *counter);
> + /*
> * the number of unsuccessful attempts to consume the resource
> */
> unsigned long long failcnt;
> @@ -87,6 +105,7 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_THRESHOLD,
> };
>
> /*
> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> return false;
> }
>
> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> +{
> + if (cnt->usage + cnt->threshold < cnt->limit)
> + return true;
> +
> + return false;
> +}
> +
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> + if (!res_counter_threshold_check_locked(cnt) &&
> + cnt->threshold_notifier)
> + cnt->threshold_notifier(cnt);
> +}
> +
> /*
> * Helper function to detect if the cgroup is within it's limit or
> * not. It's currently called from cgroup_rss_prepare()
> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> return ret;
> }
>
> +/*
> + * Helper function to detect if the cgroup usage is under it's threshold or
> + * not.
> + */
> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> +{
> + bool ret;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&cnt->lock, flags);
> + ret = res_counter_threshold_check_locked(cnt);
> + spin_unlock_irqrestore(&cnt->lock, flags);
> + return ret;
> +}
> +
> static inline void res_counter_reset_max(struct res_counter *cnt)
> {
> unsigned long flags;
> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
> spin_lock_irqsave(&cnt->lock, flags);
> if (cnt->usage <= limit) {
> cnt->limit = limit;
> + if (limit <= cnt->threshold)
> + cnt->threshold = 0;
> + else
> + res_counter_threshold_notify_locked(cnt);
> + ret = 0;
> + }
> + spin_unlock_irqrestore(&cnt->lock, flags);
> + return ret;
> +}
> +
> +static inline int res_counter_set_threshold(struct res_counter *cnt,
> + unsigned long long threshold)
> +{
> + unsigned long flags;
> + int ret = -EINVAL;
> +
> + spin_lock_irqsave(&cnt->lock, flags);
> + if (cnt->limit > threshold) {
> + cnt->threshold = threshold;
> + res_counter_threshold_notify_locked(cnt);
> ret = 0;
> }
> spin_unlock_irqrestore(&cnt->lock, flags);
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index e1338f0..9b36748 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -5,6 +5,10 @@
> *
> * Author: Pavel Emelianov <xemul@openvz.org>
> *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> */
>
> #include <linux/types.h>
> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> counter->usage += val;
> if (counter->usage > counter->max_usage)
> counter->max_usage = counter->usage;
> + res_counter_threshold_notify_locked(counter);
> return 0;
> }
>
> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_THRESHOLD:
> + return &counter->threshold;
> };
>
> BUG();
> --
> 1.5.6.3
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
2009-07-14 0:30 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
@ 2009-07-14 1:29 ` Vladislav D. Buzov
2009-07-14 1:45 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 13+ messages in thread
From: Vladislav D. Buzov @ 2009-07-14 1:29 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
Linux memory management list, Dan Malek, Andrew Morton,
Paul Menage, Balbir Singh
KAMEZAWA Hiroyuki wrote:
> On Mon, 13 Jul 2009 17:16:20 -0700
> Vladislav Buzov <vbuzov@embeddedalley.com> wrote:
>
>
>> This patch updates the Resource Counter to add a configurable resource usage
>> threshold notification mechanism.
>>
>> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
>> Signed-off-by: Dan Malek <dan@embeddedalley.com>
>> ---
>> Documentation/cgroups/resource_counter.txt | 21 ++++++++-
>> include/linux/res_counter.h | 69 ++++++++++++++++++++++++++++
>> kernel/res_counter.c | 7 +++
>> 3 files changed, 95 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
>> index 95b24d7..1369dff 100644
>> --- a/Documentation/cgroups/resource_counter.txt
>> +++ b/Documentation/cgroups/resource_counter.txt
>> @@ -39,7 +39,20 @@ to work with it.
>> The failcnt stands for "failures counter". This is the number of
>> resource allocation attempts that failed.
>>
>> - c. spinlock_t lock
>> + e. unsigned long long threshold
>> +
>> + The resource usage threshold to notify the resouce controller. This is
>> + the minimal difference between the resource limit and current usage
>> + to fire a notification.
>> +
>> + f. void (*threshold_notifier)(struct res_counter *counter)
>> +
>> + The threshold notification callback installed by the resource
>> + controller. Called when the usage reaches or exceeds the threshold.
>> + Should be fast and not sleep because called when interrupts are
>> + disabled.
>> +
>>
>
> This interface isn't very useful..hard to use..can't you just return the result as
> "exceeds threshold" to the callers ?
>
> If I was you, I'll add following state to res_counter
>
> enum {
> RES_BELOW_THRESH,
> RES_OVER_THRESH,
> } res_state;
>
> struct res_counter {
> .....
> enum res_state state;
> }
>
> Then, caller does
> example)
> prev_state = res->state;
> res_counter_charge(res....)
> if (prev_state != res->state)
> do_xxxxx..
>
> notifier under spinlock is not usual interface. And if this is "notifier",
> something generic, notifier_call_chain should be used rather than original
> one, IIUC.
>
> So, avoiding to use "callback" is a way to go, I think.
>
>
The reason of having this callback is to support the hierarchy, which
was the problem in previous implementation you pointed out.
When a new page charged we want to walk up the hierarchy and find all
the ancestors exceeding their thresholds and notify them. To avoid
walking up the hierarchy twice, I've expanded res_counter with "notifier
callback" called by res_counter_charge() for each res_counter in the
tree which exceeds the limit.
In the example above, the hierarchy is not supported. We know only state
of the res_counter/memcg which current thread belongs to.
Thanks,
Vlad.
> Thanks,
> -Kame
>
>
>
>
>
>> + g. spinlock_t lock
>>
>> Protects changes of the above values.
>>
>> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
>> usage usage_in_<unit_of_measurement>
>> max_usage max_usage_in_<unit_of_measurement>
>> limit limit_in_<unit_of_measurement>
>> + threshold notify_threshold_in_<unit_of_measurement>
>> failcnt failcnt
>> lock no file :)
>>
>> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
>> usage prohibited
>> max_usage reset to usage
>> limit set the limit
>> + threshold set the threshold
>> failcnt reset to zero
>>
>> -
>> + d. Notification is enabled by installing the threshold notifier callback. It
>> + is up to the resouce controller to communicate the notification to user
>> + space tasks.
>>
>> 5. Usage example
>>
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index 511f42f..5ec98d7 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -9,6 +9,11 @@
>> *
>> * Author: Pavel Emelianov <xemul@openvz.org>
>> *
>> + * Resouce usage threshold notification update
>> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
>> + * Author: Dan Malek <dan@embeddedalley.com>
>> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>> + *
>> * See Documentation/cgroups/resource_counter.txt for more
>> * info about what this counter is.
>> */
>> @@ -35,6 +40,19 @@ struct res_counter {
>> */
>> unsigned long long limit;
>> /*
>> + * the resource usage threshold to notify the resouce controller. This
>> + * is the minimal difference between the resource limit and current
>> + * usage to fire a notification.
>> + */
>> + unsigned long long threshold;
>> + /*
>> + * the threshold notification callback installed by the resource
>> + * controller. Called when the usage reaches or exceeds the threshold.
>> + * Should be fast and not sleep because called when interrupts are
>> + * disabled.
>> + */
>> + void (*threshold_notifier)(struct res_counter *counter);
>> + /*
>> * the number of unsuccessful attempts to consume the resource
>> */
>> unsigned long long failcnt;
>> @@ -87,6 +105,7 @@ enum {
>> RES_MAX_USAGE,
>> RES_LIMIT,
>> RES_FAILCNT,
>> + RES_THRESHOLD,
>> };
>>
>> /*
>> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
>> return false;
>> }
>>
>> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
>> +{
>> + if (cnt->usage + cnt->threshold < cnt->limit)
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
>> +{
>> + if (!res_counter_threshold_check_locked(cnt) &&
>> + cnt->threshold_notifier)
>> + cnt->threshold_notifier(cnt);
>> +}
>> +
>> /*
>> * Helper function to detect if the cgroup is within it's limit or
>> * not. It's currently called from cgroup_rss_prepare()
>> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>> return ret;
>> }
>>
>> +/*
>> + * Helper function to detect if the cgroup usage is under it's threshold or
>> + * not.
>> + */
>> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
>> +{
>> + bool ret;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&cnt->lock, flags);
>> + ret = res_counter_threshold_check_locked(cnt);
>> + spin_unlock_irqrestore(&cnt->lock, flags);
>> + return ret;
>> +}
>> +
>> static inline void res_counter_reset_max(struct res_counter *cnt)
>> {
>> unsigned long flags;
>> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
>> spin_lock_irqsave(&cnt->lock, flags);
>> if (cnt->usage <= limit) {
>> cnt->limit = limit;
>> + if (limit <= cnt->threshold)
>> + cnt->threshold = 0;
>> + else
>> + res_counter_threshold_notify_locked(cnt);
>> + ret = 0;
>> + }
>> + spin_unlock_irqrestore(&cnt->lock, flags);
>> + return ret;
>> +}
>> +
>> +static inline int res_counter_set_threshold(struct res_counter *cnt,
>> + unsigned long long threshold)
>> +{
>> + unsigned long flags;
>> + int ret = -EINVAL;
>> +
>> + spin_lock_irqsave(&cnt->lock, flags);
>> + if (cnt->limit > threshold) {
>> + cnt->threshold = threshold;
>> + res_counter_threshold_notify_locked(cnt);
>> ret = 0;
>> }
>> spin_unlock_irqrestore(&cnt->lock, flags);
>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>> index e1338f0..9b36748 100644
>> --- a/kernel/res_counter.c
>> +++ b/kernel/res_counter.c
>> @@ -5,6 +5,10 @@
>> *
>> * Author: Pavel Emelianov <xemul@openvz.org>
>> *
>> + * Resouce usage threshold notification update
>> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
>> + * Author: Dan Malek <dan@embeddedalley.com>
>> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>> */
>>
>> #include <linux/types.h>
>> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>> counter->usage += val;
>> if (counter->usage > counter->max_usage)
>> counter->max_usage = counter->usage;
>> + res_counter_threshold_notify_locked(counter);
>> return 0;
>> }
>>
>> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
>> return &counter->limit;
>> case RES_FAILCNT:
>> return &counter->failcnt;
>> + case RES_THRESHOLD:
>> + return &counter->threshold;
>> };
>>
>> BUG();
>> --
>> 1.5.6.3
>>
>>
>>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
2009-07-14 1:29 ` Vladislav D. Buzov
@ 2009-07-14 1:45 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14 1:45 UTC (permalink / raw)
To: Vladislav D. Buzov
Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
Linux memory management list, Dan Malek, Andrew Morton,
Paul Menage, Balbir Singh
On Mon, 13 Jul 2009 18:29:01 -0700
"Vladislav D. Buzov" <vbuzov@embeddedalley.com> wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Mon, 13 Jul 2009 17:16:20 -0700
> > Vladislav Buzov <vbuzov@embeddedalley.com> wrote:
> >
> >
> >> This patch updates the Resource Counter to add a configurable resource usage
> >> threshold notification mechanism.
> >>
> >> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> >> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> >> ---
> >> Documentation/cgroups/resource_counter.txt | 21 ++++++++-
> >> include/linux/res_counter.h | 69 ++++++++++++++++++++++++++++
> >> kernel/res_counter.c | 7 +++
> >> 3 files changed, 95 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> >> index 95b24d7..1369dff 100644
> >> --- a/Documentation/cgroups/resource_counter.txt
> >> +++ b/Documentation/cgroups/resource_counter.txt
> >> @@ -39,7 +39,20 @@ to work with it.
> >> The failcnt stands for "failures counter". This is the number of
> >> resource allocation attempts that failed.
> >>
> >> - c. spinlock_t lock
> >> + e. unsigned long long threshold
> >> +
> >> + The resource usage threshold to notify the resouce controller. This is
> >> + the minimal difference between the resource limit and current usage
> >> + to fire a notification.
> >> +
> >> + f. void (*threshold_notifier)(struct res_counter *counter)
> >> +
> >> + The threshold notification callback installed by the resource
> >> + controller. Called when the usage reaches or exceeds the threshold.
> >> + Should be fast and not sleep because called when interrupts are
> >> + disabled.
> >> +
> >>
> >
> > This interface isn't very useful..hard to use..can't you just return the result as
> > "exceeds threshold" to the callers ?
> >
> > If I was you, I'll add following state to res_counter
> >
> > enum {
> > RES_BELOW_THRESH,
> > RES_OVER_THRESH,
> > } res_state;
> >
> > struct res_counter {
> > .....
> > enum res_state state;
> > }
> >
> > Then, caller does
> > example)
> > prev_state = res->state;
> > res_counter_charge(res....)
> > if (prev_state != res->state)
> > do_xxxxx..
> >
> > notifier under spinlock is not usual interface. And if this is "notifier",
> > something generic, notifier_call_chain should be used rather than original
> > one, IIUC.
> >
> > So, avoiding to use "callback" is a way to go, I think.
> >
> >
> The reason of having this callback is to support the hierarchy, which
> was the problem in previous implementation you pointed out.
>
> When a new page charged we want to walk up the hierarchy and find all
> the ancestors exceeding their thresholds and notify them. To avoid
> walking up the hierarchy twice, I've expanded res_counter with "notifier
> callback" called by res_counter_charge() for each res_counter in the
> tree which exceeds the limit.
>
> In the example above, the hierarchy is not supported. We know only state
> of the res_counter/memcg which current thread belongs to.
>
How heavy res_coutner can be ? ;) plz don't check at "every charge", use some
filter.
plz discuss with Balbir. His softlimit adds something similar. And I don't think
both are elegant.
I'll consider more (of course, I may not be able to find any..) and rewrite the
whole thing if I have a chance.
Briefly thinking, it's not very bad to have following interface.
==
/*
* This function is for checking all ancestors's state. Each ancestors are
* pased to check_function() ony be one until res->parent is not NULL.
*/
void res_counter_callback(struct res_counter *res, int (*check_function)())
{
do {
if ((*check_function)(res))
break;
res = res->parent;
} while (res);
}
==
Calling this once per 1000 charges or once per sec will not be very bad. And we can
keep res_counter simple. If you want some trigger, you can add something as
you like.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
2009-07-14 0:16 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
2009-07-14 0:16 ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
2009-07-14 0:30 ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
@ 2009-07-14 0:36 ` Paul Menage
2009-07-14 0:47 ` KAMEZAWA Hiroyuki
2 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2009-07-14 0:36 UTC (permalink / raw)
To: Vladislav Buzov
Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
Linux memory management list, Dan Malek, Andrew Morton,
KAMEZAWA Hiroyuki, Balbir Singh
As I mentioned in another thread, I think that associating the
threshold with the res_counter rather than with each individual waiter
is a mistake, since it creates global state and makes it hard to have
multiple waiters on the same cgroup.
Paul
On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
Buzov<vbuzov@embeddedalley.com> wrote:
> This patch updates the Resource Counter to add a configurable resource usage
> threshold notification mechanism.
>
> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> ---
> Documentation/cgroups/resource_counter.txt | 21 ++++++++-
> include/linux/res_counter.h | 69 ++++++++++++++++++++++++++++
> kernel/res_counter.c | 7 +++
> 3 files changed, 95 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> index 95b24d7..1369dff 100644
> --- a/Documentation/cgroups/resource_counter.txt
> +++ b/Documentation/cgroups/resource_counter.txt
> @@ -39,7 +39,20 @@ to work with it.
> The failcnt stands for "failures counter". This is the number of
> resource allocation attempts that failed.
>
> - c. spinlock_t lock
> + e. unsigned long long threshold
> +
> + The resource usage threshold to notify the resouce controller. This is
> + the minimal difference between the resource limit and current usage
> + to fire a notification.
> +
> + f. void (*threshold_notifier)(struct res_counter *counter)
> +
> + The threshold notification callback installed by the resource
> + controller. Called when the usage reaches or exceeds the threshold.
> + Should be fast and not sleep because called when interrupts are
> + disabled.
> +
> + g. spinlock_t lock
>
> Protects changes of the above values.
>
> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
> usage usage_in_<unit_of_measurement>
> max_usage max_usage_in_<unit_of_measurement>
> limit limit_in_<unit_of_measurement>
> + threshold notify_threshold_in_<unit_of_measurement>
> failcnt failcnt
> lock no file :)
>
> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
> usage prohibited
> max_usage reset to usage
> limit set the limit
> + threshold set the threshold
> failcnt reset to zero
>
> -
> + d. Notification is enabled by installing the threshold notifier callback. It
> + is up to the resouce controller to communicate the notification to user
> + space tasks.
>
> 5. Usage example
>
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 511f42f..5ec98d7 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,11 @@
> *
> * Author: Pavel Emelianov <xemul@openvz.org>
> *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> + *
> * See Documentation/cgroups/resource_counter.txt for more
> * info about what this counter is.
> */
> @@ -35,6 +40,19 @@ struct res_counter {
> */
> unsigned long long limit;
> /*
> + * the resource usage threshold to notify the resouce controller. This
> + * is the minimal difference between the resource limit and current
> + * usage to fire a notification.
> + */
> + unsigned long long threshold;
> + /*
> + * the threshold notification callback installed by the resource
> + * controller. Called when the usage reaches or exceeds the threshold.
> + * Should be fast and not sleep because called when interrupts are
> + * disabled.
> + */
> + void (*threshold_notifier)(struct res_counter *counter);
> + /*
> * the number of unsuccessful attempts to consume the resource
> */
> unsigned long long failcnt;
> @@ -87,6 +105,7 @@ enum {
> RES_MAX_USAGE,
> RES_LIMIT,
> RES_FAILCNT,
> + RES_THRESHOLD,
> };
>
> /*
> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> return false;
> }
>
> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> +{
> + if (cnt->usage + cnt->threshold < cnt->limit)
> + return true;
> +
> + return false;
> +}
> +
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> + if (!res_counter_threshold_check_locked(cnt) &&
> + cnt->threshold_notifier)
> + cnt->threshold_notifier(cnt);
> +}
> +
> /*
> * Helper function to detect if the cgroup is within it's limit or
> * not. It's currently called from cgroup_rss_prepare()
> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> return ret;
> }
>
> +/*
> + * Helper function to detect if the cgroup usage is under it's threshold or
> + * not.
> + */
> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> +{
> + bool ret;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&cnt->lock, flags);
> + ret = res_counter_threshold_check_locked(cnt);
> + spin_unlock_irqrestore(&cnt->lock, flags);
> + return ret;
> +}
> +
> static inline void res_counter_reset_max(struct res_counter *cnt)
> {
> unsigned long flags;
> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
> spin_lock_irqsave(&cnt->lock, flags);
> if (cnt->usage <= limit) {
> cnt->limit = limit;
> + if (limit <= cnt->threshold)
> + cnt->threshold = 0;
> + else
> + res_counter_threshold_notify_locked(cnt);
> + ret = 0;
> + }
> + spin_unlock_irqrestore(&cnt->lock, flags);
> + return ret;
> +}
> +
> +static inline int res_counter_set_threshold(struct res_counter *cnt,
> + unsigned long long threshold)
> +{
> + unsigned long flags;
> + int ret = -EINVAL;
> +
> + spin_lock_irqsave(&cnt->lock, flags);
> + if (cnt->limit > threshold) {
> + cnt->threshold = threshold;
> + res_counter_threshold_notify_locked(cnt);
> ret = 0;
> }
> spin_unlock_irqrestore(&cnt->lock, flags);
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index e1338f0..9b36748 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -5,6 +5,10 @@
> *
> * Author: Pavel Emelianov <xemul@openvz.org>
> *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> */
>
> #include <linux/types.h>
> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> counter->usage += val;
> if (counter->usage > counter->max_usage)
> counter->max_usage = counter->usage;
> + res_counter_threshold_notify_locked(counter);
> return 0;
> }
>
> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
> return &counter->limit;
> case RES_FAILCNT:
> return &counter->failcnt;
> + case RES_THRESHOLD:
> + return &counter->threshold;
> };
>
> BUG();
> --
> 1.5.6.3
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
2009-07-14 0:36 ` Paul Menage
@ 2009-07-14 0:47 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14 0:47 UTC (permalink / raw)
To: Paul Menage
Cc: Vladislav Buzov, Linux Kernel Mailing List,
Linux Containers Mailing List, Linux memory management list,
Dan Malek, Andrew Morton, Balbir Singh
On Mon, 13 Jul 2009 17:36:40 -0700
Paul Menage <menage@google.com> wrote:
> As I mentioned in another thread, I think that associating the
> threshold with the res_counter rather than with each individual waiter
> is a mistake, since it creates global state and makes it hard to have
> multiple waiters on the same cgroup.
>
Ah, Hmm...maybe yes.
But the problem is "hierarchy". (even if this usage notifier don't handle it.)
While we charge as following res_coutner+hierarchy
res_counter_A + PAGE_SIZE
res_counter_B + PAGE_SIZE
res_counter_C + PAGE_SIZE
Checking "where we exceeds" in smart way is not very easy. Balbir's soft limit does
similar check but it's not very smart, either I think.
If there are prural thesholds (notifer, softlimit, etc...), this is worth to be
tried. Hmm...if not, size of res_coutner excees 128bytes and we'll see terrible counter.
Any idea ?
Thanks,
-Kame
> Paul
>
> On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
> Buzov<vbuzov@embeddedalley.com> wrote:
> > This patch updates the Resource Counter to add a configurable resource usage
> > threshold notification mechanism.
> >
> > Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> > Signed-off-by: Dan Malek <dan@embeddedalley.com>
> > ---
> > A Documentation/cgroups/resource_counter.txt | A 21 ++++++++-
> > A include/linux/res_counter.h A A A A A A A A | A 69 ++++++++++++++++++++++++++++
> > A kernel/res_counter.c A A A A A A A A A A A | A A 7 +++
> > A 3 files changed, 95 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> > index 95b24d7..1369dff 100644
> > --- a/Documentation/cgroups/resource_counter.txt
> > +++ b/Documentation/cgroups/resource_counter.txt
> > @@ -39,7 +39,20 @@ to work with it.
> > A A A A The failcnt stands for "failures counter". This is the number of
> > A A A A resource allocation attempts that failed.
> >
> > - c. spinlock_t lock
> > + e. unsigned long long threshold
> > +
> > + A A A The resource usage threshold to notify the resouce controller. This is
> > + A A A the minimal difference between the resource limit and current usage
> > + A A A to fire a notification.
> > +
> > + f. void (*threshold_notifier)(struct res_counter *counter)
> > +
> > + A A A The threshold notification callback installed by the resource
> > + A A A controller. Called when the usage reaches or exceeds the threshold.
> > + A A A Should be fast and not sleep because called when interrupts are
> > + A A A disabled.
> > +
> > + g. spinlock_t lock
> >
> > A A A A Protects changes of the above values.
> >
> > @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
> > A A A A usage A A A A A usage_in_<unit_of_measurement>
> > A A A A max_usage A A A max_usage_in_<unit_of_measurement>
> > A A A A limit A A A A A limit_in_<unit_of_measurement>
> > + A A A threshold A A A notify_threshold_in_<unit_of_measurement>
> > A A A A failcnt A A A A failcnt
> > A A A A lock A A A A A A no file :)
> >
> > @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
> > A A A A usage A A A A A prohibited
> > A A A A max_usage A A A reset to usage
> > A A A A limit A A A A A set the limit
> > + A A A threshold A A A set the threshold
> > A A A A failcnt A A A A reset to zero
> >
> > -
> > + d. Notification is enabled by installing the threshold notifier callback. It
> > + A A is up to the resouce controller to communicate the notification to user
> > + A A space tasks.
> >
> > A 5. Usage example
> >
> > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> > index 511f42f..5ec98d7 100644
> > --- a/include/linux/res_counter.h
> > +++ b/include/linux/res_counter.h
> > @@ -9,6 +9,11 @@
> > A *
> > A * Author: Pavel Emelianov <xemul@openvz.org>
> > A *
> > + * Resouce usage threshold notification update
> > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> > + * Author: Dan Malek <dan@embeddedalley.com>
> > + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> > + *
> > A * See Documentation/cgroups/resource_counter.txt for more
> > A * info about what this counter is.
> > A */
> > @@ -35,6 +40,19 @@ struct res_counter {
> > A A A A */
> > A A A A unsigned long long limit;
> > A A A A /*
> > + A A A A * the resource usage threshold to notify the resouce controller. This
> > + A A A A * is the minimal difference between the resource limit and current
> > + A A A A * usage to fire a notification.
> > + A A A A */
> > + A A A unsigned long long threshold;
> > + A A A /*
> > + A A A A * the threshold notification callback installed by the resource
> > + A A A A * controller. Called when the usage reaches or exceeds the threshold.
> > + A A A A * Should be fast and not sleep because called when interrupts are
> > + A A A A * disabled.
> > + A A A A */
> > + A A A void (*threshold_notifier)(struct res_counter *counter);
> > + A A A /*
> > A A A A * the number of unsuccessful attempts to consume the resource
> > A A A A */
> > A A A A unsigned long long failcnt;
> > @@ -87,6 +105,7 @@ enum {
> > A A A A RES_MAX_USAGE,
> > A A A A RES_LIMIT,
> > A A A A RES_FAILCNT,
> > + A A A RES_THRESHOLD,
> > A };
> >
> > A /*
> > @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> > A A A A return false;
> > A }
> >
> > +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> > +{
> > + A A A if (cnt->usage + cnt->threshold < cnt->limit)
> > + A A A A A A A return true;
> > +
> > + A A A return false;
> > +}
> > +
> > +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> > +{
> > + A A A if (!res_counter_threshold_check_locked(cnt) &&
> > + A A A A A cnt->threshold_notifier)
> > + A A A A A A A cnt->threshold_notifier(cnt);
> > +}
> > +
> > A /*
> > A * Helper function to detect if the cgroup is within it's limit or
> > A * not. It's currently called from cgroup_rss_prepare()
> > @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> > A A A A return ret;
> > A }
> >
> > +/*
> > + * Helper function to detect if the cgroup usage is under it's threshold or
> > + * not.
> > + */
> > +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> > +{
> > + A A A bool ret;
> > + A A A unsigned long flags;
> > +
> > + A A A spin_lock_irqsave(&cnt->lock, flags);
> > + A A A ret = res_counter_threshold_check_locked(cnt);
> > + A A A spin_unlock_irqrestore(&cnt->lock, flags);
> > + A A A return ret;
> > +}
> > +
> > A static inline void res_counter_reset_max(struct res_counter *cnt)
> > A {
> > A A A A unsigned long flags;
> > @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
> > A A A A spin_lock_irqsave(&cnt->lock, flags);
> > A A A A if (cnt->usage <= limit) {
> > A A A A A A A A cnt->limit = limit;
> > + A A A A A A A if (limit <= cnt->threshold)
> > + A A A A A A A A A A A cnt->threshold = 0;
> > + A A A A A A A else
> > + A A A A A A A A A A A res_counter_threshold_notify_locked(cnt);
> > + A A A A A A A ret = 0;
> > + A A A }
> > + A A A spin_unlock_irqrestore(&cnt->lock, flags);
> > + A A A return ret;
> > +}
> > +
> > +static inline int res_counter_set_threshold(struct res_counter *cnt,
> > + A A A A A A A unsigned long long threshold)
> > +{
> > + A A A unsigned long flags;
> > + A A A int ret = -EINVAL;
> > +
> > + A A A spin_lock_irqsave(&cnt->lock, flags);
> > + A A A if (cnt->limit > threshold) {
> > + A A A A A A A cnt->threshold = threshold;
> > + A A A A A A A res_counter_threshold_notify_locked(cnt);
> > A A A A A A A A ret = 0;
> > A A A A }
> > A A A A spin_unlock_irqrestore(&cnt->lock, flags);
> > diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> > index e1338f0..9b36748 100644
> > --- a/kernel/res_counter.c
> > +++ b/kernel/res_counter.c
> > @@ -5,6 +5,10 @@
> > A *
> > A * Author: Pavel Emelianov <xemul@openvz.org>
> > A *
> > + * Resouce usage threshold notification update
> > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> > + * Author: Dan Malek <dan@embeddedalley.com>
> > + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> > A */
> >
> > A #include <linux/types.h>
> > @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > A A A A counter->usage += val;
> > A A A A if (counter->usage > counter->max_usage)
> > A A A A A A A A counter->max_usage = counter->usage;
> > + A A A res_counter_threshold_notify_locked(counter);
> > A A A A return 0;
> > A }
> >
> > @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
> > A A A A A A A A return &counter->limit;
> > A A A A case RES_FAILCNT:
> > A A A A A A A A return &counter->failcnt;
> > + A A A case RES_THRESHOLD:
> > + A A A A A A A return &counter->threshold;
> > A A A A };
> >
> > A A A A BUG();
> > --
> > 1.5.6.3
> >
> >
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread