linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 1/1] Memory usage limit notification addition to memcg
       [not found]     ` <20090708095616.cdfe8c7c.kamezawa.hiroyu@jp.fujitsu.com>
@ 2009-07-09  1:43       ` Vladislav D. Buzov
  2009-07-13  0:52         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav D. Buzov @ 2009-07-09  1:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Dan Malek, Andrew Morton, Paul Menage, Balbir Singh, linux-mm

KAMEZAWA Hiroyuki wrote:
> I don't think notify_available_in_bytes is necessary.
>   
I agree. This was a replacement for the old percentage calculation that
was harder for the application to resolve. I'll remove it and update the
example to use the other available memory controller information.

> For making this kind of threashold useful, I think some relaxing margin is good.
> for example) Once triggered, "notiry" will not be triggered in next 1ms
> Do you have an idea ?
>   
There isn't any time attribute associated with this model. There is no
"trigger," just that you don't sleep if the threshold is exceeded.

The notification only happens if you are asking for it. One application
implementation could be that you just respond to notifications. If one
occurs, you will free some memory, then wait for another notification.
If you didn't free enough memory, the notification just keeps occurring
as you ask until the situation is resolved.

> I know people likes to wait for file descriptor to get notification in these days.
> Can't we have "event" file descriptor in cgroup layer and make it reusable for
> other purposes ?
That's next on the list to implement, and there were some comments in
previous messages. I just didn't want to complicate providing this
notification feature by having to also implement an "event" descriptor.
I'm certain that will cause much discussion as well. :-)

>
> I hope this application will not block rmdir() ;)
>   
No, because there are no blocking reads (the wait continues to return)
when the cgroup is being destroyed.

> One question is how this works under hierarchical accounting.
>
> Considering following.
>
> /cgroup/A/                     no thresh
>           001/                 thresh=5M
>               John             thresh=1M
>           002/                 no thresh
>               Hiroyuki         no thresh
>
> If Hiroyuki use too much and hit /cgroup/A's limit, memory will be reclaimed from all
> A,001,John,002,Hiroyuki and OOM Killer may kill processes in John.
> But 001/John's notifier will not fire. Right ?
>   
The 001/John's applications will not be notified, since everything in
that child cgroup is OK. This is based on the accounting behavior of the
memory cgroup. If you want notification at the parent, you need to
create a thread to catch that condition at the parent level. When that
occurs, there is a mechanism to notify the children by just writing the
notify_threshold_lowait file. Your applications need to be designed to
identify this condition (or simply always free some resources when
notified) for this to work.

The OOM killer is an orthogonal discussion. You can select from
available killers that may make the choices you desire, or implement
your own requirements and attach it to the cgroup.

> I don't think CONFIG is necessary. Let this always used.
>   
Ok.

> 2 points.
>  - Do we have to check this always we account ?
>   
What are the options? Every N pages? How to select N?

>  - This will not catch hierarchical accounting threshold because this check
>    only local cgroup, no ancestors.
>   
Right.. That was the intention so I'll need to fix it

> I don't want to say this but you need to add hook to res_counter itself.
>   
I agree, res_counter seems to be the most appropriate place to keep and
track the threshold as well as it already does for the usage and limit.
During resource charge operation res_counter can check the usage against
the threshold and, if it's exceeded, call the memory controller cgroup
to notify its tasks

> What this means ?? Can happen ?
>   
It means the cgroup was created but no one has yet set any limits on the
cgroup itself. There is no reason to test any conditions for notification.

> If this is true, "set limit" should be checked to guarantee this.
> plz allow minus this for avoiding mess.
Setting the memory controller cgroup limit and the notification
threshold are two separate operations. There isn't any "mess," just some
validation testing for reporting back to the source of the request. When
changing the memory controller limit, we ensure the threshold limit is
never allowed "negative." At most, the threshold limit will be equal the
memory controller cgroup limit. Otherwise, the arithmetic and
conditional tests during the operational part of the software becomes
more complex, which we don't want.

> plz call wake_em_up at pre_destroy(), too.
>   
Ok.

Thanks,
Vlad.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] Memory usage limit notification addition to memcg
  2009-07-09  1:43       ` [PATCH 1/1] Memory usage limit notification addition to memcg Vladislav D. Buzov
@ 2009-07-13  0:52         ` KAMEZAWA Hiroyuki
  2009-07-13 21:21           ` Vladislav D. Buzov
  0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-13  0:52 UTC (permalink / raw)
  To: Vladislav D. Buzov
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Dan Malek, Andrew Morton, Paul Menage, Balbir Singh, linux-mm

On Wed, 08 Jul 2009 18:43:48 -0700
"Vladislav D. Buzov" <vbuzov@embeddedalley.com> wrote:

> KAMEZAWA Hiroyuki wrote:

> > 2 points.
> >  - Do we have to check this always we account ?
> >   
> What are the options? Every N pages? How to select N?
> 
I think you can reuse Balbir's softlimit event counter. (see v9.)


> > If this is true, "set limit" should be checked to guarantee this.
> > plz allow minus this for avoiding mess.
> Setting the memory controller cgroup limit and the notification
> threshold are two separate operations. There isn't any "mess," just some
> validation testing for reporting back to the source of the request. When
> changing the memory controller limit, we ensure the threshold limit is
> never allowed "negative." At most, the threshold limit will be equal the
> memory controller cgroup limit. Otherwise, the arithmetic and
> conditional tests during the operational part of the software becomes
> more complex, which we don't want.
> 
Hmm, then, plz this interface put under "set_limit_mutex".

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] Memory usage limit notification addition to memcg
  2009-07-13  0:52         ` KAMEZAWA Hiroyuki
@ 2009-07-13 21:21           ` Vladislav D. Buzov
  0 siblings, 0 replies; 13+ messages in thread
From: Vladislav D. Buzov @ 2009-07-13 21:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Dan Malek, Andrew Morton, Paul Menage, Balbir Singh, linux-mm

KAMEZAWA Hiroyuki wrote:
> On Wed, 08 Jul 2009 18:43:48 -0700
> "Vladislav D. Buzov" <vbuzov@embeddedalley.com> wrote:
>
>   
>> KAMEZAWA Hiroyuki wrote:
>>     
>>> 2 points.
>>>  - Do we have to check this always we account ?
>>>   
>>>       
>> What are the options? Every N pages? How to select N?
>>
>>     
> I think you can reuse Balbir's softlimit event counter. (see v9.)
>   
It still does not answer the question how to select the number of events
before/between sending the notification.

The idea behind the notification feature is to let user applications
know immediately when a low memory condition occurs (the threshold is
exceeded). So that they can take action to free unused memory before the
OS is involved to handle that (OOM-kill, reclaiming pages).

As far as I understand the reason why you would like to add a delay
between sending notifications is to let user applications some time to
free memory. This is not required by design of the notification feature
because the notification is sent only if someone listening for it.
Typical application will subscribe for low-memory notification, receive
it, handle and then subscribe again. So, even if low memory conditions
keep occurring in mean time, the notification will not be fired. If it
happens again after the user application freed some memory the
application will be immediately notified.

>   
>>> If this is true, "set limit" should be checked to guarantee this.
>>> plz allow minus this for avoiding mess.
>>>       
>> Setting the memory controller cgroup limit and the notification
>> threshold are two separate operations. There isn't any "mess," just some
>> validation testing for reporting back to the source of the request. When
>> changing the memory controller limit, we ensure the threshold limit is
>> never allowed "negative." At most, the threshold limit will be equal the
>> memory controller cgroup limit. Otherwise, the arithmetic and
>> conditional tests during the operational part of the software becomes
>> more complex, which we don't want.
>>
>>     
> Hmm, then, plz this interface put under "set_limit_mutex".
>   
I'm going to send another patch soon where I added threshold feature to
the Resource Counter. It's going to address all concerns about data
protection.

Thanks,
Vlad.
> Thanks,
> -Kame
>
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/2] Memory usage limit notification feature (v3)
       [not found] ` <1246998310-16764-1-git-send-email-vbuzov@embeddedalley.com>
       [not found]   ` <1246998310-16764-2-git-send-email-vbuzov@embeddedalley.com>
@ 2009-07-14  0:16   ` Vladislav Buzov
  2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
  2009-07-14  0:20     ` [PATCH 0/2] Memory usage limit notification feature (v3) Paul Menage
  1 sibling, 2 replies; 13+ messages in thread
From: Vladislav Buzov @ 2009-07-14  0:16 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Containers Mailing List, Linux memory management list,
	Dan Malek, Andrew Morton, Paul Menage, KAMEZAWA Hiroyuki,
	Balbir Singh


The following sequence of patches introduce memory usage limit notification
capability to the Memory Controller cgroup.

This is v3 of the implementation. The major difference between previous
version is it is based on the the Resource Counter extension to notify the
Resource Controller when the resource usage achieves or exceeds a configurable
threshold.

TODOs:

1. Another, more generic notification mechanism supporting different  events
   is preferred to use, rather than creating a dedicated file in the Memory
   Controller cgroup.


Thanks,
Vlad.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  0:16   ` [PATCH 0/2] Memory usage limit notification feature (v3) Vladislav Buzov
@ 2009-07-14  0:16     ` Vladislav Buzov
  2009-07-14  0:16       ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
                         ` (2 more replies)
  2009-07-14  0:20     ` [PATCH 0/2] Memory usage limit notification feature (v3) Paul Menage
  1 sibling, 3 replies; 13+ messages in thread
From: Vladislav Buzov @ 2009-07-14  0:16 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Containers Mailing List, Linux memory management list,
	Dan Malek, Andrew Morton, Paul Menage, KAMEZAWA Hiroyuki,
	Balbir Singh, Vladislav Buzov

This patch updates the Resource Counter to add a configurable resource usage
threshold notification mechanism.

Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
Signed-off-by: Dan Malek <dan@embeddedalley.com>
---
 Documentation/cgroups/resource_counter.txt |   21 ++++++++-
 include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++
 kernel/res_counter.c                       |    7 +++
 3 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
index 95b24d7..1369dff 100644
--- a/Documentation/cgroups/resource_counter.txt
+++ b/Documentation/cgroups/resource_counter.txt
@@ -39,7 +39,20 @@ to work with it.
  	The failcnt stands for "failures counter". This is the number of
 	resource allocation attempts that failed.
 
- c. spinlock_t lock
+ e. unsigned long long threshold
+
+ 	The resource usage threshold to notify the resouce controller. This is
+	the minimal difference between the resource limit and current usage
+	to fire a notification.
+
+ f. void (*threshold_notifier)(struct res_counter *counter)
+
+	The threshold notification callback installed by the resource
+	controller. Called when the usage reaches or exceeds the threshold.
+	Should be fast and not sleep because called when interrupts are
+	disabled.
+
+ g. spinlock_t lock
 
  	Protects changes of the above values.
 
@@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
 	usage		usage_in_<unit_of_measurement>
 	max_usage	max_usage_in_<unit_of_measurement>
 	limit		limit_in_<unit_of_measurement>
+	threshold	notify_threshold_in_<unit_of_measurement>
 	failcnt		failcnt
 	lock		no file :)
 
@@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
 	usage		prohibited
 	max_usage	reset to usage
 	limit		set the limit
+	threshold	set the threshold
 	failcnt		reset to zero
 
-
+ d. Notification is enabled by installing the threshold notifier callback. It
+    is up to the resouce controller to communicate the notification to user
+    space tasks.
 
 5. Usage example
 
diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index 511f42f..5ec98d7 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -9,6 +9,11 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Resouce usage threshold notification update
+ * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
+ * Author: Dan Malek <dan@embeddedalley.com>
+ * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
+ *
  * See Documentation/cgroups/resource_counter.txt for more
  * info about what this counter is.
  */
@@ -35,6 +40,19 @@ struct res_counter {
 	 */
 	unsigned long long limit;
 	/*
+	 * the resource usage threshold to notify the resouce controller. This
+	 * is the minimal difference between the resource limit and current
+	 * usage to fire a notification.
+	 */
+	unsigned long long threshold;
+	/*
+	 * the threshold notification callback installed by the resource
+	 * controller. Called when the usage reaches or exceeds the threshold.
+	 * Should be fast and not sleep because called when interrupts are
+	 * disabled.
+	 */
+	void (*threshold_notifier)(struct res_counter *counter);
+	/*
 	 * the number of unsuccessful attempts to consume the resource
 	 */
 	unsigned long long failcnt;
@@ -87,6 +105,7 @@ enum {
 	RES_MAX_USAGE,
 	RES_LIMIT,
 	RES_FAILCNT,
+	RES_THRESHOLD,
 };
 
 /*
@@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
 	return false;
 }
 
+static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
+{
+	if (cnt->usage + cnt->threshold < cnt->limit)
+		return true;
+
+	return false;
+}
+
+static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
+{
+	if (!res_counter_threshold_check_locked(cnt) &&
+	    cnt->threshold_notifier)
+		cnt->threshold_notifier(cnt);
+}
+
 /*
  * Helper function to detect if the cgroup is within it's limit or
  * not. It's currently called from cgroup_rss_prepare()
@@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
 	return ret;
 }
 
+/*
+ * Helper function to detect if the cgroup usage is under it's threshold or
+ * not.
+ */
+static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
+{
+	bool ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	ret = res_counter_threshold_check_locked(cnt);
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
 static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
@@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
 	spin_lock_irqsave(&cnt->lock, flags);
 	if (cnt->usage <= limit) {
 		cnt->limit = limit;
+		if (limit <= cnt->threshold)
+			cnt->threshold = 0;
+		else
+			res_counter_threshold_notify_locked(cnt);
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}
+
+static inline int res_counter_set_threshold(struct res_counter *cnt,
+		unsigned long long threshold)
+{
+	unsigned long flags;
+	int ret = -EINVAL;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if (cnt->limit > threshold) {
+		cnt->threshold = threshold;
+		res_counter_threshold_notify_locked(cnt);
 		ret = 0;
 	}
 	spin_unlock_irqrestore(&cnt->lock, flags);
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index e1338f0..9b36748 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -5,6 +5,10 @@
  *
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Resouce usage threshold notification update
+ * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
+ * Author: Dan Malek <dan@embeddedalley.com>
+ * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
  */
 
 #include <linux/types.h>
@@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
 	counter->usage += val;
 	if (counter->usage > counter->max_usage)
 		counter->max_usage = counter->usage;
+	res_counter_threshold_notify_locked(counter);
 	return 0;
 }
 
@@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
 		return &counter->limit;
 	case RES_FAILCNT:
 		return &counter->failcnt;
+	case RES_THRESHOLD:
+		return &counter->threshold;
 	};
 
 	BUG();
-- 
1.5.6.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] Memory usage limit notification addition to memcg (v3)
  2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
@ 2009-07-14  0:16       ` Vladislav Buzov
  2009-07-14  0:30       ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
  2009-07-14  0:36       ` Paul Menage
  2 siblings, 0 replies; 13+ messages in thread
From: Vladislav Buzov @ 2009-07-14  0:16 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linux Containers Mailing List, Linux memory management list,
	Dan Malek, Andrew Morton, Paul Menage, KAMEZAWA Hiroyuki,
	Balbir Singh, Vladislav Buzov

This patch updates the Memory Controller Control Group to add a
configurable memory usage limit notification. The feature was
presented at the April 2009 Embedded Linux Conference.

Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
Signed-off-by: Dan Malek <dan@embeddedalley.com>
---
 Documentation/cgroups/mem_notify.txt |  140 ++++++++++++++++++++++++++++++++++
 mm/memcontrol.c                      |  100 ++++++++++++++++++++++++-
 2 files changed, 239 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/cgroups/mem_notify.txt

diff --git a/Documentation/cgroups/mem_notify.txt b/Documentation/cgroups/mem_notify.txt
new file mode 100644
index 0000000..94be3f3
--- /dev/null
+++ b/Documentation/cgroups/mem_notify.txt
@@ -0,0 +1,140 @@
+
+Memory Limit Notification
+
+Attempts have been made in the past to provide a mechanism for
+the notification to processes (task, an address space) when memory
+usage is approaching a high limit.  The intention is that it gives
+the application an opportunity to release some memory and continue
+operation rather than be OOM killed.  The CE Linux Forum requested
+a more contemporary implementation, and this is the result.
+
+The memory limit notification is an extension to the existing Memory
+Resource Controller.  Please read memory.txt in this directory to
+understand its operation before continuing here.
+
+1. Operation
+
+When the Memory Controller cgroup file system is mounted, the following
+files will appear:
+
+	memory.notify_threshold_in_bytes
+	memory.notify_threshold_lowait
+
+The notification is based upon reaching a threshold below the Memory
+Resource Controller limit (memory.limit_in_bytes).  The threshold
+represents the minimal number of bytes that should be available under
+the limit.  When the controller group is created, the threshold is set
+to zero which triggers notification when the Memory Resource Controller
+limit is reached.
+
+The threshold may be set by writing to memory.notify_threshold_in_bytes,
+such as:
+
+	echo 10M > memory.notify_threshold_in_bytes
+
+The current number of available bytes may be computed at any time as a
+difference between the memory.limit_in_bytes and memory.usage_in_bytes.
+
+The memory.notify_threshold_lowait is a blocking read file.  The read will
+block until one of four conditions occurs:
+
+    - The amount of available memory is equal or less than the threshold
+      defined in memory.notify_threshold_in_bytes
+    - The memory.notify_threshold_lowait file is written with any value (debug)
+    - A thread is moved to another controller group
+    - The cgroup is destroyed or forced empty (memory.force_empty)
+
+
+1.1 Example Usage
+
+An application must be designed to properly take advantage of this
+memory threshold notification feature.  It is a powerful management component
+of some operating systems and embedded devices that must provide
+highly available and reliable computing services.  The application works
+in conjunction with information provided by the operating system to
+control limited resource usage.  Since many programmers still think
+memory is infinite and never check the return value from malloc(), it
+may come as a surprise that such mechanisms have been utilized long ago.
+
+A typical application will be multithreaded, with one thread either
+polling or waiting for the notification event.  When the event occurs,
+the thread will take whatever action is appropriate within the application
+design.  This could be actually running a garbage collection algorithm
+or to simply signal other processing threads they must do something to
+reduce their memory usage.  The notification thread will then be required
+to poll the actual usage until the low limit of its choosing is met,
+at which time the reclaim of memory can stop and the notification thread
+will wait for the next event.
+
+Internally, the application only needs to
+fopen("memory.notify_usage_in_bytes" ..) or
+fopen("memory.notify_threshold_lowait" ...), then either poll the former
+files or block read on the latter file using fread() or fscanf() as desired.
+Subtracting the value returned from either of these read function from the
+value obtained by reading memory.limit_in_bytes and further comparing it with
+the threshold obtained by reading memory.notify_threshold_in_bytes will be an
+indication of the amount of memory used over the threshold limit.
+
+2. Configuration
+
+Follow the instructions in memory.txt for the configuration and usage of
+the Memory Resource Controller cgroup.  Once this is created and tasks
+assigned, use the memory threshold notification as described here.
+
+The only action that is needed outside of the application waiting or polling
+is to set the memory.notify_threshold_in_bytes.  To set a notification to occur
+when memory usage of the cgroup reaches or exceeds 1 MByte below the limit
+can be simply done:
+
+	echo 1M > memory.notify_threshold_in_bytes
+
+This value may be read or changed at any time.  Writing a higher value once
+the Memory Resource Controller is in operation may trigger immediate
+notification if the usage is above the new threshold.  Writing a value higher
+than the Memory Controller limit will cause an error while setting the limit
+lower than the threshold will cause setting the threshold to zero.
+
+3. Debug and Testing
+
+The design of cgroups makes it easier to perform some debugging or
+monitoring tasks without modification to the application.  For example,
+a write of any value to memory.notify_threshold_lowait will wake up all
+threads waiting for notifications regardless of current memory usage.
+
+Collecting performance data about the cgroup is also simplified, as
+no application modifications are necessary.  A separate task can be
+created that will open and monitor any necessary files of the cgroup
+(such as current limits, usage and usage percentages and even when
+notification occurs).  This task can also operate outside of the cgroup,
+so its memory usage is not charged to the cgroup.
+
+4. Design
+
+The Memory Resource Controller utilizes the Resource Counter to track and manage
+the memory of the Control Group.  The Resource Counter was extended to support
+the resource usage threshold, which is the minimal difference between the
+resource limit and usage causing the notification.  For the Memory Controller
+cgroup it means a number of bytes of the memory not in use so the cgroup
+parameters may continue to be dynamically modified without the need to modify
+the notification parameters.  Otherwise, the notification threshold would have
+to also be computed and modified on any Memory Resource Controller operating
+parameter change.
+
+The cgroup file semantics are not well suited for this type of notification
+mechanism.  While applications may choose to simply poll the current
+usage at their convenience, it was also desired to have a notification
+event that would trigger when the usage attained the threshold.  The
+blocking read() was chosen, as it is the only current useful method.
+This presented the problems of "out of band" notification, when you want
+to return some exceptional status other than reaching the notification
+threshold.  In the cases listed above, the read() on the
+memory.notify_threshold_lowait file will not block and return "0" for
+the remaining size.  When this occurs, the thread must determine if the task
+has moved to a new cgroup or if the cgroup has been destroyed.  Due to
+the usage model of this cgroup, neither is likely to happen during normal
+operation of a product.
+
+Dan Malek <dan@embeddedalley.com>
+Vladislav Buzov <vbuzov@embeddedalley.com>
+Embedded Alley Solutions, Inc.
+10 July 2009
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2fa20d..3b49fd4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6,6 +6,11 @@
  * Copyright 2007 OpenVZ SWsoft Inc
  * Author: Pavel Emelianov <xemul@openvz.org>
  *
+ * Memory Limit Notification update
+ * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
+ * Author: Dan Malek <dan@embeddedalley.com>
+ * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -180,6 +185,9 @@ struct mem_cgroup {
 	/* set when res.limit == memsw.limit */
 	bool		memsw_is_minimum;
 
+	/* tasks waiting for memory usage threshold notification */
+	wait_queue_head_t notify_threshold_wait;
+
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -2052,7 +2060,7 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
 }
 /*
  * The user of this function is...
- * RES_LIMIT.
+ * RES_LIMIT, RES_THRESHOLD
  */
 static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			    const char *buffer)
@@ -2075,6 +2083,17 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 		else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
+	case RES_THRESHOLD:
+		/* This function does all necessary parse...reuse it */
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (ret)
+			break;
+		/* For memsw threshold is not implemented */
+		if (type == _MEM)
+			ret = res_counter_set_threshold(&memcg->res, val);
+		else
+			ret = -EINVAL;
+		break;
 	default:
 		ret = -EINVAL; /* should be BUG() ? */
 		break;
@@ -2308,6 +2327,68 @@ static int mem_cgroup_swappiness_write(struct cgroup *cgrp, struct cftype *cft,
 	return 0;
 }
 
+/*
+ * This is a blocking read operation forcing a reader to sleep unless
+ * a low memory condition occurs, someone intentionaly writes to
+ * "memory.notify_threshold_lowait" or cgroup state is changed. E.g.
+ * the cgroup is destroyed or task is moved to another cgroup.
+ */
+static u64 mem_cgroup_notify_threshold_lowait(struct cgroup *cgrp,
+					      struct cftype *cft)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp);
+	DEFINE_WAIT(notify_lowait);
+
+	/*
+	 * A memory resource usage of zero is a special case that
+	 * causes us not to sleep.  It normally happens when the
+	 * cgroup is about to be destroyed, and we don't want someone
+	 * trying to sleep on a queue that is about to go away.  This
+	 * condition can also be forced as part of testing.
+	 */
+	if (likely(mem->res.usage != 0)) {
+		prepare_to_wait(&mem->notify_threshold_wait, &notify_lowait,
+							TASK_INTERRUPTIBLE);
+
+		if (res_counter_check_under_threshold(&mem->res))
+			schedule();
+
+		finish_wait(&mem->notify_threshold_wait, &notify_lowait);
+	}
+
+	return res_counter_read_u64(&mem->res, RES_USAGE);
+}
+
+/*
+ * Memory usage threshold notification callback. Called under disabled
+ * interrupts by the memory resource counter when low memory condition
+ * occurs.
+ */
+static void mem_cgroup_res_threshold_notifier(struct res_counter *cnt)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = mem_cgroup_from_res_counter(cnt, res);
+	if (waitqueue_active(&memcg->notify_threshold_wait))
+		wake_up_locked(&memcg->notify_threshold_wait);
+}
+
+/*
+ * This is used to wake up all threads that may be hanging
+ * out waiting for a low memory condition prior to that happening.
+ * Useful for triggering the event to assist with debug of applications.
+ */
+static int mem_cgroup_notify_threshold_wake_em_up(struct cgroup *cgrp,
+						  unsigned int event)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = mem_cgroup_from_cont(cgrp);
+	if (waitqueue_active(&memcg->notify_threshold_wait))
+		wake_up(&memcg->notify_threshold_wait);
+	return 0;
+}
+
 
 static struct cftype mem_cgroup_files[] = {
 	{
@@ -2351,6 +2432,17 @@ static struct cftype mem_cgroup_files[] = {
 		.read_u64 = mem_cgroup_swappiness_read,
 		.write_u64 = mem_cgroup_swappiness_write,
 	},
+	{
+		.name = "notify_threshold_in_bytes",
+		.private = MEMFILE_PRIVATE(_MEM, RES_THRESHOLD),
+		.write_string = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
+		.name = "notify_threshold_lowait",
+		.trigger = mem_cgroup_notify_threshold_wake_em_up,
+		.read_u64 = mem_cgroup_notify_threshold_lowait,
+	},
 };
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
@@ -2554,6 +2646,9 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	mem->last_scanned_child = 0;
 	spin_lock_init(&mem->reclaim_param_lock);
 
+	init_waitqueue_head(&mem->notify_threshold_wait);
+	mem->res.threshold_notifier = mem_cgroup_res_threshold_notifier;
+
 	if (parent)
 		mem->swappiness = get_swappiness(parent);
 	atomic_set(&mem->refcnt, 1);
@@ -2568,6 +2663,7 @@ static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 {
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
 
+	mem_cgroup_notify_threshold_wake_em_up(cont, 0);
 	return mem_cgroup_force_empty(mem, false);
 }
 
@@ -2597,6 +2693,8 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct cgroup *old_cont,
 				struct task_struct *p)
 {
+	mem_cgroup_notify_threshold_wake_em_up(old_cont, 0);
+
 	mutex_lock(&memcg_tasklist);
 	/*
 	 * FIXME: It's better to move charges of this process from old
-- 
1.5.6.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Memory usage limit notification feature (v3)
  2009-07-14  0:16   ` [PATCH 0/2] Memory usage limit notification feature (v3) Vladislav Buzov
  2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
@ 2009-07-14  0:20     ` Paul Menage
  2009-07-14  0:31       ` KOSAKI Motohiro
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Menage @ 2009-07-14  0:20 UTC (permalink / raw)
  To: Vladislav Buzov
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Linux memory management list, Dan Malek, Andrew Morton,
	KAMEZAWA Hiroyuki, Balbir Singh

On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
Buzov<vbuzov@embeddedalley.com> wrote:
>
> The following sequence of patches introduce memory usage limit notification
> capability to the Memory Controller cgroup.
>
> This is v3 of the implementation. The major difference between previous
> version is it is based on the the Resource Counter extension to notify the
> Resource Controller when the resource usage achieves or exceeds a configurable
> threshold.
>
> TODOs:
>
> 1. Another, more generic notification mechanism supporting different  events
>   is preferred to use, rather than creating a dedicated file in the Memory
>   Controller cgroup.

I think that defining the the more generic userspace-API portion of
this TODO should come *prior* to the new feature in this patch, even
if the kernel implementation isn't initially generic.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
  2009-07-14  0:16       ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
@ 2009-07-14  0:30       ` KAMEZAWA Hiroyuki
  2009-07-14  1:29         ` Vladislav D. Buzov
  2009-07-14  0:36       ` Paul Menage
  2 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14  0:30 UTC (permalink / raw)
  To: Vladislav Buzov
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Linux memory management list, Dan Malek, Andrew Morton,
	Paul Menage, Balbir Singh

On Mon, 13 Jul 2009 17:16:20 -0700
Vladislav Buzov <vbuzov@embeddedalley.com> wrote:

> This patch updates the Resource Counter to add a configurable resource usage
> threshold notification mechanism.
> 
> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> ---
>  Documentation/cgroups/resource_counter.txt |   21 ++++++++-
>  include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++
>  kernel/res_counter.c                       |    7 +++
>  3 files changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> index 95b24d7..1369dff 100644
> --- a/Documentation/cgroups/resource_counter.txt
> +++ b/Documentation/cgroups/resource_counter.txt
> @@ -39,7 +39,20 @@ to work with it.
>   	The failcnt stands for "failures counter". This is the number of
>  	resource allocation attempts that failed.
>  
> - c. spinlock_t lock
> + e. unsigned long long threshold
> +
> + 	The resource usage threshold to notify the resouce controller. This is
> +	the minimal difference between the resource limit and current usage
> +	to fire a notification.
> +
> + f. void (*threshold_notifier)(struct res_counter *counter)
> +
> +	The threshold notification callback installed by the resource
> +	controller. Called when the usage reaches or exceeds the threshold.
> +	Should be fast and not sleep because called when interrupts are
> +	disabled.
> +

This interface isn't very useful..hard to use..can't you just return the result as
"exceeds threshold" to the callers ?

If I was you, I'll add following state to res_counter

enum {
	RES_BELOW_THRESH,
	RES_OVER_THRESH,
} res_state;

struct res_counter {
	.....
	enum	res_state	state;
}

Then, caller does
example)
	prev_state = res->state;
	res_counter_charge(res....)
	if (prev_state != res->state)
		do_xxxxx..

notifier under spinlock is not usual interface. And if this is "notifier",
something generic, notifier_call_chain should be used rather than original
one, IIUC.

So, avoiding to use "callback" is a way to go, I think.

Thanks,
-Kame




> + g. spinlock_t lock
>  
>   	Protects changes of the above values.
>  
> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
>  	usage		usage_in_<unit_of_measurement>
>  	max_usage	max_usage_in_<unit_of_measurement>
>  	limit		limit_in_<unit_of_measurement>
> +	threshold	notify_threshold_in_<unit_of_measurement>
>  	failcnt		failcnt
>  	lock		no file :)
>  
> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
>  	usage		prohibited
>  	max_usage	reset to usage
>  	limit		set the limit
> +	threshold	set the threshold
>  	failcnt		reset to zero
>  
> -
> + d. Notification is enabled by installing the threshold notifier callback. It
> +    is up to the resouce controller to communicate the notification to user
> +    space tasks.
>  
>  5. Usage example
>  
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 511f42f..5ec98d7 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,11 @@
>   *
>   * Author: Pavel Emelianov <xemul@openvz.org>
>   *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> + *
>   * See Documentation/cgroups/resource_counter.txt for more
>   * info about what this counter is.
>   */
> @@ -35,6 +40,19 @@ struct res_counter {
>  	 */
>  	unsigned long long limit;
>  	/*
> +	 * the resource usage threshold to notify the resouce controller. This
> +	 * is the minimal difference between the resource limit and current
> +	 * usage to fire a notification.
> +	 */
> +	unsigned long long threshold;
> +	/*
> +	 * the threshold notification callback installed by the resource
> +	 * controller. Called when the usage reaches or exceeds the threshold.
> +	 * Should be fast and not sleep because called when interrupts are
> +	 * disabled.
> +	 */
> +	void (*threshold_notifier)(struct res_counter *counter);
> +	/*
>  	 * the number of unsuccessful attempts to consume the resource
>  	 */
>  	unsigned long long failcnt;
> @@ -87,6 +105,7 @@ enum {
>  	RES_MAX_USAGE,
>  	RES_LIMIT,
>  	RES_FAILCNT,
> +	RES_THRESHOLD,
>  };
>  
>  /*
> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
>  	return false;
>  }
>  
> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> +{
> +	if (cnt->usage + cnt->threshold < cnt->limit)
> +		return true;
> +
> +	return false;
> +}
> +
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> +	if (!res_counter_threshold_check_locked(cnt) &&
> +	    cnt->threshold_notifier)
> +		cnt->threshold_notifier(cnt);
> +}
> +
>  /*
>   * Helper function to detect if the cgroup is within it's limit or
>   * not. It's currently called from cgroup_rss_prepare()
> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>  	return ret;
>  }
>  
> +/*
> + * Helper function to detect if the cgroup usage is under it's threshold or
> + * not.
> + */
> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> +{
> +	bool ret;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	ret = res_counter_threshold_check_locked(cnt);
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
>  static inline void res_counter_reset_max(struct res_counter *cnt)
>  {
>  	unsigned long flags;
> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
>  	spin_lock_irqsave(&cnt->lock, flags);
>  	if (cnt->usage <= limit) {
>  		cnt->limit = limit;
> +		if (limit <= cnt->threshold)
> +			cnt->threshold = 0;
> +		else
> +			res_counter_threshold_notify_locked(cnt);
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}
> +
> +static inline int res_counter_set_threshold(struct res_counter *cnt,
> +		unsigned long long threshold)
> +{
> +	unsigned long flags;
> +	int ret = -EINVAL;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if (cnt->limit > threshold) {
> +		cnt->threshold = threshold;
> +		res_counter_threshold_notify_locked(cnt);
>  		ret = 0;
>  	}
>  	spin_unlock_irqrestore(&cnt->lock, flags);
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index e1338f0..9b36748 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -5,6 +5,10 @@
>   *
>   * Author: Pavel Emelianov <xemul@openvz.org>
>   *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>   */
>  
>  #include <linux/types.h>
> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>  	counter->usage += val;
>  	if (counter->usage > counter->max_usage)
>  		counter->max_usage = counter->usage;
> +	res_counter_threshold_notify_locked(counter);
>  	return 0;
>  }
>  
> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
>  		return &counter->limit;
>  	case RES_FAILCNT:
>  		return &counter->failcnt;
> +	case RES_THRESHOLD:
> +		return &counter->threshold;
>  	};
>  
>  	BUG();
> -- 
> 1.5.6.3
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Memory usage limit notification feature (v3)
  2009-07-14  0:20     ` [PATCH 0/2] Memory usage limit notification feature (v3) Paul Menage
@ 2009-07-14  0:31       ` KOSAKI Motohiro
  0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-07-14  0:31 UTC (permalink / raw)
  To: Paul Menage
  Cc: kosaki.motohiro, Vladislav Buzov, Linux Kernel Mailing List,
	Linux Containers Mailing List, Linux memory management list,
	Dan Malek, Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh

> On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
> Buzov<vbuzov@embeddedalley.com> wrote:
> >
> > The following sequence of patches introduce memory usage limit notification
> > capability to the Memory Controller cgroup.
> >
> > This is v3 of the implementation. The major difference between previous
> > version is it is based on the the Resource Counter extension to notify the
> > Resource Controller when the resource usage achieves or exceeds a configurable
> > threshold.
> >
> > TODOs:
> >
> > 1. Another, more generic notification mechanism supporting different  events
> >   is preferred to use, rather than creating a dedicated file in the Memory
> >   Controller cgroup.
> 
> I think that defining the the more generic userspace-API portion of
> this TODO should come *prior* to the new feature in this patch, even
> if the kernel implementation isn't initially generic.

I fully agree this ;-)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
  2009-07-14  0:16       ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
  2009-07-14  0:30       ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
@ 2009-07-14  0:36       ` Paul Menage
  2009-07-14  0:47         ` KAMEZAWA Hiroyuki
  2 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2009-07-14  0:36 UTC (permalink / raw)
  To: Vladislav Buzov
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Linux memory management list, Dan Malek, Andrew Morton,
	KAMEZAWA Hiroyuki, Balbir Singh

As I mentioned in another thread, I think that associating the
threshold with the res_counter rather than with each individual waiter
is a mistake, since it creates global state and makes it hard to have
multiple waiters on the same cgroup.

Paul

On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
Buzov<vbuzov@embeddedalley.com> wrote:
> This patch updates the Resource Counter to add a configurable resource usage
> threshold notification mechanism.
>
> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> ---
>  Documentation/cgroups/resource_counter.txt |   21 ++++++++-
>  include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++
>  kernel/res_counter.c                       |    7 +++
>  3 files changed, 95 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> index 95b24d7..1369dff 100644
> --- a/Documentation/cgroups/resource_counter.txt
> +++ b/Documentation/cgroups/resource_counter.txt
> @@ -39,7 +39,20 @@ to work with it.
>        The failcnt stands for "failures counter". This is the number of
>        resource allocation attempts that failed.
>
> - c. spinlock_t lock
> + e. unsigned long long threshold
> +
> +       The resource usage threshold to notify the resouce controller. This is
> +       the minimal difference between the resource limit and current usage
> +       to fire a notification.
> +
> + f. void (*threshold_notifier)(struct res_counter *counter)
> +
> +       The threshold notification callback installed by the resource
> +       controller. Called when the usage reaches or exceeds the threshold.
> +       Should be fast and not sleep because called when interrupts are
> +       disabled.
> +
> + g. spinlock_t lock
>
>        Protects changes of the above values.
>
> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
>        usage           usage_in_<unit_of_measurement>
>        max_usage       max_usage_in_<unit_of_measurement>
>        limit           limit_in_<unit_of_measurement>
> +       threshold       notify_threshold_in_<unit_of_measurement>
>        failcnt         failcnt
>        lock            no file :)
>
> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
>        usage           prohibited
>        max_usage       reset to usage
>        limit           set the limit
> +       threshold       set the threshold
>        failcnt         reset to zero
>
> -
> + d. Notification is enabled by installing the threshold notifier callback. It
> +    is up to the resouce controller to communicate the notification to user
> +    space tasks.
>
>  5. Usage example
>
> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> index 511f42f..5ec98d7 100644
> --- a/include/linux/res_counter.h
> +++ b/include/linux/res_counter.h
> @@ -9,6 +9,11 @@
>  *
>  * Author: Pavel Emelianov <xemul@openvz.org>
>  *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> + *
>  * See Documentation/cgroups/resource_counter.txt for more
>  * info about what this counter is.
>  */
> @@ -35,6 +40,19 @@ struct res_counter {
>         */
>        unsigned long long limit;
>        /*
> +        * the resource usage threshold to notify the resouce controller. This
> +        * is the minimal difference between the resource limit and current
> +        * usage to fire a notification.
> +        */
> +       unsigned long long threshold;
> +       /*
> +        * the threshold notification callback installed by the resource
> +        * controller. Called when the usage reaches or exceeds the threshold.
> +        * Should be fast and not sleep because called when interrupts are
> +        * disabled.
> +        */
> +       void (*threshold_notifier)(struct res_counter *counter);
> +       /*
>         * the number of unsuccessful attempts to consume the resource
>         */
>        unsigned long long failcnt;
> @@ -87,6 +105,7 @@ enum {
>        RES_MAX_USAGE,
>        RES_LIMIT,
>        RES_FAILCNT,
> +       RES_THRESHOLD,
>  };
>
>  /*
> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
>        return false;
>  }
>
> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> +{
> +       if (cnt->usage + cnt->threshold < cnt->limit)
> +               return true;
> +
> +       return false;
> +}
> +
> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> +{
> +       if (!res_counter_threshold_check_locked(cnt) &&
> +           cnt->threshold_notifier)
> +               cnt->threshold_notifier(cnt);
> +}
> +
>  /*
>  * Helper function to detect if the cgroup is within it's limit or
>  * not. It's currently called from cgroup_rss_prepare()
> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>        return ret;
>  }
>
> +/*
> + * Helper function to detect if the cgroup usage is under it's threshold or
> + * not.
> + */
> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> +{
> +       bool ret;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&cnt->lock, flags);
> +       ret = res_counter_threshold_check_locked(cnt);
> +       spin_unlock_irqrestore(&cnt->lock, flags);
> +       return ret;
> +}
> +
>  static inline void res_counter_reset_max(struct res_counter *cnt)
>  {
>        unsigned long flags;
> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
>        spin_lock_irqsave(&cnt->lock, flags);
>        if (cnt->usage <= limit) {
>                cnt->limit = limit;
> +               if (limit <= cnt->threshold)
> +                       cnt->threshold = 0;
> +               else
> +                       res_counter_threshold_notify_locked(cnt);
> +               ret = 0;
> +       }
> +       spin_unlock_irqrestore(&cnt->lock, flags);
> +       return ret;
> +}
> +
> +static inline int res_counter_set_threshold(struct res_counter *cnt,
> +               unsigned long long threshold)
> +{
> +       unsigned long flags;
> +       int ret = -EINVAL;
> +
> +       spin_lock_irqsave(&cnt->lock, flags);
> +       if (cnt->limit > threshold) {
> +               cnt->threshold = threshold;
> +               res_counter_threshold_notify_locked(cnt);
>                ret = 0;
>        }
>        spin_unlock_irqrestore(&cnt->lock, flags);
> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> index e1338f0..9b36748 100644
> --- a/kernel/res_counter.c
> +++ b/kernel/res_counter.c
> @@ -5,6 +5,10 @@
>  *
>  * Author: Pavel Emelianov <xemul@openvz.org>
>  *
> + * Resouce usage threshold notification update
> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> + * Author: Dan Malek <dan@embeddedalley.com>
> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>  */
>
>  #include <linux/types.h>
> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>        counter->usage += val;
>        if (counter->usage > counter->max_usage)
>                counter->max_usage = counter->usage;
> +       res_counter_threshold_notify_locked(counter);
>        return 0;
>  }
>
> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
>                return &counter->limit;
>        case RES_FAILCNT:
>                return &counter->failcnt;
> +       case RES_THRESHOLD:
> +               return &counter->threshold;
>        };
>
>        BUG();
> --
> 1.5.6.3
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  0:36       ` Paul Menage
@ 2009-07-14  0:47         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14  0:47 UTC (permalink / raw)
  To: Paul Menage
  Cc: Vladislav Buzov, Linux Kernel Mailing List,
	Linux Containers Mailing List, Linux memory management list,
	Dan Malek, Andrew Morton, Balbir Singh

On Mon, 13 Jul 2009 17:36:40 -0700
Paul Menage <menage@google.com> wrote:

> As I mentioned in another thread, I think that associating the
> threshold with the res_counter rather than with each individual waiter
> is a mistake, since it creates global state and makes it hard to have
> multiple waiters on the same cgroup.
> 
Ah, Hmm...maybe yes. 

But the problem is "hierarchy". (even if this usage notifier don't handle it.)

While we charge as following res_coutner+hierarchy

	res_counter_A			+ PAGE_SIZE
		res_counter_B			+ PAGE_SIZE
			res_counter_C			+ PAGE_SIZE

Checking "where we exceeds" in smart way is not very easy. Balbir's soft limit does
similar check but it's not very smart, either I think.

If there are prural thesholds (notifer, softlimit, etc...), this is worth to be
tried. Hmm...if not, size of res_coutner excees 128bytes and we'll see terrible counter.
Any idea ?

Thanks,
-Kame


> Paul
> 
> On Mon, Jul 13, 2009 at 5:16 PM, Vladislav
> Buzov<vbuzov@embeddedalley.com> wrote:
> > This patch updates the Resource Counter to add a configurable resource usage
> > threshold notification mechanism.
> >
> > Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> > Signed-off-by: Dan Malek <dan@embeddedalley.com>
> > ---
> > A Documentation/cgroups/resource_counter.txt | A  21 ++++++++-
> > A include/linux/res_counter.h A  A  A  A  A  A  A  A | A  69 ++++++++++++++++++++++++++++
> > A kernel/res_counter.c A  A  A  A  A  A  A  A  A  A  A  | A  A 7 +++
> > A 3 files changed, 95 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> > index 95b24d7..1369dff 100644
> > --- a/Documentation/cgroups/resource_counter.txt
> > +++ b/Documentation/cgroups/resource_counter.txt
> > @@ -39,7 +39,20 @@ to work with it.
> > A  A  A  A The failcnt stands for "failures counter". This is the number of
> > A  A  A  A resource allocation attempts that failed.
> >
> > - c. spinlock_t lock
> > + e. unsigned long long threshold
> > +
> > + A  A  A  The resource usage threshold to notify the resouce controller. This is
> > + A  A  A  the minimal difference between the resource limit and current usage
> > + A  A  A  to fire a notification.
> > +
> > + f. void (*threshold_notifier)(struct res_counter *counter)
> > +
> > + A  A  A  The threshold notification callback installed by the resource
> > + A  A  A  controller. Called when the usage reaches or exceeds the threshold.
> > + A  A  A  Should be fast and not sleep because called when interrupts are
> > + A  A  A  disabled.
> > +
> > + g. spinlock_t lock
> >
> > A  A  A  A Protects changes of the above values.
> >
> > @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
> > A  A  A  A usage A  A  A  A  A  usage_in_<unit_of_measurement>
> > A  A  A  A max_usage A  A  A  max_usage_in_<unit_of_measurement>
> > A  A  A  A limit A  A  A  A  A  limit_in_<unit_of_measurement>
> > + A  A  A  threshold A  A  A  notify_threshold_in_<unit_of_measurement>
> > A  A  A  A failcnt A  A  A  A  failcnt
> > A  A  A  A lock A  A  A  A  A  A no file :)
> >
> > @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
> > A  A  A  A usage A  A  A  A  A  prohibited
> > A  A  A  A max_usage A  A  A  reset to usage
> > A  A  A  A limit A  A  A  A  A  set the limit
> > + A  A  A  threshold A  A  A  set the threshold
> > A  A  A  A failcnt A  A  A  A  reset to zero
> >
> > -
> > + d. Notification is enabled by installing the threshold notifier callback. It
> > + A  A is up to the resouce controller to communicate the notification to user
> > + A  A space tasks.
> >
> > A 5. Usage example
> >
> > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> > index 511f42f..5ec98d7 100644
> > --- a/include/linux/res_counter.h
> > +++ b/include/linux/res_counter.h
> > @@ -9,6 +9,11 @@
> > A *
> > A * Author: Pavel Emelianov <xemul@openvz.org>
> > A *
> > + * Resouce usage threshold notification update
> > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> > + * Author: Dan Malek <dan@embeddedalley.com>
> > + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> > + *
> > A * See Documentation/cgroups/resource_counter.txt for more
> > A * info about what this counter is.
> > A */
> > @@ -35,6 +40,19 @@ struct res_counter {
> > A  A  A  A  */
> > A  A  A  A unsigned long long limit;
> > A  A  A  A /*
> > + A  A  A  A * the resource usage threshold to notify the resouce controller. This
> > + A  A  A  A * is the minimal difference between the resource limit and current
> > + A  A  A  A * usage to fire a notification.
> > + A  A  A  A */
> > + A  A  A  unsigned long long threshold;
> > + A  A  A  /*
> > + A  A  A  A * the threshold notification callback installed by the resource
> > + A  A  A  A * controller. Called when the usage reaches or exceeds the threshold.
> > + A  A  A  A * Should be fast and not sleep because called when interrupts are
> > + A  A  A  A * disabled.
> > + A  A  A  A */
> > + A  A  A  void (*threshold_notifier)(struct res_counter *counter);
> > + A  A  A  /*
> > A  A  A  A  * the number of unsuccessful attempts to consume the resource
> > A  A  A  A  */
> > A  A  A  A unsigned long long failcnt;
> > @@ -87,6 +105,7 @@ enum {
> > A  A  A  A RES_MAX_USAGE,
> > A  A  A  A RES_LIMIT,
> > A  A  A  A RES_FAILCNT,
> > + A  A  A  RES_THRESHOLD,
> > A };
> >
> > A /*
> > @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> > A  A  A  A return false;
> > A }
> >
> > +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
> > +{
> > + A  A  A  if (cnt->usage + cnt->threshold < cnt->limit)
> > + A  A  A  A  A  A  A  return true;
> > +
> > + A  A  A  return false;
> > +}
> > +
> > +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
> > +{
> > + A  A  A  if (!res_counter_threshold_check_locked(cnt) &&
> > + A  A  A  A  A  cnt->threshold_notifier)
> > + A  A  A  A  A  A  A  cnt->threshold_notifier(cnt);
> > +}
> > +
> > A /*
> > A * Helper function to detect if the cgroup is within it's limit or
> > A * not. It's currently called from cgroup_rss_prepare()
> > @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
> > A  A  A  A return ret;
> > A }
> >
> > +/*
> > + * Helper function to detect if the cgroup usage is under it's threshold or
> > + * not.
> > + */
> > +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
> > +{
> > + A  A  A  bool ret;
> > + A  A  A  unsigned long flags;
> > +
> > + A  A  A  spin_lock_irqsave(&cnt->lock, flags);
> > + A  A  A  ret = res_counter_threshold_check_locked(cnt);
> > + A  A  A  spin_unlock_irqrestore(&cnt->lock, flags);
> > + A  A  A  return ret;
> > +}
> > +
> > A static inline void res_counter_reset_max(struct res_counter *cnt)
> > A {
> > A  A  A  A unsigned long flags;
> > @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
> > A  A  A  A spin_lock_irqsave(&cnt->lock, flags);
> > A  A  A  A if (cnt->usage <= limit) {
> > A  A  A  A  A  A  A  A cnt->limit = limit;
> > + A  A  A  A  A  A  A  if (limit <= cnt->threshold)
> > + A  A  A  A  A  A  A  A  A  A  A  cnt->threshold = 0;
> > + A  A  A  A  A  A  A  else
> > + A  A  A  A  A  A  A  A  A  A  A  res_counter_threshold_notify_locked(cnt);
> > + A  A  A  A  A  A  A  ret = 0;
> > + A  A  A  }
> > + A  A  A  spin_unlock_irqrestore(&cnt->lock, flags);
> > + A  A  A  return ret;
> > +}
> > +
> > +static inline int res_counter_set_threshold(struct res_counter *cnt,
> > + A  A  A  A  A  A  A  unsigned long long threshold)
> > +{
> > + A  A  A  unsigned long flags;
> > + A  A  A  int ret = -EINVAL;
> > +
> > + A  A  A  spin_lock_irqsave(&cnt->lock, flags);
> > + A  A  A  if (cnt->limit > threshold) {
> > + A  A  A  A  A  A  A  cnt->threshold = threshold;
> > + A  A  A  A  A  A  A  res_counter_threshold_notify_locked(cnt);
> > A  A  A  A  A  A  A  A ret = 0;
> > A  A  A  A }
> > A  A  A  A spin_unlock_irqrestore(&cnt->lock, flags);
> > diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> > index e1338f0..9b36748 100644
> > --- a/kernel/res_counter.c
> > +++ b/kernel/res_counter.c
> > @@ -5,6 +5,10 @@
> > A *
> > A * Author: Pavel Emelianov <xemul@openvz.org>
> > A *
> > + * Resouce usage threshold notification update
> > + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
> > + * Author: Dan Malek <dan@embeddedalley.com>
> > + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
> > A */
> >
> > A #include <linux/types.h>
> > @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > A  A  A  A counter->usage += val;
> > A  A  A  A if (counter->usage > counter->max_usage)
> > A  A  A  A  A  A  A  A counter->max_usage = counter->usage;
> > + A  A  A  res_counter_threshold_notify_locked(counter);
> > A  A  A  A return 0;
> > A }
> >
> > @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
> > A  A  A  A  A  A  A  A return &counter->limit;
> > A  A  A  A case RES_FAILCNT:
> > A  A  A  A  A  A  A  A return &counter->failcnt;
> > + A  A  A  case RES_THRESHOLD:
> > + A  A  A  A  A  A  A  return &counter->threshold;
> > A  A  A  A };
> >
> > A  A  A  A BUG();
> > --
> > 1.5.6.3
> >
> >
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  0:30       ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
@ 2009-07-14  1:29         ` Vladislav D. Buzov
  2009-07-14  1:45           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 13+ messages in thread
From: Vladislav D. Buzov @ 2009-07-14  1:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Linux memory management list, Dan Malek, Andrew Morton,
	Paul Menage, Balbir Singh

KAMEZAWA Hiroyuki wrote:
> On Mon, 13 Jul 2009 17:16:20 -0700
> Vladislav Buzov <vbuzov@embeddedalley.com> wrote:
>
>   
>> This patch updates the Resource Counter to add a configurable resource usage
>> threshold notification mechanism.
>>
>> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
>> Signed-off-by: Dan Malek <dan@embeddedalley.com>
>> ---
>>  Documentation/cgroups/resource_counter.txt |   21 ++++++++-
>>  include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++
>>  kernel/res_counter.c                       |    7 +++
>>  3 files changed, 95 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
>> index 95b24d7..1369dff 100644
>> --- a/Documentation/cgroups/resource_counter.txt
>> +++ b/Documentation/cgroups/resource_counter.txt
>> @@ -39,7 +39,20 @@ to work with it.
>>   	The failcnt stands for "failures counter". This is the number of
>>  	resource allocation attempts that failed.
>>  
>> - c. spinlock_t lock
>> + e. unsigned long long threshold
>> +
>> + 	The resource usage threshold to notify the resouce controller. This is
>> +	the minimal difference between the resource limit and current usage
>> +	to fire a notification.
>> +
>> + f. void (*threshold_notifier)(struct res_counter *counter)
>> +
>> +	The threshold notification callback installed by the resource
>> +	controller. Called when the usage reaches or exceeds the threshold.
>> +	Should be fast and not sleep because called when interrupts are
>> +	disabled.
>> +
>>     
>
> This interface isn't very useful..hard to use..can't you just return the result as
> "exceeds threshold" to the callers ?
>
> If I was you, I'll add following state to res_counter
>
> enum {
> 	RES_BELOW_THRESH,
> 	RES_OVER_THRESH,
> } res_state;
>
> struct res_counter {
> 	.....
> 	enum	res_state	state;
> }
>
> Then, caller does
> example)
> 	prev_state = res->state;
> 	res_counter_charge(res....)
> 	if (prev_state != res->state)
> 		do_xxxxx..
>
> notifier under spinlock is not usual interface. And if this is "notifier",
> something generic, notifier_call_chain should be used rather than original
> one, IIUC.
>
> So, avoiding to use "callback" is a way to go, I think.
>
>   
The reason of having this callback is to support the hierarchy, which
was the problem in previous implementation you pointed out.

When a new page charged we want to walk up the hierarchy and find all
the ancestors exceeding their thresholds and notify them. To avoid
walking up the hierarchy twice, I've expanded res_counter with "notifier
callback" called by res_counter_charge() for each res_counter in the
tree which exceeds the limit.

In the example above, the hierarchy is not supported. We know only state
of the res_counter/memcg which current thread belongs to.

Thanks,
Vlad.

> Thanks,
> -Kame
>
>
>
>
>   
>> + g. spinlock_t lock
>>  
>>   	Protects changes of the above values.
>>  
>> @@ -140,6 +153,7 @@ counter fields. They are recommended to adhere to the following rules:
>>  	usage		usage_in_<unit_of_measurement>
>>  	max_usage	max_usage_in_<unit_of_measurement>
>>  	limit		limit_in_<unit_of_measurement>
>> +	threshold	notify_threshold_in_<unit_of_measurement>
>>  	failcnt		failcnt
>>  	lock		no file :)
>>  
>> @@ -153,9 +167,12 @@ counter fields. They are recommended to adhere to the following rules:
>>  	usage		prohibited
>>  	max_usage	reset to usage
>>  	limit		set the limit
>> +	threshold	set the threshold
>>  	failcnt		reset to zero
>>  
>> -
>> + d. Notification is enabled by installing the threshold notifier callback. It
>> +    is up to the resouce controller to communicate the notification to user
>> +    space tasks.
>>  
>>  5. Usage example
>>  
>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
>> index 511f42f..5ec98d7 100644
>> --- a/include/linux/res_counter.h
>> +++ b/include/linux/res_counter.h
>> @@ -9,6 +9,11 @@
>>   *
>>   * Author: Pavel Emelianov <xemul@openvz.org>
>>   *
>> + * Resouce usage threshold notification update
>> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
>> + * Author: Dan Malek <dan@embeddedalley.com>
>> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>> + *
>>   * See Documentation/cgroups/resource_counter.txt for more
>>   * info about what this counter is.
>>   */
>> @@ -35,6 +40,19 @@ struct res_counter {
>>  	 */
>>  	unsigned long long limit;
>>  	/*
>> +	 * the resource usage threshold to notify the resouce controller. This
>> +	 * is the minimal difference between the resource limit and current
>> +	 * usage to fire a notification.
>> +	 */
>> +	unsigned long long threshold;
>> +	/*
>> +	 * the threshold notification callback installed by the resource
>> +	 * controller. Called when the usage reaches or exceeds the threshold.
>> +	 * Should be fast and not sleep because called when interrupts are
>> +	 * disabled.
>> +	 */
>> +	void (*threshold_notifier)(struct res_counter *counter);
>> +	/*
>>  	 * the number of unsuccessful attempts to consume the resource
>>  	 */
>>  	unsigned long long failcnt;
>> @@ -87,6 +105,7 @@ enum {
>>  	RES_MAX_USAGE,
>>  	RES_LIMIT,
>>  	RES_FAILCNT,
>> +	RES_THRESHOLD,
>>  };
>>  
>>  /*
>> @@ -132,6 +151,21 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
>>  	return false;
>>  }
>>  
>> +static inline bool res_counter_threshold_check_locked(struct res_counter *cnt)
>> +{
>> +	if (cnt->usage + cnt->threshold < cnt->limit)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +static inline void res_counter_threshold_notify_locked(struct res_counter *cnt)
>> +{
>> +	if (!res_counter_threshold_check_locked(cnt) &&
>> +	    cnt->threshold_notifier)
>> +		cnt->threshold_notifier(cnt);
>> +}
>> +
>>  /*
>>   * Helper function to detect if the cgroup is within it's limit or
>>   * not. It's currently called from cgroup_rss_prepare()
>> @@ -147,6 +181,21 @@ static inline bool res_counter_check_under_limit(struct res_counter *cnt)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Helper function to detect if the cgroup usage is under it's threshold or
>> + * not.
>> + */
>> +static inline bool res_counter_check_under_threshold(struct res_counter *cnt)
>> +{
>> +	bool ret;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&cnt->lock, flags);
>> +	ret = res_counter_threshold_check_locked(cnt);
>> +	spin_unlock_irqrestore(&cnt->lock, flags);
>> +	return ret;
>> +}
>> +
>>  static inline void res_counter_reset_max(struct res_counter *cnt)
>>  {
>>  	unsigned long flags;
>> @@ -174,6 +223,26 @@ static inline int res_counter_set_limit(struct res_counter *cnt,
>>  	spin_lock_irqsave(&cnt->lock, flags);
>>  	if (cnt->usage <= limit) {
>>  		cnt->limit = limit;
>> +		if (limit <= cnt->threshold)
>> +			cnt->threshold = 0;
>> +		else
>> +			res_counter_threshold_notify_locked(cnt);
>> +		ret = 0;
>> +	}
>> +	spin_unlock_irqrestore(&cnt->lock, flags);
>> +	return ret;
>> +}
>> +
>> +static inline int res_counter_set_threshold(struct res_counter *cnt,
>> +		unsigned long long threshold)
>> +{
>> +	unsigned long flags;
>> +	int ret = -EINVAL;
>> +
>> +	spin_lock_irqsave(&cnt->lock, flags);
>> +	if (cnt->limit > threshold) {
>> +		cnt->threshold = threshold;
>> +		res_counter_threshold_notify_locked(cnt);
>>  		ret = 0;
>>  	}
>>  	spin_unlock_irqrestore(&cnt->lock, flags);
>> diff --git a/kernel/res_counter.c b/kernel/res_counter.c
>> index e1338f0..9b36748 100644
>> --- a/kernel/res_counter.c
>> +++ b/kernel/res_counter.c
>> @@ -5,6 +5,10 @@
>>   *
>>   * Author: Pavel Emelianov <xemul@openvz.org>
>>   *
>> + * Resouce usage threshold notification update
>> + * Copyright 2009 CE Linux Forum and Embedded Alley Solutions, Inc.
>> + * Author: Dan Malek <dan@embeddedalley.com>
>> + * Author: Vladislav Buzov <vbuzov@embeddedalley.com>
>>   */
>>  
>>  #include <linux/types.h>
>> @@ -32,6 +36,7 @@ int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
>>  	counter->usage += val;
>>  	if (counter->usage > counter->max_usage)
>>  		counter->max_usage = counter->usage;
>> +	res_counter_threshold_notify_locked(counter);
>>  	return 0;
>>  }
>>  
>> @@ -101,6 +106,8 @@ res_counter_member(struct res_counter *counter, int member)
>>  		return &counter->limit;
>>  	case RES_FAILCNT:
>>  		return &counter->failcnt;
>> +	case RES_THRESHOLD:
>> +		return &counter->threshold;
>>  	};
>>  
>>  	BUG();
>> -- 
>> 1.5.6.3
>>
>>
>>     
>
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3)
  2009-07-14  1:29         ` Vladislav D. Buzov
@ 2009-07-14  1:45           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-07-14  1:45 UTC (permalink / raw)
  To: Vladislav D. Buzov
  Cc: Linux Kernel Mailing List, Linux Containers Mailing List,
	Linux memory management list, Dan Malek, Andrew Morton,
	Paul Menage, Balbir Singh

On Mon, 13 Jul 2009 18:29:01 -0700
"Vladislav D. Buzov" <vbuzov@embeddedalley.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Mon, 13 Jul 2009 17:16:20 -0700
> > Vladislav Buzov <vbuzov@embeddedalley.com> wrote:
> >
> >   
> >> This patch updates the Resource Counter to add a configurable resource usage
> >> threshold notification mechanism.
> >>
> >> Signed-off-by: Vladislav Buzov <vbuzov@embeddedalley.com>
> >> Signed-off-by: Dan Malek <dan@embeddedalley.com>
> >> ---
> >>  Documentation/cgroups/resource_counter.txt |   21 ++++++++-
> >>  include/linux/res_counter.h                |   69 ++++++++++++++++++++++++++++
> >>  kernel/res_counter.c                       |    7 +++
> >>  3 files changed, 95 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/cgroups/resource_counter.txt b/Documentation/cgroups/resource_counter.txt
> >> index 95b24d7..1369dff 100644
> >> --- a/Documentation/cgroups/resource_counter.txt
> >> +++ b/Documentation/cgroups/resource_counter.txt
> >> @@ -39,7 +39,20 @@ to work with it.
> >>   	The failcnt stands for "failures counter". This is the number of
> >>  	resource allocation attempts that failed.
> >>  
> >> - c. spinlock_t lock
> >> + e. unsigned long long threshold
> >> +
> >> + 	The resource usage threshold to notify the resouce controller. This is
> >> +	the minimal difference between the resource limit and current usage
> >> +	to fire a notification.
> >> +
> >> + f. void (*threshold_notifier)(struct res_counter *counter)
> >> +
> >> +	The threshold notification callback installed by the resource
> >> +	controller. Called when the usage reaches or exceeds the threshold.
> >> +	Should be fast and not sleep because called when interrupts are
> >> +	disabled.
> >> +
> >>     
> >
> > This interface isn't very useful..hard to use..can't you just return the result as
> > "exceeds threshold" to the callers ?
> >
> > If I was you, I'll add following state to res_counter
> >
> > enum {
> > 	RES_BELOW_THRESH,
> > 	RES_OVER_THRESH,
> > } res_state;
> >
> > struct res_counter {
> > 	.....
> > 	enum	res_state	state;
> > }
> >
> > Then, caller does
> > example)
> > 	prev_state = res->state;
> > 	res_counter_charge(res....)
> > 	if (prev_state != res->state)
> > 		do_xxxxx..
> >
> > notifier under spinlock is not usual interface. And if this is "notifier",
> > something generic, notifier_call_chain should be used rather than original
> > one, IIUC.
> >
> > So, avoiding to use "callback" is a way to go, I think.
> >
> >   
> The reason of having this callback is to support the hierarchy, which
> was the problem in previous implementation you pointed out.
> 
> When a new page charged we want to walk up the hierarchy and find all
> the ancestors exceeding their thresholds and notify them. To avoid
> walking up the hierarchy twice, I've expanded res_counter with "notifier
> callback" called by res_counter_charge() for each res_counter in the
> tree which exceeds the limit.
> 
> In the example above, the hierarchy is not supported. We know only state
> of the res_counter/memcg which current thread belongs to.
> 
How heavy res_coutner can be ? ;) plz don't check at "every charge", use some
filter.

plz discuss with Balbir. His softlimit adds something similar. And I don't think
both are elegant.

I'll consider more (of course, I may not be able to find any..) and rewrite the
whole thing if I have a chance.

Briefly thinking, it's not very bad to have following interface.

==
/*
 * This function is for checking all ancestors's state. Each ancestors are
 * pased to check_function() ony be one until res->parent is not NULL.
 */
void res_counter_callback(struct res_counter *res, int (*check_function)())
{
	do {
		if ((*check_function)(res))
			break;
		res = res->parent;
	} while (res);
}
==
Calling this once per 1000 charges or once per sec will not be very bad. And we can
keep res_counter simple. If you want some trigger, you can add something as
you like.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-07-14  1:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1239660512-25468-1-git-send-email-dan@embeddedalley.com>
     [not found] ` <1246998310-16764-1-git-send-email-vbuzov@embeddedalley.com>
     [not found]   ` <1246998310-16764-2-git-send-email-vbuzov@embeddedalley.com>
     [not found]     ` <20090708095616.cdfe8c7c.kamezawa.hiroyu@jp.fujitsu.com>
2009-07-09  1:43       ` [PATCH 1/1] Memory usage limit notification addition to memcg Vladislav D. Buzov
2009-07-13  0:52         ` KAMEZAWA Hiroyuki
2009-07-13 21:21           ` Vladislav D. Buzov
2009-07-14  0:16   ` [PATCH 0/2] Memory usage limit notification feature (v3) Vladislav Buzov
2009-07-14  0:16     ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) Vladislav Buzov
2009-07-14  0:16       ` [PATCH 2/2] Memory usage limit notification addition to memcg (v3) Vladislav Buzov
2009-07-14  0:30       ` [PATCH 1/2] Resource usage threshold notification addition to res_counter (v3) KAMEZAWA Hiroyuki
2009-07-14  1:29         ` Vladislav D. Buzov
2009-07-14  1:45           ` KAMEZAWA Hiroyuki
2009-07-14  0:36       ` Paul Menage
2009-07-14  0:47         ` KAMEZAWA Hiroyuki
2009-07-14  0:20     ` [PATCH 0/2] Memory usage limit notification feature (v3) Paul Menage
2009-07-14  0:31       ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).