All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <righi.andrea@gmail.com>
To: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk,
	baramsori72@gmail.com, Carl Henrik Lunde <chlunde@ping.uio.no>,
	dave@linux.vnet.ibm.com, Divyesh Shah <dpshah@google.com>,
	eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	Li Zefan <lizf@cn.fujitsu.com>,
	matt@bluehost.com, dradford@bluehost.com, ngupta@google.com,
	randy.dunlap@oracle.com, roberto@unbit.it,
	Ryo Tsuruta <ryov@valinux.co.jp>,
	Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
	subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp,
	Nauman Rafique <nauman@google.com>,
	fchecconi@gmail.com, paolo.valente@unimore.it,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] res_counter: introduce ratelimiting attributes
Date: Tue, 21 Apr 2009 13:16:34 +0200	[thread overview]
Message-ID: <20090421111633.GC13699@linux> (raw)
In-Reply-To: <20090421101326.GE19637@balbir.in.ibm.com>

On Tue, Apr 21, 2009 at 03:43:26PM +0530, Balbir Singh wrote:
> * Andrea Righi <righi.andrea@gmail.com> [2009-04-18 23:38:27]:
> 
> > Introduce attributes and functions in res_counter to implement throttling-based
> > cgroup subsystems.
> > 
> > The following attributes have been added to struct res_counter:
> >  * @policy:     the limiting policy / algorithm
> >  * @capacity:   the maximum capacity of the resource
> >  * @timestamp:  timestamp of the last accounted resource request
> > 
> 
> Units of each of the above would be desirable, without them it is hard
> to understand what you are trying to add. What is the unit of
> capacity?

Theoretically it can be any unit. At the moment it is used by the
io-throttle controller only for the token bucket strategy (@policy =
RATELIMIT_TOKEN_BUCKET) and it can be either bytes or IO operations.

Maybe I should add a comment like this.

> 
> > Currently the available policies are: token-bucket and leaky-bucket and the
> > attribute @capacity is only used by token-bucket policy (to represent the
> > bucket size).
> > 
> > The following function has been implemented to return the amount of time a
> > cgroup should sleep to remain within the defined resource limits.
> > 
> >   unsigned long long
> >   res_counter_ratelimit_sleep(struct res_counter *res, ssize_t val);
> > 
> > [ Note: only the interfaces needed by the cgroup IO controller are implemented
> > right now ]
> > 
> 
> This is a good RFC, but I would hold off merging till the subsystem
> gets in. Having said that I am not convinced about the subsystem
> sleeping, if the subsystem is not IO intensive, should it still sleep
> because it is over its IO b/w? This might make sense for the CPU
> controller, since not having CPU b/w does imply sleeping.
> 
> Could you please use the word throttle instead of sleep.

OK, will do in the next version.

> 
> 
> > Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> > ---
> >  include/linux/res_counter.h |   69 +++++++++++++++++++++++++++++++----------
> >  kernel/res_counter.c        |   72 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 124 insertions(+), 17 deletions(-)
> > 
> > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
> > index 4c5bcf6..9bed6af 100644
> > --- a/include/linux/res_counter.h
> > +++ b/include/linux/res_counter.h
> > @@ -14,30 +14,36 @@
> >   */
> > 
> >  #include <linux/cgroup.h>
> > +#include <linux/jiffies.h>
> > 
> > -/*
> > - * The core object. the cgroup that wishes to account for some
> > - * resource may include this counter into its structures and use
> > - * the helpers described beyond
> > - */
> > +/* The various policies that can be used for ratelimiting resources */
> > +#define	RATELIMIT_LEAKY_BUCKET	0
> > +#define	RATELIMIT_TOKEN_BUCKET	1
> > 
> > +/**
> > + * struct res_counter - the core object to account cgroup resources
> > + *
> > + * @usage:	the current resource consumption level
> > + * @max_usage:	the maximal value of the usage from the counter creation
> > + * @limit:	the limit that usage cannot be exceeded
> > + * @failcnt:	the number of unsuccessful attempts to consume the resource
> > + * @policy:	the limiting policy / algorithm
> > + * @capacity:	the maximum capacity of the resource
> > + * @timestamp:	timestamp of the last accounted resource request
> > + * @lock:	the lock to protect all of the above.
> > + *		The routines below consider this to be IRQ-safe
> > + *
> > + * The cgroup that wishes to account for some resource may include this counter
> > + * into its structures and use the helpers described beyond.
> > + */
> >  struct res_counter {
> > -	/*
> > -	 * the current resource consumption level
> > -	 */
> >  	unsigned long long usage;
> > -	/*
> > -	 * the maximal value of the usage from the counter creation
> > -	 */
> >  	unsigned long long max_usage;
> > -	/*
> > -	 * the limit that usage cannot exceed
> > -	 */
> >  	unsigned long long limit;
> > -	/*
> > -	 * the number of unsuccessful attempts to consume the resource
> > -	 */
> 
> Don't understand why res_counter is removed? Am I reading the diff
> correctly?

It is not removed. I've just used the kernel-doc style comment
(Documentation/kernel-doc-nano-HOWTO.txt). I think Randy suggested this
in the past.

> 
> >  	unsigned long long failcnt;
> > +	unsigned long long policy;
> > +	unsigned long long capacity;
> > +	unsigned long long timestamp;
> >  	/*
> >  	 * the lock to protect all of the above.
> >  	 * the routines below consider this to be IRQ-safe
> > @@ -84,6 +90,9 @@ enum {
> >  	RES_USAGE,
> >  	RES_MAX_USAGE,
> >  	RES_LIMIT,
> > +	RES_POLICY,
> > +	RES_TIMESTAMP,
> > +	RES_CAPACITY,
> >  	RES_FAILCNT,
> >  };
> > 
> > @@ -130,6 +139,15 @@ static inline bool res_counter_limit_check_locked(struct res_counter *cnt)
> >  	return false;
> >  }
> > 
> > +static inline unsigned long long
> > +res_counter_ratelimit_delta_t(struct res_counter *res)
> > +{
> > +	return (long long)get_jiffies_64() - (long long)res->timestamp;
> > +}
> > +
> > +unsigned long long
> > +res_counter_ratelimit_sleep(struct res_counter *res, ssize_t val);
> > +
> >  /*
> >   * Helper function to detect if the cgroup is within it's limit or
> >   * not. It's currently called from cgroup_rss_prepare()
> > @@ -163,6 +181,23 @@ static inline void res_counter_reset_failcnt(struct res_counter *cnt)
> >  	spin_unlock_irqrestore(&cnt->lock, flags);
> >  }
> > 
> > +static inline int
> > +res_counter_ratelimit_set_limit(struct res_counter *cnt,
> > +			unsigned long long policy,
> > +			unsigned long long limit, unsigned long long max)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	cnt->limit = limit;
> > +	cnt->capacity = max;
> > +	cnt->policy = policy;
> > +	cnt->timestamp = get_jiffies_64();
> > +	cnt->usage = 0;
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return 0;
> > +}
> > +
> >  static inline int res_counter_set_limit(struct res_counter *cnt,
> >  		unsigned long long limit)
> >  {
> > diff --git a/kernel/res_counter.c b/kernel/res_counter.c
> > index bf8e753..b62319c 100644
> > --- a/kernel/res_counter.c
> > +++ b/kernel/res_counter.c
> > @@ -9,6 +9,7 @@
> > 
> >  #include <linux/types.h>
> >  #include <linux/parser.h>
> > +#include <linux/jiffies.h>
> >  #include <linux/fs.h>
> >  #include <linux/slab.h>
> >  #include <linux/res_counter.h>
> > @@ -20,6 +21,8 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
> >  	spin_lock_init(&counter->lock);
> >  	counter->limit = (unsigned long long)LLONG_MAX;
> >  	counter->parent = parent;
> > +	counter->capacity = (unsigned long long)LLONG_MAX;
> > +	counter->timestamp = get_jiffies_64();
> >  }
> > 
> >  int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> > @@ -99,6 +102,12 @@ res_counter_member(struct res_counter *counter, int member)
> >  		return &counter->max_usage;
> >  	case RES_LIMIT:
> >  		return &counter->limit;
> > +	case RES_POLICY:
> > +		return &counter->policy;
> > +	case RES_TIMESTAMP:
> > +		return &counter->timestamp;
> > +	case RES_CAPACITY:
> > +		return &counter->capacity;
> >  	case RES_FAILCNT:
> >  		return &counter->failcnt;
> >  	};
> > @@ -163,3 +172,66 @@ int res_counter_write(struct res_counter *counter, int member,
> >  	spin_unlock_irqrestore(&counter->lock, flags);
> >  	return 0;
> >  }
> > +
> > +static unsigned long long
> > +ratelimit_leaky_bucket(struct res_counter *res, ssize_t val)
> > +{
> > +	unsigned long long delta, t;
> > +
> > +	res->usage += val;
> 
> Is this called from a protected context (w.r.t. res)?

Yes, it is called with res->lock held (look at
res_counter_ratelimit_sleep()).

I can add a comment anyway.

> 
> > +	delta = res_counter_ratelimit_delta_t(res);
> > +	if (!delta)
> > +		return 0;
> > +	t = res->usage * USEC_PER_SEC;
> > +	t = usecs_to_jiffies(div_u64(t, res->limit));
> > +	if (t > delta)
> > +		return t - delta;
> > +	/* Reset i/o statistics */
> > +	res->usage = 0;
> > +	res->timestamp = get_jiffies_64();
> > +	return 0;
> > +}
> > +
> > +static unsigned long long
> > +ratelimit_token_bucket(struct res_counter *res, ssize_t val)
> > +{
> > +	unsigned long long delta;
> > +	long long tok;
> > +
> > +	res->usage -= val;
> > +	delta = jiffies_to_msecs(res_counter_ratelimit_delta_t(res));
> > +	res->timestamp = get_jiffies_64();
> > +	tok = (long long)res->usage * MSEC_PER_SEC;
> > +	if (delta) {
> > +		long long max = (long long)res->capacity * MSEC_PER_SEC;
> > +
> > +		tok += delta * res->limit;
> > +		if (tok > max)
> > +			tok = max;
> 
> Use max_t() here

ok.

> 
> > +		res->usage = (unsigned long long)div_s64(tok, MSEC_PER_SEC);
> > +	}
> > +	return (tok < 0) ? msecs_to_jiffies(div_u64(-tok, res->limit)) : 0;
> > +}
> 
> I don't like the usage of MSEC and USEC for res->usage based on
> policy.

I used a different granularity only because in the io-throttle tests
token bucket worked better with USEC and leaky bucket with MSEC. But we
can generalize and encode this "granularity" information in a
res_counter->flags attribute.

> 
> > +
> > +unsigned long long
> > +res_counter_ratelimit_sleep(struct res_counter *res, ssize_t val)
> > +{
> > +	unsigned long long sleep = 0;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&res->lock, flags);
> > +	if (res->limit)
> > +		switch (res->policy) {
> > +		case RATELIMIT_LEAKY_BUCKET:
> > +			sleep = ratelimit_leaky_bucket(res, val);
> > +			break;
> > +		case RATELIMIT_TOKEN_BUCKET:
> > +			sleep = ratelimit_token_bucket(res, val);
> > +			break;
> > +		default:
> > +			WARN_ON(1);
> > +			break;
> > +		}
> > +	spin_unlock_irqrestore(&res->lock, flags);
> > +	return sleep;
> > +}
> > -- 
> > 1.5.6.3
> > 
> > 
> 
> -- 
> 	Balbir

Thanks for your comments!
-Andrea

  reply	other threads:[~2009-04-21 11:16 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-18 21:38 [PATCH 0/7] cgroup: io-throttle controller (v14) Andrea Righi
2009-04-18 21:38 ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38 ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-21  0:15   ` KAMEZAWA Hiroyuki
     [not found]     ` <20090421091534.971f676f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-04-21  9:55       ` Andrea Righi
2009-04-21  9:55     ` Andrea Righi
2009-04-21 10:16       ` Balbir Singh
     [not found]         ` <20090421101659.GF19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 14:17           ` Andrea Righi
2009-04-21 14:17         ` Andrea Righi
2009-04-21 10:16       ` Balbir Singh
2009-04-21 10:19       ` KAMEZAWA Hiroyuki
2009-04-21 10:19       ` KAMEZAWA Hiroyuki
     [not found]   ` <1240090712-1058-3-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-21  0:15     ` KAMEZAWA Hiroyuki
2009-04-21 10:13     ` Balbir Singh
2009-04-21 10:13   ` Balbir Singh
2009-04-21 11:16     ` Andrea Righi [this message]
     [not found]     ` <20090421101326.GE19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 11:16       ` Andrea Righi
2009-04-18 21:38 ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-24  2:11   ` Gui Jianfeng
     [not found]     ` <49F11FBD.3070705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-24  8:31       ` Andrea Righi
2009-04-24  8:31     ` Andrea Righi
2009-04-24  9:14       ` Gui Jianfeng
2009-04-24  9:14       ` Gui Jianfeng
     [not found]         ` <49F1830F.8020609-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-26 17:19           ` Andrea Righi
2009-04-26 17:19             ` Andrea Righi
     [not found]   ` <1240090712-1058-4-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-24  2:11     ` Gui Jianfeng
2009-04-18 21:38 ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
     [not found]   ` <1240090712-1058-5-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-20 17:59     ` Paul E. McKenney
2009-04-20 17:59   ` Paul E. McKenney
     [not found]     ` <20090420175904.GD6822-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-20 21:22       ` Andrea Righi
2009-04-20 21:22     ` Andrea Righi
2009-04-21  4:15       ` Paul E. McKenney
2009-04-21 12:58         ` Andrea Righi
2009-04-21 14:03           ` Paul E. McKenney
2009-04-21 14:03           ` Paul E. McKenney
     [not found]         ` <20090421041524.GB6939-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-21 12:58           ` Andrea Righi
2009-04-21  4:15       ` Paul E. McKenney
2009-04-18 21:38 ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
     [not found]   ` <1240090712-1058-6-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-23  7:53     ` Gui Jianfeng
2009-04-23  7:53       ` Gui Jianfeng
     [not found]       ` <49F01E8F.80807-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-23 10:25         ` Andrea Righi
2009-04-23 10:25       ` Andrea Righi
2009-04-24  6:36         ` Gui Jianfeng
2009-04-24  6:36         ` Gui Jianfeng
     [not found] ` <1240090712-1058-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-18 21:38   ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38   ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-18 21:38   ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-18 21:38   ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
2009-04-18 21:38   ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-18 21:38   ` [PATCH 6/7] io-throttle instrumentation Andrea Righi
2009-04-18 21:38     ` Andrea Righi
2009-04-18 21:38   ` [PATCH 7/7] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-18 21:38     ` Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2009-05-03 11:36 [PATCH 0/7] cgroup: io-throttle controller (v16) Andrea Righi
2009-05-03 11:36 ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
     [not found] ` <1241350583-9871-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-05-03 11:36   ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090421111633.GC13699@linux \
    --to=righi.andrea@gmail.com \
    --cc=agk@sourceware.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=baramsori72@gmail.com \
    --cc=chlunde@ping.uio.no \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=dpshah@google.com \
    --cc=dradford@bluehost.com \
    --cc=eric.rannaud@gmail.com \
    --cc=fchecconi@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matt@bluehost.com \
    --cc=menage@google.com \
    --cc=nauman@google.com \
    --cc=ngupta@google.com \
    --cc=paolo.valente@unimore.it \
    --cc=randy.dunlap@oracle.com \
    --cc=roberto@unbit.it \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=subrata@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.