Re: [updated] [Patch 6/8] delay accounting usage of taskstats interface

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@in.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
	jlan@engr.sgi.com, akpm@osdl.org
Subject: Re: [updated] [Patch 6/8] delay accounting usage of taskstats interface
Date: Tue, 9 May 2006 22:41:20 +0530	[thread overview]
Message-ID: <20060509171120.GA10478@in.ibm.com> (raw)
In-Reply-To: <1147175208.7392.40.camel@localhost.localdomain>

On Tue, May 09, 2006 at 11:46:48AM +0000, Thomas Gleixner wrote:
> On Fri, 2006-05-05 at 00:14 +0530, Balbir Singh wrote:
> 
> >  #else
> >  static inline void delayacct_set_flag(int flag)
> >  {}
> 
> Please make this
> 
> static inline void delayacct_set_flag(int flag) { }
> 
> Same in all other files

Will do.

> 
> > @@ -89,6 +99,9 @@ static inline void delayacct_blkio_start
> >  {}
> >  static inline void delayacct_blkio_end(void)
> >  {}
> > +static inline int delayacct_add_tsk(struct taskstats *d,
> > +					struct task_struct *tsk)
> > +{ return 0; }
> 
> {
> 	return 0;
> }
> 
> > +	 *
> > +	 * xxx_count is the number of delay values recorded
> > +	 * xxx_delay_total is the corresponding cumulative delay in nanoseconds
> > +	 *
> > +	 * xxx_delay_total wraps around to zero on overflow
> > +	 * xxx_count incremented regardless of overflow
> > +	 */
> 
> Please use a structure for that.
> 
> struct delay_account {
> 	ktime_t delay;
> 	u64 count;
> };
> 
> Also instead of having tons of fields added, please use something like
> 
> enum {
> 	CPU_ACCT,
> 	BLKIO_ACCT,
> 	....,
> 	MAX_ACCT_TYPES
> };
> 	struct delay_account stats[MAX_ACCT_TYPES];
>

Yes, this would make sense if all the fields were of type delay_account.
currently, only CPU, BLKIO and SWAPIN are of that type. The new
fields that will be added to this structure (since this is a part of
a statistics interface now) could potentially belong to other subsystems.
 
> 
> > +	/* Delay waiting for cpu, while runnable
> > +	 * count, delay_total NOT updated atomically
> > +	 */
> > +	__u64	cpu_count;
> > +	__u64	cpu_delay_total;
> > +	/* Following four fields atomically updated using task->delays->lock */
> > +
> > +	/* Delay waiting for synchronous block I/O to complete
> > +	 * does not account for delays in I/O submission
> > +	 */
> > +	__u64	blkio_count;
> > +	__u64	blkio_delay_total;
> > +
> > +	/* Delay waiting for page fault I/O (swap in only) */
> > +	__u64	swapin_count;
> > +	__u64	swapin_delay_total;
> > +
> > +	/* cpu "wall-clock" running time
> > +	 * On some architectures, value will adjust for cpu time stolen
> > +	 * from the kernel in involuntary waits due to virtualization.
> > +	 * Value is cumulative, in nanoseconds, without a corresponding count
> > +	 * and wraps around to zero silently on overflow
> 
> When will this overflow happen ? 2^64 nsec ~= 584 years
>

Detecting a potential overflow is not a bad thing IMHO. Currently
the existing data type makes this overflow very rare.
 
> 
> > +int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
> > +{
> 
> Please add some comment what this code does.

Will do

> 
> > +	s64 tmp;
> > +	struct timespec ts;
> > +	unsigned long t1,t2,t3;
> > +
> > +	tmp = (s64)d->cpu_run_real_total;
> > +	tmp += (u64)(tsk->utime + tsk->stime) * TICK_NSEC;
> > +	d->cpu_run_real_total = (tmp < (s64)d->cpu_run_real_total) ? 0 : tmp;
> > +
> > +	/*
> > +	 * No locking available for sched_info (and too expensive to add one)
> > +	 * Mitigate by taking snapshot of values
> > +	 */
> > +	t1 = tsk->sched_info.pcnt;
> > +	t2 = tsk->sched_info.run_delay;
> > +	t3 = tsk->sched_info.cpu_time;
> > +
> > +	d->cpu_count += t1;
> > +
> > +	jiffies_to_timespec(t2, &ts);
> > +	tmp = (s64)d->cpu_delay_total + timespec_to_ns(&ts);
> > +	d->cpu_delay_total = (tmp < (s64)d->cpu_delay_total) ? 0 : tmp;
> > +
> > +	tmp = (s64)d->cpu_run_virtual_total + (s64)jiffies_to_usecs(t3) * 1000;
> > +	d->cpu_run_virtual_total =
> > +		(tmp < (s64)d->cpu_run_virtual_total) ?	0 : tmp;
> > +
> > +	/* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */
> > +
> > +	spin_lock(&tsk->delays->lock);
> > +	tmp = d->blkio_delay_total + tsk->delays->blkio_delay;
> > +	d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp;
> 
> I really do not get whatfor all these comparisons are. How can those
> values get > 2^63 within a reasonable timeframe ?

Its defensive programming, granted that it is probably not required.

> 
> > +	tmp = d->swapin_delay_total + tsk->delays->swapin_delay;
> > +	d->swapin_delay_total = (tmp < d->swapin_delay_total) ? 0 : tmp;
> > +	d->blkio_count += tsk->delays->blkio_count;
> > +	d->swapin_count += tsk->delays->swapin_count;
> > +	spin_unlock(&tsk->delays->lock);
> > +	return 0;
> > +}
> > diff -puN kernel/taskstats.c~delayacct-taskstats kernel/taskstats.c
> > --- linux-2.6.17-rc3/kernel/taskstats.c~delayacct-taskstats	2006-05-04 09:31:59.000000000 +0530
> > +++ linux-2.6.17-rc3-balbir/kernel/taskstats.c	2006-05-04 11:27:53.000000000 +0530
> > @@ -18,6 +18,7 @@
> >  
> >  #include <linux/kernel.h>
> >  #include <linux/taskstats_kern.h>
> > +#include <linux/delayacct.h>
> >  #include <net/genetlink.h>
> >  #include <asm/atomic.h>
> >  
> > @@ -120,7 +121,10 @@ static int fill_pid(pid_t pid, struct ta
> >  	 *		goto err;
> >  	 */
> >  
> > -err:
> > +	rc = delayacct_add_tsk(stats, tsk);
> > +	stats->version = TASKSTATS_VERSION;
> > +
> > +	/* Define err: label here if needed */
> 
> Please remove that comment. The err label goes to a place where it makes
> sense.

Will fix this.

> 
> >  	put_task_struct(tsk);
> >  	return rc;
> >  
> > @@ -152,8 +156,14 @@ static int fill_tgid(pid_t tgid, struct 
> >  		 *		break;
> >  		 */
> >  
> > +		rc = delayacct_add_tsk(stats, tsk);
> > +		if (rc)
> > +			break;
> > +
> >  	} while_each_thread(first, tsk);
> >  	read_unlock(&tasklist_lock);
> > +	stats->version = TASKSTATS_VERSION;
> > +
> >  
> >  	/*
> >  	 * Accounting subsytems can also add calls here if they don't
> > _
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/


	Thanks for your review,
	Balbir Singh,
	Linux Technology Center,
	IBM Software Labs

     prev parent reply	other threads:[~2006-05-09 17:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-02  6:19 [Patch 6/8] delay accounting usage of taskstats interface Balbir Singh
2006-05-04  2:13 ` [Lse-tech] " Jay Lan
2006-05-04  4:23   ` Balbir Singh
2006-05-04 18:44 ` [updated] " Balbir Singh
2006-05-09 11:46   ` Thomas Gleixner
2006-05-09 17:11     ` Balbir Singh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060509171120.GA10478@in.ibm.com \
    --to=balbir@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=jlan@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.