public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Gerlof Langeveld <gerlof@ATComputing.nl>
Cc: linux-kernel@vger.kernel.org, Balbir Singh <balbir@in.ibm.com>,
	Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [PATCH 1/3] accounting: task counters for disk/network
Date: Mon, 7 Apr 2008 23:10:07 -0700	[thread overview]
Message-ID: <20080407231007.4946410d.akpm@linux-foundation.org> (raw)
In-Reply-To: <20080408054837.GA7103@atcmpg.ATComputing.nl>

On Tue, 8 Apr 2008 07:48:37 +0200 Gerlof Langeveld <gerlof@ATComputing.nl> wrote:

> > > --- linux-2.6.24.4-vanilla/block/ll_rw_blk.c	2008-03-24 19:49:18.000000000 +0100
> > > +++ linux-2.6.24.4-modified/block/ll_rw_blk.c	2008-03-25 13:52:14.000000000 +0100
> > > @@ -2739,6 +2739,19 @@ static void drive_stat_acct(struct reque
> > >  		disk_round_stats(rq->rq_disk);
> > >  		rq->rq_disk->in_flight++;
> > >  	}
> > > +
> > > +#ifdef CONFIG_TASK_IO_ACCOUNTING
> > > +	switch (rw) {
> > > +	case READ:
> > > +		current->group_leader->ioac.dsk_rio += new_io;
> > > +		current->group_leader->ioac.dsk_rsz += rq->nr_sectors;
> > > +		break;
> > > +	case WRITE:
> > > +		current->group_leader->ioac.dsk_wio += new_io;
> > > +		current->group_leader->ioac.dsk_wsz += rq->nr_sectors;
> > > +		break;
> > > +	}
> > > +#endif
> > 
> > For many workloads, this will cause almost all writeout to be accounted to
> > pdflush and perhaps kswapd.  This makes the per-task write accounting
> > largely unuseful.
> 
> There are several situations that writeouts are accounted to the user-process
> itself, e.g. when issueing direct writes (open mode O_DIRECT) or synchronous
> writes (open mode O_SYNC, syscall sync/fsync, synchronous file attribute,
> synchronous mounted filesystem).

yup.

> Apart from that, swapping out of process pages by kswapd is currently not
> accounted at all as shown by the following snapshot of 'atop' on a heavily
> swapping system:

Under heavy load, callers into alloc_pages() will themselves perform disk
writeout.  So under the proposed scheme, process A will be accounted for
writeout which was in fact caused by process B.

> So the extra counters can be considered as a useful addition to the I/O 
> counters that are currently maintained.

mmm, maybe.  But if we implement a partial solution like this we really
should have a plan to finish it off.

There have been numerous attempts at this, which tend to involve adding
backpointers to the pageframe structure and such.

This sort of accounting will presumably be needed by a disk bandwidth
cgroup controller.  Perhaps the containers/cgroup people have plans of code
already?


  reply	other threads:[~2008-04-08  6:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-02  7:30 [PATCH 1/3] accounting: task counters for disk/network Gerlof Langeveld
2008-04-03 19:54 ` Andrew Morton
2008-04-08  5:48   ` Gerlof Langeveld
2008-04-08  6:10     ` Andrew Morton [this message]
2008-04-08  6:16       ` Paul Menage
2008-04-10  3:41         ` Hirokazu Takahashi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080407231007.4946410d.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=gerlof@ATComputing.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox