linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@lst.de>, Theodore Ts'o <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/13] IO-less dirty throttling v2
Date: Thu, 18 Nov 2010 18:27:06 +1100	[thread overview]
Message-ID: <20101118072706.GW13830@dastard> (raw)
In-Reply-To: <20101117193431.ec1f4547.akpm@linux-foundation.org>

On Wed, Nov 17, 2010 at 07:34:31PM -0800, Andrew Morton wrote:
> On Thu, 18 Nov 2010 14:21:41 +1100 Dave Chinner <david@fromorbit.com> wrote:
> 
> > > But mainly because we're taking the work accounting away from the user
> > > who caused it and crediting it to the kernel thread instead, and that's
> > > an actively *bad* thing to do.
> > 
> > The current foreground writeback is doing work on behalf of the
> > system (i.e. doing background writeback) and therefore crediting it
> > to the user process. That seems wrong to me; it's hiding the
> > overhead of system tasks in user processes.
> > 
> > IMO, time spent doing background writeback should not be creditted
> > to user processes - writeback caching is a function of the OS and
> > it's overhead should be accounted as such.
> 
> bah, that's bunk.  Using this logic, _no_ time spent in the kernel
> should be accounted to the user process and we may as well do away with
> system-time accounting altogether.

That's a rather extreme intepretation and not what I meant at all.
:/

> If userspace performs some action which causes the kernel to consume
> CPU resources, that consumption should be accounted to that process.

Which is pretty much impossible for work deferred to background
kernel threads. On a vanilla kernel (without this series), the CPU
dd consumes on ext4 is (output from top):

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
9875 dave      20   0 10708 1604  516 R   74  0.0   0:05.58 dd
9876 root      20   0     0    0    0 D   30  0.0   0:01.76 flush-253:16
 561 root      20   0     0    0    0 R   17  0.0  21:45.06 kswapd0
8820 root      20   0     0    0    0 S   10  0.0  15:58.61 jbd2/vdb-8

The dd is consuming 75% cpu time, all in system, including
foreground writeback.  We've got 30% being consumed by the bdi
flusher doing background writeback.  We've got 17% consumed by
kswapd reclaiming memory. And finally, 10% is consumed by a jbd2
thread.  So, all up, the dd is triggering ~130% CPU usage, but time
only reports:

$ /usr/bin/time dd if=/dev/zero of=/mnt/scratch/test1 bs=1024k count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 17.8536 s, 587 MB/s
0.00user 12.11system 0:17.91elapsed 67%CPU (0avgtext+0avgdata 7296maxresident)k
0inputs+0outputs (11major+506minor)pagefaults 0swaps

67% CPU usage for dd. IOWs, half of the CPU time associated with a
dd write is already accounted to kernel threads in a current kernel.

> Yes, writeback can be inaccurate because process A will write back
> process B's stuff, but that should even out on average, and it's more
> accurate than saying "zero".

Sure, but it still doesn't account for the flusher, jbd or kswapd
CPU usage that is still being chewed up. That's still missing from
'time dd'.

> > Indeed, nobody has
> > realised (until now) just how inefficient it really is because of
> > the fact that the overhead is mostly hidden in user process system
> > time.
> 
> "hidden"?  You do "time dd" and look at the output!
> 
> _now_ it's hidden.  You do "time dd" and whee, no system time!

What I meant is that the cost of foreground writeback was hidden in
the process system time. Now we have separated the two of them, we
can see exactly how much it was costing us because it is no longer
hidden inside the process system time.

Besides, there's plenty of system time still accounted to the dd.
It's now just the CPU time spent writing data into the page cache,
rather than write + writeback CPU time.

> You
> need to do complex gymnastics with kernel thread accounting to work out
> the real cost of your dd.

Yup, that's what we've been doing for years. ;) e.g from the high
bandwidth IO paper I presented at OLS 2006, section 5.3 "kswapd and
pdflush":

	"While running single threaded tests, it was clear
	that there was something running in the back-
	ground that was using more CPU time than the
	writer process and pdflush combined. A sin-
	gle threaded read from disk consuming a single
	CPU was consuming 10-15% of a CPU on each
	node running memory reclaim via kswapd. For
	a single threaded write, this was closer to 30%
	of a CPU per node. On our twelve node ma-
	chine, this meant that we were using between
	1.5 and 3.5 CPUs to reclaim memory being al-
	located by a single CPU."

(http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-presentation.pdf)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-11-18  7:27 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-17  4:27 [PATCH 00/13] IO-less dirty throttling v2 Wu Fengguang
2010-11-17  4:27 ` [PATCH 01/13] writeback: IO-less balance_dirty_pages() Wu Fengguang
2010-11-17 10:34   ` Minchan Kim
2010-11-22  2:01     ` Wu Fengguang
2010-11-17 23:08   ` Andrew Morton
2010-11-18 13:04   ` Peter Zijlstra
2010-11-18 13:26     ` Wu Fengguang
2010-11-18 13:40       ` Peter Zijlstra
2010-11-18 14:02         ` Wu Fengguang
     [not found]     ` <20101129151719.GA30590@localhost>
     [not found]       ` <1291064013.32004.393.camel@laptop>
     [not found]         ` <20101130043735.GA22947@localhost>
     [not found]           ` <1291156522.32004.1359.camel@laptop>
     [not found]             ` <1291156765.32004.1365.camel@laptop>
     [not found]               ` <20101201133818.GA13377@localhost>
2010-12-01 23:03                 ` Andrew Morton
2010-12-02  1:56                   ` Wu Fengguang
2010-12-05 16:14                 ` Wu Fengguang
2010-12-06  2:42                   ` Ted Ts'o
2010-12-06  9:52                     ` Dmitry
2010-12-06 12:34                       ` Ted Ts'o
2010-11-17  4:27 ` [PATCH 02/13] writeback: consolidate variable names in balance_dirty_pages() Wu Fengguang
2010-11-17  4:27 ` [PATCH 03/13] writeback: per-task rate limit on balance_dirty_pages() Wu Fengguang
2010-11-17 14:39   ` Wu Fengguang
2010-11-24 10:23   ` Peter Zijlstra
2010-11-24 10:43     ` Wu Fengguang
2010-11-24 10:49       ` Peter Zijlstra
2010-11-17  4:27 ` [PATCH 04/13] writeback: prevent duplicate balance_dirty_pages_ratelimited() calls Wu Fengguang
2010-11-17  4:27 ` [PATCH 05/13] writeback: account per-bdi accumulated written pages Wu Fengguang
2010-11-24 10:26   ` Peter Zijlstra
2010-11-24 10:44     ` Wu Fengguang
2010-11-17  4:27 ` [PATCH 06/13] writeback: bdi write bandwidth estimation Wu Fengguang
2010-11-17 23:08   ` Andrew Morton
2010-11-17 23:24     ` Peter Zijlstra
2010-11-17 23:38       ` Andrew Morton
2010-11-17 23:43         ` Peter Zijlstra
2010-11-18  6:51     ` Wu Fengguang
2010-11-24 10:58   ` Peter Zijlstra
2010-11-24 14:06     ` Wu Fengguang
2010-11-24 11:05   ` Peter Zijlstra
2010-11-24 12:10     ` Wu Fengguang
2010-11-24 12:50       ` Peter Zijlstra
2010-11-24 13:14         ` Wu Fengguang
2010-11-24 13:20           ` Wu Fengguang
2010-11-24 13:42             ` Peter Zijlstra
2010-11-24 13:46               ` Wu Fengguang
2010-11-24 14:12                 ` Peter Zijlstra
2010-11-24 14:21                   ` Wu Fengguang
2010-11-24 14:31                     ` Peter Zijlstra
2010-11-24 14:38                       ` Wu Fengguang
2010-11-24 14:34                   ` Wu Fengguang
2010-11-17  4:27 ` [PATCH 07/13] writeback: show bdi write bandwidth in debugfs Wu Fengguang
2010-11-17  4:27 ` [PATCH 08/13] writeback: quit throttling when bdi dirty pages dropped low Wu Fengguang
2010-11-24 11:13   ` Peter Zijlstra
2010-11-24 12:30     ` Wu Fengguang
2010-11-24 12:46       ` Peter Zijlstra
2010-11-24 12:59         ` Wu Fengguang
2010-11-17  4:27 ` [PATCH 09/13] writeback: reduce per-bdi dirty threshold ramp up time Wu Fengguang
2010-11-24 11:15   ` Peter Zijlstra
2010-11-24 12:39     ` Wu Fengguang
2010-11-24 12:56       ` Peter Zijlstra
2010-11-17  4:27 ` [PATCH 10/13] writeback: make reasonable gap between the dirty/background thresholds Wu Fengguang
2010-11-24 11:18   ` Peter Zijlstra
2010-11-24 12:48     ` Wu Fengguang
2010-11-17  4:27 ` [PATCH 11/13] writeback: scale down max throttle bandwidth on concurrent dirtiers Wu Fengguang
2010-11-17  4:27 ` [PATCH 12/13] writeback: add trace event for balance_dirty_pages() Wu Fengguang
2010-11-17  4:41   ` Wu Fengguang
2010-11-17  4:27 ` [PATCH 13/13] writeback: make nr_to_write a per-file limit Wu Fengguang
2010-11-17 23:03 ` [PATCH 00/13] IO-less dirty throttling v2 Andrew Morton
2010-11-18  2:06   ` Dave Chinner
2010-11-18  2:09     ` Andrew Morton
2010-11-18  3:21       ` Dave Chinner
2010-11-18  3:34         ` Andrew Morton
2010-11-18  7:27           ` Dave Chinner [this message]
2010-11-18  7:33             ` Andrew Morton
2010-11-19  3:11               ` Dave Chinner
2010-11-24 11:12       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101118072706.GW13830@dastard \
    --to=david@fromorbit.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).