linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ethan Solomita <solo@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Christoph Lameter <clameter@sgi.com>
Subject: Re: [PATCH 0/6] cpuset aware writeback
Date: Tue, 11 Sep 2007 18:32:33 -0700	[thread overview]
Message-ID: <46E741B1.4030100@google.com> (raw)
In-Reply-To: <469D3342.3080405@google.com>

Perform writeback and dirty throttling with awareness of cpuset mem_allowed.

The theory of operation has two primary elements:

1. Add a nodemask per mapping which indicates the nodes
   which have set PageDirty on any page of the mappings.

2. Add a nodemask argument to wakeup_pdflush() which is
   propagated down to sync_sb_inodes.

This leaves sync_sb_inodes() with two nodemasks. One is passed to it and
specifies the nodes the caller is interested in syncing, and will either
be null (i.e. all nodes) or will be cpuset_current_mems_allowed in the
caller's context.

The second nodemask is attached to the inode's mapping and shows who has
modified data in the inode. sync_sb_inodes() will then skip syncing of
inodes if the nodemask argument does not intersect with the mapping
nodemask.

cpuset_current_mems_allowed will be passed in to pdflush
background_writeout by try_to_free_pages and balance_dirty_pages.
balance_dirty_pages also passes the nodemask in to writeback_inodes
directly when doing active reclaim.

Other callers do not limit inode writeback, passing in a NULL nodemask
pointer.

A final change is to get_dirty_limits. It takes a nodemask argument, and
when it is null there is no change in behavior. If the nodemask is set,
page statistics are accumulated only for specified nodes, and the
background and throttle dirty ratios will be read from a new per-cpuset
ratio feature.

For testing I did a variety of basic tests, verifying individual
features of the test. To verify that it fixes the core problem, I
created a stress test which involved using cpusets and mems_allowed
to split memory so that all daemons had memory set aside for them, and
my memory stress test had a separate set of memory. The stress test was
mmaping 7GB of a very large file on disk. It then scans the entire 7GB
of memory reading and modifying each byte. 7GB is more than the amount
of physical memory made available to the stress test.

Using iostat I can see the initial period of reading from disk, followed
by a period of simultaneous reads and writes as dirty bytes are pushed
to make room for new reads.

In a separate log-in, in the other cpuset, I am running:

while `true`; do date | tee -a date.txt; sleep 5; done

date.txt resides on the same disk as the large file mentioned above. The
above while-loop serves the dual purpose of providing me visual clues of
progress along with the opportunity for the "tee" command to become
throttled writing to the disk.

The effect of this patchset is straightforward. Without it there are
long hangs between appearances of the date. With it the dates are all 5
(or sometimes 6) seconds apart.

I also added printks to the kernel to verify that, without these
patches, the tee was being throttled (along with lots of other things),
and with the patch only pdflush is being throttled.

These patches are mostly unchanged from Chris Lameter's original
changelist posted previously to linux-mm.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-09-12  1:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-17 21:23 [PATCH 0/6] cpuset aware writeback Ethan Solomita
2007-07-17 21:32 ` [PATCH 1/6] cpuset write dirty map Ethan Solomita
2007-07-17 21:33 ` [PATCH 2/6] cpuset write pdflush nodemask Ethan Solomita
2007-07-17 21:34 ` [PATCH 3/6] cpuset write throttle Ethan Solomita
2007-07-17 21:35 ` [PATCH 4/6] cpuset write vmscan Ethan Solomita
2007-07-17 21:36 ` [PATCH 5/6] cpuset write vm writeout Ethan Solomita
2007-07-17 21:37 ` [PATCH 6/6] cpuset dirty limits Ethan Solomita
2007-07-23 20:18 ` [PATCH 0/6] cpuset aware writeback Christoph Lameter
2007-07-23 21:30   ` Ethan Solomita
2007-07-23 21:53     ` Christoph Lameter
2007-09-12  1:32 ` Ethan Solomita [this message]
2007-09-12  1:36   ` [PATCH 1/6] cpuset write dirty map Ethan Solomita
2007-09-14 23:15     ` Andrew Morton
2007-09-14 23:47       ` Satyam Sharma
2007-09-15  0:07         ` Andrew Morton
2007-09-15  0:16           ` Satyam Sharma
2007-09-17 18:37             ` Mike Travis
2007-09-17 19:10       ` Christoph Lameter
2007-09-19  0:51       ` Ethan Solomita
2007-09-19  2:14         ` Andrew Morton
2007-09-19 17:08           ` Christoph Lameter
2007-09-19 17:06         ` Christoph Lameter
2007-09-12  1:38   ` [PATCH 2/6] cpuset write pdflush nodemask Ethan Solomita
2007-09-12  1:39   ` [PATCH 3/6] cpuset write throttle Ethan Solomita
     [not found]     ` <20070914161517.5ea3847f.akpm@linux-foundation.org>
2007-10-03  0:38       ` Ethan Solomita
2007-10-03 17:46         ` Christoph Lameter
2007-10-03 20:46           ` Ethan Solomita
2007-10-04  3:56             ` Christoph Lameter
2007-10-04  7:37               ` Peter Zijlstra
2007-10-04  7:56                 ` Paul Jackson
2007-10-04  8:15                   ` Peter Zijlstra
2007-10-04  8:25                     ` Peter Zijlstra
2007-10-04  9:06                       ` Paul Jackson
2007-10-04  9:04                     ` Paul Jackson
2007-10-05 19:34                 ` Ethan Solomita
2007-09-12  1:40   ` [PATCH 4/6] cpuset write vmscan Ethan Solomita
2007-09-12  1:41   ` [PATCH 5/6] cpuset write vm writeout Ethan Solomita
2007-09-12  1:42   ` [PATCH 6/6] cpuset dirty limits Ethan Solomita
2007-09-14 23:15     ` Andrew Morton
2007-09-17 19:00       ` Christoph Lameter
2007-09-19  0:23         ` Ethan Solomita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46E741B1.4030100@google.com \
    --to=solo@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).