From: Andrew Morton <akpm@linux-foundation.org>
To: Ethan Solomita <solo@google.com>
Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Christoph Lameter <clameter@sgi.com>
Subject: Re: [PATCH 1/6] cpuset write dirty map
Date: Tue, 18 Sep 2007 19:14:05 -0700 [thread overview]
Message-ID: <20070918191405.d9b43470.akpm@linux-foundation.org> (raw)
In-Reply-To: <46F072A5.8060008@google.com>
On Tue, 18 Sep 2007 17:51:49 -0700 Ethan Solomita <solo@google.com> wrote:
> >
> >> +void cpuset_update_dirty_nodes(struct address_space *mapping,
> >> + struct page *page)
> >> +{
> >> + nodemask_t *nodes = mapping->dirty_nodes;
> >> + int node = page_to_nid(page);
> >> +
> >> + if (!nodes) {
> >> + nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC);
> >
> > Does it have to be atomic? atomic is weak and can fail.
> >
> > If some callers can do GFP_KERNEL and some can only do GFP_ATOMIC then we
> > should at least pass the gfp_t into this function so it can do the stronger
> > allocation when possible.
>
> I was going to say that sanity would be improved by just allocing the
> nodemask at inode alloc time. A failure here could be a problem because
> below cpuset_intersects_dirty_nodes() assumes that a NULL nodemask
> pointer means that there are no dirty nodes, thus preventing dirty pages
> from getting written to disk. i.e. This must never fail.
>
> Given that we allocate it always at the beginning, I'm leaning towards
> just allocating it within mapping no matter its size. It will make the
> code much much simpler, and save me writing all the comments we've been
> discussing. 8-)
>
> How disastrous would this be? Is the need to support a 1024 node system
> with 1,000,000 open mostly-read-only files thus needing to spend 120MB
> of extra memory on my nodemasks a real scenario and a showstopper?
None of this is very nice. Yes, it would be good to save all that memory
and yes, I_DIRTY_PAGES inodes are very much the uncommon case.
But if a failed GFP_ATOMIC allocation results in data loss then that's a
showstopper.
How hard would it be to handle the allocation failure in a more friendly
manner? Say, if the allocation failed then point mapping->dirty_nodes at
some global all-ones nodemask, and then special-case that nodemask in the
freeing code?
> >
> >
> >> + if (!nodes)
> >> + return;
> >> +
> >> + *nodes = NODE_MASK_NONE;
> >> + mapping->dirty_nodes = nodes;
> >> + }
> >> +
> >> + if (!node_isset(node, *nodes))
> >> + node_set(node, *nodes);
> >> +}
> >> +
> >> +void cpuset_clear_dirty_nodes(struct address_space *mapping)
> >> +{
> >> + nodemask_t *nodes = mapping->dirty_nodes;
> >> +
> >> + if (nodes) {
> >> + mapping->dirty_nodes = NULL;
> >> + kfree(nodes);
> >> + }
> >> +}
> >
> > Can this race with cpuset_update_dirty_nodes()? And with itself? If not,
> > a comment which describes the locking requirements would be good.
>
> I'll add a comment. Such a race should not be possible. It is called
> only from clear_inode() which is used when the inode is being freed
> "with extreme prejudice" (from its comments). I can add a check that
> i_state I_FREEING is set. Would that do?
Sounds sane.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Ethan Solomita <solo@google.com>
Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
Christoph Lameter <clameter@sgi.com>
Subject: Re: [PATCH 1/6] cpuset write dirty map
Date: Tue, 18 Sep 2007 19:14:05 -0700 [thread overview]
Message-ID: <20070918191405.d9b43470.akpm@linux-foundation.org> (raw)
In-Reply-To: <46F072A5.8060008@google.com>
On Tue, 18 Sep 2007 17:51:49 -0700 Ethan Solomita <solo@google.com> wrote:
> >
> >> +void cpuset_update_dirty_nodes(struct address_space *mapping,
> >> + struct page *page)
> >> +{
> >> + nodemask_t *nodes = mapping->dirty_nodes;
> >> + int node = page_to_nid(page);
> >> +
> >> + if (!nodes) {
> >> + nodes = kmalloc(sizeof(nodemask_t), GFP_ATOMIC);
> >
> > Does it have to be atomic? atomic is weak and can fail.
> >
> > If some callers can do GFP_KERNEL and some can only do GFP_ATOMIC then we
> > should at least pass the gfp_t into this function so it can do the stronger
> > allocation when possible.
>
> I was going to say that sanity would be improved by just allocing the
> nodemask at inode alloc time. A failure here could be a problem because
> below cpuset_intersects_dirty_nodes() assumes that a NULL nodemask
> pointer means that there are no dirty nodes, thus preventing dirty pages
> from getting written to disk. i.e. This must never fail.
>
> Given that we allocate it always at the beginning, I'm leaning towards
> just allocating it within mapping no matter its size. It will make the
> code much much simpler, and save me writing all the comments we've been
> discussing. 8-)
>
> How disastrous would this be? Is the need to support a 1024 node system
> with 1,000,000 open mostly-read-only files thus needing to spend 120MB
> of extra memory on my nodemasks a real scenario and a showstopper?
None of this is very nice. Yes, it would be good to save all that memory
and yes, I_DIRTY_PAGES inodes are very much the uncommon case.
But if a failed GFP_ATOMIC allocation results in data loss then that's a
showstopper.
How hard would it be to handle the allocation failure in a more friendly
manner? Say, if the allocation failed then point mapping->dirty_nodes at
some global all-ones nodemask, and then special-case that nodemask in the
freeing code?
> >
> >
> >> + if (!nodes)
> >> + return;
> >> +
> >> + *nodes = NODE_MASK_NONE;
> >> + mapping->dirty_nodes = nodes;
> >> + }
> >> +
> >> + if (!node_isset(node, *nodes))
> >> + node_set(node, *nodes);
> >> +}
> >> +
> >> +void cpuset_clear_dirty_nodes(struct address_space *mapping)
> >> +{
> >> + nodemask_t *nodes = mapping->dirty_nodes;
> >> +
> >> + if (nodes) {
> >> + mapping->dirty_nodes = NULL;
> >> + kfree(nodes);
> >> + }
> >> +}
> >
> > Can this race with cpuset_update_dirty_nodes()? And with itself? If not,
> > a comment which describes the locking requirements would be good.
>
> I'll add a comment. Such a race should not be possible. It is called
> only from clear_inode() which is used when the inode is being freed
> "with extreme prejudice" (from its comments). I can add a check that
> i_state I_FREEING is set. Would that do?
Sounds sane.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-09-19 2:14 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-17 21:23 [PATCH 0/6] cpuset aware writeback Ethan Solomita
2007-07-17 21:23 ` Ethan Solomita
2007-07-17 21:32 ` [PATCH 1/6] cpuset write dirty map Ethan Solomita
2007-07-17 21:32 ` Ethan Solomita
2007-07-17 21:33 ` [PATCH 2/6] cpuset write pdflush nodemask Ethan Solomita
2007-07-17 21:33 ` Ethan Solomita
2007-07-17 21:34 ` [PATCH 3/6] cpuset write throttle Ethan Solomita
2007-07-17 21:34 ` Ethan Solomita
2007-07-17 21:35 ` [PATCH 4/6] cpuset write vmscan Ethan Solomita
2007-07-17 21:35 ` Ethan Solomita
2007-07-17 21:36 ` [PATCH 5/6] cpuset write vm writeout Ethan Solomita
2007-07-17 21:36 ` Ethan Solomita
2007-07-17 21:37 ` [PATCH 6/6] cpuset dirty limits Ethan Solomita
2007-07-17 21:37 ` Ethan Solomita
2007-07-23 20:18 ` [PATCH 0/6] cpuset aware writeback Christoph Lameter
2007-07-23 20:18 ` Christoph Lameter
2007-07-23 21:30 ` Ethan Solomita
2007-07-23 21:30 ` Ethan Solomita
2007-07-23 21:53 ` Christoph Lameter
2007-07-23 21:53 ` Christoph Lameter
2007-09-12 1:32 ` Ethan Solomita
2007-09-12 1:32 ` Ethan Solomita
2007-09-12 1:36 ` [PATCH 1/6] cpuset write dirty map Ethan Solomita
2007-09-14 23:15 ` Andrew Morton
2007-09-14 23:15 ` Andrew Morton
2007-09-14 23:47 ` Satyam Sharma
2007-09-14 23:47 ` Satyam Sharma
2007-09-15 0:07 ` Andrew Morton
2007-09-15 0:07 ` Andrew Morton
2007-09-15 0:16 ` Satyam Sharma
2007-09-15 0:16 ` Satyam Sharma
2007-09-17 18:37 ` Mike Travis
2007-09-17 18:37 ` Mike Travis
2007-09-17 19:10 ` Christoph Lameter
2007-09-17 19:10 ` Christoph Lameter
2007-09-19 0:51 ` Ethan Solomita
2007-09-19 0:51 ` Ethan Solomita
2007-09-19 2:14 ` Andrew Morton [this message]
2007-09-19 2:14 ` Andrew Morton
2007-09-19 17:08 ` Christoph Lameter
2007-09-19 17:08 ` Christoph Lameter
2007-09-19 17:06 ` Christoph Lameter
2007-09-19 17:06 ` Christoph Lameter
2007-09-12 1:38 ` [PATCH 2/6] cpuset write pdflush nodemask Ethan Solomita
2007-09-12 1:38 ` Ethan Solomita
2007-09-12 1:39 ` [PATCH 3/6] cpuset write throttle Ethan Solomita
2007-09-12 1:39 ` Ethan Solomita
[not found] ` <20070914161517.5ea3847f.akpm@linux-foundation.org>
2007-10-03 0:38 ` Ethan Solomita
2007-10-03 17:46 ` Christoph Lameter
2007-10-03 20:46 ` Ethan Solomita
2007-10-04 3:56 ` Christoph Lameter
2007-10-04 7:37 ` Peter Zijlstra
2007-10-04 7:56 ` Paul Jackson
2007-10-04 8:15 ` Peter Zijlstra
2007-10-04 8:25 ` Peter Zijlstra
2007-10-04 9:06 ` Paul Jackson
2007-10-04 9:04 ` Paul Jackson
2007-10-05 19:34 ` Ethan Solomita
2007-09-12 1:40 ` [PATCH 4/6] cpuset write vmscan Ethan Solomita
2007-09-12 1:40 ` Ethan Solomita
2007-09-12 1:41 ` [PATCH 5/6] cpuset write vm writeout Ethan Solomita
2007-09-12 1:41 ` Ethan Solomita
2007-09-12 1:42 ` [PATCH 6/6] cpuset dirty limits Ethan Solomita
2007-09-12 1:42 ` Ethan Solomita
2007-09-14 23:15 ` Andrew Morton
2007-09-14 23:15 ` Andrew Morton
2007-09-17 19:00 ` Christoph Lameter
2007-09-17 19:00 ` Christoph Lameter
2007-09-19 0:23 ` Ethan Solomita
2007-09-19 0:23 ` Ethan Solomita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070918191405.d9b43470.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=solo@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.