From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@zip.com.au>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [BK PATCH 2.5] Introduce 64-bit versions of PAGE_{CACHE_,}{MASK,ALIGN}
Date: Tue, 30 Jul 2002 00:18:52 +0200 [thread overview]
Message-ID: <20020729221852.GI1201@dualathlon.random> (raw)
In-Reply-To: <3D45B79F.D228226@zip.com.au>
On Mon, Jul 29, 2002 at 02:46:07PM -0700, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> >
> > On Mon, Jul 29, 2002 at 02:01:15PM -0700, Andrew Morton wrote:
> > > Andrea Arcangeli wrote:
> > > >
> > > > On Sun, Jul 28, 2002 at 07:05:19PM -0700, Andrew Morton wrote:
> > > > > But yes, all of this is a straight speed/space tradeoff. Probably
> > > > > some of it should be ifdeffed.
> > > >
> > > > I would say so. recalculating page_address in cpu core with no cacheline
> > > > access is one thing, deriving the index is a different thing.
> > > >
> > > > > The cost of the tree walk doesn't worry me much - generally we
> > > > > walk the tree with good locality of reference, so most everything is
> > > > > in cache anyway.
> > > >
> > > > well, the rbtree showedup heavily when it started growing more than a
> > > > few steps, it has less locality of reference though.
> > > >
> > > > > Good luck setting up a testcase which does this ;)
> > > >
> > > > a gigabit will trigger it in a millisecond. of course nobody tested it
> > > > either I guess (I guess not many people tested the 800Gbyte offset
> > > > either in the first place).
> > >
> > > There's still the mempool.
> >
> > that's hiding the problem at the moment, it's global, it doesn't provide
> > any real guarantee.
>
> Sizing the mempool to max_cpus * max tree depth provides a guarantee,
> provided you take care of context switches, which is pretty easy.
I guess I still prefer the GFP_KERNEL fallback because it avoids to
waste/reserve lots of ram, but I only care about correctness, the
current code isn't correct, doing max_cpus * max tree depth would
satisfy me completely too (saving ram is a lower prio), so it's up to
you as far as it cannot fail unless it's truly oom (i.e. you need a
GFP_KERNEL in your way).
>
> > ...
> >
> > so it's not too bad in terms of stack because there's not going to be
> > more than one walk at time, thanks for doing the math btw. You'd
> > basically need a second radix tree for the dirty pages (using the same
> > radix tree is not an option because it would increase pdflush complexity
> > too much with terabytes of clean pages in the tree).
>
> Not sure. If each ratnode has a 64-bit bitmap which represents
> dirty pages if it's a leaf node, or nodes which have dirty pages
> if it's a higher node then the "find the next 16 dirty pages above index
> N" is a pretty efficient thing.
You will have """only""" 18 layers, but scanning through 2**(6*18)
entries will take too long time even if only entry takes 1 nanosecond to
scan. Of course that's the extreme case, but still it should be too much
in practice. I doubt you can avoid at least an additional infrastructure
that tells you if any of the underlying ratnodes has any dirty page,
which will probably save ram at least because it can be coded as a
bitflag in each node, but that will force you an up-walk of the tree
every time you mark a page dirty (but of course also a second tree would
force you to do some tree every time you mark a page dirty/clean). The
second tree probably allows you not to go into the radix-tree
implementation details to provide the "underlying node dirty page" info,
and it would be faster if for example only the start of the inode has
dirty pages, that would allow the dirty page flushing to walk only a few
levels instead of potential 18 of them even to reach the first few
pages. But I don't think it's a common case, so probably the best
(but not simpler) approch is to mark each ratnode with a dirty
cumulative information.
Andrea
next prev parent reply other threads:[~2002-07-29 22:14 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-27 13:41 [BK PATCH 2.5] Introduce 64-bit versions of PAGE_{CACHE_,}{MASK,ALIGN} Anton Altaparmakov
2002-07-27 17:23 ` Andrew Morton
2002-07-28 17:53 ` Eric W. Biederman
2002-07-28 18:54 ` Anton Altaparmakov
2002-07-28 20:12 ` Eric W. Biederman
2002-07-28 23:26 ` Linus Torvalds
2002-07-29 0:10 ` Andrew Morton
2002-07-29 0:43 ` William Lee Irwin III
2002-07-29 0:56 ` Andrea Arcangeli
2002-07-29 1:04 ` William Lee Irwin III
2002-07-29 1:09 ` Rik van Riel
2002-07-29 2:14 ` Andrew Morton
2002-07-29 2:11 ` William Lee Irwin III
2002-07-29 2:18 ` Rik van Riel
2002-07-29 0:49 ` Andrea Arcangeli
2002-07-29 2:05 ` Andrew Morton
2002-07-29 2:09 ` William Lee Irwin III
2002-07-29 20:52 ` Andrea Arcangeli
2002-07-29 21:01 ` Andrew Morton
2002-07-29 21:31 ` Andrea Arcangeli
2002-07-29 21:46 ` Andrew Morton
2002-07-29 22:18 ` Andrea Arcangeli [this message]
2002-07-29 0:56 ` William Lee Irwin III
2002-07-29 1:36 ` Andrew Morton
2002-07-29 1:37 ` William Lee Irwin III
2002-07-29 9:27 ` Russell King
2002-07-29 18:32 ` Andrew Morton
[not found] <5.1.0.14.2.20020728193528.04336a80@pop.cus.cam.ac.uk.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.44.0207281622350.8208-100000@home.transmeta.com.suse.lists.linux.kernel>
[not found] ` <3D448808.CF8D18BA@zip.com.au.suse.lists.linux.kernel>
[not found] ` <20020729004942.GL1201@dualathlon.random.suse.lists.linux.kernel>
[not found] ` <3D44A2DF.F751B564@zip.com.au.suse.lists.linux.kernel>
[not found] ` <20020729205211.GB1201@dualathlon.random.suse.lists.linux.kernel>
2002-07-30 13:44 ` Andi Kleen
2002-07-30 14:06 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020729221852.GI1201@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@zip.com.au \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox