From: Nick Piggin <npiggin@suse.de>
To: Dave Chinner <david@fromorbit.com>
Cc: tytso@mit.edu, Christoph Lameter <cl@linux-foundation.org>,
Andi Kleen <andi@firstfloor.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Alexander Viro <viro@ftp.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Christoph Lameter <clameter@sgi.com>,
Rik van Riel <riel@redhat.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
akpm@linux-foundation.org, Nick Piggin <nickpiggin@yahoo.com.au>,
Hugh Dickins <hugh@veritas.com>,
linux-kernel@vger.kernel.org
Subject: Re: inodes: Support generic defragmentation
Date: Thu, 4 Feb 2010 20:33:50 +1100 [thread overview]
Message-ID: <20100204093350.GE13318@laptop> (raw)
In-Reply-To: <20100204033911.GE5332@discord.disaster>
On Thu, Feb 04, 2010 at 02:39:11PM +1100, Dave Chinner wrote:
> On Wed, Feb 03, 2010 at 10:07:36PM -0500, tytso@mit.edu wrote:
> > On Thu, Feb 04, 2010 at 11:34:10AM +1100, Dave Chinner wrote:
> > > What it comes down to is that the slab has two states for objects -
> > > allocated and free - but what we really need here is 3 states -
> > > allocated, unused and freed. We currently track unused objects
> > > outside the slab in LRU lists and, IMO, that is the source of our
> > > fragmentation problems because it has no knowledge of the spatial
> > > layout of the slabs and the state of other objects in the page.
> > >
> > > What I'm suggesting is that we ditch the external LRUs and track the
> > > "unused" state inside the slab and then use that knowledge to decide
> > > which pages to reclaim.
> >
> > Or maybe we need to have the way to track the LRU of the slab page as
> > a whole? Any time we touch an object on the slab page, we touch the
> > last updatedness of the slab as a hole.
>
> Yes, that's pretty much what I have been trying to describe. ;)
> (And, IIUC, what I think Nick has been trying to describe as well
> when he's been saying we should "turn reclaim upside down".)
Well what I described is to do the slab pinning from the reclaim path
(rather than from slab calling into the subsystem). All slab locking
basically "innermost", so you can pretty much poke the slab layer as
much as you like from the subsystem.
After that, LRU on slabs should be fairly easy. Slab could provide a
private per-slab pointer for example that is managed by the caller.
Subsystem can then call into slab to find the objects.
> It seems to me to be pretty simple to track, too, if we define pages
> for reclaim to only be those that are full of unused objects. i.e.
> the pages have the two states:
>
> - Active: some allocated and referenced object on the page
> => no need for LRU tracking of these
> - Unused: all allocated objects on the page are not used
> => these pages are LRU tracked within the slab
>
> A single referenced object is enough to change the state of the
> page from Unused to Active, and when page transitions from
> Active to Unused is goes on the MRU end of the LRU queue.
> Reclaim would then start with the oldest pages on the LRU....
>
> > It's actually more complicated than that, though. Even if no one has
> > touched a particular inode, if one of the inode in the slab page is
> > pinned down because it is in use,
>
> A single active object like this would the slab page Active, and
> therefore not a candidate for reclaim. Also, we already reclaim
> dentries before inodes because dentries pin inodes, so our
> algorithms for reclaim already deal with these ordering issues for
> us.
>
> ...
>
> > And of course, if the inode is pinned down because it is opened and/or
> > mmaped, then its associated dcache entry can't be freed either, so
> > there's no point trying to trash all of its sibling dentries on the
> > same page as that dcache entry.
>
> Agreed - that's why I think preventing fragemntation caused by LRU
> reclaim is best dealt with internally to slab where both object age
> and locality can be taken into account.
next prev parent reply other threads:[~2010-02-04 9:34 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-29 20:49 Slab Fragmentation Reduction V15 Christoph Lameter
2010-01-29 20:49 ` slub: Add defrag_ratio field and sysfs support Christoph Lameter
2010-01-29 20:49 ` slub: Replace ctor field with ops field in /sys/slab/* Christoph Lameter
2010-01-29 20:49 ` slub: Add get() and kick() methods Christoph Lameter
2010-01-29 20:49 ` slub: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2010-01-29 20:49 ` slub: Slab defrag core Christoph Lameter
2010-01-29 20:49 ` slub: Add KICKABLE to avoid repeated kick() attempts Christoph Lameter
2010-01-29 20:49 ` slub: Extend slabinfo to support -D and -F options Christoph Lameter
2010-01-29 20:49 ` slub/slabinfo: add defrag statistics Christoph Lameter
2010-01-29 20:49 ` slub: Trigger defragmentation from memory reclaim Christoph Lameter
2010-01-29 20:49 ` buffer heads: Support slab defrag Christoph Lameter
2010-01-30 1:59 ` Dave Chinner
2010-02-01 6:39 ` Nick Piggin
2010-01-29 20:49 ` inodes: Support generic defragmentation Christoph Lameter
2010-01-30 2:43 ` Dave Chinner
2010-02-01 17:50 ` Christoph Lameter
2010-01-30 19:26 ` tytso
2010-01-31 8:34 ` Andi Kleen
2010-01-31 13:59 ` Dave Chinner
2010-02-03 15:31 ` Christoph Lameter
2010-02-04 0:34 ` Dave Chinner
2010-02-04 3:07 ` tytso
2010-02-04 3:39 ` Dave Chinner
2010-02-04 9:33 ` Nick Piggin [this message]
2010-02-04 17:13 ` Christoph Lameter
2010-02-08 7:37 ` Nick Piggin
2010-02-08 17:40 ` Christoph Lameter
2010-02-08 22:13 ` Dave Chinner
2010-02-04 16:59 ` Christoph Lameter
2010-02-06 0:39 ` Dave Chinner
2010-01-31 21:02 ` tytso
2010-02-01 10:17 ` Andi Kleen
2010-02-01 13:47 ` tytso
2010-02-01 13:54 ` Andi Kleen
2010-01-29 20:49 ` Filesystem: Ext2 filesystem defrag Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext3 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext4 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: XFS slab defragmentation Christoph Lameter
2010-01-29 20:49 ` Filesystems: /proc filesystem support for slab defrag Christoph Lameter
2010-01-29 20:49 ` dentries: dentry defragmentation Christoph Lameter
2010-01-29 22:00 ` Al Viro
2010-02-01 7:08 ` Nick Piggin
2010-02-01 10:10 ` Andi Kleen
2010-02-01 10:16 ` Nick Piggin
2010-02-01 10:22 ` Andi Kleen
2010-02-01 10:35 ` Nick Piggin
2010-02-01 10:45 ` Andi Kleen
2010-02-01 10:56 ` Nick Piggin
2010-02-01 13:25 ` Andi Kleen
2010-02-01 13:36 ` Nick Piggin
2010-01-29 20:49 ` slub defrag: Transition patch upstream -> -next Christoph Lameter
2010-01-30 8:54 ` Slab Fragmentation Reduction V15 Pekka Enberg
2010-01-30 10:48 ` Andi Kleen
2010-01-30 14:53 ` Rik van Riel
2010-02-01 17:53 ` Christoph Lameter
2010-02-01 17:52 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100204093350.GE13318@laptop \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=tytso@mit.edu \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox