From: tytso@mit.edu
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
Dave Chinner <david@fromorbit.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Alexander Viro <viro@ftp.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Christoph Lameter <clameter@sgi.com>,
Rik van Riel <riel@redhat.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
akpm@linux-foundation.org, Nick Piggin <nickpiggin@yahoo.com.au>,
Hugh Dickins <hugh@veritas.com>,
linux-kernel@vger.kernel.org
Subject: Re: inodes: Support generic defragmentation
Date: Sat, 30 Jan 2010 14:26:23 -0500 [thread overview]
Message-ID: <20100130192623.GE788@thunk.org> (raw)
In-Reply-To: <20100129205004.405949705@quilx.com>
On Fri, Jan 29, 2010 at 02:49:42PM -0600, Christoph Lameter wrote:
> This implements the ability to remove inodes in a particular slab
> from inode caches. In order to remove an inode we may have to write out
> the pages of an inode, the inode itself and remove the dentries referring
> to the node.
How often is this going to happen? Removing an inode is an incredibly
expensive operation. We have to eject all of the pages from the page
cache, and if those pages are getting a huge amount of use --- say,
those corresponding to some shared library like libc --- or which have
a huge number of pages that are actively getting used, the thrashing
that is going to result is going to enormous.
There needs to be some kind of cost/benefit analysis done about
whether or not this is worth it. Does it make sense to potentially
force hundreds and hundreds of megaytes of pages to get thrashed in
and out just to recover a single 4k page? In some cases, maybe yes.
But in other cases, the results could be disastrous.
> +/*
> + * Generic callback function slab defrag ->kick methods. Takes the
> + * array with inodes where we obtained refcounts using fs_get_inodes()
> + * or get_inodes() and tries to free them.
> + */
> +void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
> +{
> + struct inode *inode;
> + int i;
> + int abort = 0;
> + LIST_HEAD(freeable);
> + int active;
> +
> + for (i = 0; i < nr; i++) {
> + inode = v[i];
> + if (!inode)
> + continue;
In some cases, it's going to be impossible to empty a particular slab
cache page. For example, there may be one inode which has pages
locked into memory, or which we may decide (once we add some
intelligence into this function) is really not worth ejecting. In
that case, there's no point dumping the rest of the inodes on that
particular slab page.
> + if (inode_has_buffers(inode) || inode->i_data.nrpages) {
> + if (remove_inode_buffers(inode))
> + /*
> + * Should we really be doing this? Or
> + * limit the writeback here to only a few pages?
> + *
> + * Possibly an expensive operation but we
> + * cannot reclaim the inode if the pages
> + * are still present.
> + */
> + invalidate_mapping_pages(&inode->i_data,
> + 0, -1);
> + }
I do not thing this function does what you think it does....
"invalidate_mapping_pages() will not block on I/O activity, and it
will refuse to invalidate pages which are dirty, locked, under
writeback, or mapped into page tables."
So you need to force the data to be written *first*, then get the
pages removed from the page table, and only then, call
invalidate_mapping_pages(). Otherwise, this is just going to
pointlessly drop pages from the page cache and trashing the page
cache's effectiveness, without actually making it possible to drop a
particular inode if it is being used at all by any process.
Presumably then the defrag code, since it was unable to free a
particular page, will go on pillaging and raping other inodes in the
inode cache, until it can actually create a hugepage. This is why you
really shouldn't start trying to trash an inode until you're
**really** sure it's possible completely evict a 4k slab page of all
of its inodes.
- Ted
next prev parent reply other threads:[~2010-01-30 23:02 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-29 20:49 Slab Fragmentation Reduction V15 Christoph Lameter
2010-01-29 20:49 ` slub: Add defrag_ratio field and sysfs support Christoph Lameter
2010-01-29 20:49 ` slub: Replace ctor field with ops field in /sys/slab/* Christoph Lameter
2010-01-29 20:49 ` slub: Add get() and kick() methods Christoph Lameter
2010-01-29 20:49 ` slub: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2010-01-29 20:49 ` slub: Slab defrag core Christoph Lameter
2010-01-29 20:49 ` slub: Add KICKABLE to avoid repeated kick() attempts Christoph Lameter
2010-01-29 20:49 ` slub: Extend slabinfo to support -D and -F options Christoph Lameter
2010-01-29 20:49 ` slub/slabinfo: add defrag statistics Christoph Lameter
2010-01-29 20:49 ` slub: Trigger defragmentation from memory reclaim Christoph Lameter
2010-01-29 20:49 ` buffer heads: Support slab defrag Christoph Lameter
2010-01-30 1:59 ` Dave Chinner
2010-02-01 6:39 ` Nick Piggin
2010-01-29 20:49 ` inodes: Support generic defragmentation Christoph Lameter
2010-01-30 2:43 ` Dave Chinner
2010-02-01 17:50 ` Christoph Lameter
2010-01-30 19:26 ` tytso [this message]
2010-01-31 8:34 ` Andi Kleen
2010-01-31 13:59 ` Dave Chinner
2010-02-03 15:31 ` Christoph Lameter
2010-02-04 0:34 ` Dave Chinner
2010-02-04 3:07 ` tytso
2010-02-04 3:39 ` Dave Chinner
2010-02-04 9:33 ` Nick Piggin
2010-02-04 17:13 ` Christoph Lameter
2010-02-08 7:37 ` Nick Piggin
2010-02-08 17:40 ` Christoph Lameter
2010-02-08 22:13 ` Dave Chinner
2010-02-04 16:59 ` Christoph Lameter
2010-02-06 0:39 ` Dave Chinner
2010-01-31 21:02 ` tytso
2010-02-01 10:17 ` Andi Kleen
2010-02-01 13:47 ` tytso
2010-02-01 13:54 ` Andi Kleen
2010-01-29 20:49 ` Filesystem: Ext2 filesystem defrag Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext3 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext4 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: XFS slab defragmentation Christoph Lameter
2010-01-29 20:49 ` Filesystems: /proc filesystem support for slab defrag Christoph Lameter
2010-01-29 20:49 ` dentries: dentry defragmentation Christoph Lameter
2010-01-29 22:00 ` Al Viro
2010-02-01 7:08 ` Nick Piggin
2010-02-01 10:10 ` Andi Kleen
2010-02-01 10:16 ` Nick Piggin
2010-02-01 10:22 ` Andi Kleen
2010-02-01 10:35 ` Nick Piggin
2010-02-01 10:45 ` Andi Kleen
2010-02-01 10:56 ` Nick Piggin
2010-02-01 13:25 ` Andi Kleen
2010-02-01 13:36 ` Nick Piggin
2010-01-29 20:49 ` slub defrag: Transition patch upstream -> -next Christoph Lameter
2010-01-30 8:54 ` Slab Fragmentation Reduction V15 Pekka Enberg
2010-01-30 10:48 ` Andi Kleen
2010-01-30 14:53 ` Rik van Riel
2010-02-01 17:53 ` Christoph Lameter
2010-02-01 17:52 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100130192623.GE788@thunk.org \
--to=tytso@mit.edu \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox