From: Nick Piggin <npiggin@suse.de>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
Dave Chinner <david@fromorbit.com>,
Christoph Lameter <clameter@sgi.com>,
Rik van Riel <riel@redhat.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
akpm@linux-foundation.org, Miklos Szeredi <miklos@szeredi.hu>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Hugh Dickins <hugh@veritas.com>,
linux-kernel@vger.kernel.org
Subject: Re: buffer heads: Support slab defrag
Date: Mon, 1 Feb 2010 17:39:03 +1100 [thread overview]
Message-ID: <20100201063903.GD9085@laptop> (raw)
In-Reply-To: <20100129205003.813495196@quilx.com>
On Fri, Jan 29, 2010 at 02:49:41PM -0600, Christoph Lameter wrote:
> Defragmentation support for buffer heads. We convert the references to
> buffers to struct page references and try to remove the buffers from
> those pages. If the pages are dirty then trigger writeout so that the
> buffer heads can be removed later.
>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
>
> ---
> fs/buffer.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 99 insertions(+)
>
> Index: slab-2.6/fs/buffer.c
> ===================================================================
> --- slab-2.6.orig/fs/buffer.c 2010-01-22 15:09:43.000000000 -0600
> +++ slab-2.6/fs/buffer.c 2010-01-22 16:17:27.000000000 -0600
> @@ -3352,6 +3352,104 @@ int bh_submit_read(struct buffer_head *b
> }
> EXPORT_SYMBOL(bh_submit_read);
>
> +/*
> + * Writeback a page to clean the dirty state
> + */
> +static void trigger_write(struct page *page)
> +{
> + struct address_space *mapping = page_mapping(page);
> + int rc;
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_NONE,
> + .nr_to_write = 1,
> + .range_start = 0,
> + .range_end = LLONG_MAX,
> + .nonblocking = 1,
> + .for_reclaim = 0
> + };
> +
> + if (!mapping->a_ops->writepage)
> + /* No write method for the address space */
> + return;
> +
> + if (!clear_page_dirty_for_io(page))
> + /* Someone else already triggered a write */
> + return;
> +
> + rc = mapping->a_ops->writepage(page, &wbc);
> + if (rc < 0)
> + /* I/O Error writing */
> + return;
> +
> + if (rc == AOP_WRITEPAGE_ACTIVATE)
> + unlock_page(page);
> +}
> +
> +/*
> + * Get references on buffers.
> + *
> + * We obtain references on the page that uses the buffer. v[i] will point to
> + * the corresponding page after get_buffers() is through.
> + *
> + * We are safe from the underlying page being removed simply by doing
> + * a get_page_unless_zero. The buffer head removal may race at will.
> + * try_to_free_buffes will later take appropriate locks to remove the
> + * buffers if they are still there.
> + */
> +static void *get_buffers(struct kmem_cache *s, int nr, void **v)
> +{
> + struct page *page;
> + struct buffer_head *bh;
> + int i, j;
> + int n = 0;
> +
> + for (i = 0; i < nr; i++) {
> + bh = v[i];
> + v[i] = NULL;
> +
> + page = bh->b_page;
> +
> + if (page && PagePrivate(page)) {
> + for (j = 0; j < n; j++)
> + if (page == v[j])
> + continue;
> + }
> +
> + if (get_page_unless_zero(page))
> + v[n++] = page;
This seems wrong to me. The page can have been reused at this
stage.
You technically can't re-check using page->private because that
can be anything and doesn't actually need to be a pointer. You
could re-check bh->b_page, provided that you ensure it is always
cleared before a page is detached, and the correct barriers are
in place.
> + }
> + return NULL;
> +}
> +
> +/*
> + * Despite its name: kick_buffers operates on a list of pointers to
> + * page structs that was set up by get_buffer().
> + */
> +static void kick_buffers(struct kmem_cache *s, int nr, void **v,
> + void *private)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < nr; i++) {
> + page = v[i];
> +
> + if (!page || PageWriteback(page))
> + continue;
> +
> + if (trylock_page(page)) {
> + if (PageDirty(page))
> + trigger_write(page);
> + else {
> + if (PagePrivate(page))
> + try_to_free_buffers(page);
> + unlock_page(page);
PagePrivate doesn't necessarily mean it has buffers. try_to_release_page
should be a better idea.
> + }
> + }
> + put_page(page);
> + }
> +}
> +
> static void
> init_buffer_head(void *data)
> {
> @@ -3370,6 +3468,7 @@ void __init buffer_init(void)
> (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
> SLAB_MEM_SPREAD),
> init_buffer_head);
> + kmem_cache_setup_defrag(bh_cachep, get_buffers, kick_buffers);
>
> /*
> * Limit the bh occupancy to 10% of ZONE_NORMAL
Buffer heads and buffer head refcounting really stinks badly. Although
I can see the need for a medium term solution until fsblock / some
actual sane refcounting.
next prev parent reply other threads:[~2010-02-01 8:30 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-29 20:49 Slab Fragmentation Reduction V15 Christoph Lameter
2010-01-29 20:49 ` slub: Add defrag_ratio field and sysfs support Christoph Lameter
2010-01-29 20:49 ` slub: Replace ctor field with ops field in /sys/slab/* Christoph Lameter
2010-01-29 20:49 ` slub: Add get() and kick() methods Christoph Lameter
2010-01-29 20:49 ` slub: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2010-01-29 20:49 ` slub: Slab defrag core Christoph Lameter
2010-01-29 20:49 ` slub: Add KICKABLE to avoid repeated kick() attempts Christoph Lameter
2010-01-29 20:49 ` slub: Extend slabinfo to support -D and -F options Christoph Lameter
2010-01-29 20:49 ` slub/slabinfo: add defrag statistics Christoph Lameter
2010-01-29 20:49 ` slub: Trigger defragmentation from memory reclaim Christoph Lameter
2010-01-29 20:49 ` buffer heads: Support slab defrag Christoph Lameter
2010-01-30 1:59 ` Dave Chinner
2010-02-01 6:39 ` Nick Piggin [this message]
2010-01-29 20:49 ` inodes: Support generic defragmentation Christoph Lameter
2010-01-30 2:43 ` Dave Chinner
2010-02-01 17:50 ` Christoph Lameter
2010-01-30 19:26 ` tytso
2010-01-31 8:34 ` Andi Kleen
2010-01-31 13:59 ` Dave Chinner
2010-02-03 15:31 ` Christoph Lameter
2010-02-04 0:34 ` Dave Chinner
2010-02-04 3:07 ` tytso
2010-02-04 3:39 ` Dave Chinner
2010-02-04 9:33 ` Nick Piggin
2010-02-04 17:13 ` Christoph Lameter
2010-02-08 7:37 ` Nick Piggin
2010-02-08 17:40 ` Christoph Lameter
2010-02-08 22:13 ` Dave Chinner
2010-02-04 16:59 ` Christoph Lameter
2010-02-06 0:39 ` Dave Chinner
2010-01-31 21:02 ` tytso
2010-02-01 10:17 ` Andi Kleen
2010-02-01 13:47 ` tytso
2010-02-01 13:54 ` Andi Kleen
2010-01-29 20:49 ` Filesystem: Ext2 filesystem defrag Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext3 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext4 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: XFS slab defragmentation Christoph Lameter
2010-01-29 20:49 ` Filesystems: /proc filesystem support for slab defrag Christoph Lameter
2010-01-29 20:49 ` dentries: dentry defragmentation Christoph Lameter
2010-01-29 22:00 ` Al Viro
2010-02-01 7:08 ` Nick Piggin
2010-02-01 10:10 ` Andi Kleen
2010-02-01 10:16 ` Nick Piggin
2010-02-01 10:22 ` Andi Kleen
2010-02-01 10:35 ` Nick Piggin
2010-02-01 10:45 ` Andi Kleen
2010-02-01 10:56 ` Nick Piggin
2010-02-01 13:25 ` Andi Kleen
2010-02-01 13:36 ` Nick Piggin
2010-01-29 20:49 ` slub defrag: Transition patch upstream -> -next Christoph Lameter
2010-01-30 8:54 ` Slab Fragmentation Reduction V15 Pekka Enberg
2010-01-30 10:48 ` Andi Kleen
2010-01-30 14:53 ` Rik van Riel
2010-02-01 17:53 ` Christoph Lameter
2010-02-01 17:52 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100201063903.GD9085@laptop \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=cl@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=david@fromorbit.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox