public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Lameter <clameter@sgi.com>,
	Rik van Riel <riel@redhat.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	akpm@linux-foundation.org, Miklos Szeredi <miklos@szeredi.hu>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Hugh Dickins <hugh@veritas.com>,
	linux-kernel@vger.kernel.org
Subject: Re: buffer heads: Support slab defrag
Date: Mon, 1 Feb 2010 17:39:03 +1100	[thread overview]
Message-ID: <20100201063903.GD9085@laptop> (raw)
In-Reply-To: <20100129205003.813495196@quilx.com>

On Fri, Jan 29, 2010 at 02:49:41PM -0600, Christoph Lameter wrote:
> Defragmentation support for buffer heads. We convert the references to
> buffers to struct page references and try to remove the buffers from
> those pages. If the pages are dirty then trigger writeout so that the
> buffer heads can be removed later.
> 
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
> 
> ---
>  fs/buffer.c |   99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 99 insertions(+)
> 
> Index: slab-2.6/fs/buffer.c
> ===================================================================
> --- slab-2.6.orig/fs/buffer.c	2010-01-22 15:09:43.000000000 -0600
> +++ slab-2.6/fs/buffer.c	2010-01-22 16:17:27.000000000 -0600
> @@ -3352,6 +3352,104 @@ int bh_submit_read(struct buffer_head *b
>  }
>  EXPORT_SYMBOL(bh_submit_read);
>  
> +/*
> + * Writeback a page to clean the dirty state
> + */
> +static void trigger_write(struct page *page)
> +{
> +	struct address_space *mapping = page_mapping(page);
> +	int rc;
> +	struct writeback_control wbc = {
> +		.sync_mode = WB_SYNC_NONE,
> +		.nr_to_write = 1,
> +		.range_start = 0,
> +		.range_end = LLONG_MAX,
> +		.nonblocking = 1,
> +		.for_reclaim = 0
> +	};
> +
> +	if (!mapping->a_ops->writepage)
> +		/* No write method for the address space */
> +		return;
> +
> +	if (!clear_page_dirty_for_io(page))
> +		/* Someone else already triggered a write */
> +		return;
> +
> +	rc = mapping->a_ops->writepage(page, &wbc);
> +	if (rc < 0)
> +		/* I/O Error writing */
> +		return;
> +
> +	if (rc == AOP_WRITEPAGE_ACTIVATE)
> +		unlock_page(page);
> +}
> +
> +/*
> + * Get references on buffers.
> + *
> + * We obtain references on the page that uses the buffer. v[i] will point to
> + * the corresponding page after get_buffers() is through.
> + *
> + * We are safe from the underlying page being removed simply by doing
> + * a get_page_unless_zero. The buffer head removal may race at will.
> + * try_to_free_buffes will later take appropriate locks to remove the
> + * buffers if they are still there.
> + */
> +static void *get_buffers(struct kmem_cache *s, int nr, void **v)
> +{
> +	struct page *page;
> +	struct buffer_head *bh;
> +	int i, j;
> +	int n = 0;
> +
> +	for (i = 0; i < nr; i++) {
> +		bh = v[i];
> +		v[i] = NULL;
> +
> +		page = bh->b_page;
> +
> +		if (page && PagePrivate(page)) {
> +			for (j = 0; j < n; j++)
> +				if (page == v[j])
> +					continue;
> +		}
> +
> +		if (get_page_unless_zero(page))
> +			v[n++] = page;

This seems wrong to me. The page can have been reused at this
stage.

You technically can't re-check using page->private because that
can be anything and doesn't actually need to be a pointer. You
could re-check bh->b_page, provided that you ensure it is always
cleared before a page is detached, and the correct barriers are
in place.


> +	}
> +	return NULL;
> +}
> +
> +/*
> + * Despite its name: kick_buffers operates on a list of pointers to
> + * page structs that was set up by get_buffer().
> + */
> +static void kick_buffers(struct kmem_cache *s, int nr, void **v,
> +							void *private)
> +{
> +	struct page *page;
> +	int i;
> +
> +	for (i = 0; i < nr; i++) {
> +		page = v[i];
> +
> +		if (!page || PageWriteback(page))
> +			continue;
> +
> +		if (trylock_page(page)) {
> +			if (PageDirty(page))
> +				trigger_write(page);
> +			else {
> +				if (PagePrivate(page))
> +					try_to_free_buffers(page);
> +				unlock_page(page);

PagePrivate doesn't necessarily mean it has buffers. try_to_release_page
should be a better idea.

> +			}
> +		}
> +		put_page(page);
> +	}
> +}
> +
>  static void
>  init_buffer_head(void *data)
>  {
> @@ -3370,6 +3468,7 @@ void __init buffer_init(void)
>  				(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
>  				SLAB_MEM_SPREAD),
>  				init_buffer_head);
> +	kmem_cache_setup_defrag(bh_cachep, get_buffers, kick_buffers);
>  
>  	/*
>  	 * Limit the bh occupancy to 10% of ZONE_NORMAL


Buffer heads and buffer head refcounting really stinks badly. Although
I can see the need for a medium term solution until fsblock / some
actual sane refcounting.


  parent reply	other threads:[~2010-02-01  8:30 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-29 20:49 Slab Fragmentation Reduction V15 Christoph Lameter
2010-01-29 20:49 ` slub: Add defrag_ratio field and sysfs support Christoph Lameter
2010-01-29 20:49 ` slub: Replace ctor field with ops field in /sys/slab/* Christoph Lameter
2010-01-29 20:49 ` slub: Add get() and kick() methods Christoph Lameter
2010-01-29 20:49 ` slub: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2010-01-29 20:49 ` slub: Slab defrag core Christoph Lameter
2010-01-29 20:49 ` slub: Add KICKABLE to avoid repeated kick() attempts Christoph Lameter
2010-01-29 20:49 ` slub: Extend slabinfo to support -D and -F options Christoph Lameter
2010-01-29 20:49 ` slub/slabinfo: add defrag statistics Christoph Lameter
2010-01-29 20:49 ` slub: Trigger defragmentation from memory reclaim Christoph Lameter
2010-01-29 20:49 ` buffer heads: Support slab defrag Christoph Lameter
2010-01-30  1:59   ` Dave Chinner
2010-02-01  6:39   ` Nick Piggin [this message]
2010-01-29 20:49 ` inodes: Support generic defragmentation Christoph Lameter
2010-01-30  2:43   ` Dave Chinner
2010-02-01 17:50     ` Christoph Lameter
2010-01-30 19:26   ` tytso
2010-01-31  8:34     ` Andi Kleen
2010-01-31 13:59       ` Dave Chinner
2010-02-03 15:31         ` Christoph Lameter
2010-02-04  0:34           ` Dave Chinner
2010-02-04  3:07             ` tytso
2010-02-04  3:39               ` Dave Chinner
2010-02-04  9:33                 ` Nick Piggin
2010-02-04 17:13                   ` Christoph Lameter
2010-02-08  7:37                     ` Nick Piggin
2010-02-08 17:40                       ` Christoph Lameter
2010-02-08 22:13                       ` Dave Chinner
2010-02-04 16:59                 ` Christoph Lameter
2010-02-06  0:39                   ` Dave Chinner
2010-01-31 21:02       ` tytso
2010-02-01 10:17         ` Andi Kleen
2010-02-01 13:47           ` tytso
2010-02-01 13:54             ` Andi Kleen
2010-01-29 20:49 ` Filesystem: Ext2 filesystem defrag Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext3 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: Ext4 " Christoph Lameter
2010-01-29 20:49 ` Filesystem: XFS slab defragmentation Christoph Lameter
2010-01-29 20:49 ` Filesystems: /proc filesystem support for slab defrag Christoph Lameter
2010-01-29 20:49 ` dentries: dentry defragmentation Christoph Lameter
2010-01-29 22:00   ` Al Viro
2010-02-01  7:08     ` Nick Piggin
2010-02-01 10:10       ` Andi Kleen
2010-02-01 10:16         ` Nick Piggin
2010-02-01 10:22           ` Andi Kleen
2010-02-01 10:35             ` Nick Piggin
2010-02-01 10:45               ` Andi Kleen
2010-02-01 10:56                 ` Nick Piggin
2010-02-01 13:25                   ` Andi Kleen
2010-02-01 13:36                     ` Nick Piggin
2010-01-29 20:49 ` slub defrag: Transition patch upstream -> -next Christoph Lameter
2010-01-30  8:54 ` Slab Fragmentation Reduction V15 Pekka Enberg
2010-01-30 10:48 ` Andi Kleen
2010-01-30 14:53   ` Rik van Riel
2010-02-01 17:53     ` Christoph Lameter
2010-02-01 17:52   ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100201063903.GD9085@laptop \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cl@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=david@fromorbit.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox