linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Badari Pulavarty <pbadari@us.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>,
	sct@redhat.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	ext4 <linux-ext4@vger.kernel.org>,
	jack@suse.cz
Subject: Re: [RFC][PATCH] set_page_buffer_dirty should skip unmapped buffers
Date: Tue, 05 Sep 2006 09:11:42 -0700	[thread overview]
Message-ID: <1157472702.23501.12.camel@dyn9047017100.beaverton.ibm.com> (raw)
In-Reply-To: <20060901101801.7845bca2.akpm@osdl.org>

On Fri, 2006-09-01 at 10:18 -0700, Andrew Morton wrote:
> On Fri, 01 Sep 2006 09:32:22 -0700
> Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> > > > Kernel BUG at fs/buffer.c:2791
> > > > invalid opcode: 0000 [1] SMP
> > > > 
> > > > Its complaining about BUG_ON(!buffer_mapped(bh)).
> 
> I need to have a little think about this, remember what _should_ be
> happening in this situation.
> 
> We (mainly I) used to do a huge amount of fsx-linux testing on 1k blocksize
> filesystems.  We've done something to make this start happening.  Part of
> resolving this bug will be working out what that was.
> 

It took a while to track this down. 2.6.13 is the last kernel
which runs my fsx tests fine (72+ hours).

Here is the change that seems to cause the problem. Jana Kara
introduced a new mode "SWRITE" for ll_rw_block() - where it
waits for buffer to be unlocked (WRITE will skip locked
buffers) + jbd journal_commit_transaction() has been changed
to make use of SWRITE.

http://marc.theaimsgroup.com/?l=linux-fsdevel&m=112109788702895&w=2

Theoritically same race (between journal_commit_transaction() and
journal_unmap_buffer()+set_page_dirty()) could exist before his changes
- which should trigger bug in submit_bh(). But I can't seem to hit
it without his changes. My guess is ll_rw_block() is always skipping
those cleaned up buffers, before page gets dirtied again ..

Andrew, what should we do ? Do you suggest handling this in jbd
itself (like this patch) ?

Thanks,
Badari

Patch to fix: Kernel BUG at fs/buffer.c:2791
on 1k (2k) filesystems while running fsx.

journal_commit_transaction collects lots of dirty buffer from
and does a single ll_rw_block() to write them out. ll_rw_block()
locks the buffer and checks to see if they are dirty and submits
them for IO.

In the mean while, journal_unmap_buffers() as part of
truncate can unmap the buffer and throw it away. Since its
a 1k (2k) filesystem - each page (4k) will have more than
one buffer_head attached to the page and and we can't free 
up buffer_heads attached to the page (if we are not
invalidating the whole page).

Now, any call to set_page_dirty() (like msync_interval)
could end up setting all the buffer heads attached to
this page again dirty, including the ones those got
cleaned up :(

So, -not-so-elegant fix would be to have make jbd skip all 
the buffers that are not mapped.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 fs/jbd/commit.c |   32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

Index: linux-2.6.18-rc5/fs/jbd/commit.c
===================================================================
--- linux-2.6.18-rc5.orig/fs/jbd/commit.c	2006-08-27 20:41:48.000000000 -0700
+++ linux-2.6.18-rc5/fs/jbd/commit.c	2006-09-01 10:43:54.000000000 -0700
@@ -160,6 +160,34 @@ static int journal_write_commit_record(j
 	return (ret == -EIO);
 }
 
+static void jbd_write_buffers(int nr, struct buffer_head *bhs[])
+{
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		struct buffer_head *bh = bhs[i];
+
+		lock_buffer(bh);
+
+		/*
+		 * case 1: Buffer could have been flushed by now,
+		 * if so nothing to do for us.
+		 * case 2: Buffer could have been unmapped up by
+		 * journal_unmap_buffer - but still attached to the
+		 * page. Any calls to set_page_dirty() would dirty
+		 * the buffer even though its not mapped.  If so,
+		 * we need to skip them.
+		 */
+		if (buffer_mapped(bh) && test_clear_buffer_dirty(bh)) {
+			bh->b_end_io = end_buffer_write_sync;
+			get_bh(bh);
+			submit_bh(WRITE, bh);
+			continue;
+		}
+		unlock_buffer(bh);
+	}
+}
+
 /*
  * journal_commit_transaction
  *
@@ -356,7 +384,7 @@ write_out_data:
 					jbd_debug(2, "submit %d writes\n",
 							bufs);
 					spin_unlock(&journal->j_list_lock);
-					ll_rw_block(SWRITE, bufs, wbuf);
+					jbd_write_buffers(bufs, wbuf);
 					journal_brelse_array(wbuf, bufs);
 					bufs = 0;
 					goto write_out_data;
@@ -379,7 +407,7 @@ write_out_data:
 
 	if (bufs) {
 		spin_unlock(&journal->j_list_lock);
-		ll_rw_block(SWRITE, bufs, wbuf);
+		jbd_write_buffers(bufs, wbuf);
 		journal_brelse_array(wbuf, bufs);
 		spin_lock(&journal->j_list_lock);
 	}



  parent reply	other threads:[~2006-09-05 16:11 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-01 15:50 [RFC][PATCH] set_page_buffer_dirty should skip unmapped buffers Badari Pulavarty
2006-09-01 16:12 ` Anton Altaparmakov
2006-09-01 16:32   ` Badari Pulavarty
2006-09-01 17:18     ` Andrew Morton
2006-09-01 17:43       ` Badari Pulavarty
2006-09-01 21:01       ` Badari Pulavarty
2006-09-05 16:11       ` Badari Pulavarty [this message]
2006-09-06 12:47         ` Jan Kara
2006-09-06 15:12           ` Badari Pulavarty
2006-09-06 15:34             ` Jan Kara
2006-09-06 16:19               ` Badari Pulavarty
2006-09-06 16:27                 ` Jan Kara
2006-09-06 16:43                   ` Badari Pulavarty
2006-09-06 17:03                     ` Jan Kara
2006-09-06 17:16                   ` Badari Pulavarty
2006-09-06 17:27                     ` Jan Kara
2006-09-07  2:14                       ` Andrew Morton
2006-09-07  3:04                         ` Badari Pulavarty
2006-09-07  3:34                           ` Andrew Morton
2006-09-07 15:11                       ` Badari Pulavarty
2006-09-07 20:48                         ` Jan Kara
2006-09-07 22:30                         ` Jan Kara
2006-09-08  4:33                           ` Badari Pulavarty
2006-09-08  8:25                             ` Jan Kara
2006-09-08 14:35                               ` Badari Pulavarty
2006-09-11  9:46                                 ` Jan Kara
2006-09-11 20:45                                   ` Badari Pulavarty
2006-09-11 20:52                                     ` Jan Kara
2006-09-13 20:25                           ` Dave Kleikamp
2006-09-14  3:38                             ` Andrew Morton
2006-09-27 22:01                               ` Eric Sandeen
2006-09-28 15:03                                 ` [PATCH] JBD: Make journal_do_submit_data static Dave Kleikamp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1157472702.23501.12.camel@dyn9047017100.beaverton.ibm.com \
    --to=pbadari@us.ibm.com \
    --cc=aia21@cam.ac.uk \
    --cc=akpm@osdl.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).