linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/6] reiserfs v3 patches
@ 2006-01-16  0:50 Chris Mason
  2006-01-16  0:50 ` [patch 1/6] reiserfs v3 patches, [patch 1/6] fix reiserfs_invalidatepage race against data=ordered Chris Mason, Chris Mason
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

Hello everyone,

Here is my current queue of reiserfs patches.  These originated from
various bugs solved in the suse sles9 kernel, and have been ported to
2.6.15-git9.

-chris

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 1/6] reiserfs v3 patches, [patch 1/6] fix reiserfs_invalidatepage race against data=ordered
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 2/6] reiserfs v3 patches, [patch 2/6] Zero b_private when allocating buffer heads Chris Mason, Chris Mason
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: reiserfs_invalidatepage-race-fix --]
[-- Type: text/plain, Size: 2688 bytes --]

After a transaction has closed but before it has finished commit, there
is a window where data=ordered mode requires invalidatepage to pin pages
instead of freeing them.  This patch fixes a race between the
invalidatepage checks and data=ordered writeback, and it also adds a
check to the reiserfs write_ordered_buffers routines to write any
anonymous buffers that were dirtied after its first writeback loop.

That bug works like this:

proc1: transaction closes and a new one starts
proc1: write_ordered_buffers starts processing data=ordered list
proc1: buffer A is cleaned and written
proc2: buffer A is dirtied by another process
proc2: File is truncated to zero, page A goes through invalidatepage
proc2: reiserfs_invalidatepage sees dirty buffer A with reiserfs
       journal head, pins it
proc1: write_ordered_buffers frees the journal head on buffer A

At this point, buffer A stays dirty forever

diff -r 21be96fa294a fs/reiserfs/inode.c
--- a/fs/reiserfs/inode.c	Fri Jan 13 13:48:03 2006 -0500
+++ b/fs/reiserfs/inode.c	Fri Jan 13 13:50:37 2006 -0500
@@ -2743,6 +2743,7 @@ static int invalidatepage_can_drop(struc
 	int ret = 1;
 	struct reiserfs_journal *j = SB_JOURNAL(inode->i_sb);
 
+	lock_buffer(bh);
 	spin_lock(&j->j_dirty_buffers_lock);
 	if (!buffer_mapped(bh)) {
 		goto free_jh;
@@ -2758,7 +2759,7 @@ static int invalidatepage_can_drop(struc
 		if (buffer_journaled(bh) || buffer_journal_dirty(bh)) {
 			ret = 0;
 		}
-	} else if (buffer_dirty(bh) || buffer_locked(bh)) {
+	} else  if (buffer_dirty(bh)) {
 		struct reiserfs_journal_list *jl;
 		struct reiserfs_jh *jh = bh->b_private;
 
@@ -2784,6 +2785,7 @@ static int invalidatepage_can_drop(struc
 		reiserfs_free_jh(bh);
 	}
 	spin_unlock(&j->j_dirty_buffers_lock);
+	unlock_buffer(bh);
 	return ret;
 }
 
diff -r 21be96fa294a fs/reiserfs/journal.c
--- a/fs/reiserfs/journal.c	Fri Jan 13 13:48:03 2006 -0500
+++ b/fs/reiserfs/journal.c	Fri Jan 13 13:50:37 2006 -0500
@@ -878,6 +878,19 @@ static int write_ordered_buffers(spinloc
 		}
 		if (!buffer_uptodate(bh)) {
 			ret = -EIO;
+		}
+		/* ugly interaction with invalidatepage here.
+		 * reiserfs_invalidate_page will pin any buffer that has a valid
+		 * journal head from an older transaction.  If someone else sets
+		 * our buffer dirty after we write it in the first loop, and
+		 * then someone truncates the page away, nobody will ever write
+		 * the buffer. We're safe if we write the page one last time
+		 * after freeing the journal header.
+		 */
+		if (buffer_dirty(bh) && unlikely(bh->b_page->mapping == NULL)) {
+			spin_unlock(lock);
+			ll_rw_block(WRITE, 1, &bh);
+			spin_lock(lock);
 		}
 		put_bh(bh);
 		cond_resched_lock(lock);

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 2/6] reiserfs v3 patches, [patch 2/6] Zero b_private when allocating buffer heads
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
  2006-01-16  0:50 ` [patch 1/6] reiserfs v3 patches, [patch 1/6] fix reiserfs_invalidatepage race against data=ordered Chris Mason, Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 3/6] reiserfs v3 patches, [patch 3/6] reiserfs hang and performance fix for data=journal mode Chris Mason, Chris Mason
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: b_private-init --]
[-- Type: text/plain, Size: 507 bytes --]

The b_private field in buffer heads needs to be zero filled
when the buffers are allocated.  Thanks to Nathan Scott for
finding this.  It was causing problems on systems with both XFS and
reiserfs.

diff -r 5ef1fa0a021a fs/buffer.c
--- a/fs/buffer.c	Fri Jan 13 13:50:39 2006 -0500
+++ b/fs/buffer.c	Fri Jan 13 13:51:09 2006 -0500
@@ -1022,6 +1022,7 @@ try_again:
 
 		bh->b_state = 0;
 		atomic_set(&bh->b_count, 0);
+		bh->b_private = NULL;
 		bh->b_size = size;
 
 		/* Link the buffer to its page */

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 3/6] reiserfs v3 patches, [patch 3/6] reiserfs hang and performance fix for data=journal mode
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
  2006-01-16  0:50 ` [patch 1/6] reiserfs v3 patches, [patch 1/6] fix reiserfs_invalidatepage race against data=ordered Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 2/6] reiserfs v3 patches, [patch 2/6] Zero b_private when allocating buffer heads Chris Mason, Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 4/6] reiserfs v3 patches, [patch 4/6] reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh Chris Mason, Chris Mason
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: reiserfs-logging-perf-3 --]
[-- Type: text/plain, Size: 2553 bytes --]

In data=journal mode, reiserfs writepage needs to make sure not to
trigger transactions while being run under PF_MEMALLOC.  This patch
makes sure to redirty the page instead of forcing a transaction start
in this case.

Also, calling filemap_fdata* in order to trigger io on the block device
can cause lock inversions on the page lock.  Instead, do simple
batching from flush_commit_list.

diff -r c10585019f18 fs/reiserfs/inode.c
--- a/fs/reiserfs/inode.c	Fri Jan 13 13:51:10 2006 -0500
+++ b/fs/reiserfs/inode.c	Fri Jan 13 13:55:09 2006 -0500
@@ -2363,6 +2363,13 @@ static int reiserfs_write_full_page(stru
 	int bh_per_page = PAGE_CACHE_SIZE / s->s_blocksize;
 	th.t_trans_id = 0;
 
+	/* no logging allowed when nonblocking or from PF_MEMALLOC */
+	if (checked && (current->flags & PF_MEMALLOC)) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return 0;
+	}
+
 	/* The page dirty bit is cleared before writepage is called, which
 	 * means we have to tell create_empty_buffers to make dirty buffers
 	 * The page really should be up to date at this point, so tossing
diff -r c10585019f18 fs/reiserfs/journal.c
--- a/fs/reiserfs/journal.c	Fri Jan 13 13:51:10 2006 -0500
+++ b/fs/reiserfs/journal.c	Fri Jan 13 13:55:09 2006 -0500
@@ -990,6 +990,7 @@ static int flush_commit_list(struct supe
 	struct reiserfs_journal *journal = SB_JOURNAL(s);
 	int barrier = 0;
 	int retval = 0;
+	int write_len;
 
 	reiserfs_check_lock_depth(s, "flush_commit_list");
 
@@ -1039,16 +1040,24 @@ static int flush_commit_list(struct supe
 	BUG_ON(!list_empty(&jl->j_bh_list));
 	/*
 	 * for the description block and all the log blocks, submit any buffers
-	 * that haven't already reached the disk
+	 * that haven't already reached the disk.  Try to write at least 256
+	 * log blocks. later on, we will only wait on blocks that correspond
+	 * to this transaction, but while we're unplugging we might as well
+	 * get a chunk of data on there.
 	 */
 	atomic_inc(&journal->j_async_throttle);
-	for (i = 0; i < (jl->j_len + 1); i++) {
+	write_len = jl->j_len + 1;
+	if (write_len < 256)
+		write_len = 256;
+	for (i = 0 ; i < write_len ; i++) {
 		bn = SB_ONDISK_JOURNAL_1st_BLOCK(s) + (jl->j_start + i) %
 		    SB_ONDISK_JOURNAL_SIZE(s);
 		tbh = journal_find_get_block(s, bn);
-		if (buffer_dirty(tbh))	/* redundant, ll_rw_block() checks */
-			ll_rw_block(SWRITE, 1, &tbh);
-		put_bh(tbh);
+		if (tbh) {
+			if (buffer_dirty(tbh))
+			    ll_rw_block(WRITE, 1, &tbh) ;
+			put_bh(tbh) ; 
+		}
 	}
 	atomic_dec(&journal->j_async_throttle);
 

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 4/6] reiserfs v3 patches, [patch 4/6] reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
                   ` (2 preceding siblings ...)
  2006-01-16  0:50 ` [patch 3/6] reiserfs v3 patches, [patch 3/6] reiserfs hang and performance fix for data=journal mode Chris Mason, Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 5/6] reiserfs v3 patches, [patch 5/6] reiserfs fix journal accounting in journal_transaction_should_end Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 6/6] reiserfs v3 patches, [patch 6/6] reiserfs: check for files > 2GB on 3.5.x disks Chris Mason, Jeff Mahoney
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: reiserfs-ordered-io-failure --]
[-- Type: text/plain, Size: 1180 bytes --]

write_ordered_buffers should handle dirty non-uptodate buffers without
a BUG()

diff -r 18fa5554d7e2 fs/reiserfs/journal.c
--- a/fs/reiserfs/journal.c	Fri Jan 13 13:55:10 2006 -0500
+++ b/fs/reiserfs/journal.c	Fri Jan 13 14:00:49 2006 -0500
@@ -848,6 +848,14 @@ static int write_ordered_buffers(spinloc
 			spin_lock(lock);
 			goto loop_next;
 		}
+		/* in theory, dirty non-uptodate buffers should never get here,
+		 * but the upper layer io error paths still have a few quirks.  
+		 * Handle them here as gracefully as we can
+		 */
+		if (!buffer_uptodate(bh) && buffer_dirty(bh)) {
+			clear_buffer_dirty(bh);
+			ret = -EIO;
+		}
 		if (buffer_dirty(bh)) {
 			list_del_init(&jh->list);
 			list_add(&jh->list, &tmp);
@@ -1032,9 +1040,12 @@ static int flush_commit_list(struct supe
 	}
 
 	if (!list_empty(&jl->j_bh_list)) {
+		int ret;
 		unlock_kernel();
-		write_ordered_buffers(&journal->j_dirty_buffers_lock,
-				      journal, jl, &jl->j_bh_list);
+		ret = write_ordered_buffers(&journal->j_dirty_buffers_lock,
+					    journal, jl, &jl->j_bh_list);
+		if (ret < 0 && retval == 0)
+			retval = ret;
 		lock_kernel();
 	}
 	BUG_ON(!list_empty(&jl->j_bh_list));

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 5/6] reiserfs v3 patches, [patch 5/6] reiserfs fix journal accounting in journal_transaction_should_end
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
                   ` (3 preceding siblings ...)
  2006-01-16  0:50 ` [patch 4/6] reiserfs v3 patches, [patch 4/6] reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh Chris Mason, Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Chris Mason
  2006-01-16  0:50 ` [patch 6/6] reiserfs v3 patches, [patch 6/6] reiserfs: check for files > 2GB on 3.5.x disks Chris Mason, Jeff Mahoney
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Chris Mason @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: reiserfs-should-end-alloc --]
[-- Type: text/plain, Size: 618 bytes --]

reiserfs: journal_transaction_should_end should increase the count of blocks
allocated so the transaction subsystem can keep new writers from creating
a transaction that is too large.

diff -r 890bf922a629 fs/reiserfs/journal.c
--- a/fs/reiserfs/journal.c	Fri Jan 13 14:00:50 2006 -0500
+++ b/fs/reiserfs/journal.c	Fri Jan 13 14:01:36 2006 -0500
@@ -2854,6 +2854,9 @@ int journal_transaction_should_end(struc
 	    journal->j_cnode_free < (journal->j_trans_max * 3)) {
 		return 1;
 	}
+	/* protected by the BKL here */
+	journal->j_len_alloc += new_alloc;
+	th->t_blocks_allocated += new_alloc ;
 	return 0;
 }
 

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 6/6] reiserfs v3 patches, [patch 6/6] reiserfs: check for files > 2GB on 3.5.x disks
  2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
                   ` (4 preceding siblings ...)
  2006-01-16  0:50 ` [patch 5/6] reiserfs v3 patches, [patch 5/6] reiserfs fix journal accounting in journal_transaction_should_end Chris Mason, Chris Mason
@ 2006-01-16  0:50 ` Chris Mason, Jeff Mahoney
  5 siblings, 0 replies; 7+ messages in thread
From: Chris Mason, Jeff Mahoney @ 2006-01-16  0:50 UTC (permalink / raw)
  To: akpm, linux-fsdevel, reiserfs-list

[-- Attachment #1: reiserfs-old-format-size.diff --]
[-- Type: text/plain, Size: 1335 bytes --]

When a filesystem has been converted from 3.5.x to 3.6.x, we need
an extra check during file write to make sure we are not trying
to make a 3.5.x file > 2GB.

diff -r ee81eb208598 fs/reiserfs/file.c
--- a/fs/reiserfs/file.c	Fri Jan 13 14:01:37 2006 -0500
+++ b/fs/reiserfs/file.c	Fri Jan 13 14:08:12 2006 -0500
@@ -1285,6 +1285,23 @@ static ssize_t reiserfs_file_write(struc
 	struct reiserfs_transaction_handle th;
 	th.t_trans_id = 0;
 
+	/* If a filesystem is converted from 3.5 to 3.6, we'll have v3.5 items
+	* lying around (most of the disk, in fact). Despite the filesystem
+	* now being a v3.6 format, the old items still can't support large
+	* file sizes. Catch this case here, as the rest of the VFS layer is
+	* oblivious to the different limitations between old and new items.
+	* reiserfs_setattr catches this for truncates. This chunk is lifted
+	* from generic_write_checks. */
+	if (get_inode_item_key_version (inode) == KEY_FORMAT_3_5 && 
+	    *ppos + count > MAX_NON_LFS) {
+		if (*ppos >= MAX_NON_LFS) {
+			send_sig(SIGXFSZ, current, 0);
+			return -EFBIG;
+		}
+		if (count > MAX_NON_LFS - (unsigned long)*ppos)
+			count = MAX_NON_LFS - (unsigned long)*ppos;
+	}
+
 	if (file->f_flags & O_DIRECT) {	// Direct IO needs treatment
 		ssize_t result, after_file_end = 0;
 		if ((*ppos + count >= inode->i_size)

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-16  0:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-16  0:50 [patch 0/6] reiserfs v3 patches Chris Mason
2006-01-16  0:50 ` [patch 1/6] reiserfs v3 patches, [patch 1/6] fix reiserfs_invalidatepage race against data=ordered Chris Mason, Chris Mason
2006-01-16  0:50 ` [patch 2/6] reiserfs v3 patches, [patch 2/6] Zero b_private when allocating buffer heads Chris Mason, Chris Mason
2006-01-16  0:50 ` [patch 3/6] reiserfs v3 patches, [patch 3/6] reiserfs hang and performance fix for data=journal mode Chris Mason, Chris Mason
2006-01-16  0:50 ` [patch 4/6] reiserfs v3 patches, [patch 4/6] reiserfs write_ordered_buffers should not oops on dirty non-uptodate bh Chris Mason, Chris Mason
2006-01-16  0:50 ` [patch 5/6] reiserfs v3 patches, [patch 5/6] reiserfs fix journal accounting in journal_transaction_should_end Chris Mason, Chris Mason
2006-01-16  0:50 ` [patch 6/6] reiserfs v3 patches, [patch 6/6] reiserfs: check for files > 2GB on 3.5.x disks Chris Mason, Jeff Mahoney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).