[PATCH 0/5] ext3/jbd patches in my patch queue

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] ext3/jbd patches in my patch queue
@ 2009-07-21 10:04 Jan Kara
  2009-07-21 10:04 ` [PATCH 1/5] jbd: Fail to load a journal if it is too short Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton

  Hi,

  I'm sending here several patches that I carry currently and plan to merge
them. As far as I remember all of them went to the list already but some of them
some time ago so I rather resend them... Any comments welcome.

									Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] jbd: Fail to load a journal if it is too short
  2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
@ 2009-07-21 10:04 ` Jan Kara
  2009-07-21 16:19   ` Andrew Morton
  2009-07-21 10:04 ` [PATCH 2/5] ext3: Fix truncation of symlinks after failed write Jan Kara
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, Jan Kara

Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.

Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd/journal.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 737f724..94a64a1 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -848,6 +848,12 @@ static int journal_reset(journal_t *journal)
 
 	first = be32_to_cpu(sb->s_first);
 	last = be32_to_cpu(sb->s_maxlen);
+	if (first + JFS_MIN_JOURNAL_BLOCKS > last + 1) {
+		printk(KERN_ERR "JBD: Journal too short (blocks %lu-%lu).\n",
+		       first, last);
+		journal_fail_superblock(journal);
+		return -EINVAL;
+	}
 
 	journal->j_first = first;
 	journal->j_last = last;
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] ext3: Fix truncation of symlinks after failed write
  2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
  2009-07-21 10:04 ` [PATCH 1/5] jbd: Fail to load a journal if it is too short Jan Kara
@ 2009-07-21 10:04 ` Jan Kara
  2009-07-21 10:04 ` [PATCH 3/5] jbd: Fix a race between checkpointing code and journal_get_write_access() Jan Kara
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, Jan Kara

Contents of long symlinks is written via standard write methods. So when the
write fails, we add inode to orphan list. But symlinks don't have .truncate
method defined so nobody properly removes them from the orphan list (both on
disk and in memory).

Fix this by calling ext3_truncate() directly instead of calling vmtruncate()
(which is saner anyway since we don't need anything vmtruncate() does except
from calling .truncate in these paths).  We also add inode to orphan list only
if ext3_can_truncate() is true (currently, it can be false for symlinks when
there are no blocks allocated) - otherwise orphan list processing will complain
and ext3_truncate() will not remove inode from on-disk orphan list.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/inode.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 5f51fed..4d7da6f 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1193,15 +1193,16 @@ write_begin_failed:
 		 * i_size_read because we hold i_mutex.
 		 *
 		 * Add inode to orphan list in case we crash before truncate
-		 * finishes.
+		 * finishes. Do this only if ext3_can_truncate() agrees so
+		 * that orphan processing code is happy.
 		 */
-		if (pos + len > inode->i_size)
+		if (pos + len > inode->i_size && ext3_can_truncate(inode))
 			ext3_orphan_add(handle, inode);
 		ext3_journal_stop(handle);
 		unlock_page(page);
 		page_cache_release(page);
 		if (pos + len > inode->i_size)
-			vmtruncate(inode, inode->i_size);
+			ext3_truncate(inode);
 	}
 	if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
 		goto retry;
@@ -1287,7 +1288,7 @@ static int ext3_ordered_write_end(struct file *file,
 	 * There may be allocated blocks outside of i_size because
 	 * we failed to copy some data. Prepare for truncate.
 	 */
-	if (pos + len > inode->i_size)
+	if (pos + len > inode->i_size && ext3_can_truncate(inode))
 		ext3_orphan_add(handle, inode);
 	ret2 = ext3_journal_stop(handle);
 	if (!ret)
@@ -1296,7 +1297,7 @@ static int ext3_ordered_write_end(struct file *file,
 	page_cache_release(page);
 
 	if (pos + len > inode->i_size)
-		vmtruncate(inode, inode->i_size);
+		ext3_truncate(inode);
 	return ret ? ret : copied;
 }
 
@@ -1315,14 +1316,14 @@ static int ext3_writeback_write_end(struct file *file,
 	 * There may be allocated blocks outside of i_size because
 	 * we failed to copy some data. Prepare for truncate.
 	 */
-	if (pos + len > inode->i_size)
+	if (pos + len > inode->i_size && ext3_can_truncate(inode))
 		ext3_orphan_add(handle, inode);
 	ret = ext3_journal_stop(handle);
 	unlock_page(page);
 	page_cache_release(page);
 
 	if (pos + len > inode->i_size)
-		vmtruncate(inode, inode->i_size);
+		ext3_truncate(inode);
 	return ret ? ret : copied;
 }
 
@@ -1358,7 +1359,7 @@ static int ext3_journalled_write_end(struct file *file,
 	 * There may be allocated blocks outside of i_size because
 	 * we failed to copy some data. Prepare for truncate.
 	 */
-	if (pos + len > inode->i_size)
+	if (pos + len > inode->i_size && ext3_can_truncate(inode))
 		ext3_orphan_add(handle, inode);
 	EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
 	if (inode->i_size > EXT3_I(inode)->i_disksize) {
@@ -1375,7 +1376,7 @@ static int ext3_journalled_write_end(struct file *file,
 	page_cache_release(page);
 
 	if (pos + len > inode->i_size)
-		vmtruncate(inode, inode->i_size);
+		ext3_truncate(inode);
 	return ret ? ret : copied;
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] jbd: Fix a race between checkpointing code and journal_get_write_access()
  2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
  2009-07-21 10:04 ` [PATCH 1/5] jbd: Fail to load a journal if it is too short Jan Kara
  2009-07-21 10:04 ` [PATCH 2/5] ext3: Fix truncation of symlinks after failed write Jan Kara
@ 2009-07-21 10:04 ` Jan Kara
  2009-07-21 10:04 ` [PATCH 4/5] ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() Jan Kara
  2009-07-21 10:04 ` [PATCH 5/5] jbd: fix race between write_metadata_buffer and get_write_access Jan Kara
  4 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, Jan Kara

The following race can happen:

  CPU1                          CPU2
                                checkpointing code checks the buffer, adds
                                  it to an array for writeback
do_get_write_access()
  ...
  lock_buffer()
  unlock_buffer()
                                  flush_batch() submits the buffer for IO
  __jbd_journal_file_buffer()

  So a buffer under writeout is returned from do_get_write_access(). Since
the filesystem code relies on the fact that journaled buffers cannot be
written out, it does not take the buffer lock and so it can modify buffer
while it is under writeout. That can lead to a filesystem corruption
if we crash at the right moment. The similar problem can happen with
the journal_get_create_access() path.
  We fix the problem by clearing the buffer dirty bit under buffer_lock
even if the buffer is on BJ_None list. Actually, we clear the dirty bit
regardless the list the buffer is in and warn about the fact if
the buffer is already journalled.

Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.

Reported-by: dingdinghua <dingdinghua85@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd/transaction.c |   68 +++++++++++++++++++++++++------------------------
 1 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 73242ba..c03ac11 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -489,34 +489,15 @@ void journal_unlock_updates (journal_t *journal)
 	wake_up(&journal->j_wait_transaction_locked);
 }
 
-/*
- * Report any unexpected dirty buffers which turn up.  Normally those
- * indicate an error, but they can occur if the user is running (say)
- * tune2fs to modify the live filesystem, so we need the option of
- * continuing as gracefully as possible.  #
- *
- * The caller should already hold the journal lock and
- * j_list_lock spinlock: most callers will need those anyway
- * in order to probe the buffer's journaling state safely.
- */
-static void jbd_unexpected_dirty_buffer(struct journal_head *jh)
+static void warn_dirty_buffer(struct buffer_head *bh)
 {
-	int jlist;
-
-	/* If this buffer is one which might reasonably be dirty
-	 * --- ie. data, or not part of this journal --- then
-	 * we're OK to leave it alone, but otherwise we need to
-	 * move the dirty bit to the journal's own internal
-	 * JBDDirty bit. */
-	jlist = jh->b_jlist;
+	char b[BDEVNAME_SIZE];
 
-	if (jlist == BJ_Metadata || jlist == BJ_Reserved ||
-	    jlist == BJ_Shadow || jlist == BJ_Forget) {
-		struct buffer_head *bh = jh2bh(jh);
-
-		if (test_clear_buffer_dirty(bh))
-			set_buffer_jbddirty(bh);
-	}
+	printk(KERN_WARNING
+	       "JBD: Spotted dirty metadata buffer (dev = %s, blocknr = %llu). "
+	       "There's a risk of filesystem corruption in case of system "
+	       "crash.\n",
+	       bdevname(bh->b_bdev, b), (unsigned long long)bh->b_blocknr);
 }
 
 /*
@@ -583,14 +564,16 @@ repeat:
 			if (jh->b_next_transaction)
 				J_ASSERT_JH(jh, jh->b_next_transaction ==
 							transaction);
+			warn_dirty_buffer(bh);
 		}
 		/*
 		 * In any case we need to clean the dirty flag and we must
 		 * do it under the buffer lock to be sure we don't race
 		 * with running write-out.
 		 */
-		JBUFFER_TRACE(jh, "Unexpected dirty buffer");
-		jbd_unexpected_dirty_buffer(jh);
+		JBUFFER_TRACE(jh, "Journalling dirty buffer");
+		clear_buffer_dirty(bh);
+		set_buffer_jbddirty(bh);
 	}
 
 	unlock_buffer(bh);
@@ -826,6 +809,15 @@ int journal_get_create_access(handle_t *handle, struct buffer_head *bh)
 	J_ASSERT_JH(jh, buffer_locked(jh2bh(jh)));
 
 	if (jh->b_transaction == NULL) {
+		/*
+		 * Previous journal_forget() could have left the buffer
+		 * with jbddirty bit set because it was being committed. When
+		 * the commit finished, we've filed the buffer for
+		 * checkpointing and marked it dirty. Now we are reallocating
+		 * the buffer so the transaction freeing it must have
+		 * committed and so it's safe to clear the dirty bit.
+		 */
+		clear_buffer_dirty(jh2bh(jh));
 		jh->b_transaction = transaction;
 
 		/* first access by this transaction */
@@ -1782,8 +1774,13 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
 
 	if (jh->b_cp_transaction) {
 		JBUFFER_TRACE(jh, "on running+cp transaction");
+		/*
+		 * We don't want to write the buffer anymore, clear the
+		 * bit so that we don't confuse checks in
+		 * __journal_file_buffer
+		 */
+		clear_buffer_dirty(bh);
 		__journal_file_buffer(jh, transaction, BJ_Forget);
-		clear_buffer_jbddirty(bh);
 		may_free = 0;
 	} else {
 		JBUFFER_TRACE(jh, "on running transaction");
@@ -2041,12 +2038,17 @@ void __journal_file_buffer(struct journal_head *jh,
 	if (jh->b_transaction && jh->b_jlist == jlist)
 		return;
 
-	/* The following list of buffer states needs to be consistent
-	 * with __jbd_unexpected_dirty_buffer()'s handling of dirty
-	 * state. */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle()
  2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
                   ` (2 preceding siblings ...)
  2009-07-21 10:04 ` [PATCH 3/5] jbd: Fix a race between checkpointing code and journal_get_write_access() Jan Kara
@ 2009-07-21 10:04 ` Jan Kara
  2009-07-21 10:04 ` [PATCH 5/5] jbd: fix race between write_metadata_buffer and get_write_access Jan Kara
  4 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, Jan Kara

Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
be a relict from some old days and setting disksize in this function does not
make much sence. Currently it was set only by ext3_getblk().  Since the
parameter has some effect only if create == 1, it is easy to check that the
three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/dir.c           |    3 +--
 fs/ext3/inode.c         |   13 +++----------
 include/linux/ext3_fs.h |    2 +-
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 3d724a9..373fa90 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -130,8 +130,7 @@ static int ext3_readdir(struct file * filp,
 		struct buffer_head *bh = NULL;
 
 		map_bh.b_state = 0;
-		err = ext3_get_blocks_handle(NULL, inode, blk, 1,
-						&map_bh, 0, 0);
+		err = ext3_get_blocks_handle(NULL, inode, blk, 1, &map_bh, 0);
 		if (err > 0) {
 			pgoff_t index = map_bh.b_blocknr >>
 					(PAGE_CACHE_SHIFT - inode->i_blkbits);
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 4d7da6f..b49908a 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -788,7 +788,7 @@ err_out:
 int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
 		sector_t iblock, unsigned long maxblocks,
 		struct buffer_head *bh_result,
-		int create, int extend_disksize)
+		int create)
 {
 	int err = -EIO;
 	int offsets[4];
@@ -911,13 +911,6 @@ int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
 	if (!err)
 		err = ext3_splice_branch(handle, inode, iblock,
 					partial, indirect_blks, count);
-	/*
-	 * i_disksize growing is protected by truncate_mutex.  Don't forget to
-	 * protect it if you're about to implement concurrent
-	 * ext3_get_block() -bzzz
-	*/
-	if (!err && extend_disksize && inode->i_size > ei->i_disksize)
-		ei->i_disksize = inode->i_size;
 	mutex_unlock(&ei->truncate_mutex);
 	if (err)
 		goto cleanup;
@@ -972,7 +965,7 @@ static int ext3_get_block(struct inode *inode, sector_t iblock,
 	}
 
 	ret = ext3_get_blocks_handle(handle, inode, iblock,
-					max_blocks, bh_result, create, 0);
+					max_blocks, bh_result, create);
 	if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
 		ret = 0;
@@ -1005,7 +998,7 @@ struct buffer_head *ext3_getblk(handle_t *handle, struct inode *inode,
 	dummy.b_blocknr = -1000;
 	buffer_trace_init(&dummy.b_history);
 	err = ext3_get_blocks_handle(handle, inode, block, 1,
-					&dummy, create, 1);
+					&dummy, create);
 	/*
 	 * ext3_get_blocks_handle() returns number of blocks
 	 * mapped. 0 in case of a HOLE.
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index 634a5e5..7499b36 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -874,7 +874,7 @@ struct buffer_head * ext3_getblk (handle_t *, struct inode *, long, int, int *);
 struct buffer_head * ext3_bread (handle_t *, struct inode *, int, int, int *);
 int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
 	sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result,
-	int create, int extend_disksize);
+	int create);
 
 extern struct inode *ext3_iget(struct super_block *, unsigned long);
 extern int  ext3_write_inode (struct inode *, int);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] jbd: fix race between write_metadata_buffer and get_write_access
  2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
                   ` (3 preceding siblings ...)
  2009-07-21 10:04 ` [PATCH 4/5] ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() Jan Kara
@ 2009-07-21 10:04 ` Jan Kara
  4 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-21 10:04 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andrew Morton, dingdinghua, Jan Kara

From: dingdinghua <dingdinghua85@gmail.com>

The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in)
too early; this could potentially allow another thread to call get_write_access
on the buffer head, modify the data, and dirty it, and allowing the wrong data
to be written into the journal.  Fortunately, if we lose this race, the only
time this will actually cause filesystem corruption is if there is a system
crash or other unclean shutdown of the system before the next commit can take
place.

Signed-off-by: dingdinghua <dingdinghua85@gmail.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd/journal.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index 94a64a1..f96f850 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -287,6 +287,7 @@ int journal_write_metadata_buffer(transaction_t *transaction,
 	struct page *new_page;
 	unsigned int new_offset;
 	struct buffer_head *bh_in = jh2bh(jh_in);
+	journal_t *journal = transaction->t_journal;
 
 	/*
 	 * The buffer really shouldn't be locked: only the current committing
@@ -300,6 +301,11 @@ int journal_write_metadata_buffer(transaction_t *transaction,
 	J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in));
 
 	new_bh = alloc_buffer_head(GFP_NOFS|__GFP_NOFAIL);
+	/* keep subsequent assertions sane */
+	new_bh->b_state = 0;
+	init_buffer(new_bh, NULL, NULL);
+	atomic_set(&new_bh->b_count, 1);
+	new_jh = journal_add_journal_head(new_bh);	/* This sleeps */
 
 	/*
 	 * If a new transaction has already done a buffer copy-out, then
@@ -361,14 +367,6 @@ repeat:
 		kunmap_atomic(mapped_data, KM_USER0);
 	}
 
-	/* keep subsequent assertions sane */
-	new_bh->b_state = 0;
-	init_buffer(new_bh, NULL, NULL);
-	atomic_set(&new_bh->b_count, 1);
-	jbd_unlock_bh_state(bh_in);
-
-	new_jh = journal_add_journal_head(new_bh);	/* This sleeps */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] jbd: Fail to load a journal if it is too short
  2009-07-21 10:04 ` [PATCH 1/5] jbd: Fail to load a journal if it is too short Jan Kara
@ 2009-07-21 16:19   ` Andrew Morton
  2009-07-21 16:50     ` Andreas Dilger
  2009-07-21 21:35     ` Theodore Tso
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2009-07-21 16:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, 21 Jul 2009 12:04:15 +0200 Jan Kara <jack@suse.cz> wrote:

> Due to on disk corruption, it can happen that journal is too short. Fail
> to load it in such case so that we don't oops somewhere later.
> 
> Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/jbd/journal.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> index 737f724..94a64a1 100644
> --- a/fs/jbd/journal.c
> +++ b/fs/jbd/journal.c
> @@ -848,6 +848,12 @@ static int journal_reset(journal_t *journal)
>  
>  	first = be32_to_cpu(sb->s_first);
>  	last = be32_to_cpu(sb->s_maxlen);
> +	if (first + JFS_MIN_JOURNAL_BLOCKS > last + 1) {
> +		printk(KERN_ERR "JBD: Journal too short (blocks %lu-%lu).\n",
> +		       first, last);
> +		journal_fail_superblock(journal);
> +		return -EINVAL;
> +	}
>  
>  	journal->j_first = first;
>  	journal->j_last = last;

It's odd that sb->s_first/s_maxlen are 32-bit and
journal->j_first/j_last are unsigned long.

These things will only ever be 32-bit unless we change the journal
superblock.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] jbd: Fail to load a journal if it is too short
  2009-07-21 16:19   ` Andrew Morton
@ 2009-07-21 16:50     ` Andreas Dilger
  2009-07-21 21:35     ` Theodore Tso
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Dilger @ 2009-07-21 16:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-ext4

On Jul 21, 2009  09:19 -0700, Andrew Morton wrote:
> On Tue, 21 Jul 2009 12:04:15 +0200 Jan Kara <jack@suse.cz> wrote:
> > Due to on disk corruption, it can happen that journal is too short. Fail
> > to load it in such case so that we don't oops somewhere later.
> > 
> > Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/jbd/journal.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
> > index 737f724..94a64a1 100644
> > --- a/fs/jbd/journal.c
> > +++ b/fs/jbd/journal.c
> > @@ -848,6 +848,12 @@ static int journal_reset(journal_t *journal)
> >  
> >  	first = be32_to_cpu(sb->s_first);
> >  	last = be32_to_cpu(sb->s_maxlen);
> > +	if (first + JFS_MIN_JOURNAL_BLOCKS > last + 1) {
> > +		printk(KERN_ERR "JBD: Journal too short (blocks %lu-%lu).\n",
> > +		       first, last);
> > +		journal_fail_superblock(journal);
> > +		return -EINVAL;
> > +	}
> >  
> >  	journal->j_first = first;
> >  	journal->j_last = last;
> 
> It's odd that sb->s_first/s_maxlen are 32-bit and
> journal->j_first/j_last are unsigned long.
> 
> These things will only ever be 32-bit unless we change the journal
> superblock.

The jbd on disk structure and APIs cannot handle 64-bit block numbers.
That is one of the first changes we made for jbd2 so that it is possible
to store either 32-bit or 64-bit block numbers in a transaction.  I don't
think that needs to be fixed for the jbd code.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] jbd: Fail to load a journal if it is too short
  2009-07-21 16:19   ` Andrew Morton
  2009-07-21 16:50     ` Andreas Dilger
@ 2009-07-21 21:35     ` Theodore Tso
  2009-07-22  9:52       ` Jan Kara
  1 sibling, 1 reply; 10+ messages in thread
From: Theodore Tso @ 2009-07-21 21:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-ext4

On Tue, Jul 21, 2009 at 09:19:46AM -0700, Andrew Morton wrote:
> 
> It's odd that sb->s_first/s_maxlen are 32-bit and
> journal->j_first/j_last are unsigned long.
> 
> These things will only ever be 32-bit unless we change the journal
> superblock.

In general, if there is any use of "unsigned long" in fs/ext[34], it's
probably a bug.  This is because ulong is 32-bits on x86, and 64-bits
on x86_64, so it just wastes memory space on 64-bit platforms.  The
one exception to this is if the field in question is used by the
standard bitops functions, which only functions correctly on "unsigned
long".

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] jbd: Fail to load a journal if it is too short
  2009-07-21 21:35     ` Theodore Tso
@ 2009-07-22  9:52       ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2009-07-22  9:52 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Andrew Morton, Jan Kara, linux-ext4

On Tue 21-07-09 17:35:48, Theodore Tso wrote:
> On Tue, Jul 21, 2009 at 09:19:46AM -0700, Andrew Morton wrote:
> > 
> > It's odd that sb->s_first/s_maxlen are 32-bit and
> > journal->j_first/j_last are unsigned long.
> > 
> > These things will only ever be 32-bit unless we change the journal
> > superblock.
> 
> In general, if there is any use of "unsigned long" in fs/ext[34], it's
> probably a bug.  This is because ulong is 32-bits on x86, and 64-bits
> on x86_64, so it just wastes memory space on 64-bit platforms.  The
> one exception to this is if the field in question is used by the
> standard bitops functions, which only functions correctly on "unsigned
> long".
  That's a good point. I'll write a cleanup patch at least for the obvious
offenders.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-07-22  9:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-21 10:04 [PATCH 0/5] ext3/jbd patches in my patch queue Jan Kara
2009-07-21 10:04 ` [PATCH 1/5] jbd: Fail to load a journal if it is too short Jan Kara
2009-07-21 16:19   ` Andrew Morton
2009-07-21 16:50     ` Andreas Dilger
2009-07-21 21:35     ` Theodore Tso
2009-07-22  9:52       ` Jan Kara
2009-07-21 10:04 ` [PATCH 2/5] ext3: Fix truncation of symlinks after failed write Jan Kara
2009-07-21 10:04 ` [PATCH 3/5] jbd: Fix a race between checkpointing code and journal_get_write_access() Jan Kara
2009-07-21 10:04 ` [PATCH 4/5] ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() Jan Kara
2009-07-21 10:04 ` [PATCH 5/5] jbd: fix race between write_metadata_buffer and get_write_access Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).