stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] ext4: Fix data exposure after a crash
       [not found] <1459354767-8693-1-git-send-email-jack@suse.cz>
@ 2016-03-30 16:19 ` Jan Kara
  2016-04-24  3:48   ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2016-03-30 16:19 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Weller.Huang, Jan Kara, stable

Huang has reported that in his powerfail testing he is seeing stale
block contents in some of recently allocated blocks although he mounts
ext4 in data=ordered mode. After some investigation I have found out
that indeed when delayed allocation is used, we don't add inode to
transaction's list of inodes needing flushing before commit. Originally
we were doing that but commit f3b59291a69d removed the logic with a
flawed argument that it is not needed.

The problem is that although for delayed allocated blocks we write their
contents immediately after allocating them, there is no guarantee that
the IO scheduler or device doesn't reorder things and thus transaction
allocating blocks and attaching them to inode can reach stable storage
before actual block contents. Actually whenever we attach freshly
allocated blocks to inode using a written extent, we should add inode to
transaction's ordered inode list to make sure we properly wait for block
contents to be written before committing the transaction. So that is
what we do in this patch. This also handles other cases where stale data
exposure was possible - like filling hole via mmap in
data=ordered,nodelalloc mode.

The only exception to the above rule are extending direct IO writes where
blkdev_direct_IO() waits for IO to complete before increasing i_size and
thus stale data exposure is not possible. For now we don't complicate
the code with optimizing this special case since the overhead is pretty
low. In case this is observed to be a performance problem we can always
handle it using a special flag to ext4_map_blocks().

CC: stable@vger.kernel.org
Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d
Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index dab84a2530ff..747b0e64b9d2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -684,6 +684,20 @@ out_sem:
 		ret = check_block_validity(inode, map);
 		if (ret != 0)
 			return ret;
+
+		/*
+		 * Inodes with freshly allocated blocks where contents will be
+		 * visible after transaction commit must be on transaction's
+		 * ordered data list.
+		 */
+		if (map->m_flags & EXT4_MAP_NEW &&
+		    !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
+		    !(flags & EXT4_GET_BLOCKS_ZERO) &&
+		    ext4_should_order_data(inode)) {
+			ret = ext4_jbd2_file_inode(handle, inode);
+			if (ret)
+				return ret;
+		}
 	}
 	return retval;
 }
@@ -1291,15 +1305,6 @@ static int ext4_write_end(struct file *file,
 	int i_size_changed = 0;
 
 	trace_ext4_write_end(inode, pos, len, copied);
-	if (ext4_test_inode_state(inode, EXT4_STATE_ORDERED_MODE)) {
-		ret = ext4_jbd2_file_inode(handle, inode);
-		if (ret) {
-			unlock_page(page);
-			page_cache_release(page);
-			goto errout;
-		}
-	}
-
 	if (ext4_has_inline_data(inode)) {
 		ret = ext4_write_inline_data_end(inode, pos, len,
 						 copied, page);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/4] ext4: Fix data exposure after a crash
  2016-03-30 16:19 ` [PATCH 1/4] ext4: Fix data exposure after a crash Jan Kara
@ 2016-04-24  3:48   ` Theodore Ts'o
  2016-04-24  4:55     ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2016-04-24  3:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Weller.Huang, stable

On Wed, Mar 30, 2016 at 06:19:24PM +0200, Jan Kara wrote:
> Huang has reported that in his powerfail testing he is seeing stale
> block contents in some of recently allocated blocks although he mounts
> ext4 in data=ordered mode. After some investigation I have found out
> that indeed when delayed allocation is used, we don't add inode to
> transaction's list of inodes needing flushing before commit. Originally
> we were doing that but commit f3b59291a69d removed the logic with a
> flawed argument that it is not needed.
> 
> The problem is that although for delayed allocated blocks we write their
> contents immediately after allocating them, there is no guarantee that
> the IO scheduler or device doesn't reorder things and thus transaction
> allocating blocks and attaching them to inode can reach stable storage
> before actual block contents. Actually whenever we attach freshly
> allocated blocks to inode using a written extent, we should add inode to
> transaction's ordered inode list to make sure we properly wait for block
> contents to be written before committing the transaction. So that is
> what we do in this patch. This also handles other cases where stale data
> exposure was possible - like filling hole via mmap in
> data=ordered,nodelalloc mode.
> 
> The only exception to the above rule are extending direct IO writes where
> blkdev_direct_IO() waits for IO to complete before increasing i_size and
> thus stale data exposure is not possible. For now we don't complicate
> the code with optimizing this special case since the overhead is pretty
> low. In case this is observed to be a performance problem we can always
> handle it using a special flag to ext4_map_blocks().
> 
> CC: stable@vger.kernel.org
> Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d
> Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

Applied, thanks.

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/4] ext4: Fix data exposure after a crash
  2016-04-24  3:48   ` Theodore Ts'o
@ 2016-04-24  4:55     ` Theodore Ts'o
  2016-04-25 10:24       ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2016-04-24  4:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Weller.Huang, stable

I had to add a !IS_NOQUOTA check:

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 576f64a..250c2df 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -693,6 +693,7 @@ out_sem:
 		if (map->m_flags & EXT4_MAP_NEW &&
 		    !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
 		    !(flags & EXT4_GET_BLOCKS_ZERO) &&
+		    !IS_NOQUOTA(inode) &&
 		    ext4_should_order_data(inode)) {
 			ret = ext4_jbd2_file_inode(handle, inode);
 			if (ret)


In order to prevent crashes when writing to the quota file (see
below).

Cheers,

				- Ted

generic/244		[00:42:13]run fstests generic/244 at 2016-04-24 00:42:13
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8128a126>] jbd2_journal_file_inode+0x35/0xde
PGD 0 
Oops: 0000 [#1] SMP 
CPU: 0 PID: 3316 Comm: setquota Not tainted 4.6.0-rc4-ext4-00002-gc66f90b #246
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff88007aa8f440 ti: ffff88007a970000 task.ti: ffff88007a970000
RIP: 0010:[<ffffffff8128a126>]  [<ffffffff8128a126>] jbd2_journal_file_inode+0x35/0xde
RSP: 0018:ffff88007a973980  EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88007a973a18 RCX: 00000000a8bac810
RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff88007bbbc440
RBP: ffff88007b9b8e00 R08: 0000000000000002 R09: ffff88007a973950
R10: 0000000000000003 R11: 47ffffffffffffff R12: 0000000000000000
R13: ffff88007a838000 R14: ffff88007bbbc440 R15: 0000000000000001
FS:  00007f8bf247c700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007ab85000 CR4: 00000000000006f0
Stack:
 ffff88007a973a18 ffff880079dbd5f0 ffff880079dbd670 0000000000000000
 ffff88007bbbc440 ffffffff8123f810 ffff88007b8b3a88 ffff88007b8b3a88
 ffff88007b8b3a88 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8123f810>] ? ext4_map_blocks+0x45e/0x47d
 [<ffffffff8123fe0d>] ? ext4_getblk+0x3c/0x140
 [<ffffffff8123ff1d>] ? ext4_bread+0xc/0x60
 [<ffffffff81259598>] ? ext4_quota_write+0xe1/0x19d
 [<ffffffff81206b8f>] ? write_blk+0x2f/0x66
 [<ffffffff81206c8d>] ? get_free_dqblk+0x64/0x8f
 [<ffffffff81207545>] ? do_insert_tree+0x4c/0x384
 [<ffffffff812077fc>] ? do_insert_tree+0x303/0x384
 [<ffffffff812077fc>] ? do_insert_tree+0x303/0x384
 [<ffffffff812077fc>] ? do_insert_tree+0x303/0x384
 [<ffffffff811ad0c1>] ? __kmalloc+0x134/0x1a7
 [<ffffffff81207906>] ? qtree_write_dquot+0x89/0x15e
 [<ffffffff81207bd9>] ? qtree_read_dquot+0x1d6/0x1e3
 [<ffffffff81202b91>] ? dquot_acquire+0x8f/0xf9
 [<ffffffff812577e3>] ? ext4_acquire_dquot+0x76/0x94
 [<ffffffff81204198>] ? dqget+0x273/0x430
 [<ffffffff81204972>] ? dquot_get_dqblk+0xf/0x32
 [<ffffffff81208861>] ? quota_getquota+0x5f/0x111
 [<ffffffff811abba1>] ? cache_alloc_debugcheck_after.isra.17+0x4d/0x15a


   	    


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/4] ext4: Fix data exposure after a crash
  2016-04-24  4:55     ` Theodore Ts'o
@ 2016-04-25 10:24       ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2016-04-25 10:24 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4, Weller.Huang, stable

On Sun 24-04-16 00:55:51, Ted Tso wrote:
> I had to add a !IS_NOQUOTA check:
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 576f64a..250c2df 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -693,6 +693,7 @@ out_sem:
>  		if (map->m_flags & EXT4_MAP_NEW &&
>  		    !(map->m_flags & EXT4_MAP_UNWRITTEN) &&
>  		    !(flags & EXT4_GET_BLOCKS_ZERO) &&
> +		    !IS_NOQUOTA(inode) &&
>  		    ext4_should_order_data(inode)) {
>  			ret = ext4_jbd2_file_inode(handle, inode);
>  			if (ret)
> 
> 
> In order to prevent crashes when writing to the quota file (see
> below).

Good catch. The fix looks correct to me, thanks for fixing this up. Since
data for quota files is actually journalled, it may be actually a cleaner
fix to reflect this in ext4_inode_journal_mode() and thus
ext4_should_order_data() would catch this case. But this has more potential
for breakage elsewhere so I will do that as a separate cleanup patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-25 10:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1459354767-8693-1-git-send-email-jack@suse.cz>
2016-03-30 16:19 ` [PATCH 1/4] ext4: Fix data exposure after a crash Jan Kara
2016-04-24  3:48   ` Theodore Ts'o
2016-04-24  4:55     ` Theodore Ts'o
2016-04-25 10:24       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).