* [PATCH 0/4] Fix bugs and possible data corruption if blocksize < pagesize @ 2009-03-17 17:33 Jan Kara 2009-03-17 17:33 ` [PATCH 1/4] ext3: Fix false EIO errors Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2009-03-17 17:33 UTC (permalink / raw) To: LKML; +Cc: linux-ext4 Hi, lately, I've been tracking some problems with ext3 reported by HP on ia64 and this is what I came with. The first three patches fix real bugs - the first two fix bugs in ext3 leading to false reports of EIO errors from JBD and aborted journal, the third patch fixes a bug in block_write_full_page() which can possibly lead to data corruption. With these patches, don't see the data corruption I was able to reproduce with fsx-linux under UML. I'm not yet sure whether all the problems HP reported are gone (they're testing now) but since this seems to be a nasty problem I'm posting earlier rather than later. Please someone have a look whether my fix and analysis looks sane. Thanks. Honza ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/4] ext3: Fix false EIO errors 2009-03-17 17:33 [PATCH 0/4] Fix bugs and possible data corruption if blocksize < pagesize Jan Kara @ 2009-03-17 17:33 ` Jan Kara 2009-03-17 17:33 ` [PATCH 2/4] ext3: Avoid " Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2009-03-17 17:33 UTC (permalink / raw) To: LKML; +Cc: linux-ext4, Jan Kara When machine is under heavy memory pressure, it can happen that the page we want to copy data from is paged out from memory before we copy data from it and thus we are called with copied == 0. Generally, we should not file buffers into journal lists if the don't really contain uptodate data. So fix the range of buffers we file to journal list to contain only buffers with copied data. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext3/inode.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 5fa453b..62005c0 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1261,7 +1261,7 @@ static int ext3_ordered_write_end(struct file *file, int ret = 0, ret2; from = pos & (PAGE_CACHE_SIZE - 1); - to = from + len; + to = from + copied; ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, ext3_journal_dirty_data); -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/4] ext3: Avoid false EIO errors 2009-03-17 17:33 ` [PATCH 1/4] ext3: Fix false EIO errors Jan Kara @ 2009-03-17 17:33 ` Jan Kara 2009-03-17 17:33 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Jan Kara 0 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2009-03-17 17:33 UTC (permalink / raw) To: LKML; +Cc: linux-ext4, Jan Kara Sometimes block_write_begin() can map buffers in a page but later we fail to copy data into those buffers (because the source page has been paged out in the mean time). We then end up with !uptodate mapped buffers. We must not file such buffers into a transaction - firstly they contain garbage and secondly it confuses the journaling code (it thinks write has failed and complains about IO errors). Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext3/inode.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 62005c0..d351eab 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1430,7 +1430,11 @@ static int bput_one(handle_t *handle, struct buffer_head *bh) static int journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh) { - if (buffer_mapped(bh)) + /* + * Parallel write could have mapped the buffer but it didn't copy + * the data in yet. So avoid filing such buffer into a transaction. + */ + if (buffer_mapped(bh) && buffer_uptodate(bh)) return ext3_journal_dirty_data(handle, bh); return 0; } -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-17 17:33 ` [PATCH 2/4] ext3: Avoid " Jan Kara @ 2009-03-17 17:33 ` Jan Kara 2009-03-17 17:33 ` [PATCH 4/4] fs: Warn about writing !uptodate buffers Jan Kara ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Jan Kara @ 2009-03-17 17:33 UTC (permalink / raw) To: LKML; +Cc: linux-ext4, Jan Kara, Nick Piggin Assume the following situation: Filesystem with blocksize < pagesize - suppose blocksize = 1024, pagesize = 4096. File 'f' has first four blocks already allocated. (line with "state:" contains the state of buffers in the page - m = mapped, u = uptodate, d = dirty) process 1: process 2: write to 'f' bytes 0 - 1024 state: |mud,-,-,-|, page dirty write to 'f' bytes 1024 - 4096: __block_prepare_write() maps blocks state: |mud,m,m,m|, page dirty we fail to copy data -> copied = 0 block_write_end() does nothing page gets unlocked writepage() is called on the page block_write_full_page() writes buffers with garbage This patch fixes the problem by skipping !uptodate buffers in block_write_full_page(). CC: Nick Piggin <npiggin@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/buffer.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 9f69741..22c0144 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page, } while (bh != head); do { - if (!buffer_mapped(bh)) + /* + * Parallel write could have already mapped the buffers but + * it then had to restart before copying in new data. We + * must avoid writing garbage so just skip the buffer. + */ + if (!buffer_mapped(bh) || !buffer_uptodate(bh)) continue; /* * If it's a fully non-blocking write attempt and we cannot -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/4] fs: Warn about writing !uptodate buffers 2009-03-17 17:33 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Jan Kara @ 2009-03-17 17:33 ` Jan Kara 2009-03-18 12:07 ` Nick Piggin 2009-03-18 12:00 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Nick Piggin 2009-03-18 18:42 ` Aneesh Kumar K.V 2 siblings, 1 reply; 11+ messages in thread From: Jan Kara @ 2009-03-17 17:33 UTC (permalink / raw) To: LKML; +Cc: linux-ext4, Jan Kara Make submit_bh() warn about writing !uptodate buffers. Hopefully this warns us about writing garbage (although bugs in write EIO handling are going to trigger this as well as they already trigger the warning in mark_buffer_dirty()). Signed-off-by: Jan Kara <jack@suse.cz> --- fs/buffer.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 22c0144..985f617 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2997,6 +2997,8 @@ int submit_bh(int rw, struct buffer_head * bh) BUG_ON(!buffer_locked(bh)); BUG_ON(!buffer_mapped(bh)); BUG_ON(!bh->b_end_io); + if (rw & WRITE) + WARN_ON_ONCE(!buffer_uptodate(bh)); /* * Mask in barrier bit for a write (could be either a WRITE or a -- 1.6.0.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] fs: Warn about writing !uptodate buffers 2009-03-17 17:33 ` [PATCH 4/4] fs: Warn about writing !uptodate buffers Jan Kara @ 2009-03-18 12:07 ` Nick Piggin 0 siblings, 0 replies; 11+ messages in thread From: Nick Piggin @ 2009-03-18 12:07 UTC (permalink / raw) To: Jan Kara; +Cc: LKML, linux-ext4 On Wednesday 18 March 2009 04:33:55 Jan Kara wrote: > Make submit_bh() warn about writing !uptodate buffers. Hopefully this > warns us about writing garbage (although bugs in write EIO handling > are going to trigger this as well as they already trigger the warning > in mark_buffer_dirty()). > > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/buffer.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 22c0144..985f617 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2997,6 +2997,8 @@ int submit_bh(int rw, struct buffer_head * bh) > BUG_ON(!buffer_locked(bh)); > BUG_ON(!buffer_mapped(bh)); > BUG_ON(!bh->b_end_io); > + if (rw & WRITE) > + WARN_ON_ONCE(!buffer_uptodate(bh)); Yes very nice assertion to have. Arguably I think it should be a BUG_ON because it is definitely some state corruption at least, and writing garbage data to disk at worst. But WARN_ON for now is probably best. I have some patches to fix up some problems with EIO handling in the VM (and I think solves the the buffer warning too). I have to get them out and try to get them merged again... at which point this should probably be turned into a bug. Acked-by: Nick Piggin <npiggin@suse.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-17 17:33 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Jan Kara 2009-03-17 17:33 ` [PATCH 4/4] fs: Warn about writing !uptodate buffers Jan Kara @ 2009-03-18 12:00 ` Nick Piggin 2009-03-18 14:13 ` Jan Kara 2009-03-18 18:57 ` Jan Kara 2009-03-18 18:42 ` Aneesh Kumar K.V 2 siblings, 2 replies; 11+ messages in thread From: Nick Piggin @ 2009-03-18 12:00 UTC (permalink / raw) To: Jan Kara; +Cc: LKML, linux-ext4 On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote: > Assume the following situation: > Filesystem with blocksize < pagesize - suppose blocksize = 1024, > pagesize = 4096. File 'f' has first four blocks already allocated. > (line with "state:" contains the state of buffers in the page - m = mapped, > u = uptodate, d = dirty) > > process 1: process 2: > > write to 'f' bytes 0 - 1024 > state: |mud,-,-,-|, page dirty > write to 'f' bytes 1024 - 4096: > __block_prepare_write() maps blocks > state: |mud,m,m,m|, page dirty > we fail to copy data -> copied = 0 > block_write_end() does nothing > page gets unlocked > writepage() is called on the page > block_write_full_page() writes buffers with garbage > > This patch fixes the problem by skipping !uptodate buffers in > block_write_full_page(). > > CC: Nick Piggin <npiggin@suse.de> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/buffer.c | 7 ++++++- > 1 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 9f69741..22c0144 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page, > } while (bh != head); > > do { > - if (!buffer_mapped(bh)) > + /* > + * Parallel write could have already mapped the buffers but > + * it then had to restart before copying in new data. We > + * must avoid writing garbage so just skip the buffer. > + */ > + if (!buffer_mapped(bh) || !buffer_uptodate(bh)) > continue; I don't quite see how this can happen. Further down in this loop, we do a test_clear_buffer_dirty(), which should exclude this I think? And marking the buffer dirty if it is not uptodate should be a bug. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-18 12:00 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Nick Piggin @ 2009-03-18 14:13 ` Jan Kara 2009-03-18 18:57 ` Jan Kara 1 sibling, 0 replies; 11+ messages in thread From: Jan Kara @ 2009-03-18 14:13 UTC (permalink / raw) To: Nick Piggin; +Cc: LKML, linux-ext4 On Wed 18-03-09 13:00:23, Nick Piggin wrote: > On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote: > > Assume the following situation: > > Filesystem with blocksize < pagesize - suppose blocksize = 1024, > > pagesize = 4096. File 'f' has first four blocks already allocated. > > (line with "state:" contains the state of buffers in the page - m = mapped, > > u = uptodate, d = dirty) > > > > process 1: process 2: > > > > write to 'f' bytes 0 - 1024 > > state: |mud,-,-,-|, page dirty > > write to 'f' bytes 1024 - 4096: > > __block_prepare_write() maps blocks > > state: |mud,m,m,m|, page dirty > > we fail to copy data -> copied = 0 > > block_write_end() does nothing > > page gets unlocked > > writepage() is called on the page > > block_write_full_page() writes buffers with garbage > > > > This patch fixes the problem by skipping !uptodate buffers in > > block_write_full_page(). > > > > CC: Nick Piggin <npiggin@suse.de> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/buffer.c | 7 ++++++- > > 1 files changed, 6 insertions(+), 1 deletions(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 9f69741..22c0144 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page, > > } while (bh != head); > > > > do { > > - if (!buffer_mapped(bh)) > > + /* > > + * Parallel write could have already mapped the buffers but > > + * it then had to restart before copying in new data. We > > + * must avoid writing garbage so just skip the buffer. > > + */ > > + if (!buffer_mapped(bh) || !buffer_uptodate(bh)) > > continue; > > I don't quite see how this can happen. Further down in this loop, > we do a test_clear_buffer_dirty(), which should exclude this I > think? And marking the buffer dirty if it is not uptodate should > be a bug. Hmm, this patch definitely does something important because without it I hit corruption in UML in ~20 minutes and with it no corruption happens in ~3 hours. Maybe someone calls set_page_dirty() on the page and __set_page_dirty_buffers() unconditionally dirties all the buffers the page has? But I still don't see how the write could be lost which is what I observe in fsx-linux test. I'm doing some more tests to understand this better. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-18 12:00 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Nick Piggin 2009-03-18 14:13 ` Jan Kara @ 2009-03-18 18:57 ` Jan Kara 1 sibling, 0 replies; 11+ messages in thread From: Jan Kara @ 2009-03-18 18:57 UTC (permalink / raw) To: Nick Piggin; +Cc: LKML, linux-ext4 On Wed 18-03-09 13:00:23, Nick Piggin wrote: > On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote: > > Assume the following situation: > > Filesystem with blocksize < pagesize - suppose blocksize = 1024, > > pagesize = 4096. File 'f' has first four blocks already allocated. > > (line with "state:" contains the state of buffers in the page - m = mapped, > > u = uptodate, d = dirty) > > > > process 1: process 2: > > > > write to 'f' bytes 0 - 1024 > > state: |mud,-,-,-|, page dirty > > write to 'f' bytes 1024 - 4096: > > __block_prepare_write() maps blocks > > state: |mud,m,m,m|, page dirty > > we fail to copy data -> copied = 0 > > block_write_end() does nothing > > page gets unlocked > > writepage() is called on the page > > block_write_full_page() writes buffers with garbage > > > > This patch fixes the problem by skipping !uptodate buffers in > > block_write_full_page(). > > > > CC: Nick Piggin <npiggin@suse.de> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/buffer.c | 7 ++++++- > > 1 files changed, 6 insertions(+), 1 deletions(-) > > > > diff --git a/fs/buffer.c b/fs/buffer.c > > index 9f69741..22c0144 100644 > > --- a/fs/buffer.c > > +++ b/fs/buffer.c > > @@ -1774,7 +1774,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page, > > } while (bh != head); > > > > do { > > - if (!buffer_mapped(bh)) > > + /* > > + * Parallel write could have already mapped the buffers but > > + * it then had to restart before copying in new data. We > > + * must avoid writing garbage so just skip the buffer. > > + */ > > + if (!buffer_mapped(bh) || !buffer_uptodate(bh)) > > continue; > > I don't quite see how this can happen. Further down in this loop, > we do a test_clear_buffer_dirty(), which should exclude this I > think? And marking the buffer dirty if it is not uptodate should > be a bug. OK, I spoke too soon. Now I reproduced the corruption under UML even with this patch. So it may be something different... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-17 17:33 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Jan Kara 2009-03-17 17:33 ` [PATCH 4/4] fs: Warn about writing !uptodate buffers Jan Kara 2009-03-18 12:00 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Nick Piggin @ 2009-03-18 18:42 ` Aneesh Kumar K.V 2009-03-18 18:50 ` Jan Kara 2 siblings, 1 reply; 11+ messages in thread From: Aneesh Kumar K.V @ 2009-03-18 18:42 UTC (permalink / raw) To: Jan Kara; +Cc: LKML, linux-ext4, Nick Piggin On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote: > Assume the following situation: > Filesystem with blocksize < pagesize - suppose blocksize = 1024, > pagesize = 4096. File 'f' has first four blocks already allocated. > (line with "state:" contains the state of buffers in the page - m = mapped, > u = uptodate, d = dirty) > > process 1: process 2: > > write to 'f' bytes 0 - 1024 > state: |mud,-,-,-|, page dirty > write to 'f' bytes 1024 - 4096: > __block_prepare_write() maps blocks > state: |mud,m,m,m|, page dirty > we fail to copy data -> copied = 0 > block_write_end() does nothing > page gets unlocked If copied = 0 then in block_write_end we do page_zero_new_buffers(page, start+copied, start+len which would mean we should not see garbage. > writepage() is called on the page > block_write_full_page() writes buffers with garbage > -aneesh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize 2009-03-18 18:42 ` Aneesh Kumar K.V @ 2009-03-18 18:50 ` Jan Kara 0 siblings, 0 replies; 11+ messages in thread From: Jan Kara @ 2009-03-18 18:50 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: LKML, linux-ext4, Nick Piggin On Thu 19-03-09 00:12:22, Aneesh Kumar K.V wrote: > On Tue, Mar 17, 2009 at 06:33:54PM +0100, Jan Kara wrote: > > Assume the following situation: > > Filesystem with blocksize < pagesize - suppose blocksize = 1024, > > pagesize = 4096. File 'f' has first four blocks already allocated. > > (line with "state:" contains the state of buffers in the page - m = mapped, > > u = uptodate, d = dirty) > > > > process 1: process 2: > > > > write to 'f' bytes 0 - 1024 > > state: |mud,-,-,-|, page dirty > > write to 'f' bytes 1024 - 4096: > > __block_prepare_write() maps blocks > > state: |mud,m,m,m|, page dirty > > we fail to copy data -> copied = 0 > > block_write_end() does nothing > > page gets unlocked > > > If copied = 0 then in block_write_end we do > > page_zero_new_buffers(page, start+copied, start+len > > which would mean we should not see garbage. But this will zero only *new* buffers - so if they are already allocated, get_block() won't set new flag and they won't be zeroed... But I'm not saying I understand why this seems to help against a corruption under UML because we don't seem to be writing !uptodate buffers there. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-03-18 18:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-17 17:33 [PATCH 0/4] Fix bugs and possible data corruption if blocksize < pagesize Jan Kara 2009-03-17 17:33 ` [PATCH 1/4] ext3: Fix false EIO errors Jan Kara 2009-03-17 17:33 ` [PATCH 2/4] ext3: Avoid " Jan Kara 2009-03-17 17:33 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Jan Kara 2009-03-17 17:33 ` [PATCH 4/4] fs: Warn about writing !uptodate buffers Jan Kara 2009-03-18 12:07 ` Nick Piggin 2009-03-18 12:00 ` [PATCH 3/4] fs: Avoid data corruption with blocksize < pagesize Nick Piggin 2009-03-18 14:13 ` Jan Kara 2009-03-18 18:57 ` Jan Kara 2009-03-18 18:42 ` Aneesh Kumar K.V 2009-03-18 18:50 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).