Re: [PATCH] ext4: Fix entry corruption when disk online and offline frequently

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: Fix entry corruption when disk online and offline frequently
Date: Thu, 22 Aug 2019 23:02:07 -0400	[thread overview]
Message-ID: <20190823030207.GC8130@mit.edu> (raw)
In-Reply-To: <20190517225940.GC21961@mit.edu>

I've applied this patch with the modified patch summary:

ext4: treat buffers with write errors as containing valid data

Cheers,

					- Ted

On Fri, May 17, 2019 at 06:59:40PM -0400, Theodore Ts'o wrote:
> On Tue, May 14, 2019 at 12:23:37PM +0800, ZhangXiaoxu wrote:
> > I got some errors when I repair an ext4 volume which stacked by an
> > iscsi target:
> >     Entry 'test60' in / (2) has deleted/unused inode 73750.  Clear?
> > It can be reproduced when the network not good enough.
> > 
> > When I debug this I found ext4 will read entry buffer from disk and
> > the buffer is marked with write_io_error.
> > 
> > If the buffer is marked with write_io_error, it means it already
> > wroten to journal, and not checked out to disk. IOW, the journal
> > is newer than the data in disk.
> > If this journal record 'delete test60', it means the 'test60' still
> > on the disk metadata.
> > 
> > In this case, if we read the buffer from disk successfully and create
> > file continue, the new journal record will overwrite the journal
> > which record 'delete test60', then the entry corruptioned.
> > 
> > So, use the buffer rather than read from disk if the buffer marked
> > with write_io_error
> 
> You've raised a number of issues about how we handle write errors,
> especially when they occur due to a flaky transport --- in your case,
> due to iSCSI.  As such, your patch isn't wrong, so much as it is
> incomplete.
> 
> For example, your assumption that if the buffer is marked
> write_io_error, it's safe to clear write_io_error and reset
> buffer_uptodate assumes that journalling is enabled.  If the file
> system does not have the journal, there is no journal to fall back
> upon.  For file systems which do have a journal, if you are using a
> flaky iSCSI transport, there is no protection from write errors which
> occur when the journal is replayed.  (fs/jbd2/recovery.c simply marks
> the buffer dirty and allows the writeback code take care of writing
> the buffer.)  This means that the buffer could have write_io_error set
> due to a failure to write the buffer during recovery, in which case
> relying on the journal having a uptodate copy block is invalid.
> 
> Also, this patch only patches the ex4_bread() path, which is only used
> by directories.  It doesn't deal with metadata reads for allocation
> bitmaps or extent tree blocks.  We are doing this hack for inode table
> blocks, already; perhaps you got the idea to do this for ext4_bread()
> from __ext4_get_inode_loc()?
> 
> We could add some kind of callback from the buffer cache layer when an
> aysnchronous writeback fails --- or we could use a synchronous write
> in the journal recovery code (which would be bad from a performance
> perspective, but ignore that for the moment) --- however, what do we
> do when we discover that there is an error?  Right now, we do nothing
> until we try to read the inode table block (and after your patch,
> reading a directory block).  Under memory pressure, though, the data
> will get lost and we don't even mark the file system as needing to be
> checked.  We could retry the write, but if it's due to a flaky iSCSI
> or FC transport, this write could fail yet again --- and then what?
> 
> So while I could apply this patch, since it doesn't make things worse,
> I want to make sure you are aware that if you have problems with your
> iSCSI device, this patch is far from a complete solution.  At the very
> least, we should handle reads for other metadata block.  
> 
>       	      	   	    	       		  - Ted

     prev parent reply	other threads:[~2019-08-23  3:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14  4:23 [PATCH] ext4: Fix entry corruption when disk online and offline frequently ZhangXiaoxu
2019-05-17 22:59 ` Theodore Ts'o
2019-08-23  3:02   ` Theodore Y. Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190823030207.GC8130@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=zhangxiaoxu5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.