* [PATCH RFC] xfs: handle torn writes during log head/tail discovery @ 2015-07-06 18:26 Brian Foster 2015-07-06 19:02 ` Brian Foster 2015-07-07 0:53 ` Dave Chinner 0 siblings, 2 replies; 6+ messages in thread From: Brian Foster @ 2015-07-06 18:26 UTC (permalink / raw) To: xfs Persistent memory without block translation table (btt) support provides a block device suitable for filesystems, but does not provide the sector write atomicity guarantees typical of block storage. This is a problem for log recovery on XFS. The on-disk log record structure already includes a CRC and thus can detect torn writes. The problem is that such a failure isn't detected until log recovery is already in progress and therefore results in a hard error and mount failure. Update the log head/tail discovery algorithm to detect and trim off a torn log record from the end (from a recovery perspective) of the log. Once the head is determined from the log cycle information and we have a pointer to the last record to be recovered, read and verify the CRC of said record before we initiate actual recovery. If CRC verification fails, assume the record is torn and reset the head to the start of the torn record. We reverse seek for the previous log record header from that point and attempt recovery up through that record. Signed-off-by: Brian Foster <bfoster@redhat.com> --- Hi all, As far as I'm aware, the torn log write problem is the only major gap we have to safely run XFS w/ crc=1 on non-btt pmem devices. This is an RFC to attempt to address that problem. I used a custom hack to actually reproduce such torn writes on a ramdisk to primarily help me understand the problem, but I was able to use the same hack to sanity test this approach under the basic corruption case (this is generally untested, otherwise). This could obviously use some cleanup, but is the approach sane? I'm curious if this should also be tied to DAX enablement so as to not interfere with unrelated corruption handling (IOW, with what logic is it reasonable enough to assume a crc failure of the final record == torn write?). I'm also wondering if this should be tied to a new mount option rather than make this behavior implicit. Any other thoughts? Brian fs/xfs/xfs_log_recover.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 01dd228..6015d02 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -62,6 +62,9 @@ xlog_recover_check_summary( #define xlog_recover_check_summary(log) #endif +STATIC int +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); + /* * This structure is used during recovery to record the buf log items which * have been canceled and should not be replayed. @@ -898,6 +901,7 @@ xlog_find_tail( xfs_daddr_t after_umount_blk; xfs_lsn_t tail_lsn; int hblks; + bool skipped_last = false; found = 0; @@ -925,6 +929,7 @@ xlog_find_tail( /* * Search backwards looking for log record header block */ +retry: ASSERT(*head_blk < INT_MAX); for (i = (int)(*head_blk) - 1; i >= 0; i--) { error = xlog_bread(log, i, 1, bp, &offset); @@ -962,6 +967,31 @@ xlog_find_tail( return -EIO; } + /* + * Now that we think we've found the log head, we have to check that the + * last log record wasn't torn when written out to the log. This is + * possible on devices without sector atomicity guarantees (e.g., pmem). + * + * Verify the CRC of the last log record that was written. If the CRC is + * invalid, point the head at the start of this record and retry the + * above backwards log record header search. We'll try the recovery up + * through this record. Note that we only walk backwards once since this + * is only intended to handle the torn write on power loss case. + * + * TODO: mount option? tied to DAX? + */ + if (xfs_sb_version_hascrc(&log->l_mp->m_sb) && !skipped_last) { + error = xlog_validate_logrec_crc(log, i); + if (error == -EFSBADCRC) { + skipped_last = true; + *head_blk = i; + xfs_warn(log->l_mp, + "WARNING: Torn write? Attempting recovery up to previous record."); + goto retry; + } else if (error) + goto done; + } + /* find blk_no of tail of log */ rhead = (xlog_rec_header_t *)offset; *tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn)); @@ -4607,6 +4637,63 @@ xlog_recover_finish( return 0; } +/* + * Read and CRC validate a full log record. This is used to detect torn log + * writes during head/tail discovery. + */ +STATIC int +xlog_validate_logrec_crc( + struct xlog *log, + xfs_daddr_t rec_blk) +{ + int hblks, bblks; + struct xlog_rec_header *rhead; + struct xfs_buf *hbp = NULL; + struct xfs_buf *dbp = NULL; + int error; + char *offset; + + hblks = 1; /* XXX */ + + /* read and validate the log record header */ + hbp = xlog_get_bp(log, 1); + if (!hbp) + return -ENOMEM; + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); + if (error) + goto out; + + rhead = (struct xlog_rec_header *) offset; + + error = xlog_valid_rec_header(log, rhead, rec_blk); + if (error) + goto out; + + /* read the full record and verify the CRC */ + /* XXX: factor out from do_recovery_pass() ? */ + dbp = xlog_get_bp(log, BTOBB(be32_to_cpu(rhead->h_size))); + if (!dbp) + goto out; + + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); + if (error) + goto out; + + /* + * If the CRC validation fails, convert the return code so the caller + * can distinguish from unrelated errors. + */ + error = xlog_unpack_data_crc(rhead, offset, log); + if (error) + error = -EFSBADCRC; +out: + if (dbp) + xlog_put_bp(dbp); + if (hbp) + xlog_put_bp(hbp); + return error; +} #if defined(DEBUG) /* -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] xfs: handle torn writes during log head/tail discovery 2015-07-06 18:26 [PATCH RFC] xfs: handle torn writes during log head/tail discovery Brian Foster @ 2015-07-06 19:02 ` Brian Foster 2015-07-07 0:53 ` Dave Chinner 1 sibling, 0 replies; 6+ messages in thread From: Brian Foster @ 2015-07-06 19:02 UTC (permalink / raw) To: xfs On Mon, Jul 06, 2015 at 02:26:34PM -0400, Brian Foster wrote: > Persistent memory without block translation table (btt) support provides > a block device suitable for filesystems, but does not provide the sector > write atomicity guarantees typical of block storage. This is a problem > for log recovery on XFS. The on-disk log record structure already > includes a CRC and thus can detect torn writes. The problem is that such > a failure isn't detected until log recovery is already in progress and > therefore results in a hard error and mount failure. > > Update the log head/tail discovery algorithm to detect and trim off a > torn log record from the end (from a recovery perspective) of the log. > Once the head is determined from the log cycle information and we have a > pointer to the last record to be recovered, read and verify the CRC of > said record before we initiate actual recovery. If CRC verification > fails, assume the record is torn and reset the head to the start of the > torn record. We reverse seek for the previous log record header from > that point and attempt recovery up through that record. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Hi all, > > As far as I'm aware, the torn log write problem is the only major gap we > have to safely run XFS w/ crc=1 on non-btt pmem devices. This is an RFC > to attempt to address that problem. I used a custom hack to actually > reproduce such torn writes on a ramdisk to primarily help me understand > the problem, but I was able to use the same hack to sanity test this > approach under the basic corruption case (this is generally untested, > otherwise). > > This could obviously use some cleanup, but is the approach sane? I'm > curious if this should also be tied to DAX enablement so as to not > interfere with unrelated corruption handling (IOW, with what logic is it > reasonable enough to assume a crc failure of the final record == torn > write?). I'm also wondering if this should be tied to a new mount option > rather than make this behavior implicit. Any other thoughts? > It occurs to me that this may also need to grow some logic for all the log wrapping and whatnot that is handled in xlog_do_recovery_pass(). I'll need to stare at that some more, but FWIW, I did have another hack that added a "dummy" recovery pass that only verified CRCs and walked backwards until verification succeeded. It was ugly as is and quite honestly just another hack to help me try and understand what was going on. That said, perhaps cleaning it up and doing something like a dummy crc verify pass from the log head/tail discovery code and _only_ over the range of the final record might be a reasonable alternative. That would allow reuse of the existing code for any of those wrap around cases that might need to be handled. Anyways, thoughts or other ideas appreciated. Brian > Brian > > fs/xfs/xfs_log_recover.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 87 insertions(+) > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 01dd228..6015d02 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -62,6 +62,9 @@ xlog_recover_check_summary( > #define xlog_recover_check_summary(log) > #endif > > +STATIC int > +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); > + > /* > * This structure is used during recovery to record the buf log items which > * have been canceled and should not be replayed. > @@ -898,6 +901,7 @@ xlog_find_tail( > xfs_daddr_t after_umount_blk; > xfs_lsn_t tail_lsn; > int hblks; > + bool skipped_last = false; > > found = 0; > > @@ -925,6 +929,7 @@ xlog_find_tail( > /* > * Search backwards looking for log record header block > */ > +retry: > ASSERT(*head_blk < INT_MAX); > for (i = (int)(*head_blk) - 1; i >= 0; i--) { > error = xlog_bread(log, i, 1, bp, &offset); > @@ -962,6 +967,31 @@ xlog_find_tail( > return -EIO; > } > > + /* > + * Now that we think we've found the log head, we have to check that the > + * last log record wasn't torn when written out to the log. This is > + * possible on devices without sector atomicity guarantees (e.g., pmem). > + * > + * Verify the CRC of the last log record that was written. If the CRC is > + * invalid, point the head at the start of this record and retry the > + * above backwards log record header search. We'll try the recovery up > + * through this record. Note that we only walk backwards once since this > + * is only intended to handle the torn write on power loss case. > + * > + * TODO: mount option? tied to DAX? > + */ > + if (xfs_sb_version_hascrc(&log->l_mp->m_sb) && !skipped_last) { > + error = xlog_validate_logrec_crc(log, i); > + if (error == -EFSBADCRC) { > + skipped_last = true; > + *head_blk = i; > + xfs_warn(log->l_mp, > + "WARNING: Torn write? Attempting recovery up to previous record."); > + goto retry; > + } else if (error) > + goto done; > + } > + > /* find blk_no of tail of log */ > rhead = (xlog_rec_header_t *)offset; > *tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn)); > @@ -4607,6 +4637,63 @@ xlog_recover_finish( > return 0; > } > > +/* > + * Read and CRC validate a full log record. This is used to detect torn log > + * writes during head/tail discovery. > + */ > +STATIC int > +xlog_validate_logrec_crc( > + struct xlog *log, > + xfs_daddr_t rec_blk) > +{ > + int hblks, bblks; > + struct xlog_rec_header *rhead; > + struct xfs_buf *hbp = NULL; > + struct xfs_buf *dbp = NULL; > + int error; > + char *offset; > + > + hblks = 1; /* XXX */ > + > + /* read and validate the log record header */ > + hbp = xlog_get_bp(log, 1); > + if (!hbp) > + return -ENOMEM; > + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); > + if (error) > + goto out; > + > + rhead = (struct xlog_rec_header *) offset; > + > + error = xlog_valid_rec_header(log, rhead, rec_blk); > + if (error) > + goto out; > + > + /* read the full record and verify the CRC */ > + /* XXX: factor out from do_recovery_pass() ? */ > + dbp = xlog_get_bp(log, BTOBB(be32_to_cpu(rhead->h_size))); > + if (!dbp) > + goto out; > + > + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); > + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); > + if (error) > + goto out; > + > + /* > + * If the CRC validation fails, convert the return code so the caller > + * can distinguish from unrelated errors. > + */ > + error = xlog_unpack_data_crc(rhead, offset, log); > + if (error) > + error = -EFSBADCRC; > +out: > + if (dbp) > + xlog_put_bp(dbp); > + if (hbp) > + xlog_put_bp(hbp); > + return error; > +} > > #if defined(DEBUG) > /* > -- > 2.1.0 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] xfs: handle torn writes during log head/tail discovery 2015-07-06 18:26 [PATCH RFC] xfs: handle torn writes during log head/tail discovery Brian Foster 2015-07-06 19:02 ` Brian Foster @ 2015-07-07 0:53 ` Dave Chinner 2015-07-07 13:10 ` Brian Foster 1 sibling, 1 reply; 6+ messages in thread From: Dave Chinner @ 2015-07-07 0:53 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Mon, Jul 06, 2015 at 02:26:34PM -0400, Brian Foster wrote: > Persistent memory without block translation table (btt) support provides > a block device suitable for filesystems, but does not provide the sector > write atomicity guarantees typical of block storage. This is a problem > for log recovery on XFS. The on-disk log record structure already > includes a CRC and thus can detect torn writes. The problem is that such > a failure isn't detected until log recovery is already in progress and > therefore results in a hard error and mount failure. > > Update the log head/tail discovery algorithm to detect and trim off a > torn log record from the end (from a recovery perspective) of the log. > Once the head is determined from the log cycle information and we have a > pointer to the last record to be recovered, read and verify the CRC of > said record before we initiate actual recovery. If CRC verification > fails, assume the record is torn and reset the head to the start of the > torn record. We reverse seek for the previous log record header from > that point and attempt recovery up through that record. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Hi all, > > As far as I'm aware, the torn log write problem is the only major gap we > have to safely run XFS w/ crc=1 on non-btt pmem devices. This is an RFC > to attempt to address that problem. I used a custom hack to actually > reproduce such torn writes on a ramdisk to primarily help me understand > the problem, but I was able to use the same hack to sanity test this > approach under the basic corruption case (this is generally untested, > otherwise). > > This could obviously use some cleanup, but is the approach sane? I'm > curious if this should also be tied to DAX enablement so as to not > interfere with unrelated corruption handling (IOW, with what logic is it > reasonable enough to assume a crc failure of the final record == torn > write?). I'm also wondering if this should be tied to a new mount option > rather than make this behavior implicit. Any other thoughts? Seems good fom a conceptual point of view - the fact that we CRC all log writes now means this will work on both v4 and v5 filesystems. The CRC validation needs to be done on more than just the head record. We can have up to 8 IOs in flight at the time power goes down, so we need to validate the CRCs for at least the previous 8 log writes... We also need to validate the tail record, as we could be in a situation where the head is overwriting the tail and it tears and so by discarding the head record we discard the tail update and so now the tail record is also corrupt. In that case, we need to abort recovery because the log is unrecoverable. Also: should we zero out the torn sections before starting recovery, like we do with the call to xlog_clear_stale_blocks() later in xlog_find_tail()? > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 01dd228..6015d02 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -62,6 +62,9 @@ xlog_recover_check_summary( > #define xlog_recover_check_summary(log) > #endif > > +STATIC int > +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); > + Put the function here, don't use forward declarations.... > /* > * This structure is used during recovery to record the buf log items which > * have been canceled and should not be replayed. > @@ -898,6 +901,7 @@ xlog_find_tail( > xfs_daddr_t after_umount_blk; > xfs_lsn_t tail_lsn; > int hblks; > + bool skipped_last = false; > > found = 0; > > @@ -925,6 +929,7 @@ xlog_find_tail( > /* > * Search backwards looking for log record header block > */ > +retry: > ASSERT(*head_blk < INT_MAX); > for (i = (int)(*head_blk) - 1; i >= 0; i--) { > error = xlog_bread(log, i, 1, bp, &offset); > @@ -962,6 +967,31 @@ xlog_find_tail( > return -EIO; > } > > + /* > + * Now that we think we've found the log head, we have to check that the > + * last log record wasn't torn when written out to the log. This is > + * possible on devices without sector atomicity guarantees (e.g., pmem). > + * > + * Verify the CRC of the last log record that was written. If the CRC is > + * invalid, point the head at the start of this record and retry the > + * above backwards log record header search. We'll try the recovery up > + * through this record. Note that we only walk backwards once since this > + * is only intended to handle the torn write on power loss case. > + * > + * TODO: mount option? tied to DAX? > + */ Always do it - if any hardware is tearing writes, then we should handle it sanely. > + if (xfs_sb_version_hascrc(&log->l_mp->m_sb) && !skipped_last) { No need to be tied to xfs_sb_version_hascrc(), even v4 filesystems now have a CRC calculated on the log. Only skip it on v4 filesystes when the h_crc field is zero (indicating the log came from an old kernel that didn't calculate CRCs). > + error = xlog_validate_logrec_crc(log, i); > + if (error == -EFSBADCRC) { > + skipped_last = true; > + *head_blk = i; > + xfs_warn(log->l_mp, > + "WARNING: Torn write? Attempting recovery up to previous record."); > + goto retry; > + } else if (error) > + goto done; > + } Better to do a loop and factor the code a bit? > + > /* find blk_no of tail of log */ > rhead = (xlog_rec_header_t *)offset; > *tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn)); > @@ -4607,6 +4637,63 @@ xlog_recover_finish( > return 0; > } > > +/* > + * Read and CRC validate a full log record. This is used to detect torn log > + * writes during head/tail discovery. > + */ > +STATIC int > +xlog_validate_logrec_crc( > + struct xlog *log, > + xfs_daddr_t rec_blk) > +{ > + int hblks, bblks; > + struct xlog_rec_header *rhead; > + struct xfs_buf *hbp = NULL; > + struct xfs_buf *dbp = NULL; > + int error; > + char *offset; > + > + hblks = 1; /* XXX */ > + > + /* read and validate the log record header */ > + hbp = xlog_get_bp(log, 1); > + if (!hbp) > + return -ENOMEM; > + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); > + if (error) > + goto out; > + > + rhead = (struct xlog_rec_header *) offset; > + > + error = xlog_valid_rec_header(log, rhead, rec_blk); > + if (error) > + goto out; > + > + /* read the full record and verify the CRC */ > + /* XXX: factor out from do_recovery_pass() ? */ Yes, please do ;) > + dbp = xlog_get_bp(log, BTOBB(be32_to_cpu(rhead->h_size))); > + if (!dbp) > + goto out; > + > + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); > + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); > + if (error) > + goto out; > + > + /* > + * If the CRC validation fails, convert the return code so the caller > + * can distinguish from unrelated errors. > + */ > + error = xlog_unpack_data_crc(rhead, offset, log); > + if (error) > + error = -EFSBADCRC; Note that this will only return an error on v5 filesystems so some changes will be needed here to handle v4 filesystems. It will also log a CRC mismatch error, so perhaps we want to make the error reporting in this case a little bit less noisy.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] xfs: handle torn writes during log head/tail discovery 2015-07-07 0:53 ` Dave Chinner @ 2015-07-07 13:10 ` Brian Foster 2015-07-07 23:31 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Brian Foster @ 2015-07-07 13:10 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Jul 07, 2015 at 10:53:31AM +1000, Dave Chinner wrote: > On Mon, Jul 06, 2015 at 02:26:34PM -0400, Brian Foster wrote: > > Persistent memory without block translation table (btt) support provides > > a block device suitable for filesystems, but does not provide the sector > > write atomicity guarantees typical of block storage. This is a problem > > for log recovery on XFS. The on-disk log record structure already > > includes a CRC and thus can detect torn writes. The problem is that such > > a failure isn't detected until log recovery is already in progress and > > therefore results in a hard error and mount failure. > > > > Update the log head/tail discovery algorithm to detect and trim off a > > torn log record from the end (from a recovery perspective) of the log. > > Once the head is determined from the log cycle information and we have a > > pointer to the last record to be recovered, read and verify the CRC of > > said record before we initiate actual recovery. If CRC verification > > fails, assume the record is torn and reset the head to the start of the > > torn record. We reverse seek for the previous log record header from > > that point and attempt recovery up through that record. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > > > Hi all, > > > > As far as I'm aware, the torn log write problem is the only major gap we > > have to safely run XFS w/ crc=1 on non-btt pmem devices. This is an RFC > > to attempt to address that problem. I used a custom hack to actually > > reproduce such torn writes on a ramdisk to primarily help me understand > > the problem, but I was able to use the same hack to sanity test this > > approach under the basic corruption case (this is generally untested, > > otherwise). > > > > This could obviously use some cleanup, but is the approach sane? I'm > > curious if this should also be tied to DAX enablement so as to not > > interfere with unrelated corruption handling (IOW, with what logic is it > > reasonable enough to assume a crc failure of the final record == torn > > write?). I'm also wondering if this should be tied to a new mount option > > rather than make this behavior implicit. Any other thoughts? > > Seems good fom a conceptual point of view - the fact that we CRC all > log writes now means this will work on both v4 and v5 filesystems. > Indeed, though v4 currently doesn't fail log recovery on CRC failure. It just yells about it and attempts to continue. This seems like it could broaden the scope of this a bit, though somewhat unrelated to the original torn write problem (more on this at the end). > The CRC validation needs to be done on more than just the head > record. We can have up to 8 IOs in flight at the time power goes > down, so we need to validate the CRCs for at least the previous 8 > log writes... > Interesting, Ok.. so I take it that the recovery process should be to examine the last N records and recover only up to the point of the last correctly written record. I suppose this can be handled by backing up a fixed size from the last record header that is discovered by the code as is today. > We also need to validate the tail record, as we could be in a > situation where the head is overwriting the tail and it tears and > so by discarding the head record we discard the tail update and so > now the tail record is also corrupt. In that case, we need to abort > recovery because the log is unrecoverable. > I don't follow here. If the head is overwriting the tail, hasn't the tail been moved forward and all the associated metadata changes written back before that can occur? > Also: should we zero out the torn sections before starting recovery, > like we do with the call to xlog_clear_stale_blocks() later in > xlog_find_tail()? > The thought crossed my mind just from a conservative implementation standpoint, but I didn't think too hard about it for this rfc. It probably makes sense to do something like this. > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > index 01dd228..6015d02 100644 > > --- a/fs/xfs/xfs_log_recover.c > > +++ b/fs/xfs/xfs_log_recover.c > > @@ -62,6 +62,9 @@ xlog_recover_check_summary( > > #define xlog_recover_check_summary(log) > > #endif > > > > +STATIC int > > +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); > > + > > Put the function here, don't use forward declarations.... > Eh, I put the function at the end because it depends on other functions that don't have declarations but would require them if I put it earlier in the file. The implementation will surely change and I'll avoid it if I can, but I'm not sure how likely that is. > > /* > > * This structure is used during recovery to record the buf log items which > > * have been canceled and should not be replayed. > > @@ -898,6 +901,7 @@ xlog_find_tail( > > xfs_daddr_t after_umount_blk; > > xfs_lsn_t tail_lsn; > > int hblks; > > + bool skipped_last = false; > > > > found = 0; > > > > @@ -925,6 +929,7 @@ xlog_find_tail( > > /* > > * Search backwards looking for log record header block > > */ > > +retry: > > ASSERT(*head_blk < INT_MAX); > > for (i = (int)(*head_blk) - 1; i >= 0; i--) { > > error = xlog_bread(log, i, 1, bp, &offset); > > @@ -962,6 +967,31 @@ xlog_find_tail( > > return -EIO; > > } > > > > + /* > > + * Now that we think we've found the log head, we have to check that the > > + * last log record wasn't torn when written out to the log. This is > > + * possible on devices without sector atomicity guarantees (e.g., pmem). > > + * > > + * Verify the CRC of the last log record that was written. If the CRC is > > + * invalid, point the head at the start of this record and retry the > > + * above backwards log record header search. We'll try the recovery up > > + * through this record. Note that we only walk backwards once since this > > + * is only intended to handle the torn write on power loss case. > > + * > > + * TODO: mount option? tied to DAX? > > + */ > > Always do it - if any hardware is tearing writes, then we should > handle it sanely. > Ok. > > + if (xfs_sb_version_hascrc(&log->l_mp->m_sb) && !skipped_last) { > > No need to be tied to xfs_sb_version_hascrc(), even v4 filesystems > now have a CRC calculated on the log. Only skip it on v4 filesystes > when the h_crc field is zero (indicating the log came from an old > kernel that didn't calculate CRCs). > Ah, Ok. > > + error = xlog_validate_logrec_crc(log, i); > > + if (error == -EFSBADCRC) { > > + skipped_last = true; > > + *head_blk = i; > > + xfs_warn(log->l_mp, > > + "WARNING: Torn write? Attempting recovery up to previous record."); > > + goto retry; > > + } else if (error) > > + goto done; > > + } > > Better to do a loop and factor the code a bit? > Yeah, I'll have to revisit how this looks given that we'll need to examine multiple records and figure out which one is the last fully written. > > + > > /* find blk_no of tail of log */ > > rhead = (xlog_rec_header_t *)offset; > > *tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn)); > > @@ -4607,6 +4637,63 @@ xlog_recover_finish( > > return 0; > > } > > > > +/* > > + * Read and CRC validate a full log record. This is used to detect torn log > > + * writes during head/tail discovery. > > + */ > > +STATIC int > > +xlog_validate_logrec_crc( > > + struct xlog *log, > > + xfs_daddr_t rec_blk) > > +{ > > + int hblks, bblks; > > + struct xlog_rec_header *rhead; > > + struct xfs_buf *hbp = NULL; > > + struct xfs_buf *dbp = NULL; > > + int error; > > + char *offset; > > + > > + hblks = 1; /* XXX */ > > + > > + /* read and validate the log record header */ > > + hbp = xlog_get_bp(log, 1); > > + if (!hbp) > > + return -ENOMEM; > > + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); > > + if (error) > > + goto out; > > + > > + rhead = (struct xlog_rec_header *) offset; > > + > > + error = xlog_valid_rec_header(log, rhead, rec_blk); > > + if (error) > > + goto out; > > + > > + /* read the full record and verify the CRC */ > > + /* XXX: factor out from do_recovery_pass() ? */ > > Yes, please do ;) > This was originally noting that the hblks = 1 above and the below buffer sizing appear to have more complicated logic to get correct in xlog_do_recovery_pass() that I didn't want to fully duplicate here. Given that and the comments from my follow on mail about handling wrapping around the log, it's sounding more like reuse of xlog_do_recovery_pass() with a dummy XLOG_RECOVER_CRCPASS is the better approach. It can skip all real processing of the records beyond the crc verification and it will probably need to learn how to report the starting block of the first record that fails crc verification. > > + dbp = xlog_get_bp(log, BTOBB(be32_to_cpu(rhead->h_size))); > > + if (!dbp) > > + goto out; > > + > > + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); > > + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); > > + if (error) > > + goto out; > > + > > + /* > > + * If the CRC validation fails, convert the return code so the caller > > + * can distinguish from unrelated errors. > > + */ > > + error = xlog_unpack_data_crc(rhead, offset, log); > > + if (error) > > + error = -EFSBADCRC; > > Note that this will only return an error on v5 filesystems so some > changes will be needed here to handle v4 filesystems. It will > also log a CRC mismatch error, so perhaps we want to make the error > reporting in this case a little bit less noisy.... > Yeah, this was tied to the fact that the torn write verification only took place on v5 in this patch. This will require some cleanup if we want to do this unconditionally for all h_crc != 0 log records. Note that this is also a potential behavior change because the v4 logic will attempt to recover the corrupted log records. Finally getting back to the bigger picture... if we do want to do this verification unconditionally, do we want to expand this mechanism further than just targetting torn writes? For example, crc failure towards the end of the log doesn't necessarily mean a torn write. The currently proposed approach is to only validate the last written record(s) and assume crc failure is due to torn writes. Conversely, we do know that a crc failure somewhere in the middle of the dirty log is not a torn write problem. That said and as noted above, we are effectively changing behavior for v4 filesystems and/or those on regular block storage, however. We're also inconsistent depending on precisely where a failure occurs. CRC failure towards the end will now skip the bad and subsequent log records for v4 or v5. Failure earlier in the log might yell and proceed on v4 and hard fail on v5. Do we want to do something like have a generic CRC verification pass across the entire log immediately after the head/tail is discovered and skip everything beyond the first corrupted (for whatever reason) log record (with a warning that the log has been truncated)? That's certainly not as conservative an approach as I was targetting, but it almost seems more logical if we're not going to try and keep this as isolated as possible to torn write handling. The tradeoff is against the v4 brute force recovery, but there might not be any better option for the v5 scenario (xfs_repair -L and lose the entire log anyways..?). Thoughts? Thanks for the feedback, Dave. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] xfs: handle torn writes during log head/tail discovery 2015-07-07 13:10 ` Brian Foster @ 2015-07-07 23:31 ` Dave Chinner 2015-07-08 16:26 ` Brian Foster 0 siblings, 1 reply; 6+ messages in thread From: Dave Chinner @ 2015-07-07 23:31 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Tue, Jul 07, 2015 at 09:10:44AM -0400, Brian Foster wrote: > On Tue, Jul 07, 2015 at 10:53:31AM +1000, Dave Chinner wrote: > > On Mon, Jul 06, 2015 at 02:26:34PM -0400, Brian Foster wrote: > > > Persistent memory without block translation table (btt) support provides > > > a block device suitable for filesystems, but does not provide the sector > > > write atomicity guarantees typical of block storage. This is a problem > > > for log recovery on XFS. The on-disk log record structure already > > > includes a CRC and thus can detect torn writes. The problem is that such > > > a failure isn't detected until log recovery is already in progress and > > > therefore results in a hard error and mount failure. .... > > The CRC validation needs to be done on more than just the head > > record. We can have up to 8 IOs in flight at the time power goes > > down, so we need to validate the CRCs for at least the previous 8 > > log writes... > > > > Interesting, Ok.. so I take it that the recovery process should be to > examine the last N records and recover only up to the point of the last > correctly written record. I suppose this can be handled by backing up a > fixed size from the last record header that is discovered by the code as > is today. > > > We also need to validate the tail record, as we could be in a > > situation where the head is overwriting the tail and it tears and > > so by discarding the head record we discard the tail update and so > > now the tail record is also corrupt. In that case, we need to abort > > recovery because the log is unrecoverable. > > > > I don't follow here. If the head is overwriting the tail, hasn't the > tail been moved forward and all the associated metadata changes written > back before that can occur? See the comment about having multiple IOs in flight at once. Also, if flushes are not being used (i.e. nobarrier mount option) on storage with volatile write caches, the tail block that the head points to may or may not have been written to disk, hence the need to validate the tail record.... i.e. once we start down the "handle incorrectly written sectors in recovery via CRC validation" path, we need to consider all the different ways they can happen, not just the specific case of pmem-related problems... > > Also: should we zero out the torn sections before starting recovery, > > like we do with the call to xlog_clear_stale_blocks() later in > > xlog_find_tail()? > > The thought crossed my mind just from a conservative implementation > standpoint, but I didn't think too hard about it for this rfc. It > probably makes sense to do something like this. > > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > > index 01dd228..6015d02 100644 > > > --- a/fs/xfs/xfs_log_recover.c > > > +++ b/fs/xfs/xfs_log_recover.c > > > @@ -62,6 +62,9 @@ xlog_recover_check_summary( > > > #define xlog_recover_check_summary(log) > > > #endif > > > > > > +STATIC int > > > +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); > > > + > > > > Put the function here, don't use forward declarations.... > > > > Eh, I put the function at the end because it depends on other functions > that don't have declarations but would require them if I put it earlier > in the file. The implementation will surely change and I'll avoid it if > I can, but I'm not sure how likely that is. You can move functions around without changing them. ;) As it is, I'm planning to split the log recovery code into two parts (the bit shared with userspace and the bit that is kernel specific) so don't be afraid to move code around... > > > + if (!hbp) > > > + return -ENOMEM; > > > + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); > > > + if (error) > > > + goto out; > > > + > > > + rhead = (struct xlog_rec_header *) offset; > > > + > > > + error = xlog_valid_rec_header(log, rhead, rec_blk); > > > + if (error) > > > + goto out; > > > + > > > + /* read the full record and verify the CRC */ > > > + /* XXX: factor out from do_recovery_pass() ? */ > > > > Yes, please do ;) > > > > This was originally noting that the hblks = 1 above and the below buffer > sizing appear to have more complicated logic to get correct in > xlog_do_recovery_pass() that I didn't want to fully duplicate here. > > Given that and the comments from my follow on mail about handling > wrapping around the log, it's sounding more like reuse of > xlog_do_recovery_pass() with a dummy XLOG_RECOVER_CRCPASS is the better > approach. It can skip all real processing of the records beyond the crc > verification and it will probably need to learn how to report the > starting block of the first record that fails crc verification. We don't really want to add another full pass reading the log if we can avoid it. we already do a bisect to find the head/tail, a pass to find cancelld buffers and then another pass to do the actual recovery. Adding a third pass can is something I'd like to avoid if possible, especially if we have a full 2GB log on a drive that only does 100MB/s.... > > > + > > > + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); > > > + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); > > > + if (error) > > > + goto out; > > > + > > > + /* > > > + * If the CRC validation fails, convert the return code so the caller > > > + * can distinguish from unrelated errors. > > > + */ > > > + error = xlog_unpack_data_crc(rhead, offset, log); > > > + if (error) > > > + error = -EFSBADCRC; > > > > Note that this will only return an error on v5 filesystems so some > > changes will be needed here to handle v4 filesystems. It will > > also log a CRC mismatch error, so perhaps we want to make the error > > reporting in this case a little bit less noisy.... > > > > Yeah, this was tied to the fact that the torn write verification only > took place on v5 in this patch. This will require some cleanup if we > want to do this unconditionally for all h_crc != 0 log records. Note > that this is also a potential behavior change because the v4 logic will > attempt to recover the corrupted log records. Right, but we did that so that we didn't unexpectedly change behaviour of log recovery for v4 filesystems. Given that nobody has reported a problem w.r.t. CRC failures on log recovery on v4 filesystem since that was added, I think it is probably safe to make v4 behave like v5 now and start bailing out of log recovery if there are bad CRCs. > Finally getting back to the bigger picture... if we do want to do this > verification unconditionally, do we want to expand this mechanism > further than just targetting torn writes? For example, crc failure > towards the end of the log doesn't necessarily mean a torn write. The > currently proposed approach is to only validate the last written > record(s) and assume crc failure is due to torn writes. Conversely, we > do know that a crc failure somewhere in the middle of the dirty log is > not a torn write problem. CRC failures are unlikely to begin with. failures in the middle of the log tend to imply deeper problems with the storage medium. Also, it's fine to abort recovery half way through if we come across such problems, so i don't think we need to care about torn writes outside of the region that we expect active writes to be occurring. > That said and as noted above, we are effectively changing behavior for > v4 filesystems and/or those on regular block storage, however. We're > also inconsistent depending on precisely where a failure occurs. CRC > failure towards the end will now skip the bad and subsequent log records > for v4 or v5. Failure earlier in the log might yell and proceed on v4 > and hard fail on v5. Bad records at the head of the log are different to those in the middle. We can't replay past a failure, so a bad record in the middle of the log leaves us with an inconsistent filesystems after recovery aborts. OTOH, trimming the incomplete records at the head to guarantee we end recovery at the last complete checkpoint before a full recovery pass still results in a consistent filesystem when recovery completes.... > Do we want to do something like have a generic CRC verification pass > across the entire log immediately after the head/tail is discovered and > skip everything beyond the first corrupted (for whatever reason) log > record (with a warning that the log has been truncated)? That's exactly how log recovery behaves now - when we abort log recovery due to a verification failure, we've already replayed and written everything up to the corruption. Hence running xfs_repair -L at that point will only have to fix up metadata inconsistnecies in the part of the log that hasn't been replayed. Also, we don't want to continue at that point, because the filesystem is inconsistent. e.g. we can't process unlinked lists, nor can we process EFIs because they are only valid once all the changes in the log have been recovered. xfs_repair is the only option we have after a failed log recovery right now.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] xfs: handle torn writes during log head/tail discovery 2015-07-07 23:31 ` Dave Chinner @ 2015-07-08 16:26 ` Brian Foster 0 siblings, 0 replies; 6+ messages in thread From: Brian Foster @ 2015-07-08 16:26 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Wed, Jul 08, 2015 at 09:31:54AM +1000, Dave Chinner wrote: > On Tue, Jul 07, 2015 at 09:10:44AM -0400, Brian Foster wrote: > > On Tue, Jul 07, 2015 at 10:53:31AM +1000, Dave Chinner wrote: > > > On Mon, Jul 06, 2015 at 02:26:34PM -0400, Brian Foster wrote: > > > > Persistent memory without block translation table (btt) support provides > > > > a block device suitable for filesystems, but does not provide the sector > > > > write atomicity guarantees typical of block storage. This is a problem > > > > for log recovery on XFS. The on-disk log record structure already > > > > includes a CRC and thus can detect torn writes. The problem is that such > > > > a failure isn't detected until log recovery is already in progress and > > > > therefore results in a hard error and mount failure. > .... > > > The CRC validation needs to be done on more than just the head > > > record. We can have up to 8 IOs in flight at the time power goes > > > down, so we need to validate the CRCs for at least the previous 8 > > > log writes... > > > > > > > Interesting, Ok.. so I take it that the recovery process should be to > > examine the last N records and recover only up to the point of the last > > correctly written record. I suppose this can be handled by backing up a > > fixed size from the last record header that is discovered by the code as > > is today. > > > > > We also need to validate the tail record, as we could be in a > > > situation where the head is overwriting the tail and it tears and > > > so by discarding the head record we discard the tail update and so > > > now the tail record is also corrupt. In that case, we need to abort > > > recovery because the log is unrecoverable. > > > > > > > I don't follow here. If the head is overwriting the tail, hasn't the > > tail been moved forward and all the associated metadata changes written > > back before that can occur? > > See the comment about having multiple IOs in flight at once. Also, > if flushes are not being used (i.e. nobarrier mount option) on > storage with volatile write caches, the tail block that the head > points to may or may not have been written to disk, hence the need > to validate the tail record.... > The multiple I/Os in flight refers to log I/O, yes? If so, I still don't quite follow that train of thought. Are you saying that of the 8 I/Os potentially in flight, the current head could be pointing at a tail that is also in-flight and ends up torn? If so, does that really require anything beyond the previously noted requirement to verify the last 8 I/Os behind the head? IIUC, that case also means we'd skip recovery entirely. The nobarrier case makes sense, though that seems like it could be a different problem. Wouldn't the cycle in the block not be updated in this case (i.e., is that really a crc verification problem)? > i.e. once we start down the "handle incorrectly written sectors in > recovery via CRC validation" path, we need to consider all the > different ways they can happen, not just the specific case of > pmem-related problems... > Agreed, I'm just trying to understand the various possibilities so I know what needs to be done and why. I'm digging deeper into the on-disk log here than I have before so I'm learning as I go. > > > Also: should we zero out the torn sections before starting recovery, > > > like we do with the call to xlog_clear_stale_blocks() later in > > > xlog_find_tail()? > > > > The thought crossed my mind just from a conservative implementation > > standpoint, but I didn't think too hard about it for this rfc. It > > probably makes sense to do something like this. > > > > > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > > > index 01dd228..6015d02 100644 > > > > --- a/fs/xfs/xfs_log_recover.c > > > > +++ b/fs/xfs/xfs_log_recover.c > > > > @@ -62,6 +62,9 @@ xlog_recover_check_summary( > > > > #define xlog_recover_check_summary(log) > > > > #endif > > > > > > > > +STATIC int > > > > +xlog_validate_logrec_crc(struct xlog *, xfs_daddr_t); > > > > + > > > > > > Put the function here, don't use forward declarations.... > > > > > > > Eh, I put the function at the end because it depends on other functions > > that don't have declarations but would require them if I put it earlier > > in the file. The implementation will surely change and I'll avoid it if > > I can, but I'm not sure how likely that is. > > You can move functions around without changing them. ;) > Sure, but why shuffle the file around to avoid a declaration? We use declarations all over the place, including in this file. > As it is, I'm planning to split the log recovery code into two parts > (the bit shared with userspace and the bit that is kernel specific) > so don't be afraid to move code around... > Ok, if that's the reason. The approach I'm on now refactors things a bit as is, so we'll see what falls out of that. I'd prefer to get the core functionality and refactoring right before reorganizing everything, at least. > > > > + if (!hbp) > > > > + return -ENOMEM; > > > > + error = xlog_bread(log, rec_blk, hblks, hbp, &offset); > > > > + if (error) > > > > + goto out; > > > > + > > > > + rhead = (struct xlog_rec_header *) offset; > > > > + > > > > + error = xlog_valid_rec_header(log, rhead, rec_blk); > > > > + if (error) > > > > + goto out; > > > > + > > > > + /* read the full record and verify the CRC */ > > > > + /* XXX: factor out from do_recovery_pass() ? */ > > > > > > Yes, please do ;) > > > > > > > This was originally noting that the hblks = 1 above and the below buffer > > sizing appear to have more complicated logic to get correct in > > xlog_do_recovery_pass() that I didn't want to fully duplicate here. > > > > Given that and the comments from my follow on mail about handling > > wrapping around the log, it's sounding more like reuse of > > xlog_do_recovery_pass() with a dummy XLOG_RECOVER_CRCPASS is the better > > approach. It can skip all real processing of the records beyond the crc > > verification and it will probably need to learn how to report the > > starting block of the first record that fails crc verification. > > We don't really want to add another full pass reading the log if we > can avoid it. we already do a bisect to find the head/tail, a pass > to find cancelld buffers and then another pass to do the actual > recovery. Adding a third pass can is something I'd like to avoid if > possible, especially if we have a full 2GB log on a drive that only > does 100MB/s.... > Hmm, Ok. There's not much of a point anyways if we aren't going to do anything different in general about crc failure in the middle of the log. I have something halfway reworked that adds a CRC pass, but that is with the idea of doing more broad CRC verification. The code can still be reused since xlog_do_recovery_pass() already implements all of the log walking logic and already supports the ability to iterate a specific range of blocks. I think it just changes where and how the code is invoked a bit. > > > > + > > > > + bblks = (int)BTOBB(be32_to_cpu(rhead->h_len)); > > > > + error = xlog_bread(log, rec_blk+hblks, bblks, dbp, &offset); > > > > + if (error) > > > > + goto out; > > > > + > > > > + /* > > > > + * If the CRC validation fails, convert the return code so the caller > > > > + * can distinguish from unrelated errors. > > > > + */ > > > > + error = xlog_unpack_data_crc(rhead, offset, log); > > > > + if (error) > > > > + error = -EFSBADCRC; > > > > > > Note that this will only return an error on v5 filesystems so some > > > changes will be needed here to handle v4 filesystems. It will > > > also log a CRC mismatch error, so perhaps we want to make the error > > > reporting in this case a little bit less noisy.... > > > > > > > Yeah, this was tied to the fact that the torn write verification only > > took place on v5 in this patch. This will require some cleanup if we > > want to do this unconditionally for all h_crc != 0 log records. Note > > that this is also a potential behavior change because the v4 logic will > > attempt to recover the corrupted log records. > > Right, but we did that so that we didn't unexpectedly change > behaviour of log recovery for v4 filesystems. Given that nobody has > reported a problem w.r.t. CRC failures on log recovery on v4 > filesystem since that was added, I think it is probably safe to make > v4 behave like v5 now and start bailing out of log recovery if there > are bad CRCs. > Sounds good, at least we'll be consistent with fs versions across the "edges" and middle of the log. > > Finally getting back to the bigger picture... if we do want to do this > > verification unconditionally, do we want to expand this mechanism > > further than just targetting torn writes? For example, crc failure > > towards the end of the log doesn't necessarily mean a torn write. The > > currently proposed approach is to only validate the last written > > record(s) and assume crc failure is due to torn writes. Conversely, we > > do know that a crc failure somewhere in the middle of the dirty log is > > not a torn write problem. > > CRC failures are unlikely to begin with. failures in the middle of > the log tend to imply deeper problems with the storage medium. Also, > it's fine to abort recovery half way through if we come across such > problems, so i don't think we need to care about torn writes outside > of the region that we expect active writes to be occurring. > Right... we know crc failure in the middle is not a torn write, as noted above. The train of thought was about doing more broad crc failure handling. As you explained below, mounting after such a halfway recovery is not the right thing to do. > > That said and as noted above, we are effectively changing behavior for > > v4 filesystems and/or those on regular block storage, however. We're > > also inconsistent depending on precisely where a failure occurs. CRC > > failure towards the end will now skip the bad and subsequent log records > > for v4 or v5. Failure earlier in the log might yell and proceed on v4 > > and hard fail on v5. > > Bad records at the head of the log are different to those in the > middle. We can't replay past a failure, so a bad record in the > middle of the log leaves us with an inconsistent filesystems after > recovery aborts. OTOH, trimming the incomplete records at the head > to guarantee we end recovery at the last complete checkpoint before > a full recovery pass still results in a consistent filesystem when > recovery completes.... > > > Do we want to do something like have a generic CRC verification pass > > across the entire log immediately after the head/tail is discovered and > > skip everything beyond the first corrupted (for whatever reason) log > > record (with a warning that the log has been truncated)? > > That's exactly how log recovery behaves now - when we abort log > recovery due to a verification failure, we've already replayed and > written everything up to the corruption. Hence running xfs_repair -L > at that point will only have to fix up metadata inconsistnecies in > the part of the log that hasn't been replayed. > > Also, we don't want to continue at that point, because the filesystem > is inconsistent. e.g. we can't process unlinked lists, nor can we > process EFIs because they are only valid once all the changes in the > log have been recovered. xfs_repair is the only option we have after > a failed log recovery right now.... > Ok. When thinking about this, I guess I was thinking that we'd be able to recover and mount across a complete log record boundary, but that is not the case. I take it the reason is that if some log record in the middle is corrupted for something other than a torn write, it was still completely written as far as the kernel was concerned at runtime. Thus, some changes dependent on that record could have been written back to the fs. Failure to replay the records beyond that point can thus leave us inconsistent and in need of repair. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-07-08 16:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-06 18:26 [PATCH RFC] xfs: handle torn writes during log head/tail discovery Brian Foster 2015-07-06 19:02 ` Brian Foster 2015-07-07 0:53 ` Dave Chinner 2015-07-07 13:10 ` Brian Foster 2015-07-07 23:31 ` Dave Chinner 2015-07-08 16:26 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox