public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfs: log recovery torn write detection
@ 2015-11-09 20:21 Brian Foster
  2015-11-09 20:21 ` [PATCH 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

Hi all,

Here's a first real pass at XFS log recovery torn write detection. This
series has been tested via xfstests and via repetitive fsstress/shutdown
sequences followed by simulated CRC errors on log recovery. The latter
testing has proven useful in shaking out a few bugs, but I have still
reproduced fs inconsistency after a couple hundred iterations or so.
That said, I suspect the problems at this point are either actual
logging problems (e.g., all of the EFI/EFD logging patches and whatnot
originated from this kind of testing) or due to the nature of the error
simulation.

In short, it simulates log corruption moreso than torn writes because it
injects errors at recovery time. The log buffers are written
successfully at shutdown time and therefore I believe it's still
possible for the filesystem to have modifications that depend on
committed transactions (which are ultimately skipped if a crc error is
simulated). I've marked this patch RFC for the time being because I'd
like to try and come up with something a bit more deterministic, if
possible (so long as it can be done reasonably simply). For example,
perhaps we can replace it with a similar debug mode that intentionally
corrupts a crc at write time and shuts down the fs on write completion
such that the AIL is not updated and there is less risk of inconsistency
due to writing back metadata items in the "corrupted" log buffer(s).
Anyways, the current patch is included so the current test procedure is
documented, reviewable and repeatable.

Patch 1 is a bug fix for a problem exposed by this mechanism. Patches
2-6 are primarily refactoring and introduce the CRC-check-only log
recovery pass. Patch 7 enables log head/tail torn write detection. Patch
8 implements the DEBUG mode error injection mechanism described above.
Thoughts, reviews, flames appreciated.

Brian

v1:
- Added bug fix for mkfs log record header inconsistency.
- Refactored log recovery code to support a CRC-check-only recovery
  pass.
- CRC verify the last 8 records behind the head to account for
  concurrent log writes.
- Verify the tail of the log as well when the head is torn.
- Added (rfc) crc error injection patch for testing purposes.
rfc: http://oss.sgi.com/pipermail/xfs/2015-July/042415.html

Brian Foster (8):
  xfs: detect and handle invalid iclog size set by mkfs
  xfs: refactor log record unpack and data processing
  xfs: refactor and open code log record crc check
  xfs: return start block of first bad log record during recovery
  xfs: support a crc verification only log record pass
  xfs: refactor log record start detection into a new helper
  xfs: detect and trim torn writes during log recovery
  xfs: debug mode log recovery crc error injection

 fs/xfs/libxfs/xfs_log_recover.h |   1 +
 fs/xfs/xfs_globals.c            |   1 +
 fs/xfs/xfs_log_recover.c        | 646 +++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_sysctl.h             |   1 +
 fs/xfs/xfs_sysfs.c              |  31 ++
 5 files changed, 574 insertions(+), 106 deletions(-)

-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/8] xfs: detect and handle invalid iclog size set by mkfs
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-09 20:21 ` [PATCH 2/8] xfs: refactor log record unpack and data processing Brian Foster
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

XFS log records have separate fields for the record size and the iclog
size used to write the record. mkfs.xfs zeroes the log and writes an
unmount record to generate a clean log for the subsequent mount. The
userspace record logging code has a bug where the iclog size (h_size)
field of the log record is hardcoded to 32k, even if a log stripe unit
is specified. The log record length is correctly extended to the stripe
unit. Since the kernel log recovery code uses the h_size field to
determine the log buffer size, this means that the kernel can attempt to
read/process records larger than the buffer size and overrun the buffer.

This has historically not been a problem because the kernel doesn't
actually run through log recovery in the clean unmount case. Instead,
the kernel detects that a single unmount record exists between the head
and tail and pushes the tail forward such that the log is viewed as
clean (head == tail). Once CRC verification is enabled, however, all
records at the head of the log are verified for CRC errors and thus we
are susceptible to overrun problems if the iclog field is not correct.

While the core problem must be fixed in userspace, this is historical
behavior that must be detected in the kernel to avoid severe side
effects such as memory corruption and crashes. Update the log buffer
size calculation code to detect this condition, warn the user and resize
the log buffer based on the log stripe unit. Return a corruption error
in cases where this does not look like a clean filesystem (i.e., the log
record header indicates more than one operation).

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index f8f1363..b00c85e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4245,7 +4245,7 @@ xlog_do_recovery_pass(
 	xfs_daddr_t		blk_no;
 	char			*offset;
 	xfs_buf_t		*hbp, *dbp;
-	int			error = 0, h_size;
+	int			error = 0, h_size, h_len;
 	int			bblks, split_bblks;
 	int			hblks, split_hblks, wrapped_hblks;
 	struct hlist_head	rhash[XLOG_RHASH_SIZE];
@@ -4274,7 +4274,31 @@ xlog_do_recovery_pass(
 		error = xlog_valid_rec_header(log, rhead, tail_blk);
 		if (error)
 			goto bread_err1;
+
+		/*
+		 * xfsprogs has a bug where record length is based on lsunit but
+		 * h_size (iclog size) is hardcoded to 32k. Now that we
+		 * unconditionally CRC verify the unmount record, this means the
+		 * log buffer can be too small for the record and cause an
+		 * overrun.
+		 *
+		 * Detect this condition here. Use lsunit for the buffer size as
+		 * long as this looks like the mkfs case. Otherwise, return an
+		 * error to avoid a buffer overrun.
+		 */
 		h_size = be32_to_cpu(rhead->h_size);
+		h_len = be32_to_cpu(rhead->h_len);
+		if (h_len > h_size) {
+			if (h_len <= log->l_mp->m_logbsize &&
+			    be32_to_cpu(rhead->h_num_logops) == 1) {
+				xfs_warn(log->l_mp,
+		"invalid iclog size (%d bytes), using lsunit (%d bytes)",
+					 h_size, log->l_mp->m_logbsize);
+				h_size = log->l_mp->m_logbsize;
+			} else
+				return -EFSCORRUPTED;
+		}
+
 		if ((be32_to_cpu(rhead->h_version) & XLOG_VERSION_2) &&
 		    (h_size > XLOG_HEADER_CYCLE_SIZE)) {
 			hblks = h_size / XLOG_HEADER_CYCLE_SIZE;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/8] xfs: refactor log record unpack and data processing
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
  2015-11-09 20:21 ` [PATCH 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-09 20:21 ` [PATCH 3/8] xfs: refactor and open code log record crc check Brian Foster
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

xlog_do_recovery_pass() duplicates a couple function calls related to
processing log records because the function must handle wrapping around
the end of the log if the head is behind the tail. This is implemented
as separate loops. CRC verification pass support will modify how records
are processed in both of these loops.

Rather than continue to duplicate code, factor the calls that process a
log record into a new helper and call that helper from both loops. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index b00c85e..a15a506 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4190,6 +4190,26 @@ xlog_unpack_data(
 	return 0;
 }
 
+/*
+ * Unpack and process a log record.
+ */
+STATIC int
+xlog_recover_process(
+	struct xlog		*log,
+	struct hlist_head	rhash[],
+	struct xlog_rec_header	*rhead,
+	char			*dp,
+	int			pass)
+{
+	int			error;
+
+	error = xlog_unpack_data(rhead, dp, log);
+	if (error)
+		return error;
+
+	return xlog_recover_process_data(log, rhash, rhead, dp, pass);
+}
+
 STATIC int
 xlog_valid_rec_header(
 	struct xlog		*log,
@@ -4432,12 +4452,8 @@ xlog_do_recovery_pass(
 					goto bread_err2;
 			}
 
-			error = xlog_unpack_data(rhead, offset, log);
-			if (error)
-				goto bread_err2;
-
-			error = xlog_recover_process_data(log, rhash,
-							rhead, offset, pass);
+			error = xlog_recover_process(log, rhash, rhead, offset,
+						     pass);
 			if (error)
 				goto bread_err2;
 			blk_no += bblks;
@@ -4465,12 +4481,7 @@ xlog_do_recovery_pass(
 		if (error)
 			goto bread_err2;
 
-		error = xlog_unpack_data(rhead, offset, log);
-		if (error)
-			goto bread_err2;
-
-		error = xlog_recover_process_data(log, rhash,
-						rhead, offset, pass);
+		error = xlog_recover_process(log, rhash, rhead, offset, pass);
 		if (error)
 			goto bread_err2;
 		blk_no += bblks + hblks;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/8] xfs: refactor and open code log record crc check
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
  2015-11-09 20:21 ` [PATCH 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
  2015-11-09 20:21 ` [PATCH 2/8] xfs: refactor log record unpack and data processing Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-09 20:21 ` [PATCH 4/8] xfs: return start block of first bad log record during recovery Brian Foster
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

Log record CRC verification currently occurs during active log recovery,
immediately before a log record is unpacked. Therefore, the CRC
calculation code is buried within the data unpack function. CRC
verification pass support only needs to go so far as check the CRC, but
this is not easily allowed as the code is currently organized.

Since we now have a new log record processing helper, pull the record
CRC verification code out from the unpack helper and open-code it at the
top of the new process helper. This facilitates the ability to modify
how records are processed based on the type of the current pass. This
patch contains no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 72 +++++++++++++++++-------------------------------
 1 file changed, 26 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index a15a506..c2bf307 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4118,46 +4118,6 @@ xlog_recover_process_iunlinks(
 	mp->m_dmevmask = mp_dmevmask;
 }
 
-/*
- * Upack the log buffer data and crc check it. If the check fails, issue a
- * warning if and only if the CRC in the header is non-zero. This makes the
- * check an advisory warning, and the zero CRC check will prevent failure
- * warnings from being emitted when upgrading the kernel from one that does not
- * add CRCs by default.
- *
- * When filesystems are CRC enabled, this CRC mismatch becomes a fatal log
- * corruption failure
- */
-STATIC int
-xlog_unpack_data_crc(
-	struct xlog_rec_header	*rhead,
-	char			*dp,
-	struct xlog		*log)
-{
-	__le32			crc;
-
-	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
-	if (crc != rhead->h_crc) {
-		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
-			xfs_alert(log->l_mp,
-		"log record CRC mismatch: found 0x%x, expected 0x%x.",
-					le32_to_cpu(rhead->h_crc),
-					le32_to_cpu(crc));
-			xfs_hex_dump(dp, 32);
-		}
-
-		/*
-		 * If we've detected a log record corruption, then we can't
-		 * recover past this point. Abort recovery if we are enforcing
-		 * CRC protection by punting an error back up the stack.
-		 */
-		if (xfs_sb_version_hascrc(&log->l_mp->m_sb))
-			return -EFSCORRUPTED;
-	}
-
-	return 0;
-}
-
 STATIC int
 xlog_unpack_data(
 	struct xlog_rec_header	*rhead,
@@ -4165,11 +4125,6 @@ xlog_unpack_data(
 	struct xlog		*log)
 {
 	int			i, j, k;
-	int			error;
-
-	error = xlog_unpack_data_crc(rhead, dp, log);
-	if (error)
-		return error;
 
 	for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) &&
 		  i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) {
@@ -4191,7 +4146,7 @@ xlog_unpack_data(
 }
 
 /*
- * Unpack and process a log record.
+ * CRC check, unpack and process a log record.
  */
 STATIC int
 xlog_recover_process(
@@ -4202,6 +4157,31 @@ xlog_recover_process(
 	int			pass)
 {
 	int			error;
+	__le32			crc;
+
+	/*
+	 * Check the CRC and issue a warning if and only if the CRC in the
+	 * header is non-zero. This is an advisory warning and the zero CRC
+	 * check prevents warnings from being emitted when upgrading the kernel
+	 * from one that does not add CRCs by default.
+	 */
+	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
+	if (crc != le32_to_cpu(rhead->h_crc)) {
+		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
+			xfs_alert(log->l_mp,
+		"log record CRC mismatch: found 0x%x, expected 0x%x.",
+					le32_to_cpu(rhead->h_crc),
+					le32_to_cpu(crc));
+			xfs_hex_dump(dp, 32);
+		}
+
+		/*
+		 * If the filesystem is CRC enabled, this mismatch becomes a
+		 * fatal log corruption failure.
+		 */
+		if (xfs_sb_version_hascrc(&log->l_mp->m_sb))
+			return -EFSCORRUPTED;
+	}
 
 	error = xlog_unpack_data(rhead, dp, log);
 	if (error)
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/8] xfs: return start block of first bad log record during recovery
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (2 preceding siblings ...)
  2015-11-09 20:21 ` [PATCH 3/8] xfs: refactor and open code log record crc check Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-10 15:42   ` Brian Foster
  2015-11-09 20:21 ` [PATCH 5/8] xfs: support a crc verification only log record pass Brian Foster
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

Each log recovery pass walks from the tail block to the head block and
processes records appropriately based on the associated log pass type.
There are various failure conditions that can occur through this
sequence, such as I/O errors, CRC errors, etc. Log torn write detection
will perform CRC verification near the head of the log to detect torn
writes and trim torn records from the log appropriately.

As it is, xlog_do_recovery_pass() only returns an error code in the
event of CRC failure, which isn't enough information to trim the head of
the log. Update xlog_do_recovery_pass() to optionally return the start
block of the associated record when an error occurs. This patch contains
no functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c2bf307..2ef0880 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4239,10 +4239,12 @@ xlog_do_recovery_pass(
 	struct xlog		*log,
 	xfs_daddr_t		head_blk,
 	xfs_daddr_t		tail_blk,
-	int			pass)
+	int			pass,
+	xfs_daddr_t		*first_bad)	/* out: first bad log rec */
 {
 	xlog_rec_header_t	*rhead;
 	xfs_daddr_t		blk_no;
+	xfs_daddr_t		rhead_blk;
 	char			*offset;
 	xfs_buf_t		*hbp, *dbp;
 	int			error = 0, h_size, h_len;
@@ -4251,6 +4253,7 @@ xlog_do_recovery_pass(
 	struct hlist_head	rhash[XLOG_RHASH_SIZE];
 
 	ASSERT(head_blk != tail_blk);
+	rhead_blk = 0;
 
 	/*
 	 * Read the header of the tail block and get the iclog buffer size from
@@ -4325,7 +4328,7 @@ xlog_do_recovery_pass(
 	}
 
 	memset(rhash, 0, sizeof(rhash));
-	blk_no = tail_blk;
+	blk_no = rhead_blk = tail_blk;
 	if (tail_blk > head_blk) {
 		/*
 		 * Perform recovery around the end of the physical log.
@@ -4436,7 +4439,9 @@ xlog_do_recovery_pass(
 						     pass);
 			if (error)
 				goto bread_err2;
+
 			blk_no += bblks;
+			rhead_blk = blk_no;
 		}
 
 		ASSERT(blk_no >= log->l_logBBsize);
@@ -4464,13 +4469,19 @@ xlog_do_recovery_pass(
 		error = xlog_recover_process(log, rhash, rhead, offset, pass);
 		if (error)
 			goto bread_err2;
+
 		blk_no += bblks + hblks;
+		rhead_blk = blk_no;
 	}
 
  bread_err2:
 	xlog_put_bp(dbp);
  bread_err1:
 	xlog_put_bp(hbp);
+
+	if (error && first_bad)
+		*first_bad = rhead_blk;
+
 	return error;
 }
 
@@ -4508,7 +4519,7 @@ xlog_do_log_recovery(
 		INIT_LIST_HEAD(&log->l_buf_cancel_table[i]);
 
 	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
-				      XLOG_RECOVER_PASS1);
+				      XLOG_RECOVER_PASS1, NULL);
 	if (error != 0) {
 		kmem_free(log->l_buf_cancel_table);
 		log->l_buf_cancel_table = NULL;
@@ -4519,7 +4530,7 @@ xlog_do_log_recovery(
 	 * When it is complete free the table of buf cancel items.
 	 */
 	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
-				      XLOG_RECOVER_PASS2);
+				      XLOG_RECOVER_PASS2, NULL);
 #ifdef DEBUG
 	if (!error) {
 		int	i;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/8] xfs: support a crc verification only log record pass
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (3 preceding siblings ...)
  2015-11-09 20:21 ` [PATCH 4/8] xfs: return start block of first bad log record during recovery Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-09 20:21 ` [PATCH 6/8] xfs: refactor log record start detection into a new helper Brian Foster
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

Log recovery torn write detection uses CRC verification over a range of
the active log to identify torn writes. Since the generic log recovery
pass code implements a superset of the functionality required for CRC
verification, it can be easily modified to support a CRC verification
only pass.

Create a new CRC pass type and update the log record processing helper
to skip everything beyond CRC verification when in this mode. This pass
will be invoked in subsequent patches to implement torn write detection.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_log_recover.h |  1 +
 fs/xfs/xfs_log_recover.c        | 24 +++++++++++++++++++-----
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 1c55ccb..8e385f9 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -60,6 +60,7 @@ typedef struct xlog_recover {
  */
 #define	XLOG_BC_TABLE_SIZE	64
 
+#define	XLOG_RECOVER_CRCPASS	0
 #define	XLOG_RECOVER_PASS1	1
 #define	XLOG_RECOVER_PASS2	2
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 2ef0880..ae514df 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4159,13 +4159,27 @@ xlog_recover_process(
 	int			error;
 	__le32			crc;
 
+	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
+
 	/*
-	 * Check the CRC and issue a warning if and only if the CRC in the
-	 * header is non-zero. This is an advisory warning and the zero CRC
-	 * check prevents warnings from being emitted when upgrading the kernel
-	 * from one that does not add CRCs by default.
+	 * Nothing else to do if this is a CRC verification pass. Just return
+	 * if this a record with a non-zero crc. Unfortunately, mkfs always
+	 * sets h_crc to 0 so we must consider this valid even on v5 supers.
+	 * Otherwise, return EFSBADCRC on failure so the callers up the stack
+	 * know precisely what failed.
+	 */
+	if (pass == XLOG_RECOVER_CRCPASS) {
+		if (rhead->h_crc && crc != le32_to_cpu(rhead->h_crc))
+			return -EFSBADCRC;
+		return 0;
+	}
+
+	/*
+	 * We're in the normal recovery path. Issue a warning if and only if the
+	 * CRC in the header is non-zero. This is an advisory warning and the
+	 * zero CRC check prevents warnings from being emitted when upgrading
+	 * the kernel from one that does not add CRCs by default.
 	 */
-	crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len));
 	if (crc != le32_to_cpu(rhead->h_crc)) {
 		if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) {
 			xfs_alert(log->l_mp,
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/8] xfs: refactor log record start detection into a new helper
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (4 preceding siblings ...)
  2015-11-09 20:21 ` [PATCH 5/8] xfs: support a crc verification only log record pass Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-09 20:21 ` [PATCH 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
  2015-11-09 20:21 ` [PATCH RFC 8/8] xfs: debug mode log recovery crc error injection Brian Foster
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

As part of the head/tail discovery process, log recovery locates the
head block and then reverse seeks to find the start of the last active
record in the log. This is non-trivial as the record itself could have
wrapped around the end of the physical log. Log recovery torn write
detection potentially needs to walk further behind the last record in
the log, as multiple log I/Os can be in-flight at one time during a
crash event.

Therefore, refactor the reverse log record header search mechanism into
a new helper that supports the ability to seek past an arbitrary number
of log records (or until the tail is hit). Update the head/tail search
mechanism to call the new helper, but otherwise there is no change in
log recovery behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 118 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index ae514df..1b3f8fe 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -868,6 +868,79 @@ validate_head:
 }
 
 /*
+ * Seek backwards in the log for log record headers.
+ *
+ * Given a starting log block, walk backwards until we find the provided number
+ * of records or hit the provided tail block. The return value is the number of
+ * records encountered or a negative error code. The log block and buffer
+ * pointer of the last record seen are returned in rblk and rhead respectively.
+ */
+STATIC int
+xlog_rseek_logrec_hdr(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk,
+	int			count,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rblk,
+	struct xlog_rec_header	**rhead,
+	bool			*wrapped)
+{
+	int			i;
+	int			error;
+	int			found = 0;
+	char			*offset = NULL;
+	xfs_daddr_t		end_blk;
+
+	*wrapped = false;
+
+	/*
+	 * Walk backwards from the head block until we hit the tail or the first
+	 * block in the log.
+	 */
+	end_blk = head_blk > tail_blk ? tail_blk : 0;
+	for (i = (int) head_blk - 1; i >= end_blk; i--) {
+		error = xlog_bread(log, i, 1, bp, &offset);
+		if (error)
+			goto out_error;
+
+		if (*(__be32 *) offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+			*rblk = i;
+			*rhead = (struct xlog_rec_header *) offset;
+			if (++found == count)
+				break;
+		}
+	}
+
+	/*
+	 * If we haven't hit the tail block or the log record header count,
+	 * start looking again from the end of the physical log. Note that
+	 * callers can pass head == tail if the tail is not yet known.
+	 */
+	if (tail_blk >= head_blk && found != count) {
+		for (i = log->l_logBBsize - 1; i >= (int) tail_blk; i--) {
+			error = xlog_bread(log, i, 1, bp, &offset);
+			if (error)
+				goto out_error;
+
+			if (*(__be32 *)offset ==
+			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+				*wrapped = true;
+				*rblk = i;
+				*rhead = (struct xlog_rec_header *) offset;
+				if (++found == count)
+					break;
+			}
+		}
+	}
+
+	return found;
+
+out_error:
+	return error;
+}
+
+/*
  * Find the sync block number or the tail of the log.
  *
  * This will be the block number of the last record to have its
@@ -898,8 +971,7 @@ xlog_find_tail(
 	xfs_daddr_t		after_umount_blk;
 	xfs_lsn_t		tail_lsn;
 	int			hblks;
-
-	found = 0;
+	bool			wrapped = false;
 
 	/*
 	 * Find previous log record
@@ -923,37 +995,16 @@ xlog_find_tail(
 	}
 
 	/*
-	 * Search backwards looking for log record header block
+	 * Search backwards through the log looking for the log record header
+	 * block. This wraps all the way back around to the head so something is
+	 * seriously wrong if we can't find it.
 	 */
 	ASSERT(*head_blk < INT_MAX);
-	for (i = (int)(*head_blk) - 1; i >= 0; i--) {
-		error = xlog_bread(log, i, 1, bp, &offset);
-		if (error)
-			goto done;
-
-		if (*(__be32 *)offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
-			found = 1;
-			break;
-		}
-	}
-	/*
-	 * If we haven't found the log record header block, start looking
-	 * again from the end of the physical log.  XXXmiken: There should be
-	 * a check here to make sure we didn't search more than N blocks in
-	 * the previous code.
-	 */
-	if (!found) {
-		for (i = log->l_logBBsize - 1; i >= (int)(*head_blk); i--) {
-			error = xlog_bread(log, i, 1, bp, &offset);
-			if (error)
-				goto done;
-
-			if (*(__be32 *)offset ==
-			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
-				found = 2;
-				break;
-			}
-		}
+	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, &i,
+				      &rhead, &wrapped);
+	if (found < 0) {
+		error = found;
+		goto done;
 	}
 	if (!found) {
 		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
@@ -961,9 +1012,6 @@ xlog_find_tail(
 		ASSERT(0);
 		return -EIO;
 	}
-
-	/* find blk_no of tail of log */
-	rhead = (xlog_rec_header_t *)offset;
 	*tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn));
 
 	/*
@@ -979,7 +1027,7 @@ xlog_find_tail(
 	log->l_prev_block = i;
 	log->l_curr_block = (int)*head_blk;
 	log->l_curr_cycle = be32_to_cpu(rhead->h_cycle);
-	if (found == 2)
+	if (wrapped)
 		log->l_curr_cycle++;
 	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
 	atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn));
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/8] xfs: detect and trim torn writes during log recovery
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (5 preceding siblings ...)
  2015-11-09 20:21 ` [PATCH 6/8] xfs: refactor log record start detection into a new helper Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  2015-11-10 13:45   ` Brian Foster
  2015-11-09 20:21 ` [PATCH RFC 8/8] xfs: debug mode log recovery crc error injection Brian Foster
  7 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

Certain types of storage, such as persistent memory, do not provide
sector atomicity for writes. This means that if a crash occurs while XFS
is writing log records, only part of those records might make it to the
storage. This is problematic because log recovery uses the cycle value
packed at the top of each log block to locate the head/tail of the log.
This can lead to CRC verification failures during log recovery and an
unmountable fs for a filesystem that is otherwise consistent.

Update log recovery to incorporate log record CRC verification as part
of the head/tail discovery process. Once the head is located via the
traditional algorithm, run a CRC-only pass over the records up to the
head of the log. If CRC verification fails, assume that the records are
torn as a matter of policy and trim the head block back to the start of
the first bad record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 298 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 279 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1b3f8fe..83fd7ca 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -61,6 +61,9 @@ xlog_recover_check_summary(
 #else
 #define	xlog_recover_check_summary(log)
 #endif
+STATIC int
+xlog_do_recovery_pass(
+        struct xlog *, xfs_daddr_t, xfs_daddr_t, int, xfs_daddr_t *);
 
 /*
  * This structure is used during recovery to record the buf log items which
@@ -941,6 +944,270 @@ out_error:
 }
 
 /*
+ * Seek forward in the log for log record headers.
+ *
+ * Given head and tail blocks, walk forward from the tail block until we find
+ * the provided number of records or hit the head block. The return value is the
+ * number of records encountered or a negative error code. The log block and
+ * buffer pointer of the last record seen are returned in rblk and rhead
+ * respectively.
+ */
+STATIC int
+xlog_seek_logrec_hdr(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk,
+	int			count,
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rblk,
+	struct xlog_rec_header	**rhead,
+	bool			*wrapped)
+{
+	int			i;
+	int			error;
+	int			found = 0;
+	char			*offset = NULL;
+	xfs_daddr_t		end_blk;
+
+	*wrapped = false;
+
+	/*
+	 * Walk forward from the tail block until we hit the head or the last
+	 * block in the log.
+	 */
+	end_blk = head_blk > tail_blk ? head_blk : log->l_logBBsize - 1;
+	for (i = (int) tail_blk; i <= end_blk; i++) {
+		error = xlog_bread(log, i, 1, bp, &offset);
+		if (error)
+			goto out_error;
+
+		if (*(__be32 *) offset == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+			*rblk = i;
+			*rhead = (struct xlog_rec_header *) offset;
+			if (++found == count)
+				break;
+		}
+	}
+
+	/*
+	 * If we haven't hit the head block or the log record header count,
+	 * start looking again from the start of the physical log.
+	 */
+	if (tail_blk > head_blk && found != count) {
+		for (i = 0; i < (int) head_blk; i++) {
+			error = xlog_bread(log, i, 1, bp, &offset);
+			if (error)
+				goto out_error;
+
+			if (*(__be32 *)offset ==
+			    cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) {
+				*wrapped = true;
+				*rblk = i;
+				*rhead = (struct xlog_rec_header *) offset;
+				if (++found == count)
+					break;
+			}
+		}
+	}
+
+	return found;
+
+out_error:
+	return error;
+}
+
+/*
+ * Check the log tail for torn writes. This is required when torn writes are
+ * detected at the head and the head had to be walked back to a previous record.
+ * The tail of the previous record must now be verified to ensure the torn
+ * writes didn't corrupt the previous tail.
+ *
+ * Return an error if CRC verification fails as recovery cannot proceed.
+ */
+STATIC int
+xlog_verify_tail(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk)
+{
+	struct xlog_rec_header	*thead;
+	struct xfs_buf		*bp;
+	xfs_daddr_t		first_bad;
+	int			count;
+	int			error = 0;
+	bool			wrapped;
+	xfs_daddr_t		tmp_head;
+
+	bp = xlog_get_bp(log, 1);
+	if (!bp)
+		return -ENOMEM;
+
+	/*
+	 * Seek XLOG_MAX_ICLOGS + 1 records past the current tail record to get
+	 * a temporary head block that points after the last possible
+	 * concurrently written record of the tail.
+	 */
+	count = xlog_seek_logrec_hdr(log, head_blk, tail_blk,
+				     XLOG_MAX_ICLOGS + 1, bp, &tmp_head, &thead,
+				     &wrapped);
+	if (count < 0) {
+		error = count;
+		goto out;
+	}
+
+	/*
+	 * If the call above didn't find XLOG_MAX_ICLOGS + 1 records, we ran
+	 * into the actual log head. tmp_head points to the start of the record
+	 * so update it to the actual head block.
+	 */
+	if (count < XLOG_MAX_ICLOGS + 1)
+		tmp_head = head_blk;
+
+	/*
+	 * We now have a tail and temporary head block that covers at least
+	 * XLOG_MAX_ICLOGS records from the tail. We need to verify that these
+	 * records were completely written. Run a CRC verification pass from
+	 * tail to head and return the result.
+	 */
+	error = xlog_do_recovery_pass(log, tmp_head, tail_blk,
+				      XLOG_RECOVER_CRCPASS, &first_bad);
+
+out:
+	xlog_put_bp(bp);
+	return error;
+}
+
+/*
+ * Detect and trim torn writes from the head of the log.
+ *
+ * Storage without sector atomicity guarantees can result in torn writes in the
+ * log in the event of a crash. Our only means to detect this scenario is via
+ * CRC verification. While we can't always be certain that CRC verification
+ * failure is due to a torn write vs. an unrelated corruption, we do know that
+ * only a certain number (XLOG_MAX_ICLOGS) of log records can be written out at
+ * one time. Therefore, CRC verify up to XLOG_MAX_ICLOGS records at the head of
+ * the log and treat failures in this range as torn writes as a matter of
+ * policy. In the event of CRC failure, the head is walked back to the last good
+ * record in the log and the tail is updated from that record and verified.
+ */
+STATIC int
+xlog_verify_head(
+	struct xlog		*log,
+	xfs_daddr_t		*head_blk,	/* in/out: unverified head */
+	xfs_daddr_t		*tail_blk,	/* out: tail block */
+	struct xfs_buf		*bp,
+	xfs_daddr_t		*rhead_blk,	/* start blk of last record */
+	struct xlog_rec_header	**rhead,	/* ptr to last record */
+	bool			*wrapped)	/* last rec. wraps phys. log */
+{
+	struct xlog_rec_header	*tmp_rhead;
+	struct xfs_buf		*tmp_bp;
+	xfs_daddr_t		first_bad;
+	xfs_daddr_t		tmp_rhead_blk;
+	int			found;
+	int			error;
+	bool			tmp_wrapped;
+
+	/*
+	 * Search backwards through the log looking for the log record header
+	 * block. This wraps all the way back around to the head so something is
+	 * seriously wrong if we can't find it.
+	 */
+	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, rhead_blk,
+				      rhead, wrapped);
+	if (found < 0)
+		return found;
+	if (!found) {
+		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
+		return -EIO;
+	}
+
+	*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
+
+	/*
+	 * Now that we have a tail block, check the head of the log for torn
+	 * writes. Search again until we hit the tail or the maximum number of
+	 * log record I/Os that could have been in flight at one time. Use a
+	 * temporary buffer so we don't trash the rhead/bp pointer from the
+	 * call above.
+	 */
+	tmp_bp = xlog_get_bp(log, 1);
+	if (!tmp_bp)
+		return -ENOMEM;
+	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
+				      XLOG_MAX_ICLOGS, tmp_bp, &tmp_rhead_blk,
+				      &tmp_rhead, &tmp_wrapped);
+	xlog_put_bp(tmp_bp);
+	if (error < 0)
+		return error;
+
+	/*
+	 * Now run a CRC verification pass over the records starting at the
+	 * block found above to the current head. If a CRC failure occurs, the
+	 * log block of the first bad record is saved in first_bad.
+	 */
+	error = xlog_do_recovery_pass(log, *head_blk, tmp_rhead_blk,
+				      XLOG_RECOVER_CRCPASS, &first_bad);
+	if (error == -EFSBADCRC && first_bad != *tail_blk) {
+		/*
+		 * We've hit a potential torn write. Reset the error and warn
+		 * about it.
+		 */
+		error = 0;
+		xfs_warn(log->l_mp,
+"WARNING: torn write (CRC failure) detected at block 0x%llx, recovery truncated",
+			 first_bad);
+
+		/*
+		 * Get the header block and buffer pointer for the last good
+		 * record before the bad record.
+		 *
+		 * Note that xlog_find_tail() clears the blocks at the new head
+		 * (i.e., the records with invalid CRC) if the cycle number
+		 * matches the the current cycle.
+		 */
+		found = xlog_rseek_logrec_hdr(log, first_bad, *tail_blk, 1, bp,
+					      rhead_blk, rhead, wrapped);
+		if (found < 0)
+			return found;
+		if (found == 0)		/* XXX: right thing to do here? */
+			return -EIO;
+
+		/*
+		 * Reset the head block to the starting block of the first bad
+		 * log record and set the tail block based on the last good
+		 * record.
+		 */
+		*head_blk = first_bad;
+		*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
+
+		/*
+		 * Now verify the tail based on the updated head. This is
+		 * required because the torn writes trimmed from the head could
+		 * have been written over the tail of a previous record. Return
+		 * any errors since recovery cannot proceed if the tail is
+		 * corrupt.
+		 *
+		 * XXX: This leaves a gap in truly robust protection from torn
+		 * writes in the log. If the head is behind the tail, the tail
+		 * pushes forward to create some space and then a crash occurs
+		 * causing the writes into the previous record's tail region to
+		 * tear, log recovery isn't able to recover.
+		 *
+		 * How likely is this to occur? If possible, can we do something
+		 * more intelligent here? Is it safe to push the tail forward if
+		 * we can determine that the tail is within the range of the
+		 * torn write (e.g., the kernel can only overwrite the tail if
+		 * it has actually been pushed forward)? Alternatively, could we
+		 * somehow prevent this condition at runtime?
+		 */
+		error = xlog_verify_tail(log, *head_blk, *tail_blk);
+	}
+
+	return error;
+}
+
+/*
  * Find the sync block number or the tail of the log.
  *
  * This will be the block number of the last record to have its
@@ -966,9 +1233,10 @@ xlog_find_tail(
 	xlog_op_header_t	*op_head;
 	char			*offset = NULL;
 	xfs_buf_t		*bp;
-	int			error, i, found;
+	int			error;
 	xfs_daddr_t		umount_data_blk;
 	xfs_daddr_t		after_umount_blk;
+	xfs_daddr_t		rhead_blk;
 	xfs_lsn_t		tail_lsn;
 	int			hblks;
 	bool			wrapped = false;
@@ -995,24 +1263,16 @@ xlog_find_tail(
 	}
 
 	/*
-	 * Search backwards through the log looking for the log record header
-	 * block. This wraps all the way back around to the head so something is
-	 * seriously wrong if we can't find it.
+	 * Trim the head block back to skip over torn records. We can have
+	 * multiple log I/Os in flight at any time, so we assume CRC failures
+	 * back through the previous several records are torn writes and skip
+	 * them.
 	 */
 	ASSERT(*head_blk < INT_MAX);
-	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, &i,
-				      &rhead, &wrapped);
-	if (found < 0) {
-		error = found;
+	error = xlog_verify_head(log, head_blk, tail_blk, bp, &rhead_blk,
+				 &rhead, &wrapped);
+	if (error)
 		goto done;
-	}
-	if (!found) {
-		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
-		xlog_put_bp(bp);
-		ASSERT(0);
-		return -EIO;
-	}
-	*tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn));
 
 	/*
 	 * Reset log values according to the state of the log when we
@@ -1024,7 +1284,7 @@ xlog_find_tail(
 	 * written was complete and ended exactly on the end boundary
 	 * of the physical log.
 	 */
-	log->l_prev_block = i;
+	log->l_prev_block = rhead_blk;
 	log->l_curr_block = (int)*head_blk;
 	log->l_curr_cycle = be32_to_cpu(rhead->h_cycle);
 	if (wrapped)
@@ -1062,12 +1322,12 @@ xlog_find_tail(
 	} else {
 		hblks = 1;
 	}
-	after_umount_blk = (i + hblks + (int)
+	after_umount_blk = (rhead_blk + hblks + (int)
 		BTOBB(be32_to_cpu(rhead->h_len))) % log->l_logBBsize;
 	tail_lsn = atomic64_read(&log->l_tail_lsn);
 	if (*head_blk == after_umount_blk &&
 	    be32_to_cpu(rhead->h_num_logops) == 1) {
-		umount_data_blk = (i + hblks) % log->l_logBBsize;
+		umount_data_blk = (rhead_blk + hblks) % log->l_logBBsize;
 		error = xlog_bread(log, umount_data_blk, 1, bp, &offset);
 		if (error)
 			goto done;
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 8/8] xfs: debug mode log recovery crc error injection
  2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
                   ` (6 preceding siblings ...)
  2015-11-09 20:21 ` [PATCH 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
@ 2015-11-09 20:21 ` Brian Foster
  7 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-09 20:21 UTC (permalink / raw)
  To: xfs

XFS now uses CRC verification over a limited section of the log to
detect torn writes prior to a crash. This is difficult to test directly
due to the timing and hardware requirements to cause a short write.

Add a DEBUG mode error injection mechanism to facilitate testing torn
write detection. The mechanism injects CRC errors for random records in
the range of the log that is scanned for torn writes. Corruptions are
simulated in-core only and not generated on-disk. However, the
subsequent log head truncation and partial recovery can and will cause
permanent data loss. Further, error injection is global in nature and
will inject errors even when the log is clean to allow for test coverage
for torn writes of unmount records.

This option is dangerous and is for development and testing purposes
only. Error injection can be enabled and disabled via:

	/sys/fs/xfs/debug/log_recovery_crc_fail

... on CONFIG_XFS_DEBUG enabled kernels only.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_globals.c     |  1 +
 fs/xfs/xfs_log_recover.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_sysctl.h      |  1 +
 fs/xfs/xfs_sysfs.c       | 31 ++++++++++++++++
 4 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
index 4d41b24..0f5a242 100644
--- a/fs/xfs/xfs_globals.c
+++ b/fs/xfs/xfs_globals.c
@@ -46,4 +46,5 @@ xfs_param_t xfs_params = {
 
 struct xfs_globals xfs_globals = {
 	.log_recovery_delay	=	0,	/* no delay by default */
+	.log_recovery_crc_fail	=	0,	/* error injection default */
 };
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 83fd7ca..eb088d2 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1078,6 +1078,82 @@ out:
 }
 
 /*
+ * Inject log record CRC failures during recovery. This facilitates testing of
+ * the recovery mechanism for short writes to the log in the event of a crash.
+ *
+ * Note that this is dangerous and for development and testing purposes only.
+ * This can only be enabled via the DEBUG mode log_recovery_crc_fail sysfs
+ * option.
+ */
+STATIC int
+xlog_debug_crc_failure(
+	struct xlog		*log,
+	xfs_daddr_t		head_blk,
+	xfs_daddr_t		tail_blk,
+	int			total,
+	xfs_daddr_t		*first_bad)
+{
+	struct xlog_rec_header	*tmp_rhead;
+	struct xfs_buf		*tmp_bp;
+	xfs_daddr_t		tmp_rhead_blk;
+	int			found;
+	int			badcnt;
+	bool			tmp_wrapped;
+
+	ASSERT(xfs_globals.log_recovery_crc_fail);
+
+	/*
+	 * In practice, a clean log ends with an unmount record with a tail_lsn
+	 * that points to the previous record. The reverse scan from the head
+	 * block thus finds two record headers (e.g., total == 2). This means
+	 * that errors are injected even after clean unmounts in order to
+	 * simulate torn writes of unmount records.
+	 *
+	 * Note: increase this value to 3 or larger to prevent error injection
+	 * on clean mounts.
+	 */
+	if (total < 2)
+		return 0;
+
+	/*
+	 * Randomly determine which record to flag as corrupt.
+	 */
+	badcnt = prandom_u32() % total;
+	if (!badcnt)
+		return 0;
+
+	tmp_bp = xlog_get_bp(log, 1);
+	if (!tmp_bp)
+		return -ENOMEM;
+
+	/*
+	 * Locate the Nth record header block back from the head block. Set
+	 * first_bad to point to this record and return a CRC error to flag it
+	 * as corrupt.
+	 *
+	 * For example, suppose 8 records are written at the head of the log and
+	 * badcnt is set to 3. This locates the 3rd record back from the current
+	 * head block and calls it corrupt. In response, the log recovery CRC
+	 * error handling code truncates these last 3 records from the head,
+	 * verifies the new tail based on the truncated head and attempts
+	 * recovery up through the first 5 records.
+	 */
+	found = xlog_rseek_logrec_hdr(log, head_blk, tail_blk, badcnt, tmp_bp,
+				      &tmp_rhead_blk, &tmp_rhead, &tmp_wrapped);
+	xlog_put_bp(tmp_bp);
+	if (found < 0)
+		return found;
+	if (found != badcnt)
+		return 0;
+
+	xfs_warn(log->l_mp, "Generated CRC error at log block 0x%llx "
+		 "(%d bad records out of %d)", tmp_rhead_blk, badcnt, total);
+
+	*first_bad = tmp_rhead_blk;
+	return -EFSBADCRC;
+}
+
+/*
  * Detect and trim torn writes from the head of the log.
  *
  * Storage without sector atomicity guarantees can result in torn writes in the
@@ -1134,12 +1210,12 @@ xlog_verify_head(
 	tmp_bp = xlog_get_bp(log, 1);
 	if (!tmp_bp)
 		return -ENOMEM;
-	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
+	found = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
 				      XLOG_MAX_ICLOGS, tmp_bp, &tmp_rhead_blk,
 				      &tmp_rhead, &tmp_wrapped);
 	xlog_put_bp(tmp_bp);
-	if (error < 0)
-		return error;
+	if (found < 0)
+		return found;
 
 	/*
 	 * Now run a CRC verification pass over the records starting at the
@@ -1148,6 +1224,16 @@ xlog_verify_head(
 	 */
 	error = xlog_do_recovery_pass(log, *head_blk, tmp_rhead_blk,
 				      XLOG_RECOVER_CRCPASS, &first_bad);
+
+	/*
+	 * Run DEBUG mode CRC error injection if it is enabled and we haven't
+	 * legitimately failed. XXX: This is dangerous! For testing purposes
+	 * only!
+	 */
+	if (!error && xfs_globals.log_recovery_crc_fail)
+		error = xlog_debug_crc_failure(log, *head_blk, tmp_rhead_blk,
+					       found, &first_bad);
+
 	if (error == -EFSBADCRC && first_bad != *tail_blk) {
 		/*
 		 * We've hit a potential torn write. Reset the error and warn
diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
index ffef453..54a6f53 100644
--- a/fs/xfs/xfs_sysctl.h
+++ b/fs/xfs/xfs_sysctl.h
@@ -94,6 +94,7 @@ extern xfs_param_t	xfs_params;
 
 struct xfs_globals {
 	int	log_recovery_delay;	/* log recovery delay (secs) */
+	int	log_recovery_crc_fail;	/* generate log crc errors */
 };
 extern struct xfs_globals	xfs_globals;
 
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index ee70f5d..ab0c01b 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -116,8 +116,39 @@ log_recovery_delay_show(
 }
 XFS_SYSFS_ATTR_RW(log_recovery_delay);
 
+STATIC ssize_t
+log_recovery_crc_fail_store(
+	struct kobject	*kobject,
+	const char	*buf,
+	size_t		count)
+{
+	int		ret;
+	int		val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (val < 0 || val > 1)
+		return -EINVAL;
+
+	xfs_globals.log_recovery_crc_fail = val;
+
+	return count;
+}
+
+STATIC ssize_t
+log_recovery_crc_fail_show(
+	struct kobject	*kobject,
+	char		*buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.log_recovery_crc_fail);
+}
+XFS_SYSFS_ATTR_RW(log_recovery_crc_fail);
+
 static struct attribute *xfs_dbg_attrs[] = {
 	ATTR_LIST(log_recovery_delay),
+	ATTR_LIST(log_recovery_crc_fail),
 	NULL,
 };
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 7/8] xfs: detect and trim torn writes during log recovery
  2015-11-09 20:21 ` [PATCH 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
@ 2015-11-10 13:45   ` Brian Foster
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-10 13:45 UTC (permalink / raw)
  To: xfs

On Mon, Nov 09, 2015 at 03:21:14PM -0500, Brian Foster wrote:
> Certain types of storage, such as persistent memory, do not provide
> sector atomicity for writes. This means that if a crash occurs while XFS
> is writing log records, only part of those records might make it to the
> storage. This is problematic because log recovery uses the cycle value
> packed at the top of each log block to locate the head/tail of the log.
> This can lead to CRC verification failures during log recovery and an
> unmountable fs for a filesystem that is otherwise consistent.
> 
> Update log recovery to incorporate log record CRC verification as part
> of the head/tail discovery process. Once the head is located via the
> traditional algorithm, run a CRC-only pass over the records up to the
> head of the log. If CRC verification fails, assume that the records are
> torn as a matter of policy and trim the head block back to the start of
> the first bad record.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_log_recover.c | 298 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 279 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1b3f8fe..83fd7ca 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
...
> +STATIC int
> +xlog_verify_head(
> +	struct xlog		*log,
> +	xfs_daddr_t		*head_blk,	/* in/out: unverified head */
> +	xfs_daddr_t		*tail_blk,	/* out: tail block */
> +	struct xfs_buf		*bp,
> +	xfs_daddr_t		*rhead_blk,	/* start blk of last record */
> +	struct xlog_rec_header	**rhead,	/* ptr to last record */
> +	bool			*wrapped)	/* last rec. wraps phys. log */
> +{
> +	struct xlog_rec_header	*tmp_rhead;
> +	struct xfs_buf		*tmp_bp;
> +	xfs_daddr_t		first_bad;
> +	xfs_daddr_t		tmp_rhead_blk;
> +	int			found;
> +	int			error;
> +	bool			tmp_wrapped;
> +
> +	/*
> +	 * Search backwards through the log looking for the log record header
> +	 * block. This wraps all the way back around to the head so something is
> +	 * seriously wrong if we can't find it.
> +	 */
> +	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, rhead_blk,
> +				      rhead, wrapped);
> +	if (found < 0)
> +		return found;
> +	if (!found) {
> +		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
> +		return -EIO;
> +	}
> +
> +	*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
> +
> +	/*
> +	 * Now that we have a tail block, check the head of the log for torn
> +	 * writes. Search again until we hit the tail or the maximum number of
> +	 * log record I/Os that could have been in flight at one time. Use a
> +	 * temporary buffer so we don't trash the rhead/bp pointer from the
> +	 * call above.
> +	 */
> +	tmp_bp = xlog_get_bp(log, 1);
> +	if (!tmp_bp)
> +		return -ENOMEM;
> +	error = xlog_rseek_logrec_hdr(log, *head_blk, *tail_blk,
> +				      XLOG_MAX_ICLOGS, tmp_bp, &tmp_rhead_blk,
> +				      &tmp_rhead, &tmp_wrapped);
> +	xlog_put_bp(tmp_bp);
> +	if (error < 0)
> +		return error;
> +
> +	/*
> +	 * Now run a CRC verification pass over the records starting at the
> +	 * block found above to the current head. If a CRC failure occurs, the
> +	 * log block of the first bad record is saved in first_bad.
> +	 */
> +	error = xlog_do_recovery_pass(log, *head_blk, tmp_rhead_blk,
> +				      XLOG_RECOVER_CRCPASS, &first_bad);
> +	if (error == -EFSBADCRC && first_bad != *tail_blk) {

I started playing around with an alternate error injection mechanism
that actually writes out an invalid crc record, aborts on log write
completion and shuts down the fs. With that, I hit an interesting
problem with the first_bad != *tail_blk check in that we don't fix up
the head if the first record written after a covered log is torn and
points to itself as the tail. We end up failing the mount (which is what
would happen currently, anyways), but this should probably be fixed up.

Unfortunately I didn't sufficiently document why I had that check there.
I suspect it is a remnant of an old version that didn't update the tail
when the head was truncated back (before I fully grokked what was
necessary here), and then ran into issues further along when the head ==
tail. This version obviously updates and verifies the tail once it finds
a valid head so I can probably just kill the check. Perhaps I'll replace
it with a head == tail check after the head/tail update to simply bail
out and let normal recovery try to run.

Brian

> +		/*
> +		 * We've hit a potential torn write. Reset the error and warn
> +		 * about it.
> +		 */
> +		error = 0;
> +		xfs_warn(log->l_mp,
> +"WARNING: torn write (CRC failure) detected at block 0x%llx, recovery truncated",
> +			 first_bad);
> +
> +		/*
> +		 * Get the header block and buffer pointer for the last good
> +		 * record before the bad record.
> +		 *
> +		 * Note that xlog_find_tail() clears the blocks at the new head
> +		 * (i.e., the records with invalid CRC) if the cycle number
> +		 * matches the the current cycle.
> +		 */
> +		found = xlog_rseek_logrec_hdr(log, first_bad, *tail_blk, 1, bp,
> +					      rhead_blk, rhead, wrapped);
> +		if (found < 0)
> +			return found;
> +		if (found == 0)		/* XXX: right thing to do here? */
> +			return -EIO;
> +
> +		/*
> +		 * Reset the head block to the starting block of the first bad
> +		 * log record and set the tail block based on the last good
> +		 * record.
> +		 */
> +		*head_blk = first_bad;
> +		*tail_blk = BLOCK_LSN(be64_to_cpu((*rhead)->h_tail_lsn));
> +
> +		/*
> +		 * Now verify the tail based on the updated head. This is
> +		 * required because the torn writes trimmed from the head could
> +		 * have been written over the tail of a previous record. Return
> +		 * any errors since recovery cannot proceed if the tail is
> +		 * corrupt.
> +		 *
> +		 * XXX: This leaves a gap in truly robust protection from torn
> +		 * writes in the log. If the head is behind the tail, the tail
> +		 * pushes forward to create some space and then a crash occurs
> +		 * causing the writes into the previous record's tail region to
> +		 * tear, log recovery isn't able to recover.
> +		 *
> +		 * How likely is this to occur? If possible, can we do something
> +		 * more intelligent here? Is it safe to push the tail forward if
> +		 * we can determine that the tail is within the range of the
> +		 * torn write (e.g., the kernel can only overwrite the tail if
> +		 * it has actually been pushed forward)? Alternatively, could we
> +		 * somehow prevent this condition at runtime?
> +		 */
> +		error = xlog_verify_tail(log, *head_blk, *tail_blk);
> +	}
> +
> +	return error;
> +}
> +
> +/*
>   * Find the sync block number or the tail of the log.
>   *
>   * This will be the block number of the last record to have its
> @@ -966,9 +1233,10 @@ xlog_find_tail(
>  	xlog_op_header_t	*op_head;
>  	char			*offset = NULL;
>  	xfs_buf_t		*bp;
> -	int			error, i, found;
> +	int			error;
>  	xfs_daddr_t		umount_data_blk;
>  	xfs_daddr_t		after_umount_blk;
> +	xfs_daddr_t		rhead_blk;
>  	xfs_lsn_t		tail_lsn;
>  	int			hblks;
>  	bool			wrapped = false;
> @@ -995,24 +1263,16 @@ xlog_find_tail(
>  	}
>  
>  	/*
> -	 * Search backwards through the log looking for the log record header
> -	 * block. This wraps all the way back around to the head so something is
> -	 * seriously wrong if we can't find it.
> +	 * Trim the head block back to skip over torn records. We can have
> +	 * multiple log I/Os in flight at any time, so we assume CRC failures
> +	 * back through the previous several records are torn writes and skip
> +	 * them.
>  	 */
>  	ASSERT(*head_blk < INT_MAX);
> -	found = xlog_rseek_logrec_hdr(log, *head_blk, *head_blk, 1, bp, &i,
> -				      &rhead, &wrapped);
> -	if (found < 0) {
> -		error = found;
> +	error = xlog_verify_head(log, head_blk, tail_blk, bp, &rhead_blk,
> +				 &rhead, &wrapped);
> +	if (error)
>  		goto done;
> -	}
> -	if (!found) {
> -		xfs_warn(log->l_mp, "%s: couldn't find sync record", __func__);
> -		xlog_put_bp(bp);
> -		ASSERT(0);
> -		return -EIO;
> -	}
> -	*tail_blk = BLOCK_LSN(be64_to_cpu(rhead->h_tail_lsn));
>  
>  	/*
>  	 * Reset log values according to the state of the log when we
> @@ -1024,7 +1284,7 @@ xlog_find_tail(
>  	 * written was complete and ended exactly on the end boundary
>  	 * of the physical log.
>  	 */
> -	log->l_prev_block = i;
> +	log->l_prev_block = rhead_blk;
>  	log->l_curr_block = (int)*head_blk;
>  	log->l_curr_cycle = be32_to_cpu(rhead->h_cycle);
>  	if (wrapped)
> @@ -1062,12 +1322,12 @@ xlog_find_tail(
>  	} else {
>  		hblks = 1;
>  	}
> -	after_umount_blk = (i + hblks + (int)
> +	after_umount_blk = (rhead_blk + hblks + (int)
>  		BTOBB(be32_to_cpu(rhead->h_len))) % log->l_logBBsize;
>  	tail_lsn = atomic64_read(&log->l_tail_lsn);
>  	if (*head_blk == after_umount_blk &&
>  	    be32_to_cpu(rhead->h_num_logops) == 1) {
> -		umount_data_blk = (i + hblks) % log->l_logBBsize;
> +		umount_data_blk = (rhead_blk + hblks) % log->l_logBBsize;
>  		error = xlog_bread(log, umount_data_blk, 1, bp, &offset);
>  		if (error)
>  			goto done;
> -- 
> 2.1.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/8] xfs: return start block of first bad log record during recovery
  2015-11-09 20:21 ` [PATCH 4/8] xfs: return start block of first bad log record during recovery Brian Foster
@ 2015-11-10 15:42   ` Brian Foster
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2015-11-10 15:42 UTC (permalink / raw)
  To: xfs

On Mon, Nov 09, 2015 at 03:21:11PM -0500, Brian Foster wrote:
> Each log recovery pass walks from the tail block to the head block and
> processes records appropriately based on the associated log pass type.
> There are various failure conditions that can occur through this
> sequence, such as I/O errors, CRC errors, etc. Log torn write detection
> will perform CRC verification near the head of the log to detect torn
> writes and trim torn records from the log appropriately.
> 
> As it is, xlog_do_recovery_pass() only returns an error code in the
> event of CRC failure, which isn't enough information to trim the head of
> the log. Update xlog_do_recovery_pass() to optionally return the start
> block of the associated record when an error occurs. This patch contains
> no functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_log_recover.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index c2bf307..2ef0880 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -4239,10 +4239,12 @@ xlog_do_recovery_pass(
>  	struct xlog		*log,
>  	xfs_daddr_t		head_blk,
>  	xfs_daddr_t		tail_blk,
> -	int			pass)
> +	int			pass,
> +	xfs_daddr_t		*first_bad)	/* out: first bad log rec */
>  {
>  	xlog_rec_header_t	*rhead;
>  	xfs_daddr_t		blk_no;
> +	xfs_daddr_t		rhead_blk;
>  	char			*offset;
>  	xfs_buf_t		*hbp, *dbp;
>  	int			error = 0, h_size, h_len;
> @@ -4251,6 +4253,7 @@ xlog_do_recovery_pass(
>  	struct hlist_head	rhash[XLOG_RHASH_SIZE];
>  
>  	ASSERT(head_blk != tail_blk);
> +	rhead_blk = 0;
>  
>  	/*
>  	 * Read the header of the tail block and get the iclog buffer size from
> @@ -4325,7 +4328,7 @@ xlog_do_recovery_pass(
>  	}
>  
>  	memset(rhash, 0, sizeof(rhash));
> -	blk_no = tail_blk;
> +	blk_no = rhead_blk = tail_blk;
>  	if (tail_blk > head_blk) {
>  		/*
>  		 * Perform recovery around the end of the physical log.
> @@ -4436,7 +4439,9 @@ xlog_do_recovery_pass(
>  						     pass);
>  			if (error)
>  				goto bread_err2;
> +
>  			blk_no += bblks;
> +			rhead_blk = blk_no;
>  		}
>  
>  		ASSERT(blk_no >= log->l_logBBsize);

There's a rewind of blk_no (not shown in this patch) between the above
loop and the subsequent loop to handle wrapping around the end of the
log. We need to update rhead_blk there as well. Otherwise, if the first
record processed in the following loop is bad, rhead_blk can still point
beyond the end of the log and thus throw off all of the recovery bits
that follow.

Brian

> @@ -4464,13 +4469,19 @@ xlog_do_recovery_pass(
>  		error = xlog_recover_process(log, rhash, rhead, offset, pass);
>  		if (error)
>  			goto bread_err2;
> +
>  		blk_no += bblks + hblks;
> +		rhead_blk = blk_no;
>  	}
>  
>   bread_err2:
>  	xlog_put_bp(dbp);
>   bread_err1:
>  	xlog_put_bp(hbp);
> +
> +	if (error && first_bad)
> +		*first_bad = rhead_blk;
> +
>  	return error;
>  }
>  
> @@ -4508,7 +4519,7 @@ xlog_do_log_recovery(
>  		INIT_LIST_HEAD(&log->l_buf_cancel_table[i]);
>  
>  	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
> -				      XLOG_RECOVER_PASS1);
> +				      XLOG_RECOVER_PASS1, NULL);
>  	if (error != 0) {
>  		kmem_free(log->l_buf_cancel_table);
>  		log->l_buf_cancel_table = NULL;
> @@ -4519,7 +4530,7 @@ xlog_do_log_recovery(
>  	 * When it is complete free the table of buf cancel items.
>  	 */
>  	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
> -				      XLOG_RECOVER_PASS2);
> +				      XLOG_RECOVER_PASS2, NULL);
>  #ifdef DEBUG
>  	if (!error) {
>  		int	i;
> -- 
> 2.1.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-10 15:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-09 20:21 [PATCH 0/8] xfs: log recovery torn write detection Brian Foster
2015-11-09 20:21 ` [PATCH 1/8] xfs: detect and handle invalid iclog size set by mkfs Brian Foster
2015-11-09 20:21 ` [PATCH 2/8] xfs: refactor log record unpack and data processing Brian Foster
2015-11-09 20:21 ` [PATCH 3/8] xfs: refactor and open code log record crc check Brian Foster
2015-11-09 20:21 ` [PATCH 4/8] xfs: return start block of first bad log record during recovery Brian Foster
2015-11-10 15:42   ` Brian Foster
2015-11-09 20:21 ` [PATCH 5/8] xfs: support a crc verification only log record pass Brian Foster
2015-11-09 20:21 ` [PATCH 6/8] xfs: refactor log record start detection into a new helper Brian Foster
2015-11-09 20:21 ` [PATCH 7/8] xfs: detect and trim torn writes during log recovery Brian Foster
2015-11-10 13:45   ` Brian Foster
2015-11-09 20:21 ` [PATCH RFC 8/8] xfs: debug mode log recovery crc error injection Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox