* [PATCH 0/2] xfs: CRCs for log buffers
@ 2012-11-07 13:37 Dave Chinner
2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Dave Chinner @ 2012-11-07 13:37 UTC (permalink / raw)
To: xfs
These patches introduce the first little piece of the CRC picture.
The first patch introduces the calculation and checking functions,
as well as the superblock feature bit for CRCs. The superblock bit
is not set anywhere, not is it really needed for 3.8. There's no
real harm in introducing it now, and doing so means that the log
code can demonstrate how it will differentiate between advisory
warnings and fatal errors on CRC mistmatches during recovery.
The second patch converts the log checksum code to use the CRCs and
enables it for *all* filesystems. This can be done because the log
header already has a CRC field in it, and for production kernels it
is guaranteed to be zeroed. Hence for production kernels, only
issuing a CRC mistmatch warning when the log header CRC field is non
zero means that people can upgrade to a kernel with this
functionality and not see any CRC mismatch warnings.
Warnings look like:
XFS (vda): log record CRC mismatch: found 0xa05866c2, expected 0xd9290110.
ffffc90001088000: 00 00 00 14 00 00 00 00 69 01 00 00 6e 14 a5 3d ........i...n..=
ffffc90001088010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 00 ....i...NART*...
The only issue with this is that filesystems that were not cleanly
shut down on debug kernels will throw CRC mismatch warnings the
first time the are recovered after mount. After the first mount on
upgrade, the warnings won't happen again unless you downgrade to an
older debug kernel. I don't see this as a major problem - debug
kernels are not used in production, and anyone using a debug kernel
should be following this mailing list. ;)
Anyway, the overhead is negliable - I don't see any measurable
impact on metadata heavy operations (cpu verhead or performance),
and the benefits of even advisory warnings on production kernels are
of significant benefit. e.g. the recent log buffer wrap recovery
problem would have triggered a CRC mismatch warning long before
the bad client id error was detected....
As such, I'd really like to have this in the 3.8 kernel - it gets
the initial CRC code more testing, and provides us with an immediate
integrity benefit and important debug information when log recovery
problems are reported (i.e. we know definitely that the log is or
isn't corrupted). I think the risk is rather small that it will
cause problems, and the worst it can cause is scary looking noise
in the logs.
Comments?
Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 13:37 [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner @ 2012-11-07 13:37 ` Dave Chinner 2012-11-07 16:21 ` Andi Kleen ` (2 more replies) 2012-11-07 13:37 ` [PATCH 2/2] xfs: add CRC checks to the log Dave Chinner ` (2 subsequent siblings) 3 siblings, 3 replies; 16+ messages in thread From: Dave Chinner @ 2012-11-07 13:37 UTC (permalink / raw) To: xfs From: Christoph Hellwig <hch@lst.de> - add a mount feature bit for CRC enabled filesystems - add some helpers for generating and verifying the CRCs - add a copy_uuid helper The checksumming helpers are losely based on similar ones in sctp, all other bits come from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/Kconfig | 1 + fs/xfs/uuid.h | 6 +++++ fs/xfs/xfs_cksum.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_linux.h | 1 + fs/xfs/xfs_sb.h | 10 ++++++++- 5 files changed, 79 insertions(+), 1 deletion(-) create mode 100644 fs/xfs/xfs_cksum.h diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index 6100ec0..5a7ffe5 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -2,6 +2,7 @@ config XFS_FS tristate "XFS filesystem support" depends on BLOCK select EXPORTFS + select LIBCRC32C help XFS is a high performance journaling filesystem which originated on the SGI IRIX platform. It is completely multi-threaded, can diff --git a/fs/xfs/uuid.h b/fs/xfs/uuid.h index 4732d71..104db0f 100644 --- a/fs/xfs/uuid.h +++ b/fs/xfs/uuid.h @@ -26,4 +26,10 @@ extern int uuid_is_nil(uuid_t *uuid); extern int uuid_equal(uuid_t *uuid1, uuid_t *uuid2); extern void uuid_getnodeuniq(uuid_t *uuid, int fsid [2]); +static inline void +uuid_copy(uuid_t *dst, uuid_t *src) +{ + memcpy(dst, src, sizeof(uuid_t)); +} + #endif /* __XFS_SUPPORT_UUID_H__ */ diff --git a/fs/xfs/xfs_cksum.h b/fs/xfs/xfs_cksum.h new file mode 100644 index 0000000..623dd7b --- /dev/null +++ b/fs/xfs/xfs_cksum.h @@ -0,0 +1,62 @@ +#ifndef _XFS_CKSUM_H +#define _XFS_CKSUM_H 1 + +#define XFS_CRC_SEED (~(__uint32_t)0) + +/* + * Calculate the intermediate checksum for a buffer that has the CRC field + * inside it. The offset of the 32bit crc fields is passed as the + * cksum_offset parameter. + */ +static inline __uint32_t +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) +{ + __uint32_t zero = 0; + __uint32_t crc; + + /* Calculate CRC up to the checksum. */ + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); + + /* Skip checksum field */ + crc = crc32c(crc, &zero, sizeof(__u32)); + + /* Calculate the rest of the CRC. */ + return crc32c(crc, &buffer[cksum_offset + sizeof(__be32)], + length - (cksum_offset + sizeof(__be32))); +} + +/* + * Convert the intermediate checksum to the final ondisk format. + * + * Note that crc32c is already endianess agnostic, so no additional + * byte swap is needed. + */ +static inline __be32 +xfs_end_cksum(__uint32_t crc) +{ + return (__force __be32)~crc; +} + +/* + * Helper to generate the checksum for a buffer. + */ +static inline void +xfs_update_cksum(char *buffer, size_t length, unsigned long cksum_offset) +{ + __uint32_t crc = xfs_start_cksum(buffer, length, cksum_offset); + + *(__be32 *)(buffer + cksum_offset) = xfs_end_cksum(crc); +} + +/* + * Helper to verify the checksum for a buffer. + */ +static inline int +xfs_verify_cksum(char *buffer, size_t length, unsigned long cksum_offset) +{ + __uint32_t crc = xfs_start_cksum(buffer, length, cksum_offset); + + return *(__be32 *)(buffer + cksum_offset) == xfs_end_cksum(crc); +} + +#endif /* _XFS_CKSUM_H */ diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 828662f..6e288e4 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -44,6 +44,7 @@ #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/slab.h> +#include <linux/crc32c.h> #include <linux/module.h> #include <linux/mutex.h> #include <linux/file.h> diff --git a/fs/xfs/xfs_sb.h b/fs/xfs/xfs_sb.h index f429d9d..666e89c 100644 --- a/fs/xfs/xfs_sb.h +++ b/fs/xfs/xfs_sb.h @@ -81,11 +81,13 @@ struct xfs_mount; #define XFS_SB_VERSION2_ATTR2BIT 0x00000008 /* Inline attr rework */ #define XFS_SB_VERSION2_PARENTBIT 0x00000010 /* parent pointers */ #define XFS_SB_VERSION2_PROJID32BIT 0x00000080 /* 32 bit project id */ +#define XFS_SB_VERSION2_CRCBIT 0x00000100 /* metadata CRCs */ #define XFS_SB_VERSION2_OKREALFBITS \ (XFS_SB_VERSION2_LAZYSBCOUNTBIT | \ XFS_SB_VERSION2_ATTR2BIT | \ - XFS_SB_VERSION2_PROJID32BIT) + XFS_SB_VERSION2_PROJID32BIT | \ + XFS_SB_VERSION2_CRCBIT) #define XFS_SB_VERSION2_OKSASHFBITS \ (0) #define XFS_SB_VERSION2_OKREALBITS \ @@ -503,6 +505,12 @@ static inline int xfs_sb_version_hasprojid32bit(xfs_sb_t *sbp) (sbp->sb_features2 & XFS_SB_VERSION2_PROJID32BIT); } +static inline int xfs_sb_version_hascrc(xfs_sb_t *sbp) +{ + return (xfs_sb_version_hasmorebits(sbp) && + (sbp->sb_features2 & XFS_SB_VERSION2_CRCBIT)); +} + /* * end of superblock version macros */ -- 1.7.10 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner @ 2012-11-07 16:21 ` Andi Kleen 2012-11-07 20:31 ` Dave Chinner 2012-11-08 14:42 ` Mark Tinguely 2012-11-09 22:09 ` Mark Tinguely 2 siblings, 1 reply; 16+ messages in thread From: Andi Kleen @ 2012-11-07 16:21 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave Chinner <david@fromorbit.com> writes: > + */ > +static inline __uint32_t > +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > +{ > + __uint32_t zero = 0; > + __uint32_t crc; > + > + /* Calculate CRC up to the checksum. */ > + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > + > + /* Skip checksum field */ > + crc = crc32c(crc, &zero, sizeof(__u32)); > + > + /* Calculate the rest of the CRC. */ > + return crc32c(crc, &buffer[cksum_offset + sizeof(__be32)], > + length - (cksum_offset + sizeof(__be32))); Don't you need to remap a zero result to some other value, otherwise a real zero checksum will never be checked? -Andi -- ak@linux.intel.com -- Speaking for myself only _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 16:21 ` Andi Kleen @ 2012-11-07 20:31 ` Dave Chinner 2012-11-07 21:39 ` Andi Kleen 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-11-07 20:31 UTC (permalink / raw) To: Andi Kleen; +Cc: xfs On Wed, Nov 07, 2012 at 08:21:27AM -0800, Andi Kleen wrote: > Dave Chinner <david@fromorbit.com> writes: > > + */ > > +static inline __uint32_t > > +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > > +{ > > + __uint32_t zero = 0; > > + __uint32_t crc; > > + > > + /* Calculate CRC up to the checksum. */ > > + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > > + > > + /* Skip checksum field */ > > + crc = crc32c(crc, &zero, sizeof(__u32)); > > + > > + /* Calculate the rest of the CRC. */ > > + return crc32c(crc, &buffer[cksum_offset + sizeof(__be32)], > > + length - (cksum_offset + sizeof(__be32))); > > Don't you need to remap a zero result to some other value, otherwise a > real zero checksum will never be checked? Why would that be necessary? We never include the checksum field in the calculation when setting it or verifiying it, and the verification uses exactly the same method as the original calculation to check the CRC, so it doesn't matter if the CRC value is zero or not - if it matches (zero or otherwise), the validation passes.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 20:31 ` Dave Chinner @ 2012-11-07 21:39 ` Andi Kleen 2012-11-07 22:28 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Andi Kleen @ 2012-11-07 21:39 UTC (permalink / raw) To: Dave Chinner; +Cc: Andi Kleen, xfs > Why would that be necessary? We never include the checksum field in > the calculation when setting it or verifiying it, and the > verification uses exactly the same method as the original > calculation to check the CRC, so it doesn't matter if the CRC value > is zero or not - if it matches (zero or otherwise), the validation > passes.... I thought zero meant the checksum is not there? You stated that somewhere else. -Andi _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 21:39 ` Andi Kleen @ 2012-11-07 22:28 ` Dave Chinner 0 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2012-11-07 22:28 UTC (permalink / raw) To: Andi Kleen; +Cc: xfs On Wed, Nov 07, 2012 at 10:39:46PM +0100, Andi Kleen wrote: > > Why would that be necessary? We never include the checksum field in > > the calculation when setting it or verifiying it, and the > > verification uses exactly the same method as the original > > calculation to check the CRC, so it doesn't matter if the CRC value > > is zero or not - if it matches (zero or otherwise), the validation > > passes.... > > I thought zero meant the checksum is not there? You stated that > somewhere else. That's only to avoid spurious warnings when moving from an existing kernel to a new kernel that issues advisory warnings on mismatches. For enforcement, though, when the on-disk format changes so that all metadata is CRC protected, determination of whether zero is a valid CRC value is determined by a superblock feature bit, not a magic CRC mapping value... Hence mapping the zero value just for advisory warnings really doesn't buy us that much other than complexity for this single case. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner 2012-11-07 16:21 ` Andi Kleen @ 2012-11-08 14:42 ` Mark Tinguely 2012-11-08 21:22 ` Dave Chinner 2012-11-09 22:09 ` Mark Tinguely 2 siblings, 1 reply; 16+ messages in thread From: Mark Tinguely @ 2012-11-08 14:42 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 11/07/12 07:37, Dave Chinner wrote: > +/* > + * Calculate the intermediate checksum for a buffer that has the CRC field > + * inside it. The offset of the 32bit crc fields is passed as the > + * cksum_offset parameter. > + */ > +static inline __uint32_t > +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > +{ > + __uint32_t zero = 0; > + __uint32_t crc; > + > + /* Calculate CRC up to the checksum. */ > + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > + > + /* Skip checksum field */ > + crc = crc32c(crc,&zero, sizeof(__u32)); > + > + /* Calculate the rest of the CRC. */ > + return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], > + length - (cksum_offset + sizeof(__be32))); > +} Why not zero the cksum field and then calculate the crc on the whole buffer? --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-08 14:42 ` Mark Tinguely @ 2012-11-08 21:22 ` Dave Chinner 0 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2012-11-08 21:22 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Nov 08, 2012 at 08:42:56AM -0600, Mark Tinguely wrote: > On 11/07/12 07:37, Dave Chinner wrote: > >+/* > >+ * Calculate the intermediate checksum for a buffer that has the CRC field > >+ * inside it. The offset of the 32bit crc fields is passed as the > >+ * cksum_offset parameter. > >+ */ > >+static inline __uint32_t > >+xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > >+{ > >+ __uint32_t zero = 0; > >+ __uint32_t crc; > >+ > >+ /* Calculate CRC up to the checksum. */ > >+ crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > >+ > >+ /* Skip checksum field */ > >+ crc = crc32c(crc,&zero, sizeof(__u32)); > >+ > >+ /* Calculate the rest of the CRC. */ > >+ return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], > >+ length - (cksum_offset + sizeof(__be32))); > >+} > > Why not zero the cksum field and then calculate the crc on the whole buffer? Because we use the same calculation function for verification of the CRC and we do not want to zero the CRC field when verifying it. It would make subsequent verification calls on an unmodified buffer fail. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner 2012-11-07 16:21 ` Andi Kleen 2012-11-08 14:42 ` Mark Tinguely @ 2012-11-09 22:09 ` Mark Tinguely 2012-11-11 1:26 ` Dave Chinner 2 siblings, 1 reply; 16+ messages in thread From: Mark Tinguely @ 2012-11-09 22:09 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 11/07/12 07:37, Dave Chinner wrote: > From: Christoph Hellwig<hch@lst.de> > > - add a mount feature bit for CRC enabled filesystems > - add some helpers for generating and verifying the CRCs > - add a copy_uuid helper > > The checksumming helpers are losely based on similar ones in sctp, > all other bits come from Dave Chinner. > > Signed-off-by: Christoph Hellwig<hch@lst.de> > Signed-off-by: Dave Chinner<dchinner@redhat.com> > --- > +/* > + * Calculate the intermediate checksum for a buffer that has the CRC field > + * inside it. The offset of the 32bit crc fields is passed as the > + * cksum_offset parameter. > + */ > +static inline __uint32_t > +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > +{ > + __uint32_t zero = 0; > + __uint32_t crc; > + > + /* Calculate CRC up to the checksum. */ > + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > + > + /* Skip checksum field */ > + crc = crc32c(crc,&zero, sizeof(__u32)); I know this came from sctp and I know I can't convince you to copy/null the *cksum_offset to make one block for those with hardware crc32c devices. Since the *cksum_offset value is never used in creating and verifying the checksum, the 4 bytes of zeros does not add any new information, why not just drop it from the cksum calculation? > + > + /* Calculate the rest of the CRC. */ > + return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], > + length - (cksum_offset + sizeof(__be32))); > +} > + > +/* > + * Convert the intermediate checksum to the final ondisk format. > + * > + * Note that crc32c is already endianess agnostic, so no additional > + * byte swap is needed. > + */ > +static inline __be32 > +xfs_end_cksum(__uint32_t crc) > +{ > + return (__force __be32)~crc; > +} > + Wouldn't you have to cpu_to_le32() for big endian machines? commit 9c5ff5f75d0d0a1c7928ecfae3f38418b51a88e3 Author: Vlad Yasevich <vladislav.yasevich@hp.com> Date: Thu Jan 22 14:52:23 2009 -0800 sctp: Fix crc32c calculations on big-endian arhes. crc32c algorithm provides a byteswaped result. On little-endian arches, the result ends up in big-endian/network byte order. On big-endinan arches, the result ends up in little-endian order and needs to be byte swapped again. Thus calling cpu_to_le32 gives the right output. Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/include/net/sctp/checksum.h b/include/net/sctp/checksum.h index b799fb2..2fec3c3 100644 --- a/include/net/sctp/checksum.h +++ b/include/net/sctp/checksum.h @@ -79,5 +79,5 @@ static inline __be32 sctp_update_cksum(__u8 *buffer, __u16 length, __be32 crc32) static inline __be32 sctp_end_cksum(__be32 crc32) { - return ~crc32; + return (__force __be32)~cpu_to_le32((__force u32)crc32); } _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-09 22:09 ` Mark Tinguely @ 2012-11-11 1:26 ` Dave Chinner 2012-11-11 19:54 ` Mark Tinguely 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-11-11 1:26 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Fri, Nov 09, 2012 at 04:09:30PM -0600, Mark Tinguely wrote: > On 11/07/12 07:37, Dave Chinner wrote: > >From: Christoph Hellwig<hch@lst.de> > > > > - add a mount feature bit for CRC enabled filesystems > > - add some helpers for generating and verifying the CRCs > > - add a copy_uuid helper > > > >The checksumming helpers are losely based on similar ones in sctp, > >all other bits come from Dave Chinner. > > > >Signed-off-by: Christoph Hellwig<hch@lst.de> > >Signed-off-by: Dave Chinner<dchinner@redhat.com> > >--- > >+/* > >+ * Calculate the intermediate checksum for a buffer that has the CRC field > >+ * inside it. The offset of the 32bit crc fields is passed as the > >+ * cksum_offset parameter. > >+ */ > >+static inline __uint32_t > >+xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > >+{ > >+ __uint32_t zero = 0; > >+ __uint32_t crc; > >+ > >+ /* Calculate CRC up to the checksum. */ > >+ crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > >+ > >+ /* Skip checksum field */ > >+ crc = crc32c(crc,&zero, sizeof(__u32)); > > I know this came from sctp and I know I can't convince you to copy/null > the *cksum_offset to make one block for those with hardware crc32c devices. > > Since the *cksum_offset value is never used in creating and verifying > the checksum, the 4 bytes of zeros does not add any new information, > why not just drop it from the cksum calculation? Because it gives a different CRC value. If we zero the CRC field, and then do an entire block CRC ignoring the location of the CRC, we get the same value as using the above algorithm. While we aren't going to do this type of verification/calculation in the kernel code, there are use cases for it, say in repair, where we don't have to worry about multiple verifications of the object occurring. Hence by making sure we zero the CRC field during the calculation, we retain the flexibility of doing faster, single pass calculation and verification where it makes sense to use it. If we optimise away the zero block for the CRC, then we that flexibility when it comes to implementing other tools that check and recalculate CRC values. > >+ /* Calculate the rest of the CRC. */ > >+ return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], > >+ length - (cksum_offset + sizeof(__be32))); > >+} > >+ > >+/* > >+ * Convert the intermediate checksum to the final ondisk format. > >+ * > >+ * Note that crc32c is already endianess agnostic, so no additional > >+ * byte swap is needed. > >+ */ > >+static inline __be32 > >+xfs_end_cksum(__uint32_t crc) > >+{ > >+ return (__force __be32)~crc; > >+} > >+ > > Wouldn't you have to cpu_to_le32() for big endian machines? Good catch, I hadn't noticed that fix - it's been quite a while since this particular patch was originally written. So, yeah, it probably does need that fix. FWIW, I don't have a big endian machine immediately handy to test this. Do you? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-11 1:26 ` Dave Chinner @ 2012-11-11 19:54 ` Mark Tinguely 2012-11-11 22:51 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Mark Tinguely @ 2012-11-11 19:54 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 11/10/12 19:26, Dave Chinner wrote: > On Fri, Nov 09, 2012 at 04:09:30PM -0600, Mark Tinguely wrote: >> On 11/07/12 07:37, Dave Chinner wrote: >>> From: Christoph Hellwig<hch@lst.de> >>> >>> - add a mount feature bit for CRC enabled filesystems >>> - add some helpers for generating and verifying the CRCs >>> - add a copy_uuid helper >>> >>> The checksumming helpers are losely based on similar ones in sctp, >>> all other bits come from Dave Chinner. >>> >>> Signed-off-by: Christoph Hellwig<hch@lst.de> >>> Signed-off-by: Dave Chinner<dchinner@redhat.com> >>> --- >>> +/* >>> + * Calculate the intermediate checksum for a buffer that has the CRC field >>> + * inside it. The offset of the 32bit crc fields is passed as the >>> + * cksum_offset parameter. >>> + */ >>> +static inline __uint32_t >>> +xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) >>> +{ >>> + __uint32_t zero = 0; >>> + __uint32_t crc; >>> + >>> + /* Calculate CRC up to the checksum. */ >>> + crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); >>> + >>> + /* Skip checksum field */ >>> + crc = crc32c(crc,&zero, sizeof(__u32)); >> >> I know this came from sctp and I know I can't convince you to copy/null >> the *cksum_offset to make one block for those with hardware crc32c devices. >> >> Since the *cksum_offset value is never used in creating and verifying >> the checksum, the 4 bytes of zeros does not add any new information, >> why not just drop it from the cksum calculation? > > Because it gives a different CRC value. If we zero the CRC field, > and then do an entire block CRC ignoring the location of the CRC, we > get the same value as using the above algorithm. While we aren't > going to do this type of verification/calculation in the kernel > code, there are use cases for it, say in repair, where we don't have > to worry about multiple verifications of the object occurring. > > Hence by making sure we zero the CRC field during the calculation, > we retain the flexibility of doing faster, single pass calculation > and verification where it makes sense to use it. If we optimise > away the zero block for the CRC, then we that flexibility when it > comes to implementing other tools that check and recalculate CRC > values. I was mostly thinking about down the road when crc32c offloading is common. The copy/null/checksum/replace model of the routine can be done anytime that it make sense to do so (as long as only one checksum can happen at one time). > >>> + /* Calculate the rest of the CRC. */ >>> + return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], >>> + length - (cksum_offset + sizeof(__be32))); >>> +} >>> + >>> +/* >>> + * Convert the intermediate checksum to the final ondisk format. >>> + * >>> + * Note that crc32c is already endianess agnostic, so no additional >>> + * byte swap is needed. >>> + */ >>> +static inline __be32 >>> +xfs_end_cksum(__uint32_t crc) >>> +{ >>> + return (__force __be32)~crc; >>> +} >>> + >> >> Wouldn't you have to cpu_to_le32() for big endian machines? > > Good catch, I hadn't noticed that fix - it's been quite a while > since this particular patch was originally written. So, yeah, it > probably does need that fix. > > FWIW, I don't have a big endian machine immediately handy to test > this. Do you? I am sure there is something in the pool. I will ask on Monday. Someone was going to do some testing on a big endian machine before the user space release anyway. I guess I just volunteered. A log recovery test from a different endian machine sounds interesting. > Cheers, > > Dave. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] xfs: add CRC infrastructure 2012-11-11 19:54 ` Mark Tinguely @ 2012-11-11 22:51 ` Dave Chinner 0 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2012-11-11 22:51 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Sun, Nov 11, 2012 at 01:54:28PM -0600, Mark Tinguely wrote: > On 11/10/12 19:26, Dave Chinner wrote: > >On Fri, Nov 09, 2012 at 04:09:30PM -0600, Mark Tinguely wrote: > >>On 11/07/12 07:37, Dave Chinner wrote: > >>>From: Christoph Hellwig<hch@lst.de> > >>> > >>> - add a mount feature bit for CRC enabled filesystems > >>> - add some helpers for generating and verifying the CRCs > >>> - add a copy_uuid helper > >>> > >>>The checksumming helpers are losely based on similar ones in sctp, > >>>all other bits come from Dave Chinner. > >>> > >>>Signed-off-by: Christoph Hellwig<hch@lst.de> > >>>Signed-off-by: Dave Chinner<dchinner@redhat.com> > >>>--- > >>>+/* > >>>+ * Calculate the intermediate checksum for a buffer that has the CRC field > >>>+ * inside it. The offset of the 32bit crc fields is passed as the > >>>+ * cksum_offset parameter. > >>>+ */ > >>>+static inline __uint32_t > >>>+xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) > >>>+{ > >>>+ __uint32_t zero = 0; > >>>+ __uint32_t crc; > >>>+ > >>>+ /* Calculate CRC up to the checksum. */ > >>>+ crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); > >>>+ > >>>+ /* Skip checksum field */ > >>>+ crc = crc32c(crc,&zero, sizeof(__u32)); > >> > >>I know this came from sctp and I know I can't convince you to copy/null > >>the *cksum_offset to make one block for those with hardware crc32c devices. > >> > >>Since the *cksum_offset value is never used in creating and verifying > >>the checksum, the 4 bytes of zeros does not add any new information, > >>why not just drop it from the cksum calculation? > > > >Because it gives a different CRC value. If we zero the CRC field, > >and then do an entire block CRC ignoring the location of the CRC, we > >get the same value as using the above algorithm. While we aren't > >going to do this type of verification/calculation in the kernel > >code, there are use cases for it, say in repair, where we don't have > >to worry about multiple verifications of the object occurring. > > > >Hence by making sure we zero the CRC field during the calculation, > >we retain the flexibility of doing faster, single pass calculation > >and verification where it makes sense to use it. If we optimise > >away the zero block for the CRC, then we that flexibility when it > >comes to implementing other tools that check and recalculate CRC > >values. > > I was mostly thinking about down the road when crc32c offloading is > common. The copy/null/checksum/replace model of the routine can be done > anytime that it make sense to do so (as long as only one checksum can > happen at one time). Sure, but it is already common - the crypto framework already has a module for offloading CRC32c to intel CPUs rather than using the generic implementation: The guest: $ grep -A 8 crc32c /proc/crypto name : crc32c driver : crc32c-generic module : kernel priority : 100 refcnt : 2 selftest : passed type : shash blocksize : 1 digestsize : 4 The host: $ $ grep -A 8 crc32c /proc/crypto name : crc32c driver : crc32c-intel module : crc32c_intel priority : 200 refcnt : 2 selftest : passed type : shash blocksize : 1 digestsize : 4 If the hardware offload is to slow for small ranges, then that is up to the crypto driver to deal with (e.g. by not offloading it). > > > > >>>+ /* Calculate the rest of the CRC. */ > >>>+ return crc32c(crc,&buffer[cksum_offset + sizeof(__be32)], > >>>+ length - (cksum_offset + sizeof(__be32))); > >>>+} > >>>+ > >>>+/* > >>>+ * Convert the intermediate checksum to the final ondisk format. > >>>+ * > >>>+ * Note that crc32c is already endianess agnostic, so no additional > >>>+ * byte swap is needed. > >>>+ */ > >>>+static inline __be32 > >>>+xfs_end_cksum(__uint32_t crc) > >>>+{ > >>>+ return (__force __be32)~crc; > >>>+} > >>>+ > >> > >>Wouldn't you have to cpu_to_le32() for big endian machines? > > > >Good catch, I hadn't noticed that fix - it's been quite a while > >since this particular patch was originally written. So, yeah, it > >probably does need that fix. > > > >FWIW, I don't have a big endian machine immediately handy to test > >this. Do you? > > I am sure there is something in the pool. I will ask on Monday. > Someone was going to do some testing on a big endian machine before the > user space release anyway. I guess I just volunteered. A log recovery > test from a different endian machine sounds interesting. Won't work. Log records are written in machine format, not endian neutral. Log recovery can only be done on a machine of the same endianness. Something like the patch below just to output the CRCs in host order will be sufficient to tell us whether they'll end up the same or different on disk without swapping will do the trick. (i.e. if both hosts output the same CRC value, then they'll have different endianess on disk...) Cheers, Dave. -- Dave Chinner david@fromorbit.com --- fs/xfs/xfs_super.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e0b6863..26bd864 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -51,6 +51,7 @@ #include "xfs_inode_item.h" #include "xfs_icache.h" #include "xfs_trace.h" +#include "xfs_cksum.h" #include <linux/namei.h> #include <linux/init.h> @@ -1504,6 +1505,35 @@ out_destroy_workqueues: goto out_free_sb; } +static void +xfs_crc_test(void) +{ + static char buf[512]; + int i; + __u32 crc; + __u32 vcrc; + __be32 ecrc; + __be32 evcrc; + + for (i = 0; i < 512; i++) { + buf[i] = i & 0xff; + } + + crc = xfs_start_cksum(buf, 512, 64); + ecrc = xfs_end_cksum(crc); + + xfs_warn(NULL, "crc: normal val 0x%x/0x%x, verify = %s", crc, ecrc, + xfs_verify_cksum(buf, 512, 64) ? "good" : "bad"); + + crc = crc32c(XFS_CRC_SEED, buf, 64); + crc = crc32c(crc, &buf[68], 512 - 68); + ecrc = xfs_end_cksum(crc); + + xfs_warn(NULL, "crc: skipped zero val 0x%x/0x%x, verify = %s", crc, ecrc, + xfs_verify_cksum(buf, 512, 64) ? "good" : "bad"); + +} + STATIC struct dentry * xfs_fs_mount( struct file_system_type *fs_type, @@ -1511,6 +1546,7 @@ xfs_fs_mount( const char *dev_name, void *data) { + xfs_crc_test(); return mount_bdev(fs_type, flags, dev_name, data, xfs_fs_fill_super); } _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] xfs: add CRC checks to the log 2012-11-07 13:37 [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner @ 2012-11-07 13:37 ` Dave Chinner 2012-11-11 19:57 ` Mark Tinguely 2012-11-08 0:21 ` [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner 2012-11-09 0:05 ` Ben Myers 3 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-11-07 13:37 UTC (permalink / raw) To: xfs From: Christoph Hellwig <hch@lst.de> Implement CRCs for the log buffers. We re-use a field in struct xlog_rec_header that was used for a weak checksum of the log buffer payload in debug builds before. The new checksumming uses the crc32c checksum we will use elsewhere in XFS, and also protects the record header and addition cycle data. Due to this there are some interesting changes in xlog_sync, as we need to do the cycle wrapping for the split buffer case much earlier, as we would touch the buffer after generating the checksum otherwise. The CRC calculation is always enabled, even for non-CRC filesystems, as adding this CRC does not change the log format. On non-CRC filesystems, only issue an alert if a CRC mismatch is found and allow recovery to continue - this will act as an indicator that log recovery problems are a result of log corruption. On CRC enabled filesystems, however, log recovery will fail. Note that existing debug kernels will write a simple checksum value to the log, so the first time this is run on a filesystem taht was last used on a debug kernel it will through CRC mismatch warning errors. These can be ignored. Initially based on a patch from Dave Chinner, then modified significantly by Christoph Hellwig. Modified again by Dave Chinner to get to this version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_log.c | 132 ++++++++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_log_priv.h | 11 ++-- fs/xfs/xfs_log_recover.c | 132 ++++++++++++++++++++++------------------------ 3 files changed, 176 insertions(+), 99 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 1d6d2ee..c6d6e13 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -35,6 +35,7 @@ #include "xfs_inode.h" #include "xfs_trace.h" #include "xfs_fsops.h" +#include "xfs_cksum.h" kmem_zone_t *xfs_log_ticket_zone; @@ -1490,6 +1491,84 @@ xlog_grant_push_ail( } /* + * Stamp cycle number in every block + */ +STATIC void +xlog_pack_data( + struct xlog *log, + struct xlog_in_core *iclog, + int roundoff) +{ + int i, j, k; + int size = iclog->ic_offset + roundoff; + __be32 cycle_lsn; + xfs_caddr_t dp; + + cycle_lsn = CYCLE_LSN_DISK(iclog->ic_header.h_lsn); + + dp = iclog->ic_datap; + for (i = 0; i < BTOBB(size); i++) { + if (i >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) + break; + iclog->ic_header.h_cycle_data[i] = *(__be32 *)dp; + *(__be32 *)dp = cycle_lsn; + dp += BBSIZE; + } + + if (xfs_sb_version_haslogv2(&log->l_mp->m_sb)) { + xlog_in_core_2_t *xhdr = iclog->ic_data; + + for ( ; i < BTOBB(size); i++) { + j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); + k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); + xhdr[j].hic_xheader.xh_cycle_data[k] = *(__be32 *)dp; + *(__be32 *)dp = cycle_lsn; + dp += BBSIZE; + } + + for (i = 1; i < log->l_iclog_heads; i++) + xhdr[i].hic_xheader.xh_cycle = cycle_lsn; + } +} + +/* + * Calculate the checksum for a log buffer. + * + * This is a little more complicated than it should be because the various + * headers and the actual data are non-contiguous. + */ +__be32 +xlog_cksum( + struct xlog *log, + struct xlog_rec_header *rhead, + char *dp, + int size) +{ + __uint32_t crc; + + /* first generate the crc for the record header ... */ + crc = xfs_start_cksum((char *)rhead, + sizeof(struct xlog_rec_header), + offsetof(struct xlog_rec_header, h_crc)); + + /* ... then for additional cycle data for v2 logs ... */ + if (xfs_sb_version_haslogv2(&log->l_mp->m_sb)) { + union xlog_in_core2 *xhdr = (union xlog_in_core2 *)rhead; + int i; + + for (i = 1; i < log->l_iclog_heads; i++) { + crc = crc32c(crc, &xhdr[i].hic_xheader, + sizeof(struct xlog_rec_ext_header)); + } + } + + /* ... and finally for the payload */ + crc = crc32c(crc, dp, size); + + return xfs_end_cksum(crc); +} + +/* * The bdstrat callback function for log bufs. This gives us a central * place to trap bufs in case we get hit by a log I/O error and need to * shutdown. Actually, in practice, even when we didn't get a log error, @@ -1549,7 +1628,6 @@ xlog_sync( struct xlog *log, struct xlog_in_core *iclog) { - xfs_caddr_t dptr; /* pointer to byte sized element */ xfs_buf_t *bp; int i; uint count; /* byte count of bwrite */ @@ -1558,6 +1636,7 @@ xlog_sync( int split = 0; /* split write into two regions */ int error; int v2 = xfs_sb_version_haslogv2(&log->l_mp->m_sb); + int size; XFS_STATS_INC(xs_log_writes); ASSERT(atomic_read(&iclog->ic_refcnt) == 0); @@ -1588,13 +1667,10 @@ xlog_sync( xlog_pack_data(log, iclog, roundoff); /* real byte length */ - if (v2) { - iclog->ic_header.h_len = - cpu_to_be32(iclog->ic_offset + roundoff); - } else { - iclog->ic_header.h_len = - cpu_to_be32(iclog->ic_offset); - } + size = iclog->ic_offset; + if (v2) + size += roundoff; + iclog->ic_header.h_len = cpu_to_be32(size); bp = iclog->ic_bp; XFS_BUF_SET_ADDR(bp, BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn))); @@ -1603,12 +1679,36 @@ xlog_sync( /* Do we need to split this write into 2 parts? */ if (XFS_BUF_ADDR(bp) + BTOBB(count) > log->l_logBBsize) { + char *dptr; + split = count - (BBTOB(log->l_logBBsize - XFS_BUF_ADDR(bp))); count = BBTOB(log->l_logBBsize - XFS_BUF_ADDR(bp)); - iclog->ic_bwritecnt = 2; /* split into 2 writes */ + iclog->ic_bwritecnt = 2; + + /* + * Bump the cycle numbers at the start of each block in the + * part of the iclog that ends up in the buffer that gets + * written to the start of the log. + * + * Watch out for the header magic number case, though. + */ + dptr = (char *)&iclog->ic_header + count; + for (i = 0; i < split; i += BBSIZE) { + __uint32_t cycle = be32_to_cpu(*(__be32 *)dptr); + if (++cycle == XLOG_HEADER_MAGIC_NUM) + cycle++; + *(__be32 *)dptr = cpu_to_be32(cycle); + + dptr += BBSIZE; + } } else { iclog->ic_bwritecnt = 1; } + + /* calculcate the checksum */ + iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, + iclog->ic_datap, size); + bp->b_io_length = BTOBB(count); bp->b_fspriv = iclog; XFS_BUF_ZEROFLAGS(bp); @@ -1662,19 +1762,6 @@ xlog_sync( bp->b_flags |= XBF_SYNCIO; if (log->l_mp->m_flags & XFS_MOUNT_BARRIER) bp->b_flags |= XBF_FUA; - dptr = bp->b_addr; - /* - * Bump the cycle numbers at the start of each block - * since this part of the buffer is at the start of - * a new cycle. Watch out for the header magic number - * case, though. - */ - for (i = 0; i < split; i += BBSIZE) { - be32_add_cpu((__be32 *)dptr, 1); - if (be32_to_cpu(*(__be32 *)dptr) == XLOG_HEADER_MAGIC_NUM) - be32_add_cpu((__be32 *)dptr, 1); - dptr += BBSIZE; - } ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1); ASSERT(XFS_BUF_ADDR(bp) + BTOBB(count) <= log->l_logBBsize); @@ -1691,7 +1778,6 @@ xlog_sync( return 0; } /* xlog_sync */ - /* * Deallocate a log structure */ diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 9a4e0e5..0c74e3f 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -139,7 +139,6 @@ static inline uint xlog_get_client_id(__be32 i) /* * Flags for log structure */ -#define XLOG_CHKSUM_MISMATCH 0x1 /* used only during recovery */ #define XLOG_ACTIVE_RECOVERY 0x2 /* in the middle of recovery */ #define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */ #define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being @@ -291,7 +290,7 @@ typedef struct xlog_rec_header { __be32 h_len; /* len in bytes; should be 64-bit aligned: 4 */ __be64 h_lsn; /* lsn of this LR : 8 */ __be64 h_tail_lsn; /* lsn of 1st LR w/ buffers not committed: 8 */ - __be32 h_chksum; /* may not be used; non-zero if used : 4 */ + __be32 h_crc; /* crc of log record : 4 */ __be32 h_prev_block; /* block number to previous LR : 4 */ __be32 h_num_logops; /* number of log operations in this LR : 4 */ __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; @@ -555,11 +554,9 @@ xlog_recover( extern int xlog_recover_finish( struct xlog *log); -extern void -xlog_pack_data( - struct xlog *log, - struct xlog_in_core *iclog, - int); + +extern __be32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, + char *dp, int size); extern kmem_zone_t *xfs_log_ticket_zone; struct xlog_ticket * diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 931e8e2..9c3651c 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -41,6 +41,7 @@ #include "xfs_trans_priv.h" #include "xfs_quota.h" #include "xfs_utils.h" +#include "xfs_cksum.h" #include "xfs_trace.h" #include "xfs_icache.h" @@ -3216,80 +3217,58 @@ xlog_recover_process_iunlinks( mp->m_dmevmask = mp_dmevmask; } - -#ifdef DEBUG -STATIC void -xlog_pack_data_checksum( - struct xlog *log, - struct xlog_in_core *iclog, - int size) -{ - int i; - __be32 *up; - uint chksum = 0; - - up = (__be32 *)iclog->ic_datap; - /* divide length by 4 to get # words */ - for (i = 0; i < (size >> 2); i++) { - chksum ^= be32_to_cpu(*up); - up++; - } - iclog->ic_header.h_chksum = cpu_to_be32(chksum); -} -#else -#define xlog_pack_data_checksum(log, iclog, size) -#endif - /* - * Stamp cycle number in every block + * Upack the log buffer data and crc check it. If the check fails, issue a + * warning if and only if the CRC in the header is non-zero. This makes the + * check an advisory warning, and the zero CRC check will prevent failure + * warnings from being emitted when upgrading the kernel from one that does not + * add CRCs by default. + * + * When filesystems are CRC enabled, this CRC mismatch becomes a fatal log + * corruption failure */ -void -xlog_pack_data( - struct xlog *log, - struct xlog_in_core *iclog, - int roundoff) +STATIC int +xlog_unpack_data_crc( + struct xlog_rec_header *rhead, + xfs_caddr_t dp, + struct xlog *log) { - int i, j, k; - int size = iclog->ic_offset + roundoff; - __be32 cycle_lsn; - xfs_caddr_t dp; - - xlog_pack_data_checksum(log, iclog, size); - - cycle_lsn = CYCLE_LSN_DISK(iclog->ic_header.h_lsn); - - dp = iclog->ic_datap; - for (i = 0; i < BTOBB(size) && - i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) { - iclog->ic_header.h_cycle_data[i] = *(__be32 *)dp; - *(__be32 *)dp = cycle_lsn; - dp += BBSIZE; - } - - if (xfs_sb_version_haslogv2(&log->l_mp->m_sb)) { - xlog_in_core_2_t *xhdr = iclog->ic_data; - - for ( ; i < BTOBB(size); i++) { - j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - xhdr[j].hic_xheader.xh_cycle_data[k] = *(__be32 *)dp; - *(__be32 *)dp = cycle_lsn; - dp += BBSIZE; + __be32 crc; + + crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len)); + if (crc != rhead->h_crc) { + if (rhead->h_crc || xfs_sb_version_hascrc(&log->l_mp->m_sb)) { + xfs_alert(log->l_mp, + "log record CRC mismatch: found 0x%x, expected 0x%x.\n", + be32_to_cpu(rhead->h_crc), + be32_to_cpu(crc)); + xfs_hex_dump(dp, 32); } - for (i = 1; i < log->l_iclog_heads; i++) { - xhdr[i].hic_xheader.xh_cycle = cycle_lsn; - } + /* + * If we've detected a log record corruption, then we can't + * recover past this point. Abort recovery if we are enforcing + * CRC protection by punting an error back up the stack. + */ + if (xfs_sb_version_hascrc(&log->l_mp->m_sb)) + return EFSCORRUPTED; } + + return 0; } -STATIC void +STATIC int xlog_unpack_data( struct xlog_rec_header *rhead, xfs_caddr_t dp, struct xlog *log) { int i, j, k; + int error; + + error = xlog_unpack_data_crc(rhead, dp, log); + if (error) + return error; for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) && i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) { @@ -3306,6 +3285,8 @@ xlog_unpack_data( dp += BBSIZE; } } + + return 0; } STATIC int @@ -3437,9 +3418,13 @@ xlog_do_recovery_pass( if (error) goto bread_err2; - xlog_unpack_data(rhead, offset, log); - if ((error = xlog_recover_process_data(log, - rhash, rhead, offset, pass))) + error = xlog_unpack_data(rhead, offset, log); + if (error) + goto bread_err2; + + error = xlog_recover_process_data(log, + rhash, rhead, offset, pass); + if (error) goto bread_err2; blk_no += bblks + hblks; } @@ -3549,9 +3534,14 @@ xlog_do_recovery_pass( if (error) goto bread_err2; } - xlog_unpack_data(rhead, offset, log); - if ((error = xlog_recover_process_data(log, rhash, - rhead, offset, pass))) + + error = xlog_unpack_data(rhead, offset, log); + if (error) + goto bread_err2; + + error = xlog_recover_process_data(log, rhash, + rhead, offset, pass); + if (error) goto bread_err2; blk_no += bblks; } @@ -3576,9 +3566,13 @@ xlog_do_recovery_pass( if (error) goto bread_err2; - xlog_unpack_data(rhead, offset, log); - if ((error = xlog_recover_process_data(log, rhash, - rhead, offset, pass))) + error = xlog_unpack_data(rhead, offset, log); + if (error) + goto bread_err2; + + error = xlog_recover_process_data(log, rhash, + rhead, offset, pass); + if (error) goto bread_err2; blk_no += bblks + hblks; } -- 1.7.10 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] xfs: add CRC checks to the log 2012-11-07 13:37 ` [PATCH 2/2] xfs: add CRC checks to the log Dave Chinner @ 2012-11-11 19:57 ` Mark Tinguely 0 siblings, 0 replies; 16+ messages in thread From: Mark Tinguely @ 2012-11-11 19:57 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 11/07/12 07:37, Dave Chinner wrote: > From: Christoph Hellwig<hch@lst.de> > > Implement CRCs for the log buffers. We re-use a field in > struct xlog_rec_header that was used for a weak checksum of the > log buffer payload in debug builds before. > > The new checksumming uses the crc32c checksum we will use elsewhere > in XFS, and also protects the record header and addition cycle data. > > Due to this there are some interesting changes in xlog_sync, as we > need to do the cycle wrapping for the split buffer case much earlier, > as we would touch the buffer after generating the checksum otherwise. > > The CRC calculation is always enabled, even for non-CRC filesystems, > as adding this CRC does not change the log format. On non-CRC > filesystems, only issue an alert if a CRC mismatch is found and > allow recovery to continue - this will act as an indicator that > log recovery problems are a result of log corruption. On CRC enabled > filesystems, however, log recovery will fail. > > Note that existing debug kernels will write a simple checksum value > to the log, so the first time this is run on a filesystem taht was > last used on a debug kernel it will through CRC mismatch warning > errors. These can be ignored. > > Initially based on a patch from Dave Chinner, then modified > significantly by Christoph Hellwig. Modified again by Dave Chinner > to get to this version. > > Signed-off-by: Christoph Hellwig<hch@lst.de> > Signed-off-by: Dave Chinner<dchinner@redhat.com> > --- Looks good. Reviewed-by: Mark Tinguely <tinguely@sgi.com> _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] xfs: CRCs for log buffers 2012-11-07 13:37 [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner 2012-11-07 13:37 ` [PATCH 2/2] xfs: add CRC checks to the log Dave Chinner @ 2012-11-08 0:21 ` Dave Chinner 2012-11-09 0:05 ` Ben Myers 3 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2012-11-08 0:21 UTC (permalink / raw) To: xfs On Thu, Nov 08, 2012 at 12:37:30AM +1100, Dave Chinner wrote: > These patches introduce the first little piece of the CRC picture. > The first patch introduces the calculation and checking functions, > as well as the superblock feature bit for CRCs. The superblock bit > is not set anywhere, not is it really needed for 3.8. There's no > real harm in introducing it now, and doing so means that the log > code can demonstrate how it will differentiate between advisory > warnings and fatal errors on CRC mistmatches during recovery. > > The second patch converts the log checksum code to use the CRCs and > enables it for *all* filesystems. This can be done because the log > header already has a CRC field in it, and for production kernels it > is guaranteed to be zeroed. Hence for production kernels, only > issuing a CRC mistmatch warning when the log header CRC field is non > zero means that people can upgrade to a kernel with this > functionality and not see any CRC mismatch warnings. > > Warnings look like: > > XFS (vda): log record CRC mismatch: found 0xa05866c2, expected 0xd9290110. > > ffffc90001088000: 00 00 00 14 00 00 00 00 69 01 00 00 6e 14 a5 3d ........i...n..= > ffffc90001088010: 00 00 00 10 69 00 00 00 4e 41 52 54 2a 00 00 00 ....i...NART*... > > > The only issue with this is that filesystems that were not cleanly > shut down on debug kernels will throw CRC mismatch warnings the > first time the are recovered after mount. After the first mount on > upgrade, the warnings won't happen again unless you downgrade to an > older debug kernel. I don't see this as a major problem - debug > kernels are not used in production, and anyone using a debug kernel > should be following this mailing list. ;) > > Anyway, the overhead is negliable - I don't see any measurable > impact on metadata heavy operations (cpu verhead or performance), > and the benefits of even advisory warnings on production kernels are > of significant benefit. e.g. the recent log buffer wrap recovery > problem would have triggered a CRC mismatch warning long before > the bad client id error was detected.... FWIW, perf indicates that the CPU overhead of the CRC overhead when unlinking files at around 60,000/s - a log write rate of about 15MB/s - is: 0.25% [kernel] [k] __crc32c_le When creating files at around 90,000/s, the log write rate is ~40-50MB/s, and the CPU overhead is: 0.74% [kernel] [k] __crc32c_le So roughly linear scaling of about 0.15% of 8p per 10MB/s. On a single CPU, that would be 1% CPU overhead per 10MB/s of log throughput. This was measured inside a VM running on a 2.5 year old 2.26GHz Xeon. The VM does not assert the sse4_2 cpu feature, so can not use the special CPU instruction to calculate the CRC as efficiently as the hardware allows.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] xfs: CRCs for log buffers 2012-11-07 13:37 [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner ` (2 preceding siblings ...) 2012-11-08 0:21 ` [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner @ 2012-11-09 0:05 ` Ben Myers 3 siblings, 0 replies; 16+ messages in thread From: Ben Myers @ 2012-11-09 0:05 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hi Dave, On Thu, Nov 08, 2012 at 12:37:30AM +1100, Dave Chinner wrote: > These patches introduce the first little piece of the CRC picture. ... > As such, I'd really like to have this in the 3.8 kernel Sounds fine. We'll try and get this in 3.8. Regards, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-11-11 22:49 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-07 13:37 [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner 2012-11-07 13:37 ` [PATCH 1/2] xfs: add CRC infrastructure Dave Chinner 2012-11-07 16:21 ` Andi Kleen 2012-11-07 20:31 ` Dave Chinner 2012-11-07 21:39 ` Andi Kleen 2012-11-07 22:28 ` Dave Chinner 2012-11-08 14:42 ` Mark Tinguely 2012-11-08 21:22 ` Dave Chinner 2012-11-09 22:09 ` Mark Tinguely 2012-11-11 1:26 ` Dave Chinner 2012-11-11 19:54 ` Mark Tinguely 2012-11-11 22:51 ` Dave Chinner 2012-11-07 13:37 ` [PATCH 2/2] xfs: add CRC checks to the log Dave Chinner 2012-11-11 19:57 ` Mark Tinguely 2012-11-08 0:21 ` [PATCH 0/2] xfs: CRCs for log buffers Dave Chinner 2012-11-09 0:05 ` Ben Myers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox