* [RFC] ext4 metadata checksumming design
@ 2011-08-17 3:25 Darrick J. Wong
2011-08-17 13:57 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Darrick J. Wong @ 2011-08-17 3:25 UTC (permalink / raw)
To: Theodore Ts'o, Andreas Dilger
Cc: linux-fsdevel, linux-ext4, linux-kernel, Sunil Mushran,
Joel Becker, Mingming Cao, Amir Goldstein, Coly Li, Andi Kleen
Hi all,
I've created a page on the ext4 wiki outlining the patchset that I'm working on
to add metadata checksumming to ext4. The page can be found at this address:
https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
For the most part, the metadata objects in ext4 actually have enough space to
squeeze in a 32-bit checksum; it was trivially easy to find a spot in the
superblock, the extent tree, extended attribute blocks, and the inode. Those
pieces are already done and in my tree, but the patchset as a whole is being
held up by the second class of metadata objects.
That second class of objects are the ones that required a bit of work:
- Directory blocks have an "unused" 12-byte directory entry at the very end of
the block; 8 bytes of header are followed by a 32-bit checksum. This can be
taken care of as part of directory rebuilding in e2fsck/rehash.c.
- HTree blocks had to have the dx_entry limit reduced by 1 to accomodate a
checksum. This is also taken care of during e2fsck directory rebuild.
- Extended attribute blocks that are stored in the inode table -- the h_magic
field is written by the kernel, but neither the kernel nor e2fsprogs ever
actually read this field. The field could be reused to checksum the extra
space since (as far as I can tell) EAs are the only user of that empty space.
Other miscellany:
- e2fsprogs had to be converted to always work with ext2_inode_large.
- Various bugs in the htree code....
I hope to have a first draft of the kernel/e2fsprogs patches out on the mailing
list in a week or two, or at least before LPC next month. Still on my todo
list is superblocks, EAs, changing the jbd2 checksum, and rigorous testing on
powerpc.
Please have a look at the design document and please feel free to suggest any
changes.
--D
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC] ext4 metadata checksumming design
2011-08-17 3:25 [RFC] ext4 metadata checksumming design Darrick J. Wong
@ 2011-08-17 13:57 ` Andi Kleen
2011-08-17 17:09 ` Darrick J. Wong
2011-08-18 6:16 ` Andreas Dilger
2011-08-19 17:46 ` Coly Li
2 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2011-08-17 13:57 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
linux-kernel, Sunil Mushran, Joel Becker, Mingming Cao,
Amir Goldstein, Coly Li, Andi Kleen
On Tue, Aug 16, 2011 at 08:25:19PM -0700, Darrick J. Wong wrote:
> Hi all,
>
> I've created a page on the ext4 wiki outlining the patchset that I'm working on
> to add metadata checksumming to ext4. The page can be found at this address:
> https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
Can you summarize the differences to my earlier patchkit?
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] ext4 metadata checksumming design
2011-08-17 13:57 ` Andi Kleen
@ 2011-08-17 17:09 ` Darrick J. Wong
0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2011-08-17 17:09 UTC (permalink / raw)
To: Andi Kleen
Cc: Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
linux-kernel, Sunil Mushran, Joel Becker, Mingming Cao,
Amir Goldstein, Coly Li
On Wed, Aug 17, 2011 at 03:57:19PM +0200, Andi Kleen wrote:
> On Tue, Aug 16, 2011 at 08:25:19PM -0700, Darrick J. Wong wrote:
> > Hi all,
> >
> > I've created a page on the ext4 wiki outlining the patchset that I'm working on
> > to add metadata checksumming to ext4. The page can be found at this address:
> > https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
>
> Can you summarize the differences to my earlier patchkit?
My new patchset intends to expand on your earlier inode/superblock patches by
adding checksums to nearly all metadata objects (directory blocks, htree,
extent tree, block/inode bitmaps, extended attributes). It also adds code to
e2fsprogs to verify the checksums, and make the necessary adjustments to the
on-disk format to add space for checksums.
--D
>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] ext4 metadata checksumming design
2011-08-17 3:25 [RFC] ext4 metadata checksumming design Darrick J. Wong
2011-08-17 13:57 ` Andi Kleen
@ 2011-08-18 6:16 ` Andreas Dilger
2011-08-18 18:14 ` Darrick J. Wong
2011-08-19 17:46 ` Coly Li
2 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2011-08-18 6:16 UTC (permalink / raw)
To: djwong@us.ibm.com
Cc: Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
linux-kernel, Sunil Mushran, Joel Becker, Mingming Cao,
Amir Goldstein, Coly Li, Andi Kleen
On 2011-08-16, at 9:25 PM, "Darrick J. Wong" <djwong@us.ibm.com> wrote:
> - Extended attribute blocks that are stored in the inode table -- the h_magic
> field is written by the kernel, but neither the kernel nor e2fsprogs ever
> actually read this field. The field could be reused to checksum the extra
> space since (as far as I can tell) EAs are the only user of that empty space.
I haven't had a chance to read the document you wrote, but wanted to comment on xattrs. There is a hash field for each xattr (including internal xattrs), and one for the external xattr blocks that can be used to validate the xattr value.
In addition to the hash for the in-inode xattrs, the inode hash itself would serve to validate the xattr values.
I have a patch for e2fsprogs that checks the xattr hash for in-inode xattrs (currently it is always 0).
> Please have a look at the design document and please feel free to suggest any
> changes.
Hopefully soon.
Cheers, Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] ext4 metadata checksumming design
2011-08-18 6:16 ` Andreas Dilger
@ 2011-08-18 18:14 ` Darrick J. Wong
0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2011-08-18 18:14 UTC (permalink / raw)
To: Andreas Dilger
Cc: Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
linux-kernel, Sunil Mushran, Joel Becker, Mingming Cao,
Amir Goldstein, Coly Li, Andi Kleen
On Thu, Aug 18, 2011 at 12:16:00AM -0600, Andreas Dilger wrote:
> On 2011-08-16, at 9:25 PM, "Darrick J. Wong" <djwong@us.ibm.com> wrote:
> > - Extended attribute blocks that are stored in the inode table -- the h_magic
> > field is written by the kernel, but neither the kernel nor e2fsprogs ever
> > actually read this field. The field could be reused to checksum the extra
> > space since (as far as I can tell) EAs are the only user of that empty space.
>
> I haven't had a chance to read the document you wrote, but wanted to comment
> on xattrs. There is a hash field for each xattr (including internal xattrs),
> and one for the external xattr blocks that can be used to validate the xattr
> value.
>
> In addition to the hash for the in-inode xattrs, the inode hash itself would
> serve to validate the xattr values.
>
> I have a patch for e2fsprogs that checks the xattr hash for in-inode xattrs
> (currently it is always 0).
I surveyed the h_hash/e_hash calculation code; it only covers the name and
value fields. Do we care about checksum protection for the extra fields in
struct ext4_xattr_header and struct ext4_xattr_entry? I think it would be
useful to be able to check the sanity of h_refcount and h_blocks. Possibly
that extends to e_value_* as well, though the hash probably covers it. Also,
there's no hardware acceleration available for the xattr hash, though I doubt
xattrs are especially performance sensitive.
--D
>
> > Please have a look at the design document and please feel free to suggest any
> > changes.
>
> Hopefully soon.
>
> Cheers, Andreas--
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] ext4 metadata checksumming design
2011-08-17 3:25 [RFC] ext4 metadata checksumming design Darrick J. Wong
2011-08-17 13:57 ` Andi Kleen
2011-08-18 6:16 ` Andreas Dilger
@ 2011-08-19 17:46 ` Coly Li
2 siblings, 0 replies; 6+ messages in thread
From: Coly Li @ 2011-08-19 17:46 UTC (permalink / raw)
To: djwong
Cc: Theodore Ts'o, Andreas Dilger, linux-fsdevel, linux-ext4,
linux-kernel, Sunil Mushran, Joel Becker, Mingming Cao,
Amir Goldstein, Andi Kleen
On 2011年08月17日 11:25, Darrick J. Wong Wrote:
> Hi all,
>
> I've created a page on the ext4 wiki outlining the patchset that I'm working on
> to add metadata checksumming to ext4. The page can be found at this address:
> https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
>
Hi Darrick,
I just go through the proposal, have on objection for most of the text. Only some things want to confirm,
1) If a metadata_csum enabled file system is metadata_csum disabled, it should be better to mark the block group or
inode whether the existing (disabled) checksum is valid or not. So if people re-enable metadata_csum, we can save quite
a lot of time to re-build check sums for all metadata objects.
2) In no-journal mode, every time when we modify the metadata objects, we may have to hold a lock, calculate the check
sum, and release the lock, which may introduce performance regression. I hope this is only my unnecessary over worry.
BTW, an engineer in Taobao kernel team, is trying to count different meta data objects I/O in run time now. One of the
first efforts, is trying to unify a set of routines to read or dirty meta data object blocks. A.k.a something (might)
like ext4_read_ext_block(), ext4_read_idx_block(), ... etc. Then the counting routines can be added inside the meta data
object blocks I/O routines. So far, it seems the modification is not trivial, needs more study on the code. Anyway,
since you mentioned on the wiki page, just let you know what we are doing now :-)
P.S. The idea of meta data I/O counting is to help us understanding the I/O characteristic of our online servers running
Ext4 file systems, which is the basic material for further I/O performance optimization.
Thanks.
--
Coly Li
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-08-19 17:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-17 3:25 [RFC] ext4 metadata checksumming design Darrick J. Wong
2011-08-17 13:57 ` Andi Kleen
2011-08-17 17:09 ` Darrick J. Wong
2011-08-18 6:16 ` Andreas Dilger
2011-08-18 18:14 ` Darrick J. Wong
2011-08-19 17:46 ` Coly Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox