From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?SsO2cm4=?= Engel Subject: Re: [RFC] TileFS - a proposal for scalable integrity checking Date: Sun, 29 Apr 2007 14:21:13 +0200 Message-ID: <20070429122112.GA30608@lazybastard.org> References: <20070428220522.GN11166@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-fsdevel@vger.kernel.org To: Matt Mackall Return-path: Received: from lazybastard.de ([212.112.238.170]:40206 "EHLO longford.lazybastard.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933187AbXD2MZY (ORCPT ); Sun, 29 Apr 2007 08:25:24 -0400 Content-Disposition: inline In-Reply-To: <20070428220522.GN11166@waste.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: >=20 > This is a relatively simple scheme for making a filesystem with > incremental online consistency checks of both data and metadata. > Overhead can be well under 1% disk space and CPU overhead may also be > very small, while greatly improving filesystem integrity. I like it a lot. Until now it appears to solve more problems and cause fewer new problems than ChunkFS. > [...] >=20 > Divide disk into a bunch of tiles. For each tile, allocate a one > block tile header that contains (inode, checksum) pairs for each > block in the tile. Unused blocks get marked inode -1, filesystem > metadata blocks -2. The first element contains a last-clean > timestamp, a clean flag and a checksum for the block itself. For 4K > blocks with 32-bit inode and CRC, that's 512 blocks per tile (2MB), > with ~.2% overhead. You should add a 64bit fpos field. That allows you to easily check for addressing errors. > [Note that CRCs are optional so we can cut the overhead in half. I > choose CRCs here because they're capable of catching the vast > majority of accidental corruptions at a small cost and mostly serve > to protect against errors not caught by on-disk ECC (eg cable noise, > kernel bugs, cosmic rays). Replacing CRCs with a stronger hash like > SHA-n is perfectly doable.] The checksum cannot protect against a maliciously prepared medium anyway, so crypto makes little sense. Crc32 can provably (if you trust those who did the proof) detect all 1, 2 and 3-bit errors and has a 1:2^32 chance of detecting any remaining errors. That is fairly hard t= o improve on. > Every time we write to a tile, we must mark the tile dirty. To cut > down time to find dirty tiles, the clean bits can be collected into = a > smaller set of blocks, one clean bitmap block per 64GB of data. You can and possibly should organize this as a tree, similar to a file. One bit at the lowest level marks a tile as dirty. One bit for each indirect block pointer marks some tiles behind the pointer as dirty. That scales logarithmically to any filesystem size. J=C3=B6rn --=20 I don't understand it. Nobody does. -- Richard P. Feynman - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html