From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?SsO2cm4=?= Engel Subject: Re: [RFC] TileFS - a proposal for scalable integrity checking Date: Thu, 10 May 2007 02:03:29 +0200 Message-ID: <20070510000328.GA1257@lazybastard.org> References: <20070428220522.GN11166@waste.org> <20070429232349.GA19937@thunk.org> <20070430014042.GL11115@waste.org> <20070509075638.GJ12859@nifty> <20070509170652.GF11115@waste.org> <20070509185921.GB18778@nifty> <20070509195141.GG11115@waste.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Valerie Henson , Theodore Tso , linux-fsdevel@vger.kernel.org To: Matt Mackall Return-path: Received: from lazybastard.de ([212.112.238.170]:38358 "EHLO longford.lazybastard.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752443AbXEJAHx (ORCPT ); Wed, 9 May 2007 20:07:53 -0400 Content-Disposition: inline In-Reply-To: <20070509195141.GG11115@waste.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, 9 May 2007 14:51:41 -0500, Matt Mackall wrote: > On Wed, May 09, 2007 at 11:59:23AM -0700, Valerie Henson wrote: > >=20 > > Hrm. Can you help me understand how you would check i_size then? >=20 > That's pretty straightforward, I think. When we check an inode, we > have to check whether it has a block that corresponds with i_size, an= d > none beyond that. i_size is indeed simple, but that got me thinking. i_blocks is a much harder problem. Luckily it is almost the same problem as the free/used block count for the filesystem. And again the solution would be to hav= e a tree structure and have a sub-total for each node in the tree. Now, inodes already have a tree structure, the indirect blocks. So indirect blocks would need to get an extra field somewhere to store how many used blocks are below them somewhere. Only problem is: indirect blocks have a nice power-of-two size and no spare space around. > That begs the question of when we check various pieces of data. It > seems the best time to check the various elements of an inode is when > we're checking the tile it lives on. This is when we'd check i_size, > that link counts made sense and that the ring of hardlinks was > correct, etc.=20 Yup. Checking i_size costs O(log(n)), i_count with above method is O(log(n)) as well. The hardlink ring is O(number of links). For most people that don't have a forest of hard-linked kernel trees around, tha= t should be fairly small as well. I believe for large files it is important not to check the complete file. We can divide&conquer the physical device, so we can do the same with files. Although I wonder if that would require a dirty bit for inodes as well. > We will, unfortunately, need to be able to check an entire directory > at once. There's no other efficient way to assure that there are no > duplicate names in a directory, for instance. There is. As long as directories are in htree or similar format, that is. Problem is the same as fast lookup. J=C3=B6rn --=20 tglx1 thinks that joern should get a (TM) for "Thinking Is Hard" -- Thomas Gleixner - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html