From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: [PATCH 1/2 linux-next] Revert "ufs: fix deadlocks introduced by sb mutex merge" Date: Tue, 23 Jun 2015 22:56:41 +0100 Message-ID: <20150623215641.GR17109@ZenIV.linux.org.uk> References: <1432754131-27425-1-git-send-email-fabf@skynet.be> <20150527145735.e3d1913bc66426038d53be32@linux-foundation.org> <20150604050123.GL7232@ZenIV.linux.org.uk> <1122467636.634568.1433521621076.open-xchange@webmail.nmp.proximus.be> <20150605185018.GX7232@ZenIV.linux.org.uk> <20150605220348.GA14402@ZenIV.linux.org.uk> <20150617085715.GC1614@quack.suse.cz> <20150617203116.GG17109@ZenIV.linux.org.uk> <20150619230739.GO17109@ZenIV.linux.org.uk> <20150623164608.GR2427@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Fabian Frederick , Andrew Morton , Alexey Khoroshilov , Ian Campbell , Roger Pau Monne , Ian Jackson , xen-devel , Evgeniy Dushistov , linux-fsdevel@vger.kernel.org To: Jan Kara Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:49092 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932319AbbFWV4w (ORCPT ); Tue, 23 Jun 2015 17:56:52 -0400 Content-Disposition: inline In-Reply-To: <20150623164608.GR2427@quack.suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Jun 23, 2015 at 06:46:08PM +0200, Jan Kara wrote: > Looks good to me. BTW also ext4 (with BIGALLOC feature) and OCFS2 can have > block allocation unit (called cluster) larger than page size. However the > block size of both filesystems is still <= page size. So at least ext4 > handles fun with partially initialized clusters by just marking parts > of the cluster as uninitialized in the extent tree. But the code is still > pretty messy to be honest. Well, with UFS there's no place on disk to store such "this block is uninitialized" marks - it uses a bog-standard Unix inode structure. There are two units - fragments and blocks. Block is an aligned group of adjacent fragments; normal ratio is 8:1. Block is at least 4Kb (and always a power of two), fragment is at least a one sector and block:fragment ratio is at most 8:1. Inode structure is normal for a Unix filesystem (12 direct + indirect + double indirect + triple indirect). Each reference covers a block worth of file offsets and almost all of them point to full blocks. Indirects are full blocks as well. Reference to a block is represented as the number of the first fragment in it (i.e. with normal parameters bits 0..2 are clear). Block bitmap is actually a fragment bitmap (i.e. bit per fragment). The only situation when a reference is *not* to a full block is the last reference in a file shorter than 12*block size (i.e. not requiring indirects at all). In that case the last direct reference points to less than a full block (unless the size in fragments is a multiple of block:fragment ratio, that is). One unusual thing is that holes can't extend to EOF - the last byte *must* be allocated. (BTW, the only difference between UFS2 and UFS1 in that area is that fragment numbers are 64bit now. There had been talk about turning block:fragment ratio into a per-inode value, but so far nobody has implemented that - ->di_blksize is there, but it's never used).