From mboxrd@z Thu Jan 1 00:00:00 1970 From: Evgeniy Dushistov Subject: Re: [PATCH 1/5]: ufs: missed brelse and wrong baseblk Date: Mon, 19 Jun 2006 17:17:50 +0400 Message-ID: <20060619131750.GA14770@rain.homenetwork> References: <20060617101403.GA22098@rain.homenetwork> <20060618162054.GW27946@ftp.linux.org.uk> <20060618175045.GX27946@ftp.linux.org.uk> <20060619064721.GA6106@rain.homenetwork> <20060619073229.GI27946@ftp.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Return-path: Received: from mx2.mail.ru ([194.67.23.122]:42014 "EHLO mx2.mail.ru") by vger.kernel.org with ESMTP id S932441AbWFSNM1 (ORCPT ); Mon, 19 Jun 2006 09:12:27 -0400 To: Al Viro Content-Disposition: inline In-Reply-To: <20060619073229.GI27946@ftp.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, Jun 19, 2006 at 08:32:29AM +0100, Al Viro wrote: > On Mon, Jun 19, 2006 at 10:47:21AM +0400, Evgeniy Dushistov wrote: > > On Sun, Jun 18, 2006 at 06:50:45PM +0100, Al Viro wrote: > > > * block may be bigger than page. That can cause all sorts of fun > > > problems in interaction with our VM, since allocation can affect more than > > > one page and that has to be taken into account. > > > > In fact this is not a problem. Blocks in terms of linux VFS > > is fragments in terms of UFS. > > And? > > > And if fragment >4096 we just don't mount such file system. > > > > So we can easily support 32K blocks. > > .... Which means that we have allocation unit (aka UFS block) covering > several in-core pages. Which means that if you have a file with 8Kb > hole in the beginning, mmap it shared and dirty the first couple of > pages, you will get the situation when parallel writeout on those two > pages will cause trouble. Both times you'll allocate full block (file > is couple of megabytes long, so forget about partial blocks, relocations, > etc.) And both will try to put the reference to what they'd allocated > into the same inode as the first direct block. Do you see the problem? No, I don't see, can you explain in detail how this affect current implementation? In case of 1k fragments, msync of two pages cause 8 calls of ufs's get_block_t with create == 1, they will be consequent because of synchronization. Only the first one allocate block, all other check if reference to block not 0, and just return appropriate value. See for example ufs_inode_getblock: p = (__fs32 *) bh->b_data + block; tmp = fs32_to_cpu(sb, *p); if (tmp) { *phys = tmp + blockoff; goto out; and that's all, nothing will be allocated. So this is abstract theory about some abstract implementation of UFS, or you see some problems in code? -- /Evgeniy