From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kyungmin Park" Subject: Re: Proposal to improve filesystem/block snapshot interaction Date: Wed, 31 Oct 2007 08:19:58 +0900 Message-ID: <9c9fda240710301619y5a066043ye20d8a97bacb644c@mail.gmail.com> References: <20070927063113.GD2989@sgi.com> <7fe698080710300719ne3ff2b0wb4ebd8ffd75a288a@mail.gmail.com> <20071030153707.GC13455@lazybastard.org> <200710301737.29007.arnd@arndb.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "=?ISO-8859-1?Q?J=F6rn_Engel?=" , "Dongjun Shin" , "Greg Banks" , "Linux Filesystem Mailing List" , "David Chinner" , "Donald Douwsma" , "Christoph Hellwig" , "Roger Strassburg" , "Mark Goodwin" , "Brett Jon Grandbois" To: "Arnd Bergmann" Return-path: Received: from wr-out-0506.google.com ([64.233.184.237]:42777 "EHLO wr-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752790AbXJ3XUA convert rfc822-to-8bit (ORCPT ); Tue, 30 Oct 2007 19:20:00 -0400 Received: by wr-out-0506.google.com with SMTP id 36so1236955wra for ; Tue, 30 Oct 2007 16:19:58 -0700 (PDT) In-Reply-To: <200710301737.29007.arnd@arndb.de> Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 10/31/07, Arnd Bergmann wrote: > On Tuesday 30 October 2007, J=F6rn Engel wrote: > > On Tue, 30 October 2007 23:19:48 +0900, Dongjun Shin wrote: > > > On 10/30/07, Arnd Bergmann wrote: > > > > > > > > Not sure. Why shouldn't you be able to reorder the hints provid= ed that > > > > they don't overlap with read/write bios for the same block? > > > > > > You're right. The bios can be reordered if they don't overlap wit= h hint. > > > > I would keep things simpler. Bios can be reordered, full stop. If a= n > > erase and a write overlap, the caller (filesystem?) has to add a > > barrier. > > I thought bios were already ordered if they affect the same blocks. > Either way, I agree that an erase should not be treated special on > the bio layer, its ordering should be handled the same way we do it > for writes. > To support the new ATA command (trim, or dataset), the suggested hint is not enough. We have to send the bio with data (at least one sector or more) since the new ATA command requests the dataset information. And also we have to strictly follow the order using barrier or other methods at filesystem level =46or example, the delete operation in ext3. 1. delete some file 2. ext3_delete_inode() called 3. ... -> ext3_free_blocks_sb() releases the free blocks 4. If it sends the hints here, it breaks the ext3 power off recovery scheme since it trims the data from given information after reboot 5. after transaction, all dirty pages are flushed. after this work, we can trim the free blocks safely. Another approach is modifying the block framework. At I/O scheduler, it don't merge the hint bio (in my terminology, bio control info) with general bio. In this case we also consider the reordering problem. I'm not sure it is possible at this time. Thank you, Kyungmin Park - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html