From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: atomic write & T10 standards Date: Wed, 03 Jul 2013 14:31:59 -0400 Message-ID: <51D46E1F.1090501@redhat.com> References: <51D4365C.1030008@redhat.com> <20130703143844.14981.69152@localhost.localdomain> <51D43B87.5090005@redhat.com> <1372863655.3601.19.camel@dabdike> <51D43D6C.6050505@redhat.com> <1372864959.3601.37.camel@dabdike> <51D442DD.8000001@redhat.com> <1372865829.3601.41.camel@dabdike> <51D4466E.8040408@redhat.com> <20130703155400.14981.4222@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:53451 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754184Ab3GCScJ (ORCPT ); Wed, 3 Jul 2013 14:32:09 -0400 In-Reply-To: <20130703155400.14981.4222@localhost.localdomain> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Chris Mason Cc: James Bottomley , "Martin K. Petersen" , "linux-scsi@vger.kernel.org" On 07/03/2013 11:54 AM, Chris Mason wrote: > Quoting Ric Wheeler (2013-07-03 11:42:38) >> On 07/03/2013 11:37 AM, James Bottomley wrote: >>> On Wed, 2013-07-03 at 11:27 -0400, Ric Wheeler wrote: >>>> On 07/03/2013 11:22 AM, James Bottomley wrote: >>>>> On Wed, 2013-07-03 at 11:04 -0400, Ric Wheeler wrote: >>>>>> Why not have the atomic write actually imply that it is atomic and durable for >>>>>> just that command? >>>>> I don't understand why you think you need guaranteed durability for >>>>> every journal transaction? That's what causes us performance problems >>>>> because we have to pause on every transaction commit. >>>>> >>>>> We require durability for explicit flushes, obviously, but we could >>>>> achieve far better performance if we could just let the filesystem >>>>> updates stream to the disk and rely on atomic writes making sure the >>>>> journal entries were all correct. The reason we require durability for >>>>> journal entries today is to ensure caching effects don't cause the >>>>> journal to lie or be corrupt. >>>> Why would we use atomic writes for things that don't need to be >>>> durable? >>>> >>>> Avoid a torn page write seems to be the only real difference here if >>>> you use the atomic operations and don't have durability... >>> It's not just about torn pages: Journal entries are big complex beasts. >>> They can be megabytes big (at least on xfs). If we can guarantee all or >>> nothing atomicity in the entire journal entry write it permits a more >>> streaming design of the filesystem writeout path. >>> >>> James >>> >>> >> Journals are normally big (128MB or so?) - I don't think that this is unique to xfs. > We're mixing a bunch of concepts here. The filesystems have a lot of > different requirements, and atomics are just one small part. > > Creating a new file often uses resources freed by past files. So > deleting the old must be ordered against allocating the new. They are > really separate atomic units but you can't handle them completely > independently. > >> If our existing journal commit is: >> >> * write the data blocks for a transaction >> * flush >> * write the commit block for the transaction >> * flush >> >> Which part of this does and atomic write help? >> >> We would still need at least: >> >> * atomic write of data blocks & commit blocks >> * flush > Yes. But just because we need the flush here doesn't mean we need the > flush for every single atomic write. > > -chris > The catch is that our current flush mechanisms are still pretty brute force and act across either the whole device or in a temporal (everything flushed before this is acked) way. I still see it would be useful to have the atomic write really be atomic and durable just for that IO - no flush needed. Can you give a sequence for the use case for the non-durable atomic write that would not need a sync? Can we really trust all devices to make something atomic that is not durable :) ? thanks! ric