From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id mBFIniKj029699 for ; Mon, 15 Dec 2008 12:49:47 -0600 Received: from mail.lichtvoll.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 31C5F1733A4B for ; Mon, 15 Dec 2008 10:49:41 -0800 (PST) Received: from mail.lichtvoll.de (mondschein.lichtvoll.de [194.150.191.11]) by cuda.sgi.com with ESMTP id 7Jf6IGlHIGRrQyab for ; Mon, 15 Dec 2008 10:49:41 -0800 (PST) Received: from shambhala.lichtvoll.local (DSL01.83.171.170.108.ip-pool.NEFkom.net [83.171.170.108]) by mail.lichtvoll.de (Postfix) with ESMTPSA id B4E055AE18 for ; Mon, 15 Dec 2008 19:49:05 +0100 (CET) From: Martin Steigerwald Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Date: Mon, 15 Dec 2008 19:48:59 +0100 References: <200812141912.59649.Martin@lichtvoll.de> <18757.33373.744917.457587@tree.ty.sabi.co.uk> (sfid-20081215_095747_992215_AEAEC38B) In-Reply-To: <18757.33373.744917.457587@tree.ty.sabi.co.uk> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200812151948.59870.Martin@lichtvoll.de> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: linux-xfs@oss.sgi.com Am Sonntag 14 Dezember 2008 schrieb Peter Grandi: > [ ... ] > > But - as far as I understood - the filesystem doesn't have to > > wait for barriers to complete, but could continue issuing IO > > requests happily. A barrier only means, any request prior to > > that have to land before and any after it after it. > > > > It doesn't mean that the barrier has to land immediately and > > the filesystem has to wait for this. At least that always was > > the whole point of barriers for me. If thats not the case I > > misunderstood the purpose of barriers to the maximum extent > > possible. > > Unfortunately that seems the case. > > The purpose of barriers is to guarantee that relevant data is > known to be on persistent storage (kind of hardware 'fsync'). > > In effect write barrier means "tell me when relevant data is on > persistent storage", or less precisely "flush/sync writes now > and tell me when it is done". Properties as to ordering are just > a side effect. Interesting to know. Thanks for long explaination. Unfortunately in my understanding none of this is reflected by Documentation/block/barrier.txt Especially this mentions: --------------------------------------------------------------------- I/O Barriers ============ Tejun Heo , July 22 2005 I/O barrier requests are used to guarantee ordering around the barrier requests. Unless you're crazy enough to use disk drives for implementing synchronization constructs (wow, sounds interesting...), the ordering is meaningful only for write requests for things like journal checkpoints. All requests queued before a barrier request must be finished (made it to the physical medium) before the barrier request is started, and all requests queued after the barrier request must be started only after the barrier request is finished (again, made it to the physical medium) In other words, I/O barrier requests have the following two properties. 1. Request ordering Requests cannot pass the barrier request. Preceding requests are processed before the barrier and following requests after. Depending on what features a drive supports, this can be done in one of the following three ways. i. For devices which have queue depth greater than 1 (TCQ devices) and support ordered tags, block layer can just issue the barrier as an ordered request and the lower level driver, controller and drive itself are responsible for making sure that the ordering constraint is met. Most modern SCSI controllers/drives should support this. NOTE: SCSI ordered tag isn't currently used due to limitation in the SCSI midlayer, see the following random notes section. ii. For devices which have queue depth greater than 1 but don't support ordered tags, block layer ensures that the requests preceding a barrier request finishes before issuing the barrier request. Also, it defers requests following the barrier until the barrier request is finished. Older SCSI controllers/drives and SATA drives fall in this category. iii. Devices which have queue depth of 1. This is a degenerate case of ii. Just keeping issue order suffices. Ancient SCSI controllers/drives and IDE drives are in this category. 2. Forced flushing to physical medium Again, if you're not gonna do synchronization with disk drives (dang, it sounds even more appealing now!), the reason you use I/O barriers is mainly to protect filesystem integrity when power failure or some other events abruptly stop the drive from operating and possibly make the drive lose data in its cache. So, I/O barriers need to guarantee that requests actually get written to non-volatile medium in order. There are four cases, i. No write-back cache. Keeping requests ordered is enough. ii. Write-back cache but no flush operation. There's no way to guarantee physical-medium commit order. This kind of devices can't to I/O barriers. iii. Write-back cache and flush operation but no FUA (forced unit access). We need two cache flushes - before and after the barrier request. iv. Write-back cache, flush operation and FUA. We still need one flush to make sure requests preceding a barrier are written to medium, but post-barrier flush can be avoided by using FUA write on the barrier itself. --------------------------------------------------------------------- I do not see any mention of "tell me when its finished" in that file. It just mentions that a cache flush has to be issued before the write barrier and then it shall issue the barrier either as a FUA (forced unit access) request or it shall issue a cache flush after the barrier request. No where it is written that this has to happen immediately. The documentation file is mainly about ordering requests instead and that cache flushes may be used to enforce that regular requests cannot pass barrier requests. Nor do I understand why the filesystem needs to know whether a barrier has been completed - it just needs to know whether the block device / driver can handle barrier requests. If the filesystem knows that requests are written with certain order constraint, then it shouldn't matter when they are written. When should be a choice of the user on how much data she / he risks to loose in case of a sudden interruption of writing out requests. Thus I think the mentioned documentation is at least misleading, if your description matches the actual implementation of write barriers. Then I think it should be adapted, changed. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs