From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id n1INAdda062795 for ; Wed, 18 Feb 2009 17:10:39 -0600 Received: from theco.de (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id D7FF51966F5A for ; Wed, 18 Feb 2009 15:10:03 -0800 (PST) Received: from theco.de (scout.theco.de.mind.de [212.42.230.55]) by cuda.sgi.com with ESMTP id ESm0EtYf4BbHpOBA for ; Wed, 18 Feb 2009 15:10:03 -0800 (PST) Date: Thu, 19 Feb 2009 00:09:58 +0100 From: Ralf Liebenow Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Message-ID: <20090218230958.GA6506@theco.de> References: <200812141912.59649.Martin@lichtvoll.de> <18757.33373.744917.457587@tree.ty.sabi.co.uk> <200812151948.59870.Martin@lichtvoll.de> <18758.57121.570007.816329@tree.ty.sabi.co.uk> Mime-Version: 1.0 Content-Disposition: inline In-Reply-To: Reply-To: ralf@theco.de List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Hello ! > Correct ordering can be proven to be enough to provide transactional > correctness, enough to ensure that filesystems can not get corrupted > on power down. Please beware that caching RAID controllers which are not battery backed and the harddisk (when write caching) may decide to = re-order writes to the disk, so the ordering imposed by the = operating system (filesystem driver) may not be retained. = This is usually done by harddisks and controllers to minimize seek times and thats what disk command queueing is good for. So ordering can only be retained if all external caching mechanism and command queueing are switched off. Otherwise you need to have something like fsync points (barriers ?) to have consistent checkpoints you can rollback to ... So the answer has many variables: = do you have a persistent (battery backed) write cache ? Yes -> you can go with nobarriers if you can make sure that the harddisk cache is off, if the filesystem does proper write ordering. No -> if you switch off the disks cache, you _may_ switch off barriers, when the filesystem driver uses properly placed write ordering -> if you have disk write caching on, you are on your own when power goes down and you dont use barriers ... you maybe lucky or not ... But to make that clear: its only a problem when power is failing ... its not a problem when the machine crashes ... the disks will eventually write down their caches then. So if your system is somewhere connected with a redundant power supply and failsave power supply sytems (as this is the case for most data centers) you can probably live with disk write caching on and nobarriers, if the filesystem driver does order its writes properly .... So I have one open question left: does xfs do proper (transactional) ordering when barriers are off ? Im using xfs for years now and had many machine crashes (not power failures) without xfs get corrupted (and that was before 2.6.17 ... and therefore without barrier support). So I assume it always does proper ordering and barrier support is only making "fsynced" checkpoints in time. Am I right ? Ralf > Hello, > = > On 15 dec 2008, at 23:50, Peter Grandi wrote: > = > >[ ... ] > > > >>>The purpose of barriers is to guarantee that relevant data is > >>>known to be on persistent storage (kind of hardware 'fsync'). > >>> > > > >>[ ... ] Unfortunately in my understanding none of this is > >>reflected by Documentation/block/barrier.txt > > > >But we are talking about XFS and barriers here. That described > >just a (flawed, buggy) mechanism to implement those. Consider > >for example: > > > > http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support. > > http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with= _storage_which_has_a_persistent_write_cache.3F > > > >In any case as to the kernel "barrier" mechanism, its > >description is misleading because it heavily fixates on the > >ordering issue, which is just a consequence, but yet mentions > >the far more important "flush/sync" aspect. > > > >Still, there is a lot of confusion about barrier support and > >what it means at which level, as reflected in several online > >discussions and the different behaviour of different kernel > >versions. > > > The semantics of a barrier are whatever semantics we describe to it. = > So we can continue to be confused about it. > = > I strongly disagree on the ordering issue being a side effect. > = > Correct ordering can be proven to be enough to provide transactional = > correctness, enough to ensure that filesystems can not get corrupted = > on power down. > = > Using barriers to guarantee that (all submitted) write requests = > (before the barrier) made it to the medium are a stronger predicate. > = > The Linux approach and documentation talks about the first type of = > semantics (which I rather like for them being strong enough and not = > more). > = > Regards, > = > Leon > = > = > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > = -- = theCode AG = HRB 78053, Amtsgericht Charlottenbg USt-IdNr.: DE204114808 Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel Aufsichtsratsvorsitzender: Wolf von Jaduczynski Oranienstr. 10-11, 10997 Berlin [=D7] fon +49 30 617 897-0 fax -10 ralf@theCo.de http://www.theCo.de _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs