linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Woestenberg <leonw@mailcan.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>,
	Peter Grandi <pg_xf2@xf2.for.sabi.co.UK>,
	Linux XFS <xfs@oss.sgi.com>
Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
Date: Thu, 18 Dec 2008 09:20:10 +0100	[thread overview]
Message-ID: <494A07BA.1080008@mailcan.com> (raw)
In-Reply-To: <494971B2.1000103@tmr.com>

Hello all,

Bill Davidsen wrote:
> Peter Grandi wrote:
>   
>> Unfortunately that seems the case.
>>
>> The purpose of barriers is to guarantee that relevant data is
>> known to be on persistent storage (kind of hardware 'fsync').
>>
>> In effect write barrier means "tell me when relevant data is on
>> persistent storage", or less precisely "flush/sync writes now
>> and tell me when it is done". Properties as to ordering are just
>> a side effect.
>>   
>>     
>
> I don't get that sense from the barriers stuff in Documentation, in fact 
> I think it's essentially a pure ordering thing, I don't even see that it 
> has an effect of forcing the data to be written to the device, other 
> than by preventing other writes until the drive writes everything. So we 
> read the intended use differently.
>
> What really bothers me is that there's no obvious need for barriers at 
> the device level if the file system is just a bit smarter and does it's 
> own async io (like aio_*), because you can track writes outstanding on a 
> per-fd basis, so instead of stopping the flow of data to the drive, you 
> can just block a file descriptor and wait for the count of outstanding 
> i/o to drop to zero. That provides the order semantics of barriers as 
> far as I can see, having tirelessly thought about it for ten minutes or 
> so. Oh, and did something very similar decades ago in a long-gone 
> mainframe OS.
>   
Did that mainframe OS have re-ordering devices? If it did, you'ld still 
need barriers all the way down:

The drive itself may still re-order writes, thus can cause corruption if 
halfway the power goes down.
 From my understanding, disabling write-caches simply forces the drive 
to operate in-order.

Barriers need to travel all the way down to the point where-after 
everything remains in-order.
Devices with write-cache enabled will still re-order, but not across 
barriers (which are implemented as
either a single cache flush with forced unit access, or a double cache 
flush around the barrier write).

Whether the data has made it to the drive platters is not really 
important from a barrier point of view, however,
iff part of the data made it to the platters, then we want to be sure it 
was in-order.

Because only in this way can we ensure that the data that is on the 
platters is consistent.

Regards,

Leon.



[[HTML alternate version deleted]]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2008-12-18  8:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-06 14:28 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Justin Piszcz
2008-12-06 15:36 ` Eric Sandeen
2008-12-06 20:35   ` Redeeman
2008-12-13 12:54   ` Justin Piszcz
2008-12-13 17:26     ` Martin Steigerwald
2008-12-13 17:40       ` Eric Sandeen
2008-12-14  3:31         ` Redeeman
2008-12-14 14:02           ` Peter Grandi
2008-12-14 18:12             ` Martin Steigerwald
2008-12-14 22:02               ` Peter Grandi
2008-12-15 22:38                 ` Dave Chinner
2008-12-16  9:39                   ` Martin Steigerwald
2008-12-16 20:57                     ` Peter Grandi
2008-12-16 23:14                     ` Dave Chinner
2008-12-17 21:40                 ` Bill Davidsen
2008-12-18  8:20                   ` Leon Woestenberg [this message]
2008-12-18 23:33                     ` Bill Davidsen
2008-12-21 19:16                     ` Peter Grandi
2008-12-22 13:19                       ` Leon Woestenberg
2008-12-18 22:26                   ` Dave Chinner
2008-12-20 14:06               ` Peter Grandi
2008-12-14 18:35             ` Martin Steigerwald
2008-12-14 17:49           ` Martin Steigerwald
2008-12-14 23:36         ` Dave Chinner
2008-12-13 18:01       ` David Lethe
2008-12-06 18:42 ` Peter Grandi
2008-12-11  0:20 ` Bill Davidsen
2008-12-11  9:18   ` Justin Piszcz
2008-12-11  9:24     ` Justin Piszcz
  -- strict thread matches above, loose matches on Subject: below --
2008-12-14 18:33 Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=494A07BA.1080008@mailcan.com \
    --to=leonw@mailcan.com \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg_xf2@xf2.for.sabi.co.UK \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).