All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Ric Wheeler <rwheeler@redhat.com>,
	Eric Anopolsky <erpo41@gmail.com>,
	Chris Mason <chris.mason@oracle.com>,
	Stephan von Krawczynski <skraw@ithnet.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Some very basic questions
Date: Wed, 22 Oct 2008 12:34:41 -0400	[thread overview]
Message-ID: <48FF5621.6020902@redhat.com> (raw)
In-Reply-To: <48FF515B.2030209@kernel.org>

Tejun Heo wrote:
> Ric Wheeler wrote:
>   
>> Waiting for the target to ack an IO is not sufficient, since the target
>> ack does not (with write cache enabled) mean that it is on persistent
>> storage.
>>     
>
> FS waiting for completion of all the dependent writes isn't too good
> latency and throughput-wise tho.  It would be best if FS can indicate
> dependencies between write commands and barrier so that barrier
> doesn't have to empty the whole queue.  Hmm... Can someone tell me how
> much such scheme would help?
>
>   
I think that this is where SCSI ordered tags come in (or similar 
schemes). The idea would be to have tag all IO. You bump the tag, for 
example after you send down the journal data blocks to a new tag which 
is used for the commit block data sequence.

The ordering would require that lower ranked tags must all be destaged 
to persistent storage before a subsequent tag is written out.

The T13 had a microsoft proposal that is in this area:

http://www.t13.org/Documents/UploadedDocuments/docs2007/e07174r0-Write_Barrier_Command_Proposal.doc


>> The key is to make your transaction commit insure that the commit block
>> itself is not written out of sequence without flushing the dependent IO
>> from the transaction.
>>
>> If we disable the write cache, then file systems effectively do exactly
>> the right thing today as you describe :-)
>>     
>
> For most SATA drives, disabling write back cache seems to take high
> toll on write throughput.  :-(
>   

I have seen a 50% reduction in my testing on S-ATA :-(

>   
>>> IIUC, that should be detectable from FLUSH whether the destaging
>>> occurred as part of flush or not, no?
>>>   
>>>       
>> I am not sure what happens to a write that fails to get destaged from
>> cache. It probably depends on the target firmware, but I imagine that
>> the target cannot hold onto it forever (or all subsequent flushes would
>> always fail).
>>     
>
> As long as the error status is sticky, it doesn't have to hold on to
> the data, it's not gonna be able to write it anyway.  The drive has to
> hold onto the failure information only.  Yeah, but fully agreed on
> that it's most likely dependent on the specific firmware.  There isn't
> any requirement on how to handle write back failure in the ATA spec.
> It wouldn't be too surprising if there are some drives which happily
> report the old data after silent write failure followed by flush and
> power loss at the right timing.
>
> Thanks.
>
>   
agreed....

ric


  reply	other threads:[~2008-10-22 16:34 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 11:23 Some very basic questions Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22   ` Stephan von Krawczynski
2008-10-21 15:34     ` jim owens
2008-10-22 11:36       ` Stephan von Krawczynski
2008-10-22 12:15         ` Avi Kivity
2008-10-22 13:03           ` Ric Wheeler
2008-10-22 13:13             ` Chris Mason
2008-10-22 13:16             ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01   ` Stephan von Krawczynski
2008-10-21 17:15     ` Christoph Hellwig
2008-10-21 17:31       ` Ric Wheeler
2008-10-22 12:27         ` Stephan von Krawczynski
2008-10-22 13:15           ` Chris Mason
2008-10-22 13:27             ` Ric Wheeler
2008-10-22 14:32               ` Avi Kivity
2008-10-22 14:36                 ` Chris Mason
2008-10-22 14:40                   ` Avi Kivity
2008-10-22 14:46                 ` Ric Wheeler
2008-10-22 14:54                   ` Avi Kivity
2008-10-22 15:02                     ` Ric Wheeler
2008-10-22 15:13                       ` Avi Kivity
2008-10-22 15:25                         ` Ric Wheeler
2008-10-22 15:33                           ` Chris Mason
2008-10-22 15:43                             ` Avi Kivity
2008-10-22 15:54                               ` Ric Wheeler
2008-10-22 18:28                                 ` Avi Kivity
2008-10-22 15:39                           ` Avi Kivity
2008-10-22 13:52             ` Stephan von Krawczynski
2008-10-22 15:56               ` Michel Salim
2008-10-22 16:56                 ` jim owens
2008-10-23  9:47                 ` Stephan von Krawczynski
2008-10-22 11:40       ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09   ` Andi Kleen
2008-10-22 11:43     ` Stephan von Krawczynski
2008-10-21 16:27   ` Stephan von Krawczynski
2008-10-21 16:59     ` Andi Kleen
2008-10-22 11:46       ` Stephan von Krawczynski
2008-10-21 17:49     ` Chris Mason
2008-10-22 12:19       ` Stephan von Krawczynski
2008-10-22 12:48         ` Jeff Schroeder
2008-10-22 14:02           ` Stephan von Krawczynski
2008-10-22 13:50         ` Chris Mason
2008-10-22 14:04           ` Matthias Wächter
2008-10-22 14:32             ` Ric Wheeler
2008-10-22 14:44               ` jim owens
2008-10-24  8:42           ` Chris Samuel
2008-10-24  8:39         ` Chris Samuel
2008-10-21 20:54   ` Eric Anopolsky
2008-10-21 22:18     ` Ric Wheeler
2008-10-22  2:29       ` Eric Anopolsky
2008-10-22 10:42         ` Ric Wheeler
2008-10-22 10:53           ` Tejun Heo
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 13:15               ` Tejun Heo
2008-10-22 13:19                 ` Chris Mason
2008-10-22 13:38                   ` Ric Wheeler
2008-10-22 13:59                     ` Chris Mason
2008-10-22 14:23                       ` Ric Wheeler
2008-10-22 13:23                 ` Ric Wheeler
2008-10-22 16:14                   ` Tejun Heo
2008-10-22 16:34                     ` Ric Wheeler [this message]
2008-10-23  3:59                       ` Tejun Heo
2008-10-22 18:32                     ` Avi Kivity
2008-10-22 19:13                       ` jim owens
2008-10-22 19:22                         ` Avi Kivity
2008-10-22 19:59                       ` Ric Wheeler
2008-10-22 21:31                     ` Eric Anopolsky
2008-10-22 21:56                       ` Ric Wheeler
  -- strict thread matches above, loose matches on Subject: below --
2008-10-21 17:37 calin
2008-10-21 20:08 ` jim owens
2008-10-22  7:15   ` Avi Kivity
2008-10-22 14:13     ` jim owens
2008-10-22 14:25       ` Avi Kivity
2008-10-22 14:35 dbz
2008-10-27 15:43 ` Stephan von Krawczynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48FF5621.6020902@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=erpo41@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=skraw@ithnet.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.