public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: davidsen@tmr.com (bill davidsen)
To: linux-kernel@vger.kernel.org
Subject: Re: libata in 2.4.24?
Date: 2 Dec 2003 22:34:20 GMT	[thread overview]
Message-ID: <bqj41c$drr$1@gatekeeper.tmr.com> (raw)
In-Reply-To: 877k1f9e1g.fsf@stark.dyndns.tv

In article <877k1f9e1g.fsf@stark.dyndns.tv>,
Greg Stark  <gsstark@mit.edu> wrote:
| Jeff Garzik <jgarzik@pobox.com> writes:

| > > This doesn't happen with SCSI disks where multiple requests can be pending so
| > > there's no urgency to reporting a false success. The request doesn't complete
| > > until the write hits disk. As a result SCSI disks are reliable for database
| > > operation and IDE disks aren't unless write caching is disabled.
| > 
| > This is not really true.
| > 
| > Regardless of TCQ, if the OS driver has not issued a FLUSH CACHE (IDE)
| > or SYNCHRONIZE CACHE (SCSI), then the data is not guaranteed to be on
| > the disk media.  Plain and simple.
| 
| That doesn't agree with people's experience. People seem to find that SCSI
| drives never cache writes. This sort of makes sense since there's just not
| much reason to report a write success before the write can be performed.
| There's no performance advantage as long as more requests can be queued up.

I hope you mean the drives don't report completion until the data is on
the platter, clearly the data is cached in the drive until it can be
written.
| 
| 
| > If fsync(2) returns without a flush-cache, then your data is not
| > guaranteed to be on the disk.  And as you noted, flush-cache destroys
| > performance.
| 
| It's my understanding that it doesn't. There was some discussion in the past
| month about making the drivers issue syncs for journalled filesystems, but
| even then the idea of adding it to fsync or O_SYNC files wasn't the
| motivation.

With O_SYNC files there is the possibility of having a don't cache bit
in the packet to the drive, even with write caching. With fsync I don't
see any way to do it after the fact for only some of the data in the
drive cache. That's just an observation.

Clearly with a completion status coming back after actual completion
O_SYNC or fsync reduce to "wait for the ack from the drive."
| 
| 
| > There are three levels:
| > 
| > a) Data is successfully transferred to the controller/drive queue (TCQ).
| > b) Data is successfully transferred to the drive's internal buffers.
| > c) The drive successfully transfers data to the media.
| 
| Only the third is of interest to Postgres or other databases. In fact, I
| suspect only the third is of interest to other systems that are supposed to be
| reliable like MTAs etc. I think Wietse and others would be shocked if they
| were told fsync wasn't guaranteed to have waited until the writes had actually
| hit the media.

I think for reliability fsync has to flush cache, regardless of the
performance hit. I think a drive would be unusably slow if you did it
after each O_SYNC write, so that's probably not practical. Clearly the
best solution is a full SCSI implementation over PATA/SATA, but that
would eliminate some of the justification for SCSI devices at premium
prices.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

  parent reply	other threads:[~2003-12-02 22:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-28 18:27 linux-2.4.23 released Marcelo Tosatti
2003-11-28 19:06 ` Willy Tarreau
2003-11-28 22:55 ` J.A. Magallon
2003-11-29 22:26 ` libata in 2.4.24? Samuel Flory
2003-11-29 23:10   ` Marcelo Tosatti
2003-12-01 10:43     ` Marcelo Tosatti
2003-12-01 18:06       ` Samuel Flory
2003-12-01 21:12         ` Greg Stark
2003-12-01 21:23           ` Samuel Flory
2003-12-01 21:44             ` Greg Stark
2003-12-01 22:00               ` Jeff Garzik
2003-12-01 22:06               ` Samuel Flory
2003-12-01 22:00             ` Erik Steffl
2003-12-02  5:36               ` Greg Stark
     [not found]                 ` <20031202055336.GO1566@mis-mike-wstn.matchmail.com>
2003-12-02  5:58                   ` Mike Fedyk
2003-12-02 16:31                     ` Greg Stark
2003-12-02 17:40                       ` Mike Fedyk
2003-12-02 18:04                         ` Jeff Garzik
2003-12-02 18:46                           ` Mike Fedyk
2003-12-02 18:49                             ` Jeff Garzik
2003-12-04  8:18                         ` Jens Axboe
2003-12-02 18:02                       ` Jeff Garzik
2003-12-02 18:51                         ` Greg Stark
2003-12-02 19:06                           ` Jeff Garzik
2003-12-02 20:10                             ` Greg Stark
2003-12-02 20:16                               ` Jeff Garzik
2003-12-02 20:34                                 ` Greg Stark
2003-12-02 22:34                               ` bill davidsen [this message]
2003-12-02 23:02                                 ` Mike Fedyk
2003-12-02 23:18                                   ` bill davidsen
2003-12-02 23:40                                     ` Mike Fedyk
2003-12-03  0:01                                     ` Jeff Garzik
2003-12-03  0:47                                 ` Jamie Lokier
2003-12-07  5:33                                   ` Bill Davidsen
2003-12-01 21:36           ` Justin Cormack
  -- strict thread matches above, loose matches on Subject: below --
2003-12-01 13:41 Xose Vazquez Perez
2003-12-01 14:11 ` Marcelo Tosatti
2003-12-02 19:59   ` Stephan von Krawczynski
2003-12-02 22:05   ` bill davidsen
2003-12-02 22:34     ` Jeff Garzik
2003-12-03  0:34 Xose Vazquez Perez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='bqj41c$drr$1@gatekeeper.tmr.com' \
    --to=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox