All of lore.kernel.org
 help / color / mirror / Atom feed
From: davidsen@tmr.com (bill davidsen)
To: linux-kernel@vger.kernel.org
Subject: Re: libata in 2.4.24?
Date: 2 Dec 2003 22:34:20 GMT	[thread overview]
Message-ID: <bqj41c$drr$1@gatekeeper.tmr.com> (raw)
In-Reply-To: 877k1f9e1g.fsf@stark.dyndns.tv

In article <877k1f9e1g.fsf@stark.dyndns.tv>,
Greg Stark  <gsstark@mit.edu> wrote:
| Jeff Garzik <jgarzik@pobox.com> writes:

| > > This doesn't happen with SCSI disks where multiple requests can be pending so
| > > there's no urgency to reporting a false success. The request doesn't complete
| > > until the write hits disk. As a result SCSI disks are reliable for database
| > > operation and IDE disks aren't unless write caching is disabled.
| > 
| > This is not really true.
| > 
| > Regardless of TCQ, if the OS driver has not issued a FLUSH CACHE (IDE)
| > or SYNCHRONIZE CACHE (SCSI), then the data is not guaranteed to be on
| > the disk media.  Plain and simple.
| 
| That doesn't agree with people's experience. People seem to find that SCSI
| drives never cache writes. This sort of makes sense since there's just not
| much reason to report a write success before the write can be performed.
| There's no performance advantage as long as more requests can be queued up.

I hope you mean the drives don't report completion until the data is on
the platter, clearly the data is cached in the drive until it can be
written.
| 
| 
| > If fsync(2) returns without a flush-cache, then your data is not
| > guaranteed to be on the disk.  And as you noted, flush-cache destroys
| > performance.
| 
| It's my understanding that it doesn't. There was some discussion in the past
| month about making the drivers issue syncs for journalled filesystems, but
| even then the idea of adding it to fsync or O_SYNC files wasn't the
| motivation.

With O_SYNC files there is the possibility of having a don't cache bit
in the packet to the drive, even with write caching. With fsync I don't
see any way to do it after the fact for only some of the data in the
drive cache. That's just an observation.

Clearly with a completion status coming back after actual completion
O_SYNC or fsync reduce to "wait for the ack from the drive."
| 
| 
| > There are three levels:
| > 
| > a) Data is successfully transferred to the controller/drive queue (TCQ).
| > b) Data is successfully transferred to the drive's internal buffers.
| > c) The drive successfully transfers data to the media.
| 
| Only the third is of interest to Postgres or other databases. In fact, I
| suspect only the third is of interest to other systems that are supposed to be
| reliable like MTAs etc. I think Wietse and others would be shocked if they
| were told fsync wasn't guaranteed to have waited until the writes had actually
| hit the media.

I think for reliability fsync has to flush cache, regardless of the
performance hit. I think a drive would be unusably slow if you did it
after each O_SYNC write, so that's probably not practical. Clearly the
best solution is a full SCSI implementation over PATA/SATA, but that
would eliminate some of the justification for SCSI devices at premium
prices.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

  parent reply	other threads:[~2003-12-02 22:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-28 18:27 linux-2.4.23 released Marcelo Tosatti
2003-11-28 19:06 ` Willy Tarreau
2003-11-28 22:55 ` J.A. Magallon
2003-11-29 22:26 ` libata in 2.4.24? Samuel Flory
2003-11-29 23:10   ` Marcelo Tosatti
2003-12-01 10:43     ` Marcelo Tosatti
2003-12-01 18:06       ` Samuel Flory
2003-12-01 21:12         ` Greg Stark
2003-12-01 21:23           ` Samuel Flory
2003-12-01 21:44             ` Greg Stark
2003-12-01 22:00               ` Jeff Garzik
2003-12-01 22:06               ` Samuel Flory
2003-12-01 22:00             ` Erik Steffl
2003-12-02  5:36               ` Greg Stark
     [not found]                 ` <20031202055336.GO1566@mis-mike-wstn.matchmail.com>
2003-12-02  5:58                   ` Mike Fedyk
2003-12-02 16:31                     ` Greg Stark
2003-12-02 17:40                       ` Mike Fedyk
2003-12-02 18:04                         ` Jeff Garzik
2003-12-02 18:46                           ` Mike Fedyk
2003-12-02 18:49                             ` Jeff Garzik
2003-12-04  8:18                         ` Jens Axboe
2003-12-02 18:02                       ` Jeff Garzik
2003-12-02 18:51                         ` Greg Stark
2003-12-02 19:06                           ` Jeff Garzik
2003-12-02 20:10                             ` Greg Stark
2003-12-02 20:16                               ` Jeff Garzik
2003-12-02 20:34                                 ` Greg Stark
2003-12-02 22:34                               ` bill davidsen [this message]
2003-12-02 23:02                                 ` Mike Fedyk
2003-12-02 23:18                                   ` bill davidsen
2003-12-02 23:40                                     ` Mike Fedyk
2003-12-03  0:01                                     ` Jeff Garzik
2003-12-03  0:47                                 ` Jamie Lokier
2003-12-07  5:33                                   ` Bill Davidsen
2003-12-01 21:36           ` Justin Cormack
  -- strict thread matches above, loose matches on Subject: below --
2003-12-01 13:41 Xose Vazquez Perez
2003-12-01 14:11 ` Marcelo Tosatti
2003-12-02 19:59   ` Stephan von Krawczynski
2003-12-02 22:05   ` bill davidsen
2003-12-02 22:34     ` Jeff Garzik
2003-12-03  0:34 Xose Vazquez Perez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='bqj41c$drr$1@gatekeeper.tmr.com' \
    --to=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.