All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Greg Stark <gsstark@mit.edu>
Cc: "Mudama, Eric" <eric_mudama@Maxtor.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ide write barrier support
Date: Fri, 17 Oct 2003 08:44:31 +0200	[thread overview]
Message-ID: <20031017064431.GW1128@suse.de> (raw)
In-Reply-To: <87ekxcap7a.fsf@stark.dyndns.tv>

On Thu, Oct 16 2003, Greg Stark wrote:
> 
> "Mudama, Eric" <eric_mudama@Maxtor.com> writes:
> 
> > It takes us multiple servo wedges to know that we think our write to the
> > media went in the right place, therefore by definition if we didn't already
> > have the next command's data, we've already missed our target location and
> > have to wait a full revolution to put the new data on the media.  Since we
> > can't report good status for the flush until after we're sure the data is
> > down properly, we'll always blow a rev.
> 
> Ok, on further thought. I think a write barrier isn't really what the database
> needs. It seems to be stronger and more resource intensive than what it really
> needs.
> 
> Postgres writes a transaction log. When the client issues a commit postgres
> cannot return until it knows all the writes for the transaction log for that
> transaction have completed.
> 
> Currently it issues an fsync which is already a bit stronger than necessary.
> But a write barrier sounds even stronger. It would block all other disk i/o
> until the fsync completes. This is completely unnecessary, it would prevent
> other transactions from proceeding at all until the commit finished.
> 
> Ideally postgres just needs to call some kind of fsync syscall that guarantees
> it won't return until all buffers from the file that were dirty prior to the
> sync were flushed and the disk was really synced. It's fine for buffers that
> were dirtied later to get synced as well, as long as all the old buffers are
> all synced.

I've been thinking about adding WRITESYNC to do exactly that, and keep
WRITEBARRIER with its current functionality for journalled file
systems. WRITESYNC would be exactly what you describe, it just wont
imply any io scheduler ordering. So a post-flush would be enough to
handle that case.

The problem is that as far as I can see the best way to make fsync
really work is to make the last write a barrier write. That
automagically gets everything right for you - when the last block goes
to disk, you know the previous ones have already. And when the last
block completes, you know the whole lot is on platter. If you were just
using WRITESYNC, you would have to WRITESYNC all blocks in that range
instead of just WRITE WRITE WRITE ... WRITEBARRIER. So the barrier would
still end up being cheaper, unless the fsync just flushes a single page
in which case the WRITESYNC is enough.

-- 
Jens Axboe


  reply	other threads:[~2003-10-17  6:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-16 16:51 [PATCH] ide write barrier support Mudama, Eric
2003-10-16 20:43 ` Greg Stark
2003-10-17  6:44   ` Jens Axboe [this message]
2003-10-17  6:46 ` Jens Axboe
     [not found] <IXzh.61g.5@gated-at.bofh.it>
2003-10-21 19:24 ` Anton Ertl
  -- strict thread matches above, loose matches on Subject: below --
2003-10-17 18:42 Mudama, Eric
2003-10-17 17:59 Manfred Spraul
2003-10-17 18:06 ` Jens Axboe
2003-10-21  0:47   ` Matthias Andree
2003-10-17 16:07 Mudama, Eric
2003-10-17 18:08 ` Jens Axboe
2003-10-16 20:51 Mudama, Eric
2003-10-17  6:48 ` Jens Axboe
2003-10-13 14:08 Jens Axboe
2003-10-13 15:23 ` Jeff Garzik
2003-10-13 15:35   ` Jens Axboe
2003-10-13 15:37     ` Jens Axboe
2003-10-13 22:39 ` Matthias Andree
2003-10-14  0:16   ` Jeff Garzik
2003-10-16 10:36     ` Jens Axboe
2003-10-16 10:46       ` Jeff Garzik
2003-10-16 10:48         ` Jens Axboe
2003-10-13 23:07 ` Andrew Morton
2003-10-14  6:48   ` Jens Axboe
2003-10-15  3:40 ` Greg Stark
2003-10-16  7:10   ` Jens Axboe
2003-10-20 17:10 ` Daniel Phillips
2003-10-20 19:56   ` Jens Axboe
2003-10-20 23:46     ` Daniel Phillips
2003-10-21  5:40       ` Jens Axboe
2003-10-23 16:22         ` Daniel Phillips
2003-10-23 16:23           ` Jens Axboe
2003-10-23 17:20             ` Daniel Phillips
2003-10-23 23:21               ` Nick Piggin
2003-10-26 21:06                 ` Daniel Phillips
2003-10-27 10:29                   ` Lars Marowsky-Bree
2003-10-27 21:35                     ` Daniel Phillips
2003-10-24  9:36               ` Helge Hafting
2003-10-26 15:38                 ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031017064431.GW1128@suse.de \
    --to=axboe@suse.de \
    --cc=eric_mudama@Maxtor.com \
    --cc=gsstark@mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.