linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] bdi: Create a flag to indicate that a backing device needs stable page writes
Date: Tue, 30 Oct 2012 14:38:25 +0100	[thread overview]
Message-ID: <20121030133825.GA2260@quack.suse.cz> (raw)
In-Reply-To: <20121030113441.7f62df51@notabene.brown>

On Tue 30-10-12 11:34:41, NeilBrown wrote:
> On Tue, 30 Oct 2012 01:10:08 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > On Tue 30-10-12 10:48:37, NeilBrown wrote:
> > > On Mon, 29 Oct 2012 19:30:51 +0100 Jan Kara <jack@suse.cz> wrote:
> > > 
> > > > On Mon 29-10-12 19:13:58, Jan Kara wrote:
> > > > > On Fri 26-10-12 18:35:24, Darrick J. Wong wrote:
> > > > > > This creates BDI_CAP_STABLE_WRITES, which indicates that a device requires
> > > > > > stable page writes.  It also plumbs in a sysfs attribute so that admins can
> > > > > > check the device status.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > >   I guess Jens Axboe <axboe@kernel.dk> would be the best target for this
> > > > > patch (so that he can merge it). The patch looks OK to me. You can add:
> > > > >   Reviewed-by: Jan Kara <jack@suse.cz>
> > > >   One more thing popped up in my mind: What about NFS, Ceph or md RAID5?
> > > > These could (at least theoretically) care about stable writes as well. I'm
> > > > not sure if they really started to use them but it would be good to at
> > > > least let them know.
> > > > 
> > > 
> > > What exactly are the semantics of BDI_CAP_STABLE_WRITES ?
> > > 
> > > If I set it for md/RAID5, do I get a cast-iron guarantee that no byte in any
> > > page submitted for write will ever change until after I call bio_endio()?
> >   Yes.
> > 
> > > If so, is this true for all filesystems? - I would expect a bigger patch would
> > > be needed for that.
> >   Actually the code is in kernel for quite some time already. The problem
> > is it is always enabled causing unnecessary performance issues for some
> > workloads. So these patches try to be more selective in when the code gets
> > enabled.
> > 
> > Regarding "all filesystems" question: If we update filemap_page_mkwrite()
> > to call wait_on_page_writeback() then it should be for all filesystems.
> 
> Cool.  I didn't realise it had progressed that far.
> 
> I guess it is time to look at the possibility of removing the
> 'copy-into-cache' step for full-page, well-aligned bi_iovecs.
> 
> I assume this applies to swap-out as well ??  It has been a minor source of
> frustration that when you swap-out to RAID1, you can occasionally get
> different data on the two devices because memory changed between the two DMA
> events.
  Really? I'm somewhat surprised. I was under the impression that when a
page is added to a swap cache it is unmapped so there should be no
modification to it possible while it is being swapped out. But maybe it
could get mapped back and modified after we unlock the page and submit the
bio. So mm/memory.c:do_swap_page() might need wait_on_page_writeback() as
well. But I'm not an expert on swap code. I guess I'll experiment with this
a bit. Thanks for a pointer.

								Honza

  reply	other threads:[~2012-10-30 13:38 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-26 10:19 semi-stable page writes Darrick J. Wong
2012-10-27  1:35 ` [RFC PATCH 1/2] bdi: Create a flag to indicate that a backing device needs stable " Darrick J. Wong
2012-10-29 18:13   ` Jan Kara
2012-10-29 18:30     ` Jan Kara
2012-10-29 23:48       ` NeilBrown
2012-10-30  0:10         ` Jan Kara
2012-10-30  0:34           ` NeilBrown
2012-10-30 13:38             ` Jan Kara [this message]
2012-10-30 21:49               ` NeilBrown
2012-10-30  4:10   ` Martin K. Petersen
2012-10-30  4:48     ` NeilBrown
2012-10-30 12:19       ` Martin K. Petersen
2012-10-30 20:14         ` Darrick J. Wong
2012-10-30 22:14           ` NeilBrown
2012-10-30 23:58             ` Boaz Harrosh
2012-10-31  8:56             ` Darrick J. Wong
2012-10-31 11:56               ` Jan Kara
2012-10-31 19:36                 ` Darrick J. Wong
2012-10-31 23:12                   ` Boaz Harrosh
2012-11-01  5:51                     ` Darrick J. Wong
2012-11-01  6:25                       ` Boaz Harrosh
2012-11-01  8:59                   ` Jan Kara
2012-11-01 17:24                     ` Boaz Harrosh
2012-11-01 22:42                       ` Jan Kara
2012-10-30 22:40   ` Boaz Harrosh
2012-10-27  1:36 ` [RFC PATCH 2/2] mm: Gate stable page writes on the bdi flag Darrick J. Wong
2012-10-29 18:28   ` Jan Kara
2012-10-31  8:58     ` Darrick J. Wong
2012-10-29 22:01 ` semi-stable page writes Dave Chinner
2012-10-30  1:00   ` Theodore Ts'o
2012-10-30 23:30     ` Dave Chinner
2012-10-31 11:45       ` Jan Kara
2012-10-30 20:40   ` Darrick J. Wong
2012-10-30 23:43     ` Dave Chinner
2012-10-31  9:05       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121030133825.GA2260@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).