From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
linux-ext4 <linux-ext4@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: semi-stable page writes
Date: Wed, 31 Oct 2012 02:05:19 -0700 [thread overview]
Message-ID: <20121031090519.GE19591@blackbox.djwong.org> (raw)
In-Reply-To: <20121030234331.GH29378@dastard>
On Wed, Oct 31, 2012 at 10:43:31AM +1100, Dave Chinner wrote:
> On Tue, Oct 30, 2012 at 01:40:37PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 30, 2012 at 09:01:22AM +1100, Dave Chinner wrote:
> > > On Fri, Oct 26, 2012 at 03:19:09AM -0700, Darrick J. Wong wrote:
> > > > Hi everyone,
> > > >
> > > > Are people still annoyed about writes taking unexpectedly long amounts of tme
> > > > due to the stable page write patchset? I'm guessing yes...
> > >
> > > I haven't heard anyone except th elunatic fringe complain
> > > recently...
> > >
> > > > I'm close to posting a patchset that (a) gates the wait_on_page_writeback calls
> > > > on a flag that you can set in the bdi to indicate that you need stable writes
> > > > (which blk_integrity_register will set);
> > >
> > > I'd prefer stable pages by default (e.g. btrfs needs it for sane
> > > data crc calculations), with an option to turn it off.
> > >
> > > > (b) (ab)uses a page flag bit (PG_slab)
> > > > to indicate that a page is actually being sent out to disk hardware; and (c)
> > >
> > > I don't think you can do that. You can send slab allocated memory to
> > > disk (e.g. kmalloc()d memory) and XFS definitely does that for
> > > sub-page sized metadata. I'm pretty sure that means the PG_slab
> > > flag is not available for (ab)use in the IO path....
> >
> > I gave up on PG_slab and declared my own PG_ bit. Unfortunately, atm I can't
> > remember which bit of code marks the page ptes so that they have to go back
> > through page_mkwrite, where we can trap the write. Hopefully for a shorter
> > duration.
>
> clear_page_dirty_for_io(), IIRC.
Yep, thanks. My memory is a bit rusty due to recent downtime. :/
Now to figure out if I can safely call that from deep inside the SCSI dispatch
functions as part of deferred-checksumming. I have a bad feeling that we have
to lock the page, which implies sleeping, and (unless they fixed this) the SCSI
dispatch functions hold the scsi host lock while running, which means we can't
sleep.
> > Also, I was wondering -- is it possible to pursue a dual strategy? If we can
> > obtain a memory page without sleeping or causing any writeback, then use the
> > page as a bounce buffer. Otherwise, just wait like we do now.
>
> Using bounce buffers for all IO is not a feasible solution. Way too
> much overhead copying data, not to mention we are already suffering
> from the problem of flusher threads going CPU bound trying to issue
> enough IO to keep high bandwidth storage fully utilised...
Ok.
--D
prev parent reply other threads:[~2012-10-31 9:05 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-26 10:19 semi-stable page writes Darrick J. Wong
2012-10-27 1:35 ` [RFC PATCH 1/2] bdi: Create a flag to indicate that a backing device needs stable " Darrick J. Wong
2012-10-29 18:13 ` Jan Kara
2012-10-29 18:30 ` Jan Kara
2012-10-29 23:48 ` NeilBrown
2012-10-30 0:10 ` Jan Kara
2012-10-30 0:34 ` NeilBrown
2012-10-30 13:38 ` Jan Kara
2012-10-30 21:49 ` NeilBrown
2012-10-30 4:10 ` Martin K. Petersen
2012-10-30 4:48 ` NeilBrown
2012-10-30 12:19 ` Martin K. Petersen
2012-10-30 20:14 ` Darrick J. Wong
2012-10-30 22:14 ` NeilBrown
2012-10-30 23:58 ` Boaz Harrosh
2012-10-31 8:56 ` Darrick J. Wong
2012-10-31 11:56 ` Jan Kara
2012-10-31 19:36 ` Darrick J. Wong
2012-10-31 23:12 ` Boaz Harrosh
2012-11-01 5:51 ` Darrick J. Wong
2012-11-01 6:25 ` Boaz Harrosh
2012-11-01 8:59 ` Jan Kara
2012-11-01 17:24 ` Boaz Harrosh
2012-11-01 22:42 ` Jan Kara
2012-10-30 22:40 ` Boaz Harrosh
2012-10-27 1:36 ` [RFC PATCH 2/2] mm: Gate stable page writes on the bdi flag Darrick J. Wong
2012-10-29 18:28 ` Jan Kara
2012-10-31 8:58 ` Darrick J. Wong
2012-10-29 22:01 ` semi-stable page writes Dave Chinner
2012-10-30 1:00 ` Theodore Ts'o
2012-10-30 23:30 ` Dave Chinner
2012-10-31 11:45 ` Jan Kara
2012-10-30 20:40 ` Darrick J. Wong
2012-10-30 23:43 ` Dave Chinner
2012-10-31 9:05 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121031090519.GE19591@blackbox.djwong.org \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).