From: Jan Kara <jack@suse.cz>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
Mike Snitzer <snitzer@redhat.com>,
linux-scsi@vger.kernel.org, neilb@suse.de, dm-devel@redhat.com,
linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org,
"Darrick J. Wong" <djwong@us.ibm.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] a few storage topics
Date: Mon, 23 Jan 2012 17:30:24 +0100 [thread overview]
Message-ID: <20120123163024.GD28526@quack.suse.cz> (raw)
In-Reply-To: <4F1BF39A.6090709@panasas.com>
On Sun 22-01-12 13:31:38, Boaz Harrosh wrote:
> On 01/19/2012 11:39 PM, Andrea Arcangeli wrote:
> > On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote:
> >> anything. So what will be cheaper depends on how often are redirtied pages
> >> under IO. This is rather rare because pages aren't flushed all that often.
> >> So the effect of stable pages in not observable on throughput. But you can
> >> certainly see it on max latency...
> >
> > I see your point. A problem with migrate though is that the page must
> > be pinned by the I/O layer to prevent migration to free the page under
> > I/O, or how else it could be safe to read from a freed page? And if
> > the page is pinned migration won't work at all. See page_freeze_refs
> > in migrate_page_move_mapping. So the pinning issue would need to be
> > handled somehow. It's needed for example when there's an O_DIRECT
> > read, and the I/O is going to the page, if the page is migrated in
> > that case, we'd lose a part of the I/O. Differentiating how many page
> > pins are ok to be ignored by migration won't be trivial but probably
> > possible to do.
> >
> > Another way maybe would be to detect when there's too much re-dirtying
> > of pages in flight in a short amount of time, and to start the bounce
> > buffering and stop waiting, until the re-dirtying stops, and then you
> > stop the bounce buffering. But unlike migration, it can't prevent an
> > initial burst of high fault latency...
>
> Or just change that RT program that is one - latency bound but, two - does
> unpredictable, statistically bad, things to a memory mapped file.
Right. That's what I told the RT guy as well :) But he didn't like to
hear that because it meant more coding for him.
> Can a memory-mapped-file writer have some control on the time of
> writeback with data_sync or such, or it's purely: Timer fired, Kernel see
> a dirty page, start a writeout? What about if the application maps a
> portion of the file at a time, and the Kernel gets more lazy on an active
> memory mapped region. (That's what windows NT do. It will never IO a mapped
> section unless in OOM conditions. The application needs to map small sections
> and unmap to IO. It's more of a direct_io than mmap)
You can always start writeback by sync_file_range() but you have no
guarantees what writeback does. Also if you need to redirty the page
pernamently (e.g. it's a head of your transaction log), there's simply no
good time when it can be written when you also want stable pages.
> In any case, if you are very latency sensitive an mmap writeout is bad for
> you. Not only because of this new problem, but because mmap writeout can
> sync with tones of other things, that are do to memory management. (As mentioned
> by Andrea). The best for latency sensitive application is asynchronous direct-io
> by far. Only with asynchronous and direct-io you can have any real control on
> your latency. (I understand they used to have empirically observed latency bound
> but that is just luck, not real control)
>
> BTW: The application mentioned would probably not want it's IO bounced at
> the block layer, other wise why would it use mmap if not for preventing
> the copy induced by buffer IO?
Yeah, I'm not sure why their design was as it was.
> All that said, a mount option to ext4 (Is ext4 used?) to revert to the old
> behavior is the easiest solution. When originally we brought this up in LSF
> my thought was that the block request Q should have some flag that says
> need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF
> enabled devices and so on, and the FS does not guaranty/wants stable pages
> then an IO bounce is set up. But if not set then the like of ext4 need not
> bother.
There's no mount option. The behavior is on unconditionally. And so far I
have not seen enough people complain to introduce something like that -
automatic logic is a different thing of course. That might be nice to have.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2012-01-23 16:30 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-21 17:26 [CFP] Linux Storage, Filesystem & Memory Management Summit 2012 (April 1-2) Williams, Dan J
2012-01-17 20:06 ` [LSF/MM TOPIC] a few storage topics Mike Snitzer
2012-01-17 21:36 ` [Lsf-pc] " Jan Kara
2012-01-18 22:58 ` Darrick J. Wong
2012-01-18 23:22 ` Jan Kara
2012-01-18 23:42 ` Boaz Harrosh
2012-01-19 9:46 ` Jan Kara
2012-01-19 15:08 ` Andrea Arcangeli
2012-01-19 20:52 ` Jan Kara
2012-01-19 21:39 ` Andrea Arcangeli
2012-01-22 11:31 ` Boaz Harrosh
2012-01-23 16:30 ` Jan Kara [this message]
2012-01-22 12:21 ` Boaz Harrosh
2012-01-23 16:18 ` Jan Kara
2012-01-23 17:53 ` Andrea Arcangeli
2012-01-23 18:28 ` Jeff Moyer
2012-01-23 18:56 ` Andrea Arcangeli
2012-01-23 19:19 ` Jeff Moyer
2012-01-24 15:15 ` Chris Mason
2012-01-24 16:56 ` [dm-devel] " Christoph Hellwig
2012-01-24 17:01 ` Andreas Dilger
2012-01-24 17:06 ` [Lsf-pc] [dm-devel] " Andrea Arcangeli
2012-01-24 17:08 ` Chris Mason
2012-01-24 17:08 ` [Lsf-pc] " Andreas Dilger
2012-01-24 18:05 ` [dm-devel] " Jeff Moyer
2012-01-24 18:05 ` Jeff Moyer
2012-01-24 18:40 ` Christoph Hellwig
2012-01-24 19:07 ` Chris Mason
2012-01-24 19:14 ` Jeff Moyer
2012-01-24 20:09 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:13 ` [Lsf-pc] " Jeff Moyer
2012-01-24 20:39 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:59 ` Jeff Moyer
2012-01-24 20:59 ` Jeff Moyer
2012-01-24 21:08 ` Jan Kara
2012-01-25 3:29 ` Wu Fengguang
2012-01-25 6:15 ` [Lsf-pc] " Andreas Dilger
2012-01-25 6:35 ` [Lsf-pc] [dm-devel] " Wu Fengguang
2012-01-25 14:00 ` Jan Kara
2012-01-26 12:29 ` Andreas Dilger
2012-01-27 17:03 ` Ted Ts'o
2012-01-26 16:25 ` Vivek Goyal
2012-01-26 20:37 ` Jan Kara
2012-01-26 22:34 ` Dave Chinner
2012-01-27 3:27 ` Wu Fengguang
2012-01-27 5:25 ` Andreas Dilger
2012-01-27 7:53 ` Wu Fengguang
2012-01-25 14:33 ` Steven Whitehouse
2012-01-25 14:45 ` Jan Kara
2012-01-25 16:22 ` Loke, Chetan
2012-01-25 16:22 ` Loke, Chetan
2012-01-25 16:40 ` Steven Whitehouse
2012-01-25 17:08 ` Loke, Chetan
2012-01-25 17:08 ` Loke, Chetan
2012-01-25 17:32 ` James Bottomley
2012-01-25 18:28 ` Loke, Chetan
2012-01-25 18:28 ` Loke, Chetan
2012-01-25 18:28 ` Loke, Chetan
2012-01-25 18:37 ` Loke, Chetan
2012-01-25 18:37 ` Loke, Chetan
2012-01-25 18:37 ` Loke, Chetan
2012-01-25 18:37 ` James Bottomley
2012-01-25 20:06 ` Chris Mason
2012-01-25 22:46 ` Andrea Arcangeli
2012-01-25 22:58 ` Jan Kara
2012-01-26 8:59 ` Boaz Harrosh
2012-01-26 8:59 ` Boaz Harrosh
2012-01-26 16:40 ` Loke, Chetan
2012-01-26 16:40 ` Loke, Chetan
2012-01-26 17:00 ` Andreas Dilger
2012-01-26 17:00 ` Andreas Dilger
2012-01-26 17:16 ` Loke, Chetan
2012-01-26 17:16 ` Loke, Chetan
2012-02-03 12:37 ` Wu Fengguang
2012-01-26 22:38 ` Dave Chinner
2012-01-26 22:38 ` Dave Chinner
2012-01-26 16:17 ` Loke, Chetan
2012-01-26 16:17 ` Loke, Chetan
2012-01-26 16:17 ` Loke, Chetan
2012-01-25 18:44 ` Boaz Harrosh
2012-01-25 18:44 ` Boaz Harrosh
2012-02-03 12:55 ` Wu Fengguang
2012-01-24 19:11 ` [dm-devel] [Lsf-pc] " Jeff Moyer
2012-01-24 19:11 ` Jeff Moyer
2012-01-26 22:31 ` Dave Chinner
2012-01-24 17:12 ` Jeff Moyer
2012-01-24 17:32 ` Chris Mason
2012-01-24 18:14 ` Jeff Moyer
2012-01-25 0:23 ` NeilBrown
2012-01-25 0:23 ` NeilBrown
2012-01-25 6:11 ` Andreas Dilger
2012-01-25 6:11 ` Andreas Dilger
2012-01-18 23:39 ` Dan Williams
2012-01-24 17:59 ` Martin K. Petersen
2012-01-24 19:48 ` Douglas Gilbert
2012-01-24 20:04 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120123163024.GD28526@quack.suse.cz \
--to=jack@suse.cz \
--cc=aarcange@redhat.com \
--cc=bharrosh@panasas.com \
--cc=djwong@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.