dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, Jan Kara <jack@suse.cz>,
	Mike Snitzer <snitzer@redhat.com>,
	linux-scsi@vger.kernel.org, neilb@suse.de, dm-devel@redhat.com,
	linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org,
	"Darrick J. Wong" <djwong@us.ibm.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] a few storage topics
Date: Mon, 23 Jan 2012 17:30:24 +0100	[thread overview]
Message-ID: <20120123163024.GD28526@quack.suse.cz> (raw)
In-Reply-To: <4F1BF39A.6090709@panasas.com>

On Sun 22-01-12 13:31:38, Boaz Harrosh wrote:
> On 01/19/2012 11:39 PM, Andrea Arcangeli wrote:
> > On Thu, Jan 19, 2012 at 09:52:11PM +0100, Jan Kara wrote:
> >> anything. So what will be cheaper depends on how often are redirtied pages
> >> under IO. This is rather rare because pages aren't flushed all that often.
> >> So the effect of stable pages in not observable on throughput. But you can
> >> certainly see it on max latency...
> > 
> > I see your point. A problem with migrate though is that the page must
> > be pinned by the I/O layer to prevent migration to free the page under
> > I/O, or how else it could be safe to read from a freed page? And if
> > the page is pinned migration won't work at all. See page_freeze_refs
> > in migrate_page_move_mapping. So the pinning issue would need to be
> > handled somehow. It's needed for example when there's an O_DIRECT
> > read, and the I/O is going to the page, if the page is migrated in
> > that case, we'd lose a part of the I/O. Differentiating how many page
> > pins are ok to be ignored by migration won't be trivial but probably
> > possible to do.
> > 
> > Another way maybe would be to detect when there's too much re-dirtying
> > of pages in flight in a short amount of time, and to start the bounce
> > buffering and stop waiting, until the re-dirtying stops, and then you
> > stop the bounce buffering. But unlike migration, it can't prevent an
> > initial burst of high fault latency...
> 
> Or just change that RT program that is one - latency bound but, two - does
> unpredictable, statistically bad, things to a memory mapped file.
  Right. That's what I told the RT guy as well :) But he didn't like to
hear that because it meant more coding for him.

> Can a memory-mapped-file writer have some control on the time of
> writeback with data_sync or such, or it's purely: Timer fired, Kernel see
> a dirty page, start a writeout? What about if the application maps a
> portion of the file at a time, and the Kernel gets more lazy on an active
> memory mapped region. (That's what windows NT do. It will never IO a mapped
> section unless in OOM conditions. The application needs to map small sections
> and unmap to IO. It's more of a direct_io than mmap)
  You can always start writeback by sync_file_range() but you have no
guarantees what writeback does. Also if you need to redirty the page
pernamently (e.g. it's a head of your transaction log), there's simply no
good time when it can be written when you also want stable pages.

> In any case, if you are very latency sensitive an mmap writeout is bad for
> you. Not only because of this new problem, but because mmap writeout can
> sync with tones of other things, that are do to memory management. (As mentioned
> by Andrea). The best for latency sensitive application is asynchronous direct-io
> by far. Only with asynchronous and direct-io you can have any real control on
> your latency. (I understand they used to have empirically observed latency bound
> but that is just luck, not real control)
> 
> BTW: The application mentioned would probably not want it's IO bounced at
> the block layer, other wise why would it use mmap if not for preventing
> the copy induced by buffer IO?
  Yeah, I'm not sure why their design was as it was.

> All that said, a mount option to ext4 (Is ext4 used?) to revert to the old
> behavior is the easiest solution. When originally we brought this up in LSF
> my thought was that the block request Q should have some flag that says
> need_stable_pages. If set by the likes of dm/md-raid, iscsi-with-data-signed, DIFF
> enabled devices and so on, and the FS does not guaranty/wants stable pages
> then an IO bounce is set up. But if not set then the like of ext4 need not
> bother.
  There's no mount option. The behavior is on unconditionally. And so far I
have not seen enough people complain to introduce something like that -
automatic logic is a different thing of course. That might be nice to have.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-01-23 16:30 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-21 17:26 [CFP] Linux Storage, Filesystem & Memory Management Summit 2012 (April 1-2) Williams, Dan J
2012-01-17 20:06 ` [LSF/MM TOPIC] a few storage topics Mike Snitzer
2012-01-17 21:36   ` [Lsf-pc] " Jan Kara
2012-01-18 22:58     ` Darrick J. Wong
2012-01-18 23:22       ` Jan Kara
2012-01-18 23:42         ` Boaz Harrosh
2012-01-19  9:46           ` Jan Kara
2012-01-19 15:08             ` Andrea Arcangeli
2012-01-19 20:52               ` Jan Kara
2012-01-19 21:39                 ` Andrea Arcangeli
2012-01-22 11:31                   ` Boaz Harrosh
2012-01-23 16:30                     ` Jan Kara [this message]
2012-01-22 12:21             ` Boaz Harrosh
2012-01-23 16:18               ` Jan Kara
2012-01-23 17:53                 ` Andrea Arcangeli
2012-01-23 18:28                   ` Jeff Moyer
2012-01-23 18:56                     ` Andrea Arcangeli
2012-01-23 19:19                       ` Jeff Moyer
2012-01-24 15:15                     ` Chris Mason
2012-01-24 16:56                       ` [dm-devel] " Christoph Hellwig
2012-01-24 17:01                         ` Andreas Dilger
2012-01-24 17:06                         ` [Lsf-pc] [dm-devel] " Andrea Arcangeli
2012-01-24 17:08                         ` Chris Mason
2012-01-24 17:08                         ` [Lsf-pc] " Andreas Dilger
2012-01-24 18:05                           ` [dm-devel] " Jeff Moyer
2012-01-24 18:40                             ` Christoph Hellwig
2012-01-24 19:07                               ` Chris Mason
2012-01-24 19:14                                 ` Jeff Moyer
2012-01-24 20:09                                   ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:13                                     ` [Lsf-pc] " Jeff Moyer
2012-01-24 20:39                                       ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:59                                         ` Jeff Moyer
2012-01-24 21:08                                           ` Jan Kara
2012-01-25  3:29                                         ` Wu Fengguang
2012-01-25  6:15                                           ` [Lsf-pc] " Andreas Dilger
2012-01-25  6:35                                             ` [Lsf-pc] [dm-devel] " Wu Fengguang
2012-01-25 14:00                                               ` Jan Kara
2012-01-26 12:29                                                 ` Andreas Dilger
2012-01-27 17:03                                                   ` Ted Ts'o
2012-01-26 16:25                                               ` Vivek Goyal
2012-01-26 20:37                                                 ` Jan Kara
2012-01-26 22:34                                                 ` Dave Chinner
2012-01-27  3:27                                                   ` Wu Fengguang
2012-01-27  5:25                                                     ` Andreas Dilger
2012-01-27  7:53                                                       ` Wu Fengguang
2012-01-25 14:33                                             ` Steven Whitehouse
2012-01-25 14:45                                               ` Jan Kara
2012-01-25 16:22                                               ` Loke, Chetan
2012-01-25 16:40                                                 ` Steven Whitehouse
2012-01-25 17:08                                                   ` Loke, Chetan
2012-01-25 17:32                                                   ` James Bottomley
2012-01-25 18:28                                                     ` Loke, Chetan
2012-01-25 18:37                                                       ` Loke, Chetan
2012-01-25 18:37                                                       ` James Bottomley
2012-01-25 20:06                                                         ` Chris Mason
2012-01-25 22:46                                                           ` Andrea Arcangeli
2012-01-25 22:58                                                             ` Jan Kara
2012-01-26  8:59                                                             ` Boaz Harrosh
2012-01-26 16:40                                                             ` Loke, Chetan
2012-01-26 17:00                                                               ` Andreas Dilger
2012-01-26 17:16                                                                 ` Loke, Chetan
2012-02-03 12:37                                                               ` Wu Fengguang
2012-01-26 22:38                                                           ` Dave Chinner
2012-01-26 16:17                                                         ` Loke, Chetan
2012-01-25 18:44                                                       ` Boaz Harrosh
2012-02-03 12:55                                                   ` Wu Fengguang
2012-01-24 19:11                               ` [dm-devel] [Lsf-pc] " Jeff Moyer
2012-01-26 22:31                             ` Dave Chinner
2012-01-24 17:12                       ` Jeff Moyer
2012-01-24 17:32                         ` Chris Mason
2012-01-24 18:14                           ` Jeff Moyer
2012-01-25  0:23           ` NeilBrown
2012-01-25  6:11             ` Andreas Dilger
2012-01-18 23:39       ` Dan Williams
2012-01-24 17:59   ` Martin K. Petersen
2012-01-24 19:48     ` Douglas Gilbert
2012-01-24 20:04       ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120123163024.GD28526@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=bharrosh@panasas.com \
    --cc=djwong@us.ibm.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.de \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).