linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>, Vivek Goyal <vgoyal@redhat.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	"lsf@lists.linux-foundation.org" <lsf@lists.linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)
Date: Wed, 20 Apr 2011 12:56:06 +0200	[thread overview]
Message-ID: <20110420105606.GA4991@quack.suse.cz> (raw)
In-Reply-To: <20110420012131.GB4421@localhost>

On Wed 20-04-11 09:21:31, Wu Fengguang wrote:
> On Wed, Apr 20, 2011 at 04:58:21AM +0800, Jan Kara wrote:
> > On Tue 19-04-11 13:05:43, Vivek Goyal wrote:
> > > On Wed, Apr 20, 2011 at 12:58:38AM +0800, Wu Fengguang wrote:
> > > > On Tue, Apr 19, 2011 at 11:31:06PM +0800, Vivek Goyal wrote:
> > > > > On Tue, Apr 19, 2011 at 11:22:40PM +0800, Wu Fengguang wrote:
> > > > > > On Tue, Apr 19, 2011 at 11:11:11PM +0800, Vivek Goyal wrote:
> > > > > > > On Tue, Apr 19, 2011 at 04:48:32PM +0200, Jan Kara wrote:
> > > > > > > > On Tue 19-04-11 10:34:23, Vivek Goyal wrote:
> > > > > > > > > On Tue, Apr 19, 2011 at 10:17:17PM +0800, Wu Fengguang wrote:
> > > > > > > > > > [snip]
> > > > > > > > > > > > > > For throttling case, apart from metadata, I found that with simple
> > > > > > > > > > > > > > throttling of data I ran into issues with journalling with ext4 mounuted
> > > > > > > > > > > > > > in ordered mode. So it was suggested that WRITE IO throttling should
> > > > > > > > > > > > > > not be done at device level instead try to do it in higher layers,
> > > > > > > > > > > > > > possibly balance_dirty_pages() and throttle process early.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The problem with doing it at the page cache entry level is that
> > > > > > > > > > > > > cache hits then get throttled. It's not really a an IO controller at
> > > > > > > > > > > > > that point, and the impact on application performance could be huge
> > > > > > > > > > > > > (i.e. MB/s instead of GB/s).
> > > > > > > > > > > > 
> > > > > > > > > > > > Agreed that throttling cache hits is not a good idea. Can we determine
> > > > > > > > > > > > if page being asked for is in cache or not and charge for IO accordingly.
> > > > > > > > > > > 
> > > > > > > > > > > You'd need hooks in find_or_create_page(), though you have no
> > > > > > > > > > > context of whether a read or a write is in progress at that point.
> > > > > > > > > > 
> > > > > > > > > > I'm confused.  Where is the throttling at cache hits?
> > > > > > > > > > 
> > > > > > > > > > The balance_dirty_pages() throttling kicks in at write() syscall and
> > > > > > > > > > page fault time. For example, generic_perform_write(), do_wp_page()
> > > > > > > > > > and __do_fault() will explicitly call
> > > > > > > > > > balance_dirty_pages_ratelimited() to do the write throttling.
> > > > > > > > > 
> > > > > > > > > This comment was in the context of what if we move block IO controller read
> > > > > > > > > throttling also in higher layers. Then we don't want to throttle reads
> > > > > > > > > which are already in cache.
> > > > > > > > > 
> > > > > > > > > Currently throttling hook is in generic_make_request() and it kicks in
> > > > > > > > > only if data is not present in page cache and actual disk IO is initiated.
> > > > > > > >   You can always throttle in readpage(). It's not much higher than
> > > > > > > > generic_make_request() but basically as high as it can get I suspect
> > > > > > > > (otherwise you'd have to deal with lots of different code paths like page
> > > > > > > > faults, splice, read, ...).
> > > > > > > 
> > > > > > > Yep, I was thinking that what do I gain by moving READ throttling up. 
> > > > > > > The only thing generic_make_request() does not catch is network file
> > > > > > > systems. I think for that I can introduce another hook say in NFS and
> > > > > > > I might be all set.
> > > > > > 
> > > > > > Basically all data reads go through the readahead layer, and the
> > > > > > __do_page_cache_readahead() function.
> > > > > > 
> > > > > > Just one more option for your tradeoffs :)
> > > > > 
> > > > > But this does not cover direct IO?
> > > > 
> > > > Yes, sorry!
> > > > 
> > > > > But I guess if I split the hook into two parts (one in direct IO path
> > > > > and one in __do_page_cache_readahead()), then filesystems don't have
> > > > > to mark meta data READS. I will look into it.
> > > > 
> > > > Right, and the hooks should be trivial to add.
> > > > 
> > > > The readahead code is typically invoked in three ways:
> > > > 
> > > > - sync readahead, on page cache miss, => page_cache_sync_readahead()
> > > > 
> > > > - async readahead, on hitting PG_readahead (tagged on one page per readahead window),
> > > >   => page_cache_async_readahead()
> > > > 
> > > > - user space readahead, fadvise(WILLNEED), => force_page_cache_readahead()
> > > > 
> > > > ext3/4 also call into readahead on readdir().
> > > 
> > > So this will be called for even meta data READS. Then there is no
> > > advantage of moving the throttle hook out of generic_make_request()?
> >   No, generally it won't. I think Fengguang was wrong - only ext2 carries
> > directories in page cache and thus uses readahead code. All other
> > filesystems handle directories specially and don't use readpage for them.
> 
> So ext2 is implicitly using readahead? ext3/4 behave different in that
> ext4_readdir() has an explicit call to page_cache_sync_readahead(),
> passing the blockdev mapping as the page cache container.
  Yes, ext2 uses implicitely readahead because it uses read_mapping_page()
for directory inodes. I forgot that ext3/4 call
page_cache_sync_readahead() so you were right that they actually use it for
the device inode. I'm sorry for the noise.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2011-04-20 10:56 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1301373398.2590.20.camel@mulgrave.site>
2011-03-29  5:14 ` [Lsf] Preliminary Agenda and Activities for LSF Amir Goldstein
2011-03-29 11:16 ` Ric Wheeler
2011-03-29 11:22   ` Matthew Wilcox
2011-03-29 12:17     ` Jens Axboe
2011-03-29 13:09       ` Martin K. Petersen
2011-03-29 13:12         ` Ric Wheeler
2011-03-29 13:38         ` James Bottomley
2011-03-29 17:20   ` Shyam_Iyer
2011-03-29 17:33     ` Vivek Goyal
2011-03-29 18:10       ` Shyam_Iyer
2011-03-29 18:45         ` Vivek Goyal
2011-03-29 19:13           ` Shyam_Iyer
2011-03-29 19:57             ` Vivek Goyal
2011-03-29 19:59             ` Mike Snitzer
2011-03-29 20:12               ` Shyam_Iyer
2011-03-29 20:23                 ` Mike Snitzer
2011-03-29 23:09                   ` Shyam_Iyer
2011-03-30  5:58                     ` [Lsf] " Hannes Reinecke
2011-03-30 14:02                       ` James Bottomley
2011-03-30 14:10                         ` Hannes Reinecke
2011-03-30 14:26                           ` James Bottomley
2011-03-30 14:55                             ` Hannes Reinecke
2011-03-30 15:33                               ` James Bottomley
2011-03-30 15:46                                 ` Shyam_Iyer
2011-03-30 20:32                                 ` Giridhar Malavali
2011-03-30 20:45                                   ` James Bottomley
2011-03-29 19:47   ` Nicholas A. Bellinger
2011-03-29 20:29   ` Jan Kara
2011-03-29 20:31     ` Ric Wheeler
2011-03-30  0:33   ` Mingming Cao
2011-03-30  2:17     ` Dave Chinner
2011-03-30 11:13       ` Theodore Tso
2011-03-30 11:28         ` Ric Wheeler
2011-03-30 14:07           ` Chris Mason
2011-04-01 15:19           ` Ted Ts'o
2011-04-01 16:30             ` Amir Goldstein
2011-04-01 21:46               ` Joel Becker
2011-04-02  3:26                 ` Amir Goldstein
2011-04-01 21:43             ` Joel Becker
2011-03-30 21:49       ` Mingming Cao
2011-03-31  0:05         ` Matthew Wilcox
2011-03-31  1:00         ` Joel Becker
2011-04-01 21:34           ` Mingming Cao
2011-04-01 21:49             ` Joel Becker
2011-03-29 17:35 ` Chad Talbott
2011-03-29 19:09   ` Vivek Goyal
2011-03-29 20:14     ` Chad Talbott
2011-03-29 20:35     ` Jan Kara
2011-03-29 21:08       ` Greg Thelen
2011-03-30  4:18   ` Dave Chinner
2011-03-30 15:37     ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Vivek Goyal
2011-03-30 22:20       ` Dave Chinner
2011-03-30 22:49         ` Chad Talbott
2011-03-31  3:00           ` Dave Chinner
2011-03-31 14:16         ` Vivek Goyal
2011-03-31 14:34           ` Chris Mason
2011-03-31 22:14             ` Dave Chinner
2011-03-31 23:43               ` Chris Mason
2011-04-01  0:55                 ` Dave Chinner
2011-04-01  1:34               ` Vivek Goyal
2011-04-01  4:36                 ` Dave Chinner
2011-04-01  6:32                   ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Christoph Hellwig
2011-04-01  7:23                     ` Dave Chinner
2011-04-01 12:56                       ` Christoph Hellwig
2011-04-21 15:07                         ` Vivek Goyal
2011-04-01 14:49                   ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] " Vivek Goyal
2011-03-31 22:25             ` Vivek Goyal
2011-03-31 14:50           ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Greg Thelen
2011-03-31 22:27             ` Dave Chinner
2011-04-01 17:18               ` Vivek Goyal
2011-04-01 21:49                 ` Dave Chinner
2011-04-02  7:33                   ` Greg Thelen
2011-04-02  7:34                     ` Greg Thelen
2011-04-05 13:13                   ` Vivek Goyal
2011-04-05 22:56                     ` Dave Chinner
2011-04-06 14:49                       ` Curt Wohlgemuth
2011-04-06 15:39                         ` Vivek Goyal
2011-04-06 19:49                           ` Greg Thelen
2011-04-06 23:07                           ` [Lsf] IO less throttling and cgroup aware writeback Greg Thelen
2011-04-06 23:36                             ` Dave Chinner
2011-04-07 19:24                               ` Vivek Goyal
2011-04-07 20:33                                 ` Christoph Hellwig
2011-04-07 21:34                                   ` Vivek Goyal
2011-04-07 23:42                                 ` Dave Chinner
2011-04-08  0:59                                   ` Greg Thelen
2011-04-08  1:25                                     ` Dave Chinner
2011-04-12  3:17                                       ` KAMEZAWA Hiroyuki
2011-04-08 13:43                                   ` Vivek Goyal
2011-04-06 23:08                         ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Dave Chinner
2011-04-07 20:04                           ` Vivek Goyal
2011-04-07 23:47                             ` Dave Chinner
2011-04-08 13:50                               ` Vivek Goyal
2011-04-11  1:05                                 ` Dave Chinner
2011-04-06 15:37                       ` Vivek Goyal
2011-04-06 16:08                         ` Vivek Goyal
2011-04-06 17:10                           ` Jan Kara
2011-04-06 17:14                             ` Curt Wohlgemuth
2011-04-08  1:58                             ` Dave Chinner
2011-04-19 14:26                               ` Wu Fengguang
2011-04-06 23:50                         ` Dave Chinner
2011-04-07 17:55                           ` Vivek Goyal
2011-04-11  1:36                             ` Dave Chinner
2011-04-15 21:07                               ` Vivek Goyal
2011-04-16  3:06                                 ` Vivek Goyal
2011-04-18 21:58                                   ` Jan Kara
2011-04-18 22:51                                     ` cgroup IO throttling and filesystem ordered mode (Was: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)) Vivek Goyal
2011-04-19  0:33                                       ` Dave Chinner
2011-04-19 14:30                                         ` Vivek Goyal
2011-04-19 14:45                                           ` Jan Kara
2011-04-19 17:17                                           ` Vivek Goyal
2011-04-19 18:30                                             ` Vivek Goyal
2011-04-21  0:32                                               ` Dave Chinner
2011-04-21  0:29                                           ` Dave Chinner
2011-04-19 14:17                               ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Wu Fengguang
2011-04-19 14:34                                 ` Vivek Goyal
2011-04-19 14:48                                   ` Jan Kara
2011-04-19 15:11                                     ` Vivek Goyal
2011-04-19 15:22                                       ` Wu Fengguang
2011-04-19 15:31                                         ` Vivek Goyal
2011-04-19 16:58                                           ` Wu Fengguang
2011-04-19 17:05                                             ` Vivek Goyal
2011-04-19 20:58                                               ` Jan Kara
2011-04-20  1:21                                                 ` Wu Fengguang
2011-04-20 10:56                                                   ` Jan Kara [this message]
2011-04-20 11:19                                                     ` Wu Fengguang
2011-04-20 14:42                                                       ` Jan Kara
2011-04-20  1:16                                               ` Wu Fengguang
2011-04-20 18:44                                                 ` Vivek Goyal
2011-04-20 19:16                                                   ` Jan Kara
2011-04-21  0:17                                                   ` Dave Chinner
2011-04-21 15:06                                                   ` Wu Fengguang
2011-04-21 15:10                                                     ` Wu Fengguang
2011-04-21 17:20                                                     ` Vivek Goyal
2011-04-22  4:21                                                       ` Wu Fengguang
2011-04-22 15:25                                                         ` Vivek Goyal
2011-04-22 16:28                                                           ` Andrea Arcangeli
2011-04-25 18:19                                                             ` Vivek Goyal
2011-04-26 14:37                                                               ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110420105606.GA4991@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf@lists.linux-foundation.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).