From: Vivek Goyal <vgoyal@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
lsf@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)
Date: Fri, 15 Apr 2011 17:07:50 -0400 [thread overview]
Message-ID: <20110415210750.GC28323@redhat.com> (raw)
In-Reply-To: <20110411013630.GM30279@dastard>
On Mon, Apr 11, 2011 at 11:36:30AM +1000, Dave Chinner wrote:
[..]
> > > > > how metadata IO is going to be handled by
> > > > > IO controllers,
> > > >
> > > > So IO controller provides two mechanisms.
> > > >
> > > > - IO throttling(bytes_per_second, io_per_second interface)
> > > > - Proportional weight disk sharing
> > > >
> > > > In case of proportional weight disk sharing, we don't run into issues of
> > > > priority inversion and metadata handing should not be a concern.
> > >
> > > Though metadata IO will affect how much bandwidth/iops is available
> > > for applications to use.
> >
> > I think meta data IO will be accounted to the process submitting the meta
> > data IO. (IO tracking stuff will be used only for page cache pages during
> > page dirtying time). So yes, the process doing meta data IO will be
> > charged for it.
> >
> > I think I am missing something here and not understanding your concern
> > exactly here.
>
> XFS can issue thousands of delayed metadata write IO per second from
> it's writeback threads when it needs to (e.g. tail pushing the
> journal). Completely unthrottled due to the context they are issued
> from(*) and can basically consume all the disk iops and bandwidth
> capacity for seconds at a time.
>
> Also, XFS doesn't use the page cache for metadata buffers anymore
> so page cache accounting, throttling and reclaim mechanisms
> are never going to work for controlling XFS metadata IO
>
>
> (*) It'll be IO issued by workqueues rather than threads RSN:
>
> http://git.kernel.org/?p=linux/kernel/git/dgc/xfsdev.git;a=shortlog;h=refs/heads/xfs-for-2.6.39
>
> And this will become _much_ more common in the not-to-distant
> future. So context passing between threads and to workqueues is
> something you need to think about sooner rather than later if you
> want metadata IO to be throttled in any way....
Ok,
So this seems to the similar case as WRITE traffic from flusher threads
which can disrupt IO on end device even if we have done throttling in
balance_dirty_pages().
How about doing throttling at two layers. All the data throttling is
done in higher layers and then also retain the mechanism of throttling
at end device. That way an admin can put a overall limit on such
common write traffic. (XFS meta data coming from workqueues, flusher
thread, kswapd etc).
Anyway, we can't attribute this IO to per process context/group otherwise
most likely something will get serialized in higher layers.
Right now I am speaking purely from IO throttling point of view and not
even thinking about CFQ and IO tracking stuff.
This increases the complexity in IO cgroup interface as now we see to have
four combinations.
Global Throttling
Throttling at lower layers
Throttling at higher layers.
Per device throttling
Throttling at lower layers
Throttling at higher layers.
Thanks
Vivek
next prev parent reply other threads:[~2011-04-15 21:08 UTC|newest]
Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1301373398.2590.20.camel@mulgrave.site>
2011-03-29 5:14 ` [Lsf] Preliminary Agenda and Activities for LSF Amir Goldstein
2011-03-29 11:16 ` Ric Wheeler
2011-03-29 11:22 ` Matthew Wilcox
2011-03-29 12:17 ` Jens Axboe
2011-03-29 13:09 ` Martin K. Petersen
2011-03-29 13:12 ` Ric Wheeler
2011-03-29 13:38 ` James Bottomley
2011-03-29 17:20 ` Shyam_Iyer
2011-03-29 17:33 ` Vivek Goyal
2011-03-29 18:10 ` Shyam_Iyer
2011-03-29 18:45 ` Vivek Goyal
2011-03-29 19:13 ` Shyam_Iyer
2011-03-29 19:57 ` Vivek Goyal
2011-03-29 19:59 ` Mike Snitzer
2011-03-29 20:12 ` Shyam_Iyer
2011-03-29 20:23 ` Mike Snitzer
2011-03-29 23:09 ` Shyam_Iyer
2011-03-30 5:58 ` [Lsf] " Hannes Reinecke
2011-03-30 14:02 ` James Bottomley
2011-03-30 14:10 ` Hannes Reinecke
2011-03-30 14:26 ` James Bottomley
2011-03-30 14:55 ` Hannes Reinecke
2011-03-30 15:33 ` James Bottomley
2011-03-30 15:46 ` Shyam_Iyer
2011-03-30 20:32 ` Giridhar Malavali
2011-03-30 20:45 ` James Bottomley
2011-03-29 19:47 ` Nicholas A. Bellinger
2011-03-29 20:29 ` Jan Kara
2011-03-29 20:31 ` Ric Wheeler
2011-03-30 0:33 ` Mingming Cao
2011-03-30 2:17 ` Dave Chinner
2011-03-30 11:13 ` Theodore Tso
2011-03-30 11:28 ` Ric Wheeler
2011-03-30 14:07 ` Chris Mason
2011-04-01 15:19 ` Ted Ts'o
2011-04-01 16:30 ` Amir Goldstein
2011-04-01 21:46 ` Joel Becker
2011-04-02 3:26 ` Amir Goldstein
2011-04-01 21:43 ` Joel Becker
2011-03-30 21:49 ` Mingming Cao
2011-03-31 0:05 ` Matthew Wilcox
2011-03-31 1:00 ` Joel Becker
2011-04-01 21:34 ` Mingming Cao
2011-04-01 21:49 ` Joel Becker
2011-03-29 17:35 ` Chad Talbott
2011-03-29 19:09 ` Vivek Goyal
2011-03-29 20:14 ` Chad Talbott
2011-03-29 20:35 ` Jan Kara
2011-03-29 21:08 ` Greg Thelen
2011-03-30 4:18 ` Dave Chinner
2011-03-30 15:37 ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Vivek Goyal
2011-03-30 22:20 ` Dave Chinner
2011-03-30 22:49 ` Chad Talbott
2011-03-31 3:00 ` Dave Chinner
2011-03-31 14:16 ` Vivek Goyal
2011-03-31 14:34 ` Chris Mason
2011-03-31 22:14 ` Dave Chinner
2011-03-31 23:43 ` Chris Mason
2011-04-01 0:55 ` Dave Chinner
2011-04-01 1:34 ` Vivek Goyal
2011-04-01 4:36 ` Dave Chinner
2011-04-01 6:32 ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Christoph Hellwig
2011-04-01 7:23 ` Dave Chinner
2011-04-01 12:56 ` Christoph Hellwig
2011-04-21 15:07 ` Vivek Goyal
2011-04-01 14:49 ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] " Vivek Goyal
2011-03-31 22:25 ` Vivek Goyal
2011-03-31 14:50 ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Greg Thelen
2011-03-31 22:27 ` Dave Chinner
2011-04-01 17:18 ` Vivek Goyal
2011-04-01 21:49 ` Dave Chinner
2011-04-02 7:33 ` Greg Thelen
2011-04-02 7:34 ` Greg Thelen
2011-04-05 13:13 ` Vivek Goyal
2011-04-05 22:56 ` Dave Chinner
2011-04-06 14:49 ` Curt Wohlgemuth
2011-04-06 15:39 ` Vivek Goyal
2011-04-06 19:49 ` Greg Thelen
2011-04-06 23:07 ` [Lsf] IO less throttling and cgroup aware writeback Greg Thelen
2011-04-06 23:36 ` Dave Chinner
2011-04-07 19:24 ` Vivek Goyal
2011-04-07 20:33 ` Christoph Hellwig
2011-04-07 21:34 ` Vivek Goyal
2011-04-07 23:42 ` Dave Chinner
2011-04-08 0:59 ` Greg Thelen
2011-04-08 1:25 ` Dave Chinner
2011-04-12 3:17 ` KAMEZAWA Hiroyuki
2011-04-08 13:43 ` Vivek Goyal
2011-04-06 23:08 ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Dave Chinner
2011-04-07 20:04 ` Vivek Goyal
2011-04-07 23:47 ` Dave Chinner
2011-04-08 13:50 ` Vivek Goyal
2011-04-11 1:05 ` Dave Chinner
2011-04-06 15:37 ` Vivek Goyal
2011-04-06 16:08 ` Vivek Goyal
2011-04-06 17:10 ` Jan Kara
2011-04-06 17:14 ` Curt Wohlgemuth
2011-04-08 1:58 ` Dave Chinner
2011-04-19 14:26 ` Wu Fengguang
2011-04-06 23:50 ` Dave Chinner
2011-04-07 17:55 ` Vivek Goyal
2011-04-11 1:36 ` Dave Chinner
2011-04-15 21:07 ` Vivek Goyal [this message]
2011-04-16 3:06 ` Vivek Goyal
2011-04-18 21:58 ` Jan Kara
2011-04-18 22:51 ` cgroup IO throttling and filesystem ordered mode (Was: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)) Vivek Goyal
2011-04-19 0:33 ` Dave Chinner
2011-04-19 14:30 ` Vivek Goyal
2011-04-19 14:45 ` Jan Kara
2011-04-19 17:17 ` Vivek Goyal
2011-04-19 18:30 ` Vivek Goyal
2011-04-21 0:32 ` Dave Chinner
2011-04-21 0:29 ` Dave Chinner
2011-04-19 14:17 ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Wu Fengguang
2011-04-19 14:34 ` Vivek Goyal
2011-04-19 14:48 ` Jan Kara
2011-04-19 15:11 ` Vivek Goyal
2011-04-19 15:22 ` Wu Fengguang
2011-04-19 15:31 ` Vivek Goyal
2011-04-19 16:58 ` Wu Fengguang
2011-04-19 17:05 ` Vivek Goyal
2011-04-19 20:58 ` Jan Kara
2011-04-20 1:21 ` Wu Fengguang
2011-04-20 10:56 ` Jan Kara
2011-04-20 11:19 ` Wu Fengguang
2011-04-20 14:42 ` Jan Kara
2011-04-20 1:16 ` Wu Fengguang
2011-04-20 18:44 ` Vivek Goyal
2011-04-20 19:16 ` Jan Kara
2011-04-21 0:17 ` Dave Chinner
2011-04-21 15:06 ` Wu Fengguang
2011-04-21 15:10 ` Wu Fengguang
2011-04-21 17:20 ` Vivek Goyal
2011-04-22 4:21 ` Wu Fengguang
2011-04-22 15:25 ` Vivek Goyal
2011-04-22 16:28 ` Andrea Arcangeli
2011-04-25 18:19 ` Vivek Goyal
2011-04-26 14:37 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110415210750.GC28323@redhat.com \
--to=vgoyal@redhat.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=david@fromorbit.com \
--cc=gthelen@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).