All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	lsf@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)
Date: Fri, 1 Apr 2011 13:18:38 -0400	[thread overview]
Message-ID: <20110401171838.GD20986@redhat.com> (raw)
In-Reply-To: <20110331222756.GC2904@dastard>

On Fri, Apr 01, 2011 at 09:27:56AM +1100, Dave Chinner wrote:
> On Thu, Mar 31, 2011 at 07:50:23AM -0700, Greg Thelen wrote:
> > On Thu, Mar 31, 2011 at 7:16 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Thu, Mar 31, 2011 at 09:20:02AM +1100, Dave Chinner wrote:
> > >> > and also try to select inodes intelligently (cgroup aware manner).
> > >>
> > >> Such selection algorithms would need to be able to handle hundreds
> > >> of thousands of newly dirtied inodes per second so sorting and
> > >> selecting them efficiently will be a major issue...
> > >
> > > There was proposal of memory cgroup maintaining a per memory cgroup per
> > > bdi structure which will keep a list of inodes which need writeback
> > > from that cgroup.
> > 
> > FYI, I have patches which implement this per memcg per bdi dirty inode
> > list.  I want to debug a few issues before posting an RFC series.  But
> > it is getting close.
> 
> That's all well and good, but we're still trying to work out how to
> scale this list in a sane fashion. We just broke it out into it's
> own global lock, so it's going to change soon so that the list+lock
> is not a contention point on large machines. Just breaking it into a
> list per cgroup doesn't solve this problem - it just adds another
> container to the list.
> 
> Also, you have the problem that some filesystems don't use the bdi
> dirty inode list for all the dirty inodes in the filesytem - XFS has
> recent changed to only track VFS dirtied inodes in that list, intead
> using it's own active item list to track all logged modifications.
> IIUC, btrfs and ext3/4 do something similar as well. My current plans
> are to modify the dirty inode code to allow filesystems to say tot
> the VFS "don't track this dirty inode - I'm doing it myself" so that
> we can reduce the VFS dirty inode list to only those inodes with
> dirty pages....
> 
> > > So any cgroup looking for a writeback will queue up this structure on
> > > bdi and flusher threads can walk though this list and figure out
> > > which memory cgroups and which inodes within memory cgroup need to
> > > be written back.
> > 
> > The way these memcg-writeback patches are currently implemented is
> > that when a memcg is over background dirty limits, it will queue the
> > memcg a on a global over_bg_limit list and wakeup bdi flusher.
> 
> No global lists and locks, please. That's one of the big problems
> with the current foreground IO based throttling - it _hammers_ the
> global inode writeback list locks such that one an 8p machine we can
> be wasted 2-3 entire CPUs just contending on it when all 8 CPUs are
> trying to throttle and write back at the same time.....
> 
> > There
> > is no context (memcg or otherwise) given to the bdi flusher.  After
> > the bdi flusher checks system-wide background limits, it uses the
> > over_bg_limit list to find (and rotate) an over limit memcg.  Using
> > the memcg, then the per memcg per bdi dirty inode list is walked to
> > find inode pages to writeback.  Once the memcg dirty memory usage
> > drops below the memcg-thresh, the memcg is removed from the global
> > over_bg_limit list.
> 
> If you want controlled hand-off of writeback, you need to pass the
> memcg that triggered the throttling directly to the bdi. You already
> know what both the bdi and memcg that need writeback are. Yes, this
> needs concurrency at the BDI flush level to handle, but see my
> previous email in this thread for that....
> 

Even with memcg being passed around I don't think that we get rid of
global list lock. The reason being that inodes are not exclusive to
the memory cgroups. Multiple memory cgroups might be writting to same
inode. So inode still remains in the global list and memory cgroups
kind of will have pointer to it. So to start writeback on an inode
you still shall have to take global lock, IIUC.

Thanks
Vivek

  reply	other threads:[~2011-04-01 17:19 UTC|newest]

Thread overview: 166+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1301373398.2590.20.camel@mulgrave.site>
2011-03-29  5:14 ` [Lsf] Preliminary Agenda and Activities for LSF Amir Goldstein
2011-03-29 11:16 ` Ric Wheeler
2011-03-29 11:22   ` Matthew Wilcox
2011-03-29 12:17     ` Jens Axboe
2011-03-29 13:09       ` Martin K. Petersen
2011-03-29 13:09         ` Martin K. Petersen
2011-03-29 13:12         ` Ric Wheeler
2011-03-29 13:38         ` James Bottomley
2011-03-29 17:20   ` Shyam_Iyer
2011-03-29 17:20     ` Shyam_Iyer
2011-03-29 17:33     ` Vivek Goyal
2011-03-29 18:10       ` Shyam_Iyer
2011-03-29 18:10         ` Shyam_Iyer
2011-03-29 18:45         ` Vivek Goyal
2011-03-29 19:13           ` Shyam_Iyer
2011-03-29 19:13             ` Shyam_Iyer
2011-03-29 19:57             ` Vivek Goyal
2011-03-29 19:59             ` Mike Snitzer
2011-03-29 20:12               ` Shyam_Iyer
2011-03-29 20:12                 ` Shyam_Iyer
2011-03-29 20:23                 ` Mike Snitzer
2011-03-29 23:09                   ` Shyam_Iyer
2011-03-29 23:09                     ` Shyam_Iyer
2011-03-30  5:58                     ` [Lsf] " Hannes Reinecke
2011-03-30 14:02                       ` James Bottomley
2011-03-30 14:10                         ` Hannes Reinecke
2011-03-30 14:26                           ` James Bottomley
2011-03-30 14:55                             ` Hannes Reinecke
2011-03-30 15:33                               ` James Bottomley
2011-03-30 15:46                                 ` Shyam_Iyer
2011-03-30 15:46                                   ` Shyam_Iyer
2011-03-30 20:32                                 ` Giridhar Malavali
2011-03-30 20:45                                   ` James Bottomley
2011-03-29 19:47   ` Nicholas A. Bellinger
2011-03-29 20:29   ` Jan Kara
2011-03-29 20:31     ` Ric Wheeler
2011-03-30  0:33   ` Mingming Cao
2011-03-30  2:17     ` Dave Chinner
2011-03-30 11:13       ` Theodore Tso
2011-03-30 11:28         ` Ric Wheeler
2011-03-30 14:07           ` Chris Mason
2011-04-01 15:19           ` Ted Ts'o
2011-04-01 16:30             ` Amir Goldstein
2011-04-01 21:46               ` Joel Becker
2011-04-02  3:26                 ` Amir Goldstein
2011-04-01 21:43             ` Joel Becker
2011-04-01 21:43             ` Joel Becker
2011-04-01 21:43             ` Joel Becker
2011-03-30 21:49       ` Mingming Cao
2011-03-31  0:05         ` Matthew Wilcox
2011-03-31  1:00         ` Joel Becker
2011-04-01 21:34           ` Mingming Cao
2011-04-01 21:49             ` Joel Becker
2011-03-29 15:35 ` [LSF][MM] page allocation & direct reclaim latency Rik van Riel
2011-03-29 19:05   ` [Lsf] " Andrea Arcangeli
2011-03-29 20:35     ` Ying Han
2011-03-29 20:39       ` Ying Han
2011-03-29 20:45       ` Andrea Arcangeli
2011-03-29 20:53         ` Ying Han
2011-03-29 21:22     ` Rik van Riel
2011-03-29 22:38       ` Andrea Arcangeli
2011-03-29 22:13     ` Minchan Kim
2011-03-29 23:12       ` Andrea Arcangeli
2011-03-30 16:17       ` Mel Gorman
2011-03-30 16:49         ` Andrea Arcangeli
2011-03-31  0:42           ` Hugh Dickins
2011-03-31 15:15             ` Andrea Arcangeli
2011-03-31  9:30           ` Mel Gorman
2011-03-31 16:36             ` Andrea Arcangeli
2011-03-30 16:59         ` Dan Magenheimer
2011-03-29 17:35 ` [Lsf] Preliminary Agenda and Activities for LSF Chad Talbott
2011-03-29 19:09   ` Vivek Goyal
2011-03-29 20:14     ` Chad Talbott
2011-03-29 20:35     ` Jan Kara
2011-03-29 21:08       ` Greg Thelen
2011-03-30  4:18   ` Dave Chinner
2011-03-30 15:37     ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] Preliminary Agenda and Activities for LSF) Vivek Goyal
2011-03-30 22:20       ` Dave Chinner
2011-03-30 22:49         ` Chad Talbott
2011-03-31  3:00           ` Dave Chinner
2011-03-31 14:16         ` Vivek Goyal
2011-03-31 14:34           ` Chris Mason
2011-03-31 22:14             ` Dave Chinner
2011-03-31 23:43               ` Chris Mason
2011-04-01  0:55                 ` Dave Chinner
2011-04-01  1:34               ` Vivek Goyal
2011-04-01  4:36                 ` Dave Chinner
2011-04-01  6:32                   ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Christoph Hellwig
2011-04-01  7:23                     ` Dave Chinner
2011-04-01 12:56                       ` Christoph Hellwig
2011-04-21 15:07                         ` Vivek Goyal
2011-04-01 14:49                   ` IO less throttling and cgroup aware writeback (Was: Re: [Lsf] " Vivek Goyal
2011-03-31 22:25             ` Vivek Goyal
2011-03-31 14:50           ` [Lsf] IO less throttling and cgroup aware writeback (Was: " Greg Thelen
2011-03-31 22:27             ` Dave Chinner
2011-04-01 17:18               ` Vivek Goyal [this message]
2011-04-01 19:57                 ` [LSF]: fc_rport attributes to further populate HBAAPIv2 Giridhar Malavali
2011-04-01 21:49                 ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Dave Chinner
2011-04-02  7:33                   ` Greg Thelen
2011-04-02  7:34                     ` Greg Thelen
2011-04-05 13:13                   ` Vivek Goyal
2011-04-05 22:56                     ` Dave Chinner
2011-04-06 14:49                       ` Curt Wohlgemuth
2011-04-06 15:39                         ` Vivek Goyal
2011-04-06 19:49                           ` Greg Thelen
2011-04-06 23:07                           ` [Lsf] IO less throttling and cgroup aware writeback Greg Thelen
2011-04-06 23:36                             ` Dave Chinner
2011-04-07 19:24                               ` Vivek Goyal
2011-04-07 20:33                                 ` Christoph Hellwig
2011-04-07 21:34                                   ` Vivek Goyal
2011-04-07 23:42                                 ` Dave Chinner
2011-04-08  0:59                                   ` Greg Thelen
2011-04-08  1:25                                     ` Dave Chinner
2011-04-08  1:25                                       ` Dave Chinner
2011-04-12  3:17                                       ` KAMEZAWA Hiroyuki
2011-04-08 13:43                                   ` Vivek Goyal
2011-04-06 23:08                         ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Dave Chinner
2011-04-07 20:04                           ` Vivek Goyal
2011-04-07 23:47                             ` Dave Chinner
2011-04-08 13:50                               ` Vivek Goyal
2011-04-11  1:05                                 ` Dave Chinner
2011-04-06 15:37                       ` Vivek Goyal
2011-04-06 16:08                         ` Vivek Goyal
2011-04-06 17:10                           ` Jan Kara
2011-04-06 17:14                             ` Curt Wohlgemuth
2011-04-08  1:58                             ` Dave Chinner
2011-04-19 14:26                               ` Wu Fengguang
2011-04-06 23:50                         ` Dave Chinner
2011-04-07 17:55                           ` Vivek Goyal
2011-04-11  1:36                             ` Dave Chinner
2011-04-15 21:07                               ` Vivek Goyal
2011-04-16  3:06                                 ` Vivek Goyal
2011-04-18 21:58                                   ` Jan Kara
2011-04-18 22:51                                     ` cgroup IO throttling and filesystem ordered mode (Was: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF)) Vivek Goyal
2011-04-19  0:33                                       ` Dave Chinner
2011-04-19 14:30                                         ` Vivek Goyal
2011-04-19 14:45                                           ` Jan Kara
2011-04-19 17:17                                           ` Vivek Goyal
2011-04-19 18:30                                             ` Vivek Goyal
2011-04-21  0:32                                               ` Dave Chinner
2011-04-21  0:29                                           ` Dave Chinner
2011-04-19 14:17                               ` [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Wu Fengguang
2011-04-19 14:34                                 ` Vivek Goyal
2011-04-19 14:48                                   ` Jan Kara
2011-04-19 15:11                                     ` Vivek Goyal
2011-04-19 15:22                                       ` Wu Fengguang
2011-04-19 15:31                                         ` Vivek Goyal
2011-04-19 16:58                                           ` Wu Fengguang
2011-04-19 17:05                                             ` Vivek Goyal
2011-04-19 20:58                                               ` Jan Kara
2011-04-20  1:21                                                 ` Wu Fengguang
2011-04-20 10:56                                                   ` Jan Kara
2011-04-20 11:19                                                     ` Wu Fengguang
2011-04-20 14:42                                                       ` Jan Kara
2011-04-20  1:16                                               ` Wu Fengguang
2011-04-20 18:44                                                 ` Vivek Goyal
2011-04-20 19:16                                                   ` Jan Kara
2011-04-21  0:17                                                   ` Dave Chinner
2011-04-21 15:06                                                   ` Wu Fengguang
2011-04-21 15:10                                                     ` Wu Fengguang
2011-04-21 17:20                                                     ` Vivek Goyal
2011-04-22  4:21                                                       ` Wu Fengguang
2011-04-22 15:25                                                         ` Vivek Goyal
2011-04-22 16:28                                                           ` Andrea Arcangeli
2011-04-25 18:19                                                             ` Vivek Goyal
2011-04-26 14:37                                                               ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110401171838.GD20986@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=david@fromorbit.com \
    --cc=gthelen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.