cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
To: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	sjayaraman-IBi9RG/b67k@public.gmane.org,
	lsf-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [RFC] writeback and cgroup
Date: Thu, 19 Apr 2012 22:26:35 +0200	[thread overview]
Message-ID: <20120419202635.GA4795@quack.suse.cz> (raw)
In-Reply-To: <20120419142343.GA12684@localhost>

On Thu 19-04-12 22:23:43, Wu Fengguang wrote:
> For one instance, splitting the request queues will give rise to
> PG_writeback pages.  Those pages have been the biggest source of
> latency issues in the various parts of the system.
  Well, if we allow more requests to be in flight in total then yes, number
of PG_Writeback pages can be higher as well.

> It's not uncommon for me to see filesystems sleep on PG_writeback
> pages during heavy writeback, within some lock or transaction, which in
> turn stall many tasks that try to do IO or merely dirty some page in
> memory. Random writes are especially susceptible to such stalls. The
> stable page feature also vastly increase the chances of stalls by
> locking the writeback pages. 
> 
> Page reclaim may also block on PG_writeback and/or PG_dirty pages. In
> the case of direct reclaim, it means blocking random tasks that are
> allocating memory in the system.
> 
> PG_writeback pages are much worse than PG_dirty pages in that they are
> not movable. This makes a big difference for high-order page allocations.
> To make room for a 2MB huge page, vmscan has the option to migrate
> PG_dirty pages, but for PG_writeback it has no better choices than to
> wait for IO completion.
> 
> The difficulty of THP allocation goes up *exponentially* with the
> number of PG_writeback pages. Assume PG_writeback pages are randomly
> distributed in the physical memory space. Then we have formula
> 
>         P(reclaimable for THP) = 1 - P(hit PG_writeback)^256
  Well, this implicitely assumes that PG_Writeback pages are scattered
across memory uniformly at random. I'm not sure to which extent this is
true... Also as a nitpick, this isn't really an exponential growth since
the exponent is fixed (256 - actually it should be 512, right?). It's just
a polynomial with a big exponent. But sure, growth in number of PG_Writeback
pages will cause relatively steep drop in the number of available huge
pages.

...
> It's worth to note that running multiple flusher threads per bdi means
> not only disk seeks for spin disks, smaller IO size for SSD, but also
> lock contentions and cache bouncing for metadata heavy workloads and
> fast storage.
  Well, this heavily depends on particular implementation (and chosen
data structures). But yes, we should have that in mind.

...
> > > To me, balance_dirty_pages() is *the* proper layer for buffered writes.
> > > It's always there doing 1:1 proportional throttling. Then you try to
> > > kick in to add *double* throttling in block/cfq layer. Now the low
> > > layer may enforce 10:1 throttling and push balance_dirty_pages() away
> > > from its balanced state, leading to large fluctuations and program
> > > stalls.
> > 
> > Just do the same 1:1 inside each cgroup.
> 
> Sure. But the ratio mismatch I'm talking about is inter-cgroup.
> For example there are only 2 dd tasks doing buffered writes in the
> system. Now consider the mismatch that cfq is dispatching their IO
> requests at 10:1 weights, while balance_dirty_pages() is throttling
> the dd tasks at 1:1 equal split because it's not aware of the cgroup
> weights.
> 
> What will happen in the end? The 1:1 ratio imposed by
> balance_dirty_pages() will take effect and the dd tasks will progress
> at the same pace. The cfq weights will be defeated because the async
> queue for the second dd (and cgroup) constantly runs empty.
  Yup. This just shows that you have to have per-cgroup dirty limits. Once
you have those, things start working again.

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR

  parent reply	other threads:[~2012-04-19 20:26 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-03 18:36 [RFC] writeback and cgroup Tejun Heo
2012-04-04 14:51 ` Vivek Goyal
     [not found]   ` <20120404145134.GC12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 15:36     ` [Lsf] " Steve French
2012-04-04 18:56       ` Tejun Heo
2012-04-04 19:19         ` Vivek Goyal
     [not found]           ` <20120404191918.GK12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-25  8:47             ` Suresh Jayaraman
2012-04-04 18:49     ` Tejun Heo
2012-04-04 19:23       ` [Lsf] " Steve French
2012-04-14 12:15         ` Peter Zijlstra
2012-04-04 20:32       ` Vivek Goyal
     [not found]         ` <20120404203239.GM12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 23:02           ` Tejun Heo
     [not found]       ` <20120404184909.GB29686-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-05 16:38         ` Tejun Heo
     [not found]           ` <20120405163854.GE12854-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-05 17:13             ` Vivek Goyal
2012-04-14 11:53         ` [Lsf] " Peter Zijlstra
2012-04-07  8:00   ` Jan Kara
     [not found]     ` <20120407080027.GA2584-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-10 16:23       ` [Lsf] " Steve French
2012-04-10 18:16         ` Vivek Goyal
2012-04-10 18:06     ` Vivek Goyal
     [not found]       ` <20120410180653.GJ21801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-10 21:05         ` Jan Kara
2012-04-10 21:20           ` Vivek Goyal
2012-04-10 22:24             ` Jan Kara
     [not found]               ` <20120410222425.GF4936-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-11 15:40                 ` Vivek Goyal
2012-04-11 15:45                   ` Vivek Goyal
2012-04-11 17:05                     ` Jan Kara
2012-04-11 17:23                       ` Vivek Goyal
     [not found]                         ` <20120411172311.GF16692-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-11 19:44                           ` Jan Kara
     [not found]                       ` <20120411170542.GB16008-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-17 21:48                         ` Tejun Heo
2012-04-18 18:18                           ` Vivek Goyal
2012-04-11 19:22                   ` Jan Kara
     [not found]                     ` <20120411192231.GF16008-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-12 20:37                       ` Vivek Goyal
2012-04-12 20:51                         ` Tejun Heo
     [not found]                           ` <20120412205148.GA24056-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-14 14:36                             ` Fengguang Wu
2012-04-16 14:57                               ` Vivek Goyal
2012-04-24 11:33                                 ` Fengguang Wu
2012-04-24 14:56                                   ` Jan Kara
2012-04-24 15:58                                     ` Vivek Goyal
     [not found]                                       ` <20120424155843.GG26708-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-25  2:42                                         ` Fengguang Wu
     [not found]                                     ` <20120424145655.GA1474-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-25  3:16                                       ` Fengguang Wu
2012-04-25  9:01                                         ` Jan Kara
     [not found]                                           ` <20120425090156.GB12568-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-25 12:05                                             ` Fengguang Wu
2012-04-15 11:37                         ` [Lsf] " Peter Zijlstra
2012-04-17 22:01                       ` Tejun Heo
2012-04-18  6:30                         ` Jan Kara
2012-04-14 12:25                   ` [Lsf] " Peter Zijlstra
2012-04-16 12:54                     ` Vivek Goyal
2012-04-16 13:07                       ` Fengguang Wu
2012-04-16 14:19                         ` Fengguang Wu
2012-04-16 15:52                         ` Vivek Goyal
     [not found]                           ` <20120416155207.GB15437-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-17  2:14                             ` Fengguang Wu
     [not found] ` <20120403183655.GA23106-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-04 17:51   ` Fengguang Wu
2012-04-04 18:35     ` Vivek Goyal
     [not found]       ` <20120404183528.GJ12676-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-04-04 21:42         ` Fengguang Wu
2012-04-05 15:10           ` Vivek Goyal
2012-04-06  0:32             ` Fengguang Wu
2012-04-04 19:33     ` Tejun Heo
     [not found]       ` <20120404193355.GD29686-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org>
2012-04-04 20:18         ` Vivek Goyal
2012-04-05 16:31           ` Tejun Heo
2012-04-05 17:09             ` Vivek Goyal
2012-04-06  9:59         ` Fengguang Wu
2012-04-17 22:38           ` Tejun Heo
     [not found]             ` <20120417223854.GG19975-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-19 14:23               ` Fengguang Wu
2012-04-19 18:31                 ` Vivek Goyal
2012-04-20 12:45                   ` Fengguang Wu
2012-04-20 19:29                     ` Vivek Goyal
2012-04-20 21:33                       ` Tejun Heo
2012-04-22 14:26                         ` Fengguang Wu
2012-04-23 12:30                         ` Vivek Goyal
2012-04-23 16:04                           ` Tejun Heo
2012-04-19 20:26                 ` Jan Kara [this message]
2012-04-20 13:34                   ` Fengguang Wu
2012-04-20 19:08                     ` Tejun Heo
     [not found]                       ` <20120420190844.GH32324-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-04-22 14:46                         ` Fengguang Wu
2012-04-23 16:56                           ` Tejun Heo
2012-04-24  7:58                             ` Fengguang Wu
2012-04-25 15:47                               ` Tejun Heo
2012-04-23  9:14                     ` Jan Kara
2012-04-23 10:24                       ` Fengguang Wu
2012-04-23 12:42                         ` Jan Kara
2012-04-23 14:31                           ` Fengguang Wu
2012-04-18  6:57           ` Jan Kara
     [not found]             ` <20120418065720.GA21485-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2012-04-18  7:58               ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120419202635.GA4795@quack.suse.cz \
    --to=jack-alswssmvlrq@public.gmane.org \
    --cc=andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=lsf-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=sjayaraman-IBi9RG/b67k@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).