Re: [Lsf-pc] [LSF/MM ATTEND] Filesystems -- Btrfs, cgroups, Storage topics from Facebook

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Chris Mason <clm@fb.com>
Cc: "tm@tao.ma" <tm@tao.ma>, "jack@suse.cz" <jack@suse.cz>,
	"gnehzuil.liu@gmail.com" <gnehzuil.liu@gmail.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [Lsf-pc] [LSF/MM ATTEND] Filesystems -- Btrfs, cgroups, Storage topics from Facebook
Date: Thu, 2 Jan 2014 07:46:59 +0100	[thread overview]
Message-ID: <20140102064659.GF11920@quack.suse.cz> (raw)
In-Reply-To: <1388504116.24668.0.camel@ret.masoncoding.com>

On Tue 31-12-13 15:34:40, Chris Mason wrote:
> On Tue, 2013-12-31 at 22:22 +0800, Tao Ma wrote:
> > Hi Chris,
> > On 12/31/2013 09:19 PM, Chris Mason wrote:
> >  
> > > So I'd like to throttle the rate at which dirty pages are created,
> > > preferably based on the rates currently calculated in the BDI of how
> > > quickly the device is doing IO.  This way we can limit dirty creation to
> > > a percentage of the disk capacity during the current workload
> > > (regardless of random vs buffered).
> > Fengguang had already done some work on this, but it seems that the
> > community does't have a consensus on where this control file should go.
> >  You can look at this link: https://lkml.org/lkml/2011/4/4/205
> 
> I had forgotten Wu's patches here, it's very close to the starting point
> I was hoping for.
  I specifically don't like those patches because throttling pagecache
dirty rate is IMHO rather poor interface. What people want to do is to
limit IO from a container. That means reads & writes, buffered & direct IO.
So dirty rate is just a one of several things which contributes to total IO
rate. When you have both direct IO & buffered IO happening in the container
they influence each other so dirty rate 50 MB/s may be fine when nothing
else is going on in the container but may be far to much for the system if
there are heavy direct IO reads happening as well.

So you really need to tune the limit on the dirty rate depending on how
fast the writeback can happen (which is what current IO-less throttling
does), not based on some hard throughput number like
50 MB/s (which is what Fengguang's patches did if I remember right).

What could work a tad bit better (and that seems to be something you are
proposing) is to have a weight for each memcg and each memcg would be
allowed to dirty at a rate proportional to its weight * writeback
throughput. But this still has a couple of problems:
1) This doesn't take into account local situation in a memcg - for memcg
   full of dirty pages you want to throttle dirtying much more than for a
   memcg which has no dirty pages.
2) Flusher thread (or workqueue these days) doesn't know anything about
   memcgs. So it can happily flush a memcg which is relatively OK for a
   rather long time while some other memcg is full of dirty pages and
   struggling to do any progress.
3) This will be somewhat unfair since the total IO allowed to happen from a
   container will depend on whether you are doing only reads (or DIO), only
   writes or both reads & writes.

In an ideal world you could compute writeback throughput for each memcg
(and writeback from a memcg would be accounted in a proper blkcg - we would
need unified memcg & blkcg hieararchy for that), take into account number of
dirty pages in each memcg, and compute dirty rate according to these two
numbers. But whether this can work in practice heavily depends on the memcg
size and how smooth / fair can the writeback from different memcgs be so
that we don't have excessive stalls and throughput estimation errors...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2014-01-02  6:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-30 21:36 [LSF/MM ATTEND] Filesystems -- Btrfs, cgroups, Storage topics from Facebook Chris Mason
2013-12-31  8:49 ` Zheng Liu
2013-12-31  9:36   ` Jeff Liu
2013-12-31 12:45   ` [Lsf-pc] " Jan Kara
2013-12-31 13:19     ` Chris Mason
2013-12-31 14:22       ` Tao Ma
2013-12-31 15:34         ` Chris Mason
2014-01-02  6:46           ` Jan Kara [this message]
2014-01-02 15:21             ` Chris Mason
2014-01-02 16:01               ` tj
2014-01-02 16:14                 ` tj
2014-01-03  6:03                   ` Jan Kara
2014-01-02 17:06                 ` Vivek Goyal
2014-01-02 17:10                   ` tj
2014-01-02 19:11                     ` Chris Mason
2014-01-03  6:39                       ` Jan Kara
2014-01-02 18:27                 ` James Bottomley
2014-01-02 18:36                   ` tj
2014-01-03  7:44                     ` James Bottomley
2014-01-08 15:04       ` Mel Gorman
2014-01-08 16:14         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140102064659.GF11920@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=clm@fb.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tm@tao.ma \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).