From: Vivek Goyal <vgoyal@redhat.com>
To: Andrea Righi <andrea@betterlinux.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
Suresh Jayaraman <sjayaraman@suse.com>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
Greg Thelen <gthelen@google.com>
Subject: Re: [ATTEND] [LSF/MM TOPIC] Buffered writes throttling
Date: Wed, 7 Mar 2012 15:52:09 -0500 [thread overview]
Message-ID: <20120307205209.GK13430@redhat.com> (raw)
In-Reply-To: <20120305225801.GB7545@thinkpad>
On Mon, Mar 05, 2012 at 11:58:01PM +0100, Andrea Righi wrote:
[..]
> What about this scenario? (Sorry, I've not followed some of the recent
> discussions on this topic, so I'm sure I'm oversimplifying a bit or
> ignoring some details):
>
> - track inodes per-memcg for writeback IO (provided Greg's patch)
> - provide per-memcg dirty limit (global, not per-device); when this
> limit is exceeded flusher threads are awekened and all tasks that
> continue to generate new dirty pages inside the memcg are put to
> sleep
> - flusher threads start to write some dirty inodes of this memcg (using
> the inode tracking feature), let say they start with a chunk of N
> pages of the first dirty inode
> - flusher threads can't flush in this way more than N pages / sec
> (where N * PAGE_SIZE / sec is the blkcg "buffered write rate limit"
> on the inode's block device); if a flusher thread exceeds this limit
> it won't be blocked directly, it just stops flushing pages for this
> memcg after the first chunk and it can continue to flush dirty pages
> of a different memcg.
>
So, IIUC, the only thing little different here is that throttling is
implemented by flusher thread. But it is still per device per cgroup. I
think that is just a implementation detail whether we implement it
in block layer, or in writeback or somewhere else. We can very well
implement it in block layer and provide per bdi/per_group congestion
flag in bdi so that flusher will stop pushing more IO if group on
a bdi is congested (because IO is throttled).
I think first important thing is to figure out what is minimal set of
requirement (As jan said in another mail), which will solve wide
variety of cases. I am trying to list some of points.
- Throttling for buffered writes
- Do we want per device throttling limits or global throttling
limtis.
- Exising direct write limtis are per device and implemented in
block layer.
- I personally think that both kind of limits might make sense.
But a global limit for async write might make more sense at
least for the workloads like backup which can run on a throttled
speed.
- Absolute throttling IO will make most sense on top level device
in the IO stack.
- For per device rate throttling, do we want a common limit for
direct write and buffered write or a separate limit just for
buffered writes.
- Proportional IO for async writes
- Will probably make most sense on bottom most devices in the IO
stack (If we are able to somehow retain the submitter's context).
- Logically it will make sense to keep sync and async writes in
same group and try to provide fair share of disk between groups.
Technically CFQ can do that but in practice I think it will be
problematic. Writes of one group will take precedence of reads
of another group. Currently any read is prioritized over
buffered writes. So by splitting buffered writes in their own
cgroups, they can serverly impact the latency of reads in
another group. Not sure how many people really want to do
that in practice.
- Do we really need proportional IO for async writes. CFQ had
tried implementing ioprio for async writes but it does not
work. Should we just care about groups of sync IO and let
all the async IO on device go in a single queue and lets
make suere it is not starved while sync IO is going on.
- I thought that most of the people cared about not impacting
sync latencies badly while buffered writes are happening. Not
many complained that buffered writes of one application should
happen faster than other application.
- If we agree that not many people require service differentation
between buffered writes, then we probably don't have to do
anything in this space and we can keep things simple. I
personally prefer this option. Trying to provide proportional
IO for async writes will make things complicated and we might
not achieve much.
- CFQ already does a very good job of prioritizing sync over async
(at the cost of reduced throuhgput on fast devices). So what's
the use case of proportion IO for async writes.
Once we figure out what are the requirements, we can discuss the
implementation details.
Thanks
Vivek
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-03-07 20:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-02 7:18 [ATTEND] [LSF/MM TOPIC] Buffered writes throttling Suresh Jayaraman
2012-03-02 15:33 ` Vivek Goyal
2012-03-05 19:22 ` Fengguang Wu
2012-03-05 21:11 ` Vivek Goyal
2012-03-05 22:30 ` Fengguang Wu
2012-03-05 23:19 ` Andrea Righi
2012-03-05 23:51 ` Fengguang Wu
2012-03-06 0:46 ` Andrea Righi
2012-03-07 20:26 ` Vivek Goyal
2012-03-05 22:58 ` Andrea Righi
2012-03-07 20:52 ` Vivek Goyal [this message]
2012-03-07 22:04 ` Jeff Moyer
2012-03-08 8:08 ` Greg Thelen
2012-03-05 20:23 ` [Lsf-pc] " Jan Kara
2012-03-05 21:41 ` Vivek Goyal
2012-03-07 17:24 ` Jan Kara
2012-03-07 21:29 ` Vivek Goyal
2012-03-05 22:18 ` Vivek Goyal
2012-03-05 22:36 ` Jan Kara
2012-03-07 6:42 ` Fengguang Wu
2012-03-07 6:31 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120307205209.GK13430@redhat.com \
--to=vgoyal@redhat.com \
--cc=andrea@betterlinux.com \
--cc=fengguang.wu@intel.com \
--cc=gthelen@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=sjayaraman@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).