Re: Per iocontext request descriptor limits (Was: Re: RFC: default group_isolation to 1, remove option)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Justin TerAvest <teravest@google.com>,
	Chad Talbott <ctalbott@google.com>,
	Nauman Rafique <nauman@google.com>,
	Divyesh Shah <dpshah@google.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Corrado Zoccolo <czoccolo@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Greg Thelen <gthelen@google.com>
Subject: Re: Per iocontext request descriptor limits (Was: Re: RFC: default group_isolation to 1, remove option)
Date: Thu, 03 Mar 2011 10:44:17 -0500	[thread overview]
Message-ID: <4D6FB751.3010608@kernel.dk> (raw)
In-Reply-To: <20110303153007.GF16720@redhat.com>

On 2011-03-03 10:30, Vivek Goyal wrote:
> On Wed, Mar 02, 2011 at 10:45:20PM -0500, Jens Axboe wrote:
>> On 2011-03-01 09:20, Vivek Goyal wrote:
>>> I think creating per group request pool will complicate the
>>> implementation further. (we have done that once in the past). Jens
>>> once mentioned that he liked number of requests per iocontext limit
>>> better than overall queue limit. So if we implement per iocontext
>>> limit, it will get rid of need of doing anything extra for group
>>> infrastructure.
>>>
>>> Jens, do you think per iocontext per queue limit on request
>>> descriptors make sense and we can get rid of per queue overall limit? 
>>
>> Since we practically don't need a limit anymore to begin with (or so is
>> the theory).
> 
> So what has changed that we don't need queue limits on nr_requests anymore?
> If we get rid of queue limits then we need to get rid of bdi congestion
> logic also and come up with some kind of ioc congestion logic so that
> a thread which does not want to sleep while submitting the request needs to
> checks it own ioc for being congested or not for a specific device/bdi. 

Right now congestion is a measure of request starvation on the OS side.
It may make sense to keep the notion of a congested device when we are
operating at the device limits. But as a blocking measure it should go
away.

No recent change is causing us to be able to throw away the limit. It
used to be that the vm got really unhappy with long queues, since you
could have tons of memory dirty. This works a LOT better now. And one
would hope that it does, since there are a number of drivers that don't
have limts. So when I say "practically" don't need limits anymore, the
hope is that we'll behave well enough with just per-ioc limits in place.

>> then yes we can move to per-ioc limits instead and get rid
>> of that queue state. We'd have to hold on to the ioc for the duration of
>> the IO explicitly from the request then.
> 
> I think every request submitted on request queue already takes a reference
> on ioc (set_request) and reference is not dropped till completion. So
> ioc is anyway around till request completes.

That's only true for CFQ, it's not a block layer property. This would
have to be explicitly done.

>> I primarily like that implementation since it means we can make the IO
>> completion lockless, at least on the block layer side. We still have
>> state to complete in the schedulers that require that, but it's a good
>> step at least.
> 
> Ok so in completion path the contention will move from queue_lock to
> ioc lock or something like that. (We hope that there are no other
> dependencies on queue here, devil lies in details :-))

Right, so it's spread out and in most cases the ioc will be completely
uncontended since it's usually private to the process.

> The other potential issue with this approach is how will we handle the
> case of flusher thread submitting IO. At some point of time we want to
> account it to right cgroup.
> 
> Retrieving iocontext from bio will be hard as it will atleast require
> on extra pointer in page_cgroup and I am not sure how feasible that is.
> 
> Or we could come up with the concept of group iocontext. With the help
> of page cgroup we should be able to get to cgroup, retrieve the right
> group iocontext and check the limit against that. But I guess this
> get complicated. 
> 
> So if we move to ioc based limit, then for async IO, a reasonable way
> would be to find the io context of submitting task and operate on that
> even if that means increased page_cgroup size.

For now it's not a complicated effort, I already have a patch for this.
If page tracking needs extra complexity, it'll have to remain in the
page tracking code.

-- 
Jens Axboe

next prev parent reply	other threads:[~2011-03-03 15:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-01  0:19 RFC: default group_isolation to 1, remove option Justin TerAvest
2011-03-01 14:20 ` Vivek Goyal
2011-03-01 18:44   ` Justin TerAvest
2011-03-02 21:28     ` Vivek Goyal
2011-03-06 16:06       ` Andrea Righi
2011-03-03  3:45   ` Jens Axboe
2011-03-03 15:30     ` Per iocontext request descriptor limits (Was: Re: RFC: default group_isolation to 1, remove option) Vivek Goyal
2011-03-03 15:44       ` Jens Axboe [this message]
2011-03-03 16:57         ` Vivek Goyal
2011-03-03 18:03           ` Vivek Goyal
2011-03-04 11:01             ` Jens Axboe
2011-03-04 21:31               ` Vivek Goyal
2011-03-04 21:34                 ` Jens Axboe
2011-03-07 18:20     ` RFC: default group_isolation to 1, remove option Justin TerAvest
2011-03-07 19:39       ` Jens Axboe
2011-03-07 20:24         ` Vivek Goyal
2011-03-07 20:32           ` Jens Axboe
2011-03-07 20:46             ` Vivek Goyal
2011-03-07 20:47               ` Jens Axboe
2011-03-07 23:41                 ` Justin TerAvest
2011-03-08  0:05             ` Vivek Goyal
2011-03-07 20:34           ` Vivek Goyal
2011-03-07 20:36             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6FB751.3010608@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=ctalbott@google.com \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=gthelen@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=teravest@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).