All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Justin TerAvest <teravest@google.com>,
	Chad Talbott <ctalbott@google.com>,
	Nauman Rafique <nauman@google.com>,
	Divyesh Shah <dpshah@google.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Corrado Zoccolo <czoccolo@gmail.com>
Subject: Re: RFC: default group_isolation to 1, remove option
Date: Mon, 07 Mar 2011 21:32:54 +0100	[thread overview]
Message-ID: <4D7540F6.3080303@kernel.dk> (raw)
In-Reply-To: <20110307202432.GH9540@redhat.com>

On 2011-03-07 21:24, Vivek Goyal wrote:
> On Mon, Mar 07, 2011 at 08:39:52PM +0100, Jens Axboe wrote:
>> It's at least not my goal, it has nothing to do with isolation. Since we
>> have ->make_request_fn() drivers operating completely without queuing
>> limits, it may just be that we can drop the tracking completely on the
>> request side. Either one is currently broken, or both will work that
>> way. And if that is the case, then we don't have to do this ioc tracking
>> at all. With the additional complication of now needing
>> per-disk-per-process io contexts, that approach is looking a lot more
>> tasty right now.
> 
> I am writting the code for per-disk-per-process io context and it is
> significant amount of code and as code size is growing I am also wondering
> if it worth the complication.

Yep, I don't think we should do that.

> Currently request queue blocks a process if device is congested. It might
> happen that one process in a low weight cgroup is doing writes and has
> consumed all available request descriptors (it is really easy to produce)
> and now device is congested. Now any writes from high weight/prio cgroup
> will not even be submitted on request queue and hence they can not be
> given priority by CFQ.
> 
>>
>> Or not get rid of limits completely, but do a lot more relaxed
>> accounting at the queue level still. That will not require any
>> additional tracking of io contexts etc, but still impose some limit on
>> the number of queued IOs.
> 
> A lot more relaxed limit accounting should help a bit but it after a 
> while it might happen that slow movers eat up lots of request descriptors
> and making not much of progress.
> 
> Long back I had implemented this additional notion of q->nr_group_requests
> where we defined per group number of requests allowed submitter will
> be put to sleep. I also extended it to also export per bdi per group
> congestion notion. So a flusher thread can look at the page and cgroup
> of the page and determine if respective cgroup is congested or not. If
> cgroup is congested, flusher thread can move to next inode so that it
> is not put to sleep behind a slow mover.
> 
> Completely limitless queueu will solve the problem completely. But I guess
> then we can get that complain back that flusher thread submitted too much
> of IO to device.
> 
> So given then fact that per-ioc-per-disk accounting of request descriptors
> makes the accounting complicated and also makes it hard for block IO
> controller to use it, the other approach of implementing per group limit
> and per-group-per-bdi congested might be reasonable. Having said that, the
> patch I had written for per group descritor was also not necessarily very
> simple.

So before all of this gets over designed a lot... If we get rid of the
one remaining direct buffered writeback in bdp(), then only the flusher
threads should be sending huge amounts of IO. So if we attack the
problem from that end instead, have it do that accounting in the bdi.
With that in place, I'm fairly confident that we can remove the request
limits.

Basically just replace the congestion_wait() in there with a bit of
accounting logic. Since it's per bdi anyway, we don't even have to
maintain that state in the bdi itself. It can remain in the thread
stack.

-- 
Jens Axboe


  reply	other threads:[~2011-03-07 20:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-01  0:19 RFC: default group_isolation to 1, remove option Justin TerAvest
2011-03-01 14:20 ` Vivek Goyal
2011-03-01 18:44   ` Justin TerAvest
2011-03-02 21:28     ` Vivek Goyal
2011-03-06 16:06       ` Andrea Righi
2011-03-03  3:45   ` Jens Axboe
2011-03-03 15:30     ` Per iocontext request descriptor limits (Was: Re: RFC: default group_isolation to 1, remove option) Vivek Goyal
2011-03-03 15:44       ` Jens Axboe
2011-03-03 16:57         ` Vivek Goyal
2011-03-03 18:03           ` Vivek Goyal
2011-03-04 11:01             ` Jens Axboe
2011-03-04 21:31               ` Vivek Goyal
2011-03-04 21:34                 ` Jens Axboe
2011-03-07 18:20     ` RFC: default group_isolation to 1, remove option Justin TerAvest
2011-03-07 19:39       ` Jens Axboe
2011-03-07 20:24         ` Vivek Goyal
2011-03-07 20:32           ` Jens Axboe [this message]
2011-03-07 20:46             ` Vivek Goyal
2011-03-07 20:47               ` Jens Axboe
2011-03-07 23:41                 ` Justin TerAvest
2011-03-08  0:05             ` Vivek Goyal
2011-03-07 20:34           ` Vivek Goyal
2011-03-07 20:36             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D7540F6.3080303@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=ctalbott@google.com \
    --cc=czoccolo@gmail.com \
    --cc=dpshah@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=teravest@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.