From: Vivek Goyal <vgoyal@redhat.com>
To: Chad Talbott <ctalbott@google.com>
Cc: jaxboe@fusionio.com, guijianfeng@cn.fujitsu.com,
mrubin@google.com, teravest@google.com, jmoyer@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Avoid preferential treatment of groups that aren't backlogged
Date: Fri, 18 Feb 2011 14:54:55 -0500 [thread overview]
Message-ID: <20110218195454.GI26654@redhat.com> (raw)
In-Reply-To: <20110211181533.GG8773@redhat.com>
On Fri, Feb 11, 2011 at 01:15:33PM -0500, Vivek Goyal wrote:
> On Thu, Feb 10, 2011 at 04:36:25PM -0800, Chad Talbott wrote:
> > On Thu, Feb 10, 2011 at 10:57 AM, Chad Talbott <ctalbott@google.com> wrote:
> > > On Wed, Feb 9, 2011 at 7:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > >> If you ran different random readers in different groups of differnet
> > >> weight with group_isolation=1, then there is a case of having service
> > >> differentiation. In that case we will idle for 8ms on each group before
> > >> we expire the group. So in these test cases are low weight groups not
> > >> submitting IO with-in 8ms? Putting a random reader in separate group
> > >> with think time > 8, I think is going to hurt a lot because for every
> > >> single IO dispatched group is going to weight for 8ms before it is
> > >> expired.
> > >
> > > You're right about the behavior of group_idle. We have more
> > > experience with earlier kernels (before group_idle). With this patch
> > > we are able to achieve isolation without group_idle even with these
> > > large ratios. (Without group_idle the random reader workloads will
> > > get marked seeky, and idling is disabled. Without group_idle, we have
> > > to remember the vdisktime to get isolation.)
> > >
> > >> Can you run blktrace and verify what's happenig?
> > >
> > > I can run a blktrace, and I think it will show what you expect.
> >
> > So, I ran the following two tests and took a blktrace.
> >
> > 950 rdrand, 50 rdrand.delay10
> > weight 950 random reader with low think time vs weight 50 random
> > reader with 10ms think time
> >
> > 950 rdrand, 50 rdrand.delay50 # 50ms think time
> > weight 950 random reader with low think time vs weight 50 random
> > reader with 50ms think time
> >
> > I find that we are still idling for these random readers, even the one
> > with 50ms think time. group_idle is 0 according to blktrace.
> >
> > With this patch, both of these cases have correct isolation. Without
> > this patch, the small weight reader is able to get more than its
> > share.
> >
> > I think that idling for a random reader with a 50ms think time is
> > likely a bug, but a separate issue.
>
> Thanks for checking this out. I agree that for a low weight random
> reader/writer which high think time, we need to remember the vdisktime
> otherwise it will showup as a fresh new candidate and get more done.
>
> Having said that, one can say that random reader/writer doing small
> amount of IO should be able to get job done really fast and the one
> who are hogging the disk for long time, should get higher vdisktime.
>
> So with this scheme, a random reader/writer shall have to be of higher
> weight to get the job done fast. A low weight reader/writer will still
> get higher vdisktime and get lesser share. I think it is reasonable.
>
> And yes, even with group_idle=0 if we are idling on a 50ms thinktime
> random reader it sounds like a bug.
Thinking more about it, I think it must be happening because of the fact
that random IO goes on sync-noidle tree of group and there we idle on
whole tree. I think if you set slice_idle=0 along with group_idle=0, that
idling should go away.
Thanks
Vivek
next prev parent reply other threads:[~2011-02-18 19:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-10 1:32 [PATCH] Avoid preferential treatment of groups that aren't backlogged Chad Talbott
2011-02-10 2:09 ` Vivek Goyal
2011-02-10 2:45 ` Chad Talbott
2011-02-10 3:57 ` Vivek Goyal
2011-02-10 18:57 ` Chad Talbott
2011-02-11 0:36 ` Chad Talbott
2011-02-11 18:15 ` Vivek Goyal
2011-02-18 19:54 ` Vivek Goyal [this message]
2011-02-10 4:02 ` Vivek Goyal
2011-02-10 19:06 ` Chad Talbott
2011-02-11 18:30 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110218195454.GI26654@redhat.com \
--to=vgoyal@redhat.com \
--cc=ctalbott@google.com \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jaxboe@fusionio.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mrubin@google.com \
--cc=teravest@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.