From: Kent Overstreet <kmo@daterainc.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <tom.leiming@gmail.com>,
Alexander Gordeev <agordeev@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Shaohua Li <shli@kernel.org>,
Nicholas Bellinger <nab@linux-iscsi.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH RFC 0/2] percpu_ida: Take into account CPU topology when stealing tags
Date: Thu, 1 May 2014 19:38:27 -0700 [thread overview]
Message-ID: <20140502023827.GA2276@kmo> (raw)
In-Reply-To: <536300BB.5060906@kernel.dk>
On Thu, May 01, 2014 at 08:19:39PM -0600, Jens Axboe wrote:
> On 2014-05-01 16:47, Kent Overstreet wrote:
> >On Tue, Apr 29, 2014 at 03:13:38PM -0600, Jens Axboe wrote:
> >>On 04/29/2014 05:35 AM, Ming Lei wrote:
> >>>On Sat, Apr 26, 2014 at 10:03 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >>>>On 2014-04-25 18:01, Ming Lei wrote:
> >>>>>
> >>>>>Hi Jens,
> >>>>>
> >>>>>On Sat, Apr 26, 2014 at 5:23 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>>
> >>>>>>On 04/25/2014 03:10 AM, Ming Lei wrote:
> >>>>>>
> >>>>>>Sorry, I did run it the other day. It has little to no effect here, but
> >>>>>>that's mostly because there's so much other crap going on in there. The
> >>>>>>most effective way to currently make it work better, is just to ensure
> >>>>>>the caching pool is of a sane size.
> >>>>>
> >>>>>
> >>>>>Yes, that is just what the patch is doing, :-)
> >>>>
> >>>>
> >>>>But it's not enough.
> >>>
> >>>Yes, the patch is only for cases of mutli hw queue and having
> >>>offline CPUs existed.
> >>>
> >>>>For instance, my test case, it's 255 tags and 64 CPUs.
> >>>>We end up in cross-cpu spinlock nightmare mode.
> >>>
> >>>IMO, the scaling problem for the above case might be
> >>>caused by either current percpu ida design or blk-mq's
> >>>usage on it.
> >>
> >>That is pretty much my claim, yes. Basically I don't think per-cpu tag
> >>caching is ever going to be the best solution for the combination of
> >>modern machines and the hardware that is out there (limited tags).
> >
> >Sorry for not being more active in the discussion earlier, but anyways - I'm in
> >100% agreement with this.
> >
> >Percpu freelists are _fundamentally_ only _useful_ when you don't need to be
> >using all your available tags, because percpu sharding requires wasting your tag
> >space. I could write a mathematical proof of this if I cared enough.
> >
> >Otherwise what happens is on alloc failure you're touching all the other
> >cachelines every single time and now you're bouncing _more_ cachelines than if
> >you just had a single global freelist.
> >
> >So yeah, for small tag spaces just use a single simple bit vector on a single
> >cacheline.
>
> I've taken the consequence of this and implemented another tagging scheme
> that blk-mq will use if it deems that percpu_ida isn't going to be effective
> for the device being initialized. But I really hate to have both of them in
> there. Unfortunately I have no devices available that have a tag space that
> will justify using percu_ida, so comparisons are a bit hard at the moment.
> NVMe should change that, though, so decision will have to be deferred until
> that is tested.
Yeah, I agree that is annoying. I've thought about the issue too though and I
haven't been able to come up with any better ideas, myself.
A given driver probably should be able to always use one or the other though, so
we shouldn't _need_ a runtime conditional because of this, though structuring
the code that way might be more trouble than it's worth from my vague
recollection of what blk-mq looks likee...
(I've actually been fighting with unrelated issues at a very similar layer of
abstraction, it's quite annoying.).
> >BTW, Shaohua Li's patch d835502f3dacad1638d516ab156d66f0ba377cf5 that changed
> >when steal_tags() runs was fundamentally wrong and broken in this respect, and
> >should be reverted, whatever usage it was that was expecting to be able to
> >allocate the entire tag space was the problem.
>
> It's hard to blame Shaohua, and I helped push that. It was an attempt in
> making percpu_ida actually useful for what blk-mq needs it for, and being
> the primary user of it, it was definitely worth doing. A tagging scheme that
> requires the tag space to be effectively sparse/huge to be fast is not a
> good generic tagging algorithm.
Yeah it was definitely not an unreasonable attempt and it's probably my own
fault for not speaking up louder at the time (I can't remember how much I
commented at the time). Ah well, irritating problem space :)
next prev parent reply other threads:[~2014-05-02 2:38 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-26 13:34 [PATCH RFC 0/2] percpu_ida: Take into account CPU topology when stealing tags Alexander Gordeev
2014-03-26 13:34 ` [PATCH RFC 1/2] sched: Introduce topology level masks and for_each_tlm() macro Alexander Gordeev
2014-03-26 13:34 ` [PATCH RFC 2/2] percpu_ida: Use for_each_tlm() macro for CPU lookup in steal_tags() Alexander Gordeev
2014-04-22 7:10 ` [PATCH RFC 0/2] percpu_ida: Take into account CPU topology when stealing tags Alexander Gordeev
2014-04-22 14:03 ` Jens Axboe
2014-04-22 15:57 ` Jens Axboe
2014-04-23 0:53 ` Ming Lei
2014-04-23 1:25 ` Jens Axboe
2014-04-25 9:10 ` Ming Lei
2014-04-25 21:23 ` Jens Axboe
2014-04-26 0:01 ` Ming Lei
2014-04-26 2:03 ` Jens Axboe
2014-04-29 11:35 ` Ming Lei
2014-04-29 21:13 ` Jens Axboe
2014-04-30 9:40 ` Ming Lei
2014-05-01 22:47 ` Kent Overstreet
2014-05-02 2:19 ` Jens Axboe
2014-05-02 2:38 ` Kent Overstreet [this message]
2014-05-02 2:44 ` Jens Axboe
2014-05-02 5:05 ` Christoph Hellwig
2014-05-02 16:41 ` Jens Axboe
2014-05-02 16:43 ` Christoph Hellwig
2014-05-02 16:56 ` Jens Axboe
2014-04-22 11:56 ` Peter Zijlstra
2014-05-01 21:24 ` Alexander Gordeev
2014-05-01 22:04 ` Alexander Gordeev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140502023827.GA2276@kmo \
--to=kmo@daterainc.com \
--cc=agordeev@redhat.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nab@linux-iscsi.org \
--cc=peterz@infradead.org \
--cc=shli@kernel.org \
--cc=tom.leiming@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox