public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Jens Axboe <axboe@kernel.dk>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>
Subject: Re: [PATCH] Percpu tag allocator
Date: Wed, 12 Jun 2013 22:46:23 -0700	[thread overview]
Message-ID: <20130612224623.bf303e33.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130613035432.GE10979@localhost>

On Wed, 12 Jun 2013 20:54:32 -0700 Kent Overstreet <koverstreet@google.com> wrote:

> On Wed, Jun 12, 2013 at 08:03:11PM -0700, Andrew Morton wrote:
> > On Wed, 12 Jun 2013 19:05:36 -0700 Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > ...
> > >
> > > > Why can't we use ida_get_new_above()?
> > > > 
> > > >    If it is "speed" then how bad is the problem and what efforts have
> > > >    been made to address them within the idr code?  (A per-cpu magazine
> > > >    frontend to ida_get_new_above() might suit).
> > > > 
> > > >    If it is "functionality" then what efforts have been made to
> > > >    suitably modify the ida code?
> > > 
> > > Originally, it was because every time I opened idr.[ch] I was confronted
> > > with an enormous pile of little functions, with very little indication
> > > in the way of what they were all trying to do or which ones I might want
> > > to start with.
> > > 
> > > Having finally read enough of the code to maybe get a handle on what
> > > it's about - performance is a large part of it, but idr is also a more
> > > complicated api that does more than what I wanted.
> > 
> > They all sound like pretty crappy reasons ;) If the idr/ida interface
> > is nasty then it can be wrapped to provide the same interface as the
> > percpu tag allocator.
> 
> Well, the way I see it right now, idr and this tag allocator are doing
> two somewhat different things and I'm really not sure how one piece of
> code could do both jobs.
> 
> Supposing idr could be made percpu and just as fast as my tag allocator:

We don't know how fast either is, nor what the performance requirements are.

> the problem is, using idr where right now I'd use the tag allocator
> forces you to do two allocations instead of one:
> 
> First, you allocate your idr slot - but now that has to point to
> something, so you need a kmem_cache_alloc() too.

Only a single allocation is needed - for the bit used to reserve
ida_get_new_above()'s ID, plus epsilon for idr metadata. 
ida_get_new_above() will call kmalloc approximately 0.001 times per
invocation.

> So right now at least I honestly think letting the tag allocator and idr
> be distinct things is probably the way to go.

That depends on performance testing results and upon performance
requirements.  Neither are known at this time.

> (now someone will point out to me all the fancy percpu optimizations in
> idr I missed).

idr use percpu data for reliability, not for performance.

Cross-cpu costs might be significant.  There are surely ways to reduce
them.  I could suggest one but I've found that when I make a
suggestion, others busily work on shooting down my suggestion while
avoiding thinking of their own ideas.

> > I think all this could be done with test_and_set_bit() and friends,
> > btw.  xchg() hurts my brain.
> 
> Ahh, you were talking about with a slightly bigger rework.

Not really.  Just that the xchg() in this code could be replaced in a
straightforward fashion with test_and_set_bit().  Or maybe not.

> > > You mean if a GFP_NOWAIT allocation fails? It's the same as any other
> > > allocation, I suppose.
> > > 
> > > Did you have something else in mind that could be implemented? I don't
> > > want to add code for a reserve to this code - IMO, if a reserve is
> > > needed it should be done elsewhere (like how mempools work on top of
> > > existing allocators).
> > 
> > Dunno, really - I'm just wondering what the implications of an
> > allocation failure will be.  I suspect it's EIO?  Which in some
> > circumstances could be as serious as total system failure (loss of
> > data), so reliability/robustness is a pretty big deal.
> 
> Ahh. That's just outside the scope of this code - IME, in driver code
> GFP_NOWAIT allocations are not the norm - most tags are created when
> you're submitting a bio or processing a request, and then you're in
> process context. But in your error handling code you could also need to
> allocate tags to resubmit things - that'd be an issue if you're silly
> enough to stick all your error handling code in your interrupt handling
> path.

Our experience with alloc_pages(GFP_ATOMIC) has no relevance to this
code, because this code doesn't call alloc_pages()!  Instead of failing
if page reserves are exhausted, it will fail if no tags are available
in this new data structure.

The reliability (or otherwise) of alloc_pages(GFP_ATOMIC) is well
understood whereas the reliability of percpu_tag_alloc(GFP_ATOMIC) is
not understood at all.  So let us think about both this and about the
consequences of allocation failures.

> > Another thing: CPU hot-unplug.  If it's "don't care" then the
> > forthcoming changelog should thoroughly explain why, please.  Otherwise
> > it will need a notifier to spill back any reservation.
> 
> Oh - doesn't care, becasue steal_tags() will eventually get it. We
> _could_ satisfy more allocations if we had a notifier - but we'll still
> meet our guarantees and it's absolutely not a correctness issue, so I
> lean towards just leaving it out.

I agree.  It's a design decision which should be explained and justified
in the changelog, please.

  reply	other threads:[~2013-06-13  5:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12  4:03 [PATCH] Percpu tag allocator Kent Overstreet
2013-06-12 17:08 ` Oleg Nesterov
2013-06-12 17:59   ` Kent Overstreet
2013-06-12 19:14     ` Oleg Nesterov
2013-06-12 23:38 ` Andrew Morton
2013-06-13  2:05   ` Kent Overstreet
2013-06-13  3:03     ` Andrew Morton
2013-06-13  3:54       ` Kent Overstreet
2013-06-13  5:46         ` Andrew Morton [this message]
2013-06-13 18:53       ` Tejun Heo
2013-06-13 19:04         ` Andrew Morton
2013-06-13 19:15           ` Tejun Heo
2013-06-13 19:23             ` Andrew Morton
2013-06-13 19:35               ` Tejun Heo
2013-06-13 22:10                 ` Andrew Morton
2013-06-13 22:30                   ` Tejun Heo
2013-06-13 22:35                     ` Andrew Morton
2013-06-13 23:13                       ` Tejun Heo
2013-06-13 23:23                         ` Tejun Heo
2013-06-19  1:32               ` Kent Overstreet
2013-06-13 19:08         ` J. Bruce Fields
2013-06-13 19:09         ` Jeff Layton
2013-06-13 21:53         ` Kent Overstreet
2013-06-13 19:06   ` Tejun Heo
2013-06-13 19:13     ` Andrew Morton
2013-06-13 19:21       ` Tejun Heo
2013-06-13 21:14         ` Kent Overstreet
2013-06-13 21:50           ` Tejun Heo
2013-06-13 21:58             ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130612224623.bf303e33.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux-foundation.org \
    --cc=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nab@linux-iscsi.org \
    --cc=oleg@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox