All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Jens Axboe <axboe@kernel.dk>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>
Subject: Re: [PATCH] Percpu tag allocator
Date: Wed, 12 Jun 2013 22:46:23 -0700	[thread overview]
Message-ID: <20130612224623.bf303e33.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130613035432.GE10979@localhost>

On Wed, 12 Jun 2013 20:54:32 -0700 Kent Overstreet <koverstreet@google.com> wrote:

> On Wed, Jun 12, 2013 at 08:03:11PM -0700, Andrew Morton wrote:
> > On Wed, 12 Jun 2013 19:05:36 -0700 Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > ...
> > >
> > > > Why can't we use ida_get_new_above()?
> > > > 
> > > >    If it is "speed" then how bad is the problem and what efforts have
> > > >    been made to address them within the idr code?  (A per-cpu magazine
> > > >    frontend to ida_get_new_above() might suit).
> > > > 
> > > >    If it is "functionality" then what efforts have been made to
> > > >    suitably modify the ida code?
> > > 
> > > Originally, it was because every time I opened idr.[ch] I was confronted
> > > with an enormous pile of little functions, with very little indication
> > > in the way of what they were all trying to do or which ones I might want
> > > to start with.
> > > 
> > > Having finally read enough of the code to maybe get a handle on what
> > > it's about - performance is a large part of it, but idr is also a more
> > > complicated api that does more than what I wanted.
> > 
> > They all sound like pretty crappy reasons ;) If the idr/ida interface
> > is nasty then it can be wrapped to provide the same interface as the
> > percpu tag allocator.
> 
> Well, the way I see it right now, idr and this tag allocator are doing
> two somewhat different things and I'm really not sure how one piece of
> code could do both jobs.
> 
> Supposing idr could be made percpu and just as fast as my tag allocator:

We don't know how fast either is, nor what the performance requirements are.

> the problem is, using idr where right now I'd use the tag allocator
> forces you to do two allocations instead of one:
> 
> First, you allocate your idr slot - but now that has to point to
> something, so you need a kmem_cache_alloc() too.

Only a single allocation is needed - for the bit used to reserve
ida_get_new_above()'s ID, plus epsilon for idr metadata. 
ida_get_new_above() will call kmalloc approximately 0.001 times per
invocation.

> So right now at least I honestly think letting the tag allocator and idr
> be distinct things is probably the way to go.

That depends on performance testing results and upon performance
requirements.  Neither are known at this time.

> (now someone will point out to me all the fancy percpu optimizations in
> idr I missed).

idr use percpu data for reliability, not for performance.

Cross-cpu costs might be significant.  There are surely ways to reduce
them.  I could suggest one but I've found that when I make a
suggestion, others busily work on shooting down my suggestion while
avoiding thinking of their own ideas.

> > I think all this could be done with test_and_set_bit() and friends,
> > btw.  xchg() hurts my brain.
> 
> Ahh, you were talking about with a slightly bigger rework.

Not really.  Just that the xchg() in this code could be replaced in a
straightforward fashion with test_and_set_bit().  Or maybe not.

> > > You mean if a GFP_NOWAIT allocation fails? It's the same as any other
> > > allocation, I suppose.
> > > 
> > > Did you have something else in mind that could be implemented? I don't
> > > want to add code for a reserve to this code - IMO, if a reserve is
> > > needed it should be done elsewhere (like how mempools work on top of
> > > existing allocators).
> > 
> > Dunno, really - I'm just wondering what the implications of an
> > allocation failure will be.  I suspect it's EIO?  Which in some
> > circumstances could be as serious as total system failure (loss of
> > data), so reliability/robustness is a pretty big deal.
> 
> Ahh. That's just outside the scope of this code - IME, in driver code
> GFP_NOWAIT allocations are not the norm - most tags are created when
> you're submitting a bio or processing a request, and then you're in
> process context. But in your error handling code you could also need to
> allocate tags to resubmit things - that'd be an issue if you're silly
> enough to stick all your error handling code in your interrupt handling
> path.

Our experience with alloc_pages(GFP_ATOMIC) has no relevance to this
code, because this code doesn't call alloc_pages()!  Instead of failing
if page reserves are exhausted, it will fail if no tags are available
in this new data structure.

The reliability (or otherwise) of alloc_pages(GFP_ATOMIC) is well
understood whereas the reliability of percpu_tag_alloc(GFP_ATOMIC) is
not understood at all.  So let us think about both this and about the
consequences of allocation failures.

> > Another thing: CPU hot-unplug.  If it's "don't care" then the
> > forthcoming changelog should thoroughly explain why, please.  Otherwise
> > it will need a notifier to spill back any reservation.
> 
> Oh - doesn't care, becasue steal_tags() will eventually get it. We
> _could_ satisfy more allocations if we had a notifier - but we'll still
> meet our guarantees and it's absolutely not a correctness issue, so I
> lean towards just leaving it out.

I agree.  It's a design decision which should be explained and justified
in the changelog, please.

  reply	other threads:[~2013-06-13  5:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12  4:03 [PATCH] Percpu tag allocator Kent Overstreet
2013-06-12 17:08 ` Oleg Nesterov
2013-06-12 17:59   ` Kent Overstreet
2013-06-12 19:14     ` Oleg Nesterov
2013-06-12 23:38 ` Andrew Morton
2013-06-13  2:05   ` Kent Overstreet
2013-06-13  3:03     ` Andrew Morton
2013-06-13  3:54       ` Kent Overstreet
2013-06-13  5:46         ` Andrew Morton [this message]
2013-06-13 18:53       ` Tejun Heo
2013-06-13 19:04         ` Andrew Morton
2013-06-13 19:15           ` Tejun Heo
2013-06-13 19:23             ` Andrew Morton
2013-06-13 19:35               ` Tejun Heo
2013-06-13 22:10                 ` Andrew Morton
2013-06-13 22:30                   ` Tejun Heo
2013-06-13 22:35                     ` Andrew Morton
2013-06-13 23:13                       ` Tejun Heo
2013-06-13 23:23                         ` Tejun Heo
2013-06-19  1:32               ` Kent Overstreet
2013-06-13 19:08         ` J. Bruce Fields
2013-06-13 19:09         ` Jeff Layton
2013-06-13 21:53         ` Kent Overstreet
2013-06-13 19:06   ` Tejun Heo
2013-06-13 19:13     ` Andrew Morton
2013-06-13 19:21       ` Tejun Heo
2013-06-13 21:14         ` Kent Overstreet
2013-06-13 21:50           ` Tejun Heo
2013-06-13 21:58             ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130612224623.bf303e33.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux-foundation.org \
    --cc=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nab@linux-iscsi.org \
    --cc=oleg@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.