All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Tejun Heo <tj@kernel.org>
Cc: avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org,
	oleg@redhat.com, axboe@kernel.dk, vgoyal@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock
Date: Thu, 22 Dec 2011 14:54:26 -0800	[thread overview]
Message-ID: <20111222145426.5844df96.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111222224117.GL17084@google.com>

On Thu, 22 Dec 2011 14:41:17 -0800
Tejun Heo <tj@kernel.org> wrote:

> Hello, Andrew.
> 
> On Thu, Dec 22, 2011 at 02:20:58PM -0800, Andrew Morton wrote:
> > Don't just consider my suggestions - please try to come up with your own
> > alternatives too!  If all else fails then this patch is a last resort.
> 
> Umm... this is my alternative.

We're beyond the point where aany additional kernel complexity should
be considered a regression.

> > > but apparently those percpu stats reduced
> > > CPU overhead significantly.
> > 
> > Deleting them would save even more CPU.
> > 
> > Or make them runtime or compile-time configurable, so only the
> > developers see the impact.
> > 
> > Some specifics on which counters are causing the problems would help here.
> 
> These stats are userland visible and quite useful ones if blkcg is in
> use.  I don't really see how these can be removed.

What stats?

And why are we doing percpu *allocation* so deep in the code?  You mean
we're *creating* stats counters on an IO path?  Sounds odd.  Where is
this code?

> > > > Or how about we fix the percpu memory allocation code so that it
> > > > propagates the gfp flags, then delete this patchset?
> > > 
> > > Oh, no, this is gonna make things *way* more complex.  I tried.
> > 
> > But there's a difference between fixing a problem and working around it.
> 
> Yeah, that was my first direction too.  The reason why percpu can't do
> NOIO is the same one why vmalloc can't do it.  It reaches pretty deep
> into page table code and I don't think doing all that churning is
> worthwhile or even desirable.  An altnernative approach would be
> implementing transparent front buffer to percpu allocator, which I
> *might* do if there really are more of these users, but I think
> keeping percpu allocator painful to use from reclaim context isn't
> such a bad idea.
> 
> There have been multiple requests for atomic allocation and they all
> have been successfully pushed back, but IMHO this is a valid one and I
> don't see a better way around the problem, so while I agree using
> mempool for this is a workaround, I think it is a right choice, for
> now, anyway.

For starters, doing pagetable allocation on the I/O path sounds nutty.

Secondly, GFP_NOIO is a *weaker* allocation mode than GFP_KERNEL.  By
permitting it with this patchset, we have a kernel which is more likely
to get oom failures.  Fixing the kernel to not perform GFP_NOIO
allocations for these counters will result in a more robust kernel. 
This is a good thing, which improves the kernel while avoiding adding
more compexity elsewhere.

This patchset is the worst option and we should try much harder to avoid
applying it!

> > > If we're gonna have many more NOIO percpu users, which I don't
> > > think we would or should, that might make sense but, for fringe
> > > cases, extending mempool to cover percpu is a much better sized
> > > solution.
> > 
> > I've long felt that we goofed with the gfp_flags thing and that it
> > should be a field in the task_struct.  Now *that* would be a large
> > patch!
> 
> Yeah, some of PF_* flags already carry related role information.  I'm
> not too sure how much pushing the whole thing into task_struct would
> change tho.  We would need push/popping.  It could be simpler in some
> cases but in essence wouldn't we have just relocated the position of
> parameter?

The code would get considerably simpler.  The big benefit comes when
you have deep call stacks - we're presently passing a gfp_t down five
layers of function call while none of the intermediate functions even
use the thing - they just pass it on to the next guy.  Pass it via the
task_struct and all that goes away.  It would make maintenance a lot
easier - at present if you want to add a new kmalloc() to a leaf
function you need to edit all five layers of caller functions.

  reply	other threads:[~2011-12-22 22:54 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-22 21:45 [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Tejun Heo
2011-12-22 21:45 ` [PATCH 1/7] mempool: fix and document synchronization and memory barrier usage Tejun Heo
2011-12-22 21:45 ` [PATCH 2/7] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22 21:45 ` [PATCH 3/7] mempool: fix first round failure behavior Tejun Heo
2011-12-22 21:45 ` [PATCH 4/7] mempool: factor out mempool_fill() Tejun Heo
2011-12-22 21:45 ` [PATCH 5/7] mempool: separate out __mempool_create() Tejun Heo
2011-12-22 21:45 ` [PATCH 6/7] mempool, percpu: implement percpu mempool Tejun Heo
2011-12-22 21:45 ` [PATCH 7/7] block: fix deadlock through percpu allocation in blk-cgroup Tejun Heo
2011-12-23  1:00   ` Vivek Goyal
2011-12-23 22:54     ` Tejun Heo
2011-12-22 21:59 ` [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Andrew Morton
2011-12-22 22:09   ` Tejun Heo
2011-12-22 22:20     ` Andrew Morton
2011-12-22 22:41       ` Tejun Heo
2011-12-22 22:54         ` Andrew Morton [this message]
2011-12-22 23:00           ` Tejun Heo
2011-12-22 23:16             ` Andrew Morton
2011-12-22 23:24               ` Tejun Heo
2011-12-22 23:41                 ` Andrew Morton
2011-12-22 23:54                   ` Tejun Heo
2011-12-23  1:14                     ` Andrew Morton
2011-12-23 15:17                       ` Vivek Goyal
2011-12-27 18:34                       ` Tejun Heo
2011-12-27 21:20                         ` Andrew Morton
2011-12-27 21:44                           ` Tejun Heo
2011-12-27 21:58                             ` Andrew Morton
2011-12-27 22:22                               ` Tejun Heo
2011-12-23  1:21                   ` Vivek Goyal
2011-12-23  1:38                     ` Andrew Morton
2011-12-23  2:54                       ` Vivek Goyal
2011-12-23  3:11                         ` Andrew Morton
2011-12-23 14:58                           ` Vivek Goyal
2011-12-27 21:25                             ` Andrew Morton
2011-12-27 22:07                               ` Tejun Heo
2011-12-27 22:21                                 ` Andrew Morton
2011-12-27 22:30                                   ` Tejun Heo
2012-01-16 15:26                                     ` Vivek Goyal
2011-12-23  1:40       ` Vivek Goyal
2011-12-23  1:58         ` Andrew Morton
2011-12-23  2:56           ` Vivek Goyal
2011-12-26  6:05             ` KAMEZAWA Hiroyuki
2011-12-27 17:52               ` Tejun Heo
2011-12-28  0:14                 ` KAMEZAWA Hiroyuki
2011-12-28  0:41                   ` Tejun Heo
2012-01-05  1:28                     ` Tejun Heo
2012-01-16 15:28                       ` Vivek Goyal
2012-02-09 23:58                       ` Tejun Heo
2012-02-10 16:26                         ` Vivek Goyal
2012-02-13 22:31                           ` Tejun Heo
2012-02-15 15:43                             ` Vivek Goyal
2011-12-23 14:46           ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111222145426.5844df96.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nate@cpanel.net \
    --cc=oleg@redhat.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.