Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org,
	oleg@redhat.com, axboe@kernel.dk, vgoyal@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock
Date: Thu, 22 Dec 2011 15:00:47 -0800	[thread overview]
Message-ID: <20111222230047.GN17084@google.com> (raw)
In-Reply-To: <20111222145426.5844df96.akpm@linux-foundation.org>

Hello, Andrew.

On Thu, Dec 22, 2011 at 02:54:26PM -0800, Andrew Morton wrote:
> > These stats are userland visible and quite useful ones if blkcg is in
> > use.  I don't really see how these can be removed.
> 
> What stats?

The ones allocated in the last patch.  blk_group_cpu_stats.

> And why are we doing percpu *allocation* so deep in the code?  You mean
> we're *creating* stats counters on an IO path?  Sounds odd.  Where is
> this code?

Please read below.

> > > > > Or how about we fix the percpu memory allocation code so that it
> > > > > propagates the gfp flags, then delete this patchset?
> > > > 
> > > > Oh, no, this is gonna make things *way* more complex.  I tried.
> > > 
> > > But there's a difference between fixing a problem and working around it.
> > 
> > Yeah, that was my first direction too.  The reason why percpu can't do
> > NOIO is the same one why vmalloc can't do it.  It reaches pretty deep
> > into page table code and I don't think doing all that churning is
> > worthwhile or even desirable.  An altnernative approach would be
> > implementing transparent front buffer to percpu allocator, which I
> > *might* do if there really are more of these users, but I think
> > keeping percpu allocator painful to use from reclaim context isn't
> > such a bad idea.
> > 
> > There have been multiple requests for atomic allocation and they all
> > have been successfully pushed back, but IMHO this is a valid one and I
> > don't see a better way around the problem, so while I agree using
> > mempool for this is a workaround, I think it is a right choice, for
> > now, anyway.
> 
> For starters, doing pagetable allocation on the I/O path sounds nutty.
> 
> Secondly, GFP_NOIO is a *weaker* allocation mode than GFP_KERNEL.  By
> permitting it with this patchset, we have a kernel which is more likely
> to get oom failures.  Fixing the kernel to not perform GFP_NOIO
> allocations for these counters will result in a more robust kernel. 
> This is a good thing, which improves the kernel while avoiding adding
> more compexity elsewhere.
> 
> This patchset is the worst option and we should try much harder to avoid
> applying it!

The stats are per cgroup - request_queue pair.  We don't want to
allocate for all of them for each combination as there are
configurations with stupid number of request_queues and silly many
cgroups and #cgroups * #request_queue * #cpus can be huge.  So, we
want on-demand allocation.  While the stats are important, they are
not critical and allocations can be opportunistic.  If the allocation
fails this time, we can try it for the next time.

So, yeah, the suggested solution fits the problem.  If you have a
better idea, please don't be shy.

> > Yeah, some of PF_* flags already carry related role information.  I'm
> > not too sure how much pushing the whole thing into task_struct would
> > change tho.  We would need push/popping.  It could be simpler in some
> > cases but in essence wouldn't we have just relocated the position of
> > parameter?
> 
> The code would get considerably simpler.  The big benefit comes when
> you have deep call stacks - we're presently passing a gfp_t down five
> layers of function call while none of the intermediate functions even
> use the thing - they just pass it on to the next guy.  Pass it via the
> task_struct and all that goes away.  It would make maintenance a lot
> easier - at present if you want to add a new kmalloc() to a leaf
> function you need to edit all five layers of caller functions.

Hmmm... yeah, the relocation could save a lot of hassle, I suppose.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2011-12-22 23:00 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-22 21:45 [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Tejun Heo
2011-12-22 21:45 ` [PATCH 1/7] mempool: fix and document synchronization and memory barrier usage Tejun Heo
2011-12-22 21:45 ` [PATCH 2/7] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22 21:45 ` [PATCH 3/7] mempool: fix first round failure behavior Tejun Heo
2011-12-22 21:45 ` [PATCH 4/7] mempool: factor out mempool_fill() Tejun Heo
2011-12-22 21:45 ` [PATCH 5/7] mempool: separate out __mempool_create() Tejun Heo
2011-12-22 21:45 ` [PATCH 6/7] mempool, percpu: implement percpu mempool Tejun Heo
2011-12-22 21:45 ` [PATCH 7/7] block: fix deadlock through percpu allocation in blk-cgroup Tejun Heo
2011-12-23  1:00   ` Vivek Goyal
2011-12-23 22:54     ` Tejun Heo
2011-12-22 21:59 ` [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Andrew Morton
2011-12-22 22:09   ` Tejun Heo
2011-12-22 22:20     ` Andrew Morton
2011-12-22 22:41       ` Tejun Heo
2011-12-22 22:54         ` Andrew Morton
2011-12-22 23:00           ` Tejun Heo [this message]
2011-12-22 23:16             ` Andrew Morton
2011-12-22 23:24               ` Tejun Heo
2011-12-22 23:41                 ` Andrew Morton
2011-12-22 23:54                   ` Tejun Heo
2011-12-23  1:14                     ` Andrew Morton
2011-12-23 15:17                       ` Vivek Goyal
2011-12-27 18:34                       ` Tejun Heo
2011-12-27 21:20                         ` Andrew Morton
2011-12-27 21:44                           ` Tejun Heo
2011-12-27 21:58                             ` Andrew Morton
2011-12-27 22:22                               ` Tejun Heo
2011-12-23  1:21                   ` Vivek Goyal
2011-12-23  1:38                     ` Andrew Morton
2011-12-23  2:54                       ` Vivek Goyal
2011-12-23  3:11                         ` Andrew Morton
2011-12-23 14:58                           ` Vivek Goyal
2011-12-27 21:25                             ` Andrew Morton
2011-12-27 22:07                               ` Tejun Heo
2011-12-27 22:21                                 ` Andrew Morton
2011-12-27 22:30                                   ` Tejun Heo
2012-01-16 15:26                                     ` Vivek Goyal
2011-12-23  1:40       ` Vivek Goyal
2011-12-23  1:58         ` Andrew Morton
2011-12-23  2:56           ` Vivek Goyal
2011-12-26  6:05             ` KAMEZAWA Hiroyuki
2011-12-27 17:52               ` Tejun Heo
2011-12-28  0:14                 ` KAMEZAWA Hiroyuki
2011-12-28  0:41                   ` Tejun Heo
2012-01-05  1:28                     ` Tejun Heo
2012-01-16 15:28                       ` Vivek Goyal
2012-02-09 23:58                       ` Tejun Heo
2012-02-10 16:26                         ` Vivek Goyal
2012-02-13 22:31                           ` Tejun Heo
2012-02-15 15:43                             ` Vivek Goyal
2011-12-23 14:46           ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111222230047.GN17084@google.com \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nate@cpanel.net \
    --cc=oleg@redhat.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).