From: Andrew Morton <akpm@linux-foundation.org>
To: Tejun Heo <tj@kernel.org>
Cc: avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org,
oleg@redhat.com, axboe@kernel.dk, vgoyal@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock
Date: Thu, 22 Dec 2011 17:14:32 -0800 [thread overview]
Message-ID: <20111222171432.e429c041.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111222235455.GT17084@google.com>
On Thu, 22 Dec 2011 15:54:55 -0800 Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Thu, Dec 22, 2011 at 03:41:38PM -0800, Andrew Morton wrote:
> > All the code I'm looking at assumes that blkio_group.stats_cpu is
> > non-zero. Won't the kerenl just go splat if that allocation failed?
> >
> > If the code *does* correctly handle ->stats_cpu == NULL then we have
> > options.
>
> I think it's supposed to just skip creating whole blk_group if percpu
> allocation fails, so ->stats_cpu of existing groups are guaranteed to
> be !%NULL.
What is the role of ->elevator_set_req_fn()? And when is it called?
It seems that we allocate the blkio_group within the
elevator_set_req_fn() context?
(Your stack trace in the "block: fix deadlock through percpu allocation
in blk-cgroup" changelog is some unuseful ACPI thing. It would be
better if it were to show the offending trace into the block code).
> > a) Give userspace a new procfs/debugfs file to start stats gathering
> > on a particular cgroup/request_queue pair. Allocate the stats
> > memory in that.
> >
> > b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
> > and return zeroes for this first call.
>
> Hmmm... IIRC, the stats aren't exported per cgroup-request_queue pair,
> so reads are issued per cgroup. We can't tell which request_queues
> userland is actually interested in.
Doesn't matter. The stats are allocated on a per-blkio_group basis.
blkio_read_stat_cpu() is passed the blkio_group. Populate ->stats_cpu
there.
Advantages:
- performs allocation with the more reliable GPF_KERNEL
- avoids burdening users with the space and CPU overhead when they're
not using the stats
- avoids adding more code into the mempool code.
> > c) Or change the low-level code to do
> > blkio_group.want_stats_cpu=true, then test that at the top level
> > after we've determined that blkio_group.stats_cpu is NULL.
>
> Not following. Where's the "top level"?
Somewhere appropriate where we can use GFP_KERNEL. ie: the correct
context for percpu_alloc().
Separately...
Mixing mempools and percpu_alloc() in the proposed fashion seems a
pretty poor fit. mempools are for high-frequency low-level allocations
which have key characteristics: there are typically a finite number of
elements in flight and we *know* that elements are being freed in a
timely manner.
This doesn't fit with percpu_alloc(), which is a very heavyweight
operation requiring GFP_KERNEL and it doesn't fit with
blkio_group_stats_cpu because blkio_group_stats_cpu does not have the
"freed in a timely manner" behaviour.
To resolve these things you've added the workqueue to keep the pool
populated, which turns percpu_mempool into a quite different concept
which happens to borrow some mempool code (not necessarily a bad thing).
This will result in some memory wastage, keeping that pool full.
More significantly, it's pretty unreliable: if the allocations outpace
the kernel thread's ability to refill the pool, all we can do is to
wait for the kernel thread to do some work. But we're holding
low-level locks while doing that wait, which will block the kernel
thread. Deadlock.
next prev parent reply other threads:[~2011-12-23 1:11 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 21:45 [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Tejun Heo
2011-12-22 21:45 ` [PATCH 1/7] mempool: fix and document synchronization and memory barrier usage Tejun Heo
2011-12-22 21:45 ` [PATCH 2/7] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22 21:45 ` [PATCH 3/7] mempool: fix first round failure behavior Tejun Heo
2011-12-22 21:45 ` [PATCH 4/7] mempool: factor out mempool_fill() Tejun Heo
2011-12-22 21:45 ` [PATCH 5/7] mempool: separate out __mempool_create() Tejun Heo
2011-12-22 21:45 ` [PATCH 6/7] mempool, percpu: implement percpu mempool Tejun Heo
2011-12-22 21:45 ` [PATCH 7/7] block: fix deadlock through percpu allocation in blk-cgroup Tejun Heo
2011-12-23 1:00 ` Vivek Goyal
2011-12-23 22:54 ` Tejun Heo
2011-12-22 21:59 ` [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Andrew Morton
2011-12-22 22:09 ` Tejun Heo
2011-12-22 22:20 ` Andrew Morton
2011-12-22 22:41 ` Tejun Heo
2011-12-22 22:54 ` Andrew Morton
2011-12-22 23:00 ` Tejun Heo
2011-12-22 23:16 ` Andrew Morton
2011-12-22 23:24 ` Tejun Heo
2011-12-22 23:41 ` Andrew Morton
2011-12-22 23:54 ` Tejun Heo
2011-12-23 1:14 ` Andrew Morton [this message]
2011-12-23 15:17 ` Vivek Goyal
2011-12-27 18:34 ` Tejun Heo
2011-12-27 21:20 ` Andrew Morton
2011-12-27 21:44 ` Tejun Heo
2011-12-27 21:58 ` Andrew Morton
2011-12-27 22:22 ` Tejun Heo
2011-12-23 1:21 ` Vivek Goyal
2011-12-23 1:38 ` Andrew Morton
2011-12-23 2:54 ` Vivek Goyal
2011-12-23 3:11 ` Andrew Morton
2011-12-23 14:58 ` Vivek Goyal
2011-12-27 21:25 ` Andrew Morton
2011-12-27 22:07 ` Tejun Heo
2011-12-27 22:21 ` Andrew Morton
2011-12-27 22:30 ` Tejun Heo
2012-01-16 15:26 ` Vivek Goyal
2011-12-23 1:40 ` Vivek Goyal
2011-12-23 1:58 ` Andrew Morton
2011-12-23 2:56 ` Vivek Goyal
2011-12-26 6:05 ` KAMEZAWA Hiroyuki
2011-12-27 17:52 ` Tejun Heo
2011-12-28 0:14 ` KAMEZAWA Hiroyuki
2011-12-28 0:41 ` Tejun Heo
2012-01-05 1:28 ` Tejun Heo
2012-01-16 15:28 ` Vivek Goyal
2012-02-09 23:58 ` Tejun Heo
2012-02-10 16:26 ` Vivek Goyal
2012-02-13 22:31 ` Tejun Heo
2012-02-15 15:43 ` Vivek Goyal
2011-12-23 14:46 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111222171432.e429c041.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=axboe@kernel.dk \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nate@cpanel.net \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.