From: Vivek Goyal <vgoyal@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>,
avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org,
oleg@redhat.com, axboe@kernel.dk, linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock
Date: Thu, 22 Dec 2011 20:21:12 -0500 [thread overview]
Message-ID: <20111223012112.GB12738@redhat.com> (raw)
In-Reply-To: <20111222154138.d6c583e3.akpm@linux-foundation.org>
On Thu, Dec 22, 2011 at 03:41:38PM -0800, Andrew Morton wrote:
> On Thu, 22 Dec 2011 15:24:33 -0800
> Tejun Heo <tj@kernel.org> wrote:
>
> > Hello,
> >
> > On Thu, Dec 22, 2011 at 03:16:49PM -0800, Andrew Morton wrote:
> > > > The ones allocated in the last patch. blk_group_cpu_stats.
> > >
> > > What last patch.
> > >
> > > I can find no occurence of "blk_group_cpu_stats" on linux-kernel or in
> > > the kernel tree.
> >
> > Sorry it's blkio_group_stats_cpu. It's in the seventh path in this
> > series.
> >
> > > > The stats are per cgroup - request_queue pair. We don't want to
> > > > allocate for all of them for each combination as there are
> > > > configurations with stupid number of request_queues and silly many
> > > > cgroups and #cgroups * #request_queue * #cpus can be huge. So, we
> > > > want on-demand allocation. While the stats are important, they are
> > > > not critical and allocations can be opportunistic. If the allocation
> > > > fails this time, we can try it for the next time.
> > >
> > > Without code to look at I am at a loss.
> >
> > block/blk-cgroup.c blk-throttle.c cfq-iosched.c. Have fun.
> >
> > > request_queues are allocated in blk_alloc_queue_node(), which uses
> > > GFP_KERNEL (and also mysteriously takes a gfp_t arg).
> >
> > Yeah, sure, we *can* allocate everything for every combination when
> > either request_queue or cgroup comes up. That's the thing I tried to
> > explain in the above quoted paragraph.
> >
>
> All the code I'm looking at assumes that blkio_group.stats_cpu is
> non-zero. Won't the kerenl just go splat if that allocation failed?
If per cpu stat allocation fails, we fail the whole group allocation
and IO is accounted to root group and is tried again when new IO
request comes in.
Look at throtl_alloc_tg() in block/blk-throttle.c
>
> If the code *does* correctly handle ->stats_cpu == NULL then we have
> options.
>
> a) Give userspace a new procfs/debugfs file to start stats gathering
> on a particular cgroup/request_queue pair. Allocate the stats
> memory in that.
>
> b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
> and return zeroes for this first call.
But the purpose of stats is that they are gathered even if somebody
has not read them even once? So if I create a cgroup and put some
task into it which does some IO, I think stat collection should start
immediately without user taking any action. Forcing the user to first
read a stat before the collection starts is kind of odd to me.
>
> c) Or change the low-level code to do
> blkio_group.want_stats_cpu=true, then test that at the top level
> after we've determined that blkio_group.stats_cpu is NULL.
>
> d) Or, worse, punt the allocation into a workqueue thread.
I implemented a patch to punt the allocation using a worker thread. Tejun
did not like it. I personally think that it is less intrusive to fix this
specific problem.
https://lkml.org/lkml/2011/12/19/291
>
> Note that all these option will permit us to use GFP_KERNEL, which is
> better.
>
> Note that a) and b) means that users get control over whether these
> stats are accumulated at all, so many won't incur needless memory and
> CPU consumption.
>
> I think I like b). Fix the code so it doesn't oops when ->stats_cpu is
> NULL, then turn on stats gathering the first time someone tries to read
> the stats.
>
> (Someone appears to have misspelled "throttle" as "throtl" for no
> apparent reason about 1000 times. Sigh.)
That someone would be me. I thought that throtl communicates the meaning
and keeps the length of all the strings relatively short. But if it does
not look good, I can change it.
Thanks
Vivek
next prev parent reply other threads:[~2011-12-23 1:21 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 21:45 [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Tejun Heo
2011-12-22 21:45 ` [PATCH 1/7] mempool: fix and document synchronization and memory barrier usage Tejun Heo
2011-12-22 21:45 ` [PATCH 2/7] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22 21:45 ` [PATCH 3/7] mempool: fix first round failure behavior Tejun Heo
2011-12-22 21:45 ` [PATCH 4/7] mempool: factor out mempool_fill() Tejun Heo
2011-12-22 21:45 ` [PATCH 5/7] mempool: separate out __mempool_create() Tejun Heo
2011-12-22 21:45 ` [PATCH 6/7] mempool, percpu: implement percpu mempool Tejun Heo
2011-12-22 21:45 ` [PATCH 7/7] block: fix deadlock through percpu allocation in blk-cgroup Tejun Heo
2011-12-23 1:00 ` Vivek Goyal
2011-12-23 22:54 ` Tejun Heo
2011-12-22 21:59 ` [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Andrew Morton
2011-12-22 22:09 ` Tejun Heo
2011-12-22 22:20 ` Andrew Morton
2011-12-22 22:41 ` Tejun Heo
2011-12-22 22:54 ` Andrew Morton
2011-12-22 23:00 ` Tejun Heo
2011-12-22 23:16 ` Andrew Morton
2011-12-22 23:24 ` Tejun Heo
2011-12-22 23:41 ` Andrew Morton
2011-12-22 23:54 ` Tejun Heo
2011-12-23 1:14 ` Andrew Morton
2011-12-23 15:17 ` Vivek Goyal
2011-12-27 18:34 ` Tejun Heo
2011-12-27 21:20 ` Andrew Morton
2011-12-27 21:44 ` Tejun Heo
2011-12-27 21:58 ` Andrew Morton
2011-12-27 22:22 ` Tejun Heo
2011-12-23 1:21 ` Vivek Goyal [this message]
2011-12-23 1:38 ` Andrew Morton
2011-12-23 2:54 ` Vivek Goyal
2011-12-23 3:11 ` Andrew Morton
2011-12-23 14:58 ` Vivek Goyal
2011-12-27 21:25 ` Andrew Morton
2011-12-27 22:07 ` Tejun Heo
2011-12-27 22:21 ` Andrew Morton
2011-12-27 22:30 ` Tejun Heo
2012-01-16 15:26 ` Vivek Goyal
2011-12-23 1:40 ` Vivek Goyal
2011-12-23 1:58 ` Andrew Morton
2011-12-23 2:56 ` Vivek Goyal
2011-12-26 6:05 ` KAMEZAWA Hiroyuki
2011-12-27 17:52 ` Tejun Heo
2011-12-28 0:14 ` KAMEZAWA Hiroyuki
2011-12-28 0:41 ` Tejun Heo
2012-01-05 1:28 ` Tejun Heo
2012-01-16 15:28 ` Vivek Goyal
2012-02-09 23:58 ` Tejun Heo
2012-02-10 16:26 ` Vivek Goyal
2012-02-13 22:31 ` Tejun Heo
2012-02-15 15:43 ` Vivek Goyal
2011-12-23 14:46 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111223012112.GB12738@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=axboe@kernel.dk \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nate@cpanel.net \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).