From: Vivek Goyal <vgoyal@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>,
avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org,
oleg@redhat.com, axboe@kernel.dk, linux-kernel@vger.kernel.org
Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock
Date: Fri, 23 Dec 2011 09:58:56 -0500 [thread overview]
Message-ID: <20111223145856.GB16818@redhat.com> (raw)
In-Reply-To: <20111222191144.78aec23a.akpm@linux-foundation.org>
On Thu, Dec 22, 2011 at 07:11:44PM -0800, Andrew Morton wrote:
> On Thu, 22 Dec 2011 21:54:11 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
>
> > On Thu, Dec 22, 2011 at 05:38:20PM -0800, Andrew Morton wrote:
> > > On Thu, 22 Dec 2011 20:21:12 -0500 Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > > On Thu, Dec 22, 2011 at 03:41:38PM -0800, Andrew Morton wrote:
> > > > >
> > > > > If the code *does* correctly handle ->stats_cpu == NULL then we have
> > > > > options.
> > > > >
> > > > > a) Give userspace a new procfs/debugfs file to start stats gathering
> > > > > on a particular cgroup/request_queue pair. Allocate the stats
> > > > > memory in that.
> > > > >
> > > > > b) Or allocate stats_cpu on the first call to blkio_read_stat_cpu()
> > > > > and return zeroes for this first call.
> > > >
> > > > But the purpose of stats is that they are gathered even if somebody
> > > > has not read them even once?
> > >
> > > That's not a useful way of using stats. The normal usage would be to
> > > record the stats then start the workload then monitor how the stats
> > > have changed as work proceeds.
> >
> > I have atleast one example "iostat" which does not follow this. Its
> > first report shows the total stats since the system boot and each
> > subsequent report covers time since previous report. With stats being
> > available since the cgroup creation time, one can think of extending
> > iostat tool to display per IO cgroup stats too.
>
> If that's useful (dubious) then it can be addressed by creating the
> stats when a device is bound to the cgroup (below).
>
> > Also we have a knob "reset_stats" to reset all the stats to zero. So
> > one can first reset stats, starts workload and then monitor it (if one
> > does not like stats since the cgroup creation time).
> >
> > >
> > > > So if I create a cgroup and put some
> > > > task into it which does some IO, I think stat collection should start
> > > > immediately without user taking any action.
> > >
> > > If you really want to know the stats since cgroup creation then trigger
> > > the stats initialisation from userspace when creating the blkio_cgroup.
> >
> > These per cpu stats are per cgroup per device. So if a workload in a
> > cgroup is doing IO to 4 devices, we allocate 4 percpu stat areas for
> > stats. So at cgroup creation time we just don't know how many of these
> > to create and also it does not cover the case of device hotplug after
> > cgroup creation.
>
> Mark the cgroup as "needing stats" then allocate the stats (if needed)
> when a device is bound to the cgroup. Rather than on first I/O.
This will work for the throttling case where a user has to specifically
put throttling rules for each device and that can be considered as binding
device to the cgroup. In that case we will not be collecting the stats
for which there are no rules for the device. I guess I can live with that.
But it still does not work for the case of CFQ where there might not
be any user initiated device binding to cgroup. User might just specify
a cgroup weight (like task ioprio) and binding to device is automatically
created on first IO to the device from the cgroup. User does not initiate
any specific binding.
>
> > >
> > > > Forcing the user to first
> > > > read a stat before the collection starts is kind of odd to me.
> > >
> > > Well one could add a separate stats_enable knob. Doing it
> > > automatically from read() would be for approximate-back-compatibility
> > > with existing behaviour.
> > >
> > > Plus (again) this way we also avoid burdening non-stats-users with the
> > > overhead of stats.
> >
> > Even if we do that we have the problem with hoplugged device. Assume a
> > cgroup created, stats enabled now a new devices shows up and some task
> > in the group does IO on that device. Now we need to create percpu data
> > area for that cgroup-device pair dynamically in IO path and we are back
> > to the same problem.
>
> Why do the allocation during I/O? Can't it be done in the hotplug handler?
>
Even if we can do it in hotplug handler it will be very wasteful of
memory. So if there are 100 IO cgroups in the system, upon every block
device hotplug, we will allocate per cpu memory for all the 100 cgroups,
irrespective of the fact whether they are doing IO to the device or not.
Now expand this to a system with 100 cgroups and 100 Luns. 10000
allocations for no reason. (Even if we do it for cgroups needing stats,
does not help much). Current scheme allocates memory for the group
only if a sepcific cgroup is doing IO to a specific block device.
Thanks
Vivek
next prev parent reply other threads:[~2011-12-23 14:59 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 21:45 [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Tejun Heo
2011-12-22 21:45 ` [PATCH 1/7] mempool: fix and document synchronization and memory barrier usage Tejun Heo
2011-12-22 21:45 ` [PATCH 2/7] mempool: drop unnecessary and incorrect BUG_ON() from mempool_destroy() Tejun Heo
2011-12-22 21:45 ` [PATCH 3/7] mempool: fix first round failure behavior Tejun Heo
2011-12-22 21:45 ` [PATCH 4/7] mempool: factor out mempool_fill() Tejun Heo
2011-12-22 21:45 ` [PATCH 5/7] mempool: separate out __mempool_create() Tejun Heo
2011-12-22 21:45 ` [PATCH 6/7] mempool, percpu: implement percpu mempool Tejun Heo
2011-12-22 21:45 ` [PATCH 7/7] block: fix deadlock through percpu allocation in blk-cgroup Tejun Heo
2011-12-23 1:00 ` Vivek Goyal
2011-12-23 22:54 ` Tejun Heo
2011-12-22 21:59 ` [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Andrew Morton
2011-12-22 22:09 ` Tejun Heo
2011-12-22 22:20 ` Andrew Morton
2011-12-22 22:41 ` Tejun Heo
2011-12-22 22:54 ` Andrew Morton
2011-12-22 23:00 ` Tejun Heo
2011-12-22 23:16 ` Andrew Morton
2011-12-22 23:24 ` Tejun Heo
2011-12-22 23:41 ` Andrew Morton
2011-12-22 23:54 ` Tejun Heo
2011-12-23 1:14 ` Andrew Morton
2011-12-23 15:17 ` Vivek Goyal
2011-12-27 18:34 ` Tejun Heo
2011-12-27 21:20 ` Andrew Morton
2011-12-27 21:44 ` Tejun Heo
2011-12-27 21:58 ` Andrew Morton
2011-12-27 22:22 ` Tejun Heo
2011-12-23 1:21 ` Vivek Goyal
2011-12-23 1:38 ` Andrew Morton
2011-12-23 2:54 ` Vivek Goyal
2011-12-23 3:11 ` Andrew Morton
2011-12-23 14:58 ` Vivek Goyal [this message]
2011-12-27 21:25 ` Andrew Morton
2011-12-27 22:07 ` Tejun Heo
2011-12-27 22:21 ` Andrew Morton
2011-12-27 22:30 ` Tejun Heo
2012-01-16 15:26 ` Vivek Goyal
2011-12-23 1:40 ` Vivek Goyal
2011-12-23 1:58 ` Andrew Morton
2011-12-23 2:56 ` Vivek Goyal
2011-12-26 6:05 ` KAMEZAWA Hiroyuki
2011-12-27 17:52 ` Tejun Heo
2011-12-28 0:14 ` KAMEZAWA Hiroyuki
2011-12-28 0:41 ` Tejun Heo
2012-01-05 1:28 ` Tejun Heo
2012-01-16 15:28 ` Vivek Goyal
2012-02-09 23:58 ` Tejun Heo
2012-02-10 16:26 ` Vivek Goyal
2012-02-13 22:31 ` Tejun Heo
2012-02-15 15:43 ` Vivek Goyal
2011-12-23 14:46 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111223145856.GB16818@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=axboe@kernel.dk \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nate@cpanel.net \
--cc=oleg@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).