From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755101Ab2APP0S (ORCPT ); Mon, 16 Jan 2012 10:26:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:13098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753711Ab2APP0Q (ORCPT ); Mon, 16 Jan 2012 10:26:16 -0500 Date: Mon, 16 Jan 2012 10:26:05 -0500 From: Vivek Goyal To: Tejun Heo Cc: Andrew Morton , avi@redhat.com, nate@cpanel.net, cl@linux-foundation.org, oleg@redhat.com, axboe@kernel.dk, linux-kernel@vger.kernel.org, Divyesh Shah Subject: Re: [PATCHSET] block, mempool, percpu: implement percpu mempool and fix blkcg percpu alloc deadlock Message-ID: <20120116152605.GA9129@redhat.com> References: <20111222154138.d6c583e3.akpm@linux-foundation.org> <20111223012112.GB12738@redhat.com> <20111222173820.3461be5d.akpm@linux-foundation.org> <20111223025411.GD12738@redhat.com> <20111222191144.78aec23a.akpm@linux-foundation.org> <20111223145856.GB16818@redhat.com> <20111227132501.ad7f895f.akpm@linux-foundation.org> <20111227220753.GH17712@google.com> <20111227142156.7943446e.akpm@linux-foundation.org> <20111227223012.GJ17712@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111227223012.GJ17712@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 27, 2011 at 02:30:12PM -0800, Tejun Heo wrote: > Hello, Andrew. > > On Tue, Dec 27, 2011 at 02:21:56PM -0800, Andrew Morton wrote: > > For those users who don't want the stats, stats shouldn't > > consume any resources at all. > > Hmmm.... For common use cases - a few cgroups doing IOs to most likely > single physical device and maybe a couple virtual ones, I don't think > this would show up anywhere both in terms of memory and process > overhead. While avoding it would be nice, I don't think that should > be the focus of optimization or design decisions. > > > And I bet that the majority of the minority who want stats simply want > > to know "how much IO is this cgroup doing", and don't need per-cgroup, > > per-device accounting. > > > > And it could be that the minority of the minority who want per-device, > > per-cgroup stats only want those for a minority of the time. > > > > IOW, what happens if we give 'em atomic_add() and be done with it? > > I really don't know. That surely is an enticing idea tho. Jens, > Vivek, can you guys chime in? Is gutting out (or drastically > simplifying) cgroup-dev stats an option? Are there users who are > actually interested in this stuff? Ok, I am back after a break of 3 weeks. So time to restart the discussion. So we seem to be talking of two things. - Use atomic_add() for stats. - Do not keep stats per cgroup/per device instead just keep gloabl per cgroup stat. For the first point, is atomic operation really that cheap then taking spin lock. The whole point of introducing per cpu data structure was to make fast path lockless. My understanding is that atomic operation on IO submission path is expensive so to me it really does not solve the overhead problem? Initially google folks (Divyesh Shah) introduced additional files to display additional stats which per per cgroup per device. I am assuming they are making use of it. To me knowing how IO is distributed to different devies from a cgroup is a good thing to know. Keeping the stats per device also helps that aggregation of stats happens from process context and we reduce the contention on stat update from various devices. So to me it is good thing to keep stats per device and then display these as user find them useful (Either per cgroup or per cgroup per device). So to me none of the above options are really solving the issue of reducing the cost/overhead of atomic operation in IO submission path. Please correct me if missed something here. Thanks Vivek