From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932311AbWF2TdK (ORCPT ); Thu, 29 Jun 2006 15:33:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932318AbWF2TdK (ORCPT ); Thu, 29 Jun 2006 15:33:10 -0400 Received: from omx2-ext.sgi.com ([192.48.171.19]:20132 "EHLO omx2.sgi.com") by vger.kernel.org with ESMTP id S932311AbWF2TdI (ORCPT ); Thu, 29 Jun 2006 15:33:08 -0400 Message-ID: <44A42AEE.1010107@sgi.com> Date: Thu, 29 Jun 2006 12:33:02 -0700 From: Jay Lan User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040906 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Morton Cc: Paul Jackson , Valdis.Kletnieks@vt.edu, jlan@engr.sgi.com, nagar@watson.ibm.com, balbir@in.ibm.com, csturtiv@sgi.com, linux-kernel@vger.kernel.org Subject: Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats References: <44892610.6040001@watson.ibm.com> <20060609042129.ae97018c.akpm@osdl.org> <4489EE7C.3080007@watson.ibm.com> <449999D1.7000403@engr.sgi.com> <44999A98.8030406@engr.sgi.com> <44999F5A.2080809@watson.ibm.com> <4499D7CD.1020303@engr.sgi.com> <449C2181.6000007@watson.ibm.com> <20060623141926.b28a5fc0.akpm@osdl.org> <449C6620.1020203@engr.sgi.com> <20060623164743.c894c314.akpm@osdl.org> <449CAA78.4080902@watson.ibm.com> <20060623213912.96056b02.akpm@osdl.org> <449CD4B3.8020300@watson.ibm.com> <44A01A50.1050403@sgi.com> <20060626105548.edef4c64.akpm@osdl.org> <44A020CD.30903@watson.ibm.com> <20060626111249.7aece36e.akpm@osdl.org> <44A026ED.8080903@sgi.com> <20060626113959.839d72bc.akpm@osdl.org> <44A2F50D.8030306@engr.sgi.com> <20060628145341.529a61ab.akpm@osdl.org> <44A2FC72.9090407@engr.sgi.com> <20060629014050.d3bf0be4.pj@sgi.com> <200606291230.k5TCUg45030710@turing-police.cc.vt.edu> <20060629094408.360ac157.pj@sgi.com> <20060629110107.2e56310b.akpm@osdl.org> In-Reply-To: <20060629110107.2e56310b.akpm@osdl.org> X-Enigmail-Version: 0.86.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > On Thu, 29 Jun 2006 09:44:08 -0700 > Paul Jackson wrote: > > >>>You're probably correct on that model. However, it all depends on the actual >>>workload. Are people who actually have large-CPU (>256) systems actually >>>running fork()-heavy things like webservers on them, or are they running things >>>like database servers and computations, which tend to have persistent >>>processes? >> >>It may well be mostly as you say - the large-CPU systems not running >>the fork() heavy jobs. >> >>Sooner or later, someone will want to run a fork()-heavy job on a >>large-CPU system. On a 1024 CPU system, it would apparently take >>just 14 exits/sec/CPU to hit this bottleneck, if Jay's number of >>14000 applied. >> >>Chris Sturdivant's reply is reasonable -- we'll hit it sooner or later, >>and deal with it then. >> > > > I agree, and I'm viewing this as blocking the taskstats merge. Because if > this _is_ a problem then it's a big one because fixing it will be > intrusive, and might well involve userspace-visible changes. > > The only ways I can see of fixing the problem generally are to either > > a) throw more CPU(s) at stats collection: allow userspace to register for > "stats generated by CPU N", then run a stats collection daemon on each > CPU or Clearly this approach (or the per-cpuset as Paul suggested) can solve large-CPU system issues. As technology advances, this _WILL_ become a problem sooner or later. However, taskstats header carries a version number. Would a change like this too intrusive to add to a later version? Regards, - jay > > b) make the kernel recognise when it's getting overloaded and switch to > some degraded mode where it stops trying to send all the data to > userspace - just send a summary, or a "we goofed" message or something.