From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+willy=40w.ods.org-S1751215AbWFZRwc@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751215AbWFZRwc (ORCPT <rfc822;willy@w.ods.org>);
	Mon, 26 Jun 2006 13:52:32 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751227AbWFZRwc
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Mon, 26 Jun 2006 13:52:32 -0400
Received: from e31.co.us.ibm.com ([32.97.110.149]:28043 "EHLO
	e31.co.us.ibm.com") by vger.kernel.org with ESMTP id S1751215AbWFZRwb
	(ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 26 Jun 2006 13:52:31 -0400
Message-ID: <44A01EDC.6040401@watson.ibm.com>
Date: Mon, 26 Jun 2006 13:52:28 -0400
From: Shailabh Nagar <nagar@watson.ibm.com>
User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Jay Lan <jlan@sgi.com>
CC: Andrew Morton <akpm@osdl.org>, jlan@engr.sgi.com, balbir@in.ibm.com,
       csturtiv@sgi.com, linux-kernel@vger.kernel.org
Subject: Re: [Patch][RFC]  Disabling per-tgid stats on task exit in taskstats
References: <44892610.6040001@watson.ibm.com>	<20060609010057.e454a14f.akpm@osdl.org>	<448952C2.1060708@in.ibm.com>	<20060609042129.ae97018c.akpm@osdl.org>	<4489EE7C.3080007@watson.ibm.com>	<449999D1.7000403@engr.sgi.com>	<44999A98.8030406@engr.sgi.com>	<44999F5A.2080809@watson.ibm.com>	<4499D7CD.1020303@engr.sgi.com>	<449C2181.6000007@watson.ibm.com>	<20060623141926.b28a5fc0.akpm@osdl.org>	<449C6620.1020203@engr.sgi.com>	<20060623164743.c894c314.akpm@osdl.org>	<449CAA78.4080902@watson.ibm.com> <20060623213912.96056b02.akpm@osdl.org> <449CD4B3.8020300@watson.ibm.com> <44A01A50.1050403@sgi.com>
In-Reply-To: <44A01A50.1050403@sgi.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
X-Mailing-List: linux-kernel@vger.kernel.org

Jay Lan wrote:

> Shailabh Nagar wrote:
>
>> Andrew Morton wrote:
>>
>>> On Fri, 23 Jun 2006 22:59:04 -0400
>>> Shailabh Nagar <nagar@watson.ibm.com> wrote:
>>>
>>>  
>>>
>>>>>> It was due to a loop in fill_tgid() when per-TG stats
>>>>>> data are assembled for netlink:
>>>>>>       do {
>>>>>>               rc = delayacct_add_tsk(stats, tsk);
>>>>>>               if (rc)
>>>>>>                       break;
>>>>>>
>>>>>>       } while_each_thread(first, tsk);
>>>>>>
>>>>>> and it is executed inside a lock.
>>>>>> Fortunately single threaded appls do not hit this code.
>>>>>>  
>>>>>>       
>>>>>
>>>>>
>>>>> Am I reading this right?  We do that loop when each thread within the
>>>>> thread group exits?
>>>>>
>>>>>     
>>>>
>>>>
>>>> Yes.
>>>>
>>>>  
>>>>
>>>>> How come?
>>>>>
>>>>>
>>>>>     
>>>>
>>>>
>>>> To get the sum of all per-tid data for threads that are currently 
>>>> alive.
>>>> This is returned to userspace with each thread exit.
>>>>   
>>>
>>>
>>>
>>> I realise that.  How about we stop doing it?
>>>
>>> When a thread exits it only makes sense to send up the stats for that
>>> thread. 
>>
>>
>>
>>> Why does the kernel assume that userspace is also interested in
>>> the accumulated stats of its siblings?  And if userspace _is_ 
>>> interested in
>>> that info, it's still present in-kernel and can be queried for.
>>>  
>>>
>> The reason for sending out sum of siblings's stats was as follows:
>> I didn't maintain a per-tgid data structure in-kernel where the 
>> exiting threads taskstats could be accumalated
>> , erroneously thinking that this would require such a structure to be 
>> *additionally* updated each  time a statististic
>> was being collected and that would be way too much overhead. Also to 
>> save on space. Thus if userspace wants to get the per-tgid stats for 
>> the thread group when the last thread exits, then it cannot
>> do so by querying since such a query only returns the sum of 
>> currently live threads (data from exited threads is lost).
>>
>> So, the current design chooses to return the sum of all siblings + 
>> self when each thread exits. Using this userspace
>> can maintain the per-tgid data for all currently living threads of 
>> the group + previously exited threads.
>>
>> But as pointed out in an earlier mail, it looks like this is 
>> unnecessarily elaborate way of trying to avoid maintaining
>> a separate per-tgid data structure in the kernel (in addition to the 
>> per-tid ones we already have).
>>
>> What can be done is to create a taskstats structure for a thread 
>> group the moment the *second* thread gets created.
>> Then each exiting thread can accumalate its stats to this struct. If 
>> userspace queries for per-tgid data, the sum of all
>> live threads + value in this struct can be returned. And when the 
>> last thread of the thread group exits, the struct's
>> value can be output.
>>
>> While this will mean an extra taskstats structure hanging around for 
>> the lifetime of a multithreaded app (not single threaded
>> ones), it should cut down on the overhead of running through all 
>> threads that we see in the current design.
>> More importantly, it will reduce the frequency of per-tgid data send 
>> to once for each thread group exit  instead of once
>> per thread exit.
>>
>> Will that work for everyone ?
>
>
> As long as the per-pid delayacct struct has a pointer to the per-tgid
> data struct 

It doesn't need to....the per-tgid thing is allocated inside tsk->signal.
Let me send the patch out and we can discuss the design/implementation.

> and deoes not need to go through the loop on every exit.

Yes. Thats not needed anymore.

>
>>
>>>>> Is there some better lock we can use in there?  It only has to be
>>>>> threadgroup-wide rather than kernel-wide.
>>>>>
>>>>>
>>>>>     
>>>>
>>>>
>>>> The lock we're holding is the tasklist_lock. To go through all the 
>>>> threads of a thread group
>>>> thats the only lock that can protect integrity of while_each_thread 
>>>> afaics.
>>>>   
>>>
>>>
>>>
>>> At present, yes.  That's persumably not impossible to fix.
>>>  
>>>
>> In the above design, if a userspace query for per-tgid data arrives, 
>> then I'll still need to run through
>> all the threads of a thread group (to return their sum + that of 
>> already exited threads accumalated in the
>> extra per-tgid taskstats struct).
>
>
> But, this query-reply logic can be separated from that executed at
> exit.

Yes, it already is.

Thanks,
Shailabh