From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753520Ab0CZVvX (ORCPT ); Fri, 26 Mar 2010 17:51:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39802 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752034Ab0CZVvV (ORCPT ); Fri, 26 Mar 2010 17:51:21 -0400 Date: Fri, 26 Mar 2010 22:49:06 +0100 From: Oleg Nesterov To: Balbir Singh Cc: Andrew Morton , Americo Wang , "Eric W. Biederman" , Hidetoshi Seto , Ingo Molnar , Peter Zijlstra , Roland McGrath , Spencer Candland , Stanislaw Gruszka , linux-kernel@vger.kernel.org Subject: Re: [RFC,PATCH 1/2] cputimers/proc: do_task_stat()->task_times() can race with getrusage() Message-ID: <20100326214906.GA18467@redhat.com> References: <20100324204550.GA31777@redhat.com> <20100326035344.GQ3308@balbir.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100326035344.GQ3308@balbir.in.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/26, Balbir Singh wrote: > > * Oleg Nesterov [2010-03-24 21:45:50]: > > > do_task_stat()->task_times() can race with getrusage(), they both can > > try to update task->prev_Xtime at the same time. > > > > Remove this bit of d180c5bc "sched: Introduce task_times() to replace > > task_{u,s}time()". > > One of the reasons for adding this accuracy was to avoid sampling > based noise and errors that occur with utime and stime. > > As long as there is no preemption during the assignment, I think we > should be OK. I don't think preemp_disable() can help. Probably we can use task_lock(). As for do_task_stat()->thread_group_times(), I think we can make it rc-safe without breaking /bin/top. 1. add spin_lock_irqsave(&sig->cputimer.lock) around sig->prev_Xtime = max(...) 2. Add a couple of barriers into thread_group_cputime() and __exit_signal() so that without ->siglock we can never overestimate utime/stime if we race with exit. If we underestimate these values, this should be fine: - the error can't be "systematic", the next read from /prod/pid/stat will see the updated values - the prev_Xtime logic in thread_group_times() ensures the reported time can never go back. IOW: at worse, cat /proc/pid/stat can miss the time which the exited thread spent on CPU after the previous read of /proc/pid/stat. This looks absolutely harmless, the next read will see this time. Probably we can even detect this case if we look at sig->nr_threads and retry. I'll try to make patches unless someone has a better idea. I just can't accept the fact that we are doing while_each_thread() under ->siglock here ;) Oleg.