From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752545Ab1AFKPn (ORCPT ); Thu, 6 Jan 2011 05:15:43 -0500 Received: from casper.infradead.org ([85.118.1.10]:36323 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752423Ab1AFKPm convert rfc822-to-8bit (ORCPT ); Thu, 6 Jan 2011 05:15:42 -0500 Subject: Re: [PATCH 4/5] perf_events: add cgroup support (v7) From: Peter Zijlstra To: Stephane Eranian Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, paulus@samba.org, davem@davemloft.net, fweisbec@gmail.com, perfmon2-devel@lists.sf.net, eranian@gmail.com, robert.richter@amd.com, acme@redhat.com, lizf@cn.fujitsu.com In-Reply-To: References: <4d2205a1.8b02e30a.3541.ffff8aee@mx.google.com> <1294226618.2016.259.camel@laptop> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Thu, 06 Jan 2011 11:15:09 +0100 Message-ID: <1294308909.2016.321.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2011-01-05 at 22:39 +0100, Stephane Eranian wrote: > Peter, > > On Wed, Jan 5, 2011 at 2:01 PM, Stephane Eranian wrote: > > On Wed, Jan 5, 2011 at 12:23 PM, Peter Zijlstra wrote: > >> On Mon, 2011-01-03 at 18:20 +0200, Stephane Eranian wrote: > >>> +void > >>> +perf_cgroup_switch(struct task_struct *task, struct task_struct *next) > >>> +{ > >>> + struct perf_cgroup *cgrp_out = perf_cgroup_from_task(task); > >>> + struct perf_cgroup *cgrp_in = perf_cgroup_from_task(next); > >>> + struct perf_cpu_context *cpuctx; > >>> + struct pmu *pmu; > >>> + /* > >>> + * if task is DEAD, then css_out is irrelevant, it has > >>> + * been changed to init_cgrp in cgroup_exit() from do_exit(). > >>> + * Furthermore, perf_cgroup_exit_task(), has scheduled out > >>> + * all css constrained events, only unconstrained events > >>> + * remain. Therefore we need to reschedule based on css_in. > >>> + */ > >>> + if (task->state != TASK_DEAD && cgrp_out == cgrp_in) > >>> + return; > >> > >> I think that check is broken, TASK_DEAD is set way after calling > >> cgroup_exit(), so if we get preempted in between there you'll still go > >> funny. > >> > > I looked at this part again. > > The original code checking for TASK_DEAD is correct. > > The reason is simple, you're looking at perf_cgroup_switch() which is > invoked as part of schedule() and NOT perf_event_task_exit() (called > prior to cgroup_exit()). > Thus, by the time you do the final schedule(), the task state has indeed > been switched to TASK_DEAD. > > I remember testing for this condition during the debug phase. But, cgroup_exit() detaches the task from the cgroup, after which the cgroup can disappear. Furthermore, we can schedule after cgroup_exit() and before the explicit schedule() invocation. Some of the exit functions (say proc_exit_connector) can block and cause scheduling, and with PREEMPT=y we can get preempted. This means you'll be context switching, and thus possibly calling perf_cgroup_switch(), on a task who's cgroup is possibly destroyed. So I'm not at all seeing how this is correct.