From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.12] helo=sc8-sf-mx2.sourceforge.net) by sc8-sf-list1.sourceforge.net with esmtp (Exim 4.30) id 1Cxgy3-0006dP-Gw for user-mode-linux-devel@lists.sourceforge.net; Sat, 05 Feb 2005 23:36:27 -0800 Received: from smtp209.mail.sc5.yahoo.com ([216.136.130.117]) by sc8-sf-mx2.sourceforge.net with smtp (Exim 4.41) id 1Cxgy2-0003Ov-4L for user-mode-linux-devel@lists.sourceforge.net; Sat, 05 Feb 2005 23:36:27 -0800 Message-ID: <4205C8EF.2000604@yahoo.com.au> From: Nick Piggin MIME-Version: 1.0 References: <42021E35.8050601@fujitsu-siemens.com> <4202C18F.5010605@yahoo.com.au> <42036C2C.5040503@fujitsu-siemens.com> <4203F40C.8040707@yahoo.com.au> <20050204143917.1f9507cb.akpm@osdl.org> <4204020F.2000501@yahoo.com.au> <42044D17.5040703@yahoo.com.au> <42058E52.8030306@yahoo.com.au> <20050206071935.GA19991@elte.hu> In-Reply-To: <20050206071935.GA19991@elte.hu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: [uml-devel] Re: [PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace) Sender: user-mode-linux-devel-admin@lists.sourceforge.net Errors-To: user-mode-linux-devel-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: The user-mode Linux development list List-Post: List-Help: List-Subscribe: , List-Archive: Date: Sun, 06 Feb 2005 18:36:15 +1100 To: Ingo Molnar Cc: Andrew Morton , bstroesser@fujitsu-siemens.com, roland@redhat.com, jdike@addtoit.com, blaisorblade_spam@yahoo.it, user-mode-linux-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org Ingo Molnar wrote: > * Nick Piggin wrote: > > >>When a task is put to sleep, it is dequeued from the runqueue >>while it is still running. The problem is that the runqueue >>lock can be dropped and retaken in schedule() before the task >>actually schedules off, and wait_task_inactive did not account >>for this. > > > ugh. This has been the Nth time we got bitten by the fundamental > unrobustness of non-atomic scheduling on some architectures ... > (And i'll say the N+1th time that this is not good.) > This is actually due to wake_sleeping_dependent and dependent_sleeper dropping the runqueue lock. Actually idle_balance can (in rare cases) drop the lock as well, which I didn't notice earlier, so it is something that we have been doing forever. The smtnice locking has just exposed the problem for us, so I wrongfully bashed it earlier *blush*. > >>+static int task_onqueue(runqueue_t *rq, task_t *p) >>+{ >>+ return (p->array || task_running(rq, p)); >>+} > > > the fix looks good, but i'd go for the simplified oneliner patch below. > I dont like the name 'task_onqueue()', a because a task is 'on the > queue' when it has p->array != NULL. The task is running when it's > task_running(). On architectures with nonatomic scheduling a task may > be running while not on the queue and external observers with the > runqueue lock might notice this - and wait_task_inactive() has to take > care of this case. I'm not sure how introducing a function named > "task_onqueue()" will make the situation any clearer. > > ok? > Well just because there is a specific condition that both callsites require. That is, the task is neither running nor on the runqueue. While task_onqueue is technically wrong if you're looking down into the fine details of the priority queue management, I think it is OK to go up a level of abstraction and say that the task is "on the runqueue" if it is either running or waiting to run. It is really the one condition that is made un-intuitive due to the locking in schedule(), so I thought formalising it would be better. Suggestions for a better name welcome? ;) Your one liner would fix the problem too, of course. The important thing at this stage is that it gets fixed for 2.6.11. Thanks, Nick ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S270981AbVBFHgn (ORCPT ); Sun, 6 Feb 2005 02:36:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262938AbVBFHgm (ORCPT ); Sun, 6 Feb 2005 02:36:42 -0500 Received: from smtp209.mail.sc5.yahoo.com ([216.136.130.117]:26554 "HELO smtp209.mail.sc5.yahoo.com") by vger.kernel.org with SMTP id S262481AbVBFHgX (ORCPT ); Sun, 6 Feb 2005 02:36:23 -0500 Message-ID: <4205C8EF.2000604@yahoo.com.au> Date: Sun, 06 Feb 2005 18:36:15 +1100 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20050105 Debian/1.7.5-1 X-Accept-Language: en MIME-Version: 1.0 To: Ingo Molnar CC: Andrew Morton , bstroesser@fujitsu-siemens.com, roland@redhat.com, jdike@addtoit.com, blaisorblade_spam@yahoo.it, user-mode-linux-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace) References: <42021E35.8050601@fujitsu-siemens.com> <4202C18F.5010605@yahoo.com.au> <42036C2C.5040503@fujitsu-siemens.com> <4203F40C.8040707@yahoo.com.au> <20050204143917.1f9507cb.akpm@osdl.org> <4204020F.2000501@yahoo.com.au> <42044D17.5040703@yahoo.com.au> <42058E52.8030306@yahoo.com.au> <20050206071935.GA19991@elte.hu> In-Reply-To: <20050206071935.GA19991@elte.hu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Ingo Molnar wrote: > * Nick Piggin wrote: > > >>When a task is put to sleep, it is dequeued from the runqueue >>while it is still running. The problem is that the runqueue >>lock can be dropped and retaken in schedule() before the task >>actually schedules off, and wait_task_inactive did not account >>for this. > > > ugh. This has been the Nth time we got bitten by the fundamental > unrobustness of non-atomic scheduling on some architectures ... > (And i'll say the N+1th time that this is not good.) > This is actually due to wake_sleeping_dependent and dependent_sleeper dropping the runqueue lock. Actually idle_balance can (in rare cases) drop the lock as well, which I didn't notice earlier, so it is something that we have been doing forever. The smtnice locking has just exposed the problem for us, so I wrongfully bashed it earlier *blush*. > >>+static int task_onqueue(runqueue_t *rq, task_t *p) >>+{ >>+ return (p->array || task_running(rq, p)); >>+} > > > the fix looks good, but i'd go for the simplified oneliner patch below. > I dont like the name 'task_onqueue()', a because a task is 'on the > queue' when it has p->array != NULL. The task is running when it's > task_running(). On architectures with nonatomic scheduling a task may > be running while not on the queue and external observers with the > runqueue lock might notice this - and wait_task_inactive() has to take > care of this case. I'm not sure how introducing a function named > "task_onqueue()" will make the situation any clearer. > > ok? > Well just because there is a specific condition that both callsites require. That is, the task is neither running nor on the runqueue. While task_onqueue is technically wrong if you're looking down into the fine details of the priority queue management, I think it is OK to go up a level of abstraction and say that the task is "on the runqueue" if it is either running or waiting to run. It is really the one condition that is made un-intuitive due to the locking in schedule(), so I thought formalising it would be better. Suggestions for a better name welcome? ;) Your one liner would fix the problem too, of course. The important thing at this stage is that it gets fixed for 2.6.11. Thanks, Nick