From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933309AbaJ2O1H (ORCPT ); Wed, 29 Oct 2014 10:27:07 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:41739 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932476AbaJ2O1F (ORCPT ); Wed, 29 Oct 2014 10:27:05 -0400 Date: Wed, 29 Oct 2014 15:26:54 +0100 From: Peter Zijlstra To: Oleg Nesterov Cc: Mike Galbraith , mingo@kernel.org, torvalds@linux-foundation.org, tglx@linutronix.de, ilya.dryomov@inktank.com, linux-kernel@vger.kernel.org, Eric Paris , rafael.j.wysocki@intel.com Subject: Re: [PATCH 00/11] nested sleeps, fixes and debug infrastructure Message-ID: <20141029142654.GD3337@twins.programming.kicks-ass.net> References: <20140924081845.572814794@infradead.org> <1411633803.15810.12.camel@marge.simpson.net> <20140925090619.GA5430@worktop> <20140925091556.GB5430@worktop> <20141002102251.GA6324@worktop.programming.kicks-ass.net> <20141002121553.GB6324@worktop.programming.kicks-ass.net> <20141027134103.GA10476@twins.programming.kicks-ass.net> <20141028000703.GA22964@redhat.com> <20141028082335.GM3337@twins.programming.kicks-ass.net> <20141029000055.GA12107@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141029000055.GA12107@redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 29, 2014 at 01:00:56AM +0100, Oleg Nesterov wrote: > On 10/28, Peter Zijlstra wrote: > > So I talked to Rafael yesterday and I'm going to replace all the > > wait_event*() stuff, and I suppose also freezable_schedule() because > > they're racy. > > > > The moment we call freezer_do_not_count() the freezer will ignore us, > > this means the thread could still be running (albeit not for long) when > > the freezer reports success. > > Yes, sure. IIRC the theory was that a PF_FREEZER_SKIP will do nothing > "wrong" wrt freezing/suspend before it actually sleeps, but I guess > today we can't assume this. Esp. the wait_event_freezable*() family seems suspicious in that the cond stmt can actually result in quite a lot of code. But see below, I don't think we have a guarantee it will _ever_ sleep. Also, this calls schedule(); try_to_freeze() in a suitable loop that's safe against spurious wakeups, OTOH.. > > Ideally I'll be able to kill the entire freezer_do_not_count() stuff. > > Agreed... but it is not clear to me what exactly we can/should do. .. I looked at freezable_schedule() and I'm not sure how to 'fix' that. The problem being things like signal.c:ptrace_stop() that will actually misbehave in the face of spurious wakeups as allowed by try_to_freeze(). Then again, freezable_schedule() isn't nearly as bad as the wait_event_freezable() stuff because it does indeed guarantee the task only calls schedule(). Then again, it is possible to miss these tasks and report freeze success with a running task all the same, suppose its already woken but preempted before freezer_count(). The for_each_process_thread() loop in try_to_freeze_tasks() will skip over it. And all I can come up with is horrible.. maybe for another day.