From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752994AbXD2JSP (ORCPT ); Sun, 29 Apr 2007 05:18:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755088AbXD2JSO (ORCPT ); Sun, 29 Apr 2007 05:18:14 -0400 Received: from ogre.sisk.pl ([217.79.144.158]:36462 "EHLO ogre.sisk.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752921AbXD2JSN (ORCPT ); Sun, 29 Apr 2007 05:18:13 -0400 From: "Rafael J. Wysocki" To: Pavel Machek Subject: Re: Back to the future. Date: Sun, 29 Apr 2007 11:22:15 +0200 User-Agent: KMail/1.9.5 Cc: Linus Torvalds , Nigel Cunningham , Pekka J Enberg , LKML , Oleg Nesterov References: <1177567481.5025.211.camel@nigel.suspend2.net> <20070429082313.GA1900@elf.ucw.cz> In-Reply-To: <20070429082313.GA1900@elf.ucw.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200704291122.16616.rjw@sisk.pl> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sunday, 29 April 2007 10:23, Pavel Machek wrote: > Hi! > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > > > needed for the suspend writeouts to succeed!), not solved them. > > > > > > I can't remember anything like this, but I believe you have a specific test > > > case in mind. > > > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > > > Rafael, you really don't know what you're talking about, do you? > > > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > > exactly *because* they do IO. You claim that kernel threads shouldn't do > > IO, but that's the point: if you cannot do IO when snapshotting to disk, > > here's a damn big clue for you: how do you think that snapshot is going to > > get written? > > > > I *guarantee* you that we've had a lot more problems with threads that > > should *not* have been frozen than with those hypothetical threads that > > you think should have been frozen. > > Well, we had nasty corruption on XFS, caused by thread that was not > frozen and should be. (While the other case leads "only" to deadlocks, > so it is easier to debug.) > > The locking point.. when I added freezing to swsusp, I knew very > little about kernel locking, so I "simply" decided to avoid the > problem altogether... using the freezer. > > You may be right that locks are not a big problem for the hibernation > after all; I just do not know. Still, I think, if a kernel thread is a part of a device driver, then _in_ _principle_ it needs _some_ synchronization with the driver's suspend/freeze and resume/thaw callbacks. For example, it's reasonable to assume that the thread should be quiet between suspend/freeze and resume/thaw. With the freezing of kernel threads we provide a simple means of such synchronization: use try_to_freeze() in a suitable place of your kernel thread and you're done. [Well, there should be a second part for making the thread die if the thaw callback doesn't find the device, but that's in the works.] Without it, there may be race conditions that we are not even aware of and that may trigger in, say, 1 in 10 suspends or so and I wish you good luck with debugging such things. Greetings, Rafael