From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Rafael J. Wysocki" Subject: Re: [RFC][PATCH -mm 3/3] Freezer: Replace the timeout Date: Wed, 1 Aug 2007 12:43:24 +0200 Message-ID: <200708011243.25276.rjw@sisk.pl> References: <200707251401.48340.rjw@sisk.pl> <200708010029.43652.rjw@sisk.pl> <20070801083146.GX2087@elf.ucw.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20070801083146.GX2087@elf.ucw.cz> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-pm-bounces@lists.linux-foundation.org Errors-To: linux-pm-bounces@lists.linux-foundation.org To: Pavel Machek Cc: Nigel Cunningham , Andres Salomon , linux-pm@lists.linux-foundation.org, Chris Ball , David Woodhouse , Oleg Nesterov List-Id: linux-pm@vger.kernel.org On Wednesday, 1 August 2007 10:31, Pavel Machek wrote: > Hi! >=20 > > Instead of using the global timeout, we can use a more fine grained m= ethod of > > checking if the freezing of tasks should fail. Namely, we can measur= e the time > > in which no tasks have entered the refrigerator by counting the numbe= r of calls > > to wait_event_timeout() in try_to_freeze_tasks() that have returned 0= (in a > > row). > >=20 > > After sending freeze requests to the tasks regarded as freezable > > try_to_freeze_tasks() goes to sleep and waits until at least one task= enters the > > refrigerator. =A0If the refrigerator is not entered by any tasks befo= re WAIT_TIME > > expires, try_to_freeze_tasks() increases the counter of expired timeo= uts and > > sends freeze requests to the remaining tasks. If the number of expir= ed timeouts > > becomes greater than MAX_WAITS, the freezing of tasks fails (the coun= ter of > > expired timeouts is reset whenever a task enters the refrigerator). >=20 > I do not get logic behind this. >=20 > Old logic was "we give system 20 seconds to come into quiet state". >=20 > New logic is "if we do no progress within second, we fail"... which is > quite a big change. Well, I agree, and that's why I wanted to separate this part from the two previous patches ... > What happens on loaded ext3 filesystem, for example? Bunch of userland = tasks > will wait on data to be synced to disk, taking more than second, no? IMHO this only is a question of what the value of MAX_WAITS should be. [I took 5 because it turned to be enough in my testing, but that could be= 10 or more.] The point is that in 99.(9)% of cases the 20s timeout is unnecessary, bec= ause: (1) most often we succeed within 1s (2) if we are going to fail, we can say that we'll fail way before the 20= s expires. Now, the question is how we can check that we'll fail and this patch atte= mpts to use a simple machanism: * measure the time in which no tasks have entered the refrigerator and if= this time is long enough, we can safely assume the "blocking" tasks to be st= uck somewhere and give up. This isn't bullet proof, but it should cover the vast majority of cases. Anyway, eventually, I'd like the freezer to detect failures relatively ea= rly, so the user won't have to wait 20s each time it's going to fail. Greetings, Rafael --=20 "Premature optimization is the root of all evil." - Donald Knuth