From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fernando Luis Vazquez Cao Subject: Re: [PATCH] fsfreeze: tell hung_task about processes put to sleep Date: Tue, 16 Oct 2012 11:30:48 +0900 Message-ID: <507CC6D8.9080504@lab.ntt.co.jp> References: <1350035252.6500.2.camel@nexus.lab.ntt.co.jp> <20121013010613.GP2739@dastard> <507B820B.3000908@lab.ntt.co.jp> <20121015063608.GW2739@dastard> <507BB276.8020502@lab.ntt.co.jp> <20121015210217.GC2739@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Al Viro , Ingo Molnar , Jan Kara , linux-fsdevel@vger.kernel.org To: Dave Chinner Return-path: Received: from tama50.ecl.ntt.co.jp ([129.60.39.147]:50046 "EHLO tama50.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751094Ab2JPCbK (ORCPT ); Mon, 15 Oct 2012 22:31:10 -0400 In-Reply-To: <20121015210217.GC2739@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 2012/10/16 06:02, Dave Chinner wrote: > On Mon, Oct 15, 2012 at 03:51:34PM +0900, Fernando Luis Vazquez Cao wrote: >> As I mentioned in my previous email if you want to emit a >> warning do it in the right place and make sure that it is >> something informative. hung_check certainly isn't the >> right place to do it. > So, how do we now know when a freeze fails to complete, as opposed > to a thaw that hasn't occurred? We won't get any reports from > threads that are stuck waiting for the freeze to complete, and so > we'll end up with a silent hang. Are you referring to a situation where freeze_super() fails to complete? If so hung_check will detect that the task that called the freeze ioctl is stuck and will dump its stack's contents, which is *precisely* the information we want. If freeze_super() completes we know that the filesystem is in either frozen state (SB_FREEZE_COMPLETE) or thawed state (SB_UNFROZEN) with no tasks waiting (we take care of things properly even when ->freeze_fs(sb) fails; we print an error message and wake up any tasks that may be waiting for the freeze to complete). Once the ioctl returns we know that there is nothing wrong with the kernel and spewing random stack dumps or causing a kernel panic is not called for. > Indeed, if you have a daemon that freezes the filesystem, and you > haven't architected it with a watchdog to handle restarts due to > failures, then you don't have a resilient system at all, regardless > of these warnings. If it's a HA daemon/agent that doesn't get > restarted and clean up it's mess automatically, then IMO it is > fundamentally broken and that's the problem that needs fixing. Absolutely. By the way, to handle restarts properly we need check ioctls or a sysfs/procfs equivalent for fsfreeze, which my previous patch set implements. > Removing kernel warnings doesn't change the fact that the > application doing freeze/thaw is broken by design... It is precisely because we want to handle things in user space that we need to get hung_task related panics and unneeded warnings out of the way. As I mentioned above, if the freeze failed to complete we already got the warnings we need from the kernel. If it completed but ->freeze_fs() failed we get a warning and the ioctl returns an error code so the application will know that something went wrong. If the ioctl returns without errors the application can take care of things by itself (specially once we get the check API, in whatever form, merged). Thanks, Fernando