From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) by kanga.kvack.org (Postfix) with ESMTP id 4104F6B0037 for ; Mon, 16 Jun 2014 12:30:32 -0400 (EDT) Received: by mail-pd0-f180.google.com with SMTP id fp1so1309523pdb.11 for ; Mon, 16 Jun 2014 09:30:31 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [143.182.124.21]) by mx.google.com with ESMTP id iy1si11468680pbb.115.2014.06.16.09.30.30 for ; Mon, 16 Jun 2014 09:30:31 -0700 (PDT) Message-ID: <539F1B66.2020006@intel.com> Date: Mon, 16 Jun 2014 18:29:26 +0200 From: "Rafael J. Wysocki" MIME-Version: 1.0 Subject: Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..." References: <20140505233358.GC19914@cmpxchg.org> <5368227D.7060302@intel.com> <20140612220200.GA25344@cmpxchg.org> <539A3CD7.6080100@intel.com> <20140613045557.GL2878@cmpxchg.org> In-Reply-To: <20140613045557.GL2878@cmpxchg.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: =?ISO-8859-1?Q?=22Rodolfo_Garc=EDa_Pe=F1as_=28kix=29=22?= , Oliver Winker , Jan Kara , Andrew Morton , bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org, Maxim Patlasov , Fengguang Wu , Tejun Heo , "Rafael J. Wysocki" On 6/13/2014 6:55 AM, Johannes Weiner wrote: > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: >> On 6/13/2014 12:02 AM, Johannes Weiner wrote: >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: >>>>> Hi Oliver, >>>>> >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: >>>>>> Hello, >>>>>> >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1] >>>>>> attached. >>>>>> >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail >>>>>> Probably more efficient when one of you guys looks directly. >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the >>>>> bdi_wq workqueue as it should: >>>>> >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory <-global_dirty_limits >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited >>>>> but the worker wakeup doesn't actually do anything: >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch <-__schedule >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq <-worker_thread >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker <-worker_thread >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle <-worker_thread >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers <-worker_enter_idle >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule <-worker_thread >>>>> >>>>> My suspicion is that this fails because the bdi_wq is frozen at this >>>>> point and so the flush work never runs until resume, whereas before my >>>>> patch the effective dirty limit was high enough so that image could be >>>>> written in one go without being throttled; followed by an fsync() that >>>>> then writes the pages in the context of the unfrozen s2disk. >>>>> >>>>> Does this make sense? Rafael? Tejun? >>>> Well, it does seem to make sense to me. >>> From what I see, this is a deadlock in the userspace suspend model and >>> just happened to work by chance in the past. >> Well, it had been working for quite a while, so it was a rather large >> opportunity >> window it seems. :-) > No doubt about that, and I feel bad that it broke. But it's still a > deadlock that can't reasonably be accommodated from dirty throttling. > > It can't just put the flushers to sleep and then issue a large amount > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > the messenger, this bug needs to be addressed, not get papered over. > >>> Can we patch suspend-utils as follows? >> Perhaps we can. Let's ask the new maintainer. >> >> Rodolfo, do you think you can apply the patch below to suspend-utils? >> >>> Alternatively, suspend-utils >>> could clear the dirty limits before it starts writing and restore them >>> post-resume. >> That (and the patch too) doesn't seem to address the problem with existing >> suspend-utils >> binaries, however. > It's userspace that freezes the system before issuing buffered IO, so > my conclusion was that the bug is in there. This is arguable. I also > wouldn't be opposed to a patch that sets the dirty limits to infinity > from the ioctl that freezes the system or creates the image. OK, that sounds like a workable plan. How do I set those limits to infinity? Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org