linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rainer Fiebig <jrf@mailbox.org>
To: Matheus Fillipe <matheusfillipeag@gmail.com>
Cc: "Jan Kara" <jack@suse.cz>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Rodolfo García Peñas (kix)" <kix@kix.es>,
	"Oliver Winker" <oliverml1@oli1170.net>,
	bugzilla-daemon@bugzilla.kernel.org, linux-mm@kvack.org,
	"Maxim Patlasov" <mpatlasov@parallels.com>,
	"Fengguang Wu" <fengguang.wu@intel.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	killian.de.volder@megasoft.be,
	"Atilla Karaca" <atillakaraca72@hotmail.com>
Subject: Re: [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..."
Date: Wed, 3 Apr 2019 19:55:59 +0200	[thread overview]
Message-ID: <56c1efb7-142b-9ae3-7f59-852d739f6632@mailbox.org> (raw)
In-Reply-To: <CAFWuBvcS-8AFZ4KoimMrLPjFXGE8a48QnSqV3_gajJNWYZymGA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 8331 bytes --]

Am 03.04.19 um 18:59 schrieb Matheus Fillipe:
> Yes I can sorta confirm the bug is in uswsusp. I removed the package
> and pm-utils 

Matheus,

there is no need to uninstall pm-utils. You actually need this to have
comfortable suspend/hibernate.

The only additional option you will get from uswsusp is true s2both
(which is nice, imo).

pm-utils provides something similar called "suspend-hybrid" which means
that the computer suspends and after a configurable time wakes up again
to go into hibernation.

and used both "systemctl hibernate"  and "echo disk >>
> /sys/power/state" to hibernate. It seems to succeed and shuts down, I
> am just not able to resume from it, which seems to be a classical
> problem solved just by setting the resume swap file/partition on grub.
> (which i tried and didn't work even with nvidia disabled)
> 
> Anyway uswsusp is still necessary because the default kernel
> hibernation doesn't work with the proprietary nvidia drivers as long
> as I know  and tested.

What doesn't work: hibernating or resuming?
And /var/log/pm-suspend.log might give you a clue what causes the problem.

> 
> Is there anyway I could get any workaround to this bug on my current
> OS by the way?

*I* don't know, I don't use Ubuntu. But what I would do now is
re-install pm-utils *without* uswsusp and make sure that you have got
the swap-partition/file right in grub.cfg or menu.lst (grub legacy).

Then do a few pm-hibernate/resume and tell us what happened.

So long!

> 
> On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <jrf@mailbox.org> wrote:
>>
>> Am 03.04.19 um 11:34 schrieb Jan Kara:
>>> On Tue 02-04-19 16:25:00, Andrew Morton wrote:
>>>>
>>>> I cc'ed a bunch of people from bugzilla.
>>>>
>>>> Folks, please please please remember to reply via emailed
>>>> reply-to-all.  Don't use the bugzilla interface!
>>>>
>>>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> wrote:
>>>>
>>>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
>>>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
>>>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
>>>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
>>>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
>>>>>>>>>> Hi Oliver,
>>>>>>>>>>
>>>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
>>>>>>>>>>> attached.
>>>>>>>>>>>
>>>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
>>>>>>>>>>> Probably more efficient when one of you guys looks directly.
>>>>>>>>>> Thanks, this looks interesting.  balance_dirty_pages() wakes up the
>>>>>>>>>> bdi_wq workqueue as it should:
>>>>>>>>>>
>>>>>>>>>> [  249.148009]   s2disk-3327    2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited
>>>>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : global_dirtyable_memory <-global_dirty_limits
>>>>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited
>>>>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited
>>>>>>>>>> [  249.148009]   s2disk-3327    2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited
>>>>>>>>>> but the worker wakeup doesn't actually do anything:
>>>>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us : finish_task_switch <-__schedule
>>>>>>>>>> [  249.148009] kworker/-3466    2.... 48550431us : _raw_spin_lock_irq <-worker_thread
>>>>>>>>>> [  249.148009] kworker/-3466    2d... 48550431us : need_to_create_worker <-worker_thread
>>>>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : worker_enter_idle <-worker_thread
>>>>>>>>>> [  249.148009] kworker/-3466    2d... 48550432us : too_many_workers <-worker_enter_idle
>>>>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : schedule <-worker_thread
>>>>>>>>>> [  249.148009] kworker/-3466    2.... 48550432us : __schedule <-worker_thread
>>>>>>>>>>
>>>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this
>>>>>>>>>> point and so the flush work never runs until resume, whereas before my
>>>>>>>>>> patch the effective dirty limit was high enough so that image could be
>>>>>>>>>> written in one go without being throttled; followed by an fsync() that
>>>>>>>>>> then writes the pages in the context of the unfrozen s2disk.
>>>>>>>>>>
>>>>>>>>>> Does this make sense?  Rafael?  Tejun?
>>>>>>>>> Well, it does seem to make sense to me.
>>>>>>>>  From what I see, this is a deadlock in the userspace suspend model and
>>>>>>>> just happened to work by chance in the past.
>>>>>>> Well, it had been working for quite a while, so it was a rather large
>>>>>>> opportunity
>>>>>>> window it seems. :-)
>>>>>> No doubt about that, and I feel bad that it broke.  But it's still a
>>>>>> deadlock that can't reasonably be accommodated from dirty throttling.
>>>>>>
>>>>>> It can't just put the flushers to sleep and then issue a large amount
>>>>>> of buffered IO, hoping it doesn't hit the dirty limits.  Don't shoot
>>>>>> the messenger, this bug needs to be addressed, not get papered over.
>>>>>>
>>>>>>>> Can we patch suspend-utils as follows?
>>>>>>> Perhaps we can.  Let's ask the new maintainer.
>>>>>>>
>>>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils?
>>>>>>>
>>>>>>>> Alternatively, suspend-utils
>>>>>>>> could clear the dirty limits before it starts writing and restore them
>>>>>>>> post-resume.
>>>>>>> That (and the patch too) doesn't seem to address the problem with existing
>>>>>>> suspend-utils
>>>>>>> binaries, however.
>>>>>> It's userspace that freezes the system before issuing buffered IO, so
>>>>>> my conclusion was that the bug is in there.  This is arguable.  I also
>>>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity
>>>>>> from the ioctl that freezes the system or creates the image.
>>>>>
>>>>> OK, that sounds like a workable plan.
>>>>>
>>>>> How do I set those limits to infinity?
>>>>
>>>> Five years have passed and people are still hitting this.
>>>>
>>>> Killian described the workaround in comment 14 at
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
>>>>
>>>> People can use this workaround manually by hand or in scripts.  But we
>>>> really should find a proper solution.  Maybe special-case the freezing
>>>> of the flusher threads until all the writeout has completed.  Or
>>>> something else.
>>>
>>> I've refreshed my memory wrt this bug and I believe the bug is really on
>>> the side of suspend-utils (uswsusp or however it is called). They are low
>>> level system tools, they ask the kernel to freeze all processes
>>> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is
>>> relatively heavyweight infrastructure) to work. That is wrong in my
>>> opinion.
>>>
>>> I can see Johanness was suggesting in comment 11 to use O_SYNC in
>>> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big
>>> hammer but using O_DIRECT should be what they need and get better
>>> performance - no additional buffering in the kernel, no dirty throttling,
>>> etc. They only need their buffer & device offsets sector aligned - they
>>> seem to be even page aligned in suspend-utils so they should be fine. And
>>> if the performance still sucks (currently they appear to do mostly random
>>> 4k writes so it probably would for rotating disks), they could use AIO DIO
>>> to get multiple pages in flight (as many as they dare to allocate buffers)
>>> and then the IO scheduler will reorder things as good as it can and they
>>> should get reasonable performance.
>>>
>>> Is there someone who works on suspend-utils these days? Because the repo
>>> I've found on kernel.org seems to be long dead (last commit in 2012).
>>>
>>>                                                               Honza
>>>
>>
>> Whether it's suspend-utils (or uswsusp) or not could be answered quickly
>> by de-installing this package and using the kernel-methods instead.
>>
>>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-04-03 17:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-05 23:33 [Bug 75101] New: [bisected] s2disk / hibernate blocks on "Saving 506031 image data pages () ..." Johannes Weiner
2014-05-05 23:45 ` Rafael J. Wysocki
2014-06-12 22:02   ` Johannes Weiner
2014-06-12 23:50     ` Rafael J. Wysocki
2014-06-13  4:55       ` Johannes Weiner
2014-06-16 16:29         ` Rafael J. Wysocki
2019-04-02 23:25           ` Andrew Morton
2019-04-03  3:54             ` Matheus Fillipe
2019-04-03  8:23               ` Rainer Fiebig
2019-04-03  8:34             ` Rainer Fiebig
2019-04-03  9:34             ` Jan Kara
2019-04-03 10:04               ` Rainer Fiebig
2019-04-03 16:59                 ` Matheus Fillipe
2019-04-03 17:55                   ` Rainer Fiebig [this message]
2019-04-03 19:08                     ` Matheus Fillipe
     [not found]                     ` <CAFWuBvfxS0S6me_pneXmNzKwObSRUOg08_7=YToAoBg53UtPKg@mail.gmail.com>
2019-04-04 10:48                       ` Rainer Fiebig
2019-04-04 16:04                         ` matheus
2019-04-03 21:43               ` Rafael J. Wysocki
     [not found] <bug-75101-27@https.bugzilla.kernel.org/>
2014-04-29 22:24 ` Andrew Morton
2014-05-05 15:35   ` Johannes Weiner
2014-05-05 16:10     ` Jan Kara
2014-05-05 21:00       ` Oliver Winker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56c1efb7-142b-9ae3-7f59-852d739f6632@mailbox.org \
    --to=jrf@mailbox.org \
    --cc=akpm@linux-foundation.org \
    --cc=atillakaraca72@hotmail.com \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=killian.de.volder@megasoft.be \
    --cc=kix@kix.es \
    --cc=linux-mm@kvack.org \
    --cc=matheusfillipeag@gmail.com \
    --cc=mpatlasov@parallels.com \
    --cc=oliverml1@oli1170.net \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).