From: "Ville Syrjälä" <ville.syrjala@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.79.tang@gmail.com>, feng.tang@intel.com, "Rafael J. Wysocki" <rafael@kernel.org>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Steven Rostedt <rostedt@goodmis.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, linux-arch@vger.kernel.org, Rik van Riel <riel@redhat.com>, "Srivatsa S. Bhat" <srivatsa@mit.edu>, Peter Zijlstra <peterz@infradead.org>, Arjan van de Ven <arjan@linux.intel.com>, Rusty Russell <rusty@rustcorp.com.au>, Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Paul McKenney <paulmck@linux.vnet.ibm.com>, Linus Torvalds <torvalds@linux-foundation.org>, Paul Turner <pjt@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "Zhang, Rui" <rui.zhang@intel> Subject: Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")] Date: Mon, 7 Nov 2016 13:49:37 +0200 [thread overview] Message-ID: <20161107114937.GX4617@intel.com> (raw) In-Reply-To: <20161101204737.GB4617@intel.com> On Tue, Nov 01, 2016 at 10:47:37PM +0200, Ville Syrjälä wrote: > On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote: > > On Fri, 28 Oct 2016, Ville Syrjälä wrote: > > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > > > sufficient. > > > > > > > > > > So far it looks like the answer is yes. > > > > > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > > > I suppose that's not all that surprising ;) > > > > > > > > Well, set it to 1msec then. If that works reliably then we really can do > > > > that unconditionally. There is no harm in firing a useless timer during > > > > resume once. > > > > > > I narrowed down the required timeout, and looks like 25ms is the > > > minimum that works. With 24ms I already started to have failures. So > > > maybe just bump it up by an order of magnitude to 250ms for some > > > safety margin? > > I left the thing running for the weekend and it failed 26 out of 16057 > times with the 25ms timeout. Looks like it takes ~5 minutes to resume > when it fails, but eventually it does come back. > > > > > Sure, but what puzzles me is that we need a timeout that big. What happens > > between broadcast_resume() and broadcast_resume() + 25ms? > > > > IOW, what is the event/resume function which we need to bridge. We should > > really try to track than down. > > My hunch would be that SMM trap in the DSDT/SSDT since that's where > things ended up last time I was tracing these resume problems. Though I > can't recall if that was just with acpi-idle or if intel_idle landed in > the same spot as well. > > I guess I can try to repeat that test tomorrow, or I'll try your function > tracer method if the other thing fails. I didn't manage to find a lot of time to play around with this, but it definitely looks like the SMM trap is the problem here. I repeated my pm_trace experiemnts and when it gets stuck it is trying to execute the _WAK ACPI method which is where the SMM trap happens. Maybe the SMM code was written with the expectation of a periodic tick or something like that? > > > > > You might try to enable function tracing and do a tracing_off() when that > > 25ms timeout fires. > > > > Something like > > > > stop_trace = true; > > > > in broadcast_resume() and then in the broadcast timer function: > > > > if (stop_trace) { > > stop_trace = false; > > tracing_off(); > > } > > > > Then when the machine is up read the trace, compress and upload it > > somewhere or send it in private mail if it's not that big. > > > > Thanks, > > > > tglx > > > -- > Ville Syrjälä > Intel OTC -- Ville Syrjälä Intel OTC
WARNING: multiple messages have this Message-ID (diff)
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de> Cc: Feng Tang <feng.79.tang@gmail.com>, feng.tang@intel.com, "Rafael J. Wysocki" <rafael@kernel.org>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>, Steven Rostedt <rostedt@goodmis.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, linux-arch@vger.kernel.org, Rik van Riel <riel@redhat.com>, "Srivatsa S. Bhat" <srivatsa@mit.edu>, Peter Zijlstra <peterz@infradead.org>, Arjan van de Ven <arjan@linux.intel.com>, Rusty Russell <rusty@rustcorp.com.au>, Oleg Nesterov <oleg@redhat.com>, Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Paul McKenney <paulmck@linux.vnet.ibm.com>, Linus Torvalds <torvalds@linux-foundation.org>, Paul Turner <pjt@google.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, "Zhang, Rui" <rui.zhang@intel.com>, Len Brown <len.brown@intel.com>, Linux PM <linux-pm@vger.kernel.org>, Linux ACPI <linux-acpi@vger.kernel.org> Subject: Re: S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")] Date: Mon, 7 Nov 2016 13:49:37 +0200 [thread overview] Message-ID: <20161107114937.GX4617@intel.com> (raw) Message-ID: <20161107114937.U0_1Za-5fscbuvEax-KUZhBvHtdH1zIkrBHFeFEb2h0@z> (raw) In-Reply-To: <20161101204737.GB4617@intel.com> On Tue, Nov 01, 2016 at 10:47:37PM +0200, Ville Syrjälä wrote: > On Fri, Oct 28, 2016 at 08:58:41PM +0200, Thomas Gleixner wrote: > > On Fri, 28 Oct 2016, Ville Syrjälä wrote: > > > On Thu, Oct 27, 2016 at 10:41:18PM +0200, Thomas Gleixner wrote: > > > > On Thu, 27 Oct 2016, Ville Syrjälä wrote: > > > > > On Thu, Oct 27, 2016 at 09:25:05PM +0200, Thomas Gleixner wrote: > > > > > > So it would be interesting whether that hunk in resume_broadcast() is > > > > > > sufficient. > > > > > > > > > > So far it looks like the answer is yes. > > > > > > > > > > Looks to be about 5 seconds slower than acpi-idle in resuming, but > > > > > I suppose that's not all that surprising ;) > > > > > > > > Well, set it to 1msec then. If that works reliably then we really can do > > > > that unconditionally. There is no harm in firing a useless timer during > > > > resume once. > > > > > > I narrowed down the required timeout, and looks like 25ms is the > > > minimum that works. With 24ms I already started to have failures. So > > > maybe just bump it up by an order of magnitude to 250ms for some > > > safety margin? > > I left the thing running for the weekend and it failed 26 out of 16057 > times with the 25ms timeout. Looks like it takes ~5 minutes to resume > when it fails, but eventually it does come back. > > > > > Sure, but what puzzles me is that we need a timeout that big. What happens > > between broadcast_resume() and broadcast_resume() + 25ms? > > > > IOW, what is the event/resume function which we need to bridge. We should > > really try to track than down. > > My hunch would be that SMM trap in the DSDT/SSDT since that's where > things ended up last time I was tracing these resume problems. Though I > can't recall if that was just with acpi-idle or if intel_idle landed in > the same spot as well. > > I guess I can try to repeat that test tomorrow, or I'll try your function > tracer method if the other thing fails. I didn't manage to find a lot of time to play around with this, but it definitely looks like the SMM trap is the problem here. I repeated my pm_trace experiemnts and when it gets stuck it is trying to execute the _WAK ACPI method which is where the SMM trap happens. Maybe the SMM code was written with the expectation of a periodic tick or something like that? > > > > > You might try to enable function tracing and do a tracing_off() when that > > 25ms timeout fires. > > > > Something like > > > > stop_trace = true; > > > > in broadcast_resume() and then in the broadcast timer function: > > > > if (stop_trace) { > > stop_trace = false; > > tracing_off(); > > } > > > > Then when the machine is up read the trace, compress and upload it > > somewhere or send it in private mail if it's not that big. > > > > Thanks, > > > > tglx > > > -- > Ville Syrjälä > Intel OTC -- Ville Syrjälä Intel OTC
next prev parent reply other threads:[~2016-11-07 11:49 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-05-11 10:19 S3 resume regression [1cf4f629d9d2 ("cpu/hotplug: Move online calls to hotplugged cpu")] Ville Syrjälä 2016-05-11 10:19 ` Ville Syrjälä 2016-05-11 12:11 ` Sebastian Andrzej Siewior 2016-05-11 12:21 ` Ville Syrjälä 2016-05-11 12:24 ` Sebastian Andrzej Siewior 2016-05-11 12:41 ` Ville Syrjälä 2016-05-11 12:44 ` Steven Rostedt 2016-05-11 13:34 ` Ville Syrjälä 2016-05-16 19:39 ` Ville Syrjälä 2016-05-17 23:14 ` Rafael J. Wysocki 2016-05-17 23:14 ` Rafael J. Wysocki 2016-05-18 7:24 ` Ville Syrjälä 2016-05-18 7:24 ` Ville Syrjälä 2016-05-26 18:32 ` Ville Syrjälä 2016-05-30 20:43 ` Rafael J. Wysocki 2016-05-31 7:26 ` Ville Syrjälä 2016-05-31 7:26 ` Ville Syrjälä 2016-07-13 14:54 ` Ville Syrjälä 2016-07-13 14:54 ` Ville Syrjälä 2016-07-14 8:29 ` Feng Tang 2016-07-14 8:29 ` Feng Tang 2016-08-09 17:20 ` Ville Syrjälä 2016-08-09 17:20 ` Ville Syrjälä 2016-10-27 17:28 ` Ville Syrjälä 2016-10-27 17:28 ` Ville Syrjälä 2016-10-27 18:48 ` Thomas Gleixner 2016-10-27 18:48 ` Thomas Gleixner 2016-10-27 19:20 ` Ville Syrjälä 2016-10-27 19:20 ` Ville Syrjälä 2016-10-27 19:25 ` Thomas Gleixner 2016-10-27 19:25 ` Thomas Gleixner 2016-10-27 20:37 ` Ville Syrjälä 2016-10-27 20:37 ` Ville Syrjälä 2016-10-27 20:41 ` Thomas Gleixner 2016-10-27 20:41 ` Thomas Gleixner 2016-10-28 15:56 ` Ville Syrjälä 2016-10-28 15:56 ` Ville Syrjälä 2016-10-28 18:58 ` Thomas Gleixner 2016-10-28 18:58 ` Thomas Gleixner 2016-11-01 20:47 ` Ville Syrjälä 2016-11-01 20:47 ` Ville Syrjälä 2016-11-07 11:49 ` Ville Syrjälä [this message] 2016-11-07 11:49 ` Ville Syrjälä 2016-11-07 13:07 ` Thomas Gleixner 2016-11-07 13:07 ` Thomas Gleixner 2016-11-07 16:45 ` Ville Syrjälä 2016-11-07 16:45 ` Ville Syrjälä 2016-11-09 3:54 ` Feng Tang 2016-11-09 3:54 ` Feng Tang 2016-11-09 6:08 ` Linus Torvalds 2016-11-09 6:08 ` Linus Torvalds 2016-11-17 17:14 ` Ville Syrjälä 2016-11-17 17:14 ` Ville Syrjälä 2016-05-11 13:36 ` Rafael J. Wysocki 2016-05-11 15:25 ` Jim Bos 2016-05-11 16:19 ` Rafael J. Wysocki 2016-05-11 16:19 ` Rafael J. Wysocki 2016-05-11 16:21 ` Sebastian Andrzej Siewior 2016-05-11 16:24 ` Rafael J. Wysocki 2016-05-11 12:44 ` Arjan van de Ven 2016-05-11 15:26 ` Arjan van de Ven 2016-05-11 15:26 ` Arjan van de Ven 2016-05-11 17:09 ` Ville Syrjälä 2016-05-11 17:09 ` Ville Syrjälä
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20161107114937.GX4617@intel.com \ --to=ville.syrjala@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=arjan@linux.intel.com \ --cc=bigeasy@linutronix.de \ --cc=feng.79.tang@gmail.com \ --cc=feng.tang@intel.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=oleg@redhat.com \ --cc=paulmck@linux.vnet.ibm.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=rafael.j.wysocki@intel.com \ --cc=rafael@kernel.org \ --cc=riel@redhat.com \ --cc=rostedt@goodmis.org \ --cc=rui.zhang@intel \ --cc=rusty@rustcorp.com.au \ --cc=srivatsa@mit.edu \ --cc=tglx@linutronix.de \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).