public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Lyude Paul <lyude@redhat.com>
To: Takashi Iwai <tiwai@suse.de>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: Panic after S3 resume and modeset with MST
Date: Thu, 30 Mar 2017 16:01:31 -0400	[thread overview]
Message-ID: <1490904091.19826.3.camel@redhat.com> (raw)
In-Reply-To: <s5h37duiopt.wl-tiwai@suse.de>

On Thu, 2017-03-30 at 20:50 +0200, Takashi Iwai wrote:
<snip>
> 
> Sure, if we get a proper stack dump, we can analyze it somehow.  You
> can use addr2line, or even check objdump output manually.
> But in this case, as already mentioned, it was impossible to get any
> sensible stack trace on my machine with 4.11-rc, so far,
> unfortunately.  So no material to read.

huh? I thought that was what the file called "screenshot showing kernel
panic trace" on the bugzilla was (although that backtrace definitely
didn't look too relevant)... anyway if you are having trouble getting
just a stack trace though, one of my coworkers here has taught me a
trick called divide and conquer.

The idea is pretty simple. Let's say we have a block of code like this
in the kernel

void some_resume_func() {
	cool_function_call();
	this_is_neat_too();

	foo();
	bar();
	death();
	baz();
	zab();
}

And you know it's crashing inside this function on resume (e.g. it
could be in foo(), bar(), or that suspicious death() function) but you
have no way of getting a back trace.

This is where the trick comes in: while you might not be able to get a
stack trace, you can probably at least tell the difference between when
the machine reboots immediately as a result of calling
emergency_restart(), and whether it's just hanging due to the bug.

So what you do is kind of like bisecting, except instead of testing
different commits you see what happens when you insert a call to
emergency_restart() and move it around:

- Try #1:

void some_resume_func() {
	cool_function_call();
	this_is_neat_too();

	foo();
	emergency_restart();
	bar();
	death();
	baz();
	zab();
}

The machine immediately reboots, so the problem is below where we
inserted the emergency_reboot() call

- Try #2:

void some_resume_func() {
	cool_function_call();
	this_is_neat_too();

	foo();
	bar();
	death();
	emergency_restart();
	baz();
	zab();
}

The machine hangs, so we know the problem's either in the call to bar()
or death().

- Try #3:

void some_resume_func() {
	cool_function_call();
	this_is_neat_too();

	foo();
	bar();
	emergency_restart();
	death();
	baz();
	zab();
}

The machine reboots immediately this time, which means that the problem
has to be occurring inside the suspicious death() function. Of course,
if we want to keep debugging further we can go into the death()
function itself and try the same thing to figure out which line inside
it is causing the issue.

So if you do this except around wherever it looks like this crash might
be happening. From:

https://bugzilla.suse.com/show_bug.cgi?id=1029634#c5

It sounds like this happens on hotplugging, so the place to start this
would probably be i915_hotplug_work_func(). Keep going down the call
stack there and you should eventually find the culprit.

The only complication I foresee here is that you'll have to write a
little bit of additional debugging code so that
i915_hotplug_work_func() doesn't actually call emergency_restart()
until right before the moment where the crash happens. This shouldn't
be too difficult, you could do something like add a module parameter to
i915 that you change right before the final step of reproducing the bug
that enables the calls to emergency_restart(). If you have any trouble
with this part, feel free to let me know and I'll hack together a quick
patch you can use.

Lemme know if this helps at all :).

> 
> That is, the problem isn't how to translate it, but how to get it.
> Normal ways didn't work.  Maybe I can try AMT, but I doubt that it'll
> give any output since kdump already failed...
> 
> 
> thanks,
> 
> Takashi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-03-30 20:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27 16:02 Panic after S3 resume and modeset with MST Takashi Iwai
2017-03-27 22:14 ` Lyude Paul
2017-03-29 13:10 ` Takashi Iwai
2017-03-29 13:34   ` Ville Syrjälä
2017-03-29 13:54     ` Takashi Iwai
2017-03-30  0:24       ` Lyude Paul
2017-03-30  5:55         ` Takashi Iwai
2017-03-30 18:07           ` Lyude Paul
2017-03-30 18:50             ` Takashi Iwai
2017-03-30 20:01               ` Lyude Paul [this message]
2017-03-30 20:27                 ` Takashi Iwai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1490904091.19826.3.camel@redhat.com \
    --to=lyude@redhat.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox