From: Baoquan He <bhe@redhat.com>
To: Dave Young <dyoung@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>, David Airlie <airlied@redhat.com>,
x86@kernel.org, kexec@lists.infradead.org,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: mgag200 fails kdump kernel booting
Date: Tue, 2 Jul 2019 13:34:43 +0800 [thread overview]
Message-ID: <20190702053443.GE3178@localhost.localdomain> (raw)
In-Reply-To: <20190702031715.GB3327@dhcp-128-65.nay.redhat.com>
On 07/02/19 at 11:17am, Dave Young wrote:
> On 07/02/19 at 09:41am, Baoquan He wrote:
> > On 07/02/19 at 06:51am, David Airlie wrote:
> > > On Wed, Jun 26, 2019 at 6:29 PM Baoquan He <bhe@redhat.com> wrote:
> > > >
> > > > On 06/26/19 at 04:15pm, Baoquan He wrote:
> > > > > Hi Dave,
> > > > >
> > > > > We met an kdump kernel boot failure on a lenovo system. Kdump kernel
> > > > > failed to boot, but just reset to firmware to reboot system. And nothing
> > > > > is printed out.
> > > > >
> > > > > The machine is a big server, with 6T memory and many cpu, its graphic
> > > > > driver module is mgag200.
> > > > >
> > > > > When added 'earlyprintk=ttyS0' into kernel command line, it printed
> > > > > out only one line to console during kdump kernel booting:
> > > > > KASLR disabled: 'nokaslr' on cmdline.
> > > > >
> > > > > Then reset to firmware to reboot system.
> > > > >
> > > > > By further code debugging, the failure happened in
> > > > > arch/x86/boot/compressed/misc.c, during kernel decompressing stage. It's
> > > > > triggered by the vga printing. As you can see, in __putstr() of
> > > > > arch/x86/boot/compressed/misc.c, the code checks if earlyprintk= is
> > > > > specified, and print out to the target. And no matter if earlyprintk= is
> > > > > added or not, it will print to VGA. And printing to VGA caused it to
> > > > > reset to firmware. That's why we see nothing when didn't specify
> > > > > earlyprintk=, but see only one line of printing about the 'KASLR
> > > > > disabled'.
> > > >
> > > > Here I mean:
> > > > That's why we see nothing when didn't specify earlyprintk=, but see only
> > > > one line of printing about the 'KASLR disabled' message when
> > > > earlyprintk=ttyS0 added.
> > >
> > > Just to clarify, the original kernel is booted with mgag200 turned
> > > off, then kexec works, but if the original kernel loads mgag200, the
> > > kexec kernels resets hard when the VGA is used to write stuff out.
> >
> > Thanks for looking into this, Dave.
> >
> > Yeah, in fact the issue was found in kdump kernel. I haven't checked the
> > kexec jumping. Kexec jumping will call device_shutdown() to attempt to
> > shutdown all devices before jumping to the 2nd kernel. But kdump jumping
> > won't.
> >
> > >
> > > This *might* be fixable in the controlled kexec case, but having an
> > > mgag200 shutdown path that tries to put the gpu back into a state
> > > where VGA doesn't die, but for the uncontrolled kexec it'll still be a
> > > problem, since once the gpu is up and running and VGA is disabled, it
> > > doesn't expect to see anymore VGA transactions.
> >
> > Yes, I see. It should have been shutdown by device_shutdown() in kexec
> > case. The uncontrolled case, I guess you mean the kdump case. In
> > kdump case, we don't call device_shutdown() before jumping because the
> > 1st kernel has been in crashed state, we just want to switch to kdump
> > kernel asap. So wondering how other GPU/VGA device/driver bebahve,
> > currently haven't got report about them. Probably mgag200 is very new,
> > or we may not meet them. This issue was met on a new bought server.
>
> I assumed the vga writing only take effect when earlyprintk is provided.
> eg. earlyprintk=ttyS0, then x86 early decompress code will write to both
> vga and ttyS0. So if one does not use earlyprintk, he/she still get
> nothing. But if one provides earlyprintk, then he/she should provide a
> correct param he want, instead of blindly assume kernel will write to
> vga even if he use ttyS0.
No, the vga printing takes effect always, otherwise those warn() and
error() won't work. It takes effect no matter if CONFIG_EARLY_PRINTK
is enabled, and if any earlyprintk= specified.
That's why I prefer to pursuit fix in driver side. It's making the
error/warn print out even though nothing specific needed, that's make
sense to me.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: Dave Young <dyoung@redhat.com>
Cc: David Airlie <airlied@redhat.com>,
kexec@lists.infradead.org, x86@kernel.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Lyude Paul <lyude@redhat.com>
Subject: Re: mgag200 fails kdump kernel booting
Date: Tue, 2 Jul 2019 13:34:43 +0800 [thread overview]
Message-ID: <20190702053443.GE3178@localhost.localdomain> (raw)
In-Reply-To: <20190702031715.GB3327@dhcp-128-65.nay.redhat.com>
On 07/02/19 at 11:17am, Dave Young wrote:
> On 07/02/19 at 09:41am, Baoquan He wrote:
> > On 07/02/19 at 06:51am, David Airlie wrote:
> > > On Wed, Jun 26, 2019 at 6:29 PM Baoquan He <bhe@redhat.com> wrote:
> > > >
> > > > On 06/26/19 at 04:15pm, Baoquan He wrote:
> > > > > Hi Dave,
> > > > >
> > > > > We met an kdump kernel boot failure on a lenovo system. Kdump kernel
> > > > > failed to boot, but just reset to firmware to reboot system. And nothing
> > > > > is printed out.
> > > > >
> > > > > The machine is a big server, with 6T memory and many cpu, its graphic
> > > > > driver module is mgag200.
> > > > >
> > > > > When added 'earlyprintk=ttyS0' into kernel command line, it printed
> > > > > out only one line to console during kdump kernel booting:
> > > > > KASLR disabled: 'nokaslr' on cmdline.
> > > > >
> > > > > Then reset to firmware to reboot system.
> > > > >
> > > > > By further code debugging, the failure happened in
> > > > > arch/x86/boot/compressed/misc.c, during kernel decompressing stage. It's
> > > > > triggered by the vga printing. As you can see, in __putstr() of
> > > > > arch/x86/boot/compressed/misc.c, the code checks if earlyprintk= is
> > > > > specified, and print out to the target. And no matter if earlyprintk= is
> > > > > added or not, it will print to VGA. And printing to VGA caused it to
> > > > > reset to firmware. That's why we see nothing when didn't specify
> > > > > earlyprintk=, but see only one line of printing about the 'KASLR
> > > > > disabled'.
> > > >
> > > > Here I mean:
> > > > That's why we see nothing when didn't specify earlyprintk=, but see only
> > > > one line of printing about the 'KASLR disabled' message when
> > > > earlyprintk=ttyS0 added.
> > >
> > > Just to clarify, the original kernel is booted with mgag200 turned
> > > off, then kexec works, but if the original kernel loads mgag200, the
> > > kexec kernels resets hard when the VGA is used to write stuff out.
> >
> > Thanks for looking into this, Dave.
> >
> > Yeah, in fact the issue was found in kdump kernel. I haven't checked the
> > kexec jumping. Kexec jumping will call device_shutdown() to attempt to
> > shutdown all devices before jumping to the 2nd kernel. But kdump jumping
> > won't.
> >
> > >
> > > This *might* be fixable in the controlled kexec case, but having an
> > > mgag200 shutdown path that tries to put the gpu back into a state
> > > where VGA doesn't die, but for the uncontrolled kexec it'll still be a
> > > problem, since once the gpu is up and running and VGA is disabled, it
> > > doesn't expect to see anymore VGA transactions.
> >
> > Yes, I see. It should have been shutdown by device_shutdown() in kexec
> > case. The uncontrolled case, I guess you mean the kdump case. In
> > kdump case, we don't call device_shutdown() before jumping because the
> > 1st kernel has been in crashed state, we just want to switch to kdump
> > kernel asap. So wondering how other GPU/VGA device/driver bebahve,
> > currently haven't got report about them. Probably mgag200 is very new,
> > or we may not meet them. This issue was met on a new bought server.
>
> I assumed the vga writing only take effect when earlyprintk is provided.
> eg. earlyprintk=ttyS0, then x86 early decompress code will write to both
> vga and ttyS0. So if one does not use earlyprintk, he/she still get
> nothing. But if one provides earlyprintk, then he/she should provide a
> correct param he want, instead of blindly assume kernel will write to
> vga even if he use ttyS0.
No, the vga printing takes effect always, otherwise those warn() and
error() won't work. It takes effect no matter if CONFIG_EARLY_PRINTK
is enabled, and if any earlyprintk= specified.
That's why I prefer to pursuit fix in driver side. It's making the
error/warn print out even though nothing specific needed, that's make
sense to me.
next prev parent reply other threads:[~2019-07-02 5:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-26 8:15 mgag200 fails kdump kernel booting Baoquan He
2019-06-26 8:15 ` Baoquan He
2019-06-26 8:29 ` Baoquan He
2019-06-26 8:29 ` Baoquan He
2019-07-01 20:51 ` David Airlie
2019-07-01 20:51 ` David Airlie
2019-07-02 1:41 ` Baoquan He
2019-07-02 1:41 ` Baoquan He
2019-07-02 3:17 ` Dave Young
2019-07-02 3:17 ` Dave Young
2019-07-02 5:34 ` Baoquan He [this message]
2019-07-02 5:34 ` Baoquan He
2019-07-02 7:42 ` Dave Young
2019-07-02 7:42 ` Dave Young
2020-02-05 7:31 ` Baoquan He
2019-07-02 2:21 ` Dave Young
2019-07-02 2:21 ` Dave Young
2019-07-02 2:47 ` Baoquan He
2019-07-02 2:47 ` Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190702053443.GE3178@localhost.localdomain \
--to=bhe@redhat.com \
--cc=airlied@redhat.com \
--cc=dyoung@redhat.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.