From mboxrd@z Thu Jan 1 00:00:00 1970 From: Minjun Hong Subject: Re: kernel panic with no call trace Date: Fri, 8 Sep 2017 01:29:12 +0900 Message-ID: References: <7a1c005f-a470-3a86-6b22-db128a2b7876@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5282422412209875479==" Return-path: In-Reply-To: <7a1c005f-a470-3a86-6b22-db128a2b7876@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Andrew Cooper Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============5282422412209875479== Content-Type: multipart/alternative; boundary="001a1144018aee750e05589bf5f7" --001a1144018aee750e05589bf5f7 Content-Type: text/plain; charset="UTF-8" Thank you for your replay even if this is quite late. As you mention, I know there is an error (or some errors) but I cannot guess where it is, so that I want to know where I should start debugging from. However, although I'm using serial console, I could get not enough clues only from the kernel log: 1) I could get what line and file caused the panic by using the call trace 2) What linear address brings about this situation; Faulting linear address I think, literally, the 'Faulting linear address' is key point because I heard that it represents bad pointer. With the pointer(it is just address and I cannot infer what it does mean), is there any way to figure out its real data or line in C source code? If you have any other approach that can be used in some cases like this, could you please give me the guide? Below is the kernel log from serial console: (XEN) ----[ Xen-4.5.0 x86_64 debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[] csched_schedule+0x373/0x1180 (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 00000000ffffffff rbx: ffff830087ffa000 rcx: ffff830461d20000 (XEN) rdx: ffff830088002c98 rsi: ffff830461d20000 rdi: 0000000000000000 (XEN) rbp: ffff830461ce2ae0 rsp: ffff830461d27d10 r8: 0000001e582339ec (XEN) r9: 0000000000000004 r10: 000000000000003c r11: 0000000000000004 (XEN) r12: 0000000000000001 r13: ffff82d0803f26a0 r14: ffff830461c53000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000003526f0 (XEN) cr3: 0000000086077000 cr2: ffff830088002c98 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830461d27d10: (XEN) ffff830461d03950 ffff82d0804081e0 ffff830461c74068 ffff830461d27de0 (XEN) ffff830461c24c30 ffff830461cec800 ffff82d0804081e0 0000000600000002 (XEN) ffff830461ce29d0 ffff830461d20000 ffff82d0804081e0 ffff830461d3a720 (XEN) 0000000000000002 ffff830461d3a700 00ffffc000000000 ffff830461d27dd0 (XEN) ffff830461d27e68 ffff82d080408180 0000001e5c106499 0000000001c9c380 (XEN) 0000000000000000 0000000000000000 ffff8300864e3000 ffff8302e1596fb0 (XEN) ffff830461d27dd0 ffff830461d27dd0 000000000000004b 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffff830461d3a738 ffff8300864e3000 (XEN) ffff82d0804081e0 ffff830461d2e068 0000001e5c106499 ffff830461d2e060 (XEN) ffff82d0803f26a0 ffff82d080128cb3 0000001e00000000 ffff830461d2e080 (XEN) ffff82d080279944 ffff82d08015f295 0000001e5c0504ce ffff830461d3ad80 (XEN) 0000001e5c1054ba ffff82d08012f64e ffff82d0803f26a0 00000000ffffffff (XEN) ffff82d0803df880 0000000000000001 ffff82d0803df780 ffffffffffffffff (XEN) ffff830461d20000 ffff82d08012c03c ffffffffffffffff 00000000ffffffff (XEN) ffff830461d20000 ffff830461d2e068 0000001e5b762541 ffff830461d2e060 (XEN) ffff82d0803f26a0 ffff82d080162e3a 0000000000000000 ffff8300864e3000 (XEN) ffff8300864e3000 ffff8800f8bbbfd8 0000000000000000 ffff8800f8bbbfd8 (XEN) 0000000000000003 ffff8800f8bbbec0 0000000000000000 0000000000000246 (XEN) 0000000000007ff0 0000000000000000 0000000000000000 0000000000000000 (XEN) ffffffff810013aa 0000000000000001 0000000000000000 0000000000000001 (XEN) Xen call trace: (XEN) [] csched_schedule+0x373/0x1180 (XEN) [] schedule+0xf3/0x590 (XEN) [] reprogram_timer+0x75/0xe0 (XEN) [] timer_softirq_action+0x13e/0x210 (XEN) [] __do_softirq+0x7c/0xd0 (XEN) [] idle_loop+0x3a/0x70 (XEN) (XEN) Pagetable walk from ffff830088002c98: (XEN) L4[0x106] = 0000000086075063 ffffffffffffffff (XEN) L3[0x002] = 0000000086071063 ffffffffffffffff (XEN) L2[0x040] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff830088002c98 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. I hope your help. Sincerely, On Wed, Sep 6, 2017 at 4:45 PM, Andrew Cooper wrote: > On 06/09/2017 03:39, Minjun Hong wrote: > > Hello~~ > > I'm struggling to resolve a kernel panic problem during developing > > scheduler code. > > But I have not made any progress since I can not get any meaningful > > information from the serial log. > > When the panic occurred, always there is no call trace and only panic > > notification like following: > > > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) cpu:20, vcpu:20 in csched_schedule(1891) > > (XEN) cpu:21, vcpu:21 in csched_schedule(1891) > > (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907) > > (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907) > > (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907) > > (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(1907) > > (XEN) FATAL PAGE FAULT > > (XEN) [error_code=0000] > > (XEN) Faulting linear address: ffff830078efcc98 > > (XEN) **************************************** > > (XEN) > > (XEN) Reboot in five seconds... > > > > I'm using Xen-4.5.0 on my server having 2 Intel Xeon E5-2620 v4 cpus, > > 128 GB RAM(16 GB DDR4 * 4) and 1 TB HDD and, using Ubuntu 14.04 LTS. > > > > Is there any method to make the call trace show up or > > when there is no call trace, please tell me from where I should start > > to debug. > > > > Thanks in advance and I wait for your comments. > > There is a call trace, but as you've clearly added printk()'s to the > scheduler, the calltrace will be getting lost in the spew of logging > beforehand. > > From what you've printed, you've fallen over a bad pointer which isn't > present, although the offset into the directmap does look semi > plausible. Either way, you've got memory corruption of some kind. > > ~Andrew > --001a1144018aee750e05589bf5f7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for your replay even if this is quite late.
= As you mention, I know there is an error (or some errors) but I cannot gues= s where it is, so that I want to know where I should start debugging from.<= /div>
However, although I'm using serial console, I could get not e= nough clues only from the kernel log:
1) I could get what line an= d file caused the panic by using the call trace
2) What linear ad= dress brings about this situation; Faulting linear address

I think, literally, the 'Faulting linear address' is key p= oint because I heard that it represents bad pointer.
With the poi= nter(it is just address and I cannot infer what it does mean), is there any= way to figure out its real data or line in C source code?
I= f you have any other approach that can be used in some cases like this, cou= ld you please give me the guide?

Below is the = kernel log from serial console:

(XEN) ----[ X= en-4.5.0 =C2=A0x86_64 =C2=A0debug=3Dn =C2=A0Not tainted ]----
(XE= N) CPU: =C2=A0 =C2=A02
(XEN) RIP: =C2=A0 =C2=A0e008:[<ffff82d0= 80120973>] csched_schedule+0x373/0x1180
(XEN) RFLAGS: 00000000= 00010086 =C2=A0 CONTEXT: hypervisor
(XEN) rax: 00000000ffffffff = =C2=A0 rbx: ffff830087ffa000 =C2=A0 rcx: ffff830461d20000
(XEN) r= dx: ffff830088002c98 =C2=A0 rsi: ffff830461d20000 =C2=A0 rdi: 0000000000000= 000
(XEN) rbp: ffff830461ce2ae0 =C2=A0 rsp: ffff830461d27d10 =C2= =A0 r8: =C2=A00000001e582339ec
(XEN) r9: =C2=A00000000000000004 = =C2=A0 r10: 000000000000003c =C2=A0 r11: 0000000000000004
(XEN) r= 12: 0000000000000001 =C2=A0 r13: ffff82d0803f26a0 =C2=A0 r14: ffff830461c53= 000
(XEN) r15: 0000000000000000 =C2=A0 cr0: 000000008005003b =C2= =A0 cr4: 00000000003526f0
(XEN) cr3: 0000000086077000 =C2=A0 cr2:= ffff830088002c98
(XEN) ds: 002b =C2=A0 es: 002b =C2=A0 fs: 0000 = =C2=A0 gs: 0000 =C2=A0 ss: e010 =C2=A0 cs: e008
(XEN) Xen stack t= race from rsp=3Dffff830461d27d10:
(XEN) =C2=A0 =C2=A0ffff830461d0= 3950 ffff82d0804081e0 ffff830461c74068 ffff830461d27de0
(XEN) =C2= =A0 =C2=A0ffff830461c24c30 ffff830461cec800 ffff82d0804081e0 00000006000000= 02
(XEN) =C2=A0 =C2=A0ffff830461ce29d0 ffff830461d20000 ffff82d08= 04081e0 ffff830461d3a720
(XEN) =C2=A0 =C2=A00000000000000002 ffff= 830461d3a700 00ffffc000000000 ffff830461d27dd0
(XEN) =C2=A0 =C2= =A0ffff830461d27e68 ffff82d080408180 0000001e5c106499 0000000001c9c380
(XEN) =C2=A0 =C2=A00000000000000000 0000000000000000 ffff8300864e3000= ffff8302e1596fb0
(XEN) =C2=A0 =C2=A0ffff830461d27dd0 ffff830461d= 27dd0 000000000000004b 0000000000000000
(XEN) =C2=A0 =C2=A0000000= 0000000000 0000000000000000 ffff830461d3a738 ffff8300864e3000
(XE= N) =C2=A0 =C2=A0ffff82d0804081e0 ffff830461d2e068 0000001e5c106499 ffff8304= 61d2e060
(XEN) =C2=A0 =C2=A0ffff82d0803f26a0 ffff82d080128cb3 000= 0001e00000000 ffff830461d2e080
(XEN) =C2=A0 =C2=A0ffff82d08027994= 4 ffff82d08015f295 0000001e5c0504ce ffff830461d3ad80
(XEN) =C2=A0= =C2=A00000001e5c1054ba ffff82d08012f64e ffff82d0803f26a0 00000000ffffffff<= /div>
(XEN) =C2=A0 =C2=A0ffff82d0803df880 0000000000000001 ffff82d0803d= f780 ffffffffffffffff
(XEN) =C2=A0 =C2=A0ffff830461d20000 ffff82d= 08012c03c ffffffffffffffff 00000000ffffffff
(XEN) =C2=A0 =C2=A0ff= ff830461d20000 ffff830461d2e068 0000001e5b762541 ffff830461d2e060
(XEN) =C2=A0 =C2=A0ffff82d0803f26a0 ffff82d080162e3a 0000000000000000 ffff= 8300864e3000
(XEN) =C2=A0 =C2=A0ffff8300864e3000 ffff8800f8bbbfd8= 0000000000000000 ffff8800f8bbbfd8
(XEN) =C2=A0 =C2=A000000000000= 00003 ffff8800f8bbbec0 0000000000000000 0000000000000246
(XEN) = =C2=A0 =C2=A00000000000007ff0 0000000000000000 0000000000000000 00000000000= 00000
(XEN) =C2=A0 =C2=A0ffffffff810013aa 0000000000000001 000000= 0000000000 0000000000000001
(XEN) Xen call trace:
(XEN)= =C2=A0 =C2=A0[<ffff82d080120973>] csched_schedule+0x373/0x1180
=
(XEN) =C2=A0 =C2=A0[<ffff82d080128cb3>] schedule+0xf3/0x590
(XEN) =C2=A0 =C2=A0[<ffff82d08015f295>] reprogram_timer+0x75/0x= e0
(XEN) =C2=A0 =C2=A0[<ffff82d08012f64e>] timer_softirq_ac= tion+0x13e/0x210
(XEN) =C2=A0 =C2=A0[<ffff82d08012c03c>] __= do_softirq+0x7c/0xd0
(XEN) =C2=A0 =C2=A0[<ffff82d080162e3a>= ] idle_loop+0x3a/0x70
(XEN)
(XEN) Pagetable walk from f= fff830088002c98:
(XEN) =C2=A0L4[0x106] =3D 0000000086075063 fffff= fffffffffff
(XEN) =C2=A0L3[0x002] =3D 0000000086071063 ffffffffff= ffffff
(XEN) =C2=A0L2[0x040] =3D 0000000000000000 fffffffffffffff= f
(XEN)
(XEN) ****************************************<= /div>
(XEN) Panic on CPU 2:
(XEN) FATAL PAGE FAULT
= (XEN) [error_code=3D0000]
(XEN) Faulting linear address: ffff8300= 88002c98
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting = with ACPI MEMORY or I/O RESET_REG.

I hope your help.

Sincerely,

On Wed, Sep 6, = 2017 at 4:45 PM, Andrew Cooper <andrew.cooper3@citrix.com><= /span> wrote:
On 06/09/2017 03:39, Minjun Hong wrote:
> Hello~~
> I'm struggling to resolve a kernel panic problem during developing=
> scheduler code.
> But I have not made any progress since I can not get any meaningful > information from the serial log.
> When the panic occurred, always there is no call trace and only panic<= br> > notification like following:
>
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) cpu:20, vcpu:20 in csched_schedule(1891)
> (XEN) cpu:21, vcpu:21 in csched_schedule(1891)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(190= 7)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(190= 7)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(190= 7)
> (XEN) cpumask_test_cpu(cpu, prv->in_cosched) in csched_schedule(190= 7)
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=3D0000]
> (XEN) Faulting linear address: ffff830078efcc98
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
> I'm using Xen-4.5.0 on my server having 2 Intel Xeon E5-2620 v4 cp= us,
> 128 GB RAM(16 GB DDR4 * 4) and 1 TB HDD and, using Ubuntu 14.04 LTS. >
> Is there any method to make the call trace show up or=C2=A0
> when there is no call trace, please tell me from where I should start<= br> > to debug.
>
> Thanks in advance and I wait for your comments.

There is a call trace, but as you've clearly added printk()= 's to the
scheduler, the calltrace will be getting lost in the spew of logging
beforehand.

>>From what you've printed, you've fallen over a bad pointer which is= n't
present, although the offset into the directmap does look semi
plausible.=C2=A0 Either way, you've got memory corruption of some kind.=

~Andrew

--001a1144018aee750e05589bf5f7-- --===============5282422412209875479== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============5282422412209875479==--