public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Andres Freund <andres@anarazel.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Guenter Roeck <linux@roeck-us.net>,
	linux-kernel@vger.kernel.org,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: upstream kernel crashes
Date: Mon, 15 Aug 2022 03:29:43 -0400	[thread overview]
Message-ID: <20220815031549-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220815071143.n2t5xsmifnigttq2@awork3.anarazel.de>

On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Some of the symptoms could be related to the issue in this thread, hence
> > > listing them here
> > 
> > Smells like slab corruption to me, and the problems may end up being
> > then largely random just depending on who ends up using the allocation
> > that gets trampled on.
> > 
> > I wouldn't be surprised if it's all the same thing - including your
> > network issue.
> 
> Yea. As I just wrote in
> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> bisected it down to one commit (762faee5a267). With that commit I only see the
> networking issue across a few reboots, but with ebcce4926365 some boots oops
> badly and other times it' "just" network not working.
> 
> 
> [    2.447668] general protection fault, probably for non-canonical address 0xffff000000000800: 0000 [#1] PREEMPT SMP PTI
> [    2.449168] CPU: 1 PID: 109 Comm: systemd-udevd Not tainted 5.19.0-bisect8-00051-gebcce4926365 #8
> [    2.450397] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [    2.451670] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.452399] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.455454] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.456181] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.457195] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.458211] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [    2.459183] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.460268] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [    2.461368] FS:  00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [    2.462605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.463436] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [    2.464527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.465520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.466509] Call Trace:
> [    2.466882]  <TASK>
> [    2.467218]  copy_process+0x1eb/0x1a00
> [    2.467827]  ? _raw_spin_unlock_irqrestore+0x16/0x30
> [    2.468578]  kernel_clone+0xba/0x400
> [    2.470455]  __do_sys_clone+0x78/0xa0
> [    2.471006]  do_syscall_64+0x37/0x90
> [    2.471526]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [    2.472267] RIP: 0033:0x7fd02bf98cb3
> [    2.472889] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
> [    2.475504] RSP: 002b:00007ffc6a3abf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
> [    2.476565] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd02bf98cb3
> [    2.477554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> [    2.478574] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [    2.479608] R10: 00007fd02b8b8b90 R11: 0000000000000246 R12: 0000000000000001
> [    2.480675] R13: 00007ffc6a3ac0c0 R14: 0000000000000000 R15: 0000000000000001
> [    2.481686]  </TASK>
> [    2.482119] Modules linked in:
> [    2.482704] ---[ end trace 0000000000000000 ]---
> [    2.483456] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.484282] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.487024] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.487817] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.488805] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.489869] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [    2.490842] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.491905] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [    2.492975] FS:  00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [    2.494140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.495082] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [    2.496080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.497084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.498524] systemd-udevd (109) used greatest stack depth: 13688 bytes left
> [    2.503905] general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#2] PREEMPT SMP PTI
> [    2.505504] CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G      D           5.19.0-bisect8-00051-gebcce4926365 #8
> [    2.507037] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [    2.508313] RIP: 0010:rcu_core+0x280/0x920
> [    2.508968] Code: 3f 00 00 48 89 c2 48 85 c0 0f 84 2b 03 00 00 49 89 dd 48 83 c3 01 0f 1f 44 00 00 48 8b 42 08 48 89 d7 48 c7 42 08 00 00 00 00 <ff> d0 0f 1f 00 65 8b 05 64 f5 ad 6c f6 c4 01 75 97 be 00 02 00 00
> [    2.511684] RSP: 0000:ffffa2b40007fe20 EFLAGS: 00010202
> [    2.512410] RAX: ffff000000000000 RBX: 0000000000000002 RCX: 0000000080170011
> [    2.513497] RDX: ffff8da783372a20 RSI: 0000000080170011 RDI: ffff8da783372a20
> [    2.514604] RBP: ffff8da8b7c2b940 R08: 0000000000000001 R09: ffffffff9353b752
> [    2.515667] R10: ffffffff94a060c0 R11: 000000000009b776 R12: ffff8da78020c000
> [    2.516650] R13: 0000000000000001 R14: ffff8da8b7c2b9b8 R15: 0000000000000000
> [    2.517628] FS:  0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [    2.518840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.519645] CR2: 0000557194db70f8 CR3: 0000000100364006 CR4: 00000000003706f0
> [    2.520641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    2.521629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.522592] Call Trace:
> [    2.522963]  <TASK>
> [    2.523299]  __do_softirq+0xe1/0x2ec
> [    2.523883]  ? sort_range+0x20/0x20
> [    2.524404]  run_ksoftirqd+0x25/0x30
> [    2.524944]  smpboot_thread_fn+0x180/0x220
> [    2.525519]  kthread+0xe1/0x110
> [    2.526001]  ? kthread_complete_and_exit+0x20/0x20
> [    2.526673]  ret_from_fork+0x1f/0x30
> [    2.527182]  </TASK>
> [    2.527518] Modules linked in:
> [    2.528005] ---[ end trace 0000000000000000 ]---
> [    2.528662] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [    2.529524] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [    2.532396] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [    2.533201] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [    2.534376] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [    2.535398] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> Begin: Loading e[    2.536401] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [    2.537641] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> ssential drivers[    2.538737] FS:  0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [    2.540028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  ... done.[    2.540843] CR2: 0000557194db70f8 CR3: 000000015080c002 CR4: 00000000003706f0
> [    2.541953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 
> [    2.542924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    2.543902] Kernel panic - not syncing: Fatal exception in interrupt
> [    2.544967] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [    2.546637] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> 
> 
> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> can create a kernel with some more debugging stuff enabled, if somebody tells
> me what'd work best here.
> 
> 
> Greetings,
> 
> Andres Freund

Thanks a lot for the work!
Just a small clarification:

So IIUC you see several issues, right?

With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.

With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.

-- 
MST


  reply	other threads:[~2022-08-15  7:29 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-14 21:26 upstream kernel crashes Guenter Roeck
2022-08-14 21:40 ` Linus Torvalds
2022-08-14 22:37   ` Andres Freund
2022-08-14 22:47     ` Linus Torvalds
2022-08-15  1:04       ` Jens Axboe
2022-08-15  1:36         ` Andres Freund
2022-08-15  3:18           ` Linus Torvalds
2022-08-15  7:11             ` Andres Freund
2022-08-15  7:29               ` Michael S. Tsirkin [this message]
2022-08-15  7:46                 ` Andres Freund
2022-08-15  7:53                   ` Michael S. Tsirkin
2022-08-15  8:02                   ` Michael S. Tsirkin
2022-08-15  7:51               ` Michael S. Tsirkin
2022-08-15  8:15                 ` Andres Freund
2022-08-15  8:28                   ` Michael S. Tsirkin
2022-08-15  8:34                     ` Andres Freund
2022-08-15 15:40                       ` Michael S. Tsirkin
2022-08-15 16:45                         ` Andres Freund
2022-08-15 16:50                           ` Michael S. Tsirkin
2022-08-15 17:46                             ` Andres Freund
2022-08-15 20:21                               ` Michael S. Tsirkin
2022-08-15 20:53                                 ` Andres Freund
2022-08-15 21:04                                   ` Andres Freund
2022-08-15 21:10                                     ` Andres Freund
2022-08-15 21:32                                   ` Michael S. Tsirkin
2022-08-16  2:45                                     ` Xuan Zhuo
2022-08-17  6:13                                     ` Dmitry Vyukov
2022-08-17  6:36                                       ` Xuan Zhuo
2022-08-17 10:53                                         ` Michael S. Tsirkin
2022-08-17 15:58                                         ` Linus Torvalds
2022-08-18  1:55                                           ` Xuan Zhuo
2022-08-15 20:45                             ` Guenter Roeck
2022-08-15  6:36           ` Michael S. Tsirkin
2022-08-15  7:17             ` Andres Freund
2022-08-15  7:43               ` Michael S. Tsirkin
2022-08-15  1:17       ` Guenter Roeck
2022-08-15  1:29         ` Jens Axboe
2022-08-15  9:43 ` Michael S. Tsirkin
2022-08-15 15:49   ` Guenter Roeck
2022-08-15 16:01     ` Michael S. Tsirkin
2022-08-15 18:22       ` Guenter Roeck
2022-08-15 18:37         ` Linus Torvalds
2022-08-15 20:38           ` Guenter Roeck
2022-08-17 17:12 ` Linus Torvalds
2022-08-18  1:08   ` Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220815031549-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=andres@anarazel.de \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox