From: Guillaume Nault <g.nault@alphalink.fr>
To: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: ppp/pppoe, still panic 4.15.3 in ppp_push
Date: Wed, 21 Feb 2018 19:38:06 +0100 [thread overview]
Message-ID: <20180221183805.GA1322@alphalink.fr> (raw)
In-Reply-To: <24bfdcd534ad2cbb00e1a9509f1c3c32@nuclearcat.com>
On Sun, Feb 18, 2018 at 12:01:02PM +0200, Denys Fedoryshchenko wrote:
> On 2018-02-16 20:48, Guillaume Nault wrote:
> > On Fri, Feb 16, 2018 at 01:13:18PM +0200, Denys Fedoryshchenko wrote:
> > > As far as i can see there is only KASAN triggered again(and server
> > > rebooted
> > > shortly after that), but nothing else:
> > >
> > Ok, so no refcount failure detected. Not what I expected... but that's
> > still an information. It's getting even harder to find a ppp scenario
> > that could lead to such symptoms.
> > If that's acceptable for you, you can try reverting the few commits
> > that entered after 4.14.
> >
> > 02612bb05e51df8489db5e94d0cf8d1c81f87b0c pppoe: take ->needed_headroom
> > of lower device into account on xmit
> > 0171c41835591e9aa2e384b703ef9a6ae367c610 ppp: unlock all_ppp_mutex
> > before registering device
> > e6675000f9a404f7651724c0b2e2e71f7247d3a1 ppp: exit_net cleanup checks
> > added
> > f02b2320b27c16b644691267ee3b5c110846f49e ppp: Destroy the mutex when
> > cleanup
> > 90e229ef61fad240554f5899eb122fbe44990f78 ppp: allow usage in namespaces
> > 709c89b45b874b2f81a074b8802a736009873f48 drivers, net, ppp: convert
> > syncppp.refcnt from atomic_t to refcount_t
> > d780cd44e3cea119a3346e6d7c04d35b9c50d54b drivers, net, ppp: convert
> > ppp_file.refcnt from atomic_t to refcount_t
> > 313a912155c78ed87ad6fca175dc56b75fd00a58 drivers, net, ppp: convert
> > asyncppp.refcnt from atomic_t to refcount_t
> >
> > Sorry, but I have nothing better to propose for now. At least that
> > should help narrowing the problem space.
> > I'm going to stress test ppp_generic and pppoe on my side.
> >
> Quick update.
> Testing 5 first patches didn't changed anything.
> But revering more, with last 4 patches also (i did all together) is changing
> things, probably i need to repeat one night more reverting just all
> refcount_t patches.
>
So you got the following trace with all 8 patches reverted, right?
I prefer to concentrate on the other traces for now. If this one tends
to be reproducible, you can try to activate lockdep (for lack of better
suggestion).
> [25222.173840] ------------[ cut here ]------------
> [25222.174259] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 3 timed out
> [25222.174618] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:323
> dev_watchdog+0x44a/0x555
> [25222.175212] Modules linked in: pppoe pppox ppp_generic slhc netconsole
> configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp
> nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
> t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set
> xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
> t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables
> x_tables 8021q garp mrp stp llc ixgbe dca
> [25222.177133] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G B W
> 4.15.3-build-0134 #6
> [25222.184121] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80
> 04/02/2015
> [25222.184457] RIP: 0010:dev_watchdog+0x44a/0x555
> [25222.184791] RSP: 0018:ffff8803f22c7d98 EFLAGS: 00010292
> [25222.185127] RAX: 0000000000000000 RBX: ffff8803ded00438 RCX:
> 0000000000000000
> [25222.185463] RDX: 0000000000000001 RSI: 0000000000000002 RDI:
> ffffed007e458fa8
> [25222.185797] RBP: ffff8803ded00000 R08: 0000000000000001 R09:
> 0000000000000000
> [25222.186133] R10: ffff8803f22c7e30 R11: 0000000000000001 R12:
> ffff8803ded28450
> [25222.186471] R13: 0000000000000003 R14: dffffc0000000000 R15:
> ffff8803ded283c0
> [25222.186804] FS: 0000000000000000(0000) GS:ffff8803f22c0000(0000)
> knlGS:0000000000000000
> [25222.187401] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [25222.187739] CR2: 0000561f5bffc128 CR3: 0000000445a0d003 CR4:
> 00000000001606e0
> [25222.188077] Call Trace:
> [25222.188410] <IRQ>
> [25222.188740] ? dev_graft_qdisc+0xfa/0xfa
> [25222.189072] call_timer_fn+0x15/0x72
> [25222.189407] ? dev_graft_qdisc+0xfa/0xfa
> [25222.189741] expire_timers+0x1b9/0x1d5
> [25222.190072] run_timer_softirq+0x184/0x361
> [25222.190400] ? expire_timers+0x1d5/0x1d5
> [25222.190723] ? enqueue_hrtimer+0xce/0xd8
> [25222.191048] ? __hrtimer_run_queues+0x1ec/0x24d
> [25222.191373] __do_softirq+0x17f/0x34a
> [25222.191702] irq_exit+0x8f/0xf9
> [25222.192034] smp_apic_timer_interrupt+0xcb/0xd6
> [25222.192365] apic_timer_interrupt+0x92/0xa0
> [25222.192695] </IRQ>
> [25222.193023] RIP: 0010:mwait_idle+0x99/0xac
> [25222.193355] RSP: 0018:ffff8803f030fef8 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff11
> [25222.193956] RAX: 0000000000000000 RBX: ffff8803f02e3500 RCX:
> 0000000000000000
> [25222.194290] RDX: 1ffff1007e05c6a0 RSI: 0000000000000000 RDI:
> 0000000000000000
> [25222.194626] RBP: ffff8803f02e3500 R08: ffffed007ccc8eef R09:
> ffff8803e6647728
> [25222.194958] R10: ffff8803f030fdd0 R11: 0000000000000001 R12:
> 0000000000000000
> [25222.195292] R13: dffffc0000000000 R14: ffffed007e05c6a0 R15:
> ffff8803f02e3500
> [25222.195627] do_idle+0xe6/0x19a
> [25222.195963] cpu_startup_entry+0x18/0x1a
> [25222.196295] secondary_startup_64+0xa5/0xb0
> [25222.196625] Code: 68 87 40 01 00 75 3f 48 89 ef c6 05 5c 87 40 01 01 e8
> 64 93 fa ff 44 89 e9 48 89 c2 48 89 ee 48 c7 c7 80 28 68 83 e8 25 69 6d fe
> <0f> ff eb 17 41 ff c5 49 81 c4 40 0
> 1 00 00 44 3b 6c 24 04 0f 85
> [25222.197511] ---[ end trace 4b04e9c6754a1cd5 ]---
>
> and then
>
> [25222.197853] ixgbe 0000:04:00.1 eth1: initiating reset due to tx timeout
> [25222.198194] ixgbe 0000:04:00.1 eth1: Reset adapter
> [25227.805896] ixgbe 0000:04:00.1 eth1: initiating reset due to tx timeout
> [25232.925944] ixgbe 0000:04:00.1 eth1: initiating reset due to tx timeout
> [25236.084968] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [accel-pppd:12627]
> [25236.085562] Modules linked in: pppoe pppox ppp_generic slhc netconsole
> configfs coretemp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp
> nf_conntrack_proto_gre tun xt_TEE nf_dup_ipv4 x
> t_REDIRECT nf_nat_redirect xt_nat xt_TCPMSS ipt_REJECT nf_reject_ipv4 xt_set
> xt_string xt_connmark xt_DSCP xt_mark xt_tcpudp ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink iptable_mangle iptable_filter iptable_na
> t nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables
> x_tables 8021q garp mrp stp llc ixgbe dca
> [25236.087496] CPU: 0 PID: 12627 Comm: accel-pppd Tainted: G B W
> 4.15.3-build-0134 #6
> [25236.088095] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80
> 04/02/2015
> [25236.088430] RIP: 0010:queued_spin_lock_slowpath+0xb1/0x418
> [25236.088759] RSP: 0018:ffff8803e6457a98 EFLAGS: 00000213 ORIG_RAX:
> ffffffffffffff11
> [25236.089353] RAX: 00000000000001fb RBX: ffff880345e75fe0 RCX:
> ffffffff811aeca3
> [25236.089685] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
> ffff880345e75fe0
> [25236.090026] RBP: ffffed0068bcebfc R08: 06030a0001012180 R09:
> ffffed006cc9beb2
> [25236.090369] R10: ffffed006cc9beb3 R11: 0000000000000001 R12:
> 0000000000000003
> [25236.090705] R13: 0000000000008021 R14: 0000000000008021 R15:
> 00000000034e4b06
> [25236.091043] FS: 00007f94bd26c700(0000) GS:ffff8803f2200000(0000)
> knlGS:0000000000000000
> [25236.091636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [25236.091966] CR2: 00007ffc0935eff8 CR3: 00000003d709b003 CR4:
> 00000000001606f0
> [25236.092304] Call Trace:
> [25236.092638] ppp_push+0x112/0xdda [ppp_generic]
> [25236.092975] ? enqueue_hrtimer+0xce/0xd8
> [25236.093304] ? hrtimer_start_range_ns+0x827/0x854
> [25236.093635] __ppp_xmit_process+0xc6a/0xdd5 [ppp_generic]
> [25236.093969] ? __kmalloc_reserve.isra.5+0x29/0x96
> [25236.094302] ? memset+0x1f/0x31
> [25236.094631] ? ppp_receive_nonmp_frame+0x138c/0x138c [ppp_generic]
> [25236.094962] ? __alloc_skb+0x2ec/0x431
> [25236.095292] ? __kmalloc_reserve.isra.5+0x96/0x96
> [25236.095620] ? timerfd_release+0x1d3/0x1d3
> [25236.095950] ppp_xmit_process+0xc3/0x194 [ppp_generic]
> [25236.096284] ppp_write+0x1b7/0x1c3 [ppp_generic]
> [25236.096617] __vfs_write+0xd9/0x4ad
> [25236.096953] ? kernel_read+0xed/0xed
> [25236.097283] ? vfs_copy_file_range+0x6a8/0x6a8
> [25236.097614] ? bit_waitqueue+0x2a/0x2a
> [25236.097946] ? __fsnotify_inode_delete+0xc/0xc
> [25236.098276] ? __fsnotify_inode_delete+0xc/0xc
> [25236.098610] ? SyS_sendmmsg+0x13/0x13
> [25236.098936] vfs_write+0x18c/0x378
> [25236.099258] SyS_write+0xc4/0x13b
> [25236.099579] ? SyS_read+0x13b/0x13b
> [25236.099902] ? exit_to_usermode_loop+0x7c/0xaf
> [25236.100225] ? SyS_read+0x13b/0x13b
> [25236.100550] do_syscall_64+0x1b1/0x31f
> [25236.100879] entry_SYSCALL_64_after_hwframe+0x21/0x86
> [25236.101210] RIP: 0033:0x7f94bca53b2d
> [25236.101536] RSP: 002b:00007f94bd26bb80 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000001
> [25236.102127] RAX: ffffffffffffffda RBX: 00007f94bb59f1e3 RCX:
> 00007f94bca53b2d
> [25236.102461] RDX: 000000000000000c RSI: 00007f94b78895d0 RDI:
> 0000000000002f92
> [25236.102793] RBP: 00007f94bd26bbb0 R08: 0000000000000030 R09:
> 0000000000000027
> [25236.103127] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007f94b6450eb8
> [25236.103460] R13: 00007ffc8c047a6f R14: 0000000000000000 R15:
> 00007f94bd26c700
> [25236.103790] Code: 83 03 00 00 48 89 dd 49 89 dc 48 b8 00 00 00 00 00 fc
> ff df 48 c1 ed 03 41 83 e4 07 48 01 c5 41 83 c4 03 8a 45 00 41 38 c4 7c 0c
> <84> c0 74 08 48 89 df e8 31 54 17 0
> 0 8b 03 84 c0 74 04 f3 90 eb
>
> Then system autorebooted.
> Maybe i am hitting some qdisc bug now...
next prev parent reply other threads:[~2018-02-21 18:38 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-14 13:17 ppp/pppoe, still panic 4.15.3 in ppp_push Denys Fedoryshchenko
2018-02-14 16:07 ` Guillaume Nault
2018-02-14 16:29 ` Denys Fedoryshchenko
2018-02-14 16:47 ` Guillaume Nault
2018-02-14 16:49 ` Denys Fedoryshchenko
2018-02-14 17:25 ` Guillaume Nault
2018-02-15 10:19 ` Denys Fedoryshchenko
2018-02-15 15:55 ` Guillaume Nault
2018-02-15 16:01 ` Denys Fedoryshchenko
2018-02-15 19:31 ` Guillaume Nault
2018-02-15 19:34 ` Denys Fedoryshchenko
2018-02-15 19:42 ` Guillaume Nault
2018-02-16 11:13 ` Denys Fedoryshchenko
2018-02-16 18:48 ` Guillaume Nault
2018-02-18 10:01 ` Denys Fedoryshchenko
2018-02-21 18:38 ` Guillaume Nault [this message]
2018-02-20 9:05 ` Denys Fedoryshchenko
2018-02-21 10:26 ` Denys Fedoryshchenko
2018-02-21 18:55 ` Guillaume Nault
2018-02-21 19:30 ` Denys Fedoryshchenko
2018-02-21 20:04 ` Cong Wang
2018-02-22 18:30 ` Guillaume Nault
2018-02-22 18:51 ` Denys Fedoryshchenko
2018-02-23 9:38 ` Guillaume Nault
2018-02-23 9:41 ` Denys Fedoryshchenko
2018-02-23 10:07 ` Guillaume Nault
2018-02-23 10:54 ` Denys Fedoryshchenko
2018-02-24 21:22 ` Denys Fedoryshchenko
2018-02-27 10:58 ` Denys Fedoryshchenko
2018-02-27 18:56 ` Guillaume Nault
2018-03-01 20:01 ` Guillaume Nault
2018-03-01 20:07 ` Denys Fedoryshchenko
2018-03-02 17:43 ` Guillaume Nault
2018-03-03 9:33 ` Denys Fedoryshchenko
2018-03-05 17:22 ` Guillaume Nault
2018-02-27 18:54 ` Guillaume Nault
2018-02-15 19:20 ` Guillaume Nault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180221183805.GA1322@alphalink.fr \
--to=g.nault@alphalink.fr \
--cc=netdev@vger.kernel.org \
--cc=nuclearcat@nuclearcat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.