From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nuclearcat.com ([144.76.183.226]:36258 "EHLO nuclearcat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750764AbeBVSvX (ORCPT ); Thu, 22 Feb 2018 13:51:23 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 22 Feb 2018 20:51:19 +0200 From: Denys Fedoryshchenko To: Guillaume Nault Cc: Cong Wang , Linux Kernel Network Developers , netdev-owner@vger.kernel.org Subject: Re: ppp/pppoe, still panic 4.15.3 in ppp_push In-Reply-To: <20180222183038.GD1322@alphalink.fr> References: <20180214160739.GF1422@alphalink.fr> <3ffb97617fd5dcb436848b5522850b7f@nuclearcat.com> <20180214164728.GG1422@alphalink.fr> <802ba894fee75e34576d39aed5022dc8@nuclearcat.com> <20180214172549.GH1422@alphalink.fr> <20180215155517.GI1422@alphalink.fr> <79ba31ea47137ca539beea112bec8e57@nuclearcat.com> <20180215193114.GK1422@alphalink.fr> <20180222183038.GD1322@alphalink.fr> Message-ID: <336f2012881f89f2b86b29a853c5a672@nuclearcat.com> Sender: netdev-owner@vger.kernel.org List-ID: On 2018-02-22 20:30, Guillaume Nault wrote: > On Wed, Feb 21, 2018 at 12:04:30PM -0800, Cong Wang wrote: >> On Thu, Feb 15, 2018 at 11:31 AM, Guillaume Nault >> wrote: >> > On Thu, Feb 15, 2018 at 06:01:16PM +0200, Denys Fedoryshchenko wrote: >> >> On 2018-02-15 17:55, Guillaume Nault wrote: >> >> > On Thu, Feb 15, 2018 at 12:19:52PM +0200, Denys Fedoryshchenko wrote: >> >> > > Here we go: >> >> > > >> >> > > [24558.921549] >> >> > > ================================================================== >> >> > > [24558.922167] BUG: KASAN: use-after-free in >> >> > > ppp_ioctl+0xa6a/0x1522 >> >> > > [ppp_generic] >> >> > > [24558.922776] Write of size 8 at addr ffff8803d35bf3f8 by task >> >> > > accel-pppd/12622 >> >> > > [24558.923113] >> >> > > [24558.923451] CPU: 0 PID: 12622 Comm: accel-pppd Tainted: G >> >> > > W >> >> > > 4.15.3-build-0134 #1 >> >> > > [24558.924058] Hardware name: HP ProLiant DL320e Gen8 v2, >> >> > > BIOS P80 >> >> > > 04/02/2015 >> >> > > [24558.924406] Call Trace: >> >> > > [24558.924753] dump_stack+0x46/0x59 >> >> > > [24558.925103] print_address_description+0x6b/0x23b >> >> > > [24558.925451] ? ppp_ioctl+0xa6a/0x1522 [ppp_generic] >> >> > > [24558.925797] kasan_report+0x21b/0x241 >> >> > > [24558.926136] ppp_ioctl+0xa6a/0x1522 [ppp_generic] >> >> > > [24558.926479] ? ppp_nl_newlink+0x1da/0x1da [ppp_generic] >> >> > > [24558.926829] ? sock_sendmsg+0x89/0x99 >> >> > > [24558.927176] ? __vfs_write+0xd9/0x4ad >> >> > > [24558.927523] ? kernel_read+0xed/0xed >> >> > > [24558.927872] ? SyS_getpeername+0x18c/0x18c >> >> > > [24558.928213] ? bit_waitqueue+0x2a/0x2a >> >> > > [24558.928561] ? wake_atomic_t_function+0x115/0x115 >> >> > > [24558.928898] vfs_ioctl+0x6e/0x81 >> >> > > [24558.929228] do_vfs_ioctl+0xa00/0xb10 >> >> > > [24558.929571] ? sigprocmask+0x1a6/0x1d0 >> >> > > [24558.929907] ? sigsuspend+0x13e/0x13e >> >> > > [24558.930239] ? ioctl_preallocate+0x14e/0x14e >> >> > > [24558.930568] ? SyS_rt_sigprocmask+0xf1/0x142 >> >> > > [24558.930904] ? sigprocmask+0x1d0/0x1d0 >> >> > > [24558.931252] SyS_ioctl+0x39/0x55 >> >> > > [24558.931595] ? do_vfs_ioctl+0xb10/0xb10 >> >> > > [24558.931942] do_syscall_64+0x1b1/0x31f >> >> > > [24558.932288] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> >> > > [24558.932627] RIP: 0033:0x7f302849d8a7 >> >> > > [24558.932965] RSP: 002b:00007f3029a52af8 EFLAGS: 00000206 >> >> > > ORIG_RAX: >> >> > > 0000000000000010 >> >> > > [24558.933578] RAX: ffffffffffffffda RBX: 00007f3027d861e3 RCX: >> >> > > 00007f302849d8a7 >> >> > > [24558.933927] RDX: 00007f3023f49468 RSI: 000000004004743a RDI: >> >> > > 0000000000003a67 >> >> > > [24558.934266] RBP: 00007f3029a52b20 R08: 0000000000000000 R09: >> >> > > 000055c8308d8e40 >> >> > > [24558.934607] R10: 0000000000000008 R11: 0000000000000206 R12: >> >> > > 00007f3023f49358 >> >> > > [24558.934947] R13: 00007ffe86e5723f R14: 0000000000000000 R15: >> >> > > 00007f3029a53700 >> >> > > [24558.935288] >> >> > > [24558.935626] Allocated by task 12622: >> >> > > [24558.935972] ppp_register_net_channel+0x5f/0x5c6 >> >> > > [ppp_generic] >> >> > > [24558.936306] pppoe_connect+0xab7/0xc71 [pppoe] >> >> > > [24558.936640] SyS_connect+0x14b/0x1b7 >> >> > > [24558.936975] do_syscall_64+0x1b1/0x31f >> >> > > [24558.937319] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> >> > > [24558.937655] >> >> > > [24558.937993] Freed by task 12622: >> >> > > [24558.938321] kfree+0xb0/0x11d >> >> > > [24558.938658] ppp_release+0x111/0x120 [ppp_generic] >> >> > > [24558.938994] __fput+0x2ba/0x51a >> >> > > [24558.939332] task_work_run+0x11c/0x13d >> >> > > [24558.939676] exit_to_usermode_loop+0x7c/0xaf >> >> > > [24558.940022] do_syscall_64+0x2ea/0x31f >> >> > > [24558.940368] entry_SYSCALL_64_after_hwframe+0x21/0x86 >> >> > > [24558.947099] >> >> > >> >> > Your first guess was right. It looks like we have an issue with >> >> > reference counting on the channels. Can you send me your ppp_generic.o? >> >> http://nuclearcat.com/ppp_generic.o >> >> Compiled with gcc version 6.4.0 (Gentoo 6.4.0-r1 p1.3) >> >> >> > From what I can see, ppp_release() and ioctl(PPPIOCCONNECT) are called >> > concurrently on the same ppp_file. Even if this ppp_file was pointed at >> > by two different file descriptors, I can't see how this could defeat >> > the reference counting mechanism. I'm going to think more about it. >> >> For me it looks like pch->clist is not removed from the list >> ppp->channels >> when destroyed via ppp_release(). But I don't want to pretend I >> understand >> ppp logic. >> > I've thought about that too, but couldn't find a scenario that could > trigger the bug. > > To get ->private_data pointing to a struct channel pointer, a file > needs > to ioctl(PPPIOCATTCHAN) first. For this call to succeed, the channel > must have been registered with ppp_register_net_channel(). Both > operations take a reference on the channel, which means that, before > adding pch->clist to a ppp->channels list (with ppp_connect_channel()), > the channel is already held by a /dev/ppp file and by the code that > registered the channel in the first place. > > Therefore, closing the /dev/ppp file can't be enough to make > ppp_release() free the channel. We need to unregister the channel for > the refcount to drop to 0. And ppp_unregister_channel(), removes > pch->clist from ppp->channels before decrementing the reference > counter. > > I'm simplifying a bit, and there could always be a race somewhere. But > I couldn't find anything so far. I'm using accel-ppp that has unit-cache option, i guess for "reusing" ppp interfaces (because creating a lot of interfaces on BRAS with 8k users quite expensive). Maybe it is somehow related and can be that scenario causing this bug?