From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guillaume Nault Subject: Re: [PATCH net] ppp: don't override sk->sk_state in pppoe_flush_dev() Date: Tue, 6 Oct 2015 10:50:36 +0200 Message-ID: <20151006085036.GC2882@alphalink.fr> References: <7045c1dad4647944f61c958511d45fcd@visp.net.lb> <20151002175426.GE2911@alphalink.fr> <356ca8b8094bb2460c0182c00e120378@visp.net.lb> <1444018131.14634.6.camel@mattb-dl> <20151005122459.GG2911@alphalink.fr> <1444091180.1468.17.camel@mattb-dl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "core@irc.lg.ua" , "netdev@vger.kernel.org" , "davem@davemloft.net" , "paulus@samba.org" , "nuclearcat@nuclearcat.com" To: Matt Bennett Return-path: Received: from zimbra.alphalink.fr ([217.15.80.77]:39037 "EHLO mail-2-cbv2.admin.alphalink.fr" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751340AbbJFIuj (ORCPT ); Tue, 6 Oct 2015 04:50:39 -0400 Content-Disposition: inline In-Reply-To: <1444091180.1468.17.camel@mattb-dl> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, Oct 06, 2015 at 12:26:20AM +0000, Matt Bennett wrote: > On Mon, 2015-10-05 at 14:24 +0200, Guillaume Nault wrote: > > On Mon, Oct 05, 2015 at 04:08:51AM +0000, Matt Bennett wrote: > > > Hi, I am seeing this panic occur occasionally however I am unsure how to > > > go about reproducing it. Is it enough to simply keep creating and > > > tearing down the PPP interface? I can also test and/or investigate this > > > issue if a suitable reproduction method is available. > > > > > There are at least two issues resulting in similar Oops. > > > > The first one goes with MTU/address/link state updates on the > > underlying interface: any such update on an interface used by a > > PPPoE connection will generally result in an Oops when releasing the > > PPPoE connection. This is fixed by e6740165b8f7 ("ppp: don't override > > sk->sk_state in pppoe_flush_dev()"). > > Without your patch ("ppp: don't override sk->sk_state in > pppoe_flush_dev()") I can see the following function calls being made > when changing the mtu on the underlying ethernet interface for the PPPoE > connection: > > 1. pppoe_flush_dev() - setting PPPOX_ZOMBIE > > 2. pppoe_connect - setting PPPOX_NONE (shown below) > > /* Delete the old binding */ > if (stage_session(po->pppoe_pa.sid)) { > pppox_unbind_sock(sk); > pn = pppoe_pernet(sock_net(sk)); > delete_item(pn, po->pppoe_pa.sid, > po->pppoe_pa.remote, po->pppoe_ifindex); > if (po->pppoe_dev) { > dev_put(po->pppoe_dev); > po->pppoe_dev = NULL; > } > > memset(sk_pppox(po) + 1, 0, > sizeof(struct pppox_sock) - sizeof(struct sock)); > sk->sk_state = PPPOX_NONE; > } > > 3. pppoe_release - No oops (since sk->sk_state is no longer in > {PPPOX_CONNECTED,PPPOX_BOUND,PPPOX_ZOMBIE}) > > It doesn't look to me like the above functions can execute > asynchronously but I'd have to look harder. I am using 3.16 by the way. > Just drop the pppoe_connect() call. Right after the pppoe_flush_dev() call, sk_state is PPPOX_ZOMBIE and pppoe_dev is NULL. This is enouhg to make pppoe_release() crash. The typical scenario e6740165b8f7 ("ppp: don't override sk->sk_state in pppoe_flush_dev()") fixes is: Userspace process #1: Userspace process #2: --------------------- --------------------- fd = socket(AF_PPPOX, PX_PROTO_OE, 0); connect(fd, {AF_PPPOX, PX_PROTO_EO, $sid, $mac_addr, $ifname}, sizeof(struct sockaddr_pppox)); ... process_packets() ... # ip link set $ifname mtu $mtu close(fd); --> Kernel Oops