From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guillaume Nault Subject: Re: Kernel 4.1.12 crash Date: Mon, 30 Nov 2015 21:42:08 +0100 Message-ID: <20151130204208.GA6046@alphalink.fr> References: <564F26FF.3040605@seti.kr.ua> <564FA904.7020603@gmail.com> <5650287B.9070901@seti.kr.ua> <56514FF5.7060906@gmail.com> <5654EBE8.9030705@seti.kr.ua> <20151125141048.GA3868@alphalink.fr> <5655CCAE.6000300@seti.kr.ua> <20151126164452.GA2988@alphalink.fr> <565B7699.8030105@seti.kr.ua> <20151130150337.GC3059@alphalink.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Duyck , netdev@vger.kernel.org, Simon Farnsworth To: Andrew Return-path: Received: from zimbra.alphalink.fr ([217.15.80.77]:42218 "EHLO zimbra.alphalink.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751250AbbK3UmM convert rfc822-to-8bit (ORCPT ); Mon, 30 Nov 2015 15:42:12 -0500 Content-Disposition: inline In-Reply-To: <20151130150337.GC3059@alphalink.fr> Sender: netdev-owner@vger.kernel.org List-ID: [Adding Simon to the discussion] On Mon, Nov 30, 2015 at 04:03:37PM +0100, Guillaume Nault wrote: > On Mon, Nov 30, 2015 at 12:05:13AM +0200, Andrew wrote: > > 26.11.2015 18:44, Guillaume Nault =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > >On Wed, Nov 25, 2015 at 04:58:54PM +0200, Andrew wrote: > > >>25.11.2015 16:10, Guillaume Nault =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > >>>On Wed, Nov 25, 2015 at 12:59:52AM +0200, Andrew wrote: > > >>>>Hi. > > >>>> > > >>>>I tried to reproduce errors in virtual environment (some VMs on= my > > >>>>notebook). > > >>>> > > >>>>I've tried to create 1000 client PPPoE sessions from this box v= ia script: > > >>>>for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test pas= sword test > > >>>>nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noaut= h eth0; done > > >>>> > > >>>I've tried to reproduce the bug with your script, but couldn't g= et > > >>>anything to crash (VM is Debian Jessie i386 running on KVM with = upstream > > >>>kernel 4.1.12). Does the crash happen before all sessions get > > >>>established? > > >>Yes, crash happens even before all daemon instances are started. = Sessions > > >>don't get established because BRAS configured to reject sessions = (so a lot > > >>of concurrent connection retries happens) - I still didn't create= d account > > >>for test user on it. > > >> > > >Ok, I got the crash too. In fact I had misunderstood your previous > > >message, crash happens when PPP sessions don't get established > > >(authentication failures in my case). > > > > > >I'll investigate on that and let you know. > >=20 > > It seems like bug appears on mass ppp devices removing (I planned t= o use > > this test environment to reproduce BRAS periodical crashes, but sud= denly > > I've got crashes on test client). > >=20 > > I've checked it with some kernels - it's present in 4.3.0, but it i= sn't > > present in 3.10.57. I'll try to build 3.14/3.18 kernels to look how= they > > will work in this case. >=20 > Yes, it most likely was introduced by 287f3a943fef ("pppoe: Use > workqueue to die properly when a PADT is received"). I still have to > figure out why. I confirm the bug comes from this commit. It happens if pppoe_connect() reinitialises po->proto.pppoe.padt_work after pppoe_disc_rcv() has added it to the system's work queue, and before that work got scheduled. Then when scheduling occurs, the worker thread tries to run a corrupted structure and crashes. I'm going to work on a patch.