From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752658AbdI2QHV (ORCPT ); Fri, 29 Sep 2017 12:07:21 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45902 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752398AbdI2QHT (ORCPT ); Fri, 29 Sep 2017 12:07:19 -0400 Date: Fri, 29 Sep 2017 09:07:13 -0700 From: "Paul E. McKenney" To: Linus Torvalds Cc: Kyle Sanderson , Linux-Kernal Subject: Re: Kernel panic - not syncing: Fatal exception in interrupt (file_free_rcu+0x14) Reply-To: paulmck@linux.vnet.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17092916-0044-0000-0000-00000397F0E6 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007812; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000233; SDB=6.00924128; UDB=6.00464649; IPR=6.00704267; BA=6.00005613; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017325; XFM=3.00000015; UTC=2017-09-29 16:07:16 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17092916-0045-0000-0000-000007C6FBA5 Message-Id: <20170929160713.GX3521@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-29_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709290230 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 28, 2017 at 09:00:52PM -0700, Linus Torvalds wrote: > On Thu, Sep 28, 2017 at 8:32 PM, Kyle Sanderson wrote: > > Not sure if the stack is crap or not, but this looks like an RCU crash? > > > > https://i.imgur.com/sBnNe1p.jpg > > Hmm. Not the clearest picture, and the "Code:" line in particular is > missing the interesting part, but at a guess it's taking a fault in > put_cred(), which inlines to > > if (atomic_dec_and_test(&(cred)->usage)) > __put_cred(cred); > > and I think it's that "cred" pointer that may be NULL, which makes > "&(cred)->usage" be a NULL pointer too, and you get a page fault when > it tries to decrement the usage count. > > Now, it goes without saying that the cred pointer should never *be* > NULL on a filp that is on the RCU freeing list, because we always > initialize file->f_cred when we allocate a file to the current creds. > > So there's something odd going on. Possibly entirely unrelated memory > corruption. > > Nothing obvious stands out, I think we'd need to see more of a pattern > of the problem to see what is up. Kyle, if this is reproducible, please build your kernel with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. This can detect the call_rcu() equivalent of a double free, and these double frees have sometimes resulted in output looking something like the .jpg along with the NULL-pointer problem that Linus suspects. Thanx, Paul