From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Jan Glauber To: Will Deacon CC: Alexander Viro , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: dcache_readdir NULL inode oops Date: Wed, 21 Nov 2018 13:19:06 +0000 Message-ID: <20181121131900.GA18931@hc> References: <20181109143744.GA12128@hc> <20181109155856.GC2091@brain-police> <20181110111656.GA16667@hc> <20181120182854.GC28838@arm.com> <20181120190317.GA29161@arm.com> In-Reply-To: <20181120190317.GA29161@arm.com> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-ID: <44C8F01F8BB08248B4B5DFB4703FFDC3@namprd07.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: On Tue, Nov 20, 2018 at 07:03:17PM +0000, Will Deacon wrote: > On Tue, Nov 20, 2018 at 06:28:54PM +0000, Will Deacon wrote: > > On Sat, Nov 10, 2018 at 11:17:03AM +0000, Jan Glauber wrote: > > > On Fri, Nov 09, 2018 at 03:58:56PM +0000, Will Deacon wrote: > > > > On Fri, Nov 09, 2018 at 02:37:51PM +0000, Jan Glauber wrote: > > > > > I'm seeing the following oops reproducible with upstream kernel o= n arm64 > > > > > (ThunderX2): > > > > > > > > [...] > > > > > > > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This= testcase > > > > > does a scandir of /dev and then calls random stuff like ioctl, ls= eek, > > > > > open/close etc. on the entries. I assume no files are deleted und= er /dev > > > > > during the testcase. > > > > > > > > > > The NULL pointer is the inode pointer of next. The next dentry->d= _flags is > > > > > DCACHE_RCUACCESS when this happens. > > > > > > > > > > Any hints on how to further debug this? > > > > > > > > Can you reproduce the issue with vanilla -rc1 and do you have a "kn= own good" > > > > kernel? > > > > > > I can try out -rc1, but IIRC this wasn't bisectible as the bug was pr= esent at > > > least back to 4.14. I need to double check that as there were other i= ssues > > > that are resolved now so I may confuse things here. I've defintely se= en > > > the same bug with 4.18. > > > > > > Unfortunately I lost access to the machine as our data center seems t= o be > > > moving currently so it might take some days until I can try -rc1. > > > > Ok, I've just managed to reproduce this in a KVM guest running v4.20-rc= 3 on > > both the host and the guest, so if anybody has any ideas of things to t= ry then > > I'm happy to give them a shot. In the meantime, I'll try again with a b= unch of > > debug checks enabled. Hi Will, good that you can reproduce the issue. I've verified that the issue is indeed reproducible with 4.14. >=20 > Weee, I eventually hit a use-after-free from KASAN. See below. I ran KASAN (and all the other debug stuff) but didn't trigger anything in the host. --Jan