From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aleksa Sarai Subject: Re: [PATCH RFC 0/1] mount: universally disallow mounting over symlinks Date: Mon, 30 Dec 2019 19:32:24 +1100 Message-ID: <20191230083224.sbk2jspqmup43obs@yavin.dot.cyphar.com> References: <20191230052036.8765-1-cyphar@cyphar.com> <20191230054413.GX4203@ZenIV.linux.org.uk> <20191230054913.c5avdjqbygtur2l7@yavin.dot.cyphar.com> <20191230072959.62kcojxpthhdwmfa@yavin.dot.cyphar.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4jbwm72isg7wdcgm" Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Linus Torvalds Cc: Al Viro , David Howells , Eric Biederman , stable , Christian Brauner , Serge Hallyn , dev@opencontainers.org, Linux Containers , Linux API , linux-fsdevel , Linux Kernel Mailing List List-Id: linux-api@vger.kernel.org --4jbwm72isg7wdcgm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2019-12-29, Linus Torvalds wrote: > On Sun, Dec 29, 2019 at 11:30 PM Aleksa Sarai wrote: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 >=20 > Would you mind building with debug info, and then running the oops through >=20 > scripts/decode_stacktrace.sh >=20 > which makes those addresses much more legible. Will do. > > #PF: supervisor instruction fetch in kernel mode > > #PF: error_code(0x0010) - not-present page >=20 > Somebody jumped through a NULL pointer. >=20 > > RAX: 0000000000000000 RBX: ffff906d0cc3bb40 RCX: 0000000000000abc > > RDX: 0000000000000089 RSI: ffff906d74623cc0 RDI: ffff906d74475df0 > > RBP: ffff906d74475df0 R08: ffffd70b7fb24c20 R09: ffff906d066a5000 > > R10: 0000000000000000 R11: 8080807fffffffff R12: ffff906d74623cc0 > > R13: 0000000000000089 R14: ffffb70b82963dc0 R15: 0000000000000080 > > FS: 00007fbc2a8f0540(0000) GS:ffff906dcf500000(0000) knlGS:0000000= 000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: ffffffffffffffd6 CR3: 00000003c68f8001 CR4: 00000000003606e0 > > Call Trace: > > __lookup_slow+0x94/0x160 >=20 > And "__lookup_slow()" has two indirect calls (they aren't obvious with > retpoline, but look for something like >=20 > call __x86_indirect_thunk_rax >=20 > which is the modern sad way of doing "call *%rax"). One is for > revalidatinging an old dentry, but the one I _suspect_ you trigger is > this one: >=20 > old =3D inode->i_op->lookup(inode, dentry, flags); >=20 > but I thought we only could get here if we know it's a directory. >=20 > How did we miss the "d_can_lookup()", which is what should check that > yes, we can call that ->lookup() routine. I'll try applying a trivial patch to add d_can_lookup() to see if it fixes the immediate issue. > This is why I have that suspicion that it's somehow that O_PATH fd > opened in another process without O_PATH causes confusion... >=20 > So what I think has happened is that because of the O_PATH thing, > we've ended up with an inode that has never been truly opened (because > O_PATH skips that part), but then with the /proc//fd/xyz open, we > now have a file descriptor that _looks_ like it is valid, and we're > treating that inode as if it can be used. I'm not sure I agree -- as I mentioned in my other mail, re-opening through /proc/self/fd/$n works *very* well and has for a long time (in fact, both LXC and runc depend on this working). --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --4jbwm72isg7wdcgm Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSxZm6dtfE8gxLLfYqdlLljIbnQEgUCXgm2FQAKCRCdlLljIbnQ Es8nAQDVLlsiprSDBJzgPUIJzecdqNxCZqJKorIf34AfFNF2FgD/YY1dxfmAH2LS EX3M69u6T8mfTLPNWSZlIyF13X2S2w4= =JUfW -----END PGP SIGNATURE----- --4jbwm72isg7wdcgm--