* Re: [RFC] Null Namespaces
From: Al Viro @ 2026-06-26 0:15 UTC (permalink / raw)
To: John Ericson
Cc: Andy Lutomirski, Li Chen, Cong Wang, Christian Brauner,
linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
Sergei Zimmerman, Farid Zakaria
In-Reply-To: <a75a9b82-a15b-4893-8f92-62b62664ea83@app.fastmail.com>
On Wed, Jun 24, 2026 at 11:41:07PM -0400, John Ericson wrote:
> The current working directory, roughly, is *just* some global state
> holding a directory file descriptor.
So's the descriptor table; what's the difference?
> But I don't want that global state.
Don't use it, then... out of curiosity, does that extend to stdout et.al.?
> If I am writing my userland program (that is not a shell), I would not
> create the global variable. I do not appreciate the fact that the kernel
> foists that state upon me whether I like it or not.
<wry> Kernel will have to live without your appreciation, I suppose. </wry>
> Now obviously we cannot have a giant breaking change removing the notion
> of a current working directory altogether. But we can allow individual
> processes which don't want it to opt out, and that is what nulling out
> these fields (and updating the path resolution code to cope with that)
> allows.
>
> There is no loss of expressive power doing this, because one can (and
> should!) just use the `*at` and file descriptors. But there is, however,
> the imposition of discipline.
So supply a library of your own and try to convince people to use it
instead of libc. You'll have to anyway, seeing that a large and
hard-to-predict part of libc will be non-functional. Which syscalls
are used by your library is entirely up to you.
Would that kind of thing added kernel-side assist the development of such
library? Maybe, but I wouldn't bet too much on that - if you start from
scratch, you can trivially verify that you don't even attempt given
set of syscalls and if you use libc as a starting point, you get to
debug all the failure exits you've added...
> The programmer (or coding agent) is
> encouraged to do everything with file descriptors rather than path
> concatenations etc., because they need to use `*at` anyways, and then
> voilà, without browbeating anyone in security seminars or code review, a
> bunch of TOCTOU issues disappear simply because doing the right thing is
> now the path of least resistance.
I'm sorry, but the path of least resistance is picking a snippet from google
that will implement open(), etc., on top of your setup and using it.
_Especially_ if coding agents are going to be involved, precisely because
they'll do a convincing simulation of human duhveloper's behaviour, i.e.
"cut'n'paste it from the net".
^ permalink raw reply
* Re: [RFC] Null Namespaces
From: David Laight @ 2026-06-26 8:27 UTC (permalink / raw)
To: Andy Lutomirski
Cc: John Ericson, H. Peter Anvin, Al Viro, Li Chen, Cong Wang,
Christian Brauner, linux-arch, LKML, linux-fsdevel, linux-api,
Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
Sergei Zimmerman, Farid Zakaria
In-Reply-To: <CALCETrWxi1g1wy2Bi4y6URW728Cmo8D3tchdMqs4GZ7S476iJA@mail.gmail.com>
On Thu, 25 Jun 2026 16:09:58 -0700
Andy Lutomirski <luto@kernel.org> wrote:
> On Thu, Jun 25, 2026 at 2:53 PM John Ericson <mail@johnericson.me> wrote:
> >
> > On Thu, Jun 25, 2026, at 5:00 PM, H. Peter Anvin wrote:
> > > On 2026-06-24 16:12, Al Viro wrote:
> > > > On Wed, Jun 24, 2026 at 06:51:47PM -0400, John Ericson wrote:
> > > >
> > > >> #### Null mount namespace
> > > >>
> > > >> - requires:
> > > >>
> > > >> - null root file system: absolute paths don't work.
> > > >>
> > > >> - null current working directory: relative paths with traditional,
> > > >> non-`*at` system calls (and `*at` ones using `AT_FDCWD`) don't work.
> > > >>
> > > >> - All operations relating to the "ambient" mount tree don't work.
> > > >>
> > > >> - `*at` operations with a file descriptor do work.
> > > >
> > > > Huh? The last bit looks contradicts the previous one - if you have
> > > > an opened directory in a mount from some namespace, those `*at` operations
> > > > with that descriptor *will* be seeing the mount tree of that namespace,
> > > > whatever the hell is "ambient" supposed to mean. Either that, or you
> > > > will be exposing whatever's overmounted in that mount, which is a huge
> > > > can of worms.
> > >
> > > It seems to me that this is really no different *in practice* to having an
> > > empty mount namespace, no? You might still be able to stat("/") and get a
> > > d--------- result, but how does that actually affect anything?
> >
> > The argument against just having an empty, immutable root directory and
> > calling it a day is the tie-in with a new process-spawning API discussed
> > near the bottom of my original email. I want to have nice secure
> > defaults, rather than forcing the programmer to remember to unshare, but
> > I also don't want to degrade performance by speculatively creating new
> > empty mount namespaces that might just be thrown away. Null fields alone
> > get us both --- security and good performance.
>
> This seems like a false dichotomy. There's such thing as a singleton.
>
> In fact, we have this spiffy nullfs_fs_get_tree. It seems relatively
> straightforward to have an API to get an fd to the singleton nullfs,
> and the default for a newly spawned process could even be to have cwd
> pointing at nullfs.
>
> root is still harder, because of the shadowing issue. I think I
> proposed, ages ago, relaxing the chroot rules so that, at least under
> certain circumstances (e.g. the task is not already chrooted) an
> unprivileged task could chroot. chrooting to nullfs seems like a
> somewhat useful operation.
>
> I can imagine more complex schemes to allow even a chrooted process to
> safely start acting as though their root is nullfs, but that would be
> potentially fairly nasty. *Maybe* everything would work if there was
> a root-for-dotdot and a separate root-for-absolute-paths, and
> nameidata->root could point to the former, but I'm certainly not
> willing to say that I think this would work with any confidence at
> all.
You'd also need to sort out the 'pwd' mess.
The kernel inode always has its real parent, inside a chroot the scan stops
when the inode is the same as that of the base of the chroot.
But faf about with namespaces (IIRC I was doing an unshare to get out of
a network namespace) and that comparison can fail (if the chroot base isn't
a mount point) - so "../.." can go all the way back to the real root rather
than stopping at the base of the chroot (as you would expect).
David
>
> --Andy
>
^ permalink raw reply
* Re: [PATCH 01/19] dt-bindings: crypto: add Rambus CryptoManager Hub
From: Krzysztof Kozlowski @ 2026-06-26 10:55 UTC (permalink / raw)
To: Saravanakrishnan Krishnamoorthy
Cc: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
devicetree, Joel Wittenauer, linux-api, linux-crypto, linux-doc,
linux-kernel, linux-kselftest, linux-riscv, Shuah Khan,
sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-2-skrishnamoorthy@rambus.com>
On Thu, Jun 25, 2026 at 10:33:09AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
>
> Add device tree binding schema for the CRI CryptoManager Hub (CMH)
> hardware crypto accelerator. The binding covers the parent SoC-level
> node with register region, interrupt, DMA properties, and per-core
> child nodes identified by compatible string and unit address.
...
>
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **
OK, we are done. I am removing your posting from Patchwork.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH 19/19] MAINTAINERS: add Rambus CryptoManager Hub (CMH)
From: Krzysztof Kozlowski @ 2026-06-26 10:57 UTC (permalink / raw)
To: Saravanakrishnan Krishnamoorthy
Cc: Albert Ou, Alex Ousherovitch, Conor Dooley, David S. Miller,
Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
devicetree, Joel Wittenauer, linux-api, linux-crypto, linux-doc,
linux-kernel, linux-kselftest, linux-riscv, Shuah Khan,
sipsupport, Thi Nguyen
In-Reply-To: <20260625173328.1140487-20-skrishnamoorthy@rambus.com>
On Thu, Jun 25, 2026 at 10:33:27AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
>
> Add MAINTAINERS entry for the CRI CryptoManager Hub (CMH) hardware
> crypto accelerator driver under drivers/crypto/cmh/.
>
> Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
> Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> Reviewed-by: Thi Nguyen <thin@rambus.com>
Are these people really provided you with Reviewer's statement of
oversight? Do they understand what does it mean?
> ---
> MAINTAINERS | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 90034eb7874e..ecb389795e3d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6797,6 +6797,25 @@ F: kernel/cred.c
> F: rust/kernel/cred.rs
> F: Documentation/security/credentials.rst
>
> +CRI CRYPTOMANAGER HUB (CMH) HARDWARE CRYPTO ACCELERATOR
> +M: Alex Ousherovitch <aousherovitch@rambus.com>
> +M: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> +R: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> +R: Thi Nguyen <thin@rambus.com>
> +L: linux-crypto@vger.kernel.org
> +L: sipsupport@rambus.com (moderated for non-subscribers)
NAK, drop. You are not allowed to add here internal moderated mailing
lists. We are not going to participate in your corporate dances.
> +S: Maintained
> +T: git https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
Drop, you do not have commit rights there.
> +F: Documentation/ABI/testing/cmh-mgmt
> +F: Documentation/ABI/testing/debugfs-driver-cmh
> +F: Documentation/ABI/testing/sysfs-driver-cmh
> +F: Documentation/crypto/device_drivers/cmh.rst
> +F: Documentation/devicetree/bindings/crypto/cri,cmh.yaml
> +F: Documentation/userspace-api/ioctl/cmh_mgmt.rst
> +F: drivers/crypto/cmh/
> +F: include/uapi/linux/cmh_mgmt_ioctl.h
> +F: tools/testing/selftests/drivers/crypto/cmh/
> +
> INTEL CRPS COMMON REDUNDANT PSU DRIVER
> M: Ninad Palsule <ninad@linux.ibm.com>
> L: linux-hwmon@vger.kernel.org
> --
> 2.43.7
>
>
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **
Heh, I should have ignored your message...
Best regards,
Krzysztof
^ permalink raw reply
* Re: [RFC] Null Namespaces
From: John Ericson @ 2026-06-26 16:26 UTC (permalink / raw)
To: Al Viro
Cc: Andy Lutomirski, Li Chen, Cong Wang, Christian Brauner,
linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
Sergei Zimmerman, Farid Zakaria
In-Reply-To: <20260626001538.GO2636677@ZenIV>
On Thu, Jun 25, 2026, at 8:15 PM, Al Viro wrote:
> On Wed, Jun 24, 2026 at 11:41:07PM -0400, John Ericson wrote:
>
> > But I don't want that global state.
>
> Don't use it, then... out of curiosity, does that extend to stdout et.al.?
Good question; it turns out I like the standard streams much better!
First of all, the standard streams are just an idiom --- there is
nothing actually special about file descriptors 0, 1, and 2. That's a
clean design --- the kernel doesn't need to know about userspace idioms.
Second of all, if you don't want any of those, you can just close 'em!
You can't do that with the cwd, however. It's stuck open.
Ideally `*at` would have been with us from the beginning, and, say, file
descriptor 3 would have been the "current working directory" merely by
convention.
> Would that kind of thing added kernel-side assist the development of such
> library? Maybe, but I wouldn't bet too much on that - if you start from
> scratch, you can trivially verify that you don't even attempt given
> set of syscalls and if you use libc as a starting point, you get to
> debug all the failure exits you've added...
First of all, I am trying to change what processes are allowed to do,
and this includes programs I did not write. A libc-based solution is the
program cooperating with its own sandboxing; this is not a solution for
running arbitrary programs which may not be trusted in a restricted
manner.
Second of all, this would be very laborious in practice, because we're
talking not about what syscalls the program uses, but about what data is
passed in those syscalls. Any program that consumes arbitrary user input
(like shell utilities) might receive an absolute or relative path, and
so it would have to manually check for that, lest the user input "trick"
the program into using the root dir and cwd it is trying to ignore.
Making a tiny few edits in the kernel path resolution logic to allow for
these null fields is much more practical than defending a much broader
perimeter in userspace.
> > The programmer (or coding agent) is
> > encouraged to do everything with file descriptors rather than path
> > concatenations etc., because they need to use `*at` anyways, and then
> > voilà, without browbeating anyone in security seminars or code review, a
> > bunch of TOCTOU issues disappear simply because doing the right thing is
> > now the path of least resistance.
>
> I'm sorry, but the path of least resistance is picking a snippet from google
> that will implement open(), etc., on top of your setup and using it.
> _Especially_ if coding agents are going to be involved, precisely because
> they'll do a convincing simulation of human duhveloper's behaviour, i.e.
> "cut'n'paste it from the net".
We agree! But this is precisely why it is important to make these things
fail. Mindless Stack Overflow cut'n'pasters (human or agent) still run
their program to make sure it works. Making the thing you don't want
them to do *actually fail* creates sufficiently strong and incremental
feedback that they will end up doing the right thing.
> > The current working directory, roughly, is *just* some global state
> > holding a directory file descriptor.
>
> So's the descriptor table; what's the difference?
Now that I've responded to everything else, I can answer this in
summary:
- File descriptors can be closed; cwd and root cannot be.
- File descriptors need to be explicitly used in syscalls. The cwd and
root are implicitly used (in too many different syscalls to make
syscall-level auditing practical) based on the sort of path string
argument to the syscall, without the program's explicit consent.
John
^ permalink raw reply
* Re: [PATCH 01/19] dt-bindings: crypto: add Rambus CryptoManager Hub
From: Krishnamoorthy, Saravanakrishnan @ 2026-06-26 17:15 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Albert Ou, Ousherovitch, Alex, Conor Dooley, David S. Miller,
Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
devicetree@vger.kernel.org, Wittenauer, Joel,
linux-api@vger.kernel.org, linux-crypto@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-riscv@lists.infradead.org,
Shuah Khan, SIPSupport, Nguyen, Thi
In-Reply-To: <20260626-radiant-affable-raccoon-f48b9a@quoll>
Hi Krzysztof,
Understood, and apologies. The confidentiality footer was auto-appended by our corporate mail gateway, not something we intended on an open-source submission. We've had IT disable it, so it won't be on future mail. We'll resend the series as v2 without the disclaimer.
Sorry for the noise.
Krishnan (Saravanakrishnan Krishnamoorthy)
________________________________________
From: Krzysztof Kozlowski <krzk@kernel.org>
Sent: Friday, June 26, 2026 3:55 AM
To: Krishnamoorthy, Saravanakrishnan
Cc: Albert Ou; Ousherovitch, Alex; Conor Dooley; David S. Miller; Herbert Xu; Jonathan Corbet; Krzysztof Kozlowski; Palmer Dabbelt; Paul Walmsley; Rob Herring; Shuah Khan; Alexandre Ghiti; devicetree@vger.kernel.org; Wittenauer, Joel; linux-api@vger.kernel.org; linux-crypto@vger.kernel.org; linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; linux-riscv@lists.infradead.org; Shuah Khan; SIPSupport; Nguyen, Thi
Subject: Re: [PATCH 01/19] dt-bindings: crypto: add Rambus CryptoManager Hub
[Some people who received this message don't often get email from krzk@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Caution: < External Email >
On Thu, Jun 25, 2026 at 10:33:09AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
>
> Add device tree binding schema for the CRI CryptoManager Hub (CMH)
> hardware crypto accelerator. The binding covers the parent SoC-level
> node with register region, interrupt, DMA properties, and per-core
> child nodes identified by compatible string and unit address.
...
>
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **
OK, we are done. I am removing your posting from Patchwork.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH 19/19] MAINTAINERS: add Rambus CryptoManager Hub (CMH)
From: Krishnamoorthy, Saravanakrishnan @ 2026-06-26 17:22 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Albert Ou, Ousherovitch, Alex, Conor Dooley, David S. Miller,
Herbert Xu, Jonathan Corbet, Krzysztof Kozlowski, Palmer Dabbelt,
Paul Walmsley, Rob Herring, Shuah Khan, Alexandre Ghiti,
devicetree@vger.kernel.org, Wittenauer, Joel,
linux-api@vger.kernel.org, linux-crypto@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-riscv@lists.infradead.org,
Shuah Khan, SIPSupport, Nguyen, Thi
In-Reply-To: <20260626-lush-eel-of-election-5fcbde@quoll>
Hi Krzysztof,
Thanks for the review - all fair, and we'll fix them in v2:
Drop L: sipsupport@rambus.com (keeping only linux-crypto).
Drop the T: line - we don't maintain a tree; the driver will go through the crypto tree.
Yes, Joel and Thi reviewed and acknowledged with the statement of oversight.
Krishnan
________________________________________
From: Krzysztof Kozlowski <krzk@kernel.org>
Sent: Friday, June 26, 2026 3:57 AM
To: Krishnamoorthy, Saravanakrishnan
Cc: Albert Ou; Ousherovitch, Alex; Conor Dooley; David S. Miller; Herbert Xu; Jonathan Corbet; Krzysztof Kozlowski; Palmer Dabbelt; Paul Walmsley; Rob Herring; Shuah Khan; Alexandre Ghiti; devicetree@vger.kernel.org; Wittenauer, Joel; linux-api@vger.kernel.org; linux-crypto@vger.kernel.org; linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; linux-kselftest@vger.kernel.org; linux-riscv@lists.infradead.org; Shuah Khan; SIPSupport; Nguyen, Thi
Subject: Re: [PATCH 19/19] MAINTAINERS: add Rambus CryptoManager Hub (CMH)
[Some people who received this message don't often get email from krzk@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
Caution: < External Email >
On Thu, Jun 25, 2026 at 10:33:27AM -0700, Saravanakrishnan Krishnamoorthy wrote:
> From: Alex Ousherovitch <aousherovitch@rambus.com>
>
> Add MAINTAINERS entry for the CRI CryptoManager Hub (CMH) hardware
> crypto accelerator driver under drivers/crypto/cmh/.
>
> Co-developed-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> Signed-off-by: Alex Ousherovitch <aousherovitch@rambus.com>
> Reviewed-by: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> Reviewed-by: Thi Nguyen <thin@rambus.com>
Are these people really provided you with Reviewer's statement of
oversight? Do they understand what does it mean?
> ---
> MAINTAINERS | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 90034eb7874e..ecb389795e3d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6797,6 +6797,25 @@ F: kernel/cred.c
> F: rust/kernel/cred.rs
> F: Documentation/security/credentials.rst
>
> +CRI CRYPTOMANAGER HUB (CMH) HARDWARE CRYPTO ACCELERATOR
> +M: Alex Ousherovitch <aousherovitch@rambus.com>
> +M: Saravanakrishnan Krishnamoorthy <skrishnamoorthy@rambus.com>
> +R: Joel Wittenauer <Joel.Wittenauer@cryptography.com>
> +R: Thi Nguyen <thin@rambus.com>
> +L: linux-crypto@vger.kernel.org
> +L: sipsupport@rambus.com (moderated for non-subscribers)
NAK, drop. You are not allowed to add here internal moderated mailing
lists. We are not going to participate in your corporate dances.
> +S: Maintained
> +T: git https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
Drop, you do not have commit rights there.
> +F: Documentation/ABI/testing/cmh-mgmt
> +F: Documentation/ABI/testing/debugfs-driver-cmh
> +F: Documentation/ABI/testing/sysfs-driver-cmh
> +F: Documentation/crypto/device_drivers/cmh.rst
> +F: Documentation/devicetree/bindings/crypto/cri,cmh.yaml
> +F: Documentation/userspace-api/ioctl/cmh_mgmt.rst
> +F: drivers/crypto/cmh/
> +F: include/uapi/linux/cmh_mgmt_ioctl.h
> +F: tools/testing/selftests/drivers/crypto/cmh/
> +
> INTEL CRPS COMMON REDUNDANT PSU DRIVER
> M: Ninad Palsule <ninad@linux.ibm.com>
> L: linux-hwmon@vger.kernel.org
> --
> 2.43.7
>
>
> ** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **
Heh, I should have ignored your message...
Best regards,
Krzysztof
^ permalink raw reply
* Re: [RFC] Null Namespaces
From: John Ericson @ 2026-06-26 17:23 UTC (permalink / raw)
To: David Laight, Andy Lutomirski
Cc: H. Peter Anvin, Al Viro, Li Chen, Cong Wang, Christian Brauner,
linux-arch, LKML, linux-fsdevel, linux-api, Arnd Bergmann,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
Jan Kara, Jonathan Corbet, Shuah Khan, Kees Cook,
Sergei Zimmerman, Farid Zakaria
In-Reply-To: <20260626092750.58a8de9c@pumpkin>
I am replying to both Andy and David in a single email --- hope that is
not confusing.
On Thu, Jun 25, 2026, at 7:09 PM, Andy Lutomirski wrote:
> On Thu, Jun 25, 2026 at 2:53 PM John Ericson <mail@johnericson.me> wrote:
> >
> > The argument against just having an empty, immutable root directory and
> > calling it a day is the tie-in with a new process-spawning API discussed
> > near the bottom of my original email. I want to have nice secure
> > defaults, rather than forcing the programmer to remember to unshare, but
> > I also don't want to degrade performance by speculatively creating new
> > empty mount namespaces that might just be thrown away. Null fields alone
> > get us both --- security and good performance.
>
> This seems like a false dichotomy. There's such thing as a singleton.
>
> In fact, we have this spiffy nullfs_fs_get_tree. It seems relatively
> straightforward to have an API to get an fd to the singleton nullfs,
> and the default for a newly spawned process could even be to have cwd
> pointing at nullfs.
Ah! This is the first I am learning about the new nullfs. OK yes I agree
this gives us both properties, since it is truly immutably empty.
I still have a slight preference for something that also makes
statting/opening/etc. of `/` itself fail, but this is otherwise good ---
there's no denying it.
> root is still harder, because of the shadowing issue. I think I
> proposed, ages ago, relaxing the chroot rules so that, at least under
> certain circumstances (e.g. the task is not already chrooted) an
> unprivileged task could chroot. chrooting to nullfs seems like a
> somewhat useful operation.
>
> I can imagine more complex schemes to allow even a chrooted process to
> safely start acting as though their root is nullfs, but that would be
> potentially fairly nasty. *Maybe* everything would work if there was
> a root-for-dotdot and a separate root-for-absolute-paths, and
> nameidata->root could point to the former, but I'm certainly not
> willing to say that I think this would work with any confidence at
> all.
I really like these ideas!
- Splitting the two uses of root sounds great. Even more generally (at
least as a thought experiment, I don't like the O(n) performance), one
can imagine a set of paths one must not `cd ..` past. Conceptually, I
feel optimistic that inserting another boundary path into the set on
every `chroot` makes it safe.
- In the original "real root", the "root for .." field could be null,
since no `..` check is actually needed. Then, if we only want to have
a single "root for .." (to avoid the O(n)), only the initial
assignment of it from null to non-null would be unprivileged --- this
would implement your "task is not already chrooted" idea. Subsequent
assignment would still be privileged since we are replacing, not
extending our "set". (The nullable single path means we have 0 or 1
paths in our set.)
----
On Fri, Jun 26, 2026, at 4:27 AM, David Laight wrote:
>
> You'd also need to sort out the 'pwd' mess.
> The kernel inode always has its real parent, inside a chroot the scan stops
> when the inode is the same as that of the base of the chroot.
> But faf about with namespaces (IIRC I was doing an unshare to get out of
> a network namespace) and that comparison can fail (if the chroot base isn't
> a mount point) - so "../.." can go all the way back to the real root rather
> than stopping at the base of the chroot (as you would expect).
>
> David
I did get the impression that the `..` check is...rather fragile. I am
also thinking that a global setting like `openat2`'s `RESOLVE_BENEATH`
to make `..` never work would be useful; then all manner of chrooting is
trivially safe, because you cannot go up regardless!
----
Given the state of the discussion, I'll go submit my null cwd and root
patch momentarily. The nullfs alternative is quite compelling; to the
extent that I do prefer making the root operations fail as I said above,
I think my best shot is demonstrating that this patch is so small and
lightweight that this slight benefit is paid for by the simplicity of
the implementation.
John
^ permalink raw reply
page: | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox