Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Mickaël Salaün" <mic@digikod.net>
To: "Günther Noack" <gnoack@google.com>
Cc: Tingmao Wang <m@maowtm.org>, Jann Horn <jannh@google.com>,
	 John Johansen <john.johansen@canonical.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	 Ben Scarlato <akhna@google.com>,
	Christian Brauner <brauner@kernel.org>,
	 Daniel Burgener <dburgener@linux.microsoft.com>,
	Jeff Xu <jeffxu@google.com>, NeilBrown <neil@brown.name>,
	 Paul Moore <paul@paul-moore.com>,
	Ryan Sullivan <rysulliv@redhat.com>, Song Liu <song@kernel.org>,
	 linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories
Date: Thu, 24 Jul 2025 19:10:11 +0200	[thread overview]
Message-ID: <20250724.ECiGoor4ulub@digikod.net> (raw)
In-Reply-To: <aIJH9CoEKWNq0HwN@google.com>

On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote:
> On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:
> > On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:
> > > On the other hand, I'm still a bit uncertain about the domain check
> > > semantics.  While it would not cause a rename to be allowed if it is
> > > otherwise not allowed by any rules on or above the mountpoint, this gets a
> > > bit weird if we have a situation where renames are allowed on the
> > > mountpoint or everywhere, but not read/writes, however read/writes are
> > > allowed directly on a file, but the dir containing that file gets
> > > disconnected so the sandboxed application can't read or write to it.
> > > (Maybe someone would set up such a policy where renames are allowed,
> > > expecting Landlock to always prevent renames where additional permissions
> > > would be exposed?)
> > > 
> > > In the above situation, if the file is then moved to a connected
> > > directory, it will become readable/writable again.
> > 
> > We can generalize this issue to not only the end file but any component
> > of the path: disconnected directories.  In fact, the main issue is the
> > potential inconsistency of access checks over time (e.g. between two
> > renames).  This could be exploited to bypass the security checks done
> > for FS_REFER.
> > 
> > I see two solutions:
> > 
> > 1. *Always* walk down to the IS_ROOT directory, and then jump to the
> >    mount point.  This makes it possible to have consistent access checks
> >    for renames and open/use.  The first downside is that that would
> >    change the current behavior for bind mounts that could get more
> >    access rights (if the policy explicitly sets rights for the hidden
> >    directories).  The second downside is that we'll do more walk.
> > 
> > 2. Return -EACCES (or -ENOENT) for actions involving disconnected
> >    directories, or renames of disconnected opened files.  This second
> >    solution is simpler and safer but completely disables the use of
> >    disconnected directories and the rename of disconnected files for
> >    sandboxed processes.
> > 
> > It would be much better to be able to handle opened directories as
> > (object) capabilities, but that is not currently possible because of the
> > way paths are handled by the VFS and LSM hooks.
> > 
> > Tingmao, Günther, Jann, what do you think?
> 
> I have to admit that so far, I still failed to wrap my head around the
> full patch set and its possible corner cases.  I hope I did not
> misunderstand things all too badly below:
> 
> As far as I understand the proposed patch, we are "checkpointing" the
> intermediate results of the path walk at every mount point boundary,
> and in the case where we run into a disconnected directory in one of
> the nested mount points, we restore from the intermediate result at
> the previous mount point directory and skip to the next mount point.

Correct

> 
> Visually speaking, if the layout is this (where ":" denotes a
> mountpoint boundary between the mountpoints MP1, MP2, MP3):
> 
>                           dirfd
>                             |
>           :                 V         :
> 	  :       ham <--- spam <--- eggs <--- x.txt
> 	  :    (disconn.)             :
>           :                           :
>   / <--- foo <--- bar <--- baz        :
>           :                           :
>     MP1                 MP2                  MP3
> 
> When a process holds a reference to the "spam" directory, which is now
> disconnected, and invokes openat(dirfd, "eggs/x.txt", ...), then we
> would:
> 
>   * traverse x.txt
>   * traverse eggs (checkpointing the intermediate result) <-.
>   * traverse spam                                           |
>   * traverse ham                                            |
>   * discover that ham is disconnected:                      |
>      * restore the intermediate result from "eggs" ---------'
>      * continue the walk at foo
>   * end up at the root
> 
> So effectively, since the results from "spam" and "ham" are discarded,
> we would traverse only the inodes in the outer and inner mountpoints
> MP1 and MP3, but effectively return a result that looks like we did
> not traverse MP2?

We'd still check MP2's inode, but otherwise yes.

> 
> Maybe (likely) I misread the code. :) It's not clear to me what the
> thinking behind this is.  Also, if there was another directory in
> between "spam" and "eggs" in MP2, wouldn't we be missing the access
> rights attached to this directory?

Yes, we would ignore this access right because we don't know that the
path was resolved from spam.

> 
> 
> Regarding the capability approach:
> 
> I agree that a "capability" approach would be the better solution, but
> it seems infeasible with the existing LSM hooks at the moment.  I
> would be in favor of it though.

Yes, it would be a new feature with potential important changes.

In the meantime, we still need a fix for disconnected directories, and
this fix needs to be backported.  That's why the capability approach is
not part of the two solutions. ;)

> 
> To spell it out a bit more explicitly what that would mean in my mind:
> 
> When a path is looked up relative to a dirfd, the path walk upwards
> would terminate at the dirfd and use previously calculated access
> rights stored in the associated struct file.  These access rights
> would be determined at the time of opening the dirfd, similar to how we
> are already storing the "truncate" access right today for regular
> files.
> 
> (Remark: There might still be corner cases where we have to solve it
> the hard way, if someone uses ".." together with a dirfd-relative
> lookup.)

Yep, real capabilities don't have ".." in their design.  On Linux (and
Landlock), we need to properly handle "..", which is challenging.

> 
> I also looked at what it would take to change the LSM hooks to pass
> the directory that the lookup was done relative to, but it seems that
> this would have to be passed through a bunch of VFS callbacks as well,
> which seems like a larger change.  I would be curious whether that
> would be deemed an acceptable change.
> 
> —Günther
> 
> 
> P.S. Related to relative directory lookups, there is some movement in
> the BSDs as well to use dirfds as capabilities, by adding a flag to
> open directories that enforces O_BENEATH on subsequent opens:
> 
>  * https://undeadly.org/cgi?action=article;sid=20250529080623
>  * https://reviews.freebsd.org/D50371
> 
> (both found via https://news.ycombinator.com/item?id=44575361)
> 
> If a dirfd had such a flag, that would get rid of the corner case
> above.

This would be nice but it would not solve the current issue because we
cannot force all processes to use this flag (which breaks some use
cases).

FYI, Capsicum is a more complete implementation:
https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4
See the vfs.lookup_cap_dotdot sysctl too.

next prev parent reply	other threads:[~2025-07-24 17:10 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-19 10:41 [PATCH v3 0/4] Landlock: Disconnected directory handling Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 1/4] landlock: Fix cosmetic change Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 2/4] landlock: Fix handling of disconnected directories Mickaël Salaün
2025-07-22 18:04   ` Tingmao Wang
2025-07-23 21:01     ` Mickaël Salaün
2025-07-24 14:49       ` Günther Noack
2025-07-24 17:10         ` Mickaël Salaün [this message]
2025-07-25  3:54           ` Abhinav Saxena
2025-07-25  9:05             ` Mickaël Salaün
2025-07-28  0:13       ` Tingmao Wang
2025-07-19 10:42 ` [PATCH v3 3/4] selftests/landlock: Add tests for access through disconnected paths Mickaël Salaün
2025-07-19 10:42 ` [PATCH v3 4/4] selftests/landlock: Add disconnected leafs and branch test suites Mickaël Salaün

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250724.ECiGoor4ulub@digikod.net \
    --to=mic@digikod.net \
    --cc=akhna@google.com \
    --cc=brauner@kernel.org \
    --cc=dburgener@linux.microsoft.com \
    --cc=gnoack@google.com \
    --cc=jannh@google.com \
    --cc=jeffxu@google.com \
    --cc=john.johansen@canonical.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=neil@brown.name \
    --cc=paul@paul-moore.com \
    --cc=rysulliv@redhat.com \
    --cc=song@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).