Linux Security Modules development
 help / color / mirror / Atom feed
* Re: [PATCH RFC 00/12] Enroll kernel keys thru MOK
From: Eric Snowberg @ 2021-07-08 23:17 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: keyrings, linux-integrity, David Howells, David Woodhouse,
	Herbert Xu, David S . Miller, Jarkko Sakkinen, James Morris,
	Serge E . Hallyn, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, glin, konrad.wilk@oracle.com
In-Reply-To: <490941a5197bf4bcf0d6f95610085ee4d46ed9bb.camel@linux.ibm.com>


> On Jul 8, 2021, at 1:31 PM, Mimi Zohar <zohar@linux.ibm.com> wrote:
> 
> On Thu, 2021-07-08 at 11:59 -0600, Eric Snowberg wrote:
>> 
>>> Asumming a
>>> function similar to "restrict_link_by_builtin_and_secondary_trusted" is
>>> defined to include the MOK keyring, the CA keys in the MOK db would be
>>> loaded onto the MOK keyring, the other keys that meet the secondary
>>> keyring restriction would be loaded directly onto the secondary
>>> keyring[1], and as you currently have, the remaining keys onto the
>>> platform keyring.
>>> 
>>> This eliminates the exemption needed for loading keys onto the
>>> secondary keyring.  The MOK keyring, containing just CA keys, becomes a
>>> new trust source.
>> 
>> I just want to make sure I understand. If we kept the .mok keyring around, 
>> we would store it into the system_keyring code, just like the platform 
>> keyring is stored.  This would allow the move exemption code to be removed.
>> If the mok keyring is a new trust source, whenever the secondary keyring 
>> is referenced in verify_ code, the mok keyring will be checked too.  If 
>> I have this right, let me know and I’ll work on a v2.  Thanks.
> 
> All the firmware keys are loaded onto the "platform" keyring, without
> any restriction.  Your reference point should be the "builtin" and
> "secondary" keyrings, not the "platform" keyring.
> 
> Changes:
> - defining a new keyring restriction which only allows CA keys to be
> loaded onto the MOK keyring.
> - defining a new keyring restriction something along the lines of
> "restrict_link_by_builtin_mok_and_secondary_trusted()".
> 
> In the case of "restrict_link_by_builtin_and_secondary_trusted()", it's
> based on a build time option.  In the case of MOK, it might be both a
> build time and runtime firmware variable option.  There are quite a few
> permutations that will somehow need to be addressed:  secondary keyring
> not defined, but MOK keyring defined, and the reverse.
> 
> Once all the CA keys in the MOK db are loaded onto the MOK keyring,

To avoid confusion with the new keyring name, would it be more appropriate 
to change what we are calling the .mok keyring to the .trusted_platform 
keyring instead? Or just leave it as .mok?

> there will be no need for moving keys to the secondary keyring.  The
> secondary keyring restriction will just work.  The main question is
> whether there will need to be two passes.   One pass to first load all
> the CA keys onto the MOK keyring.  A second pass to load the keys onto
> the secondary keyring, based on the keyring restriction, and the
> remaining ones onto the "platform" keyring to avoid the regression.
> 
> [Once the CA keys are loaded onto the MOK keyring, userspace will be
> able to load certificates, signed by a key on the MOK keyring, onto the
> secondary keyring.]

Other than that, I think I got it, I’ll start working on a v2.  Thanks.


^ permalink raw reply

* Re: [PATCH RFC 00/12] Enroll kernel keys thru MOK
From: Mimi Zohar @ 2021-07-09  1:10 UTC (permalink / raw)
  To: Eric Snowberg
  Cc: keyrings, linux-integrity, David Howells, David Woodhouse,
	Herbert Xu, David S . Miller, Jarkko Sakkinen, James Morris,
	Serge E . Hallyn, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, glin, konrad.wilk@oracle.com
In-Reply-To: <839EF700-7A2C-4282-AF97-768FAD1A9957@oracle.com>

On Thu, 2021-07-08 at 17:17 -0600, Eric Snowberg wrote:
> > Once all the CA keys in the MOK db are loaded onto the MOK keyring,
> 
> To avoid confusion with the new keyring name, would it be more appropriate 
> to change what we are calling the .mok keyring to the .trusted_platform 
> keyring instead? Or just leave it as .mok?

Definitely not ".trusted_platform" keyring, as it would be too
confusing with the existing "trusted" key type [1].  At least for now,
leave it as ".mok".

thanks,

Mimi

[1] Documentation/security/keys/trusted-encrypted.rst


^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Christian Brauner @ 2021-07-09  9:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, viro, virtio-fs, dwalsh, dgilbert,
	casey.schaufler, linux-security-module, selinux, tytso, miklos,
	gscrivan, jack, Christoph Hellwig
In-Reply-To: <20210708175738.360757-2-vgoyal@redhat.com>

On Thu, Jul 08, 2021 at 01:57:38PM -0400, Vivek Goyal wrote:
> Currently user.* xattr are not allowed on symlink and special files.
> 
> man xattr and recent discussion suggested that primary reason for this
> restriction is how file permissions for symlinks and special files
> are little different from regular files and directories.
> 
> For symlinks, they are world readable/writable and if user xattr were
> to be permitted, it will allow unpriviliged users to dump a huge amount
> of user.* xattrs on symlinks without any control.
> 
> For special files, permissions typically control capability to read/write
> from devices (and not necessarily from filesystem). So if a user can
> write to device (/dev/null), does not necessarily mean it should be allowed
> to write large number of user.* xattrs on the filesystem device node is
> residing in.
> 
> This patch proposes to relax the restrictions a bit and allow file owner
> or priviliged user (CAP_FOWNER), to be able to read/write user.* xattrs
> on symlink and special files.
> 
> virtiofs daemon has a need to store user.* xatrrs on all the files
> (including symlinks and special files), and currently that fails. This
> patch should help.
> 
> Link: https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---

Seems reasonable and useful.
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>

One question, do all filesystem supporting xattrs deal with setting them
on symlinks/device files correctly?

>  fs/xattr.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 5c8c5175b385..2f1855c8b620 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -120,12 +120,14 @@ xattr_permission(struct user_namespace *mnt_userns, struct inode *inode,
>  	}
>  
>  	/*
> -	 * In the user.* namespace, only regular files and directories can have
> -	 * extended attributes. For sticky directories, only the owner and
> -	 * privileged users can write attributes.
> +	 * In the user.* namespace, for symlinks and special files, only
> +	 * the owner and priviliged users can read/write attributes.
> +	 * For sticky directories, only the owner and privileged users can
> +	 * write attributes.
>  	 */
>  	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
> -		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> +		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> +		    !inode_owner_or_capable(mnt_userns, inode))
>  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
>  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
>  		    (mask & MAY_WRITE) &&
> -- 
> 2.25.4
> 

^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Vivek Goyal @ 2021-07-09 15:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-kernel, viro, virtio-fs, dwalsh, dgilbert,
	casey.schaufler, linux-security-module, selinux, tytso, miklos,
	gscrivan, jack, Christoph Hellwig
In-Reply-To: <20210709091915.2bd4snyfjndexw2b@wittgenstein>

On Fri, Jul 09, 2021 at 11:19:15AM +0200, Christian Brauner wrote:
> On Thu, Jul 08, 2021 at 01:57:38PM -0400, Vivek Goyal wrote:
> > Currently user.* xattr are not allowed on symlink and special files.
> > 
> > man xattr and recent discussion suggested that primary reason for this
> > restriction is how file permissions for symlinks and special files
> > are little different from regular files and directories.
> > 
> > For symlinks, they are world readable/writable and if user xattr were
> > to be permitted, it will allow unpriviliged users to dump a huge amount
> > of user.* xattrs on symlinks without any control.
> > 
> > For special files, permissions typically control capability to read/write
> > from devices (and not necessarily from filesystem). So if a user can
> > write to device (/dev/null), does not necessarily mean it should be allowed
> > to write large number of user.* xattrs on the filesystem device node is
> > residing in.
> > 
> > This patch proposes to relax the restrictions a bit and allow file owner
> > or priviliged user (CAP_FOWNER), to be able to read/write user.* xattrs
> > on symlink and special files.
> > 
> > virtiofs daemon has a need to store user.* xatrrs on all the files
> > (including symlinks and special files), and currently that fails. This
> > patch should help.
> > 
> > Link: https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> 
> Seems reasonable and useful.
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> 
> One question, do all filesystem supporting xattrs deal with setting them
> on symlinks/device files correctly?

Wrote a simple bash script to do setfattr/getfattr user.foo xattr on
symlink and device node on ext4, xfs and btrfs and it works fine.

https://github.com/rhvgoyal/misc/blob/master/generic-programs/user-xattr-special-files.sh

I probably can add some more filesystems to test.

Thanks
Vivek

> 
> >  fs/xattr.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xattr.c b/fs/xattr.c
> > index 5c8c5175b385..2f1855c8b620 100644
> > --- a/fs/xattr.c
> > +++ b/fs/xattr.c
> > @@ -120,12 +120,14 @@ xattr_permission(struct user_namespace *mnt_userns, struct inode *inode,
> >  	}
> >  
> >  	/*
> > -	 * In the user.* namespace, only regular files and directories can have
> > -	 * extended attributes. For sticky directories, only the owner and
> > -	 * privileged users can write attributes.
> > +	 * In the user.* namespace, for symlinks and special files, only
> > +	 * the owner and priviliged users can read/write attributes.
> > +	 * For sticky directories, only the owner and privileged users can
> > +	 * write attributes.
> >  	 */
> >  	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
> > -		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> > +		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> > +		    !inode_owner_or_capable(mnt_userns, inode))
> >  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
> >  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
> >  		    (mask & MAY_WRITE) &&
> > -- 
> > 2.25.4
> > 
> 


^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Casey Schaufler @ 2021-07-09 15:34 UTC (permalink / raw)
  To: Vivek Goyal, Christian Brauner
  Cc: linux-fsdevel, linux-kernel, viro, virtio-fs, dwalsh, dgilbert,
	casey.schaufler, linux-security-module, selinux, tytso, miklos,
	gscrivan, jack, Christoph Hellwig
In-Reply-To: <20210709152737.GA398382@redhat.com>

On 7/9/2021 8:27 AM, Vivek Goyal wrote:
> On Fri, Jul 09, 2021 at 11:19:15AM +0200, Christian Brauner wrote:
>> On Thu, Jul 08, 2021 at 01:57:38PM -0400, Vivek Goyal wrote:
>>> Currently user.* xattr are not allowed on symlink and special files.
>>>
>>> man xattr and recent discussion suggested that primary reason for this
>>> restriction is how file permissions for symlinks and special files
>>> are little different from regular files and directories.
>>>
>>> For symlinks, they are world readable/writable and if user xattr were
>>> to be permitted, it will allow unpriviliged users to dump a huge amount
>>> of user.* xattrs on symlinks without any control.
>>>
>>> For special files, permissions typically control capability to read/write
>>> from devices (and not necessarily from filesystem). So if a user can
>>> write to device (/dev/null), does not necessarily mean it should be allowed
>>> to write large number of user.* xattrs on the filesystem device node is
>>> residing in.
>>>
>>> This patch proposes to relax the restrictions a bit and allow file owner
>>> or priviliged user (CAP_FOWNER), to be able to read/write user.* xattrs
>>> on symlink and special files.
>>>
>>> virtiofs daemon has a need to store user.* xatrrs on all the files
>>> (including symlinks and special files), and currently that fails. This
>>> patch should help.
>>>
>>> Link: https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
>>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
>>> ---
>> Seems reasonable and useful.
>> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
>>
>> One question, do all filesystem supporting xattrs deal with setting them
>> on symlinks/device files correctly?
> Wrote a simple bash script to do setfattr/getfattr user.foo xattr on
> symlink and device node on ext4, xfs and btrfs and it works fine.

How about nfs, tmpfs, overlayfs and/or some of the other less conventional
filesystems?

>
> https://github.com/rhvgoyal/misc/blob/master/generic-programs/user-xattr-special-files.sh
>
> I probably can add some more filesystems to test.
>
> Thanks
> Vivek
>
>>>  fs/xattr.c | 10 ++++++----
>>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/xattr.c b/fs/xattr.c
>>> index 5c8c5175b385..2f1855c8b620 100644
>>> --- a/fs/xattr.c
>>> +++ b/fs/xattr.c
>>> @@ -120,12 +120,14 @@ xattr_permission(struct user_namespace *mnt_userns, struct inode *inode,
>>>  	}
>>>  
>>>  	/*
>>> -	 * In the user.* namespace, only regular files and directories can have
>>> -	 * extended attributes. For sticky directories, only the owner and
>>> -	 * privileged users can write attributes.
>>> +	 * In the user.* namespace, for symlinks and special files, only
>>> +	 * the owner and priviliged users can read/write attributes.
>>> +	 * For sticky directories, only the owner and privileged users can
>>> +	 * write attributes.
>>>  	 */
>>>  	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
>>> -		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>>> +		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
>>> +		    !inode_owner_or_capable(mnt_userns, inode))
>>>  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
>>>  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
>>>  		    (mask & MAY_WRITE) &&
>>> -- 
>>> 2.25.4
>>>

^ permalink raw reply

* Re: [PATCH RFC 00/12] Enroll kernel keys thru MOK
From: Nayna @ 2021-07-09 15:59 UTC (permalink / raw)
  To: Mimi Zohar, Eric Snowberg
  Cc: keyrings, linux-integrity, David Howells, David Woodhouse,
	Herbert Xu, David S . Miller, Jarkko Sakkinen, James Morris,
	Serge E . Hallyn, keescook, gregkh, torvalds, scott.branden,
	weiyongjun1, nayna, ebiggers, ardb, nramas, lszubowi,
	linux-kernel, linux-crypto, linux-security-module,
	James.Bottomley, pjones, glin, konrad.wilk@oracle.com
In-Reply-To: <ef480c8f83780eea4ff8fdcd35c6208760b5e1d7.camel@linux.ibm.com>


On 7/8/21 9:10 PM, Mimi Zohar wrote:
> Definitely not ".trusted_platform" keyring, as it would be too
> confusing with the existing "trusted" key type [1].  At least for now,
> leave it as ".mok".

Since this keyring is meant only for "CA" keys, can we name it as ".ca" ?

Thanks & Regards,

     - Nayna


^ permalink raw reply

* Re: [RFC PATCH v2 0/1] Relax restrictions on user.* xattr
From: Daniel Walsh @ 2021-07-09 16:00 UTC (permalink / raw)
  To: Vivek Goyal, linux-fsdevel, linux-kernel, viro
  Cc: virtio-fs, dgilbert, christian.brauner, casey.schaufler,
	linux-security-module, selinux, tytso, miklos, gscrivan, jack
In-Reply-To: <20210708175738.360757-1-vgoyal@redhat.com>

On 7/8/21 13:57, Vivek Goyal wrote:
> Hi,
>
> This is V2 of the patch. Posted V1 here.
>
> https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
>
> Right now we don't allow setting user.* xattrs on symlinks and special
> files at all. Initially I thought that real reason behind this
> restriction is quota limitations but from last conversation it seemed
> that real reason is that permission bits on symlink and special files
> are special and different from regular files and directories, hence
> this restriction is in place.
>
> Given it probably is not a quota issue (I tested with xfs user quota
> enabled and quota restrictions kicked in on symlink), I dropped the
> idea of allowing user.* xattr if process has CAP_SYS_RESOURCE.
>
> Instead this version of patch allows reading/writing user.* xattr
> on symlink and special files if caller is owner or priviliged (has
> CAP_FOWNER) w.r.t inode.
>
> We need this for virtiofs daemon. I also found one more user. Giuseppe,
> seems to set user.* xattr attrs on unpriviliged fuse-overlay as well
> and he ran into similar issue. So fuse-overlay should benefit from
> this change as well.
>
> Who wants to set user.* xattr on symlink/special files
> -----------------------------------------------------
>
> In virtiofs, actual file server is virtiosd daemon running on host.
> There we have a mode where xattrs can be remapped to something else.
> For example security.selinux can be remapped to
> user.virtiofsd.securit.selinux on the host.
>
> This remapping is useful when SELinux is enabled in guest and virtiofs
> as being used as rootfs. Guest and host SELinux policy might not match
> and host policy might deny security.selinux xattr setting by guest
> onto host. Or host might have SELinux disabled and in that case to
> be able to set security.selinux xattr, virtiofsd will need to have
> CAP_SYS_ADMIN (which we are trying to avoid). Being able to remap
> guest security.selinux (or other xattrs) on host to something else
> is also better from security point of view.
>
> But when we try this, we noticed that SELinux relabeling in guest
> is failing on some symlinks. When I debugged a little more, I
> came to know that "user.*" xattrs are not allowed on symlinks
> or special files.
>
> So if we allow owner (or CAP_FOWNER) to set user.* xattr, it will
> allow virtiofs to arbitrarily remap guests's xattrs to something
> else on host and that solves this SELinux issue nicely and provides
> two SELinux policies (host and guest) to co-exist nicely without
> interfering with each other.
>
> Thanks
> Vivek
>
>
> Vivek Goyal (1):
>    xattr: Allow user.* xattr on symlink and special files
>
>   fs/xattr.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
>
I just wanted to point out that the work Giuseppe is doing is to support 
nfs homedirs with container runtimes like Rootless Podman.


Basically fuse-overlayfs on top of NFS homedir needs to be able to use 
user xattrs to set file permissions and ownership fields to be 
represented to containers.

Currently NFS Servers do not understand User Namespace and seeing a 
client user attempting to chown to a different user, is blocked on the 
server, even though user namespace on the client allows it.  
fuse-overlay intercepts the chown from the container and writes out the 
user.Xattr the permissions and owner/group as user.Xattrs.  And all the 
server sees is the user modifying the xattrs now chowning the real UID 
of the file.



^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Vivek Goyal @ 2021-07-09 17:59 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Christian Brauner, linux-fsdevel, linux-kernel, viro, virtio-fs,
	dwalsh, dgilbert, casey.schaufler, linux-security-module, selinux,
	tytso, miklos, gscrivan, jack, Christoph Hellwig, bfields
In-Reply-To: <710d1c6f-d477-384f-0cc1-8914258f1fb1@schaufler-ca.com>

On Fri, Jul 09, 2021 at 08:34:41AM -0700, Casey Schaufler wrote:
> On 7/9/2021 8:27 AM, Vivek Goyal wrote:
> > On Fri, Jul 09, 2021 at 11:19:15AM +0200, Christian Brauner wrote:
> >> On Thu, Jul 08, 2021 at 01:57:38PM -0400, Vivek Goyal wrote:
> >>> Currently user.* xattr are not allowed on symlink and special files.
> >>>
> >>> man xattr and recent discussion suggested that primary reason for this
> >>> restriction is how file permissions for symlinks and special files
> >>> are little different from regular files and directories.
> >>>
> >>> For symlinks, they are world readable/writable and if user xattr were
> >>> to be permitted, it will allow unpriviliged users to dump a huge amount
> >>> of user.* xattrs on symlinks without any control.
> >>>
> >>> For special files, permissions typically control capability to read/write
> >>> from devices (and not necessarily from filesystem). So if a user can
> >>> write to device (/dev/null), does not necessarily mean it should be allowed
> >>> to write large number of user.* xattrs on the filesystem device node is
> >>> residing in.
> >>>
> >>> This patch proposes to relax the restrictions a bit and allow file owner
> >>> or priviliged user (CAP_FOWNER), to be able to read/write user.* xattrs
> >>> on symlink and special files.
> >>>
> >>> virtiofs daemon has a need to store user.* xatrrs on all the files
> >>> (including symlinks and special files), and currently that fails. This
> >>> patch should help.
> >>>
> >>> Link: https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
> >>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >>> ---
> >> Seems reasonable and useful.
> >> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> >>
> >> One question, do all filesystem supporting xattrs deal with setting them
> >> on symlinks/device files correctly?
> > Wrote a simple bash script to do setfattr/getfattr user.foo xattr on
> > symlink and device node on ext4, xfs and btrfs and it works fine.
> 
> How about nfs, tmpfs, overlayfs and/or some of the other less conventional
> filesystems?

tmpfs does not support user.* xattr at all on any kind of files.

overlayfs works fine. I updated my test too.

nfs seems to have some issues. 

- I can set user.foo xattr on symlink and query it back using xattr name.

  getfattr -h -n user.foo foo-link.txt

  But when I try to dump all xattrs on this file, user.foo is being
  filtered out it looks like. Not sure why.

- I can't set "user.foo" xattr on a device node on nfs and I get
  "Permission denied". I am assuming nfs server is returning this.
  I am using knfsd with following in /etc/exports.

  /mnt/test/nfs-server 127.0.0.1(insecure,no_root_squash,rw,async)

Copying Bruce. He might have an idea.

Thanks
Vivek

> 
> >
> > https://github.com/rhvgoyal/misc/blob/master/generic-programs/user-xattr-special-files.sh
> >
> > I probably can add some more filesystems to test.
> >
> > Thanks
> > Vivek
> >
> >>>  fs/xattr.c | 10 ++++++----
> >>>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/xattr.c b/fs/xattr.c
> >>> index 5c8c5175b385..2f1855c8b620 100644
> >>> --- a/fs/xattr.c
> >>> +++ b/fs/xattr.c
> >>> @@ -120,12 +120,14 @@ xattr_permission(struct user_namespace *mnt_userns, struct inode *inode,
> >>>  	}
> >>>  
> >>>  	/*
> >>> -	 * In the user.* namespace, only regular files and directories can have
> >>> -	 * extended attributes. For sticky directories, only the owner and
> >>> -	 * privileged users can write attributes.
> >>> +	 * In the user.* namespace, for symlinks and special files, only
> >>> +	 * the owner and priviliged users can read/write attributes.
> >>> +	 * For sticky directories, only the owner and privileged users can
> >>> +	 * write attributes.
> >>>  	 */
> >>>  	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
> >>> -		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> >>> +		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> >>> +		    !inode_owner_or_capable(mnt_userns, inode))
> >>>  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
> >>>  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
> >>>  		    (mask & MAY_WRITE) &&
> >>> -- 
> >>> 2.25.4
> >>>
> 


^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Bruce Fields @ 2021-07-09 20:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Casey Schaufler, Christian Brauner, linux-fsdevel, linux-kernel,
	viro, virtio-fs, dwalsh, dgilbert, casey.schaufler,
	linux-security-module, selinux, tytso, miklos, gscrivan, jack,
	Christoph Hellwig
In-Reply-To: <20210709175947.GB398382@redhat.com>

On Fri, Jul 9, 2021 at 1:59 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> nfs seems to have some issues.

I'm not sure what the expected behavior is for nfs.  All I have for
now is some generic troubleshooting ideas, sorry:

> - I can set user.foo xattr on symlink and query it back using xattr name.
>
>   getfattr -h -n user.foo foo-link.txt
>
>   But when I try to dump all xattrs on this file, user.foo is being
>   filtered out it looks like. Not sure why.

Logging into the server and seeing what's set there could help confirm
whether it's the client or server that's at fault.  (Or watching the
traffic in wireshark; there are GET/SET/LISTXATTR ops that should be
easy to spot.)

> - I can't set "user.foo" xattr on a device node on nfs and I get
>   "Permission denied". I am assuming nfs server is returning this.

Wireshark should tell you whether it's the server or client doing that.

The RFC is https://datatracker.ietf.org/doc/html/rfc8276, and I don't
see any explicit statement about what the server should do in the case
of symlinks or device nodes, but I do see "Any regular file or
directory may have a set of extended attributes", so that was clearly
the assumption.  Also, NFS4ERR_WRONG_TYPE is listed as a possible
error return for the xattr ops.  But on a quick skim I don't see any
explicit checks in the nfsd code, so I *think* it's just relying on
the vfs for any file type checks.

--b.


^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Theodore Ts'o @ 2021-07-09 20:36 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Vivek Goyal, Christian Brauner, linux-fsdevel, linux-kernel, viro,
	virtio-fs, dwalsh, dgilbert, casey.schaufler,
	linux-security-module, selinux, miklos, gscrivan, jack,
	Christoph Hellwig
In-Reply-To: <710d1c6f-d477-384f-0cc1-8914258f1fb1@schaufler-ca.com>

On Fri, Jul 09, 2021 at 08:34:41AM -0700, Casey Schaufler wrote:
> >> One question, do all filesystem supporting xattrs deal with setting them
> >> on symlinks/device files correctly?
> > Wrote a simple bash script to do setfattr/getfattr user.foo xattr on
> > symlink and device node on ext4, xfs and btrfs and it works fine.
> 
> How about nfs, tmpfs, overlayfs and/or some of the other less conventional
> filesystems?

As a suggestion, perhaps you could take your bash script and turn it
into an xfstests test so we can more easily test various file systems,
both now and once the commit is accepted, to look for regressions?

Cheers,

					- Ted

^ permalink raw reply

* [PATCH 0/2] net: cipso: bug fixes
From: Pavel Skripkin @ 2021-07-10  7:03 UTC (permalink / raw)
  To: paul, davem, yoshfuji, dsahern, kuba
  Cc: netdev, linux-security-module, linux-kernel, Pavel Skripkin

This small patch series fixes 2 bugs in cipso code. The first one was
found by syzbot and the second was found while testing fix for the
fisrt one. All 2 bugs can be triggered by reproducer provided by syzbot:

https://syzkaller.appspot.com/bug?id=d4bc7d67efe79c6ead3cb6bd94b84dbd287f1069

Pavel Skripkin (2):
  net: cipso: fix warnings in netlbl_cipsov4_add_std
  net: cipso: fix memory leak in cipso_v4_doi_free

 net/ipv4/cipso_ipv4.c            | 1 +
 net/netlabel/netlabel_cipso_v4.c | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

-- 
2.32.0


^ permalink raw reply

* [PATCH 1/2] net: cipso: fix warnings in netlbl_cipsov4_add_std
From: Pavel Skripkin @ 2021-07-10  7:03 UTC (permalink / raw)
  To: paul, davem, yoshfuji, dsahern, kuba
  Cc: netdev, linux-security-module, linux-kernel, Pavel Skripkin,
	syzbot+cdd51ee2e6b0b2e18c0d
In-Reply-To: <cover.1625900431.git.paskripkin@gmail.com>

Syzbot reported warning in netlbl_cipsov4_add(). The
problem was in too big doi_def->map.std->lvl.local_size
passed to kcalloc(). Since this value comes from userpace there is
no need to warn if value is not correct.

The same problem may occur with other kcalloc() calls in
this function, so, I've added __GFP_NOWARN flag to all
kcalloc() calls there.

Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com
Fixes: 96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
 net/netlabel/netlabel_cipso_v4.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netlabel/netlabel_cipso_v4.c b/net/netlabel/netlabel_cipso_v4.c
index 4f50a64315cf..50f40943c815 100644
--- a/net/netlabel/netlabel_cipso_v4.c
+++ b/net/netlabel/netlabel_cipso_v4.c
@@ -187,14 +187,14 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
 		}
 	doi_def->map.std->lvl.local = kcalloc(doi_def->map.std->lvl.local_size,
 					      sizeof(u32),
-					      GFP_KERNEL);
+					      GFP_KERNEL | __GFP_NOWARN);
 	if (doi_def->map.std->lvl.local == NULL) {
 		ret_val = -ENOMEM;
 		goto add_std_failure;
 	}
 	doi_def->map.std->lvl.cipso = kcalloc(doi_def->map.std->lvl.cipso_size,
 					      sizeof(u32),
-					      GFP_KERNEL);
+					      GFP_KERNEL | __GFP_NOWARN);
 	if (doi_def->map.std->lvl.cipso == NULL) {
 		ret_val = -ENOMEM;
 		goto add_std_failure;
@@ -263,7 +263,7 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
 		doi_def->map.std->cat.local = kcalloc(
 					      doi_def->map.std->cat.local_size,
 					      sizeof(u32),
-					      GFP_KERNEL);
+					      GFP_KERNEL | __GFP_NOWARN);
 		if (doi_def->map.std->cat.local == NULL) {
 			ret_val = -ENOMEM;
 			goto add_std_failure;
@@ -271,7 +271,7 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
 		doi_def->map.std->cat.cipso = kcalloc(
 					      doi_def->map.std->cat.cipso_size,
 					      sizeof(u32),
-					      GFP_KERNEL);
+					      GFP_KERNEL | __GFP_NOWARN);
 		if (doi_def->map.std->cat.cipso == NULL) {
 			ret_val = -ENOMEM;
 			goto add_std_failure;
-- 
2.32.0


^ permalink raw reply related

* [PATCH 2/2] net: cipso: fix memory leak in cipso_v4_doi_free
From: Pavel Skripkin @ 2021-07-10  7:03 UTC (permalink / raw)
  To: paul, davem, yoshfuji, dsahern, kuba
  Cc: netdev, linux-security-module, linux-kernel, Pavel Skripkin
In-Reply-To: <cover.1625900431.git.paskripkin@gmail.com>

When doi_def->type == CIPSO_V4_MAP_TRANS doi_def->map.std should
be freed to avoid memory leak.

Fail log:

BUG: memory leak
unreferenced object 0xffff88801b936d00 (size 64):
comm "a.out", pid 8478, jiffies 4295042353 (age 15.260s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 15 b8 12 26 00 00 00 00 00 00 00 00  .......&........
backtrace:
netlbl_cipsov4_add (net/netlabel/netlabel_cipso_v4.c:145 net/netlabel/netlabel_cipso_v4.c:416)
genl_family_rcv_msg_doit (net/netlink/genetlink.c:741)
genl_rcv_msg (net/netlink/genetlink.c:783 net/netlink/genetlink.c:800)
netlink_rcv_skb (net/netlink/af_netlink.c:2505)
genl_rcv (net/netlink/genetlink.c:813)

Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence
counts")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
 net/ipv4/cipso_ipv4.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index bfaf327e9d12..e0480c6cebaa 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -472,6 +472,7 @@ void cipso_v4_doi_free(struct cipso_v4_doi *doi_def)
 		kfree(doi_def->map.std->lvl.local);
 		kfree(doi_def->map.std->cat.cipso);
 		kfree(doi_def->map.std->cat.local);
+		kfree(doi_def->map.std);
 		break;
 	}
 	kfree(doi_def);
-- 
2.32.0


^ permalink raw reply related

* Re: [PATCH 2/2] net: cipso: fix memory leak in cipso_v4_doi_free
From: Dongliang Mu @ 2021-07-10  7:29 UTC (permalink / raw)
  To: Pavel Skripkin
  Cc: Paul Moore, David S. Miller, yoshfuji, dsahern, Jakub Kicinski,
	open list:NETWORKING [GENERAL], linux-security-module,
	linux-kernel
In-Reply-To: <cec894625531da243df3a9f05466b83e107e50d7.1625900431.git.paskripkin@gmail.com>

On Sat, Jul 10, 2021 at 3:10 PM Pavel Skripkin <paskripkin@gmail.com> wrote:
>
> When doi_def->type == CIPSO_V4_MAP_TRANS doi_def->map.std should
> be freed to avoid memory leak.
>
> Fail log:
>
> BUG: memory leak
> unreferenced object 0xffff88801b936d00 (size 64):
> comm "a.out", pid 8478, jiffies 4295042353 (age 15.260s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> 00 00 00 00 15 b8 12 26 00 00 00 00 00 00 00 00  .......&........
> backtrace:
> netlbl_cipsov4_add (net/netlabel/netlabel_cipso_v4.c:145 net/netlabel/netlabel_cipso_v4.c:416)
> genl_family_rcv_msg_doit (net/netlink/genetlink.c:741)
> genl_rcv_msg (net/netlink/genetlink.c:783 net/netlink/genetlink.c:800)
> netlink_rcv_skb (net/netlink/af_netlink.c:2505)
> genl_rcv (net/netlink/genetlink.c:813)
>
> Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence
> counts")
> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> ---
>  net/ipv4/cipso_ipv4.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index bfaf327e9d12..e0480c6cebaa 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -472,6 +472,7 @@ void cipso_v4_doi_free(struct cipso_v4_doi *doi_def)
>                 kfree(doi_def->map.std->lvl.local);
>                 kfree(doi_def->map.std->cat.cipso);
>                 kfree(doi_def->map.std->cat.local);
> +               kfree(doi_def->map.std);
>                 break;
>         }
>         kfree(doi_def);
> --

Hi Paval,

this patch is already merged by Paul. See [1] for more details.

[1] https://lore.kernel.org/netdev/CAHC9VhQZVOmy7n14nTSRGHzwN-y=E_JTUP+NpRCgD8rJN5sOGA@mail.gmail.com/T/

> 2.32.0
>

^ permalink raw reply

* Re: [PATCH 2/2] net: cipso: fix memory leak in cipso_v4_doi_free
From: Pavel Skripkin @ 2021-07-10  7:40 UTC (permalink / raw)
  To: Dongliang Mu
  Cc: Paul Moore, David S. Miller, yoshfuji, dsahern, Jakub Kicinski,
	open list:NETWORKING [GENERAL], linux-security-module,
	linux-kernel
In-Reply-To: <CAD-N9QWcOv0s4uzPW0kGk70tpkCjorQCKpa3RrtbxyMmSW5b=Q@mail.gmail.com>

On Sat, 10 Jul 2021 15:29:19 +0800
Dongliang Mu <mudongliangabcd@gmail.com> wrote:

> On Sat, Jul 10, 2021 at 3:10 PM Pavel Skripkin <paskripkin@gmail.com>
> wrote:
> >
> > When doi_def->type == CIPSO_V4_MAP_TRANS doi_def->map.std should
> > be freed to avoid memory leak.
> >
> > Fail log:
> >
> > BUG: memory leak
> > unreferenced object 0xffff88801b936d00 (size 64):
> > comm "a.out", pid 8478, jiffies 4295042353 (age 15.260s)
> > hex dump (first 32 bytes):
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > 00 00 00 00 15 b8 12 26 00 00 00 00 00 00 00 00  .......&........
> > backtrace:
> > netlbl_cipsov4_add (net/netlabel/netlabel_cipso_v4.c:145
> > net/netlabel/netlabel_cipso_v4.c:416) genl_family_rcv_msg_doit
> > (net/netlink/genetlink.c:741) genl_rcv_msg
> > (net/netlink/genetlink.c:783 net/netlink/genetlink.c:800)
> > netlink_rcv_skb (net/netlink/af_netlink.c:2505) genl_rcv
> > (net/netlink/genetlink.c:813)
> >
> > Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking
> > with refrerence counts")
> > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > ---
> >  net/ipv4/cipso_ipv4.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> > index bfaf327e9d12..e0480c6cebaa 100644
> > --- a/net/ipv4/cipso_ipv4.c
> > +++ b/net/ipv4/cipso_ipv4.c
> > @@ -472,6 +472,7 @@ void cipso_v4_doi_free(struct cipso_v4_doi
> > *doi_def) kfree(doi_def->map.std->lvl.local);
> >                 kfree(doi_def->map.std->cat.cipso);
> >                 kfree(doi_def->map.std->cat.local);
> > +               kfree(doi_def->map.std);
> >                 break;
> >         }
> >         kfree(doi_def);
> > --
> 
> Hi Paval,
> 
> this patch is already merged by Paul. See [1] for more details.
> 
> [1]
> https://lore.kernel.org/netdev/CAHC9VhQZVOmy7n14nTSRGHzwN-y=E_JTUP+NpRCgD8rJN5sOGA@mail.gmail.com/T/
> 


Hi, Dongliang!

Thank you for the information. I'm wondering, can maintainer pick only 1
patch from series, or I should send v2? 



With regards,
Pavel Skripkin

^ permalink raw reply

* Re: [PATCH 2/2] net: cipso: fix memory leak in cipso_v4_doi_free
From: Dongliang Mu @ 2021-07-10  7:42 UTC (permalink / raw)
  To: Pavel Skripkin
  Cc: Paul Moore, David S. Miller, yoshfuji, dsahern, Jakub Kicinski,
	open list:NETWORKING [GENERAL], linux-security-module,
	linux-kernel
In-Reply-To: <20210710104052.1469c94a@gmail.com>

On Sat, Jul 10, 2021 at 3:41 PM Pavel Skripkin <paskripkin@gmail.com> wrote:
>
> On Sat, 10 Jul 2021 15:29:19 +0800
> Dongliang Mu <mudongliangabcd@gmail.com> wrote:
>
> > On Sat, Jul 10, 2021 at 3:10 PM Pavel Skripkin <paskripkin@gmail.com>
> > wrote:
> > >
> > > When doi_def->type == CIPSO_V4_MAP_TRANS doi_def->map.std should
> > > be freed to avoid memory leak.
> > >
> > > Fail log:
> > >
> > > BUG: memory leak
> > > unreferenced object 0xffff88801b936d00 (size 64):
> > > comm "a.out", pid 8478, jiffies 4295042353 (age 15.260s)
> > > hex dump (first 32 bytes):
> > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > > 00 00 00 00 15 b8 12 26 00 00 00 00 00 00 00 00  .......&........
> > > backtrace:
> > > netlbl_cipsov4_add (net/netlabel/netlabel_cipso_v4.c:145
> > > net/netlabel/netlabel_cipso_v4.c:416) genl_family_rcv_msg_doit
> > > (net/netlink/genetlink.c:741) genl_rcv_msg
> > > (net/netlink/genetlink.c:783 net/netlink/genetlink.c:800)
> > > netlink_rcv_skb (net/netlink/af_netlink.c:2505) genl_rcv
> > > (net/netlink/genetlink.c:813)
> > >
> > > Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking
> > > with refrerence counts")
> > > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > > ---
> > >  net/ipv4/cipso_ipv4.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> > > index bfaf327e9d12..e0480c6cebaa 100644
> > > --- a/net/ipv4/cipso_ipv4.c
> > > +++ b/net/ipv4/cipso_ipv4.c
> > > @@ -472,6 +472,7 @@ void cipso_v4_doi_free(struct cipso_v4_doi
> > > *doi_def) kfree(doi_def->map.std->lvl.local);
> > >                 kfree(doi_def->map.std->cat.cipso);
> > >                 kfree(doi_def->map.std->cat.local);
> > > +               kfree(doi_def->map.std);
> > >                 break;
> > >         }
> > >         kfree(doi_def);
> > > --
> >
> > Hi Paval,
> >
> > this patch is already merged by Paul. See [1] for more details.
> >
> > [1]
> > https://lore.kernel.org/netdev/CAHC9VhQZVOmy7n14nTSRGHzwN-y=E_JTUP+NpRCgD8rJN5sOGA@mail.gmail.com/T/
> >
>
>
> Hi, Dongliang!
>
> Thank you for the information. I'm wondering, can maintainer pick only 1
> patch from series, or I should send v2?

I don't know. Maybe you can wait for the reply of the maintainers.

>
>
>
> With regards,
> Pavel Skripkin

^ permalink raw reply

* Re: [PATCH v1 1/4] landlock.7: Add a new page to introduce Landlock
From: Alejandro Colomar (man-pages) @ 2021-07-10 18:12 UTC (permalink / raw)
  To: Mickaël Salaün, Michael Kerrisk
  Cc: Jann Horn, Jonathan Corbet, Kees Cook, Randy Dunlap,
	Vincent Dagonneau, landlock, linux-kernel, linux-man,
	linux-security-module, Mickaël Salaün
In-Reply-To: <20210706182217.32338-2-mic@digikod.net>

Hi Mickaël,

On 7/6/21 8:22 PM, Mickaël Salaün wrote:
> From: Mickaël Salaün <mic@linux.microsoft.com>
> 
> From the user point of view, Landlock is a set of system calls enabling
> to build and enforce a set of access-control rules.  A ruleset can be
> created with landlock_create_ruleset(2), populated with
> landlock_add_rule(2) and enforced with landlock_restrict_self(2).  This
> man page gives an overview of the whole mechanism.  Details of these
> system calls are documented in their respective man pages.
> 
> This is an adaptation of
> https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html
> 
> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
> Link: https://lore.kernel.org/r/20210706182217.32338-2-mic@digikod.net

Please see a few comments below.

Thanks,

Alex

> ---
>  man7/landlock.7 | 354 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 354 insertions(+)
>  create mode 100644 man7/landlock.7
> 
> diff --git a/man7/landlock.7 b/man7/landlock.7
> new file mode 100644
> index 000000000000..32127d3b2061
> --- /dev/null
> +++ b/man7/landlock.7
> @@ -0,0 +1,354 @@
> +.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
> +.\" Copyright © 2019-2020 ANSSI
> +.\" Copyright © 2021 Microsoft Corporation
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH LANDLOCK 7 2021-06-27 Linux "Linux Programmer's Manual"
> +.SH NAME
> +Landlock \- security sandboxing
> +.SH DESCRIPTION
> +Landlock is a sandboxing mechanism that enables any processes to securely
> +restrict themselves and their future children.
> +Because Landlock is a stackable LSM, it makes possible to create safe
> +security sandboxes as new security layers in addition to the existing
> +system-wide access-controls.  This kind of sandbox is expected to help
> +mitigate the security impact of bugs, and unexpected or malicious behaviors
> +in applications.
> +.PP
> +A Landlock security policy is a set of access rights (e.g., open a file in
> +read-only, make a directory, etc.) tied to a file hierarchy.  Such policy
> +can be configured and enforced by processes for themselves using three
> +system calls:


See the following extract from man-pages(7):

$ man 7 man-pages | sed -n '/Use semantic newlines/,/^$/p';
   Use semantic newlines
       In the source of a manual page,  new  sentences  should  be
       started  on new lines, and long sentences should split into
       lines at clause breaks (commas, semicolons, colons, and  so
       on).   This  convention,  sometimes known as "semantic new‐
       lines", makes it easier to see the effect of patches, which
       often  operate at the level of individual sentences or sen‐
       tence clauses.

> +.IP \(bu 2
> +.BR landlock_create_ruleset (2)
> +creates a new ruleset;
> +.IP \(bu
> +.BR landlock_add_rule (2)
> +adds a new rule to a ruleset;
> +.IP \(bu
> +.BR landlock_restrict_self (2)
> +enforces a ruleset on the calling thread.
> +.PP
> +To be able to use these system calls, the running kernel must support
> +Landlock and it must be enabled at boot time.
> +.\"
> +.SS Landlock rules
> +A Landlock rule describes an action on an object.  An object is currently a
> +file hierarchy, and the related filesystem actions are defined with access
> +rights (see
> +.BR landlock_add_rule (2)
> +).  A set of rules is aggregated in a ruleset, which can
> +then restrict the thread enforcing it, and its future children.
> +.\"
> +.SS Defining and enforcing a security policy
> +We first need to create the ruleset that will contain our rules.  For this
> +example, the ruleset will contain rules that only allow read actions, but
> +write actions will be denied.  The ruleset then needs to handle both of
> +these kind of actions.  See below for the description of filesystem
> +actions.
> +.PP
> +.in +4n
> +.EX
> +int ruleset_fd;
> +struct landlock_ruleset_attr ruleset_attr = {
> +    .handled_access_fs =
> +        LANDLOCK_ACCESS_FS_EXECUTE |
> +        LANDLOCK_ACCESS_FS_WRITE_FILE |
> +        LANDLOCK_ACCESS_FS_READ_FILE |
> +        LANDLOCK_ACCESS_FS_READ_DIR |
> +        LANDLOCK_ACCESS_FS_REMOVE_DIR |
> +        LANDLOCK_ACCESS_FS_REMOVE_FILE |
> +        LANDLOCK_ACCESS_FS_MAKE_CHAR |
> +        LANDLOCK_ACCESS_FS_MAKE_DIR |
> +        LANDLOCK_ACCESS_FS_MAKE_REG |
> +        LANDLOCK_ACCESS_FS_MAKE_SOCK |
> +        LANDLOCK_ACCESS_FS_MAKE_FIFO |
> +        LANDLOCK_ACCESS_FS_MAKE_BLOCK |
> +        LANDLOCK_ACCESS_FS_MAKE_SYM,
> +};
> +
> +ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
> +if (ruleset_fd < 0) {
> +    perror("Failed to create a ruleset");
> +    return 1;
> +}
> +.EE
> +.in
> +.PP
> +We can now add a new rule to this ruleset thanks to the returned file
> +descriptor referring to this ruleset.  The rule will only allow reading the
> +file hierarchy
> +.I /usr

Use the following for formatting part of a "word":

.IR /usr .

See man_groff(7).

> +\&.  Without another rule, write actions would then be denied by the
> +ruleset.  To add
> +.I /usr
> +to the ruleset, we open it with the
> +.I O_PATH
> +flag and fill the
> +.I struct landlock_path_beneath_attr
> +with this file descriptor.
> +.PP
> +.in +4n
> +.EX
> +int err;
> +struct landlock_path_beneath_attr path_beneath = {
> +    .allowed_access =
> +        LANDLOCK_ACCESS_FS_EXECUTE |
> +        LANDLOCK_ACCESS_FS_READ_FILE |
> +        LANDLOCK_ACCESS_FS_READ_DIR,
> +};
> +
> +path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
> +if (path_beneath.parent_fd < 0) {
> +    perror("Failed to open file");
> +    close(ruleset_fd);
> +    return 1;
> +}
> +err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
> +                        &path_beneath, 0);
> +close(path_beneath.parent_fd);
> +if (err) {
> +    perror("Failed to update ruleset");
> +    close(ruleset_fd);
> +    return 1;
> +}
> +.EE
> +.in
> +.PP
> +We now have a ruleset with one rule allowing read access to
> +.I /usr
> +while denying all other handled accesses for the filesystem.  The next step
> +is to restrict the current thread from gaining more privileges (e.g.,
> +thanks to a set-user-ID binary).
> +.PP
> +.in +4n
> +.EX
> +if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
> +    perror("Failed to restrict privileges");
> +    close(ruleset_fd);
> +    return 1;
> +}
> +.EE
> +.in
> +.PP
> +The current thread is now ready to sandbox itself with the ruleset.
> +.PP
> +.in +4n
> +.EX
> +if (landlock_restrict_self(ruleset_fd, 0)) {
> +    perror("Failed to enforce ruleset");
> +    close(ruleset_fd);
> +    return 1;
> +}
> +close(ruleset_fd);
> +.EE
> +.in
> +.PP
> +If the
> +.BR landlock_restrict_self (2)
> +system call succeeds, the current thread is now restricted and this policy
> +will be enforced on all its subsequently created children as well.  Once a
> +thread is landlocked, there is no way to remove its security policy; only
> +adding more restrictions is allowed.  These threads are now in a new
> +Landlock domain, merge of their parent one (if any) with the new ruleset.
> +.PP
> +Full working code can be found in
> +.UR https://git.kernel.org\:/pub\:/scm\:/linux\:/kernel\:/git\:/stable\:/linux.git\:/tree\:/samples\:/landlock\:/sandboxer.c
> +.UE
> +.\"
> +.SS Filesystem actions
> +These flags enable to restrict a sandboxed process to a set of actions on
> +files and directories.  Files or directories opened before the sandboxing
> +are not subject to these restrictions.  See
> +.BR landlock_add_rule (2)
> +and
> +.BR landlock_create_ruleset (2)
> +for more context.
> +.PP
> +A file can only receive these access rights:
> +.TP
> +.BR LANDLOCK_ACCESS_FS_EXECUTE

Here it should be .B and not .BR (no roman part).  Also below...

> +Execute a file.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_WRITE_FILE
> +Open a file with write access.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_READ_FILE
> +Open a file with read access.
> +.PP
> +A directory can receive access rights related to files or directories.  The
> +following access right is applied to the directory itself, and the
> +directories beneath it:
> +.TP
> +.BR LANDLOCK_ACCESS_FS_READ_DIR
> +Open a directory or list its content.
> +.PP
> +However, the following access rights only apply to the content of a
> +directory, not the directory itself:
> +.TP
> +.BR LANDLOCK_ACCESS_FS_REMOVE_DIR
> +Remove an empty directory or rename one.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_REMOVE_FILE
> +Unlink (or rename) a file.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_CHAR
> +Create (or rename or link) a character device.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_DIR
> +Create (or rename) a directory.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_REG
> +Create (or rename or link) a regular file.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_SOCK
> +Create (or rename or link) a UNIX domain socket.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_FIFO
> +Create (or rename or link) a named pipe.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_BLOCK
> +Create (or rename or link) a block device.
> +.TP
> +.BR LANDLOCK_ACCESS_FS_MAKE_SYM
> +Create (or rename or link) a symbolic link.
> +.\"
> +.SS Layers of file path access rights
> +Each time a thread enforces a ruleset on itself, it updates its Landlock
> +domain with a new layer of policy.  Indeed, this complementary policy is
> +composed with the potentially other rulesets already restricting this
> +thread.  A sandboxed thread can then safely add more constraints to itself
> +with a new enforced ruleset.
> +.PP
> +One policy layer grants access to a file path if at least one of its rules
> +encountered on the path grants the access.  A sandboxed thread can only
> +access a file path if all its enforced policy layers grant the access as
> +well as all the other system access controls (e.g., filesystem DAC, other
> +LSM policies, etc.).
> +.\"
> +.SS Bind mounts and OverlayFS
> +Landlock enables restricting access to file hierarchies, which means that
> +these access rights can be propagated with bind mounts (cf.
> +.BR mount_namespaces (7)
> +) but not with OverlayFS.
> +.PP
> +A bind mount mirrors a source file hierarchy to a destination.  The
> +destination hierarchy is then composed of the exact same files, on which
> +Landlock rules can be tied, either via the source or the destination path.
> +These rules restrict access when they are encountered on a path, which
> +means that they can restrict access to multiple file hierarchies at the
> +same time, whether these hierarchies are the result of bind mounts or not.
> +.PP
> +An OverlayFS mount point consists of upper and lower layers.  These layers
> +are combined in a merge directory, result of the mount point.  This merge
> +hierarchy may include files from the upper and lower layers, but
> +modifications performed on the merge hierarchy only reflects on the upper
> +layer.  From a Landlock policy point of view, each OverlayFS layers and
> +merge hierarchies are standalone and contains their own set of files and
> +directories, which is different from bind mounts.  A policy restricting an
> +OverlayFS layer will not restrict the resulted merged hierarchy, and vice
> +versa.  Landlock users should then only think about file hierarchies they
> +want to allow access to, regardless of the underlying filesystem.
> +.\"
> +.SS Inheritance
> +Every new thread resulting from a
> +.BR clone (2)
> +inherits Landlock domain restrictions from its parent.  This is similar to
> +the
> +.BR seccomp (2)
> +inheritance or any other LSM dealing with task's
> +.BR credentials (7)
> +\&.  For instance, one process's thread may apply Landlock rules to itself,
> +but they will not be automatically applied to other sibling threads (unlike
> +POSIX thread credential changes, cf.
> +.BR nptl (7)
> +).
> +.PP
> +When a thread sandboxes itself, we have the guarantee that the related
> +security policy will stay enforced on all this thread's descendants.  This
> +allows creating standalone and modular security policies per application,
> +which will automatically be composed between themselves according to their
> +runtime parent policies.
> +.\"
> +.SS Ptrace restrictions
> +A sandboxed process has less privileges than a non-sandboxed process and
> +must then be subject to additional restrictions when manipulating another
> +process.  To be allowed to use
> +.BR ptrace (2)
> +and related syscalls on a target process, a sandboxed process should have a
> +subset of the target process rules, which means the tracee must be in a
> +sub-domain of the tracer.
> +.SH VERSIONS
> +Landlock was added in Linux 5.13.
> +.SH NOTES
> +Landlock is enabled by CONFIG_SECURITY_LANDLOCK.
> +The
> +.I lsm=lsm1,...,lsmN
> +command line parameter controls the sequence of the initialization of
> +Linux Security Modules.
> +It must contain the string
> +.I landlock
> +to enable Landlock.
> +If the command line parameter is not specified,
> +the initialization falls back to the value of the deprecated
> +.I security=
> +command line parameter and further to the value of CONFIG_LSM.
> +We can check that Landlock is enabled by looking for
> +.I
> +landlock: Up and running.
> +in kernel logs.
> +.\"
> +.PP
> +It is currently not possible to restrict some file-related actions
> +accessible through these syscall families:
> +.BR chdir (2)
> +,
> +.BR truncate (2)
> +,

In a single line:

.BR truncate (2),

> +.BR stat (2)
> +,
> +.BR flock (2)
> +,
> +.BR chmod (2)
> +,
> +.BR chown (2)
> +,
> +.BR setxattr (2)
> +,
> +.BR utime (2)
> +,
> +.BR ioctl (2)
> +,
> +.BR fcntl (2)
> +,
> +.BR access (2)
> +\&.

And

.BR access (2).

> +Future Landlock evolutions will enable to restrict them.
> +.SH SEE ALSO
> +.BR landlock_create_ruleset (2),
> +.BR landlock_add_rule (2),
> +.BR landlock_restrict_self (2)
> +.PP
> +.UR https://landlock.io\:/
> +.UE
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply

* Re: [PATCH v3 1/2] perf: Fix required permissions if sigtrap is requested
From: Marco Elver @ 2021-07-12 10:32 UTC (permalink / raw)
  To: elver, peterz
  Cc: tglx, mingo, dvyukov, glider, kasan-dev, linux-kernel, mingo,
	acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	linux-perf-users, ebiederm, omosnace, serge,
	linux-security-module, stable
In-Reply-To: <20210705084453.2151729-1-elver@google.com>

It'd be good to get this sorted -- please take another look.

Many thanks,
-- Marco

On Mon, 5 Jul 2021 at 10:45, Marco Elver <elver@google.com> wrote:
> If perf_event_open() is called with another task as target and
> perf_event_attr::sigtrap is set, and the target task's user does not
> match the calling user, also require the CAP_KILL capability or
> PTRACE_MODE_ATTACH permissions.
>
> Otherwise, with the CAP_PERFMON capability alone it would be possible
> for a user to send SIGTRAP signals via perf events to another user's
> tasks. This could potentially result in those tasks being terminated if
> they cannot handle SIGTRAP signals.
>
> Note: The check complements the existing capability check, but is not
> supposed to supersede the ptrace_may_access() check. At a high level we
> now have:
>
>         capable of CAP_PERFMON and (CAP_KILL if sigtrap)
>                 OR
>         ptrace_may_access(...) // also checks for same thread-group and uid
>
> Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
> Cc: <stable@vger.kernel.org> # 5.13+
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v3:
> * Upgrade ptrace mode check to ATTACH if attr.sigtrap, otherwise it's
>   possible to change the target task (send signal) even if only read
>   ptrace permissions were granted (reported by Eric W. Biederman).
>
> v2: https://lkml.kernel.org/r/20210701083842.580466-1-elver@google.com
> * Drop kill_capable() and just check CAP_KILL (reported by Ondrej Mosnacek).
> * Use ns_capable(__task_cred(task)->user_ns, CAP_KILL) to check for
>   capability in target task's ns (reported by Ondrej Mosnacek).
>
> v1: https://lkml.kernel.org/r/20210630093709.3612997-1-elver@google.com
> ---
>  kernel/events/core.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index fe88d6eea3c2..f79ee82e644a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -12152,10 +12152,33 @@ SYSCALL_DEFINE5(perf_event_open,
>         }
>
>         if (task) {
> +               unsigned int ptrace_mode = PTRACE_MODE_READ_REALCREDS;
> +               bool is_capable;
> +
>                 err = down_read_interruptible(&task->signal->exec_update_lock);
>                 if (err)
>                         goto err_file;
>
> +               is_capable = perfmon_capable();
> +               if (attr.sigtrap) {
> +                       /*
> +                        * perf_event_attr::sigtrap sends signals to the other
> +                        * task. Require the current task to also have
> +                        * CAP_KILL.
> +                        */
> +                       rcu_read_lock();
> +                       is_capable &= ns_capable(__task_cred(task)->user_ns, CAP_KILL);
> +                       rcu_read_unlock();
> +
> +                       /*
> +                        * If the required capabilities aren't available, checks
> +                        * for ptrace permissions: upgrade to ATTACH, since
> +                        * sending signals can effectively change the target
> +                        * task.
> +                        */
> +                       ptrace_mode = PTRACE_MODE_ATTACH_REALCREDS;
> +               }
> +
>                 /*
>                  * Preserve ptrace permission check for backwards compatibility.
>                  *
> @@ -12165,7 +12188,7 @@ SYSCALL_DEFINE5(perf_event_open,
>                  * perf_event_exit_task() that could imply).
>                  */
>                 err = -EACCES;
> -               if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
> +               if (!is_capable && !ptrace_may_access(task, ptrace_mode))
>                         goto err_cred;
>         }
>
> --
> 2.32.0.93.g670b81a890-goog
>

^ permalink raw reply

* Re: [Virtio-fs] [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Greg Kurz @ 2021-07-12 12:49 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Vivek Goyal, Christian Brauner, gscrivan, tytso, miklos, selinux,
	linux-kernel, virtio-fs, casey.schaufler, linux-security-module,
	viro, Christoph Hellwig, linux-fsdevel, jack
In-Reply-To: <710d1c6f-d477-384f-0cc1-8914258f1fb1@schaufler-ca.com>

On Fri, 9 Jul 2021 08:34:41 -0700
Casey Schaufler <casey@schaufler-ca.com> wrote:

> On 7/9/2021 8:27 AM, Vivek Goyal wrote:
> > On Fri, Jul 09, 2021 at 11:19:15AM +0200, Christian Brauner wrote:
> >> On Thu, Jul 08, 2021 at 01:57:38PM -0400, Vivek Goyal wrote:
> >>> Currently user.* xattr are not allowed on symlink and special files.
> >>>
> >>> man xattr and recent discussion suggested that primary reason for this
> >>> restriction is how file permissions for symlinks and special files
> >>> are little different from regular files and directories.
> >>>
> >>> For symlinks, they are world readable/writable and if user xattr were
> >>> to be permitted, it will allow unpriviliged users to dump a huge amount
> >>> of user.* xattrs on symlinks without any control.
> >>>
> >>> For special files, permissions typically control capability to read/write
> >>> from devices (and not necessarily from filesystem). So if a user can
> >>> write to device (/dev/null), does not necessarily mean it should be allowed
> >>> to write large number of user.* xattrs on the filesystem device node is
> >>> residing in.
> >>>
> >>> This patch proposes to relax the restrictions a bit and allow file owner
> >>> or priviliged user (CAP_FOWNER), to be able to read/write user.* xattrs
> >>> on symlink and special files.
> >>>
> >>> virtiofs daemon has a need to store user.* xatrrs on all the files
> >>> (including symlinks and special files), and currently that fails. This
> >>> patch should help.
> >>>
> >>> Link: https://lore.kernel.org/linux-fsdevel/20210625191229.1752531-1-vgoyal@redhat.com/
> >>> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> >>> ---
> >> Seems reasonable and useful.
> >> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> >>
> >> One question, do all filesystem supporting xattrs deal with setting them
> >> on symlinks/device files correctly?
> > Wrote a simple bash script to do setfattr/getfattr user.foo xattr on
> > symlink and device node on ext4, xfs and btrfs and it works fine.
> 
> How about nfs, tmpfs, overlayfs and/or some of the other less conventional
> filesystems?
> 

How about virtiofs then ? :-)

> >
> > https://github.com/rhvgoyal/misc/blob/master/generic-programs/user-xattr-special-files.sh
> >
> > I probably can add some more filesystems to test.
> >
> > Thanks
> > Vivek
> >
> >>>  fs/xattr.c | 10 ++++++----
> >>>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/fs/xattr.c b/fs/xattr.c
> >>> index 5c8c5175b385..2f1855c8b620 100644
> >>> --- a/fs/xattr.c
> >>> +++ b/fs/xattr.c
> >>> @@ -120,12 +120,14 @@ xattr_permission(struct user_namespace *mnt_userns, struct inode *inode,
> >>>  	}
> >>>  
> >>>  	/*
> >>> -	 * In the user.* namespace, only regular files and directories can have
> >>> -	 * extended attributes. For sticky directories, only the owner and
> >>> -	 * privileged users can write attributes.
> >>> +	 * In the user.* namespace, for symlinks and special files, only
> >>> +	 * the owner and priviliged users can read/write attributes.
> >>> +	 * For sticky directories, only the owner and privileged users can
> >>> +	 * write attributes.
> >>>  	 */
> >>>  	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
> >>> -		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
> >>> +		if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode) &&
> >>> +		    !inode_owner_or_capable(mnt_userns, inode))
> >>>  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
> >>>  		if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
> >>>  		    (mask & MAY_WRITE) &&
> >>> -- 
> >>> 2.25.4
> >>>
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 


^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: Vivek Goyal @ 2021-07-12 14:02 UTC (permalink / raw)
  To: Bruce Fields
  Cc: Casey Schaufler, Christian Brauner, linux-fsdevel, linux-kernel,
	viro, virtio-fs, dwalsh, dgilbert, casey.schaufler,
	linux-security-module, selinux, tytso, miklos, gscrivan, jack,
	Christoph Hellwig
In-Reply-To: <CAPL3RVGKg4G5qiiHo7KYPcsWWgeoW=qNPOSQpd3Sv329jrWrLQ@mail.gmail.com>

On Fri, Jul 09, 2021 at 04:10:16PM -0400, Bruce Fields wrote:
> On Fri, Jul 9, 2021 at 1:59 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > nfs seems to have some issues.
> 
> I'm not sure what the expected behavior is for nfs.  All I have for
> now is some generic troubleshooting ideas, sorry:
> 
> > - I can set user.foo xattr on symlink and query it back using xattr name.
> >
> >   getfattr -h -n user.foo foo-link.txt
> >
> >   But when I try to dump all xattrs on this file, user.foo is being
> >   filtered out it looks like. Not sure why.
> 
> Logging into the server and seeing what's set there could help confirm
> whether it's the client or server that's at fault.  (Or watching the
> traffic in wireshark; there are GET/SET/LISTXATTR ops that should be
> easy to spot.)
> 
> > - I can't set "user.foo" xattr on a device node on nfs and I get
> >   "Permission denied". I am assuming nfs server is returning this.
> 
> Wireshark should tell you whether it's the server or client doing that.
> 
> The RFC is https://datatracker.ietf.org/doc/html/rfc8276, and I don't
> see any explicit statement about what the server should do in the case
> of symlinks or device nodes, but I do see "Any regular file or
> directory may have a set of extended attributes", so that was clearly
> the assumption.  Also, NFS4ERR_WRONG_TYPE is listed as a possible
> error return for the xattr ops.  But on a quick skim I don't see any
> explicit checks in the nfsd code, so I *think* it's just relying on
> the vfs for any file type checks.

Hi Bruce,

Thanks for the response. I am just trying to do set a user.foo xattr on
a device node on nfs.

setfattr -n "user.foo" -v "bar" /mnt/nfs/test-dev

and I get -EACCESS.

I put some printk() statements and EACCESS is being returned from here.

nfs4_xattr_set_nfs4_user() {
        if (!nfs_access_get_cached(inode, current_cred(), &cache, true)) {
                if (!(cache.mask & NFS_ACCESS_XAWRITE)) {
                        return -EACCES;
                }
        }
}

Value of cache.mask=0xd at the time of error.

Thanks
Vivek


^ permalink raw reply

* Re: [PATCH 1/2] net: cipso: fix warnings in netlbl_cipsov4_add_std
From: Paul Moore @ 2021-07-12 15:03 UTC (permalink / raw)
  To: Pavel Skripkin
  Cc: davem, yoshfuji, dsahern, kuba, netdev, linux-security-module,
	linux-kernel, syzbot+cdd51ee2e6b0b2e18c0d
In-Reply-To: <53de0ccd1aa3fffa6bce2a2ae7a5ca07e0af6d3a.1625900431.git.paskripkin@gmail.com>

On Sat, Jul 10, 2021 at 3:03 AM Pavel Skripkin <paskripkin@gmail.com> wrote:
>
> Syzbot reported warning in netlbl_cipsov4_add(). The
> problem was in too big doi_def->map.std->lvl.local_size
> passed to kcalloc(). Since this value comes from userpace there is
> no need to warn if value is not correct.
>
> The same problem may occur with other kcalloc() calls in
> this function, so, I've added __GFP_NOWARN flag to all
> kcalloc() calls there.
>
> Reported-and-tested-by: syzbot+cdd51ee2e6b0b2e18c0d@syzkaller.appspotmail.com
> Fixes: 96cb8e3313c7 ("[NetLabel]: CIPSOv4 and Unlabeled packet integration")
> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> ---
>  net/netlabel/netlabel_cipso_v4.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

This seems fine to me, callers will get a ENOMEM error code too so it
isn't like the failure is going to be a mystery, especially in the
case where an obscenely large translation mapping is being attempted.

Acked-by: Paul Moore <paul@paul-moore.com>

As an aside, I see no reason why this patch can't be merged and 2/2
simply dropped as already in-tree.  As has already been pointed out,
patch 2/2 is a duplicate; the in-tree commit is d612c3f3fae2 ("net:
ipv4: fix memory leak in netlbl_cipsov4_add_std").

> diff --git a/net/netlabel/netlabel_cipso_v4.c b/net/netlabel/netlabel_cipso_v4.c
> index 4f50a64315cf..50f40943c815 100644
> --- a/net/netlabel/netlabel_cipso_v4.c
> +++ b/net/netlabel/netlabel_cipso_v4.c
> @@ -187,14 +187,14 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
>                 }
>         doi_def->map.std->lvl.local = kcalloc(doi_def->map.std->lvl.local_size,
>                                               sizeof(u32),
> -                                             GFP_KERNEL);
> +                                             GFP_KERNEL | __GFP_NOWARN);
>         if (doi_def->map.std->lvl.local == NULL) {
>                 ret_val = -ENOMEM;
>                 goto add_std_failure;
>         }
>         doi_def->map.std->lvl.cipso = kcalloc(doi_def->map.std->lvl.cipso_size,
>                                               sizeof(u32),
> -                                             GFP_KERNEL);
> +                                             GFP_KERNEL | __GFP_NOWARN);
>         if (doi_def->map.std->lvl.cipso == NULL) {
>                 ret_val = -ENOMEM;
>                 goto add_std_failure;
> @@ -263,7 +263,7 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
>                 doi_def->map.std->cat.local = kcalloc(
>                                               doi_def->map.std->cat.local_size,
>                                               sizeof(u32),
> -                                             GFP_KERNEL);
> +                                             GFP_KERNEL | __GFP_NOWARN);
>                 if (doi_def->map.std->cat.local == NULL) {
>                         ret_val = -ENOMEM;
>                         goto add_std_failure;
> @@ -271,7 +271,7 @@ static int netlbl_cipsov4_add_std(struct genl_info *info,
>                 doi_def->map.std->cat.cipso = kcalloc(
>                                               doi_def->map.std->cat.cipso_size,
>                                               sizeof(u32),
> -                                             GFP_KERNEL);
> +                                             GFP_KERNEL | __GFP_NOWARN);
>                 if (doi_def->map.std->cat.cipso == NULL) {
>                         ret_val = -ENOMEM;
>                         goto add_std_failure;
> --
> 2.32.0

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 1/1] xattr: Allow user.* xattr on symlink and special files
From: J. Bruce Fields @ 2021-07-12 15:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Bruce Fields, Casey Schaufler, Christian Brauner, linux-fsdevel,
	linux-kernel, viro, virtio-fs, dwalsh, dgilbert, casey.schaufler,
	linux-security-module, selinux, tytso, miklos, gscrivan, jack,
	Christoph Hellwig
In-Reply-To: <20210712140247.GA486376@redhat.com>

On Mon, Jul 12, 2021 at 10:02:47AM -0400, Vivek Goyal wrote:
> On Fri, Jul 09, 2021 at 04:10:16PM -0400, Bruce Fields wrote:
> > On Fri, Jul 9, 2021 at 1:59 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > nfs seems to have some issues.
> > 
> > I'm not sure what the expected behavior is for nfs.  All I have for
> > now is some generic troubleshooting ideas, sorry:
> > 
> > > - I can set user.foo xattr on symlink and query it back using xattr name.
> > >
> > >   getfattr -h -n user.foo foo-link.txt
> > >
> > >   But when I try to dump all xattrs on this file, user.foo is being
> > >   filtered out it looks like. Not sure why.
> > 
> > Logging into the server and seeing what's set there could help confirm
> > whether it's the client or server that's at fault.  (Or watching the
> > traffic in wireshark; there are GET/SET/LISTXATTR ops that should be
> > easy to spot.)
> > 
> > > - I can't set "user.foo" xattr on a device node on nfs and I get
> > >   "Permission denied". I am assuming nfs server is returning this.
> > 
> > Wireshark should tell you whether it's the server or client doing that.
> > 
> > The RFC is https://datatracker.ietf.org/doc/html/rfc8276, and I don't
> > see any explicit statement about what the server should do in the case
> > of symlinks or device nodes, but I do see "Any regular file or
> > directory may have a set of extended attributes", so that was clearly
> > the assumption.  Also, NFS4ERR_WRONG_TYPE is listed as a possible
> > error return for the xattr ops.  But on a quick skim I don't see any
> > explicit checks in the nfsd code, so I *think* it's just relying on
> > the vfs for any file type checks.
> 
> Hi Bruce,
> 
> Thanks for the response. I am just trying to do set a user.foo xattr on
> a device node on nfs.
> 
> setfattr -n "user.foo" -v "bar" /mnt/nfs/test-dev
> 
> and I get -EACCESS.
> 
> I put some printk() statements and EACCESS is being returned from here.
> 
> nfs4_xattr_set_nfs4_user() {
>         if (!nfs_access_get_cached(inode, current_cred(), &cache, true)) {
>                 if (!(cache.mask & NFS_ACCESS_XAWRITE)) {
>                         return -EACCES;
>                 }
>         }
> }
> 
> Value of cache.mask=0xd at the time of error.

Looks like 0xd is what the server returns to access on a device node
with mode bits rw- for the caller.

Commit c11d7fd1b317 "nfsd: take xattr bits into account for permission
checks" added the ACCESS_X* bits for regular files and directories but
not others.

But you don't want to determine permission from the mode bits anyway,
you want it to depend on the owner, so I guess we should be calling
xattr_permission somewhere if we want that behavior.

The RFC assumes user xattrs are for regular files and directories,
without, as far as I can tell, actually explicitly forbidding them on
other objects.  We should also raise this with the working group if we
want to increase the chances that you'll get the behavior you want on
non-Linux servers.

The "User extended attributes" section of the xattr(7) man page will
need updating.

--b.

^ permalink raw reply

* [PATCH v2 2/4] landlock_create_ruleset.2: Document new syscall
From: Mickaël Salaün @ 2021-07-12 15:57 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Randy Dunlap, Vincent Dagonneau, landlock, linux-kernel,
	linux-man, linux-security-module, Mickaël Salaün
In-Reply-To: <20210712155745.831580-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210712155745.831580-3-mic@digikod.net
---

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Add a "CONFORMING TO" section.
* Replace "(2)" with "()" for the described syscall name.
---
 man2/landlock_create_ruleset.2 | 136 +++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)
 create mode 100644 man2/landlock_create_ruleset.2

diff --git a/man2/landlock_create_ruleset.2 b/man2/landlock_create_ruleset.2
new file mode 100644
index 000000000000..38029c2cc48a
--- /dev/null
+++ b/man2/landlock_create_ruleset.2
@@ -0,0 +1,136 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK_CREATE_RULESET 2 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+landlock_create_ruleset \- create a new Landlock ruleset
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/landlock.h>" "  /* Definition of " LANDLOCK_* " constants */"
+.BR "#include <sys/syscall.h>" "     /* Definition of " SYS_* " constants */"
+.PP
+.BI "int syscall(SYS_landlock_create_ruleset,
+.BI "            const struct landlock_ruleset_attr " attr ,
+.BI "            size_t " size " , __u32 " flags );
+.SH DESCRIPTION
+A Landlock ruleset identifies a set of rules (i.e., actions on objects).
+This
+.BR landlock_create_ruleset ()
+system call enables creating a new file descriptor identifying a ruleset.
+This file descriptor can then be used by
+.BR landlock_add_rule (2)
+and
+.BR landlock_restrict_self (2).
+See
+.BR landlock (7)
+for a global overview.
+.PP
+.IR attr
+specifies the properties of the new ruleset.
+It points to the following structure:
+.IP
+.in +4n
+.EX
+struct landlock_ruleset_attr {
+	__u64 handled_access_fs;
+};
+.EE
+.in
+.IP
+.IR handled_access_fs
+is a bitmask of actions that is handled by this ruleset and should then be
+forbidden if no rule explicitly allow them
+(see
+.BR "Filesystem actions"
+in
+.BR landlock (7)).
+This enables simply restricting ambient rights
+(e.g., global filesystem access) and is needed for compatibility reasons.
+.PP
+.IR size
+must be specified as
+.IR "sizeof(struct landlock_ruleset_attr)"
+for compatibility reasons.
+.PP
+.IR flags
+must be 0 if
+.IR attr
+is used.
+Otherwise,
+.IR flags
+can be set to:
+.TP
+.B LANDLOCK_CREATE_RULESET_VERSION
+If
+.IR attr
+is NULL and
+.IR size
+is 0, then the returned value is the highest supported Landlock ABI version
+(starting at 1).
+This version can be used for a best-effort security approach,
+which is encouraged when user space is not pinned to a specific kernel
+version.
+All features documented in these man pages are available with the version
+1.
+.SH RETURN VALUE
+On success,
+.BR landlock_create_ruleset ()
+returns a new Landlock ruleset file descriptor, or a Landlock ABI version
+according to
+.IR flags .
+.SH ERRORS
+.BR landlock_create_ruleset ()
+can failed for the following reasons:
+.TP
+.B EOPNOTSUPP
+Landlock is supported by the kernel but disabled at boot time.
+.TP
+.B EINVAL
+Unknown
+.IR flags ,
+or unknown access, or too small
+.IR size .
+.TP
+.B E2BIG
+.IR size
+is too big.
+.TP
+.B EFAULT
+.IR attr
+was not a valid address.
+.TP
+.B ENOMSG
+Empty accesses (i.e.,
+.IR attr->handled_access_fs
+is 0).
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH SEE ALSO
+.BR landlock (7),
+.BR landlock_add_rule (2),
+.BR landlock_restrict_self (2)
-- 
2.32.0


^ permalink raw reply related

* [PATCH v2 0/4] Add Landlock man pages
From: Mickaël Salaün @ 2021-07-12 15:57 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Randy Dunlap, Vincent Dagonneau, landlock, linux-kernel,
	linux-man, linux-security-module, Mickaël Salaün

From: Mickaël Salaün <mic@linux.microsoft.com>

Hi,

These four documents give a global overview of Landlock and explain each
system calls.  This is mainly a formatting of the current kernel
documentation with some new additional details.

This second patch series slightly improves the content and fixes some
syntax issues pointed out by Alejandro Colomar.

This patch series can be found in a Git repository:
https://github.com/landlock-lsm/man-pages/commits/landlock-v2

Previous version:
https://lore.kernel.org/linux-man/20210706182217.32338-1-mic@digikod.net/

Regards,

Mickaël Salaün (4):
  landlock.7: Add a new page to introduce Landlock
  landlock_create_ruleset.2: Document new syscall
  landlock_add_rule.2: Document new syscall
  landlock_restrict_self.2: Document new syscall

 man2/landlock_add_rule.2       | 139 +++++++++++++
 man2/landlock_create_ruleset.2 | 136 +++++++++++++
 man2/landlock_restrict_self.2  | 130 ++++++++++++
 man7/landlock.7                | 356 +++++++++++++++++++++++++++++++++
 4 files changed, 761 insertions(+)
 create mode 100644 man2/landlock_add_rule.2
 create mode 100644 man2/landlock_create_ruleset.2
 create mode 100644 man2/landlock_restrict_self.2
 create mode 100644 man7/landlock.7


base-commit: 33248cfe50ebb8762208e7ef3264676dad71b016
-- 
2.32.0


^ permalink raw reply

* [PATCH v2 1/4] landlock.7: Add a new page to introduce Landlock
From: Mickaël Salaün @ 2021-07-12 15:57 UTC (permalink / raw)
  To: Alejandro Colomar, Michael Kerrisk
  Cc: Mickaël Salaün, Jann Horn, Jonathan Corbet, Kees Cook,
	Randy Dunlap, Vincent Dagonneau, landlock, linux-kernel,
	linux-man, linux-security-module, Mickaël Salaün
In-Reply-To: <20210712155745.831580-1-mic@digikod.net>

From: Mickaël Salaün <mic@linux.microsoft.com>

From the user point of view, Landlock is a set of system calls enabling
to build and enforce a set of access-control rules.  A ruleset can be
created with landlock_create_ruleset(2), populated with
landlock_add_rule(2) and enforced with landlock_restrict_self(2).  This
man page gives an overview of the whole mechanism.  Details of these
system calls are documented in their respective man pages.

This is an adaptation of
https://www.kernel.org/doc/html/v5.13/userspace-api/landlock.html

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210712155745.831580-2-mic@digikod.net
---

Changes since v1:
* Replace all ".I" with ".IR", except when used for titles.
* Append punctuation to ".IR" and ".BR" when it makes sense (requested
  by Alejandro Colomar).
* Cut lines according to the semantic newline rules (requested by
  Alejandro Colomar).
* Remove roman style from ".TP" section titles (requested by Alejandro
  Colomar).
* Add comma after "i.e." and "e.g.".
* Move the example in a new EXAMPLES section.
* Improve title.
* Explain the LSM acronym at first use.
---
 man7/landlock.7 | 356 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 356 insertions(+)
 create mode 100644 man7/landlock.7

diff --git a/man7/landlock.7 b/man7/landlock.7
new file mode 100644
index 000000000000..c89f5b1cabb6
--- /dev/null
+++ b/man7/landlock.7
@@ -0,0 +1,356 @@
+.\" Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+.\" Copyright © 2019-2020 ANSSI
+.\" Copyright © 2021 Microsoft Corporation
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH LANDLOCK 7 2021-06-27 Linux "Linux Programmer's Manual"
+.SH NAME
+Landlock \- unprivileged access-control
+.SH DESCRIPTION
+Landlock is an access-control system that enables any processes to securely
+restrict themselves and their future children.
+Because Landlock is a stackable Linux Security Module (LSM),
+it makes possible to create safe security sandboxes as new security layers
+in addition to the existing system-wide access-controls.
+This kind of sandbox is expected to help mitigate the security impact of
+bugs, and unexpected or malicious behaviors in applications.
+.PP
+A Landlock security policy is a set of access rights
+(e.g., open a file in read-only, make a directory, etc.)
+tied to a file hierarchy.
+Such policy can be configured and enforced by processes for themselves
+using three system calls:
+.IP \(bu 2
+.BR landlock_create_ruleset (2)
+creates a new ruleset;
+.IP \(bu
+.BR landlock_add_rule (2)
+adds a new rule to a ruleset;
+.IP \(bu
+.BR landlock_restrict_self (2)
+enforces a ruleset on the calling thread.
+.PP
+To be able to use these system calls,
+the running kernel must support Landlock and it must be enabled at boot
+time.
+.\"
+.SS Landlock rules
+A Landlock rule describes an action on an object.
+An object is currently a file hierarchy,
+and the related filesystem actions are defined with access rights (see
+.BR landlock_add_rule (2)).
+A set of rules is aggregated in a ruleset, which can
+then restrict the thread enforcing it, and its future children.
+.\"
+.SS Filesystem actions
+These flags enable to restrict a sandboxed process to a set of actions on
+files and directories.
+Files or directories opened before the sandboxing are not subject to these
+restrictions.
+See
+.BR landlock_add_rule (2)
+and
+.BR landlock_create_ruleset (2)
+for more context.
+.PP
+A file can only receive these access rights:
+.TP
+.B LANDLOCK_ACCESS_FS_EXECUTE
+Execute a file.
+.TP
+.B LANDLOCK_ACCESS_FS_WRITE_FILE
+Open a file with write access.
+.TP
+.B LANDLOCK_ACCESS_FS_READ_FILE
+Open a file with read access.
+.PP
+A directory can receive access rights related to files or directories.
+The following access right is applied to the directory itself,
+and the directories beneath it:
+.TP
+.B LANDLOCK_ACCESS_FS_READ_DIR
+Open a directory or list its content.
+.PP
+However,
+the following access rights only apply to the content of a directory,
+not the directory itself:
+.TP
+.B LANDLOCK_ACCESS_FS_REMOVE_DIR
+Remove an empty directory or rename one.
+.TP
+.B LANDLOCK_ACCESS_FS_REMOVE_FILE
+Unlink (or rename) a file.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_CHAR
+Create (or rename or link) a character device.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_DIR
+Create (or rename) a directory.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_REG
+Create (or rename or link) a regular file.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_SOCK
+Create (or rename or link) a UNIX domain socket.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_FIFO
+Create (or rename or link) a named pipe.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_BLOCK
+Create (or rename or link) a block device.
+.TP
+.B LANDLOCK_ACCESS_FS_MAKE_SYM
+Create (or rename or link) a symbolic link.
+.\"
+.SS Layers of file path access rights
+Each time a thread enforces a ruleset on itself, it updates its Landlock
+domain with a new layer of policy.
+Indeed, this complementary policy is composed with the potentially other
+rulesets already restricting this thread.
+A sandboxed thread can then safely add more constraints to itself with a
+new enforced ruleset.
+.PP
+One policy layer grants access to a file path if at least one of its rules
+encountered on the path grants the access.
+A sandboxed thread can only access a file path if all its enforced policy
+layers grant the access as well as all the other system access controls
+(e.g., filesystem DAC, other LSM policies, etc.).
+.\"
+.SS Bind mounts and OverlayFS
+Landlock enables restricting access to file hierarchies,
+which means that these access rights can be propagated with bind mounts
+(cf.
+.BR mount_namespaces (7))
+but not with OverlayFS.
+.PP
+A bind mount mirrors a source file hierarchy to a destination.
+The destination hierarchy is then composed of the exact same files,
+on which Landlock rules can be tied, either via the source or the
+destination path.
+These rules restrict access when they are encountered on a path,
+which means that they can restrict access to multiple file hierarchies at
+the same time,
+whether these hierarchies are the result of bind mounts or not.
+.PP
+An OverlayFS mount point consists of upper and lower layers.
+These layers are combined in a merge directory, result of the mount point.
+This merge hierarchy may include files from the upper and lower layers,
+but modifications performed on the merge hierarchy only reflects on the
+upper layer.
+From a Landlock policy point of view,
+each OverlayFS layers and merge hierarchies are standalone and contains
+their own set of files and directories,
+which is different from bind mounts.
+A policy restricting an OverlayFS layer will not restrict the resulted
+merged hierarchy, and vice versa.
+Landlock users should then only think about file hierarchies they want to
+allow access to, regardless of the underlying filesystem.
+.\"
+.SS Inheritance
+Every new thread resulting from a
+.BR clone (2)
+inherits Landlock domain restrictions from its parent.
+This is similar to the
+.BR seccomp (2)
+inheritance or any other LSM dealing with task's
+.BR credentials (7).
+For instance, one process's thread may apply Landlock rules to itself,
+but they will not be automatically applied to other sibling threads
+(unlike POSIX thread credential changes, cf.
+.BR nptl (7)).
+.PP
+When a thread sandboxes itself, we have the guarantee that the related
+security policy will stay enforced on all this thread's descendants.
+This allows creating standalone and modular security policies per
+application,
+which will automatically be composed between themselves according to their
+runtime parent policies.
+.\"
+.SS Ptrace restrictions
+A sandboxed process has less privileges than a non-sandboxed process and
+must then be subject to additional restrictions when manipulating another
+process.
+To be allowed to use
+.BR ptrace (2)
+and related syscalls on a target process,
+a sandboxed process should have a subset of the target process rules,
+which means the tracee must be in a sub-domain of the tracer.
+.SH VERSIONS
+Landlock was added in Linux 5.13.
+.SH NOTES
+Landlock is enabled by CONFIG_SECURITY_LANDLOCK.
+The
+.IR lsm=lsm1,...,lsmN
+command line parameter controls the sequence of the initialization of
+Linux Security Modules.
+It must contain the string
+.IR landlock
+to enable Landlock.
+If the command line parameter is not specified,
+the initialization falls back to the value of the deprecated
+.IR security=
+command line parameter and further to the value of CONFIG_LSM.
+We can check that Landlock is enabled by looking for
+.IR "landlock: Up and running."
+in kernel logs.
+.PP
+It is currently not possible to restrict some file-related actions
+accessible through these syscall families:
+.BR chdir (2),
+.BR truncate (2),
+.BR stat (2),
+.BR flock (2),
+.BR chmod (2),
+.BR chown (2),
+.BR setxattr (2),
+.BR utime (2),
+.BR ioctl (2),
+.BR fcntl (2),
+.BR access (2).
+Future Landlock evolutions will enable to restrict them.
+.SH EXAMPLES
+We first need to create the ruleset that will contain our rules.
+For this example,
+the ruleset will contain rules that only allow read actions,
+but write actions will be denied.
+The ruleset then needs to handle both of these kind of actions.
+See below for the description of filesystem actions.
+.PP
+.in +4n
+.EX
+int ruleset_fd;
+struct landlock_ruleset_attr ruleset_attr = {
+    .handled_access_fs =
+        LANDLOCK_ACCESS_FS_EXECUTE |
+        LANDLOCK_ACCESS_FS_WRITE_FILE |
+        LANDLOCK_ACCESS_FS_READ_FILE |
+        LANDLOCK_ACCESS_FS_READ_DIR |
+        LANDLOCK_ACCESS_FS_REMOVE_DIR |
+        LANDLOCK_ACCESS_FS_REMOVE_FILE |
+        LANDLOCK_ACCESS_FS_MAKE_CHAR |
+        LANDLOCK_ACCESS_FS_MAKE_DIR |
+        LANDLOCK_ACCESS_FS_MAKE_REG |
+        LANDLOCK_ACCESS_FS_MAKE_SOCK |
+        LANDLOCK_ACCESS_FS_MAKE_FIFO |
+        LANDLOCK_ACCESS_FS_MAKE_BLOCK |
+        LANDLOCK_ACCESS_FS_MAKE_SYM,
+};
+
+ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+if (ruleset_fd < 0) {
+    perror("Failed to create a ruleset");
+    return 1;
+}
+.EE
+.in
+.PP
+We can now add a new rule to this ruleset thanks to the returned file
+descriptor referring to this ruleset.
+The rule will only allow reading the file hierarchy
+.IR /usr .
+Without another rule, write actions would then be denied by the ruleset.
+To add
+.IR /usr
+to the ruleset, we open it with the
+.IR O_PATH
+flag and fill the
+.IR "struct landlock_path_beneath_attr"
+with this file descriptor.
+.PP
+.in +4n
+.EX
+int err;
+struct landlock_path_beneath_attr path_beneath = {
+    .allowed_access =
+        LANDLOCK_ACCESS_FS_EXECUTE |
+        LANDLOCK_ACCESS_FS_READ_FILE |
+        LANDLOCK_ACCESS_FS_READ_DIR,
+};
+
+path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
+if (path_beneath.parent_fd < 0) {
+    perror("Failed to open file");
+    close(ruleset_fd);
+    return 1;
+}
+err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+                        &path_beneath, 0);
+close(path_beneath.parent_fd);
+if (err) {
+    perror("Failed to update ruleset");
+    close(ruleset_fd);
+    return 1;
+}
+.EE
+.in
+.PP
+We now have a ruleset with one rule allowing read access to
+.IR /usr
+while denying all other handled accesses for the filesystem.
+The next step is to restrict the current thread from gaining more
+privileges
+(e.g., thanks to a set-user-ID binary).
+.PP
+.in +4n
+.EX
+if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+    perror("Failed to restrict privileges");
+    close(ruleset_fd);
+    return 1;
+}
+.EE
+.in
+.PP
+The current thread is now ready to sandbox itself with the ruleset.
+.PP
+.in +4n
+.EX
+if (landlock_restrict_self(ruleset_fd, 0)) {
+    perror("Failed to enforce ruleset");
+    close(ruleset_fd);
+    return 1;
+}
+close(ruleset_fd);
+.EE
+.in
+.PP
+If the
+.BR landlock_restrict_self (2)
+system call succeeds, the current thread is now restricted and this policy
+will be enforced on all its subsequently created children as well.
+Once a thread is landlocked, there is no way to remove its security policy;
+only adding more restrictions is allowed.
+These threads are now in a new Landlock domain, merge of their parent one
+(if any) with the new ruleset.
+.PP
+Full working code can be found in
+.UR https://git.kernel.org\:/pub\:/scm\:/linux\:/kernel\:/git\:/stable\:/linux.git\:/tree\:/samples\:/landlock\:/sandboxer.c
+.UE
+.SH SEE ALSO
+.BR landlock_create_ruleset (2),
+.BR landlock_add_rule (2),
+.BR landlock_restrict_self (2)
+.PP
+.UR https://landlock.io\:/
+.UE
-- 
2.32.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox