Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: [PATCH 4/4] module: Add hook for security_kernel_post_read_file()
From: Kees Cook @ 2020-07-08  3:10 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: James Morris, Jessica Yu, Luis Chamberlain, Scott Branden,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <1594169240.23056.143.camel@linux.ibm.com>

On Tue, Jul 07, 2020 at 08:47:20PM -0400, Mimi Zohar wrote:
> On Tue, 2020-07-07 at 01:19 -0700, Kees Cook wrote:
> > Calls to security_kernel_load_data() should be paired with a call to
> > security_kernel_post_read_file() with a NULL file argument. Add the
> > missing call so the module contents are visible to the LSMs interested
> > in measuring the module content. (This also paves the way for moving
> > module signature checking out of the module core and into an LSM.)
> > 
> > Cc: Jessica Yu <jeyu@kernel.org>
> > Fixes: c77b8cdf745d ("module: replace the existing LSM hook in init_module")
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >  kernel/module.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 0c6573b98c36..af9679f8e5c6 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -2980,7 +2980,12 @@ static int copy_module_from_user(const void __user *umod, unsigned long len,
> >  		return -EFAULT;
> >  	}
> >  
> > -	return 0;
> > +	err = security_kernel_post_read_file(NULL, (char *)info->hdr,
> > +					     info->len, READING_MODULE);
> 
> There was a lot of push back on calling security_kernel_read_file()
> with a NULL file descriptor here.[1]  The result was defining a new
> security hook - security_kernel_load_data - and enumeration -
> LOADING_MODULE.  I would prefer calling the same pre and post security
> hook.
> 
> Mimi
> 
> [1] http://kernsec.org/pipermail/linux-security-module-archive/2018-May/007110.html

Ah yes, thanks for the pointer to the discussion.

I think we have four cases then, for differing LSM hooks:

- no "file", no contents
	e.g. init_module() before copying user buffer
	security_kernel_load_data()
- only a "file" available, no contents
	e.g. kernel_read_file() before actually reading anything
	security_kernel_read_file()
- "file" and contents
	e.g. kernel_read_file() after reading
	security_kernel_post_read_file()
- no "file" available, just the contents
	e.g. firmware platform fallback from EFI space (no "file")
	unimplemented!

If an LSM wants to be able to examine the contents of firmware, modules,
kexec, etc, it needs either a "file" or the full contents.

The "file" methods all pass through the kernel_read_file()-family. The
others happen via blobs coming from userspace or (more recently) the EFI
universe.

So, if a NULL file is unreasonable, we need, perhaps,
security_kernel_post_load_data()

?

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Scott Branden @ 2020-07-08  3:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: James Morris, Luis Chamberlain, Mimi Zohar, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alexander Viro, Jessica Yu, Dmitry Kasatkin,
	Serge E. Hallyn, Casey Schaufler, Eric W. Biederman,
	Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module,
	Christoph Hellwig
In-Reply-To: <202007071447.D96AA42ECE@keescook>

Hi Kees,

Thanks for looking at my patch series to see how it relates.
I see what you're trying to accomplish in various areas of cleanup.
I'll comment as I go through your individual emails.
1 comment below.

On 2020-07-07 2:55 p.m., Kees Cook wrote:
> On Tue, Jul 07, 2020 at 09:42:02AM -0700, Scott Branden wrote:
>> On 2020-07-07 1:19 a.m., Kees Cook wrote:
>>> FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
>>> that are interested in filtering between types of things. The "how"
>>> should be an internal detail made uninteresting to the LSMs.
>>>
>>> Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
>>> Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
>>> Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
>>> [...]
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 3f881a892ea7..95fc775ed937 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -2993,10 +2993,10 @@ static inline void i_readcount_inc(struct inode *inode)
>>>    #endif
>>>    extern int do_pipe_flags(int *, int);
>>> +/* This is a list of *what* is being read, not *how*. */
>>>    #define __kernel_read_file_id(id) \
>>>    	id(UNKNOWN, unknown)		\
>>>    	id(FIRMWARE, firmware)		\
>> With this change, I'm trying to figure out how the partial firmware read is
>> going to work on top of this reachitecture.
>> Is it going to be ok to add READING_PARTIAL_FIRMWARE here as that is a
>> "what"?
> No, that's why I said you need to do the implementation within the API
> and not expect each LSM to implement their own (as I mentioned both
> times):
>
> https://lore.kernel.org/lkml/202005221551.5CA1372@keescook/
> https://lore.kernel.org/lkml/202007061950.F6B3D9E6A@keescook/
>
> I will reply in the thread above.
>
>>> -	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
>> My patch series gets rejected any time I make a change to the
>> kernel_read_file* region in linux/fs.h.
>> The requirement is for this api to move to another header file outside of
>> linux/fs.h
>> It seems the same should apply to your change.
> Well I'm hardly making the same level of changes, but yeah, sure, if
> that helps move things along, I can include that here.
>
>> Could you please add the following patch to the start of you patch series to
>> move the kernel_read_file* to its own include file?
>> https://patchwork.kernel.org/patch/11647063/
> https://lore.kernel.org/lkml/20200706232309.12010-2-scott.branden@broadcom.com/
>
> You've included it in include/linux/security.h and that should be pretty
> comprehensive, it shouldn't be needed in so many .c files.
Some people want the header files included in each c file they are used.
Others want header files not included if they are included in another 
header file.
I chose the first approach: every file that uses the api includes the 
header file.
I didn't know there was a standard approach to only put it in security.h
>


^ permalink raw reply

* [security:secure_uffd_v5.9 1/3] fs/anon_inodes.c:92:10-17: WARNING: ERR_CAST can be used with inode
From: kernel test robot @ 2020-07-08  2:53 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: kbuild-all, linux-security-module, James Morris

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git secure_uffd_v5.9
head:   d08ac70b1e0dc71ac2315007bcc3efb283b2eae4
commit: 2749d3f84a708110ed0359579be0eddd2c4036da [1/3] Add a new LSM-supporting anonymous inode interface
config: sh-randconfig-c022-20200707 (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


coccinelle warnings: (new ones prefixed by >>)

>> fs/anon_inodes.c:92:10-17: WARNING: ERR_CAST can be used with inode

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26323 bytes --]

^ permalink raw reply

* [PATCH] fix err_cast.cocci warnings
From: kernel test robot @ 2020-07-08  2:53 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: kbuild-all, linux-security-module, James Morris
In-Reply-To: <202007081008.YPssDcFv%lkp@intel.com>

From: kernel test robot <lkp@intel.com>

fs/anon_inodes.c:92:10-17: WARNING: ERR_CAST can be used with inode


 Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))

Generated by: scripts/coccinelle/api/err_cast.cocci

Fixes: 2749d3f84a70 ("Add a new LSM-supporting anonymous inode interface")
CC: Daniel Colascione <dancol@google.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git secure_uffd_v5.9
head:   d08ac70b1e0dc71ac2315007bcc3efb283b2eae4
commit: 2749d3f84a708110ed0359579be0eddd2c4036da [1/3] Add a new LSM-supporting anonymous inode interface

 anon_inodes.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -89,7 +89,7 @@ struct file *_anon_inode_getfile(const c
 		inode =	anon_inode_make_secure_inode(
 			name, context_inode);
 		if (IS_ERR(inode))
-			return ERR_PTR(PTR_ERR(inode));
+			return ERR_CAST(inode);
 	} else {
 		inode =	anon_inode_inode;
 		if (IS_ERR(inode))

^ permalink raw reply

* Re: [PATCH 4/4] module: Add hook for security_kernel_post_read_file()
From: Mimi Zohar @ 2020-07-08  0:47 UTC (permalink / raw)
  To: Kees Cook, James Morris
  Cc: Jessica Yu, Luis Chamberlain, Scott Branden, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alexander Viro, Dmitry Kasatkin,
	Serge E. Hallyn, Casey Schaufler, Eric W. Biederman,
	Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <20200707081926.3688096-5-keescook@chromium.org>

On Tue, 2020-07-07 at 01:19 -0700, Kees Cook wrote:
> Calls to security_kernel_load_data() should be paired with a call to
> security_kernel_post_read_file() with a NULL file argument. Add the
> missing call so the module contents are visible to the LSMs interested
> in measuring the module content. (This also paves the way for moving
> module signature checking out of the module core and into an LSM.)
> 
> Cc: Jessica Yu <jeyu@kernel.org>
> Fixes: c77b8cdf745d ("module: replace the existing LSM hook in init_module")
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  kernel/module.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 0c6573b98c36..af9679f8e5c6 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2980,7 +2980,12 @@ static int copy_module_from_user(const void __user *umod, unsigned long len,
>  		return -EFAULT;
>  	}
>  
> -	return 0;
> +	err = security_kernel_post_read_file(NULL, (char *)info->hdr,
> +					     info->len, READING_MODULE);

There was a lot of push back on calling security_kernel_read_file()
with a NULL file descriptor here.[1]  The result was defining a new
security hook - security_kernel_load_data - and enumeration -
LOADING_MODULE.  I would prefer calling the same pre and post security
hook.

Mimi

[1] http://kernsec.org/pipermail/linux-security-module-archive/2018-Ma
y/007110.html

> +	if (err)
> +		vfree(info->hdr);
> +
> +	return err;
>  }
>  
>  static void free_copy(struct load_info *info)


^ permalink raw reply

* Re: [PATCH v10 2/9] fs: introduce kernel_pread_file* support
From: Mimi Zohar @ 2020-07-08  0:24 UTC (permalink / raw)
  To: Kees Cook, Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Rafael J . Wysocki, linux-kernel, linux-arm-msm,
	linux-fsdevel, BCM Kernel Feedback, Olof Johansson, Andrew Morton,
	Dan Carpenter, Colin Ian King, Takashi Iwai, linux-kselftest,
	Andy Gross, linux-integrity, linux-security-module
In-Reply-To: <202007071642.AA705B2A@keescook>

On Tue, 2020-07-07 at 16:56 -0700, Kees Cook wrote:
> > @@ -951,21 +955,32 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> >               ret = -EINVAL;
> >               goto out;
> >       }
> > -     if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> > +
> > +     /* Default read to end of file */
> > +     read_end = i_size;
> > +
> > +     /* Allow reading partial portion of file */
> > +     if ((id == READING_FIRMWARE_PARTIAL_READ) &&
> > +         (i_size > (pos + max_size)))
> > +             read_end = pos + max_size;
> 
> There's no need to involve "id" here. There are other signals about
> what's happening (i.e. pos != 0, max_size != i_size, etc).

Both the pre and post security kernel_read_file hooks are called here,
but there isn't enough information being passed to the LSM/IMA to be
able to different which hook is applicable.  One method of providing
that additional information is by enumeration.  The other option would
be to pass some additional information.

For example, on the post kernel_read_file hook, the file is read once
into memory.  IMA calculates the firmware file hash based on the
buffer contents.  On the pre kernel_read_file hook, IMA would need to
read the entire file, calculating the file hash.  Both methods of
calculating the file hash work, but the post hook is more efficient.

Mimi

^ permalink raw reply

* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll
From: Daniel Borkmann @ 2020-07-08  0:05 UTC (permalink / raw)
  To: Eric W. Biederman, Christian Brauner
  Cc: Alexei Starovoitov, linux-kernel, David Miller,
	Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton,
	Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Jakub Kicinski,
	Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List,
	Casey Schaufler, Luis Chamberlain, Linus Torvalds
In-Reply-To: <87mu4bjlqm.fsf@x220.int.ebiederm.org>

On 7/7/20 7:09 PM, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
>> On Fri, Jul 03, 2020 at 04:37:47PM -0500, Eric W. Biederman wrote:
>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>>
>>>> On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote:
>>>>> Create an independent helper thread_group_exited report return true
>>>>> when all threads have passed exit_notify in do_exit.  AKA all of the
>>>>> threads are at least zombies and might be dead or completely gone.
>>>>>
>>>>> Create this helper by taking the logic out of pidfd_poll where
>>>>> it is already tested, and adding a missing READ_ONCE on
>>>>> the read of task->exit_state.
>>>>>
>>>>> I will be changing the user mode driver code to use this same logic
>>>>> to know when a user mode driver needs to be restarted.
>>>>>
>>>>> Place the new helper thread_group_exited in kernel/exit.c and
>>>>> EXPORT it so it can be used by modules.
>>>>>
>>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>> ---
>>>>>   include/linux/sched/signal.h |  2 ++
>>>>>   kernel/exit.c                | 24 ++++++++++++++++++++++++
>>>>>   kernel/fork.c                |  6 +-----
>>>>>   3 files changed, 27 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>>>>> index 0ee5e696c5d8..1bad18a1d8ba 100644
>>>>> --- a/include/linux/sched/signal.h
>>>>> +++ b/include/linux/sched/signal.h
>>>>> @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p)
>>>>>   #define delay_group_leader(p) \
>>>>>   		(thread_group_leader(p) && !thread_group_empty(p))
>>>>>   
>>>>> +extern bool thread_group_exited(struct pid *pid);
>>>>> +
>>>>>   extern struct sighand_struct *__lock_task_sighand(struct task_struct *task,
>>>>>   							unsigned long *flags);
>>>>>   
>>>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>>>> index d3294b611df1..a7f112feb0f6 100644
>>>>> --- a/kernel/exit.c
>>>>> +++ b/kernel/exit.c
>>>>> @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid,
>>>>>   }
>>>>>   #endif
>>>>>   
>>>>> +/**
>>>>> + * thread_group_exited - check that a thread group has exited
>>>>> + * @pid: tgid of thread group to be checked.
>>>>> + *
>>>>> + * Test if thread group is has exited (all threads are zombies, dead
>>>>> + * or completely gone).
>>>>> + *
>>>>> + * Return: true if the thread group has exited. false otherwise.
>>>>> + */
>>>>> +bool thread_group_exited(struct pid *pid)
>>>>> +{
>>>>> +	struct task_struct *task;
>>>>> +	bool exited;
>>>>> +
>>>>> +	rcu_read_lock();
>>>>> +	task = pid_task(pid, PIDTYPE_PID);
>>>>> +	exited = !task ||
>>>>> +		(READ_ONCE(task->exit_state) && thread_group_empty(task));
>>>>> +	rcu_read_unlock();
>>>>> +
>>>>> +	return exited;
>>>>> +}
>>>>
>>>> I'm not sure why you think READ_ONCE was missing.
>>>> It's different in wait_consider_task() where READ_ONCE is needed because
>>>> of multiple checks. Here it's done once.
>>>
>>> In practice it probably has no effect on the generated code.  But
>>> READ_ONCE is about telling the compiler not to be clever.  Don't use
>>> tearing loads or stores etc.  When all of the other readers are using
>>> READ_ONCE I just get nervous if we have a case that doesn't.
>>
>> That's not true. The only place where READ_ONCE(->exit_state) is used is
>> in wait_consider_task() and nowhere else. We had that discussion a while
>> ago where I or someone proposed to simply place a READ_ONCE() around all
>> accesses to exit_state for the sake of kcsan and we agreed that it's
>> unnecessary and not to do this.
>> But it obviously doesn't hurt to have it.
> 
> There is a larger discussion to be had around the proper handling of
> exit_state.
> 
> In this particular case because we are accessing exit_state with
> only rcu_read_lock protection, because the outcome of the read
> is about correctness, and because the compiler has nothing else
> telling it not to re-read exit_state, I believe we actually need
> the READ_ONCE.
> 
> At the same time it would take a pretty special compiler to want to
> reaccess that field in thread_group_exited.
> 
> I have looked through and I don't find any of the other access of
> exit_state where the result is about correctness (so that we care)
> and we don't hold tasklist_lock.
> 
> But I have removed the necessary wording from the commit comment.

Hey Eric, are you planning to push the final version into a topic branch
so it can be pulled into bpf-next as discussed earlier?

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v10 7/9] misc: bcm-vk: add Broadcom VK driver
From: Kees Cook @ 2020-07-08  0:03 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module, Desmond Yan, James Hu
In-Reply-To: <20200706232309.12010-8-scott.branden@broadcom.com>

On Mon, Jul 06, 2020 at 04:23:07PM -0700, Scott Branden wrote:
> Add Broadcom VK driver offload engine.
> This driver interfaces to the VK PCIe offload engine to perform
> should offload functions as video transcoding on multiple streams
> in parallel.  VK device is booted from files loaded using
> request_firmware_into_buf mechanism.  After booted card status is updated
> and messages can then be sent to the card.
> Such messages contain scatter gather list of addresses
> to pull data from the host to perform operations on.
> 
> Signed-off-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Desmond Yan <desmond.yan@broadcom.com>

nit: your S-o-b chain doesn't make sense (I would expect you at the end
since you're sending it and showing as the Author). Is it Co-developed-by?
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

> [...]
> +
> +		max_buf = SZ_4M;
> +		bufp = dma_alloc_coherent(dev,
> +					  max_buf,
> +					  &boot_dma_addr, GFP_KERNEL);
> +		if (!bufp) {
> +			dev_err(dev, "Error allocating 0x%zx\n", max_buf);
> +			ret = -ENOMEM;
> +			goto err_buf_out;
> +		}
> +
> +		bcm_vk_buf_notify(vk, bufp, boot_dma_addr, max_buf);
> +	} else {
> +		dev_err(dev, "Error invalid image type 0x%x\n", load_type);
> +		ret = -EINVAL;
> +		goto err_buf_out;
> +	}
> +
> +	ret = request_partial_firmware_into_buf(&fw, filename, dev,
> +						bufp, max_buf, 0);

Unless I don't understand what's happening here, this needs to be
reordered if you're going to keep Mimi happy and disallow the device
being able to see the firmware before it has been verified. (i.e. please
load the firmware before mapping DMA across the buffer.)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v10 4/9] test_firmware: add partial read support for request_firmware_into_buf
From: Kees Cook @ 2020-07-07 23:59 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module
In-Reply-To: <20200706232309.12010-5-scott.branden@broadcom.com>

On Mon, Jul 06, 2020 at 04:23:04PM -0700, Scott Branden wrote:
> Add additional hooks to test_firmware to pass in support
> for partial file read using request_firmware_into_buf.
> buf_size: size of buffer to request firmware into
> partial: indicates that a partial file request is being made
> file_offset: to indicate offset into file to request
> 
> Signed-off-by: Scott Branden <scott.branden@broadcom.com>

I am a fan of tests. :) If Luis gives an Ack here, you're good.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v10 3/9] firmware: add request_partial_firmware_into_buf
From: Kees Cook @ 2020-07-07 23:58 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module
In-Reply-To: <20200706232309.12010-4-scott.branden@broadcom.com>

On Mon, Jul 06, 2020 at 04:23:03PM -0700, Scott Branden wrote:
> Add request_partial_firmware_into_buf to allow for portions
> of firmware file to be read into a buffer.  Necessary where firmware
> needs to be loaded in portions from file in memory constrained systems.

Just tear out the differing "id" and just use FW_OPT_PARTIAL and I think
if Luis is happy, you're all set.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v10 2/9] fs: introduce kernel_pread_file* support
From: Kees Cook @ 2020-07-07 23:56 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module
In-Reply-To: <20200706232309.12010-3-scott.branden@broadcom.com>

On Mon, Jul 06, 2020 at 04:23:02PM -0700, Scott Branden wrote:
> Add kernel_pread_file* support to kernel to allow for partial read
> of files with an offset into the file.
> 
> Signed-off-by: Scott Branden <scott.branden@broadcom.com>
> ---
>  fs/exec.c                        | 93 ++++++++++++++++++++++++--------
>  include/linux/kernel_read_file.h | 17 ++++++
>  2 files changed, 87 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 4ea87db5e4d5..e6a8a65f7478 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -928,10 +928,14 @@ struct file *open_exec(const char *name)
>  }
>  EXPORT_SYMBOL(open_exec);
>  
> -int kernel_read_file(struct file *file, void **buf, loff_t *size,
> -		     loff_t max_size, enum kernel_read_file_id id)
> -{
> -	loff_t i_size, pos;
> +int kernel_pread_file(struct file *file, void **buf, loff_t *size,
> +		      loff_t max_size, loff_t pos,
> +		      enum kernel_read_file_id id)
> +{
> +	loff_t alloc_size;
> +	loff_t buf_pos;
> +	loff_t read_end;
> +	loff_t i_size;
>  	ssize_t bytes = 0;
>  	int ret;
>  
> @@ -951,21 +955,32 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>  		ret = -EINVAL;
>  		goto out;
>  	}
> -	if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> +
> +	/* Default read to end of file */
> +	read_end = i_size;
> +
> +	/* Allow reading partial portion of file */
> +	if ((id == READING_FIRMWARE_PARTIAL_READ) &&
> +	    (i_size > (pos + max_size)))
> +		read_end = pos + max_size;

There's no need to involve "id" here. There are other signals about
what's happening (i.e. pos != 0, max_size != i_size, etc).

> +
> +	alloc_size = read_end - pos;
> +	if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) {
>  		ret = -EFBIG;
>  		goto out;
>  	}
>  
> -	if (id != READING_FIRMWARE_PREALLOC_BUFFER)
> -		*buf = vmalloc(i_size);
> +	if ((id != READING_FIRMWARE_PARTIAL_READ) &&
> +	    (id != READING_FIRMWARE_PREALLOC_BUFFER))
> +		*buf = vmalloc(alloc_size);
>  	if (!*buf) {
>  		ret = -ENOMEM;
>  		goto out;
>  	}

The id usage here was a mistake in upstream, and the series I sent is
trying to clean that up.

Greg, it seems this series is going to end up in your tree due to it
being drivers/misc? I guess I need to direct my series to Greg then, but
get LSM folks Acks.

>  
> -	pos = 0;
> -	while (pos < i_size) {
> -		bytes = kernel_read(file, *buf + pos, i_size - pos, &pos);
> +	buf_pos = 0;
> +	while (pos < read_end) {
> +		bytes = kernel_read(file, *buf + buf_pos, read_end - pos, &pos);
>  		if (bytes < 0) {
>  			ret = bytes;
>  			goto out_free;
> @@ -973,20 +988,23 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>  
>  		if (bytes == 0)
>  			break;
> +
> +		buf_pos += bytes;
>  	}
>  
> -	if (pos != i_size) {
> +	if (pos != read_end) {
>  		ret = -EIO;
>  		goto out_free;
>  	}
>  
> -	ret = security_kernel_post_read_file(file, *buf, i_size, id);
> +	ret = security_kernel_post_read_file(file, *buf, alloc_size, id);
>  	if (!ret)
>  		*size = pos;

This call cannot be inside kernel_pread_file(): any future LSMs will see
a moving window of contents, etc. It'll need to be in kernel_read_file()
proper.

>  
>  out_free:
>  	if (ret < 0) {
> -		if (id != READING_FIRMWARE_PREALLOC_BUFFER) {
> +		if ((id != READING_FIRMWARE_PARTIAL_READ) &&
> +		    (id != READING_FIRMWARE_PREALLOC_BUFFER)) {
>  			vfree(*buf);
>  			*buf = NULL;
>  		}
> @@ -996,10 +1014,18 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
>  	allow_write_access(file);
>  	return ret;
>  }
> +
> +int kernel_read_file(struct file *file, void **buf, loff_t *size,
> +		     loff_t max_size, enum kernel_read_file_id id)
> +{
> +	return kernel_pread_file(file, buf, size, max_size, 0, id);
> +}
>  EXPORT_SYMBOL_GPL(kernel_read_file);
>  
> -int kernel_read_file_from_path(const char *path, void **buf, loff_t *size,
> -			       loff_t max_size, enum kernel_read_file_id id)
> +int kernel_pread_file_from_path(const char *path, void **buf,
> +				loff_t *size,
> +				loff_t max_size, loff_t pos,
> +				enum kernel_read_file_id id)
>  {
>  	struct file *file;
>  	int ret;
> @@ -1011,15 +1037,22 @@ int kernel_read_file_from_path(const char *path, void **buf, loff_t *size,
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
>  
> -	ret = kernel_read_file(file, buf, size, max_size, id);
> +	ret = kernel_pread_file(file, buf, size, max_size, pos, id);
>  	fput(file);
>  	return ret;
>  }
> +
> +int kernel_read_file_from_path(const char *path, void **buf, loff_t *size,
> +			       loff_t max_size, enum kernel_read_file_id id)
> +{
> +	return kernel_pread_file_from_path(path, buf, size, max_size, 0, id);
> +}
>  EXPORT_SYMBOL_GPL(kernel_read_file_from_path);
>  
> -int kernel_read_file_from_path_initns(const char *path, void **buf,
> -				      loff_t *size, loff_t max_size,
> -				      enum kernel_read_file_id id)
> +int kernel_pread_file_from_path_initns(const char *path, void **buf,
> +				       loff_t *size,
> +				       loff_t max_size, loff_t pos,
> +				       enum kernel_read_file_id id)
>  {
>  	struct file *file;
>  	struct path root;
> @@ -1037,14 +1070,22 @@ int kernel_read_file_from_path_initns(const char *path, void **buf,
>  	if (IS_ERR(file))
>  		return PTR_ERR(file);
>  
> -	ret = kernel_read_file(file, buf, size, max_size, id);
> +	ret = kernel_pread_file(file, buf, size, max_size, pos, id);
>  	fput(file);
>  	return ret;
>  }
> +
> +int kernel_read_file_from_path_initns(const char *path, void **buf,
> +				      loff_t *size, loff_t max_size,
> +				      enum kernel_read_file_id id)
> +{
> +	return kernel_pread_file_from_path_initns(path, buf, size, max_size, 0, id);
> +}
>  EXPORT_SYMBOL_GPL(kernel_read_file_from_path_initns);
>  
> -int kernel_read_file_from_fd(int fd, void **buf, loff_t *size, loff_t max_size,
> -			     enum kernel_read_file_id id)
> +int kernel_pread_file_from_fd(int fd, void **buf, loff_t *size,
> +			      loff_t max_size, loff_t pos,
> +			      enum kernel_read_file_id id)
>  {
>  	struct fd f = fdget(fd);
>  	int ret = -EBADF;
> @@ -1052,11 +1093,17 @@ int kernel_read_file_from_fd(int fd, void **buf, loff_t *size, loff_t max_size,
>  	if (!f.file)
>  		goto out;
>  
> -	ret = kernel_read_file(f.file, buf, size, max_size, id);
> +	ret = kernel_pread_file(f.file, buf, size, max_size, pos, id);
>  out:
>  	fdput(f);
>  	return ret;
>  }
> +
> +int kernel_read_file_from_fd(int fd, void **buf, loff_t *size, loff_t max_size,
> +			     enum kernel_read_file_id id)
> +{
> +	return kernel_pread_file_from_fd(fd, buf, size, max_size, 0, id);
> +}
>  EXPORT_SYMBOL_GPL(kernel_read_file_from_fd);

For each of these execution path, the mapping to LSM hooks is:

- all path must call security_kernel_read_file(file, id) before reading
  (this appears to be fine as-is in your series).

- anything doing a "full" read needs to call
  security_kernel_post_read_file() with the file and full buffer, size,
  etc (so all the kernel_read_file*() paths). I imagine instead of
  adding 3 copy/pasted versions of this, it may be possible to refactor
  the helpers into a single core "full" caller that takes struct file,
  or doing some logic in kernel_pread_file() that notices it has the
  entire file in the buffer and doing the call then.
  As an example of what I mean about doing the call, here's how I might
  imagine it for one of the paths if it took struct file:

int kernel_read_file_from_file(struct file *file, void **buf, loff_t *size,
			       loff_t max_size, enum kernel_read_file_id id)
{
	int ret;

	ret = kernel_pread_file_from_file(file, buf, size, max_size, 0, id);
	if (ret)
		return ret;
	return security_kernel_post_read_file(file, buf, *size, id);
}

>  
>  #if defined(CONFIG_HAVE_AOUT) || defined(CONFIG_BINFMT_FLAT) || \
> diff --git a/include/linux/kernel_read_file.h b/include/linux/kernel_read_file.h
> index 53f5ca41519a..f061ccb8d0b4 100644
> --- a/include/linux/kernel_read_file.h
> +++ b/include/linux/kernel_read_file.h
> @@ -8,6 +8,7 @@
>  #define __kernel_read_file_id(id) \
>  	id(UNKNOWN, unknown)		\
>  	id(FIRMWARE, firmware)		\
> +	id(FIRMWARE_PARTIAL_READ, firmware)	\
>  	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
>  	id(FIRMWARE_EFI_EMBEDDED, firmware)	\

And again, sorry that this was in here as a misleading example.

>  	id(MODULE, kernel-module)		\
> @@ -36,15 +37,31 @@ static inline const char *kernel_read_file_id_str(enum kernel_read_file_id id)
>  	return kernel_read_file_str[id];
>  }
>  
> +int kernel_pread_file(struct file *file,
> +		      void **buf, loff_t *size, loff_t pos,
> +		      loff_t max_size,
> +		      enum kernel_read_file_id id);
>  int kernel_read_file(struct file *file,
>  		     void **buf, loff_t *size, loff_t max_size,
>  		     enum kernel_read_file_id id);
> +int kernel_pread_file_from_path(const char *path,
> +				void **buf, loff_t *size, loff_t pos,
> +				loff_t max_size,
> +				enum kernel_read_file_id id);
>  int kernel_read_file_from_path(const char *path,
>  			       void **buf, loff_t *size, loff_t max_size,
>  			       enum kernel_read_file_id id);
> +int kernel_pread_file_from_path_initns(const char *path,
> +				       void **buf, loff_t *size, loff_t pos,
> +				       loff_t max_size,
> +				       enum kernel_read_file_id id);
>  int kernel_read_file_from_path_initns(const char *path,
>  				      void **buf, loff_t *size, loff_t max_size,
>  				      enum kernel_read_file_id id);
> +int kernel_pread_file_from_fd(int fd,
> +			      void **buf, loff_t *size, loff_t pos,
> +			      loff_t max_size,
> +			      enum kernel_read_file_id id);
>  int kernel_read_file_from_fd(int fd,
>  			     void **buf, loff_t *size, loff_t max_size,
>  			     enum kernel_read_file_id id);

I remain concerned that adding these helpers will lead a poor
interaction with LSMs, but I guess I get to hold my tongue. :)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v10 1/9] fs: move kernel_read_file* to its own include file
From: Kees Cook @ 2020-07-07 23:40 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module
In-Reply-To: <20200706232309.12010-2-scott.branden@broadcom.com>

On Mon, Jul 06, 2020 at 04:23:01PM -0700, Scott Branden wrote:
> Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
> include file. That header gets pulled in just about everywhere
> and doesn't really need functions not related to the general fs interface.
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Scott Branden <scott.branden@broadcom.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  drivers/base/firmware_loader/main.c |  1 +
>  fs/exec.c                           |  1 +
>  include/linux/fs.h                  | 39 ----------------------
>  include/linux/ima.h                 |  1 +
>  include/linux/kernel_read_file.h    | 52 +++++++++++++++++++++++++++++
>  include/linux/security.h            |  1 +
>  kernel/kexec_file.c                 |  1 +
>  kernel/module.c                     |  1 +
>  security/integrity/digsig.c         |  1 +
>  security/integrity/ima/ima_fs.c     |  1 +
>  security/integrity/ima/ima_main.c   |  1 +
>  security/integrity/ima/ima_policy.c |  1 +
>  security/loadpin/loadpin.c          |  1 +
>  security/security.c                 |  1 +
>  security/selinux/hooks.c            |  1 +
>  15 files changed, 65 insertions(+), 39 deletions(-)
>  create mode 100644 include/linux/kernel_read_file.h

This looks like too many files are getting touched. If it got added to
security.h, very few of the above .c files will need it explicitly
added (maybe none). You can test future versions of this change with an
allmodconfig build and make sure you have a matching .o for each .c
file that calls kernel_read_file(). :)

But otherwise, sure, seems good.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v10 9/9] ima: add FIRMWARE_PARTIAL_READ support
From: Kees Cook @ 2020-07-07 23:36 UTC (permalink / raw)
  To: Scott Branden
  Cc: Luis Chamberlain, Wolfram Sang, Greg Kroah-Hartman, David Brown,
	Alexander Viro, Shuah Khan, bjorn.andersson, Shuah Khan,
	Arnd Bergmann, Mimi Zohar, Rafael J . Wysocki, linux-kernel,
	linux-arm-msm, linux-fsdevel, BCM Kernel Feedback, Olof Johansson,
	Andrew Morton, Dan Carpenter, Colin Ian King, Takashi Iwai,
	linux-kselftest, Andy Gross, linux-integrity,
	linux-security-module
In-Reply-To: <df45cc5b-62d7-21c7-a852-1433a45b68ef@broadcom.com>

On Tue, Jul 07, 2020 at 10:13:42AM -0700, Scott Branden wrote:
> You and others are certainly more experts in the filesystem and security
> infrastructure of the kernel.
> What I am trying to accomplish is a simple operation:
> request part of a file into a buffer rather than the whole file.
> If someone could add such support I would be more than happy to use it.

Sure, and I totally understand that, but as it happens, no one has stepped
up with spare time to do that work. Since you're the person with the need
for the API, it falls to you to do it. And I understand what feature creep
feels like (I needed to fix one design problem[1] with timers, and I spent
months sending hundreds of patches). Some times you get lucky and it's
easy, and sometimes you end up touching something that needs a LOT of work
to refactor before you can make your desired change work well with the
rest of the kernel and be maintainable by other people into the future.

Quick tangent: I can't find in the many many threads where you explain
how large these firmware images are and why existing kernel memory
allocations are insufficient to load them. How large are these[2] files?

/lib/firmware/vk-boot1-bcm958401m2.ecdsa.bin
/lib/firmware/vk-boot2-bcm958401m2_a72.ecdsa.bin

For me, the requirements for partial read support are these things,
which are the characteristics of the existing API:

- the LSM must be able to validate the entire file contents before
  any data is available to any reader. (Which was pointed out back in
  August 2019[3].) And "any" reader includes having a DMA window open
  on the memory range used for reading the contents (which was pointed
  out at by Mimi[4] but went unanswered and remains broken still in this
  v10, but I will comment separately on that.)

- the integrity of the file contents must be maintained between
  validation and delivery (currently this is handled internally via
  disallow_writes()).

> This has now bubbled into many other designs issues in the existing
> codebase.

Correct -- this is one of the many difficulties of contributing to a
large and complex code base with many maintainers. There can be a lot
of requirements for the code that have nothing to do with seemingly more
narrow areas of endeavour.

> I will need more details on your comments - see below.
> 
> On 2020-07-06 8:08 p.m., Kees Cook wrote:
> > On Mon, Jul 06, 2020 at 04:23:09PM -0700, Scott Branden wrote:
> > > Add FIRMWARE_PARTIAL_READ support for integrity
> > > measurement on partial reads of firmware files.
> > Hi,
> > 
> > Several versions ago I'd suggested that the LSM infrastructure handle
> > the "full read" semantics so that individual LSMs don't need to each
> > duplicate the same efforts. As it happens, only IMA is impacted (SELinux
> > ignores everything except modules, and LoadPin only cares about origin
> > not contents).
> 
> Does your patch series "Fix misused kernel_read_file() enums" handle this
> because this suggestion is outside the scope of my change?

My proposed patch series cleans up a number of mistakes that were made
to the kernel_read_file() API, and helps clarify my point about the
enums being used for *what*, and not *how* or *where*, which needs to
be fixed in this series and shouldn't be a big deal (I will reply to
individual patches).

> > Next is the problem that enum kernel_read_file_id is an object
> > TYPE enum, not a HOW enum. (And it seems I missed the addition of
> > READING_FIRMWARE_PREALLOC_BUFFER, which may share a similar problem.)
> > That it's a partial read doesn't change _what_ you're reading: that's an
> > internal API detail. What happens when I attempt to do a partial read of
> > a kexec image?
> 
> It does not appear there is any user of partial reads of kexec images?
> I have been informed by Greg K-H to not add apis that are not used so such
> support doesn't make sense to add at this time.

But you are proposing a generic API enhancement that any other user in
the kernel may end up using. Just because the bcm-vk driver is the only
user now, and IMA is the only LSM performing content analysis, it
doesn't mean that there won't be another driver added later, nor another
LSM. In fact, the new BPF LSM means that anything exposed by LSM hooks
is now available for analysis.

> >   I'll use kernel_pread_file() and pass READING_KEXEC_IMAGE,
> > but the LSMs will have no idea it's a partial read.
> The addition I am adding is for request_partial_firmware_into_buf.
> In order to do so it adds internal support for partial reads of firmware
> files,
> not kexec image.

Yes, but you're changing kernel_read_file() APIs to do it. There are
plenty of users of that API. Maybe they would like to also use partial
reads?

$ git grep kernel_read_file | wc -l
77

> The above seems outside the scope of my patch?

Unfortunately, it is not. Part of your responsibility as a kernel
developer making API changes/additions is that those changes need to
interact correctly with the rest of the kernel (and be maintainable).

> > Finally, what keeps the contents of the file from changing between the
> > first call (which IMA will read the entire file for) and the next reads
> > which will bypass IMA?
> 
> The request is for a partial read.  IMA ensures the whole file integrity
> even though I only do a partial read.
> The next partial read will re-read and check integrity of file.

So, while terribly inefficient, I guess this approach is tenable. It
means that each partial read will trigger a full read for LSMs that care
about the hook.

So, to that end, I wonder why IMA doesn't do this for all file types?

It also means that we won't have a strict pairing of
security_kernel_read_file() to security_kernel_post_read_file() in the
LSMs, but it seems that only IMA currently explicitly cares about this
(or maybe not at all).

I'm not entirely happy about using this design, but it does appear
sufficient.

> >   I'd suggested that the open file must have writes
> > disabled on it (as execve() does).
> The file will be reopened and integrity checked on the next partial read (if
> there is one).
> So I don't think there is any change to be made here.
> If writes aren't already disabled for a whole file read then that is
> something that needs to be fixed in the existing code.

My suggestion quoted here was operating on the idea that whole-file
verification wasn't happening on every partial read, so this isn't a
problem in that case.

> > 
> > So, please redesign this:
> > - do not add an enum
> I used existing infrastructure provided by Mimi but now looks like it will
> have to fit with your patches from yesterday.

Right, this won't be hard. I will send a v2 of my patches to clarify the
purpose of the 3 file content hooks (load_data, read_file,
post_read_file), which might need renaming...

> > - make the file unwritable for the life of having the handle open
> It's no different than a full file read so no change to be made here.

Correct.

> > - make the "full read" happen as part of the first partial read so the
> >    LSMs don't have to reimplement everything
> Each partial read is an individual operation so I think a "full read" is
> performed every time
> if your security IMA is enabled.  If someone wants to add a file lock and
> then partial reads in the kernel
> then that would be different than what is needed by the kernel driver.

So, given that Mimi is (I think?) satisfied with your approach here, I
can't realistically complain. I still don't like the idea of each LSM
needing to perform it's full read loop for the contents, but so be it:
IMA will have the code, SELinux doesn't care (yet), and LoadPin doesn't
care about contents.

-Kees

[1] https://lore.kernel.org/lkml/20170808003345.GA107289@beast/
    git log --oneline --grep 'Convert timer' --author "Kees Cook" | wc -l
    271
[2] https://lore.kernel.org/lkml/824407ae-8ab8-0fe3-bd72-270fce960ac5@broadcom.com/
[3] https://lore.kernel.org/lkml/s5hsgpsqd49.wl-tiwai@suse.de/
[4] https://lore.kernel.org/lkml/1591622862.4638.74.camel@linux.ibm.com/

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 2/4] fs: Remove FIRMWARE_PREALLOC_BUFFER from kernel_read_file() enums
From: Kees Cook @ 2020-07-07 21:55 UTC (permalink / raw)
  To: Scott Branden
  Cc: James Morris, Luis Chamberlain, Mimi Zohar, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alexander Viro, Jessica Yu, Dmitry Kasatkin,
	Serge E. Hallyn, Casey Schaufler, Eric W. Biederman,
	Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module,
	Christoph Hellwig
In-Reply-To: <0a5e2c2e-507c-9114-5328-5943f63d707e@broadcom.com>

On Tue, Jul 07, 2020 at 09:42:02AM -0700, Scott Branden wrote:
> On 2020-07-07 1:19 a.m., Kees Cook wrote:
> > FIRMWARE_PREALLOC_BUFFER is a "how", not a "what", and confuses the LSMs
> > that are interested in filtering between types of things. The "how"
> > should be an internal detail made uninteresting to the LSMs.
> > 
> > Fixes: a098ecd2fa7d ("firmware: support loading into a pre-allocated buffer")
> > Fixes: fd90bc559bfb ("ima: based on policy verify firmware signatures (pre-allocated buffer)")
> > Fixes: 4f0496d8ffa3 ("ima: based on policy warn about loading firmware (pre-allocated buffer)")
> > [...]
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 3f881a892ea7..95fc775ed937 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2993,10 +2993,10 @@ static inline void i_readcount_inc(struct inode *inode)
> >   #endif
> >   extern int do_pipe_flags(int *, int);
> > +/* This is a list of *what* is being read, not *how*. */
> >   #define __kernel_read_file_id(id) \
> >   	id(UNKNOWN, unknown)		\
> >   	id(FIRMWARE, firmware)		\
> With this change, I'm trying to figure out how the partial firmware read is
> going to work on top of this reachitecture.
> Is it going to be ok to add READING_PARTIAL_FIRMWARE here as that is a
> "what"?

No, that's why I said you need to do the implementation within the API
and not expect each LSM to implement their own (as I mentioned both
times):

https://lore.kernel.org/lkml/202005221551.5CA1372@keescook/
https://lore.kernel.org/lkml/202007061950.F6B3D9E6A@keescook/

I will reply in the thread above.

> > -	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
> My patch series gets rejected any time I make a change to the
> kernel_read_file* region in linux/fs.h.
> The requirement is for this api to move to another header file outside of
> linux/fs.h
> It seems the same should apply to your change.

Well I'm hardly making the same level of changes, but yeah, sure, if
that helps move things along, I can include that here.

> Could you please add the following patch to the start of you patch series to
> move the kernel_read_file* to its own include file?
> https://patchwork.kernel.org/patch/11647063/

https://lore.kernel.org/lkml/20200706232309.12010-2-scott.branden@broadcom.com/

You've included it in include/linux/security.h and that should be pretty
comprehensive, it shouldn't be needed in so many .c files.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH 0/4] Fix misused kernel_read_file() enums
From: Kees Cook @ 2020-07-07 21:45 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: James Morris, Luis Chamberlain, Scott Branden, Greg Kroah-Hartman,
	Rafael J. Wysocki, Alexander Viro, Jessica Yu, Dmitry Kasatkin,
	Serge E. Hallyn, Casey Schaufler, Eric W. Biederman,
	Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, linux-kernel,
	linux-fsdevel, linux-integrity, linux-security-module
In-Reply-To: <1594136164.23056.76.camel@linux.ibm.com>

On Tue, Jul 07, 2020 at 11:36:04AM -0400, Mimi Zohar wrote:
> Hi Kees,
> 
> On Tue, 2020-07-07 at 01:19 -0700, Kees Cook wrote:
> > Hi,
> > 
> > In looking for closely at the additions that got made to the
> > kernel_read_file() enums, I noticed that FIRMWARE_PREALLOC_BUFFER
> > and FIRMWARE_EFI_EMBEDDED were added, but they are not appropriate
> > *kinds* of files for the LSM to reason about. They are a "how" and
> > "where", respectively. Remove these improper aliases and refactor the
> > code to adapt to the changes.
> 
> Thank you for adding the missing calls and the firmware pre allocated
> buffer comment update.
> 
> > 
> > Additionally adds in missing calls to security_kernel_post_read_file()
> > in the platform firmware fallback path (to match the sysfs firmware
> > fallback path) and in module loading. I considered entirely removing
> > security_kernel_post_read_file() hook since it is technically unused,
> > but IMA probably wants to be able to measure EFI-stored firmware images,
> > so I wired it up and matched it for modules, in case anyone wants to
> > move the module signature checks out of the module core and into an LSM
> > to avoid the current layering violations.
> 
> IMa has always verified kernel module signatures.  Recently appended

Right, but not through the kernel_post_read_file() hook, nor via
out-of-band hooks in kernel/module.c. I was just meaning that future
work could be done here to regularize module_sig_check() into an actual
LSM (which could, in theory, be extended to kexec() to avoid code
duplication there, as kimage_validate_signature() has some overlap with
mod_verify_sig()). into a bit more normal of an LSM.

As far as IMA and regularizing things, though, what about fixing IMA to
not manually stack:

$ grep -B3 ima_ security/security.c
        ret = call_int_hook(bprm_check_security, 0, bprm);
        if (ret)
                return ret;
        return ima_bprm_check(bprm);
--
        ret = call_int_hook(file_mprotect, 0, vma, reqprot, prot);
        if (ret)
                return ret;
        return ima_file_mprotect(vma, prot);
...

Can IMA implement a hook-last method to join the regular stacking routines?

> kernel module signature support was added to IMA.  The same appended
> signature format is also being used to sign and verify the kexec
> kernel image.
> 
> With IMA's new kernel module appended signature support and patch 4/4
> in this series, IMA won't be limit to the finit_module syscall, but
> could support the init_module syscall as well.

Exactly.

> > This touches several trees, and I suspect it would be best to go through
> > James's LSM tree.
> 
> Sure.

Is this an "Acked-by"? :)

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH] Replace HTTP links with HTTPS ones: security
From: James Morris @ 2020-07-07 20:28 UTC (permalink / raw)
  To: Alexander A. Klimov
  Cc: serge, john.johansen, zohar, dmitry.kasatkin, dhowells,
	jarkko.sakkinen, linux-security-module, linux-kernel,
	linux-integrity, keyrings
In-Reply-To: <20200705214512.28498-1-grandmaster@al2klimov.de>

On Sun, 5 Jul 2020, Alexander A. Klimov wrote:

> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
>     For each line:
>       If doesn't contain `\bxmlns\b`:
>         For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
>           If both the HTTP and HTTPS versions
>           return 200 OK and serve the same content:
>             Replace HTTP with HTTPS.
> 
> Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>

Thanks.

Applied to 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next-general


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply

* Re: [PATCH v4 3/3] prctl: Allow ptrace capable processes to change /proc/self/exe
From: Cyrill Gorcunov @ 2020-07-07 20:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Nicolas Viennot, Jann Horn, Paul Moore, Serge E. Hallyn,
	Adrian Reber, Eric Biederman, Pavel Emelyanov, Oleg Nesterov,
	Dmitry Safonov, Andrei Vagin, Michał Cłapiński,
	Kamil Yurtsever, Dirk Petersen, Christine Flood, Casey Schaufler,
	Mike Rapoport, Radostin Stoyanov, Stephen Smalley, Sargun Dhillon,
	Arnd Bergmann, linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris,
	linux-fsdevel@vger.kernel.org
In-Reply-To: <20200707154504.aknxmw6qavpjkr24@wittgenstein>

On Tue, Jul 07, 2020 at 05:45:04PM +0200, Christian Brauner wrote:
...
> 
> Ok, so the original patch proposal was presented in [4] in 2014. The
> final version of that patch added the PR_SET_MM_MAP we know today. The
> initial version presented in [4] did not require _any_ privilege.
> 

True. I still think that relyng on /proc/<pid>/exe being immutable (or
guarded by caps) in a sake of security is a bit misleading, this link
only a hint without any guarantees of what code is being executed once
we pass cs:rip to userspace right after exec is completed. Nowadays I rather
think we might need to call audit_log() here or something similar to point
that exe link is changed (by criu or someone else) and simply notify
node's administrator, that's all. But as you pointed tomoyo may be
affected if we simply drops all caps from here. Thus I agree that
the new cap won't make situation worse.

Still I'm not in touch with kernel code for a couple of years already
and might be missing something obvious here.

^ permalink raw reply

* Re: [PATCH v19 07/12] landlock: Support filesystem access-control
From: Randy Dunlap @ 2020-07-07 20:11 UTC (permalink / raw)
  To: Mickaël Salaün, linux-kernel
  Cc: Al Viro, Andy Lutomirski, Anton Ivanov, Arnd Bergmann,
	Casey Schaufler, James Morris, Jann Horn, Jeff Dike,
	Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-8-mic@digikod.net>

Hi--

On 7/7/20 11:09 AM, Mickaël Salaün wrote:
> ---
>  arch/Kconfig                  |   7 +
>  arch/um/Kconfig               |   1 +
>  include/uapi/linux/landlock.h |  78 +++++
>  security/landlock/Kconfig     |   2 +-
>  security/landlock/Makefile    |   2 +-
>  security/landlock/fs.c        | 609 ++++++++++++++++++++++++++++++++++
>  security/landlock/fs.h        |  60 ++++
>  security/landlock/setup.c     |   7 +
>  security/landlock/setup.h     |   2 +
>  9 files changed, 766 insertions(+), 2 deletions(-)
>  create mode 100644 include/uapi/linux/landlock.h
>  create mode 100644 security/landlock/fs.c
>  create mode 100644 security/landlock/fs.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8cc35dc556c7..483b7476ac69 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -845,6 +845,13 @@ config COMPAT_32BIT_TIME
>  config ARCH_NO_PREEMPT
>  	bool
>  
> +config ARCH_EPHEMERAL_STATES
> +	def_bool n
> +	help
> +	  An arch should select this symbol if it do not keep an internal kernel

	                                       it does not

> +	  state for kernel objects such as inodes, but instead rely on something

	                                               instead relies on

> +	  else (e.g. the host kernel for an UML kernel).
> +
>  config ARCH_SUPPORTS_RT
>  	bool
>  

thanks.
-- 
~Randy


^ permalink raw reply

* [PATCH v19 05/12] LSM: Infrastructure management of the superblock
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86, John Johansen, Stephen Smalley
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

From: Casey Schaufler <casey@schaufler-ca.com>

Move management of the superblock->sb_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20190829232935.7099-2-casey@schaufler-ca.com
---

Changes since v17:
* Rebase the original LSM stacking patch from v5.3 to v5.7: I fixed some
  diff conflicts caused by code moves and function renames in
  selinux/include/objsec.h and selinux/hooks.c .  I checked that it
  builds but I didn't test the changes for SELinux nor SMACK.
---
 include/linux/lsm_hooks.h         |  1 +
 security/security.c               | 46 ++++++++++++++++++++----
 security/selinux/hooks.c          | 58 ++++++++++++-------------------
 security/selinux/include/objsec.h |  6 ++++
 security/selinux/ss/services.c    |  3 +-
 security/smack/smack.h            |  6 ++++
 security/smack/smack_lsm.c        | 35 +++++--------------
 7 files changed, 85 insertions(+), 70 deletions(-)

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 95b7c1d32062..80c629d8a2ad 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1550,6 +1550,7 @@ struct lsm_blob_sizes {
 	int	lbs_cred;
 	int	lbs_file;
 	int	lbs_inode;
+	int	lbs_superblock;
 	int	lbs_ipc;
 	int	lbs_msg_msg;
 	int	lbs_task;
diff --git a/security/security.c b/security/security.c
index 70a7ad357bc6..d60aa835b670 100644
--- a/security/security.c
+++ b/security/security.c
@@ -201,6 +201,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed)
 	lsm_set_blob_size(&needed->lbs_inode, &blob_sizes.lbs_inode);
 	lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc);
 	lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg);
+	lsm_set_blob_size(&needed->lbs_superblock, &blob_sizes.lbs_superblock);
 	lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task);
 }
 
@@ -331,12 +332,13 @@ static void __init ordered_lsm_init(void)
 	for (lsm = ordered_lsms; *lsm; lsm++)
 		prepare_lsm(*lsm);
 
-	init_debug("cred blob size     = %d\n", blob_sizes.lbs_cred);
-	init_debug("file blob size     = %d\n", blob_sizes.lbs_file);
-	init_debug("inode blob size    = %d\n", blob_sizes.lbs_inode);
-	init_debug("ipc blob size      = %d\n", blob_sizes.lbs_ipc);
-	init_debug("msg_msg blob size  = %d\n", blob_sizes.lbs_msg_msg);
-	init_debug("task blob size     = %d\n", blob_sizes.lbs_task);
+	init_debug("cred blob size       = %d\n", blob_sizes.lbs_cred);
+	init_debug("file blob size       = %d\n", blob_sizes.lbs_file);
+	init_debug("inode blob size      = %d\n", blob_sizes.lbs_inode);
+	init_debug("ipc blob size        = %d\n", blob_sizes.lbs_ipc);
+	init_debug("msg_msg blob size    = %d\n", blob_sizes.lbs_msg_msg);
+	init_debug("superblock blob size = %d\n", blob_sizes.lbs_superblock);
+	init_debug("task blob size       = %d\n", blob_sizes.lbs_task);
 
 	/*
 	 * Create any kmem_caches needed for blobs
@@ -668,6 +670,27 @@ static void __init lsm_early_task(struct task_struct *task)
 		panic("%s: Early task alloc failed.\n", __func__);
 }
 
+/**
+ * lsm_superblock_alloc - allocate a composite superblock blob
+ * @sb: the superblock that needs a blob
+ *
+ * Allocate the superblock blob for all the modules
+ *
+ * Returns 0, or -ENOMEM if memory can't be allocated.
+ */
+static int lsm_superblock_alloc(struct super_block *sb)
+{
+	if (blob_sizes.lbs_superblock == 0) {
+		sb->s_security = NULL;
+		return 0;
+	}
+
+	sb->s_security = kzalloc(blob_sizes.lbs_superblock, GFP_KERNEL);
+	if (sb->s_security == NULL)
+		return -ENOMEM;
+	return 0;
+}
+
 /*
  * The default value of the LSM hook is defined in linux/lsm_hook_defs.h and
  * can be accessed with:
@@ -865,12 +888,21 @@ int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *
 
 int security_sb_alloc(struct super_block *sb)
 {
-	return call_int_hook(sb_alloc_security, 0, sb);
+	int rc = lsm_superblock_alloc(sb);
+
+	if (unlikely(rc))
+		return rc;
+	rc = call_int_hook(sb_alloc_security, 0, sb);
+	if (unlikely(rc))
+		security_sb_free(sb);
+	return rc;
 }
 
 void security_sb_free(struct super_block *sb)
 {
 	call_void_hook(sb_free_security, sb);
+	kfree(sb->s_security);
+	sb->s_security = NULL;
 }
 
 void security_free_mnt_opts(void **mnt_opts)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index efa6108b1ce9..a88e21b90639 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -321,7 +321,7 @@ static void inode_free_security(struct inode *inode)
 
 	if (!isec)
 		return;
-	sbsec = inode->i_sb->s_security;
+	sbsec = selinux_superblock(inode->i_sb);
 	/*
 	 * As not all inode security structures are in a list, we check for
 	 * empty list outside of the lock to make sure that we won't waste
@@ -339,13 +339,6 @@ static void inode_free_security(struct inode *inode)
 	}
 }
 
-static void superblock_free_security(struct super_block *sb)
-{
-	struct superblock_security_struct *sbsec = sb->s_security;
-	sb->s_security = NULL;
-	kfree(sbsec);
-}
-
 struct selinux_mnt_opts {
 	const char *fscontext, *context, *rootcontext, *defcontext;
 };
@@ -457,7 +450,7 @@ static int selinux_is_genfs_special_handling(struct super_block *sb)
 
 static int selinux_is_sblabel_mnt(struct super_block *sb)
 {
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 
 	/*
 	 * IMPORTANT: Double-check logic in this function when adding a new
@@ -485,7 +478,7 @@ static int selinux_is_sblabel_mnt(struct super_block *sb)
 
 static int sb_finish_set_opts(struct super_block *sb)
 {
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 	struct dentry *root = sb->s_root;
 	struct inode *root_inode = d_backing_inode(root);
 	int rc = 0;
@@ -598,7 +591,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 				unsigned long *set_kern_flags)
 {
 	const struct cred *cred = current_cred();
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 	struct dentry *root = sbsec->sb->s_root;
 	struct selinux_mnt_opts *opts = mnt_opts;
 	struct inode_security_struct *root_isec;
@@ -835,8 +828,8 @@ static int selinux_set_mnt_opts(struct super_block *sb,
 static int selinux_cmp_sb_context(const struct super_block *oldsb,
 				    const struct super_block *newsb)
 {
-	struct superblock_security_struct *old = oldsb->s_security;
-	struct superblock_security_struct *new = newsb->s_security;
+	struct superblock_security_struct *old = selinux_superblock(oldsb);
+	struct superblock_security_struct *new = selinux_superblock(newsb);
 	char oldflags = old->flags & SE_MNTMASK;
 	char newflags = new->flags & SE_MNTMASK;
 
@@ -868,8 +861,9 @@ static int selinux_sb_clone_mnt_opts(const struct super_block *oldsb,
 					unsigned long *set_kern_flags)
 {
 	int rc = 0;
-	const struct superblock_security_struct *oldsbsec = oldsb->s_security;
-	struct superblock_security_struct *newsbsec = newsb->s_security;
+	const struct superblock_security_struct *oldsbsec =
+						selinux_superblock(oldsb);
+	struct superblock_security_struct *newsbsec = selinux_superblock(newsb);
 
 	int set_fscontext =	(oldsbsec->flags & FSCONTEXT_MNT);
 	int set_context =	(oldsbsec->flags & CONTEXT_MNT);
@@ -1048,7 +1042,7 @@ static int show_sid(struct seq_file *m, u32 sid)
 
 static int selinux_sb_show_options(struct seq_file *m, struct super_block *sb)
 {
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 	int rc;
 
 	if (!(sbsec->flags & SE_SBINITIALIZED))
@@ -1398,7 +1392,7 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent
 	if (isec->sclass == SECCLASS_FILE)
 		isec->sclass = inode_mode_to_security_class(inode->i_mode);
 
-	sbsec = inode->i_sb->s_security;
+	sbsec = selinux_superblock(inode->i_sb);
 	if (!(sbsec->flags & SE_SBINITIALIZED)) {
 		/* Defer initialization until selinux_complete_init,
 		   after the initial policy is loaded and the security
@@ -1741,7 +1735,8 @@ selinux_determine_inode_label(const struct task_security_struct *tsec,
 				 const struct qstr *name, u16 tclass,
 				 u32 *_new_isid)
 {
-	const struct superblock_security_struct *sbsec = dir->i_sb->s_security;
+	const struct superblock_security_struct *sbsec =
+						selinux_superblock(dir->i_sb);
 
 	if ((sbsec->flags & SE_SBINITIALIZED) &&
 	    (sbsec->behavior == SECURITY_FS_USE_MNTPOINT)) {
@@ -1772,7 +1767,7 @@ static int may_create(struct inode *dir,
 	int rc;
 
 	dsec = inode_security(dir);
-	sbsec = dir->i_sb->s_security;
+	sbsec = selinux_superblock(dir->i_sb);
 
 	sid = tsec->sid;
 
@@ -1921,7 +1916,7 @@ static int superblock_has_perm(const struct cred *cred,
 	struct superblock_security_struct *sbsec;
 	u32 sid = cred_sid(cred);
 
-	sbsec = sb->s_security;
+	sbsec = selinux_superblock(sb);
 	return avc_has_perm(&selinux_state,
 			    sid, sbsec->sid, SECCLASS_FILESYSTEM, perms, ad);
 }
@@ -2550,11 +2545,7 @@ static void selinux_bprm_committed_creds(struct linux_binprm *bprm)
 
 static int selinux_sb_alloc_security(struct super_block *sb)
 {
-	struct superblock_security_struct *sbsec;
-
-	sbsec = kzalloc(sizeof(struct superblock_security_struct), GFP_KERNEL);
-	if (!sbsec)
-		return -ENOMEM;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 
 	mutex_init(&sbsec->lock);
 	INIT_LIST_HEAD(&sbsec->isec_head);
@@ -2563,16 +2554,10 @@ static int selinux_sb_alloc_security(struct super_block *sb)
 	sbsec->sid = SECINITSID_UNLABELED;
 	sbsec->def_sid = SECINITSID_FILE;
 	sbsec->mntpoint_sid = SECINITSID_UNLABELED;
-	sb->s_security = sbsec;
 
 	return 0;
 }
 
-static void selinux_sb_free_security(struct super_block *sb)
-{
-	superblock_free_security(sb);
-}
-
 static inline int opt_len(const char *s)
 {
 	bool open_quote = false;
@@ -2651,7 +2636,7 @@ static int selinux_sb_eat_lsm_opts(char *options, void **mnt_opts)
 static int selinux_sb_remount(struct super_block *sb, void *mnt_opts)
 {
 	struct selinux_mnt_opts *opts = mnt_opts;
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 	u32 sid;
 	int rc;
 
@@ -2889,7 +2874,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
 	int rc;
 	char *context;
 
-	sbsec = dir->i_sb->s_security;
+	sbsec = selinux_superblock(dir->i_sb);
 
 	newsid = tsec->create_sid;
 
@@ -3134,7 +3119,7 @@ static int selinux_inode_setxattr(struct dentry *dentry, const char *name,
 	if (!selinux_initialized(&selinux_state))
 		return (inode_owner_or_capable(inode) ? 0 : -EPERM);
 
-	sbsec = inode->i_sb->s_security;
+	sbsec = selinux_superblock(inode->i_sb);
 	if (!(sbsec->flags & SBLABEL_MNT))
 		return -EOPNOTSUPP;
 
@@ -3368,13 +3353,14 @@ static int selinux_inode_setsecurity(struct inode *inode, const char *name,
 				     const void *value, size_t size, int flags)
 {
 	struct inode_security_struct *isec = inode_security_novalidate(inode);
-	struct superblock_security_struct *sbsec = inode->i_sb->s_security;
+	struct superblock_security_struct *sbsec;
 	u32 newsid;
 	int rc;
 
 	if (strcmp(name, XATTR_SELINUX_SUFFIX))
 		return -EOPNOTSUPP;
 
+	sbsec = selinux_superblock(inode->i_sb);
 	if (!(sbsec->flags & SBLABEL_MNT))
 		return -EOPNOTSUPP;
 
@@ -6870,6 +6856,7 @@ struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = {
 	.lbs_inode = sizeof(struct inode_security_struct),
 	.lbs_ipc = sizeof(struct ipc_security_struct),
 	.lbs_msg_msg = sizeof(struct msg_security_struct),
+	.lbs_superblock = sizeof(struct superblock_security_struct),
 };
 
 #ifdef CONFIG_PERF_EVENTS
@@ -6970,7 +6957,6 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds),
 	LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
 
-	LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security),
 	LSM_HOOK_INIT(sb_free_mnt_opts, selinux_free_mnt_opts),
 	LSM_HOOK_INIT(sb_remount, selinux_sb_remount),
 	LSM_HOOK_INIT(sb_kern_mount, selinux_sb_kern_mount),
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 330b7b6d44e0..dcebd2b95ca7 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -189,4 +189,10 @@ static inline u32 current_sid(void)
 	return tsec->sid;
 }
 
+static inline struct superblock_security_struct *selinux_superblock(
+					const struct super_block *superblock)
+{
+	return superblock->s_security + selinux_blob_sizes.lbs_superblock;
+}
+
 #endif /* _SELINUX_OBJSEC_H_ */
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index ef0afd878bfc..f838010cb0fe 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -47,6 +47,7 @@
 #include <linux/sched.h>
 #include <linux/audit.h>
 #include <linux/vmalloc.h>
+#include <linux/lsm_hooks.h>
 #include <net/netlabel.h>
 
 #include "flask.h"
@@ -2795,7 +2796,7 @@ int security_fs_use(struct selinux_state *state, struct super_block *sb)
 	struct sidtab *sidtab;
 	int rc = 0;
 	struct ocontext *c;
-	struct superblock_security_struct *sbsec = sb->s_security;
+	struct superblock_security_struct *sbsec = selinux_superblock(sb);
 	const char *fstype = sb->s_type->name;
 
 	read_lock(&state->ss->policy_rwlock);
diff --git a/security/smack/smack.h b/security/smack/smack.h
index e9e817d09785..d33d2a7b73a3 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -364,6 +364,12 @@ static inline struct smack_known **smack_ipc(const struct kern_ipc_perm *ipc)
 	return ipc->security + smack_blob_sizes.lbs_ipc;
 }
 
+static inline struct superblock_smack *smack_superblock(
+					const struct super_block *superblock)
+{
+	return superblock->s_security + smack_blob_sizes.lbs_superblock;
+}
+
 /*
  * Is the directory transmuting?
  */
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8ffbf951b7ed..8d8f70a99374 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -535,12 +535,7 @@ static int smack_syslog(int typefrom_file)
  */
 static int smack_sb_alloc_security(struct super_block *sb)
 {
-	struct superblock_smack *sbsp;
-
-	sbsp = kzalloc(sizeof(struct superblock_smack), GFP_KERNEL);
-
-	if (sbsp == NULL)
-		return -ENOMEM;
+	struct superblock_smack *sbsp = smack_superblock(sb);
 
 	sbsp->smk_root = &smack_known_floor;
 	sbsp->smk_default = &smack_known_floor;
@@ -549,22 +544,10 @@ static int smack_sb_alloc_security(struct super_block *sb)
 	/*
 	 * SMK_SB_INITIALIZED will be zero from kzalloc.
 	 */
-	sb->s_security = sbsp;
 
 	return 0;
 }
 
-/**
- * smack_sb_free_security - free a superblock blob
- * @sb: the superblock getting the blob
- *
- */
-static void smack_sb_free_security(struct super_block *sb)
-{
-	kfree(sb->s_security);
-	sb->s_security = NULL;
-}
-
 struct smack_mnt_opts {
 	const char *fsdefault, *fsfloor, *fshat, *fsroot, *fstransmute;
 };
@@ -772,7 +755,7 @@ static int smack_set_mnt_opts(struct super_block *sb,
 {
 	struct dentry *root = sb->s_root;
 	struct inode *inode = d_backing_inode(root);
-	struct superblock_smack *sp = sb->s_security;
+	struct superblock_smack *sp = smack_superblock(sb);
 	struct inode_smack *isp;
 	struct smack_known *skp;
 	struct smack_mnt_opts *opts = mnt_opts;
@@ -871,7 +854,7 @@ static int smack_set_mnt_opts(struct super_block *sb,
  */
 static int smack_sb_statfs(struct dentry *dentry)
 {
-	struct superblock_smack *sbp = dentry->d_sb->s_security;
+	struct superblock_smack *sbp = smack_superblock(dentry->d_sb);
 	int rc;
 	struct smk_audit_info ad;
 
@@ -905,7 +888,7 @@ static int smack_bprm_creds_for_exec(struct linux_binprm *bprm)
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
 
-	sbsp = inode->i_sb->s_security;
+	sbsp = smack_superblock(inode->i_sb);
 	if ((sbsp->smk_flags & SMK_SB_UNTRUSTED) &&
 	    isp->smk_task != sbsp->smk_root)
 		return 0;
@@ -1157,7 +1140,7 @@ static int smack_inode_rename(struct inode *old_inode,
  */
 static int smack_inode_permission(struct inode *inode, int mask)
 {
-	struct superblock_smack *sbsp = inode->i_sb->s_security;
+	struct superblock_smack *sbsp = smack_superblock(inode->i_sb);
 	struct smk_audit_info ad;
 	int no_block = mask & MAY_NOT_BLOCK;
 	int rc;
@@ -1398,7 +1381,7 @@ static int smack_inode_removexattr(struct dentry *dentry, const char *name)
 	 */
 	if (strcmp(name, XATTR_NAME_SMACK) == 0) {
 		struct super_block *sbp = dentry->d_sb;
-		struct superblock_smack *sbsp = sbp->s_security;
+		struct superblock_smack *sbsp = smack_superblock(sbp);
 
 		isp->smk_inode = sbsp->smk_default;
 	} else if (strcmp(name, XATTR_NAME_SMACKEXEC) == 0)
@@ -1668,7 +1651,7 @@ static int smack_mmap_file(struct file *file,
 	isp = smack_inode(file_inode(file));
 	if (isp->smk_mmap == NULL)
 		return 0;
-	sbsp = file_inode(file)->i_sb->s_security;
+	sbsp = smack_superblock(file_inode(file)->i_sb);
 	if (sbsp->smk_flags & SMK_SB_UNTRUSTED &&
 	    isp->smk_mmap != sbsp->smk_root)
 		return -EACCES;
@@ -3268,7 +3251,7 @@ static void smack_d_instantiate(struct dentry *opt_dentry, struct inode *inode)
 		return;
 
 	sbp = inode->i_sb;
-	sbsp = sbp->s_security;
+	sbsp = smack_superblock(sbp);
 	/*
 	 * We're going to use the superblock default label
 	 * if there's no label on the file.
@@ -4653,6 +4636,7 @@ struct lsm_blob_sizes smack_blob_sizes __lsm_ro_after_init = {
 	.lbs_inode = sizeof(struct inode_smack),
 	.lbs_ipc = sizeof(struct smack_known *),
 	.lbs_msg_msg = sizeof(struct smack_known *),
+	.lbs_superblock = sizeof(struct superblock_smack),
 };
 
 static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
@@ -4664,7 +4648,6 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(fs_context_parse_param, smack_fs_context_parse_param),
 
 	LSM_HOOK_INIT(sb_alloc_security, smack_sb_alloc_security),
-	LSM_HOOK_INIT(sb_free_security, smack_sb_free_security),
 	LSM_HOOK_INIT(sb_free_mnt_opts, smack_free_mnt_opts),
 	LSM_HOOK_INIT(sb_eat_lsm_opts, smack_sb_eat_lsm_opts),
 	LSM_HOOK_INIT(sb_statfs, smack_sb_statfs),
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 10/12] selftests/landlock: Add initial tests
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

Test landlock syscall, ptrace hooks semantic and filesystem
access-control.

Test coverage for security/landlock/ is 93.6% of lines.  The code not
covered only deals with internal kernel errors (e.g. memory allocation)
and race conditions.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Shuah Khan <shuah@kernel.org>
---

Changes since v18:
* Replace ruleset_rw.inval with layout1.inval to avoid inexistent test
  layout.
* Use the new FIXTURE_VARIANT for ptrace_test: makes the tests more
  readable and usable.
* Add ARRAY_SIZE() macro to please checkpatch.

Changes since v17:
* Add new test for mknod with a zero mode.
* Use memset(3) to initialize attr_features in base_test.

Changes since v16:
* Add new unpriv_enforce_without_no_new_privs test: check that ruleset
  enforcing is forbiden without no_new_privs and CAP_SYS_ADMIN.
* Drop capabilities when useful.
* Check the new size_attr_features field from struct
  landlock_attr_features.
* Update the empty_or_same_ruleset test to check complementary empty
  ruleset.
* Update base_test according to the new attribute structures and fix the
  inconsistent_attr test accordingly.
* Switch syscall attribute pointer and size arguments.
* Rename test files with a "_test" suffix.

Changes since v14:
* Add new tests:
  - superset: check new layer bitmask.
  - max_layers: check maximum number of layers.
  - release_inodes: check that umount work well.
  - empty_or_same_ruleset.
  - inconsistent_attr: checks copy_to_user limits.
  - in ruleset_rw.inval to check ruleset FD.
  - proc_unlinked_file: check file access through /proc/self/fd .
  - file_access_rights: check that a file can only get consistent access
    rights.
  - unpriv: check that NO_NEW_PRIVS or CAP_SYS_ADMIN is required.
  - check pipe access through /proc/self/fd .
  - check move_mount(2).
  - check ruleset file descriptor properties.
  - proc_nsfs: extend to check that internal filesystems (e.g. nsfs) are
    allowed.
* Double-check read and write effective actions.
* Fix potential desynchronization between the kernel sources and
  installed headers by overriding the build step in the Makefile.  This
  also enable to build with Clang.
* Add two files in the test directories (for link test and rename test).
* Remove test for ruleset's show_fdinfo().
* Replace EBADR with EBADFD.
* Update tests accordingly to the changes of rename and link rights.
* Fix (now) illegal access rights tied to files.
* Update rename and link tests.
* Remove superfluous '\n' in TH_LOG() calls.
* Make assert calls consistent and readable.
* Fix the execute test.
* Make tests future-proof.
* Cosmetic fixes.

Changes since v14:
* Add new tests:
  - Compatibility: empty_attr_{ruleset,path_beneath,enforce} to check
    minimal attr size.
  - Access types: link_to, rename_from, rename_to, rmdir, unlink,
    make_char, make_block, make_reg, make_sock, make_fifo, make_sym,
    make_dir, chroot, execute.
  - Test privilege escalation prevention by enforcing a nested rule, on
    a parent directory, with less restrictions than one on a child
    directory.
  - Test for empty and more than 32-bits allowed_access
* Merge the two test mount hierarchies.
* Complete relative path tests by combining chdir and chroot.
* Adjust tests:
  - Remove the layout1/extend_ruleset_with_denied_path test.
  - Extend layout1/whitelist test with checks on file.
  - Add and use create_dir_and_file().
* Only use read/write checks but not stat(2) for tests.
* Rename test.h to common.h and improve it.
* Rename path name to make them more consistent, easy to understand and
  make them in a common directory.
* Make create_ruleset() more generic.
* Constify variables.
* Re-add static global variables.
* Remove useless openat(2).
* Fix and complete kernel config.
* Set umask and clean up file modes.
* Clean up open flags.
* Improve Makefile.
* Fix spelling.
* Improve comments and error messages.

Changes since v13:
* Add back the filesystem tests (from v10) and extend them.
* Add tests for the new syscall.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-7-mic@digikod.net/
---
 tools/testing/selftests/Makefile              |    1 +
 tools/testing/selftests/landlock/.gitignore   |    2 +
 tools/testing/selftests/landlock/Makefile     |   29 +
 tools/testing/selftests/landlock/base_test.c  |  163 ++
 tools/testing/selftests/landlock/common.h     |   93 +
 tools/testing/selftests/landlock/config       |    5 +
 tools/testing/selftests/landlock/fs_test.c    | 1740 +++++++++++++++++
 .../testing/selftests/landlock/ptrace_test.c  |  321 +++
 tools/testing/selftests/landlock/true.c       |    5 +
 9 files changed, 2359 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/base_test.c
 create mode 100644 tools/testing/selftests/landlock/common.h
 create mode 100644 tools/testing/selftests/landlock/config
 create mode 100644 tools/testing/selftests/landlock/fs_test.c
 create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
 create mode 100644 tools/testing/selftests/landlock/true.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 1195bd85af38..25f00c514cc1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -24,6 +24,7 @@ TARGETS += ir
 TARGETS += kcmp
 TARGETS += kexec
 TARGETS += kvm
+TARGETS += landlock
 TARGETS += lib
 TARGETS += livepatch
 TARGETS += lkdtm
diff --git a/tools/testing/selftests/landlock/.gitignore b/tools/testing/selftests/landlock/.gitignore
new file mode 100644
index 000000000000..470203a7cd73
--- /dev/null
+++ b/tools/testing/selftests/landlock/.gitignore
@@ -0,0 +1,2 @@
+/*_test
+/true
diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile
new file mode 100644
index 000000000000..d29bd0804c14
--- /dev/null
+++ b/tools/testing/selftests/landlock/Makefile
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+
+src_test := $(wildcard *_test.c)
+
+TEST_GEN_PROGS := $(src_test:.c=)
+
+TEST_GEN_PROGS_EXTENDED := true
+
+KSFT_KHDR_INSTALL := 1
+OVERRIDE_TARGETS := 1
+include ../lib.mk
+
+# Cf. tools/testing/selftests/arm64/Makefile
+ifeq ($(KBUILD_OUTPUT),)
+khdr_dir = $(top_srcdir)/usr/include
+else
+khdr_dir = $(KBUILD_OUTPUT)/kselftest/usr/include
+endif
+
+CFLAGS += -Wall -O2
+
+$(khdr_dir)/linux/landlock.h: khdr
+	@:
+
+$(OUTPUT)/true: true.c
+	$(LINK.c) $< $(LDLIBS) -o $@ -static
+
+$(OUTPUT)/%_test: %_test.c $(khdr_dir)/linux/landlock.h ../kselftest_harness.h common.h
+	$(LINK.c) $< $(LDLIBS) -o $@ -lcap -I$(khdr_dir)
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
new file mode 100644
index 000000000000..f6bc8d6419f8
--- /dev/null
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Common user space base
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <string.h>
+#include <sys/prctl.h>
+
+#include "common.h"
+
+#ifndef O_PATH
+#define O_PATH		010000000
+#endif
+
+TEST(features)
+{
+	struct landlock_attr_features attr_features;
+
+	/* Tests that all fields are properly initialized. */
+	memset(&attr_features, 0xff, sizeof(attr_features));
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES,
+				&attr_features, sizeof(attr_features)));
+	ASSERT_EQ(((LANDLOCK_OPT_GET_FEATURES << 1) - 1),
+			attr_features.options_get_features);
+	ASSERT_EQ(((LANDLOCK_OPT_CREATE_RULESET << 1) - 1),
+			attr_features.options_create_ruleset);
+	ASSERT_EQ(((LANDLOCK_OPT_ADD_RULE_PATH_BENEATH << 1) - 1),
+			attr_features.options_add_rule);
+	ASSERT_EQ(((LANDLOCK_OPT_ENFORCE_RULESET << 1) - 1),
+			attr_features.options_enforce_ruleset);
+	ASSERT_EQ(sizeof(struct landlock_attr_features),
+		attr_features.size_attr_features);
+	ASSERT_EQ(sizeof(struct landlock_attr_ruleset),
+		attr_features.size_attr_ruleset);
+	ASSERT_EQ(sizeof(struct landlock_attr_path_beneath),
+		attr_features.size_attr_path_beneath);
+	ASSERT_EQ(sizeof(struct landlock_attr_enforce),
+		attr_features.size_attr_enforce);
+	ASSERT_EQ(((LANDLOCK_ACCESS_FS_MAKE_SYM << 1) - 1),
+			attr_features.access_fs);
+}
+
+TEST(inconsistent_attr) {
+	const long page_size = sysconf(_SC_PAGESIZE);
+	char *buf = malloc(page_size + 1);
+
+	/* Checks copy_from_user(). */
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, buf, 0));
+	/* The size if less than sizeof(struct landlock_attr_enforce). */
+	ASSERT_EQ(EINVAL, errno);
+
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, NULL, 1));
+	/* The size if less than sizeof(struct landlock_attr_enforce). */
+	ASSERT_EQ(EINVAL, errno);
+
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET,
+			NULL, sizeof(struct landlock_attr_ruleset)));
+	ASSERT_EQ(EFAULT, errno);
+
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, buf, page_size + 1));
+	ASSERT_EQ(E2BIG, errno);
+
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, buf, page_size));
+	ASSERT_EQ(ENOMSG, errno);
+
+	/* Checks non-zero value. */
+	buf[page_size - 2] = '.';
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, buf, page_size));
+	ASSERT_EQ(E2BIG, errno);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, buf, page_size + 1));
+	ASSERT_EQ(E2BIG, errno);
+
+	/* Checks copy_to_user(). */
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_GET_FEATURES,
+			LANDLOCK_OPT_GET_FEATURES, NULL, 0));
+	ASSERT_EQ(ENODATA, errno);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_GET_FEATURES,
+			LANDLOCK_OPT_GET_FEATURES, buf, 0));
+	ASSERT_EQ(ENODATA, errno);
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES, buf, 1));
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_GET_FEATURES,
+			LANDLOCK_OPT_GET_FEATURES, buf, page_size + 1));
+	ASSERT_EQ(E2BIG, errno);
+	ASSERT_EQ('.', buf[page_size - 2]);
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+			LANDLOCK_OPT_GET_FEATURES, buf, page_size));
+	ASSERT_EQ('\0', buf[page_size - 2]);
+
+	free(buf);
+}
+
+TEST(empty_attr_ruleset) {
+	/* Similar to struct landlock_attr_ruleset.handled_access_fs = 0 */
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, NULL, 0));
+	ASSERT_EQ(EINVAL, errno);
+}
+
+TEST(empty_attr_path_beneath) {
+	/* Similar to struct landlock_attr_path_beneath.*_fd = 0 */
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+			LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, NULL, 0));
+	ASSERT_EQ(EINVAL, errno);
+}
+
+TEST(empty_attr_enforce) {
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+
+	/* Similar to struct landlock_attr_enforce.ruleset_fd = 0 */
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+			LANDLOCK_OPT_ENFORCE_RULESET, NULL, 0));
+	ASSERT_EQ(EINVAL, errno);
+}
+
+TEST(unpriv_enforce_without_no_new_privs) {
+	int err;
+
+	disable_caps(_metadata);
+	err = landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+			LANDLOCK_OPT_ENFORCE_RULESET, NULL, 0);
+	ASSERT_EQ(errno, EPERM);
+	ASSERT_EQ(err, -1);
+}
+
+TEST(ruleset_fd)
+{
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+	int ruleset_fd;
+	char buf;
+
+	disable_caps(_metadata);
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(-1, write(ruleset_fd, ".", 1));
+	ASSERT_EQ(EINVAL, errno);
+	ASSERT_EQ(-1, read(ruleset_fd, &buf, 1));
+	ASSERT_EQ(EINVAL, errno);
+
+	ASSERT_EQ(0, close(ruleset_fd));
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
new file mode 100644
index 000000000000..f5459ab85cfd
--- /dev/null
+++ b/tools/testing/selftests/landlock/common.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Landlock test helpers
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#include <errno.h>
+#include <linux/landlock.h>
+#include <sys/capability.h>
+#include <sys/syscall.h>
+
+#include "../kselftest_harness.h"
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#ifndef landlock
+static inline int landlock(const unsigned int command,
+		const unsigned int options,
+		void *const attr_ptr, const size_t attr_size)
+{
+	errno = 0;
+	return syscall(__NR_landlock, command, options, attr_ptr, attr_size,
+			NULL, 0);
+}
+#endif
+
+static void disable_caps(struct __test_metadata *const _metadata)
+{
+	cap_t cap_p;
+	/* Only these two capabilities are useful for the tests. */
+	const cap_value_t caps[] = {
+		CAP_MKNOD,
+		CAP_SYS_ADMIN,
+		CAP_SYS_CHROOT,
+	};
+
+	cap_p = cap_get_proc();
+	ASSERT_NE(NULL, cap_p) {
+		TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_clear(cap_p)) {
+		TH_LOG("Failed to cap_clear: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_set_flag(cap_p, CAP_PERMITTED, ARRAY_SIZE(caps),
+				caps, CAP_SET)) {
+		TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_set_proc(cap_p)) {
+		TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_free(cap_p)) {
+		TH_LOG("Failed to cap_free: %s", strerror(errno));
+	}
+}
+
+static void effective_cap(struct __test_metadata *const _metadata,
+		const cap_value_t caps, const cap_flag_value_t value)
+{
+	cap_t cap_p;
+
+	cap_p = cap_get_proc();
+	ASSERT_NE(NULL, cap_p) {
+		TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &caps, value)) {
+		TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_set_proc(cap_p)) {
+		TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
+	}
+	ASSERT_NE(-1, cap_free(cap_p)) {
+		TH_LOG("Failed to cap_free: %s", strerror(errno));
+	}
+}
+
+/* We can't put such helpers in a library because of kselftest_harness.h . */
+__attribute__((__unused__))
+static void set_cap(struct __test_metadata *const _metadata,
+		const cap_value_t caps)
+{
+	effective_cap(_metadata, caps, CAP_SET);
+}
+
+__attribute__((__unused__))
+static void clear_cap(struct __test_metadata *const _metadata,
+		const cap_value_t caps)
+{
+	effective_cap(_metadata, caps, CAP_CLEAR);
+}
diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
new file mode 100644
index 000000000000..042298105821
--- /dev/null
+++ b/tools/testing/selftests/landlock/config
@@ -0,0 +1,5 @@
+CONFIG_SECURITY_LANDLOCK=y
+CONFIG_SECURITY_PATH=y
+CONFIG_SECURITY=y
+CONFIG_SHMEM=y
+CONFIG_TMPFS=y
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
new file mode 100644
index 000000000000..6847a6730df6
--- /dev/null
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -0,0 +1,1740 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Filesystem
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <sched.h>
+#include <string.h>
+#include <sys/capability.h>
+#include <sys/mount.h>
+#include <sys/prctl.h>
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+
+#include "common.h"
+
+#define TMP_DIR "tmp/"
+#define FILE_1 "file1"
+#define FILE_2 "file2"
+#define BINARY_PATH "./true"
+
+/* Paths (sibling number and depth) */
+static const char dir_s1d1[] = TMP_DIR "s1d1";
+static const char file1_s1d1[] = TMP_DIR "s1d1/" FILE_1;
+static const char file2_s1d1[] = TMP_DIR "s1d1/" FILE_2;
+static const char dir_s1d2[] = TMP_DIR "s1d1/s1d2";
+static const char file1_s1d2[] = TMP_DIR "s1d1/s1d2/" FILE_1;
+static const char file2_s1d2[] = TMP_DIR "s1d1/s1d2/" FILE_2;
+static const char dir_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3";
+static const char file1_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3/" FILE_1;
+static const char file2_s1d3[] = TMP_DIR "s1d1/s1d2/s1d3/" FILE_2;
+
+static const char dir_s2d1[] = TMP_DIR "s2d1";
+static const char file1_s2d1[] = TMP_DIR "s2d1/" FILE_1;
+static const char dir_s2d2[] = TMP_DIR "s2d1/s2d2";
+static const char file1_s2d2[] = TMP_DIR "s2d1/s2d2/" FILE_1;
+static const char dir_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3";
+static const char file1_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3/" FILE_1;
+static const char file2_s2d3[] = TMP_DIR "s2d1/s2d2/s2d3/" FILE_2;
+
+static const char dir_s3d1[] = TMP_DIR "s3d1";
+/* dir_s3d2 is a mount point. */
+static const char dir_s3d2[] = TMP_DIR "s3d1/s3d2";
+static const char dir_s3d3[] = TMP_DIR "s3d1/s3d2/s3d3";
+
+static void create_dir_and_file(struct __test_metadata *const _metadata,
+		const char *const dir_path)
+{
+	int file_fd;
+	char *const file1_path = alloca(strlen(dir_path) + sizeof(FILE_1) + 2);
+	char *const file2_path = alloca(strlen(dir_path) + sizeof(FILE_2) + 2);
+
+	strcpy(file1_path, dir_path);
+	strcat(file1_path, "/");
+	strcat(file1_path, FILE_1);
+
+	strcpy(file2_path, dir_path);
+	strcat(file2_path, "/");
+	strcat(file2_path, FILE_2);
+
+	ASSERT_EQ(0, mkdir(dir_path, 0700)) {
+		TH_LOG("Failed to create directory \"%s\": %s", dir_path,
+				strerror(errno));
+	}
+	file_fd = open(file1_path, O_CREAT | O_EXCL | O_WRONLY | O_CLOEXEC,
+			0700);
+	ASSERT_LE(0, file_fd);
+	ASSERT_EQ(0, close(file_fd));
+
+	file_fd = open(file2_path, O_CREAT | O_EXCL | O_WRONLY | O_CLOEXEC,
+			0700);
+	ASSERT_LE(0, file_fd);
+	ASSERT_EQ(0, close(file_fd));
+}
+
+static void delete_dir_and_file(const char *const dir_path)
+{
+	char *const file1_path = alloca(strlen(dir_path) +
+			sizeof(FILE_1) + 2);
+	char *const file2_path = alloca(strlen(dir_path) +
+			sizeof(FILE_2) + 2);
+
+	strcpy(file1_path, dir_path);
+	strcat(file1_path, "/");
+	strcat(file1_path, FILE_1);
+
+	strcpy(file2_path, dir_path);
+	strcat(file2_path, "/");
+	strcat(file2_path, FILE_2);
+
+	unlink(file1_path);
+	unlink(file2_path);
+	/* file1_path may be a directory, cf. layout1/make_directory. */
+	rmdir(file1_path);
+	rmdir(dir_path);
+}
+
+static void cleanup_layout1(struct __test_metadata *const _metadata)
+{
+	delete_dir_and_file(dir_s1d3);
+	delete_dir_and_file(dir_s1d2);
+	delete_dir_and_file(dir_s1d1);
+
+	delete_dir_and_file(dir_s2d3);
+	delete_dir_and_file(dir_s2d2);
+	delete_dir_and_file(dir_s2d1);
+
+	delete_dir_and_file(dir_s3d3);
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	umount(dir_s3d2);
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	delete_dir_and_file(dir_s3d2);
+	delete_dir_and_file(dir_s3d1);
+
+	delete_dir_and_file(TMP_DIR);
+}
+
+FIXTURE(layout1) {
+};
+
+FIXTURE_SETUP(layout1)
+{
+	disable_caps(_metadata);
+	cleanup_layout1(_metadata);
+
+	/* Do not pollute the rest of the system. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, unshare(CLONE_NEWNS));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	umask(0077);
+	create_dir_and_file(_metadata, TMP_DIR);
+
+	create_dir_and_file(_metadata, dir_s1d1);
+	create_dir_and_file(_metadata, dir_s1d2);
+	create_dir_and_file(_metadata, dir_s1d3);
+
+	create_dir_and_file(_metadata, dir_s2d1);
+	create_dir_and_file(_metadata, dir_s2d2);
+	create_dir_and_file(_metadata, dir_s2d3);
+
+	create_dir_and_file(_metadata, dir_s3d1);
+	create_dir_and_file(_metadata, dir_s3d2);
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount("tmp", dir_s3d2, "tmpfs", 0, "size=4m,mode=700"));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+	create_dir_and_file(_metadata, dir_s3d3);
+}
+
+FIXTURE_TEARDOWN(layout1)
+{
+	/*
+	 * cleanup_layout1() would be denied here, use TEST(cleanup) instead.
+	 */
+}
+
+static void test_path_rel(struct __test_metadata *const _metadata,
+		const int dirfd, const char *const path, const int ret)
+{
+	int fd;
+
+	/* Works with file and directories. */
+	fd = openat(dirfd, path, O_RDONLY | O_CLOEXEC);
+	if (ret) {
+		ASSERT_EQ(-1, fd) {
+			TH_LOG("Successfully opened \"%s\": %s", path,
+					strerror(errno));
+		}
+		ASSERT_EQ(EACCES, errno) {
+			TH_LOG("Wrong error code to open \"%s\": %s", path,
+					strerror(errno));
+		}
+	} else {
+		ASSERT_LE(0, fd) {
+			TH_LOG("Failed to open \"%s\": %s", path,
+					strerror(errno));
+		}
+		EXPECT_EQ(0, close(fd));
+	}
+}
+
+static void test_path(struct __test_metadata *const _metadata,
+		const char *const path, const int ret)
+{
+	return test_path_rel(_metadata, AT_FDCWD, path, ret);
+}
+
+TEST_F(layout1, no_restriction)
+{
+	test_path(_metadata, dir_s1d1, 0);
+	test_path(_metadata, file1_s1d1, 0);
+	test_path(_metadata, file2_s1d1, 0);
+	test_path(_metadata, dir_s1d2, 0);
+	test_path(_metadata, file1_s1d2, 0);
+	test_path(_metadata, file2_s1d2, 0);
+	test_path(_metadata, dir_s1d3, 0);
+	test_path(_metadata, file1_s1d3, 0);
+
+	test_path(_metadata, dir_s2d1, 0);
+	test_path(_metadata, file1_s2d1, 0);
+	test_path(_metadata, dir_s2d2, 0);
+	test_path(_metadata, file1_s2d2, 0);
+	test_path(_metadata, dir_s2d3, 0);
+	test_path(_metadata, file1_s2d3, 0);
+
+	test_path(_metadata, dir_s3d1, 0);
+	test_path(_metadata, dir_s3d2, 0);
+	test_path(_metadata, dir_s3d3, 0);
+}
+
+TEST_F(layout1, inval)
+{
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_WRITE_FILE,
+		.parent_fd = -1,
+	};
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_WRITE_FILE,
+	};
+	struct landlock_attr_enforce attr_enforce = {};
+	const int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET,
+			&attr_ruleset, sizeof(attr_ruleset));
+
+	ASSERT_LE(0, ruleset_fd);
+	path_beneath.parent_fd = open(dir_s1d2, O_PATH | O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+
+	path_beneath.ruleset_fd = open(dir_s1d1, O_PATH | O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.ruleset_fd);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	/* Returns EBADF because ruleset_fd contains O_PATH. */
+	ASSERT_EQ(EBADF, errno);
+	ASSERT_EQ(0, close(path_beneath.ruleset_fd));
+
+	path_beneath.ruleset_fd = open(dir_s1d1, O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.ruleset_fd);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	/* Returns EBADFD because ruleset_fd is not a valid ruleset. */
+	ASSERT_EQ(EBADFD, errno);
+	ASSERT_EQ(0, close(path_beneath.ruleset_fd));
+
+	path_beneath.ruleset_fd = ruleset_fd;
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	/* Tests without O_PATH. */
+	path_beneath.parent_fd = open(dir_s1d2, O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(EBADFD, errno);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	/* Checks unhandled allowed_access. */
+	path_beneath.parent_fd = open(dir_s1d2, O_PATH | O_DIRECTORY |
+			O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+
+	/* Test with legitimate values. */
+	path_beneath.allowed_access |= LANDLOCK_ACCESS_FS_EXECUTE;
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(EINVAL, errno);
+	path_beneath.allowed_access &= ~LANDLOCK_ACCESS_FS_EXECUTE;
+
+	/* Test with unknown (64-bits) value. */
+	path_beneath.allowed_access |= (1ULL << 60);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(EINVAL, errno);
+	path_beneath.allowed_access &= ~(1ULL << 60);
+
+	/* Test with no access. */
+	path_beneath.allowed_access = 0;
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	path_beneath.allowed_access &= ~(1ULL << 60);
+
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	/* Enforces the ruleset. */
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+
+	attr_enforce.ruleset_fd = ruleset_fd;
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce)));
+
+	ASSERT_EQ(0, close(ruleset_fd));
+}
+
+#define ACCESS_FILE ( \
+	LANDLOCK_ACCESS_FS_EXECUTE | \
+	LANDLOCK_ACCESS_FS_WRITE_FILE | \
+	LANDLOCK_ACCESS_FS_READ_FILE)
+
+#define ACCESS_LAST LANDLOCK_ACCESS_FS_MAKE_SYM
+
+#define ACCESS_ALL ( \
+	ACCESS_FILE | \
+	LANDLOCK_ACCESS_FS_READ_DIR | \
+	LANDLOCK_ACCESS_FS_CHROOT | \
+	LANDLOCK_ACCESS_FS_REMOVE_DIR | \
+	LANDLOCK_ACCESS_FS_REMOVE_FILE | \
+	LANDLOCK_ACCESS_FS_MAKE_CHAR | \
+	LANDLOCK_ACCESS_FS_MAKE_DIR | \
+	LANDLOCK_ACCESS_FS_MAKE_REG | \
+	LANDLOCK_ACCESS_FS_MAKE_SOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_FIFO | \
+	LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+	ACCESS_LAST)
+
+TEST_F(layout1, file_access_rights)
+{
+	__u64 access;
+	int err;
+	struct landlock_attr_path_beneath path_beneath = {};
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = ACCESS_ALL,
+	};
+
+	path_beneath.ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, path_beneath.ruleset_fd);
+
+	/* Tests access rights for files. */
+	path_beneath.parent_fd = open(file1_s1d2, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+	for (access = 1; access <= ACCESS_LAST; access <<= 1) {
+		path_beneath.allowed_access = access;
+		err = landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath));
+		if ((access | ACCESS_FILE) == ACCESS_FILE) {
+			ASSERT_EQ(0, err);
+		} else {
+			ASSERT_EQ(-1, err);
+			ASSERT_EQ(EINVAL, errno);
+		}
+	}
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static void add_path_beneath(struct __test_metadata *const _metadata,
+		const int ruleset_fd, const __u64 allowed_access,
+		const char *const path)
+{
+	struct landlock_attr_path_beneath path_beneath = {
+		.ruleset_fd = ruleset_fd,
+		.allowed_access = allowed_access,
+	};
+
+	path_beneath.parent_fd = open(path, O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd) {
+		TH_LOG("Failed to open directory \"%s\": %s", path,
+				strerror(errno));
+	}
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath))) {
+		TH_LOG("Failed to update the ruleset with \"%s\": %s", path,
+				strerror(errno));
+	}
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+struct rule {
+	const char *path;
+	__u64 access;
+};
+
+#define ACCESS_RO ( \
+	LANDLOCK_ACCESS_FS_READ_FILE | \
+	LANDLOCK_ACCESS_FS_READ_DIR)
+
+#define ACCESS_RW ( \
+	ACCESS_RO | \
+	LANDLOCK_ACCESS_FS_WRITE_FILE)
+
+static int create_ruleset(struct __test_metadata *const _metadata,
+		const __u64 handled_access_fs, const struct rule rules[])
+{
+	int ruleset_fd, i;
+	struct landlock_attr_features attr_features = {};
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = handled_access_fs,
+	};
+
+	ASSERT_NE(NULL, rules) {
+		TH_LOG("No rule list");
+	}
+	ASSERT_NE(NULL, rules[0].path) {
+		TH_LOG("Empty rule list");
+	}
+
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES, &attr_features,
+				sizeof(attr_features)));
+	/* Only for test, use a binary AND for real application instead. */
+	ASSERT_EQ(attr_ruleset.handled_access_fs,
+			attr_ruleset.handled_access_fs &
+			attr_features.access_fs);
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, ruleset_fd) {
+		TH_LOG("Failed to create a ruleset: %s", strerror(errno));
+	}
+
+	for (i = 0; rules[i].path; i++) {
+		ASSERT_EQ(rules[i].access, rules[i].access &
+				attr_features.access_fs);
+		add_path_beneath(_metadata, ruleset_fd, rules[i].access,
+				rules[i].path);
+	}
+	return ruleset_fd;
+}
+
+static void enforce_ruleset(struct __test_metadata *const _metadata,
+		const int ruleset_fd)
+{
+	struct landlock_attr_enforce attr_enforce = {
+		.ruleset_fd = ruleset_fd,
+	};
+
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce))) {
+		TH_LOG("Failed to enforce ruleset: %s", strerror(errno));
+	}
+}
+
+TEST_F(layout1, proc_nsfs)
+{
+	const struct rule rules[] = {
+		{
+			.path = "/dev/null",
+			.access = LANDLOCK_ACCESS_FS_READ_FILE |
+				LANDLOCK_ACCESS_FS_WRITE_FILE,
+		},
+		{}
+	};
+	struct landlock_attr_path_beneath path_beneath;
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access |
+			LANDLOCK_ACCESS_FS_READ_DIR, rules);
+
+	test_path(_metadata, "/proc/self/ns/mnt", 0);
+
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	test_path(_metadata, "/", -1);
+	test_path(_metadata, "/dev", -1);
+	test_path(_metadata, "/dev/null", 0);
+	test_path(_metadata, "/dev/full", -1);
+
+	test_path(_metadata, "/proc", -1);
+	test_path(_metadata, "/proc/self", -1);
+	test_path(_metadata, "/proc/self/ns", -1);
+	/*
+	 * Because nsfs is an internal filesystem, /proc/self/ns/mnt is a
+	 * disconnected path.  Such path cannot be identified and must then be
+	 * allowed.
+	 */
+	test_path(_metadata, "/proc/self/ns/mnt", 0);
+
+	/*
+	 * Checks that it is not possible to add nsfs-like filesystem
+	 * references to a ruleset.
+	 */
+	path_beneath.ruleset_fd = ruleset_fd;
+	path_beneath.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE |
+		LANDLOCK_ACCESS_FS_WRITE_FILE,
+	path_beneath.parent_fd = open("/proc/self/ns/mnt", O_PATH | O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(EBADFD, errno);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+}
+
+static void drop_privileges(struct __test_metadata *const _metadata)
+{
+	cap_t caps;
+	const cap_value_t cap_val = CAP_SYS_ADMIN;
+
+	caps = cap_get_proc();
+	ASSERT_NE(NULL, caps);
+	ASSERT_EQ(0, cap_set_flag(caps, CAP_EFFECTIVE, 1, &cap_val,
+				CAP_CLEAR));
+	ASSERT_EQ(0, cap_set_proc(caps));
+	ASSERT_EQ(0, cap_free(caps));
+}
+
+TEST_F(layout1, unpriv) {
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	struct landlock_attr_enforce attr_enforce = {};
+
+	drop_privileges(_metadata);
+	attr_enforce.ruleset_fd = create_ruleset(_metadata, ACCESS_RO, rules);
+	ASSERT_LE(0, attr_enforce.ruleset_fd);
+	ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce)));
+	ASSERT_EQ(EPERM, errno);
+
+	/* enforce_ruleset() calls prctl(no_new_privs). */
+	enforce_ruleset(_metadata, attr_enforce.ruleset_fd);
+	EXPECT_EQ(0, close(attr_enforce.ruleset_fd));
+}
+
+TEST_F(layout1, whitelist)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = ACCESS_RO,
+		},
+		{
+			.path = file1_s2d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE |
+				LANDLOCK_ACCESS_FS_WRITE_FILE,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+	char buf;
+	int reg_fd;
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Tests on a directory. */
+	test_path(_metadata, "/", -1);
+	test_path(_metadata, dir_s1d1, -1);
+	test_path(_metadata, file1_s1d1, -1);
+	test_path(_metadata, dir_s1d2, 0);
+	test_path(_metadata, file1_s1d2, 0);
+	test_path(_metadata, dir_s1d3, 0);
+	test_path(_metadata, file1_s1d3, 0);
+
+	/* Tests on a file. */
+	test_path(_metadata, dir_s2d2, -1);
+	test_path(_metadata, file1_s2d2, 0);
+
+	/* Checks effective read and write actions. */
+	reg_fd = open(file1_s2d2, O_RDWR | O_CLOEXEC);
+	ASSERT_LE(0, reg_fd);
+	ASSERT_EQ(1, write(reg_fd, ".", 1));
+	ASSERT_LE(0, lseek(reg_fd, 0, SEEK_SET));
+	ASSERT_EQ(1, read(reg_fd, &buf, 1));
+	ASSERT_EQ('.', buf);
+	ASSERT_EQ(0, close(reg_fd));
+
+	/* Just in case, double-checks effective actions. */
+	reg_fd = open(file1_s2d2, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, reg_fd);
+	ASSERT_EQ(-1, write(reg_fd, &buf, 1));
+	ASSERT_EQ(EBADF, errno);
+	ASSERT_EQ(0, close(reg_fd));
+}
+
+TEST_F(layout1, unhandled_access)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	set_cap(_metadata, CAP_SYS_CHROOT);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Because the policy does not handle LANDLOCK_ACCESS_FS_CHROOT,
+	 * chroot(2) should be allowed.
+	 */
+	ASSERT_EQ(0, chroot(dir_s1d1));
+	ASSERT_EQ(0, chroot(dir_s1d2));
+	ASSERT_EQ(0, chroot(dir_s1d3));
+}
+
+TEST_F(layout1, ruleset_overlap)
+{
+	const struct rule rules[] = {
+		/* These rules should be ORed among them. */
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_WRITE_FILE,
+		},
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_DIR,
+		},
+		{}
+	};
+	int open_fd;
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, open(file1_s1d1, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, open(dir_s1d1, O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	open_fd = open(file1_s1d2, O_WRONLY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+
+	open_fd = open(file1_s1d3, O_WRONLY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	EXPECT_EQ(0, close(open_fd));
+}
+
+TEST_F(layout1, inherit_subset)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE |
+				LANDLOCK_ACCESS_FS_READ_DIR,
+		},
+		{}
+	};
+	int open_fd;
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	ASSERT_EQ(-1, open(file1_s1d1, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, open(dir_s1d1, O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/* Write access is forbidden. */
+	ASSERT_EQ(-1, open(file1_s1d2, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is allowed. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* Write access is forbidden. */
+	ASSERT_EQ(-1, open(file1_s1d3, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is allowed. */
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/*
+	 * Tests shared rule extension: the following rules should not grant
+	 * any new access, only remove some.  Once enforced, these rules are
+	 * ANDed with the previous ones.
+	 */
+	add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_WRITE_FILE,
+			dir_s1d2);
+	/*
+	 * According to ruleset_fd, dir_s1d2 should now have the
+	 * LANDLOCK_ACCESS_FS_READ_FILE and LANDLOCK_ACCESS_FS_WRITE_FILE
+	 * access rights (even if this directory is opened a second time).
+	 * However, when enforcing this updated ruleset, the ruleset tied to
+	 * the current process (i.e. its domain) will still only have the
+	 * dir_s1d2 with LANDLOCK_ACCESS_FS_READ_FILE and
+	 * LANDLOCK_ACCESS_FS_READ_DIR accesses, but
+	 * LANDLOCK_ACCESS_FS_WRITE_FILE must not be allowed because it would
+	 * be a privilege escalation.
+	 */
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	/* Same tests and results as above. */
+	ASSERT_EQ(-1, open(file1_s1d1, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, open(dir_s1d1, O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/* It is still forbidden to write in file1_s1d2. */
+	ASSERT_EQ(-1, open(file1_s1d2, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is still allowed. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* It is still forbidden to write in file1_s1d3. */
+	ASSERT_EQ(-1, open(file1_s1d3, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is still allowed. */
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/*
+	 * Try to get more privileges by adding new access rights to the parent
+	 * directory: dir_s1d1.
+	 */
+	add_path_beneath(_metadata, ruleset_fd, ACCESS_RW, dir_s1d1);
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	/* Same tests and results as above. */
+	ASSERT_EQ(-1, open(file1_s1d1, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, open(dir_s1d1, O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/* It is still forbidden to write in file1_s1d2. */
+	ASSERT_EQ(-1, open(file1_s1d2, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is still allowed. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* It is still forbidden to write in file1_s1d3. */
+	ASSERT_EQ(-1, open(file1_s1d3, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is still allowed. */
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/*
+	 * Now, dir_s1d3 get a new rule tied to it, only allowing
+	 * LANDLOCK_ACCESS_FS_WRITE_FILE.  The (kernel internal) difference is
+	 * that there was no rule tied to it before.
+	 */
+	add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_WRITE_FILE,
+			dir_s1d3);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/*
+	 * Same tests and results as above, except for open(dir_s1d3) which is
+	 * now denied because the new rule mask the rule previously inherited
+	 * from dir_s1d2.
+	 */
+
+	/* Same tests and results as above. */
+	ASSERT_EQ(-1, open(file1_s1d1, O_WRONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, open(dir_s1d1, O_RDONLY | O_DIRECTORY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/* It is still forbidden to write in file1_s1d2. */
+	ASSERT_EQ(-1, open(file1_s1d2, O_WRONLY | O_CLOEXEC));
+	/* Readdir access is still allowed. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* It is still forbidden to write in file1_s1d3. */
+	ASSERT_EQ(-1, open(file1_s1d3, O_WRONLY | O_CLOEXEC));
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	/* Readdir of dir_s1d3 is now forbidden too. */
+	ASSERT_EQ(-1, open_fd);
+	ASSERT_EQ(EACCES, errno);
+}
+
+TEST_F(layout1, inherit_superset)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d3,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	int open_fd;
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	/* Readdir access is denied for dir_s1d2. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_EQ(-1, open_fd);
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is allowed for dir_s1d3. */
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+
+	/* Now dir_s1d2, parent of dir_s1d3, gets a new rule tied to it. */
+	add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_READ_DIR, dir_s1d2);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Readdir access is still denied for dir_s1d2. */
+	open_fd = open(dir_s1d2, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_EQ(-1, open_fd);
+	ASSERT_EQ(EACCES, errno);
+	/* Readdir access is still allowed for dir_s1d3. */
+	open_fd = open(dir_s1d3, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+	ASSERT_LE(0, open_fd);
+	ASSERT_EQ(0, close(open_fd));
+}
+
+TEST_F(layout1, max_layers)
+{
+	int i, err;
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+	struct landlock_attr_enforce attr_enforce = {
+		.ruleset_fd = ruleset_fd,
+	};
+
+	ASSERT_LE(0, ruleset_fd);
+	for (i = 0; i < 64; i++)
+		enforce_ruleset(_metadata, ruleset_fd);
+
+	for (i = 0; i < 2; i++) {
+		err = landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce));
+		ASSERT_EQ(-1, err);
+		ASSERT_EQ(E2BIG, errno);
+	}
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, empty_or_same_ruleset)
+{
+	struct landlock_attr_enforce attr_enforce = {};
+	struct landlock_attr_ruleset attr_ruleset = {};
+
+	/* Tests empty handled_access_fs. */
+	attr_enforce.ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(-1, attr_enforce.ruleset_fd);
+	ASSERT_EQ(ENOMSG, errno);
+
+	/* Enforces policy which deny read access to all files. */
+	attr_ruleset.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE;
+	attr_enforce.ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, attr_enforce.ruleset_fd);
+	enforce_ruleset(_metadata, attr_enforce.ruleset_fd);
+	test_path(_metadata, file1_s1d1, -1);
+	test_path(_metadata, dir_s1d1, 0);
+
+	/* Nests a policy which deny read access to all directories. */
+	attr_ruleset.handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR;
+	attr_enforce.ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, attr_enforce.ruleset_fd);
+	enforce_ruleset(_metadata, attr_enforce.ruleset_fd);
+	test_path(_metadata, file1_s1d1, -1);
+	test_path(_metadata, dir_s1d1, -1);
+
+	/* Enforces a second time with the same ruleset. */
+	enforce_ruleset(_metadata, attr_enforce.ruleset_fd);
+	EXPECT_EQ(0, close(attr_enforce.ruleset_fd));
+}
+
+TEST_F(layout1, rule_on_mountpoint)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d1,
+			.access = ACCESS_RO,
+		},
+		{
+			/* dir_s3d2 is a mount point. */
+			.path = dir_s3d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, dir_s1d1, 0);
+
+	test_path(_metadata, dir_s2d1, -1);
+
+	test_path(_metadata, dir_s3d1, -1);
+	test_path(_metadata, dir_s3d2, 0);
+	test_path(_metadata, dir_s3d3, 0);
+}
+
+TEST_F(layout1, rule_over_mountpoint)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d1,
+			.access = ACCESS_RO,
+		},
+		{
+			/* dir_s3d2 is a mount point. */
+			.path = dir_s3d1,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, dir_s1d1, 0);
+
+	test_path(_metadata, dir_s2d1, -1);
+
+	test_path(_metadata, dir_s3d1, 0);
+	test_path(_metadata, dir_s3d2, 0);
+	test_path(_metadata, dir_s3d3, 0);
+}
+
+/*
+ * This test verifies that we can apply a landlock rule on the root (/), it
+ * might require special handling.
+ */
+TEST_F(layout1, rule_over_root)
+{
+	const struct rule rules[] = {
+		{
+			.path = "/",
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, "/", 0);
+	test_path(_metadata, dir_s1d1, 0);
+}
+
+TEST_F(layout1, rule_inside_mount_ns)
+{
+	const struct rule rules[] = {
+		{
+			.path = "s3d3",
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	int ruleset_fd;
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+	ASSERT_EQ(0, syscall(SYS_pivot_root, dir_s3d2, dir_s3d3)) {
+		TH_LOG("Failed to pivot_root into \"%s\": %s", dir_s3d2,
+				strerror(errno));
+	};
+	ASSERT_EQ(0, chdir("/"));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_path(_metadata, "s3d3", 0);
+	test_path(_metadata, "/", -1);
+}
+
+TEST_F(layout1, mount_and_pivot)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s3d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+	ASSERT_EQ(EPERM, errno);
+	ASSERT_EQ(-1, syscall(SYS_pivot_root, dir_s3d2, dir_s3d3));
+	ASSERT_EQ(EPERM, errno);
+}
+
+TEST_F(layout1, move_mount)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s3d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL));
+	ASSERT_EQ(0, syscall(SYS_move_mount, AT_FDCWD, dir_s3d2, AT_FDCWD,
+				dir_s1d2, 0)) {
+		TH_LOG("Failed to move_mount: %s", strerror(errno));
+	}
+	ASSERT_EQ(0, syscall(SYS_move_mount, AT_FDCWD, dir_s1d2, AT_FDCWD,
+				dir_s3d2, 0));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, syscall(SYS_move_mount, AT_FDCWD, dir_s3d2, AT_FDCWD,
+				dir_s1d2, 0));
+	ASSERT_EQ(EPERM, errno);
+}
+
+TEST_F(layout1, release_inodes)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d1,
+			.access = ACCESS_RO,
+		},
+		{
+			.path = dir_s3d2,
+			.access = ACCESS_RO,
+		},
+		{
+			.path = dir_s3d3,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+	int test_fd;
+
+	ASSERT_LE(0, ruleset_fd);
+	/* Unmount a file hierarchy while it is being used by a ruleset. */
+	set_cap(_metadata, CAP_SYS_ADMIN);
+	ASSERT_EQ(0, umount(dir_s3d2));
+	clear_cap(_metadata, CAP_SYS_ADMIN);
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_fd = open(file1_s1d1, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, test_fd);
+	/* This dir_s3d2 is not allowed, only the tmpfs on it was. */
+	ASSERT_EQ(-1, open(dir_s3d2, O_RDONLY | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	/* This dir_s3d3 would not be allowed and does not exist anyway. */
+	ASSERT_EQ(-1, open(dir_s3d3, O_RDONLY | O_CLOEXEC));
+	ASSERT_EQ(ENOENT, errno);
+}
+
+enum relative_access {
+	REL_OPEN,
+	REL_CHDIR,
+	REL_CHROOT_ONLY,
+	REL_CHROOT_CHDIR,
+};
+
+static void test_relative_path(struct __test_metadata *const _metadata,
+		const enum relative_access rel)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = ACCESS_RO,
+		},
+		{
+			.path = dir_s2d2,
+			.access = ACCESS_RO,
+		},
+		{}
+	};
+	int dirfd;
+	const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	switch (rel) {
+	case REL_OPEN:
+	case REL_CHDIR:
+		break;
+	case REL_CHROOT_ONLY:
+		ASSERT_EQ(0, chdir(dir_s2d2));
+		break;
+	case REL_CHROOT_CHDIR:
+		ASSERT_EQ(0, chdir(dir_s1d2));
+		break;
+	default:
+		ASSERT_TRUE(false);
+		return;
+	}
+
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	enforce_ruleset(_metadata, ruleset_fd);
+
+	switch (rel) {
+	case REL_OPEN:
+		dirfd = open(dir_s1d2, O_DIRECTORY);
+		ASSERT_LE(0, dirfd);
+		break;
+	case REL_CHDIR:
+		ASSERT_EQ(0, chdir(dir_s1d2));
+		dirfd = AT_FDCWD;
+		break;
+	case REL_CHROOT_ONLY:
+		/* Do chroot into dir_s1d2 (relative to dir_s2d2). */
+		ASSERT_EQ(0, chroot("../../s1d1/s1d2")) {
+			TH_LOG("Failed to chroot: %s", strerror(errno));
+		}
+		dirfd = AT_FDCWD;
+		break;
+	case REL_CHROOT_CHDIR:
+		/* Do chroot into dir_s1d2. */
+		ASSERT_EQ(0, chroot(".")) {
+			TH_LOG("Failed to chroot: %s", strerror(errno));
+		}
+		dirfd = AT_FDCWD;
+		break;
+	}
+
+	test_path_rel(_metadata, dirfd, "..",
+			(rel == REL_CHROOT_CHDIR) ? 0 : -1);
+	test_path_rel(_metadata, dirfd, ".", 0);
+
+	if (rel == REL_CHROOT_ONLY)
+		/* The current directory is dir_s2d2. */
+		test_path_rel(_metadata, dirfd, "./s2d3", 0);
+	else
+		/* The current directory is dir_s1d2. */
+		test_path_rel(_metadata, dirfd, "./s1d3", 0);
+
+	if (rel != REL_CHROOT_CHDIR) {
+		test_path_rel(_metadata, dirfd, "../../s1d1", -1);
+		test_path_rel(_metadata, dirfd, "../../s1d1/s1d2", 0);
+		test_path_rel(_metadata, dirfd, "../../s1d1/s1d2/s1d3", 0);
+
+		test_path_rel(_metadata, dirfd, "../../s2d1", -1);
+		test_path_rel(_metadata, dirfd, "../../s2d1/s2d2", 0);
+		test_path_rel(_metadata, dirfd, "../../s2d1/s2d2/s2d3", 0);
+	}
+
+	if (rel == REL_OPEN)
+		EXPECT_EQ(0, close(dirfd));
+	EXPECT_EQ(0, close(ruleset_fd));
+}
+
+TEST_F(layout1, relative_open)
+{
+	test_relative_path(_metadata, REL_OPEN);
+}
+
+TEST_F(layout1, relative_chdir)
+{
+	test_relative_path(_metadata, REL_CHDIR);
+}
+
+TEST_F(layout1, relative_chroot_only)
+{
+	test_relative_path(_metadata, REL_CHROOT_ONLY);
+}
+
+TEST_F(layout1, relative_chroot_chdir)
+{
+	test_relative_path(_metadata, REL_CHROOT_CHDIR);
+}
+
+TEST_F(layout1, chroot)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_CHROOT,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	set_cap(_metadata, CAP_SYS_CHROOT);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, chroot(dir_s1d1));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, chroot(dir_s1d2)) {
+		TH_LOG("Failed to chroot into \"%s\": %s", file1_s1d2,
+				strerror(errno));
+	};
+	/* This chroot still works because we didn't chdir(dir_s1d2). */
+	ASSERT_EQ(0, chroot(dir_s1d3));
+}
+
+static void copy_binary(struct __test_metadata *const _metadata,
+		const char *const dst_path)
+{
+	int dst_fd, src_fd;
+	struct stat statbuf;
+
+	dst_fd = open(dst_path, O_WRONLY | O_TRUNC | O_CLOEXEC);
+	ASSERT_LE(0, dst_fd) {
+		TH_LOG("Failed to open \"%s\": %s", dst_path,
+				strerror(errno));
+	}
+	src_fd = open(BINARY_PATH, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, src_fd) {
+		TH_LOG("Failed to open \"" BINARY_PATH "\": %s",
+				strerror(errno));
+	}
+	ASSERT_EQ(0, fstat(src_fd, &statbuf));
+	ASSERT_EQ(statbuf.st_size, sendfile(dst_fd, src_fd, 0,
+				statbuf.st_size));
+	ASSERT_EQ(0, close(src_fd));
+	ASSERT_EQ(0, close(dst_fd));
+}
+
+static void test_execute(struct __test_metadata *const _metadata,
+		const char *const path, const int ret)
+{
+	int status;
+	char *const argv[] = {(char *)path, NULL};
+	const pid_t child = fork();
+
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		ASSERT_EQ(ret, execve(path, argv, NULL)) {
+			TH_LOG("Failed to execute \"%s\": %s", path,
+					strerror(errno));
+		};
+		ASSERT_EQ(EACCES, errno);
+		_exit(_metadata->passed ? 2 : 1);
+		return;
+	}
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	ASSERT_EQ(1, WIFEXITED(status));
+	ASSERT_EQ(ret ? 2 : 0, WEXITSTATUS(status)) {
+		TH_LOG("Unexpected return code for \"%s\": %s", path,
+				strerror(errno));
+	};
+}
+
+TEST_F(layout1, execute)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_EXECUTE,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	copy_binary(_metadata, file1_s1d1);
+	copy_binary(_metadata, file1_s1d2);
+	copy_binary(_metadata, file1_s1d3);
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	test_execute(_metadata, file1_s1d1, -1);
+	test_execute(_metadata, file1_s1d2, 0);
+	test_execute(_metadata, file1_s1d3, 0);
+}
+
+TEST_F(layout1, link)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_MAKE_REG,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, unlink(file1_s1d1));
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, unlink(file1_s1d3));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, link(file2_s1d1, file1_s1d1));
+	ASSERT_EQ(EACCES, errno);
+	/* Denies linking because of reparenting. */
+	ASSERT_EQ(-1, link(file1_s2d1, file1_s1d2));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, link(file2_s1d2, file1_s1d3));
+	ASSERT_EQ(EACCES, errno);
+
+	ASSERT_EQ(0, link(file2_s1d2, file1_s1d2)) {
+		TH_LOG("Failed to link file to \"%s\": %s", file2_s1d2,
+				strerror(errno));
+	};
+	ASSERT_EQ(0, link(file2_s1d3, file1_s1d3));
+}
+
+TEST_F(layout1, rename_file)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d3,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+		},
+		{
+			.path = dir_s2d2,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, unlink(file1_s1d1));
+	ASSERT_EQ(0, unlink(file1_s1d2));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Replaces file. */
+	ASSERT_EQ(-1, rename(file1_s2d3, file1_s1d3));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, rename(file1_s2d1, file1_s1d3));
+	ASSERT_EQ(EACCES, errno);
+	/* Same parent. */
+	ASSERT_EQ(0, rename(file2_s2d3, file1_s2d3)) {
+		TH_LOG("Failed to rename file \"%s\": %s", file2_s2d3,
+				strerror(errno));
+	};
+
+	/* Renames files. */
+	ASSERT_EQ(-1, rename(file1_s2d2, file1_s1d2));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, unlink(file1_s1d3));
+	ASSERT_EQ(-1, rename(file1_s2d1, file1_s1d3));
+	ASSERT_EQ(EACCES, errno);
+	/* Same parent. */
+	ASSERT_EQ(0, rename(file2_s1d3, file1_s1d3));
+}
+
+TEST_F(layout1, rename_dir)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+		},
+		{
+			.path = dir_s2d1,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	/* Empties dir_s1d3. */
+	ASSERT_EQ(0, unlink(file1_s1d3));
+	ASSERT_EQ(0, unlink(file2_s1d3));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Renames directory. */
+	ASSERT_EQ(-1, rename(dir_s2d3, dir_s1d3));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, rename(dir_s1d3, file1_s1d2)) {
+		TH_LOG("Failed to rename directory \"%s\": %s", dir_s1d3,
+				strerror(errno));
+	};
+	ASSERT_EQ(0, rmdir(file1_s1d2));
+}
+
+TEST_F(layout1, rmdir)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_DIR,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, unlink(file1_s1d1));
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, unlink(file1_s1d3));
+	ASSERT_EQ(0, unlink(file2_s1d3));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(0, rmdir(dir_s1d3));
+	/* dir_s1d2 itself cannot be removed. */
+	ASSERT_EQ(-1, rmdir(dir_s1d2));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(-1, rmdir(dir_s1d1));
+	ASSERT_EQ(EACCES, errno);
+}
+
+TEST_F(layout1, unlink)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_REMOVE_FILE,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, unlink(file1_s1d1));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, unlink(file1_s1d2)) {
+		TH_LOG("Failed to unlink file \"%s\": %s", file1_s1d2,
+				strerror(errno));
+	};
+	ASSERT_EQ(0, unlink(file1_s1d3));
+}
+
+static void test_make_file(struct __test_metadata *const _metadata,
+		const __u64 access, const mode_t mode, const dev_t dev)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = access,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, access, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	unlink(file1_s1d1);
+	unlink(file1_s1d2);
+	unlink(file1_s1d3);
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, mknod(file1_s1d1, mode | 0400, dev));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, mknod(file1_s1d2, mode | 0400, dev)) {
+		TH_LOG("Failed to make file \"%s\": %s",
+				file1_s1d2, strerror(errno));
+	};
+	ASSERT_EQ(0, mknod(file1_s1d3, mode | 0400, dev));
+}
+
+TEST_F(layout1, make_char)
+{
+	/* Creates a /dev/null device. */
+	set_cap(_metadata, CAP_MKNOD);
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_CHAR, S_IFCHR,
+			major(1) | minor(3));
+}
+
+TEST_F(layout1, make_block)
+{
+	/* Creates a /dev/loop0 device. */
+	set_cap(_metadata, CAP_MKNOD);
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_BLOCK, S_IFBLK,
+			major(7) | minor(0));
+}
+
+TEST_F(layout1, make_reg)
+{
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_REG, S_IFREG, 0);
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_REG, 0, 0);
+}
+
+TEST_F(layout1, make_sock)
+{
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_SOCK, S_IFSOCK, 0);
+}
+
+TEST_F(layout1, make_fifo)
+{
+	test_make_file(_metadata, LANDLOCK_ACCESS_FS_MAKE_FIFO, S_IFIFO, 0);
+}
+
+TEST_F(layout1, make_sym)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_MAKE_SYM,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, unlink(file1_s1d1));
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, unlink(file1_s1d3));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, symlink("none", file1_s1d1));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, symlink("none", file1_s1d2)) {
+		TH_LOG("Failed to make symlink \"%s\": %s",
+				file1_s1d2, strerror(errno));
+	};
+	ASSERT_EQ(0, symlink("none", file1_s1d3));
+}
+
+TEST_F(layout1, make_dir)
+{
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_MAKE_DIR,
+		},
+		{}
+	};
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+
+	ASSERT_EQ(0, unlink(file1_s1d1));
+	ASSERT_EQ(0, unlink(file1_s1d2));
+	ASSERT_EQ(0, unlink(file1_s1d3));
+
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Uses file_* as directory names. */
+	ASSERT_EQ(-1, mkdir(file1_s1d1, 0700));
+	ASSERT_EQ(EACCES, errno);
+	ASSERT_EQ(0, mkdir(file1_s1d2, 0700)) {
+		TH_LOG("Failed to make directory \"%s\": %s",
+				file1_s1d2, strerror(errno));
+	};
+	ASSERT_EQ(0, mkdir(file1_s1d3, 0700));
+}
+
+static int open_proc_fd(struct __test_metadata *const _metadata, const int fd,
+		const int open_flags)
+{
+	static const char path_template[] = "/proc/self/fd/%d";
+	char procfd_path[sizeof(path_template) + 10];
+	const int procfd_path_size = snprintf(procfd_path, sizeof(procfd_path),
+			path_template, fd);
+
+	ASSERT_LE(procfd_path_size, sizeof(procfd_path));
+	return open(procfd_path, open_flags);
+}
+
+TEST_F(layout1, proc_unlinked_file)
+{
+	const struct rule rules[] = {
+		{
+			.path = file1_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE,
+		},
+		{}
+	};
+	int reg_fd, proc_fd;
+	const int ruleset_fd = create_ruleset(_metadata,
+			LANDLOCK_ACCESS_FS_READ_FILE |
+			LANDLOCK_ACCESS_FS_WRITE_FILE, rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	ASSERT_EQ(-1, open(file1_s1d2, O_RDWR | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+	reg_fd = open(file1_s1d2, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, reg_fd);
+	test_path(_metadata, file1_s1d2, 0);
+	ASSERT_EQ(0, unlink(file1_s1d2));
+
+	proc_fd = open_proc_fd(_metadata, reg_fd, O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, proc_fd);
+	EXPECT_EQ(0, close(proc_fd));
+
+	proc_fd = open_proc_fd(_metadata, reg_fd, O_RDWR | O_CLOEXEC);
+	ASSERT_EQ(-1, proc_fd) {
+		TH_LOG("Successfully opened /proc/self/fd/%d: %s",
+				reg_fd, strerror(errno));
+	}
+	ASSERT_EQ(EACCES, errno);
+
+	EXPECT_EQ(0, close(reg_fd));
+}
+
+TEST_F(layout1, proc_pipe)
+{
+	int reg_fd, proc_fd;
+	int pipe_fds[2];
+	char buf = '\0';
+	const struct rule rules[] = {
+		{
+			.path = dir_s1d2,
+			.access = LANDLOCK_ACCESS_FS_READ_FILE |
+				LANDLOCK_ACCESS_FS_WRITE_FILE,
+		},
+		{}
+	};
+	/* Limits read and write access to files tied to the filesystem. */
+	const int ruleset_fd = create_ruleset(_metadata, rules[0].access,
+			rules);
+
+	ASSERT_LE(0, ruleset_fd);
+	enforce_ruleset(_metadata, ruleset_fd);
+	EXPECT_EQ(0, close(ruleset_fd));
+
+	/* Checks enforcement for normal files. */
+	reg_fd = open(file1_s1d2, O_RDWR | O_CLOEXEC);
+	ASSERT_LE(0, reg_fd);
+	EXPECT_EQ(0, close(reg_fd));
+	ASSERT_EQ(-1, open(file1_s1d1, O_RDWR | O_CLOEXEC));
+	ASSERT_EQ(EACCES, errno);
+
+	/* Checks access to pipes through FD. */
+	ASSERT_EQ(0, pipe(pipe_fds));
+	ASSERT_EQ(1, write(pipe_fds[1], ".", 1)) {
+		TH_LOG("Failed to write in pipe: %s", strerror(errno));
+	}
+	ASSERT_EQ(1, read(pipe_fds[0], &buf, 1));
+	ASSERT_EQ('.', buf);
+
+	/* Checks write access to pipe through /proc/self/fd . */
+	proc_fd = open_proc_fd(_metadata, pipe_fds[1], O_WRONLY | O_CLOEXEC);
+	ASSERT_LE(0, proc_fd);
+	ASSERT_EQ(1, write(proc_fd, ".", 1)) {
+		TH_LOG("Failed to write through /proc/self/fd/%d: %s",
+				pipe_fds[1], strerror(errno));
+	}
+	EXPECT_EQ(0, close(proc_fd));
+
+	/* Checks read access to pipe through /proc/self/fd . */
+	proc_fd = open_proc_fd(_metadata, pipe_fds[0], O_RDONLY | O_CLOEXEC);
+	ASSERT_LE(0, proc_fd);
+	buf = '\0';
+	ASSERT_EQ(1, read(proc_fd, &buf, 1)) {
+		TH_LOG("Failed to read through /proc/self/fd/%d: %s",
+				pipe_fds[1], strerror(errno));
+	}
+	EXPECT_EQ(0, close(proc_fd));
+
+	EXPECT_EQ(0, close(pipe_fds[0]));
+	EXPECT_EQ(0, close(pipe_fds[1]));
+}
+
+TEST(cleanup)
+{
+	cleanup_layout1(_metadata);
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/ptrace_test.c b/tools/testing/selftests/landlock/ptrace_test.c
new file mode 100644
index 000000000000..87d1b9ae35f5
--- /dev/null
+++ b/tools/testing/selftests/landlock/ptrace_test.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Landlock tests - Ptrace
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019-2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <signal.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "common.h"
+
+static void create_domain(struct __test_metadata *const _metadata)
+{
+	int ruleset_fd;
+	struct landlock_attr_features attr_features;
+	struct landlock_attr_enforce attr_enforce;
+	struct landlock_attr_ruleset attr_ruleset = {
+		.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+	struct landlock_attr_path_beneath path_beneath = {
+		.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE,
+	};
+
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES,
+				LANDLOCK_OPT_GET_FEATURES,
+				&attr_features, sizeof(attr_features)));
+	/* Only for test, use a binary AND for real application instead. */
+	ASSERT_EQ(attr_ruleset.handled_access_fs,
+			attr_ruleset.handled_access_fs &
+			attr_features.access_fs);
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &attr_ruleset,
+			sizeof(attr_ruleset));
+	ASSERT_LE(0, ruleset_fd) {
+		TH_LOG("Failed to create a ruleset: %s", strerror(errno));
+	}
+	path_beneath.ruleset_fd = ruleset_fd;
+	path_beneath.parent_fd = open("/tmp", O_PATH | O_NOFOLLOW | O_DIRECTORY
+			| O_CLOEXEC);
+	ASSERT_LE(0, path_beneath.parent_fd);
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ADD_RULE,
+				LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+				&path_beneath, sizeof(path_beneath)));
+	ASSERT_EQ(0, errno);
+	ASSERT_EQ(0, close(path_beneath.parent_fd));
+
+	ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+	ASSERT_EQ(0, errno);
+
+	attr_enforce.ruleset_fd = ruleset_fd;
+	ASSERT_EQ(0, landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce)));
+	ASSERT_EQ(0, errno);
+
+	ASSERT_EQ(0, close(ruleset_fd));
+}
+
+FIXTURE(hierarchy) { };
+
+FIXTURE_VARIANT(hierarchy) {
+	bool domain_both;
+	bool domain_parent;
+	bool domain_child;
+};
+
+/*
+ * Test multiple tracing combinations between a parent process P1 and a child
+ * process P2.
+ *
+ * Yama's scoped ptrace is presumed disabled.  If enabled, this optional
+ * restriction is enforced in addition to any Landlock check, which means that
+ * all P2 requests to trace P1 would be denied.
+ */
+
+/*
+ *        No domain
+ *
+ *   P1-.               P1 -> P2 : allow
+ *       \              P2 -> P1 : allow
+ *        'P2
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_without_domain) {
+	.domain_both = false,
+	.domain_parent = false,
+	.domain_child = false,
+};
+
+/*
+ *        Child domain
+ *
+ *   P1--.              P1 -> P2 : allow
+ *        \             P2 -> P1 : deny
+ *        .'-----.
+ *        |  P2  |
+ *        '------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_with_one_domain) {
+	.domain_both = false,
+	.domain_parent = false,
+	.domain_child = true,
+};
+
+/*
+ *        Parent domain
+ * .------.
+ * |  P1  --.           P1 -> P2 : deny
+ * '------'  \          P2 -> P1 : allow
+ *            '
+ *            P2
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_parent_domain) {
+	.domain_both = false,
+	.domain_parent = true,
+	.domain_child = false,
+};
+
+/*
+ *        Parent + child domain (siblings)
+ * .------.
+ * |  P1  ---.          P1 -> P2 : deny
+ * '------'   \         P2 -> P1 : deny
+ *         .---'--.
+ *         |  P2  |
+ *         '------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_sibling_domain) {
+	.domain_both = false,
+	.domain_parent = true,
+	.domain_child = true,
+};
+
+/*
+ *         Same domain (inherited)
+ * .-------------.
+ * | P1----.     |      P1 -> P2 : allow
+ * |        \    |      P2 -> P1 : allow
+ * |         '   |
+ * |         P2  |
+ * '-------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_sibling_domain) {
+	.domain_both = true,
+	.domain_parent = false,
+	.domain_child = false,
+};
+
+/*
+ *         Inherited + child domain
+ * .-----------------.
+ * |  P1----.        |  P1 -> P2 : allow
+ * |         \       |  P2 -> P1 : deny
+ * |        .-'----. |
+ * |        |  P2  | |
+ * |        '------' |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, allow_with_nested_domain) {
+	.domain_both = true,
+	.domain_parent = false,
+	.domain_child = true,
+};
+
+/*
+ *         Inherited + parent domain
+ * .-----------------.
+ * |.------.         |  P1 -> P2 : deny
+ * ||  P1  ----.     |  P2 -> P1 : allow
+ * |'------'    \    |
+ * |             '   |
+ * |             P2  |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_nested_and_parent_domain) {
+	.domain_both = true,
+	.domain_parent = true,
+	.domain_child = false,
+};
+
+/*
+ *         Inherited + parent and child domain (siblings)
+ * .-----------------.
+ * | .------.        |  P1 -> P2 : deny
+ * | |  P1  .        |  P2 -> P1 : deny
+ * | '------'\       |
+ * |          \      |
+ * |        .--'---. |
+ * |        |  P2  | |
+ * |        '------' |
+ * '-----------------'
+ */
+FIXTURE_VARIANT_ADD(hierarchy, deny_with_forked_domain) {
+	.domain_both = true,
+	.domain_parent = true,
+	.domain_child = true,
+};
+
+FIXTURE_SETUP(hierarchy)
+{ }
+
+FIXTURE_TEARDOWN(hierarchy)
+{ }
+
+/* test PTRACE_TRACEME and PTRACE_ATTACH for parent and child */
+TEST_F(hierarchy, trace)
+{
+	pid_t child, parent;
+	int status;
+	int pipe_child[2], pipe_parent[2];
+	char buf_parent;
+
+	disable_caps(_metadata);
+
+	parent = getpid();
+	ASSERT_EQ(0, pipe(pipe_child));
+	ASSERT_EQ(0, pipe(pipe_parent));
+	if (variant->domain_both)
+		create_domain(_metadata);
+
+	child = fork();
+	ASSERT_LE(0, child);
+	if (child == 0) {
+		char buf_child;
+
+		EXPECT_EQ(0, close(pipe_parent[1]));
+		EXPECT_EQ(0, close(pipe_child[0]));
+		if (variant->domain_child)
+			create_domain(_metadata);
+
+		/* sync #1 */
+		ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) {
+			TH_LOG("Failed to read() sync #1 from parent");
+		}
+		ASSERT_EQ('.', buf_child);
+
+		/* Tests the parent protection. */
+		ASSERT_EQ(variant->domain_child ? -1 : 0,
+				ptrace(PTRACE_ATTACH, parent, NULL, 0));
+		if (variant->domain_child) {
+			ASSERT_EQ(EPERM, errno);
+		} else {
+			ASSERT_EQ(parent, waitpid(parent, &status, 0));
+			ASSERT_EQ(1, WIFSTOPPED(status));
+			ASSERT_EQ(0, ptrace(PTRACE_DETACH, parent, NULL, 0));
+		}
+
+		/* sync #2 */
+		ASSERT_EQ(1, write(pipe_child[1], ".", 1)) {
+			TH_LOG("Failed to write() sync #2 to parent");
+		}
+
+		/* Tests traceme. */
+		ASSERT_EQ(variant->domain_parent ? -1 : 0, ptrace(PTRACE_TRACEME));
+		if (variant->domain_parent) {
+			ASSERT_EQ(EPERM, errno);
+		} else {
+			ASSERT_EQ(0, raise(SIGSTOP));
+		}
+
+		/* sync #3 */
+		ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) {
+			TH_LOG("Failed to read() sync #3 from parent");
+		}
+		ASSERT_EQ('.', buf_child);
+		_exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE);
+	}
+
+	EXPECT_EQ(0, close(pipe_child[1]));
+	EXPECT_EQ(0, close(pipe_parent[0]));
+	if (variant->domain_parent)
+		create_domain(_metadata);
+
+	/* sync #1 */
+	ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) {
+		TH_LOG("Failed to write() sync #1 to child");
+	}
+
+	/* Tests the parent protection. */
+	/* sync #2 */
+	ASSERT_EQ(1, read(pipe_child[0], &buf_parent, 1)) {
+		TH_LOG("Failed to read() sync #2 from child");
+	}
+	ASSERT_EQ('.', buf_parent);
+
+	/* Tests traceme. */
+	if (!variant->domain_parent) {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+	}
+	/* Tests attach. */
+	ASSERT_EQ(variant->domain_parent ? -1 : 0,
+			ptrace(PTRACE_ATTACH, child, NULL, 0));
+	if (variant->domain_parent) {
+		ASSERT_EQ(EPERM, errno);
+	} else {
+		ASSERT_EQ(child, waitpid(child, &status, 0));
+		ASSERT_EQ(1, WIFSTOPPED(status));
+		ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0));
+	}
+
+	/* sync #3 */
+	ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) {
+		TH_LOG("Failed to write() sync #3 to child");
+	}
+	ASSERT_EQ(child, waitpid(child, &status, 0));
+	if (WIFSIGNALED(status) || WEXITSTATUS(status))
+		_metadata->passed = 0;
+}
+
+TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/landlock/true.c b/tools/testing/selftests/landlock/true.c
new file mode 100644
index 000000000000..3f9ccbf52783
--- /dev/null
+++ b/tools/testing/selftests/landlock/true.c
@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: GPL-2.0
+int main(void)
+{
+	return 0;
+}
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 11/12] samples/landlock: Add a sandbox manager example
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

Add a basic sandbox tool to launch a command which can only access a
whitelist of file hierarchies in a read-only or read-write way.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v16:
* Switch syscall attribute pointer and size arguments.

Changes since v15:
* Update access right names.
* Properly assign access right to files according to the new related
  syscall restriction.
* Replace "select" with "depends on" HEADERS_INSTALL (suggested by Randy
  Dunlap).

Changes since v14:
* Fix Kconfig dependency.
* Remove access rights that may be required for FD-only requests:
  mmap, truncate, getattr, lock, chmod, chown, chgrp, ioctl.
* Fix useless hardcoded syscall number.
* Use execvpe().
* Follow symlinks.
* Extend help with common file paths.
* Constify variables.
* Clean up comments.
* Improve error message.

Changes since v11:
* Add back the filesystem sandbox manager and update it to work with the
  new Landlock syscall.

Previous changes:
https://lore.kernel.org/lkml/20190721213116.23476-9-mic@digikod.net/
---
 samples/Kconfig              |   7 ++
 samples/Makefile             |   1 +
 samples/landlock/.gitignore  |   1 +
 samples/landlock/Makefile    |  15 +++
 samples/landlock/sandboxer.c | 228 +++++++++++++++++++++++++++++++++++
 5 files changed, 252 insertions(+)
 create mode 100644 samples/landlock/.gitignore
 create mode 100644 samples/landlock/Makefile
 create mode 100644 samples/landlock/sandboxer.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 0ed6e4d71d87..092962924f0d 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -124,6 +124,13 @@ config SAMPLE_HIDRAW
 	bool "hidraw sample"
 	depends on CC_CAN_LINK && HEADERS_INSTALL
 
+config SAMPLE_LANDLOCK
+	bool "Build Landlock sample code"
+	depends on HEADERS_INSTALL
+	help
+	  Build a simple Landlock sandbox manager able to launch a process
+	  restricted by a user-defined filesystem access-control security policy.
+
 config SAMPLE_PIDFD
 	bool "pidfd sample"
 	depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index 754553597581..4a6ce8f64a4c 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB)		+= kdb/
 obj-$(CONFIG_SAMPLE_KFIFO)		+= kfifo/
 obj-$(CONFIG_SAMPLE_KOBJECT)		+= kobject/
 obj-$(CONFIG_SAMPLE_KPROBES)		+= kprobes/
+subdir-$(CONFIG_SAMPLE_LANDLOCK)	+= landlock
 obj-$(CONFIG_SAMPLE_LIVEPATCH)		+= livepatch/
 subdir-$(CONFIG_SAMPLE_PIDFD)		+= pidfd
 obj-$(CONFIG_SAMPLE_QMI_CLIENT)		+= qmi/
diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore
new file mode 100644
index 000000000000..f43668b2d318
--- /dev/null
+++ b/samples/landlock/.gitignore
@@ -0,0 +1 @@
+/sandboxer
diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile
new file mode 100644
index 000000000000..9dfb571641ba
--- /dev/null
+++ b/samples/landlock/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+
+hostprogs-y := sandboxer
+
+always := $(hostprogs-y)
+
+KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
+
+.PHONY: all clean
+
+all:
+	$(MAKE) -C ../.. samples/landlock/
+
+clean:
+	$(MAKE) -C ../.. M=samples/landlock/ clean
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
new file mode 100644
index 000000000000..e0059706c11f
--- /dev/null
+++ b/samples/landlock/sandboxer.c
@@ -0,0 +1,228 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Simple Landlock sandbox manager able to launch a process restricted by a
+ * user-defined filesystem access-control security policy.
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2020 ANSSI
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/landlock.h>
+#include <linux/prctl.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+#ifndef landlock
+static inline int landlock(const unsigned int command,
+		const unsigned int options, void *const attr_ptr,
+		const size_t attr_size)
+{
+	errno = 0;
+	return syscall(__NR_landlock, command, options, attr_ptr, attr_size,
+			NULL, 0);
+}
+#endif
+
+#define ENV_FS_RO_NAME "LL_FS_RO"
+#define ENV_FS_RW_NAME "LL_FS_RW"
+#define ENV_PATH_TOKEN ":"
+
+static int parse_path(char *env_path, const char ***const path_list)
+{
+	int i, path_nb = 0;
+
+	if (env_path) {
+		path_nb++;
+		for (i = 0; env_path[i]; i++) {
+			if (env_path[i] == ENV_PATH_TOKEN[0])
+				path_nb++;
+		}
+	}
+	*path_list = malloc(path_nb * sizeof(**path_list));
+	for (i = 0; i < path_nb; i++)
+		(*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN);
+
+	return path_nb;
+}
+
+#define ACCESS_FILE ( \
+	LANDLOCK_ACCESS_FS_EXECUTE | \
+	LANDLOCK_ACCESS_FS_WRITE_FILE | \
+	LANDLOCK_ACCESS_FS_READ_FILE)
+
+static int populate_ruleset(
+		const struct landlock_attr_features *const attr_features,
+		const char *const env_var, const int ruleset_fd,
+		const __u64 allowed_access)
+{
+	int path_nb, i;
+	char *env_path_name;
+	const char **path_list = NULL;
+	struct landlock_attr_path_beneath path_beneath = {
+		.ruleset_fd = ruleset_fd,
+		.parent_fd = -1,
+	};
+
+	env_path_name = getenv(env_var);
+	if (!env_path_name) {
+		fprintf(stderr, "Missing environment variable %s\n", env_var);
+		return 1;
+	}
+	env_path_name = strdup(env_path_name);
+	unsetenv(env_var);
+	path_nb = parse_path(env_path_name, &path_list);
+	if (path_nb == 1 && path_list[0][0] == '\0') {
+		fprintf(stderr, "Missing path in %s\n", env_var);
+		goto err_free_name;
+	}
+
+	for (i = 0; i < path_nb; i++) {
+		struct stat statbuf;
+
+		path_beneath.parent_fd = open(path_list[i], O_PATH |
+				O_CLOEXEC);
+		if (path_beneath.parent_fd < 0) {
+			fprintf(stderr, "Failed to open \"%s\": %s\n",
+					path_list[i],
+					strerror(errno));
+			goto err_free_name;
+		}
+		if (fstat(path_beneath.parent_fd, &statbuf)) {
+			close(path_beneath.parent_fd);
+			goto err_free_name;
+		}
+		/* Follows a best-effort approach. */
+		path_beneath.allowed_access = allowed_access &
+			attr_features->access_fs;
+		if (!S_ISDIR(statbuf.st_mode))
+			path_beneath.allowed_access &= ACCESS_FILE;
+		if (landlock(LANDLOCK_CMD_ADD_RULE,
+					LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+					&path_beneath, sizeof(path_beneath))) {
+			fprintf(stderr, "Failed to update the ruleset with \"%s\": %s\n",
+					path_list[i], strerror(errno));
+			close(path_beneath.parent_fd);
+			goto err_free_name;
+		}
+		close(path_beneath.parent_fd);
+	}
+	free(env_path_name);
+	return 0;
+
+err_free_name:
+	free(env_path_name);
+	return 1;
+}
+
+#define ACCESS_FS_ROUGHLY_READ ( \
+	LANDLOCK_ACCESS_FS_EXECUTE | \
+	LANDLOCK_ACCESS_FS_READ_FILE | \
+	LANDLOCK_ACCESS_FS_READ_DIR | \
+	LANDLOCK_ACCESS_FS_CHROOT)
+
+#define ACCESS_FS_ROUGHLY_WRITE ( \
+	LANDLOCK_ACCESS_FS_WRITE_FILE | \
+	LANDLOCK_ACCESS_FS_REMOVE_DIR | \
+	LANDLOCK_ACCESS_FS_REMOVE_FILE | \
+	LANDLOCK_ACCESS_FS_MAKE_CHAR | \
+	LANDLOCK_ACCESS_FS_MAKE_DIR | \
+	LANDLOCK_ACCESS_FS_MAKE_REG | \
+	LANDLOCK_ACCESS_FS_MAKE_SOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_FIFO | \
+	LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
+	LANDLOCK_ACCESS_FS_MAKE_SYM)
+
+int main(const int argc, char *const argv[], char *const *const envp)
+{
+	const char *cmd_path;
+	char *const *cmd_argv;
+	int ruleset_fd;
+	struct landlock_attr_features attr_features;
+	struct landlock_attr_ruleset ruleset = {
+		.handled_access_fs = ACCESS_FS_ROUGHLY_READ |
+			ACCESS_FS_ROUGHLY_WRITE,
+	};
+	struct landlock_attr_enforce attr_enforce = {};
+
+	if (argc < 2) {
+		fprintf(stderr, "usage: %s=\"...\" %s=\"...\" %s <cmd> [args]...\n\n",
+				ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+		fprintf(stderr, "Launch a command in a restricted environment.\n\n");
+		fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n");
+		fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
+				ENV_FS_RO_NAME);
+		fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
+				ENV_FS_RO_NAME);
+		fprintf(stderr, "\nexample:\n"
+				"%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
+				"%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
+				"%s bash -i\n",
+				ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+		return 1;
+	}
+
+	if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
+				&attr_features, sizeof(attr_features))) {
+		perror("Failed to probe the Landlock supported features");
+		switch (errno) {
+		case ENOSYS:
+			fprintf(stderr, "Hint: this kernel does not support Landlock.\n");
+			break;
+		case ENOPKG:
+			fprintf(stderr, "Hint: Landlock is currently disabled. It can be enabled in the kernel configuration or at boot with the \"lsm=landlock\" parameter.\n");
+			break;
+		}
+		return 1;
+	}
+	/* Follows a best-effort approach. */
+	ruleset.handled_access_fs &= attr_features.access_fs;
+	ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+			LANDLOCK_OPT_CREATE_RULESET, &ruleset,
+			sizeof(ruleset));
+	if (ruleset_fd < 0) {
+		perror("Failed to create a ruleset");
+		return 1;
+	}
+	if (populate_ruleset(&attr_features, ENV_FS_RO_NAME, ruleset_fd,
+				ACCESS_FS_ROUGHLY_READ)) {
+		goto err_close_ruleset;
+	}
+	if (populate_ruleset(&attr_features, ENV_FS_RW_NAME, ruleset_fd,
+				ACCESS_FS_ROUGHLY_READ |
+				ACCESS_FS_ROUGHLY_WRITE)) {
+		goto err_close_ruleset;
+	}
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("Failed to restrict privileges");
+		goto err_close_ruleset;
+	}
+	attr_enforce.ruleset_fd = ruleset_fd;
+	if (landlock(LANDLOCK_CMD_ENFORCE_RULESET,
+				LANDLOCK_OPT_ENFORCE_RULESET, &attr_enforce,
+				sizeof(attr_enforce))) {
+		perror("Failed to enforce ruleset");
+		goto err_close_ruleset;
+	}
+	close(ruleset_fd);
+
+	cmd_path = argv[1];
+	cmd_argv = argv + 1;
+	execvpe(cmd_path, cmd_argv, envp);
+	fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
+			strerror(errno));
+	fprintf(stderr, "Hint: access to the binary, the interpreter or shared libraries may be denied.\n");
+	return 1;
+
+err_close_ruleset:
+	close(ruleset_fd);
+	return 1;
+}
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 03/12] landlock: Set up the security framework and manage credentials
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

A process credentials point to a Landlock domain, which is underneath
implemented with a ruleset.  In the following commits, this domain is
used to check and enforce the ptrace and filesystem security policies.
A domain is inherited from a parent to its child the same way a thread
inherits a seccomp policy.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v17:
* Constify returned domain pointers from landlock_get_current_domain()
  and landlock_get_task_domain() helpers.

Changes since v15:
* Optimize landlocked() for current thread.
* Display the greeting message when everything is initialized.

Changes since v14:
* Uses pr_fmt from common.h .
* Constify variables.
* Remove useless NULL initialization.

Changes since v13:
* totally get ride of the seccomp dependency
* only keep credential management and LSM setup.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-4-mic@digikod.net/
---
 security/Kconfig           | 10 +++----
 security/landlock/Makefile |  3 +-
 security/landlock/common.h | 20 +++++++++++++
 security/landlock/cred.c   | 46 ++++++++++++++++++++++++++++++
 security/landlock/cred.h   | 58 ++++++++++++++++++++++++++++++++++++++
 security/landlock/setup.c  | 31 ++++++++++++++++++++
 security/landlock/setup.h  | 16 +++++++++++
 7 files changed, 178 insertions(+), 6 deletions(-)
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/cred.c
 create mode 100644 security/landlock/cred.h
 create mode 100644 security/landlock/setup.c
 create mode 100644 security/landlock/setup.h

diff --git a/security/Kconfig b/security/Kconfig
index 582fd777a757..a96ee1c7fd25 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -278,11 +278,11 @@ endchoice
 
 config LSM
 	string "Ordered list of enabled LSMs"
-	default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
-	default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
-	default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
-	default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
-	default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
+	default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
 	help
 	  A comma-separated list of LSMs, in initialization order.
 	  Any LSMs left off this list will be ignored. This can be
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index d846eba445bb..041ea242e627 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := object.o ruleset.o
+landlock-y := setup.o object.o ruleset.o \
+	cred.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
new file mode 100644
index 000000000000..5dc0fe15707d
--- /dev/null
+++ b/security/landlock/common.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Common constants and helpers
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_COMMON_H
+#define _SECURITY_LANDLOCK_COMMON_H
+
+#define LANDLOCK_NAME "landlock"
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) LANDLOCK_NAME ": " fmt
+
+#endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/cred.c b/security/landlock/cred.c
new file mode 100644
index 000000000000..7074149d2517
--- /dev/null
+++ b/security/landlock/cred.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Credential hooks
+ *
+ * Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/cred.h>
+#include <linux/lsm_hooks.h>
+
+#include "common.h"
+#include "cred.h"
+#include "ruleset.h"
+#include "setup.h"
+
+static int hook_cred_prepare(struct cred *const new,
+		const struct cred *const old, const gfp_t gfp)
+{
+	const struct landlock_cred_security *cred_old = landlock_cred(old);
+	struct landlock_cred_security *cred_new = landlock_cred(new);
+	struct landlock_ruleset *dom_old;
+
+	dom_old = cred_old->domain;
+	if (dom_old) {
+		landlock_get_ruleset(dom_old);
+		cred_new->domain = dom_old;
+	}
+	return 0;
+}
+
+static void hook_cred_free(struct cred *const cred)
+{
+	landlock_put_ruleset_deferred(landlock_cred(cred)->domain);
+}
+
+static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = {
+	LSM_HOOK_INIT(cred_prepare, hook_cred_prepare),
+	LSM_HOOK_INIT(cred_free, hook_cred_free),
+};
+
+__init void landlock_add_hooks_cred(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			LANDLOCK_NAME);
+}
diff --git a/security/landlock/cred.h b/security/landlock/cred.h
new file mode 100644
index 000000000000..2983dd4dda46
--- /dev/null
+++ b/security/landlock/cred.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Credential hooks
+ *
+ * Copyright © 2019 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2019 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_CRED_H
+#define _SECURITY_LANDLOCK_CRED_H
+
+#include <linux/cred.h>
+#include <linux/init.h>
+#include <linux/rcupdate.h>
+
+#include "ruleset.h"
+#include "setup.h"
+
+struct landlock_cred_security {
+	struct landlock_ruleset *domain;
+};
+
+static inline struct landlock_cred_security *landlock_cred(
+		const struct cred *cred)
+{
+	return cred->security + landlock_blob_sizes.lbs_cred;
+}
+
+static inline const struct landlock_ruleset *landlock_get_current_domain(void)
+{
+	return landlock_cred(current_cred())->domain;
+}
+
+/*
+ * The call needs to come from an RCU read-side critical section.
+ */
+static inline const struct landlock_ruleset *landlock_get_task_domain(
+		const struct task_struct *const task)
+{
+	return landlock_cred(__task_cred(task))->domain;
+}
+
+static inline bool landlocked(const struct task_struct *const task)
+{
+	bool has_dom;
+
+	if (task == current)
+		return !!landlock_get_current_domain();
+
+	rcu_read_lock();
+	has_dom = !!landlock_get_task_domain(task);
+	rcu_read_unlock();
+	return has_dom;
+}
+
+__init void landlock_add_hooks_cred(void);
+
+#endif /* _SECURITY_LANDLOCK_CRED_H */
diff --git a/security/landlock/setup.c b/security/landlock/setup.c
new file mode 100644
index 000000000000..39ee1766f175
--- /dev/null
+++ b/security/landlock/setup.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Security framework setup
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/init.h>
+#include <linux/lsm_hooks.h>
+
+#include "common.h"
+#include "cred.h"
+#include "setup.h"
+
+struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = {
+	.lbs_cred = sizeof(struct landlock_cred_security),
+};
+
+static int __init landlock_init(void)
+{
+	landlock_add_hooks_cred();
+	pr_info("Up and running.\n");
+	return 0;
+}
+
+DEFINE_LSM(LANDLOCK_NAME) = {
+	.name = LANDLOCK_NAME,
+	.init = landlock_init,
+	.blobs = &landlock_blob_sizes,
+};
diff --git a/security/landlock/setup.h b/security/landlock/setup.h
new file mode 100644
index 000000000000..9fdbf33fcc33
--- /dev/null
+++ b/security/landlock/setup.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Security framework setup
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_SETUP_H
+#define _SECURITY_LANDLOCK_SETUP_H
+
+#include <linux/lsm_hooks.h>
+
+extern struct lsm_blob_sizes landlock_blob_sizes;
+
+#endif /* _SECURITY_LANDLOCK_SETUP_H */
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 02/12] landlock: Add ruleset and domain management
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes.  This enables quick update and lookup to match a requested access
e.g., to a file.  A ruleset is usable through a dedicated file
descriptor (cf. following commit implementing the syscall) which enables
a process to create and populate a ruleset with new rules.

A domain is a ruleset tied to a set of processes.  This group of rules
define the security policy enforced on these processes and their future
children.  A domain can transition to a new domain which is the
intersection of all its constraints and those of a ruleset provided by
the current process.  This modification only impact the current process.
This means that a process can only gain more constraints (i.e. lose
accesses) over time.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v18:
* Account rulesets to kmemcg.
* Remove struct holes.
* Cosmetic changes.

Changes since v17:
* Move include/uapi/linux/landlock.h and _LANDLOCK_ACCESS_FS_* to a
  following patch.

Changes since v16:
* Allow enforcement of empty ruleset, which enables deny-all policies.

Changes since v15:
* Replace layer_levels and layer_depth with a bitfield of layers, cf.
  filesystem commit.
* Rename the LANDLOCK_ACCESS_FS_{UNLINK,RMDIR} with
  LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} because it makes sense to use
  them for the action of renaming a file or a directory, which may lead
  to the removal of the source file or directory.  Removes the
  LANDLOCK_ACCESS_FS_{LINK_TO,RENAME_FROM,RENAME_TO} which are now
  replaced with LANDLOCK_ACCESS_FS_REMOVE_{FILE,DIR} and
  LANDLOCK_ACCESS_FS_MAKE_* .
* Update the documentation accordingly and highlight how the access
  rights are taken into account.
* Change nb_rules from atomic_t to u32 because it is not use anymore by
  show_fdinfo().
* Add safeguard for level variables types.
* Check max number of rules.
* Replace struct landlock_access (self and beneath bitfields) with one
  bitfield.
* Remove useless variable.
* Add comments.

Changes since v14:
* Simplify the object, rule and ruleset management at the expense of a
  less aggressive memory freeing (contributed by Jann Horn, with
  additional modifications):
  - Make a domain immutable (remove the opportunistic cleaning).
  - Remove RCU pointers.
  - Merge struct landlock_ref and struct landlock_ruleset_elem into
    landlock_rule: get ride of rule's RCU.
  - Adjust union.
  - Remove the landlock_insert_rule() check about a new object with the
    same address as a previously disabled one, because it is not
    possible to disable a rule anymore.
  Cf. https://lore.kernel.org/lkml/CAG48ez21bEn0wL1bbmTiiu8j9jP5iEWtHOwz4tURUJ+ki0ydYw@mail.gmail.com/
* Fix nested domains by implementing a notion of layer level and depth:
  - Update landlock_insert_rule() to manage such layers.
  - Add an inherit_ruleset() helper to properly create a new domain.
  - Rename landlock_find_access() to landlock_find_rule() and return a
    full rule reference.
  - Add a layer_level and a layer_depth fields to struct landlock_rule.
  - Add a top_layer_level field to struct landlock_ruleset.
* Remove access rights that may be required for FD-only requests:
  truncate, getattr, lock, chmod, chown, chgrp, ioctl.  This will be
  handle in a future evolution of Landlock, but right now the goal is to
  lighten the code to ease review.
* Remove LANDLOCK_ACCESS_FS_OPEN and rename
  LANDLOCK_ACCESS_FS_{READ,WRITE} with a FILE suffix.
* Rename LANDLOCK_ACCESS_FS_READDIR to match the *_FILE pattern.
* Remove LANDLOCK_ACCESS_FS_MAP which was useless.
* Fix memory leak in put_hierarchy() (reported by Jann Horn).
* Fix user-after-free and rename free_ruleset() (reported by Jann Horn).
* Replace the for loops with rbtree_postorder_for_each_entry_safe().
* Constify variables.
* Only use refcount_inc() through getter helpers.
* Change Landlock_insert_ruleset_access() to
  Landlock_insert_ruleset_rule().
* Rename landlock_put_ruleset_enqueue() to landlock_put_ruleset_deferred().
* Improve kernel documentation and add a warning about the unhandled
  access/syscall families.
* Move ABI check to syscall.c .

Changes since v13:
* New implementation, inspired by the previous inode eBPF map, but
  agnostic to the underlying kernel object.

Previous changes:
https://lore.kernel.org/lkml/20190721213116.23476-7-mic@digikod.net/
---
 MAINTAINERS                 |   1 +
 security/landlock/Makefile  |   2 +-
 security/landlock/ruleset.c | 342 ++++++++++++++++++++++++++++++++++++
 security/landlock/ruleset.h | 157 +++++++++++++++++
 4 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/ruleset.c
 create mode 100644 security/landlock/ruleset.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 4c229c961d0d..f2c2480d8590 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9636,6 +9636,7 @@ L:	linux-security-module@vger.kernel.org
 S:	Supported
 W:	https://landlock.io
 T:	git https://github.com/landlock-lsm/linux.git
+F:	include/uapi/linux/landlock.h
 F:	security/landlock/
 K:	landlock
 K:	LANDLOCK
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index cb6deefbf4c0..d846eba445bb 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := object.o
+landlock-y := object.o ruleset.o
diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
new file mode 100644
index 000000000000..f9ef8a6793e2
--- /dev/null
+++ b/security/landlock/ruleset.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#include <linux/bits.h>
+#include <linux/bug.h>
+#include <linux/compiler_types.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/limits.h>
+#include <linux/rbtree.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+
+#include "object.h"
+#include "ruleset.h"
+
+static struct landlock_ruleset *create_ruleset(void)
+{
+	struct landlock_ruleset *new_ruleset;
+
+	new_ruleset = kzalloc(sizeof(*new_ruleset), GFP_KERNEL_ACCOUNT);
+	if (!new_ruleset)
+		return ERR_PTR(-ENOMEM);
+	refcount_set(&new_ruleset->usage, 1);
+	mutex_init(&new_ruleset->lock);
+	/*
+	 * root = RB_ROOT
+	 * hierarchy = NULL
+	 * nb_rules = 0
+	 * nb_layers = 0
+	 * fs_access_mask = 0
+	 */
+	return new_ruleset;
+}
+
+struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask)
+{
+	struct landlock_ruleset *new_ruleset;
+
+	/* Informs about useless ruleset. */
+	if (!fs_access_mask)
+		return ERR_PTR(-ENOMSG);
+	new_ruleset = create_ruleset();
+	if (!IS_ERR(new_ruleset))
+		new_ruleset->fs_access_mask = fs_access_mask;
+	return new_ruleset;
+}
+
+static struct landlock_rule *duplicate_rule(struct landlock_rule *const src)
+{
+	struct landlock_rule *new_rule;
+
+	new_rule = kzalloc(sizeof(*new_rule), GFP_KERNEL_ACCOUNT);
+	if (!new_rule)
+		return ERR_PTR(-ENOMEM);
+	RB_CLEAR_NODE(&new_rule->node);
+	landlock_get_object(src->object);
+	new_rule->object = src->object;
+	new_rule->access = src->access;
+	new_rule->layers = src->layers;
+	return new_rule;
+}
+
+static void put_rule(struct landlock_rule *const rule)
+{
+	might_sleep();
+	if (!rule)
+		return;
+	landlock_put_object(rule->object);
+	kfree(rule);
+}
+
+/*
+ * Assumptions:
+ * - An inserted rule can not be removed.
+ * - The underlying kernel object must be held by the caller.
+ *
+ * @rule: Read-only payload to be inserted (not own by this function).
+ * @is_merge: If true, intersects access rights and updates the rule's layers
+ * (e.g. merge two rulesets), else do a union of access rights and keep the
+ * rule's layers (e.g. extend a ruleset)
+ */
+int landlock_insert_rule(struct landlock_ruleset *const ruleset,
+		struct landlock_rule *const rule, const bool is_merge)
+{
+	struct rb_node **walker_node;
+	struct rb_node *parent_node = NULL;
+	struct landlock_rule *new_rule;
+
+	might_sleep();
+	lockdep_assert_held(&ruleset->lock);
+	walker_node = &(ruleset->root.rb_node);
+	while (*walker_node) {
+		struct landlock_rule *const this = rb_entry(*walker_node,
+				struct landlock_rule, node);
+
+		if (this->object != rule->object) {
+			parent_node = *walker_node;
+			if (this->object < rule->object)
+				walker_node = &((*walker_node)->rb_right);
+			else
+				walker_node = &((*walker_node)->rb_left);
+			continue;
+		}
+
+		/* If there is a matching rule, updates it. */
+		if (is_merge) {
+			/* Intersects access rights. */
+			this->access &= rule->access;
+
+			/* Updates the rule layers with the next one. */
+			this->layers |= BIT_ULL(ruleset->nb_layers);
+		} else {
+			/* Extends access rights. */
+			this->access |= rule->access;
+		}
+		return 0;
+	}
+
+	/* There is no match for @rule->object. */
+	if (ruleset->nb_rules == U32_MAX)
+		return -E2BIG;
+	new_rule = duplicate_rule(rule);
+	if (IS_ERR(new_rule))
+		return PTR_ERR(new_rule);
+	if (is_merge)
+		/* Sets the rule layer to the next one. */
+		new_rule->layers = BIT_ULL(ruleset->nb_layers);
+	rb_link_node(&new_rule->node, parent_node, walker_node);
+	rb_insert_color(&new_rule->node, &ruleset->root);
+	ruleset->nb_rules++;
+	return 0;
+}
+
+static inline void get_hierarchy(struct landlock_hierarchy *const hierarchy)
+{
+	if (hierarchy)
+		refcount_inc(&hierarchy->usage);
+}
+
+static void put_hierarchy(struct landlock_hierarchy *hierarchy)
+{
+	while (hierarchy && refcount_dec_and_test(&hierarchy->usage)) {
+		const struct landlock_hierarchy *const freeme = hierarchy;
+
+		hierarchy = hierarchy->parent;
+		kfree(freeme);
+	}
+}
+
+static int merge_ruleset(struct landlock_ruleset *const dst,
+		struct landlock_ruleset *const src)
+{
+	struct landlock_rule *walker_rule, *next_rule;
+	int err = 0;
+
+	might_sleep();
+	if (!src)
+		return 0;
+	/* Only merge into a domain. */
+	if (WARN_ON_ONCE(!dst || !dst->hierarchy))
+		return -EFAULT;
+
+	mutex_lock(&dst->lock);
+	mutex_lock_nested(&src->lock, 1);
+	/*
+	 * Makes a new layer, but only increments the number of layers after
+	 * the rules are inserted.
+	 */
+	if (dst->nb_layers == sizeof(walker_rule->layers) * BITS_PER_BYTE) {
+		err = -E2BIG;
+		goto out_unlock;
+	}
+	dst->fs_access_mask |= src->fs_access_mask;
+
+	/* Merges the @src tree. */
+	rbtree_postorder_for_each_entry_safe(walker_rule, next_rule,
+			&src->root, node) {
+		err = landlock_insert_rule(dst, walker_rule, true);
+		if (err)
+			goto out_unlock;
+	}
+	dst->nb_layers++;
+
+out_unlock:
+	mutex_unlock(&src->lock);
+	mutex_unlock(&dst->lock);
+	return err;
+}
+
+static struct landlock_ruleset *inherit_ruleset(
+		struct landlock_ruleset *const parent)
+{
+	struct landlock_rule *walker_rule, *next_rule;
+	struct landlock_ruleset *new_ruleset;
+	int err = 0;
+
+	might_sleep();
+	new_ruleset = create_ruleset();
+	if (IS_ERR(new_ruleset))
+		return new_ruleset;
+
+	new_ruleset->hierarchy = kzalloc(sizeof(*new_ruleset->hierarchy),
+			GFP_KERNEL_ACCOUNT);
+	if (!new_ruleset->hierarchy) {
+		err = -ENOMEM;
+		goto out_put_ruleset;
+	}
+	refcount_set(&new_ruleset->hierarchy->usage, 1);
+	if (!parent)
+		return new_ruleset;
+
+	mutex_lock(&new_ruleset->lock);
+	mutex_lock_nested(&parent->lock, 1);
+	new_ruleset->nb_layers = parent->nb_layers;
+	new_ruleset->fs_access_mask = parent->fs_access_mask;
+	WARN_ON_ONCE(!parent->hierarchy);
+	get_hierarchy(parent->hierarchy);
+	new_ruleset->hierarchy->parent = parent->hierarchy;
+
+	/* Copies the @parent tree. */
+	rbtree_postorder_for_each_entry_safe(walker_rule, next_rule,
+			&parent->root, node) {
+		err = landlock_insert_rule(new_ruleset, walker_rule, false);
+		if (err)
+			goto out_unlock;
+	}
+	mutex_unlock(&parent->lock);
+	mutex_unlock(&new_ruleset->lock);
+	return new_ruleset;
+
+out_unlock:
+	mutex_unlock(&parent->lock);
+	mutex_unlock(&new_ruleset->lock);
+
+out_put_ruleset:
+	landlock_put_ruleset(new_ruleset);
+	return ERR_PTR(err);
+}
+
+static void free_ruleset(struct landlock_ruleset *const ruleset)
+{
+	struct landlock_rule *freeme, *next;
+
+	might_sleep();
+	rbtree_postorder_for_each_entry_safe(freeme, next, &ruleset->root,
+			node)
+		put_rule(freeme);
+	put_hierarchy(ruleset->hierarchy);
+	kfree(ruleset);
+}
+
+void landlock_put_ruleset(struct landlock_ruleset *const ruleset)
+{
+	might_sleep();
+	if (ruleset && refcount_dec_and_test(&ruleset->usage))
+		free_ruleset(ruleset);
+}
+
+static void free_ruleset_work(struct work_struct *const work)
+{
+	struct landlock_ruleset *ruleset;
+
+	ruleset = container_of(work, struct landlock_ruleset, work_free);
+	free_ruleset(ruleset);
+}
+
+void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset)
+{
+	if (ruleset && refcount_dec_and_test(&ruleset->usage)) {
+		INIT_WORK(&ruleset->work_free, free_ruleset_work);
+		schedule_work(&ruleset->work_free);
+	}
+}
+
+/*
+ * Creates a new transition domain, intersection of @parent and @ruleset, or
+ * return @parent if @ruleset is empty.  If @parent is empty, returns a
+ * duplicate of @ruleset.
+ */
+struct landlock_ruleset *landlock_merge_ruleset(
+		struct landlock_ruleset *const parent,
+		struct landlock_ruleset *const ruleset)
+{
+	struct landlock_ruleset *new_dom;
+	int err;
+
+	might_sleep();
+	/*
+	 * Merging duplicates a ruleset, so a new ruleset can't be
+	 * the same as the parent, but they can have similar content.
+	 */
+	if (WARN_ON_ONCE(!ruleset || parent == ruleset)) {
+		landlock_get_ruleset(parent);
+		return parent;
+	}
+
+	new_dom = inherit_ruleset(parent);
+	if (IS_ERR(new_dom))
+		return new_dom;
+
+	err = merge_ruleset(new_dom, ruleset);
+	if (err) {
+		landlock_put_ruleset(new_dom);
+		return ERR_PTR(err);
+	}
+	return new_dom;
+}
+
+/*
+ * The returned access has the same lifetime as @ruleset.
+ */
+const struct landlock_rule *landlock_find_rule(
+		const struct landlock_ruleset *const ruleset,
+		const struct landlock_object *const object)
+{
+	const struct rb_node *node;
+
+	if (!object)
+		return NULL;
+	node = ruleset->root.rb_node;
+	while (node) {
+		struct landlock_rule *this = rb_entry(node,
+				struct landlock_rule, node);
+
+		if (this->object == object)
+			return this;
+		if (this->object < object)
+			node = node->rb_right;
+		else
+			node = node->rb_left;
+	}
+	return NULL;
+}
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
new file mode 100644
index 000000000000..d5fcec4c1a17
--- /dev/null
+++ b/security/landlock/ruleset.h
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Landlock LSM - Ruleset management
+ *
+ * Copyright © 2016-2020 Mickaël Salaün <mic@digikod.net>
+ * Copyright © 2018-2020 ANSSI
+ */
+
+#ifndef _SECURITY_LANDLOCK_RULESET_H
+#define _SECURITY_LANDLOCK_RULESET_H
+
+#include <linux/mutex.h>
+#include <linux/rbtree.h>
+#include <linux/refcount.h>
+#include <linux/workqueue.h>
+
+#include "object.h"
+
+/**
+ * struct landlock_rule - Access rights tied to an object
+ *
+ * When enforcing a ruleset (i.e. merging a ruleset into the current domain),
+ * the layer level of a new rule is the incremented top layer level (cf.
+ * &struct landlock_ruleset).  If there is no rule (from this domain) tied to
+ * the same object, then the depth of the new rule is 1. However, if there is
+ * already a rule tied to the same object and if this rule's layer level is the
+ * previous top layer level, then the depth and the layer level are both
+ * incremented and the rule is updated with the new access rights (boolean
+ * AND).
+ */
+struct landlock_rule {
+	/**
+	 * @node: Node in the red-black tree.
+	 */
+	struct rb_node node;
+	/**
+	 * @object: Pointer to identify a kernel object (e.g. an inode).  This
+	 * is used as a key for this ruleset element.  This pointer is set once
+	 * and never modified.  It always point to an allocated object because
+	 * each rule increment the refcount of there object.
+	 */
+	struct landlock_object *object;
+	/**
+	 * @layers: Bitfield to identify the layers which resulted to @access
+	 * from different consecutive intersections.
+	 */
+	u64 layers;
+	/**
+	 * @access: Bitfield of allowed actions on the kernel object.  They are
+	 * relative to the object type (e.g. %LANDLOCK_ACTION_FS_READ).  This
+	 * may be the result of the merged access rights (boolean AND) from
+	 * multiple layers referring to the same object.
+	 */
+	u32 access;
+};
+
+/**
+ * struct landlock_hierarchy - Node in a ruleset hierarchy
+ */
+struct landlock_hierarchy {
+	/**
+	 * @parent: Pointer to the parent node, or NULL if it is a root Lanlock
+	 * domain.
+	 */
+	struct landlock_hierarchy *parent;
+	/**
+	 * @usage: Number of potential children domains plus their parent
+	 * domain.
+	 */
+	refcount_t usage;
+};
+
+/**
+ * struct landlock_ruleset - Landlock ruleset
+ *
+ * This data structure must contains unique entries, be updatable, and quick to
+ * match an object.
+ */
+struct landlock_ruleset {
+	/**
+	 * @root: Root of a red-black tree containing &struct landlock_rule
+	 * nodes.
+	 */
+	struct rb_root root;
+	/**
+	 * @hierarchy: Enables hierarchy identification even when a parent
+	 * domain vanishes.  This is needed for the ptrace protection.
+	 */
+	struct landlock_hierarchy *hierarchy;
+	union {
+		/**
+		 * @work_free: Enables to free a ruleset within a lockless
+		 * section.  This is only used by
+		 * landlock_put_ruleset_deferred() when @usage reaches zero.
+		 * The fields @lock, @usage, @nb_layers, @nb_rules and
+		 * @fs_access_mask are then unused.
+		 */
+		struct work_struct work_free;
+		struct {
+			/**
+			 * @lock: Guards against concurrent modifications of
+			 * @root, if @usage is greater than zero.
+			 */
+			struct mutex lock;
+			/**
+			 * @usage: Number of processes (i.e. domains) or file
+			 * descriptors referencing this ruleset.
+			 */
+			refcount_t usage;
+			/**
+			 * @nb_rules: Number of non-overlapping (i.e. not for
+			 * the same object) rules in this ruleset.
+			 */
+			u32 nb_rules;
+			/**
+			 * @nb_layers: Number of layers which are used in this
+			 * ruleset.  This enables to check that all the layers
+			 * allow an access request.  A value of 0 identify a
+			 * non-merged ruleset (i.e. not a domain).
+			 */
+			u32 nb_layers;
+			/**
+			 * @fs_access_mask: Contains the subset of filesystem
+			 * actions which are restricted by a ruleset.  This is
+			 * used when merging rulesets and for user space
+			 * backward compatibility (i.e. future-proof).  Set
+			 * once and never changed for the lifetime of the
+			 * ruleset.
+			 */
+			u32 fs_access_mask;
+		};
+	};
+};
+
+struct landlock_ruleset *landlock_create_ruleset(const u32 fs_access_mask);
+
+void landlock_put_ruleset(struct landlock_ruleset *const ruleset);
+void landlock_put_ruleset_deferred(struct landlock_ruleset *const ruleset);
+
+int landlock_insert_rule(struct landlock_ruleset *const ruleset,
+		struct landlock_rule *const rule, const bool is_merge);
+
+struct landlock_ruleset *landlock_merge_ruleset(
+		struct landlock_ruleset *const parent,
+		struct landlock_ruleset *const ruleset);
+
+const struct landlock_rule *landlock_find_rule(
+		const struct landlock_ruleset *const ruleset,
+		const struct landlock_object *const object);
+
+static inline void landlock_get_ruleset(struct landlock_ruleset *const ruleset)
+{
+	if (ruleset)
+		refcount_inc(&ruleset->usage);
+}
+
+#endif /* _SECURITY_LANDLOCK_RULESET_H */
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 12/12] landlock: Add user and kernel documentation
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

This documentation can be built with the Sphinx framework.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v15:
* Add current limitations.

Changes since v14:
* Fix spelling (contributed by Randy Dunlap).
* Extend documentation about inheritance and explain layer levels.
* Remove the use of now-removed access rights.
* Use GitHub links.
* Improve kernel documentation.
* Add section for tests.
* Update example.

Changes since v13:
* Rewrote the documentation according to the major revamp.

Previous changes:
https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
---
 Documentation/security/index.rst           |   1 +
 Documentation/security/landlock/index.rst  |  18 ++
 Documentation/security/landlock/kernel.rst |  69 ++++++
 Documentation/security/landlock/user.rst   | 268 +++++++++++++++++++++
 4 files changed, 356 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 8129405eb2cc..e3f2bf4fef77 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -16,3 +16,4 @@ Security Documentation
    siphash
    tpm/index
    digsig
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst
new file mode 100644
index 000000000000..2520f8f33f5e
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,18 @@
+=========================================
+Landlock LSM: unprivileged access control
+=========================================
+
+:Author: Mickaël Salaün
+
+The goal of Landlock is to enable to restrict ambient rights (e.g.  global
+filesystem access) for a set of processes.  Because Landlock is a stackable
+LSM, it makes possible to create safe security sandboxes as new security layers
+in addition to the existing system-wide access-controls. This kind of sandbox
+is expected to help mitigate the security impact of bugs or
+unexpected/malicious behaviors in user-space applications. Landlock empowers
+any process, including unprivileged ones, to securely restrict themselves.
+
+.. toctree::
+
+    user
+    kernel
diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst
new file mode 100644
index 000000000000..f8e2b8f5c824
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,69 @@
+==============================
+Landlock: kernel documentation
+==============================
+
+Landlock's goal is to create scoped access-control (i.e. sandboxing).  To
+harden a whole system, this feature should be available to any process,
+including unprivileged ones.  Because such process may be compromised or
+backdoored (i.e. untrusted), Landlock's features must be safe to use from the
+kernel and other processes point of view.  Landlock's interface must therefore
+expose a minimal attack surface.
+
+Landlock is designed to be usable by unprivileged processes while following the
+system security policy enforced by other access control mechanisms (e.g. DAC,
+LSM).  Indeed, a Landlock rule shall not interfere with other access-controls
+enforced on the system, only add more restrictions.
+
+Any user can enforce Landlock rulesets on their processes.  They are merged and
+evaluated according to the inherited ones in a way that ensures that only more
+constraints can be added.
+
+Guiding principles for safe access controls
+===========================================
+
+* A Landlock rule shall be focused on access control on kernel objects instead
+  of syscall filtering (i.e. syscall arguments), which is the purpose of
+  seccomp-bpf.
+* To avoid multiple kinds of side-channel attacks (e.g. leak of security
+  policies, CPU-based attacks), Landlock rules shall not be able to
+  programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
+  processes.
+* Computation related to Landlock operations (e.g. enforce a ruleset) shall
+  only impact the processes requesting them.
+
+Tests
+=====
+
+Userspace tests for backward compatibility, ptrace restrictions and filesystem
+support can be found here: `tools/testing/selftests/landlock/`_.
+
+Kernel structures
+=================
+
+Object
+------
+
+.. kernel-doc:: security/landlock/object.h
+    :identifiers:
+
+Ruleset and domain
+------------------
+
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks'
+credentials).  Each time a ruleset is enforced on a task, the current domain is
+duplicated and the ruleset is imported as a new layer of rules in the new
+domain.  Indeed, once in a domain, each rule is tied to a layer level.  To
+grant access to an object, at least one rule of each layer must allow the
+requested action on the object.  A task can then only transit to a new domain
+which is the intersection of the constraints from the current domain and those
+of a ruleset provided by the task.
+
+The definition of a subject is implicit for a task sandboxing itself, which
+makes the reasoning much easier and helps avoid pitfalls.
+
+.. kernel-doc:: security/landlock/ruleset.h
+    :identifiers:
+
+.. Links
+.. _tools/testing/selftests/landlock/: https://github.com/landlock-lsm/linux/tree/landlock-v19/tools/testing/selftests/landlock/
diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst
new file mode 100644
index 000000000000..5420f74d9158
--- /dev/null
+++ b/Documentation/security/landlock/user.rst
@@ -0,0 +1,268 @@
+=================================
+Landlock: userspace documentation
+=================================
+
+Landlock rules
+==============
+
+A Landlock rule enables to describe an action on an object.  An object is
+currently a file hierarchy, and the related filesystem actions are defined in
+`Access rights`_.  A set of rules is aggregated in a ruleset, which can then
+restrict the thread enforcing it, and its future children.
+
+Defining and enforcing a security policy
+----------------------------------------
+
+Before defining a security policy, an application should first probe for the
+features supported by the running kernel, which is important to be compatible
+with older kernels.  This can be done thanks to the `landlock` syscall (cf.
+:ref:`syscall`).
+
+.. code-block:: c
+
+    struct landlock_attr_features attr_features;
+
+    if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
+            sizeof(attr_features), &attr_features)) {
+        perror("Failed to probe the Landlock supported features");
+        return 1;
+    }
+
+Then, we need to create the ruleset that will contain our rules.  For this
+example, the ruleset will contain rules which only allow read actions, but
+write actions will be denied.  The ruleset then needs to handle both of these
+kind of actions.  To have a backward compatibility, these actions should be
+ANDed with the supported ones.
+
+.. code-block:: c
+
+    int ruleset_fd;
+    struct landlock_attr_ruleset ruleset = {
+        .handled_access_fs =
+            LANDLOCK_ACCESS_FS_EXECUTE |
+            LANDLOCK_ACCESS_FS_WRITE_FILE |
+            LANDLOCK_ACCESS_FS_READ_FILE |
+            LANDLOCK_ACCESS_FS_READ_DIR |
+            LANDLOCK_ACCESS_FS_REMOVE_DIR |
+            LANDLOCK_ACCESS_FS_REMOVE_FILE |
+            LANDLOCK_ACCESS_FS_MAKE_CHAR |
+            LANDLOCK_ACCESS_FS_MAKE_DIR |
+            LANDLOCK_ACCESS_FS_MAKE_REG |
+            LANDLOCK_ACCESS_FS_MAKE_SOCK |
+            LANDLOCK_ACCESS_FS_MAKE_FIFO |
+            LANDLOCK_ACCESS_FS_MAKE_BLOCK |
+            LANDLOCK_ACCESS_FS_MAKE_SYM,
+    };
+
+    ruleset.handled_access_fs &= attr_features.access_fs;
+    ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
+                    LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
+    if (ruleset_fd < 0) {
+        perror("Failed to create a ruleset");
+        return 1;
+    }
+
+We can now add a new rule to this ruleset thanks to the returned file
+descriptor referring to this ruleset.  The rule will only enable to read the
+file hierarchy ``/usr``.  Without another rule, write actions would then be
+denied by the ruleset.  To add ``/usr`` to the ruleset, we open it with the
+``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file
+descriptor.
+
+.. code-block:: c
+
+    int err;
+    struct landlock_attr_path_beneath path_beneath = {
+        .ruleset_fd = ruleset_fd,
+        .allowed_access =
+            LANDLOCK_ACCESS_FS_EXECUTE |
+            LANDLOCK_ACCESS_FS_READ_FILE |
+            LANDLOCK_ACCESS_FS_READ_DIR,
+    };
+
+    path_beneath.allowed_access &= attr_features.access_fs;
+    path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
+    if (path_beneath.parent_fd < 0) {
+        perror("Failed to open file");
+        close(ruleset_fd);
+        return 1;
+    }
+    err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
+            sizeof(path_beneath), &path_beneath);
+    close(path_beneath.parent_fd);
+    if (err) {
+        perror("Failed to update ruleset");
+        close(ruleset_fd);
+        return 1;
+    }
+
+We now have a ruleset with one rule allowing read access to ``/usr`` while
+denying all accesses featured in ``attr_features.access_fs`` to everything else
+on the filesystem.  The next step is to restrict the current thread from
+gaining more privileges (e.g. thanks to a SUID binary).
+
+.. code-block:: c
+
+    if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+        perror("Failed to restrict privileges");
+        close(ruleset_fd);
+        return 1;
+    }
+
+The current thread is now ready to sandbox itself with the ruleset.
+
+.. code-block:: c
+
+    struct landlock_attr_enforce attr_enforce = {
+        .ruleset_fd = ruleset_fd,
+    };
+
+    if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
+            sizeof(attr_enforce), &attr_enforce)) {
+        perror("Failed to enforce ruleset");
+        close(ruleset_fd);
+        return 1;
+    }
+    close(ruleset_fd);
+
+If the last `landlock` system call succeeds, the current thread is now
+restricted and this policy will be enforced on all its subsequently created
+children as well.  Once a thread is landlocked, there is no way to remove its
+security policy; only adding more restrictions is allowed.  These threads are
+now in a new Landlock domain, merge of their parent one (if any) with the new
+ruleset.
+
+Full working code can be found in `samples/landlock/sandboxer.c`_.
+
+Inheritance
+-----------
+
+Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain
+restrictions from its parent.  This is similar to the seccomp inheritance (cf.
+:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's
+:manpage:`credentials(7)`.  For instance, one process's thread may apply
+Landlock rules to itself, but they will not be automatically applied to other
+sibling threads (unlike POSIX thread credential changes, cf.
+:manpage:`nptl(7)`).
+
+When a thread sandbox itself, we have the grantee that the related security
+policy will stay enforced on all this thread's descendants.  This enables to
+create standalone and modular security policies per application, which will
+automatically be composed between themselves according to their runtime parent
+policies.
+
+Ptrace restrictions
+-------------------
+
+A sandboxed process has less privileges than a non-sandboxed process and must
+then be subject to additional restrictions when manipulating another process.
+To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
+process, a sandboxed process should have a subset of the target process rules,
+which means the tracee must be in a sub-domain of the tracer.
+
+.. _syscall:
+
+The `landlock` syscall and its arguments
+========================================
+
+.. kernel-doc:: security/landlock/syscall.c
+    :identifiers: sys_landlock
+
+Commands
+--------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :identifiers: landlock_cmd
+
+Options
+-------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :identifiers: options_intro
+                  options_get_features options_create_ruleset
+                  options_add_rule options_enforce_ruleset
+
+Attributes
+----------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :identifiers: landlock_attr_features landlock_attr_ruleset
+                  landlock_attr_path_beneath landlock_attr_enforce
+
+Access rights
+-------------
+
+.. kernel-doc:: include/uapi/linux/landlock.h
+    :identifiers: fs_access
+
+Current limitations
+===================
+
+File renaming and linking
+-------------------------
+
+Because Landlock targets unprivileged access controls, it is needed to properly
+handle composition of rules.  Such property also implies rules nesting.
+Properly handling multiple layers of ruleset, each one of them able to restrict
+access to files, also imply to inherit the ruleset restrictions from a parent
+to its hierarchy.  Because files are identified and restricted by their
+hierarchy, moving or linking a file from one directory to another imply to
+propagate the hierarchy constraints.  To protect against privilege escalations
+through renaming or linking, and for the sack of simplicity, Landlock currently
+limits linking and renaming to the same directory.  Future Landlock evolutions
+will enable more flexibility for renaming and linking, with dedicated ruleset
+options.
+
+OverlayFS
+---------
+
+An OverlayFS mount point consists of upper and lower layers.  It is currently
+not possible to reliably infer which underlying file hierarchy matches an
+OverlayFS path composed of such layers.  It is then not currently possible to
+track the source of an indirect access-request, and then not possible to
+properly identify and allow an unified OverlayFS hierarchy.  Restricting files
+in an OverlayFS mount point works, but files allowed in one layer may not be
+allowed in a related OverlayFS mount point.  A future Landlock evolution will
+make possible to properly work with OverlayFS, according to a dedicated ruleset
+option.
+
+
+Special filesystems
+-------------------
+
+Access to regular files and directories can be restricted by Landlock,
+according to the handled accesses of a ruleset.  However, files which do not
+come from a user-visible filesystem (e.g. pipe, socket), but can still be
+accessed through /proc/self/fd/, cannot currently be restricted.  Likewise,
+some special kernel filesystems such as nsfs which can be accessed through
+/proc/self/ns/, cannot currently be restricted.  For now, these kind of special
+paths are then always allowed.  Future Landlock evolutions will enable to
+restrict such paths, with dedicated ruleset options.
+
+Questions and answers
+=====================
+
+What about user space sandbox managers?
+---------------------------------------
+
+Using user space process to enforce restrictions on kernel resources can lead
+to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of
+the OS code and state
+<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
+
+What about namespaces and containers?
+-------------------------------------
+
+Namespaces can help create sandboxes but they are not designed for
+access-control and then miss useful features for such use case (e.g. no
+fine-grained restrictions).  Moreover, their complexity can lead to security
+issues, especially when untrusted processes can manipulate them (cf.
+`Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
+
+Additional documentation
+========================
+
+See https://landlock.io
+
+.. Links
+.. _samples/landlock/sandboxer.c: https://github.com/landlock-lsm/linux/tree/landlock-v19/samples/landlock/sandboxer.c
-- 
2.27.0


^ permalink raw reply related

* [PATCH v19 09/12] arch: Wire up landlock() syscall
From: Mickaël Salaün @ 2020-07-07 18:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mickaël Salaün, Al Viro, Andy Lutomirski, Anton Ivanov,
	Arnd Bergmann, Casey Schaufler, James Morris, Jann Horn,
	Jeff Dike, Jonathan Corbet, Kees Cook, Michael Kerrisk,
	Mickaël Salaün, Richard Weinberger, Serge E . Hallyn,
	Shuah Khan, Vincent Dagonneau, kernel-hardening, linux-api,
	linux-arch, linux-doc, linux-fsdevel, linux-kselftest,
	linux-security-module, x86
In-Reply-To: <20200707180955.53024-1-mic@digikod.net>

Wire up the landlock() system call for all architectures.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
---

Changes since v18:
* Increase the syscall number because of the new faccessat2(2).

Changes since v14:
* Add all architectures.

Changes since v13:
* New implementation.
---
 arch/alpha/kernel/syscalls/syscall.tbl      | 1 +
 arch/arm/tools/syscall.tbl                  | 1 +
 arch/arm64/include/asm/unistd.h             | 2 +-
 arch/arm64/include/asm/unistd32.h           | 2 ++
 arch/ia64/kernel/syscalls/syscall.tbl       | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl       | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl     | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl    | 1 +
 arch/s390/kernel/syscalls/syscall.tbl       | 1 +
 arch/sh/kernel/syscalls/syscall.tbl         | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl      | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl      | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl     | 1 +
 include/uapi/asm-generic/unistd.h           | 4 +++-
 19 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 5ddd128d4b7a..f11e690e0419 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -478,3 +478,4 @@
 547	common	openat2				sys_openat2
 548	common	pidfd_getfd			sys_pidfd_getfd
 549	common	faccessat2			sys_faccessat2
+550	common	landlock			sys_landlock
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index d5cae5ffede0..1ebac2411785 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -452,3 +452,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
 #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
 #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
 
-#define __NR_compat_syscalls		440
+#define __NR_compat_syscalls		441
 #endif
 
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f..083f7994d924 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_landlock 440
+__SYSCALL(__NR_landlock, sys_landlock)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 49e325b604b3..210692084aff 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -359,3 +359,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index f71b1bbcc198..31ed52871053 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -438,3 +438,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index edacc4561f2b..56a1d49884a5 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -444,3 +444,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f777141f5256..37457511334a 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -377,3 +377,4 @@
 437	n32	openat2				sys_openat2
 438	n32	pidfd_getfd			sys_pidfd_getfd
 439	n32	faccessat2			sys_faccessat2
+440	n32	landlock			sys_landlock
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index da8c76394e17..0c68b01f47d2 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -353,3 +353,4 @@
 437	n64	openat2				sys_openat2
 438	n64	pidfd_getfd			sys_pidfd_getfd
 439	n64	faccessat2			sys_faccessat2
+440	n64	landlock			sys_landlock
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 13280625d312..7e4ba2c41f8c 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -426,3 +426,4 @@
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
 439	o32	faccessat2			sys_faccessat2
+440	o32	landlock			sys_landlock
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index 5a758fa6ec52..40d14e1480cb 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index f833a3190822..1615462665e1 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -528,3 +528,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index bfdcb7633957..5f0835738e90 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 437  common	openat2			sys_openat2			sys_openat2
 438  common	pidfd_getfd		sys_pidfd_getfd			sys_pidfd_getfd
 439  common	faccessat2		sys_faccessat2			sys_faccessat2
+440  common	landlock		sys_landlock			sys_landlock
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index acc35daa1b79..6e7414763f10 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -441,3 +441,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index 8004a276cb74..b9996e1f70f0 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -484,3 +484,4 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed1..1673c287f2e6 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
 437	i386	openat2			sys_openat2
 438	i386	pidfd_getfd		sys_pidfd_getfd
 439	i386	faccessat2		sys_faccessat2
+440	i386	landlock		sys_landlock
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e137..b3bae66a2c7c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
 437	common	openat2			sys_openat2
 438	common	pidfd_getfd		sys_pidfd_getfd
 439	common	faccessat2		sys_faccessat2
+440	common	landlock		sys_landlock
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 69d0d73876b3..ea0f0b186e8e 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -409,3 +409,4 @@
 437	common	openat2				sys_openat2
 438	common	pidfd_getfd			sys_pidfd_getfd
 439	common	faccessat2			sys_faccessat2
+440	common	landlock			sys_landlock
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index f4a01305d9a6..a63a411a74d5 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2)
 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
 #define __NR_faccessat2 439
 __SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_landlock 440
+__SYSCALL(__NR_landlock, sys_landlock)
 
 #undef __NR_syscalls
-#define __NR_syscalls 440
+#define __NR_syscalls 441
 
 /*
  * 32 bit systems traditionally used different
-- 
2.27.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox