Re: [PATCH 0/8] loopfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 0/8] loopfs
       [not found] <20200408152151.5780-1-christian.brauner@ubuntu.com>
@ 2020-04-08 16:24 ` Jann Horn
  2020-04-08 16:41   ` Stéphane Graber
  0 siblings, 1 reply; 3+ messages in thread
From: Jann Horn @ 2020-04-08 16:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jens Axboe, Greg Kroah-Hartman, kernel list, linux-block,
	Linux API, Jonathan Corbet, Serge Hallyn, Rafael J. Wysocki,
	Tejun Heo, David S. Miller, Saravana Kannan, Jan Kara,
	David Howells, Seth Forshee, David Rheinsberg, Tom Gundersen,
	Christian Kellner, Dmitry Vyukov, Stéphane Graber, linux-doc,
	Network Development, Matthew Garrett, linux-fsdevel

On Wed, Apr 8, 2020 at 5:23 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> One of the use-cases for loopfs is to allow to dynamically allocate loop
> devices in sandboxed workloads without exposing /dev or
> /dev/loop-control to the workload in question and without having to
> implement a complex and also racy protocol to send around file
> descriptors for loop devices. With loopfs each mount is a new instance,
> i.e. loop devices created in one loopfs instance are independent of any
> loop devices created in another loopfs instance. This allows
> sufficiently privileged tools to have their own private stash of loop
> device instances. Dmitry has expressed his desire to use this for
> syzkaller in a private discussion. And various parties that want to use
> it are Cced here too.
>
> In addition, the loopfs filesystem can be mounted by user namespace root
> and is thus suitable for use in containers. Combined with syscall
> interception this makes it possible to securely delegate mounting of
> images on loop devices, i.e. when a user calls mount -o loop <image>
> <mountpoint> it will be possible to completely setup the loop device.
> The final mount syscall to actually perform the mount will be handled
> through syscall interception and be performed by a sufficiently
> privileged process. Syscall interception is already supported through a
> new seccomp feature we implemented in [1] and extended in [2] and is
> actively used in production workloads. The additional loopfs work will
> be used there and in various other workloads too. You'll find a short
> illustration how this works with syscall interception below in [4].

Would that privileged process then allow you to mount your filesystem
images with things like ext4? As far as I know, the filesystem
maintainers don't generally consider "untrusted filesystem image" to
be a strongly enforced security boundary; and worse, if an attacker
has access to a loop device from which something like ext4 is mounted,
things like "struct ext4_dir_entry_2" will effectively be in shared
memory, and an attacker can trivially bypass e.g.
ext4_check_dir_entry(). At the moment, that's not a huge problem (for
anything other than kernel lockdown) because only root normally has
access to loop devices.

Ubuntu carries an out-of-tree patch that afaik blocks the shared
memory thing: <https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/commit?id=4bc428fdf5500b7366313f166b7c9c50ee43f2c4>

But even with that patch, I'm not super excited about exposing
filesystem image parsing attack surface to containers unless you run
the filesystem in a sandboxed environment (at which point you don't
need a loop device anymore either).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/8] loopfs
  2020-04-08 16:24 ` [PATCH 0/8] loopfs Jann Horn
@ 2020-04-08 16:41   ` Stéphane Graber
  2020-04-09  7:02     ` Dmitry Vyukov
  0 siblings, 1 reply; 3+ messages in thread
From: Stéphane Graber @ 2020-04-08 16:41 UTC (permalink / raw)
  To: Jann Horn
  Cc: Christian Brauner, Jens Axboe, Greg Kroah-Hartman, kernel list,
	linux-block, Linux API, Jonathan Corbet, Serge Hallyn,
	Rafael J. Wysocki, Tejun Heo, David S. Miller, Saravana Kannan,
	Jan Kara, David Howells, Seth Forshee, David Rheinsberg,
	Tom Gundersen, Christian Kellner, Dmitry Vyukov, linux-doc,
	Network Development, Matthew Garrett, linux-fsdevel

On Wed, Apr 8, 2020 at 12:24 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Apr 8, 2020 at 5:23 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> > One of the use-cases for loopfs is to allow to dynamically allocate loop
> > devices in sandboxed workloads without exposing /dev or
> > /dev/loop-control to the workload in question and without having to
> > implement a complex and also racy protocol to send around file
> > descriptors for loop devices. With loopfs each mount is a new instance,
> > i.e. loop devices created in one loopfs instance are independent of any
> > loop devices created in another loopfs instance. This allows
> > sufficiently privileged tools to have their own private stash of loop
> > device instances. Dmitry has expressed his desire to use this for
> > syzkaller in a private discussion. And various parties that want to use
> > it are Cced here too.
> >
> > In addition, the loopfs filesystem can be mounted by user namespace root
> > and is thus suitable for use in containers. Combined with syscall
> > interception this makes it possible to securely delegate mounting of
> > images on loop devices, i.e. when a user calls mount -o loop <image>
> > <mountpoint> it will be possible to completely setup the loop device.
> > The final mount syscall to actually perform the mount will be handled
> > through syscall interception and be performed by a sufficiently
> > privileged process. Syscall interception is already supported through a
> > new seccomp feature we implemented in [1] and extended in [2] and is
> > actively used in production workloads. The additional loopfs work will
> > be used there and in various other workloads too. You'll find a short
> > illustration how this works with syscall interception below in [4].
>
> Would that privileged process then allow you to mount your filesystem
> images with things like ext4? As far as I know, the filesystem
> maintainers don't generally consider "untrusted filesystem image" to
> be a strongly enforced security boundary; and worse, if an attacker
> has access to a loop device from which something like ext4 is mounted,
> things like "struct ext4_dir_entry_2" will effectively be in shared
> memory, and an attacker can trivially bypass e.g.
> ext4_check_dir_entry(). At the moment, that's not a huge problem (for
> anything other than kernel lockdown) because only root normally has
> access to loop devices.
>
> Ubuntu carries an out-of-tree patch that afaik blocks the shared
> memory thing: <https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/commit?id=4bc428fdf5500b7366313f166b7c9c50ee43f2c4>
>
> But even with that patch, I'm not super excited about exposing
> filesystem image parsing attack surface to containers unless you run
> the filesystem in a sandboxed environment (at which point you don't
> need a loop device anymore either).

So in general we certainly agree that you should never expose someone
that you wouldn't trust with root on the host to syscall interception
mounting of real kernel filesystems.

But that's not all that our syscall interception logic can do. We have
support for rewriting a normal filesystem mount attempt to instead use
an available FUSE implementation. As far as the user is concerned,
they ran "mount /dev/sdaX /mnt" and got that ext4 filesystem mounted
on /mnt as requested, except that the container manager intercepted
the mount attempt and instead spawned fuse2fs for that mount. This
requires absolutely no change to the software the user is running.

loopfs, with that interception mode, will let us also handle all cases
where a loop would be used, similarly without needing any change to
the software being run. If a piece of software calls the command
"mount -o loop blah.img /mnt", the "mount" command will setup a loop
device as it normally would (doing so through loopfs) and then will
call the "mount" syscall, which will get intercepted and redirected to
a FUSE implementation if so configured, resulting in the expected
filesystem being mounted for the user.

LXD with syscall interception offers both straight up privileged
mounting using the kernel fs or using a FUSE based implementation.
This is configurable on a per-filesystem and per-container basis.

I hope that clarifies what we're doing here :)

Stéphane

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/8] loopfs
  2020-04-08 16:41   ` Stéphane Graber
@ 2020-04-09  7:02     ` Dmitry Vyukov
  0 siblings, 0 replies; 3+ messages in thread
From: Dmitry Vyukov @ 2020-04-09  7:02 UTC (permalink / raw)
  To: Stéphane Graber
  Cc: Jann Horn, Christian Brauner, Jens Axboe, Greg Kroah-Hartman,
	kernel list, linux-block, Linux API, Jonathan Corbet,
	Serge Hallyn, Rafael J. Wysocki, Tejun Heo, David S. Miller,
	Saravana Kannan, Jan Kara, David Howells, Seth Forshee,
	David Rheinsberg, Tom Gundersen, Christian Kellner,
	open list:DOCUMENTATION, Network Development, Matthew Garrett,
	linux-fsdevel, syzkaller

On Wed, Apr 8, 2020 at 6:41 PM Stéphane Graber <stgraber@ubuntu.com> wrote:
>
> On Wed, Apr 8, 2020 at 12:24 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Apr 8, 2020 at 5:23 PM Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> > > One of the use-cases for loopfs is to allow to dynamically allocate loop
> > > devices in sandboxed workloads without exposing /dev or
> > > /dev/loop-control to the workload in question and without having to
> > > implement a complex and also racy protocol to send around file
> > > descriptors for loop devices. With loopfs each mount is a new instance,
> > > i.e. loop devices created in one loopfs instance are independent of any
> > > loop devices created in another loopfs instance. This allows
> > > sufficiently privileged tools to have their own private stash of loop
> > > device instances. Dmitry has expressed his desire to use this for
> > > syzkaller in a private discussion. And various parties that want to use
> > > it are Cced here too.
> > >
> > > In addition, the loopfs filesystem can be mounted by user namespace root
> > > and is thus suitable for use in containers. Combined with syscall
> > > interception this makes it possible to securely delegate mounting of
> > > images on loop devices, i.e. when a user calls mount -o loop <image>
> > > <mountpoint> it will be possible to completely setup the loop device.
> > > The final mount syscall to actually perform the mount will be handled
> > > through syscall interception and be performed by a sufficiently
> > > privileged process. Syscall interception is already supported through a
> > > new seccomp feature we implemented in [1] and extended in [2] and is
> > > actively used in production workloads. The additional loopfs work will
> > > be used there and in various other workloads too. You'll find a short
> > > illustration how this works with syscall interception below in [4].
> >
> > Would that privileged process then allow you to mount your filesystem
> > images with things like ext4? As far as I know, the filesystem
> > maintainers don't generally consider "untrusted filesystem image" to
> > be a strongly enforced security boundary; and worse, if an attacker
> > has access to a loop device from which something like ext4 is mounted,
> > things like "struct ext4_dir_entry_2" will effectively be in shared
> > memory, and an attacker can trivially bypass e.g.
> > ext4_check_dir_entry(). At the moment, that's not a huge problem (for
> > anything other than kernel lockdown) because only root normally has
> > access to loop devices.
> >
> > Ubuntu carries an out-of-tree patch that afaik blocks the shared
> > memory thing: <https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/commit?id=4bc428fdf5500b7366313f166b7c9c50ee43f2c4>
> >
> > But even with that patch, I'm not super excited about exposing
> > filesystem image parsing attack surface to containers unless you run
> > the filesystem in a sandboxed environment (at which point you don't
> > need a loop device anymore either).
>
> So in general we certainly agree that you should never expose someone
> that you wouldn't trust with root on the host to syscall interception
> mounting of real kernel filesystems.
>
> But that's not all that our syscall interception logic can do. We have
> support for rewriting a normal filesystem mount attempt to instead use
> an available FUSE implementation. As far as the user is concerned,
> they ran "mount /dev/sdaX /mnt" and got that ext4 filesystem mounted
> on /mnt as requested, except that the container manager intercepted
> the mount attempt and instead spawned fuse2fs for that mount. This
> requires absolutely no change to the software the user is running.
>
> loopfs, with that interception mode, will let us also handle all cases
> where a loop would be used, similarly without needing any change to
> the software being run. If a piece of software calls the command
> "mount -o loop blah.img /mnt", the "mount" command will setup a loop
> device as it normally would (doing so through loopfs) and then will
> call the "mount" syscall, which will get intercepted and redirected to
> a FUSE implementation if so configured, resulting in the expected
> filesystem being mounted for the user.
>
> LXD with syscall interception offers both straight up privileged
> mounting using the kernel fs or using a FUSE based implementation.
> This is configurable on a per-filesystem and per-container basis.
>
> I hope that clarifies what we're doing here :)
>
> Stéphane


Hi Christian,

Our use case for loopfs in syzkaller would be isolation of several
test processes from each other.
Currently all loop devices and loop-control are global and cause test
processes to collide, which in turn causes non-reproducible coverage
and non-reproducible crashes. Ideally we give each test process its
own loopfs instance.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-09  7:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20200408152151.5780-1-christian.brauner@ubuntu.com>
2020-04-08 16:24 ` [PATCH 0/8] loopfs Jann Horn
2020-04-08 16:41   ` Stéphane Graber
2020-04-09  7:02     ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).