Re: [PATCH 0/8] loopfs - Dmitry Vyukov

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dmitry Vyukov <dvyukov@google.com>
To: "Stéphane Graber" <stgraber@ubuntu.com>
Cc: Jann Horn <jannh@google.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Jens Axboe <axboe@kernel.dk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Jonathan Corbet <corbet@lwn.net>, Serge Hallyn <serge@hallyn.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Saravana Kannan <saravanak@google.com>, Jan Kara <jack@suse.cz>,
	David Howells <dhowells@redhat.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	David Rheinsberg <david.rheinsberg@gmail.com>,
	Tom Gundersen <teg@jklm.no>,
	Christian Kellner <ckellner@redhat.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Matthew Garrett <mjg59@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	syzkaller <syzkaller@googlegroups.com>
Subject: Re: [PATCH 0/8] loopfs
Date: Thu, 9 Apr 2020 09:02:54 +0200	[thread overview]
Message-ID: <CACT4Y+aDeSAARG0b9FjDFyWuhjb=YVxpGtsvBmoKnHo+0TF4gA@mail.gmail.com> (raw)
In-Reply-To: <CA+enf=uhTi1yWtOe+iuv2FvdZzo69pwsP-NNU2775jN01aDcVQ@mail.gmail.com>

On Wed, Apr 8, 2020 at 6:41 PM Stéphane Graber <stgraber@ubuntu.com> wrote:
>
> On Wed, Apr 8, 2020 at 12:24 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Apr 8, 2020 at 5:23 PM Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> > > One of the use-cases for loopfs is to allow to dynamically allocate loop
> > > devices in sandboxed workloads without exposing /dev or
> > > /dev/loop-control to the workload in question and without having to
> > > implement a complex and also racy protocol to send around file
> > > descriptors for loop devices. With loopfs each mount is a new instance,
> > > i.e. loop devices created in one loopfs instance are independent of any
> > > loop devices created in another loopfs instance. This allows
> > > sufficiently privileged tools to have their own private stash of loop
> > > device instances. Dmitry has expressed his desire to use this for
> > > syzkaller in a private discussion. And various parties that want to use
> > > it are Cced here too.
> > >
> > > In addition, the loopfs filesystem can be mounted by user namespace root
> > > and is thus suitable for use in containers. Combined with syscall
> > > interception this makes it possible to securely delegate mounting of
> > > images on loop devices, i.e. when a user calls mount -o loop <image>
> > > <mountpoint> it will be possible to completely setup the loop device.
> > > The final mount syscall to actually perform the mount will be handled
> > > through syscall interception and be performed by a sufficiently
> > > privileged process. Syscall interception is already supported through a
> > > new seccomp feature we implemented in [1] and extended in [2] and is
> > > actively used in production workloads. The additional loopfs work will
> > > be used there and in various other workloads too. You'll find a short
> > > illustration how this works with syscall interception below in [4].
> >
> > Would that privileged process then allow you to mount your filesystem
> > images with things like ext4? As far as I know, the filesystem
> > maintainers don't generally consider "untrusted filesystem image" to
> > be a strongly enforced security boundary; and worse, if an attacker
> > has access to a loop device from which something like ext4 is mounted,
> > things like "struct ext4_dir_entry_2" will effectively be in shared
> > memory, and an attacker can trivially bypass e.g.
> > ext4_check_dir_entry(). At the moment, that's not a huge problem (for
> > anything other than kernel lockdown) because only root normally has
> > access to loop devices.
> >
> > Ubuntu carries an out-of-tree patch that afaik blocks the shared
> > memory thing: <https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/commit?id=4bc428fdf5500b7366313f166b7c9c50ee43f2c4>
> >
> > But even with that patch, I'm not super excited about exposing
> > filesystem image parsing attack surface to containers unless you run
> > the filesystem in a sandboxed environment (at which point you don't
> > need a loop device anymore either).
>
> So in general we certainly agree that you should never expose someone
> that you wouldn't trust with root on the host to syscall interception
> mounting of real kernel filesystems.
>
> But that's not all that our syscall interception logic can do. We have
> support for rewriting a normal filesystem mount attempt to instead use
> an available FUSE implementation. As far as the user is concerned,
> they ran "mount /dev/sdaX /mnt" and got that ext4 filesystem mounted
> on /mnt as requested, except that the container manager intercepted
> the mount attempt and instead spawned fuse2fs for that mount. This
> requires absolutely no change to the software the user is running.
>
> loopfs, with that interception mode, will let us also handle all cases
> where a loop would be used, similarly without needing any change to
> the software being run. If a piece of software calls the command
> "mount -o loop blah.img /mnt", the "mount" command will setup a loop
> device as it normally would (doing so through loopfs) and then will
> call the "mount" syscall, which will get intercepted and redirected to
> a FUSE implementation if so configured, resulting in the expected
> filesystem being mounted for the user.
>
> LXD with syscall interception offers both straight up privileged
> mounting using the kernel fs or using a FUSE based implementation.
> This is configurable on a per-filesystem and per-container basis.
>
> I hope that clarifies what we're doing here :)
>
> Stéphane


Hi Christian,

Our use case for loopfs in syzkaller would be isolation of several
test processes from each other.
Currently all loop devices and loop-control are global and cause test
processes to collide, which in turn causes non-reproducible coverage
and non-reproducible crashes. Ideally we give each test process its
own loopfs instance.

     prev parent reply	other threads:[~2020-04-09  7:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 15:21 [PATCH 0/8] loopfs Christian Brauner
2020-04-08 15:21 ` [PATCH 1/8] kobject_uevent: remove unneeded netlink_ns check Christian Brauner
2020-04-08 15:21 ` [PATCH 2/8] loopfs: implement loopfs Christian Brauner
2020-04-09  5:39   ` David Rheinsberg
2020-04-09  8:26     ` Christian Brauner
2020-04-12 10:38       ` David Rheinsberg
2020-04-12 12:03         ` Christian Brauner
2020-04-12 13:04           ` Christian Brauner
2020-04-12 13:44           ` David Rheinsberg
2020-04-09  7:53   ` Christoph Hellwig
2020-04-09  8:33     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 3/8] loop: use ns_capable for some loop operations Christian Brauner
2020-04-08 15:21 ` [PATCH 4/8] kernfs: handle multiple namespace tags Christian Brauner
2020-04-13 18:46   ` Tejun Heo
2020-04-08 15:21 ` [PATCH 5/8] kernfs: let objects opt-in to propagating from the initial namespace Christian Brauner
2020-04-13 19:02   ` Tejun Heo
2020-04-13 19:39     ` Christian Brauner
2020-04-13 19:45       ` Tejun Heo
2020-04-13 19:59         ` Christian Brauner
2020-04-13 20:37           ` Tejun Heo
2020-04-14 10:39             ` Christian Brauner
2020-04-08 15:21 ` [PATCH 6/8] genhd: add minimal namespace infrastructure Christian Brauner
2020-04-13 19:04   ` Tejun Heo
2020-04-13 19:42     ` Christian Brauner
2020-04-08 15:21 ` [PATCH 7/8] loopfs: start attaching correct namespace during loop_add() Christian Brauner
2020-04-08 15:21 ` [PATCH 8/8] loopfs: only show devices in their correct instance Christian Brauner
2020-04-08 16:24 ` [PATCH 0/8] loopfs Jann Horn
2020-04-08 16:41   ` Stéphane Graber
2020-04-09  7:02     ` Dmitry Vyukov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+aDeSAARG0b9FjDFyWuhjb=YVxpGtsvBmoKnHo+0TF4gA@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=axboe@kernel.dk \
    --cc=christian.brauner@ubuntu.com \
    --cc=ckellner@redhat.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=david.rheinsberg@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjg59@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=saravanak@google.com \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=stgraber@ubuntu.com \
    --cc=syzkaller@googlegroups.com \
    --cc=teg@jklm.no \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).