linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jan Kara <jack@suse.cz>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Seth Forshee <seth.forshee@canonical.com>,
	linux-fsdevel@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Djalal Harouni <tixxdz@gmail.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Andy Lutomirski <luto@amacapital.net>, Jann Horn <jann@thejh.net>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem
Date: Wed, 13 Jul 2016 11:34:36 +1000	[thread overview]
Message-ID: <20160713013436.GM1922@dastard> (raw)
In-Reply-To: <878tx8dowu.fsf@x220.int.ebiederm.org>

On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote:
> The place where I am concerned about thorough review and testing is
> someone poisoning quota files and then the kernel trying to use them.
> In the preliminary work we have done in other places in the kernel and
> for other filesystems there almost always winds up being some way to
> confuse the kernel and get it to misbave if you can poison the disk
> based inputs.  As poison disk based inputs is not something filesystems
> are stronlgy concerned about.  In most cases the disk the filesystem
> resides on is in the box and therefore under control of the OS at all
> times.  Dave Chinner has even said he will never consider handling
> poisoned disk based inputs for XFS as the run time cost is too high.

I didn't say that. I said that comprehensive checks to catch all
possible malicious inputs is too expensive to consider a viable
solution for allowing user-mounts of arbitrary filesystem images
through the kernel.

We already have runtime validation that bounds check most on-disk
fields when they are read - that deals with fuzz based poisoning
testing quite well and provides some protection against directed
structural attacks as well. IOWs, we already handle a large scope of
poisoned inputs safely, but it's not comprehensive because we can't
easily determine cross-object reference validity within the
format-determined limits.

e.g. we can check that the number of records in a btree block is
within the valid bounds of a block, but we cannot determine that the
record count has been incremented by 1 and a bogus record has been
inserted somewhere inside the block and the CRC recalculated to
match the modification. We can also check the records themselves for
being within bound (e.g. we can check a freespace record points to
block within range and of valid length) but we can't check that the
extent is actually free space. That requires doing a full filesystem
traversal to determine if the extent is actually free space or not.

Of course, we could look up the rmap tree (if we have one) to
determine if the space is actually used or not, but an attacker can
also insert/remove records in that tree, too, so if we can't trust
the free space tree, we can't trust the rmap tree either.
Hence we have to fall back to brute force validation if we
want to be certain that the metadata has not been tampered with.

To bring this back to quota files, the only way to validate that a
quota file has not been tampered with is to run a quotacheck on the
filesystem once it has been mounted. This requires visiting every
inode in the filesystem, so it an expensive operation. Only XFS has
this functionality in kernel, so for untrusted mounts we could
simply run it on every mount that has quotas enabled. Of course,
users won't care that mounting their filesystem now takes several
minutes (hours, even, when we have millions of inodes in the fs)
while these checks are run...

Detecting malicious corruptions that specifically manipulate the
on-disk structure within the bounds of format validity are difficult
to detect and costly to protect against. We'd need to move large
parts of fsck into the kernel and run it to validate every piece of
metadata read into the kernel. Then we've got a much larger attack
surface in the kernel (all the validity checking code needs to be
robust against invalid structures, too!), a lot more complexity
(more bugs!) and a lot of additional runtime overhead (slow
filesystem = unhappy users!). It's just not a practical solution to
the problem.

> Between actually finding issues that can cause problems, and the
> increased amount of kernel code executed (and thus the increase in
> kernel attack surface) I am very paranoid about enabling code that
> trusts data that could be poisoned data from a hostile party.
>
> At the same time I am very uncomfortable with the fact the kernel does
> not protect against malicious disks and poisoned disk images.  As
> poisoned disk images are a well known exploit vector in the wild.  A
> well known and demonstrated attack vector that works is to leave a usb
> stick in a public place, and helpful people will place it into their
> computer to try and figure out who it belongs to.  In trying to be
> helpful their computer will unbeknownst to them start executing code
> that does not serve the interests of the computer owner.  I hate that we
> can not currently protect people from shenanigans like that.

Yes, we know all about these problems. Unfortunately someone
appears to not be listening when they being repeatedly told that
hardening all the kernel filesystem implementations against poisoned
images is simply not a viable solution to the problem.

Move the parsing of untrusted structures out of the kernel - work
with the various filesystem teams to build viable FUSE
implementations (where it's much easier to incorporate parts of the
userspace fsck code) and provide the FUSE filesystems to container
users wanting to mount their own filesystem images.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2016-07-13  1:34 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-02 17:18 [PATCH review 0/11] General unprivileged mount support Eric W. Biederman
2016-07-02 17:20 ` [PATCH review 01/11] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 02/11] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 03/11] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 04/11] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 05/11] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 06/11] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 07/11] vfs: Don't create " Eric W. Biederman
2016-07-04  7:59     ` Jan Kara
2016-07-05 14:55       ` Eric W. Biederman
2016-07-06  9:07         ` Jan Kara
2016-07-06 15:37           ` Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 08/11] quota: Ensure qids map to the filesystem Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 09/11] quota: Handle quota data stored in s_user_ns Eric W. Biederman
2016-07-02 17:33     ` [PATCH v2 " Eric W. Biederman
2016-07-04  9:11       ` Jan Kara
2016-07-05 14:48         ` Seth Forshee
2016-07-05 15:34         ` Eric W. Biederman
2016-07-05 20:57           ` Dave Chinner
2016-07-05 21:28             ` Eric W. Biederman
2016-07-06  6:35               ` Dave Chinner
2016-07-06  8:25                 ` Jan Kara
2016-07-06 17:51                   ` Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 10/11] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman
2016-07-02 17:20   ` [PATCH review 11/11] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman
2016-07-04  8:52 ` [PATCH review 0/11] General unprivileged mount support Jan Kara
2016-07-04 16:27   ` Eric W. Biederman
2016-07-06  8:54     ` Jan Kara
2016-07-06 13:54       ` Seth Forshee
2016-07-06 14:22         ` Jan Kara
2016-07-06 14:46           ` Seth Forshee
2016-07-06 15:01           ` Eric W. Biederman
2016-07-06 15:23           ` James Bottomley
2016-07-06 16:35             ` Eric W. Biederman
2016-07-06 13:44 ` Andy Lutomirski
2016-07-06 15:21   ` Eric W. Biederman
2016-07-06 14:01 ` Andy Lutomirski
2016-07-06 15:19   ` Eric W. Biederman
2016-07-06 18:10 ` [PATCH review 0/12] General unprivileged mount support v2 Eric W. Biederman
2016-07-06 18:12   ` [PATCH review 01/12] fs: Refuse uid/gid changes which don't map into s_user_ns Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 02/12] userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 03/12] vfs: Verify acls are valid within superblock's s_user_ns Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 04/12] fs: Check for invalid i_uid in may_follow_link() Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 05/12] cred: Reject inodes with invalid ids in set_create_file_as() Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 06/12] vfs: Don't modify inodes with a uid or gid unknown to the vfs Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 07/12] vfs: Don't create " Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 08/12] quota: Ensure qids map to the filesystem Eric W. Biederman
2016-07-11 10:14       ` Jan Kara
2016-07-11 18:12         ` Eric W. Biederman
2016-07-13  1:34           ` Dave Chinner [this message]
2016-07-13  3:45             ` Dave Chinner
2016-07-13  5:43             ` Jann Horn
2016-07-14 17:03               ` Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 09/12] quota: Handle quota data stored in s_user_ns in quota_setxquota Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 10/12] dquot: For now explicitly don't support filesystems outside of init_user_ns Eric W. Biederman
2016-07-11 10:09       ` Jan Kara
2016-07-06 18:12     ` [PATCH review 11/12] evm: Translate user/group ids relative to s_user_ns when computing HMAC Eric W. Biederman
2016-07-06 18:12     ` [PATCH review 12/12] fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160713013436.GM1922@dastard \
    --to=david@fromorbit.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=jack@suse.cz \
    --cc=jann@thejh.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtk.manpages@gmail.com \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=tixxdz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).