From: Al Viro <viro@ZenIV.linux.org.uk>
To: Kees Cook <keescook@chromium.org>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>,
Willem de Bruijn <willemb@google.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Miller <davem@davemloft.net>,
LKML <linux-kernel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Thomas Garnier <thgarnie@google.com>,
Jann Horn <jannh@google.com>
Subject: Re: netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
Date: Fri, 1 Dec 2017 17:39:41 +0000 [thread overview]
Message-ID: <20171201173941.GP21978@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20171201045439.GO21978@ZenIV.linux.org.uk>
On Fri, Dec 01, 2017 at 04:54:39AM +0000, Al Viro wrote:
> On Fri, Dec 01, 2017 at 03:48:59AM +0000, Al Viro wrote:
>
> > Something similar to get_prog_path_type() above might make for a usable
> > primitive, IMO...
>
> Incidentally, bpf_obj_get_user()/bpf_obj_do_get() should just use
> user_path(), rather than wanking with getname()+kern_path(pname->name)+putname().
> Note that kern_path() will do getname_kernel() to get struct pathname...
>
> Would cause problems for tracepoints in there, though. And that, BTW,
> is precisely why I don't want tracepoints in core VFS, TYVM - makes
> restructuring the code harder...
Egads... Contortions in bpf ->mknod() are really obnoxious.
First of all, it checks that ->d_fsdata is non-NULL and fails otherwise.
The only time ->d_fsdata gets non-NULL on that fs? In bpf_obj_do_pin(), this:
dentry->d_fsdata = raw;
ret = vfs_mknod(dir, dentry, mode, devt);
dentry->d_fsdata = NULL;
In other words, it's *not* going to work from normal mknod(2). Why go through
->mknod(), then, especially since it requires that kind of contortions to
pass the data in?
devt is 0:1 or 0:2 here. mode? Character or block device, right? Like hell -
it's a regular file. And devt is a cute way to pass a flag down into bpf_mkobj()
(aka. ->mknod()) through vfs_mknod(). No, it doesn't go into ->i_rdev...
And to make the things even more fun, the damn thing is passed to a couple
of Linux S&M hooks - security_path_mknod() and security_inode_mknod(). Oh, sorry -
three hooks. There's devcgroup_inode_mknod() as well, but that thing sees S_IFREG
in mode and buggers off quietly. Our esteemed sadomaso^Wsecurity community gets
to play, though. Without any way to see _what_ are we attaching to that place in
the bpf fs tree, but hey - it's security, it doesn't need to make sense...
What the hell? If you need a clean way to do something, why don't you describe
(on fsdevel, or in off-list mail to relevant people) what do you really want?
Sure, you can "work around" anything, but doesn't that level of perversion
strike you as a clear sign of something being not right?
For crying out loud, you are trying to pass a tagged pointer to one or another
kind of object into your own function. For that you
* use a field in a globally visible data structure as a temporary storage
for a pointer
* encode your tag (essentially a boolean) into a fucking _device_ _number_,
of all things, and shove it through, hoping that no LSM module gets weirded out by
non-zero device number combined with regular file for mode.
If that does not scream "wrong or missing primitive", I don't know what would.
You want something along the lines of "create a filesystem object at given
location, calling this function with this argument for actual object creation"?
Fair enough, but then let's add a primitive that would do just that.
And grepping around for similar sick tricks catches a slightly milder example -
mq_open(2) doesn't play with encoding stuff into dev_t, but otherwise it's very
similar and could also benefit from the same primitive.
How about something like this:
int vfs_mkobj(struct dentry *dentry, umode_t mode,
int (*f)(struct dentry *, umode_t, void *),
void *arg)
{
struct inode *dir = dentry->d_parent->d_inode;
int error = may_create(dir, dentry);
if (error)
return error;
mode &= S_IALLUGO;
mode |= S_IFREG;
error = security_inode_create(dir, dentry, mode);
if (error)
return error;
error = f(dentry, mode, arg);
if (!error)
fsnotify_create(dir, dentry);
return error;
}
exported by fs/namei.c, with your code doing
switch (type) {
case BPF_TYPE_PROG:
error = vfs_mkobj(path.dentry, mode, bpf_mkprog, raw);
break;
case BPF_TYPE_MAP:
error = vfs_mkobj(path.dentry, mode, bpf_mkmap, raw);
break;
default:
error = -EPERM;
}
instead that vfs_mknod() hack, with
static int bpf_mkprog(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw)
{
return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_prog_iops);
}
static int bpf_mkmap(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw)
{
return bpf_mkobj_ops(dir, dentry, mode, raw, &bpf_map_iops);
}
static int bpf_mkobj_ops(struct inode *dir, struct dentry *dentry,
umode_t mode, void *raw, struct inode_operations *iops)
{
struct inode *inode;
inode = bpf_get_inode(dir->i_sb, dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = iops;
inode->i_private = raw;
bpf_dentry_finalize(dentry, inode, dir);
return 0;
}
And to hell with messing with dev_t, ->d_fsdata or having ->mknod() there at all...
Might want to replace security_path_mknod() with something saner, while we are
at it.
Objections?
PS: mqueue.c would also benefit from such primitive - do_create() there would
simply pass attr as callback's argument into vfs_mkobj(), with callback being
the guts of mqueue_create()...
next prev parent reply other threads:[~2017-12-01 17:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-01 0:57 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Kees Cook
2017-12-01 1:33 ` Al Viro
2017-12-01 3:48 ` Al Viro
2017-12-01 4:54 ` Al Viro
2017-12-01 17:39 ` Al Viro [this message]
2017-12-01 20:47 ` Daniel Borkmann
2017-12-02 18:48 ` Al Viro
2017-12-02 22:08 ` Al Viro
2017-12-03 4:22 ` Willem de Bruijn
2017-12-04 9:57 ` Daniel Borkmann
[not found] ` <CA+55aFx4WEm5Feu7S8Z_73Gfsym6aBFpT3iGZXS5QyMQvgkWtA@mail.gmail.com>
2017-12-01 20:13 ` Daniel Borkmann
2017-12-01 21:34 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171201173941.GP21978@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hch@infradead.org \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=shmulik.ladkani@gmail.com \
--cc=thgarnie@google.com \
--cc=torvalds@linux-foundation.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.