public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: Sargun Dhillon <sargun@sargun.me>
Cc: Kees Cook <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	linux-security-module <linux-security-module@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	"Reshetova, Elena" <elena.reshetova@intel.com>
Subject: Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM
Date: Mon, 15 Aug 2016 12:59:13 +0200	[thread overview]
Message-ID: <57B1A081.9030209@digikod.net> (raw)
In-Reply-To: <20160815030952.GC31242@ircssh.c.rugged-nimbus-611.internal>


[-- Attachment #1.1: Type: text/plain, Size: 2667 bytes --]


On 15/08/2016 05:09, Sargun Dhillon wrote:
> On Mon, Aug 15, 2016 at 12:57:44AM +0200, Mickaël Salaün wrote:
>> Our approaches have some common points (i.e. use eBPF in an LSM, stacked 
>> filters like seccomp) but I'm focused on a kind of unprivileged LSM (i.e. no 
>> CAP_SYS_ADMIN), to make standalone sandboxes, which brings more constraints 
>> (e.g. no use of unsafe functions like bpf_probe_read(), take care of privacy, 
>> SUID exec, stable ABI…). However, I don't want to handle resource limits, 
>> which should be the job of cgroups.
>>
> Kind of. Sometimes describing these resource limits is difficult. For example, I 
> have a customer who is trying to restrict containers from burning up all the 
> ephemeral ports on the machine. In this, they have an incredibly elaborate chain 
> of wiring to prevent a given container from connecting to the same (proto, 
> destip, destport) more than 1000 times.
> 
> I'm unsure of how you'd model that in a cgroup. 

This looks like a Netfilter rule. Have you tried applying this limitation with the connlimit module?


> 
>> For now, I'm focusing on file-system access control which is one of the more 
>> complex system to properly filter. I also plan to support basic network access 
>> control.
>>
>> What you are trying to accomplish seems more related to a Netfilter extension 
>> (something like ipset but with eBPF maybe?).
>>
> I don't only want to do network access control, I also want to write to the 
> value once it's copied into kernel space. There are lot of benefits of doing 
> this at the syscall level, but the two primary ones are performance, and 
> capability. 
> 
> One of the biggest complaints with our current approach to filtering & load 
> balancing (iptables) is that it hides information. When people connect through 
> the load balancer, they want to find out who they connected to, and without some 
> high application-level mechanism, this isn't possible. On the other hand, if we 
> just rewrite the destination address in the connect hook, we can pretty easily
> allow them to do getpeername.

What exactly is not doable with Netfilter (e.g. REDIRECT or TPROXY)?


> 
> I'm curious about your filesystem access limiter. Do you have a way to make it so
> that a given container can only write, say, 100mb of data to disk? 

It's a filesystem access control. It doesn't deal with quota and is not focused on container but process hierarchies (which is more generic).

What is not doable with a quota mount option? It may be more appropriate to enhance the VFS (or overlayfs) to apply this kind of limitation, if needed.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  reply	other threads:[~2016-08-15 11:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-04  7:11 [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM Sargun Dhillon
2016-08-04  8:41 ` Richard Weinberger
2016-08-04  9:24   ` Sargun Dhillon
2016-08-04  9:45 ` Daniel Borkmann
2016-08-04 10:12   ` Sargun Dhillon
2016-08-08 23:44 ` Kees Cook
2016-08-09  0:00   ` Sargun Dhillon
2016-08-09  0:22     ` Kees Cook
2016-08-14 22:57       ` Mickaël Salaün
2016-08-15  3:09         ` Sargun Dhillon
2016-08-15 10:59           ` Mickaël Salaün [this message]
2016-08-15 17:03             ` Sargun Dhillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57B1A081.9030209@digikod.net \
    --to=mic@digikod.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=elena.reshetova@intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sargun@sargun.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox