From: "Mickaël Salaün" <mic@digikod.net>
To: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Cc: "Günther Noack" <gnoack3000@gmail.com>,
"Günther Noack" <gnoack@google.com>,
willemdebruijn.kernel@gmail.com,
linux-security-module@vger.kernel.org, netdev@vger.kernel.org,
netfilter-devel@vger.kernel.org, yusongping@huawei.com,
artem.kuzin@huawei.com, konstantin.meskhidze@huawei.com,
"Matthieu Buffet" <matthieu@buffet.re>
Subject: Re: [RFC PATCH v3 01/19] landlock: Support socket access-control
Date: Fri, 24 Jan 2025 15:02:47 +0100 [thread overview]
Message-ID: <20250124.sei0Aur6aegu@digikod.net> (raw)
In-Reply-To: <aac17a25-eb9e-342e-953a-094ae0e86b54@huawei-partners.com>
On Fri, Jan 24, 2025 at 03:28:02PM +0300, Mikhail Ivanov wrote:
> On 1/14/2025 9:31 PM, Mickaël Salaün wrote:
> > Happy new year!
> >
> > On Fri, Jan 10, 2025 at 07:55:10PM +0300, Mikhail Ivanov wrote:
> > > On 1/10/2025 7:27 PM, Günther Noack wrote:
> > > > On Fri, Jan 10, 2025 at 04:02:42PM +0300, Mikhail Ivanov wrote:
> > > > > Correct, but we also agreed to use bitmasks for "family" field as well:
> > > > >
> > > > > https://lore.kernel.org/all/af72be74-50c7-d251-5df3-a2c63c73296a@huawei-partners.com/
> > > >
> > > > OK
> > > >
> > > >
> > > > > On 1/10/2025 2:12 PM, Günther Noack wrote:
> > > > > > I do not understand why this convenience feature in the UAPI layer
> > > > > > requires a change to the data structures that Landlock uses
> > > > > > internally? As far as I can tell, struct landlock_socket_attr is only
> > > > > > used in syscalls.c and converted to other data structures already. I
> > > > > > would have imagined that we'd "unroll" the specified bitmasks into the
> > > > > > possible combinations in the add_rule_socket() function and then call
> > > > > > landlock_append_socket_rule() multiple times with each of these?
> >
> > I agree that UAPI should not necessarily dictate the kernel
> > implementation.
> >
> > > > >
> > > > > I thought about unrolling bitmask into multiple entries in rbtree, and
> > > > > came up with following possible issue:
> > > > >
> > > > > Imagine that a user creates a rule that allows to create sockets of all
> > > > > possible families and types (with protocol=0 for example):
> > > > > {
> > > > > .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > > > > .families = INT64_MAX, /* 64 set bits */
> > > > > .types = INT16_MAX, /* 16 set bits */
> > > > > .protocol = 0,
> > > > > },
> > > > >
> > > > > This will add 64 x 16 = 1024 entries to the rbtree. Currently, the
> > > > > struct landlock_rule, which is used to store rules, weighs 40B. So, it
> > > > > will be 40kB by a single rule. Even if we allow rules with only
> > > > > "existing" families and types, it will be 46 x 7 = 322 entries and ~12kB
> > > > > by a single rule.
> > > > >
> > > > > I understand that this may be degenerate case and most common rule will
> > > > > result in less then 8 (or 4) entries in rbtree, but I think API should
> > > > > be as intuitive as possible. User can expect to see the same
> > > > > memory usage regardless of the content of the rule.
> > > > >
> > > > > I'm not really sure if this case is really an issue, so I'd be glad
> > > > > to hear your opinion on it.
> > > >
> > > > I think there are two separate questions here:
> > > >
> > > > (a) I think it is OK that it is *possible* to allocate 40kB of kernel
> > > > memory. At least, this is already possible today by calling
> > > > landlock_add_rule() repeatedly.
> > > >
> > > > I assume that the GFP_KERNEL_ACCOUNT flag is limiting the
> > > > potential damage to the caller here? That flag was added in the
> > > > Landlock v19 patch set [1] ("Account objects to kmemcg.").
> > > >
> > > > (b) I agree it might be counterintuitive when a single
> > > > landlock_add_rule() call allocates more space than expected.
> > > >
> > > > Mickaël, since it is already possible today (but harder), I assume
> > > > that you have thought about this problem before -- is it a problem
> > > > when users allocate more kernel memory through Landlock than they
> > > > expected?
> >
> > The imbalance between the user request and the required kernel memory is
> > interesting. It would not be a security issue thanks to the
> > GFP_KERNEL_ACCOUNT, but it can be surprising for users that for some
> > requests they can receive -ENOMEM but not for quite-similar ones (e.g.
> > with only some bits different). We should keep the principle of least
> > astonishment in mind, but also the fact that the kernel implementation
> > and the related required memory may change between two kernel versions
> > for the exact same user request (because of Landlock implementation
> > differences or other parts of the kernel). In summary, we should be
> > careful to prevent users from being able to use a large amount of memory
> > with only one syscall. This which would be an usability issue.
> >
> > > >
> > > >
> > > > Naive proposal:
> > > >
> > > > If this is an issue, how about we set a low limit to the number of
> > > > families and the number of types that can be used in a single
> > > > landlock_add_rule() invocation? (It tends to be easier to start with
> > > > a restrictive API and open it up later than the other way around.)
> > >
> > > Looks like a good approach! Better to return with an error (which almost
> > > always won't happen) and let the user to refactor the code than to
> > > allow ruleset to eat an unexpected amount of memory.
> > >
> > > >
> > > > For instance,
> > > >
> > > > * In the families field, at most 2 bits may be set.
> > > > * In the types field, at most 2 bits may be set.
> > >
> > > Or we can say that rule can contain not more than 4 combinations of
> > > family and type.
> > >
> > > BTW, 4 seems to be sufficient, at least for IP protocols.
> > >
> > > >
> > > > In my understanding, the main use case of the bit vectors is that
> > > > there is a short way to say "all IPv4+v6 stream+dgram sockets", but we
> > > > do not know scenarios where much more than that is needed? With that,
> > > > we would still keep people from accidentally allocating larger amounts
> > > > of memory, while permitting the main use case.
> > >
> > > Agreed
> > >
> > > >
> > > > Having independent limits for the family and type fields is a bit
> > > > easier to understand and document than imposing a limit on the
> > > > multiplication result.
> >
> > This is a safer approach that can indeed be documented, but it looks
> > unintuitive and an arbitrary limitation. Here is another proposal:
> >
> > Let's ignore my previous suggestion to use bitfields for families and
> > protocols. To keep it simple, we can get back to the initial struct but
> > specifically handle the (i64)-1 value (which cannot be a family,
> > protocol, nor a type):
> > {
> > .allowed_access = LANDLOCK_ACCESS_SOCKET_CREATE,
> > .family = AF_INET,
> > .type = SOCK_STREAM,
> > .protocol = -1,
> > },
> >
> > This would read: deny socket creation except for AF_INET with
> > SOCK_STREAM (and any protocol).
> >
> > Users might need to add several rules (e.g. one for AF_INET and another
> > for AF_INET6) but that's OK. Unfortunately we cannot identify a TCP
> > socket with only protocol = IPPROTO_TCP because protocol definitions
> > are relative to network families. Specifying the protocol without the
> > family should then return an error.
> >
> > Before rule could be loaded, users define how much they want to match a
> > socket: at least the family, optionally the type, and if the type is
> > also set then the protocol can also be set. These dependencies are
> > required to transform this triplet to a key number, see below.
> >
> > A landlock_ruleset_attr.handled_socket_layers field would define how
> > much we want to match a socket:
> > - 1: family only
> > - 2: family and type
> > - 3: family, type, and protocol
> >
> > According to this ruleset's property, users will be allowed to fill the
> > family, type, or protocol fields in landlock_socket_attr rules. If a
> > socket layer is not handled, it should contain (i64)-1 for the kernel to
> > detect misuse of the API.
> >
> > This enables us to get a key from this triplet:
> >
> > family_bits = 6; // 45 values for now
> > type_bits = 3; // 7 values for now
> > protocol_bits = 5; // 28 values for now
> >
> > // attr.* are the sanitized UAPI values, including -1 replaced with 0.
> > // In this example, landlock_ruleset_attr.handled_socket_layers is 3, so
> > // the key is composed of all the 3 properties.
> > landlock_key.data = attr.family << (type_bits + protocol_bits) |
> > attr.type << protocol_bits | attr.protocol;
> >
> > For each layer of restriction in a domain, we know how precise they
> > define a socket (i.e. with how many "socket layers"). We can then look
> > for at most 3 entries in the red-black tree: one with only the family,
> > another with the family and the type, and potentially a third also
> > including the protocol. Each key would have the same significant bits
> > but with the lower bits masked according to each
> > landlock_ruleset_attr.handled_socket_layers . Composing the related
> > access masks according to the defined socket layers, we can create an
> > array of struct access_masks for the request and then check if such
> > request is allowed by the current domain. As for the currently stored
> > data, we can also identify the domain layer that blocked the request
> > (required for audit).
>
> I do not quite understand why we need socket_layers. Without it,
> user can set (i64)(-1) to type or protocol whenever he wants. While
> transforming triplet to a key we can replace (i64)(-1) with some
> constant (e.g. (1 << type_bits - 1) for the type if type_bits = 8 and
> (1 << protocol_bits - 1) for the protocol if protocol_bits = 16).
We can indead add a virtual any/wildcard type and protocol, translated
from user space's (i64)-1 . The downside is that for the same domain
layer we would need to check 4 different keys for the same triplet (i.e.
famility is always set, but type and protocol are optionals). With the
socket_layers number, we only have to check for one key per domain
layer. However, in the worse case with at least 3 domain layers having
different socket_layers value, we would still need to check for 3
different keys. So, I think it's reasonable to get rid of the
socket_layers as you suggested (while requiring the family to always be
set).
>
> >
> > With this design, each sandbox can define a socket as much as it wants.
> >
> > The downside is that we lost the bitfields and we need several calls to
> > filter more complex sockets (e.g. 4 for UDP and TCP with IPv4 and IPv6),
> > which looks OK compared to the required calls for filesystem access
> > control.
> >
> > > >
> > > > > > That being said, I am not a big fan of red-black trees for such simple
> > > > > > integer lookups either, and I also think there should be something
> > > > > > better if we make more use of the properties of the input ranges. The
> > > > > > question is though whether you want to couple that to this socket type
> > > > > > patch set, or rather do it in a follow up? (So far we have been doing
> > > > > > fine with the red black trees, and we are already contemplating the
> > > > > > possibility of changing these internal structures in [2]. We have
> > > > > > also used RB trees for the "port" rules with a similar reasoning,
> > > > > > IIRC.)
> > > > >
> > > > > I think it'll be better to have a separate series for [2] if the socket
> > > > > restriction can be implemented without rbtree refactoring.
> > > >
> > > > Sounds good to me. 👍
> > > >
> > > > [1] https://lore.kernel.org/all/20200707180955.53024-2-mic@digikod.net/
> >
> > red-black trees are a good generic data structure for the current main
> > use case (for dynamic rulesets and static domains), but we'll need to
> > use more appropriate data structures. I think this should not be a
> > blocker for this patch series. It will be required to match (port)
> > ranges though (even if the use case seems limited), and in general for
> > better performances.
>
next prev parent reply other threads:[~2025-01-24 14:12 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-04 10:48 [RFC PATCH v3 00/19] Support socket access-control Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 01/19] landlock: " Mikhail Ivanov
2024-09-06 13:09 ` Günther Noack
2024-09-09 7:23 ` Mikhail Ivanov
2024-11-11 16:29 ` Mikhail Ivanov
2024-11-22 17:45 ` Günther Noack
2024-11-25 11:04 ` Mikhail Ivanov
2024-11-27 18:43 ` Mickaël Salaün
2024-11-28 12:01 ` Mikhail Ivanov
2024-11-28 20:52 ` Mickaël Salaün
2024-12-02 11:32 ` Mikhail Ivanov
2024-12-24 16:55 ` Mikhail Ivanov
2025-01-10 11:12 ` Günther Noack
2025-01-10 13:02 ` Mikhail Ivanov
2025-01-10 16:27 ` Günther Noack
2025-01-10 16:55 ` Mikhail Ivanov
2025-01-14 18:31 ` Mickaël Salaün
2025-01-24 12:28 ` Mikhail Ivanov
2025-01-24 14:02 ` Mickaël Salaün [this message]
2024-09-04 10:48 ` [RFC PATCH v3 02/19] landlock: Add hook on socket creation Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 03/19] selftests/landlock: Test basic socket restriction Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 04/19] selftests/landlock: Test adding a rule with each supported access Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 05/19] selftests/landlock: Test adding a rule for each unknown access Mikhail Ivanov
2024-09-10 9:53 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 06/19] selftests/landlock: Test adding a rule for unhandled access Mikhail Ivanov
2024-09-10 9:22 ` Günther Noack
2024-09-11 8:19 ` Mikhail Ivanov
2024-09-13 15:04 ` Günther Noack
2024-09-13 16:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 07/19] selftests/landlock: Test adding a rule for empty access Mikhail Ivanov
2024-09-18 12:42 ` Günther Noack
2024-09-18 13:03 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 08/19] selftests/landlock: Test overlapped restriction Mikhail Ivanov
2024-09-18 12:42 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 09/19] selftests/landlock: Test creating a ruleset with unknown access Mikhail Ivanov
2024-09-18 12:44 ` Günther Noack
2024-09-04 10:48 ` [RFC PATCH v3 10/19] selftests/landlock: Test adding a rule with family and type outside the range Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 11/19] selftests/landlock: Test unsupported protocol restriction Mikhail Ivanov
2024-09-18 12:54 ` Günther Noack
2024-09-18 13:36 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 12/19] selftests/landlock: Test that kernel space sockets are not restricted Mikhail Ivanov
2024-09-04 12:45 ` Mikhail Ivanov
2024-09-18 13:00 ` Günther Noack
2024-09-19 10:53 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 13/19] selftests/landlock: Test packet protocol alias Mikhail Ivanov
2024-09-18 13:33 ` Günther Noack
2024-09-18 14:01 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 14/19] selftests/landlock: Test socketpair(2) restriction Mikhail Ivanov
2024-09-18 13:47 ` Günther Noack
2024-09-23 12:57 ` Mikhail Ivanov
2024-09-25 12:17 ` Mikhail Ivanov
2024-09-27 9:48 ` Günther Noack
2024-09-28 20:06 ` Günther Noack
2024-09-29 17:31 ` Mickaël Salaün
2024-10-03 17:27 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 15/19] selftests/landlock: Test SCTP peeloff restriction Mikhail Ivanov
2024-09-27 14:35 ` Günther Noack
2024-10-03 12:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 16/19] selftests/landlock: Test that accept(2) is not restricted Mikhail Ivanov
2024-09-27 14:53 ` Günther Noack
2024-10-03 12:41 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 17/19] samples/landlock: Replace atoi() with strtoull() in populate_ruleset_net() Mikhail Ivanov
2024-09-27 15:12 ` Günther Noack
2024-10-03 12:59 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 18/19] samples/landlock: Support socket protocol restrictions Mikhail Ivanov
2024-10-01 7:56 ` Günther Noack
2024-10-03 13:15 ` Mikhail Ivanov
2024-09-04 10:48 ` [RFC PATCH v3 19/19] landlock: Document socket rule type support Mikhail Ivanov
2024-10-01 7:09 ` Günther Noack
2024-10-03 14:00 ` Mikhail Ivanov
2024-10-03 16:21 ` Günther Noack
2025-04-22 17:19 ` [RFC PATCH v3 00/19] Support socket access-control Mickaël Salaün
2025-04-25 13:58 ` Günther Noack
2025-04-29 11:59 ` Mikhail Ivanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250124.sei0Aur6aegu@digikod.net \
--to=mic@digikod.net \
--cc=artem.kuzin@huawei.com \
--cc=gnoack3000@gmail.com \
--cc=gnoack@google.com \
--cc=ivanov.mikhail1@huawei-partners.com \
--cc=konstantin.meskhidze@huawei.com \
--cc=linux-security-module@vger.kernel.org \
--cc=matthieu@buffet.re \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yusongping@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.