linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors
@ 2024-10-17 11:04 Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
                   ` (7 more replies)
  0 siblings, 8 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Hello!
This patchset provides two general fixes for TCP Landlock hooks:

First one fixes incorrect restriction of non-TCP bind/connect actions.
There is two commits related to testing MPTCP and SCTP protocols which were
incorrectly restricted. SCTP implementation has invalid check for minimal
address length in bind(2) call [1], therefore commit with SCTP testing can be
applied later after necessary SCTP fixes.

[1] https://lore.kernel.org/all/20241004.Hohpheipieh2@digikod.net/
Closes: https://github.com/landlock-lsm/linux/issues/40

Second one fixes inconsistency of errors in bind and connect hooks for
TCP sockets. It provides per-operation helpers, which consist of a set
of checks from the TCP network stack. Due to TCP connect(2) implementation
it's not possible to obtain full consistency, but the unhandled cases are
rather special scenarios that should almost should not normally appear.
Two new tests were implemented to validate errors consistency.

Diffs of second and third commits were unreadable, so I've decided to
rewrite net.c file to simplify reviewing process.

Code coverage
=============
Code coverage(gcov) report with the launch of net_test selftest:
 * security/landlock/net.c:
lines......: 98.8% (79 of 80 lines)
functions..: 100% (8 of 8 functions)

One uncovered line is documented in check_tcp_connect_consistency_and_get_port().

General changes
===============
 * Rebases on current linux-mic/next (based on Linux v6.12-rc3)
 * Fixes inconsistency of TCP actions errors and implements two related
   tests.
 * Removes SMC test suits.
 * Adds separate commit for SCTP test suits.
 * Adds test suits of protocol fixture for sockets created with
   protocol=IPPROTO_TCP (C.f. socket(2)).

Previous versions
=================
v1: https://lore.kernel.org/all/20241003143932.2431249-1-ivanov.mikhail1@huawei-partners.com/

Mikhail Ivanov (8):
  landlock: Fix non-TCP sockets restriction
  landlock: Make network stack layer checks explicit for each TCP action
  landlock: Fix inconsistency of errors for TCP actions
  selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
  selftests/landlock: Test that MPTCP actions are not restricted
  selftests/landlock: Test consistency of errors for TCP actions
  landlock: Add note about errors consistency in documentation
  selftests/landlock: Test that SCTP actions are not restricted

 Documentation/userspace-api/landlock.rst    |   3 +-
 security/landlock/net.c                     | 501 +++++++++++-------
 tools/testing/selftests/landlock/common.h   |   1 +
 tools/testing/selftests/landlock/config     |   4 +
 tools/testing/selftests/landlock/net_test.c | 532 ++++++++++++++++++--
 5 files changed, 825 insertions(+), 216 deletions(-)
 rewrite security/landlock/net.c (36%)


base-commit: fe76bd133024aaef12d12a7d58fa3e8d138d3bf3
-- 
2.34.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-10-17 12:59   ` Matthieu Baerts
  2024-12-04 19:30   ` Mickaël Salaün
  2024-10-17 11:04 ` [RFC PATCH v2 2/8] landlock: Make network stack layer checks explicit for each TCP action Mikhail Ivanov
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Do not check TCP access right if socket protocol is not IPPROTO_TCP.
LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
should not restrict bind(2) and connect(2) for non-TCP protocols
(SCTP, MPTCP, SMC).

sk_is_tcp() is used for this to check address family of the socket
before doing INET-specific address length validation. This is required
for error consistency.

Closes: https://github.com/landlock-lsm/linux/issues/40
Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---

Changes since v1:
* Validate socket family (=INET{,6}) before any other checks
  with sk_is_tcp().
---
 security/landlock/net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/landlock/net.c b/security/landlock/net.c
index fdc1bb0a9c5d..1e80782ba239 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -66,8 +66,8 @@ static int current_check_access_socket(struct socket *const sock,
 	if (WARN_ON_ONCE(dom->num_layers < 1))
 		return -EACCES;
 
-	/* Checks if it's a (potential) TCP socket. */
-	if (sock->type != SOCK_STREAM)
+	/* Do not restrict non-TCP sockets. */
+	if (!sk_is_tcp(sock->sk))
 		return 0;
 
 	/* Checks for minimal header length to safely read sa_family. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 2/8] landlock: Make network stack layer checks explicit for each TCP action
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Move port extraction and TCP checks required for errors consistency
to hook_socket_bind() and hook_socket_connect(). This separation
simplifies the comparison with the order of network stack layer errors
for each controlled operation.

Replace current_check_access_socket() with check_access_port().

Use sk->sk_family instead of sk->__sk_common.skc_family.

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 security/landlock/net.c | 414 ++++++++++++++++++++++------------------
 1 file changed, 228 insertions(+), 186 deletions(-)
 rewrite security/landlock/net.c (22%)

diff --git a/security/landlock/net.c b/security/landlock/net.c
dissimilarity index 22%
index 1e80782ba239..a3142f9b15ee 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -1,186 +1,228 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Landlock LSM - Network management and hooks
- *
- * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
- * Copyright © 2022-2023 Microsoft Corporation
- */
-
-#include <linux/in.h>
-#include <linux/net.h>
-#include <linux/socket.h>
-#include <net/ipv6.h>
-
-#include "common.h"
-#include "cred.h"
-#include "limits.h"
-#include "net.h"
-#include "ruleset.h"
-
-int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
-			     const u16 port, access_mask_t access_rights)
-{
-	int err;
-	const struct landlock_id id = {
-		.key.data = (__force uintptr_t)htons(port),
-		.type = LANDLOCK_KEY_NET_PORT,
-	};
-
-	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
-
-	/* Transforms relative access rights to absolute ones. */
-	access_rights |= LANDLOCK_MASK_ACCESS_NET &
-			 ~landlock_get_net_access_mask(ruleset, 0);
-
-	mutex_lock(&ruleset->lock);
-	err = landlock_insert_rule(ruleset, id, access_rights);
-	mutex_unlock(&ruleset->lock);
-
-	return err;
-}
-
-static const struct landlock_ruleset *get_current_net_domain(void)
-{
-	const union access_masks any_net = {
-		.net = ~0,
-	};
-
-	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
-}
-
-static int current_check_access_socket(struct socket *const sock,
-				       struct sockaddr *const address,
-				       const int addrlen,
-				       access_mask_t access_request)
-{
-	__be16 port;
-	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
-	const struct landlock_rule *rule;
-	struct landlock_id id = {
-		.type = LANDLOCK_KEY_NET_PORT,
-	};
-	const struct landlock_ruleset *const dom = get_current_net_domain();
-
-	if (!dom)
-		return 0;
-	if (WARN_ON_ONCE(dom->num_layers < 1))
-		return -EACCES;
-
-	/* Do not restrict non-TCP sockets. */
-	if (!sk_is_tcp(sock->sk))
-		return 0;
-
-	/* Checks for minimal header length to safely read sa_family. */
-	if (addrlen < offsetofend(typeof(*address), sa_family))
-		return -EINVAL;
-
-	switch (address->sa_family) {
-	case AF_UNSPEC:
-	case AF_INET:
-		if (addrlen < sizeof(struct sockaddr_in))
-			return -EINVAL;
-		port = ((struct sockaddr_in *)address)->sin_port;
-		break;
-
-#if IS_ENABLED(CONFIG_IPV6)
-	case AF_INET6:
-		if (addrlen < SIN6_LEN_RFC2133)
-			return -EINVAL;
-		port = ((struct sockaddr_in6 *)address)->sin6_port;
-		break;
-#endif /* IS_ENABLED(CONFIG_IPV6) */
-
-	default:
-		return 0;
-	}
-
-	/* Specific AF_UNSPEC handling. */
-	if (address->sa_family == AF_UNSPEC) {
-		/*
-		 * Connecting to an address with AF_UNSPEC dissolves the TCP
-		 * association, which have the same effect as closing the
-		 * connection while retaining the socket object (i.e., the file
-		 * descriptor).  As for dropping privileges, closing
-		 * connections is always allowed.
-		 *
-		 * For a TCP access control system, this request is legitimate.
-		 * Let the network stack handle potential inconsistencies and
-		 * return -EINVAL if needed.
-		 */
-		if (access_request == LANDLOCK_ACCESS_NET_CONNECT_TCP)
-			return 0;
-
-		/*
-		 * For compatibility reason, accept AF_UNSPEC for bind
-		 * accesses (mapped to AF_INET) only if the address is
-		 * INADDR_ANY (cf. __inet_bind).  Checking the address is
-		 * required to not wrongfully return -EACCES instead of
-		 * -EAFNOSUPPORT.
-		 *
-		 * We could return 0 and let the network stack handle these
-		 * checks, but it is safer to return a proper error and test
-		 * consistency thanks to kselftest.
-		 */
-		if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP) {
-			/* addrlen has already been checked for AF_UNSPEC. */
-			const struct sockaddr_in *const sockaddr =
-				(struct sockaddr_in *)address;
-
-			if (sock->sk->__sk_common.skc_family != AF_INET)
-				return -EINVAL;
-
-			if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
-				return -EAFNOSUPPORT;
-		}
-	} else {
-		/*
-		 * Checks sa_family consistency to not wrongfully return
-		 * -EACCES instead of -EINVAL.  Valid sa_family changes are
-		 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
-		 *
-		 * We could return 0 and let the network stack handle this
-		 * check, but it is safer to return a proper error and test
-		 * consistency thanks to kselftest.
-		 */
-		if (address->sa_family != sock->sk->__sk_common.skc_family)
-			return -EINVAL;
-	}
-
-	id.key.data = (__force uintptr_t)port;
-	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
-
-	rule = landlock_find_rule(dom, id);
-	access_request = landlock_init_layer_masks(
-		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
-	if (landlock_unmask_layers(rule, access_request, &layer_masks,
-				   ARRAY_SIZE(layer_masks)))
-		return 0;
-
-	return -EACCES;
-}
-
-static int hook_socket_bind(struct socket *const sock,
-			    struct sockaddr *const address, const int addrlen)
-{
-	return current_check_access_socket(sock, address, addrlen,
-					   LANDLOCK_ACCESS_NET_BIND_TCP);
-}
-
-static int hook_socket_connect(struct socket *const sock,
-			       struct sockaddr *const address,
-			       const int addrlen)
-{
-	return current_check_access_socket(sock, address, addrlen,
-					   LANDLOCK_ACCESS_NET_CONNECT_TCP);
-}
-
-static struct security_hook_list landlock_hooks[] __ro_after_init = {
-	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
-	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
-};
-
-__init void landlock_add_net_hooks(void)
-{
-	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
-			   &landlock_lsmid);
-}
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Network management and hooks
+ *
+ * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
+ * Copyright © 2022-2023 Microsoft Corporation
+ */
+
+#include <linux/in.h>
+#include <linux/net.h>
+#include <linux/socket.h>
+#include <net/ipv6.h>
+
+#include "common.h"
+#include "cred.h"
+#include "limits.h"
+#include "net.h"
+#include "ruleset.h"
+
+int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
+			     const u16 port, access_mask_t access_rights)
+{
+	int err;
+	const struct landlock_id id = {
+		.key.data = (__force uintptr_t)htons(port),
+		.type = LANDLOCK_KEY_NET_PORT,
+	};
+
+	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
+
+	/* Transforms relative access rights to absolute ones. */
+	access_rights |= LANDLOCK_MASK_ACCESS_NET &
+			 ~landlock_get_net_access_mask(ruleset, 0);
+
+	mutex_lock(&ruleset->lock);
+	err = landlock_insert_rule(ruleset, id, access_rights);
+	mutex_unlock(&ruleset->lock);
+
+	return err;
+}
+
+static const struct landlock_ruleset *get_current_net_domain(void)
+{
+	const union access_masks any_net = {
+		.net = ~0,
+	};
+
+	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
+}
+
+static int check_access_port(const struct landlock_ruleset *const dom,
+			     __be16 port, access_mask_t access_request)
+{
+	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
+	const struct landlock_rule *rule;
+	struct landlock_id id = {
+		.type = LANDLOCK_KEY_NET_PORT,
+	};
+
+	id.key.data = (__force uintptr_t)port;
+	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
+
+	rule = landlock_find_rule(dom, id);
+	access_request = landlock_init_layer_masks(
+		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
+	if (landlock_unmask_layers(rule, access_request, &layer_masks,
+				   ARRAY_SIZE(layer_masks)))
+		return 0;
+
+	return -EACCES;
+}
+
+static int hook_socket_bind(struct socket *const sock,
+			    struct sockaddr *const address, const int addrlen)
+{
+	__be16 port;
+	struct sock *const sk = sock->sk;
+	const struct landlock_ruleset *const dom = get_current_net_domain();
+
+	if (!dom)
+		return 0;
+	if (WARN_ON_ONCE(dom->num_layers < 1))
+		return -EACCES;
+
+	if (sk_is_tcp(sk)) {
+		/* Checks for minimal header length to safely read sa_family. */
+		if (addrlen < offsetofend(typeof(*address), sa_family))
+			return -EINVAL;
+
+		switch (address->sa_family) {
+		case AF_UNSPEC:
+		case AF_INET:
+			if (addrlen < sizeof(struct sockaddr_in))
+				return -EINVAL;
+			port = ((struct sockaddr_in *)address)->sin_port;
+			break;
+
+#if IS_ENABLED(CONFIG_IPV6)
+		case AF_INET6:
+			if (addrlen < SIN6_LEN_RFC2133)
+				return -EINVAL;
+			port = ((struct sockaddr_in6 *)address)->sin6_port;
+			break;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
+		default:
+			return 0;
+		}
+
+		/*
+		 * For compatibility reason, accept AF_UNSPEC for bind
+		 * accesses (mapped to AF_INET) only if the address is
+		 * INADDR_ANY (cf. __inet_bind).  Checking the address is
+		 * required to not wrongfully return -EACCES instead of
+		 * -EAFNOSUPPORT.
+		 *
+		 * We could return 0 and let the network stack handle these
+		 * checks, but it is safer to return a proper error and test
+		 * consistency thanks to kselftest.
+		 */
+		if (address->sa_family == AF_UNSPEC) {
+			/* addrlen has already been checked for AF_UNSPEC. */
+			const struct sockaddr_in *const sockaddr =
+				(struct sockaddr_in *)address;
+
+			if (sk->sk_family != AF_INET)
+				return -EINVAL;
+
+			if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
+				return -EAFNOSUPPORT;
+		} else {
+			/*
+			 * Checks sa_family consistency to not wrongfully return
+			 * -EACCES instead of -EINVAL.  Valid sa_family changes are
+			 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
+			 *
+			 * We could return 0 and let the network stack handle this
+			 * check, but it is safer to return a proper error and test
+			 * consistency thanks to kselftest.
+			 */
+			if (address->sa_family != sk->sk_family)
+				return -EINVAL;
+		}
+		return check_access_port(dom, port,
+					 LANDLOCK_ACCESS_NET_BIND_TCP);
+	}
+	return 0;
+}
+
+static int hook_socket_connect(struct socket *const sock,
+			       struct sockaddr *const address,
+			       const int addrlen)
+{
+	__be16 port;
+	struct sock *const sk = sock->sk;
+	const struct landlock_ruleset *const dom = get_current_net_domain();
+
+	if (!dom)
+		return 0;
+	if (WARN_ON_ONCE(dom->num_layers < 1))
+		return -EACCES;
+
+	if (sk_is_tcp(sk)) {
+		/* Checks for minimal header length to safely read sa_family. */
+		if (addrlen < offsetofend(typeof(*address), sa_family))
+			return -EINVAL;
+
+		switch (address->sa_family) {
+		case AF_UNSPEC:
+		case AF_INET:
+			if (addrlen < sizeof(struct sockaddr_in))
+				return -EINVAL;
+			port = ((struct sockaddr_in *)address)->sin_port;
+			break;
+
+#if IS_ENABLED(CONFIG_IPV6)
+		case AF_INET6:
+			if (addrlen < SIN6_LEN_RFC2133)
+				return -EINVAL;
+			port = ((struct sockaddr_in6 *)address)->sin6_port;
+			break;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
+		default:
+			return 0;
+		}
+
+		/*
+		 * Connecting to an address with AF_UNSPEC dissolves the TCP
+		 * association, which have the same effect as closing the
+		 * connection while retaining the socket object (i.e., the file
+		 * descriptor).  As for dropping privileges, closing
+		 * connections is always allowed.
+		 *
+		 * For a TCP access control system, this request is legitimate.
+		 * Let the network stack handle potential inconsistencies and
+		 * return -EINVAL if needed.
+		 */
+		if (address->sa_family == AF_UNSPEC)
+			return 0;
+		/*
+		 * Checks sa_family consistency to not wrongfully return
+		 * -EACCES instead of -EINVAL.  Valid sa_family changes are
+		 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
+		 *
+		 * We could return 0 and let the network stack handle this
+		 * check, but it is safer to return a proper error and test
+		 * consistency thanks to kselftest.
+		 */
+		if (address->sa_family != sk->sk_family)
+			return -EINVAL;
+
+		return check_access_port(dom, port,
+					 LANDLOCK_ACCESS_NET_CONNECT_TCP);
+	}
+	return 0;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
+	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
+};
+
+__init void landlock_add_net_hooks(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			   &landlock_lsmid);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 2/8] landlock: Make network stack layer checks explicit for each TCP action Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-10-17 11:34   ` Mikhail Ivanov
                     ` (2 more replies)
  2024-10-17 11:04 ` [RFC PATCH v2 4/8] selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP Mikhail Ivanov
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Add two helpers for TCP bind/connect accesses, which will serve to perform
action-specific network stack level checks and safely extract the port from
the address.

Return -EAFNOSUPPORT instead of -EINVAL in sin_family checks.

Check socket state before validating address for TCP connect access. This
is necessary to follow the error order of network stack.

Read sk_family value from socket structure with READ_ONCE to safely handle
IPV6_ADDRFORM case (see [1]).

[1] https://lore.kernel.org/all/20240202095404.183274-1-edumazet@google.com/

Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 security/landlock/net.c | 543 +++++++++++++++++++++++-----------------
 1 file changed, 315 insertions(+), 228 deletions(-)
 rewrite security/landlock/net.c (37%)

diff --git a/security/landlock/net.c b/security/landlock/net.c
dissimilarity index 37%
index a3142f9b15ee..06791aba9196 100644
--- a/security/landlock/net.c
+++ b/security/landlock/net.c
@@ -1,228 +1,315 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Landlock LSM - Network management and hooks
- *
- * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
- * Copyright © 2022-2023 Microsoft Corporation
- */
-
-#include <linux/in.h>
-#include <linux/net.h>
-#include <linux/socket.h>
-#include <net/ipv6.h>
-
-#include "common.h"
-#include "cred.h"
-#include "limits.h"
-#include "net.h"
-#include "ruleset.h"
-
-int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
-			     const u16 port, access_mask_t access_rights)
-{
-	int err;
-	const struct landlock_id id = {
-		.key.data = (__force uintptr_t)htons(port),
-		.type = LANDLOCK_KEY_NET_PORT,
-	};
-
-	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
-
-	/* Transforms relative access rights to absolute ones. */
-	access_rights |= LANDLOCK_MASK_ACCESS_NET &
-			 ~landlock_get_net_access_mask(ruleset, 0);
-
-	mutex_lock(&ruleset->lock);
-	err = landlock_insert_rule(ruleset, id, access_rights);
-	mutex_unlock(&ruleset->lock);
-
-	return err;
-}
-
-static const struct landlock_ruleset *get_current_net_domain(void)
-{
-	const union access_masks any_net = {
-		.net = ~0,
-	};
-
-	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
-}
-
-static int check_access_port(const struct landlock_ruleset *const dom,
-			     __be16 port, access_mask_t access_request)
-{
-	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
-	const struct landlock_rule *rule;
-	struct landlock_id id = {
-		.type = LANDLOCK_KEY_NET_PORT,
-	};
-
-	id.key.data = (__force uintptr_t)port;
-	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
-
-	rule = landlock_find_rule(dom, id);
-	access_request = landlock_init_layer_masks(
-		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
-	if (landlock_unmask_layers(rule, access_request, &layer_masks,
-				   ARRAY_SIZE(layer_masks)))
-		return 0;
-
-	return -EACCES;
-}
-
-static int hook_socket_bind(struct socket *const sock,
-			    struct sockaddr *const address, const int addrlen)
-{
-	__be16 port;
-	struct sock *const sk = sock->sk;
-	const struct landlock_ruleset *const dom = get_current_net_domain();
-
-	if (!dom)
-		return 0;
-	if (WARN_ON_ONCE(dom->num_layers < 1))
-		return -EACCES;
-
-	if (sk_is_tcp(sk)) {
-		/* Checks for minimal header length to safely read sa_family. */
-		if (addrlen < offsetofend(typeof(*address), sa_family))
-			return -EINVAL;
-
-		switch (address->sa_family) {
-		case AF_UNSPEC:
-		case AF_INET:
-			if (addrlen < sizeof(struct sockaddr_in))
-				return -EINVAL;
-			port = ((struct sockaddr_in *)address)->sin_port;
-			break;
-
-#if IS_ENABLED(CONFIG_IPV6)
-		case AF_INET6:
-			if (addrlen < SIN6_LEN_RFC2133)
-				return -EINVAL;
-			port = ((struct sockaddr_in6 *)address)->sin6_port;
-			break;
-#endif /* IS_ENABLED(CONFIG_IPV6) */
-
-		default:
-			return 0;
-		}
-
-		/*
-		 * For compatibility reason, accept AF_UNSPEC for bind
-		 * accesses (mapped to AF_INET) only if the address is
-		 * INADDR_ANY (cf. __inet_bind).  Checking the address is
-		 * required to not wrongfully return -EACCES instead of
-		 * -EAFNOSUPPORT.
-		 *
-		 * We could return 0 and let the network stack handle these
-		 * checks, but it is safer to return a proper error and test
-		 * consistency thanks to kselftest.
-		 */
-		if (address->sa_family == AF_UNSPEC) {
-			/* addrlen has already been checked for AF_UNSPEC. */
-			const struct sockaddr_in *const sockaddr =
-				(struct sockaddr_in *)address;
-
-			if (sk->sk_family != AF_INET)
-				return -EINVAL;
-
-			if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
-				return -EAFNOSUPPORT;
-		} else {
-			/*
-			 * Checks sa_family consistency to not wrongfully return
-			 * -EACCES instead of -EINVAL.  Valid sa_family changes are
-			 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
-			 *
-			 * We could return 0 and let the network stack handle this
-			 * check, but it is safer to return a proper error and test
-			 * consistency thanks to kselftest.
-			 */
-			if (address->sa_family != sk->sk_family)
-				return -EINVAL;
-		}
-		return check_access_port(dom, port,
-					 LANDLOCK_ACCESS_NET_BIND_TCP);
-	}
-	return 0;
-}
-
-static int hook_socket_connect(struct socket *const sock,
-			       struct sockaddr *const address,
-			       const int addrlen)
-{
-	__be16 port;
-	struct sock *const sk = sock->sk;
-	const struct landlock_ruleset *const dom = get_current_net_domain();
-
-	if (!dom)
-		return 0;
-	if (WARN_ON_ONCE(dom->num_layers < 1))
-		return -EACCES;
-
-	if (sk_is_tcp(sk)) {
-		/* Checks for minimal header length to safely read sa_family. */
-		if (addrlen < offsetofend(typeof(*address), sa_family))
-			return -EINVAL;
-
-		switch (address->sa_family) {
-		case AF_UNSPEC:
-		case AF_INET:
-			if (addrlen < sizeof(struct sockaddr_in))
-				return -EINVAL;
-			port = ((struct sockaddr_in *)address)->sin_port;
-			break;
-
-#if IS_ENABLED(CONFIG_IPV6)
-		case AF_INET6:
-			if (addrlen < SIN6_LEN_RFC2133)
-				return -EINVAL;
-			port = ((struct sockaddr_in6 *)address)->sin6_port;
-			break;
-#endif /* IS_ENABLED(CONFIG_IPV6) */
-
-		default:
-			return 0;
-		}
-
-		/*
-		 * Connecting to an address with AF_UNSPEC dissolves the TCP
-		 * association, which have the same effect as closing the
-		 * connection while retaining the socket object (i.e., the file
-		 * descriptor).  As for dropping privileges, closing
-		 * connections is always allowed.
-		 *
-		 * For a TCP access control system, this request is legitimate.
-		 * Let the network stack handle potential inconsistencies and
-		 * return -EINVAL if needed.
-		 */
-		if (address->sa_family == AF_UNSPEC)
-			return 0;
-		/*
-		 * Checks sa_family consistency to not wrongfully return
-		 * -EACCES instead of -EINVAL.  Valid sa_family changes are
-		 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
-		 *
-		 * We could return 0 and let the network stack handle this
-		 * check, but it is safer to return a proper error and test
-		 * consistency thanks to kselftest.
-		 */
-		if (address->sa_family != sk->sk_family)
-			return -EINVAL;
-
-		return check_access_port(dom, port,
-					 LANDLOCK_ACCESS_NET_CONNECT_TCP);
-	}
-	return 0;
-}
-
-static struct security_hook_list landlock_hooks[] __ro_after_init = {
-	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
-	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
-};
-
-__init void landlock_add_net_hooks(void)
-{
-	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
-			   &landlock_lsmid);
-}
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Landlock LSM - Network management and hooks
+ *
+ * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
+ * Copyright © 2022-2023 Microsoft Corporation
+ */
+
+#include <linux/in.h>
+#include <linux/net.h>
+#include <linux/socket.h>
+#include <net/ipv6.h>
+
+#include "common.h"
+#include "cred.h"
+#include "limits.h"
+#include "net.h"
+#include "ruleset.h"
+
+int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
+			     const u16 port, access_mask_t access_rights)
+{
+	int err;
+	const struct landlock_id id = {
+		.key.data = (__force uintptr_t)htons(port),
+		.type = LANDLOCK_KEY_NET_PORT,
+	};
+
+	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
+
+	/* Transforms relative access rights to absolute ones. */
+	access_rights |= LANDLOCK_MASK_ACCESS_NET &
+			 ~landlock_get_net_access_mask(ruleset, 0);
+
+	mutex_lock(&ruleset->lock);
+	err = landlock_insert_rule(ruleset, id, access_rights);
+	mutex_unlock(&ruleset->lock);
+
+	return err;
+}
+
+static const struct landlock_ruleset *get_current_net_domain(void)
+{
+	const union access_masks any_net = {
+		.net = ~0,
+	};
+
+	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
+}
+
+static int check_access_port(const struct landlock_ruleset *const dom,
+			     __be16 port, access_mask_t access_request)
+{
+	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
+	const struct landlock_rule *rule;
+	struct landlock_id id = {
+		.type = LANDLOCK_KEY_NET_PORT,
+	};
+
+	id.key.data = (__force uintptr_t)port;
+	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
+
+	rule = landlock_find_rule(dom, id);
+	access_request = landlock_init_layer_masks(
+		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
+	if (landlock_unmask_layers(rule, access_request, &layer_masks,
+				   ARRAY_SIZE(layer_masks)))
+		return 0;
+
+	return -EACCES;
+}
+
+/*
+ * Checks that TCP @sock and @address attributes are correct for bind(2).
+ *
+ * On success, extracts port from @address in @port and returns 0.
+ *
+ * This validation is consistent with network stack and returns the error
+ * in the order corresponding to the order of errors from the network stack.
+ * It's required to not wrongfully return -EACCES instead of meaningful network
+ * stack level errors. Consistency is tested with kselftest.
+ *
+ * This helper does not provide consistency of error codes for BPF filter
+ * (if any).
+ */
+static int
+check_tcp_bind_consistency_and_get_port(struct socket *const sock,
+					struct sockaddr *const address,
+					const int addrlen, __be16 *port)
+{
+	/* IPV6_ADDRFORM can change sk->sk_family under us. */
+	switch (READ_ONCE(sock->sk->sk_family)) {
+	case AF_INET:
+		const struct sockaddr_in *const addr =
+			(struct sockaddr_in *)address;
+
+		/* Cf. inet_bind_sk(). */
+		if (addrlen < sizeof(struct sockaddr_in))
+			return -EINVAL;
+		/*
+		 * For compatibility reason, accept AF_UNSPEC for bind
+		 * accesses (mapped to AF_INET) only if the address is
+		 * INADDR_ANY (cf. __inet_bind).
+		 */
+		if (addr->sin_family != AF_INET) {
+			if (addr->sin_family != AF_UNSPEC ||
+			    addr->sin_addr.s_addr != htonl(INADDR_ANY))
+				return -EAFNOSUPPORT;
+		}
+		*port = ((struct sockaddr_in *)address)->sin_port;
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		/* Cf. inet6_bind_sk(). */
+		if (addrlen < SIN6_LEN_RFC2133)
+			return -EINVAL;
+		/* Cf. __inet6_bind(). */
+		if (address->sa_family != AF_INET6)
+			return -EAFNOSUPPORT;
+		*port = ((struct sockaddr_in6 *)address)->sin6_port;
+		break;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+	default:
+		WARN_ON_ONCE(0);
+		return -EACCES;
+	}
+	return 0;
+}
+
+/*
+ * Checks that TCP @sock and @address attributes are correct for connect(2).
+ *
+ * On success, extracts port from @address in @port and returns 0.
+ *
+ * This validation is consistent with network stack and returns the error
+ * in the order corresponding to the order of errors from the network stack.
+ * It's required to not wrongfully return -EACCES instead of meaningful network
+ * stack level error. Consistency is partially tested with kselftest.
+ *
+ * This helper does not provide consistency of error codes for BPF filter
+ * (if any).
+ *
+ * The function holds socket lock while checking the socket state.
+ */
+static int
+check_tcp_connect_consistency_and_get_port(struct socket *const sock,
+					   struct sockaddr *const address,
+					   const int addrlen, __be16 *port)
+{
+	int err = 0;
+	struct sock *const sk = sock->sk;
+
+	/* Cf. __inet_stream_connect(). */
+	lock_sock(sk);
+	switch (sock->state) {
+	default:
+		err = -EINVAL;
+		break;
+	case SS_CONNECTED:
+		err = -EISCONN;
+		break;
+	case SS_CONNECTING:
+		/*
+		 * Calling connect(2) on nonblocking socket with SYN_SENT or SYN_RECV
+		 * state immediately returns -EISCONN and -EALREADY (Cf. __inet_stream_connect()).
+		 *
+		 * This check is not tested with kselftests.
+		 */
+		if ((sock->file->f_flags & O_NONBLOCK) &&
+		    ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
+			if (inet_test_bit(DEFER_CONNECT, sk))
+				err = -EISCONN;
+			else
+				err = -EALREADY;
+			break;
+		}
+
+		/*
+		 * Current state is possible in two cases:
+		 * 1. connect(2) is called upon nonblocking socket and previous
+		 *    connection attempt was closed by RST packet (therefore socket is
+		 *    in TCP_CLOSE state). In this case connect(2) calls
+		 *    sk_prot->disconnect(), changes socket state and increases number
+		 *    of disconnects.
+		 * 2. connect(2) is called twice upon socket with TCP_FASTOPEN_CONNECT
+		 *    option set. If socket state is TCP_CLOSE connect(2) does the
+		 *    same logic as in point 1 case. Otherwise connect(2) may freeze
+		 *    after inet_wait_for_connect() call since SYN was never sent.
+		 *
+		 * For both this cases Landlock cannot provide error consistency since
+		 * 1. Both cases involve executing some network stack logic and changing
+		 *    the socket state.
+		 * 2. It cannot omit access check and allow network stack handle error
+		 *    consistency since socket can change its state to SS_UNCONNECTED
+		 *    before it will be locked again in inet_stream_connect().
+		 *
+		 * Therefore it is only possible to return 0 and check access right with
+		 * check_access_port() helper.
+		 */
+		release_sock(sk);
+		return 0;
+	case SS_UNCONNECTED:
+		if (sk->sk_state != TCP_CLOSE)
+			err = -EISCONN;
+		break;
+	}
+	release_sock(sk);
+
+	if (err)
+		return err;
+
+	/* IPV6_ADDRFORM can change sk->sk_family under us. */
+	switch (READ_ONCE(sk->sk_family)) {
+	case AF_INET:
+		/* Cf. tcp_v4_connect(). */
+		if (addrlen < sizeof(struct sockaddr_in))
+			return -EINVAL;
+		if (address->sa_family != AF_INET)
+			return -EAFNOSUPPORT;
+
+		*port = ((struct sockaddr_in *)address)->sin_port;
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		/* Cf. tcp_v6_connect(). */
+		if (addrlen < SIN6_LEN_RFC2133)
+			return -EINVAL;
+		if (address->sa_family != AF_INET6)
+			return -EAFNOSUPPORT;
+
+		*port = ((struct sockaddr_in6 *)address)->sin6_port;
+		break;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+	default:
+		WARN_ON_ONCE(0);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
+static int hook_socket_bind(struct socket *const sock,
+			    struct sockaddr *const address, const int addrlen)
+{
+	int err;
+	__be16 port;
+	const struct landlock_ruleset *const dom = get_current_net_domain();
+
+	if (!dom)
+		return 0;
+	if (WARN_ON_ONCE(dom->num_layers < 1))
+		return -EACCES;
+
+	if (sk_is_tcp(sock->sk)) {
+		err = check_tcp_bind_consistency_and_get_port(sock, address,
+							      addrlen, &port);
+		if (err)
+			return err;
+		return check_access_port(dom, port,
+					 LANDLOCK_ACCESS_NET_BIND_TCP);
+	}
+	return 0;
+}
+
+static int hook_socket_connect(struct socket *const sock,
+			       struct sockaddr *const address,
+			       const int addrlen)
+{
+	int err;
+	__be16 port;
+	const struct landlock_ruleset *const dom = get_current_net_domain();
+
+	if (!dom)
+		return 0;
+	if (WARN_ON_ONCE(dom->num_layers < 1))
+		return -EACCES;
+
+	if (sk_is_tcp(sock->sk)) {
+		/* Checks for minimal header length to safely read sa_family. */
+		if (addrlen < sizeof(address->sa_family))
+			return -EINVAL;
+		/*
+		 * Connecting to an address with AF_UNSPEC dissolves the TCP
+		 * association, which have the same effect as closing the
+		 * connection while retaining the socket object (i.e., the file
+		 * descriptor).  As for dropping privileges, closing
+		 * connections is always allowed.
+		 *
+		 * For a TCP access control system, this request is legitimate.
+		 * Let the network stack handle potential inconsistencies and
+		 * return -EINVAL if needed.
+		 */
+		if (address->sa_family == AF_UNSPEC)
+			return 0;
+
+		err = check_tcp_connect_consistency_and_get_port(
+			sock, address, addrlen, &port);
+		if (err)
+			return err;
+		return check_access_port(dom, port,
+					 LANDLOCK_ACCESS_NET_CONNECT_TCP);
+	}
+	return 0;
+}
+
+static struct security_hook_list landlock_hooks[] __ro_after_init = {
+	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
+	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
+};
+
+__init void landlock_add_net_hooks(void)
+{
+	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
+			   &landlock_lsmid);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 4/8] selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
                   ` (2 preceding siblings ...)
  2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 5/8] selftests/landlock: Test that MPTCP actions are not restricted Mikhail Ivanov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Extend protocol_variant structure with protocol field (Cf. socket(2)).

Extend protocol fixture with TCP test suits with protocol=IPPROTO_TCP
which can be used as an alias for IPPROTO_IP (=0) in socket(2).

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 tools/testing/selftests/landlock/common.h   |  1 +
 tools/testing/selftests/landlock/net_test.c | 80 +++++++++++++++++----
 2 files changed, 67 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 61056fa074bb..40a2def50b83 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -234,6 +234,7 @@ enforce_ruleset(struct __test_metadata *const _metadata, const int ruleset_fd)
 struct protocol_variant {
 	int domain;
 	int type;
+	int protocol;
 };
 
 struct service_fixture {
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 4e0aeb53b225..333263780fae 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -85,18 +85,18 @@ static void setup_loopback(struct __test_metadata *const _metadata)
 	clear_ambient_cap(_metadata, CAP_NET_ADMIN);
 }
 
+static bool prot_is_tcp(const struct protocol_variant *const prot)
+{
+	return (prot->domain == AF_INET || prot->domain == AF_INET6) &&
+	       prot->type == SOCK_STREAM &&
+	       (prot->protocol == IPPROTO_TCP || prot->protocol == IPPROTO_IP);
+}
+
 static bool is_restricted(const struct protocol_variant *const prot,
 			  const enum sandbox_type sandbox)
 {
-	switch (prot->domain) {
-	case AF_INET:
-	case AF_INET6:
-		switch (prot->type) {
-		case SOCK_STREAM:
-			return sandbox == TCP_SANDBOX;
-		}
-		break;
-	}
+	if (sandbox == TCP_SANDBOX)
+		return prot_is_tcp(prot);
 	return false;
 }
 
@@ -105,7 +105,7 @@ static int socket_variant(const struct service_fixture *const srv)
 	int ret;
 
 	ret = socket(srv->protocol.domain, srv->protocol.type | SOCK_CLOEXEC,
-		     0);
+		     srv->protocol.protocol);
 	if (ret < 0)
 		return -errno;
 	return ret;
@@ -290,22 +290,48 @@ FIXTURE_TEARDOWN(protocol)
 }
 
 /* clang-format off */
-FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp) {
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp1) {
 	/* clang-format on */
 	.sandbox = NO_SANDBOX,
 	.prot = {
 		.domain = AF_INET,
 		.type = SOCK_STREAM,
+		/* IPPROTO_IP == 0 */
+		.protocol = IPPROTO_IP,
 	},
 };
 
 /* clang-format off */
-FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp) {
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp2) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_TCP,
+	},
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp1) {
 	/* clang-format on */
 	.sandbox = NO_SANDBOX,
 	.prot = {
 		.domain = AF_INET6,
 		.type = SOCK_STREAM,
+		/* IPPROTO_IP == 0 */
+		.protocol = IPPROTO_IP,
+	},
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp2) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_TCP,
 	},
 };
 
@@ -350,22 +376,48 @@ FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_unix_datagram) {
 };
 
 /* clang-format off */
-FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp) {
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp1) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		/* IPPROTO_IP == 0 */
+		.protocol = IPPROTO_IP,
+	},
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp2) {
 	/* clang-format on */
 	.sandbox = TCP_SANDBOX,
 	.prot = {
 		.domain = AF_INET,
 		.type = SOCK_STREAM,
+		.protocol = IPPROTO_TCP,
+	},
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp1) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		/* IPPROTO_IP == 0 */
+		.protocol = IPPROTO_IP,
 	},
 };
 
 /* clang-format off */
-FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp) {
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp2) {
 	/* clang-format on */
 	.sandbox = TCP_SANDBOX,
 	.prot = {
 		.domain = AF_INET6,
 		.type = SOCK_STREAM,
+		.protocol = IPPROTO_TCP,
 	},
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 5/8] selftests/landlock: Test that MPTCP actions are not restricted
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
                   ` (3 preceding siblings ...)
  2024-10-17 11:04 ` [RFC PATCH v2 4/8] selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions Mikhail Ivanov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Extend protocol fixture with test suits for MPTCP protocol.
Add CONFIG_MPTCP and CONFIG_MPTCP_IPV6 options in config.

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---

Changes since v1:
* Removes SMC test suits and puts SCTP test suits in a separate commit.
---
 tools/testing/selftests/landlock/config     |  2 +
 tools/testing/selftests/landlock/net_test.c | 44 +++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 29af19c4e9f9..a8982da4acbd 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -3,6 +3,8 @@ CONFIG_CGROUP_SCHED=y
 CONFIG_INET=y
 CONFIG_IPV6=y
 CONFIG_KEYS=y
+CONFIG_MPTCP=y
+CONFIG_MPTCP_IPV6=y
 CONFIG_NET=y
 CONFIG_NET_NS=y
 CONFIG_OVERLAY_FS=y
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 333263780fae..d9de0ee49ebc 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -312,6 +312,17 @@ FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp2) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_mptcp) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_MPTCP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp1) {
 	/* clang-format on */
@@ -335,6 +346,17 @@ FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp2) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_mptcp) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_MPTCP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_udp) {
 	/* clang-format on */
@@ -398,6 +420,17 @@ FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp2) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_mptcp) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_MPTCP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp1) {
 	/* clang-format on */
@@ -421,6 +454,17 @@ FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp2) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_mptcp) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_MPTCP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_udp) {
 	/* clang-format on */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
                   ` (4 preceding siblings ...)
  2024-10-17 11:04 ` [RFC PATCH v2 5/8] selftests/landlock: Test that MPTCP actions are not restricted Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-12-10 18:07   ` Mickaël Salaün
  2024-10-17 11:04 ` [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation Mikhail Ivanov
  2024-10-17 11:04 ` [RFC PATCH v2 8/8] selftests/landlock: Test that SCTP actions are not restricted Mikhail Ivanov
  7 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Add tcp_errors_consistency fixture for TCP errors consistency tests.

Add 6 test suits for this fixture to configure tested address family of
socket (ipv4 or ipv6), sandboxed mode and whether TCP action is allowed
in a sandboxed mode.

Add tests which validate errors consistency provided by Landlock for
bind(2) and connect(2) restrictable TCP actions.

Add sys_bind(), sys_connect() helpers for convenient checks of bind(2)
and connect(2). Add set_ipv4_tcp_address(), set_ipv6_tcp_address()
helpers.

Add CONFIG_LSM="landlock" option in config. Some LSMs (e.g. SElinux)
can be loaded before Landlock and return inconsistent error code for
bind(2) and connect(2) calls.

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 tools/testing/selftests/landlock/config     |   1 +
 tools/testing/selftests/landlock/net_test.c | 329 +++++++++++++++++++-
 2 files changed, 324 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index a8982da4acbd..52988e8a56cc 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -3,6 +3,7 @@ CONFIG_CGROUP_SCHED=y
 CONFIG_INET=y
 CONFIG_IPV6=y
 CONFIG_KEYS=y
+CONFIG_LSM="landlock"
 CONFIG_MPTCP=y
 CONFIG_MPTCP_IPV6=y
 CONFIG_NET=y
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index d9de0ee49ebc..30b29bf10bdc 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -36,6 +36,22 @@ enum sandbox_type {
 	TCP_SANDBOX,
 };
 
+static void set_ipv4_tcp_address(const struct service_fixture *const srv,
+				 struct sockaddr_in *ipv4_addr)
+{
+	ipv4_addr->sin_family = srv->protocol.domain;
+	ipv4_addr->sin_port = htons(srv->port);
+	ipv4_addr->sin_addr.s_addr = inet_addr(loopback_ipv4);
+}
+
+static void set_ipv6_tcp_address(const struct service_fixture *const srv,
+				 struct sockaddr_in6 *ipv6_addr)
+{
+	ipv6_addr->sin6_family = srv->protocol.domain;
+	ipv6_addr->sin6_port = htons(srv->port);
+	inet_pton(AF_INET6, loopback_ipv6, &ipv6_addr->sin6_addr);
+}
+
 static int set_service(struct service_fixture *const srv,
 		       const struct protocol_variant prot,
 		       const unsigned short index)
@@ -56,15 +72,11 @@ static int set_service(struct service_fixture *const srv,
 	switch (prot.domain) {
 	case AF_UNSPEC:
 	case AF_INET:
-		srv->ipv4_addr.sin_family = prot.domain;
-		srv->ipv4_addr.sin_port = htons(srv->port);
-		srv->ipv4_addr.sin_addr.s_addr = inet_addr(loopback_ipv4);
+		set_ipv4_tcp_address(srv, &srv->ipv4_addr);
 		return 0;
 
 	case AF_INET6:
-		srv->ipv6_addr.sin6_family = prot.domain;
-		srv->ipv6_addr.sin6_port = htons(srv->port);
-		inet_pton(AF_INET6, loopback_ipv6, &srv->ipv6_addr.sin6_addr);
+		set_ipv6_tcp_address(srv, &srv->ipv6_addr);
 		return 0;
 
 	case AF_UNIX:
@@ -181,6 +193,17 @@ static uint16_t get_binded_port(int socket_fd,
 	}
 }
 
+static int sys_bind(const int sock_fd, const struct sockaddr *addr,
+		    socklen_t addrlen)
+{
+	int ret;
+
+	ret = bind(sock_fd, addr, addrlen);
+	if (ret < 0)
+		return -errno;
+	return 0;
+}
+
 static int bind_variant_addrlen(const int sock_fd,
 				const struct service_fixture *const srv,
 				const socklen_t addrlen)
@@ -217,6 +240,17 @@ static int bind_variant(const int sock_fd,
 	return bind_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
 }
 
+static int sys_connect(const int sock_fd, const struct sockaddr *addr,
+		       socklen_t addrlen)
+{
+	int ret;
+
+	ret = connect(sock_fd, addr, addrlen);
+	if (ret < 0)
+		return -errno;
+	return 0;
+}
+
 static int connect_variant_addrlen(const int sock_fd,
 				   const struct service_fixture *const srv,
 				   const socklen_t addrlen)
@@ -923,6 +957,289 @@ TEST_F(protocol, connect_unspec)
 	EXPECT_EQ(0, close(bind_fd));
 }
 
+FIXTURE(tcp_errors_consistency)
+{
+	struct service_fixture srv0, srv1;
+	struct sockaddr *inval_addr_p0;
+	socklen_t addrlen_min;
+
+	struct sockaddr_in inval_ipv4_addr;
+	struct sockaddr_in6 inval_ipv6_addr;
+};
+
+FIXTURE_VARIANT(tcp_errors_consistency)
+{
+	const enum sandbox_type sandbox;
+	const int domain;
+	bool allowed;
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv4) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.domain = AF_INET,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv6) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.domain = AF_INET6,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv4) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.domain = AF_INET,
+	.allowed = false,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv4) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.domain = AF_INET,
+	.allowed = true,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv6) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.domain = AF_INET6,
+	.allowed = false,
+};
+
+/* clang-format off */
+FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv6) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.domain = AF_INET6,
+	.allowed = true,
+};
+
+FIXTURE_SETUP(tcp_errors_consistency)
+{
+	const struct protocol_variant tcp_prot = {
+		.domain = variant->domain,
+		.type = SOCK_STREAM,
+	};
+
+	disable_caps(_metadata);
+
+	set_service(&self->srv0, tcp_prot, 0);
+	set_service(&self->srv1, tcp_prot, 1);
+
+	if (variant->domain == AF_INET) {
+		set_ipv4_tcp_address(&self->srv0, &self->inval_ipv4_addr);
+		self->inval_ipv4_addr.sin_family = AF_INET6;
+
+		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv4_addr;
+		self->addrlen_min = sizeof(struct sockaddr_in);
+	} else {
+		set_ipv6_tcp_address(&self->srv0, &self->inval_ipv6_addr);
+		self->inval_ipv6_addr.sin6_family = AF_INET;
+
+		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv6_addr;
+		self->addrlen_min = SIN6_LEN_RFC2133;
+	}
+
+	setup_loopback(_metadata);
+};
+
+FIXTURE_TEARDOWN(tcp_errors_consistency)
+{
+}
+
+/*
+ * Validates that Landlock provides errors consistency for bind(2) operation
+ * (not restricted, allowed and denied).
+ *
+ * Error consistency implies that in sandboxed process, bind(2) returns the same
+ * errors and in the same order (assuming multiple errors) as during normal
+ * execution.
+ */
+TEST_F(tcp_errors_consistency, bind)
+{
+	if (variant->sandbox == TCP_SANDBOX) {
+		const struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
+		};
+		int ruleset_fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		ASSERT_LE(0, ruleset_fd);
+
+		if (variant->allowed) {
+			const struct landlock_net_port_attr tcp_bind_p0 = {
+				.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
+				.port = self->srv0.port,
+			};
+
+			/* Allows bind for the first port. */
+			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
+						       LANDLOCK_RULE_NET_PORT,
+						       &tcp_bind_p0, 0));
+		}
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+	int sock_fd;
+
+	sock_fd = socket_variant(&self->srv0);
+	ASSERT_LE(0, sock_fd);
+
+	/*
+	 * Tries to bind socket to address with invalid sa_family value
+	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
+	 */
+	EXPECT_EQ(-EAFNOSUPPORT,
+		  sys_bind(sock_fd, self->inval_addr_p0, self->addrlen_min));
+
+	if (variant->domain == AF_INET) {
+		struct sockaddr_in ipv4_unspec_addr;
+
+		set_ipv4_tcp_address(&self->srv0, &ipv4_unspec_addr);
+		ipv4_unspec_addr.sin_family = AF_UNSPEC;
+		/*
+		 * Ipv4 bind(2) accepts AF_UNSPEC family in address only if address is
+		 * INADDR_ANY. Otherwise, returns -EAFNOSUPPORT.
+		 */
+		EXPECT_EQ(-EAFNOSUPPORT,
+			  sys_bind(sock_fd,
+				   (struct sockaddr *)&ipv4_unspec_addr,
+				   self->addrlen_min));
+	}
+
+	/* Tries to bind with too small addrlen (Cf. inet_bind_sk). */
+	EXPECT_EQ(-EINVAL, sys_bind(sock_fd, self->inval_addr_p0,
+				    self->addrlen_min - 1));
+
+	ASSERT_EQ(0, close(sock_fd));
+}
+
+/*
+ * Validates that Landlock provides errors consistency for connect(2) operation
+ * (not restricted, allowed and denied).
+ *
+ * Error consistency implies that in sandboxed process, connect(2) returns the
+ * same errors and in the same order (assuming multiple errors) as during normal
+ * execution.
+ */
+TEST_F(tcp_errors_consistency, connect)
+{
+	int nonblock_p0_fd;
+
+	nonblock_p0_fd = socket(variant->domain,
+				SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
+	ASSERT_LE(0, nonblock_p0_fd);
+
+	/* Tries to connect nonblocking socket before establishing ruleset. */
+	ASSERT_EQ(-EINPROGRESS, connect_variant(nonblock_p0_fd, &self->srv0));
+
+	if (variant->sandbox == TCP_SANDBOX) {
+		const struct landlock_ruleset_attr ruleset_attr = {
+			.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+		};
+		const struct landlock_net_port_attr tcp_connect_p1 = {
+			.allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP,
+			.port = self->srv1.port,
+		};
+		int ruleset_fd;
+
+		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
+						     sizeof(ruleset_attr), 0);
+		ASSERT_LE(0, ruleset_fd);
+
+		/* Allows connect for the second port. */
+		ASSERT_EQ(0,
+			  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
+					    &tcp_connect_p1, 0));
+
+		if (variant->allowed) {
+			const struct landlock_net_port_attr tcp_connect_p0 = {
+				.allowed_access =
+					LANDLOCK_ACCESS_NET_CONNECT_TCP,
+				.port = self->srv0.port,
+			};
+
+			/* Allows connect for the first port. */
+			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
+						       LANDLOCK_RULE_NET_PORT,
+						       &tcp_connect_p0, 0));
+		}
+
+		enforce_ruleset(_metadata, ruleset_fd);
+		EXPECT_EQ(0, close(ruleset_fd));
+	}
+	int client_p0_fd, client_p1_fd, server_p0_fd, server_p1_fd;
+
+	client_p0_fd = socket_variant(&self->srv0);
+	ASSERT_LE(0, client_p0_fd);
+	/*
+	 * Tries to connect socket to address with invalid sa_family value
+	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
+	 */
+	EXPECT_EQ(-EAFNOSUPPORT, sys_connect(client_p0_fd, self->inval_addr_p0,
+					     self->addrlen_min));
+
+	/* Tries to connect with too small addrlen. */
+	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0,
+				       self->addrlen_min - 1));
+
+	/* Creates socket listening on zero port. */
+	server_p0_fd = socket_variant(&self->srv0);
+	ASSERT_LE(0, server_p0_fd);
+
+	ASSERT_EQ(0, bind_variant(server_p0_fd, &self->srv0));
+	ASSERT_EQ(0, listen(server_p0_fd, backlog));
+	/* Tries to connect listening socket. */
+	EXPECT_EQ(-EISCONN, sys_connect(server_p0_fd, self->inval_addr_p0,
+					self->addrlen_min - 1));
+
+	/* Creates socket listening on first port. */
+	server_p1_fd = socket_variant(&self->srv1);
+	ASSERT_LE(0, server_p1_fd);
+
+	ASSERT_EQ(0, bind_variant(server_p1_fd, &self->srv1));
+	ASSERT_EQ(0, listen(server_p1_fd, backlog));
+
+	client_p1_fd = socket_variant(&self->srv1);
+	ASSERT_LE(0, client_p1_fd);
+
+	/* Connects to server_p1_fd. */
+	ASSERT_EQ(0, connect_variant(client_p1_fd, &self->srv1));
+	/* Tries to connect already connected socket. */
+	EXPECT_EQ(-EISCONN, sys_connect(client_p1_fd, self->inval_addr_p0,
+					self->addrlen_min - 1));
+
+	/*
+	 * connect(2) is called upon nonblocking socket and previous connection
+	 * attempt was closed by RST packet. Landlock cannot provide error
+	 * consistency in this case (Cf. check_tcp_connect_consistency_and_get_port()).
+	 */
+	if (variant->sandbox == TCP_SANDBOX) {
+		EXPECT_EQ(-EACCES,
+			  connect_variant(nonblock_p0_fd, &self->srv0));
+	} else {
+		EXPECT_EQ(-ECONNREFUSED,
+			  connect_variant(nonblock_p0_fd, &self->srv0));
+	}
+
+	/* Tries to connect with zero as addrlen. */
+	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0, 0));
+
+	ASSERT_EQ(0, close(client_p1_fd));
+	ASSERT_EQ(0, close(server_p1_fd));
+	ASSERT_EQ(0, close(server_p0_fd));
+	ASSERT_EQ(0, close(client_p0_fd));
+	ASSERT_EQ(0, close(nonblock_p0_fd));
+}
+
 FIXTURE(ipv4)
 {
 	struct service_fixture srv0, srv1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
                   ` (5 preceding siblings ...)
  2024-10-17 11:04 ` [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  2024-12-10 18:08   ` Mickaël Salaün
  2024-10-17 11:04 ` [RFC PATCH v2 8/8] selftests/landlock: Test that SCTP actions are not restricted Mikhail Ivanov
  7 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Add recommendation to specify Landlock first in CONFIG_LSM list, so user
can have better LSM errors consistency provided by Landlock.

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 Documentation/userspace-api/landlock.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index bb7480a05e2c..0db5eee9bffa 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -610,7 +610,8 @@ time as the other security modules.  The list of security modules enabled by
 default is set with ``CONFIG_LSM``.  The kernel configuration should then
 contains ``CONFIG_LSM=landlock,[...]`` with ``[...]``  as the list of other
 potentially useful security modules for the running system (see the
-``CONFIG_LSM`` help).
+``CONFIG_LSM`` help). It is recommended to specify Landlock first of all other
+modules in CONFIG_LSM list since it provides better errors consistency.
 
 Boot time configuration
 -----------------------
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [RFC PATCH v2 8/8] selftests/landlock: Test that SCTP actions are not restricted
  2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
                   ` (6 preceding siblings ...)
  2024-10-17 11:04 ` [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation Mikhail Ivanov
@ 2024-10-17 11:04 ` Mikhail Ivanov
  7 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:04 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

Extend protocol fixture with test suits for SCTP protocol.
Add CONFIG_IP_SCTP option in config.

Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
---
 tools/testing/selftests/landlock/config     |  1 +
 tools/testing/selftests/landlock/net_test.c | 83 ++++++++++++++++++---
 2 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
index 52988e8a56cc..a96d42dc850d 100644
--- a/tools/testing/selftests/landlock/config
+++ b/tools/testing/selftests/landlock/config
@@ -1,6 +1,7 @@
 CONFIG_CGROUPS=y
 CONFIG_CGROUP_SCHED=y
 CONFIG_INET=y
+CONFIG_IP_SCTP=y
 CONFIG_IPV6=y
 CONFIG_KEYS=y
 CONFIG_LSM="landlock"
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index 30b29bf10bdc..fa382a2e3b58 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -97,13 +97,28 @@ static void setup_loopback(struct __test_metadata *const _metadata)
 	clear_ambient_cap(_metadata, CAP_NET_ADMIN);
 }
 
-static bool prot_is_tcp(const struct protocol_variant *const prot)
+static bool prot_is_inet_stream(const struct protocol_variant *const prot)
 {
 	return (prot->domain == AF_INET || prot->domain == AF_INET6) &&
-	       prot->type == SOCK_STREAM &&
+	       prot->type == SOCK_STREAM;
+}
+
+static bool prot_is_tcp(const struct protocol_variant *const prot)
+{
+	return prot_is_inet_stream(prot) &&
 	       (prot->protocol == IPPROTO_TCP || prot->protocol == IPPROTO_IP);
 }
 
+static bool prot_is_sctp(const struct protocol_variant *const prot)
+{
+	return prot_is_inet_stream(prot) && prot->protocol == IPPROTO_SCTP;
+}
+
+static bool prot_is_unix_stream(const struct protocol_variant *const prot)
+{
+	return prot->domain == AF_UNIX && prot->type == SOCK_STREAM;
+}
+
 static bool is_restricted(const struct protocol_variant *const prot,
 			  const enum sandbox_type sandbox)
 {
@@ -357,6 +372,17 @@ FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_mptcp) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_sctp) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_SCTP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp1) {
 	/* clang-format on */
@@ -391,6 +417,17 @@ FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_mptcp) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_sctp) {
+	/* clang-format on */
+	.sandbox = NO_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_SCTP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_udp) {
 	/* clang-format on */
@@ -465,6 +502,17 @@ FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_mptcp) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_sctp) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_SCTP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp1) {
 	/* clang-format on */
@@ -499,6 +547,17 @@ FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_mptcp) {
 	},
 };
 
+/* clang-format off */
+FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_sctp) {
+	/* clang-format on */
+	.sandbox = TCP_SANDBOX,
+	.prot = {
+		.domain = AF_INET6,
+		.type = SOCK_STREAM,
+		.protocol = IPPROTO_SCTP,
+	},
+};
+
 /* clang-format off */
 FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_udp) {
 	/* clang-format on */
@@ -793,7 +852,7 @@ TEST_F(protocol, bind_unspec)
 
 	/* Allowed bind on AF_UNSPEC/INADDR_ANY. */
 	ret = bind_variant(bind_fd, &self->unspec_any0);
-	if (variant->prot.domain == AF_INET) {
+	if (variant->prot.domain == AF_INET && !prot_is_sctp(&variant->prot)) {
 		EXPECT_EQ(0, ret)
 		{
 			TH_LOG("Failed to bind to unspec/any socket: %s",
@@ -819,7 +878,7 @@ TEST_F(protocol, bind_unspec)
 
 	/* Denied bind on AF_UNSPEC/INADDR_ANY. */
 	ret = bind_variant(bind_fd, &self->unspec_any0);
-	if (variant->prot.domain == AF_INET) {
+	if (variant->prot.domain == AF_INET && !prot_is_sctp(&variant->prot)) {
 		if (is_restricted(&variant->prot, variant->sandbox)) {
 			EXPECT_EQ(-EACCES, ret);
 		} else {
@@ -834,7 +893,7 @@ TEST_F(protocol, bind_unspec)
 	bind_fd = socket_variant(&self->srv0);
 	ASSERT_LE(0, bind_fd);
 	ret = bind_variant(bind_fd, &self->unspec_srv0);
-	if (variant->prot.domain == AF_INET) {
+	if (variant->prot.domain == AF_INET && !prot_is_sctp(&variant->prot)) {
 		EXPECT_EQ(-EAFNOSUPPORT, ret);
 	} else {
 		EXPECT_EQ(-EINVAL, ret)
@@ -899,17 +958,18 @@ TEST_F(protocol, connect_unspec)
 
 		/* Disconnects already connected socket, or set peer. */
 		ret = connect_variant(connect_fd, &self->unspec_any0);
-		if (self->srv0.protocol.domain == AF_UNIX &&
-		    self->srv0.protocol.type == SOCK_STREAM) {
+		if (prot_is_unix_stream(&variant->prot)) {
 			EXPECT_EQ(-EINVAL, ret);
+		} else if (prot_is_sctp(&variant->prot)) {
+			EXPECT_EQ(-EOPNOTSUPP, ret);
 		} else {
 			EXPECT_EQ(0, ret);
 		}
 
 		/* Tries to reconnect, or set peer. */
 		ret = connect_variant(connect_fd, &self->srv0);
-		if (self->srv0.protocol.domain == AF_UNIX &&
-		    self->srv0.protocol.type == SOCK_STREAM) {
+		if (prot_is_unix_stream(&variant->prot) ||
+		    prot_is_sctp(&variant->prot)) {
 			EXPECT_EQ(-EISCONN, ret);
 		} else {
 			EXPECT_EQ(0, ret);
@@ -926,9 +986,10 @@ TEST_F(protocol, connect_unspec)
 		}
 
 		ret = connect_variant(connect_fd, &self->unspec_any0);
-		if (self->srv0.protocol.domain == AF_UNIX &&
-		    self->srv0.protocol.type == SOCK_STREAM) {
+		if (prot_is_unix_stream(&variant->prot)) {
 			EXPECT_EQ(-EINVAL, ret);
+		} else if (prot_is_sctp(&variant->prot)) {
+			EXPECT_EQ(-EOPNOTSUPP, ret);
 		} else {
 			/* Always allowed to disconnect. */
 			EXPECT_EQ(0, ret);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions
  2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
@ 2024-10-17 11:34   ` Mikhail Ivanov
  2024-10-17 12:48   ` Tetsuo Handa
  2024-12-04 19:32   ` Mickaël Salaün
  2 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-17 11:34 UTC (permalink / raw)
  To: mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze

On 10/17/2024 2:04 PM, Mikhail Ivanov wrote:

[...]

> +static int
> +check_tcp_connect_consistency_and_get_port(struct socket *const sock,
> +					   struct sockaddr *const address,
> +					   const int addrlen, __be16 *port)
> +{
> +	int err = 0;
> +	struct sock *const sk = sock->sk;
> +
> +	/* Cf. __inet_stream_connect(). */
> +	lock_sock(sk);
> +	switch (sock->state) {
> +	default:
> +		err = -EINVAL;
> +		break;
> +	case SS_CONNECTED:
> +		err = -EISCONN;
> +		break;
> +	case SS_CONNECTING:
> +		/*
> +		 * Calling connect(2) on nonblocking socket with SYN_SENT or SYN_RECV
> +		 * state immediately returns -EISCONN and -EALREADY (Cf. __inet_stream_connect()).
> +		 *
> +		 * This check is not tested with kselftests.
> +		 */
> +		if ((sock->file->f_flags & O_NONBLOCK) &&
> +		    ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
> +			if (inet_test_bit(DEFER_CONNECT, sk))
> +				err = -EISCONN;
> +			else
> +				err = -EALREADY;
> +			break;
> +		}
> +
> +		/*
> +		 * Current state is possible in two cases:
> +		 * 1. connect(2) is called upon nonblocking socket and previous
> +		 *    connection attempt was closed by RST packet (therefore socket is
> +		 *    in TCP_CLOSE state). In this case connect(2) calls
> +		 *    sk_prot->disconnect(), changes socket state and increases number
> +		 *    of disconnects.
> +		 * 2. connect(2) is called twice upon socket with TCP_FASTOPEN_CONNECT
> +		 *    option set. If socket state is TCP_CLOSE connect(2) does the
> +		 *    same logic as in point 1 case. Otherwise connect(2) may freeze
> +		 *    after inet_wait_for_connect() call since SYN was never sent.
> +		 *
> +		 * For both this cases Landlock cannot provide error consistency since
> +		 * 1. Both cases involve executing some network stack logic and changing
> +		 *    the socket state.
> +		 * 2. It cannot omit access check and allow network stack handle error
> +		 *    consistency since socket can change its state to SS_UNCONNECTED
> +		 *    before it will be locked again in inet_stream_connect().
> +		 *
> +		 * Therefore it is only possible to return 0 and check access right with
> +		 * check_access_port() helper.
> +		 */
> +		release_sock(sk);
> +		return 0;

Returning 0 is incorrect since port was not extracted yet. Last two
lines should be replaced with a "break" to let further switch safely
extract a port.

This also requires fix in tcp_errors_consistency.connect kselftest.

> +	case SS_UNCONNECTED:
> +		if (sk->sk_state != TCP_CLOSE)
> +			err = -EISCONN;
> +		break;
> +	}
> +	release_sock(sk);
> +
> +	if (err)
> +		return err;
> +
> +	/* IPV6_ADDRFORM can change sk->sk_family under us. */
> +	switch (READ_ONCE(sk->sk_family)) {
> +	case AF_INET:
> +		/* Cf. tcp_v4_connect(). */
> +		if (addrlen < sizeof(struct sockaddr_in))
> +			return -EINVAL;
> +		if (address->sa_family != AF_INET)
> +			return -EAFNOSUPPORT;
> +
> +		*port = ((struct sockaddr_in *)address)->sin_port;
> +		break;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	case AF_INET6:
> +		/* Cf. tcp_v6_connect(). */
> +		if (addrlen < SIN6_LEN_RFC2133)
> +			return -EINVAL;
> +		if (address->sa_family != AF_INET6)
> +			return -EAFNOSUPPORT;
> +
> +		*port = ((struct sockaddr_in6 *)address)->sin6_port;
> +		break;
> +#endif /* IS_ENABLED(CONFIG_IPV6) */
> +	default:
> +		WARN_ON_ONCE(0);
> +		return -EACCES;
> +	}
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions
  2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
  2024-10-17 11:34   ` Mikhail Ivanov
@ 2024-10-17 12:48   ` Tetsuo Handa
  2024-11-06  9:27     ` Mikhail Ivanov
  2024-12-04 19:32   ` Mickaël Salaün
  2 siblings, 1 reply; 50+ messages in thread
From: Tetsuo Handa @ 2024-10-17 12:48 UTC (permalink / raw)
  To: Mikhail Ivanov, mic, gnoack; +Cc: linux-security-module

On 2024/10/17 20:04, Mikhail Ivanov wrote:
> +#endif /* IS_ENABLED(CONFIG_IPV6) */
> +	default:
> +		WARN_ON_ONCE(0);

WARN_ON_ONCE(0) is pointless.

> +		return -EACCES;
> +	}


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
@ 2024-10-17 12:59   ` Matthieu Baerts
  2024-10-18 18:08     ` Mickaël Salaün
  2024-12-04 19:30   ` Mickaël Salaün
  1 sibling, 1 reply; 50+ messages in thread
From: Matthieu Baerts @ 2024-10-17 12:59 UTC (permalink / raw)
  To: Mikhail Ivanov, mic, gnoack
  Cc: willemdebruijn.kernel, matthieu, linux-security-module, netdev,
	netfilter-devel, yusongping, artem.kuzin, konstantin.meskhidze,
	MPTCP Linux

Hi Mikhail and Landlock maintainers,

+cc MPTCP list.

On 17/10/2024 13:04, Mikhail Ivanov wrote:
> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> should not restrict bind(2) and connect(2) for non-TCP protocols
> (SCTP, MPTCP, SMC).

Thank you for the patch!

I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
see TCP packets with extra TCP options. On Linux, there is indeed a
dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
because we needed such dedicated socket to talk to the userspace.

I don't know Landlock well, but I think it is important to know that an
MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
will do a fallback to "plain" TCP if MPTCP is not supported by the other
peer or by a middlebox. It means that with this patch, if TCP is blocked
by Landlock, someone can simply force an application to create an MPTCP
socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
certainly work, even when connecting to a peer not supporting MPTCP.

Please note that I'm not against this modification -- especially here
when we remove restrictions around MPTCP sockets :) -- I'm just saying
it might be less confusing for users if MPTCP is considered as being
part of TCP. A bit similar to what someone would do with a firewall: if
TCP is blocked, MPTCP is blocked as well.

I understand that a future goal might probably be to have dedicated
restrictions for MPTCP and the other stream protocols (and/or for all
stream protocols like it was before this patch), but in the meantime, it
might be less confusing considering MPTCP as being part of TCP (I'm not
sure about the other stream protocols).


> sk_is_tcp() is used for this to check address family of the socket
> before doing INET-specific address length validation. This is required
> for error consistency.
> 
> Closes: https://github.com/landlock-lsm/linux/issues/40
> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")

I don't know how fixes are considered in Landlock, but should this patch
be considered as a fix? It might be surprising for someone who thought
all "stream" connections were blocked to have them unblocked when
updating to a minor kernel version, no?

(Personally, I would understand such behaviour change when upgrading to
a major version, and still, maybe only if there were alternatives to
continue having the same behaviour, e.g. a way to restrict all stream
sockets the same way, or something per stream socket. But that's just me
:) )

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-17 12:59   ` Matthieu Baerts
@ 2024-10-18 18:08     ` Mickaël Salaün
  2024-10-31 16:21       ` Mikhail Ivanov
  2024-12-04 19:27       ` Mickaël Salaün
  0 siblings, 2 replies; 50+ messages in thread
From: Mickaël Salaün @ 2024-10-18 18:08 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Mikhail Ivanov, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux

On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> Hi Mikhail and Landlock maintainers,
> 
> +cc MPTCP list.

Thanks, we should include this list in the next series.

> 
> On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > should not restrict bind(2) and connect(2) for non-TCP protocols
> > (SCTP, MPTCP, SMC).
> 
> Thank you for the patch!
> 
> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> see TCP packets with extra TCP options. On Linux, there is indeed a
> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> because we needed such dedicated socket to talk to the userspace.
> 
> I don't know Landlock well, but I think it is important to know that an
> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> will do a fallback to "plain" TCP if MPTCP is not supported by the other
> peer or by a middlebox. It means that with this patch, if TCP is blocked
> by Landlock, someone can simply force an application to create an MPTCP
> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> certainly work, even when connecting to a peer not supporting MPTCP.
> 
> Please note that I'm not against this modification -- especially here
> when we remove restrictions around MPTCP sockets :) -- I'm just saying
> it might be less confusing for users if MPTCP is considered as being
> part of TCP. A bit similar to what someone would do with a firewall: if
> TCP is blocked, MPTCP is blocked as well.

Good point!  I don't know well MPTCP but I think you're right.  Given
it's close relationship with TCP and the fallback mechanism, it would
make sense for users to not make a difference and it would avoid bypass
of misleading restrictions.  Moreover the Landlock rules are simple and
only control TCP ports, not peer addresses, which seems to be the main
evolution of MPTCP.

> 
> I understand that a future goal might probably be to have dedicated
> restrictions for MPTCP and the other stream protocols (and/or for all
> stream protocols like it was before this patch), but in the meantime, it
> might be less confusing considering MPTCP as being part of TCP (I'm not
> sure about the other stream protocols).

We need to take a closer look at the other stream protocols indeed.

> 
> 
> > sk_is_tcp() is used for this to check address family of the socket
> > before doing INET-specific address length validation. This is required
> > for error consistency.
> > 
> > Closes: https://github.com/landlock-lsm/linux/issues/40
> > Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> 
> I don't know how fixes are considered in Landlock, but should this patch
> be considered as a fix? It might be surprising for someone who thought
> all "stream" connections were blocked to have them unblocked when
> updating to a minor kernel version, no?

Indeed.  The main issue was with the semantic/definition of
LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
code with the documentation, one way or the other, preferably following
the principle of least astonishment.

> 
> (Personally, I would understand such behaviour change when upgrading to
> a major version, and still, maybe only if there were alternatives to

This "fix" needs to be backported, but we're not clear yet on what it
should be. :)

> continue having the same behaviour, e.g. a way to restrict all stream
> sockets the same way, or something per stream socket. But that's just me
> :) )

The documentation and the initial idea was to control TCP bind and
connect.  The kernel implementation does more than that, so we need to
synthronize somehow.

> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-18 18:08     ` Mickaël Salaün
@ 2024-10-31 16:21       ` Mikhail Ivanov
  2024-11-08 17:16         ` David Laight
  2024-12-12 18:43         ` Mickaël Salaün
  2024-12-04 19:27       ` Mickaël Salaün
  1 sibling, 2 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-10-31 16:21 UTC (permalink / raw)
  To: Mickaël Salaün, Matthieu Baerts
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux

On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>> Hi Mikhail and Landlock maintainers,
>>
>> +cc MPTCP list.
> 
> Thanks, we should include this list in the next series.
> 
>>
>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>> (SCTP, MPTCP, SMC).
>>
>> Thank you for the patch!
>>
>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>> see TCP packets with extra TCP options. On Linux, there is indeed a
>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>> because we needed such dedicated socket to talk to the userspace.
>>
>> I don't know Landlock well, but I think it is important to know that an
>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>> by Landlock, someone can simply force an application to create an MPTCP
>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>> certainly work, even when connecting to a peer not supporting MPTCP.
>>
>> Please note that I'm not against this modification -- especially here
>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>> it might be less confusing for users if MPTCP is considered as being
>> part of TCP. A bit similar to what someone would do with a firewall: if
>> TCP is blocked, MPTCP is blocked as well.
> 
> Good point!  I don't know well MPTCP but I think you're right.  Given
> it's close relationship with TCP and the fallback mechanism, it would
> make sense for users to not make a difference and it would avoid bypass
> of misleading restrictions.  Moreover the Landlock rules are simple and
> only control TCP ports, not peer addresses, which seems to be the main
> evolution of MPTCP. >
>>
>> I understand that a future goal might probably be to have dedicated
>> restrictions for MPTCP and the other stream protocols (and/or for all
>> stream protocols like it was before this patch), but in the meantime, it
>> might be less confusing considering MPTCP as being part of TCP (I'm not
>> sure about the other stream protocols).
> 
> We need to take a closer look at the other stream protocols indeed.
Hello! Sorry for the late reply, I was on a small business trip.

Thanks a lot for this catch, without doubt MPTCP should be controlled
with TCP access rights.

In that case, we should reconsider current semantics of TCP control.

Currently, it looks like this:
* LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
* LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
   remote port.

According to these definitions only TCP sockets should be restricted and
this is already provided by Landlock (considering observing commit)
(assuming that "TCP socket" := user space socket of IPPROTO_TCP
protocol).

AFAICS the two objectives of TCP access rights are to control
(1) which ports can be used for sending or receiving TCP packets
     (including SYN, ACK or other service packets).
(2) which ports can be used to establish TCP connection (performed by
     kernel network stack on server or client side).

In most cases denying (2) cause denying (1). Sending or receiving TCP
packets without initial 3-way handshake is only possible on RAW [1] or
PACKET [2] sockets. Usage of such sockets requires root privilligies, so
there is no point to control them with Landlock.

Therefore Landlock should only take care about case (2). For now
(please correct me if I'm wrong), we only considered control of
connection performed on user space plain TCP sockets (created with
IPPROTO_TCP).

TCP kernel sockets are generally used in the following ways:
* in a couple of other user space protocols (MPTCP, SMC, RDS)
* in a few network filesystems (e.g. NFS communication over TCP)

For the second case TCP connection is currently not restricted by
Landlock. This approach is may be correct, since NFS should not have
access to a plain TCP communication and TCP restriction of NFS may
be too implicit. Nevertheless, I think that restriction via current
access rights should be considered.

For the first case, each protocol use TCP differently, so they should
be considered separately.

In the case of MPTCP TCP internal sockets are used to establish
connection and exchange data between two network interfaces. MPTCP
allows to have multiple TCP connections between two MPTCP sockets by
connecting different network interfaces (e.g. WIFI and 3G).

Shared Memory Communication is a protocol that allows TCP applications
transparently use RDMA for communication [3]. TCP internal socket is
used to exchange service CLC messages when establishing SMC connection
(which seems harmless for sandboxing) and for communication in the case
of fallback. Fallback happens only if RDMA communication became
impossible (e.g. if RDMA capable RNIC card went down on host or peer
side). So, preventing TCP communication may be achieved by controlling
fallback mechanism.

Reliable Datagram Socket is connectionless protocol implemented by
Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
connection and use it to deliever splitted message.

In comparison with previous protocols, RDS sockets cannot be binded or
connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
port is assigned to receiving side and sending side is binded to the
port allocated by the kernel (by using zero as port number).

It may be useful to restrict RDS-over-TCP with current access rights,
since it allows to perform TCP communication from user-space. But it
would be only possible to fully allow or deny sending/receiving
(since used ports are not controlled from user space).

Restricting any TCP connection in the kernel is probably simplest
design, but we should consider above cases to provide the most useful
one.

[1] https://man7.org/linux/man-pages/man7/raw.7.html
[2] https://man7.org/linux/man-pages/man7/packet.7.html
[3] https://datatracker.ietf.org/doc/html/rfc7609
[4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html

> 
>>
>>
>>> sk_is_tcp() is used for this to check address family of the socket
>>> before doing INET-specific address length validation. This is required
>>> for error consistency.
>>>
>>> Closes: https://github.com/landlock-lsm/linux/issues/40
>>> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
>>
>> I don't know how fixes are considered in Landlock, but should this patch
>> be considered as a fix? It might be surprising for someone who thought
>> all "stream" connections were blocked to have them unblocked when
>> updating to a minor kernel version, no?
> 
> Indeed.  The main issue was with the semantic/definition of
> LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
> code with the documentation, one way or the other, preferably following
> the principle of least astonishment.
> 
>>
>> (Personally, I would understand such behaviour change when upgrading to
>> a major version, and still, maybe only if there were alternatives to
> 
> This "fix" needs to be backported, but we're not clear yet on what it
> should be. :)
> 
>> continue having the same behaviour, e.g. a way to restrict all stream
>> sockets the same way, or something per stream socket. But that's just me
>> :) )
> 
> The documentation and the initial idea was to control TCP bind and
> connect.  The kernel implementation does more than that, so we need to
> synthronize somehow.
> 
>>
>> Cheers,
>> Matt
>> -- 
>> Sponsored by the NGI0 Core fund.
>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions
  2024-10-17 12:48   ` Tetsuo Handa
@ 2024-11-06  9:27     ` Mikhail Ivanov
  0 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-11-06  9:27 UTC (permalink / raw)
  To: Tetsuo Handa, mic, gnoack; +Cc: linux-security-module

On 10/17/2024 3:48 PM, Tetsuo Handa wrote:
> On 2024/10/17 20:04, Mikhail Ivanov wrote:
>> +#endif /* IS_ENABLED(CONFIG_IPV6) */
>> +	default:
>> +		WARN_ON_ONCE(0);
> 
> WARN_ON_ONCE(0) is pointless.

thanks! will be fixed

> 
>> +		return -EACCES;
>> +	}
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-31 16:21       ` Mikhail Ivanov
@ 2024-11-08 17:16         ` David Laight
  2024-12-04 19:29           ` Mickaël Salaün
  2024-12-12 18:43         ` Mickaël Salaün
  1 sibling, 1 reply; 50+ messages in thread
From: David Laight @ 2024-11-08 17:16 UTC (permalink / raw)
  To: 'Mikhail Ivanov', Mickaël Salaün,
	Matthieu Baerts, linux-sctp@vger.kernel.org
  Cc: gnoack@google.com, willemdebruijn.kernel@gmail.com,
	matthieu@buffet.re, linux-security-module@vger.kernel.org,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	yusongping@huawei.com, artem.kuzin@huawei.com,
	konstantin.meskhidze@huawei.com, MPTCP Linux

From: Mikhail Ivanov
> Sent: 31 October 2024 16:22
> 
> On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> >> Hi Mikhail and Landlock maintainers,
> >>
> >> +cc MPTCP list.
> >
> > Thanks, we should include this list in the next series.
> >
> >>
> >> On 17/10/2024 13:04, Mikhail Ivanov wrote:
> >>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> >>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> >>> should not restrict bind(2) and connect(2) for non-TCP protocols
> >>> (SCTP, MPTCP, SMC).

I suspect you should check all IP protocols.
After all if TCP is banned why should SCTP be allowed?
Maybe you should have a different (probably more severe) restriction on SCTP.
You'd also need to look at the socket options used to add additional
local and remote IP addresses to a connect attempt.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-18 18:08     ` Mickaël Salaün
  2024-10-31 16:21       ` Mikhail Ivanov
@ 2024-12-04 19:27       ` Mickaël Salaün
  2024-12-04 19:35         ` Mickaël Salaün
  1 sibling, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-04 19:27 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Mikhail Ivanov, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > Hi Mikhail and Landlock maintainers,
> > 
> > +cc MPTCP list.
> 
> Thanks, we should include this list in the next series.
> 
> > 
> > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > (SCTP, MPTCP, SMC).
> > 
> > Thank you for the patch!
> > 
> > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > see TCP packets with extra TCP options. On Linux, there is indeed a
> > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > because we needed such dedicated socket to talk to the userspace.
> > 
> > I don't know Landlock well, but I think it is important to know that an
> > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > by Landlock, someone can simply force an application to create an MPTCP
> > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > certainly work, even when connecting to a peer not supporting MPTCP.
> > 
> > Please note that I'm not against this modification -- especially here
> > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > it might be less confusing for users if MPTCP is considered as being
> > part of TCP. A bit similar to what someone would do with a firewall: if
> > TCP is blocked, MPTCP is blocked as well.
> 
> Good point!  I don't know well MPTCP but I think you're right.  Given
> it's close relationship with TCP and the fallback mechanism, it would
> make sense for users to not make a difference and it would avoid bypass
> of misleading restrictions.  Moreover the Landlock rules are simple and
> only control TCP ports, not peer addresses, which seems to be the main
> evolution of MPTCP.

Thinking more about this, this makes sense from the point of view of the
network stack, but looking at external (potentially bogus) firewalls or
malware detection systems, it is something different.  If we don't
provide a way for users to differenciate the control of SCTP from TCP,
malicious use of SCTP could still bypass this kind of bogus security
appliances.  It would then be safer to stick to the protocol semantic by
clearly differenciating TCP from MPTCP (or any other protocol).

Mikhail, could you please send a new patch series containing one patch
to fix the kernel and another to extend tests?  We should also include
this rationale in the commit message.

> 
> > 
> > I understand that a future goal might probably be to have dedicated
> > restrictions for MPTCP and the other stream protocols (and/or for all
> > stream protocols like it was before this patch), but in the meantime, it
> > might be less confusing considering MPTCP as being part of TCP (I'm not
> > sure about the other stream protocols).
> 
> We need to take a closer look at the other stream protocols indeed.

It would be nice to add support for MPTCP too, but this will be treated
as a new Landlock feature (with a proper ABI bump).

> 
> > 
> > 
> > > sk_is_tcp() is used for this to check address family of the socket
> > > before doing INET-specific address length validation. This is required
> > > for error consistency.
> > > 
> > > Closes: https://github.com/landlock-lsm/linux/issues/40
> > > Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-11-08 17:16         ` David Laight
@ 2024-12-04 19:29           ` Mickaël Salaün
  0 siblings, 0 replies; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-04 19:29 UTC (permalink / raw)
  To: David Laight
  Cc: 'Mikhail Ivanov', Matthieu Baerts,
	linux-sctp@vger.kernel.org, gnoack@google.com,
	willemdebruijn.kernel@gmail.com, matthieu@buffet.re,
	linux-security-module@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, yusongping@huawei.com,
	artem.kuzin@huawei.com, konstantin.meskhidze@huawei.com,
	MPTCP Linux

On Fri, Nov 08, 2024 at 05:16:50PM +0000, David Laight wrote:
> From: Mikhail Ivanov
> > Sent: 31 October 2024 16:22
> > 
> > On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > >> Hi Mikhail and Landlock maintainers,
> > >>
> > >> +cc MPTCP list.
> > >
> > > Thanks, we should include this list in the next series.
> > >
> > >>
> > >> On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > >>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > >>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > >>> should not restrict bind(2) and connect(2) for non-TCP protocols
> > >>> (SCTP, MPTCP, SMC).
> 
> I suspect you should check all IP protocols.
> After all if TCP is banned why should SCTP be allowed?
> Maybe you should have a different (probably more severe) restriction on SCTP.
> You'd also need to look at the socket options used to add additional
> local and remote IP addresses to a connect attempt.

Indeed, setsockopt()'s SCTP_SOCKOPT_BINDX_ADD and SCTP_SOCKOPT_CONNECTX
don't go through the socket_bind() nor socket_connect() LSM hooks bu the
security_sctp_bind_connect() hook instead.  This SCTP-specific hook is
not implemented for Landlock and the current implementation only
partially control such operations for SCTP.  This also make it clear
that we really need to stick to TCP-only for the TCP access rights.

It would be nice to add support for SCTP but we'll need to implement
security_sctp_bind_connect() and new tests with the setsockopt()
commands.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
  2024-10-17 12:59   ` Matthieu Baerts
@ 2024-12-04 19:30   ` Mickaël Salaün
  2024-12-09 10:19     ` Mikhail Ivanov
  1 sibling, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-04 19:30 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On Thu, Oct 17, 2024 at 07:04:47PM +0800, Mikhail Ivanov wrote:
> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> should not restrict bind(2) and connect(2) for non-TCP protocols
> (SCTP, MPTCP, SMC).
> 
> sk_is_tcp() is used for this to check address family of the socket
> before doing INET-specific address length validation. This is required
> for error consistency.
> 
> Closes: https://github.com/landlock-lsm/linux/issues/40
> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
> ---
> 
> Changes since v1:
> * Validate socket family (=INET{,6}) before any other checks
>   with sk_is_tcp().
> ---
>  security/landlock/net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/security/landlock/net.c b/security/landlock/net.c
> index fdc1bb0a9c5d..1e80782ba239 100644
> --- a/security/landlock/net.c
> +++ b/security/landlock/net.c
> @@ -66,8 +66,8 @@ static int current_check_access_socket(struct socket *const sock,
>  	if (WARN_ON_ONCE(dom->num_layers < 1))
>  		return -EACCES;
>  
> -	/* Checks if it's a (potential) TCP socket. */
> -	if (sock->type != SOCK_STREAM)
> +	/* Do not restrict non-TCP sockets. */

You can remove this comment because the following check is explicit.

> +	if (!sk_is_tcp(sock->sk))
>  		return 0;
>  
>  	/* Checks for minimal header length to safely read sa_family. */
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions
  2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
  2024-10-17 11:34   ` Mikhail Ivanov
  2024-10-17 12:48   ` Tetsuo Handa
@ 2024-12-04 19:32   ` Mickaël Salaün
  2 siblings, 0 replies; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-04 19:32 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

Something is wrong with this patch.

On Thu, Oct 17, 2024 at 07:04:49PM +0800, Mikhail Ivanov wrote:
> Add two helpers for TCP bind/connect accesses, which will serve to perform
> action-specific network stack level checks and safely extract the port from
> the address.
> 
> Return -EAFNOSUPPORT instead of -EINVAL in sin_family checks.
> 
> Check socket state before validating address for TCP connect access. This
> is necessary to follow the error order of network stack.
> 
> Read sk_family value from socket structure with READ_ONCE to safely handle
> IPV6_ADDRFORM case (see [1]).
> 
> [1] https://lore.kernel.org/all/20240202095404.183274-1-edumazet@google.com/
> 
> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
> ---
>  security/landlock/net.c | 543 +++++++++++++++++++++++-----------------
>  1 file changed, 315 insertions(+), 228 deletions(-)
>  rewrite security/landlock/net.c (37%)
> 
> diff --git a/security/landlock/net.c b/security/landlock/net.c
> dissimilarity index 37%
> index a3142f9b15ee..06791aba9196 100644
> --- a/security/landlock/net.c
> +++ b/security/landlock/net.c
> @@ -1,228 +1,315 @@
> -// SPDX-License-Identifier: GPL-2.0-only
> -/*
> - * Landlock LSM - Network management and hooks
> - *
> - * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
> - * Copyright © 2022-2023 Microsoft Corporation
> - */
> -
> -#include <linux/in.h>
> -#include <linux/net.h>
> -#include <linux/socket.h>
> -#include <net/ipv6.h>
> -
> -#include "common.h"
> -#include "cred.h"
> -#include "limits.h"
> -#include "net.h"
> -#include "ruleset.h"
> -
> -int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
> -			     const u16 port, access_mask_t access_rights)
> -{
> -	int err;
> -	const struct landlock_id id = {
> -		.key.data = (__force uintptr_t)htons(port),
> -		.type = LANDLOCK_KEY_NET_PORT,
> -	};
> -
> -	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
> -
> -	/* Transforms relative access rights to absolute ones. */
> -	access_rights |= LANDLOCK_MASK_ACCESS_NET &
> -			 ~landlock_get_net_access_mask(ruleset, 0);
> -
> -	mutex_lock(&ruleset->lock);
> -	err = landlock_insert_rule(ruleset, id, access_rights);
> -	mutex_unlock(&ruleset->lock);
> -
> -	return err;
> -}
> -
> -static const struct landlock_ruleset *get_current_net_domain(void)
> -{
> -	const union access_masks any_net = {
> -		.net = ~0,
> -	};
> -
> -	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
> -}
> -
> -static int check_access_port(const struct landlock_ruleset *const dom,
> -			     __be16 port, access_mask_t access_request)
> -{
> -	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
> -	const struct landlock_rule *rule;
> -	struct landlock_id id = {
> -		.type = LANDLOCK_KEY_NET_PORT,
> -	};
> -
> -	id.key.data = (__force uintptr_t)port;
> -	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
> -
> -	rule = landlock_find_rule(dom, id);
> -	access_request = landlock_init_layer_masks(
> -		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
> -	if (landlock_unmask_layers(rule, access_request, &layer_masks,
> -				   ARRAY_SIZE(layer_masks)))
> -		return 0;
> -
> -	return -EACCES;
> -}
> -
> -static int hook_socket_bind(struct socket *const sock,
> -			    struct sockaddr *const address, const int addrlen)
> -{
> -	__be16 port;
> -	struct sock *const sk = sock->sk;
> -	const struct landlock_ruleset *const dom = get_current_net_domain();
> -
> -	if (!dom)
> -		return 0;
> -	if (WARN_ON_ONCE(dom->num_layers < 1))
> -		return -EACCES;
> -
> -	if (sk_is_tcp(sk)) {
> -		/* Checks for minimal header length to safely read sa_family. */
> -		if (addrlen < offsetofend(typeof(*address), sa_family))
> -			return -EINVAL;
> -
> -		switch (address->sa_family) {
> -		case AF_UNSPEC:
> -		case AF_INET:
> -			if (addrlen < sizeof(struct sockaddr_in))
> -				return -EINVAL;
> -			port = ((struct sockaddr_in *)address)->sin_port;
> -			break;
> -
> -#if IS_ENABLED(CONFIG_IPV6)
> -		case AF_INET6:
> -			if (addrlen < SIN6_LEN_RFC2133)
> -				return -EINVAL;
> -			port = ((struct sockaddr_in6 *)address)->sin6_port;
> -			break;
> -#endif /* IS_ENABLED(CONFIG_IPV6) */
> -
> -		default:
> -			return 0;
> -		}
> -
> -		/*
> -		 * For compatibility reason, accept AF_UNSPEC for bind
> -		 * accesses (mapped to AF_INET) only if the address is
> -		 * INADDR_ANY (cf. __inet_bind).  Checking the address is
> -		 * required to not wrongfully return -EACCES instead of
> -		 * -EAFNOSUPPORT.
> -		 *
> -		 * We could return 0 and let the network stack handle these
> -		 * checks, but it is safer to return a proper error and test
> -		 * consistency thanks to kselftest.
> -		 */
> -		if (address->sa_family == AF_UNSPEC) {
> -			/* addrlen has already been checked for AF_UNSPEC. */
> -			const struct sockaddr_in *const sockaddr =
> -				(struct sockaddr_in *)address;
> -
> -			if (sk->sk_family != AF_INET)
> -				return -EINVAL;
> -
> -			if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
> -				return -EAFNOSUPPORT;
> -		} else {
> -			/*
> -			 * Checks sa_family consistency to not wrongfully return
> -			 * -EACCES instead of -EINVAL.  Valid sa_family changes are
> -			 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
> -			 *
> -			 * We could return 0 and let the network stack handle this
> -			 * check, but it is safer to return a proper error and test
> -			 * consistency thanks to kselftest.
> -			 */
> -			if (address->sa_family != sk->sk_family)
> -				return -EINVAL;
> -		}
> -		return check_access_port(dom, port,
> -					 LANDLOCK_ACCESS_NET_BIND_TCP);
> -	}
> -	return 0;
> -}
> -
> -static int hook_socket_connect(struct socket *const sock,
> -			       struct sockaddr *const address,
> -			       const int addrlen)
> -{
> -	__be16 port;
> -	struct sock *const sk = sock->sk;
> -	const struct landlock_ruleset *const dom = get_current_net_domain();
> -
> -	if (!dom)
> -		return 0;
> -	if (WARN_ON_ONCE(dom->num_layers < 1))
> -		return -EACCES;
> -
> -	if (sk_is_tcp(sk)) {
> -		/* Checks for minimal header length to safely read sa_family. */
> -		if (addrlen < offsetofend(typeof(*address), sa_family))
> -			return -EINVAL;
> -
> -		switch (address->sa_family) {
> -		case AF_UNSPEC:
> -		case AF_INET:
> -			if (addrlen < sizeof(struct sockaddr_in))
> -				return -EINVAL;
> -			port = ((struct sockaddr_in *)address)->sin_port;
> -			break;
> -
> -#if IS_ENABLED(CONFIG_IPV6)
> -		case AF_INET6:
> -			if (addrlen < SIN6_LEN_RFC2133)
> -				return -EINVAL;
> -			port = ((struct sockaddr_in6 *)address)->sin6_port;
> -			break;
> -#endif /* IS_ENABLED(CONFIG_IPV6) */
> -
> -		default:
> -			return 0;
> -		}
> -
> -		/*
> -		 * Connecting to an address with AF_UNSPEC dissolves the TCP
> -		 * association, which have the same effect as closing the
> -		 * connection while retaining the socket object (i.e., the file
> -		 * descriptor).  As for dropping privileges, closing
> -		 * connections is always allowed.
> -		 *
> -		 * For a TCP access control system, this request is legitimate.
> -		 * Let the network stack handle potential inconsistencies and
> -		 * return -EINVAL if needed.
> -		 */
> -		if (address->sa_family == AF_UNSPEC)
> -			return 0;
> -		/*
> -		 * Checks sa_family consistency to not wrongfully return
> -		 * -EACCES instead of -EINVAL.  Valid sa_family changes are
> -		 * only (from AF_INET or AF_INET6) to AF_UNSPEC.
> -		 *
> -		 * We could return 0 and let the network stack handle this
> -		 * check, but it is safer to return a proper error and test
> -		 * consistency thanks to kselftest.
> -		 */
> -		if (address->sa_family != sk->sk_family)
> -			return -EINVAL;
> -
> -		return check_access_port(dom, port,
> -					 LANDLOCK_ACCESS_NET_CONNECT_TCP);
> -	}
> -	return 0;
> -}
> -
> -static struct security_hook_list landlock_hooks[] __ro_after_init = {
> -	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
> -	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
> -};
> -
> -__init void landlock_add_net_hooks(void)
> -{
> -	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> -			   &landlock_lsmid);
> -}
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Landlock LSM - Network management and hooks
> + *
> + * Copyright © 2022-2023 Huawei Tech. Co., Ltd.
> + * Copyright © 2022-2023 Microsoft Corporation
> + */
> +
> +#include <linux/in.h>
> +#include <linux/net.h>
> +#include <linux/socket.h>
> +#include <net/ipv6.h>
> +
> +#include "common.h"
> +#include "cred.h"
> +#include "limits.h"
> +#include "net.h"
> +#include "ruleset.h"
> +
> +int landlock_append_net_rule(struct landlock_ruleset *const ruleset,
> +			     const u16 port, access_mask_t access_rights)
> +{
> +	int err;
> +	const struct landlock_id id = {
> +		.key.data = (__force uintptr_t)htons(port),
> +		.type = LANDLOCK_KEY_NET_PORT,
> +	};
> +
> +	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
> +
> +	/* Transforms relative access rights to absolute ones. */
> +	access_rights |= LANDLOCK_MASK_ACCESS_NET &
> +			 ~landlock_get_net_access_mask(ruleset, 0);
> +
> +	mutex_lock(&ruleset->lock);
> +	err = landlock_insert_rule(ruleset, id, access_rights);
> +	mutex_unlock(&ruleset->lock);
> +
> +	return err;
> +}
> +
> +static const struct landlock_ruleset *get_current_net_domain(void)
> +{
> +	const union access_masks any_net = {
> +		.net = ~0,
> +	};
> +
> +	return landlock_match_ruleset(landlock_get_current_domain(), any_net);
> +}
> +
> +static int check_access_port(const struct landlock_ruleset *const dom,
> +			     __be16 port, access_mask_t access_request)
> +{
> +	layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {};
> +	const struct landlock_rule *rule;
> +	struct landlock_id id = {
> +		.type = LANDLOCK_KEY_NET_PORT,
> +	};
> +
> +	id.key.data = (__force uintptr_t)port;
> +	BUILD_BUG_ON(sizeof(port) > sizeof(id.key.data));
> +
> +	rule = landlock_find_rule(dom, id);
> +	access_request = landlock_init_layer_masks(
> +		dom, access_request, &layer_masks, LANDLOCK_KEY_NET_PORT);
> +	if (landlock_unmask_layers(rule, access_request, &layer_masks,
> +				   ARRAY_SIZE(layer_masks)))
> +		return 0;
> +
> +	return -EACCES;
> +}
> +
> +/*
> + * Checks that TCP @sock and @address attributes are correct for bind(2).
> + *
> + * On success, extracts port from @address in @port and returns 0.
> + *
> + * This validation is consistent with network stack and returns the error
> + * in the order corresponding to the order of errors from the network stack.
> + * It's required to not wrongfully return -EACCES instead of meaningful network
> + * stack level errors. Consistency is tested with kselftest.
> + *
> + * This helper does not provide consistency of error codes for BPF filter
> + * (if any).
> + */
> +static int
> +check_tcp_bind_consistency_and_get_port(struct socket *const sock,
> +					struct sockaddr *const address,
> +					const int addrlen, __be16 *port)
> +{
> +	/* IPV6_ADDRFORM can change sk->sk_family under us. */
> +	switch (READ_ONCE(sock->sk->sk_family)) {
> +	case AF_INET:
> +		const struct sockaddr_in *const addr =
> +			(struct sockaddr_in *)address;
> +
> +		/* Cf. inet_bind_sk(). */
> +		if (addrlen < sizeof(struct sockaddr_in))
> +			return -EINVAL;
> +		/*
> +		 * For compatibility reason, accept AF_UNSPEC for bind
> +		 * accesses (mapped to AF_INET) only if the address is
> +		 * INADDR_ANY (cf. __inet_bind).
> +		 */
> +		if (addr->sin_family != AF_INET) {
> +			if (addr->sin_family != AF_UNSPEC ||
> +			    addr->sin_addr.s_addr != htonl(INADDR_ANY))
> +				return -EAFNOSUPPORT;
> +		}
> +		*port = ((struct sockaddr_in *)address)->sin_port;
> +		break;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	case AF_INET6:
> +		/* Cf. inet6_bind_sk(). */
> +		if (addrlen < SIN6_LEN_RFC2133)
> +			return -EINVAL;
> +		/* Cf. __inet6_bind(). */
> +		if (address->sa_family != AF_INET6)
> +			return -EAFNOSUPPORT;
> +		*port = ((struct sockaddr_in6 *)address)->sin6_port;
> +		break;
> +#endif /* IS_ENABLED(CONFIG_IPV6) */
> +	default:
> +		WARN_ON_ONCE(0);
> +		return -EACCES;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Checks that TCP @sock and @address attributes are correct for connect(2).
> + *
> + * On success, extracts port from @address in @port and returns 0.
> + *
> + * This validation is consistent with network stack and returns the error
> + * in the order corresponding to the order of errors from the network stack.
> + * It's required to not wrongfully return -EACCES instead of meaningful network
> + * stack level error. Consistency is partially tested with kselftest.
> + *
> + * This helper does not provide consistency of error codes for BPF filter
> + * (if any).
> + *
> + * The function holds socket lock while checking the socket state.
> + */
> +static int
> +check_tcp_connect_consistency_and_get_port(struct socket *const sock,
> +					   struct sockaddr *const address,
> +					   const int addrlen, __be16 *port)
> +{
> +	int err = 0;
> +	struct sock *const sk = sock->sk;
> +
> +	/* Cf. __inet_stream_connect(). */
> +	lock_sock(sk);
> +	switch (sock->state) {
> +	default:
> +		err = -EINVAL;
> +		break;
> +	case SS_CONNECTED:
> +		err = -EISCONN;
> +		break;
> +	case SS_CONNECTING:
> +		/*
> +		 * Calling connect(2) on nonblocking socket with SYN_SENT or SYN_RECV
> +		 * state immediately returns -EISCONN and -EALREADY (Cf. __inet_stream_connect()).
> +		 *
> +		 * This check is not tested with kselftests.
> +		 */
> +		if ((sock->file->f_flags & O_NONBLOCK) &&
> +		    ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) {
> +			if (inet_test_bit(DEFER_CONNECT, sk))
> +				err = -EISCONN;
> +			else
> +				err = -EALREADY;
> +			break;
> +		}
> +
> +		/*
> +		 * Current state is possible in two cases:
> +		 * 1. connect(2) is called upon nonblocking socket and previous
> +		 *    connection attempt was closed by RST packet (therefore socket is
> +		 *    in TCP_CLOSE state). In this case connect(2) calls
> +		 *    sk_prot->disconnect(), changes socket state and increases number
> +		 *    of disconnects.
> +		 * 2. connect(2) is called twice upon socket with TCP_FASTOPEN_CONNECT
> +		 *    option set. If socket state is TCP_CLOSE connect(2) does the
> +		 *    same logic as in point 1 case. Otherwise connect(2) may freeze
> +		 *    after inet_wait_for_connect() call since SYN was never sent.
> +		 *
> +		 * For both this cases Landlock cannot provide error consistency since
> +		 * 1. Both cases involve executing some network stack logic and changing
> +		 *    the socket state.
> +		 * 2. It cannot omit access check and allow network stack handle error
> +		 *    consistency since socket can change its state to SS_UNCONNECTED
> +		 *    before it will be locked again in inet_stream_connect().
> +		 *
> +		 * Therefore it is only possible to return 0 and check access right with
> +		 * check_access_port() helper.
> +		 */
> +		release_sock(sk);
> +		return 0;
> +	case SS_UNCONNECTED:
> +		if (sk->sk_state != TCP_CLOSE)
> +			err = -EISCONN;
> +		break;
> +	}
> +	release_sock(sk);
> +
> +	if (err)
> +		return err;
> +
> +	/* IPV6_ADDRFORM can change sk->sk_family under us. */
> +	switch (READ_ONCE(sk->sk_family)) {
> +	case AF_INET:
> +		/* Cf. tcp_v4_connect(). */
> +		if (addrlen < sizeof(struct sockaddr_in))
> +			return -EINVAL;
> +		if (address->sa_family != AF_INET)
> +			return -EAFNOSUPPORT;
> +
> +		*port = ((struct sockaddr_in *)address)->sin_port;
> +		break;
> +#if IS_ENABLED(CONFIG_IPV6)
> +	case AF_INET6:
> +		/* Cf. tcp_v6_connect(). */
> +		if (addrlen < SIN6_LEN_RFC2133)
> +			return -EINVAL;
> +		if (address->sa_family != AF_INET6)
> +			return -EAFNOSUPPORT;
> +
> +		*port = ((struct sockaddr_in6 *)address)->sin6_port;
> +		break;
> +#endif /* IS_ENABLED(CONFIG_IPV6) */
> +	default:
> +		WARN_ON_ONCE(0);
> +		return -EACCES;
> +	}
> +
> +	return 0;
> +}
> +
> +static int hook_socket_bind(struct socket *const sock,
> +			    struct sockaddr *const address, const int addrlen)
> +{
> +	int err;
> +	__be16 port;
> +	const struct landlock_ruleset *const dom = get_current_net_domain();
> +
> +	if (!dom)
> +		return 0;
> +	if (WARN_ON_ONCE(dom->num_layers < 1))
> +		return -EACCES;
> +
> +	if (sk_is_tcp(sock->sk)) {
> +		err = check_tcp_bind_consistency_and_get_port(sock, address,
> +							      addrlen, &port);
> +		if (err)
> +			return err;
> +		return check_access_port(dom, port,
> +					 LANDLOCK_ACCESS_NET_BIND_TCP);
> +	}
> +	return 0;
> +}
> +
> +static int hook_socket_connect(struct socket *const sock,
> +			       struct sockaddr *const address,
> +			       const int addrlen)
> +{
> +	int err;
> +	__be16 port;
> +	const struct landlock_ruleset *const dom = get_current_net_domain();
> +
> +	if (!dom)
> +		return 0;
> +	if (WARN_ON_ONCE(dom->num_layers < 1))
> +		return -EACCES;
> +
> +	if (sk_is_tcp(sock->sk)) {
> +		/* Checks for minimal header length to safely read sa_family. */
> +		if (addrlen < sizeof(address->sa_family))
> +			return -EINVAL;
> +		/*
> +		 * Connecting to an address with AF_UNSPEC dissolves the TCP
> +		 * association, which have the same effect as closing the
> +		 * connection while retaining the socket object (i.e., the file
> +		 * descriptor).  As for dropping privileges, closing
> +		 * connections is always allowed.
> +		 *
> +		 * For a TCP access control system, this request is legitimate.
> +		 * Let the network stack handle potential inconsistencies and
> +		 * return -EINVAL if needed.
> +		 */
> +		if (address->sa_family == AF_UNSPEC)
> +			return 0;
> +
> +		err = check_tcp_connect_consistency_and_get_port(
> +			sock, address, addrlen, &port);
> +		if (err)
> +			return err;
> +		return check_access_port(dom, port,
> +					 LANDLOCK_ACCESS_NET_CONNECT_TCP);
> +	}
> +	return 0;
> +}
> +
> +static struct security_hook_list landlock_hooks[] __ro_after_init = {
> +	LSM_HOOK_INIT(socket_bind, hook_socket_bind),
> +	LSM_HOOK_INIT(socket_connect, hook_socket_connect),
> +};
> +
> +__init void landlock_add_net_hooks(void)
> +{
> +	security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks),
> +			   &landlock_lsmid);
> +}
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-04 19:27       ` Mickaël Salaün
@ 2024-12-04 19:35         ` Mickaël Salaün
  2024-12-09 10:19           ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-04 19:35 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Mikhail Ivanov, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
> On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
> > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > Hi Mikhail and Landlock maintainers,
> > > 
> > > +cc MPTCP list.
> > 
> > Thanks, we should include this list in the next series.
> > 
> > > 
> > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > (SCTP, MPTCP, SMC).
> > > 
> > > Thank you for the patch!
> > > 
> > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > because we needed such dedicated socket to talk to the userspace.
> > > 
> > > I don't know Landlock well, but I think it is important to know that an
> > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > by Landlock, someone can simply force an application to create an MPTCP
> > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > 
> > > Please note that I'm not against this modification -- especially here
> > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > it might be less confusing for users if MPTCP is considered as being
> > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > TCP is blocked, MPTCP is blocked as well.
> > 
> > Good point!  I don't know well MPTCP but I think you're right.  Given
> > it's close relationship with TCP and the fallback mechanism, it would
> > make sense for users to not make a difference and it would avoid bypass
> > of misleading restrictions.  Moreover the Landlock rules are simple and
> > only control TCP ports, not peer addresses, which seems to be the main
> > evolution of MPTCP.
> 
> Thinking more about this, this makes sense from the point of view of the
> network stack, but looking at external (potentially bogus) firewalls or
> malware detection systems, it is something different.  If we don't
> provide a way for users to differenciate the control of SCTP from TCP,
> malicious use of SCTP could still bypass this kind of bogus security
> appliances.  It would then be safer to stick to the protocol semantic by
> clearly differenciating TCP from MPTCP (or any other protocol).
> 
> Mikhail, could you please send a new patch series containing one patch
> to fix the kernel and another to extend tests?

No need to squash them in one, please keep the current split of the test
patches.  However, it would be good to be able to easily backport them,
or at least the most relevant for this fix, which means to avoid
extended refactoring.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-04 19:35         ` Mickaël Salaün
@ 2024-12-09 10:19           ` Mikhail Ivanov
  2024-12-10 18:04             ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-09 10:19 UTC (permalink / raw)
  To: Mickaël Salaün, Matthieu Baerts
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, David Laight

On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
> On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
>> On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>> Hi Mikhail and Landlock maintainers,
>>>>
>>>> +cc MPTCP list.
>>>
>>> Thanks, we should include this list in the next series.
>>>
>>>>
>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>> (SCTP, MPTCP, SMC).
>>>>
>>>> Thank you for the patch!
>>>>
>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>> because we needed such dedicated socket to talk to the userspace.
>>>>
>>>> I don't know Landlock well, but I think it is important to know that an
>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>
>>>> Please note that I'm not against this modification -- especially here
>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>> it might be less confusing for users if MPTCP is considered as being
>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>> TCP is blocked, MPTCP is blocked as well.
>>>
>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>> it's close relationship with TCP and the fallback mechanism, it would
>>> make sense for users to not make a difference and it would avoid bypass
>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>> only control TCP ports, not peer addresses, which seems to be the main
>>> evolution of MPTCP.
>>
>> Thinking more about this, this makes sense from the point of view of the
>> network stack, but looking at external (potentially bogus) firewalls or
>> malware detection systems, it is something different.  If we don't
>> provide a way for users to differenciate the control of SCTP from TCP,
>> malicious use of SCTP could still bypass this kind of bogus security
>> appliances.  It would then be safer to stick to the protocol semantic by
>> clearly differenciating TCP from MPTCP (or any other protocol).

You mean that these firewals have protocol granularity (e.g. different
restrictions for MPTCP and TCP sockets)?

>>
>> Mikhail, could you please send a new patch series containing one patch
>> to fix the kernel and another to extend tests?
> 
> No need to squash them in one, please keep the current split of the test
> patches.  However, it would be good to be able to easily backport them,
> or at least the most relevant for this fix, which means to avoid
> extended refactoring.

No problem, I'll remove the fix of error consistency from this patchset.
BTW, what do you think about second and third commits? Should I send the
new version of them as well (in separate patch)?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-04 19:30   ` Mickaël Salaün
@ 2024-12-09 10:19     ` Mikhail Ivanov
  0 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-09 10:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On 12/4/2024 10:30 PM, Mickaël Salaün wrote:
> On Thu, Oct 17, 2024 at 07:04:47PM +0800, Mikhail Ivanov wrote:
>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>> should not restrict bind(2) and connect(2) for non-TCP protocols
>> (SCTP, MPTCP, SMC).
>>
>> sk_is_tcp() is used for this to check address family of the socket
>> before doing INET-specific address length validation. This is required
>> for error consistency.
>>
>> Closes: https://github.com/landlock-lsm/linux/issues/40
>> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
>> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
>> ---
>>
>> Changes since v1:
>> * Validate socket family (=INET{,6}) before any other checks
>>    with sk_is_tcp().
>> ---
>>   security/landlock/net.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/security/landlock/net.c b/security/landlock/net.c
>> index fdc1bb0a9c5d..1e80782ba239 100644
>> --- a/security/landlock/net.c
>> +++ b/security/landlock/net.c
>> @@ -66,8 +66,8 @@ static int current_check_access_socket(struct socket *const sock,
>>   	if (WARN_ON_ONCE(dom->num_layers < 1))
>>   		return -EACCES;
>>   
>> -	/* Checks if it's a (potential) TCP socket. */
>> -	if (sock->type != SOCK_STREAM)
>> +	/* Do not restrict non-TCP sockets. */
> 
> You can remove this comment because the following check is explicit.

ok, thx

> 
>> +	if (!sk_is_tcp(sock->sk))
>>   		return 0;
>>   
>>   	/* Checks for minimal header length to safely read sa_family. */
>> -- 
>> 2.34.1
>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-09 10:19           ` Mikhail Ivanov
@ 2024-12-10 18:04             ` Mickaël Salaün
  2024-12-10 18:05               ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-10 18:04 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On Mon, Dec 09, 2024 at 01:19:19PM +0300, Mikhail Ivanov wrote:
> On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
> > On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
> > > On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
> > > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > > > Hi Mikhail and Landlock maintainers,
> > > > > 
> > > > > +cc MPTCP list.
> > > > 
> > > > Thanks, we should include this list in the next series.
> > > > 
> > > > > 
> > > > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > > > (SCTP, MPTCP, SMC).
> > > > > 
> > > > > Thank you for the patch!
> > > > > 
> > > > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > > > because we needed such dedicated socket to talk to the userspace.
> > > > > 
> > > > > I don't know Landlock well, but I think it is important to know that an
> > > > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > > > by Landlock, someone can simply force an application to create an MPTCP
> > > > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > > > 
> > > > > Please note that I'm not against this modification -- especially here
> > > > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > > > it might be less confusing for users if MPTCP is considered as being
> > > > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > > > TCP is blocked, MPTCP is blocked as well.
> > > > 
> > > > Good point!  I don't know well MPTCP but I think you're right.  Given
> > > > it's close relationship with TCP and the fallback mechanism, it would
> > > > make sense for users to not make a difference and it would avoid bypass
> > > > of misleading restrictions.  Moreover the Landlock rules are simple and
> > > > only control TCP ports, not peer addresses, which seems to be the main
> > > > evolution of MPTCP.
> > > 
> > > Thinking more about this, this makes sense from the point of view of the
> > > network stack, but looking at external (potentially bogus) firewalls or
> > > malware detection systems, it is something different.  If we don't
> > > provide a way for users to differenciate the control of SCTP from TCP,
> > > malicious use of SCTP could still bypass this kind of bogus security
> > > appliances.  It would then be safer to stick to the protocol semantic by
> > > clearly differenciating TCP from MPTCP (or any other protocol).
> 
> You mean that these firewals have protocol granularity (e.g. different
> restrictions for MPTCP and TCP sockets)?

Yes, and more importantly they can miss the MTCP semantic and then not
properly filter such packet, which can be use to escape the network
policy.  See some issues here:
https://en.wikipedia.org/wiki/Multipath_TCP

The point is that we cannot assume anything about other networking
stacks, and if Landlock can properly differentiate between TCP and MTCP
(e.g. with new LANDLOCK_ACCESS_NET_CONNECT_MTCP) users of such firewalls
could still limit the impact of their firewall's bugs.  However, if
Landlock treats TCP and MTCP the same way, we'll not be able to only
deny MTCP.  In most use cases, the network policy should treat both TCP
and MTCP the same way though, but we should let users decide according
to their context.

From an implementation point of view, adding MTCP support should be
simple, mainly tests will grow.

> 
> > > 
> > > Mikhail, could you please send a new patch series containing one patch
> > > to fix the kernel and another to extend tests?
> > 
> > No need to squash them in one, please keep the current split of the test
> > patches.  However, it would be good to be able to easily backport them,
> > or at least the most relevant for this fix, which means to avoid
> > extended refactoring.
> 
> No problem, I'll remove the fix of error consistency from this patchset.
> BTW, what do you think about second and third commits? Should I send the
> new version of them as well (in separate patch)?

According to the description, patch 2 may be included in this series if
it can be tested with any other LSM, but I cannot read these patches:
https://lore.kernel.org/all/20241017110454.265818-3-ivanov.mikhail1@huawei-partners.com/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-10 18:04             ` Mickaël Salaün
@ 2024-12-10 18:05               ` Mickaël Salaün
  2024-12-11 15:24                 ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-10 18:05 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On Tue, Dec 10, 2024 at 07:04:15PM +0100, Mickaël Salaün wrote:
> On Mon, Dec 09, 2024 at 01:19:19PM +0300, Mikhail Ivanov wrote:
> > On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
> > > On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
> > > > On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
> > > > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > > > > Hi Mikhail and Landlock maintainers,
> > > > > > 
> > > > > > +cc MPTCP list.
> > > > > 
> > > > > Thanks, we should include this list in the next series.
> > > > > 
> > > > > > 
> > > > > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > > > > (SCTP, MPTCP, SMC).
> > > > > > 
> > > > > > Thank you for the patch!
> > > > > > 
> > > > > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > > > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > > > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > > > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > > > > because we needed such dedicated socket to talk to the userspace.
> > > > > > 
> > > > > > I don't know Landlock well, but I think it is important to know that an
> > > > > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > > > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > > > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > > > > by Landlock, someone can simply force an application to create an MPTCP
> > > > > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > > > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > > > > 
> > > > > > Please note that I'm not against this modification -- especially here
> > > > > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > > > > it might be less confusing for users if MPTCP is considered as being
> > > > > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > > > > TCP is blocked, MPTCP is blocked as well.
> > > > > 
> > > > > Good point!  I don't know well MPTCP but I think you're right.  Given
> > > > > it's close relationship with TCP and the fallback mechanism, it would
> > > > > make sense for users to not make a difference and it would avoid bypass
> > > > > of misleading restrictions.  Moreover the Landlock rules are simple and
> > > > > only control TCP ports, not peer addresses, which seems to be the main
> > > > > evolution of MPTCP.
> > > > 
> > > > Thinking more about this, this makes sense from the point of view of the
> > > > network stack, but looking at external (potentially bogus) firewalls or
> > > > malware detection systems, it is something different.  If we don't
> > > > provide a way for users to differenciate the control of SCTP from TCP,
> > > > malicious use of SCTP could still bypass this kind of bogus security
> > > > appliances.  It would then be safer to stick to the protocol semantic by
> > > > clearly differenciating TCP from MPTCP (or any other protocol).
> > 
> > You mean that these firewals have protocol granularity (e.g. different
> > restrictions for MPTCP and TCP sockets)?
> 
> Yes, and more importantly they can miss the MTCP semantic and then not
> properly filter such packet, which can be use to escape the network
> policy.  See some issues here:
> https://en.wikipedia.org/wiki/Multipath_TCP
> 
> The point is that we cannot assume anything about other networking
> stacks, and if Landlock can properly differentiate between TCP and MTCP
> (e.g. with new LANDLOCK_ACCESS_NET_CONNECT_MTCP) users of such firewalls
> could still limit the impact of their firewall's bugs.  However, if
> Landlock treats TCP and MTCP the same way, we'll not be able to only
> deny MTCP.  In most use cases, the network policy should treat both TCP
> and MTCP the same way though, but we should let users decide according
> to their context.
> 
> From an implementation point of view, adding MTCP support should be
> simple, mainly tests will grow.

s/MTCP/MPTCP/g of course.

> 
> > 
> > > > 
> > > > Mikhail, could you please send a new patch series containing one patch
> > > > to fix the kernel and another to extend tests?
> > > 
> > > No need to squash them in one, please keep the current split of the test
> > > patches.  However, it would be good to be able to easily backport them,
> > > or at least the most relevant for this fix, which means to avoid
> > > extended refactoring.
> > 
> > No problem, I'll remove the fix of error consistency from this patchset.
> > BTW, what do you think about second and third commits? Should I send the
> > new version of them as well (in separate patch)?
> 
> According to the description, patch 2 may be included in this series if
> it can be tested with any other LSM, but I cannot read these patches:
> https://lore.kernel.org/all/20241017110454.265818-3-ivanov.mikhail1@huawei-partners.com/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions
  2024-10-17 11:04 ` [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions Mikhail Ivanov
@ 2024-12-10 18:07   ` Mickaël Salaün
  2024-12-11 15:29     ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-10 18:07 UTC (permalink / raw)
  To: Mikhail Ivanov, Paul Moore
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On Thu, Oct 17, 2024 at 07:04:52PM +0800, Mikhail Ivanov wrote:
> Add tcp_errors_consistency fixture for TCP errors consistency tests.
> 
> Add 6 test suits for this fixture to configure tested address family of
> socket (ipv4 or ipv6), sandboxed mode and whether TCP action is allowed
> in a sandboxed mode.
> 
> Add tests which validate errors consistency provided by Landlock for
> bind(2) and connect(2) restrictable TCP actions.
> 
> Add sys_bind(), sys_connect() helpers for convenient checks of bind(2)
> and connect(2). Add set_ipv4_tcp_address(), set_ipv6_tcp_address()
> helpers.
> 
> Add CONFIG_LSM="landlock" option in config. Some LSMs (e.g. SElinux)
> can be loaded before Landlock and return inconsistent error code for
> bind(2) and connect(2) calls.
> 
> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
> ---
>  tools/testing/selftests/landlock/config     |   1 +
>  tools/testing/selftests/landlock/net_test.c | 329 +++++++++++++++++++-
>  2 files changed, 324 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
> index a8982da4acbd..52988e8a56cc 100644
> --- a/tools/testing/selftests/landlock/config
> +++ b/tools/testing/selftests/landlock/config
> @@ -3,6 +3,7 @@ CONFIG_CGROUP_SCHED=y
>  CONFIG_INET=y
>  CONFIG_IPV6=y
>  CONFIG_KEYS=y
> +CONFIG_LSM="landlock"

We should not force CONFIG_LSM because we may want to test Landlock with
other LSMs.

For now, I think we should ignore wrong error codes that may be returned
by other LSMs but send this patch with a patch series fixing the LSM
framework as a whole.  Feel free to include these patches too:
https://lore.kernel.org/all/20240327120036.233641-1-mic@digikod.net/

>  CONFIG_MPTCP=y
>  CONFIG_MPTCP_IPV6=y
>  CONFIG_NET=y
> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
> index d9de0ee49ebc..30b29bf10bdc 100644
> --- a/tools/testing/selftests/landlock/net_test.c
> +++ b/tools/testing/selftests/landlock/net_test.c
> @@ -36,6 +36,22 @@ enum sandbox_type {
>  	TCP_SANDBOX,
>  };
>  
> +static void set_ipv4_tcp_address(const struct service_fixture *const srv,
> +				 struct sockaddr_in *ipv4_addr)
> +{
> +	ipv4_addr->sin_family = srv->protocol.domain;
> +	ipv4_addr->sin_port = htons(srv->port);
> +	ipv4_addr->sin_addr.s_addr = inet_addr(loopback_ipv4);
> +}
> +
> +static void set_ipv6_tcp_address(const struct service_fixture *const srv,
> +				 struct sockaddr_in6 *ipv6_addr)
> +{
> +	ipv6_addr->sin6_family = srv->protocol.domain;
> +	ipv6_addr->sin6_port = htons(srv->port);
> +	inet_pton(AF_INET6, loopback_ipv6, &ipv6_addr->sin6_addr);
> +}
> +
>  static int set_service(struct service_fixture *const srv,
>  		       const struct protocol_variant prot,
>  		       const unsigned short index)
> @@ -56,15 +72,11 @@ static int set_service(struct service_fixture *const srv,
>  	switch (prot.domain) {
>  	case AF_UNSPEC:
>  	case AF_INET:
> -		srv->ipv4_addr.sin_family = prot.domain;
> -		srv->ipv4_addr.sin_port = htons(srv->port);
> -		srv->ipv4_addr.sin_addr.s_addr = inet_addr(loopback_ipv4);
> +		set_ipv4_tcp_address(srv, &srv->ipv4_addr);
>  		return 0;
>  
>  	case AF_INET6:
> -		srv->ipv6_addr.sin6_family = prot.domain;
> -		srv->ipv6_addr.sin6_port = htons(srv->port);
> -		inet_pton(AF_INET6, loopback_ipv6, &srv->ipv6_addr.sin6_addr);
> +		set_ipv6_tcp_address(srv, &srv->ipv6_addr);
>  		return 0;
>  
>  	case AF_UNIX:
> @@ -181,6 +193,17 @@ static uint16_t get_binded_port(int socket_fd,
>  	}
>  }
>  
> +static int sys_bind(const int sock_fd, const struct sockaddr *addr,
> +		    socklen_t addrlen)
> +{
> +	int ret;
> +
> +	ret = bind(sock_fd, addr, addrlen);
> +	if (ret < 0)
> +		return -errno;
> +	return 0;
> +}
> +
>  static int bind_variant_addrlen(const int sock_fd,
>  				const struct service_fixture *const srv,
>  				const socklen_t addrlen)
> @@ -217,6 +240,17 @@ static int bind_variant(const int sock_fd,
>  	return bind_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
>  }
>  
> +static int sys_connect(const int sock_fd, const struct sockaddr *addr,
> +		       socklen_t addrlen)
> +{
> +	int ret;
> +
> +	ret = connect(sock_fd, addr, addrlen);
> +	if (ret < 0)
> +		return -errno;
> +	return 0;
> +}
> +
>  static int connect_variant_addrlen(const int sock_fd,
>  				   const struct service_fixture *const srv,
>  				   const socklen_t addrlen)
> @@ -923,6 +957,289 @@ TEST_F(protocol, connect_unspec)
>  	EXPECT_EQ(0, close(bind_fd));
>  }
>  
> +FIXTURE(tcp_errors_consistency)
> +{
> +	struct service_fixture srv0, srv1;
> +	struct sockaddr *inval_addr_p0;
> +	socklen_t addrlen_min;
> +
> +	struct sockaddr_in inval_ipv4_addr;
> +	struct sockaddr_in6 inval_ipv6_addr;
> +};
> +
> +FIXTURE_VARIANT(tcp_errors_consistency)
> +{
> +	const enum sandbox_type sandbox;
> +	const int domain;
> +	bool allowed;
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv4) {
> +	/* clang-format on */
> +	.sandbox = NO_SANDBOX,
> +	.domain = AF_INET,
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv6) {
> +	/* clang-format on */
> +	.sandbox = NO_SANDBOX,
> +	.domain = AF_INET6,
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv4) {
> +	/* clang-format on */
> +	.sandbox = TCP_SANDBOX,
> +	.domain = AF_INET,
> +	.allowed = false,
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv4) {
> +	/* clang-format on */
> +	.sandbox = TCP_SANDBOX,
> +	.domain = AF_INET,
> +	.allowed = true,
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv6) {
> +	/* clang-format on */
> +	.sandbox = TCP_SANDBOX,
> +	.domain = AF_INET6,
> +	.allowed = false,
> +};
> +
> +/* clang-format off */
> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv6) {
> +	/* clang-format on */
> +	.sandbox = TCP_SANDBOX,
> +	.domain = AF_INET6,
> +	.allowed = true,
> +};
> +
> +FIXTURE_SETUP(tcp_errors_consistency)
> +{
> +	const struct protocol_variant tcp_prot = {
> +		.domain = variant->domain,
> +		.type = SOCK_STREAM,
> +	};
> +
> +	disable_caps(_metadata);
> +
> +	set_service(&self->srv0, tcp_prot, 0);
> +	set_service(&self->srv1, tcp_prot, 1);
> +
> +	if (variant->domain == AF_INET) {
> +		set_ipv4_tcp_address(&self->srv0, &self->inval_ipv4_addr);
> +		self->inval_ipv4_addr.sin_family = AF_INET6;
> +
> +		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv4_addr;
> +		self->addrlen_min = sizeof(struct sockaddr_in);
> +	} else {
> +		set_ipv6_tcp_address(&self->srv0, &self->inval_ipv6_addr);
> +		self->inval_ipv6_addr.sin6_family = AF_INET;
> +
> +		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv6_addr;
> +		self->addrlen_min = SIN6_LEN_RFC2133;
> +	}
> +
> +	setup_loopback(_metadata);
> +};
> +
> +FIXTURE_TEARDOWN(tcp_errors_consistency)
> +{
> +}
> +
> +/*
> + * Validates that Landlock provides errors consistency for bind(2) operation
> + * (not restricted, allowed and denied).
> + *
> + * Error consistency implies that in sandboxed process, bind(2) returns the same
> + * errors and in the same order (assuming multiple errors) as during normal
> + * execution.
> + */
> +TEST_F(tcp_errors_consistency, bind)
> +{
> +	if (variant->sandbox == TCP_SANDBOX) {
> +		const struct landlock_ruleset_attr ruleset_attr = {
> +			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
> +		};
> +		int ruleset_fd;
> +
> +		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
> +						     sizeof(ruleset_attr), 0);
> +		ASSERT_LE(0, ruleset_fd);
> +
> +		if (variant->allowed) {
> +			const struct landlock_net_port_attr tcp_bind_p0 = {
> +				.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
> +				.port = self->srv0.port,
> +			};
> +
> +			/* Allows bind for the first port. */
> +			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
> +						       LANDLOCK_RULE_NET_PORT,
> +						       &tcp_bind_p0, 0));
> +		}
> +
> +		enforce_ruleset(_metadata, ruleset_fd);
> +		EXPECT_EQ(0, close(ruleset_fd));
> +	}
> +	int sock_fd;
> +
> +	sock_fd = socket_variant(&self->srv0);
> +	ASSERT_LE(0, sock_fd);
> +
> +	/*
> +	 * Tries to bind socket to address with invalid sa_family value
> +	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
> +	 */
> +	EXPECT_EQ(-EAFNOSUPPORT,
> +		  sys_bind(sock_fd, self->inval_addr_p0, self->addrlen_min));
> +
> +	if (variant->domain == AF_INET) {
> +		struct sockaddr_in ipv4_unspec_addr;
> +
> +		set_ipv4_tcp_address(&self->srv0, &ipv4_unspec_addr);
> +		ipv4_unspec_addr.sin_family = AF_UNSPEC;
> +		/*
> +		 * Ipv4 bind(2) accepts AF_UNSPEC family in address only if address is
> +		 * INADDR_ANY. Otherwise, returns -EAFNOSUPPORT.
> +		 */
> +		EXPECT_EQ(-EAFNOSUPPORT,
> +			  sys_bind(sock_fd,
> +				   (struct sockaddr *)&ipv4_unspec_addr,
> +				   self->addrlen_min));
> +	}
> +
> +	/* Tries to bind with too small addrlen (Cf. inet_bind_sk). */
> +	EXPECT_EQ(-EINVAL, sys_bind(sock_fd, self->inval_addr_p0,
> +				    self->addrlen_min - 1));
> +
> +	ASSERT_EQ(0, close(sock_fd));
> +}
> +
> +/*
> + * Validates that Landlock provides errors consistency for connect(2) operation
> + * (not restricted, allowed and denied).
> + *
> + * Error consistency implies that in sandboxed process, connect(2) returns the
> + * same errors and in the same order (assuming multiple errors) as during normal
> + * execution.
> + */
> +TEST_F(tcp_errors_consistency, connect)
> +{
> +	int nonblock_p0_fd;
> +
> +	nonblock_p0_fd = socket(variant->domain,
> +				SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
> +	ASSERT_LE(0, nonblock_p0_fd);
> +
> +	/* Tries to connect nonblocking socket before establishing ruleset. */
> +	ASSERT_EQ(-EINPROGRESS, connect_variant(nonblock_p0_fd, &self->srv0));
> +
> +	if (variant->sandbox == TCP_SANDBOX) {
> +		const struct landlock_ruleset_attr ruleset_attr = {
> +			.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
> +		};
> +		const struct landlock_net_port_attr tcp_connect_p1 = {
> +			.allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP,
> +			.port = self->srv1.port,
> +		};
> +		int ruleset_fd;
> +
> +		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
> +						     sizeof(ruleset_attr), 0);
> +		ASSERT_LE(0, ruleset_fd);
> +
> +		/* Allows connect for the second port. */
> +		ASSERT_EQ(0,
> +			  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
> +					    &tcp_connect_p1, 0));
> +
> +		if (variant->allowed) {
> +			const struct landlock_net_port_attr tcp_connect_p0 = {
> +				.allowed_access =
> +					LANDLOCK_ACCESS_NET_CONNECT_TCP,
> +				.port = self->srv0.port,
> +			};
> +
> +			/* Allows connect for the first port. */
> +			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
> +						       LANDLOCK_RULE_NET_PORT,
> +						       &tcp_connect_p0, 0));
> +		}
> +
> +		enforce_ruleset(_metadata, ruleset_fd);
> +		EXPECT_EQ(0, close(ruleset_fd));
> +	}
> +	int client_p0_fd, client_p1_fd, server_p0_fd, server_p1_fd;
> +
> +	client_p0_fd = socket_variant(&self->srv0);
> +	ASSERT_LE(0, client_p0_fd);
> +	/*
> +	 * Tries to connect socket to address with invalid sa_family value
> +	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
> +	 */
> +	EXPECT_EQ(-EAFNOSUPPORT, sys_connect(client_p0_fd, self->inval_addr_p0,
> +					     self->addrlen_min));
> +
> +	/* Tries to connect with too small addrlen. */
> +	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0,
> +				       self->addrlen_min - 1));
> +
> +	/* Creates socket listening on zero port. */
> +	server_p0_fd = socket_variant(&self->srv0);
> +	ASSERT_LE(0, server_p0_fd);
> +
> +	ASSERT_EQ(0, bind_variant(server_p0_fd, &self->srv0));
> +	ASSERT_EQ(0, listen(server_p0_fd, backlog));
> +	/* Tries to connect listening socket. */
> +	EXPECT_EQ(-EISCONN, sys_connect(server_p0_fd, self->inval_addr_p0,
> +					self->addrlen_min - 1));
> +
> +	/* Creates socket listening on first port. */
> +	server_p1_fd = socket_variant(&self->srv1);
> +	ASSERT_LE(0, server_p1_fd);
> +
> +	ASSERT_EQ(0, bind_variant(server_p1_fd, &self->srv1));
> +	ASSERT_EQ(0, listen(server_p1_fd, backlog));
> +
> +	client_p1_fd = socket_variant(&self->srv1);
> +	ASSERT_LE(0, client_p1_fd);
> +
> +	/* Connects to server_p1_fd. */
> +	ASSERT_EQ(0, connect_variant(client_p1_fd, &self->srv1));
> +	/* Tries to connect already connected socket. */
> +	EXPECT_EQ(-EISCONN, sys_connect(client_p1_fd, self->inval_addr_p0,
> +					self->addrlen_min - 1));
> +
> +	/*
> +	 * connect(2) is called upon nonblocking socket and previous connection
> +	 * attempt was closed by RST packet. Landlock cannot provide error
> +	 * consistency in this case (Cf. check_tcp_connect_consistency_and_get_port()).
> +	 */
> +	if (variant->sandbox == TCP_SANDBOX) {
> +		EXPECT_EQ(-EACCES,
> +			  connect_variant(nonblock_p0_fd, &self->srv0));
> +	} else {
> +		EXPECT_EQ(-ECONNREFUSED,
> +			  connect_variant(nonblock_p0_fd, &self->srv0));
> +	}
> +
> +	/* Tries to connect with zero as addrlen. */
> +	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0, 0));
> +
> +	ASSERT_EQ(0, close(client_p1_fd));
> +	ASSERT_EQ(0, close(server_p1_fd));
> +	ASSERT_EQ(0, close(server_p0_fd));
> +	ASSERT_EQ(0, close(client_p0_fd));
> +	ASSERT_EQ(0, close(nonblock_p0_fd));
> +}
> +
>  FIXTURE(ipv4)
>  {
>  	struct service_fixture srv0, srv1;
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation
  2024-10-17 11:04 ` [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation Mikhail Ivanov
@ 2024-12-10 18:08   ` Mickaël Salaün
  2024-12-11 15:30     ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-10 18:08 UTC (permalink / raw)
  To: Mikhail Ivanov, Paul Moore
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On Thu, Oct 17, 2024 at 07:04:53PM +0800, Mikhail Ivanov wrote:
> Add recommendation to specify Landlock first in CONFIG_LSM list, so user
> can have better LSM errors consistency provided by Landlock.
> 
> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
> ---
>  Documentation/userspace-api/landlock.rst | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
> index bb7480a05e2c..0db5eee9bffa 100644
> --- a/Documentation/userspace-api/landlock.rst
> +++ b/Documentation/userspace-api/landlock.rst
> @@ -610,7 +610,8 @@ time as the other security modules.  The list of security modules enabled by
>  default is set with ``CONFIG_LSM``.  The kernel configuration should then
>  contains ``CONFIG_LSM=landlock,[...]`` with ``[...]``  as the list of other
>  potentially useful security modules for the running system (see the
> -``CONFIG_LSM`` help).
> +``CONFIG_LSM`` help). It is recommended to specify Landlock first of all other
> +modules in CONFIG_LSM list since it provides better errors consistency.

This is partially correct because Landlock may not block anything
whereas another LSM could deny a network action, with potentially a
wrong error code.  I don't think this patch is worth it, especially
because other LSMs have bugs that should be fixed.

>  
>  Boot time configuration
>  -----------------------
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-10 18:05               ` Mickaël Salaün
@ 2024-12-11 15:24                 ` Mikhail Ivanov
  2024-12-12 18:43                   ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-11 15:24 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On 12/10/2024 9:05 PM, Mickaël Salaün wrote:
> On Tue, Dec 10, 2024 at 07:04:15PM +0100, Mickaël Salaün wrote:
>> On Mon, Dec 09, 2024 at 01:19:19PM +0300, Mikhail Ivanov wrote:
>>> On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
>>>> On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
>>>>> On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
>>>>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>>>>> Hi Mikhail and Landlock maintainers,
>>>>>>>
>>>>>>> +cc MPTCP list.
>>>>>>
>>>>>> Thanks, we should include this list in the next series.
>>>>>>
>>>>>>>
>>>>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>>>>> (SCTP, MPTCP, SMC).
>>>>>>>
>>>>>>> Thank you for the patch!
>>>>>>>
>>>>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>>>>> because we needed such dedicated socket to talk to the userspace.
>>>>>>>
>>>>>>> I don't know Landlock well, but I think it is important to know that an
>>>>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>>>>
>>>>>>> Please note that I'm not against this modification -- especially here
>>>>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>>>>> it might be less confusing for users if MPTCP is considered as being
>>>>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>>>>> TCP is blocked, MPTCP is blocked as well.
>>>>>>
>>>>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>>>>> it's close relationship with TCP and the fallback mechanism, it would
>>>>>> make sense for users to not make a difference and it would avoid bypass
>>>>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>>>>> only control TCP ports, not peer addresses, which seems to be the main
>>>>>> evolution of MPTCP.
>>>>>
>>>>> Thinking more about this, this makes sense from the point of view of the
>>>>> network stack, but looking at external (potentially bogus) firewalls or
>>>>> malware detection systems, it is something different.  If we don't
>>>>> provide a way for users to differenciate the control of SCTP from TCP,
>>>>> malicious use of SCTP could still bypass this kind of bogus security
>>>>> appliances.  It would then be safer to stick to the protocol semantic by
>>>>> clearly differenciating TCP from MPTCP (or any other protocol).
>>>
>>> You mean that these firewals have protocol granularity (e.g. different
>>> restrictions for MPTCP and TCP sockets)?
>>
>> Yes, and more importantly they can miss the MTCP semantic and then not
>> properly filter such packet, which can be use to escape the network
>> policy.  See some issues here:
>> https://en.wikipedia.org/wiki/Multipath_TCP
>>
>> The point is that we cannot assume anything about other networking
>> stacks, and if Landlock can properly differentiate between TCP and MTCP
>> (e.g. with new LANDLOCK_ACCESS_NET_CONNECT_MTCP) users of such firewalls
>> could still limit the impact of their firewall's bugs.  However, if
>> Landlock treats TCP and MTCP the same way, we'll not be able to only
>> deny MTCP.  In most use cases, the network policy should treat both TCP
>> and MTCP the same way though, but we should let users decide according
>> to their context.
>>
>>  From an implementation point of view, adding MTCP support should be
>> simple, mainly tests will grow.
> 
> s/MTCP/MPTCP/g of course.

That's reasonable, thanks for explanation!

We should also consider control of other protocols that use TCP
internally [1], since it should be easy to bypass TCP restriction by
using them (e.g. provoking a fallback of MPTCP or SMC connection to
TCP).

The simplest solution is to implement separate access rights for SMC and
RDS, as well as for MPTCP. I think we should stick to it.

I was worried if there was a case where it would be useful to allow only
SMC (and deny TCP). If there are any, it would be more correct to
restrict only the fallback SMC -> TCP with TCP access rights. But such
logic seems too complicated for the kernel and implicit for SMC
applications that can rely on a TCP connection.

[1] 
https://lore.kernel.org/all/62336067-18c2-3493-d0ec-6dd6a6d3a1b5@huawei-partners.com/

> 
>>
>>>
>>>>>
>>>>> Mikhail, could you please send a new patch series containing one patch
>>>>> to fix the kernel and another to extend tests?
>>>>
>>>> No need to squash them in one, please keep the current split of the test
>>>> patches.  However, it would be good to be able to easily backport them,
>>>> or at least the most relevant for this fix, which means to avoid
>>>> extended refactoring.
>>>
>>> No problem, I'll remove the fix of error consistency from this patchset.
>>> BTW, what do you think about second and third commits? Should I send the
>>> new version of them as well (in separate patch)?
>>
>> According to the description, patch 2 may be included in this series if
>> it can be tested with any other LSM, but I cannot read these patches:
>> https://lore.kernel.org/all/20241017110454.265818-3-ivanov.mikhail1@huawei-partners.com/

Ok I'll do this, since this patch doesn't make any functional changes.

About readability, a lot of code blocks were moved in this patch, and
because of this, the regular diff file has become too unreadable.
So, I decided to re-generate it with --break-rewrites option of git
format- patch. Do you have any advice on how best to compose a diff for
this patch?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions
  2024-12-10 18:07   ` Mickaël Salaün
@ 2024-12-11 15:29     ` Mikhail Ivanov
  0 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-11 15:29 UTC (permalink / raw)
  To: Mickaël Salaün, Paul Moore
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On 12/10/2024 9:07 PM, Mickaël Salaün wrote:
> On Thu, Oct 17, 2024 at 07:04:52PM +0800, Mikhail Ivanov wrote:
>> Add tcp_errors_consistency fixture for TCP errors consistency tests.
>>
>> Add 6 test suits for this fixture to configure tested address family of
>> socket (ipv4 or ipv6), sandboxed mode and whether TCP action is allowed
>> in a sandboxed mode.
>>
>> Add tests which validate errors consistency provided by Landlock for
>> bind(2) and connect(2) restrictable TCP actions.
>>
>> Add sys_bind(), sys_connect() helpers for convenient checks of bind(2)
>> and connect(2). Add set_ipv4_tcp_address(), set_ipv6_tcp_address()
>> helpers.
>>
>> Add CONFIG_LSM="landlock" option in config. Some LSMs (e.g. SElinux)
>> can be loaded before Landlock and return inconsistent error code for
>> bind(2) and connect(2) calls.
>>
>> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
>> ---
>>   tools/testing/selftests/landlock/config     |   1 +
>>   tools/testing/selftests/landlock/net_test.c | 329 +++++++++++++++++++-
>>   2 files changed, 324 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config
>> index a8982da4acbd..52988e8a56cc 100644
>> --- a/tools/testing/selftests/landlock/config
>> +++ b/tools/testing/selftests/landlock/config
>> @@ -3,6 +3,7 @@ CONFIG_CGROUP_SCHED=y
>>   CONFIG_INET=y
>>   CONFIG_IPV6=y
>>   CONFIG_KEYS=y
>> +CONFIG_LSM="landlock"
> 
> We should not force CONFIG_LSM because we may want to test Landlock with
> other LSMs.

Ok, I see

> 
> For now, I think we should ignore wrong error codes that may be returned
> by other LSMs but send this patch with a patch series fixing the LSM
> framework as a whole.  Feel free to include these patches too:
> https://lore.kernel.org/all/20240327120036.233641-1-mic@digikod.net/

Fix for the whole LSM subsystem ofc looks better. Lets try to make it.

> 
>>   CONFIG_MPTCP=y
>>   CONFIG_MPTCP_IPV6=y
>>   CONFIG_NET=y
>> diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
>> index d9de0ee49ebc..30b29bf10bdc 100644
>> --- a/tools/testing/selftests/landlock/net_test.c
>> +++ b/tools/testing/selftests/landlock/net_test.c
>> @@ -36,6 +36,22 @@ enum sandbox_type {
>>   	TCP_SANDBOX,
>>   };
>>   
>> +static void set_ipv4_tcp_address(const struct service_fixture *const srv,
>> +				 struct sockaddr_in *ipv4_addr)
>> +{
>> +	ipv4_addr->sin_family = srv->protocol.domain;
>> +	ipv4_addr->sin_port = htons(srv->port);
>> +	ipv4_addr->sin_addr.s_addr = inet_addr(loopback_ipv4);
>> +}
>> +
>> +static void set_ipv6_tcp_address(const struct service_fixture *const srv,
>> +				 struct sockaddr_in6 *ipv6_addr)
>> +{
>> +	ipv6_addr->sin6_family = srv->protocol.domain;
>> +	ipv6_addr->sin6_port = htons(srv->port);
>> +	inet_pton(AF_INET6, loopback_ipv6, &ipv6_addr->sin6_addr);
>> +}
>> +
>>   static int set_service(struct service_fixture *const srv,
>>   		       const struct protocol_variant prot,
>>   		       const unsigned short index)
>> @@ -56,15 +72,11 @@ static int set_service(struct service_fixture *const srv,
>>   	switch (prot.domain) {
>>   	case AF_UNSPEC:
>>   	case AF_INET:
>> -		srv->ipv4_addr.sin_family = prot.domain;
>> -		srv->ipv4_addr.sin_port = htons(srv->port);
>> -		srv->ipv4_addr.sin_addr.s_addr = inet_addr(loopback_ipv4);
>> +		set_ipv4_tcp_address(srv, &srv->ipv4_addr);
>>   		return 0;
>>   
>>   	case AF_INET6:
>> -		srv->ipv6_addr.sin6_family = prot.domain;
>> -		srv->ipv6_addr.sin6_port = htons(srv->port);
>> -		inet_pton(AF_INET6, loopback_ipv6, &srv->ipv6_addr.sin6_addr);
>> +		set_ipv6_tcp_address(srv, &srv->ipv6_addr);
>>   		return 0;
>>   
>>   	case AF_UNIX:
>> @@ -181,6 +193,17 @@ static uint16_t get_binded_port(int socket_fd,
>>   	}
>>   }
>>   
>> +static int sys_bind(const int sock_fd, const struct sockaddr *addr,
>> +		    socklen_t addrlen)
>> +{
>> +	int ret;
>> +
>> +	ret = bind(sock_fd, addr, addrlen);
>> +	if (ret < 0)
>> +		return -errno;
>> +	return 0;
>> +}
>> +
>>   static int bind_variant_addrlen(const int sock_fd,
>>   				const struct service_fixture *const srv,
>>   				const socklen_t addrlen)
>> @@ -217,6 +240,17 @@ static int bind_variant(const int sock_fd,
>>   	return bind_variant_addrlen(sock_fd, srv, get_addrlen(srv, false));
>>   }
>>   
>> +static int sys_connect(const int sock_fd, const struct sockaddr *addr,
>> +		       socklen_t addrlen)
>> +{
>> +	int ret;
>> +
>> +	ret = connect(sock_fd, addr, addrlen);
>> +	if (ret < 0)
>> +		return -errno;
>> +	return 0;
>> +}
>> +
>>   static int connect_variant_addrlen(const int sock_fd,
>>   				   const struct service_fixture *const srv,
>>   				   const socklen_t addrlen)
>> @@ -923,6 +957,289 @@ TEST_F(protocol, connect_unspec)
>>   	EXPECT_EQ(0, close(bind_fd));
>>   }
>>   
>> +FIXTURE(tcp_errors_consistency)
>> +{
>> +	struct service_fixture srv0, srv1;
>> +	struct sockaddr *inval_addr_p0;
>> +	socklen_t addrlen_min;
>> +
>> +	struct sockaddr_in inval_ipv4_addr;
>> +	struct sockaddr_in6 inval_ipv6_addr;
>> +};
>> +
>> +FIXTURE_VARIANT(tcp_errors_consistency)
>> +{
>> +	const enum sandbox_type sandbox;
>> +	const int domain;
>> +	bool allowed;
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv4) {
>> +	/* clang-format on */
>> +	.sandbox = NO_SANDBOX,
>> +	.domain = AF_INET,
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, no_sandbox_with_ipv6) {
>> +	/* clang-format on */
>> +	.sandbox = NO_SANDBOX,
>> +	.domain = AF_INET6,
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv4) {
>> +	/* clang-format on */
>> +	.sandbox = TCP_SANDBOX,
>> +	.domain = AF_INET,
>> +	.allowed = false,
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv4) {
>> +	/* clang-format on */
>> +	.sandbox = TCP_SANDBOX,
>> +	.domain = AF_INET,
>> +	.allowed = true,
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, denied_with_ipv6) {
>> +	/* clang-format on */
>> +	.sandbox = TCP_SANDBOX,
>> +	.domain = AF_INET6,
>> +	.allowed = false,
>> +};
>> +
>> +/* clang-format off */
>> +FIXTURE_VARIANT_ADD(tcp_errors_consistency, allowed_with_ipv6) {
>> +	/* clang-format on */
>> +	.sandbox = TCP_SANDBOX,
>> +	.domain = AF_INET6,
>> +	.allowed = true,
>> +};
>> +
>> +FIXTURE_SETUP(tcp_errors_consistency)
>> +{
>> +	const struct protocol_variant tcp_prot = {
>> +		.domain = variant->domain,
>> +		.type = SOCK_STREAM,
>> +	};
>> +
>> +	disable_caps(_metadata);
>> +
>> +	set_service(&self->srv0, tcp_prot, 0);
>> +	set_service(&self->srv1, tcp_prot, 1);
>> +
>> +	if (variant->domain == AF_INET) {
>> +		set_ipv4_tcp_address(&self->srv0, &self->inval_ipv4_addr);
>> +		self->inval_ipv4_addr.sin_family = AF_INET6;
>> +
>> +		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv4_addr;
>> +		self->addrlen_min = sizeof(struct sockaddr_in);
>> +	} else {
>> +		set_ipv6_tcp_address(&self->srv0, &self->inval_ipv6_addr);
>> +		self->inval_ipv6_addr.sin6_family = AF_INET;
>> +
>> +		self->inval_addr_p0 = (struct sockaddr *)&self->inval_ipv6_addr;
>> +		self->addrlen_min = SIN6_LEN_RFC2133;
>> +	}
>> +
>> +	setup_loopback(_metadata);
>> +};
>> +
>> +FIXTURE_TEARDOWN(tcp_errors_consistency)
>> +{
>> +}
>> +
>> +/*
>> + * Validates that Landlock provides errors consistency for bind(2) operation
>> + * (not restricted, allowed and denied).
>> + *
>> + * Error consistency implies that in sandboxed process, bind(2) returns the same
>> + * errors and in the same order (assuming multiple errors) as during normal
>> + * execution.
>> + */
>> +TEST_F(tcp_errors_consistency, bind)
>> +{
>> +	if (variant->sandbox == TCP_SANDBOX) {
>> +		const struct landlock_ruleset_attr ruleset_attr = {
>> +			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP,
>> +		};
>> +		int ruleset_fd;
>> +
>> +		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
>> +						     sizeof(ruleset_attr), 0);
>> +		ASSERT_LE(0, ruleset_fd);
>> +
>> +		if (variant->allowed) {
>> +			const struct landlock_net_port_attr tcp_bind_p0 = {
>> +				.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
>> +				.port = self->srv0.port,
>> +			};
>> +
>> +			/* Allows bind for the first port. */
>> +			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
>> +						       LANDLOCK_RULE_NET_PORT,
>> +						       &tcp_bind_p0, 0));
>> +		}
>> +
>> +		enforce_ruleset(_metadata, ruleset_fd);
>> +		EXPECT_EQ(0, close(ruleset_fd));
>> +	}
>> +	int sock_fd;
>> +
>> +	sock_fd = socket_variant(&self->srv0);
>> +	ASSERT_LE(0, sock_fd);
>> +
>> +	/*
>> +	 * Tries to bind socket to address with invalid sa_family value
>> +	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
>> +	 */
>> +	EXPECT_EQ(-EAFNOSUPPORT,
>> +		  sys_bind(sock_fd, self->inval_addr_p0, self->addrlen_min));
>> +
>> +	if (variant->domain == AF_INET) {
>> +		struct sockaddr_in ipv4_unspec_addr;
>> +
>> +		set_ipv4_tcp_address(&self->srv0, &ipv4_unspec_addr);
>> +		ipv4_unspec_addr.sin_family = AF_UNSPEC;
>> +		/*
>> +		 * Ipv4 bind(2) accepts AF_UNSPEC family in address only if address is
>> +		 * INADDR_ANY. Otherwise, returns -EAFNOSUPPORT.
>> +		 */
>> +		EXPECT_EQ(-EAFNOSUPPORT,
>> +			  sys_bind(sock_fd,
>> +				   (struct sockaddr *)&ipv4_unspec_addr,
>> +				   self->addrlen_min));
>> +	}
>> +
>> +	/* Tries to bind with too small addrlen (Cf. inet_bind_sk). */
>> +	EXPECT_EQ(-EINVAL, sys_bind(sock_fd, self->inval_addr_p0,
>> +				    self->addrlen_min - 1));
>> +
>> +	ASSERT_EQ(0, close(sock_fd));
>> +}
>> +
>> +/*
>> + * Validates that Landlock provides errors consistency for connect(2) operation
>> + * (not restricted, allowed and denied).
>> + *
>> + * Error consistency implies that in sandboxed process, connect(2) returns the
>> + * same errors and in the same order (assuming multiple errors) as during normal
>> + * execution.
>> + */
>> +TEST_F(tcp_errors_consistency, connect)
>> +{
>> +	int nonblock_p0_fd;
>> +
>> +	nonblock_p0_fd = socket(variant->domain,
>> +				SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0);
>> +	ASSERT_LE(0, nonblock_p0_fd);
>> +
>> +	/* Tries to connect nonblocking socket before establishing ruleset. */
>> +	ASSERT_EQ(-EINPROGRESS, connect_variant(nonblock_p0_fd, &self->srv0));
>> +
>> +	if (variant->sandbox == TCP_SANDBOX) {
>> +		const struct landlock_ruleset_attr ruleset_attr = {
>> +			.handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP,
>> +		};
>> +		const struct landlock_net_port_attr tcp_connect_p1 = {
>> +			.allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP,
>> +			.port = self->srv1.port,
>> +		};
>> +		int ruleset_fd;
>> +
>> +		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
>> +						     sizeof(ruleset_attr), 0);
>> +		ASSERT_LE(0, ruleset_fd);
>> +
>> +		/* Allows connect for the second port. */
>> +		ASSERT_EQ(0,
>> +			  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
>> +					    &tcp_connect_p1, 0));
>> +
>> +		if (variant->allowed) {
>> +			const struct landlock_net_port_attr tcp_connect_p0 = {
>> +				.allowed_access =
>> +					LANDLOCK_ACCESS_NET_CONNECT_TCP,
>> +				.port = self->srv0.port,
>> +			};
>> +
>> +			/* Allows connect for the first port. */
>> +			ASSERT_EQ(0, landlock_add_rule(ruleset_fd,
>> +						       LANDLOCK_RULE_NET_PORT,
>> +						       &tcp_connect_p0, 0));
>> +		}
>> +
>> +		enforce_ruleset(_metadata, ruleset_fd);
>> +		EXPECT_EQ(0, close(ruleset_fd));
>> +	}
>> +	int client_p0_fd, client_p1_fd, server_p0_fd, server_p1_fd;
>> +
>> +	client_p0_fd = socket_variant(&self->srv0);
>> +	ASSERT_LE(0, client_p0_fd);
>> +	/*
>> +	 * Tries to connect socket to address with invalid sa_family value
>> +	 * (AF_INET for ipv6 socket and AF_INET6 for ipv4 socket).
>> +	 */
>> +	EXPECT_EQ(-EAFNOSUPPORT, sys_connect(client_p0_fd, self->inval_addr_p0,
>> +					     self->addrlen_min));
>> +
>> +	/* Tries to connect with too small addrlen. */
>> +	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0,
>> +				       self->addrlen_min - 1));
>> +
>> +	/* Creates socket listening on zero port. */
>> +	server_p0_fd = socket_variant(&self->srv0);
>> +	ASSERT_LE(0, server_p0_fd);
>> +
>> +	ASSERT_EQ(0, bind_variant(server_p0_fd, &self->srv0));
>> +	ASSERT_EQ(0, listen(server_p0_fd, backlog));
>> +	/* Tries to connect listening socket. */
>> +	EXPECT_EQ(-EISCONN, sys_connect(server_p0_fd, self->inval_addr_p0,
>> +					self->addrlen_min - 1));
>> +
>> +	/* Creates socket listening on first port. */
>> +	server_p1_fd = socket_variant(&self->srv1);
>> +	ASSERT_LE(0, server_p1_fd);
>> +
>> +	ASSERT_EQ(0, bind_variant(server_p1_fd, &self->srv1));
>> +	ASSERT_EQ(0, listen(server_p1_fd, backlog));
>> +
>> +	client_p1_fd = socket_variant(&self->srv1);
>> +	ASSERT_LE(0, client_p1_fd);
>> +
>> +	/* Connects to server_p1_fd. */
>> +	ASSERT_EQ(0, connect_variant(client_p1_fd, &self->srv1));
>> +	/* Tries to connect already connected socket. */
>> +	EXPECT_EQ(-EISCONN, sys_connect(client_p1_fd, self->inval_addr_p0,
>> +					self->addrlen_min - 1));
>> +
>> +	/*
>> +	 * connect(2) is called upon nonblocking socket and previous connection
>> +	 * attempt was closed by RST packet. Landlock cannot provide error
>> +	 * consistency in this case (Cf. check_tcp_connect_consistency_and_get_port()).
>> +	 */
>> +	if (variant->sandbox == TCP_SANDBOX) {
>> +		EXPECT_EQ(-EACCES,
>> +			  connect_variant(nonblock_p0_fd, &self->srv0));
>> +	} else {
>> +		EXPECT_EQ(-ECONNREFUSED,
>> +			  connect_variant(nonblock_p0_fd, &self->srv0));
>> +	}
>> +
>> +	/* Tries to connect with zero as addrlen. */
>> +	EXPECT_EQ(-EINVAL, sys_connect(client_p0_fd, self->inval_addr_p0, 0));
>> +
>> +	ASSERT_EQ(0, close(client_p1_fd));
>> +	ASSERT_EQ(0, close(server_p1_fd));
>> +	ASSERT_EQ(0, close(server_p0_fd));
>> +	ASSERT_EQ(0, close(client_p0_fd));
>> +	ASSERT_EQ(0, close(nonblock_p0_fd));
>> +}
>> +
>>   FIXTURE(ipv4)
>>   {
>>   	struct service_fixture srv0, srv1;
>> -- 
>> 2.34.1
>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation
  2024-12-10 18:08   ` Mickaël Salaün
@ 2024-12-11 15:30     ` Mikhail Ivanov
  0 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-11 15:30 UTC (permalink / raw)
  To: Mickaël Salaün, Paul Moore
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze

On 12/10/2024 9:08 PM, Mickaël Salaün wrote:
> On Thu, Oct 17, 2024 at 07:04:53PM +0800, Mikhail Ivanov wrote:
>> Add recommendation to specify Landlock first in CONFIG_LSM list, so user
>> can have better LSM errors consistency provided by Landlock.
>>
>> Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
>> ---
>>   Documentation/userspace-api/landlock.rst | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
>> index bb7480a05e2c..0db5eee9bffa 100644
>> --- a/Documentation/userspace-api/landlock.rst
>> +++ b/Documentation/userspace-api/landlock.rst
>> @@ -610,7 +610,8 @@ time as the other security modules.  The list of security modules enabled by
>>   default is set with ``CONFIG_LSM``.  The kernel configuration should then
>>   contains ``CONFIG_LSM=landlock,[...]`` with ``[...]``  as the list of other
>>   potentially useful security modules for the running system (see the
>> -``CONFIG_LSM`` help).
>> +``CONFIG_LSM`` help). It is recommended to specify Landlock first of all other
>> +modules in CONFIG_LSM list since it provides better errors consistency.
> 
> This is partially correct because Landlock may not block anything
> whereas another LSM could deny a network action, with potentially a
> wrong error code.  I don't think this patch is worth it, especially
> because other LSMs have bugs that should be fixed.

Ok, agreed

> 
>>   
>>   Boot time configuration
>>   -----------------------
>> -- 
>> 2.34.1
>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-11 15:24                 ` Mikhail Ivanov
@ 2024-12-12 18:43                   ` Mickaël Salaün
  2024-12-13 11:42                     ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-12 18:43 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On Wed, Dec 11, 2024 at 06:24:53PM +0300, Mikhail Ivanov wrote:
> On 12/10/2024 9:05 PM, Mickaël Salaün wrote:
> > On Tue, Dec 10, 2024 at 07:04:15PM +0100, Mickaël Salaün wrote:
> > > On Mon, Dec 09, 2024 at 01:19:19PM +0300, Mikhail Ivanov wrote:
> > > > On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
> > > > > On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
> > > > > > On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
> > > > > > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > > > > > > Hi Mikhail and Landlock maintainers,
> > > > > > > > 
> > > > > > > > +cc MPTCP list.
> > > > > > > 
> > > > > > > Thanks, we should include this list in the next series.
> > > > > > > 
> > > > > > > > 
> > > > > > > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > > > > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > > > > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > > > > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > > > > > > (SCTP, MPTCP, SMC).
> > > > > > > > 
> > > > > > > > Thank you for the patch!
> > > > > > > > 
> > > > > > > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > > > > > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > > > > > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > > > > > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > > > > > > because we needed such dedicated socket to talk to the userspace.
> > > > > > > > 
> > > > > > > > I don't know Landlock well, but I think it is important to know that an
> > > > > > > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > > > > > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > > > > > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > > > > > > by Landlock, someone can simply force an application to create an MPTCP
> > > > > > > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > > > > > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > > > > > > 
> > > > > > > > Please note that I'm not against this modification -- especially here
> > > > > > > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > > > > > > it might be less confusing for users if MPTCP is considered as being
> > > > > > > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > > > > > > TCP is blocked, MPTCP is blocked as well.
> > > > > > > 
> > > > > > > Good point!  I don't know well MPTCP but I think you're right.  Given
> > > > > > > it's close relationship with TCP and the fallback mechanism, it would
> > > > > > > make sense for users to not make a difference and it would avoid bypass
> > > > > > > of misleading restrictions.  Moreover the Landlock rules are simple and
> > > > > > > only control TCP ports, not peer addresses, which seems to be the main
> > > > > > > evolution of MPTCP.
> > > > > > 
> > > > > > Thinking more about this, this makes sense from the point of view of the
> > > > > > network stack, but looking at external (potentially bogus) firewalls or
> > > > > > malware detection systems, it is something different.  If we don't
> > > > > > provide a way for users to differenciate the control of SCTP from TCP,
> > > > > > malicious use of SCTP could still bypass this kind of bogus security
> > > > > > appliances.  It would then be safer to stick to the protocol semantic by
> > > > > > clearly differenciating TCP from MPTCP (or any other protocol).
> > > > 
> > > > You mean that these firewals have protocol granularity (e.g. different
> > > > restrictions for MPTCP and TCP sockets)?
> > > 
> > > Yes, and more importantly they can miss the MTCP semantic and then not
> > > properly filter such packet, which can be use to escape the network
> > > policy.  See some issues here:
> > > https://en.wikipedia.org/wiki/Multipath_TCP
> > > 
> > > The point is that we cannot assume anything about other networking
> > > stacks, and if Landlock can properly differentiate between TCP and MTCP
> > > (e.g. with new LANDLOCK_ACCESS_NET_CONNECT_MTCP) users of such firewalls
> > > could still limit the impact of their firewall's bugs.  However, if
> > > Landlock treats TCP and MTCP the same way, we'll not be able to only
> > > deny MTCP.  In most use cases, the network policy should treat both TCP
> > > and MTCP the same way though, but we should let users decide according
> > > to their context.
> > > 
> > >  From an implementation point of view, adding MTCP support should be
> > > simple, mainly tests will grow.
> > 
> > s/MTCP/MPTCP/g of course.
> 
> That's reasonable, thanks for explanation!
> 
> We should also consider control of other protocols that use TCP
> internally [1], since it should be easy to bypass TCP restriction by
> using them (e.g. provoking a fallback of MPTCP or SMC connection to
> TCP).
> 
> The simplest solution is to implement separate access rights for SMC and
> RDS, as well as for MPTCP. I think we should stick to it.
> 
> I was worried if there was a case where it would be useful to allow only
> SMC (and deny TCP). If there are any, it would be more correct to
> restrict only the fallback SMC -> TCP with TCP access rights. But such
> logic seems too complicated for the kernel and implicit for SMC
> applications that can rely on a TCP connection.
> 
> [1] https://lore.kernel.org/all/62336067-18c2-3493-d0ec-6dd6a6d3a1b5@huawei-partners.com/

Let's continue the discussion on this thread.

> 
> > 
> > > 
> > > > 
> > > > > > 
> > > > > > Mikhail, could you please send a new patch series containing one patch
> > > > > > to fix the kernel and another to extend tests?
> > > > > 
> > > > > No need to squash them in one, please keep the current split of the test
> > > > > patches.  However, it would be good to be able to easily backport them,
> > > > > or at least the most relevant for this fix, which means to avoid
> > > > > extended refactoring.
> > > > 
> > > > No problem, I'll remove the fix of error consistency from this patchset.
> > > > BTW, what do you think about second and third commits? Should I send the
> > > > new version of them as well (in separate patch)?
> > > 
> > > According to the description, patch 2 may be included in this series if
> > > it can be tested with any other LSM, but I cannot read these patches:
> > > https://lore.kernel.org/all/20241017110454.265818-3-ivanov.mikhail1@huawei-partners.com/
> 
> Ok I'll do this, since this patch doesn't make any functional changes.
> 
> About readability, a lot of code blocks were moved in this patch, and
> because of this, the regular diff file has become too unreadable.
> So, I decided to re-generate it with --break-rewrites option of git
> format- patch. Do you have any advice on how best to compose a diff for
> this patch?

The changes are not clear to me so I don't know.  If a lot of parts are
changed, maybe splitting this patch into a few patches would help.  I'm
a bit worried that too much parts are changed though.

When I try to apply this series I get:

  Patch failed at 0002 landlock: Make network stack layer checks explicit
  for each TCP action
  error: patch failed: security/landlock/net.c:1
  error: security/landlock/net.c: patch does not apply

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-10-31 16:21       ` Mikhail Ivanov
  2024-11-08 17:16         ` David Laight
@ 2024-12-12 18:43         ` Mickaël Salaün
  2024-12-13 18:19           ` Mikhail Ivanov
  1 sibling, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2024-12-12 18:43 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs

On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
> On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > Hi Mikhail and Landlock maintainers,
> > > 
> > > +cc MPTCP list.
> > 
> > Thanks, we should include this list in the next series.
> > 
> > > 
> > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > (SCTP, MPTCP, SMC).
> > > 
> > > Thank you for the patch!
> > > 
> > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > because we needed such dedicated socket to talk to the userspace.
> > > 
> > > I don't know Landlock well, but I think it is important to know that an
> > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > by Landlock, someone can simply force an application to create an MPTCP
> > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > 
> > > Please note that I'm not against this modification -- especially here
> > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > it might be less confusing for users if MPTCP is considered as being
> > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > TCP is blocked, MPTCP is blocked as well.
> > 
> > Good point!  I don't know well MPTCP but I think you're right.  Given
> > it's close relationship with TCP and the fallback mechanism, it would
> > make sense for users to not make a difference and it would avoid bypass
> > of misleading restrictions.  Moreover the Landlock rules are simple and
> > only control TCP ports, not peer addresses, which seems to be the main
> > evolution of MPTCP. >
> > > 
> > > I understand that a future goal might probably be to have dedicated
> > > restrictions for MPTCP and the other stream protocols (and/or for all
> > > stream protocols like it was before this patch), but in the meantime, it
> > > might be less confusing considering MPTCP as being part of TCP (I'm not
> > > sure about the other stream protocols).
> > 
> > We need to take a closer look at the other stream protocols indeed.
> Hello! Sorry for the late reply, I was on a small business trip.
> 
> Thanks a lot for this catch, without doubt MPTCP should be controlled
> with TCP access rights.
> 
> In that case, we should reconsider current semantics of TCP control.
> 
> Currently, it looks like this:
> * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
> * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
>   remote port.
> 
> According to these definitions only TCP sockets should be restricted and
> this is already provided by Landlock (considering observing commit)
> (assuming that "TCP socket" := user space socket of IPPROTO_TCP
> protocol).
> 
> AFAICS the two objectives of TCP access rights are to control
> (1) which ports can be used for sending or receiving TCP packets
>     (including SYN, ACK or other service packets).
> (2) which ports can be used to establish TCP connection (performed by
>     kernel network stack on server or client side).
> 
> In most cases denying (2) cause denying (1). Sending or receiving TCP
> packets without initial 3-way handshake is only possible on RAW [1] or
> PACKET [2] sockets. Usage of such sockets requires root privilligies, so
> there is no point to control them with Landlock.

I agree.

> 
> Therefore Landlock should only take care about case (2). For now
> (please correct me if I'm wrong), we only considered control of
> connection performed on user space plain TCP sockets (created with
> IPPROTO_TCP).

Correct. Landlock is dedicated to sandbox user space processes and the
related access rights should focus on restricting what is possible
through syscalls (mainly).

> 
> TCP kernel sockets are generally used in the following ways:
> * in a couple of other user space protocols (MPTCP, SMC, RDS)
> * in a few network filesystems (e.g. NFS communication over TCP)
> 
> For the second case TCP connection is currently not restricted by
> Landlock. This approach is may be correct, since NFS should not have
> access to a plain TCP communication and TCP restriction of NFS may
> be too implicit. Nevertheless, I think that restriction via current
> access rights should be considered.

I'm not sure what you mean here.  I'm not familiar with NFS in the
kernel.  AFAIK there is no socket type for NFS.

> 
> For the first case, each protocol use TCP differently, so they should
> be considered separately.

Yes, for user-accessible protocols.

> 
> In the case of MPTCP TCP internal sockets are used to establish
> connection and exchange data between two network interfaces. MPTCP
> allows to have multiple TCP connections between two MPTCP sockets by
> connecting different network interfaces (e.g. WIFI and 3G).
> 
> Shared Memory Communication is a protocol that allows TCP applications
> transparently use RDMA for communication [3]. TCP internal socket is
> used to exchange service CLC messages when establishing SMC connection
> (which seems harmless for sandboxing) and for communication in the case
> of fallback. Fallback happens only if RDMA communication became
> impossible (e.g. if RDMA capable RNIC card went down on host or peer
> side). So, preventing TCP communication may be achieved by controlling
> fallback mechanism.
> 
> Reliable Datagram Socket is connectionless protocol implemented by
> Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
> datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
> connection and use it to deliever splitted message.
> 
> In comparison with previous protocols, RDS sockets cannot be binded or
> connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
> port is assigned to receiving side and sending side is binded to the
> port allocated by the kernel (by using zero as port number).
> 
> It may be useful to restrict RDS-over-TCP with current access rights,
> since it allows to perform TCP communication from user-space. But it
> would be only possible to fully allow or deny sending/receiving
> (since used ports are not controlled from user space).

Thanks for these explanations.  The ability to fine-control specific
protocol operations (e.g. connect, bind) can be useful for widely used
protocol such as TCP and UDP (or if someone wants to implement it for
another protocol), but this approach would not scale with all protocols
because of their own semantic and the development efforts.  The Landlock
access rights should be explicit, and we should also be able to deny
access to a whole set of protocols.  This should be partially possible
with your socket creation patch series.  I guess the remaining cases
would be to cover transformation of one socket type to another.  I think
we could control such transformation by building on top of the socket
creation control foundation: instead of controlling socket creation, add
a new access right to control socket transformation.  What do you think?

> 
> Restricting any TCP connection in the kernel is probably simplest
> design, but we should consider above cases to provide the most useful
> one.
> 
> [1] https://man7.org/linux/man-pages/man7/raw.7.html
> [2] https://man7.org/linux/man-pages/man7/packet.7.html
> [3] https://datatracker.ietf.org/doc/html/rfc7609
> [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
> 
> > 
> > > 
> > > 
> > > > sk_is_tcp() is used for this to check address family of the socket
> > > > before doing INET-specific address length validation. This is required
> > > > for error consistency.
> > > > 
> > > > Closes: https://github.com/landlock-lsm/linux/issues/40
> > > > Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> > > 
> > > I don't know how fixes are considered in Landlock, but should this patch
> > > be considered as a fix? It might be surprising for someone who thought
> > > all "stream" connections were blocked to have them unblocked when
> > > updating to a minor kernel version, no?
> > 
> > Indeed.  The main issue was with the semantic/definition of
> > LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
> > code with the documentation, one way or the other, preferably following
> > the principle of least astonishment.
> > 
> > > 
> > > (Personally, I would understand such behaviour change when upgrading to
> > > a major version, and still, maybe only if there were alternatives to
> > 
> > This "fix" needs to be backported, but we're not clear yet on what it
> > should be. :)
> > 
> > > continue having the same behaviour, e.g. a way to restrict all stream
> > > sockets the same way, or something per stream socket. But that's just me
> > > :) )
> > 
> > The documentation and the initial idea was to control TCP bind and
> > connect.  The kernel implementation does more than that, so we need to
> > synthronize somehow.
> > 
> > > 
> > > Cheers,
> > > Matt
> > > -- 
> > > Sponsored by the NGI0 Core fund.
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-12 18:43                   ` Mickaël Salaün
@ 2024-12-13 11:42                     ` Mikhail Ivanov
  0 siblings, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-13 11:42 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, David Laight

On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
> On Wed, Dec 11, 2024 at 06:24:53PM +0300, Mikhail Ivanov wrote:
>> On 12/10/2024 9:05 PM, Mickaël Salaün wrote:
>>> On Tue, Dec 10, 2024 at 07:04:15PM +0100, Mickaël Salaün wrote:
>>>> On Mon, Dec 09, 2024 at 01:19:19PM +0300, Mikhail Ivanov wrote:
>>>>> On 12/4/2024 10:35 PM, Mickaël Salaün wrote:
>>>>>> On Wed, Dec 04, 2024 at 08:27:58PM +0100, Mickaël Salaün wrote:
>>>>>>> On Fri, Oct 18, 2024 at 08:08:12PM +0200, Mickaël Salaün wrote:
>>>>>>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>>>>>>> Hi Mikhail and Landlock maintainers,
>>>>>>>>>
>>>>>>>>> +cc MPTCP list.
>>>>>>>>
>>>>>>>> Thanks, we should include this list in the next series.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>>>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>>>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>>>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>>>>>>> (SCTP, MPTCP, SMC).
>>>>>>>>>
>>>>>>>>> Thank you for the patch!
>>>>>>>>>
>>>>>>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>>>>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>>>>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>>>>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>>>>>>> because we needed such dedicated socket to talk to the userspace.
>>>>>>>>>
>>>>>>>>> I don't know Landlock well, but I think it is important to know that an
>>>>>>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>>>>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>>>>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>>>>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>>>>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>>>>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>>>>>>
>>>>>>>>> Please note that I'm not against this modification -- especially here
>>>>>>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>>>>>>> it might be less confusing for users if MPTCP is considered as being
>>>>>>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>>>>>>> TCP is blocked, MPTCP is blocked as well.
>>>>>>>>
>>>>>>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>>>>>>> it's close relationship with TCP and the fallback mechanism, it would
>>>>>>>> make sense for users to not make a difference and it would avoid bypass
>>>>>>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>>>>>>> only control TCP ports, not peer addresses, which seems to be the main
>>>>>>>> evolution of MPTCP.
>>>>>>>
>>>>>>> Thinking more about this, this makes sense from the point of view of the
>>>>>>> network stack, but looking at external (potentially bogus) firewalls or
>>>>>>> malware detection systems, it is something different.  If we don't
>>>>>>> provide a way for users to differenciate the control of SCTP from TCP,
>>>>>>> malicious use of SCTP could still bypass this kind of bogus security
>>>>>>> appliances.  It would then be safer to stick to the protocol semantic by
>>>>>>> clearly differenciating TCP from MPTCP (or any other protocol).
>>>>>
>>>>> You mean that these firewals have protocol granularity (e.g. different
>>>>> restrictions for MPTCP and TCP sockets)?
>>>>
>>>> Yes, and more importantly they can miss the MTCP semantic and then not
>>>> properly filter such packet, which can be use to escape the network
>>>> policy.  See some issues here:
>>>> https://en.wikipedia.org/wiki/Multipath_TCP
>>>>
>>>> The point is that we cannot assume anything about other networking
>>>> stacks, and if Landlock can properly differentiate between TCP and MTCP
>>>> (e.g. with new LANDLOCK_ACCESS_NET_CONNECT_MTCP) users of such firewalls
>>>> could still limit the impact of their firewall's bugs.  However, if
>>>> Landlock treats TCP and MTCP the same way, we'll not be able to only
>>>> deny MTCP.  In most use cases, the network policy should treat both TCP
>>>> and MTCP the same way though, but we should let users decide according
>>>> to their context.
>>>>
>>>>   From an implementation point of view, adding MTCP support should be
>>>> simple, mainly tests will grow.
>>>
>>> s/MTCP/MPTCP/g of course.
>>
>> That's reasonable, thanks for explanation!
>>
>> We should also consider control of other protocols that use TCP
>> internally [1], since it should be easy to bypass TCP restriction by
>> using them (e.g. provoking a fallback of MPTCP or SMC connection to
>> TCP).
>>
>> The simplest solution is to implement separate access rights for SMC and
>> RDS, as well as for MPTCP. I think we should stick to it.
>>
>> I was worried if there was a case where it would be useful to allow only
>> SMC (and deny TCP). If there are any, it would be more correct to
>> restrict only the fallback SMC -> TCP with TCP access rights. But such
>> logic seems too complicated for the kernel and implicit for SMC
>> applications that can rely on a TCP connection.
>>
>> [1] https://lore.kernel.org/all/62336067-18c2-3493-d0ec-6dd6a6d3a1b5@huawei-partners.com/
> 
> Let's continue the discussion on this thread.
> 
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> Mikhail, could you please send a new patch series containing one patch
>>>>>>> to fix the kernel and another to extend tests?
>>>>>>
>>>>>> No need to squash them in one, please keep the current split of the test
>>>>>> patches.  However, it would be good to be able to easily backport them,
>>>>>> or at least the most relevant for this fix, which means to avoid
>>>>>> extended refactoring.
>>>>>
>>>>> No problem, I'll remove the fix of error consistency from this patchset.
>>>>> BTW, what do you think about second and third commits? Should I send the
>>>>> new version of them as well (in separate patch)?
>>>>
>>>> According to the description, patch 2 may be included in this series if
>>>> it can be tested with any other LSM, but I cannot read these patches:
>>>> https://lore.kernel.org/all/20241017110454.265818-3-ivanov.mikhail1@huawei-partners.com/
>>
>> Ok I'll do this, since this patch doesn't make any functional changes.
>>
>> About readability, a lot of code blocks were moved in this patch, and
>> because of this, the regular diff file has become too unreadable.
>> So, I decided to re-generate it with --break-rewrites option of git
>> format- patch. Do you have any advice on how best to compose a diff for
>> this patch?
> 
> The changes are not clear to me so I don't know.  If a lot of parts are
> changed, maybe splitting this patch into a few patches would help.  I'm
> a bit worried that too much parts are changed though.

Mostly, there are just bind() and connect() related checks moved to
hook_socket_{connect, bind}.

I think I'd better move all refactoring-related fixes to a separate
patchset.

> 
> When I try to apply this series I get:
> 
>    Patch failed at 0002 landlock: Make network stack layer checks explicit
>    for each TCP action
>    error: patch failed: security/landlock/net.c:1
>    error: security/landlock/net.c: patch does not apply

Sorry, it looks like patches created using the --break-rewrites option
of git format can only be applied manually. I'll try to split this patch
in v3 so that it can be applied automatically.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-12 18:43         ` Mickaël Salaün
@ 2024-12-13 18:19           ` Mikhail Ivanov
  2025-01-24 15:02             ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2024-12-13 18:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs

On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
> On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
>> On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>> Hi Mikhail and Landlock maintainers,
>>>>
>>>> +cc MPTCP list.
>>>
>>> Thanks, we should include this list in the next series.
>>>
>>>>
>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>> (SCTP, MPTCP, SMC).
>>>>
>>>> Thank you for the patch!
>>>>
>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>> because we needed such dedicated socket to talk to the userspace.
>>>>
>>>> I don't know Landlock well, but I think it is important to know that an
>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>
>>>> Please note that I'm not against this modification -- especially here
>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>> it might be less confusing for users if MPTCP is considered as being
>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>> TCP is blocked, MPTCP is blocked as well.
>>>
>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>> it's close relationship with TCP and the fallback mechanism, it would
>>> make sense for users to not make a difference and it would avoid bypass
>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>> only control TCP ports, not peer addresses, which seems to be the main
>>> evolution of MPTCP. >
>>>>
>>>> I understand that a future goal might probably be to have dedicated
>>>> restrictions for MPTCP and the other stream protocols (and/or for all
>>>> stream protocols like it was before this patch), but in the meantime, it
>>>> might be less confusing considering MPTCP as being part of TCP (I'm not
>>>> sure about the other stream protocols).
>>>
>>> We need to take a closer look at the other stream protocols indeed.
>> Hello! Sorry for the late reply, I was on a small business trip.
>>
>> Thanks a lot for this catch, without doubt MPTCP should be controlled
>> with TCP access rights.
>>
>> In that case, we should reconsider current semantics of TCP control.
>>
>> Currently, it looks like this:
>> * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
>> * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
>>    remote port.
>>
>> According to these definitions only TCP sockets should be restricted and
>> this is already provided by Landlock (considering observing commit)
>> (assuming that "TCP socket" := user space socket of IPPROTO_TCP
>> protocol).
>>
>> AFAICS the two objectives of TCP access rights are to control
>> (1) which ports can be used for sending or receiving TCP packets
>>      (including SYN, ACK or other service packets).
>> (2) which ports can be used to establish TCP connection (performed by
>>      kernel network stack on server or client side).
>>
>> In most cases denying (2) cause denying (1). Sending or receiving TCP
>> packets without initial 3-way handshake is only possible on RAW [1] or
>> PACKET [2] sockets. Usage of such sockets requires root privilligies, so
>> there is no point to control them with Landlock.
> 
> I agree.
> 
>>
>> Therefore Landlock should only take care about case (2). For now
>> (please correct me if I'm wrong), we only considered control of
>> connection performed on user space plain TCP sockets (created with
>> IPPROTO_TCP).
> 
> Correct. Landlock is dedicated to sandbox user space processes and the
> related access rights should focus on restricting what is possible
> through syscalls (mainly).
> 
>>
>> TCP kernel sockets are generally used in the following ways:
>> * in a couple of other user space protocols (MPTCP, SMC, RDS)
>> * in a few network filesystems (e.g. NFS communication over TCP)
>>
>> For the second case TCP connection is currently not restricted by
>> Landlock. This approach is may be correct, since NFS should not have
>> access to a plain TCP communication and TCP restriction of NFS may
>> be too implicit. Nevertheless, I think that restriction via current
>> access rights should be considered.
> 
> I'm not sure what you mean here.  I'm not familiar with NFS in the
> kernel.  AFAIK there is no socket type for NFS.

NFS client makes RPC requests to perform remote file operations on the
NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
the transport layer.

Call trace of creating TCP socket for client->server communication:
	nfs_create_rpc_client()
	rpc_create()
	xprt_create_transport()
	xs_setup_tcp()
	xs_tcp_setup_socket()
	xs_create_sock()

And RPC request is forwarded to TCP stack by calling
	xs_tcp_send_request().

> 
>>
>> For the first case, each protocol use TCP differently, so they should
>> be considered separately.
> 
> Yes, for user-accessible protocols.
> 
>>
>> In the case of MPTCP TCP internal sockets are used to establish
>> connection and exchange data between two network interfaces. MPTCP
>> allows to have multiple TCP connections between two MPTCP sockets by
>> connecting different network interfaces (e.g. WIFI and 3G).
>>
>> Shared Memory Communication is a protocol that allows TCP applications
>> transparently use RDMA for communication [3]. TCP internal socket is
>> used to exchange service CLC messages when establishing SMC connection
>> (which seems harmless for sandboxing) and for communication in the case
>> of fallback. Fallback happens only if RDMA communication became
>> impossible (e.g. if RDMA capable RNIC card went down on host or peer
>> side). So, preventing TCP communication may be achieved by controlling
>> fallback mechanism.
>>
>> Reliable Datagram Socket is connectionless protocol implemented by
>> Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
>> datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
>> connection and use it to deliever splitted message.
>>
>> In comparison with previous protocols, RDS sockets cannot be binded or
>> connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
>> port is assigned to receiving side and sending side is binded to the
>> port allocated by the kernel (by using zero as port number).
>>
>> It may be useful to restrict RDS-over-TCP with current access rights,
>> since it allows to perform TCP communication from user-space. But it
>> would be only possible to fully allow or deny sending/receiving
>> (since used ports are not controlled from user space).
> 
> Thanks for these explanations.  The ability to fine-control specific
> protocol operations (e.g. connect, bind) can be useful for widely used
> protocol such as TCP and UDP (or if someone wants to implement it for
> another protocol), but this approach would not scale with all protocols
> because of their own semantic and the development efforts.  The Landlock
> access rights should be explicit, and we should also be able to deny
> access to a whole set of protocols.  This should be partially possible
> with your socket creation patch series.  I guess the remaining cases
> would be to cover transformation of one socket type to another.  I think
> we could control such transformation by building on top of the socket
> creation control foundation: instead of controlling socket creation, add
> a new access right to control socket transformation.  What do you think?

I agree that implementing fine-control network access rights for other
protocols only to be able to completely restrict TCP operations seems
excessive.

Do you mean the implementation of 2 access rights: for creating and
transforming sockets?

If so, there are only 2 socket protocols that can be transformed to TCP
(in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
a TCP socket can be used implicitly to deliver an RDS datagram. Let's
assume that the process of configuring TCP as a transport for RDS is
also included in the socket transformation control.

Socket creation control is sufficient to restrict the implicit use of a
TCP connection. Theoretically, separate socket transformation
control is only required if the user wants to use (for example) SMC
sockets with restricted (partially or completely) TCP bind(2) and
connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
communication in case of fallback. I think they are unlikely to have any
TCP restrictions.

However, control of fallback to TCP by applying socket creation rules
is too implicit and inconvenient.

Initially, I thought that users could expect TCP access rights to
completely restrict the corresponding TCP actions without additional
rules for sockets. I have concerns that socket transformation control
would not be explicit enough for such purpose.

Probably, it will be more correctly to apply rules that deny creation of
SMC, MPTCP and RDS sockets (or their transformation to TCP) in
landlock_restrict_self() if TCP actions are not fully allowed?

> 
>>
>> Restricting any TCP connection in the kernel is probably simplest
>> design, but we should consider above cases to provide the most useful
>> one.
>>
>> [1] https://man7.org/linux/man-pages/man7/raw.7.html
>> [2] https://man7.org/linux/man-pages/man7/packet.7.html
>> [3] https://datatracker.ietf.org/doc/html/rfc7609
>> [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
>>
>>>
>>>>
>>>>
>>>>> sk_is_tcp() is used for this to check address family of the socket
>>>>> before doing INET-specific address length validation. This is required
>>>>> for error consistency.
>>>>>
>>>>> Closes: https://github.com/landlock-lsm/linux/issues/40
>>>>> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
>>>>
>>>> I don't know how fixes are considered in Landlock, but should this patch
>>>> be considered as a fix? It might be surprising for someone who thought
>>>> all "stream" connections were blocked to have them unblocked when
>>>> updating to a minor kernel version, no?
>>>
>>> Indeed.  The main issue was with the semantic/definition of
>>> LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
>>> code with the documentation, one way or the other, preferably following
>>> the principle of least astonishment.
>>>
>>>>
>>>> (Personally, I would understand such behaviour change when upgrading to
>>>> a major version, and still, maybe only if there were alternatives to
>>>
>>> This "fix" needs to be backported, but we're not clear yet on what it
>>> should be. :)
>>>
>>>> continue having the same behaviour, e.g. a way to restrict all stream
>>>> sockets the same way, or something per stream socket. But that's just me
>>>> :) )
>>>
>>> The documentation and the initial idea was to control TCP bind and
>>> connect.  The kernel implementation does more than that, so we need to
>>> synthronize somehow.
>>>
>>>>
>>>> Cheers,
>>>> Matt
>>>> -- 
>>>> Sponsored by the NGI0 Core fund.
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2024-12-13 18:19           ` Mikhail Ivanov
@ 2025-01-24 15:02             ` Mickaël Salaün
  2025-01-27 12:40               ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2025-01-24 15:02 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs

On Fri, Dec 13, 2024 at 09:19:10PM +0300, Mikhail Ivanov wrote:
> On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
> > On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
> > > On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> > > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > > > Hi Mikhail and Landlock maintainers,
> > > > > 
> > > > > +cc MPTCP list.
> > > > 
> > > > Thanks, we should include this list in the next series.
> > > > 
> > > > > 
> > > > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > > > (SCTP, MPTCP, SMC).
> > > > > 
> > > > > Thank you for the patch!
> > > > > 
> > > > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > > > because we needed such dedicated socket to talk to the userspace.
> > > > > 
> > > > > I don't know Landlock well, but I think it is important to know that an
> > > > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > > > by Landlock, someone can simply force an application to create an MPTCP
> > > > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > > > 
> > > > > Please note that I'm not against this modification -- especially here
> > > > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > > > it might be less confusing for users if MPTCP is considered as being
> > > > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > > > TCP is blocked, MPTCP is blocked as well.
> > > > 
> > > > Good point!  I don't know well MPTCP but I think you're right.  Given
> > > > it's close relationship with TCP and the fallback mechanism, it would
> > > > make sense for users to not make a difference and it would avoid bypass
> > > > of misleading restrictions.  Moreover the Landlock rules are simple and
> > > > only control TCP ports, not peer addresses, which seems to be the main
> > > > evolution of MPTCP. >
> > > > > 
> > > > > I understand that a future goal might probably be to have dedicated
> > > > > restrictions for MPTCP and the other stream protocols (and/or for all
> > > > > stream protocols like it was before this patch), but in the meantime, it
> > > > > might be less confusing considering MPTCP as being part of TCP (I'm not
> > > > > sure about the other stream protocols).
> > > > 
> > > > We need to take a closer look at the other stream protocols indeed.
> > > Hello! Sorry for the late reply, I was on a small business trip.
> > > 
> > > Thanks a lot for this catch, without doubt MPTCP should be controlled
> > > with TCP access rights.
> > > 
> > > In that case, we should reconsider current semantics of TCP control.
> > > 
> > > Currently, it looks like this:
> > > * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
> > > * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
> > >    remote port.
> > > 
> > > According to these definitions only TCP sockets should be restricted and
> > > this is already provided by Landlock (considering observing commit)
> > > (assuming that "TCP socket" := user space socket of IPPROTO_TCP
> > > protocol).
> > > 
> > > AFAICS the two objectives of TCP access rights are to control
> > > (1) which ports can be used for sending or receiving TCP packets
> > >      (including SYN, ACK or other service packets).
> > > (2) which ports can be used to establish TCP connection (performed by
> > >      kernel network stack on server or client side).
> > > 
> > > In most cases denying (2) cause denying (1). Sending or receiving TCP
> > > packets without initial 3-way handshake is only possible on RAW [1] or
> > > PACKET [2] sockets. Usage of such sockets requires root privilligies, so
> > > there is no point to control them with Landlock.
> > 
> > I agree.
> > 
> > > 
> > > Therefore Landlock should only take care about case (2). For now
> > > (please correct me if I'm wrong), we only considered control of
> > > connection performed on user space plain TCP sockets (created with
> > > IPPROTO_TCP).
> > 
> > Correct. Landlock is dedicated to sandbox user space processes and the
> > related access rights should focus on restricting what is possible
> > through syscalls (mainly).
> > 
> > > 
> > > TCP kernel sockets are generally used in the following ways:
> > > * in a couple of other user space protocols (MPTCP, SMC, RDS)
> > > * in a few network filesystems (e.g. NFS communication over TCP)
> > > 
> > > For the second case TCP connection is currently not restricted by
> > > Landlock. This approach is may be correct, since NFS should not have
> > > access to a plain TCP communication and TCP restriction of NFS may
> > > be too implicit. Nevertheless, I think that restriction via current
> > > access rights should be considered.
> > 
> > I'm not sure what you mean here.  I'm not familiar with NFS in the
> > kernel.  AFAIK there is no socket type for NFS.
> 
> NFS client makes RPC requests to perform remote file operations on the
> NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
> the transport layer.
> 
> Call trace of creating TCP socket for client->server communication:
> 	nfs_create_rpc_client()
> 	rpc_create()
> 	xprt_create_transport()
> 	xs_setup_tcp()
> 	xs_tcp_setup_socket()
> 	xs_create_sock()
> 
> And RPC request is forwarded to TCP stack by calling
> 	xs_tcp_send_request().

OK, but it looks like this is connections on behalf of the kernel, that
only the kernel can use.  In other words, when these functions are
called, I guess current_cred() doesn't point to user space credentials.
Because the kernel cannot be restricted by Landlock, we should be good.

> 
> > 
> > > 
> > > For the first case, each protocol use TCP differently, so they should
> > > be considered separately.
> > 
> > Yes, for user-accessible protocols.
> > 
> > > 
> > > In the case of MPTCP TCP internal sockets are used to establish
> > > connection and exchange data between two network interfaces. MPTCP
> > > allows to have multiple TCP connections between two MPTCP sockets by
> > > connecting different network interfaces (e.g. WIFI and 3G).
> > > 
> > > Shared Memory Communication is a protocol that allows TCP applications
> > > transparently use RDMA for communication [3]. TCP internal socket is
> > > used to exchange service CLC messages when establishing SMC connection
> > > (which seems harmless for sandboxing) and for communication in the case
> > > of fallback. Fallback happens only if RDMA communication became
> > > impossible (e.g. if RDMA capable RNIC card went down on host or peer
> > > side). So, preventing TCP communication may be achieved by controlling
> > > fallback mechanism.
> > > 
> > > Reliable Datagram Socket is connectionless protocol implemented by
> > > Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
> > > datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
> > > connection and use it to deliever splitted message.
> > > 
> > > In comparison with previous protocols, RDS sockets cannot be binded or
> > > connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
> > > port is assigned to receiving side and sending side is binded to the
> > > port allocated by the kernel (by using zero as port number).
> > > 
> > > It may be useful to restrict RDS-over-TCP with current access rights,
> > > since it allows to perform TCP communication from user-space. But it
> > > would be only possible to fully allow or deny sending/receiving
> > > (since used ports are not controlled from user space).
> > 
> > Thanks for these explanations.  The ability to fine-control specific
> > protocol operations (e.g. connect, bind) can be useful for widely used
> > protocol such as TCP and UDP (or if someone wants to implement it for
> > another protocol), but this approach would not scale with all protocols
> > because of their own semantic and the development efforts.  The Landlock
> > access rights should be explicit, and we should also be able to deny
> > access to a whole set of protocols.  This should be partially possible
> > with your socket creation patch series.  I guess the remaining cases
> > would be to cover transformation of one socket type to another.  I think
> > we could control such transformation by building on top of the socket
> > creation control foundation: instead of controlling socket creation, add
> > a new access right to control socket transformation.  What do you think?
> 
> I agree that implementing fine-control network access rights for other
> protocols only to be able to completely restrict TCP operations seems
> excessive.
> 
> Do you mean the implementation of 2 access rights: for creating and
> transforming sockets?

Yes, but if it's not too complex I think it would make sense to only
have one access right that will cover these two cases.  I'm not sure
there is one common point where to check these socket transformation
though.

> 
> If so, there are only 2 socket protocols that can be transformed to TCP
> (in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
> a TCP socket can be used implicitly to deliver an RDS datagram.

Hmm, interesting.  Then we'll also need an access right to use a
protocol?  I'm worried that this kind of check would have a significant
performance impact.  I think we could tag a socket at creation time with
the allowed protocol transitions.

> Let's
> assume that the process of configuring TCP as a transport for RDS is
> also included in the socket transformation control.
> 
> Socket creation control is sufficient to restrict the implicit use of a
> TCP connection. Theoretically, separate socket transformation
> control is only required if the user wants to use (for example) SMC
> sockets with restricted (partially or completely) TCP bind(2) and
> connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
> communication in case of fallback. I think they are unlikely to have any
> TCP restrictions.
> 
> However, control of fallback to TCP by applying socket creation rules
> is too implicit and inconvenient.
> 
> Initially, I thought that users could expect TCP access rights to
> completely restrict the corresponding TCP actions without additional
> rules for sockets. I have concerns that socket transformation control
> would not be explicit enough for such purpose.
> 
> Probably, it will be more correctly to apply rules that deny creation of
> SMC, MPTCP and RDS sockets (or their transformation to TCP) in
> landlock_restrict_self() if TCP actions are not fully allowed?

That should be achieved with your socket creation control patch series
right?

I'm not sure to understand the use of landlock_restrict_self() here.
Rulesets should fully define an access control on their own.

> 
> > 
> > > 
> > > Restricting any TCP connection in the kernel is probably simplest
> > > design, but we should consider above cases to provide the most useful
> > > one.
> > > 
> > > [1] https://man7.org/linux/man-pages/man7/raw.7.html
> > > [2] https://man7.org/linux/man-pages/man7/packet.7.html
> > > [3] https://datatracker.ietf.org/doc/html/rfc7609
> > > [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > > sk_is_tcp() is used for this to check address family of the socket
> > > > > > before doing INET-specific address length validation. This is required
> > > > > > for error consistency.

Could you please send a new patch series for this specific fix,
including minimal tests?  I'd like to merge that as soon as possible,
and it will be backported to all kernel versions.

> > > > > > 
> > > > > > Closes: https://github.com/landlock-lsm/linux/issues/40
> > > > > > Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> > > > > 
> > > > > I don't know how fixes are considered in Landlock, but should this patch
> > > > > be considered as a fix? It might be surprising for someone who thought
> > > > > all "stream" connections were blocked to have them unblocked when
> > > > > updating to a minor kernel version, no?
> > > > 
> > > > Indeed.  The main issue was with the semantic/definition of
> > > > LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
> > > > code with the documentation, one way or the other, preferably following
> > > > the principle of least astonishment.
> > > > 
> > > > > 
> > > > > (Personally, I would understand such behaviour change when upgrading to
> > > > > a major version, and still, maybe only if there were alternatives to
> > > > 
> > > > This "fix" needs to be backported, but we're not clear yet on what it
> > > > should be. :)
> > > > 
> > > > > continue having the same behaviour, e.g. a way to restrict all stream
> > > > > sockets the same way, or something per stream socket. But that's just me
> > > > > :) )
> > > > 
> > > > The documentation and the initial idea was to control TCP bind and
> > > > connect.  The kernel implementation does more than that, so we need to
> > > > synthronize somehow.
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Matt
> > > > > -- 
> > > > > Sponsored by the NGI0 Core fund.
> > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-24 15:02             ` Mickaël Salaün
@ 2025-01-27 12:40               ` Mikhail Ivanov
  2025-01-27 19:48                 ` Mickaël Salaün
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-27 12:40 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs

On 1/24/2025 6:02 PM, Mickaël Salaün wrote:
> On Fri, Dec 13, 2024 at 09:19:10PM +0300, Mikhail Ivanov wrote:
>> On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
>>> On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
>>>> On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
>>>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>>>> Hi Mikhail and Landlock maintainers,
>>>>>>
>>>>>> +cc MPTCP list.
>>>>>
>>>>> Thanks, we should include this list in the next series.
>>>>>
>>>>>>
>>>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>>>> (SCTP, MPTCP, SMC).
>>>>>>
>>>>>> Thank you for the patch!
>>>>>>
>>>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>>>> because we needed such dedicated socket to talk to the userspace.
>>>>>>
>>>>>> I don't know Landlock well, but I think it is important to know that an
>>>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>>>
>>>>>> Please note that I'm not against this modification -- especially here
>>>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>>>> it might be less confusing for users if MPTCP is considered as being
>>>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>>>> TCP is blocked, MPTCP is blocked as well.
>>>>>
>>>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>>>> it's close relationship with TCP and the fallback mechanism, it would
>>>>> make sense for users to not make a difference and it would avoid bypass
>>>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>>>> only control TCP ports, not peer addresses, which seems to be the main
>>>>> evolution of MPTCP. >
>>>>>>
>>>>>> I understand that a future goal might probably be to have dedicated
>>>>>> restrictions for MPTCP and the other stream protocols (and/or for all
>>>>>> stream protocols like it was before this patch), but in the meantime, it
>>>>>> might be less confusing considering MPTCP as being part of TCP (I'm not
>>>>>> sure about the other stream protocols).
>>>>>
>>>>> We need to take a closer look at the other stream protocols indeed.
>>>> Hello! Sorry for the late reply, I was on a small business trip.
>>>>
>>>> Thanks a lot for this catch, without doubt MPTCP should be controlled
>>>> with TCP access rights.
>>>>
>>>> In that case, we should reconsider current semantics of TCP control.
>>>>
>>>> Currently, it looks like this:
>>>> * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
>>>> * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
>>>>     remote port.
>>>>
>>>> According to these definitions only TCP sockets should be restricted and
>>>> this is already provided by Landlock (considering observing commit)
>>>> (assuming that "TCP socket" := user space socket of IPPROTO_TCP
>>>> protocol).
>>>>
>>>> AFAICS the two objectives of TCP access rights are to control
>>>> (1) which ports can be used for sending or receiving TCP packets
>>>>       (including SYN, ACK or other service packets).
>>>> (2) which ports can be used to establish TCP connection (performed by
>>>>       kernel network stack on server or client side).
>>>>
>>>> In most cases denying (2) cause denying (1). Sending or receiving TCP
>>>> packets without initial 3-way handshake is only possible on RAW [1] or
>>>> PACKET [2] sockets. Usage of such sockets requires root privilligies, so
>>>> there is no point to control them with Landlock.
>>>
>>> I agree.
>>>
>>>>
>>>> Therefore Landlock should only take care about case (2). For now
>>>> (please correct me if I'm wrong), we only considered control of
>>>> connection performed on user space plain TCP sockets (created with
>>>> IPPROTO_TCP).
>>>
>>> Correct. Landlock is dedicated to sandbox user space processes and the
>>> related access rights should focus on restricting what is possible
>>> through syscalls (mainly).
>>>
>>>>
>>>> TCP kernel sockets are generally used in the following ways:
>>>> * in a couple of other user space protocols (MPTCP, SMC, RDS)
>>>> * in a few network filesystems (e.g. NFS communication over TCP)
>>>>
>>>> For the second case TCP connection is currently not restricted by
>>>> Landlock. This approach is may be correct, since NFS should not have
>>>> access to a plain TCP communication and TCP restriction of NFS may
>>>> be too implicit. Nevertheless, I think that restriction via current
>>>> access rights should be considered.
>>>
>>> I'm not sure what you mean here.  I'm not familiar with NFS in the
>>> kernel.  AFAIK there is no socket type for NFS.
>>
>> NFS client makes RPC requests to perform remote file operations on the
>> NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
>> the transport layer.
>>
>> Call trace of creating TCP socket for client->server communication:
>> 	nfs_create_rpc_client()
>> 	rpc_create()
>> 	xprt_create_transport()
>> 	xs_setup_tcp()
>> 	xs_tcp_setup_socket()
>> 	xs_create_sock()
>>
>> And RPC request is forwarded to TCP stack by calling
>> 	xs_tcp_send_request().
> 
> OK, but it looks like this is connections on behalf of the kernel, that
> only the kernel can use.  In other words, when these functions are
> called, I guess current_cred() doesn't point to user space credentials.
> Because the kernel cannot be restricted by Landlock, we should be good.

Agreed, only NFS can establish and use its connections directly.
NFS uses kernel_{bind, connect}() methods on kernel sockets, so TCP
operations are not checked by LSM.

> 
>>
>>>
>>>>
>>>> For the first case, each protocol use TCP differently, so they should
>>>> be considered separately.
>>>
>>> Yes, for user-accessible protocols.
>>>
>>>>
>>>> In the case of MPTCP TCP internal sockets are used to establish
>>>> connection and exchange data between two network interfaces. MPTCP
>>>> allows to have multiple TCP connections between two MPTCP sockets by
>>>> connecting different network interfaces (e.g. WIFI and 3G).
>>>>
>>>> Shared Memory Communication is a protocol that allows TCP applications
>>>> transparently use RDMA for communication [3]. TCP internal socket is
>>>> used to exchange service CLC messages when establishing SMC connection
>>>> (which seems harmless for sandboxing) and for communication in the case
>>>> of fallback. Fallback happens only if RDMA communication became
>>>> impossible (e.g. if RDMA capable RNIC card went down on host or peer
>>>> side). So, preventing TCP communication may be achieved by controlling
>>>> fallback mechanism.
>>>>
>>>> Reliable Datagram Socket is connectionless protocol implemented by
>>>> Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
>>>> datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
>>>> connection and use it to deliever splitted message.
>>>>
>>>> In comparison with previous protocols, RDS sockets cannot be binded or
>>>> connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
>>>> port is assigned to receiving side and sending side is binded to the
>>>> port allocated by the kernel (by using zero as port number).
>>>>
>>>> It may be useful to restrict RDS-over-TCP with current access rights,
>>>> since it allows to perform TCP communication from user-space. But it
>>>> would be only possible to fully allow or deny sending/receiving
>>>> (since used ports are not controlled from user space).
>>>
>>> Thanks for these explanations.  The ability to fine-control specific
>>> protocol operations (e.g. connect, bind) can be useful for widely used
>>> protocol such as TCP and UDP (or if someone wants to implement it for
>>> another protocol), but this approach would not scale with all protocols
>>> because of their own semantic and the development efforts.  The Landlock
>>> access rights should be explicit, and we should also be able to deny
>>> access to a whole set of protocols.  This should be partially possible
>>> with your socket creation patch series.  I guess the remaining cases
>>> would be to cover transformation of one socket type to another.  I think
>>> we could control such transformation by building on top of the socket
>>> creation control foundation: instead of controlling socket creation, add
>>> a new access right to control socket transformation.  What do you think?
>>
>> I agree that implementing fine-control network access rights for other
>> protocols only to be able to completely restrict TCP operations seems
>> excessive.
>>
>> Do you mean the implementation of 2 access rights: for creating and
>> transforming sockets?
> 
> Yes, but if it's not too complex I think it would make sense to only
> have one access right that will cover these two cases.  I'm not sure
> there is one common point where to check these socket transformation
> though.

There are at least 3 different places where some kind of transformation
is taking place.

> 
>>
>> If so, there are only 2 socket protocols that can be transformed to TCP
>> (in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
>> a TCP socket can be used implicitly to deliver an RDS datagram.
> 
> Hmm, interesting.  Then we'll also need an access right to use a
> protocol?  I'm worried that this kind of check would have a significant
> performance impact.  I think we could tag a socket at creation time with
> the allowed protocol transitions.

What do you mean by "to use a protocol"?

> 
>> Let's
>> assume that the process of configuring TCP as a transport for RDS is
>> also included in the socket transformation control.
>>
>> Socket creation control is sufficient to restrict the implicit use of a
>> TCP connection. Theoretically, separate socket transformation
>> control is only required if the user wants to use (for example) SMC
>> sockets with restricted (partially or completely) TCP bind(2) and
>> connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
>> communication in case of fallback. I think they are unlikely to have any
>> TCP restrictions.
>>
>> However, control of fallback to TCP by applying socket creation rules
>> is too implicit and inconvenient.
>>
>> Initially, I thought that users could expect TCP access rights to
>> completely restrict the corresponding TCP actions without additional
>> rules for sockets. I have concerns that socket transformation control
>> would not be explicit enough for such purpose.
>>
>> Probably, it will be more correctly to apply rules that deny creation of
>> SMC, MPTCP and RDS sockets (or their transformation to TCP) in
>> landlock_restrict_self() if TCP actions are not fully allowed?
> 
> That should be achieved with your socket creation control patch series
> right?

That's correct. I was just a little worried about a possible unawareness
on the part of the user about the sockets transformation. I'll better
just make a note in the documentation about this.

> 
> I'm not sure to understand the use of landlock_restrict_self() here.
> Rulesets should fully define an access control on their own.

You're right, landlock_restrict_self() can not define any additional
rules.

> 
>>
>>>
>>>>
>>>> Restricting any TCP connection in the kernel is probably simplest
>>>> design, but we should consider above cases to provide the most useful
>>>> one.
>>>>
>>>> [1] https://man7.org/linux/man-pages/man7/raw.7.html
>>>> [2] https://man7.org/linux/man-pages/man7/packet.7.html
>>>> [3] https://datatracker.ietf.org/doc/html/rfc7609
>>>> [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> sk_is_tcp() is used for this to check address family of the socket
>>>>>>> before doing INET-specific address length validation. This is required
>>>>>>> for error consistency.
> 
> Could you please send a new patch series for this specific fix,
> including minimal tests?  I'd like to merge that as soon as possible,
> and it will be backported to all kernel versions.

Ok, I'll do it ASAP.

> 
>>>>>>>
>>>>>>> Closes: https://github.com/landlock-lsm/linux/issues/40
>>>>>>> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
>>>>>>
>>>>>> I don't know how fixes are considered in Landlock, but should this patch
>>>>>> be considered as a fix? It might be surprising for someone who thought
>>>>>> all "stream" connections were blocked to have them unblocked when
>>>>>> updating to a minor kernel version, no?
>>>>>
>>>>> Indeed.  The main issue was with the semantic/definition of
>>>>> LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
>>>>> code with the documentation, one way or the other, preferably following
>>>>> the principle of least astonishment.
>>>>>
>>>>>>
>>>>>> (Personally, I would understand such behaviour change when upgrading to
>>>>>> a major version, and still, maybe only if there were alternatives to
>>>>>
>>>>> This "fix" needs to be backported, but we're not clear yet on what it
>>>>> should be. :)
>>>>>
>>>>>> continue having the same behaviour, e.g. a way to restrict all stream
>>>>>> sockets the same way, or something per stream socket. But that's just me
>>>>>> :) )
>>>>>
>>>>> The documentation and the initial idea was to control TCP bind and
>>>>> connect.  The kernel implementation does more than that, so we need to
>>>>> synthronize somehow.
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Matt
>>>>>> -- 
>>>>>> Sponsored by the NGI0 Core fund.
>>>>>>
>>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-27 12:40               ` Mikhail Ivanov
@ 2025-01-27 19:48                 ` Mickaël Salaün
  2025-01-28 10:56                   ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2025-01-27 19:48 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

On Mon, Jan 27, 2025 at 03:40:33PM +0300, Mikhail Ivanov wrote:
> On 1/24/2025 6:02 PM, Mickaël Salaün wrote:
> > On Fri, Dec 13, 2024 at 09:19:10PM +0300, Mikhail Ivanov wrote:
> > > On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
> > > > On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
> > > > > On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
> > > > > > On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
> > > > > > > Hi Mikhail and Landlock maintainers,
> > > > > > > 
> > > > > > > +cc MPTCP list.
> > > > > > 
> > > > > > Thanks, we should include this list in the next series.
> > > > > > 
> > > > > > > 
> > > > > > > On 17/10/2024 13:04, Mikhail Ivanov wrote:
> > > > > > > > Do not check TCP access right if socket protocol is not IPPROTO_TCP.
> > > > > > > > LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
> > > > > > > > should not restrict bind(2) and connect(2) for non-TCP protocols
> > > > > > > > (SCTP, MPTCP, SMC).
> > > > > > > 
> > > > > > > Thank you for the patch!
> > > > > > > 
> > > > > > > I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
> > > > > > > treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
> > > > > > > see TCP packets with extra TCP options. On Linux, there is indeed a
> > > > > > > dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
> > > > > > > because we needed such dedicated socket to talk to the userspace.
> > > > > > > 
> > > > > > > I don't know Landlock well, but I think it is important to know that an
> > > > > > > MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
> > > > > > > will do a fallback to "plain" TCP if MPTCP is not supported by the other
> > > > > > > peer or by a middlebox. It means that with this patch, if TCP is blocked
> > > > > > > by Landlock, someone can simply force an application to create an MPTCP
> > > > > > > socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
> > > > > > > certainly work, even when connecting to a peer not supporting MPTCP.
> > > > > > > 
> > > > > > > Please note that I'm not against this modification -- especially here
> > > > > > > when we remove restrictions around MPTCP sockets :) -- I'm just saying
> > > > > > > it might be less confusing for users if MPTCP is considered as being
> > > > > > > part of TCP. A bit similar to what someone would do with a firewall: if
> > > > > > > TCP is blocked, MPTCP is blocked as well.
> > > > > > 
> > > > > > Good point!  I don't know well MPTCP but I think you're right.  Given
> > > > > > it's close relationship with TCP and the fallback mechanism, it would
> > > > > > make sense for users to not make a difference and it would avoid bypass
> > > > > > of misleading restrictions.  Moreover the Landlock rules are simple and
> > > > > > only control TCP ports, not peer addresses, which seems to be the main
> > > > > > evolution of MPTCP. >
> > > > > > > 
> > > > > > > I understand that a future goal might probably be to have dedicated
> > > > > > > restrictions for MPTCP and the other stream protocols (and/or for all
> > > > > > > stream protocols like it was before this patch), but in the meantime, it
> > > > > > > might be less confusing considering MPTCP as being part of TCP (I'm not
> > > > > > > sure about the other stream protocols).
> > > > > > 
> > > > > > We need to take a closer look at the other stream protocols indeed.
> > > > > Hello! Sorry for the late reply, I was on a small business trip.
> > > > > 
> > > > > Thanks a lot for this catch, without doubt MPTCP should be controlled
> > > > > with TCP access rights.
> > > > > 
> > > > > In that case, we should reconsider current semantics of TCP control.
> > > > > 
> > > > > Currently, it looks like this:
> > > > > * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
> > > > > * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
> > > > >     remote port.
> > > > > 
> > > > > According to these definitions only TCP sockets should be restricted and
> > > > > this is already provided by Landlock (considering observing commit)
> > > > > (assuming that "TCP socket" := user space socket of IPPROTO_TCP
> > > > > protocol).
> > > > > 
> > > > > AFAICS the two objectives of TCP access rights are to control
> > > > > (1) which ports can be used for sending or receiving TCP packets
> > > > >       (including SYN, ACK or other service packets).
> > > > > (2) which ports can be used to establish TCP connection (performed by
> > > > >       kernel network stack on server or client side).
> > > > > 
> > > > > In most cases denying (2) cause denying (1). Sending or receiving TCP
> > > > > packets without initial 3-way handshake is only possible on RAW [1] or
> > > > > PACKET [2] sockets. Usage of such sockets requires root privilligies, so
> > > > > there is no point to control them with Landlock.
> > > > 
> > > > I agree.
> > > > 
> > > > > 
> > > > > Therefore Landlock should only take care about case (2). For now
> > > > > (please correct me if I'm wrong), we only considered control of
> > > > > connection performed on user space plain TCP sockets (created with
> > > > > IPPROTO_TCP).
> > > > 
> > > > Correct. Landlock is dedicated to sandbox user space processes and the
> > > > related access rights should focus on restricting what is possible
> > > > through syscalls (mainly).
> > > > 
> > > > > 
> > > > > TCP kernel sockets are generally used in the following ways:
> > > > > * in a couple of other user space protocols (MPTCP, SMC, RDS)
> > > > > * in a few network filesystems (e.g. NFS communication over TCP)
> > > > > 
> > > > > For the second case TCP connection is currently not restricted by
> > > > > Landlock. This approach is may be correct, since NFS should not have
> > > > > access to a plain TCP communication and TCP restriction of NFS may
> > > > > be too implicit. Nevertheless, I think that restriction via current
> > > > > access rights should be considered.
> > > > 
> > > > I'm not sure what you mean here.  I'm not familiar with NFS in the
> > > > kernel.  AFAIK there is no socket type for NFS.
> > > 
> > > NFS client makes RPC requests to perform remote file operations on the
> > > NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
> > > the transport layer.
> > > 
> > > Call trace of creating TCP socket for client->server communication:
> > > 	nfs_create_rpc_client()
> > > 	rpc_create()
> > > 	xprt_create_transport()
> > > 	xs_setup_tcp()
> > > 	xs_tcp_setup_socket()
> > > 	xs_create_sock()
> > > 
> > > And RPC request is forwarded to TCP stack by calling
> > > 	xs_tcp_send_request().
> > 
> > OK, but it looks like this is connections on behalf of the kernel, that
> > only the kernel can use.  In other words, when these functions are
> > called, I guess current_cred() doesn't point to user space credentials.
> > Because the kernel cannot be restricted by Landlock, we should be good.
> 
> Agreed, only NFS can establish and use its connections directly.
> NFS uses kernel_{bind, connect}() methods on kernel sockets, so TCP
> operations are not checked by LSM.
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > For the first case, each protocol use TCP differently, so they should
> > > > > be considered separately.
> > > > 
> > > > Yes, for user-accessible protocols.
> > > > 
> > > > > 
> > > > > In the case of MPTCP TCP internal sockets are used to establish
> > > > > connection and exchange data between two network interfaces. MPTCP
> > > > > allows to have multiple TCP connections between two MPTCP sockets by
> > > > > connecting different network interfaces (e.g. WIFI and 3G).
> > > > > 
> > > > > Shared Memory Communication is a protocol that allows TCP applications
> > > > > transparently use RDMA for communication [3]. TCP internal socket is
> > > > > used to exchange service CLC messages when establishing SMC connection
> > > > > (which seems harmless for sandboxing) and for communication in the case
> > > > > of fallback. Fallback happens only if RDMA communication became
> > > > > impossible (e.g. if RDMA capable RNIC card went down on host or peer
> > > > > side). So, preventing TCP communication may be achieved by controlling
> > > > > fallback mechanism.
> > > > > 
> > > > > Reliable Datagram Socket is connectionless protocol implemented by
> > > > > Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
> > > > > datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
> > > > > connection and use it to deliever splitted message.
> > > > > 
> > > > > In comparison with previous protocols, RDS sockets cannot be binded or
> > > > > connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
> > > > > port is assigned to receiving side and sending side is binded to the
> > > > > port allocated by the kernel (by using zero as port number).
> > > > > 
> > > > > It may be useful to restrict RDS-over-TCP with current access rights,
> > > > > since it allows to perform TCP communication from user-space. But it
> > > > > would be only possible to fully allow or deny sending/receiving
> > > > > (since used ports are not controlled from user space).
> > > > 
> > > > Thanks for these explanations.  The ability to fine-control specific
> > > > protocol operations (e.g. connect, bind) can be useful for widely used
> > > > protocol such as TCP and UDP (or if someone wants to implement it for
> > > > another protocol), but this approach would not scale with all protocols
> > > > because of their own semantic and the development efforts.  The Landlock
> > > > access rights should be explicit, and we should also be able to deny
> > > > access to a whole set of protocols.  This should be partially possible
> > > > with your socket creation patch series.  I guess the remaining cases
> > > > would be to cover transformation of one socket type to another.  I think
> > > > we could control such transformation by building on top of the socket
> > > > creation control foundation: instead of controlling socket creation, add
> > > > a new access right to control socket transformation.  What do you think?
> > > 
> > > I agree that implementing fine-control network access rights for other
> > > protocols only to be able to completely restrict TCP operations seems
> > > excessive.
> > > 
> > > Do you mean the implementation of 2 access rights: for creating and
> > > transforming sockets?
> > 
> > Yes, but if it's not too complex I think it would make sense to only
> > have one access right that will cover these two cases.  I'm not sure
> > there is one common point where to check these socket transformation
> > though.
> 
> There are at least 3 different places where some kind of transformation
> is taking place.

I'm a bit worried that we miss some of these places (now or in future
kernel versions).  We'll need a new LSM hook for that.

Could you list the current locations?

> 
> > 
> > > 
> > > If so, there are only 2 socket protocols that can be transformed to TCP
> > > (in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
> > > a TCP socket can be used implicitly to deliver an RDS datagram.
> > 
> > Hmm, interesting.  Then we'll also need an access right to use a
> > protocol?  I'm worried that this kind of check would have a significant
> > performance impact.  I think we could tag a socket at creation time with
> > the allowed protocol transitions.
> 
> What do you mean by "to use a protocol"?

To use a socket with a specific protocol.  Until now, I though being
able to control socket creation would be enough, but being able to use
one kind of socket with different protocols would be an issue if users
want to control the use of protocols (which makes sense from an access
control point of view).

> 
> > 
> > > Let's
> > > assume that the process of configuring TCP as a transport for RDS is
> > > also included in the socket transformation control.
> > > 
> > > Socket creation control is sufficient to restrict the implicit use of a
> > > TCP connection. Theoretically, separate socket transformation
> > > control is only required if the user wants to use (for example) SMC
> > > sockets with restricted (partially or completely) TCP bind(2) and
> > > connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
> > > communication in case of fallback. I think they are unlikely to have any
> > > TCP restrictions.
> > > 
> > > However, control of fallback to TCP by applying socket creation rules
> > > is too implicit and inconvenient.
> > > 
> > > Initially, I thought that users could expect TCP access rights to
> > > completely restrict the corresponding TCP actions without additional
> > > rules for sockets. I have concerns that socket transformation control
> > > would not be explicit enough for such purpose.
> > > 
> > > Probably, it will be more correctly to apply rules that deny creation of
> > > SMC, MPTCP and RDS sockets (or their transformation to TCP) in
> > > landlock_restrict_self() if TCP actions are not fully allowed?
> > 
> > That should be achieved with your socket creation control patch series
> > right?
> 
> That's correct. I was just a little worried about a possible unawareness
> on the part of the user about the sockets transformation. I'll better
> just make a note in the documentation about this.

That's why I was talking about a dedicated access right to get a clear
semantic (socket creation vs. and socket use/transition).  However, I
don't really see use cases where one should be used and not the other,
and that could also misleading to users, which means we should probably
only have one access right and consider protocol transitions as a kind
of socket creation (and find a more appropriate name).

> 
> > 
> > I'm not sure to understand the use of landlock_restrict_self() here.
> > Rulesets should fully define an access control on their own.
> 
> You're right, landlock_restrict_self() can not define any additional
> rules.
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > Restricting any TCP connection in the kernel is probably simplest
> > > > > design, but we should consider above cases to provide the most useful
> > > > > one.
> > > > > 
> > > > > [1] https://man7.org/linux/man-pages/man7/raw.7.html
> > > > > [2] https://man7.org/linux/man-pages/man7/packet.7.html
> > > > > [3] https://datatracker.ietf.org/doc/html/rfc7609
> > > > > [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > sk_is_tcp() is used for this to check address family of the socket
> > > > > > > > before doing INET-specific address length validation. This is required
> > > > > > > > for error consistency.
> > 
> > Could you please send a new patch series for this specific fix,
> > including minimal tests?  I'd like to merge that as soon as possible,
> > and it will be backported to all kernel versions.
> 
> Ok, I'll do it ASAP.

Great

> 
> > 
> > > > > > > > 
> > > > > > > > Closes: https://github.com/landlock-lsm/linux/issues/40
> > > > > > > > Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
> > > > > > > 
> > > > > > > I don't know how fixes are considered in Landlock, but should this patch
> > > > > > > be considered as a fix? It might be surprising for someone who thought
> > > > > > > all "stream" connections were blocked to have them unblocked when
> > > > > > > updating to a minor kernel version, no?
> > > > > > 
> > > > > > Indeed.  The main issue was with the semantic/definition of
> > > > > > LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
> > > > > > code with the documentation, one way or the other, preferably following
> > > > > > the principle of least astonishment.
> > > > > > 
> > > > > > > 
> > > > > > > (Personally, I would understand such behaviour change when upgrading to
> > > > > > > a major version, and still, maybe only if there were alternatives to
> > > > > > 
> > > > > > This "fix" needs to be backported, but we're not clear yet on what it
> > > > > > should be. :)
> > > > > > 
> > > > > > > continue having the same behaviour, e.g. a way to restrict all stream
> > > > > > > sockets the same way, or something per stream socket. But that's just me
> > > > > > > :) )
> > > > > > 
> > > > > > The documentation and the initial idea was to control TCP bind and
> > > > > > connect.  The kernel implementation does more than that, so we need to
> > > > > > synthronize somehow.
> > > > > > 
> > > > > > > 
> > > > > > > Cheers,
> > > > > > > Matt
> > > > > > > -- 
> > > > > > > Sponsored by the NGI0 Core fund.
> > > > > > > 
> > > > > > > 
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-27 19:48                 ` Mickaël Salaün
@ 2025-01-28 10:56                   ` Mikhail Ivanov
  2025-01-28 18:14                     ` Matthieu Baerts
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-28 10:56 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
> On Mon, Jan 27, 2025 at 03:40:33PM +0300, Mikhail Ivanov wrote:
>> On 1/24/2025 6:02 PM, Mickaël Salaün wrote:
>>> On Fri, Dec 13, 2024 at 09:19:10PM +0300, Mikhail Ivanov wrote:
>>>> On 12/12/2024 9:43 PM, Mickaël Salaün wrote:
>>>>> On Thu, Oct 31, 2024 at 07:21:44PM +0300, Mikhail Ivanov wrote:
>>>>>> On 10/18/2024 9:08 PM, Mickaël Salaün wrote:
>>>>>>> On Thu, Oct 17, 2024 at 02:59:48PM +0200, Matthieu Baerts wrote:
>>>>>>>> Hi Mikhail and Landlock maintainers,
>>>>>>>>
>>>>>>>> +cc MPTCP list.
>>>>>>>
>>>>>>> Thanks, we should include this list in the next series.
>>>>>>>
>>>>>>>>
>>>>>>>> On 17/10/2024 13:04, Mikhail Ivanov wrote:
>>>>>>>>> Do not check TCP access right if socket protocol is not IPPROTO_TCP.
>>>>>>>>> LANDLOCK_ACCESS_NET_BIND_TCP and LANDLOCK_ACCESS_NET_CONNECT_TCP
>>>>>>>>> should not restrict bind(2) and connect(2) for non-TCP protocols
>>>>>>>>> (SCTP, MPTCP, SMC).
>>>>>>>>
>>>>>>>> Thank you for the patch!
>>>>>>>>
>>>>>>>> I'm part of the MPTCP team, and I'm wondering if MPTCP should not be
>>>>>>>> treated like TCP here. MPTCP is an extension to TCP: on the wire, we can
>>>>>>>> see TCP packets with extra TCP options. On Linux, there is indeed a
>>>>>>>> dedicated MPTCP socket (IPPROTO_MPTCP), but that's just internal,
>>>>>>>> because we needed such dedicated socket to talk to the userspace.
>>>>>>>>
>>>>>>>> I don't know Landlock well, but I think it is important to know that an
>>>>>>>> MPTCP socket can be used to discuss with "plain" TCP packets: the kernel
>>>>>>>> will do a fallback to "plain" TCP if MPTCP is not supported by the other
>>>>>>>> peer or by a middlebox. It means that with this patch, if TCP is blocked
>>>>>>>> by Landlock, someone can simply force an application to create an MPTCP
>>>>>>>> socket -- e.g. via LD_PRELOAD -- and bypass the restrictions. It will
>>>>>>>> certainly work, even when connecting to a peer not supporting MPTCP.
>>>>>>>>
>>>>>>>> Please note that I'm not against this modification -- especially here
>>>>>>>> when we remove restrictions around MPTCP sockets :) -- I'm just saying
>>>>>>>> it might be less confusing for users if MPTCP is considered as being
>>>>>>>> part of TCP. A bit similar to what someone would do with a firewall: if
>>>>>>>> TCP is blocked, MPTCP is blocked as well.
>>>>>>>
>>>>>>> Good point!  I don't know well MPTCP but I think you're right.  Given
>>>>>>> it's close relationship with TCP and the fallback mechanism, it would
>>>>>>> make sense for users to not make a difference and it would avoid bypass
>>>>>>> of misleading restrictions.  Moreover the Landlock rules are simple and
>>>>>>> only control TCP ports, not peer addresses, which seems to be the main
>>>>>>> evolution of MPTCP. >
>>>>>>>>
>>>>>>>> I understand that a future goal might probably be to have dedicated
>>>>>>>> restrictions for MPTCP and the other stream protocols (and/or for all
>>>>>>>> stream protocols like it was before this patch), but in the meantime, it
>>>>>>>> might be less confusing considering MPTCP as being part of TCP (I'm not
>>>>>>>> sure about the other stream protocols).
>>>>>>>
>>>>>>> We need to take a closer look at the other stream protocols indeed.
>>>>>> Hello! Sorry for the late reply, I was on a small business trip.
>>>>>>
>>>>>> Thanks a lot for this catch, without doubt MPTCP should be controlled
>>>>>> with TCP access rights.
>>>>>>
>>>>>> In that case, we should reconsider current semantics of TCP control.
>>>>>>
>>>>>> Currently, it looks like this:
>>>>>> * LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port.
>>>>>> * LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to a
>>>>>>      remote port.
>>>>>>
>>>>>> According to these definitions only TCP sockets should be restricted and
>>>>>> this is already provided by Landlock (considering observing commit)
>>>>>> (assuming that "TCP socket" := user space socket of IPPROTO_TCP
>>>>>> protocol).
>>>>>>
>>>>>> AFAICS the two objectives of TCP access rights are to control
>>>>>> (1) which ports can be used for sending or receiving TCP packets
>>>>>>        (including SYN, ACK or other service packets).
>>>>>> (2) which ports can be used to establish TCP connection (performed by
>>>>>>        kernel network stack on server or client side).
>>>>>>
>>>>>> In most cases denying (2) cause denying (1). Sending or receiving TCP
>>>>>> packets without initial 3-way handshake is only possible on RAW [1] or
>>>>>> PACKET [2] sockets. Usage of such sockets requires root privilligies, so
>>>>>> there is no point to control them with Landlock.
>>>>>
>>>>> I agree.
>>>>>
>>>>>>
>>>>>> Therefore Landlock should only take care about case (2). For now
>>>>>> (please correct me if I'm wrong), we only considered control of
>>>>>> connection performed on user space plain TCP sockets (created with
>>>>>> IPPROTO_TCP).
>>>>>
>>>>> Correct. Landlock is dedicated to sandbox user space processes and the
>>>>> related access rights should focus on restricting what is possible
>>>>> through syscalls (mainly).
>>>>>
>>>>>>
>>>>>> TCP kernel sockets are generally used in the following ways:
>>>>>> * in a couple of other user space protocols (MPTCP, SMC, RDS)
>>>>>> * in a few network filesystems (e.g. NFS communication over TCP)
>>>>>>
>>>>>> For the second case TCP connection is currently not restricted by
>>>>>> Landlock. This approach is may be correct, since NFS should not have
>>>>>> access to a plain TCP communication and TCP restriction of NFS may
>>>>>> be too implicit. Nevertheless, I think that restriction via current
>>>>>> access rights should be considered.
>>>>>
>>>>> I'm not sure what you mean here.  I'm not familiar with NFS in the
>>>>> kernel.  AFAIK there is no socket type for NFS.
>>>>
>>>> NFS client makes RPC requests to perform remote file operations on the
>>>> NFS server. RPC requests can be sent using TCP, UDP, or RDMA sockets at
>>>> the transport layer.
>>>>
>>>> Call trace of creating TCP socket for client->server communication:
>>>> 	nfs_create_rpc_client()
>>>> 	rpc_create()
>>>> 	xprt_create_transport()
>>>> 	xs_setup_tcp()
>>>> 	xs_tcp_setup_socket()
>>>> 	xs_create_sock()
>>>>
>>>> And RPC request is forwarded to TCP stack by calling
>>>> 	xs_tcp_send_request().
>>>
>>> OK, but it looks like this is connections on behalf of the kernel, that
>>> only the kernel can use.  In other words, when these functions are
>>> called, I guess current_cred() doesn't point to user space credentials.
>>> Because the kernel cannot be restricted by Landlock, we should be good.
>>
>> Agreed, only NFS can establish and use its connections directly.
>> NFS uses kernel_{bind, connect}() methods on kernel sockets, so TCP
>> operations are not checked by LSM.
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> For the first case, each protocol use TCP differently, so they should
>>>>>> be considered separately.
>>>>>
>>>>> Yes, for user-accessible protocols.
>>>>>
>>>>>>
>>>>>> In the case of MPTCP TCP internal sockets are used to establish
>>>>>> connection and exchange data between two network interfaces. MPTCP
>>>>>> allows to have multiple TCP connections between two MPTCP sockets by
>>>>>> connecting different network interfaces (e.g. WIFI and 3G).
>>>>>>
>>>>>> Shared Memory Communication is a protocol that allows TCP applications
>>>>>> transparently use RDMA for communication [3]. TCP internal socket is
>>>>>> used to exchange service CLC messages when establishing SMC connection
>>>>>> (which seems harmless for sandboxing) and for communication in the case
>>>>>> of fallback. Fallback happens only if RDMA communication became
>>>>>> impossible (e.g. if RDMA capable RNIC card went down on host or peer
>>>>>> side). So, preventing TCP communication may be achieved by controlling
>>>>>> fallback mechanism.
>>>>>>
>>>>>> Reliable Datagram Socket is connectionless protocol implemented by
>>>>>> Oracle [4]. It uses TCP stack or Infiniband to reliably deliever
>>>>>> datagrams. For every sendmsg(2), recvmsg(2) it establishes TCP
>>>>>> connection and use it to deliever splitted message.
>>>>>>
>>>>>> In comparison with previous protocols, RDS sockets cannot be binded or
>>>>>> connected to special TCP ports (e.g. with bind(2), connect(2)). 16385
>>>>>> port is assigned to receiving side and sending side is binded to the
>>>>>> port allocated by the kernel (by using zero as port number).
>>>>>>
>>>>>> It may be useful to restrict RDS-over-TCP with current access rights,
>>>>>> since it allows to perform TCP communication from user-space. But it
>>>>>> would be only possible to fully allow or deny sending/receiving
>>>>>> (since used ports are not controlled from user space).
>>>>>
>>>>> Thanks for these explanations.  The ability to fine-control specific
>>>>> protocol operations (e.g. connect, bind) can be useful for widely used
>>>>> protocol such as TCP and UDP (or if someone wants to implement it for
>>>>> another protocol), but this approach would not scale with all protocols
>>>>> because of their own semantic and the development efforts.  The Landlock
>>>>> access rights should be explicit, and we should also be able to deny
>>>>> access to a whole set of protocols.  This should be partially possible
>>>>> with your socket creation patch series.  I guess the remaining cases
>>>>> would be to cover transformation of one socket type to another.  I think
>>>>> we could control such transformation by building on top of the socket
>>>>> creation control foundation: instead of controlling socket creation, add
>>>>> a new access right to control socket transformation.  What do you think?
>>>>
>>>> I agree that implementing fine-control network access rights for other
>>>> protocols only to be able to completely restrict TCP operations seems
>>>> excessive.
>>>>
>>>> Do you mean the implementation of 2 access rights: for creating and
>>>> transforming sockets?
>>>
>>> Yes, but if it's not too complex I think it would make sense to only
>>> have one access right that will cover these two cases.  I'm not sure
>>> there is one common point where to check these socket transformation
>>> though.
>>
>> There are at least 3 different places where some kind of transformation
>> is taking place.
> 
> I'm a bit worried that we miss some of these places (now or in future
> kernel versions).  We'll need a new LSM hook for that.
> 
> Could you list the current locations?

Currently, I know only about TCP-related transformations:

* SMC can fallback to TCP during connection. TCP connection is used
   (1) to exchange CLC control messages in default case and (2) for the
   communication in the case of fallback. If socket was connected or
   connection failed, socket can not be reconnected again. There is no
   existing security hook to control the fallback case,

* MPTCP uses TCP for communication between two network interfaces in the
   default case and can fallback to plain TCP if remote peer does not
   support MPTCP. AFAICS, there is also no security hook to control the
   fallback transformation,

* IPv6 -> IPv4 transformation for TCP and UDP sockets with
   IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.

As I said before, I wonder if user may want to use SMC or MPTCP and deny
TCP communication, since he should rely on fallback transformation
during the connection in the common case. It may be unexpected for
connect(2) to fail during the fallback due to security politics.

Theoretically, any TCP restriction should cause similar SMC and MPTCP
restriction. If we deny creation of TCP sockets, we should also deny
creation of SMC and MPTCP sockets. I thought that such dependencies may
be too complex and it will be better to leave them for the user and not
provide any transformation control at all. What do you think?

IPV6_ADDRFORM case is simple and should be covered with "socket
creation" access right.

> 
>>
>>>
>>>>
>>>> If so, there are only 2 socket protocols that can be transformed to TCP
>>>> (in the fallback path) - MPTCP and SMC. Recall that in the case of RDS,
>>>> a TCP socket can be used implicitly to deliver an RDS datagram.
>>>
>>> Hmm, interesting.  Then we'll also need an access right to use a
>>> protocol?  I'm worried that this kind of check would have a significant
>>> performance impact.  I think we could tag a socket at creation time with
>>> the allowed protocol transitions.
>>
>> What do you mean by "to use a protocol"?
> 
> To use a socket with a specific protocol.  Until now, I though being
> able to control socket creation would be enough, but being able to use
> one kind of socket with different protocols would be an issue if users
> want to control the use of protocols (which makes sense from an access
> control point of view).

Got it, thanks!

> 
>>
>>>
>>>> Let's
>>>> assume that the process of configuring TCP as a transport for RDS is
>>>> also included in the socket transformation control.
>>>>
>>>> Socket creation control is sufficient to restrict the implicit use of a
>>>> TCP connection. Theoretically, separate socket transformation
>>>> control is only required if the user wants to use (for example) SMC
>>>> sockets with restricted (partially or completely) TCP bind(2) and
>>>> connect(2) actions. But SMC (or MPTCP) applications should rely on TCP
>>>> communication in case of fallback. I think they are unlikely to have any
>>>> TCP restrictions.
>>>>
>>>> However, control of fallback to TCP by applying socket creation rules
>>>> is too implicit and inconvenient.
>>>>
>>>> Initially, I thought that users could expect TCP access rights to
>>>> completely restrict the corresponding TCP actions without additional
>>>> rules for sockets. I have concerns that socket transformation control
>>>> would not be explicit enough for such purpose.
>>>>
>>>> Probably, it will be more correctly to apply rules that deny creation of
>>>> SMC, MPTCP and RDS sockets (or their transformation to TCP) in
>>>> landlock_restrict_self() if TCP actions are not fully allowed?
>>>
>>> That should be achieved with your socket creation control patch series
>>> right?
>>
>> That's correct. I was just a little worried about a possible unawareness
>> on the part of the user about the sockets transformation. I'll better
>> just make a note in the documentation about this.
> 
> That's why I was talking about a dedicated access right to get a clear
> semantic (socket creation vs. and socket use/transition).  However, I
> don't really see use cases where one should be used and not the other,
> and that could also misleading to users, which means we should probably
> only have one access right and consider protocol transitions as a kind
> of socket creation (and find a more appropriate name).

Agreed. There is no point to control socket transformation with a
separate right.

> 
>>
>>>
>>> I'm not sure to understand the use of landlock_restrict_self() here.
>>> Rulesets should fully define an access control on their own.
>>
>> You're right, landlock_restrict_self() can not define any additional
>> rules.
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Restricting any TCP connection in the kernel is probably simplest
>>>>>> design, but we should consider above cases to provide the most useful
>>>>>> one.
>>>>>>
>>>>>> [1] https://man7.org/linux/man-pages/man7/raw.7.html
>>>>>> [2] https://man7.org/linux/man-pages/man7/packet.7.html
>>>>>> [3] https://datatracker.ietf.org/doc/html/rfc7609
>>>>>> [4] https://oss.oracle.com/projects/rds/dist/documentation/rds-3.1-spec.html
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> sk_is_tcp() is used for this to check address family of the socket
>>>>>>>>> before doing INET-specific address length validation. This is required
>>>>>>>>> for error consistency.
>>>
>>> Could you please send a new patch series for this specific fix,
>>> including minimal tests?  I'd like to merge that as soon as possible,
>>> and it will be backported to all kernel versions.
>>
>> Ok, I'll do it ASAP.
> 
> Great
> 
>>
>>>
>>>>>>>>>
>>>>>>>>> Closes: https://github.com/landlock-lsm/linux/issues/40
>>>>>>>>> Fixes: fff69fb03dde ("landlock: Support network rules with TCP bind and connect")
>>>>>>>>
>>>>>>>> I don't know how fixes are considered in Landlock, but should this patch
>>>>>>>> be considered as a fix? It might be surprising for someone who thought
>>>>>>>> all "stream" connections were blocked to have them unblocked when
>>>>>>>> updating to a minor kernel version, no?
>>>>>>>
>>>>>>> Indeed.  The main issue was with the semantic/definition of
>>>>>>> LANDLOCK_ACCESS_FS_NET_{CONNECT,BIND}_TCP.  We need to synchronize the
>>>>>>> code with the documentation, one way or the other, preferably following
>>>>>>> the principle of least astonishment.
>>>>>>>
>>>>>>>>
>>>>>>>> (Personally, I would understand such behaviour change when upgrading to
>>>>>>>> a major version, and still, maybe only if there were alternatives to
>>>>>>>
>>>>>>> This "fix" needs to be backported, but we're not clear yet on what it
>>>>>>> should be. :)
>>>>>>>
>>>>>>>> continue having the same behaviour, e.g. a way to restrict all stream
>>>>>>>> sockets the same way, or something per stream socket. But that's just me
>>>>>>>> :) )
>>>>>>>
>>>>>>> The documentation and the initial idea was to control TCP bind and
>>>>>>> connect.  The kernel implementation does more than that, so we need to
>>>>>>> synthronize somehow.
>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Matt
>>>>>>>> -- 
>>>>>>>> Sponsored by the NGI0 Core fund.
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-28 10:56                   ` Mikhail Ivanov
@ 2025-01-28 18:14                     ` Matthieu Baerts
  2025-01-29  9:52                       ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-28 18:14 UTC (permalink / raw)
  To: Mikhail Ivanov, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

Hi Mikhail,

Sorry, I didn't follow all the discussions in this thread, but here are
some comments, hoping this can help to clarify the MPTCP case.

On 28/01/2025 11:56, Mikhail Ivanov wrote:
> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:

(...)

>> I'm a bit worried that we miss some of these places (now or in future
>> kernel versions).  We'll need a new LSM hook for that.
>>
>> Could you list the current locations?
> 
> Currently, I know only about TCP-related transformations:
> 
> * SMC can fallback to TCP during connection. TCP connection is used
>   (1) to exchange CLC control messages in default case and (2) for the
>   communication in the case of fallback. If socket was connected or
>   connection failed, socket can not be reconnected again. There is no
>   existing security hook to control the fallback case,
> 
> * MPTCP uses TCP for communication between two network interfaces in the
>   default case and can fallback to plain TCP if remote peer does not
>   support MPTCP. AFAICS, there is also no security hook to control the
>   fallback transformation,

There are security hooks to control the path creation, but not to
control the "fallback transformation".

Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
socket. This is only used "internally": to communicate between the
userspace and the kernelspace, but not directly used between network
interfaces. This "external" communication is done via one or multiple
kernel TCP sockets carrying extra TCP options for the mapping. The
userspace cannot directly control these sockets created by the kernel.

In case of fallback, the kernel TCP socket "simply" drop the extra TCP
options needed for MPTCP, and carry on like normal TCP. So on the wire
and in the Linux network stack, it is the same TCP connection, without
the MPTCP options in the TCP header. The userspace continue to
communicate with the same socket.

I'm not sure if there is a need to block the fallback: it means only one
path can be used at a time.

> * IPv6 -> IPv4 transformation for TCP and UDP sockets with
>   IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
> 
> As I said before, I wonder if user may want to use SMC or MPTCP and deny
> TCP communication, since he should rely on fallback transformation
> during the connection in the common case. It may be unexpected for
> connect(2) to fail during the fallback due to security politics.

With MPTCP, fallbacks can happen at the beginning of a connection, when
there is only one path. This is done after the userspace's connect(). If
the fallback is blocked, I guess the userspace will get the same errors
as when an open connection is reset.

(Note that on the listener side, the fallback can happen before the
userspace's accept() which can even get an IPPROTO_TCP socket in return)

> Theoretically, any TCP restriction should cause similar SMC and MPTCP
> restriction. If we deny creation of TCP sockets, we should also deny
> creation of SMC and MPTCP sockets. I thought that such dependencies may
> be too complex and it will be better to leave them for the user and not
> provide any transformation control at all. What do you think?
I guess the creation of "kernel" TCP sockets used by MPTCP (and SMC?)
can be restricted, it depends on where this hook is placed I suppose.

(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-28 18:14                     ` Matthieu Baerts
@ 2025-01-29  9:52                       ` Mikhail Ivanov
  2025-01-29 10:25                         ` Matthieu Baerts
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-29  9:52 UTC (permalink / raw)
  To: Matthieu Baerts, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
> Hi Mikhail,
> 
> Sorry, I didn't follow all the discussions in this thread, but here are
> some comments, hoping this can help to clarify the MPTCP case.

Thanks a lot for sharing your knowledge, Matthieu!

> 
> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
> 
> (...)
> 
>>> I'm a bit worried that we miss some of these places (now or in future
>>> kernel versions).  We'll need a new LSM hook for that.
>>>
>>> Could you list the current locations?
>>
>> Currently, I know only about TCP-related transformations:
>>
>> * SMC can fallback to TCP during connection. TCP connection is used
>>    (1) to exchange CLC control messages in default case and (2) for the
>>    communication in the case of fallback. If socket was connected or
>>    connection failed, socket can not be reconnected again. There is no
>>    existing security hook to control the fallback case,
>>
>> * MPTCP uses TCP for communication between two network interfaces in the
>>    default case and can fallback to plain TCP if remote peer does not
>>    support MPTCP. AFAICS, there is also no security hook to control the
>>    fallback transformation,
> 
> There are security hooks to control the path creation, but not to
> control the "fallback transformation".
> 
> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
> socket. This is only used "internally": to communicate between the
> userspace and the kernelspace, but not directly used between network
> interfaces. This "external" communication is done via one or multiple
> kernel TCP sockets carrying extra TCP options for the mapping. The
> userspace cannot directly control these sockets created by the kernel.
> 
> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
> options needed for MPTCP, and carry on like normal TCP. So on the wire
> and in the Linux network stack, it is the same TCP connection, without
> the MPTCP options in the TCP header. The userspace continue to
> communicate with the same socket.
> 
> I'm not sure if there is a need to block the fallback: it means only one
> path can be used at a time.

You mean that users always rely on a plain TCP communication in the case
the connection of MPTCP multipath communication fails?

> 
>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>    IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>
>> As I said before, I wonder if user may want to use SMC or MPTCP and deny
>> TCP communication, since he should rely on fallback transformation
>> during the connection in the common case. It may be unexpected for
>> connect(2) to fail during the fallback due to security politics.
> 
> With MPTCP, fallbacks can happen at the beginning of a connection, when
> there is only one path. This is done after the userspace's connect(). If
> the fallback is blocked, I guess the userspace will get the same errors
> as when an open connection is reset.

In the case of blocking due to security policy, userspace should get
-EACESS. I mean, the user might not expect the fallback path to be
blocked during the connection if he has allowed only MPTCP communication
using the Landlock policy.

> 
> (Note that on the listener side, the fallback can happen before the
> userspace's accept() which can even get an IPPROTO_TCP socket in return)

Indeed, fallback can happen on a server side as well.

> 
>> Theoretically, any TCP restriction should cause similar SMC and MPTCP
>> restriction. If we deny creation of TCP sockets, we should also deny
>> creation of SMC and MPTCP sockets. I thought that such dependencies may
>> be too complex and it will be better to leave them for the user and not
>> provide any transformation control at all. What do you think?
> I guess the creation of "kernel" TCP sockets used by MPTCP (and SMC?)
> can be restricted, it depends on where this hook is placed I suppose.

Calling
	socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)
causes creation of kernel TCP socket, so we can use
security_socket_create() hook for this purpose.

> 
> (...)
> 
> Cheers,
> Matt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29  9:52                       ` Mikhail Ivanov
@ 2025-01-29 10:25                         ` Matthieu Baerts
  2025-01-29 11:02                           ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-29 10:25 UTC (permalink / raw)
  To: Mikhail Ivanov, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

Hi Mikhail,

On 29/01/2025 10:52, Mikhail Ivanov wrote:
> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>> Hi Mikhail,
>>
>> Sorry, I didn't follow all the discussions in this thread, but here are
>> some comments, hoping this can help to clarify the MPTCP case.
> 
> Thanks a lot for sharing your knowledge, Matthieu!
> 
>>
>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>
>> (...)
>>
>>>> I'm a bit worried that we miss some of these places (now or in future
>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>
>>>> Could you list the current locations?
>>>
>>> Currently, I know only about TCP-related transformations:
>>>
>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>    (1) to exchange CLC control messages in default case and (2) for the
>>>    communication in the case of fallback. If socket was connected or
>>>    connection failed, socket can not be reconnected again. There is no
>>>    existing security hook to control the fallback case,
>>>
>>> * MPTCP uses TCP for communication between two network interfaces in the
>>>    default case and can fallback to plain TCP if remote peer does not
>>>    support MPTCP. AFAICS, there is also no security hook to control the
>>>    fallback transformation,
>>
>> There are security hooks to control the path creation, but not to
>> control the "fallback transformation".
>>
>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>> socket. This is only used "internally": to communicate between the
>> userspace and the kernelspace, but not directly used between network
>> interfaces. This "external" communication is done via one or multiple
>> kernel TCP sockets carrying extra TCP options for the mapping. The
>> userspace cannot directly control these sockets created by the kernel.
>>
>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>> and in the Linux network stack, it is the same TCP connection, without
>> the MPTCP options in the TCP header. The userspace continue to
>> communicate with the same socket.
>>
>> I'm not sure if there is a need to block the fallback: it means only one
>> path can be used at a time.
> 
> You mean that users always rely on a plain TCP communication in the case
> the connection of MPTCP multipath communication fails?

Yes, that's the same TCP connection, just without extra bit to be able
to use multiple TCP connections associated to the same MPTCP one.

>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>    IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>>
>>> As I said before, I wonder if user may want to use SMC or MPTCP and deny
>>> TCP communication, since he should rely on fallback transformation
>>> during the connection in the common case. It may be unexpected for
>>> connect(2) to fail during the fallback due to security politics.
>>
>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>> there is only one path. This is done after the userspace's connect(). If
>> the fallback is blocked, I guess the userspace will get the same errors
>> as when an open connection is reset.
> 
> In the case of blocking due to security policy, userspace should get
> -EACESS. I mean, the user might not expect the fallback path to be
> blocked during the connection if he has allowed only MPTCP communication
> using the Landlock policy.

A "fallback" can happen on different occasions as mentioned in the
RFC8684 [1], e.g.

- The client asks to use MPTCP, but the other peer doesn't support it:

  Client                Server
  |     SYN + MP_CAPABLE     |
  |------------------------->|
  |         SYN/ACK          |
  |<-------------------------|  => Fallback on the client side
  |           ACK            |
  |------------------------->|

- A middle box doesn't touch the 3WHS, but intercept the communication
just after:

  Client                Server
  |     SYN + MP_CAPABLE     |
  |------------------------->|
  |   SYN/ACK + MP_CAPABLE   |
  |<-------------------------|
  |     ACK + MP_CAPABLE     |
  |------------------------->|
  |        DSS + data        | => but the server doesn't receive the DSS
  |------------------------->| => So fallback on the server side
  |           ACK            |
  |<-------------------------| => Fallback on the client side

- etc.

So the connect(), even in blocking mode, can be OK, but the "fallback"
will happen later.

Again, once the "fallback" has been done, it just means there will be no
more MPTCP options in the TCP headers, and these TCP connections,
created and controlled by the kernel, will continue as "plain" TCP
connections. It simply means that the MPTCP connection will be
restricted to one path, because it will not be possible to create
additional paths any more without these MPTCP options in the initial path.

[1] https://datatracker.ietf.org/doc/html/rfc8684#name-fallback

>> (Note that on the listener side, the fallback can happen before the
>> userspace's accept() which can even get an IPPROTO_TCP socket in return)
> 
> Indeed, fallback can happen on a server side as well.

Same here, this fallback can happen at different stages of the
connection, e.g. the server, supporting MPTCP, can receive a SYN without
MP_CAPABLE option ; or the 3WHS is OK, but the MPTCP options are
stripped later.

>>> Theoretically, any TCP restriction should cause similar SMC and MPTCP
>>> restriction. If we deny creation of TCP sockets, we should also deny
>>> creation of SMC and MPTCP sockets. I thought that such dependencies may
>>> be too complex and it will be better to leave them for the user and not
>>> provide any transformation control at all. What do you think?
>> I guess the creation of "kernel" TCP sockets used by MPTCP (and SMC?)
>> can be restricted, it depends on where this hook is placed I suppose.
> 
> Calling
>     socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)
> causes creation of kernel TCP socket, so we can use
> security_socket_create() hook for this purpose.

That's good if you use this hook then!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 10:25                         ` Matthieu Baerts
@ 2025-01-29 11:02                           ` Mikhail Ivanov
  2025-01-29 11:33                             ` Matthieu Baerts
  0 siblings, 1 reply; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-29 11:02 UTC (permalink / raw)
  To: Matthieu Baerts, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
> Hi Mikhail,
> 
> On 29/01/2025 10:52, Mikhail Ivanov wrote:
>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>>> Hi Mikhail,
>>>
>>> Sorry, I didn't follow all the discussions in this thread, but here are
>>> some comments, hoping this can help to clarify the MPTCP case.
>>
>> Thanks a lot for sharing your knowledge, Matthieu!
>>
>>>
>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>>
>>> (...)
>>>
>>>>> I'm a bit worried that we miss some of these places (now or in future
>>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>>
>>>>> Could you list the current locations?
>>>>
>>>> Currently, I know only about TCP-related transformations:
>>>>
>>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>>     (1) to exchange CLC control messages in default case and (2) for the
>>>>     communication in the case of fallback. If socket was connected or
>>>>     connection failed, socket can not be reconnected again. There is no
>>>>     existing security hook to control the fallback case,
>>>>
>>>> * MPTCP uses TCP for communication between two network interfaces in the
>>>>     default case and can fallback to plain TCP if remote peer does not
>>>>     support MPTCP. AFAICS, there is also no security hook to control the
>>>>     fallback transformation,
>>>
>>> There are security hooks to control the path creation, but not to
>>> control the "fallback transformation".
>>>
>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>>> socket. This is only used "internally": to communicate between the
>>> userspace and the kernelspace, but not directly used between network
>>> interfaces. This "external" communication is done via one or multiple
>>> kernel TCP sockets carrying extra TCP options for the mapping. The
>>> userspace cannot directly control these sockets created by the kernel.
>>>
>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>>> and in the Linux network stack, it is the same TCP connection, without
>>> the MPTCP options in the TCP header. The userspace continue to
>>> communicate with the same socket.
>>>
>>> I'm not sure if there is a need to block the fallback: it means only one
>>> path can be used at a time.
>>
>> You mean that users always rely on a plain TCP communication in the case
>> the connection of MPTCP multipath communication fails?
> 
> Yes, that's the same TCP connection, just without extra bit to be able
> to use multiple TCP connections associated to the same MPTCP one.

Indeed, so MPTCP communication should be restricted the same way as TCP.
AFAICS this should be intuitive for MPTCP users and it'll be better
to let userland define this dependency.

> 
>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>     IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>>>
>>>> As I said before, I wonder if user may want to use SMC or MPTCP and deny
>>>> TCP communication, since he should rely on fallback transformation
>>>> during the connection in the common case. It may be unexpected for
>>>> connect(2) to fail during the fallback due to security politics.
>>>
>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>>> there is only one path. This is done after the userspace's connect(). If
>>> the fallback is blocked, I guess the userspace will get the same errors
>>> as when an open connection is reset.
>>
>> In the case of blocking due to security policy, userspace should get
>> -EACESS. I mean, the user might not expect the fallback path to be
>> blocked during the connection if he has allowed only MPTCP communication
>> using the Landlock policy.
> 
> A "fallback" can happen on different occasions as mentioned in the
> RFC8684 [1], e.g.
> 
> - The client asks to use MPTCP, but the other peer doesn't support it:
> 
>    Client                Server
>    |     SYN + MP_CAPABLE     |
>    |------------------------->|
>    |         SYN/ACK          |
>    |<-------------------------|  => Fallback on the client side
>    |           ACK            |
>    |------------------------->|
> 
> - A middle box doesn't touch the 3WHS, but intercept the communication
> just after:
> 
>    Client                Server
>    |     SYN + MP_CAPABLE     |
>    |------------------------->|
>    |   SYN/ACK + MP_CAPABLE   |
>    |<-------------------------|
>    |     ACK + MP_CAPABLE     |
>    |------------------------->|
>    |        DSS + data        | => but the server doesn't receive the DSS
>    |------------------------->| => So fallback on the server side
>    |           ACK            |
>    |<-------------------------| => Fallback on the client side
> 
> - etc.
> 
> So the connect(), even in blocking mode, can be OK, but the "fallback"
> will happen later.

Thanks! Theoretical "socket transformation" control should cover all
these cases.

You mean that it might be reasonable for a Landlock policy to block
MPTCP fallback when establishing first sublflow (when client does not
receive MP_CAPABLE)?

> 
> Again, once the "fallback" has been done, it just means there will be no
> more MPTCP options in the TCP headers, and these TCP connections,
> created and controlled by the kernel, will continue as "plain" TCP
> connections. It simply means that the MPTCP connection will be
> restricted to one path, because it will not be possible to create
> additional paths any more without these MPTCP options in the initial path.

Correct, thanks

> 
> [1] https://datatracker.ietf.org/doc/html/rfc8684#name-fallback
> 
>>> (Note that on the listener side, the fallback can happen before the
>>> userspace's accept() which can even get an IPPROTO_TCP socket in return)
>>
>> Indeed, fallback can happen on a server side as well.
> 
> Same here, this fallback can happen at different stages of the
> connection, e.g. the server, supporting MPTCP, can receive a SYN without
> MP_CAPABLE option ; or the 3WHS is OK, but the MPTCP options are
> stripped later.
> 
>>>> Theoretically, any TCP restriction should cause similar SMC and MPTCP
>>>> restriction. If we deny creation of TCP sockets, we should also deny
>>>> creation of SMC and MPTCP sockets. I thought that such dependencies may
>>>> be too complex and it will be better to leave them for the user and not
>>>> provide any transformation control at all. What do you think?
>>> I guess the creation of "kernel" TCP sockets used by MPTCP (and SMC?)
>>> can be restricted, it depends on where this hook is placed I suppose.
>>
>> Calling
>>      socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)
>> causes creation of kernel TCP socket, so we can use
>> security_socket_create() hook for this purpose.
> 
> That's good if you use this hook then!
> 
> Cheers,
> Matt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 11:02                           ` Mikhail Ivanov
@ 2025-01-29 11:33                             ` Matthieu Baerts
  2025-01-29 11:47                               ` Mikhail Ivanov
  0 siblings, 1 reply; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-29 11:33 UTC (permalink / raw)
  To: Mikhail Ivanov, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

On 29/01/2025 12:02, Mikhail Ivanov wrote:
> On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
>> Hi Mikhail,
>>
>> On 29/01/2025 10:52, Mikhail Ivanov wrote:
>>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>>>> Hi Mikhail,
>>>>
>>>> Sorry, I didn't follow all the discussions in this thread, but here are
>>>> some comments, hoping this can help to clarify the MPTCP case.
>>>
>>> Thanks a lot for sharing your knowledge, Matthieu!
>>>
>>>>
>>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>>>
>>>> (...)
>>>>
>>>>>> I'm a bit worried that we miss some of these places (now or in future
>>>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>>>
>>>>>> Could you list the current locations?
>>>>>
>>>>> Currently, I know only about TCP-related transformations:
>>>>>
>>>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>>>     (1) to exchange CLC control messages in default case and (2)
>>>>> for the
>>>>>     communication in the case of fallback. If socket was connected or
>>>>>     connection failed, socket can not be reconnected again. There
>>>>> is no
>>>>>     existing security hook to control the fallback case,
>>>>>
>>>>> * MPTCP uses TCP for communication between two network interfaces
>>>>> in the
>>>>>     default case and can fallback to plain TCP if remote peer does not
>>>>>     support MPTCP. AFAICS, there is also no security hook to
>>>>> control the
>>>>>     fallback transformation,
>>>>
>>>> There are security hooks to control the path creation, but not to
>>>> control the "fallback transformation".
>>>>
>>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>>>> socket. This is only used "internally": to communicate between the
>>>> userspace and the kernelspace, but not directly used between network
>>>> interfaces. This "external" communication is done via one or multiple
>>>> kernel TCP sockets carrying extra TCP options for the mapping. The
>>>> userspace cannot directly control these sockets created by the kernel.
>>>>
>>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>>>> and in the Linux network stack, it is the same TCP connection, without
>>>> the MPTCP options in the TCP header. The userspace continue to
>>>> communicate with the same socket.
>>>>
>>>> I'm not sure if there is a need to block the fallback: it means only
>>>> one
>>>> path can be used at a time.
>>>
>>> You mean that users always rely on a plain TCP communication in the case
>>> the connection of MPTCP multipath communication fails?
>>
>> Yes, that's the same TCP connection, just without extra bit to be able
>> to use multiple TCP connections associated to the same MPTCP one.
> 
> Indeed, so MPTCP communication should be restricted the same way as TCP.
> AFAICS this should be intuitive for MPTCP users and it'll be better
> to let userland define this dependency.

Yes, I think that would make more sense.

I guess we can look at MPTCP as TCP with extra features.

So if TCP is blocked, MPTCP should be blocked as well. (And eventually
having the possibility to block only TCP but not MPTCP and the opposite,
but that's a different topic: a possible new feature, but not a bug-fix)

>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>>     IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>>>>
>>>>> As I said before, I wonder if user may want to use SMC or MPTCP and
>>>>> deny
>>>>> TCP communication, since he should rely on fallback transformation
>>>>> during the connection in the common case. It may be unexpected for
>>>>> connect(2) to fail during the fallback due to security politics.
>>>>
>>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>>>> there is only one path. This is done after the userspace's
>>>> connect(). If
>>>> the fallback is blocked, I guess the userspace will get the same errors
>>>> as when an open connection is reset.
>>>
>>> In the case of blocking due to security policy, userspace should get
>>> -EACESS. I mean, the user might not expect the fallback path to be
>>> blocked during the connection if he has allowed only MPTCP communication
>>> using the Landlock policy.
>>
>> A "fallback" can happen on different occasions as mentioned in the
>> RFC8684 [1], e.g.
>>
>> - The client asks to use MPTCP, but the other peer doesn't support it:
>>
>>    Client                Server
>>    |     SYN + MP_CAPABLE     |
>>    |------------------------->|
>>    |         SYN/ACK          |
>>    |<-------------------------|  => Fallback on the client side
>>    |           ACK            |
>>    |------------------------->|
>>
>> - A middle box doesn't touch the 3WHS, but intercept the communication
>> just after:
>>
>>    Client                Server
>>    |     SYN + MP_CAPABLE     |
>>    |------------------------->|
>>    |   SYN/ACK + MP_CAPABLE   |
>>    |<-------------------------|
>>    |     ACK + MP_CAPABLE     |
>>    |------------------------->|
>>    |        DSS + data        | => but the server doesn't receive the DSS
>>    |------------------------->| => So fallback on the server side
>>    |           ACK            |
>>    |<-------------------------| => Fallback on the client side
>>
>> - etc.
>>
>> So the connect(), even in blocking mode, can be OK, but the "fallback"
>> will happen later.
> 
> Thanks! Theoretical "socket transformation" control should cover all
> these cases.
> 
> You mean that it might be reasonable for a Landlock policy to block
> MPTCP fallback when establishing first sublflow (when client does not
> receive MP_CAPABLE)?

Personally, I don't even know if there is really a need for such
policies. The fallback is there not to block a connection if the other
peer doesn't support MPTCP, or if a middlebox decides to mess-up with
MPTCP options. So instead of an error, the connection continues but is
"degraded" by not being able to create multiple paths later on.

Maybe best to wait for a concrete use-case before implementing this?

(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 11:33                             ` Matthieu Baerts
@ 2025-01-29 11:47                               ` Mikhail Ivanov
  2025-01-29 11:57                                 ` Matthieu Baerts
  2025-01-29 14:51                                 ` Mickaël Salaün
  0 siblings, 2 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-29 11:47 UTC (permalink / raw)
  To: Matthieu Baerts, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
> On 29/01/2025 12:02, Mikhail Ivanov wrote:
>> On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
>>> Hi Mikhail,
>>>
>>> On 29/01/2025 10:52, Mikhail Ivanov wrote:
>>>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>>>>> Hi Mikhail,
>>>>>
>>>>> Sorry, I didn't follow all the discussions in this thread, but here are
>>>>> some comments, hoping this can help to clarify the MPTCP case.
>>>>
>>>> Thanks a lot for sharing your knowledge, Matthieu!
>>>>
>>>>>
>>>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>>>>
>>>>> (...)
>>>>>
>>>>>>> I'm a bit worried that we miss some of these places (now or in future
>>>>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>>>>
>>>>>>> Could you list the current locations?
>>>>>>
>>>>>> Currently, I know only about TCP-related transformations:
>>>>>>
>>>>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>>>>      (1) to exchange CLC control messages in default case and (2)
>>>>>> for the
>>>>>>      communication in the case of fallback. If socket was connected or
>>>>>>      connection failed, socket can not be reconnected again. There
>>>>>> is no
>>>>>>      existing security hook to control the fallback case,
>>>>>>
>>>>>> * MPTCP uses TCP for communication between two network interfaces
>>>>>> in the
>>>>>>      default case and can fallback to plain TCP if remote peer does not
>>>>>>      support MPTCP. AFAICS, there is also no security hook to
>>>>>> control the
>>>>>>      fallback transformation,
>>>>>
>>>>> There are security hooks to control the path creation, but not to
>>>>> control the "fallback transformation".
>>>>>
>>>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>>>>> socket. This is only used "internally": to communicate between the
>>>>> userspace and the kernelspace, but not directly used between network
>>>>> interfaces. This "external" communication is done via one or multiple
>>>>> kernel TCP sockets carrying extra TCP options for the mapping. The
>>>>> userspace cannot directly control these sockets created by the kernel.
>>>>>
>>>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>>>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>>>>> and in the Linux network stack, it is the same TCP connection, without
>>>>> the MPTCP options in the TCP header. The userspace continue to
>>>>> communicate with the same socket.
>>>>>
>>>>> I'm not sure if there is a need to block the fallback: it means only
>>>>> one
>>>>> path can be used at a time.
>>>>
>>>> You mean that users always rely on a plain TCP communication in the case
>>>> the connection of MPTCP multipath communication fails?
>>>
>>> Yes, that's the same TCP connection, just without extra bit to be able
>>> to use multiple TCP connections associated to the same MPTCP one.
>>
>> Indeed, so MPTCP communication should be restricted the same way as TCP.
>> AFAICS this should be intuitive for MPTCP users and it'll be better
>> to let userland define this dependency.
> 
> Yes, I think that would make more sense.
> 
> I guess we can look at MPTCP as TCP with extra features.

Yeap

> 
> So if TCP is blocked, MPTCP should be blocked as well. (And eventually
> having the possibility to block only TCP but not MPTCP and the opposite,
> but that's a different topic: a possible new feature, but not a bug-fix)
What do you mean by the "bug fix"?

> 
>>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>>>      IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>>>>>
>>>>>> As I said before, I wonder if user may want to use SMC or MPTCP and
>>>>>> deny
>>>>>> TCP communication, since he should rely on fallback transformation
>>>>>> during the connection in the common case. It may be unexpected for
>>>>>> connect(2) to fail during the fallback due to security politics.
>>>>>
>>>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>>>>> there is only one path. This is done after the userspace's
>>>>> connect(). If
>>>>> the fallback is blocked, I guess the userspace will get the same errors
>>>>> as when an open connection is reset.
>>>>
>>>> In the case of blocking due to security policy, userspace should get
>>>> -EACESS. I mean, the user might not expect the fallback path to be
>>>> blocked during the connection if he has allowed only MPTCP communication
>>>> using the Landlock policy.
>>>
>>> A "fallback" can happen on different occasions as mentioned in the
>>> RFC8684 [1], e.g.
>>>
>>> - The client asks to use MPTCP, but the other peer doesn't support it:
>>>
>>>     Client                Server
>>>     |     SYN + MP_CAPABLE     |
>>>     |------------------------->|
>>>     |         SYN/ACK          |
>>>     |<-------------------------|  => Fallback on the client side
>>>     |           ACK            |
>>>     |------------------------->|
>>>
>>> - A middle box doesn't touch the 3WHS, but intercept the communication
>>> just after:
>>>
>>>     Client                Server
>>>     |     SYN + MP_CAPABLE     |
>>>     |------------------------->|
>>>     |   SYN/ACK + MP_CAPABLE   |
>>>     |<-------------------------|
>>>     |     ACK + MP_CAPABLE     |
>>>     |------------------------->|
>>>     |        DSS + data        | => but the server doesn't receive the DSS
>>>     |------------------------->| => So fallback on the server side
>>>     |           ACK            |
>>>     |<-------------------------| => Fallback on the client side
>>>
>>> - etc.
>>>
>>> So the connect(), even in blocking mode, can be OK, but the "fallback"
>>> will happen later.
>>
>> Thanks! Theoretical "socket transformation" control should cover all
>> these cases.
>>
>> You mean that it might be reasonable for a Landlock policy to block
>> MPTCP fallback when establishing first sublflow (when client does not
>> receive MP_CAPABLE)?
> 
> Personally, I don't even know if there is really a need for such
> policies. The fallback is there not to block a connection if the other
> peer doesn't support MPTCP, or if a middlebox decides to mess-up with
> MPTCP options. So instead of an error, the connection continues but is
> "degraded" by not being able to create multiple paths later on.
> 
> Maybe best to wait for a concrete use-case before implementing this?

Ok, got it! I agree that such policies does not seem to be useful.

> 
> (...)
> 
> Cheers,
> Matt

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 11:47                               ` Mikhail Ivanov
@ 2025-01-29 11:57                                 ` Matthieu Baerts
  2025-01-29 14:51                                 ` Mickaël Salaün
  1 sibling, 0 replies; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-29 11:57 UTC (permalink / raw)
  To: Mikhail Ivanov, Mickaël Salaün
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

On 29/01/2025 12:47, Mikhail Ivanov wrote:
> On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
>> So if TCP is blocked, MPTCP should be blocked as well. (And eventually
>> having the possibility to block only TCP but not MPTCP and the opposite,
>> but that's a different topic: a possible new feature, but not a bug-fix)
>
> What do you mean by the "bug fix"?

I mean that to me, adding the possibility to block one but not the other
might be seen as a new feature. But at the end, that's up to the
Landlocks maintainers to decide! So feel free to ignore this previous
comment :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 11:47                               ` Mikhail Ivanov
  2025-01-29 11:57                                 ` Matthieu Baerts
@ 2025-01-29 14:51                                 ` Mickaël Salaün
  2025-01-29 15:44                                   ` Matthieu Baerts
  2025-01-31 11:04                                   ` Mikhail Ivanov
  1 sibling, 2 replies; 50+ messages in thread
From: Mickaël Salaün @ 2025-01-29 14:51 UTC (permalink / raw)
  To: Mikhail Ivanov
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

On Wed, Jan 29, 2025 at 02:47:19PM +0300, Mikhail Ivanov wrote:
> On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
> > On 29/01/2025 12:02, Mikhail Ivanov wrote:
> > > On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
> > > > Hi Mikhail,
> > > > 
> > > > On 29/01/2025 10:52, Mikhail Ivanov wrote:
> > > > > On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
> > > > > > Hi Mikhail,
> > > > > > 
> > > > > > Sorry, I didn't follow all the discussions in this thread, but here are
> > > > > > some comments, hoping this can help to clarify the MPTCP case.
> > > > > 
> > > > > Thanks a lot for sharing your knowledge, Matthieu!
> > > > > 
> > > > > > 
> > > > > > On 28/01/2025 11:56, Mikhail Ivanov wrote:
> > > > > > > On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
> > > > > > 
> > > > > > (...)
> > > > > > 
> > > > > > > > I'm a bit worried that we miss some of these places (now or in future
> > > > > > > > kernel versions).  We'll need a new LSM hook for that.
> > > > > > > > 
> > > > > > > > Could you list the current locations?
> > > > > > > 
> > > > > > > Currently, I know only about TCP-related transformations:
> > > > > > > 
> > > > > > > * SMC can fallback to TCP during connection. TCP connection is used
> > > > > > >      (1) to exchange CLC control messages in default case and (2)
> > > > > > > for the
> > > > > > >      communication in the case of fallback. If socket was connected or
> > > > > > >      connection failed, socket can not be reconnected again. There
> > > > > > > is no
> > > > > > >      existing security hook to control the fallback case,
> > > > > > > 
> > > > > > > * MPTCP uses TCP for communication between two network interfaces
> > > > > > > in the
> > > > > > >      default case and can fallback to plain TCP if remote peer does not
> > > > > > >      support MPTCP. AFAICS, there is also no security hook to
> > > > > > > control the
> > > > > > >      fallback transformation,
> > > > > > 
> > > > > > There are security hooks to control the path creation, but not to
> > > > > > control the "fallback transformation".
> > > > > > 
> > > > > > Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
> > > > > > socket. This is only used "internally": to communicate between the
> > > > > > userspace and the kernelspace, but not directly used between network
> > > > > > interfaces. This "external" communication is done via one or multiple
> > > > > > kernel TCP sockets carrying extra TCP options for the mapping. The
> > > > > > userspace cannot directly control these sockets created by the kernel.
> > > > > > 
> > > > > > In case of fallback, the kernel TCP socket "simply" drop the extra TCP
> > > > > > options needed for MPTCP, and carry on like normal TCP. So on the wire
> > > > > > and in the Linux network stack, it is the same TCP connection, without
> > > > > > the MPTCP options in the TCP header. The userspace continue to
> > > > > > communicate with the same socket.
> > > > > > 
> > > > > > I'm not sure if there is a need to block the fallback: it means only
> > > > > > one
> > > > > > path can be used at a time.

Thanks Matthieu.

So user space needs to specific IPPROTO_MPTCP to use MPTCP, but on the
network this socket can translate to "augmented" or plain TCP.

From Landlock point of view, what matters is to have a consistent policy
that maps to user space code.  The fear was that a malicious user space
that is only allowed to use MPTCP could still transform an MPTCP socket
to a TCP socket, while it wasn't allowed to create a TCP socket in the
first place.  I now think this should not be an issue because:
1. MPTCP is kind of a superset of TCP
2. user space legitimately using MPTCP should not get any error related
   to a Landlock policy because of TCP/any automatic fallback.  To say
   it another way, such fallback is independent of user space requests
   and may not be predicted because it is related to the current network
   path.  This follows the principle of least astonishment (at least
   from user space point of view).

So, if I understand correctly, this should be simple for the Landlock
socket creation control:  we only check socket properties at creation
time and we ignore potential fallbacks.  This should be documented
though.

As an example, if a Landlock policies only allows MPTCP: socket(...,
IPPROTO_MPTCP) should be allowed and any legitimate use of the returned
socket (according to MPTCP) should be allowed, including TCP fallback.
However, socket(..., IPPROTO_TCP/0), should only be allowed if TCP is
explicitly allowed.  This means that we might end up with an MPTCP
socket only using TCP, which is OK.

I guess this should be the same for other protocols, except if user
space can explicitly transform a specific socket type to use an
*arbitrary* protocol, but I think this is not possible.

> > > > > 
> > > > > You mean that users always rely on a plain TCP communication in the case
> > > > > the connection of MPTCP multipath communication fails?
> > > > 
> > > > Yes, that's the same TCP connection, just without extra bit to be able
> > > > to use multiple TCP connections associated to the same MPTCP one.
> > > 
> > > Indeed, so MPTCP communication should be restricted the same way as TCP.
> > > AFAICS this should be intuitive for MPTCP users and it'll be better
> > > to let userland define this dependency.
> > 
> > Yes, I think that would make more sense.
> > 
> > I guess we can look at MPTCP as TCP with extra features.
> 
> Yeap
> 
> > 
> > So if TCP is blocked, MPTCP should be blocked as well. (And eventually
> > having the possibility to block only TCP but not MPTCP and the opposite,
> > but that's a different topic: a possible new feature, but not a bug-fix)
> What do you mean by the "bug fix"?
> 
> > 
> > > > > > > * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
> > > > > > >      IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.

According to the man page: "It is allowed only for IPv6 sockets that are
connected and bound to a v4-mapped-on-v6 address."

This compatibility feature makes sense from user space point of view and
should not result in an error because of Landlock.

> > > > > > > 
> > > > > > > As I said before, I wonder if user may want to use SMC or MPTCP and
> > > > > > > deny
> > > > > > > TCP communication, since he should rely on fallback transformation
> > > > > > > during the connection in the common case. It may be unexpected for
> > > > > > > connect(2) to fail during the fallback due to security politics.
> > > > > > 
> > > > > > With MPTCP, fallbacks can happen at the beginning of a connection, when
> > > > > > there is only one path. This is done after the userspace's
> > > > > > connect(). If

A remaining question is then, can we repurpose an MPTCP socket that did
fallback to TCP, to (re)connect to another destination (this time
directly with TCP)?

I guess this is possible.  If it is the case, I think it should be OK
anyway.  That could be used by an attacker, but that should not give
more access because of the MPTCP fallback mechanism anyway.  We should
see MPTCP as a superset of TCP.  At the end, security policy is in the
hands of user space.

> > > > > > the fallback is blocked, I guess the userspace will get the same errors
> > > > > > as when an open connection is reset.
> > > > > 
> > > > > In the case of blocking due to security policy, userspace should get
> > > > > -EACESS. I mean, the user might not expect the fallback path to be
> > > > > blocked during the connection if he has allowed only MPTCP communication
> > > > > using the Landlock policy.
> > > > 
> > > > A "fallback" can happen on different occasions as mentioned in the
> > > > RFC8684 [1], e.g.
> > > > 
> > > > - The client asks to use MPTCP, but the other peer doesn't support it:
> > > > 
> > > >     Client                Server
> > > >     |     SYN + MP_CAPABLE     |
> > > >     |------------------------->|
> > > >     |         SYN/ACK          |
> > > >     |<-------------------------|  => Fallback on the client side
> > > >     |           ACK            |
> > > >     |------------------------->|
> > > > 
> > > > - A middle box doesn't touch the 3WHS, but intercept the communication
> > > > just after:
> > > > 
> > > >     Client                Server
> > > >     |     SYN + MP_CAPABLE     |
> > > >     |------------------------->|
> > > >     |   SYN/ACK + MP_CAPABLE   |
> > > >     |<-------------------------|
> > > >     |     ACK + MP_CAPABLE     |
> > > >     |------------------------->|
> > > >     |        DSS + data        | => but the server doesn't receive the DSS
> > > >     |------------------------->| => So fallback on the server side
> > > >     |           ACK            |
> > > >     |<-------------------------| => Fallback on the client side
> > > > 
> > > > - etc.
> > > > 
> > > > So the connect(), even in blocking mode, can be OK, but the "fallback"
> > > > will happen later.
> > > 
> > > Thanks! Theoretical "socket transformation" control should cover all
> > > these cases.
> > > 
> > > You mean that it might be reasonable for a Landlock policy to block
> > > MPTCP fallback when establishing first sublflow (when client does not
> > > receive MP_CAPABLE)?
> > 
> > Personally, I don't even know if there is really a need for such
> > policies. The fallback is there not to block a connection if the other
> > peer doesn't support MPTCP, or if a middlebox decides to mess-up with
> > MPTCP options. So instead of an error, the connection continues but is
> > "degraded" by not being able to create multiple paths later on.

I agree, this kind of compatibility feature should not be denied.

> > 
> > Maybe best to wait for a concrete use-case before implementing this?
> 
> Ok, got it! I agree that such policies does not seem to be useful.
> 
> > 
> > (...)
> > 
> > Cheers,
> > Matt
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 14:51                                 ` Mickaël Salaün
@ 2025-01-29 15:44                                   ` Matthieu Baerts
  2025-01-30  9:51                                     ` Mickaël Salaün
  2025-01-31 11:04                                   ` Mikhail Ivanov
  1 sibling, 1 reply; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-29 15:44 UTC (permalink / raw)
  To: Mickaël Salaün, Mikhail Ivanov
  Cc: gnoack, willemdebruijn.kernel, matthieu, linux-security-module,
	netdev, netfilter-devel, yusongping, artem.kuzin,
	konstantin.meskhidze, MPTCP Linux, linux-nfs, Paul Moore

Hi Mickaël,

On 29/01/2025 15:51, Mickaël Salaün wrote:
> On Wed, Jan 29, 2025 at 02:47:19PM +0300, Mikhail Ivanov wrote:
>> On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
>>> On 29/01/2025 12:02, Mikhail Ivanov wrote:
>>>> On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
>>>>> Hi Mikhail,
>>>>>
>>>>> On 29/01/2025 10:52, Mikhail Ivanov wrote:
>>>>>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>>>>>>> Hi Mikhail,
>>>>>>>
>>>>>>> Sorry, I didn't follow all the discussions in this thread, but here are
>>>>>>> some comments, hoping this can help to clarify the MPTCP case.
>>>>>>
>>>>>> Thanks a lot for sharing your knowledge, Matthieu!
>>>>>>
>>>>>>>
>>>>>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>>>>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>>>>>>
>>>>>>> (...)
>>>>>>>
>>>>>>>>> I'm a bit worried that we miss some of these places (now or in future
>>>>>>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>>>>>>
>>>>>>>>> Could you list the current locations?
>>>>>>>>
>>>>>>>> Currently, I know only about TCP-related transformations:
>>>>>>>>
>>>>>>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>>>>>>      (1) to exchange CLC control messages in default case and (2)
>>>>>>>> for the
>>>>>>>>      communication in the case of fallback. If socket was connected or
>>>>>>>>      connection failed, socket can not be reconnected again. There
>>>>>>>> is no
>>>>>>>>      existing security hook to control the fallback case,
>>>>>>>>
>>>>>>>> * MPTCP uses TCP for communication between two network interfaces
>>>>>>>> in the
>>>>>>>>      default case and can fallback to plain TCP if remote peer does not
>>>>>>>>      support MPTCP. AFAICS, there is also no security hook to
>>>>>>>> control the
>>>>>>>>      fallback transformation,
>>>>>>>
>>>>>>> There are security hooks to control the path creation, but not to
>>>>>>> control the "fallback transformation".
>>>>>>>
>>>>>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>>>>>>> socket. This is only used "internally": to communicate between the
>>>>>>> userspace and the kernelspace, but not directly used between network
>>>>>>> interfaces. This "external" communication is done via one or multiple
>>>>>>> kernel TCP sockets carrying extra TCP options for the mapping. The
>>>>>>> userspace cannot directly control these sockets created by the kernel.
>>>>>>>
>>>>>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>>>>>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>>>>>>> and in the Linux network stack, it is the same TCP connection, without
>>>>>>> the MPTCP options in the TCP header. The userspace continue to
>>>>>>> communicate with the same socket.
>>>>>>>
>>>>>>> I'm not sure if there is a need to block the fallback: it means only
>>>>>>> one
>>>>>>> path can be used at a time.
> 
> Thanks Matthieu.
> 
> So user space needs to specific IPPROTO_MPTCP to use MPTCP, but on the
> network this socket can translate to "augmented" or plain TCP.

Correct. On the wire, you will only see packet with the IPPROTO_TCP
protocol. When MPTCP is used, extra MPTCP options will be present in the
TCP headers, but the protocol is still IPPROTO_TCP on the network.

> From Landlock point of view, what matters is to have a consistent policy
> that maps to user space code.  The fear was that a malicious user space
> that is only allowed to use MPTCP could still transform an MPTCP socket
> to a TCP socket, while it wasn't allowed to create a TCP socket in the
> first place.  I now think this should not be an issue because:
> 1. MPTCP is kind of a superset of TCP
> 2. user space legitimately using MPTCP should not get any error related
>    to a Landlock policy because of TCP/any automatic fallback.  To say
>    it another way, such fallback is independent of user space requests
>    and may not be predicted because it is related to the current network
>    path.  This follows the principle of least astonishment (at least
>    from user space point of view).
> 
> So, if I understand correctly, this should be simple for the Landlock
> socket creation control:  we only check socket properties at creation
> time and we ignore potential fallbacks.  This should be documented
> though.

It depends on the restrictions that are put in place: are the user and
kernel sockets treated the same way? If yes, blocking TCP means that
even if it will be possible for the userspace to create an IPPROTO_MPTCP
socket, the kernel will not be allowed to IPPROTO_TCP ones to
communicate with the outside world. So blocking TCP will implicitly
block MPTCP.

On the other hand, if only TCP user sockets are blocked, then it will be
possible to use MPTCP to communicate to any TCP sockets: with an
IPPROTO_MPTCP socket, it is possible to communicate with any IPPROTO_TCP
sockets, but without the extra features supported by MPTCP.

> As an example, if a Landlock policies only allows MPTCP: socket(...,
> IPPROTO_MPTCP) should be allowed and any legitimate use of the returned
> socket (according to MPTCP) should be allowed, including TCP fallback.
> However, socket(..., IPPROTO_TCP/0), should only be allowed if TCP is
> explicitly allowed.  This means that we might end up with an MPTCP
> socket only using TCP, which is OK.

Would it not be confusing for the person who set the Landlock policies?
Especially for the ones who had policies to block TCP, and thought they
were "safe", no?

If only TCP is blocked on the userspace side, simply using IPPROTO_MPTCP
instead of IPPROTO_TCP will allow any users to continue to talk with the
outside world. Also, it is easy to force apps to use IPPROTO_MPTCP
instead of IPPROTO_TCP, e.g. using 'mptcpize' which set LD_PRELOAD in
order to change the parameters of the socket() call.

   mptcpize run curl https://check.mptcp.dev

> I guess this should be the same for other protocols, except if user
> space can explicitly transform a specific socket type to use an
> *arbitrary* protocol, but I think this is not possible.
I'm sorry, I don't know what is possible with the other ones. But again,
blocking both user and kernel sockets the same way might make more sense
here.

>>>>>>
>>>>>> You mean that users always rely on a plain TCP communication in the case
>>>>>> the connection of MPTCP multipath communication fails?
>>>>>
>>>>> Yes, that's the same TCP connection, just without extra bit to be able
>>>>> to use multiple TCP connections associated to the same MPTCP one.
>>>>
>>>> Indeed, so MPTCP communication should be restricted the same way as TCP.
>>>> AFAICS this should be intuitive for MPTCP users and it'll be better
>>>> to let userland define this dependency.
>>>
>>> Yes, I think that would make more sense.
>>>
>>> I guess we can look at MPTCP as TCP with extra features.
>>
>> Yeap
>>
>>>
>>> So if TCP is blocked, MPTCP should be blocked as well. (And eventually
>>> having the possibility to block only TCP but not MPTCP and the opposite,
>>> but that's a different topic: a possible new feature, but not a bug-fix)
>> What do you mean by the "bug fix"?
>>
>>>
>>>>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>>>>>      IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
> 
> According to the man page: "It is allowed only for IPv6 sockets that are
> connected and bound to a v4-mapped-on-v6 address."
> 
> This compatibility feature makes sense from user space point of view and
> should not result in an error because of Landlock.
> 
>>>>>>>>
>>>>>>>> As I said before, I wonder if user may want to use SMC or MPTCP and
>>>>>>>> deny
>>>>>>>> TCP communication, since he should rely on fallback transformation
>>>>>>>> during the connection in the common case. It may be unexpected for
>>>>>>>> connect(2) to fail during the fallback due to security politics.
>>>>>>>
>>>>>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>>>>>>> there is only one path. This is done after the userspace's
>>>>>>> connect(). If
> 
> A remaining question is then, can we repurpose an MPTCP socket that did
> fallback to TCP, to (re)connect to another destination (this time
> directly with TCP)?

If the socket was created with the IPPROTO_MPTCP protocol, the protocol
will not change after a disconnection. But still, with an MPTCP socket,
it is by design possible to connect to a TCP one no mater how the socket
was used before.

> I guess this is possible.  If it is the case, I think it should be OK
> anyway.  That could be used by an attacker, but that should not give
> more access because of the MPTCP fallback mechanism anyway.  We should
> see MPTCP as a superset of TCP.  At the end, security policy is in the
> hands of user space.

As long as it is documented and not seen as a regression :)

To me, it sounds strange to have to add extra rules for MPTCP if TCP is
blocked, but that's certainly because I see MPTCP like it is seen on the
wire: as an extension to TCP, not as a different protocol.

(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 15:44                                   ` Matthieu Baerts
@ 2025-01-30  9:51                                     ` Mickaël Salaün
  2025-01-30 10:18                                       ` Matthieu Baerts
  0 siblings, 1 reply; 50+ messages in thread
From: Mickaël Salaün @ 2025-01-30  9:51 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Mikhail Ivanov, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

On Wed, Jan 29, 2025 at 04:44:18PM +0100, Matthieu Baerts wrote:
> Hi Mickaël,
> 
> On 29/01/2025 15:51, Mickaël Salaün wrote:
> > On Wed, Jan 29, 2025 at 02:47:19PM +0300, Mikhail Ivanov wrote:
> >> On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
> >>> On 29/01/2025 12:02, Mikhail Ivanov wrote:
> >>>> On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
> >>>>> Hi Mikhail,
> >>>>>
> >>>>> On 29/01/2025 10:52, Mikhail Ivanov wrote:
> >>>>>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
> >>>>>>> Hi Mikhail,
> >>>>>>>
> >>>>>>> Sorry, I didn't follow all the discussions in this thread, but here are
> >>>>>>> some comments, hoping this can help to clarify the MPTCP case.
> >>>>>>
> >>>>>> Thanks a lot for sharing your knowledge, Matthieu!
> >>>>>>
> >>>>>>>
> >>>>>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
> >>>>>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
> >>>>>>>
> >>>>>>> (...)
> >>>>>>>
> >>>>>>>>> I'm a bit worried that we miss some of these places (now or in future
> >>>>>>>>> kernel versions).  We'll need a new LSM hook for that.
> >>>>>>>>>
> >>>>>>>>> Could you list the current locations?
> >>>>>>>>
> >>>>>>>> Currently, I know only about TCP-related transformations:
> >>>>>>>>
> >>>>>>>> * SMC can fallback to TCP during connection. TCP connection is used
> >>>>>>>>      (1) to exchange CLC control messages in default case and (2)
> >>>>>>>> for the
> >>>>>>>>      communication in the case of fallback. If socket was connected or
> >>>>>>>>      connection failed, socket can not be reconnected again. There
> >>>>>>>> is no
> >>>>>>>>      existing security hook to control the fallback case,
> >>>>>>>>
> >>>>>>>> * MPTCP uses TCP for communication between two network interfaces
> >>>>>>>> in the
> >>>>>>>>      default case and can fallback to plain TCP if remote peer does not
> >>>>>>>>      support MPTCP. AFAICS, there is also no security hook to
> >>>>>>>> control the
> >>>>>>>>      fallback transformation,
> >>>>>>>
> >>>>>>> There are security hooks to control the path creation, but not to
> >>>>>>> control the "fallback transformation".
> >>>>>>>
> >>>>>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
> >>>>>>> socket. This is only used "internally": to communicate between the
> >>>>>>> userspace and the kernelspace, but not directly used between network
> >>>>>>> interfaces. This "external" communication is done via one or multiple
> >>>>>>> kernel TCP sockets carrying extra TCP options for the mapping. The
> >>>>>>> userspace cannot directly control these sockets created by the kernel.
> >>>>>>>
> >>>>>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
> >>>>>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
> >>>>>>> and in the Linux network stack, it is the same TCP connection, without
> >>>>>>> the MPTCP options in the TCP header. The userspace continue to
> >>>>>>> communicate with the same socket.
> >>>>>>>
> >>>>>>> I'm not sure if there is a need to block the fallback: it means only
> >>>>>>> one
> >>>>>>> path can be used at a time.
> > 
> > Thanks Matthieu.
> > 
> > So user space needs to specific IPPROTO_MPTCP to use MPTCP, but on the
> > network this socket can translate to "augmented" or plain TCP.
> 
> Correct. On the wire, you will only see packet with the IPPROTO_TCP
> protocol. When MPTCP is used, extra MPTCP options will be present in the
> TCP headers, but the protocol is still IPPROTO_TCP on the network.
> 
> > From Landlock point of view, what matters is to have a consistent policy
> > that maps to user space code.  The fear was that a malicious user space
> > that is only allowed to use MPTCP could still transform an MPTCP socket
> > to a TCP socket, while it wasn't allowed to create a TCP socket in the
> > first place.  I now think this should not be an issue because:
> > 1. MPTCP is kind of a superset of TCP
> > 2. user space legitimately using MPTCP should not get any error related
> >    to a Landlock policy because of TCP/any automatic fallback.  To say
> >    it another way, such fallback is independent of user space requests
> >    and may not be predicted because it is related to the current network
> >    path.  This follows the principle of least astonishment (at least
> >    from user space point of view).
> > 
> > So, if I understand correctly, this should be simple for the Landlock
> > socket creation control:  we only check socket properties at creation
> > time and we ignore potential fallbacks.  This should be documented
> > though.
> 
> It depends on the restrictions that are put in place: are the user and
> kernel sockets treated the same way? If yes, blocking TCP means that
> even if it will be possible for the userspace to create an IPPROTO_MPTCP
> socket, the kernel will not be allowed to IPPROTO_TCP ones to
> communicate with the outside world. So blocking TCP will implicitly
> block MPTCP.
> 
> On the other hand, if only TCP user sockets are blocked, then it will be
> possible to use MPTCP to communicate to any TCP sockets: with an
> IPPROTO_MPTCP socket, it is possible to communicate with any IPPROTO_TCP
> sockets, but without the extra features supported by MPTCP.

Yes, that how Landlock works, it only enforces a security policy defined
by user space on user space.  The kernel on its own is never restricted.

> 
> > As an example, if a Landlock policies only allows MPTCP: socket(...,
> > IPPROTO_MPTCP) should be allowed and any legitimate use of the returned
> > socket (according to MPTCP) should be allowed, including TCP fallback.
> > However, socket(..., IPPROTO_TCP/0), should only be allowed if TCP is
> > explicitly allowed.  This means that we might end up with an MPTCP
> > socket only using TCP, which is OK.
> 
> Would it not be confusing for the person who set the Landlock policies?
> Especially for the ones who had policies to block TCP, and thought they
> were "safe", no?

There are two kind of users for Landlock:
1. developers sandboxing their applications;
2. sysadmins or security experts sandboxing execution environments (e.g.
   with container runtimes, service managers, sandboxing tools...).

It would make sense for developers to allow what their code request,
whatever fallback the kernel might use instead.  In this case, they
should not care about MPTCP being TCP with some flags underneath.
Moreover, developers might not be aware of the system on which their
application is running, and their concern should mainly be about
compatibility.

For security or network experts, implying that allowing MPTCP means that
fallback to TCP is allowed might be a bit surprising at first, but they
should have the knowledge to know how MPTCP works underneath, including
this fallback mechanism.  Moreover, this kind of users can (and should)
also rely on system-wide security policies such as Netfilter, which
give more control.

In a nutshell, Landlock should favor compatibility at the sandboxing/app
layers and we should rely on system-wide security policies (taking into
account the running system's context) for more fine-grained control.
This compatibility behaviors should be explained in the Landlock
documentation though.

> 
> If only TCP is blocked on the userspace side, simply using IPPROTO_MPTCP
> instead of IPPROTO_TCP will allow any users to continue to talk with the
> outside world. Also, it is easy to force apps to use IPPROTO_MPTCP
> instead of IPPROTO_TCP, e.g. using 'mptcpize' which set LD_PRELOAD in
> order to change the parameters of the socket() call.
> 
>    mptcpize run curl https://check.mptcp.dev

Landlock restrictions are enforced at a specific time for a process and
all its future children.  LD_PRELOAD is not an issue because a security
policy cannot be disabled once enforced.  If a sandboxed program uses
MPTCP (because of LD_PRELOAD) instead of TCP, the previously enforced
policy will be enforced the same (either to allow or deny the use of
MPTCP).

The only issue with LD_PRELOAD could be when e.g. curl sandboxes itself
and denies itself the use of MPTCP, whereas mptcpize would "patch" the
curl process to use MPTCP.  In this case, connections would failed.  A
solution would be for mptcpize to "patch" the Landlock security as well,
or for curl to be more permissive.  If the sandboxing happens before
calling mptcpize, or if it is enforced by mptcpize, then it would work
as expected.

> 
> > I guess this should be the same for other protocols, except if user
> > space can explicitly transform a specific socket type to use an
> > *arbitrary* protocol, but I think this is not possible.
> I'm sorry, I don't know what is possible with the other ones. But again,
> blocking both user and kernel sockets the same way might make more sense
> here.
> 
> >>>>>>
> >>>>>> You mean that users always rely on a plain TCP communication in the case
> >>>>>> the connection of MPTCP multipath communication fails?
> >>>>>
> >>>>> Yes, that's the same TCP connection, just without extra bit to be able
> >>>>> to use multiple TCP connections associated to the same MPTCP one.
> >>>>
> >>>> Indeed, so MPTCP communication should be restricted the same way as TCP.
> >>>> AFAICS this should be intuitive for MPTCP users and it'll be better
> >>>> to let userland define this dependency.
> >>>
> >>> Yes, I think that would make more sense.
> >>>
> >>> I guess we can look at MPTCP as TCP with extra features.
> >>
> >> Yeap
> >>
> >>>
> >>> So if TCP is blocked, MPTCP should be blocked as well. (And eventually
> >>> having the possibility to block only TCP but not MPTCP and the opposite,
> >>> but that's a different topic: a possible new feature, but not a bug-fix)
> >> What do you mean by the "bug fix"?
> >>
> >>>
> >>>>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
> >>>>>>>>      IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
> > 
> > According to the man page: "It is allowed only for IPv6 sockets that are
> > connected and bound to a v4-mapped-on-v6 address."
> > 
> > This compatibility feature makes sense from user space point of view and
> > should not result in an error because of Landlock.
> > 
> >>>>>>>>
> >>>>>>>> As I said before, I wonder if user may want to use SMC or MPTCP and
> >>>>>>>> deny
> >>>>>>>> TCP communication, since he should rely on fallback transformation
> >>>>>>>> during the connection in the common case. It may be unexpected for
> >>>>>>>> connect(2) to fail during the fallback due to security politics.
> >>>>>>>
> >>>>>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
> >>>>>>> there is only one path. This is done after the userspace's
> >>>>>>> connect(). If
> > 
> > A remaining question is then, can we repurpose an MPTCP socket that did
> > fallback to TCP, to (re)connect to another destination (this time
> > directly with TCP)?
> 
> If the socket was created with the IPPROTO_MPTCP protocol, the protocol
> will not change after a disconnection. But still, with an MPTCP socket,
> it is by design possible to connect to a TCP one no mater how the socket
> was used before.

OK, this makes sense if we see MPTCP as a superset of TCP.

> 
> > I guess this is possible.  If it is the case, I think it should be OK
> > anyway.  That could be used by an attacker, but that should not give
> > more access because of the MPTCP fallback mechanism anyway.  We should
> > see MPTCP as a superset of TCP.  At the end, security policy is in the
> > hands of user space.
> 
> As long as it is documented and not seen as a regression :)
> 
> To me, it sounds strange to have to add extra rules for MPTCP if TCP is
> blocked, but that's certainly because I see MPTCP like it is seen on the
> wire: as an extension to TCP, not as a different protocol.

I understand.  For Landlock, I'd prefer to not add exceptions according
to protocol implementations, but to define a security policy that could
easily map to user space code.  The current proposal is to map the
Landlock API to (a superset of) the socket(2) API, and then being able
to specify restrictions on a domain, a type, or a protocol.  However, we
could document and encourage users to only specify AF_INET/AF_INET6 +
SOCK_STREAM but without specifying any protocol (not "0" but a wildcard
"(u64)-1"), which would then implicitly allow TCP and MPTCP.

> 
> (...)
> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-30  9:51                                     ` Mickaël Salaün
@ 2025-01-30 10:18                                       ` Matthieu Baerts
  0 siblings, 0 replies; 50+ messages in thread
From: Matthieu Baerts @ 2025-01-30 10:18 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Mikhail Ivanov, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

Hi Mickaël,

On 30/01/2025 10:51, Mickaël Salaün wrote:
> On Wed, Jan 29, 2025 at 04:44:18PM +0100, Matthieu Baerts wrote:
>> Hi Mickaël,
>>
>> On 29/01/2025 15:51, Mickaël Salaün wrote:
>>> On Wed, Jan 29, 2025 at 02:47:19PM +0300, Mikhail Ivanov wrote:
>>>> On 1/29/2025 2:33 PM, Matthieu Baerts wrote:
>>>>> On 29/01/2025 12:02, Mikhail Ivanov wrote:
>>>>>> On 1/29/2025 1:25 PM, Matthieu Baerts wrote:
>>>>>>> Hi Mikhail,
>>>>>>>
>>>>>>> On 29/01/2025 10:52, Mikhail Ivanov wrote:
>>>>>>>> On 1/28/2025 9:14 PM, Matthieu Baerts wrote:
>>>>>>>>> Hi Mikhail,
>>>>>>>>>
>>>>>>>>> Sorry, I didn't follow all the discussions in this thread, but here are
>>>>>>>>> some comments, hoping this can help to clarify the MPTCP case.
>>>>>>>>
>>>>>>>> Thanks a lot for sharing your knowledge, Matthieu!
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28/01/2025 11:56, Mikhail Ivanov wrote:
>>>>>>>>>> On 1/27/2025 10:48 PM, Mickaël Salaün wrote:
>>>>>>>>>
>>>>>>>>> (...)
>>>>>>>>>
>>>>>>>>>>> I'm a bit worried that we miss some of these places (now or in future
>>>>>>>>>>> kernel versions).  We'll need a new LSM hook for that.
>>>>>>>>>>>
>>>>>>>>>>> Could you list the current locations?
>>>>>>>>>>
>>>>>>>>>> Currently, I know only about TCP-related transformations:
>>>>>>>>>>
>>>>>>>>>> * SMC can fallback to TCP during connection. TCP connection is used
>>>>>>>>>>      (1) to exchange CLC control messages in default case and (2)
>>>>>>>>>> for the
>>>>>>>>>>      communication in the case of fallback. If socket was connected or
>>>>>>>>>>      connection failed, socket can not be reconnected again. There
>>>>>>>>>> is no
>>>>>>>>>>      existing security hook to control the fallback case,
>>>>>>>>>>
>>>>>>>>>> * MPTCP uses TCP for communication between two network interfaces
>>>>>>>>>> in the
>>>>>>>>>>      default case and can fallback to plain TCP if remote peer does not
>>>>>>>>>>      support MPTCP. AFAICS, there is also no security hook to
>>>>>>>>>> control the
>>>>>>>>>>      fallback transformation,
>>>>>>>>>
>>>>>>>>> There are security hooks to control the path creation, but not to
>>>>>>>>> control the "fallback transformation".
>>>>>>>>>
>>>>>>>>> Technically, with MPTCP, the userspace will create an IPPROTO_MPTCP
>>>>>>>>> socket. This is only used "internally": to communicate between the
>>>>>>>>> userspace and the kernelspace, but not directly used between network
>>>>>>>>> interfaces. This "external" communication is done via one or multiple
>>>>>>>>> kernel TCP sockets carrying extra TCP options for the mapping. The
>>>>>>>>> userspace cannot directly control these sockets created by the kernel.
>>>>>>>>>
>>>>>>>>> In case of fallback, the kernel TCP socket "simply" drop the extra TCP
>>>>>>>>> options needed for MPTCP, and carry on like normal TCP. So on the wire
>>>>>>>>> and in the Linux network stack, it is the same TCP connection, without
>>>>>>>>> the MPTCP options in the TCP header. The userspace continue to
>>>>>>>>> communicate with the same socket.
>>>>>>>>>
>>>>>>>>> I'm not sure if there is a need to block the fallback: it means only
>>>>>>>>> one
>>>>>>>>> path can be used at a time.
>>>
>>> Thanks Matthieu.
>>>
>>> So user space needs to specific IPPROTO_MPTCP to use MPTCP, but on the
>>> network this socket can translate to "augmented" or plain TCP.
>>
>> Correct. On the wire, you will only see packet with the IPPROTO_TCP
>> protocol. When MPTCP is used, extra MPTCP options will be present in the
>> TCP headers, but the protocol is still IPPROTO_TCP on the network.
>>
>>> From Landlock point of view, what matters is to have a consistent policy
>>> that maps to user space code.  The fear was that a malicious user space
>>> that is only allowed to use MPTCP could still transform an MPTCP socket
>>> to a TCP socket, while it wasn't allowed to create a TCP socket in the
>>> first place.  I now think this should not be an issue because:
>>> 1. MPTCP is kind of a superset of TCP
>>> 2. user space legitimately using MPTCP should not get any error related
>>>    to a Landlock policy because of TCP/any automatic fallback.  To say
>>>    it another way, such fallback is independent of user space requests
>>>    and may not be predicted because it is related to the current network
>>>    path.  This follows the principle of least astonishment (at least
>>>    from user space point of view).
>>>
>>> So, if I understand correctly, this should be simple for the Landlock
>>> socket creation control:  we only check socket properties at creation
>>> time and we ignore potential fallbacks.  This should be documented
>>> though.
>>
>> It depends on the restrictions that are put in place: are the user and
>> kernel sockets treated the same way? If yes, blocking TCP means that
>> even if it will be possible for the userspace to create an IPPROTO_MPTCP
>> socket, the kernel will not be allowed to IPPROTO_TCP ones to
>> communicate with the outside world. So blocking TCP will implicitly
>> block MPTCP.
>>
>> On the other hand, if only TCP user sockets are blocked, then it will be
>> possible to use MPTCP to communicate to any TCP sockets: with an
>> IPPROTO_MPTCP socket, it is possible to communicate with any IPPROTO_TCP
>> sockets, but without the extra features supported by MPTCP.
> 
> Yes, that how Landlock works, it only enforces a security policy defined
> by user space on user space.  The kernel on its own is never restricted.

OK, thank you, that's clearer.

>>> As an example, if a Landlock policies only allows MPTCP: socket(...,
>>> IPPROTO_MPTCP) should be allowed and any legitimate use of the returned
>>> socket (according to MPTCP) should be allowed, including TCP fallback.
>>> However, socket(..., IPPROTO_TCP/0), should only be allowed if TCP is
>>> explicitly allowed.  This means that we might end up with an MPTCP
>>> socket only using TCP, which is OK.
>>
>> Would it not be confusing for the person who set the Landlock policies?
>> Especially for the ones who had policies to block TCP, and thought they
>> were "safe", no?
> 
> There are two kind of users for Landlock:
> 1. developers sandboxing their applications;
> 2. sysadmins or security experts sandboxing execution environments (e.g.
>    with container runtimes, service managers, sandboxing tools...).
> 
> It would make sense for developers to allow what their code request,
> whatever fallback the kernel might use instead.  In this case, they
> should not care about MPTCP being TCP with some flags underneath.
> Moreover, developers might not be aware of the system on which their
> application is running, and their concern should mainly be about
> compatibility.
> 
> For security or network experts, implying that allowing MPTCP means that
> fallback to TCP is allowed might be a bit surprising at first, but they
> should have the knowledge to know how MPTCP works underneath, including
> this fallback mechanism.  Moreover, this kind of users can (and should)
> also rely on system-wide security policies such as Netfilter, which
> give more control.
> 
> In a nutshell, Landlock should favor compatibility at the sandboxing/app
> layers and we should rely on system-wide security policies (taking into
> account the running system's context) for more fine-grained control.
> This compatibility behaviors should be explained in the Landlock
> documentation though.

Thank you, also clearer!

In my mind, Landlock would be used to get a sort of "jail" so that "any"
users could use it to run untrusted apps for example. In that case, I
was thinking no everybody will know that MPTCP can be used to bypass
some restrictions only applied to TCP sockets.

>> If only TCP is blocked on the userspace side, simply using IPPROTO_MPTCP
>> instead of IPPROTO_TCP will allow any users to continue to talk with the
>> outside world. Also, it is easy to force apps to use IPPROTO_MPTCP
>> instead of IPPROTO_TCP, e.g. using 'mptcpize' which set LD_PRELOAD in
>> order to change the parameters of the socket() call.
>>
>>    mptcpize run curl https://check.mptcp.dev
> 
> Landlock restrictions are enforced at a specific time for a process and
> all its future children.  LD_PRELOAD is not an issue because a security
> policy cannot be disabled once enforced.  If a sandboxed program uses
> MPTCP (because of LD_PRELOAD) instead of TCP, the previously enforced
> policy will be enforced the same (either to allow or deny the use of
> MPTCP).
> 
> The only issue with LD_PRELOAD could be when e.g. curl sandboxes itself
> and denies itself the use of MPTCP, whereas mptcpize would "patch" the
> curl process to use MPTCP.  In this case, connections would failed.  A
> solution would be for mptcpize to "patch" the Landlock security as well,
> or for curl to be more permissive.  If the sandboxing happens before
> calling mptcpize, or if it is enforced by mptcpize, then it would work
> as expected.

OK, it is clearer for me now that I understand apps can sandbox themselves!

>>> I guess this should be the same for other protocols, except if user
>>> space can explicitly transform a specific socket type to use an
>>> *arbitrary* protocol, but I think this is not possible.
>> I'm sorry, I don't know what is possible with the other ones. But again,
>> blocking both user and kernel sockets the same way might make more sense
>> here.
>>
>>>>>>>>
>>>>>>>> You mean that users always rely on a plain TCP communication in the case
>>>>>>>> the connection of MPTCP multipath communication fails?
>>>>>>>
>>>>>>> Yes, that's the same TCP connection, just without extra bit to be able
>>>>>>> to use multiple TCP connections associated to the same MPTCP one.
>>>>>>
>>>>>> Indeed, so MPTCP communication should be restricted the same way as TCP.
>>>>>> AFAICS this should be intuitive for MPTCP users and it'll be better
>>>>>> to let userland define this dependency.
>>>>>
>>>>> Yes, I think that would make more sense.
>>>>>
>>>>> I guess we can look at MPTCP as TCP with extra features.
>>>>
>>>> Yeap
>>>>
>>>>>
>>>>> So if TCP is blocked, MPTCP should be blocked as well. (And eventually
>>>>> having the possibility to block only TCP but not MPTCP and the opposite,
>>>>> but that's a different topic: a possible new feature, but not a bug-fix)
>>>> What do you mean by the "bug fix"?
>>>>
>>>>>
>>>>>>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>>>>>>>      IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
>>>
>>> According to the man page: "It is allowed only for IPv6 sockets that are
>>> connected and bound to a v4-mapped-on-v6 address."
>>>
>>> This compatibility feature makes sense from user space point of view and
>>> should not result in an error because of Landlock.
>>>
>>>>>>>>>>
>>>>>>>>>> As I said before, I wonder if user may want to use SMC or MPTCP and
>>>>>>>>>> deny
>>>>>>>>>> TCP communication, since he should rely on fallback transformation
>>>>>>>>>> during the connection in the common case. It may be unexpected for
>>>>>>>>>> connect(2) to fail during the fallback due to security politics.
>>>>>>>>>
>>>>>>>>> With MPTCP, fallbacks can happen at the beginning of a connection, when
>>>>>>>>> there is only one path. This is done after the userspace's
>>>>>>>>> connect(). If
>>>
>>> A remaining question is then, can we repurpose an MPTCP socket that did
>>> fallback to TCP, to (re)connect to another destination (this time
>>> directly with TCP)?
>>
>> If the socket was created with the IPPROTO_MPTCP protocol, the protocol
>> will not change after a disconnection. But still, with an MPTCP socket,
>> it is by design possible to connect to a TCP one no mater how the socket
>> was used before.
> 
> OK, this makes sense if we see MPTCP as a superset of TCP.
> 
>>
>>> I guess this is possible.  If it is the case, I think it should be OK
>>> anyway.  That could be used by an attacker, but that should not give
>>> more access because of the MPTCP fallback mechanism anyway.  We should
>>> see MPTCP as a superset of TCP.  At the end, security policy is in the
>>> hands of user space.
>>
>> As long as it is documented and not seen as a regression :)
>>
>> To me, it sounds strange to have to add extra rules for MPTCP if TCP is
>> blocked, but that's certainly because I see MPTCP like it is seen on the
>> wire: as an extension to TCP, not as a different protocol.
> 
> I understand.  For Landlock, I'd prefer to not add exceptions according
> to protocol implementations, but to define a security policy that could
> easily map to user space code.  The current proposal is to map the
> Landlock API to (a superset of) the socket(2) API, and then being able
> to specify restrictions on a domain, a type, or a protocol.  However, we
> could document and encourage users to only specify AF_INET/AF_INET6 +
> SOCK_STREAM but without specifying any protocol (not "0" but a wildcard
> "(u64)-1"), which would then implicitly allow TCP and MPTCP.

Good idea!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction
  2025-01-29 14:51                                 ` Mickaël Salaün
  2025-01-29 15:44                                   ` Matthieu Baerts
@ 2025-01-31 11:04                                   ` Mikhail Ivanov
  1 sibling, 0 replies; 50+ messages in thread
From: Mikhail Ivanov @ 2025-01-31 11:04 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Matthieu Baerts, gnoack, willemdebruijn.kernel, matthieu,
	linux-security-module, netdev, netfilter-devel, yusongping,
	artem.kuzin, konstantin.meskhidze, MPTCP Linux, linux-nfs,
	Paul Moore

On 1/29/2025 5:51 PM, Mickaël Salaün wrote:>>>>>>> On 28/01/2025 11:56, 
Mikhail Ivanov wrote:

[...]

>>>>>>>> * IPv6 -> IPv4 transformation for TCP and UDP sockets withon
>>>>>>>>       IPV6_ADDRFORM. Can be controlled with setsockopt() security hook.
> 
> According to the man page: "It is allowed only for IPv6 sockets that are
> connected and bound to a v4-mapped-on-v6 address."
> 
> This compatibility feature makes sense from user space point of view and
> should not result in an error because of Landlock.

IPV6_ADDRFORM is useful to pass IPv6 sockets binded and connected to
v4-mapped-on-v6 addresses to pure IPv4 applications [1].

I just realized we first need to consider restriction of IPv4 access
for IPv4/v6 dual stack. It's possible to communicate with IPv4 peer
using IPv6 socket (on client or server side) that is mapped on
v4-mapped-on-v6 address (RFC 3493 [2]). If socket access rights provide
separate control over IPv6 and IPv4, v4-mapped-on-v6 looks like possible
bypass of IPv4 restriction and violation of the least astonishment
principle.

This can be controlled with IPV6_V6ONLY socket option or with
net.ipv6.bindv6only sysctl knob. Restriction with sysctl knob is applied
globally and may break some dual-stack dependent applications.

I'm currently trying to collect real-world examples in which user may
want to allow IPv6-only communication in a sandboxed environment.
Theoretically, this can be seen as unprivileged reduction of attack
surface for IPv6-only programs in dual-stack network (disallow to open
IPv4 connections and communicate with loopback via IPv4 stack).

Earlier, it was also discussed about possible security issues on the
userland side related to different address representation and address
filtering [3]. But, I don't really think these are the good examples for
the motivation.

If the v4-mapped-on-v6 addressing control is deemed reasonable, it
should be better implemented with a new access right for
LANDLOCK_RULE_NET_PORT rather than a part of socket creation control.

[1] https://man7.org/linux/man-pages/man7/ipv6.7.html
[2] https://datatracker.ietf.org/doc/html/rfc3493#section-3.7
[3] https://lwn.net/Articles/688462/




^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2025-01-31 11:04 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-17 11:04 [RFC PATCH v2 0/8] Fix non-TCP restriction and inconsistency of TCP errors Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 1/8] landlock: Fix non-TCP sockets restriction Mikhail Ivanov
2024-10-17 12:59   ` Matthieu Baerts
2024-10-18 18:08     ` Mickaël Salaün
2024-10-31 16:21       ` Mikhail Ivanov
2024-11-08 17:16         ` David Laight
2024-12-04 19:29           ` Mickaël Salaün
2024-12-12 18:43         ` Mickaël Salaün
2024-12-13 18:19           ` Mikhail Ivanov
2025-01-24 15:02             ` Mickaël Salaün
2025-01-27 12:40               ` Mikhail Ivanov
2025-01-27 19:48                 ` Mickaël Salaün
2025-01-28 10:56                   ` Mikhail Ivanov
2025-01-28 18:14                     ` Matthieu Baerts
2025-01-29  9:52                       ` Mikhail Ivanov
2025-01-29 10:25                         ` Matthieu Baerts
2025-01-29 11:02                           ` Mikhail Ivanov
2025-01-29 11:33                             ` Matthieu Baerts
2025-01-29 11:47                               ` Mikhail Ivanov
2025-01-29 11:57                                 ` Matthieu Baerts
2025-01-29 14:51                                 ` Mickaël Salaün
2025-01-29 15:44                                   ` Matthieu Baerts
2025-01-30  9:51                                     ` Mickaël Salaün
2025-01-30 10:18                                       ` Matthieu Baerts
2025-01-31 11:04                                   ` Mikhail Ivanov
2024-12-04 19:27       ` Mickaël Salaün
2024-12-04 19:35         ` Mickaël Salaün
2024-12-09 10:19           ` Mikhail Ivanov
2024-12-10 18:04             ` Mickaël Salaün
2024-12-10 18:05               ` Mickaël Salaün
2024-12-11 15:24                 ` Mikhail Ivanov
2024-12-12 18:43                   ` Mickaël Salaün
2024-12-13 11:42                     ` Mikhail Ivanov
2024-12-04 19:30   ` Mickaël Salaün
2024-12-09 10:19     ` Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 2/8] landlock: Make network stack layer checks explicit for each TCP action Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 3/8] landlock: Fix inconsistency of errors for TCP actions Mikhail Ivanov
2024-10-17 11:34   ` Mikhail Ivanov
2024-10-17 12:48   ` Tetsuo Handa
2024-11-06  9:27     ` Mikhail Ivanov
2024-12-04 19:32   ` Mickaël Salaün
2024-10-17 11:04 ` [RFC PATCH v2 4/8] selftests/landlock: Test TCP accesses with protocol=IPPROTO_TCP Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 5/8] selftests/landlock: Test that MPTCP actions are not restricted Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 6/8] selftests/landlock: Test consistency of errors for TCP actions Mikhail Ivanov
2024-12-10 18:07   ` Mickaël Salaün
2024-12-11 15:29     ` Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 7/8] landlock: Add note about errors consistency in documentation Mikhail Ivanov
2024-12-10 18:08   ` Mickaël Salaün
2024-12-11 15:30     ` Mikhail Ivanov
2024-10-17 11:04 ` [RFC PATCH v2 8/8] selftests/landlock: Test that SCTP actions are not restricted Mikhail Ivanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).