From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, cgroups@vger.kernel.org,
Nadia Pinaeva <n.m.pinaeva@gmail.com>,
Florian Westphal <fw@strlen.de>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 52/63] netfilter: nft_socket: make cgroupsv2 matching work with namespaces
Date: Mon, 16 Sep 2024 13:44:31 +0200 [thread overview]
Message-ID: <20240916114222.900057413@linuxfoundation.org> (raw)
In-Reply-To: <20240916114221.021192667@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal <fw@strlen.de>
[ Upstream commit 7f3287db654395f9c5ddd246325ff7889f550286 ]
When running in container environmment, /sys/fs/cgroup/ might not be
the real root node of the sk-attached cgroup.
Example:
In container:
% stat /sys//fs/cgroup/
Device: 0,21 Inode: 2214 ..
% stat /sys/fs/cgroup/foo
Device: 0,21 Inode: 2264 ..
The expectation would be for:
nft add rule .. socket cgroupv2 level 1 "foo" counter
to match traffic from a process that got added to "foo" via
"echo $pid > /sys/fs/cgroup/foo/cgroup.procs".
However, 'level 3' is needed to make this work.
Seen from initial namespace, the complete hierarchy is:
% stat /sys/fs/cgroup/system.slice/docker-.../foo
Device: 0,21 Inode: 2264 ..
i.e. hierarchy is
0 1 2 3
/ -> system.slice -> docker-1... -> foo
... but the container doesn't know that its "/" is the "docker-1.."
cgroup. Current code will retrieve the 'system.slice' cgroup node
and store its kn->id in the destination register, so compare with
2264 ("foo" cgroup id) will not match.
Fetch "/" cgroup from ->init() and add its level to the level we try to
extract. cgroup root-level is 0 for the init-namespace or the level
of the ancestor that is exposed as the cgroup root inside the container.
In the above case, cgrp->level of "/" resolved in the container is 2
(docker-1...scope/) and request for 'level 1' will get adjusted
to fetch the actual level (3).
v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it.
(kernel test robot)
Cc: cgroups@vger.kernel.org
Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2")
Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nft_socket.c | 41 +++++++++++++++++++++++++++++++++++---
1 file changed, 38 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c
index 0f37738e4b26..8722f712c019 100644
--- a/net/netfilter/nft_socket.c
+++ b/net/netfilter/nft_socket.c
@@ -9,7 +9,8 @@
struct nft_socket {
enum nft_socket_keys key:8;
- u8 level;
+ u8 level; /* cgroupv2 level to extract */
+ u8 level_user; /* cgroupv2 level provided by userspace */
u8 len;
union {
u8 dreg;
@@ -53,6 +54,28 @@ nft_sock_get_eval_cgroupv2(u32 *dest, struct sock *sk, const struct nft_pktinfo
memcpy(dest, &cgid, sizeof(u64));
return true;
}
+
+/* process context only, uses current->nsproxy. */
+static noinline int nft_socket_cgroup_subtree_level(void)
+{
+ struct cgroup *cgrp = cgroup_get_from_path("/");
+ int level;
+
+ if (!cgrp)
+ return -ENOENT;
+
+ level = cgrp->level;
+
+ cgroup_put(cgrp);
+
+ if (WARN_ON_ONCE(level > 255))
+ return -ERANGE;
+
+ if (WARN_ON_ONCE(level < 0))
+ return -EINVAL;
+
+ return level;
+}
#endif
static struct sock *nft_socket_do_lookup(const struct nft_pktinfo *pkt)
@@ -174,9 +197,10 @@ static int nft_socket_init(const struct nft_ctx *ctx,
case NFT_SOCKET_MARK:
len = sizeof(u32);
break;
-#ifdef CONFIG_CGROUPS
+#ifdef CONFIG_SOCK_CGROUP_DATA
case NFT_SOCKET_CGROUPV2: {
unsigned int level;
+ int err;
if (!tb[NFTA_SOCKET_LEVEL])
return -EINVAL;
@@ -185,6 +209,17 @@ static int nft_socket_init(const struct nft_ctx *ctx,
if (level > 255)
return -EOPNOTSUPP;
+ err = nft_socket_cgroup_subtree_level();
+ if (err < 0)
+ return err;
+
+ priv->level_user = level;
+
+ level += err;
+ /* Implies a giant cgroup tree */
+ if (WARN_ON_ONCE(level > 255))
+ return -EOPNOTSUPP;
+
priv->level = level;
len = sizeof(u64);
break;
@@ -209,7 +244,7 @@ static int nft_socket_dump(struct sk_buff *skb,
if (nft_dump_register(skb, NFTA_SOCKET_DREG, priv->dreg))
return -1;
if (priv->key == NFT_SOCKET_CGROUPV2 &&
- nla_put_be32(skb, NFTA_SOCKET_LEVEL, htonl(priv->level)))
+ nla_put_be32(skb, NFTA_SOCKET_LEVEL, htonl(priv->level_user)))
return -1;
return 0;
}
--
2.43.0
next prev parent reply other threads:[~2024-09-16 12:01 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-16 11:43 [PATCH 6.1 00/63] 6.1.111-rc1 review Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 01/63] ksmbd: override fsids for share path check Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 02/63] ksmbd: override fsids for smb2_query_info() Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 03/63] usbnet: ipheth: fix carrier detection in modes 1 and 4 Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 04/63] net: ethernet: use ip_hdrlen() instead of bit shift Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 05/63] drm: panel-orientation-quirks: Add quirk for Ayn Loki Zero Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 06/63] drm: panel-orientation-quirks: Add quirk for Ayn Loki Max Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 07/63] net: phy: vitesse: repair vsc73xx autonegotiation Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 08/63] powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 09/63] btrfs: update target inodes ctime on unlink Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 10/63] Input: ads7846 - ratelimit the spi_sync error message Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 11/63] Input: synaptics - enable SMBus for HP Elitebook 840 G2 Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 12/63] HID: multitouch: Add support for GT7868Q Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 13/63] scripts: kconfig: merge_config: config files: add a trailing newline Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 14/63] platform/surface: aggregator_registry: Add Support for Surface Pro 10 Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 15/63] platform/surface: aggregator_registry: Add support for Surface Laptop Go 3 Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 16/63] drm/msm/adreno: Fix error return if missing firmware-name Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 17/63] Input: i8042 - add Fujitsu Lifebook E756 to i8042 quirk table Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 18/63] smb/server: fix return value of smb2_open() Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 19/63] NFSv4: Fix clearing of layout segments in layoutreturn Greg Kroah-Hartman
2024-09-16 11:43 ` [PATCH 6.1 20/63] NFS: Avoid unnecessary rescanning of the per-server delegation list Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 21/63] platform/x86: panasonic-laptop: Fix SINF array out of bounds accesses Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 22/63] platform/x86: panasonic-laptop: Allocate 1 entry extra in the sinf array Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 23/63] mptcp: pm: Fix uaf in __timer_delete_sync Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 24/63] arm64: dts: rockchip: fix eMMC/SPI corruption when audio has been used on RK3399 Puma Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 25/63] arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog " Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 26/63] minmax: reduce min/max macro expansion in atomisp driver Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 27/63] net: tighten bad gso csum offset check in virtio_net_hdr Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 28/63] dm-integrity: fix a race condition when accessing recalc_sector Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 29/63] mm: avoid leaving partial pfn mappings around in error case Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 30/63] net: xilinx: axienet: Fix race in axienet_stop Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 31/63] pmdomain: ti: Add a null pointer check to the omap_prm_domain_init Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 32/63] fs/ntfs3: Use kvfree to free memory allocated by kvmalloc Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 33/63] arm64: dts: rockchip: fix PMIC interrupt pin in pinctrl for ROCK Pi E Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 34/63] eeprom: digsy_mtc: Fix 93xx46 driver probe failure Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 35/63] cxl/core: Fix incorrect vendor debug UUID define Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 36/63] selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 37/63] hwmon: (pmbus) Conditionally clear individual status bits for pmbus rev >= 1.2 Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 38/63] ice: fix accounting for filters shared by multiple VSIs Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 39/63] igb: Always call igb_xdp_ring_update_tail() under Tx lock Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 40/63] net/mlx5: Update the list of the PCI supported devices Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 41/63] net/mlx5e: Add missing link modes to ptys2ethtool_map Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 42/63] net/mlx5: Explicitly set scheduling element and TSAR type Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 43/63] net/mlx5: Add missing masks and QoS bit masks for scheduling elements Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 44/63] net/mlx5: Correct TASR typo into TSAR Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 45/63] net/mlx5: Verify support for scheduling element and TSAR type Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 46/63] net/mlx5: Fix bridge mode operations when there are no VFs Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 47/63] fou: fix initialization of grc Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 48/63] octeontx2-af: Set XOFF on other child transmit schedulers during SMQ flush Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 49/63] octeontx2-af: Modify SMQ flush sequence to drop packets Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 50/63] net: ftgmac100: Enable TX interrupt to avoid TX timeout Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 51/63] netfilter: nft_socket: fix sk refcount leaks Greg Kroah-Hartman
2024-09-16 11:44 ` Greg Kroah-Hartman [this message]
2024-09-16 11:44 ` [PATCH 6.1 53/63] net: dpaa: Pad packets to ETH_ZLEN Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 54/63] spi: nxp-fspi: fix the KASAN report out-of-bounds bug Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 55/63] soundwire: stream: Revert "soundwire: stream: fix programming slave ports for non-continous port maps" Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 56/63] dma-buf: heaps: Fix off-by-one in CMA heap fault handler Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 57/63] drm/amdgpu/atomfirmware: Silence UBSAN warning Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 58/63] spi: geni-qcom: Convert to platform remove callback returning void Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 59/63] spi: geni-qcom: Undo runtime PM changes at driver exit time Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 60/63] spi: geni-qcom: Fix incorrect free_irq() sequence Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 61/63] drm/i915/guc: prevent a possible int overflow in wq offsets Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 62/63] pinctrl: meteorlake: Add Arrow Lake-H/U ACPI ID Greg Kroah-Hartman
2024-09-16 11:44 ` [PATCH 6.1 63/63] ASoC: meson: axg-card: fix use-after-free Greg Kroah-Hartman
2024-09-16 17:28 ` [PATCH 6.1 00/63] 6.1.111-rc1 review Peter Schneider
2024-09-17 9:38 ` Yann Sionneau
2024-09-17 9:56 ` Mark Brown
2024-09-17 14:43 ` Naresh Kamboju
2024-09-18 6:19 ` Greg Kroah-Hartman
2024-09-18 13:56 ` Naresh Kamboju
2024-09-18 12:08 ` Georgi Djakov
2024-09-18 13:54 ` Naresh Kamboju
2024-09-25 15:42 ` Dan Carpenter
2024-10-08 23:43 ` Georgi Djakov
2024-09-17 15:18 ` Jon Hunter
2024-09-17 19:06 ` Pavel Machek
2024-09-17 21:29 ` Florian Fainelli
2024-09-17 22:42 ` Ron Economos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240916114222.900057413@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=cgroups@vger.kernel.org \
--cc=fw@strlen.de \
--cc=n.m.pinaeva@gmail.com \
--cc=pablo@netfilter.org \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox