* [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE @ 2023-04-07 22:21 Dave Pifke 2023-04-08 18:23 ` Pablo Neira Ayuso 2023-04-08 23:09 ` Pablo Neira Ayuso 0 siblings, 2 replies; 7+ messages in thread From: Dave Pifke @ 2023-04-07 22:21 UTC (permalink / raw) To: netfilter-devel Prior to this patch, nft inside a systemd-nspawn container was failing to install my ruleset (which includes a large-ish map), with the error netlink: Error: Could not process rule: Message too long strace reveals: setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted) This is despite the nspawn process supposedly having CAP_NET_ADMIN, and despite /proc/sys/net/core/wmem_max (in the main host namespace) being set larger than the requested size: net.core.wmem_max = 16777216 A web search reveals at least one other user having the same issue: https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/ After this patch, nft succeeds. --- src/mnl.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/mnl.c b/src/mnl.c index 26f943db..ab6750c8 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, return; /* Rise sender buffer length to avoid hitting -EMSGSIZE */ + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, + &newbuffsiz, sizeof(socklen_t)) == 0) + return; + + /* If the above fails (probably because it exceeds + * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE. + * This requires CAP_NET_ADMIN. */ if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, &newbuffsiz, sizeof(socklen_t)) < 0) return; -- 2.20.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke @ 2023-04-08 18:23 ` Pablo Neira Ayuso 2023-04-08 18:34 ` Dave Pifke 2023-04-08 23:09 ` Pablo Neira Ayuso 1 sibling, 1 reply; 7+ messages in thread From: Pablo Neira Ayuso @ 2023-04-08 18:23 UTC (permalink / raw) To: Dave Pifke; +Cc: netfilter-devel On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote: > Prior to this patch, nft inside a systemd-nspawn container was failing > to install my ruleset (which includes a large-ish map), with the error > > netlink: Error: Could not process rule: Message too long > > strace reveals: > > setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted) > > This is despite the nspawn process supposedly having CAP_NET_ADMIN, > and despite /proc/sys/net/core/wmem_max (in the main host namespace) > being set larger than the requested size: > > net.core.wmem_max = 16777216 > > A web search reveals at least one other user having the same issue: > > https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/ > > After this patch, nft succeeds. Patch LGTM. May I add your Signed-off-by: tag to this patch before applying it? Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-08 18:23 ` Pablo Neira Ayuso @ 2023-04-08 18:34 ` Dave Pifke 0 siblings, 0 replies; 7+ messages in thread From: Dave Pifke @ 2023-04-08 18:34 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel Pablo Neira Ayuso <pablo@netfilter.org> writes: > May I add your Signed-off-by: tag to this patch before applying it? Yes. Thank you! -- Dave Pifke, dave@pifke.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke 2023-04-08 18:23 ` Pablo Neira Ayuso @ 2023-04-08 23:09 ` Pablo Neira Ayuso 2023-04-10 9:04 ` Pablo Neira Ayuso 2023-04-10 18:03 ` Dave Pifke 1 sibling, 2 replies; 7+ messages in thread From: Pablo Neira Ayuso @ 2023-04-08 23:09 UTC (permalink / raw) To: Dave Pifke; +Cc: netfilter-devel Hi again, Let me revisit this. On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote: > Prior to this patch, nft inside a systemd-nspawn container was failing > to install my ruleset (which includes a large-ish map), with the error > > netlink: Error: Could not process rule: Message too long > > strace reveals: > > setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted) > > This is despite the nspawn process supposedly having CAP_NET_ADMIN, > and despite /proc/sys/net/core/wmem_max (in the main host namespace) > being set larger than the requested size: > > net.core.wmem_max = 16777216 OK, so you indeed increased net.core.wmem_max on the host namespace. > A web search reveals at least one other user having the same issue: > > https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/ > > After this patch, nft succeeds. > --- > src/mnl.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/src/mnl.c b/src/mnl.c > index 26f943db..ab6750c8 100644 > --- a/src/mnl.c > +++ b/src/mnl.c > @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, > return; > > /* Rise sender buffer length to avoid hitting -EMSGSIZE */ > + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, > + &newbuffsiz, sizeof(socklen_t)) == 0) > + return; setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is specified by net.core.wmem_max This needs to call: setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, &newbuffsiz, sizeof(socklen_t)); without checking the return value. Otherwise, SO_SNDBUFFORCE is never going to be called after this patch. This needs a v2. On top of this patch, you still needed to increase net.core.wmem_max in your host container for this to work. > + /* If the above fails (probably because it exceeds > + * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE. > + * This requires CAP_NET_ADMIN. */ > if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, > &newbuffsiz, sizeof(socklen_t)) < 0) > return; > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-08 23:09 ` Pablo Neira Ayuso @ 2023-04-10 9:04 ` Pablo Neira Ayuso 2023-04-10 18:03 ` Dave Pifke 1 sibling, 0 replies; 7+ messages in thread From: Pablo Neira Ayuso @ 2023-04-10 9:04 UTC (permalink / raw) To: Dave Pifke; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 897 bytes --] Hi, On Sun, Apr 09, 2023 at 01:09:07AM +0200, Pablo Neira Ayuso wrote: > > diff --git a/src/mnl.c b/src/mnl.c > > index 26f943db..ab6750c8 100644 > > --- a/src/mnl.c > > +++ b/src/mnl.c > > @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, > > return; > > > > /* Rise sender buffer length to avoid hitting -EMSGSIZE */ > > + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, > > + &newbuffsiz, sizeof(socklen_t)) == 0) > > + return; > > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is > specified by net.core.wmem_max > > This needs to call: > > setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, > &newbuffsiz, sizeof(socklen_t)); > > without checking the return value. Otherwise, SO_SNDBUFFORCE is never > going to be called after this patch. This needs a v2. I think this patch should be fine. [-- Attachment #2: x.patch --] [-- Type: text/x-diff, Size: 678 bytes --] diff --git a/src/mnl.c b/src/mnl.c index 26f943dbb4c8..ee62c0c9c2a0 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -261,8 +261,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, /* Rise sender buffer length to avoid hitting -EMSGSIZE */ if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, - &newbuffsiz, sizeof(socklen_t)) < 0) - return; + &newbuffsiz, sizeof(socklen_t)) < 0) { + /* Fall back to SO_SNDBUF, this never fails, kernel trims down + * the size to net.core.wmem_max. + */ + setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, + &newbuffsiz, sizeof(socklen_t)); + } } static unsigned int nlsndbufsiz; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-08 23:09 ` Pablo Neira Ayuso 2023-04-10 9:04 ` Pablo Neira Ayuso @ 2023-04-10 18:03 ` Dave Pifke 2023-04-18 10:10 ` Pablo Neira Ayuso 1 sibling, 1 reply; 7+ messages in thread From: Dave Pifke @ 2023-04-10 18:03 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1121 bytes --] Pablo Neira Ayuso <pablo@netfilter.org> writes: > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is > specified by net.core.wmem_max Oh, good catch! Your revised patch LGTM, and is closer to what was being done in the immediately proceeding function, mnl_set_rcvbuffer. However, after thinking about it, I feel we should be checking the receiver value after setsockopt returns. If someone is running e.g. AppArmor, it seems better to me to attempt the non-privileged operation first, to avoid adding noise in the logs. Also, I don't think there are any current situations where SO_SNDBUFFORCE might also trim down the value, but after re-reading the man page, I'm not sure the contract precludes that in the future. Attached is a V3 patch for consideration, which also changes the code to attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE. I defer to your judgment on which version is actually better; I tested both and they both work a) in a container where SO_SNDBUFFORCE fails, and b) outside a container with wmem_max set to a small-ish value where SO_SNDBUFFORCE is required. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: so-sndbuf-v3.patch --] [-- Type: text/x-diff, Size: 1945 bytes --] diff --git a/src/mnl.c b/src/mnl.c index 26f943db..dcc22b82 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -259,10 +259,19 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl, if (newbuffsiz <= sndnlbuffsiz) return; - /* Rise sender buffer length to avoid hitting -EMSGSIZE */ - if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, - &newbuffsiz, sizeof(socklen_t)) < 0) - return; + /* Raise sender buffer length to avoid hitting -EMSGSIZE. The kernel may + * reduce this to /proc/sys/net/core/wmem_max, see socket(7). + */ + sndnlbuffsiz = newbuffsiz; + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF, + &sndnlbuffsiz, sizeof(socklen_t)) < 0 || sndnlbuffsiz < newbuffsiz) { + /* If SO_SNDBUF failed or the resulting size is still too small, try + * again with SO_SNDBUFFORCE. This requires CAP_NET_ADMIN. + */ + sndnlbuffsiz = newbuffsiz; + setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE, + &sndnlbuffsiz, sizeof(socklen_t)); + } } static unsigned int nlsndbufsiz; @@ -280,14 +289,16 @@ static int mnl_set_rcvbuffer(const struct mnl_socket *nl, socklen_t bufsiz) if (nlsndbufsiz >= bufsiz) return 0; - ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE, - &bufsiz, sizeof(socklen_t)); - if (ret < 0) { - /* If this doesn't work, try to reach the system wide maximum - * (or whatever the user requested). + nlsndbufsiz = bufsiz; + ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF, + &nlsndbufsiz, sizeof(socklen_t)); + if (ret < 0 || nlsndbufsiz < bufsiz) { + /* If this doesn't work, try again with SO_RCVBUFFORCE. This requires + * CAP_NET_ADMIN. */ - ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF, - &bufsiz, sizeof(socklen_t)); + nlsndbufsiz = bufsiz; + ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE, + &nlsndbufsiz, sizeof(socklen_t)); } return ret; [-- Attachment #3: Type: text/plain, Size: 33 bytes --] -- Dave Pifke, dave@pifke.org ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE 2023-04-10 18:03 ` Dave Pifke @ 2023-04-18 10:10 ` Pablo Neira Ayuso 0 siblings, 0 replies; 7+ messages in thread From: Pablo Neira Ayuso @ 2023-04-18 10:10 UTC (permalink / raw) To: Dave Pifke; +Cc: netfilter-devel On Mon, Apr 10, 2023 at 12:03:34PM -0600, Dave Pifke wrote: > Pablo Neira Ayuso <pablo@netfilter.org> writes: > > > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is > > specified by net.core.wmem_max > > Oh, good catch! Your revised patch LGTM, and is closer to what was > being done in the immediately proceeding function, mnl_set_rcvbuffer. > > However, after thinking about it, I feel we should be checking the > receiver value after setsockopt returns. If someone is running > e.g. AppArmor, it seems better to me to attempt the non-privileged > operation first, to avoid adding noise in the logs. > > Also, I don't think there are any current situations where > SO_SNDBUFFORCE might also trim down the value, but after re-reading the > man page, I'm not sure the contract precludes that in the future. > > Attached is a V3 patch for consideration, which also changes the code to > attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE. I defer to > your judgment on which version is actually better; I tested both and > they both work a) in a container where SO_SNDBUFFORCE fails, and b) > outside a container with wmem_max set to a small-ish value where > SO_SNDBUFFORCE is required. Thanks for your patch. setsockopt() does not update the &sndnlbuffsiz that is passed as argument in Linux. I have posted this patch: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20230418100223.158964-1-pablo@netfilter.org/ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-18 10:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke 2023-04-08 18:23 ` Pablo Neira Ayuso 2023-04-08 18:34 ` Dave Pifke 2023-04-08 23:09 ` Pablo Neira Ayuso 2023-04-10 9:04 ` Pablo Neira Ayuso 2023-04-10 18:03 ` Dave Pifke 2023-04-18 10:10 ` Pablo Neira Ayuso
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.