* Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast?
@ 2024-10-31 7:23 Austin Hendrix
2024-10-31 21:52 ` Florian Westphal
0 siblings, 1 reply; 4+ messages in thread
From: Austin Hendrix @ 2024-10-31 7:23 UTC (permalink / raw)
To: netdev
I am trying to use the iptables cgroups match to filter outbound
multicast; the goal is to only allow processes in one net_cls to send
traffic on a particular multicast address.
These are not my firewall rules, but for sake of example, we could
allow processes in a cgroup to send to the mDNS multicast group, and
prevent all other processes from doing this. Let's also log those
packets, while we're at it.
-A POSTROUTING -d 224.0.0.251 -j NFLOG
-A POSTROUTING -m cgroup --cgroup 0x4242 -d 224.0.0.251 -j ACCEPT
-A POSTROUTING -d 224.0.0.251 -j DROP
In practice, when I have a single, well-behaved application that tries
to send to this multicast group, I see some very weird things:
* The NFLOG rule is hit twice for every outbound packet (I'll get back to this)
* The cgroup rule is hit once for every outbound packet, and the
packet goes out successfully.
* The DROP rule is _also_ hit once for every outbound packet.
When I use tcpdump to take a capture with nflog, things get
interesting. tcpdump captures two copies of every packet. The FIRST
copy does not have the NFULA_UID and NFULA_GID headers set; the second
copy does!
I've been staring at the linux source code for a while, and I think
this part of ip_mc_output explains it.
if (sk_mc_loop(sk)
#ifdef CONFIG_IP_MROUTE
/* Small optimization: do not loopback not local frames,
which returned after forwarding; they will be dropped
by ip_mr_input in any case.
Note, that local frames are looped back to be delivered
to local recipients.
This check is duplicated in ip_mr_input at the moment.
*/
&&
((rt->rt_flags & RTCF_LOCAL) ||
!(IPCB(skb)->flags & IPSKB_FORWARDED))
#endif
) {
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
if (newskb)
NF_HOOK(NFPROTO_IPV4, NF_INET_POST_ROUTING,
net, sk, newskb, NULL, newskb->dev,
ip_mc_finish_output);
}
It looks like ip_mc_output duplicates outgoing multicast, sends the
copy through POSTROUTING first (remember how the first copy didn't
have UID and GID?), and then loops that copy back for local multicast
listeners.
I haven't followed all of the details yet, but it looks like the copy
that is looped back lacks the sk_buff attributes which identify the
UID, GID and cgroup of the sender.
Is my understanding of this correct? Is the netdev team willing to
discuss possible solutions to this, or is this behavior "by design?"
Thanks
-Austin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast?
2024-10-31 7:23 Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast? Austin Hendrix
@ 2024-10-31 21:52 ` Florian Westphal
2024-11-01 1:59 ` namniart
0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2024-10-31 21:52 UTC (permalink / raw)
To: Austin Hendrix; +Cc: netdev
Austin Hendrix <namniart@gmail.com> wrote:
> I've been staring at the linux source code for a while, and I think
> this part of ip_mc_output explains it.
>
> if (sk_mc_loop(sk)
> #ifdef CONFIG_IP_MROUTE
> /* Small optimization: do not loopback not local frames,
> which returned after forwarding; they will be dropped
> by ip_mr_input in any case.
> Note, that local frames are looped back to be delivered
> to local recipients.
>
> This check is duplicated in ip_mr_input at the moment.
> */
> &&
> ((rt->rt_flags & RTCF_LOCAL) ||
> !(IPCB(skb)->flags & IPSKB_FORWARDED))
> #endif
> ) {
> struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
> if (newskb)
> NF_HOOK(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> net, sk, newskb, NULL, newskb->dev,
> ip_mc_finish_output);
> }
>
> It looks like ip_mc_output duplicates outgoing multicast, sends the
> copy through POSTROUTING first (remember how the first copy didn't
> have UID and GID?), and then loops that copy back for local multicast
> listeners.
>
> I haven't followed all of the details yet, but it looks like the copy
> that is looped back lacks the sk_buff attributes which identify the
> UID, GID and cgroup of the sender.
Yes, skb_clone'd skbs are not owned by any socket.
> Is my understanding of this correct? Is the netdev team willing to
> discuss possible solutions to this, or is this behavior "by design?"
Its for historic reasons, this is very old and predates cgroups.
You could try this (untested) patch, ipv6 would need similar treatment.
We'd probably also want to extend this to RTCF_BROADCAST, i.e. add
skb_clone_sk() or similar helper and then use that for these clones.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -396,10 +396,16 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
#endif
) {
struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
- if (newskb)
+ if (newskb) {
+ struct sock *skb_sk = skb->sk;
+
+ if (skb_sk)
+ skb_set_owner_edemux(newskb, skb_sk);
+
NF_HOOK(NFPROTO_IPV4, NF_INET_POST_ROUTING,
net, sk, newskb, NULL, newskb->dev,
ip_mc_finish_output);
+ }
}
/* Multicasts with ttl 0 must not go beyond the host */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast?
2024-10-31 21:52 ` Florian Westphal
@ 2024-11-01 1:59 ` namniart
2024-11-02 16:12 ` Florian Westphal
0 siblings, 1 reply; 4+ messages in thread
From: namniart @ 2024-11-01 1:59 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
Thanks for the quick reply; I will work through our build process and try to get that tested in the next few days.
I was thinking the fix for this might be more substantial; call NF_HOOK without a callback at the top of ip_mc_output to determine the fate of the packet, and then make and loop back copies of the packet only if the packet passed the postrouting chain. That’d prevent the nf chain from being called multiple times for the exact same packet, could apply to multicast and RTCF_BROADCAST, and would solve the cgroup issue at the same time.
If you think that’s a useful approach, I am willing to write and test the patch.
-Austin
> On Oct 31, 2024, at 2:52 PM, Florian Westphal <fw@strlen.de> wrote:
>
> Austin Hendrix <namniart@gmail.com> wrote:
>> I've been staring at the linux source code for a while, and I think
>> this part of ip_mc_output explains it.
>>
>> if (sk_mc_loop(sk)
>> #ifdef CONFIG_IP_MROUTE
>> /* Small optimization: do not loopback not local frames,
>> which returned after forwarding; they will be dropped
>> by ip_mr_input in any case.
>> Note, that local frames are looped back to be delivered
>> to local recipients.
>>
>> This check is duplicated in ip_mr_input at the moment.
>> */
>> &&
>> ((rt->rt_flags & RTCF_LOCAL) ||
>> !(IPCB(skb)->flags & IPSKB_FORWARDED))
>> #endif
>> ) {
>> struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
>> if (newskb)
>> NF_HOOK(NFPROTO_IPV4, NF_INET_POST_ROUTING,
>> net, sk, newskb, NULL, newskb->dev,
>> ip_mc_finish_output);
>> }
>>
>> It looks like ip_mc_output duplicates outgoing multicast, sends the
>> copy through POSTROUTING first (remember how the first copy didn't
>> have UID and GID?), and then loops that copy back for local multicast
>> listeners.
>>
>> I haven't followed all of the details yet, but it looks like the copy
>> that is looped back lacks the sk_buff attributes which identify the
>> UID, GID and cgroup of the sender.
>
> Yes, skb_clone'd skbs are not owned by any socket.
>
>> Is my understanding of this correct? Is the netdev team willing to
>> discuss possible solutions to this, or is this behavior "by design?"
>
> Its for historic reasons, this is very old and predates cgroups.
>
> You could try this (untested) patch, ipv6 would need similar treatment.
> We'd probably also want to extend this to RTCF_BROADCAST, i.e. add
> skb_clone_sk() or similar helper and then use that for these clones.
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -396,10 +396,16 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> #endif
> ) {
> struct sk_buff *newskb = skb_clone(skb, GFP_ATOMIC);
> - if (newskb)
> + if (newskb) {
> + struct sock *skb_sk = skb->sk;
> +
> + if (skb_sk)
> + skb_set_owner_edemux(newskb, skb_sk);
> +
> NF_HOOK(NFPROTO_IPV4, NF_INET_POST_ROUTING,
> net, sk, newskb, NULL, newskb->dev,
> ip_mc_finish_output);
> + }
> }
>
> /* Multicasts with ttl 0 must not go beyond the host */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast?
2024-11-01 1:59 ` namniart
@ 2024-11-02 16:12 ` Florian Westphal
0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2024-11-02 16:12 UTC (permalink / raw)
To: namniart; +Cc: Florian Westphal, netdev
namniart@gmail.com <namniart@gmail.com> wrote:
> Thanks for the quick reply; I will work through our build process and try to get that tested in the next few days.
>
> I was thinking the fix for this might be more substantial; call NF_HOOK without a callback at the top of ip_mc_output to determine the fate of the packet,
The NF_QUEUE verdict can delegate this decision to a userspace process,
if this happened its too late to do the clone.
So this would work only if we had a way to remove support for NF_QUEUE
for multicast packets. I don't think this can be done after two
decades.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-11-02 16:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 7:23 Duplicate invocation of NF_INET_POST_ROUTING rule for outbound multicast? Austin Hendrix
2024-10-31 21:52 ` Florian Westphal
2024-11-01 1:59 ` namniart
2024-11-02 16:12 ` Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).