* [PATCH v3] veth: Drop MTU check when forwarding packets @ 2024-08-08 7:04 Duan Jiong 2024-08-08 9:23 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Duan Jiong @ 2024-08-08 7:04 UTC (permalink / raw) To: netdev; +Cc: Duan Jiong When the mtu of the veth card is not the same at both ends, there is no need to check the mtu when forwarding packets, and it should be a permissible behavior to allow receiving packets with larger mtu than your own. Signed-off-by: Duan Jiong <djduanjiong@gmail.com> --- drivers/net/veth.c | 2 +- include/linux/netdevice.h | 1 + net/core/dev.c | 6 ++++++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 426e68a95067..f505fe2a55c1 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -317,7 +317,7 @@ static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, struct veth_rq *rq, bool xdp) { - return __dev_forward_skb(dev, skb) ?: xdp ? + return __dev_forward_skb_nomtu(dev, skb) ?: xdp ? veth_xdp_rx(rq, skb) : __netif_rx(skb); } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d20c6c99eb88..8cee9b40e50e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3943,6 +3943,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); u8 dev_xdp_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); +int __dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb); int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb); int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); int dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb); diff --git a/net/core/dev.c b/net/core/dev.c index e1bb6d7856d9..acd740f78b1c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2190,6 +2190,12 @@ static int __dev_forward_skb2(struct net_device *dev, struct sk_buff *skb, return ret; } +int __dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb) +{ + return __dev_forward_skb2(dev, skb, false); +} +EXPORT_SYMBOL_GPL(__dev_forward_skb_nomtu); + int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb) { return __dev_forward_skb2(dev, skb, true); -- 2.38.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-08 7:04 [PATCH v3] veth: Drop MTU check when forwarding packets Duan Jiong @ 2024-08-08 9:23 ` Toke Høiland-Jørgensen 2024-08-08 9:55 ` Duan Jiong 0 siblings, 1 reply; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-08 9:23 UTC (permalink / raw) To: Duan Jiong, netdev Duan Jiong <djduanjiong@gmail.com> writes: > When the mtu of the veth card is not the same at both ends, there > is no need to check the mtu when forwarding packets, and it should > be a permissible behavior to allow receiving packets with larger > mtu than your own. Erm, huh? The MTU check is against the receiving interface, so AFAICT your patch just disables MTU checking entirely for veth devices, which seems ill-advised. What's the issue you are trying to fix, exactly? -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-08 9:23 ` Toke Høiland-Jørgensen @ 2024-08-08 9:55 ` Duan Jiong 2024-08-08 10:21 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Duan Jiong @ 2024-08-08 9:55 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: netdev Now there are veth device pairs, one vethA on the host side and one vethB on the container side, when vethA sends data to vethB, at this time if vethB is configured with a smaller MTU then the data will be discarded, at this time there is no problem for the data to be received in my opinion, and I would like it to be so. When vethB sends data, the kernel stack is sharded according to the MTU of vethB. Toke Høiland-Jørgensen <toke@kernel.org> 于2024年8月8日周四 17:23写道: > > Duan Jiong <djduanjiong@gmail.com> writes: > > > When the mtu of the veth card is not the same at both ends, there > > is no need to check the mtu when forwarding packets, and it should > > be a permissible behavior to allow receiving packets with larger > > mtu than your own. > > Erm, huh? The MTU check is against the receiving interface, so AFAICT > your patch just disables MTU checking entirely for veth devices, which > seems ill-advised. > > What's the issue you are trying to fix, exactly? > > -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-08 9:55 ` Duan Jiong @ 2024-08-08 10:21 ` Toke Høiland-Jørgensen [not found] ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark> 0 siblings, 1 reply; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-08 10:21 UTC (permalink / raw) To: Duan Jiong; +Cc: netdev Duan Jiong <djduanjiong@gmail.com> writes: > Now there are veth device pairs, one vethA on the host side and one > vethB on the container side, when vethA sends data to vethB, at this > time if vethB is configured with a smaller MTU then the data will be > discarded, at this time there is no problem for the data to be > received in my opinion, and I would like it to be so. When vethB sends > data, the kernel stack is sharded according to the MTU of vethB. Well, yes, if the MTU is too small, the packet will be dropped. That's what the MTU setting is for :) If you want to send bigger packets just, y'know, raise the MTU? What is the reason you can't do that? -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark>]
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets [not found] ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark> @ 2024-08-08 12:24 ` Toke Høiland-Jørgensen 2024-08-08 19:38 ` Willem de Bruijn 0 siblings, 1 reply; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-08 12:24 UTC (permalink / raw) To: djduanjiong; +Cc: netdev djduanjiong@gmail.com writes: > This is similar to a virtual machine that receives packets larger than > its own mtu, regardless of the mtu configured in the guest. Or, to > put it another way, what are the negative effects of this change? Well, it's changing long-standing behaviour (the MTU check has been there throughout the history of veth). Changing it will mean that applications that set the MTU and rely on the fact that they will never receive packets higher than the MTU, will potentially break in interesting ways. You still haven't answered what's keeping you from setting the MTU correctly on the veth devices you're using? -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-08 12:24 ` Toke Høiland-Jørgensen @ 2024-08-08 19:38 ` Willem de Bruijn 2024-08-09 9:47 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Willem de Bruijn @ 2024-08-08 19:38 UTC (permalink / raw) To: Toke Høiland-Jørgensen, djduanjiong; +Cc: netdev Toke Høiland-Jørgensen wrote: > djduanjiong@gmail.com writes: > > > This is similar to a virtual machine that receives packets larger than > > its own mtu, regardless of the mtu configured in the guest. Or, to > > put it another way, what are the negative effects of this change? > > Well, it's changing long-standing behaviour (the MTU check has been > there throughout the history of veth). Changing it will mean that > applications that set the MTU and rely on the fact that they will never > receive packets higher than the MTU, will potentially break in > interesting ways. That this works is very veth specific, though? In general this max *transfer* unit configuration makes no assurances on the size of packets arriving. Though posted rx buffer size does, for which veth has no equivalent. > You still haven't answered what's keeping you from setting the MTU > correctly on the veth devices you're using? Agreed that it has a risk, so some justification is in order. Similar to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF redirect to ingress") addressed a specific need. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-08 19:38 ` Willem de Bruijn @ 2024-08-09 9:47 ` Toke Høiland-Jørgensen 2024-08-12 9:11 ` Duan Jiong 0 siblings, 1 reply; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-09 9:47 UTC (permalink / raw) To: Willem de Bruijn, djduanjiong; +Cc: netdev Willem de Bruijn <willemdebruijn.kernel@gmail.com> writes: > Toke Høiland-Jørgensen wrote: >> djduanjiong@gmail.com writes: >> >> > This is similar to a virtual machine that receives packets larger than >> > its own mtu, regardless of the mtu configured in the guest. Or, to >> > put it another way, what are the negative effects of this change? >> >> Well, it's changing long-standing behaviour (the MTU check has been >> there throughout the history of veth). Changing it will mean that >> applications that set the MTU and rely on the fact that they will never >> receive packets higher than the MTU, will potentially break in >> interesting ways. > > That this works is very veth specific, though? > > In general this max *transfer* unit configuration makes no assurances > on the size of packets arriving. Though posted rx buffer size does, > for which veth has no equivalent. Well, in practice we do use the MTU to limit the RX size as well. See for instance the limit on MTU sizes if an XDP program without multibuffer support is loaded. And I don't think having an asymmetric MTU setting on a physical point-to-point Ethernet link generally works either. So in that sense it does make sense that veth has this limitation, given that it's basically emulating an ethernet wire. I do see your point that a virtual device doesn't really *have* to respect MTU, though. So if we were implementing a new driver this argument would be a lot easier to make. In fact, AFAICT the newer netkit driver doesn't check the MTU setting before forwarding, so there's already some inconsistency there. >> You still haven't answered what's keeping you from setting the MTU >> correctly on the veth devices you're using? > > Agreed that it has a risk, so some justification is in order. Similar > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF > redirect to ingress") addressed a specific need. Exactly :) And cf the above, using netkit may be an alternative that doesn't carry this risk (assuming that's compatible with the use case). -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-09 9:47 ` Toke Høiland-Jørgensen @ 2024-08-12 9:11 ` Duan Jiong 2024-08-13 11:40 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Duan Jiong @ 2024-08-12 9:11 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: Willem de Bruijn, netdev Sorry for responding to the email so late. > > I do see your point that a virtual device doesn't really *have* to > respect MTU, though. So if we were implementing a new driver this > argument would be a lot easier to make. In fact, AFAICT the newer netkit > driver doesn't check the MTU setting before forwarding, so there's > already some inconsistency there. > > >> You still haven't answered what's keeping you from setting the MTU > >> correctly on the veth devices you're using? > > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu 1500)---ovs---vm2(mtu 1600) My scenario is that two vms are communicating via ipsec vpn gateway, the two vpn gateways are interconnected via public network, the vpn gateway has only one NIC, single arm mode. vpn gateway mtu will be 1500 in general, but the packets sent by the vm's to the vpn gateway may be more than 1500, and at this time, if implemented according to the existing veth driver, the packets sent by the vm's will be discarded. If allowed to receive large packets, the vpn gateway can actually accept large packets then esp encapsulate them and then fragment so that in the end it doesn't affect the connectivity of the network. > > Agreed that it has a risk, so some justification is in order. Similar > > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF > > redirect to ingress") addressed a specific need. > > Exactly :) > > And cf the above, using netkit may be an alternative that doesn't carry > this risk (assuming that's compatible with the use case). > > -Toke I can see how there could be a potential risk here, can we consider adding a switchable option to control this behavior? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-12 9:11 ` Duan Jiong @ 2024-08-13 11:40 ` Toke Høiland-Jørgensen 2024-08-14 6:42 ` Duan Jiong 0 siblings, 1 reply; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-13 11:40 UTC (permalink / raw) To: Duan Jiong; +Cc: Willem de Bruijn, netdev Duan Jiong <djduanjiong@gmail.com> writes: > Sorry for responding to the email so late. > >> >> I do see your point that a virtual device doesn't really *have* to >> respect MTU, though. So if we were implementing a new driver this >> argument would be a lot easier to make. In fact, AFAICT the newer netkit >> driver doesn't check the MTU setting before forwarding, so there's >> already some inconsistency there. >> >> >> You still haven't answered what's keeping you from setting the MTU >> >> correctly on the veth devices you're using? >> > > > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu > 1500)---ovs---vm2(mtu 1600) Where's the veth device in this setup? > My scenario is that two vms are communicating via ipsec vpn gateway, > the two vpn gateways are interconnected via public network, the vpn > gateway has only one NIC, single arm mode. vpn gateway mtu will be > 1500 in general, but the packets sent by the vm's to the vpn gateway > may be more than 1500, and at this time, if implemented according to > the existing veth driver, the packets sent by the vm's will be > discarded. If allowed to receive large packets, the vpn gateway can > actually accept large packets then esp encapsulate them and then > fragment so that in the end it doesn't affect the connectivity of the > network. I'm not sure I quite get the setup; it sounds like you want a subset of the traffic to adhere to one MTU, and another subset to adhere to a different MTU, on the same interface? Could you not divide the traffic over two different interfaces (with different MTUs) instead? >> > Agreed that it has a risk, so some justification is in order. Similar >> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF >> > redirect to ingress") addressed a specific need. >> >> Exactly :) >> >> And cf the above, using netkit may be an alternative that doesn't carry >> this risk (assuming that's compatible with the use case). >> >> -Toke > > > I can see how there could be a potential risk here, can we consider > adding a switchable option to control this behavior? Hmm, a toggle has its own cost in terms of complexity and overhead. Plus it's adding new UAPI. It may be that this is the least bad option in the end, but before going that route we should be very sure that there's not another way to solve your problem (cf the above). This has been discussed before, BTW, most recently five-and-some years ago: https://patchwork.ozlabs.org/project/netdev/patch/CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com/ -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-13 11:40 ` Toke Høiland-Jørgensen @ 2024-08-14 6:42 ` Duan Jiong 2024-08-16 21:09 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 11+ messages in thread From: Duan Jiong @ 2024-08-14 6:42 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: Willem de Bruijn, netdev On Tue, Aug 13, 2024 at 7:40 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: > > Duan Jiong <djduanjiong@gmail.com> writes: > > >> > > > > > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu > > 1500)---ovs---vm2(mtu 1600) > > Where's the veth device in this setup? > The veth device is used for ipsec vpn containers to connect to ovs, and traffic before and after esp encapsulation goes to this NIC. > > My scenario is that two vms are communicating via ipsec vpn gateway, > > the two vpn gateways are interconnected via public network, the vpn > > gateway has only one NIC, single arm mode. vpn gateway mtu will be > > 1500 in general, but the packets sent by the vm's to the vpn gateway > > may be more than 1500, and at this time, if implemented according to > > the existing veth driver, the packets sent by the vm's will be > > discarded. If allowed to receive large packets, the vpn gateway can > > actually accept large packets then esp encapsulate them and then > > fragment so that in the end it doesn't affect the connectivity of the > > network. > > I'm not sure I quite get the setup; it sounds like you want a subset of > the traffic to adhere to one MTU, and another subset to adhere to a > different MTU, on the same interface? Could you not divide the traffic > over two different interfaces (with different MTUs) instead? > This is indeed a viable option, but it's not easy to change our own implementation right now, so we're just seeing if it's feasible to skip the veth mtu check. > >> > Agreed that it has a risk, so some justification is in order. Similar > >> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF > >> > redirect to ingress") addressed a specific need. > >> > >> Exactly :) > >> > >> And cf the above, using netkit may be an alternative that doesn't carry > >> this risk (assuming that's compatible with the use case). > >> > >> -Toke > > > > > > I can see how there could be a potential risk here, can we consider > > adding a switchable option to control this behavior? > > Hmm, a toggle has its own cost in terms of complexity and overhead. Plus > it's adding new UAPI. It may be that this is the least bad option in the > end, but before going that route we should be very sure that there's not > another way to solve your problem (cf the above). > > This has been discussed before, BTW, most recently five-and-some > years ago: > > https://patchwork.ozlabs.org/project/netdev/patch/CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com/ > > -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3] veth: Drop MTU check when forwarding packets 2024-08-14 6:42 ` Duan Jiong @ 2024-08-16 21:09 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 11+ messages in thread From: Toke Høiland-Jørgensen @ 2024-08-16 21:09 UTC (permalink / raw) To: Duan Jiong; +Cc: Willem de Bruijn, netdev Duan Jiong <djduanjiong@gmail.com> writes: > On Tue, Aug 13, 2024 at 7:40 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote: >> >> Duan Jiong <djduanjiong@gmail.com> writes: >> >> >> > >> > >> > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu >> > 1500)---ovs---vm2(mtu 1600) >> >> Where's the veth device in this setup? >> > > The veth device is used for ipsec vpn containers to connect to ovs, and > traffic before and after esp encapsulation goes to this NIC. > > >> > My scenario is that two vms are communicating via ipsec vpn gateway, >> > the two vpn gateways are interconnected via public network, the vpn >> > gateway has only one NIC, single arm mode. vpn gateway mtu will be >> > 1500 in general, but the packets sent by the vm's to the vpn gateway >> > may be more than 1500, and at this time, if implemented according to >> > the existing veth driver, the packets sent by the vm's will be >> > discarded. If allowed to receive large packets, the vpn gateway can >> > actually accept large packets then esp encapsulate them and then >> > fragment so that in the end it doesn't affect the connectivity of the >> > network. >> >> I'm not sure I quite get the setup; it sounds like you want a subset of >> the traffic to adhere to one MTU, and another subset to adhere to a >> different MTU, on the same interface? Could you not divide the traffic >> over two different interfaces (with different MTUs) instead? >> > > This is indeed a viable option, but it's not easy to change our own > implementation right now, so we're just seeing if it's feasible to skip > the veth mtu check. Hmm, well, that basically means asking the kernel community to take on an extra maintenance burden so that you won't have to do that yourself. And since there's a workaround, my inclination would be to say that the use case is not sufficiently compelling that it's worth doing so. However, I don't feel that strongly about it, and if others think it's worth adding a knob to disable the MTU check on forward, I won't object. -Toke ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-08-16 21:09 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08 7:04 [PATCH v3] veth: Drop MTU check when forwarding packets Duan Jiong
2024-08-08 9:23 ` Toke Høiland-Jørgensen
2024-08-08 9:55 ` Duan Jiong
2024-08-08 10:21 ` Toke Høiland-Jørgensen
[not found] ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark>
2024-08-08 12:24 ` Toke Høiland-Jørgensen
2024-08-08 19:38 ` Willem de Bruijn
2024-08-09 9:47 ` Toke Høiland-Jørgensen
2024-08-12 9:11 ` Duan Jiong
2024-08-13 11:40 ` Toke Høiland-Jørgensen
2024-08-14 6:42 ` Duan Jiong
2024-08-16 21:09 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).