[PATCH v3] veth: Drop MTU check when forwarding packets

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] veth: Drop MTU check when forwarding packets
@ 2024-08-08  7:04 Duan Jiong
  2024-08-08  9:23 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 11+ messages in thread
From: Duan Jiong @ 2024-08-08  7:04 UTC (permalink / raw)
  To: netdev; +Cc: Duan Jiong

When the mtu of the veth card is not the same at both ends, there
is no need to check the mtu when forwarding packets, and it should
be a permissible behavior to allow receiving packets with larger
mtu than your own.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
---
 drivers/net/veth.c        | 2 +-
 include/linux/netdevice.h | 1 +
 net/core/dev.c            | 6 ++++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 426e68a95067..f505fe2a55c1 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -317,7 +317,7 @@ static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb)
 static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb,
 			    struct veth_rq *rq, bool xdp)
 {
-	return __dev_forward_skb(dev, skb) ?: xdp ?
+	return __dev_forward_skb_nomtu(dev, skb) ?: xdp ?
 		veth_xdp_rx(rq, skb) :
 		__netif_rx(skb);
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d20c6c99eb88..8cee9b40e50e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3943,6 +3943,7 @@ int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 u8 dev_xdp_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
+int __dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb);
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index e1bb6d7856d9..acd740f78b1c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2190,6 +2190,12 @@ static int __dev_forward_skb2(struct net_device *dev, struct sk_buff *skb,
 	return ret;
 }
 
+int __dev_forward_skb_nomtu(struct net_device *dev, struct sk_buff *skb)
+{
+	return __dev_forward_skb2(dev, skb, false);
+}
+EXPORT_SYMBOL_GPL(__dev_forward_skb_nomtu);
+
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 {
 	return __dev_forward_skb2(dev, skb, true);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-08  7:04 [PATCH v3] veth: Drop MTU check when forwarding packets Duan Jiong
@ 2024-08-08  9:23 ` Toke Høiland-Jørgensen
  2024-08-08  9:55   ` Duan Jiong
  0 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-08  9:23 UTC (permalink / raw)
  To: Duan Jiong, netdev

Duan Jiong <djduanjiong@gmail.com> writes:

> When the mtu of the veth card is not the same at both ends, there
> is no need to check the mtu when forwarding packets, and it should
> be a permissible behavior to allow receiving packets with larger
> mtu than your own.

Erm, huh? The MTU check is against the receiving interface, so AFAICT
your patch just disables MTU checking entirely for veth devices, which
seems ill-advised.

What's the issue you are trying to fix, exactly?

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-08  9:23 ` Toke Høiland-Jørgensen
@ 2024-08-08  9:55   ` Duan Jiong
  2024-08-08 10:21     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 11+ messages in thread
From: Duan Jiong @ 2024-08-08  9:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev

Now there are veth device pairs, one vethA on the host side and one
vethB on the container side, when vethA sends data to vethB, at this
time if vethB is configured with a smaller MTU then the data will be
discarded, at this time there is no problem for the data to be
received in my opinion, and I would like it to be so. When vethB sends
data, the kernel stack is sharded according to the MTU of vethB.

Toke Høiland-Jørgensen <toke@kernel.org> 于2024年8月8日周四 17:23写道：
>
> Duan Jiong <djduanjiong@gmail.com> writes:
>
> > When the mtu of the veth card is not the same at both ends, there
> > is no need to check the mtu when forwarding packets, and it should
> > be a permissible behavior to allow receiving packets with larger
> > mtu than your own.
>
> Erm, huh? The MTU check is against the receiving interface, so AFAICT
> your patch just disables MTU checking entirely for veth devices, which
> seems ill-advised.
>
> What's the issue you are trying to fix, exactly?
>
> -Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-08  9:55   ` Duan Jiong
@ 2024-08-08 10:21     ` Toke Høiland-Jørgensen
       [not found]       ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark>
  0 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-08 10:21 UTC (permalink / raw)
  To: Duan Jiong; +Cc: netdev

Duan Jiong <djduanjiong@gmail.com> writes:

> Now there are veth device pairs, one vethA on the host side and one
> vethB on the container side, when vethA sends data to vethB, at this
> time if vethB is configured with a smaller MTU then the data will be
> discarded, at this time there is no problem for the data to be
> received in my opinion, and I would like it to be so. When vethB sends
> data, the kernel stack is sharded according to the MTU of vethB.

Well, yes, if the MTU is too small, the packet will be dropped. That's
what the MTU setting is for :)

If you want to send bigger packets just, y'know, raise the MTU? What is
the reason you can't do that?

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
       [not found]       ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark>
@ 2024-08-08 12:24         ` Toke Høiland-Jørgensen
  2024-08-08 19:38           ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-08 12:24 UTC (permalink / raw)
  To: djduanjiong; +Cc: netdev

djduanjiong@gmail.com writes:

> This is similar to a virtual machine that receives packets larger than
> its own mtu, regardless of the mtu configured in the guest.  Or, to
> put it another way, what are the negative effects of this change?

Well, it's changing long-standing behaviour (the MTU check has been
there throughout the history of veth). Changing it will mean that
applications that set the MTU and rely on the fact that they will never
receive packets higher than the MTU, will potentially break in
interesting ways.

You still haven't answered what's keeping you from setting the MTU
correctly on the veth devices you're using?

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-08 12:24         ` Toke Høiland-Jørgensen
@ 2024-08-08 19:38           ` Willem de Bruijn
  2024-08-09  9:47             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2024-08-08 19:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, djduanjiong; +Cc: netdev

Toke Høiland-Jørgensen wrote:
> djduanjiong@gmail.com writes:
> 
> > This is similar to a virtual machine that receives packets larger than
> > its own mtu, regardless of the mtu configured in the guest.  Or, to
> > put it another way, what are the negative effects of this change?
> 
> Well, it's changing long-standing behaviour (the MTU check has been
> there throughout the history of veth). Changing it will mean that
> applications that set the MTU and rely on the fact that they will never
> receive packets higher than the MTU, will potentially break in
> interesting ways.

That this works is very veth specific, though?

In general this max *transfer* unit configuration makes no assurances
on the size of packets arriving. Though posted rx buffer size does,
for which veth has no equivalent.
 
> You still haven't answered what's keeping you from setting the MTU
> correctly on the veth devices you're using?

Agreed that it has a risk, so some justification is in order. Similar
to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
redirect to ingress") addressed a specific need.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-08 19:38           ` Willem de Bruijn
@ 2024-08-09  9:47             ` Toke Høiland-Jørgensen
  2024-08-12  9:11               ` Duan Jiong
  0 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-09  9:47 UTC (permalink / raw)
  To: Willem de Bruijn, djduanjiong; +Cc: netdev

Willem de Bruijn <willemdebruijn.kernel@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> djduanjiong@gmail.com writes:
>> 
>> > This is similar to a virtual machine that receives packets larger than
>> > its own mtu, regardless of the mtu configured in the guest.  Or, to
>> > put it another way, what are the negative effects of this change?
>> 
>> Well, it's changing long-standing behaviour (the MTU check has been
>> there throughout the history of veth). Changing it will mean that
>> applications that set the MTU and rely on the fact that they will never
>> receive packets higher than the MTU, will potentially break in
>> interesting ways.
>
> That this works is very veth specific, though?
>
> In general this max *transfer* unit configuration makes no assurances
> on the size of packets arriving. Though posted rx buffer size does,
> for which veth has no equivalent.

Well, in practice we do use the MTU to limit the RX size as well. See
for instance the limit on MTU sizes if an XDP program without
multibuffer support is loaded. And I don't think having an asymmetric
MTU setting on a physical point-to-point Ethernet link generally works
either. So in that sense it does make sense that veth has this
limitation, given that it's basically emulating an ethernet wire.

I do see your point that a virtual device doesn't really *have* to
respect MTU, though. So if we were implementing a new driver this
argument would be a lot easier to make. In fact, AFAICT the newer netkit
driver doesn't check the MTU setting before forwarding, so there's
already some inconsistency there.

>> You still haven't answered what's keeping you from setting the MTU
>> correctly on the veth devices you're using?
>
> Agreed that it has a risk, so some justification is in order. Similar
> to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
> redirect to ingress") addressed a specific need.

Exactly :)

And cf the above, using netkit may be an alternative that doesn't carry
this risk (assuming that's compatible with the use case).

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-09  9:47             ` Toke Høiland-Jørgensen
@ 2024-08-12  9:11               ` Duan Jiong
  2024-08-13 11:40                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 11+ messages in thread
From: Duan Jiong @ 2024-08-12  9:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Willem de Bruijn, netdev

Sorry for responding to the email so late.

>
> I do see your point that a virtual device doesn't really *have* to
> respect MTU, though. So if we were implementing a new driver this
> argument would be a lot easier to make. In fact, AFAICT the newer netkit
> driver doesn't check the MTU setting before forwarding, so there's
> already some inconsistency there.
>
> >> You still haven't answered what's keeping you from setting the MTU
> >> correctly on the veth devices you're using?
> >

vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu
1500)---ovs---vm2(mtu 1600)

My scenario is that two vms are communicating via ipsec vpn gateway,
the two vpn gateways
are interconnected via public network, the vpn gateway has only one
NIC, single arm mode.
vpn gateway mtu will be 1500 in general, but the packets sent by the
vm's to the vpn gateway
may be more than 1500, and at this time, if implemented according to
the existing veth
driver, the packets sent by the vm's will be discarded. If allowed to
receive large packets,
the vpn gateway can actually accept large packets then esp encapsulate
them and then fragment
so that in the end it doesn't affect the connectivity of the network.

> > Agreed that it has a risk, so some justification is in order. Similar
> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
> > redirect to ingress") addressed a specific need.
>
> Exactly :)
>
> And cf the above, using netkit may be an alternative that doesn't carry
> this risk (assuming that's compatible with the use case).
>
> -Toke

I can see how there could be a potential risk here, can we consider
adding a switchable option
to control this behavior?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-12  9:11               ` Duan Jiong
@ 2024-08-13 11:40                 ` Toke Høiland-Jørgensen
  2024-08-14  6:42                   ` Duan Jiong
  0 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-13 11:40 UTC (permalink / raw)
  To: Duan Jiong; +Cc: Willem de Bruijn, netdev

Duan Jiong <djduanjiong@gmail.com> writes:

> Sorry for responding to the email so late.
>
>>
>> I do see your point that a virtual device doesn't really *have* to
>> respect MTU, though. So if we were implementing a new driver this
>> argument would be a lot easier to make. In fact, AFAICT the newer netkit
>> driver doesn't check the MTU setting before forwarding, so there's
>> already some inconsistency there.
>>
>> >> You still haven't answered what's keeping you from setting the MTU
>> >> correctly on the veth devices you're using?
>> >
>
> vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu
> 1500)---ovs---vm2(mtu 1600)

Where's the veth device in this setup?

> My scenario is that two vms are communicating via ipsec vpn gateway,
> the two vpn gateways are interconnected via public network, the vpn
> gateway has only one NIC, single arm mode. vpn gateway mtu will be
> 1500 in general, but the packets sent by the vm's to the vpn gateway
> may be more than 1500, and at this time, if implemented according to
> the existing veth driver, the packets sent by the vm's will be
> discarded. If allowed to receive large packets, the vpn gateway can
> actually accept large packets then esp encapsulate them and then
> fragment so that in the end it doesn't affect the connectivity of the
> network.

I'm not sure I quite get the setup; it sounds like you want a subset of
the traffic to adhere to one MTU, and another subset to adhere to a
different MTU, on the same interface? Could you not divide the traffic
over two different interfaces (with different MTUs) instead?

>> > Agreed that it has a risk, so some justification is in order. Similar
>> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
>> > redirect to ingress") addressed a specific need.
>>
>> Exactly :)
>>
>> And cf the above, using netkit may be an alternative that doesn't carry
>> this risk (assuming that's compatible with the use case).
>>
>> -Toke
>
>
> I can see how there could be a potential risk here, can we consider
> adding a switchable option to control this behavior?

Hmm, a toggle has its own cost in terms of complexity and overhead. Plus
it's adding new UAPI. It may be that this is the least bad option in the
end, but before going that route we should be very sure that there's not
another way to solve your problem (cf the above).

This has been discussed before, BTW, most recently five-and-some
years ago:

https://patchwork.ozlabs.org/project/netdev/patch/CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com/

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-13 11:40                 ` Toke Høiland-Jørgensen
@ 2024-08-14  6:42                   ` Duan Jiong
  2024-08-16 21:09                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 11+ messages in thread
From: Duan Jiong @ 2024-08-14  6:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Willem de Bruijn, netdev

On Tue, Aug 13, 2024 at 7:40 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> Duan Jiong <djduanjiong@gmail.com> writes:
>
> >> >
> >
> > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu
> > 1500)---ovs---vm2(mtu 1600)
>
> Where's the veth device in this setup?
>

The veth device is used for ipsec vpn containers to connect to ovs, and
traffic before and after esp encapsulation goes to this NIC.


> > My scenario is that two vms are communicating via ipsec vpn gateway,
> > the two vpn gateways are interconnected via public network, the vpn
> > gateway has only one NIC, single arm mode. vpn gateway mtu will be
> > 1500 in general, but the packets sent by the vm's to the vpn gateway
> > may be more than 1500, and at this time, if implemented according to
> > the existing veth driver, the packets sent by the vm's will be
> > discarded. If allowed to receive large packets, the vpn gateway can
> > actually accept large packets then esp encapsulate them and then
> > fragment so that in the end it doesn't affect the connectivity of the
> > network.
>
> I'm not sure I quite get the setup; it sounds like you want a subset of
> the traffic to adhere to one MTU, and another subset to adhere to a
> different MTU, on the same interface? Could you not divide the traffic
> over two different interfaces (with different MTUs) instead?
>

This is indeed a viable option, but it's not easy to change our own
implementation right now, so we're just seeing if it's feasible to skip
the veth mtu check.


> >> > Agreed that it has a risk, so some justification is in order. Similar
> >> > to how commit 5f7d57280c19 (" bpf: Drop MTU check when doing TC-BPF
> >> > redirect to ingress") addressed a specific need.
> >>
> >> Exactly :)
> >>
> >> And cf the above, using netkit may be an alternative that doesn't carry
> >> this risk (assuming that's compatible with the use case).
> >>
> >> -Toke
> >
> >
> > I can see how there could be a potential risk here, can we consider
> > adding a switchable option to control this behavior?
>
> Hmm, a toggle has its own cost in terms of complexity and overhead. Plus
> it's adding new UAPI. It may be that this is the least bad option in the
> end, but before going that route we should be very sure that there's not
> another way to solve your problem (cf the above).
>
> This has been discussed before, BTW, most recently five-and-some
> years ago:
>
> https://patchwork.ozlabs.org/project/netdev/patch/CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com/
>
> -Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] veth: Drop MTU check when forwarding packets
  2024-08-14  6:42                   ` Duan Jiong
@ 2024-08-16 21:09                     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-08-16 21:09 UTC (permalink / raw)
  To: Duan Jiong; +Cc: Willem de Bruijn, netdev

Duan Jiong <djduanjiong@gmail.com> writes:

> On Tue, Aug 13, 2024 at 7:40 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>>
>> Duan Jiong <djduanjiong@gmail.com> writes:
>>
>> >> >
>> >
>> > vm1(mtu 1600)---ovs---ipsec vpn1(mtu 1500)---ipsec vpn2(mtu
>> > 1500)---ovs---vm2(mtu 1600)
>>
>> Where's the veth device in this setup?
>>
>
> The veth device is used for ipsec vpn containers to connect to ovs, and
> traffic before and after esp encapsulation goes to this NIC.
>
>
>> > My scenario is that two vms are communicating via ipsec vpn gateway,
>> > the two vpn gateways are interconnected via public network, the vpn
>> > gateway has only one NIC, single arm mode. vpn gateway mtu will be
>> > 1500 in general, but the packets sent by the vm's to the vpn gateway
>> > may be more than 1500, and at this time, if implemented according to
>> > the existing veth driver, the packets sent by the vm's will be
>> > discarded. If allowed to receive large packets, the vpn gateway can
>> > actually accept large packets then esp encapsulate them and then
>> > fragment so that in the end it doesn't affect the connectivity of the
>> > network.
>>
>> I'm not sure I quite get the setup; it sounds like you want a subset of
>> the traffic to adhere to one MTU, and another subset to adhere to a
>> different MTU, on the same interface? Could you not divide the traffic
>> over two different interfaces (with different MTUs) instead?
>>
>
> This is indeed a viable option, but it's not easy to change our own
> implementation right now, so we're just seeing if it's feasible to skip
> the veth mtu check.

Hmm, well, that basically means asking the kernel community to take on
an extra maintenance burden so that you won't have to do that yourself.
And since there's a workaround, my inclination would be to say that the
use case is not sufficiently compelling that it's worth doing so.

However, I don't feel that strongly about it, and if others think it's
worth adding a knob to disable the MTU check on forward, I won't object.

-Toke

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-16 21:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08  7:04 [PATCH v3] veth: Drop MTU check when forwarding packets Duan Jiong
2024-08-08  9:23 ` Toke Høiland-Jørgensen
2024-08-08  9:55   ` Duan Jiong
2024-08-08 10:21     ` Toke Høiland-Jørgensen
     [not found]       ` <00f872ac-4f59-4857-9c50-2d87ed860d4f@Spark>
2024-08-08 12:24         ` Toke Høiland-Jørgensen
2024-08-08 19:38           ` Willem de Bruijn
2024-08-09  9:47             ` Toke Høiland-Jørgensen
2024-08-12  9:11               ` Duan Jiong
2024-08-13 11:40                 ` Toke Høiland-Jørgensen
2024-08-14  6:42                   ` Duan Jiong
2024-08-16 21:09                     ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).