From: Florian Westphal <fw@strlen.de>
To: "Henrik Lindström" <lindstrom515@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: macvtap performs IP defragmentation, causing MTU problems for virtual machines
Date: Mon, 2 Oct 2023 11:20:10 +0200 [thread overview]
Message-ID: <20231002092010.GA30843@breakpoint.cc> (raw)
In-Reply-To: <CAHkKap3sdN4wZm_euAZEyt3XB4bvr6cV-oAMGtrmrm5Z8biZ_Q@mail.gmail.com>
Henrik Lindström <lindstrom515@gmail.com> wrote:
> I found this old thread describing why macvlan does this:
> https://lore.kernel.org/netdev/4E8C89EE.3090600@candelatech.com/
> Interestingly, the problem described in that thread seems to be more
> general than macvlans, and i can still reproduce it by simply having
> multiple physical interfaces.
> So it looks like macvlans are being special-cased right now, as a
> workaround for a more general defragmentation problem?
Looks like it, maybe Eric remembers details here.
AFAIU however this issue isn't specific to macvlan, looks like some people
insist that receiving a fragmented multicast packet on n devices means we
should pass n defragmented packets up to the stack (we don't; ip defrag
will discard "duplicates").
There is a vif identifier for l3mdev sake (that did not exist back then),
we could use that as a discriminator for mcast case.
Something like this (totally untested):
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -479,11 +479,29 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
return err;
}
+static int ip_defrag_vif(const struct sk_buff *skb, const struct net_device *dev)
+{
+ int vif = l3mdev_master_ifindex_rcu(dev);
+
+ if (vif)
+ return vif;
+
+ /* some folks insist that receiving a fragmented mcast dgram on n devices shall
+ * result in n defragmented packets.
+ */
+ if (skb->pkt_type == PACKET_BROADCAST || skb->pkt_type == PACKET_MULTICAST) {
+ if (dev)
+ vif = dev->ifindex;
+ }
+
+ return 0;
+}
+
/* Process an incoming IP datagram fragment. */
int ip_defrag(struct net *net, struct sk_buff *skb, u32 user)
{
struct net_device *dev = skb->dev ? : skb_dst(skb)->dev;
- int vif = l3mdev_master_ifindex_rcu(dev);
+ int vif = ip_defrag_vif(skb, dev);
struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
... which should allow to remove the macvlan defrag step.
next prev parent reply other threads:[~2023-10-02 9:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-30 16:00 macvtap performs IP defragmentation, causing MTU problems for virtual machines Henrik Lindström
2023-10-02 9:20 ` Florian Westphal [this message]
2023-10-02 18:49 ` Henrik Lindström
2023-10-04 8:00 ` Florian Westphal
2023-10-05 17:25 ` Henrik Lindström
2023-10-06 6:06 ` Florian Westphal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231002092010.GA30843@breakpoint.cc \
--to=fw@strlen.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=lindstrom515@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.