* [PATCH net-next v3 0/3] Mitigate the two-reallocations issue for iptunnels
@ 2024-10-28 22:36 Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue Justin Iurman
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Justin Iurman @ 2024-10-28 22:36 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, linux-kernel,
justin.iurman
v3:
- fix compilation error in seg6_iptunnel
v2:
- add missing "static" keywords in seg6_iptunnel
- use a static-inline function to return the dev overhead (as suggested
by Olek, thanks)
The same pattern is found in ioam6, rpl6, and seg6. Basically, it first
makes sure there is enough room for inserting a new header:
(1) err = skb_cow_head(skb, len + skb->mac_len);
Then, when the insertion (encap or inline) is performed, the input and
output handlers respectively make sure there is enough room for layer 2:
(2) err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
skb_cow_head() does nothing when there is enough room. Otherwise, it
reallocates more room, which depends on the architecture. Briefly,
skb_cow_head() calls __skb_cow() which then calls pskb_expand_head() as
follows:
pskb_expand_head(skb, ALIGN(delta, NET_SKB_PAD), 0, GFP_ATOMIC);
"delta" represents the number of bytes to be added. This value is
aligned with NET_SKB_PAD, which is defined as follows:
NET_SKB_PAD = max(32, L1_CACHE_BYTES)
... where L1_CACHE_BYTES also depends on the architecture. In our case
(x86), it is defined as follows:
L1_CACHE_BYTES = (1 << CONFIG_X86_L1_CACHE_SHIFT)
... where (again, in our case) CONFIG_X86_L1_CACHE_SHIFT equals 6
(=X86_GENERIC).
All this to say, skb_cow_head() would reallocate to the next multiple of
NET_SKB_PAD (in our case a 64-byte multiple) when there is not enough
room.
Back to the main issue with the pattern: in some cases, two
reallocations are triggered, resulting in a performance drop (i.e.,
lines (1) and (2) would both trigger an implicit reallocation). How's
that possible? Well, this is kind of bad luck as we hit an exact
NET_SKB_PAD boundary and when skb->mac_len (=14) is smaller than
LL_RESERVED_SPACE(dst->dev) (=16 in our case). For an x86 arch, it
happens in the following cases (with the default needed_headroom):
- ioam6:
- (inline mode) pre-allocated data trace of 236 or 240 bytes
- (encap mode) pre-allocated data trace of 196 or 200 bytes
- seg6:
- (encap mode) for 13, 17, 21, 25, 29, 33, ...(+4)... prefixes
Let's illustrate the problem, i.e., when we fall on the exact
NET_SKB_PAD boundary. In the case of ioam6, for the above problematic
values, the total overhead is 256 bytes for both modes. Based on line
(1), skb->mac_len (=14) is added, therefore passing 270 bytes to
skb_cow_head(). At that moment, the headroom has 206 bytes available (in
our case). Since 270 > 206, skb_cow_head() performs a reallocation and
the new headroom is now 206 + 64 (NET_SKB_PAD) = 270. Which is exactly
the room we needed. After the insertion, the headroom has 0 byte
available. But, there's line (2) where 16 bytes are still needed. Which,
again, triggers another reallocation.
The same logic is applied to seg6 (although it does not happen with the
inline mode, i.e., -40 bytes). It happens with other L1 cache shifts too
(the larger the cache shift, the less often it happens). For example,
with a +32 cache shift (instead of +64), the following number of
segments would trigger two reallocations: 11, 15, 19, ... With a +128
cache shift, the following number of segments would trigger two
reallocations: 17, 25, 33, ... And so on and so forth. Note that it is
the same for both the "encap" and "l2encap" modes. For the "encap.red"
and "l2encap.red" modes, it is the same logic but with "segs+1" (e.g.,
14, 18, 22, 26, etc for a +64 cache shift). Note also that it may happen
with rpl6 (based on some calculations), although it did not in our case.
This series provides a solution to mitigate the aforementioned issue for
ioam6, seg6, and rpl6. It provides the dst_entry (in the cache) to
skb_cow_head() **before** the insertion (line (1)). As a result, the
very first iteration would still trigger two reallocations (i.e., empty
cache), while next iterations would only trigger a single reallocation.
Justin Iurman (3):
net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
net/ipv6/ioam6_iptunnel.c | 90 ++++++++++++++++--------------
net/ipv6/rpl_iptunnel.c | 67 ++++++++++++----------
net/ipv6/seg6_iptunnel.c | 114 ++++++++++++++++++++++----------------
3 files changed, 153 insertions(+), 118 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
2024-10-28 22:36 [PATCH net-next v3 0/3] Mitigate the two-reallocations issue for iptunnels Justin Iurman
@ 2024-10-28 22:36 ` Justin Iurman
2024-10-29 10:58 ` Vadim Fedorenko
2024-10-28 22:36 ` [PATCH net-next v3 2/3] net: ipv6: seg6_iptunnel: " Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 3/3] net: ipv6: rpl_iptunnel: " Justin Iurman
2 siblings, 1 reply; 5+ messages in thread
From: Justin Iurman @ 2024-10-28 22:36 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, linux-kernel,
justin.iurman
This patch mitigates the two-reallocations issue with ioam6_iptunnel by
providing the dst_entry (in the cache) to the first call to
skb_cow_head(). As a result, the very first iteration would still
trigger two reallocations (i.e., empty cache), while next iterations
would only trigger a single reallocation.
Performance tests before/after applying this patch, which clearly shows
the improvement:
- inline mode:
- before: https://ibb.co/LhQ8V63
- after: https://ibb.co/x5YT2bS
- encap mode:
- before: https://ibb.co/3Cjm5m0
- after: https://ibb.co/TwpsxTC
- encap mode with tunsrc:
- before: https://ibb.co/Gpy9QPg
- after: https://ibb.co/PW1bZFT
This patch also fixes an incorrect behavior: after the insertion, the
second call to skb_cow_head() makes sure that the dev has enough
headroom in the skb for layer 2 and stuff. In that case, the "old"
dst_entry was used, which is now fixed. After discussing with Paolo, it
appears that both patches can be merged into a single one -this one-
(for the sake of readability) and target net-next.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
net/ipv6/ioam6_iptunnel.c | 90 +++++++++++++++++++++------------------
1 file changed, 49 insertions(+), 41 deletions(-)
diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
index beb6b4cfc551..07bfd557e08a 100644
--- a/net/ipv6/ioam6_iptunnel.c
+++ b/net/ipv6/ioam6_iptunnel.c
@@ -254,15 +254,24 @@ static int ioam6_do_fill(struct net *net, struct sk_buff *skb)
return 0;
}
+static inline int dev_overhead(struct dst_entry *dst, struct sk_buff *skb)
+{
+ if (likely(dst))
+ return LL_RESERVED_SPACE(dst->dev);
+
+ return skb->mac_len;
+}
+
static int ioam6_do_inline(struct net *net, struct sk_buff *skb,
- struct ioam6_lwt_encap *tuninfo)
+ struct ioam6_lwt_encap *tuninfo,
+ struct dst_entry *dst)
{
struct ipv6hdr *oldhdr, *hdr;
int hdrlen, err;
hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
- err = skb_cow_head(skb, hdrlen + skb->mac_len);
+ err = skb_cow_head(skb, hdrlen + dev_overhead(dst, skb));
if (unlikely(err))
return err;
@@ -293,16 +302,16 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
struct ioam6_lwt_encap *tuninfo,
bool has_tunsrc,
struct in6_addr *tunsrc,
- struct in6_addr *tundst)
+ struct in6_addr *tundst,
+ struct dst_entry *dst)
{
- struct dst_entry *dst = skb_dst(skb);
struct ipv6hdr *hdr, *inner_hdr;
int hdrlen, len, err;
hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
len = sizeof(*hdr) + hdrlen;
- err = skb_cow_head(skb, len + skb->mac_len);
+ err = skb_cow_head(skb, len + dev_overhead(dst, skb));
if (unlikely(err))
return err;
@@ -326,7 +335,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
if (has_tunsrc)
memcpy(&hdr->saddr, tunsrc, sizeof(*tunsrc));
else
- ipv6_dev_get_saddr(net, dst->dev, &hdr->daddr,
+ ipv6_dev_get_saddr(net, skb_dst(skb)->dev, &hdr->daddr,
IPV6_PREFER_SRC_PUBLIC, &hdr->saddr);
skb_postpush_rcsum(skb, hdr, len);
@@ -336,7 +345,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct dst_entry *dst = skb_dst(skb);
+ struct dst_entry *dst, *orig_dst = skb_dst(skb);
struct in6_addr orig_daddr;
struct ioam6_lwt *ilwt;
int err = -EINVAL;
@@ -345,7 +354,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
- ilwt = ioam6_lwt_state(dst->lwtstate);
+ ilwt = ioam6_lwt_state(orig_dst->lwtstate);
/* Check for insertion frequency (i.e., "k over n" insertions) */
pkt_cnt = atomic_fetch_inc(&ilwt->pkt_cnt);
@@ -354,6 +363,10 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
orig_daddr = ipv6_hdr(skb)->daddr;
+ local_bh_disable();
+ dst = dst_cache_get(&ilwt->cache);
+ local_bh_enable();
+
switch (ilwt->mode) {
case IOAM6_IPTUNNEL_MODE_INLINE:
do_inline:
@@ -361,7 +374,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP)
goto out;
- err = ioam6_do_inline(net, skb, &ilwt->tuninfo);
+ err = ioam6_do_inline(net, skb, &ilwt->tuninfo, dst);
if (unlikely(err))
goto drop;
@@ -371,7 +384,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
/* Encapsulation (ip6ip6) */
err = ioam6_do_encap(net, skb, &ilwt->tuninfo,
ilwt->has_tunsrc, &ilwt->tunsrc,
- &ilwt->tundst);
+ &ilwt->tundst, dst);
if (unlikely(err))
goto drop;
@@ -389,45 +402,40 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
goto drop;
}
- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
- if (unlikely(err))
- goto drop;
+ if (unlikely(!dst)) {
+ struct ipv6hdr *hdr = ipv6_hdr(skb);
+ struct flowi6 fl6;
+
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.daddr = hdr->daddr;
+ fl6.saddr = hdr->saddr;
+ fl6.flowlabel = ip6_flowinfo(hdr);
+ fl6.flowi6_mark = skb->mark;
+ fl6.flowi6_proto = hdr->nexthdr;
+
+ dst = ip6_route_output(net, NULL, &fl6);
+ if (dst->error) {
+ err = dst->error;
+ dst_release(dst);
+ goto drop;
+ }
- if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
local_bh_disable();
- dst = dst_cache_get(&ilwt->cache);
+ dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
local_bh_enable();
- if (unlikely(!dst)) {
- struct ipv6hdr *hdr = ipv6_hdr(skb);
- struct flowi6 fl6;
-
- memset(&fl6, 0, sizeof(fl6));
- fl6.daddr = hdr->daddr;
- fl6.saddr = hdr->saddr;
- fl6.flowlabel = ip6_flowinfo(hdr);
- fl6.flowi6_mark = skb->mark;
- fl6.flowi6_proto = hdr->nexthdr;
-
- dst = ip6_route_output(net, NULL, &fl6);
- if (dst->error) {
- err = dst->error;
- dst_release(dst);
- goto drop;
- }
-
- local_bh_disable();
- dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
- local_bh_enable();
- }
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ goto drop;
+ }
- skb_dst_drop(skb);
- skb_dst_set(skb, dst);
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+ if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr))
return dst_output(net, sk, skb);
- }
out:
- return dst->lwtstate->orig_output(net, sk, skb);
+ return orig_dst->lwtstate->orig_output(net, sk, skb);
drop:
kfree_skb(skb);
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v3 2/3] net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
2024-10-28 22:36 [PATCH net-next v3 0/3] Mitigate the two-reallocations issue for iptunnels Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue Justin Iurman
@ 2024-10-28 22:36 ` Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 3/3] net: ipv6: rpl_iptunnel: " Justin Iurman
2 siblings, 0 replies; 5+ messages in thread
From: Justin Iurman @ 2024-10-28 22:36 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, linux-kernel,
justin.iurman, David Lebrun
This patch mitigates the two-reallocations issue with seg6_iptunnel by
providing the dst_entry (in the cache) to the first call to
skb_cow_head(). As a result, the very first iteration would still
trigger two reallocations (i.e., empty cache), while next iterations
would only trigger a single reallocation.
Performance tests before/after applying this patch, which clearly shows
the improvement:
- before: https://ibb.co/3Cg4sNH
- after: https://ibb.co/8rQ350r
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Cc: David Lebrun <dlebrun@google.com>
---
net/ipv6/seg6_iptunnel.c | 114 ++++++++++++++++++++++-----------------
1 file changed, 66 insertions(+), 48 deletions(-)
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 098632adc9b5..1897e1338bb8 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -124,11 +124,18 @@ static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
return flowlabel;
}
-/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+static inline int dev_overhead(struct dst_entry *dst, struct sk_buff *skb)
+{
+ if (likely(dst))
+ return LL_RESERVED_SPACE(dst->dev);
+
+ return skb->mac_len;
+}
+
+static int __seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
+ int proto, struct dst_entry *dst)
{
- struct dst_entry *dst = skb_dst(skb);
- struct net *net = dev_net(dst->dev);
+ struct net *net = dev_net(skb_dst(skb)->dev);
struct ipv6hdr *hdr, *inner_hdr;
struct ipv6_sr_hdr *isrh;
int hdrlen, tot_len, err;
@@ -137,7 +144,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
hdrlen = (osrh->hdrlen + 1) << 3;
tot_len = hdrlen + sizeof(*hdr);
- err = skb_cow_head(skb, tot_len + skb->mac_len);
+ err = skb_cow_head(skb, tot_len + dev_overhead(dst, skb));
if (unlikely(err))
return err;
@@ -181,7 +188,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
isrh->nexthdr = proto;
hdr->daddr = isrh->segments[isrh->first_segment];
- set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr);
+ set_tun_src(net, skb_dst(skb)->dev, &hdr->daddr, &hdr->saddr);
#ifdef CONFIG_IPV6_SEG6_HMAC
if (sr_has_hmac(isrh)) {
@@ -197,15 +204,21 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
return 0;
}
+
+/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
+int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+{
+ return __seg6_do_srh_encap(skb, osrh, proto, NULL);
+}
EXPORT_SYMBOL_GPL(seg6_do_srh_encap);
/* encapsulate an IPv6 packet within an outer IPv6 header with reduced SRH */
static int seg6_do_srh_encap_red(struct sk_buff *skb,
- struct ipv6_sr_hdr *osrh, int proto)
+ struct ipv6_sr_hdr *osrh, int proto,
+ struct dst_entry *dst)
{
__u8 first_seg = osrh->first_segment;
- struct dst_entry *dst = skb_dst(skb);
- struct net *net = dev_net(dst->dev);
+ struct net *net = dev_net(skb_dst(skb)->dev);
struct ipv6hdr *hdr, *inner_hdr;
int hdrlen = ipv6_optlen(osrh);
int red_tlv_offset, tlv_offset;
@@ -230,7 +243,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
tot_len = red_hdrlen + sizeof(struct ipv6hdr);
- err = skb_cow_head(skb, tot_len + skb->mac_len);
+ err = skb_cow_head(skb, tot_len + dev_overhead(dst, skb));
if (unlikely(err))
return err;
@@ -263,7 +276,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
if (skip_srh) {
hdr->nexthdr = proto;
- set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr);
+ set_tun_src(net, skb_dst(skb)->dev, &hdr->daddr, &hdr->saddr);
goto out;
}
@@ -299,7 +312,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
srcaddr:
isrh->nexthdr = proto;
- set_tun_src(net, dst->dev, &hdr->daddr, &hdr->saddr);
+ set_tun_src(net, skb_dst(skb)->dev, &hdr->daddr, &hdr->saddr);
#ifdef CONFIG_IPV6_SEG6_HMAC
if (unlikely(!skip_srh && sr_has_hmac(isrh))) {
@@ -317,8 +330,8 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
return 0;
}
-/* insert an SRH within an IPv6 packet, just after the IPv6 header */
-int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+static int __seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
+ struct dst_entry *dst)
{
struct ipv6hdr *hdr, *oldhdr;
struct ipv6_sr_hdr *isrh;
@@ -326,7 +339,7 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
hdrlen = (osrh->hdrlen + 1) << 3;
- err = skb_cow_head(skb, hdrlen + skb->mac_len);
+ err = skb_cow_head(skb, hdrlen + dev_overhead(dst, skb));
if (unlikely(err))
return err;
@@ -369,22 +382,20 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
return 0;
}
-EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
-static int seg6_do_srh(struct sk_buff *skb)
+static int seg6_do_srh(struct sk_buff *skb, struct dst_entry *dst)
{
- struct dst_entry *dst = skb_dst(skb);
struct seg6_iptunnel_encap *tinfo;
int proto, err = 0;
- tinfo = seg6_encap_lwtunnel(dst->lwtstate);
+ tinfo = seg6_encap_lwtunnel(skb_dst(skb)->lwtstate);
switch (tinfo->mode) {
case SEG6_IPTUN_MODE_INLINE:
if (skb->protocol != htons(ETH_P_IPV6))
return -EINVAL;
- err = seg6_do_srh_inline(skb, tinfo->srh);
+ err = __seg6_do_srh_inline(skb, tinfo->srh, dst);
if (err)
return err;
break;
@@ -402,9 +413,9 @@ static int seg6_do_srh(struct sk_buff *skb)
return -EINVAL;
if (tinfo->mode == SEG6_IPTUN_MODE_ENCAP)
- err = seg6_do_srh_encap(skb, tinfo->srh, proto);
+ err = __seg6_do_srh_encap(skb, tinfo->srh, proto, dst);
else
- err = seg6_do_srh_encap_red(skb, tinfo->srh, proto);
+ err = seg6_do_srh_encap_red(skb, tinfo->srh, proto, dst);
if (err)
return err;
@@ -425,11 +436,11 @@ static int seg6_do_srh(struct sk_buff *skb)
skb_push(skb, skb->mac_len);
if (tinfo->mode == SEG6_IPTUN_MODE_L2ENCAP)
- err = seg6_do_srh_encap(skb, tinfo->srh,
- IPPROTO_ETHERNET);
+ err = __seg6_do_srh_encap(skb, tinfo->srh,
+ IPPROTO_ETHERNET, dst);
else
err = seg6_do_srh_encap_red(skb, tinfo->srh,
- IPPROTO_ETHERNET);
+ IPPROTO_ETHERNET, dst);
if (err)
return err;
@@ -444,6 +455,13 @@ static int seg6_do_srh(struct sk_buff *skb)
return 0;
}
+/* insert an SRH within an IPv6 packet, just after the IPv6 header */
+int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+{
+ return __seg6_do_srh_inline(skb, osrh, NULL);
+}
+EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
+
static int seg6_input_finish(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
@@ -453,36 +471,37 @@ static int seg6_input_finish(struct net *net, struct sock *sk,
static int seg6_input_core(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
- struct dst_entry *orig_dst = skb_dst(skb);
- struct dst_entry *dst = NULL;
+ struct dst_entry *dst;
struct seg6_lwt *slwt;
int err;
- err = seg6_do_srh(skb);
- if (unlikely(err))
- goto drop;
-
- slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+ slwt = seg6_lwt_lwtunnel(skb_dst(skb)->lwtstate);
local_bh_disable();
dst = dst_cache_get(&slwt->cache);
+ local_bh_enable();
+
+ err = seg6_do_srh(skb, dst);
+ if (unlikely(err))
+ goto drop;
if (!dst) {
ip6_route_input(skb);
dst = skb_dst(skb);
if (!dst->error) {
+ local_bh_disable();
dst_cache_set_ip6(&slwt->cache, dst,
&ipv6_hdr(skb)->saddr);
+ local_bh_enable();
}
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ goto drop;
} else {
skb_dst_drop(skb);
skb_dst_set(skb, dst);
}
- local_bh_enable();
-
- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
- if (unlikely(err))
- goto drop;
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
@@ -523,21 +542,20 @@ static int seg6_input(struct sk_buff *skb)
static int seg6_output_core(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
- struct dst_entry *orig_dst = skb_dst(skb);
- struct dst_entry *dst = NULL;
+ struct dst_entry *dst;
struct seg6_lwt *slwt;
int err;
- err = seg6_do_srh(skb);
- if (unlikely(err))
- goto drop;
-
- slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+ slwt = seg6_lwt_lwtunnel(skb_dst(skb)->lwtstate);
local_bh_disable();
dst = dst_cache_get(&slwt->cache);
local_bh_enable();
+ err = seg6_do_srh(skb, dst);
+ if (unlikely(err))
+ goto drop;
+
if (unlikely(!dst)) {
struct ipv6hdr *hdr = ipv6_hdr(skb);
struct flowi6 fl6;
@@ -559,15 +577,15 @@ static int seg6_output_core(struct net *net, struct sock *sk,
local_bh_disable();
dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
local_bh_enable();
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ goto drop;
}
skb_dst_drop(skb);
skb_dst_set(skb, dst);
- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
- if (unlikely(err))
- goto drop;
-
if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb,
NULL, skb_dst(skb)->dev, dst_output);
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v3 3/3] net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
2024-10-28 22:36 [PATCH net-next v3 0/3] Mitigate the two-reallocations issue for iptunnels Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 2/3] net: ipv6: seg6_iptunnel: " Justin Iurman
@ 2024-10-28 22:36 ` Justin Iurman
2 siblings, 0 replies; 5+ messages in thread
From: Justin Iurman @ 2024-10-28 22:36 UTC (permalink / raw)
To: netdev
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, linux-kernel,
justin.iurman, Alexander Aring
This patch mitigates the two-reallocations issue with rpl_iptunnel by
providing the dst_entry (in the cache) to the first call to
skb_cow_head(). As a result, the very first iteration would still
trigger two reallocations (i.e., empty cache), while next iterations
would only trigger a single reallocation.
Performance tests before/after applying this patch, which clearly shows
there is no impact (it even shows improvement):
- before: https://ibb.co/nQJhqwc
- after: https://ibb.co/4ZvW6wV
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Cc: Alexander Aring <aahringo@redhat.com>
---
net/ipv6/rpl_iptunnel.c | 67 +++++++++++++++++++++++------------------
1 file changed, 38 insertions(+), 29 deletions(-)
diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index db3c19a42e1c..c518728460a2 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -124,8 +124,17 @@ static void rpl_destroy_state(struct lwtunnel_state *lwt)
dst_cache_destroy(&rpl_lwt_lwtunnel(lwt)->cache);
}
+static inline int dev_overhead(struct dst_entry *dst, struct sk_buff *skb)
+{
+ if (likely(dst))
+ return LL_RESERVED_SPACE(dst->dev);
+
+ return skb->mac_len;
+}
+
static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
- const struct ipv6_rpl_sr_hdr *srh)
+ const struct ipv6_rpl_sr_hdr *srh,
+ struct dst_entry *dst)
{
struct ipv6_rpl_sr_hdr *isrh, *csrh;
const struct ipv6hdr *oldhdr;
@@ -153,7 +162,7 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
hdrlen = ((csrh->hdrlen + 1) << 3);
- err = skb_cow_head(skb, hdrlen + skb->mac_len);
+ err = skb_cow_head(skb, hdrlen + dev_overhead(dst, skb));
if (unlikely(err)) {
kfree(buf);
return err;
@@ -186,36 +195,35 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
return 0;
}
-static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt)
+static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+ struct dst_entry *dst)
{
- struct dst_entry *dst = skb_dst(skb);
struct rpl_iptunnel_encap *tinfo;
if (skb->protocol != htons(ETH_P_IPV6))
return -EINVAL;
- tinfo = rpl_encap_lwtunnel(dst->lwtstate);
+ tinfo = rpl_encap_lwtunnel(skb_dst(skb)->lwtstate);
- return rpl_do_srh_inline(skb, rlwt, tinfo->srh);
+ return rpl_do_srh_inline(skb, rlwt, tinfo->srh, dst);
}
static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
- struct dst_entry *orig_dst = skb_dst(skb);
- struct dst_entry *dst = NULL;
+ struct dst_entry *dst;
struct rpl_lwt *rlwt;
int err;
- rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
-
- err = rpl_do_srh(skb, rlwt);
- if (unlikely(err))
- goto drop;
+ rlwt = rpl_lwt_lwtunnel(skb_dst(skb)->lwtstate);
local_bh_disable();
dst = dst_cache_get(&rlwt->cache);
local_bh_enable();
+ err = rpl_do_srh(skb, rlwt, dst);
+ if (unlikely(err))
+ goto drop;
+
if (unlikely(!dst)) {
struct ipv6hdr *hdr = ipv6_hdr(skb);
struct flowi6 fl6;
@@ -237,15 +245,15 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
local_bh_disable();
dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
local_bh_enable();
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ goto drop;
}
skb_dst_drop(skb);
skb_dst_set(skb, dst);
- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
- if (unlikely(err))
- goto drop;
-
return dst_output(net, sk, skb);
drop:
@@ -255,36 +263,37 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
static int rpl_input(struct sk_buff *skb)
{
- struct dst_entry *orig_dst = skb_dst(skb);
- struct dst_entry *dst = NULL;
+ struct dst_entry *dst;
struct rpl_lwt *rlwt;
int err;
- rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
-
- err = rpl_do_srh(skb, rlwt);
- if (unlikely(err))
- goto drop;
+ rlwt = rpl_lwt_lwtunnel(skb_dst(skb)->lwtstate);
local_bh_disable();
dst = dst_cache_get(&rlwt->cache);
+ local_bh_enable();
+
+ err = rpl_do_srh(skb, rlwt, dst);
+ if (unlikely(err))
+ goto drop;
if (!dst) {
ip6_route_input(skb);
dst = skb_dst(skb);
if (!dst->error) {
+ local_bh_disable();
dst_cache_set_ip6(&rlwt->cache, dst,
&ipv6_hdr(skb)->saddr);
+ local_bh_enable();
}
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ goto drop;
} else {
skb_dst_drop(skb);
skb_dst_set(skb, dst);
}
- local_bh_enable();
-
- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
- if (unlikely(err))
- goto drop;
return dst_input(skb);
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
2024-10-28 22:36 ` [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue Justin Iurman
@ 2024-10-29 10:58 ` Vadim Fedorenko
0 siblings, 0 replies; 5+ messages in thread
From: Vadim Fedorenko @ 2024-10-29 10:58 UTC (permalink / raw)
To: Justin Iurman
Cc: davem, dsahern, edumazet, kuba, netdev, pabeni, horms,
linux-kernel
On 28/10/2024 22:36, Justin Iurman wrote:
> This patch mitigates the two-reallocations issue with ioam6_iptunnel by
> providing the dst_entry (in the cache) to the first call to
> skb_cow_head(). As a result, the very first iteration would still
> trigger two reallocations (i.e., empty cache), while next iterations
> would only trigger a single reallocation.
>
> Performance tests before/after applying this patch, which clearly shows
> the improvement:
> - inline mode:
> - before: https://ibb.co/LhQ8V63
> - after: https://ibb.co/x5YT2bS
> - encap mode:
> - before: https://ibb.co/3Cjm5m0
> - after: https://ibb.co/TwpsxTC
> - encap mode with tunsrc:
> - before: https://ibb.co/Gpy9QPg
> - after: https://ibb.co/PW1bZFT
>
> This patch also fixes an incorrect behavior: after the insertion, the
> second call to skb_cow_head() makes sure that the dev has enough
> headroom in the skb for layer 2 and stuff. In that case, the "old"
> dst_entry was used, which is now fixed. After discussing with Paolo, it
> appears that both patches can be merged into a single one -this one-
> (for the sake of readability) and target net-next.
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> net/ipv6/ioam6_iptunnel.c | 90 +++++++++++++++++++++------------------
> 1 file changed, 49 insertions(+), 41 deletions(-)
>
> diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
> index beb6b4cfc551..07bfd557e08a 100644
> --- a/net/ipv6/ioam6_iptunnel.c
> +++ b/net/ipv6/ioam6_iptunnel.c
> @@ -254,15 +254,24 @@ static int ioam6_do_fill(struct net *net, struct sk_buff *skb)
> return 0;
> }
>
> +static inline int dev_overhead(struct dst_entry *dst, struct sk_buff *skb)
> +{
> + if (likely(dst))
> + return LL_RESERVED_SPACE(dst->dev);
> +
> + return skb->mac_len;
> +}
static inline functions in .c files are not welcome.
consider to move this helper to some header, probably dev.h or dst.h
and reuse it in other tunnels.
And please honor 24h rule before the next submission.
> static int ioam6_do_inline(struct net *net, struct sk_buff *skb,
> - struct ioam6_lwt_encap *tuninfo)
> + struct ioam6_lwt_encap *tuninfo,
> + struct dst_entry *dst)
> {
> struct ipv6hdr *oldhdr, *hdr;
> int hdrlen, err;
>
> hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
>
> - err = skb_cow_head(skb, hdrlen + skb->mac_len);
> + err = skb_cow_head(skb, hdrlen + dev_overhead(dst, skb));
> if (unlikely(err))
> return err;
>
[.. snip ..]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-29 10:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 22:36 [PATCH net-next v3 0/3] Mitigate the two-reallocations issue for iptunnels Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 1/3] net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue Justin Iurman
2024-10-29 10:58 ` Vadim Fedorenko
2024-10-28 22:36 ` [PATCH net-next v3 2/3] net: ipv6: seg6_iptunnel: " Justin Iurman
2024-10-28 22:36 ` [PATCH net-next v3 3/3] net: ipv6: rpl_iptunnel: " Justin Iurman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).