* [RFC net-next v3 1/2] IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
@ 2022-11-29 20:06 Coco Li
2022-11-29 20:06 ` [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path Coco Li
0 siblings, 1 reply; 5+ messages in thread
From: Coco Li @ 2022-11-29 20:06 UTC (permalink / raw)
To: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Michael Chan
Cc: netdev, Daisuke Nishimura, linux-kernel, Coco Li
IPv6/TCP and GRO stacks can build big TCP packets with an added
temporary Hop By Hop header.
Is GSO is not involved, then the temporary header needs to be removed in
the driver. This patch provides a generic helper for drivers that need
to modify their headers in place.
Signed-off-by: Coco Li <lixiaoyan@google.com>
---
include/net/ipv6.h | 35 +++++++++++++++++++++++++++++++++++
net/ipv6/ip6_offload.c | 27 ++++-----------------------
2 files changed, 39 insertions(+), 23 deletions(-)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d383c895592a..08adec74f067 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -500,6 +500,41 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
return jhdr->nexthdr;
}
+/* Return 0 if HBH header is successfully removed
+ * Or if HBH removal is unnecessary (packet is not big TCP)
+ * Return error to indicate dropping the packet
+ */
+static inline int ipv6_hopopt_jumbo_remove(struct sk_buff *skb)
+{
+ const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+ int nexthdr = ipv6_has_hopopt_jumbo(skb);
+ struct ipv6hdr *h6;
+
+ if (!nexthdr)
+ return 0;
+
+ if (skb_cow_head(skb, 0))
+ return -1;
+
+ /* Remove the HBH header.
+ * Layout: [Ethernet header][IPv6 header][HBH][L4 Header]
+ */
+ memmove(skb_mac_header(skb) + hophdr_len, skb_mac_header(skb),
+ skb_network_header(skb) - skb_mac_header(skb) +
+ sizeof(struct ipv6hdr));
+
+ if (unlikely(!pskb_may_pull(skb, hophdr_len)))
+ return -1;
+
+ skb->network_header += hophdr_len;
+ skb->mac_header += hophdr_len;
+
+ h6 = ipv6_hdr(skb);
+ h6->nexthdr = nexthdr;
+
+ return 0;
+}
+
static inline bool ipv6_accept_ra(struct inet6_dev *idev)
{
/* If forwarding is enabled, RA are not accepted unless the special
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 3ee345672849..00dc2e3b0184 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -77,7 +77,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
struct sk_buff *segs = ERR_PTR(-EINVAL);
struct ipv6hdr *ipv6h;
const struct net_offload *ops;
- int proto, nexthdr;
+ int proto, err;
struct frag_hdr *fptr;
unsigned int payload_len;
u8 *prevhdr;
@@ -87,28 +87,9 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
bool gso_partial;
skb_reset_network_header(skb);
- nexthdr = ipv6_has_hopopt_jumbo(skb);
- if (nexthdr) {
- const int hophdr_len = sizeof(struct hop_jumbo_hdr);
- int err;
-
- err = skb_cow_head(skb, 0);
- if (err < 0)
- return ERR_PTR(err);
-
- /* remove the HBH header.
- * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
- */
- memmove(skb_mac_header(skb) + hophdr_len,
- skb_mac_header(skb),
- ETH_HLEN + sizeof(struct ipv6hdr));
- skb->data += hophdr_len;
- skb->len -= hophdr_len;
- skb->network_header += hophdr_len;
- skb->mac_header += hophdr_len;
- ipv6h = (struct ipv6hdr *)skb->data;
- ipv6h->nexthdr = nexthdr;
- }
+ err = ipv6_hopopt_jumbo_remove(skb);
+ if (err)
+ return ERR_PTR(err);
nhoff = skb_network_header(skb) - skb_mac_header(skb);
if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
goto out;
--
2.38.1.584.g0f3c55d4c2-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path
2022-11-29 20:06 [RFC net-next v3 1/2] IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver Coco Li
@ 2022-11-29 20:06 ` Coco Li
2022-11-29 20:41 ` Michael Chan
0 siblings, 1 reply; 5+ messages in thread
From: Coco Li @ 2022-11-29 20:06 UTC (permalink / raw)
To: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Michael Chan
Cc: netdev, Daisuke Nishimura, linux-kernel, Coco Li
Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
for IPv6 traffic. See patch series:
'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
This reduces the number of packets traversing the networking stack and
should usually improves performance. However, it also inserts a
temporary Hop-by-hop IPv6 extension header.
Using the HBH header removal method in the previous path, the extra header
be removed in bnxt drivers to allow it to send big TCP packets (bigger
TSO packets) as well.
Tested:
Compiled locally
To further test functional correctness, update the GSO/GRO limit on the
physical NIC:
ip link set eth0 gso_max_size 181000
ip link set eth0 gro_max_size 181000
Note that if there are bonding or ipvan devices on top of the physical
NIC, their GSO sizes need to be updated as well.
Then, IPv6/TCP packets with sizes larger than 64k can be observed.
Big TCP functionality is tested by Michael, feature checks not yet.
Tested by Michael:
I've confirmed with our hardware team that this is supported by our
chips, and I've tested it up to gso_max_size of 524280. Thanks.
Tested-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Coco Li <lixiaoyan@google.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0fe164b42c5d..f144a5ef2e04 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_BUSY;
}
+ if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
+ goto tx_free;
+
length = skb->len;
len = skb_headlen(skb);
last_frag = skb_shinfo(skb)->nr_frags;
@@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
if (hdrlen > 64)
return false;
+
+ /* The ext header may be a hop-by-hop header inserted for
+ * big TCP purposes. This will be removed before sending
+ * from NIC, so do not count it.
+ */
+ if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
+ hdr_count++;
nexthdr = &hp->nexthdr;
start += hdrlen;
- hdr_count++;
}
if (nextp) {
/* Caller will check inner protocol */
@@ -13657,6 +13666,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev->features &= ~NETIF_F_LRO;
dev->priv_flags |= IFF_UNICAST_FLT;
+ netif_set_tso_max_size(dev, GSO_MAX_SIZE);
+
#ifdef CONFIG_BNXT_SRIOV
init_waitqueue_head(&bp->sriov_cfg_wait);
#endif
--
2.38.1.584.g0f3c55d4c2-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path
2022-11-29 20:06 ` [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path Coco Li
@ 2022-11-29 20:41 ` Michael Chan
2022-12-02 2:03 ` Coco Li
0 siblings, 1 reply; 5+ messages in thread
From: Michael Chan @ 2022-11-29 20:41 UTC (permalink / raw)
To: Coco Li
Cc: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Daisuke Nishimura,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]
On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 0fe164b42c5d..f144a5ef2e04 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> return NETDEV_TX_BUSY;
> }
>
> + if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> + goto tx_free;
> +
> length = skb->len;
> len = skb_headlen(skb);
> last_frag = skb_shinfo(skb)->nr_frags;
> @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
>
> if (hdrlen > 64)
> return false;
> +
> + /* The ext header may be a hop-by-hop header inserted for
> + * big TCP purposes. This will be removed before sending
> + * from NIC, so do not count it.
> + */
> + if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
To be more efficient, why not just check the header's tlv_type here
instead of calling ipv6_has_hopopt_jumbo()?
> + hdr_count++;
> nexthdr = &hp->nexthdr;
> start += hdrlen;
> - hdr_count++;
> }
> if (nextp) {
> /* Caller will check inner protocol */
> @@ -13657,6 +13666,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> dev->features &= ~NETIF_F_LRO;
> dev->priv_flags |= IFF_UNICAST_FLT;
>
> + netif_set_tso_max_size(dev, GSO_MAX_SIZE);
> +
> #ifdef CONFIG_BNXT_SRIOV
> init_waitqueue_head(&bp->sriov_cfg_wait);
> #endif
> --
> 2.38.1.584.g0f3c55d4c2-goog
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path
2022-11-29 20:41 ` Michael Chan
@ 2022-12-02 2:03 ` Coco Li
2022-12-02 5:56 ` Michael Chan
0 siblings, 1 reply; 5+ messages in thread
From: Coco Li @ 2022-12-02 2:03 UTC (permalink / raw)
To: Michael Chan
Cc: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Daisuke Nishimura,
linux-kernel
On Tue, Nov 29, 2022 at 12:42 PM Michael Chan <michael.chan@broadcom.com> wrote:
>
> On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 0fe164b42c5d..f144a5ef2e04 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > return NETDEV_TX_BUSY;
> > }
> >
> > + if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> > + goto tx_free;
> > +
> > length = skb->len;
> > len = skb_headlen(skb);
> > last_frag = skb_shinfo(skb)->nr_frags;
> > @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
> >
> > if (hdrlen > 64)
> > return false;
> > +
> > + /* The ext header may be a hop-by-hop header inserted for
> > + * big TCP purposes. This will be removed before sending
> > + * from NIC, so do not count it.
> > + */
> > + if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
>
> To be more efficient, why not just check the header's tlv_type here
> instead of calling ipv6_has_hopopt_jumbo()?
>
It may be possible that the next header is Hop_by_hop but the packet
is not tcp, meaning that it would not be removed and we'd still want
to count this header towards the limit.
ipv6_has_hopopt_jumbo checks for the big tcp case (gso, skb len
reaches a certain size) particularly.
> > + hdr_count++;
> > nexthdr = &hp->nexthdr;
> > start += hdrlen;
> > - hdr_count++;
> > }
> > if (nextp) {
> > /* Caller will check inner protocol */
> > @@ -13657,6 +13666,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> > dev->features &= ~NETIF_F_LRO;
> > dev->priv_flags |= IFF_UNICAST_FLT;
> >
> > + netif_set_tso_max_size(dev, GSO_MAX_SIZE);
> > +
> > #ifdef CONFIG_BNXT_SRIOV
> > init_waitqueue_head(&bp->sriov_cfg_wait);
> > #endif
> > --
> > 2.38.1.584.g0f3c55d4c2-goog
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path
2022-12-02 2:03 ` Coco Li
@ 2022-12-02 5:56 ` Michael Chan
0 siblings, 0 replies; 5+ messages in thread
From: Michael Chan @ 2022-12-02 5:56 UTC (permalink / raw)
To: Coco Li
Cc: David S. Miller, Hideaki YOSHIFUJI, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Daisuke Nishimura,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]
On Thu, Dec 1, 2022 at 6:03 PM Coco Li <lixiaoyan@google.com> wrote:
>
> On Tue, Nov 29, 2022 at 12:42 PM Michael Chan <michael.chan@broadcom.com> wrote:
> >
> > On Tue, Nov 29, 2022 at 12:07 PM Coco Li <lixiaoyan@google.com> wrote:
> > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > index 0fe164b42c5d..f144a5ef2e04 100644
> > > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > > @@ -389,6 +389,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > return NETDEV_TX_BUSY;
> > > }
> > >
> > > + if (unlikely(ipv6_hopopt_jumbo_remove(skb)))
> > > + goto tx_free;
> > > +
> > > length = skb->len;
> > > len = skb_headlen(skb);
> > > last_frag = skb_shinfo(skb)->nr_frags;
> > > @@ -11342,9 +11345,15 @@ static bool bnxt_exthdr_check(struct bnxt *bp, struct sk_buff *skb, int nw_off,
> > >
> > > if (hdrlen > 64)
> > > return false;
> > > +
> > > + /* The ext header may be a hop-by-hop header inserted for
> > > + * big TCP purposes. This will be removed before sending
> > > + * from NIC, so do not count it.
> > > + */
> > > + if (!(*nexthdr == NEXTHDR_HOP && ipv6_has_hopopt_jumbo(skb)))
> >
> > To be more efficient, why not just check the header's tlv_type here
> > instead of calling ipv6_has_hopopt_jumbo()?
> >
>
> It may be possible that the next header is Hop_by_hop but the packet
> is not tcp, meaning that it would not be removed and we'd still want
> to count this header towards the limit.
> ipv6_has_hopopt_jumbo checks for the big tcp case (gso, skb len
> reaches a certain size) particularly.
>
We can add all the additional checks here and it will still be more
efficient because we already know this is ipv6 and we are looking at
the extension header. This is fast path so I think we want to be as
efficient as possible.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-12-02 5:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-29 20:06 [RFC net-next v3 1/2] IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver Coco Li
2022-11-29 20:06 ` [RFC net-next v3 2/2] bnxt: Use generic HBH removal helper in tx path Coco Li
2022-11-29 20:41 ` Michael Chan
2022-12-02 2:03 ` Coco Li
2022-12-02 5:56 ` Michael Chan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).