* TX pre-headers...
@ 2009-02-06 9:41 David Miller
2009-02-06 10:46 ` Steve.Glendinning
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: David Miller @ 2009-02-06 9:41 UTC (permalink / raw)
To: netdev
Some NIC hardware wants a pre-header pushed in front of the packet
data on transmit.
When routing or bridging this will cause a reallocation of skb->data
on every packet forwarded because there will only be NET_IP_ALIGN
space reserved at the head by the device receive path.
NIU is one such NIC and I only noticed this because of some things I
saw in some of Robert Olsson's routing stress test oprofile dumps.
Putting a hack into NIU is the wrong way to do this and would only fix
cases where NIU is the receiver and transmitting device. e1000 to
NIU would still be broken, for example.
I think the way to solve this is to have each device indicate how
much TX slack space it neads for it's preheaders. On device
registration we have some global "netdev_max_tx_hdr_space" that
records the maximum value seen.
We could decrease it on unregister (by walking the device list)
but I don't think that is worth it.
We also round netdev_max_tx_hdr_space up to be a multiple of 4
or something reasonable like that.
Then we get drivers to use a new interface:
struct sk_buff *netdev_alloc_rx_skb(struct net_device *dev, int size);
which is nearly identical to netdev_alloc_skb() except that it does:
size += NET_IP_ALIGN + netdev_max_tx_hdr_space;
skb = netdev_alloc_skb(dev, size);
if (skb)
skb_reserve(skb, NET_IP_ALIGN + netdev_max_tx_hdr_space);
return skb;
Seems reasonable?
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: TX pre-headers... 2009-02-06 9:41 TX pre-headers David Miller @ 2009-02-06 10:46 ` Steve.Glendinning 2009-02-06 11:11 ` David Miller 2009-02-06 18:53 ` Inaky Perez-Gonzalez 2009-02-06 12:02 ` Frank Blaschka 2009-02-09 10:07 ` Herbert Xu 2 siblings, 2 replies; 12+ messages in thread From: Steve.Glendinning @ 2009-02-06 10:46 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller <davem@davemloft.net> wrote on 06/02/2009 09:41:07: > > Some NIC hardware wants a pre-header pushed in front of the packet > data on transmit. > > When routing or bridging this will cause a reallocation of skb->data > on every packet forwarded because there will only be NET_IP_ALIGN > space reserved at the head by the device receive path. > > NIU is one such NIC and I only noticed this because of some things I > saw in some of Robert Olsson's routing stress test oprofile dumps. Many of the usbnet drivers do this as well - for example smsc95xx needs either 8 or 12 bytes for its pre-header (depending whether tx checksum offload is enabled) and does this in its tx_fixup: if (skb_headroom(skb) < overhead) { struct sk_buff *skb2 = skb_copy_expand(skb, overhead, 0, flags); dev_kfree_skb_any(skb); skb = skb2; if (!skb) return NULL; } It would be really good to get rid of these copies. Interestingly, this device can change the amount of pre-header space it needs if tx checksums are enabled/disabled at runtime. I guess the device would just indicate its worst case requirement at registration. Does this just apply to the routing/bridging case, or can packets originating from the host also allocate this extra pre-header? Steve ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-06 10:46 ` Steve.Glendinning @ 2009-02-06 11:11 ` David Miller 2009-02-06 18:53 ` Inaky Perez-Gonzalez 1 sibling, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-06 11:11 UTC (permalink / raw) To: Steve.Glendinning; +Cc: netdev From: Steve.Glendinning@smsc.com Date: Fri, 6 Feb 2009 10:46:24 +0000 > Many of the usbnet drivers do this as well - for example smsc95xx needs > either 8 or 12 bytes for its pre-header (depending whether tx checksum > offload is enabled) and does this in its tx_fixup: > > if (skb_headroom(skb) < overhead) { > struct sk_buff *skb2 = skb_copy_expand(skb, > overhead, 0, flags); > dev_kfree_skb_any(skb); > skb = skb2; > if (!skb) > return NULL; > } > > It would be really good to get rid of these copies. This is exactly the kind of code that the NIU transmit handler has. > Interestingly, this device can change the amount of pre-header space it > needs if tx checksums are enabled/disabled at runtime. I guess the > device would just indicate its worst case requirement at registration. Right. > Does this just apply to the routing/bridging case, or can packets > originating from the host also allocate this extra pre-header? ARP packets, as one specific case, are still going to trigger that conditional above and need the copy. TCP, UDP and others don't trigger this usually only because they add MAX_TCP_HEADER et al. to all of their allocations which essentially gives 128 bytes of slack space at the front of every TX frame from transport protocols. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-06 10:46 ` Steve.Glendinning 2009-02-06 11:11 ` David Miller @ 2009-02-06 18:53 ` Inaky Perez-Gonzalez 1 sibling, 0 replies; 12+ messages in thread From: Inaky Perez-Gonzalez @ 2009-02-06 18:53 UTC (permalink / raw) To: Steve.Glendinning; +Cc: David Miller, netdev On Friday 06 February 2009, Steve.Glendinning@smsc.com wrote: > David Miller <davem@davemloft.net> wrote on 06/02/2009 09:41:07: > > Some NIC hardware wants a pre-header pushed in front of the packet > > data on transmit. > > > > When routing or bridging this will cause a reallocation of skb->data > > on every packet forwarded because there will only be NET_IP_ALIGN > > space reserved at the head by the device receive path. > > > > NIU is one such NIC and I only noticed this because of some things I > > saw in some of Robert Olsson's routing stress test oprofile dumps. > > Many of the usbnet drivers do this as well - for example smsc95xx needs > either 8 or 12 bytes for its pre-header (depending whether tx checksum > offload is enabled) and does this in its tx_fixup: The wimax i2400m has a similar thing, but on top, it requires packets to be coalesced and then yet another header with packet descriptors has to be added to them. (header pld0 pld1 pld2...padding pl0 pl1 pl2....) pld - payload descriptor pl - payload Each IP packet payload has a prefix header too in the payload. So it would not be able to take advantage of this (using a FIFO right now). I wanted to explore doing all that coalescing using a sg vector, but until the USB and SDIO stacks support it without too much overhead, is not worth the effort. -- Inaky ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-06 9:41 TX pre-headers David Miller 2009-02-06 10:46 ` Steve.Glendinning @ 2009-02-06 12:02 ` Frank Blaschka 2009-02-07 8:10 ` David Miller 2009-02-09 10:07 ` Herbert Xu 2 siblings, 1 reply; 12+ messages in thread From: Frank Blaschka @ 2009-02-06 12:02 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller schrieb: > Some NIC hardware wants a pre-header pushed in front of the packet > data on transmit. > > When routing or bridging this will cause a reallocation of skb->data > on every packet forwarded because there will only be NET_IP_ALIGN > space reserved at the head by the device receive path. > > NIU is one such NIC and I only noticed this because of some things I > saw in some of Robert Olsson's routing stress test oprofile dumps. > > Putting a hack into NIU is the wrong way to do this and would only fix > cases where NIU is the receiver and transmitting device. e1000 to > NIU would still be broken, for example. > > I think the way to solve this is to have each device indicate how > much TX slack space it neads for it's preheaders. On device > registration we have some global "netdev_max_tx_hdr_space" that > records the maximum value seen. > > We could decrease it on unregister (by walking the device list) > but I don't think that is worth it. > > We also round netdev_max_tx_hdr_space up to be a multiple of 4 > or something reasonable like that. > > Then we get drivers to use a new interface: > > struct sk_buff *netdev_alloc_rx_skb(struct net_device *dev, int size); > > which is nearly identical to netdev_alloc_skb() except that it does: > > size += NET_IP_ALIGN + netdev_max_tx_hdr_space; > skb = netdev_alloc_skb(dev, size); > if (skb) > skb_reserve(skb, NET_IP_ALIGN + netdev_max_tx_hdr_space); > return skb; > > Seems reasonable? Absolutely yes, this would help the s/390 qeth drivers too > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-06 12:02 ` Frank Blaschka @ 2009-02-07 8:10 ` David Miller 2009-02-09 3:34 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2009-02-07 8:10 UTC (permalink / raw) To: blaschka; +Cc: netdev From: Frank Blaschka <blaschka@linux.vnet.ibm.com> Date: Fri, 06 Feb 2009 13:02:13 +0100 > Absolutely yes, this would help the s/390 qeth drivers too Well, I did some research and it seems all of the cases we could actually solve with such a scheme need at a maximum 32 bytes. We already ensure 16 bytes, via NET_SKB_PAD. So instead of all of this complex "who has the largest TX header size and what is it" code, we can simply increase NET_SKB_PAD to 32. You still need that headroom check there, simply because tunneling and other device stacking situations can cause the headroom to be depleted before your device sees the packet. Any objections? :-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 08670d0..5eba400 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1287,7 +1287,7 @@ static inline int skb_network_offset(const struct sk_buff *skb) * The networking layer reserves some headroom in skb data (via * dev_alloc_skb). This is used to avoid having to reallocate skb data when * the header has to grow. In the default case, if the header has to grow - * 16 bytes or less we avoid the reallocation. + * 32 bytes or less we avoid the reallocation. * * Unfortunately this headroom changes the DMA alignment of the resulting * network packet. As for NET_IP_ALIGN, this unaligned DMA is expensive @@ -1295,11 +1295,11 @@ static inline int skb_network_offset(const struct sk_buff *skb) * perhaps setting it to a cacheline in size (since that will maintain * cacheline alignment of the DMA). It must be a power of 2. * - * Various parts of the networking layer expect at least 16 bytes of + * Various parts of the networking layer expect at least 32 bytes of * headroom, you should not reduce this. */ #ifndef NET_SKB_PAD -#define NET_SKB_PAD 16 +#define NET_SKB_PAD 32 #endif extern int ___pskb_trim(struct sk_buff *skb, unsigned int len); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-07 8:10 ` David Miller @ 2009-02-09 3:34 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-09 3:34 UTC (permalink / raw) To: blaschka; +Cc: netdev From: David Miller <davem@davemloft.net> Date: Sat, 07 Feb 2009 00:10:41 -0800 (PST) > From: Frank Blaschka <blaschka@linux.vnet.ibm.com> > Date: Fri, 06 Feb 2009 13:02:13 +0100 > > > Absolutely yes, this would help the s/390 qeth drivers too > > Well, I did some research and it seems all of the cases we could > actually solve with such a scheme need at a maximum 32 bytes. > > We already ensure 16 bytes, via NET_SKB_PAD. > > So instead of all of this complex "who has the largest TX header size > and what is it" code, we can simply increase NET_SKB_PAD to 32. > > You still need that headroom check there, simply because tunneling and > other device stacking situations can cause the headroom to be depleted > before your device sees the packet. > > Any objections? :-) Nobody objected, at least for now, so I commited this change, as follows: net: Increase default NET_SKB_PAD to 32. Several devices need to insert some "pre headers" in front of the main packet data when they transmit a packet. Currently we allocate only 16 bytes of pad room and this ends up not being enough for some types of hardware (NIU, usb-net, s390 qeth, etc.) So increase this to 32. Note that drivers still need to check in their transmit routine whether enough headroom exists, and if not use skb_realloc_headroom(). Tunneling, IPSEC, and other encapsulation methods can cause the padding area to be used up. Signed-off-by: David S. Miller <davem@davemloft.net> --- include/linux/skbuff.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 08670d0..5eba400 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1287,7 +1287,7 @@ static inline int skb_network_offset(const struct sk_buff *skb) * The networking layer reserves some headroom in skb data (via * dev_alloc_skb). This is used to avoid having to reallocate skb data when * the header has to grow. In the default case, if the header has to grow - * 16 bytes or less we avoid the reallocation. + * 32 bytes or less we avoid the reallocation. * * Unfortunately this headroom changes the DMA alignment of the resulting * network packet. As for NET_IP_ALIGN, this unaligned DMA is expensive @@ -1295,11 +1295,11 @@ static inline int skb_network_offset(const struct sk_buff *skb) * perhaps setting it to a cacheline in size (since that will maintain * cacheline alignment of the DMA). It must be a power of 2. * - * Various parts of the networking layer expect at least 16 bytes of + * Various parts of the networking layer expect at least 32 bytes of * headroom, you should not reduce this. */ #ifndef NET_SKB_PAD -#define NET_SKB_PAD 16 +#define NET_SKB_PAD 32 #endif extern int ___pskb_trim(struct sk_buff *skb, unsigned int len); -- 1.6.1.2.253.ga34a ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-06 9:41 TX pre-headers David Miller 2009-02-06 10:46 ` Steve.Glendinning 2009-02-06 12:02 ` Frank Blaschka @ 2009-02-09 10:07 ` Herbert Xu 2009-02-09 10:14 ` David Miller 2 siblings, 1 reply; 12+ messages in thread From: Herbert Xu @ 2009-02-09 10:07 UTC (permalink / raw) To: David Miller; +Cc: netdev David Miller <davem@davemloft.net> wrote: > > I think the way to solve this is to have each device indicate how > much TX slack space it neads for it's preheaders. On device > registration we have some global "netdev_max_tx_hdr_space" that > records the maximum value seen. We can already solve it right now with dev->needed_headroom. Of course it's not optimal because it involves a copy so we might still need your solution. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-09 10:07 ` Herbert Xu @ 2009-02-09 10:14 ` David Miller 2009-02-09 10:19 ` Herbert Xu 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2009-02-09 10:14 UTC (permalink / raw) To: herbert; +Cc: netdev From: Herbert Xu <herbert@gondor.apana.org.au> Date: Mon, 9 Feb 2009 21:07:13 +1100 > David Miller <davem@davemloft.net> wrote: > > > > I think the way to solve this is to have each device indicate how > > much TX slack space it neads for it's preheaders. On device > > registration we have some global "netdev_max_tx_hdr_space" that > > records the maximum value seen. > > We can already solve it right now with dev->needed_headroom. > > Of course it's not optimal because it involves a copy so we > might still need your solution. Indeed, I noticed needed_headroom when I set out to start coding things up. :) There are all sorts of cases that won't work well, as you implicitly suggest. For example, forwarding to a path that encapsulates to a GRE or other type of tunnel. For IPSEC we're predominantly COW'ing the data anyways, so... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-09 10:14 ` David Miller @ 2009-02-09 10:19 ` Herbert Xu 2009-02-09 15:39 ` Krzysztof Halasa 0 siblings, 1 reply; 12+ messages in thread From: Herbert Xu @ 2009-02-09 10:19 UTC (permalink / raw) To: David Miller; +Cc: netdev On Mon, Feb 09, 2009 at 02:14:03AM -0800, David Miller wrote: > > Indeed, I noticed needed_headroom when I set out to start coding > things up. :) > > There are all sorts of cases that won't work well, as you implicitly > suggest. For example, forwarding to a path that encapsulates to a GRE > or other type of tunnel. > > For IPSEC we're predominantly COW'ing the data anyways, so... Yes tunneling (especially when it's hidden by netfilter) is the hard part. Here's a thought, what if we had your global maximum, and every time we have to reallocate a packet because of a failed head room check, we increase the maximum by the needed amount (we should add a ceiling in case some buggy path ignores this maximum when allocating a new packet). This way even tunnels could benefit from not having to copy all the time. As a failed check should be rare (given that we're continuously increasing it) the overhead in updating the maximum should be reasonable. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-09 10:19 ` Herbert Xu @ 2009-02-09 15:39 ` Krzysztof Halasa 2009-02-10 8:15 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Krzysztof Halasa @ 2009-02-09 15:39 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, netdev Herbert Xu <herbert@gondor.apana.org.au> writes: > Yes tunneling (especially when it's hidden by netfilter) is the > hard part. > > Here's a thought, what if we had your global maximum, and every > time we have to reallocate a packet because of a failed head > room check, we increase the maximum by the needed amount (we should > add a ceiling in case some buggy path ignores this maximum when > allocating a new packet). > > This way even tunnels could benefit from not having to copy all > the time. > > As a failed check should be rare (given that we're continuously > increasing it) the overhead in updating the maximum should be > reasonable. I agree. OTOH and FWIW my stuff (= WAN) alone without stacking fits in 32 bytes, the typical max case is 10 bytes for Frame-Relay header + 14 bytes for Ethernet header + 4 bytes for 802.1Q tag = 28 bytes. OTOH I wonder if these changes make the dev->hard_header_len no longer needed. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TX pre-headers... 2009-02-09 15:39 ` Krzysztof Halasa @ 2009-02-10 8:15 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2009-02-10 8:15 UTC (permalink / raw) To: khc; +Cc: herbert, netdev From: Krzysztof Halasa <khc@pm.waw.pl> Date: Mon, 09 Feb 2009 16:39:48 +0100 > OTOH I wonder if these changes make the dev->hard_header_len no longer > needed. Not really, this value tells the kernel how large the cached link-level headers are that get copied directly under the network header. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-02-10 8:15 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-06 9:41 TX pre-headers David Miller 2009-02-06 10:46 ` Steve.Glendinning 2009-02-06 11:11 ` David Miller 2009-02-06 18:53 ` Inaky Perez-Gonzalez 2009-02-06 12:02 ` Frank Blaschka 2009-02-07 8:10 ` David Miller 2009-02-09 3:34 ` David Miller 2009-02-09 10:07 ` Herbert Xu 2009-02-09 10:14 ` David Miller 2009-02-09 10:19 ` Herbert Xu 2009-02-09 15:39 ` Krzysztof Halasa 2009-02-10 8:15 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).