* [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
@ 2014-09-25 14:17 Eric Dumazet
2014-09-25 14:43 ` Jesper Dangaard Brouer
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 14:17 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Amir Vadai
From: Eric Dumazet <edumazet@google.com>
skb->xmit_more tells us if another skb is coming next.
We need to send doorbell when : xmit_more is not set,
or txqueue is stopped (preventing next skb to come immediately)
Tested with a modified pktgen version, I got a 40% increase of
throughput.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 23 +++++++++++++------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c44f4237b9be..adedc47e947d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -667,6 +667,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
int lso_header_size;
void *fragptr;
bool bounce = false;
+ bool send_doorbell;
if (!priv->port_up)
goto tx_drop;
@@ -878,12 +879,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
- if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tx_tag_present(skb)) {
+ send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
+
+ if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
+ !vlan_tx_tag_present(skb) && send_doorbell) {
tx_desc->ctrl.bf_qpn |= cpu_to_be32(ring->doorbell_qpn);
op_own |= htonl((bf_index & 0xffff) << 8);
- /* Ensure new descirptor hits memory
- * before setting ownership of this descriptor to HW */
+ /* Ensure new descriptor hits memory
+ * before setting ownership of this descriptor to HW
+ */
wmb();
tx_desc->ctrl.owner_opcode = op_own;
@@ -896,12 +901,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
ring->bf.offset ^= ring->bf.buf_size;
} else {
- /* Ensure new descirptor hits memory
- * before setting ownership of this descriptor to HW */
+ /* Ensure new descriptor hits memory
+ * before setting ownership of this descriptor to HW
+ */
wmb();
tx_desc->ctrl.owner_opcode = op_own;
- wmb();
- iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+ if (send_doorbell) {
+ wmb();
+ iowrite32be(ring->doorbell_qpn,
+ ring->bf.uar->map + MLX4_SEND_DOORBELL);
+ }
}
return NETDEV_TX_OK;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet
@ 2014-09-25 14:43 ` Jesper Dangaard Brouer
2014-09-25 15:06 ` Eric Dumazet
2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov
2014-09-28 21:27 ` David Miller
2 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-25 14:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: brouer, David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 25 Sep 2014 07:17:49 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.
Pktgen is artificial benchmarking, you know ;-) :-P
If you really must use pktgen for this, you can use an unmodified
pktgen against the qdisc layer by adding a VLAN interface on-top of the
device you want to test. I blame John, for telling me this ;-)
/me running away
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 14:43 ` Jesper Dangaard Brouer
@ 2014-09-25 15:06 ` Eric Dumazet
2014-09-25 15:36 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 15:06 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 2014-09-25 at 16:43 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 25 Sep 2014 07:17:49 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Tested with a modified pktgen version, I got a 40% increase of
> > throughput.
>
> Pktgen is artificial benchmarking, you know ;-) :-P
>
> If you really must use pktgen for this, you can use an unmodified
> pktgen against the qdisc layer by adding a VLAN interface on-top of the
> device you want to test. I blame John, for telling me this ;-)
>
> /me running away
But... Nothing yet in qdisc layer sets xmit_more, unless TSO is
disabled, and a GSO packet is segmented.
So you might notice some improvement with this patch, but not if you use
TSO, and a 40GB NIC is better with TSO on you know ;)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 15:06 ` Eric Dumazet
@ 2014-09-25 15:36 ` Eric Dumazet
2014-09-25 15:40 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 15:36 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 2014-09-25 at 08:06 -0700, Eric Dumazet wrote:
>
> But... Nothing yet in qdisc layer sets xmit_more, unless TSO is
> disabled, and a GSO packet is segmented.
>
> So you might notice some improvement with this patch, but not if you use
> TSO, and a 40GB NIC is better with TSO on you know ;)
>
So playing with TSO=off (but GSO=on), we see about 12 % increase of
throughput on a TCP_STREAM flow.
And kernel profile clearly shows the difference of sending or not the
doorbell, as mlx4_en_xmit() shifts. Note also mlx4_en_free_tx_desc()
disappears, this might point to some false sharing or something worth
investigating.
Before patch :
26.36% [kernel] [k] __copy_skb_header
16.59% [kernel] [k] mlx4_en_xmit
5.62% [kernel] [k] __alloc_skb
5.48% [kernel] [k] copy_user_enhanced_fast_string
4.48% [kernel] [k] skb_segment
2.46% [kernel] [k] mlx4_en_free_tx_desc.isra.27
2.31% [kernel] [k] _raw_spin_lock
2.10% [kernel] [k] memcpy
2.01% [kernel] [k] tcp_sendmsg
1.62% [kernel] [k] __iowrite64_copy
After patch :
32.78% [kernel] [k] __copy_skb_header
8.26% [kernel] [k] mlx4_en_xmit
7.25% [kernel] [k] __alloc_skb
7.18% [kernel] [k] copy_user_enhanced_fast_string
4.39% [kernel] [k] skb_segment
2.87% [kernel] [k] memcpy
2.59% [kernel] [k] tcp_sendmsg
2.50% [kernel] [k] _raw_spin_lock
2.38% [kernel] [k] dev_hard_start_xmit
1.52% [kernel] [k] tcp_gso_segment
1.50% [kernel] [k] kmem_cache_alloc_node_trace
1.40% [kernel] [k] kmem_cache_alloc_node
1.16% [kernel] [k] ip_send_check
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 15:36 ` Eric Dumazet
@ 2014-09-25 15:40 ` Eric Dumazet
2014-09-25 16:19 ` Eric Dumazet
2014-09-25 20:47 ` Eric Dumazet
2 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 15:40 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:
> So playing with TSO=off (but GSO=on), we see about 12 % increase of
> throughput on a TCP_STREAM flow.
Correction, this was a 20 % increase.
( 849284 pps -> 1023469 pps)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet
2014-09-25 14:43 ` Jesper Dangaard Brouer
@ 2014-09-25 15:53 ` Alexei Starovoitov
2014-09-25 16:05 ` Eric Dumazet
2014-09-28 21:27 ` David Miller
2 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2014-09-25 15:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai
On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> skb->xmit_more tells us if another skb is coming next.
>
> We need to send doorbell when : xmit_more is not set,
> or txqueue is stopped (preventing next skb to come immediately)
>
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.
this is awesome!
I've been hacking pktgen as well based on Jesper's earlier patch,
but in slightly different way. Sounds like you already have working
pktgen with xmit_more. Do you mind sharing it? even rough patch
would be great.
Thanks
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov
@ 2014-09-25 16:05 ` Eric Dumazet
2014-09-25 16:08 ` Alexei Starovoitov
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 16:05 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev, Amir Vadai
On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote:
> On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > skb->xmit_more tells us if another skb is coming next.
> >
> > We need to send doorbell when : xmit_more is not set,
> > or txqueue is stopped (preventing next skb to come immediately)
> >
> > Tested with a modified pktgen version, I got a 40% increase of
> > throughput.
>
> this is awesome!
>
> I've been hacking pktgen as well based on Jesper's earlier patch,
> but in slightly different way. Sounds like you already have working
> pktgen with xmit_more. Do you mind sharing it? even rough patch
> would be great.
The plan was to add a 'pburst x' parameter.
Because not only we want to avoid sending the doorbell, but we could
also not doing the spinlock for every start_xmit().
We also can enforce a global [p]rate (say 100000 packets per second),
but still allow pktgen to send burts of x packets to use (and test) this
xmit_more
Nothing ready yet ;)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 16:05 ` Eric Dumazet
@ 2014-09-25 16:08 ` Alexei Starovoitov
0 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2014-09-25 16:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai
On Thu, Sep 25, 2014 at 9:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote:
>> On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > From: Eric Dumazet <edumazet@google.com>
>> >
>> > skb->xmit_more tells us if another skb is coming next.
>> >
>> > We need to send doorbell when : xmit_more is not set,
>> > or txqueue is stopped (preventing next skb to come immediately)
>> >
>> > Tested with a modified pktgen version, I got a 40% increase of
>> > throughput.
>>
>> this is awesome!
>>
>> I've been hacking pktgen as well based on Jesper's earlier patch,
>> but in slightly different way. Sounds like you already have working
>> pktgen with xmit_more. Do you mind sharing it? even rough patch
>> would be great.
>
> The plan was to add a 'pburst x' parameter.
>
> Because not only we want to avoid sending the doorbell, but we could
> also not doing the spinlock for every start_xmit().
exactly!
I also hacked
atomic_inc(&(pkt_dev->skb->users))
into
single atomic_add of N packets for bursting to amortize the cost
of both spin_lock and lock xadd
> We also can enforce a global [p]rate (say 100000 packets per second),
> but still allow pktgen to send burts of x packets to use (and test) this
> xmit_more
that would be nice. I'm not sure how yet.
> Nothing ready yet ;)
can't wait :)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 15:36 ` Eric Dumazet
2014-09-25 15:40 ` Eric Dumazet
@ 2014-09-25 16:19 ` Eric Dumazet
2014-09-25 20:47 ` Eric Dumazet
2 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 16:19 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:
> 26.36% [kernel] [k] __copy_skb_header
Note to myself :
Time to optimize this thing, doing a memset() to copy a bunch of
consecutive fields.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 15:36 ` Eric Dumazet
2014-09-25 15:40 ` Eric Dumazet
2014-09-25 16:19 ` Eric Dumazet
@ 2014-09-25 20:47 ` Eric Dumazet
2014-09-25 20:58 ` Joe Perches
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
2 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 20:47 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend
On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote:
> After patch :
>
> 32.78% [kernel] [k] __copy_skb_header
> 8.26% [kernel] [k] mlx4_en_xmit
> 7.25% [kernel] [k] __alloc_skb
> 7.18% [kernel] [k] copy_user_enhanced_fast_string
> 4.39% [kernel] [k] skb_segment
> 2.87% [kernel] [k] memcpy
> 2.59% [kernel] [k] tcp_sendmsg
> 2.50% [kernel] [k] _raw_spin_lock
> 2.38% [kernel] [k] dev_hard_start_xmit
> 1.52% [kernel] [k] tcp_gso_segment
> 1.50% [kernel] [k] kmem_cache_alloc_node_trace
> 1.40% [kernel] [k] kmem_cache_alloc_node
> 1.16% [kernel] [k] ip_send_check
>
After __copy_skb_header() optimization I get something even nicer ;)
19.90% [kernel] [k] __copy_skb_header
15.16% [kernel] [k] skb_segment
7.02% [kernel] [k] __alloc_skb
6.89% [kernel] [k] copy_user_enhanced_fast_string
6.60% [kernel] [k] mlx4_en_xmit
2.70% [kernel] [k] _raw_spin_lock
2.68% [kernel] [k] memcpy
2.41% [kernel] [k] tcp_sendmsg
I'll post this patch asap.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 20:47 ` Eric Dumazet
@ 2014-09-25 20:58 ` Joe Perches
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
1 sibling, 0 replies; 19+ messages in thread
From: Joe Perches @ 2014-09-25 20:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, David Miller, netdev, Amir Vadai,
John Fastabend
On Thu, 2014-09-25 at 13:47 -0700, Eric Dumazet wrote:
> After patch :
> 32.78% [kernel] [k] __copy_skb_header
> After __copy_skb_header() optimization I get something even nicer ;)
> 19.90% [kernel] [k] __copy_skb_header
> I'll post this patch asap.
Eric, you are _killing_ it. network ninja indeed...
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-25 20:47 ` Eric Dumazet
2014-09-25 20:58 ` Joe Perches
@ 2014-09-25 21:20 ` Eric Dumazet
2014-09-26 8:35 ` Jesper Dangaard Brouer
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 21:20 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Amir Vadai, John Fastabend, Jesper Dangaard Brouer
From: Eric Dumazet <edumazet@google.com>
With proliferation of bit fields in sk_buff, __copy_skb_header() became
quite expensive, showing as the most expensive function in a GSO
workload.
__copy_skb_header() performance is also critical for non GSO TCP
operations, as it is used from skb_clone()
This patch carefully moves all the fields that were not copied in a
separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
Then I moved all other fields and all other copied fields in a section
delimited by headers_start[0]/headers_end[0] section so that we
can use a single memcpy() call, inlined by compiler using long
word load/stores.
I also tried to make all copies in the natural orders of sk_buff,
to help hardware prefetching.
I made sure sk_buff size did not change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/skbuff.h | 132 ++++++++++++++++++++-------------------
net/core/skbuff.c | 80 ++++++++++++-----------
2 files changed, 112 insertions(+), 100 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f1bfa3781c75..180f55b9ef21 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,27 +527,41 @@ struct sk_buff {
char cb[48] __aligned(8);
unsigned long _skb_refdst;
+ void (*destructor)(struct sk_buff *skb);
#ifdef CONFIG_XFRM
struct sec_path *sp;
#endif
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+ struct nf_conntrack *nfct;
+#endif
+#ifdef CONFIG_BRIDGE_NETFILTER
+ struct nf_bridge_info *nf_bridge;
+#endif
unsigned int len,
data_len;
__u16 mac_len,
hdr_len;
- union {
- __wsum csum;
- struct {
- __u16 csum_start;
- __u16 csum_offset;
- };
- };
- __u32 priority;
+
+ /* Following fields are _not_ copied in __copy_skb_header()
+ * Note that queue_mapping is here mostly to fill a hole.
+ */
kmemcheck_bitfield_begin(flags1);
- __u8 ignore_df:1,
- cloned:1,
- ip_summed:2,
+ __u16 queue_mapping;
+ __u8 cloned:1,
nohdr:1,
- nfctinfo:3;
+ fclone:2,
+ peeked:1,
+ head_frag:1,
+ xmit_more:1;
+ /* one bit hole */
+ kmemcheck_bitfield_end(flags1);
+
+
+
+ /* fields enclosed in headers_start/headers_end are copied
+ * using a single memcpy() in __copy_skb_header()
+ */
+ __u32 headers_start[0];
/* if you move pkt_type around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
@@ -558,58 +572,52 @@ struct sk_buff {
#define PKT_TYPE_OFFSET() offsetof(struct sk_buff, __pkt_type_offset)
__u8 __pkt_type_offset[0];
- __u8 pkt_type:3,
- fclone:2,
- ipvs_property:1,
- peeked:1,
- nf_trace:1;
- kmemcheck_bitfield_end(flags1);
- __be16 protocol;
-
- void (*destructor)(struct sk_buff *skb);
-#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
- struct nf_conntrack *nfct;
-#endif
-#ifdef CONFIG_BRIDGE_NETFILTER
- struct nf_bridge_info *nf_bridge;
-#endif
-
- int skb_iif;
-
- __u32 hash;
-
- __be16 vlan_proto;
- __u16 vlan_tci;
-
-#ifdef CONFIG_NET_SCHED
- __u16 tc_index; /* traffic control index */
-#ifdef CONFIG_NET_CLS_ACT
- __u16 tc_verd; /* traffic control verdict */
-#endif
-#endif
-
- __u16 queue_mapping;
- kmemcheck_bitfield_begin(flags2);
- __u8 xmit_more:1;
-#ifdef CONFIG_IPV6_NDISC_NODETYPE
- __u8 ndisc_nodetype:2;
-#endif
+ __u8 pkt_type:3;
__u8 pfmemalloc:1;
+ __u8 ignore_df:1;
+ __u8 nfctinfo:3;
+
+ __u8 nf_trace:1;
+ __u8 ip_summed:2;
__u8 ooo_okay:1;
__u8 l4_hash:1;
__u8 sw_hash:1;
__u8 wifi_acked_valid:1;
__u8 wifi_acked:1;
+
__u8 no_fcs:1;
- __u8 head_frag:1;
/* Indicates the inner headers are valid in the skbuff. */
__u8 encapsulation:1;
__u8 encap_hdr_csum:1;
__u8 csum_valid:1;
__u8 csum_complete_sw:1;
- /* 1/3 bit hole (depending on ndisc_nodetype presence) */
- kmemcheck_bitfield_end(flags2);
+ __u8 csum_level:2;
+ __u8 csum_bad:1;
+#ifdef CONFIG_IPV6_NDISC_NODETYPE
+ __u8 ndisc_nodetype:2;
+#endif
+ /* 6/8 bit hole */
+
+#ifdef CONFIG_NET_SCHED
+ __u16 tc_index; /* traffic control index */
+#ifdef CONFIG_NET_CLS_ACT
+ __u16 tc_verd; /* traffic control verdict */
+#endif
+#endif
+
+ union {
+ __wsum csum;
+ struct {
+ __u16 csum_start;
+ __u16 csum_offset;
+ };
+ };
+ __u32 priority;
+ int skb_iif;
+ __u32 hash;
+ __be16 vlan_proto;
+ __u16 vlan_tci;
#if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
union {
unsigned int napi_id;
@@ -625,19 +633,18 @@ struct sk_buff {
__u32 reserved_tailroom;
};
- kmemcheck_bitfield_begin(flags3);
- __u8 csum_level:2;
- __u8 csum_bad:1;
- /* 13 bit hole */
- kmemcheck_bitfield_end(flags3);
-
__be16 inner_protocol;
__u16 inner_transport_header;
__u16 inner_network_header;
__u16 inner_mac_header;
+
+ __be16 protocol;
__u16 transport_header;
__u16 network_header;
__u16 mac_header;
+
+ __u32 headers_end[0];
+
/* These elements must be at the end, see alloc_skb() for details. */
sk_buff_data_t tail;
sk_buff_data_t end;
@@ -3025,19 +3032,22 @@ static inline void nf_reset_trace(struct sk_buff *skb)
}
/* Note: This doesn't put any conntrack and bridge info in dst. */
-static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src)
+static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src,
+ bool copy)
{
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
dst->nfct = src->nfct;
nf_conntrack_get(src->nfct);
- dst->nfctinfo = src->nfctinfo;
+ if (copy)
+ dst->nfctinfo = src->nfctinfo;
#endif
#ifdef CONFIG_BRIDGE_NETFILTER
dst->nf_bridge = src->nf_bridge;
nf_bridge_get(src->nf_bridge);
#endif
#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES)
- dst->nf_trace = src->nf_trace;
+ if (copy)
+ dst->nf_trace = src->nf_trace;
#endif
}
@@ -3049,7 +3059,7 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
#ifdef CONFIG_BRIDGE_NETFILTER
nf_bridge_put(dst->nf_bridge);
#endif
- __nf_copy(dst, src);
+ __nf_copy(dst, src, true);
}
#ifdef CONFIG_NETWORK_SECMARK
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 06a8feb10099..57821927700e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -261,7 +261,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
atomic_t *fclone_ref = (atomic_t *) (child + 1);
kmemcheck_annotate_bitfield(child, flags1);
- kmemcheck_annotate_bitfield(child, flags2);
skb->fclone = SKB_FCLONE_ORIG;
atomic_set(fclone_ref, 1);
@@ -674,57 +673,61 @@ void consume_skb(struct sk_buff *skb)
}
EXPORT_SYMBOL(consume_skb);
+/* Make sure a field is enclosed inside headers_start/headers_end section */
+#define CHECK_SKB_FIELD(field) \
+ BUILD_BUG_ON(offsetof(struct sk_buff, field) < \
+ offsetof(struct sk_buff, headers_start)); \
+ BUILD_BUG_ON(offsetof(struct sk_buff, field) >= \
+ offsetof(struct sk_buff, headers_end)); \
+
static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
{
new->tstamp = old->tstamp;
+ /* We do not copy old->sk */
new->dev = old->dev;
- new->transport_header = old->transport_header;
- new->network_header = old->network_header;
- new->mac_header = old->mac_header;
- new->inner_protocol = old->inner_protocol;
- new->inner_transport_header = old->inner_transport_header;
- new->inner_network_header = old->inner_network_header;
- new->inner_mac_header = old->inner_mac_header;
+ memcpy(new->cb, old->cb, sizeof(old->cb));
skb_dst_copy(new, old);
- skb_copy_hash(new, old);
- new->ooo_okay = old->ooo_okay;
- new->no_fcs = old->no_fcs;
- new->encapsulation = old->encapsulation;
- new->encap_hdr_csum = old->encap_hdr_csum;
- new->csum_valid = old->csum_valid;
- new->csum_complete_sw = old->csum_complete_sw;
#ifdef CONFIG_XFRM
new->sp = secpath_get(old->sp);
#endif
- memcpy(new->cb, old->cb, sizeof(old->cb));
- new->csum = old->csum;
- new->ignore_df = old->ignore_df;
- new->pkt_type = old->pkt_type;
- new->ip_summed = old->ip_summed;
- skb_copy_queue_mapping(new, old);
- new->priority = old->priority;
-#if IS_ENABLED(CONFIG_IP_VS)
- new->ipvs_property = old->ipvs_property;
+ __nf_copy(new, old, false);
+
+ /* Note : this field could be in headers_start/headers_end section
+ * It is not yet because we do not want to have a 16 bit hole
+ */
+ new->queue_mapping = old->queue_mapping;
+
+ memcpy(&new->headers_start, &old->headers_start,
+ offsetof(struct sk_buff, headers_end) -
+ offsetof(struct sk_buff, headers_start));
+ CHECK_SKB_FIELD(protocol);
+ CHECK_SKB_FIELD(csum);
+ CHECK_SKB_FIELD(hash);
+ CHECK_SKB_FIELD(priority);
+ CHECK_SKB_FIELD(skb_iif);
+ CHECK_SKB_FIELD(vlan_proto);
+ CHECK_SKB_FIELD(vlan_tci);
+ CHECK_SKB_FIELD(transport_header);
+ CHECK_SKB_FIELD(network_header);
+ CHECK_SKB_FIELD(mac_header);
+ CHECK_SKB_FIELD(inner_protocol);
+ CHECK_SKB_FIELD(inner_transport_header);
+ CHECK_SKB_FIELD(inner_network_header);
+ CHECK_SKB_FIELD(inner_mac_header);
+ CHECK_SKB_FIELD(mark);
+#ifdef CONFIG_NETWORK_SECMARK
+ CHECK_SKB_FIELD(secmark);
+#endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ CHECK_SKB_FIELD(napi_id);
#endif
- new->pfmemalloc = old->pfmemalloc;
- new->protocol = old->protocol;
- new->mark = old->mark;
- new->skb_iif = old->skb_iif;
- __nf_copy(new, old);
#ifdef CONFIG_NET_SCHED
- new->tc_index = old->tc_index;
+ CHECK_SKB_FIELD(tc_index);
#ifdef CONFIG_NET_CLS_ACT
- new->tc_verd = old->tc_verd;
+ CHECK_SKB_FIELD(tc_verd);
#endif
#endif
- new->vlan_proto = old->vlan_proto;
- new->vlan_tci = old->vlan_tci;
-
- skb_copy_secmark(new, old);
-#ifdef CONFIG_NET_RX_BUSY_POLL
- new->napi_id = old->napi_id;
-#endif
}
/*
@@ -875,7 +878,6 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
return NULL;
kmemcheck_annotate_bitfield(n, flags1);
- kmemcheck_annotate_bitfield(n, flags2);
n->fclone = SKB_FCLONE_UNAVAILABLE;
}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
@ 2014-09-26 8:35 ` Jesper Dangaard Brouer
2014-09-28 21:37 ` David Miller
2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet
2 siblings, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-26 8:35 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai, John Fastabend, brouer
On Thu, 25 Sep 2014 14:20:04 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> With proliferation of bit fields in sk_buff, __copy_skb_header() became
> quite expensive, showing as the most expensive function in a GSO
> workload.
>
> __copy_skb_header() performance is also critical for non GSO TCP
> operations, as it is used from skb_clone()
>
> This patch carefully moves all the fields that were not copied in a
> separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
>
> Then I moved all other fields and all other copied fields in a section
> delimited by headers_start[0]/headers_end[0] section so that we
> can use a single memcpy() call, inlined by compiler using long
> word load/stores.
>
> I also tried to make all copies in the natural orders of sk_buff,
> to help hardware prefetching.
>
> I made sure sk_buff size did not change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
I'm impressed, network ninja!
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
It would have been nice if you would have included the performance
improvement you saw, e.g. from:
http://thread.gmane.org/gmane.linux.network/332035/focus=332086
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet
2014-09-25 14:43 ` Jesper Dangaard Brouer
2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov
@ 2014-09-28 21:27 ` David Miller
2 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-09-28 21:27 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, amirv
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 25 Sep 2014 07:17:49 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> skb->xmit_more tells us if another skb is coming next.
>
> We need to send doorbell when : xmit_more is not set,
> or txqueue is stopped (preventing next skb to come immediately)
>
> Tested with a modified pktgen version, I got a 40% increase of
> throughput.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
2014-09-26 8:35 ` Jesper Dangaard Brouer
@ 2014-09-28 21:37 ` David Miller
2014-09-29 4:06 ` David Miller
2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet
2 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2014-09-28 21:37 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 25 Sep 2014 14:20:04 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> With proliferation of bit fields in sk_buff, __copy_skb_header() became
> quite expensive, showing as the most expensive function in a GSO
> workload.
>
> __copy_skb_header() performance is also critical for non GSO TCP
> operations, as it is used from skb_clone()
>
> This patch carefully moves all the fields that were not copied in a
> separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
>
> Then I moved all other fields and all other copied fields in a section
> delimited by headers_start[0]/headers_end[0] section so that we
> can use a single memcpy() call, inlined by compiler using long
> word load/stores.
>
> I also tried to make all copies in the natural orders of sk_buff,
> to help hardware prefetching.
>
> I made sure sk_buff size did not change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-28 21:37 ` David Miller
@ 2014-09-29 4:06 ` David Miller
2014-09-29 4:57 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2014-09-29 4:06 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer
From: David Miller <davem@davemloft.net>
Date: Sun, 28 Sep 2014 17:37:59 -0400 (EDT)
> Applied, thanks!
Eric, I had to revert. Please enable IPVS in your build tests.
net/ipv4/ip_output.c: In function ‘ip_copy_metadata’:
net/ipv4/ip_output.c:470:4: error: ‘struct sk_buff’ has no member named ‘ipvs_property’
to->ipvs_property = from->ipvs_property;
^
net/ipv4/ip_output.c:470:26: error: ‘struct sk_buff’ has no member named ‘ipvs_property’
to->ipvs_property = from->ipvs_property;
^
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-29 4:06 ` David Miller
@ 2014-09-29 4:57 ` Eric Dumazet
0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-29 4:57 UTC (permalink / raw)
To: David Miller; +Cc: netdev, amirv, john.r.fastabend, brouer
On Mon, 2014-09-29 at 00:06 -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Sun, 28 Sep 2014 17:37:59 -0400 (EDT)
>
> > Applied, thanks!
>
> Eric, I had to revert. Please enable IPVS in your build tests.
>
> net/ipv4/ip_output.c: In function ‘ip_copy_metadata’:
> net/ipv4/ip_output.c:470:4: error: ‘struct sk_buff’ has no member named ‘ipvs_property’
> to->ipvs_property = from->ipvs_property;
> ^
> net/ipv4/ip_output.c:470:26: error: ‘struct sk_buff’ has no member named ‘ipvs_property’
> to->ipvs_property = from->ipvs_property;
Ah sorry, will send a v2.
Thanks
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
2014-09-26 8:35 ` Jesper Dangaard Brouer
2014-09-28 21:37 ` David Miller
@ 2014-09-29 5:18 ` Eric Dumazet
2014-09-29 16:27 ` David Miller
2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-09-29 5:18 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Amir Vadai, John Fastabend, Jesper Dangaard Brouer
From: Eric Dumazet <edumazet@google.com>
With proliferation of bit fields in sk_buff, __copy_skb_header() became
quite expensive, showing as the most expensive function in a GSO
workload.
__copy_skb_header() performance is also critical for non GSO TCP
operations, as it is used from skb_clone()
This patch carefully moves all the fields that were not copied in a
separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
Then I moved all other fields and all other copied fields in a section
delimited by headers_start[0]/headers_end[0] section so that we
can use a single memcpy() call, inlined by compiler using long
word load/stores.
I also tried to make all copies in the natural orders of sk_buff,
to help hardware prefetching.
I made sure sk_buff size did not change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2 : added back ipvs_property, oops ;)
include/linux/skbuff.h | 133 +++++++++++++++++++++------------------
net/core/skbuff.c | 80 ++++++++++++-----------
2 files changed, 113 insertions(+), 100 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8eaa62400fca..b6cced304b26 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,27 +527,41 @@ struct sk_buff {
char cb[48] __aligned(8);
unsigned long _skb_refdst;
+ void (*destructor)(struct sk_buff *skb);
#ifdef CONFIG_XFRM
struct sec_path *sp;
#endif
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+ struct nf_conntrack *nfct;
+#endif
+#ifdef CONFIG_BRIDGE_NETFILTER
+ struct nf_bridge_info *nf_bridge;
+#endif
unsigned int len,
data_len;
__u16 mac_len,
hdr_len;
- union {
- __wsum csum;
- struct {
- __u16 csum_start;
- __u16 csum_offset;
- };
- };
- __u32 priority;
+
+ /* Following fields are _not_ copied in __copy_skb_header()
+ * Note that queue_mapping is here mostly to fill a hole.
+ */
kmemcheck_bitfield_begin(flags1);
- __u8 ignore_df:1,
- cloned:1,
- ip_summed:2,
+ __u16 queue_mapping;
+ __u8 cloned:1,
nohdr:1,
- nfctinfo:3;
+ fclone:2,
+ peeked:1,
+ head_frag:1,
+ xmit_more:1;
+ /* one bit hole */
+ kmemcheck_bitfield_end(flags1);
+
+
+
+ /* fields enclosed in headers_start/headers_end are copied
+ * using a single memcpy() in __copy_skb_header()
+ */
+ __u32 headers_start[0];
/* if you move pkt_type around you also must adapt those constants */
#ifdef __BIG_ENDIAN_BITFIELD
@@ -558,58 +572,53 @@ struct sk_buff {
#define PKT_TYPE_OFFSET() offsetof(struct sk_buff, __pkt_type_offset)
__u8 __pkt_type_offset[0];
- __u8 pkt_type:3,
- fclone:2,
- ipvs_property:1,
- peeked:1,
- nf_trace:1;
- kmemcheck_bitfield_end(flags1);
- __be16 protocol;
-
- void (*destructor)(struct sk_buff *skb);
-#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
- struct nf_conntrack *nfct;
-#endif
-#ifdef CONFIG_BRIDGE_NETFILTER
- struct nf_bridge_info *nf_bridge;
-#endif
-
- int skb_iif;
-
- __u32 hash;
-
- __be16 vlan_proto;
- __u16 vlan_tci;
-
-#ifdef CONFIG_NET_SCHED
- __u16 tc_index; /* traffic control index */
-#ifdef CONFIG_NET_CLS_ACT
- __u16 tc_verd; /* traffic control verdict */
-#endif
-#endif
-
- __u16 queue_mapping;
- kmemcheck_bitfield_begin(flags2);
- __u8 xmit_more:1;
-#ifdef CONFIG_IPV6_NDISC_NODETYPE
- __u8 ndisc_nodetype:2;
-#endif
+ __u8 pkt_type:3;
__u8 pfmemalloc:1;
+ __u8 ignore_df:1;
+ __u8 nfctinfo:3;
+
+ __u8 nf_trace:1;
+ __u8 ip_summed:2;
__u8 ooo_okay:1;
__u8 l4_hash:1;
__u8 sw_hash:1;
__u8 wifi_acked_valid:1;
__u8 wifi_acked:1;
+
__u8 no_fcs:1;
- __u8 head_frag:1;
/* Indicates the inner headers are valid in the skbuff. */
__u8 encapsulation:1;
__u8 encap_hdr_csum:1;
__u8 csum_valid:1;
__u8 csum_complete_sw:1;
- /* 1/3 bit hole (depending on ndisc_nodetype presence) */
- kmemcheck_bitfield_end(flags2);
+ __u8 csum_level:2;
+ __u8 csum_bad:1;
+#ifdef CONFIG_IPV6_NDISC_NODETYPE
+ __u8 ndisc_nodetype:2;
+#endif
+ __u8 ipvs_property:1;
+ /* 5 or 7 bit hole */
+
+#ifdef CONFIG_NET_SCHED
+ __u16 tc_index; /* traffic control index */
+#ifdef CONFIG_NET_CLS_ACT
+ __u16 tc_verd; /* traffic control verdict */
+#endif
+#endif
+
+ union {
+ __wsum csum;
+ struct {
+ __u16 csum_start;
+ __u16 csum_offset;
+ };
+ };
+ __u32 priority;
+ int skb_iif;
+ __u32 hash;
+ __be16 vlan_proto;
+ __u16 vlan_tci;
#if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
union {
unsigned int napi_id;
@@ -625,19 +634,18 @@ struct sk_buff {
__u32 reserved_tailroom;
};
- kmemcheck_bitfield_begin(flags3);
- __u8 csum_level:2;
- __u8 csum_bad:1;
- /* 13 bit hole */
- kmemcheck_bitfield_end(flags3);
-
__be16 inner_protocol;
__u16 inner_transport_header;
__u16 inner_network_header;
__u16 inner_mac_header;
+
+ __be16 protocol;
__u16 transport_header;
__u16 network_header;
__u16 mac_header;
+
+ __u32 headers_end[0];
+
/* These elements must be at the end, see alloc_skb() for details. */
sk_buff_data_t tail;
sk_buff_data_t end;
@@ -3040,19 +3048,22 @@ static inline void nf_reset_trace(struct sk_buff *skb)
}
/* Note: This doesn't put any conntrack and bridge info in dst. */
-static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src)
+static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src,
+ bool copy)
{
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
dst->nfct = src->nfct;
nf_conntrack_get(src->nfct);
- dst->nfctinfo = src->nfctinfo;
+ if (copy)
+ dst->nfctinfo = src->nfctinfo;
#endif
#ifdef CONFIG_BRIDGE_NETFILTER
dst->nf_bridge = src->nf_bridge;
nf_bridge_get(src->nf_bridge);
#endif
#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES)
- dst->nf_trace = src->nf_trace;
+ if (copy)
+ dst->nf_trace = src->nf_trace;
#endif
}
@@ -3064,7 +3075,7 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
#ifdef CONFIG_BRIDGE_NETFILTER
nf_bridge_put(dst->nf_bridge);
#endif
- __nf_copy(dst, src);
+ __nf_copy(dst, src, true);
}
#ifdef CONFIG_NETWORK_SECMARK
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d4fdc649112c..4be570a4ab21 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -261,7 +261,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
atomic_t *fclone_ref = (atomic_t *) (child + 1);
kmemcheck_annotate_bitfield(child, flags1);
- kmemcheck_annotate_bitfield(child, flags2);
skb->fclone = SKB_FCLONE_ORIG;
atomic_set(fclone_ref, 1);
@@ -675,57 +674,61 @@ void consume_skb(struct sk_buff *skb)
}
EXPORT_SYMBOL(consume_skb);
+/* Make sure a field is enclosed inside headers_start/headers_end section */
+#define CHECK_SKB_FIELD(field) \
+ BUILD_BUG_ON(offsetof(struct sk_buff, field) < \
+ offsetof(struct sk_buff, headers_start)); \
+ BUILD_BUG_ON(offsetof(struct sk_buff, field) > \
+ offsetof(struct sk_buff, headers_end)); \
+
static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
{
new->tstamp = old->tstamp;
+ /* We do not copy old->sk */
new->dev = old->dev;
- new->transport_header = old->transport_header;
- new->network_header = old->network_header;
- new->mac_header = old->mac_header;
- new->inner_protocol = old->inner_protocol;
- new->inner_transport_header = old->inner_transport_header;
- new->inner_network_header = old->inner_network_header;
- new->inner_mac_header = old->inner_mac_header;
+ memcpy(new->cb, old->cb, sizeof(old->cb));
skb_dst_copy(new, old);
- skb_copy_hash(new, old);
- new->ooo_okay = old->ooo_okay;
- new->no_fcs = old->no_fcs;
- new->encapsulation = old->encapsulation;
- new->encap_hdr_csum = old->encap_hdr_csum;
- new->csum_valid = old->csum_valid;
- new->csum_complete_sw = old->csum_complete_sw;
#ifdef CONFIG_XFRM
new->sp = secpath_get(old->sp);
#endif
- memcpy(new->cb, old->cb, sizeof(old->cb));
- new->csum = old->csum;
- new->ignore_df = old->ignore_df;
- new->pkt_type = old->pkt_type;
- new->ip_summed = old->ip_summed;
- skb_copy_queue_mapping(new, old);
- new->priority = old->priority;
-#if IS_ENABLED(CONFIG_IP_VS)
- new->ipvs_property = old->ipvs_property;
+ __nf_copy(new, old, false);
+
+ /* Note : this field could be in headers_start/headers_end section
+ * It is not yet because we do not want to have a 16 bit hole
+ */
+ new->queue_mapping = old->queue_mapping;
+
+ memcpy(&new->headers_start, &old->headers_start,
+ offsetof(struct sk_buff, headers_end) -
+ offsetof(struct sk_buff, headers_start));
+ CHECK_SKB_FIELD(protocol);
+ CHECK_SKB_FIELD(csum);
+ CHECK_SKB_FIELD(hash);
+ CHECK_SKB_FIELD(priority);
+ CHECK_SKB_FIELD(skb_iif);
+ CHECK_SKB_FIELD(vlan_proto);
+ CHECK_SKB_FIELD(vlan_tci);
+ CHECK_SKB_FIELD(transport_header);
+ CHECK_SKB_FIELD(network_header);
+ CHECK_SKB_FIELD(mac_header);
+ CHECK_SKB_FIELD(inner_protocol);
+ CHECK_SKB_FIELD(inner_transport_header);
+ CHECK_SKB_FIELD(inner_network_header);
+ CHECK_SKB_FIELD(inner_mac_header);
+ CHECK_SKB_FIELD(mark);
+#ifdef CONFIG_NETWORK_SECMARK
+ CHECK_SKB_FIELD(secmark);
+#endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ CHECK_SKB_FIELD(napi_id);
#endif
- new->pfmemalloc = old->pfmemalloc;
- new->protocol = old->protocol;
- new->mark = old->mark;
- new->skb_iif = old->skb_iif;
- __nf_copy(new, old);
#ifdef CONFIG_NET_SCHED
- new->tc_index = old->tc_index;
+ CHECK_SKB_FIELD(tc_index);
#ifdef CONFIG_NET_CLS_ACT
- new->tc_verd = old->tc_verd;
+ CHECK_SKB_FIELD(tc_verd);
#endif
#endif
- new->vlan_proto = old->vlan_proto;
- new->vlan_tci = old->vlan_tci;
-
- skb_copy_secmark(new, old);
-#ifdef CONFIG_NET_RX_BUSY_POLL
- new->napi_id = old->napi_id;
-#endif
}
/*
@@ -876,7 +879,6 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
return NULL;
kmemcheck_annotate_bitfield(n, flags1);
- kmemcheck_annotate_bitfield(n, flags2);
n->fclone = SKB_FCLONE_UNAVAILABLE;
}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 net-next] net: reorganize sk_buff for faster __copy_skb_header()
2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet
@ 2014-09-29 16:27 ` David Miller
0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-09-29 16:27 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 28 Sep 2014 22:18:47 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> With proliferation of bit fields in sk_buff, __copy_skb_header() became
> quite expensive, showing as the most expensive function in a GSO
> workload.
>
> __copy_skb_header() performance is also critical for non GSO TCP
> operations, as it is used from skb_clone()
>
> This patch carefully moves all the fields that were not copied in a
> separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
>
> Then I moved all other fields and all other copied fields in a section
> delimited by headers_start[0]/headers_end[0] section so that we
> can use a single memcpy() call, inlined by compiler using long
> word load/stores.
>
> I also tried to make all copies in the natural orders of sk_buff,
> to help hardware prefetching.
>
> I made sure sk_buff size did not change.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> v2 : added back ipvs_property, oops ;)
Applied :-)
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-09-29 16:27 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet
2014-09-25 14:43 ` Jesper Dangaard Brouer
2014-09-25 15:06 ` Eric Dumazet
2014-09-25 15:36 ` Eric Dumazet
2014-09-25 15:40 ` Eric Dumazet
2014-09-25 16:19 ` Eric Dumazet
2014-09-25 20:47 ` Eric Dumazet
2014-09-25 20:58 ` Joe Perches
2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet
2014-09-26 8:35 ` Jesper Dangaard Brouer
2014-09-28 21:37 ` David Miller
2014-09-29 4:06 ` David Miller
2014-09-29 4:57 ` Eric Dumazet
2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet
2014-09-29 16:27 ` David Miller
2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov
2014-09-25 16:05 ` Eric Dumazet
2014-09-25 16:08 ` Alexei Starovoitov
2014-09-28 21:27 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).