* [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell
@ 2014-09-25 14:17 Eric Dumazet
2014-09-25 14:43 ` Jesper Dangaard Brouer
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Eric Dumazet @ 2014-09-25 14:17 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Amir Vadai
From: Eric Dumazet <edumazet@google.com>
skb->xmit_more tells us if another skb is coming next.
We need to send doorbell when : xmit_more is not set,
or txqueue is stopped (preventing next skb to come immediately)
Tested with a modified pktgen version, I got a 40% increase of
throughput.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 23 +++++++++++++------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index c44f4237b9be..adedc47e947d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -667,6 +667,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
int lso_header_size;
void *fragptr;
bool bounce = false;
+ bool send_doorbell;
if (!priv->port_up)
goto tx_drop;
@@ -878,12 +879,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
- if (ring->bf_enabled && desc_size <= MAX_BF && !bounce && !vlan_tx_tag_present(skb)) {
+ send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
+
+ if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
+ !vlan_tx_tag_present(skb) && send_doorbell) {
tx_desc->ctrl.bf_qpn |= cpu_to_be32(ring->doorbell_qpn);
op_own |= htonl((bf_index & 0xffff) << 8);
- /* Ensure new descirptor hits memory
- * before setting ownership of this descriptor to HW */
+ /* Ensure new descriptor hits memory
+ * before setting ownership of this descriptor to HW
+ */
wmb();
tx_desc->ctrl.owner_opcode = op_own;
@@ -896,12 +901,16 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
ring->bf.offset ^= ring->bf.buf_size;
} else {
- /* Ensure new descirptor hits memory
- * before setting ownership of this descriptor to HW */
+ /* Ensure new descriptor hits memory
+ * before setting ownership of this descriptor to HW
+ */
wmb();
tx_desc->ctrl.owner_opcode = op_own;
- wmb();
- iowrite32be(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+ if (send_doorbell) {
+ wmb();
+ iowrite32be(ring->doorbell_qpn,
+ ring->bf.uar->map + MLX4_SEND_DOORBELL);
+ }
}
return NETDEV_TX_OK;
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet @ 2014-09-25 14:43 ` Jesper Dangaard Brouer 2014-09-25 15:06 ` Eric Dumazet 2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov 2014-09-28 21:27 ` David Miller 2 siblings, 1 reply; 19+ messages in thread From: Jesper Dangaard Brouer @ 2014-09-25 14:43 UTC (permalink / raw) To: Eric Dumazet; +Cc: brouer, David Miller, netdev, Amir Vadai, John Fastabend On Thu, 25 Sep 2014 07:17:49 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > Tested with a modified pktgen version, I got a 40% increase of > throughput. Pktgen is artificial benchmarking, you know ;-) :-P If you really must use pktgen for this, you can use an unmodified pktgen against the qdisc layer by adding a VLAN interface on-top of the device you want to test. I blame John, for telling me this ;-) /me running away -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 14:43 ` Jesper Dangaard Brouer @ 2014-09-25 15:06 ` Eric Dumazet 2014-09-25 15:36 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 15:06 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 16:43 +0200, Jesper Dangaard Brouer wrote: > On Thu, 25 Sep 2014 07:17:49 -0700 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > Tested with a modified pktgen version, I got a 40% increase of > > throughput. > > Pktgen is artificial benchmarking, you know ;-) :-P > > If you really must use pktgen for this, you can use an unmodified > pktgen against the qdisc layer by adding a VLAN interface on-top of the > device you want to test. I blame John, for telling me this ;-) > > /me running away But... Nothing yet in qdisc layer sets xmit_more, unless TSO is disabled, and a GSO packet is segmented. So you might notice some improvement with this patch, but not if you use TSO, and a 40GB NIC is better with TSO on you know ;) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 15:06 ` Eric Dumazet @ 2014-09-25 15:36 ` Eric Dumazet 2014-09-25 15:40 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 15:36 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 08:06 -0700, Eric Dumazet wrote: > > But... Nothing yet in qdisc layer sets xmit_more, unless TSO is > disabled, and a GSO packet is segmented. > > So you might notice some improvement with this patch, but not if you use > TSO, and a 40GB NIC is better with TSO on you know ;) > So playing with TSO=off (but GSO=on), we see about 12 % increase of throughput on a TCP_STREAM flow. And kernel profile clearly shows the difference of sending or not the doorbell, as mlx4_en_xmit() shifts. Note also mlx4_en_free_tx_desc() disappears, this might point to some false sharing or something worth investigating. Before patch : 26.36% [kernel] [k] __copy_skb_header 16.59% [kernel] [k] mlx4_en_xmit 5.62% [kernel] [k] __alloc_skb 5.48% [kernel] [k] copy_user_enhanced_fast_string 4.48% [kernel] [k] skb_segment 2.46% [kernel] [k] mlx4_en_free_tx_desc.isra.27 2.31% [kernel] [k] _raw_spin_lock 2.10% [kernel] [k] memcpy 2.01% [kernel] [k] tcp_sendmsg 1.62% [kernel] [k] __iowrite64_copy After patch : 32.78% [kernel] [k] __copy_skb_header 8.26% [kernel] [k] mlx4_en_xmit 7.25% [kernel] [k] __alloc_skb 7.18% [kernel] [k] copy_user_enhanced_fast_string 4.39% [kernel] [k] skb_segment 2.87% [kernel] [k] memcpy 2.59% [kernel] [k] tcp_sendmsg 2.50% [kernel] [k] _raw_spin_lock 2.38% [kernel] [k] dev_hard_start_xmit 1.52% [kernel] [k] tcp_gso_segment 1.50% [kernel] [k] kmem_cache_alloc_node_trace 1.40% [kernel] [k] kmem_cache_alloc_node 1.16% [kernel] [k] ip_send_check ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 15:36 ` Eric Dumazet @ 2014-09-25 15:40 ` Eric Dumazet 2014-09-25 16:19 ` Eric Dumazet 2014-09-25 20:47 ` Eric Dumazet 2 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 15:40 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote: > So playing with TSO=off (but GSO=on), we see about 12 % increase of > throughput on a TCP_STREAM flow. Correction, this was a 20 % increase. ( 849284 pps -> 1023469 pps) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 15:36 ` Eric Dumazet 2014-09-25 15:40 ` Eric Dumazet @ 2014-09-25 16:19 ` Eric Dumazet 2014-09-25 20:47 ` Eric Dumazet 2 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 16:19 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote: > 26.36% [kernel] [k] __copy_skb_header Note to myself : Time to optimize this thing, doing a memset() to copy a bunch of consecutive fields. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 15:36 ` Eric Dumazet 2014-09-25 15:40 ` Eric Dumazet 2014-09-25 16:19 ` Eric Dumazet @ 2014-09-25 20:47 ` Eric Dumazet 2014-09-25 20:58 ` Joe Perches 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet 2 siblings, 2 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 20:47 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 08:36 -0700, Eric Dumazet wrote: > After patch : > > 32.78% [kernel] [k] __copy_skb_header > 8.26% [kernel] [k] mlx4_en_xmit > 7.25% [kernel] [k] __alloc_skb > 7.18% [kernel] [k] copy_user_enhanced_fast_string > 4.39% [kernel] [k] skb_segment > 2.87% [kernel] [k] memcpy > 2.59% [kernel] [k] tcp_sendmsg > 2.50% [kernel] [k] _raw_spin_lock > 2.38% [kernel] [k] dev_hard_start_xmit > 1.52% [kernel] [k] tcp_gso_segment > 1.50% [kernel] [k] kmem_cache_alloc_node_trace > 1.40% [kernel] [k] kmem_cache_alloc_node > 1.16% [kernel] [k] ip_send_check > After __copy_skb_header() optimization I get something even nicer ;) 19.90% [kernel] [k] __copy_skb_header 15.16% [kernel] [k] skb_segment 7.02% [kernel] [k] __alloc_skb 6.89% [kernel] [k] copy_user_enhanced_fast_string 6.60% [kernel] [k] mlx4_en_xmit 2.70% [kernel] [k] _raw_spin_lock 2.68% [kernel] [k] memcpy 2.41% [kernel] [k] tcp_sendmsg I'll post this patch asap. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 20:47 ` Eric Dumazet @ 2014-09-25 20:58 ` Joe Perches 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet 1 sibling, 0 replies; 19+ messages in thread From: Joe Perches @ 2014-09-25 20:58 UTC (permalink / raw) To: Eric Dumazet Cc: Jesper Dangaard Brouer, David Miller, netdev, Amir Vadai, John Fastabend On Thu, 2014-09-25 at 13:47 -0700, Eric Dumazet wrote: > After patch : > 32.78% [kernel] [k] __copy_skb_header > After __copy_skb_header() optimization I get something even nicer ;) > 19.90% [kernel] [k] __copy_skb_header > I'll post this patch asap. Eric, you are _killing_ it. network ninja indeed... ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-25 20:47 ` Eric Dumazet 2014-09-25 20:58 ` Joe Perches @ 2014-09-25 21:20 ` Eric Dumazet 2014-09-26 8:35 ` Jesper Dangaard Brouer ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 21:20 UTC (permalink / raw) To: David Miller; +Cc: netdev, Amir Vadai, John Fastabend, Jesper Dangaard Brouer From: Eric Dumazet <edumazet@google.com> With proliferation of bit fields in sk_buff, __copy_skb_header() became quite expensive, showing as the most expensive function in a GSO workload. __copy_skb_header() performance is also critical for non GSO TCP operations, as it is used from skb_clone() This patch carefully moves all the fields that were not copied in a separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more Then I moved all other fields and all other copied fields in a section delimited by headers_start[0]/headers_end[0] section so that we can use a single memcpy() call, inlined by compiler using long word load/stores. I also tried to make all copies in the natural orders of sk_buff, to help hardware prefetching. I made sure sk_buff size did not change. Signed-off-by: Eric Dumazet <edumazet@google.com> --- include/linux/skbuff.h | 132 ++++++++++++++++++++------------------- net/core/skbuff.c | 80 ++++++++++++----------- 2 files changed, 112 insertions(+), 100 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f1bfa3781c75..180f55b9ef21 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -527,27 +527,41 @@ struct sk_buff { char cb[48] __aligned(8); unsigned long _skb_refdst; + void (*destructor)(struct sk_buff *skb); #ifdef CONFIG_XFRM struct sec_path *sp; #endif +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + struct nf_conntrack *nfct; +#endif +#ifdef CONFIG_BRIDGE_NETFILTER + struct nf_bridge_info *nf_bridge; +#endif unsigned int len, data_len; __u16 mac_len, hdr_len; - union { - __wsum csum; - struct { - __u16 csum_start; - __u16 csum_offset; - }; - }; - __u32 priority; + + /* Following fields are _not_ copied in __copy_skb_header() + * Note that queue_mapping is here mostly to fill a hole. + */ kmemcheck_bitfield_begin(flags1); - __u8 ignore_df:1, - cloned:1, - ip_summed:2, + __u16 queue_mapping; + __u8 cloned:1, nohdr:1, - nfctinfo:3; + fclone:2, + peeked:1, + head_frag:1, + xmit_more:1; + /* one bit hole */ + kmemcheck_bitfield_end(flags1); + + + + /* fields enclosed in headers_start/headers_end are copied + * using a single memcpy() in __copy_skb_header() + */ + __u32 headers_start[0]; /* if you move pkt_type around you also must adapt those constants */ #ifdef __BIG_ENDIAN_BITFIELD @@ -558,58 +572,52 @@ struct sk_buff { #define PKT_TYPE_OFFSET() offsetof(struct sk_buff, __pkt_type_offset) __u8 __pkt_type_offset[0]; - __u8 pkt_type:3, - fclone:2, - ipvs_property:1, - peeked:1, - nf_trace:1; - kmemcheck_bitfield_end(flags1); - __be16 protocol; - - void (*destructor)(struct sk_buff *skb); -#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) - struct nf_conntrack *nfct; -#endif -#ifdef CONFIG_BRIDGE_NETFILTER - struct nf_bridge_info *nf_bridge; -#endif - - int skb_iif; - - __u32 hash; - - __be16 vlan_proto; - __u16 vlan_tci; - -#ifdef CONFIG_NET_SCHED - __u16 tc_index; /* traffic control index */ -#ifdef CONFIG_NET_CLS_ACT - __u16 tc_verd; /* traffic control verdict */ -#endif -#endif - - __u16 queue_mapping; - kmemcheck_bitfield_begin(flags2); - __u8 xmit_more:1; -#ifdef CONFIG_IPV6_NDISC_NODETYPE - __u8 ndisc_nodetype:2; -#endif + __u8 pkt_type:3; __u8 pfmemalloc:1; + __u8 ignore_df:1; + __u8 nfctinfo:3; + + __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; + __u8 no_fcs:1; - __u8 head_frag:1; /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; __u8 csum_valid:1; __u8 csum_complete_sw:1; - /* 1/3 bit hole (depending on ndisc_nodetype presence) */ - kmemcheck_bitfield_end(flags2); + __u8 csum_level:2; + __u8 csum_bad:1; +#ifdef CONFIG_IPV6_NDISC_NODETYPE + __u8 ndisc_nodetype:2; +#endif + /* 6/8 bit hole */ + +#ifdef CONFIG_NET_SCHED + __u16 tc_index; /* traffic control index */ +#ifdef CONFIG_NET_CLS_ACT + __u16 tc_verd; /* traffic control verdict */ +#endif +#endif + + union { + __wsum csum; + struct { + __u16 csum_start; + __u16 csum_offset; + }; + }; + __u32 priority; + int skb_iif; + __u32 hash; + __be16 vlan_proto; + __u16 vlan_tci; #if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL union { unsigned int napi_id; @@ -625,19 +633,18 @@ struct sk_buff { __u32 reserved_tailroom; }; - kmemcheck_bitfield_begin(flags3); - __u8 csum_level:2; - __u8 csum_bad:1; - /* 13 bit hole */ - kmemcheck_bitfield_end(flags3); - __be16 inner_protocol; __u16 inner_transport_header; __u16 inner_network_header; __u16 inner_mac_header; + + __be16 protocol; __u16 transport_header; __u16 network_header; __u16 mac_header; + + __u32 headers_end[0]; + /* These elements must be at the end, see alloc_skb() for details. */ sk_buff_data_t tail; sk_buff_data_t end; @@ -3025,19 +3032,22 @@ static inline void nf_reset_trace(struct sk_buff *skb) } /* Note: This doesn't put any conntrack and bridge info in dst. */ -static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src) +static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, + bool copy) { #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) dst->nfct = src->nfct; nf_conntrack_get(src->nfct); - dst->nfctinfo = src->nfctinfo; + if (copy) + dst->nfctinfo = src->nfctinfo; #endif #ifdef CONFIG_BRIDGE_NETFILTER dst->nf_bridge = src->nf_bridge; nf_bridge_get(src->nf_bridge); #endif #if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES) - dst->nf_trace = src->nf_trace; + if (copy) + dst->nf_trace = src->nf_trace; #endif } @@ -3049,7 +3059,7 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src) #ifdef CONFIG_BRIDGE_NETFILTER nf_bridge_put(dst->nf_bridge); #endif - __nf_copy(dst, src); + __nf_copy(dst, src, true); } #ifdef CONFIG_NETWORK_SECMARK diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 06a8feb10099..57821927700e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -261,7 +261,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, atomic_t *fclone_ref = (atomic_t *) (child + 1); kmemcheck_annotate_bitfield(child, flags1); - kmemcheck_annotate_bitfield(child, flags2); skb->fclone = SKB_FCLONE_ORIG; atomic_set(fclone_ref, 1); @@ -674,57 +673,61 @@ void consume_skb(struct sk_buff *skb) } EXPORT_SYMBOL(consume_skb); +/* Make sure a field is enclosed inside headers_start/headers_end section */ +#define CHECK_SKB_FIELD(field) \ + BUILD_BUG_ON(offsetof(struct sk_buff, field) < \ + offsetof(struct sk_buff, headers_start)); \ + BUILD_BUG_ON(offsetof(struct sk_buff, field) >= \ + offsetof(struct sk_buff, headers_end)); \ + static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old) { new->tstamp = old->tstamp; + /* We do not copy old->sk */ new->dev = old->dev; - new->transport_header = old->transport_header; - new->network_header = old->network_header; - new->mac_header = old->mac_header; - new->inner_protocol = old->inner_protocol; - new->inner_transport_header = old->inner_transport_header; - new->inner_network_header = old->inner_network_header; - new->inner_mac_header = old->inner_mac_header; + memcpy(new->cb, old->cb, sizeof(old->cb)); skb_dst_copy(new, old); - skb_copy_hash(new, old); - new->ooo_okay = old->ooo_okay; - new->no_fcs = old->no_fcs; - new->encapsulation = old->encapsulation; - new->encap_hdr_csum = old->encap_hdr_csum; - new->csum_valid = old->csum_valid; - new->csum_complete_sw = old->csum_complete_sw; #ifdef CONFIG_XFRM new->sp = secpath_get(old->sp); #endif - memcpy(new->cb, old->cb, sizeof(old->cb)); - new->csum = old->csum; - new->ignore_df = old->ignore_df; - new->pkt_type = old->pkt_type; - new->ip_summed = old->ip_summed; - skb_copy_queue_mapping(new, old); - new->priority = old->priority; -#if IS_ENABLED(CONFIG_IP_VS) - new->ipvs_property = old->ipvs_property; + __nf_copy(new, old, false); + + /* Note : this field could be in headers_start/headers_end section + * It is not yet because we do not want to have a 16 bit hole + */ + new->queue_mapping = old->queue_mapping; + + memcpy(&new->headers_start, &old->headers_start, + offsetof(struct sk_buff, headers_end) - + offsetof(struct sk_buff, headers_start)); + CHECK_SKB_FIELD(protocol); + CHECK_SKB_FIELD(csum); + CHECK_SKB_FIELD(hash); + CHECK_SKB_FIELD(priority); + CHECK_SKB_FIELD(skb_iif); + CHECK_SKB_FIELD(vlan_proto); + CHECK_SKB_FIELD(vlan_tci); + CHECK_SKB_FIELD(transport_header); + CHECK_SKB_FIELD(network_header); + CHECK_SKB_FIELD(mac_header); + CHECK_SKB_FIELD(inner_protocol); + CHECK_SKB_FIELD(inner_transport_header); + CHECK_SKB_FIELD(inner_network_header); + CHECK_SKB_FIELD(inner_mac_header); + CHECK_SKB_FIELD(mark); +#ifdef CONFIG_NETWORK_SECMARK + CHECK_SKB_FIELD(secmark); +#endif +#ifdef CONFIG_NET_RX_BUSY_POLL + CHECK_SKB_FIELD(napi_id); #endif - new->pfmemalloc = old->pfmemalloc; - new->protocol = old->protocol; - new->mark = old->mark; - new->skb_iif = old->skb_iif; - __nf_copy(new, old); #ifdef CONFIG_NET_SCHED - new->tc_index = old->tc_index; + CHECK_SKB_FIELD(tc_index); #ifdef CONFIG_NET_CLS_ACT - new->tc_verd = old->tc_verd; + CHECK_SKB_FIELD(tc_verd); #endif #endif - new->vlan_proto = old->vlan_proto; - new->vlan_tci = old->vlan_tci; - - skb_copy_secmark(new, old); -#ifdef CONFIG_NET_RX_BUSY_POLL - new->napi_id = old->napi_id; -#endif } /* @@ -875,7 +878,6 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) return NULL; kmemcheck_annotate_bitfield(n, flags1); - kmemcheck_annotate_bitfield(n, flags2); n->fclone = SKB_FCLONE_UNAVAILABLE; } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet @ 2014-09-26 8:35 ` Jesper Dangaard Brouer 2014-09-28 21:37 ` David Miller 2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet 2 siblings, 0 replies; 19+ messages in thread From: Jesper Dangaard Brouer @ 2014-09-26 8:35 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai, John Fastabend, brouer On Thu, 25 Sep 2014 14:20:04 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > With proliferation of bit fields in sk_buff, __copy_skb_header() became > quite expensive, showing as the most expensive function in a GSO > workload. > > __copy_skb_header() performance is also critical for non GSO TCP > operations, as it is used from skb_clone() > > This patch carefully moves all the fields that were not copied in a > separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more > > Then I moved all other fields and all other copied fields in a section > delimited by headers_start[0]/headers_end[0] section so that we > can use a single memcpy() call, inlined by compiler using long > word load/stores. > > I also tried to make all copies in the natural orders of sk_buff, > to help hardware prefetching. > > I made sure sk_buff size did not change. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- I'm impressed, network ninja! Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> It would have been nice if you would have included the performance improvement you saw, e.g. from: http://thread.gmane.org/gmane.linux.network/332035/focus=332086 -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet 2014-09-26 8:35 ` Jesper Dangaard Brouer @ 2014-09-28 21:37 ` David Miller 2014-09-29 4:06 ` David Miller 2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet 2 siblings, 1 reply; 19+ messages in thread From: David Miller @ 2014-09-28 21:37 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 25 Sep 2014 14:20:04 -0700 > From: Eric Dumazet <edumazet@google.com> > > With proliferation of bit fields in sk_buff, __copy_skb_header() became > quite expensive, showing as the most expensive function in a GSO > workload. > > __copy_skb_header() performance is also critical for non GSO TCP > operations, as it is used from skb_clone() > > This patch carefully moves all the fields that were not copied in a > separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more > > Then I moved all other fields and all other copied fields in a section > delimited by headers_start[0]/headers_end[0] section so that we > can use a single memcpy() call, inlined by compiler using long > word load/stores. > > I also tried to make all copies in the natural orders of sk_buff, > to help hardware prefetching. > > I made sure sk_buff size did not change. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-28 21:37 ` David Miller @ 2014-09-29 4:06 ` David Miller 2014-09-29 4:57 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: David Miller @ 2014-09-29 4:06 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer From: David Miller <davem@davemloft.net> Date: Sun, 28 Sep 2014 17:37:59 -0400 (EDT) > Applied, thanks! Eric, I had to revert. Please enable IPVS in your build tests. net/ipv4/ip_output.c: In function ‘ip_copy_metadata’: net/ipv4/ip_output.c:470:4: error: ‘struct sk_buff’ has no member named ‘ipvs_property’ to->ipvs_property = from->ipvs_property; ^ net/ipv4/ip_output.c:470:26: error: ‘struct sk_buff’ has no member named ‘ipvs_property’ to->ipvs_property = from->ipvs_property; ^ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-29 4:06 ` David Miller @ 2014-09-29 4:57 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2014-09-29 4:57 UTC (permalink / raw) To: David Miller; +Cc: netdev, amirv, john.r.fastabend, brouer On Mon, 2014-09-29 at 00:06 -0400, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Sun, 28 Sep 2014 17:37:59 -0400 (EDT) > > > Applied, thanks! > > Eric, I had to revert. Please enable IPVS in your build tests. > > net/ipv4/ip_output.c: In function ‘ip_copy_metadata’: > net/ipv4/ip_output.c:470:4: error: ‘struct sk_buff’ has no member named ‘ipvs_property’ > to->ipvs_property = from->ipvs_property; > ^ > net/ipv4/ip_output.c:470:26: error: ‘struct sk_buff’ has no member named ‘ipvs_property’ > to->ipvs_property = from->ipvs_property; Ah sorry, will send a v2. Thanks ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet 2014-09-26 8:35 ` Jesper Dangaard Brouer 2014-09-28 21:37 ` David Miller @ 2014-09-29 5:18 ` Eric Dumazet 2014-09-29 16:27 ` David Miller 2 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2014-09-29 5:18 UTC (permalink / raw) To: David Miller; +Cc: netdev, Amir Vadai, John Fastabend, Jesper Dangaard Brouer From: Eric Dumazet <edumazet@google.com> With proliferation of bit fields in sk_buff, __copy_skb_header() became quite expensive, showing as the most expensive function in a GSO workload. __copy_skb_header() performance is also critical for non GSO TCP operations, as it is used from skb_clone() This patch carefully moves all the fields that were not copied in a separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more Then I moved all other fields and all other copied fields in a section delimited by headers_start[0]/headers_end[0] section so that we can use a single memcpy() call, inlined by compiler using long word load/stores. I also tried to make all copies in the natural orders of sk_buff, to help hardware prefetching. I made sure sk_buff size did not change. Signed-off-by: Eric Dumazet <edumazet@google.com> --- v2 : added back ipvs_property, oops ;) include/linux/skbuff.h | 133 +++++++++++++++++++++------------------ net/core/skbuff.c | 80 ++++++++++++----------- 2 files changed, 113 insertions(+), 100 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8eaa62400fca..b6cced304b26 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -527,27 +527,41 @@ struct sk_buff { char cb[48] __aligned(8); unsigned long _skb_refdst; + void (*destructor)(struct sk_buff *skb); #ifdef CONFIG_XFRM struct sec_path *sp; #endif +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) + struct nf_conntrack *nfct; +#endif +#ifdef CONFIG_BRIDGE_NETFILTER + struct nf_bridge_info *nf_bridge; +#endif unsigned int len, data_len; __u16 mac_len, hdr_len; - union { - __wsum csum; - struct { - __u16 csum_start; - __u16 csum_offset; - }; - }; - __u32 priority; + + /* Following fields are _not_ copied in __copy_skb_header() + * Note that queue_mapping is here mostly to fill a hole. + */ kmemcheck_bitfield_begin(flags1); - __u8 ignore_df:1, - cloned:1, - ip_summed:2, + __u16 queue_mapping; + __u8 cloned:1, nohdr:1, - nfctinfo:3; + fclone:2, + peeked:1, + head_frag:1, + xmit_more:1; + /* one bit hole */ + kmemcheck_bitfield_end(flags1); + + + + /* fields enclosed in headers_start/headers_end are copied + * using a single memcpy() in __copy_skb_header() + */ + __u32 headers_start[0]; /* if you move pkt_type around you also must adapt those constants */ #ifdef __BIG_ENDIAN_BITFIELD @@ -558,58 +572,53 @@ struct sk_buff { #define PKT_TYPE_OFFSET() offsetof(struct sk_buff, __pkt_type_offset) __u8 __pkt_type_offset[0]; - __u8 pkt_type:3, - fclone:2, - ipvs_property:1, - peeked:1, - nf_trace:1; - kmemcheck_bitfield_end(flags1); - __be16 protocol; - - void (*destructor)(struct sk_buff *skb); -#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) - struct nf_conntrack *nfct; -#endif -#ifdef CONFIG_BRIDGE_NETFILTER - struct nf_bridge_info *nf_bridge; -#endif - - int skb_iif; - - __u32 hash; - - __be16 vlan_proto; - __u16 vlan_tci; - -#ifdef CONFIG_NET_SCHED - __u16 tc_index; /* traffic control index */ -#ifdef CONFIG_NET_CLS_ACT - __u16 tc_verd; /* traffic control verdict */ -#endif -#endif - - __u16 queue_mapping; - kmemcheck_bitfield_begin(flags2); - __u8 xmit_more:1; -#ifdef CONFIG_IPV6_NDISC_NODETYPE - __u8 ndisc_nodetype:2; -#endif + __u8 pkt_type:3; __u8 pfmemalloc:1; + __u8 ignore_df:1; + __u8 nfctinfo:3; + + __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; + __u8 no_fcs:1; - __u8 head_frag:1; /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; __u8 csum_valid:1; __u8 csum_complete_sw:1; - /* 1/3 bit hole (depending on ndisc_nodetype presence) */ - kmemcheck_bitfield_end(flags2); + __u8 csum_level:2; + __u8 csum_bad:1; +#ifdef CONFIG_IPV6_NDISC_NODETYPE + __u8 ndisc_nodetype:2; +#endif + __u8 ipvs_property:1; + /* 5 or 7 bit hole */ + +#ifdef CONFIG_NET_SCHED + __u16 tc_index; /* traffic control index */ +#ifdef CONFIG_NET_CLS_ACT + __u16 tc_verd; /* traffic control verdict */ +#endif +#endif + + union { + __wsum csum; + struct { + __u16 csum_start; + __u16 csum_offset; + }; + }; + __u32 priority; + int skb_iif; + __u32 hash; + __be16 vlan_proto; + __u16 vlan_tci; #if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL union { unsigned int napi_id; @@ -625,19 +634,18 @@ struct sk_buff { __u32 reserved_tailroom; }; - kmemcheck_bitfield_begin(flags3); - __u8 csum_level:2; - __u8 csum_bad:1; - /* 13 bit hole */ - kmemcheck_bitfield_end(flags3); - __be16 inner_protocol; __u16 inner_transport_header; __u16 inner_network_header; __u16 inner_mac_header; + + __be16 protocol; __u16 transport_header; __u16 network_header; __u16 mac_header; + + __u32 headers_end[0]; + /* These elements must be at the end, see alloc_skb() for details. */ sk_buff_data_t tail; sk_buff_data_t end; @@ -3040,19 +3048,22 @@ static inline void nf_reset_trace(struct sk_buff *skb) } /* Note: This doesn't put any conntrack and bridge info in dst. */ -static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src) +static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, + bool copy) { #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) dst->nfct = src->nfct; nf_conntrack_get(src->nfct); - dst->nfctinfo = src->nfctinfo; + if (copy) + dst->nfctinfo = src->nfctinfo; #endif #ifdef CONFIG_BRIDGE_NETFILTER dst->nf_bridge = src->nf_bridge; nf_bridge_get(src->nf_bridge); #endif #if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE) || defined(CONFIG_NF_TABLES) - dst->nf_trace = src->nf_trace; + if (copy) + dst->nf_trace = src->nf_trace; #endif } @@ -3064,7 +3075,7 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src) #ifdef CONFIG_BRIDGE_NETFILTER nf_bridge_put(dst->nf_bridge); #endif - __nf_copy(dst, src); + __nf_copy(dst, src, true); } #ifdef CONFIG_NETWORK_SECMARK diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d4fdc649112c..4be570a4ab21 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -261,7 +261,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, atomic_t *fclone_ref = (atomic_t *) (child + 1); kmemcheck_annotate_bitfield(child, flags1); - kmemcheck_annotate_bitfield(child, flags2); skb->fclone = SKB_FCLONE_ORIG; atomic_set(fclone_ref, 1); @@ -675,57 +674,61 @@ void consume_skb(struct sk_buff *skb) } EXPORT_SYMBOL(consume_skb); +/* Make sure a field is enclosed inside headers_start/headers_end section */ +#define CHECK_SKB_FIELD(field) \ + BUILD_BUG_ON(offsetof(struct sk_buff, field) < \ + offsetof(struct sk_buff, headers_start)); \ + BUILD_BUG_ON(offsetof(struct sk_buff, field) > \ + offsetof(struct sk_buff, headers_end)); \ + static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old) { new->tstamp = old->tstamp; + /* We do not copy old->sk */ new->dev = old->dev; - new->transport_header = old->transport_header; - new->network_header = old->network_header; - new->mac_header = old->mac_header; - new->inner_protocol = old->inner_protocol; - new->inner_transport_header = old->inner_transport_header; - new->inner_network_header = old->inner_network_header; - new->inner_mac_header = old->inner_mac_header; + memcpy(new->cb, old->cb, sizeof(old->cb)); skb_dst_copy(new, old); - skb_copy_hash(new, old); - new->ooo_okay = old->ooo_okay; - new->no_fcs = old->no_fcs; - new->encapsulation = old->encapsulation; - new->encap_hdr_csum = old->encap_hdr_csum; - new->csum_valid = old->csum_valid; - new->csum_complete_sw = old->csum_complete_sw; #ifdef CONFIG_XFRM new->sp = secpath_get(old->sp); #endif - memcpy(new->cb, old->cb, sizeof(old->cb)); - new->csum = old->csum; - new->ignore_df = old->ignore_df; - new->pkt_type = old->pkt_type; - new->ip_summed = old->ip_summed; - skb_copy_queue_mapping(new, old); - new->priority = old->priority; -#if IS_ENABLED(CONFIG_IP_VS) - new->ipvs_property = old->ipvs_property; + __nf_copy(new, old, false); + + /* Note : this field could be in headers_start/headers_end section + * It is not yet because we do not want to have a 16 bit hole + */ + new->queue_mapping = old->queue_mapping; + + memcpy(&new->headers_start, &old->headers_start, + offsetof(struct sk_buff, headers_end) - + offsetof(struct sk_buff, headers_start)); + CHECK_SKB_FIELD(protocol); + CHECK_SKB_FIELD(csum); + CHECK_SKB_FIELD(hash); + CHECK_SKB_FIELD(priority); + CHECK_SKB_FIELD(skb_iif); + CHECK_SKB_FIELD(vlan_proto); + CHECK_SKB_FIELD(vlan_tci); + CHECK_SKB_FIELD(transport_header); + CHECK_SKB_FIELD(network_header); + CHECK_SKB_FIELD(mac_header); + CHECK_SKB_FIELD(inner_protocol); + CHECK_SKB_FIELD(inner_transport_header); + CHECK_SKB_FIELD(inner_network_header); + CHECK_SKB_FIELD(inner_mac_header); + CHECK_SKB_FIELD(mark); +#ifdef CONFIG_NETWORK_SECMARK + CHECK_SKB_FIELD(secmark); +#endif +#ifdef CONFIG_NET_RX_BUSY_POLL + CHECK_SKB_FIELD(napi_id); #endif - new->pfmemalloc = old->pfmemalloc; - new->protocol = old->protocol; - new->mark = old->mark; - new->skb_iif = old->skb_iif; - __nf_copy(new, old); #ifdef CONFIG_NET_SCHED - new->tc_index = old->tc_index; + CHECK_SKB_FIELD(tc_index); #ifdef CONFIG_NET_CLS_ACT - new->tc_verd = old->tc_verd; + CHECK_SKB_FIELD(tc_verd); #endif #endif - new->vlan_proto = old->vlan_proto; - new->vlan_tci = old->vlan_tci; - - skb_copy_secmark(new, old); -#ifdef CONFIG_NET_RX_BUSY_POLL - new->napi_id = old->napi_id; -#endif } /* @@ -876,7 +879,6 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) return NULL; kmemcheck_annotate_bitfield(n, flags1); - kmemcheck_annotate_bitfield(n, flags2); n->fclone = SKB_FCLONE_UNAVAILABLE; } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 net-next] net: reorganize sk_buff for faster __copy_skb_header() 2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet @ 2014-09-29 16:27 ` David Miller 0 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2014-09-29 16:27 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, amirv, john.r.fastabend, brouer From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 28 Sep 2014 22:18:47 -0700 > From: Eric Dumazet <edumazet@google.com> > > With proliferation of bit fields in sk_buff, __copy_skb_header() became > quite expensive, showing as the most expensive function in a GSO > workload. > > __copy_skb_header() performance is also critical for non GSO TCP > operations, as it is used from skb_clone() > > This patch carefully moves all the fields that were not copied in a > separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more > > Then I moved all other fields and all other copied fields in a section > delimited by headers_start[0]/headers_end[0] section so that we > can use a single memcpy() call, inlined by compiler using long > word load/stores. > > I also tried to make all copies in the natural orders of sk_buff, > to help hardware prefetching. > > I made sure sk_buff size did not change. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > v2 : added back ipvs_property, oops ;) Applied :-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet 2014-09-25 14:43 ` Jesper Dangaard Brouer @ 2014-09-25 15:53 ` Alexei Starovoitov 2014-09-25 16:05 ` Eric Dumazet 2014-09-28 21:27 ` David Miller 2 siblings, 1 reply; 19+ messages in thread From: Alexei Starovoitov @ 2014-09-25 15:53 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > skb->xmit_more tells us if another skb is coming next. > > We need to send doorbell when : xmit_more is not set, > or txqueue is stopped (preventing next skb to come immediately) > > Tested with a modified pktgen version, I got a 40% increase of > throughput. this is awesome! I've been hacking pktgen as well based on Jesper's earlier patch, but in slightly different way. Sounds like you already have working pktgen with xmit_more. Do you mind sharing it? even rough patch would be great. Thanks ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov @ 2014-09-25 16:05 ` Eric Dumazet 2014-09-25 16:08 ` Alexei Starovoitov 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2014-09-25 16:05 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: David Miller, netdev, Amir Vadai On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote: > On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > skb->xmit_more tells us if another skb is coming next. > > > > We need to send doorbell when : xmit_more is not set, > > or txqueue is stopped (preventing next skb to come immediately) > > > > Tested with a modified pktgen version, I got a 40% increase of > > throughput. > > this is awesome! > > I've been hacking pktgen as well based on Jesper's earlier patch, > but in slightly different way. Sounds like you already have working > pktgen with xmit_more. Do you mind sharing it? even rough patch > would be great. The plan was to add a 'pburst x' parameter. Because not only we want to avoid sending the doorbell, but we could also not doing the spinlock for every start_xmit(). We also can enforce a global [p]rate (say 100000 packets per second), but still allow pktgen to send burts of x packets to use (and test) this xmit_more Nothing ready yet ;) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 16:05 ` Eric Dumazet @ 2014-09-25 16:08 ` Alexei Starovoitov 0 siblings, 0 replies; 19+ messages in thread From: Alexei Starovoitov @ 2014-09-25 16:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai On Thu, Sep 25, 2014 at 9:05 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2014-09-25 at 08:53 -0700, Alexei Starovoitov wrote: >> On Thu, Sep 25, 2014 at 7:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > From: Eric Dumazet <edumazet@google.com> >> > >> > skb->xmit_more tells us if another skb is coming next. >> > >> > We need to send doorbell when : xmit_more is not set, >> > or txqueue is stopped (preventing next skb to come immediately) >> > >> > Tested with a modified pktgen version, I got a 40% increase of >> > throughput. >> >> this is awesome! >> >> I've been hacking pktgen as well based on Jesper's earlier patch, >> but in slightly different way. Sounds like you already have working >> pktgen with xmit_more. Do you mind sharing it? even rough patch >> would be great. > > The plan was to add a 'pburst x' parameter. > > Because not only we want to avoid sending the doorbell, but we could > also not doing the spinlock for every start_xmit(). exactly! I also hacked atomic_inc(&(pkt_dev->skb->users)) into single atomic_add of N packets for bursting to amortize the cost of both spin_lock and lock xadd > We also can enforce a global [p]rate (say 100000 packets per second), > but still allow pktgen to send burts of x packets to use (and test) this > xmit_more that would be nice. I'm not sure how yet. > Nothing ready yet ;) can't wait :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell 2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet 2014-09-25 14:43 ` Jesper Dangaard Brouer 2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov @ 2014-09-28 21:27 ` David Miller 2 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2014-09-28 21:27 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, amirv From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 25 Sep 2014 07:17:49 -0700 > From: Eric Dumazet <edumazet@google.com> > > skb->xmit_more tells us if another skb is coming next. > > We need to send doorbell when : xmit_more is not set, > or txqueue is stopped (preventing next skb to come immediately) > > Tested with a modified pktgen version, I got a 40% increase of > throughput. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied, thanks Eric. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-09-29 16:27 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-25 14:17 [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Eric Dumazet 2014-09-25 14:43 ` Jesper Dangaard Brouer 2014-09-25 15:06 ` Eric Dumazet 2014-09-25 15:36 ` Eric Dumazet 2014-09-25 15:40 ` Eric Dumazet 2014-09-25 16:19 ` Eric Dumazet 2014-09-25 20:47 ` Eric Dumazet 2014-09-25 20:58 ` Joe Perches 2014-09-25 21:20 ` [PATCH net-next] net: reorganize sk_buff for faster __copy_skb_header() Eric Dumazet 2014-09-26 8:35 ` Jesper Dangaard Brouer 2014-09-28 21:37 ` David Miller 2014-09-29 4:06 ` David Miller 2014-09-29 4:57 ` Eric Dumazet 2014-09-29 5:18 ` [PATCH v2 " Eric Dumazet 2014-09-29 16:27 ` David Miller 2014-09-25 15:53 ` [PATCH net-next] mlx4: exploit skb->xmit_more to conditionally send doorbell Alexei Starovoitov 2014-09-25 16:05 ` Eric Dumazet 2014-09-25 16:08 ` Alexei Starovoitov 2014-09-28 21:27 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).