From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH] Fixed zero copy GSO without orphaning the fragments Date: Sun, 1 Jun 2014 15:21:52 +0300 Message-ID: <20140601122152.GB22450@redhat.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "David S. Miller" , Eric Dumazet , Tom Herbert , Daniel Borkmann , Nicolas Dichtel , Simon Horman , Joe Perches , Jiri Pirko , Florian Westphal , Paul Durrant , Thomas Graf , Jan Beulich , Herbert Xu , Miklos Szeredi , linux-kernel , netdev , Anton Nayshtut To: Igor Royzis Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Sun, Jun 01, 2014 at 02:39:39PM +0300, Igor Royzis wrote: > The patch discussion seems got lost due to a delay it took us to get > the numbers. We believe that a 24% improvement in VM's network > performance (and probably the better improvement the more guests are > running on a host) is worth commenting and getting to some conclusion. Absolutely, but we need to find a way to address Eric's comments. > > Before your patch : > > > > sizeof(struct skb_shared_info)=0x140 > > offsetof(struct skb_shared_info, frags[1])=0x40 > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) -> 0x140 > > > > After your patch : > > > > sizeof(struct skb_shared_info)=0x148 > > offsetof(struct skb_shared_info, frags[1])=0x48 > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) -> 0x180 > > > > Thats a serious bump, because it increases all skb truesizes, and > > typical skb with one fragment will use 2 cache lines instead of one in > > struct skb_shared_info, so this adds memory pressure in fast path. > > > > So while this patch might increase performance for some workloads, > > it generally decreases performance on many others. > > Would moving the parent fragment pointer from skb_shared_info to > skbuff solve this issue? > > Regards, > -Igor > > On Sun, May 25, 2014 at 1:54 PM, Igor Royzis wrote: > >> If true, I'd like to see some performance numbers please. > > > > The numbers have been obtained by running iperf between 2 QEMU Win2012 > > VMs, 4 vCPU/ 4GB RAM each. > > iperf parameters: -w 256K -l 256K -t 300 > > Original kernel 3.15.0-rc5: 34.4 Gbytes transferred, 984 > > Mbits/sec bandwidth. > > Kernel 3.15.0-rc5 with our patch: 42.5 Gbytes transferred, 1.22 > > Gbits/sec bandwidth. > > > > Overall improvement is about 24%. > > Below are raw iperf outputs. > > > > kernel 3.15.0-rc5: > > C:\iperf>iperf -c 192.168.11.2 -w 256K -l 256K -t 300 > > ------------------------------------------------------------ > > Client connecting to 192.168.11.2, TCP port 5001 > > TCP window size: 256 KByte > > ------------------------------------------------------------ > > [ 3] local 192.168.11.1 port 49167 connected with 192.168.11.2 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-300.7 sec 34.4 GBytes 984 Mbits/sec > > > > kernel 3.15.0-rc5-patched: > > C:\iperf>iperf -c 192.168.11.2 -w 256K -l 256K -t 300 > > ------------------------------------------------------------ > > Client connecting to 192.168.11.2, TCP port 5001 > > TCP window size: 256 KByte > > ------------------------------------------------------------ > > [ 3] local 192.168.11.1 port 49167 connected with 192.168.11.2 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0-300.7 sec 42.5 GBytes 1.22 Gbits/sec > > > > On Tue, May 20, 2014 at 2:50 PM, Michael S. Tsirkin wrote: > >> > >> On Tue, May 20, 2014 at 02:24:21PM +0300, Igor Royzis wrote: > >> > Fix accessing GSO fragments memory (and a possible corruption therefore) after > >> > reporting completion in a zero copy callback. The previous fix in the commit 1fd819ec > >> > orphaned frags which eliminates zero copy advantages. The fix makes the completion > >> > called after all the fragments were processed avoiding unnecessary orphaning/copying > >> > from userspace. > >> > > >> > The GSO fragments corruption issue was observed in a typical QEMU/KVM VM setup that > >> > hosts a Windows guest (since QEMU virtio-net Windows driver doesn't support GRO). > >> > The fix has been verified by running the HCK OffloadLSO test. > >> > > >> > Signed-off-by: Igor Royzis > >> > Signed-off-by: Anton Nayshtut > >> > >> OK but with 1fd819ec there's no corruption, correct? > >> So this patch is in fact an optimization? > >> If true, I'd like to see some performance numbers please. > >> > >> Thanks! > >> > >> > --- > >> > include/linux/skbuff.h | 1 + > >> > net/core/skbuff.c | 18 +++++++++++++----- > >> > 2 files changed, 14 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > >> > index 08074a8..8c49edc 100644 > >> > --- a/include/linux/skbuff.h > >> > +++ b/include/linux/skbuff.h > >> > @@ -287,6 +287,7 @@ struct skb_shared_info { > >> > struct sk_buff *frag_list; > >> > struct skb_shared_hwtstamps hwtstamps; > >> > __be32 ip6_frag_id; > >> > + struct sk_buff *zcopy_src; > >> > > >> > /* > >> > * Warning : all fields before dataref are cleared in __alloc_skb() > >> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> > index 1b62343..6fa6342 100644 > >> > --- a/net/core/skbuff.c > >> > +++ b/net/core/skbuff.c > >> > @@ -610,14 +610,18 @@ EXPORT_SYMBOL(__kfree_skb); > >> > */ > >> > void kfree_skb(struct sk_buff *skb) > >> > { > >> > + struct sk_buff *zcopy_src; > >> > if (unlikely(!skb)) > >> > return; > >> > if (likely(atomic_read(&skb->users) == 1)) > >> > smp_rmb(); > >> > else if (likely(!atomic_dec_and_test(&skb->users))) > >> > return; > >> > + zcopy_src = skb_shinfo(skb)->zcopy_src; > >> > trace_kfree_skb(skb, __builtin_return_address(0)); > >> > __kfree_skb(skb); > >> > + if (unlikely(zcopy_src)) > >> > + kfree_skb(zcopy_src); > >> > } > >> > EXPORT_SYMBOL(kfree_skb); > >> > > >> > @@ -662,14 +666,18 @@ EXPORT_SYMBOL(skb_tx_error); > >> > */ > >> > void consume_skb(struct sk_buff *skb) > >> > { > >> > + struct sk_buff *zcopy_src; > >> > if (unlikely(!skb)) > >> > return; > >> > if (likely(atomic_read(&skb->users) == 1)) > >> > smp_rmb(); > >> > else if (likely(!atomic_dec_and_test(&skb->users))) > >> > return; > >> > + zcopy_src = skb_shinfo(skb)->zcopy_src; > >> > trace_consume_skb(skb); > >> > __kfree_skb(skb); > >> > + if (unlikely(zcopy_src)) > >> > + consume_skb(zcopy_src); > >> > } > >> > EXPORT_SYMBOL(consume_skb); > >> > > >> > @@ -2867,7 +2875,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > >> > skb_frag_t *frag = skb_shinfo(head_skb)->frags; > >> > unsigned int mss = skb_shinfo(head_skb)->gso_size; > >> > unsigned int doffset = head_skb->data - skb_mac_header(head_skb); > >> > - struct sk_buff *frag_skb = head_skb; > >> > unsigned int offset = doffset; > >> > unsigned int tnl_hlen = skb_tnl_header_len(head_skb); > >> > unsigned int headroom; > >> > @@ -2913,7 +2920,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > >> > i = 0; > >> > nfrags = skb_shinfo(list_skb)->nr_frags; > >> > frag = skb_shinfo(list_skb)->frags; > >> > - frag_skb = list_skb; > >> > pos += skb_headlen(list_skb); > >> > > >> > while (pos < offset + len) { > >> > @@ -2975,6 +2981,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > >> > nskb->data - tnl_hlen, > >> > doffset + tnl_hlen); > >> > > >> > + if (skb_shinfo(head_skb)->tx_flags & SKBTX_DEV_ZEROCOPY) { > >> > + skb_shinfo(nskb)->zcopy_src = head_skb; > >> > + atomic_inc(&head_skb->users); > >> > + } > >> > + > >> > if (nskb->len == len + doffset) > >> > goto perform_csum_check; > >> > > >> > @@ -3001,7 +3012,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > >> > i = 0; > >> > nfrags = skb_shinfo(list_skb)->nr_frags; > >> > frag = skb_shinfo(list_skb)->frags; > >> > - frag_skb = list_skb; > >> > > >> > BUG_ON(!nfrags); > >> > > >> > @@ -3016,8 +3026,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > >> > goto err; > >> > } > >> > > >> > - if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC))) > >> > - goto err; > >> > > >> > *nskb_frag = *frag; > >> > __skb_frag_ref(nskb_frag); > >> > -- > >> > 1.7.9.5