From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Question about __alloc_skb() speedup Date: Fri, 03 Dec 2010 11:50:29 +0100 Message-ID: <1291373429.2897.96.camel@edumazet-laptop> References: <20101203101450.GA9573@Desktop-Junchang> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: Junchang Wang Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:58517 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932346Ab0LCKug (ORCPT ); Fri, 3 Dec 2010 05:50:36 -0500 Received: by wwa36 with SMTP id 36so9879932wwa.1 for ; Fri, 03 Dec 2010 02:50:35 -0800 (PST) In-Reply-To: <20101203101450.GA9573@Desktop-Junchang> Sender: netdev-owner@vger.kernel.org List-ID: Le vendredi 03 d=C3=A9cembre 2010 =C3=A0 18:14 +0800, Junchang Wang a =C3= =A9crit : > Hi Eric, >=20 > I'm reading your patch (ec7d2f2cf3a1 __alloc_skb() speedup), > in which you prefetch skb and the shinfo part. I'm very > curious why we don't prefetch skb->data. It seems that will > help tx path a lot. >=20 > I added the following code >=20 > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 104f844..c60a808 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -222,6 +222,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gf= p_t gfp_mask, > =20 > child->fclone =3D SKB_FCLONE_UNAVAILABLE; > } > + prefetchw(data); > + > out: > return skb; > nodata: >=20 > and the pktgen in my server (A Intel SR1625 server with two E5530=20 > 4-core processors and a single ixgbe-based NIC) goes from 7.6Mpps to > 8.4Mpps (64 byte), with 10% performance gain. >=20 > For rx path, I did experiments on both ixgbe and igb with pktgen+kute= , > and there is no change in system performance. >=20 > welcome any suggestions and corrections. >=20 > Thanks. This is because __alloc_skb() is generic : We dont know if the skb->data is going to be used right after or not at all. =46or example, NIC drivers call __alloc_skb() to refill their RX ring buffer. There is no gain to prefetch data in this case since the data i= s going to be written by the NIC hardware. The reverse would be needed actually : ask to local cpu to evict data from its cache, so that devic= e can DMA it faster (less bus transactions) By the way, adding prefetchw() right before the "return skb;" is probably not very useful. You can certainly try to add the prefetchw() in pktgen itself, since you know for sure you are going to write the data. I dont understand your 10% speedup because pktgen actually uses __netdev_alloc_skb(), so it calls skb_reserve(skb, NET_SKB_PAD) : your prefetchw is bringing a cache line that wont be used at all by pktgen. I would say 10% sounds highly suspect to me...