Re: tbench regression in 2.6.25-rc1

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Dumazet <dada1@cosmosbay.com>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Valdis.Kletnieks@vt.edu, David Miller <davem@davemloft.net>,
	herbert@gondor.apana.org.au, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: tbench regression in 2.6.25-rc1
Date: Wed, 20 Feb 2008 08:38:17 +0100	[thread overview]
Message-ID: <47BBD8E9.2090700@cosmosbay.com> (raw)
In-Reply-To: <1203491081.3248.60.camel@ymzhang>

Zhang, Yanmin a écrit :
> On Tue, 2008-02-19 at 08:40 +0100, Eric Dumazet wrote:
>> Zhang, Yanmin a �crit :
>>> On Mon, 2008-02-18 at 12:33 -0500, Valdis.Kletnieks@vt.edu wrote: 
>>>> On Mon, 18 Feb 2008 16:12:38 +0800, "Zhang, Yanmin" said:
>>>>
>>>>> I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
>>>>> pading before lastuse, so the 3 members are moved to next cache line. The performance is
>>>>> recovered.
>>>>>
>>>>> How about below patch? Almost all performance is recovered with the new patch.
>>>>>
>>>>> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
>>>> Could you add a comment someplace that says "refcnt wants to be on a different
>>>> cache line from input/output/ops or performance tanks badly", to warn some
>>>> future kernel hacker who starts adding new fields to the structure?
>>> Ok. Below is the new patch.
>>>
>>> 1) Move tclassid under ops in case CONFIG_NET_CLS_ROUTE=y. So sizeof(dst_entry)=200
>>> no matter if CONFIG_NET_CLS_ROUTE=y/n. I tested many patches on my 16-core tigerton by
>>> moving tclassid to different place. It looks like tclassid could also have impact on
>>> performance.
>>> If moving tclassid before metrics, or just don't move tclassid, the performance isn't
>>> good. So I move it behind metrics.
>>>
>>> 2) Add comments before __refcnt.
>>>
>>> If CONFIG_NET_CLS_ROUTE=y, the result with below patch is about 18% better than
>>> the one without the patch.
>>>
>>> If CONFIG_NET_CLS_ROUTE=n, the result with below patch is about 30% better than
>>> the one without the patch.
>>>
>>> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
>>>
>>> ---
>>>
>>> --- linux-2.6.25-rc1/include/net/dst.h	2008-02-21 14:33:43.000000000 +0800
>>> +++ linux-2.6.25-rc1_work/include/net/dst.h	2008-02-22 12:52:19.000000000 +0800
>>> @@ -52,15 +52,10 @@ struct dst_entry
>>>  	unsigned short		header_len;	/* more space at head required */
>>>  	unsigned short		trailer_len;	/* space to reserve at tail */
>>>  
>>> -	u32			metrics[RTAX_MAX];
>>> -	struct dst_entry	*path;
>>> -
>>> -	unsigned long		rate_last;	/* rate limiting for ICMP */
>>>  	unsigned int		rate_tokens;
>>> +	unsigned long		rate_last;	/* rate limiting for ICMP */
>>>  
>>> -#ifdef CONFIG_NET_CLS_ROUTE
>>> -	__u32			tclassid;
>>> -#endif
>>> +	struct dst_entry	*path;
>>>  
>>>  	struct neighbour	*neighbour;
>>>  	struct hh_cache		*hh;
>>> @@ -70,10 +65,20 @@ struct dst_entry
>>>  	int			(*output)(struct sk_buff*);
>>>  
>>>  	struct  dst_ops	        *ops;
>>> -		
>>> -	unsigned long		lastuse;
>>> +
>>> +	u32			metrics[RTAX_MAX];
>>> +
>>> +#ifdef CONFIG_NET_CLS_ROUTE
>>> +	__u32			tclassid;
>>> +#endif
>>> +
>>> +	/*
>>> +	 * __refcnt wants to be on a different cache line from
>>> +	 * input/output/ops or performance tanks badly
>>> +	 */
>>>  	atomic_t		__refcnt;	/* client references	*/
>>>  	int			__use;
>>> +	unsigned long		lastuse;
>>>  	union {
>>>  		struct dst_entry *next;
>>>  		struct rtable    *rt_next;
>>>
>>>
>>>
>> I prefer this patch, but unfortunatly your perf numbers are for 64 bits kernels.
>>
>> Could you please test now with 32 bits one ?
> I tested it with 32bit 2.6.25-rc1 on 8-core stoakley. The result almost has no difference
> between pure kernel and patched kernel.
> 
> New update: On 8-core stoakley, the regression becomes 2~3% with kernel 2.6.25-rc2. On
> tigerton, the regression is still 30% with 2.6.25-rc2. On Tulsa( 8 cores+hyperthreading),
> the regression is still 4% with 2.6.25-rc2.
> 
> With my patch, on tigerton, almost all regression disappears. On tulsa, only about 2%
> regression disappears.
> 
> So this issue is triggerred with multiple-cpu. Perhaps process scheduler is another
> factor causing the issue to happen, but it's very hard to change scheduler.
> 

Thanks very much Yanmin, I think we can apply your patch as is, if no 
regression was found for 32bits.

> 
> Eric,
> 
> I tested your new patch in function loopback_xmit. It has no improvement, while it doesn't
> introduce new issues. As you tested it on dual-core machine and got improvement, how about
> merging your patch with mine?

No, thank you, that was an experiment and is not related to your findings on 
dst_entry.

I am currently working on a 'distributed refcount' infrastructure, to be able 
to spread on several nodes (for NUMA machines) or several cache lines (normal 
SMP machines)  the high pressure we currently have on some refcnt (struct 
dst_entry, struct net_device, and many more refcnts ...)

Instead of NR_CPUS allocations, goal is to be able to restrict to a small 
value like 4, 8 or 16 the number of 32bits entities used to store one refcnt, 
even if NR_CPUS=4096 or so.

atomic_inc(&p->refcnt) ->  distref_inc(&p->refcnt)

distref_inc(struct distref *p)
{
	atomic_inc(myptr[p->offset]);
}

next prev parent reply	other threads:[~2008-02-20  7:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-15  1:52 tbench regression in 2.6.25-rc1 Zhang, Yanmin
2008-02-15  6:05 ` Eric Dumazet
2008-02-15  6:30   ` Zhang, Yanmin
2008-02-15 14:21     ` Eric Dumazet
2008-02-15 23:22       ` David Miller
2008-02-18  8:12         ` Zhang, Yanmin
2008-02-18 10:11           ` Eric Dumazet
2008-02-19  2:44             ` Zhang, Yanmin
2008-02-19  7:35               ` Eric Dumazet
2008-02-19  8:40                 ` Zhang, Yanmin
2008-02-18 17:33           ` Valdis.Kletnieks
2008-02-19  6:51             ` Zhang, Yanmin
2008-02-19  7:40               ` Eric Dumazet
2008-02-20  7:04                 ` Zhang, Yanmin
2008-02-20  7:38                   ` Eric Dumazet [this message]
2008-02-20  8:14                     ` David Miller
2008-02-20  8:41                       ` Zhang, Yanmin
2008-02-18  1:39       ` Zhang, Yanmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47BBD8E9.2090700@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).