Re: tbench regression in 2.6.25-rc1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Dumazet <dada1@cosmosbay.com>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: David Miller <davem@davemloft.net>,
	herbert@gondor.apana.org.au, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: tbench regression in 2.6.25-rc1
Date: Tue, 19 Feb 2008 08:35:36 +0100	[thread overview]
Message-ID: <47BA86C8.4050207@cosmosbay.com> (raw)
In-Reply-To: <1203389095.3248.6.camel@ymzhang>

Zhang, Yanmin a écrit :
> On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:
>> On Mon, 18 Feb 2008 16:12:38 +0800
>> "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
>>
>>> On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
>>>> From: Eric Dumazet <dada1@cosmosbay.com>
>>>> Date: Fri, 15 Feb 2008 15:21:48 +0100
>>>>
>>>>> On linux-2.6.25-rc1 x86_64 :
>>>>>
>>>>> offsetof(struct dst_entry, lastuse)=0xb0
>>>>> offsetof(struct dst_entry, __refcnt)=0xb8
>>>>> offsetof(struct dst_entry, __use)=0xbc
>>>>> offsetof(struct dst_entry, next)=0xc0
>>>>>
>>>>> So it should be optimal... I dont know why tbench prefers __refcnt being 
>>>>> on 0xc0, since in this case lastuse will be on a different cache line...
>>>>>
>>>>> Each incoming IP packet will need to change lastuse, __refcnt and __use, 
>>>>> so keeping them in the same cache line is a win.
>>>>>
>>>>> I suspect then that even this patch could help tbench, since it avoids 
>>>>> writing lastuse...
>>>> I think your suspicions are right, and even moreso
>>>> it helps to keep __refcnt out of the same cache line
>>>> as input/output/ops which are read-almost-entirely :-
>>> I think you are right. The issue is these three variables sharing the same cache line
>>> with input/output/ops.
>>>
>>>> )
>>>>
>>>> I haven't done an exhaustive analysis, but it seems that
>>>> the write traffic to lastuse and __refcnt are about the
>>>> same.  However if we find that __refcnt gets hit more
>>>> than lastuse in this workload, it explains the regression.
>>> I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
>>> pading before lastuse, so the 3 members are moved to next cache line. The performance is
>>> recovered.
>>>
>>> How about below patch? Almost all performance is recovered with the new patch.
>>>
>>> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
>>>
>>> ---
>>>
>>> --- linux-2.6.25-rc1/include/net/dst.h	2008-02-21 14:33:43.000000000 +0800
>>> +++ linux-2.6.25-rc1_work/include/net/dst.h	2008-02-21 14:36:22.000000000 +0800
>>> @@ -52,11 +52,10 @@ struct dst_entry
>>>  	unsigned short		header_len;	/* more space at head required */
>>>  	unsigned short		trailer_len;	/* space to reserve at tail */
>>>  
>>> -	u32			metrics[RTAX_MAX];
>>> -	struct dst_entry	*path;
>>> -
>>> -	unsigned long		rate_last;	/* rate limiting for ICMP */
>>>  	unsigned int		rate_tokens;
>>> +	unsigned long		rate_last;	/* rate limiting for ICMP */
>>> +
>>> +	struct dst_entry	*path;
>>>  
>>>  #ifdef CONFIG_NET_CLS_ROUTE
>>>  	__u32			tclassid;
>>> @@ -70,10 +69,12 @@ struct dst_entry
>>>  	int			(*output)(struct sk_buff*);
>>>  
>>>  	struct  dst_ops	        *ops;
>>> -		
>>> -	unsigned long		lastuse;
>>> +
>>> +	u32			metrics[RTAX_MAX];
>>> +
>>>  	atomic_t		__refcnt;	/* client references	*/
>>>  	int			__use;
>>> +	unsigned long		lastuse;
>>>  	union {
>>>  		struct dst_entry *next;
>>>  		struct rtable    *rt_next;
>>>
>>>
>> Well, after this patch, we grow dst_entry by 8 bytes :
> With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I don't
> enable it. I will move tclassid under ops.
> 
>> sizeof(struct dst_entry)=0xd0
>> offsetof(struct dst_entry, input)=0x68
>> offsetof(struct dst_entry, output)=0x70
>> offsetof(struct dst_entry, __refcnt)=0xb4
>> offsetof(struct dst_entry, lastuse)=0xc0
>> offsetof(struct dst_entry, __use)=0xb8
>> sizeof(struct rtable)=0x140
>>
>>
>> So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ?
>>
>> I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing...
>>
>> If you have some time, could you also test this (unrelated) patch ?
>>
>> We can avoid dirty all the time a cache line of loopback device.
>>
>> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
>> index f2a6e71..0a4186a 100644
>> --- a/drivers/net/loopback.c
>> +++ b/drivers/net/loopback.c
>> @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev)
>>                 return 0;
>>         }
>>  #endif
>> -       dev->last_rx = jiffies;
>> +#ifdef CONFIG_SMP
>> +       if (dev->last_rx != jiffies)
>> +#endif
>> +               dev->last_rx = jiffies;
>>  
>>         /* it's OK to use per_cpu_ptr() because BHs are off */
>>         pcpu_lstats = netdev_priv(dev);
>>
> Although I didn't test it, I don't think it's ok. The key is __refcnt shares the same
> cache line with ops/input/output.
> 

Note it was unrelated to struct dst, but dirtying of one cache line of 
'loopback netdevice'

I tested it, and tbench result was better with this patch : 890 MB/s instead 
of 870 MB/s on a bi dual core machine.


I was curious of the potential gain on your 16 cores (4x4) machine.

next prev parent reply	other threads:[~2008-02-19  7:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-15  1:52 tbench regression in 2.6.25-rc1 Zhang, Yanmin
2008-02-15  6:05 ` Eric Dumazet
2008-02-15  6:30   ` Zhang, Yanmin
2008-02-15 14:21     ` Eric Dumazet
2008-02-15 23:22       ` David Miller
2008-02-18  8:12         ` Zhang, Yanmin
2008-02-18 10:11           ` Eric Dumazet
2008-02-19  2:44             ` Zhang, Yanmin
2008-02-19  7:35               ` Eric Dumazet [this message]
2008-02-19  8:40                 ` Zhang, Yanmin
2008-02-18 17:33           ` Valdis.Kletnieks
2008-02-19  6:51             ` Zhang, Yanmin
2008-02-19  7:40               ` Eric Dumazet
2008-02-20  7:04                 ` Zhang, Yanmin
2008-02-20  7:38                   ` Eric Dumazet
2008-02-20  8:14                     ` David Miller
2008-02-20  8:41                       ` Zhang, Yanmin
2008-02-18  1:39       ` Zhang, Yanmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47BA86C8.4050207@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.