From: Eric Dumazet <dada1@cosmosbay.com>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: David Miller <davem@davemloft.net>,
herbert@gondor.apana.org.au, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
Subject: Re: tbench regression in 2.6.25-rc1
Date: Tue, 19 Feb 2008 08:35:36 +0100 [thread overview]
Message-ID: <47BA86C8.4050207@cosmosbay.com> (raw)
In-Reply-To: <1203389095.3248.6.camel@ymzhang>
Zhang, Yanmin a écrit :
> On Mon, 2008-02-18 at 11:11 +0100, Eric Dumazet wrote:
>> On Mon, 18 Feb 2008 16:12:38 +0800
>> "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
>>
>>> On Fri, 2008-02-15 at 15:22 -0800, David Miller wrote:
>>>> From: Eric Dumazet <dada1@cosmosbay.com>
>>>> Date: Fri, 15 Feb 2008 15:21:48 +0100
>>>>
>>>>> On linux-2.6.25-rc1 x86_64 :
>>>>>
>>>>> offsetof(struct dst_entry, lastuse)=0xb0
>>>>> offsetof(struct dst_entry, __refcnt)=0xb8
>>>>> offsetof(struct dst_entry, __use)=0xbc
>>>>> offsetof(struct dst_entry, next)=0xc0
>>>>>
>>>>> So it should be optimal... I dont know why tbench prefers __refcnt being
>>>>> on 0xc0, since in this case lastuse will be on a different cache line...
>>>>>
>>>>> Each incoming IP packet will need to change lastuse, __refcnt and __use,
>>>>> so keeping them in the same cache line is a win.
>>>>>
>>>>> I suspect then that even this patch could help tbench, since it avoids
>>>>> writing lastuse...
>>>> I think your suspicions are right, and even moreso
>>>> it helps to keep __refcnt out of the same cache line
>>>> as input/output/ops which are read-almost-entirely :-
>>> I think you are right. The issue is these three variables sharing the same cache line
>>> with input/output/ops.
>>>
>>>> )
>>>>
>>>> I haven't done an exhaustive analysis, but it seems that
>>>> the write traffic to lastuse and __refcnt are about the
>>>> same. However if we find that __refcnt gets hit more
>>>> than lastuse in this workload, it explains the regression.
>>> I also think __refcnt is the key. I did a new testing by adding 2 unsigned long
>>> pading before lastuse, so the 3 members are moved to next cache line. The performance is
>>> recovered.
>>>
>>> How about below patch? Almost all performance is recovered with the new patch.
>>>
>>> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
>>>
>>> ---
>>>
>>> --- linux-2.6.25-rc1/include/net/dst.h 2008-02-21 14:33:43.000000000 +0800
>>> +++ linux-2.6.25-rc1_work/include/net/dst.h 2008-02-21 14:36:22.000000000 +0800
>>> @@ -52,11 +52,10 @@ struct dst_entry
>>> unsigned short header_len; /* more space at head required */
>>> unsigned short trailer_len; /* space to reserve at tail */
>>>
>>> - u32 metrics[RTAX_MAX];
>>> - struct dst_entry *path;
>>> -
>>> - unsigned long rate_last; /* rate limiting for ICMP */
>>> unsigned int rate_tokens;
>>> + unsigned long rate_last; /* rate limiting for ICMP */
>>> +
>>> + struct dst_entry *path;
>>>
>>> #ifdef CONFIG_NET_CLS_ROUTE
>>> __u32 tclassid;
>>> @@ -70,10 +69,12 @@ struct dst_entry
>>> int (*output)(struct sk_buff*);
>>>
>>> struct dst_ops *ops;
>>> -
>>> - unsigned long lastuse;
>>> +
>>> + u32 metrics[RTAX_MAX];
>>> +
>>> atomic_t __refcnt; /* client references */
>>> int __use;
>>> + unsigned long lastuse;
>>> union {
>>> struct dst_entry *next;
>>> struct rtable *rt_next;
>>>
>>>
>> Well, after this patch, we grow dst_entry by 8 bytes :
> With my .config, it doesn't grow. Perhaps because of CONFIG_NET_CLS_ROUTE, I don't
> enable it. I will move tclassid under ops.
>
>> sizeof(struct dst_entry)=0xd0
>> offsetof(struct dst_entry, input)=0x68
>> offsetof(struct dst_entry, output)=0x70
>> offsetof(struct dst_entry, __refcnt)=0xb4
>> offsetof(struct dst_entry, lastuse)=0xc0
>> offsetof(struct dst_entry, __use)=0xb8
>> sizeof(struct rtable)=0x140
>>
>>
>> So we dirty two cache lines instead of one, unless your cpu have 128 bytes cache lines ?
>>
>> I am quite suprised that my patch to not change lastuse if already set to jiffies changes nothing...
>>
>> If you have some time, could you also test this (unrelated) patch ?
>>
>> We can avoid dirty all the time a cache line of loopback device.
>>
>> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
>> index f2a6e71..0a4186a 100644
>> --- a/drivers/net/loopback.c
>> +++ b/drivers/net/loopback.c
>> @@ -150,7 +150,10 @@ static int loopback_xmit(struct sk_buff *skb, struct net_device *dev)
>> return 0;
>> }
>> #endif
>> - dev->last_rx = jiffies;
>> +#ifdef CONFIG_SMP
>> + if (dev->last_rx != jiffies)
>> +#endif
>> + dev->last_rx = jiffies;
>>
>> /* it's OK to use per_cpu_ptr() because BHs are off */
>> pcpu_lstats = netdev_priv(dev);
>>
> Although I didn't test it, I don't think it's ok. The key is __refcnt shares the same
> cache line with ops/input/output.
>
Note it was unrelated to struct dst, but dirtying of one cache line of
'loopback netdevice'
I tested it, and tbench result was better with this patch : 890 MB/s instead
of 870 MB/s on a bi dual core machine.
I was curious of the potential gain on your 16 cores (4x4) machine.
next prev parent reply other threads:[~2008-02-19 7:35 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-15 1:52 tbench regression in 2.6.25-rc1 Zhang, Yanmin
2008-02-15 6:05 ` Eric Dumazet
2008-02-15 6:30 ` Zhang, Yanmin
2008-02-15 14:21 ` Eric Dumazet
2008-02-15 23:22 ` David Miller
2008-02-18 8:12 ` Zhang, Yanmin
2008-02-18 10:11 ` Eric Dumazet
2008-02-19 2:44 ` Zhang, Yanmin
2008-02-19 7:35 ` Eric Dumazet [this message]
2008-02-19 8:40 ` Zhang, Yanmin
2008-02-18 17:33 ` Valdis.Kletnieks
2008-02-19 6:51 ` Zhang, Yanmin
2008-02-19 7:40 ` Eric Dumazet
2008-02-20 7:04 ` Zhang, Yanmin
2008-02-20 7:38 ` Eric Dumazet
2008-02-20 8:14 ` David Miller
2008-02-20 8:41 ` Zhang, Yanmin
2008-02-18 1:39 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47BA86C8.4050207@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.