From: Yajun Deng <yajun.deng@linux.dev>
To: Eric Dumazet <edumazet@google.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Alexander Lobakin <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v6] net/core: Introduce netdev_core_stats_inc()
Date: Fri, 29 Sep 2023 13:38:33 +0800 [thread overview]
Message-ID: <e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev> (raw)
In-Reply-To: <a94ca1e1-d29a-5d98-bf39-97c7a1f25372@linux.dev>
On 2023/9/29 00:32, Yajun Deng wrote:
>
> On 2023/9/29 00:23, Eric Dumazet wrote:
>> On Thu, Sep 28, 2023 at 6:16 PM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>
>>> On 2023/9/28 23:44, Eric Dumazet wrote:
>>>> On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev>
>>>> wrote:
>>>>> On 2023/9/28 22:18, Eric Dumazet wrote:
>>>>>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng
>>>>>> <yajun.deng@linux.dev> wrote:
>>>>>>> Although there is a kfree_skb_reason() helper function that can
>>>>>>> be used to
>>>>>>> find the reason why this skb is dropped, but most callers didn't
>>>>>>> increase
>>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and
>>>>>>> rx_otherhost_dropped.
>>>>>>>
>>>>>>> For the users, people are more concerned about why the dropped
>>>>>>> in ip
>>>>>>> is increasing.
>>>>>>>
>>>>>>> Introduce netdev_core_stats_inc() for trace the caller of the
>>>>>>> dropped
>>>>>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called
>>>>>>> unlinkly.
>>>>>>>
>>>>>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>>>>>>> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>>>>>>> ---
>>>>>>> v6: merge netdev_core_stats and netdev_core_stats_inc together
>>>>>>> v5: Access the per cpu pointer before reach the relevant offset.
>>>>>>> v4: Introduce netdev_core_stats_inc() instead of export
>>>>>>> dev_core_stats_*_inc()
>>>>>>> v3: __cold should be added to the netdev_core_stats_alloc().
>>>>>>> v2: use __cold instead of inline in dev_core_stats().
>>>>>>> v1:
>>>>>>> https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/
>>>>>>> ---
>>>>>>> include/linux/netdevice.h | 21 ++++-----------------
>>>>>>> net/core/dev.c | 17 +++++++++++++++--
>>>>>>> 2 files changed, 19 insertions(+), 19 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>> index 7e520c14eb8c..eb1fa04fbccc 100644
>>>>>>> --- a/include/linux/netdevice.h
>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>> @@ -4002,32 +4002,19 @@ static __always_inline bool
>>>>>>> __is_skb_forwardable(const struct net_device *dev,
>>>>>>> return false;
>>>>>>> }
>>>>>>>
>>>>>>> -struct net_device_core_stats __percpu
>>>>>>> *netdev_core_stats_alloc(struct net_device *dev);
>>>>>>> -
>>>>>>> -static inline struct net_device_core_stats __percpu
>>>>>>> *dev_core_stats(struct net_device *dev)
>>>>>>> -{
>>>>>>> - /* This READ_ONCE() pairs with the write in
>>>>>>> netdev_core_stats_alloc() */
>>>>>>> - struct net_device_core_stats __percpu *p =
>>>>>>> READ_ONCE(dev->core_stats);
>>>>>>> -
>>>>>>> - if (likely(p))
>>>>>>> - return p;
>>>>>>> -
>>>>>>> - return netdev_core_stats_alloc(dev);
>>>>>>> -}
>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset);
>>>>>>>
>>>>>>> #define DEV_CORE_STATS_INC(FIELD) \
>>>>>>> static inline void dev_core_stats_##FIELD##_inc(struct
>>>>>>> net_device *dev) \
>>>>>>> { \
>>>>>>> - struct net_device_core_stats __percpu
>>>>>>> *p; \
>>>>>>> - \
>>>>>>> - p = dev_core_stats(dev); \
>>>>>>> - if (p) \
>>>>>>> - this_cpu_inc(p->FIELD); \
>>>>>> Note that we were using this_cpu_inc() which implied :
>>>>>> - IRQ safety, and
>>>>>> - a barrier paired with :
>>>>>>
>>>>>> net/core/dev.c:10548: storage->rx_dropped +=
>>>>>> READ_ONCE(core_stats->rx_dropped);
>>>>>> net/core/dev.c:10549: storage->tx_dropped +=
>>>>>> READ_ONCE(core_stats->tx_dropped);
>>>>>> net/core/dev.c:10550: storage->rx_nohandler +=
>>>>>> READ_ONCE(core_stats->rx_nohandler);
>>>>>> net/core/dev.c:10551: storage->rx_otherhost_dropped
>>>>>> += READ_ONCE(core_stats->rx_otherhost_dropped);
>>>>>>
>>>>>>
>>>>>>> + netdev_core_stats_inc(dev, \
>>>>>>> + offsetof(struct net_device_core_stats,
>>>>>>> FIELD)); \
>>>>>>> }
>>>>>>> DEV_CORE_STATS_INC(rx_dropped)
>>>>>>> DEV_CORE_STATS_INC(tx_dropped)
>>>>>>> DEV_CORE_STATS_INC(rx_nohandler)
>>>>>>> DEV_CORE_STATS_INC(rx_otherhost_dropped)
>>>>>>> +#undef DEV_CORE_STATS_INC
>>>>>>>
>>>>>>> static __always_inline int ____dev_forward_skb(struct
>>>>>>> net_device *dev,
>>>>>>> struct sk_buff *skb,
>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>> index 606a366cc209..88a32c392c1d 100644
>>>>>>> --- a/net/core/dev.c
>>>>>>> +++ b/net/core/dev.c
>>>>>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct
>>>>>>> rtnl_link_stats64 *stats64,
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(netdev_stats_to_stats64);
>>>>>>>
>>>>>>> -struct net_device_core_stats __percpu
>>>>>>> *netdev_core_stats_alloc(struct net_device *dev)
>>>>>>> +static __cold struct net_device_core_stats __percpu
>>>>>>> *netdev_core_stats_alloc(
>>>>>>> + struct net_device *dev)
>>>>>>> {
>>>>>>> struct net_device_core_stats __percpu *p;
>>>>>>>
>>>>>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu
>>>>>>> *netdev_core_stats_alloc(struct net_device
>>>>>>> /* This READ_ONCE() pairs with the cmpxchg() above */
>>>>>>> return READ_ONCE(dev->core_stats);
>>>>>>> }
>>>>>>> -EXPORT_SYMBOL(netdev_core_stats_alloc);
>>>>>>> +
>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>>> +{
>>>>>>> + /* This READ_ONCE() pairs with the write in
>>>>>>> netdev_core_stats_alloc() */
>>>>>>> + struct net_device_core_stats __percpu *p =
>>>>>>> READ_ONCE(dev->core_stats);
>>>>>>> +
>>>>>>> + if (unlikely(!p))
>>>>>>> + p = netdev_core_stats_alloc(dev);
>>>>>>> +
>>>>>>> + if (p)
>>>>>>> + (*(unsigned long *)((void *)this_cpu_ptr(p) +
>>>>>>> offset))++;
>>>>>> While here you are using a ++ operation that :
>>>>>>
>>>>>> - is not irq safe
>>>>>> - might cause store-tearing.
>>>>>>
>>>>>> I would suggest a preliminary patch converting the "unsigned
>>>>>> long" fields in
>>>>>> struct net_device_core_stats to local_t
>>>>> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use
>>>>> this_cpu_inc() to increment
>>>>>
>>>>> net->core_stats") first? But it would allocate memory which breaks on
>>>>> PREEMPT_RT.
>>>> I think I provided an (untested) alternative.
>>>>
>>>> unsigned long __percpu *field = (__force unsigned long __percpu *)
>>>> ((__force u8 *)p + offset);
>>>> this_cpu_inc(field);
>>> unsigned long __percpu *field = (__force unsigned long __percpu *)
>>> ((__force u8 *)p + offset);
>>> this_cpu_inc(*(int *)field);
>>>
>>> This would compiler success. But I didn't test it.
>>> This cold look complex.
>> Why exactly ? Not very different from the cast you already had.
> Okay, I'll test it.
It seems something wrong.
"ip -s a" would see the 'dropped' is increasing. But I cann't trace
anything by the following cmd.
"sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc"
If I change back to "(*(unsigned long *)((void *)this_cpu_ptr(p) +
offset))++; ", I can trace the caller.
So the following code would accidentally change somthing.
unsigned long __percpu *field = (__force unsigned long __percpu *)
((__force u8 *)p + offset);
this_cpu_inc(*field);
>>
>>> Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce
>>> netdev_core_stats_inc().
>>> That would be easy.
>> Well, you tell me, but this does not look incremental to me.
>>
>> I do not think we need 4 different (and maybe more to come if struct
>> net_device_core_stats
>> grows in the future) functions for some hardly used path.
prev parent reply other threads:[~2023-09-29 5:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-28 10:04 [PATCH v6] net/core: Introduce netdev_core_stats_inc() Yajun Deng
2023-09-28 14:18 ` Eric Dumazet
2023-09-28 15:40 ` Yajun Deng
2023-09-28 15:44 ` Eric Dumazet
2023-09-28 16:16 ` Yajun Deng
2023-09-28 16:23 ` Eric Dumazet
2023-09-28 16:32 ` Yajun Deng
2023-09-29 5:38 ` Yajun Deng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev \
--to=yajun.deng@linux.dev \
--cc=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.