Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yajun Deng <yajun.deng@linux.dev>
To: Eric Dumazet <edumazet@google.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexander Lobakin <aleksander.lobakin@intel.com>
Subject: Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
Date: Sun, 8 Oct 2023 16:44:48 +0800	[thread overview]
Message-ID: <9f4fb613-d63f-9b86-fe92-11bf4dfb7275@linux.dev> (raw)
In-Reply-To: <CANn89i+u5dXdYm_0_LwhXg5Nw+gHXx+nPUmbYhvT=k9P4+9JRQ@mail.gmail.com>


On 2023/10/8 15:18, Eric Dumazet wrote:
> On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/8 14:45, Eric Dumazet wrote:
>>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> On 2023/10/7 13:29, Eric Dumazet wrote:
>>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>> Although there is a kfree_skb_reason() helper function that can be used to
>>>>>> find the reason why this skb is dropped, but most callers didn't increase
>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>>>>>
>>>>> ...
>>>>>
>>>>>> +
>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>> +{
>>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>>> +       unsigned long *field;
>>>>>> +
>>>>>> +       if (unlikely(!p))
>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>> +
>>>>>> +       if (p) {
>>>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>>>> This is broken...
>>>>>
>>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
>>>>> many different contexts:
>>>>>
>>>>> 1) process contexts, where preemption and migration are allowed.
>>>>> 2) interrupt contexts.
>>>>>
>>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>>>>>
>>>>> I _think_ I already gave you how to deal with this ?
>>>> Yes, I replied in v6.
>>>>
>>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
>>>>
>>>>> Please try instead:
>>>>>
>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>> +{
>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>> +       unsigned long __percpu *field;
>>>>> +
>>>>> +       if (unlikely(!p)) {
>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>> +               if (!p)
>>>>> +                       return;
>>>>> +       }
>>>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>>>> +       this_cpu_inc(*field);
>>>>> +}
>>>> This wouldn't trace anything even the rx_dropped is in increasing. It
>>>> needs to add an extra operation, such as:
>>> I honestly do not know what you are talking about.
>>>
>>> Have you even tried to change your patch to use
>>>
>>> field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>> this_cpu_inc(*field);
>>
>> Yes, I tested this code. But the following couldn't show anything even
>> if the rx_dropped is increasing.
>>
>> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
> Well, I am not sure about this, "bpftrace" worked for me.
>
> Make sure your toolchain generates something that looks like what I got:
>
> 000000000000ef20 <netdev_core_stats_inc>:
>      ef20: f3 0f 1e fa          endbr64
>      ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
> ef25: R_X86_64_PLT32 __fentry__-0x4
>      ef29: 55                    push   %rbp
>      ef2a: 48 89 e5              mov    %rsp,%rbp
>      ef2d: 53                    push   %rbx
>      ef2e: 89 f3                mov    %esi,%ebx
>      ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
>      ef37: 48 85 c0              test   %rax,%rax
>      ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
>      ef3c: 89 d9                mov    %ebx,%ecx
>      ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
>      ef43: 5b                    pop    %rbx
>      ef44: 5d                    pop    %rbp
>      ef45: c3                    ret
>      ef46: cc                    int3
>      ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
> ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
>      ef4c: 48 85 c0              test   %rax,%rax
>      ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
>      ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
>      ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
>      ef5a: 84 00 00 00 00 00


I'll share some I can see it.

1.

objdump -D vmlinux

ffffffff81b2f170 <netdev_core_stats_inc>:
ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f175:    55                       push   %rbp
ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
ffffffff81b2f187:    74 0d                    je ffffffff81b2f196 
<netdev_core_stats_inc+0x26>
ffffffff81b2f189:    89 f6                    mov    %esi,%esi
ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
ffffffff81b2f190:    c9                       leaveq
ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120 
<netdev_core_stats_alloc>
ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189 
<netdev_core_stats_inc+0x19>
ffffffff81b2f1a6:    c9                       leaveq
ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)


2.

sudo cat /proc/kallsyms | grep netdev_core_stats_inc

ffffffff9c72f120 T netdev_core_stats_inc
ffffffff9ca2676c t netdev_core_stats_inc.cold
ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc


3.

➜  ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
         inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
         ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
         RX packets 29024  bytes 3118278 (3.1 MB)
         RX errors 0  dropped 794  overruns 0  frame 0
         TX packets 16961  bytes 2662290 (2.6 MB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 29  memory 0x39fff4000000-39fff47fffff

➜  ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
         inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
         ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
         RX packets 29272  bytes 3148997 (3.1 MB)
         RX errors 0  dropped 798  overruns 0  frame 0
         TX packets 17098  bytes 2683547 (2.6 MB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 29  memory 0x39fff4000000-39fff47fffff


The rx_dropped is increasing.


4.

sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc

TIME     PID     TID     COMM            FUNC

(Empty, I didn't see anything.)


5.

sudo trace-cmd record -p function -l netdev_core_stats_inc

sudo trace-cmd report

(Empty, I didn't see anything.)


If I add a 'pr_info("\n");'   like:

+      pr_info("\n");
         field = (__force unsigned long __percpu *)((__force void *)p + 
offset);
         this_cpu_inc(*field);


Everything is OK. The 'pr_info("\n");' can be changed to anything else, 
but not

without it.

next prev parent reply	other threads:[~2023-10-08  8:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-07  5:06 [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc() Yajun Deng
2023-10-07  5:29 ` Eric Dumazet
2023-10-07  6:34   ` Yajun Deng
2023-10-08  6:45     ` Eric Dumazet
2023-10-08  6:59       ` Yajun Deng
2023-10-08  7:18         ` Eric Dumazet
2023-10-08  8:44           ` Yajun Deng [this message]
2023-10-08  8:53             ` Eric Dumazet
2023-10-08  9:12               ` Yajun Deng
2023-10-09  3:07                 ` Yajun Deng
2023-10-09  7:53                   ` Eric Dumazet
2023-10-09  8:13                     ` Yajun Deng
2023-10-09  8:20                       ` Eric Dumazet
2023-10-09  8:36                         ` Yajun Deng
2023-10-09  9:30                           ` Eric Dumazet
2023-10-09  9:43                             ` Yajun Deng
2023-10-09 10:16                               ` Eric Dumazet
2023-10-09 10:58                                 ` Yajun Deng
2023-10-09 14:28                                   ` Steven Rostedt
2023-10-10  3:46                                     ` Yajun Deng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f4fb613-d63f-9b86-fe92-11bf4dfb7275@linux.dev \
    --to=yajun.deng@linux.dev \
    --cc=aleksander.lobakin@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.