Re: [PATCH] net/tun: expose queue utilization stats via ethtool

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] net/tun: expose queue utilization stats via ethtool
       [not found] ` <20250514233931.56961-1-alex-shalimov@yandex-team.ru>
@ 2025-05-15 14:12   ` Willem de Bruijn
  2025-05-16  1:56     ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2025-05-15 14:12 UTC (permalink / raw)
  To: Alexander Shalimov, willemdebruijn.kernel
  Cc: alex-shalimov, andrew, davem, edumazet, jacob.e.keller, jasowang,
	kuba, linux-kernel, netdev, pabeni, bpf

Alexander Shalimov wrote:
> 06.05.2025, 22:32, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>:
> > Perhaps bpftrace with a kfunc at a suitable function entry point to
> > get access to these ring structures.
> 
> Thank you for your responses!
> 
> Initially, we implemented such monitoring using bpftrace but we were
> not satisfied with the need to double-check the structure definitions
> in tun.c for each new kernel version.
> 
> We attached kprobe to the "tun_net_xmit()" function. This function
> gets a "struct net_device" as an argument, which is then explicitly
> cast to a tun_struct - "struct tun_struct *tun = netdev_priv(dev)".
> However, performing such a cast within bpftrace is difficult because
> tun_struct is defined in tun.c - meaning the structure definition
> cannot be included directly (not a header file). As a result, we were
> forced to add fake "struct tun_struct" and "struct tun_file"
> definitions, whose maintenance across kernel versions became
> cumbersome (see below). The same problems exists even with kfunc and
> btf - we are not able to cast properly netdev to tun_struct.
> 
> That’s why we decided to add this functionality directly to the kernel.

Let's solve this in bpftrace instead. That's no reason to rever to
hardcoded kernel APIs.

It quite possibly already is. I'm no bpftrace expert. Cc:ing bpf@

There seem to be two parts:

The field lookup in struct tun_struct. This should be captured by BTF:

	$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep tun_struct | wc -l
        1

The cast from netdev_priv to struct tun_struct. Note that in recent
kernels netdev_priv is just args->dev->priv. No need for this manual
struct tun_net_device.

> 
> Here is an example of bpftrace:
> 
> #define NET_DEVICE_TUN_OFFSET 0x900
> 
> struct tun_net_device {
>     unsigned char padding[NET_DEVICE_TUN_OFFSET]; #such calculation is pain
>     struct tun_struct tun;
> }
> 
> kprobe:tun_net_xmit {
>     $skb = (struct sk_buff*) arg0;
>     $netdev = $skb->dev;
>     $tun_dev = (struct tun_net_device *)arg1;
>     $tun = $tun_dev->tun;
>    ....
> }
> 
> Could you please recommend the right way to implement such bpftrace script?
> Either better place in kernel for the patch.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/tun: expose queue utilization stats via ethtool
  2025-05-15 14:12   ` [PATCH] net/tun: expose queue utilization stats via ethtool Willem de Bruijn
@ 2025-05-16  1:56     ` Daniel Xu
  2025-05-16 17:22       ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2025-05-16  1:56 UTC (permalink / raw)
  To: Willem de Bruijn, Alexander Shalimov
  Cc: andrew, David Miller, Eric Dumazet, jacob.e.keller, jasowang,
	Jakub Kicinski, linux-kernel, netdev, Paolo Abeni,
	bpf@vger.kernel.org

On Thu, May 15, 2025, at 7:12 AM, Willem de Bruijn wrote:
> Alexander Shalimov wrote:
>> 06.05.2025, 22:32, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>:
>> > Perhaps bpftrace with a kfunc at a suitable function entry point to
>> > get access to these ring structures.
>> 
>> Thank you for your responses!
>> 
>> Initially, we implemented such monitoring using bpftrace but we were
>> not satisfied with the need to double-check the structure definitions
>> in tun.c for each new kernel version.
>> 
>> We attached kprobe to the "tun_net_xmit()" function. This function
>> gets a "struct net_device" as an argument, which is then explicitly
>> cast to a tun_struct - "struct tun_struct *tun = netdev_priv(dev)".
>> However, performing such a cast within bpftrace is difficult because
>> tun_struct is defined in tun.c - meaning the structure definition
>> cannot be included directly (not a header file). As a result, we were
>> forced to add fake "struct tun_struct" and "struct tun_file"
>> definitions, whose maintenance across kernel versions became
>> cumbersome (see below). The same problems exists even with kfunc and
>> btf - we are not able to cast properly netdev to tun_struct.
>> 
>> That’s why we decided to add this functionality directly to the kernel.
>
> Let's solve this in bpftrace instead. That's no reason to rever to
> hardcoded kernel APIs.
>
> It quite possibly already is. I'm no bpftrace expert. Cc:ing bpf@

Yeah, should be possible. You haven't needed to include header
files to access type information available in BTF for a while now.
This seems to work for me - mind giving this a try?

```
fentry:tun:tun_net_xmit {
    $tun = (struct tun_struct *)args->dev->priv;
    print($tun->numqueues);  // or whatever else you want
}
```

fentry probes are better in general than kprobes if all you're doing
is attaching to the entry of a function.

You could do the same with kprobes like this if you really want, though:

```
kprobe:tun:tun_net_xmit {
    $dev = (struct net_device *)arg1;
    $tun = (struct tun_struct *)$dev->priv;
    print($tun->numqueues);  // or whatever else you want
}
```

Although it looks like there's a bug when you omit the module name
where bpftrace doesn't find the struct definition. I'll look into that.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/tun: expose queue utilization stats via ethtool
  2025-05-16  1:56     ` Daniel Xu
@ 2025-05-16 17:22       ` Willem de Bruijn
  2025-05-16 20:21         ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2025-05-16 17:22 UTC (permalink / raw)
  To: Daniel Xu, Willem de Bruijn, Alexander Shalimov
  Cc: andrew, David Miller, Eric Dumazet, jacob.e.keller, jasowang,
	Jakub Kicinski, linux-kernel, netdev, Paolo Abeni,
	bpf@vger.kernel.org

Daniel Xu wrote:
> On Thu, May 15, 2025, at 7:12 AM, Willem de Bruijn wrote:
> > Alexander Shalimov wrote:
> >> 06.05.2025, 22:32, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>:
> >> > Perhaps bpftrace with a kfunc at a suitable function entry point to
> >> > get access to these ring structures.
> >> 
> >> Thank you for your responses!
> >> 
> >> Initially, we implemented such monitoring using bpftrace but we were
> >> not satisfied with the need to double-check the structure definitions
> >> in tun.c for each new kernel version.
> >> 
> >> We attached kprobe to the "tun_net_xmit()" function. This function
> >> gets a "struct net_device" as an argument, which is then explicitly
> >> cast to a tun_struct - "struct tun_struct *tun = netdev_priv(dev)".
> >> However, performing such a cast within bpftrace is difficult because
> >> tun_struct is defined in tun.c - meaning the structure definition
> >> cannot be included directly (not a header file). As a result, we were
> >> forced to add fake "struct tun_struct" and "struct tun_file"
> >> definitions, whose maintenance across kernel versions became
> >> cumbersome (see below). The same problems exists even with kfunc and
> >> btf - we are not able to cast properly netdev to tun_struct.
> >> 
> >> That’s why we decided to add this functionality directly to the kernel.
> >
> > Let's solve this in bpftrace instead. That's no reason to rever to
> > hardcoded kernel APIs.
> >
> > It quite possibly already is. I'm no bpftrace expert. Cc:ing bpf@
> 
> Yeah, should be possible. You haven't needed to include header
> files to access type information available in BTF for a while now.
> This seems to work for me - mind giving this a try?
> 
> ```
> fentry:tun:tun_net_xmit {
>     $tun = (struct tun_struct *)args->dev->priv;
>     print($tun->numqueues);  // or whatever else you want
> }
> ```
> 
> fentry probes are better in general than kprobes if all you're doing
> is attaching to the entry of a function.
> 
> You could do the same with kprobes like this if you really want, though:
> 
> ```
> kprobe:tun:tun_net_xmit {
>     $dev = (struct net_device *)arg1;
>     $tun = (struct tun_struct *)$dev->priv;
>     print($tun->numqueues);  // or whatever else you want
> }
> ```
> 
> Although it looks like there's a bug when you omit the module name
> where bpftrace doesn't find the struct definition. I'll look into that.

Minor: unless tun is built-in.

Thanks a lot for your response, Daniel. Good to know that we can get
this information without kernel changes. And I learned something new
:) Replicated your examples.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/tun: expose queue utilization stats via ethtool
  2025-05-16 17:22       ` Willem de Bruijn
@ 2025-05-16 20:21         ` Daniel Xu
  2025-05-22 14:26           ` Alexander Shalimov
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2025-05-16 20:21 UTC (permalink / raw)
  To: Willem de Bruijn, Alexander Shalimov
  Cc: Andrew Lunn, David Miller, Eric Dumazet, Jacob Keller, jasowang,
	Jakub Kicinski, linux-kernel, netdev, Paolo Abeni,
	bpf@vger.kernel.org



On Fri, May 16, 2025, at 10:22 AM, Willem de Bruijn wrote:
> Daniel Xu wrote:
>> On Thu, May 15, 2025, at 7:12 AM, Willem de Bruijn wrote:
>> > Alexander Shalimov wrote:
>> >> 06.05.2025, 22:32, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>:
>> >> > Perhaps bpftrace with a kfunc at a suitable function entry point to
>> >> > get access to these ring structures.
>> >> 
>> >> Thank you for your responses!
>> >> 
>> >> Initially, we implemented such monitoring using bpftrace but we were
>> >> not satisfied with the need to double-check the structure definitions
>> >> in tun.c for each new kernel version.
>> >> 
>> >> We attached kprobe to the "tun_net_xmit()" function. This function
>> >> gets a "struct net_device" as an argument, which is then explicitly
>> >> cast to a tun_struct - "struct tun_struct *tun = netdev_priv(dev)".
>> >> However, performing such a cast within bpftrace is difficult because
>> >> tun_struct is defined in tun.c - meaning the structure definition
>> >> cannot be included directly (not a header file). As a result, we were
>> >> forced to add fake "struct tun_struct" and "struct tun_file"
>> >> definitions, whose maintenance across kernel versions became
>> >> cumbersome (see below). The same problems exists even with kfunc and
>> >> btf - we are not able to cast properly netdev to tun_struct.
>> >> 
>> >> That’s why we decided to add this functionality directly to the kernel.
>> >
>> > Let's solve this in bpftrace instead. That's no reason to rever to
>> > hardcoded kernel APIs.
>> >
>> > It quite possibly already is. I'm no bpftrace expert. Cc:ing bpf@
>> 
>> Yeah, should be possible. You haven't needed to include header
>> files to access type information available in BTF for a while now.
>> This seems to work for me - mind giving this a try?
>> 
>> ```
>> fentry:tun:tun_net_xmit {
>>     $tun = (struct tun_struct *)args->dev->priv;
>>     print($tun->numqueues);  // or whatever else you want
>> }
>> ```
>> 
>> fentry probes are better in general than kprobes if all you're doing
>> is attaching to the entry of a function.
>> 
>> You could do the same with kprobes like this if you really want, though:
>> 
>> ```
>> kprobe:tun:tun_net_xmit {
>>     $dev = (struct net_device *)arg1;
>>     $tun = (struct tun_struct *)$dev->priv;
>>     print($tun->numqueues);  // or whatever else you want
>> }
>> ```
>> 
>> Although it looks like there's a bug when you omit the module name
>> where bpftrace doesn't find the struct definition. I'll look into that.
>
> Minor: unless tun is built-in.

Ah, right.

>
> Thanks a lot for your response, Daniel. Good to know that we can get
> this information without kernel changes. And I learned something new
> :) Replicated your examples.

Nice! Feel free to CC me if you have other stuff in the future.

Bug fix for parsing implicit module BTF up here:
https://github.com/bpftrace/bpftrace/pull/4137

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/tun: expose queue utilization stats via ethtool
  2025-05-16 20:21         ` Daniel Xu
@ 2025-05-22 14:26           ` Alexander Shalimov
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Shalimov @ 2025-05-22 14:26 UTC (permalink / raw)
  To: dxu
  Cc: alex-shalimov, andrew, bpf, davem, edumazet, jacob.e.keller,
	jasowang, kuba, linux-kernel, netdev, pabeni,
	willemdebruijn.kernel

>> Thanks a lot for your response, Daniel. Good to know that we can get
>> this information without kernel changes. And I learned something new
>> :) Replicated your examples.
>
> Nice! Feel free to CC me if you have other stuff in the future.
> 
> Bug fix for parsing implicit module BTF up here:
> https://github.com/bpftrace/bpftrace/pull/4137

Daniel and Willem, I appreciate you help!

My mistake was that we had tun configured as a module (CONFIG_TUN=m), and I
didn't explicitly specify the module name in the kprobe. Also, thank you for
pointing out that recent kernels have added the 'priv' field to net_device.
As a result, the script has now become much more universal and simpler.

Now we will think about how to efficiently implement monitoring on top of
our bpftrace script, which dynamically reports queue utilization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-22 14:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <681a63e3c1a6c_18e44b2949d@willemb.c.googlers.com.notmuch>
     [not found] ` <20250514233931.56961-1-alex-shalimov@yandex-team.ru>
2025-05-15 14:12   ` [PATCH] net/tun: expose queue utilization stats via ethtool Willem de Bruijn
2025-05-16  1:56     ` Daniel Xu
2025-05-16 17:22       ` Willem de Bruijn
2025-05-16 20:21         ` Daniel Xu
2025-05-22 14:26           ` Alexander Shalimov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).