Re: [PATCH net-next v4] net: Re-use and set mono_delivery

BPF List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
       [not found] <20240301201348.2815102-1-quic_abchauha@quicinc.com>
@ 2024-03-12 23:52 ` Martin KaFai Lau
  2024-03-13  4:34   ` Abhishek Chauhan (ABC)
  2024-03-13  8:52   ` Willem de Bruijn
  0 siblings, 2 replies; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-12 23:52 UTC (permalink / raw)
  To: Abhishek Chauhan, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
> 
> Existing functionality of mono_delivery_time remains unaltered here,
> instead just extended with userspace tstamp support for bridge
> forwarding path.

The patch currently broke the bpf selftest test_tc_dtime: 
https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675

In particular, there is a uapi field __sk_buff->tstamp_type which currently has 
BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. 
BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at 
ingress or a delivery time set by user space).

__sk_buff->tstamp_type depends on skb->mono_delivery_time which does not 
necessarily mean mono after this patch. I thought about fixing it on the bpf 
side such that reading __sk_buff->tstamp_type only returns 
BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk 
is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().

There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, 
BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the 
skb->mono_delivery_time. The expectation is this could change skb->tstamp in the 
ingress skb and redirect to egress sch_fq. It could also set a mono time to 
skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then 
bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, 
BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects 
BPF_SKB_TSTAMP_DELIVERY_MONO also.

I ran out of idea to solve this uapi breakage.

I am afraid it may need to go back to v1 idea and use another bit 
(user_delivery_time) in the skb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-12 23:52 ` [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets Martin KaFai Lau
@ 2024-03-13  4:34   ` Abhishek Chauhan (ABC)
  2024-03-13  5:32     ` Abhishek Chauhan (ABC)
  2024-03-13  8:52   ` Willem de Bruijn
  1 sibling, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-13  4:34 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/12/2024 4:52 PM, Martin KaFai Lau wrote:
> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>> Bridge driver today has no support to forward the userspace timestamp
>> packets and ends up resetting the timestamp. ETF qdisc checks the
>> packet coming from userspace and encounters to be 0 thereby dropping
>> time sensitive packets. These changes will allow userspace timestamps
>> packets to be forwarded from the bridge to NIC drivers.
>>
>> Setting the same bit (mono_delivery_time) to avoid dropping of
>> userspace tstamp packets in the forwarding path.
>>
>> Existing functionality of mono_delivery_time remains unaltered here,
>> instead just extended with userspace tstamp support for bridge
>> forwarding path.
> 
> The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> 
> In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space).
> 
> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> 
> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also.
> 
> I ran out of idea to solve this uapi breakage.
> 
> I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb.
> 
I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. 

Martin what do you suggest ? 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13  4:34   ` Abhishek Chauhan (ABC)
@ 2024-03-13  5:32     ` Abhishek Chauhan (ABC)
  0 siblings, 0 replies; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-13  5:32 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/12/2024 9:34 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/12/2024 4:52 PM, Martin KaFai Lau wrote:
>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>> Bridge driver today has no support to forward the userspace timestamp
>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>> packet coming from userspace and encounters to be 0 thereby dropping
>>> time sensitive packets. These changes will allow userspace timestamps
>>> packets to be forwarded from the bridge to NIC drivers.
>>>
>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>> userspace tstamp packets in the forwarding path.
>>>
>>> Existing functionality of mono_delivery_time remains unaltered here,
>>> instead just extended with userspace tstamp support for bridge
>>> forwarding path.
>>
>> The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>
>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space).
>>
>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>
>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>
>> I ran out of idea to solve this uapi breakage.
>>
>> I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb.
>>
> I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. 
> 
> Martin what do you suggest ? 
Sorry i meant Willem. 

Willem do we want to fall back to version 1 where we add an extra bit for userspace timestamp as Martin is facing some issue in one of his test case. 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-12 23:52 ` [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets Martin KaFai Lau
  2024-03-13  4:34   ` Abhishek Chauhan (ABC)
@ 2024-03-13  8:52   ` Willem de Bruijn
  2024-03-13 18:42     ` Martin KaFai Lau
  1 sibling, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-03-13  8:52 UTC (permalink / raw)
  To: Martin KaFai Lau, Abhishek Chauhan, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

Martin KaFai Lau wrote:
> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> > Bridge driver today has no support to forward the userspace timestamp
> > packets and ends up resetting the timestamp. ETF qdisc checks the
> > packet coming from userspace and encounters to be 0 thereby dropping
> > time sensitive packets. These changes will allow userspace timestamps
> > packets to be forwarded from the bridge to NIC drivers.
> > 
> > Setting the same bit (mono_delivery_time) to avoid dropping of
> > userspace tstamp packets in the forwarding path.
> > 
> > Existing functionality of mono_delivery_time remains unaltered here,
> > instead just extended with userspace tstamp support for bridge
> > forwarding path.
> 
> The patch currently broke the bpf selftest test_tc_dtime: 
> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> 
> In particular, there is a uapi field __sk_buff->tstamp_type which currently has 
> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. 
> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at 
> ingress or a delivery time set by user space).
> 
> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not 
> necessarily mean mono after this patch. I thought about fixing it on the bpf 
> side such that reading __sk_buff->tstamp_type only returns 
> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk 
> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> 
> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, 
> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the 
> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the 
> ingress skb and redirect to egress sch_fq. It could also set a mono time to 
> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then 
> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, 
> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects 
> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> 
> I ran out of idea to solve this uapi breakage.
> 
> I am afraid it may need to go back to v1 idea and use another bit 
> (user_delivery_time) in the skb.

Is the only conflict when bpf_skb_set_tstamp is called for an skb from
a socket with sk_clockid set (and thus SO_TXTIME called)?

Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
skb->sk is NULL is fine. This is the ingress to egress redirect case.

I don't see an immediate use for this BPF function on egress where it
would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.

Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
already explicitly programmed?

    skb->sk &&
    sock_flag(sk, SOCK_TXTIME) &&
    skb->sk->sk_clockid != CLOCK_MONOTONIC

Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
fall back on interpreting as monotonic.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13  8:52   ` Willem de Bruijn
@ 2024-03-13 18:42     ` Martin KaFai Lau
  2024-03-13 19:36       ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-13 18:42 UTC (permalink / raw)
  To: Willem de Bruijn, Abhishek Chauhan
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>> Bridge driver today has no support to forward the userspace timestamp
>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>> packet coming from userspace and encounters to be 0 thereby dropping
>>> time sensitive packets. These changes will allow userspace timestamps
>>> packets to be forwarded from the bridge to NIC drivers.
>>>
>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>> userspace tstamp packets in the forwarding path.
>>>
>>> Existing functionality of mono_delivery_time remains unaltered here,
>>> instead just extended with userspace tstamp support for bridge
>>> forwarding path.
>>
>> The patch currently broke the bpf selftest test_tc_dtime:
>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>
>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>> ingress or a delivery time set by user space).
>>
>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>> side such that reading __sk_buff->tstamp_type only returns
>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>
>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>
>> I ran out of idea to solve this uapi breakage.
>>
>> I am afraid it may need to go back to v1 idea and use another bit
>> (user_delivery_time) in the skb.
> 
> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> a socket with sk_clockid set (and thus SO_TXTIME called)?

Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
its interpretation depends on skb->sk->sk_clockid.

> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> skb->sk is NULL is fine. This is the ingress to egress redirect case.

skb->sk == NULL is fine. I tried something like this in 
bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:

__sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:

	skb->mono_delivery_time == 1 &&
	(!skb->sk ||
	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
	 skb->sk->sk_protocol == IPPROTO_TCP)

Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.

> 
> I don't see an immediate use for this BPF function on egress where it
> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.

The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
sending to the sch_fq.

I currently also don't have a use case for skb->sk->sk_clockid != 
CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
before queuing to sch_fq.

The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
between the veth pair.

> 
> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> already explicitly programmed?

This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
acceptable.

> 
>      skb->sk &&
>      sock_flag(sk, SOCK_TXTIME) &&
>      skb->sk->sk_clockid != CLOCK_MONOTONIC

> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> fall back on interpreting as monotonic.

Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.

sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
changes skb and does not change skb->sk.

I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
from looking at __sock_cmsg_send().

There may be a short period of disconnect between what is in sk->sk_flags and 
what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
again after skb->tstamp is set by bpf. This could be considered a small glitch 
for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).

I think all this is crying for another bit in skb to mean user_delivery_time 
(skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
mono time either set by kernel-tcp or bpf. If we need to revert the 
mono_delivery_time reuse patch later, we will need to revert the netdev patch 
and the (to-be-made) bpf patch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 18:42     ` Martin KaFai Lau
@ 2024-03-13 19:36       ` Willem de Bruijn
  2024-03-13 20:59         ` Abhishek Chauhan (ABC)
  2024-03-13 21:01         ` Martin KaFai Lau
  0 siblings, 2 replies; 25+ messages in thread
From: Willem de Bruijn @ 2024-03-13 19:36 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn, Abhishek Chauhan
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

Martin KaFai Lau wrote:
> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> > Martin KaFai Lau wrote:
> >> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>> Bridge driver today has no support to forward the userspace timestamp
> >>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>> packet coming from userspace and encounters to be 0 thereby dropping
> >>> time sensitive packets. These changes will allow userspace timestamps
> >>> packets to be forwarded from the bridge to NIC drivers.
> >>>
> >>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>> userspace tstamp packets in the forwarding path.
> >>>
> >>> Existing functionality of mono_delivery_time remains unaltered here,
> >>> instead just extended with userspace tstamp support for bridge
> >>> forwarding path.
> >>
> >> The patch currently broke the bpf selftest test_tc_dtime:
> >> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>
> >> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >> ingress or a delivery time set by user space).
> >>
> >> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >> side such that reading __sk_buff->tstamp_type only returns
> >> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>
> >> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>
> >> I ran out of idea to solve this uapi breakage.
> >>
> >> I am afraid it may need to go back to v1 idea and use another bit
> >> (user_delivery_time) in the skb.
> > 
> > Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> > a socket with sk_clockid set (and thus SO_TXTIME called)?
> 
> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
> its interpretation depends on skb->sk->sk_clockid.
> 
> > Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> > skb->sk is NULL is fine. This is the ingress to egress redirect case.
> 
> skb->sk == NULL is fine. I tried something like this in 
> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> 
> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> 
> 	skb->mono_delivery_time == 1 &&
> 	(!skb->sk ||
> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> 	 skb->sk->sk_protocol == IPPROTO_TCP)
> 
> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> 
> > 
> > I don't see an immediate use for this BPF function on egress where it
> > would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> > but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> 
> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
> sending to the sch_fq.
> 
> I currently also don't have a use case for skb->sk->sk_clockid != 
> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
> before queuing to sch_fq.
> 
> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
> between the veth pair.
> 
> > 
> > Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> > already explicitly programmed?
> 
> This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
> acceptable.
>
> > 
> >      skb->sk &&
> >      sock_flag(sk, SOCK_TXTIME) &&
> >      skb->sk->sk_clockid != CLOCK_MONOTONIC
> 
> > Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> > fall back on interpreting as monotonic.
> 
> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> 
> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
> changes skb and does not change skb->sk.
> 
> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
> from looking at __sock_cmsg_send().
> 
> There may be a short period of disconnect between what is in sk->sk_flags and 
> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
> again after skb->tstamp is set by bpf. This could be considered a small glitch 
> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> 
> I think all this is crying for another bit in skb to mean user_delivery_time 
> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
> mono time either set by kernel-tcp or bpf.

It does sound like the approach with least side effects.

If we're going to increase to two bits per skb, it's perhaps
better to just encode the (selected supported) CLOCK_ type, rather
than only supporting clockid through skb->sk->sk_clockid.

This BPF function is the analogue to SO_TXTIME. It is clearly
extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
work with sch_etf, say.

> If we need to revert the 
> mono_delivery_time reuse patch later, we will need to revert the netdev patch 
> and the (to-be-made) bpf patch.


 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 19:36       ` Willem de Bruijn
@ 2024-03-13 20:59         ` Abhishek Chauhan (ABC)
  2024-03-13 21:19           ` Martin KaFai Lau
  2024-03-13 21:01         ` Martin KaFai Lau
  1 sibling, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-13 20:59 UTC (permalink / raw)
  To: Willem de Bruijn, Martin KaFai Lau
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/13/2024 12:36 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>
>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>> userspace tstamp packets in the forwarding path.
>>>>>
>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>> instead just extended with userspace tstamp support for bridge
>>>>> forwarding path.
>>>>
>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>
>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>> ingress or a delivery time set by user space).
>>>>
>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>> side such that reading __sk_buff->tstamp_type only returns
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>
>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>
>>>> I ran out of idea to solve this uapi breakage.
>>>>
>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>> (user_delivery_time) in the skb.
>>>
>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>
>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
>> its interpretation depends on skb->sk->sk_clockid.
>>
>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>
>> skb->sk == NULL is fine. I tried something like this in 
>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>
>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>
>> 	skb->mono_delivery_time == 1 &&
>> 	(!skb->sk ||
>> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>> 	 skb->sk->sk_protocol == IPPROTO_TCP)
>>
>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>
>>>
>>> I don't see an immediate use for this BPF function on egress where it
>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>
>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
>> sending to the sch_fq.
>>
>> I currently also don't have a use case for skb->sk->sk_clockid != 
>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
>> before queuing to sch_fq.
>>
>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
>> between the veth pair.
>>
>>>
>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>> already explicitly programmed?
>>
>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
>> acceptable.
>>
>>>
>>>      skb->sk &&
>>>      sock_flag(sk, SOCK_TXTIME) &&
>>>      skb->sk->sk_clockid != CLOCK_MONOTONIC
>>
>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>> fall back on interpreting as monotonic.
>>
>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>
>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
>> changes skb and does not change skb->sk.
>>
>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
>> from looking at __sock_cmsg_send().
>>
>> There may be a short period of disconnect between what is in sk->sk_flags and 
>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
>> again after skb->tstamp is set by bpf. This could be considered a small glitch 
>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>
>> I think all this is crying for another bit in skb to mean user_delivery_time 
>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
>> mono time either set by kernel-tcp or bpf.
> 
> It does sound like the approach with least side effects.
> 
Martin and Willem , while we are discussing to use seperate bit per skb to check if its 
SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
called in filter framework
@tc_at_ingress: used within tc_classify to distinguish in/egress
@from_ingress: packet was redirected from the ingress path

Since i believe the above testcase is based on redirecting from ingress to egress part
why cant we use these already existing flags ? Please advice 

I am not completely sure if we can use these flags to solve the bpf problem or not. 
but considering the problem statment put forth by Willem stating that we are just adding 
flags for every usecase and not keeping the design scalable for future needs. 

> If we're going to increase to two bits per skb, it's perhaps
> better to just encode the (selected supported) CLOCK_ type, rather
> than only supporting clockid through skb->sk->sk_clockid.
> 
> This BPF function is the analogue to SO_TXTIME. It is clearly
> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> work with sch_etf, say.
> 
>> If we need to revert the 
>> mono_delivery_time reuse patch later, we will need to revert the netdev patch 
>> and the (to-be-made) bpf patch.
> 
> 
>  
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 20:59         ` Abhishek Chauhan (ABC)
@ 2024-03-13 21:19           ` Martin KaFai Lau
  2024-03-13 21:41             ` Daniel Borkmann
  0 siblings, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-13 21:19 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC), Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote:
>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>> mono time either set by kernel-tcp or bpf.
>>
>> It does sound like the approach with least side effects.
>>
> Martin and Willem , while we are discussing to use seperate bit per skb to check if its
> SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
> called in filter framework
> @tc_at_ingress: used within tc_classify to distinguish in/egress
> @from_ingress: packet was redirected from the ingress path
> 
> Since i believe the above testcase is based on redirecting from ingress to egress part
> why cant we use these already existing flags ? Please advice
> 
> I am not completely sure if we can use these flags to solve the bpf problem or not.

I don't see how they are related. It can tell where a skb is at. It cannot tell 
what is in skb->tstamp.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 21:19           ` Martin KaFai Lau
@ 2024-03-13 21:41             ` Daniel Borkmann
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Borkmann @ 2024-03-13 21:41 UTC (permalink / raw)
  To: Martin KaFai Lau, Abhishek Chauhan (ABC), Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Alexei Starovoitov, Andrii Nakryiko

On 3/13/24 10:19 PM, Martin KaFai Lau wrote:
> On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote:
>>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>>> mono time either set by kernel-tcp or bpf.
>>>
>>> It does sound like the approach with least side effects.
>>>
>> Martin and Willem , while we are discussing to use seperate bit per skb to check if its
>> SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
>> called in filter framework
>> @tc_at_ingress: used within tc_classify to distinguish in/egress
>> @from_ingress: packet was redirected from the ingress path
>>
>> Since i believe the above testcase is based on redirecting from ingress to egress part
>> why cant we use these already existing flags ? Please advice
>>
>> I am not completely sure if we can use these flags to solve the bpf problem or not.
> 
> I don't see how they are related. It can tell where a skb is at. It cannot tell what is in skb->tstamp.

+1, if we go that route as per discussion, I'd also prefer clearly encoding delivery
time clock base in skb to avoid ambiguity.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 19:36       ` Willem de Bruijn
  2024-03-13 20:59         ` Abhishek Chauhan (ABC)
@ 2024-03-13 21:01         ` Martin KaFai Lau
  2024-03-13 21:26           ` Abhishek Chauhan (ABC)
  1 sibling, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-13 21:01 UTC (permalink / raw)
  To: Willem de Bruijn, Abhishek Chauhan
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>
>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>> userspace tstamp packets in the forwarding path.
>>>>>
>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>> instead just extended with userspace tstamp support for bridge
>>>>> forwarding path.
>>>>
>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>
>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>> ingress or a delivery time set by user space).
>>>>
>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>> side such that reading __sk_buff->tstamp_type only returns
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>
>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>
>>>> I ran out of idea to solve this uapi breakage.
>>>>
>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>> (user_delivery_time) in the skb.
>>>
>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>
>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>> its interpretation depends on skb->sk->sk_clockid.
>>
>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>
>> skb->sk == NULL is fine. I tried something like this in
>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>
>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>
>> 	skb->mono_delivery_time == 1 &&
>> 	(!skb->sk ||
>> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>> 	 skb->sk->sk_protocol == IPPROTO_TCP)
>>
>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>
>>>
>>> I don't see an immediate use for this BPF function on egress where it
>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>
>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>> sending to the sch_fq.
>>
>> I currently also don't have a use case for skb->sk->sk_clockid !=
>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>> before queuing to sch_fq.
>>
>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>> between the veth pair.
>>
>>>
>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>> already explicitly programmed?
>>
>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>> acceptable.
>>
>>>
>>>       skb->sk &&
>>>       sock_flag(sk, SOCK_TXTIME) &&
>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
>>
>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>> fall back on interpreting as monotonic.
>>
>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>
>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>> changes skb and does not change skb->sk.
>>
>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>> from looking at __sock_cmsg_send().
>>
>> There may be a short period of disconnect between what is in sk->sk_flags and
>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>
>> I think all this is crying for another bit in skb to mean user_delivery_time
>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>> mono time either set by kernel-tcp or bpf.
> 
> It does sound like the approach with least side effects.
> 
> If we're going to increase to two bits per skb, it's perhaps
> better to just encode the (selected supported) CLOCK_ type, rather
> than only supporting clockid through skb->sk->sk_clockid.

Good idea. May be starting with mono and tai (Abishek's use case?), only forward 
these two clocks and reset the skb->tstamp for others.

> 
> This BPF function is the analogue to SO_TXTIME. It is clearly
> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> work with sch_etf, say.

Yes, if there are bits in skb to describe the clock in the skb->tstamp, 
BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the 
values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.

The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes 
to include the consideration of these two bits. I think we have mostly settled 
with the approach (thanks for the discussion!). Abhishek, not sure how much can 
be reused from this patch for the two bits apporach, do you want to revert the 
current patch first and then start from clean?

> 
>> If we need to revert the
>> mono_delivery_time reuse patch later, we will need to revert the netdev patch
>> and the (to-be-made) bpf patch.
> 
> 
>   
> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 21:01         ` Martin KaFai Lau
@ 2024-03-13 21:26           ` Abhishek Chauhan (ABC)
  2024-03-13 21:40             ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-13 21:26 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
>> Martin KaFai Lau wrote:
>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>>> Martin KaFai Lau wrote:
>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>>
>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>>> userspace tstamp packets in the forwarding path.
>>>>>>
>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>>> instead just extended with userspace tstamp support for bridge
>>>>>> forwarding path.
>>>>>
>>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>>
>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>>> ingress or a delivery time set by user space).
>>>>>
>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>>> side such that reading __sk_buff->tstamp_type only returns
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>>
>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>>
>>>>> I ran out of idea to solve this uapi breakage.
>>>>>
>>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>>> (user_delivery_time) in the skb.
>>>>
>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>>
>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>>> its interpretation depends on skb->sk->sk_clockid.
>>>
>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>>
>>> skb->sk == NULL is fine. I tried something like this in
>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>>
>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>>
>>>     skb->mono_delivery_time == 1 &&
>>>     (!skb->sk ||
>>>      !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>>>      skb->sk->sk_protocol == IPPROTO_TCP)
>>>
>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>>
>>>>
>>>> I don't see an immediate use for this BPF function on egress where it
>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>>
>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>>> sending to the sch_fq.
>>>
>>> I currently also don't have a use case for skb->sk->sk_clockid !=
>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>>> before queuing to sch_fq.
>>>
>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>>> between the veth pair.
>>>
>>>>
>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>>> already explicitly programmed?
>>>
>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>>> acceptable.
>>>
>>>>
>>>>       skb->sk &&
>>>>       sock_flag(sk, SOCK_TXTIME) &&
>>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
>>>
>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>>> fall back on interpreting as monotonic.
>>>
>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>>
>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>>> changes skb and does not change skb->sk.
>>>
>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>>> from looking at __sock_cmsg_send().
>>>
>>> There may be a short period of disconnect between what is in sk->sk_flags and
>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>>
>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>> mono time either set by kernel-tcp or bpf.
>>
>> It does sound like the approach with least side effects.
>>
>> If we're going to increase to two bits per skb, it's perhaps
>> better to just encode the (selected supported) CLOCK_ type, rather
>> than only supporting clockid through skb->sk->sk_clockid.
> 
> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> 
>>
>> This BPF function is the analogue to SO_TXTIME. It is clearly
>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
>> work with sch_etf, say.
> 
> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> 
> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> 
Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)

Here is what i will do from myside
1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets 
2. Keep mono_delivery_time as ease 
3. Add user_delivery_time as a new bitfield 
4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
5. do not reset the time if either mono_delivery_time or user_delivery_time is set. 

Let me know if i have covered all the design details or add if i missed anything. 

>>
>>> If we need to revert the
>>> mono_delivery_time reuse patch later, we will need to revert the netdev patch
>>> and the (to-be-made) bpf patch.
>>
>>
>>  
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 21:26           ` Abhishek Chauhan (ABC)
@ 2024-03-13 21:40             ` Willem de Bruijn
  2024-03-13 22:08               ` Martin KaFai Lau
  0 siblings, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-03-13 21:40 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC), Martin KaFai Lau, Willem de Bruijn
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> > On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> >> Martin KaFai Lau wrote:
> >>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> >>>> Martin KaFai Lau wrote:
> >>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>>>>> Bridge driver today has no support to forward the userspace timestamp
> >>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>>>>> packet coming from userspace and encounters to be 0 thereby dropping
> >>>>>> time sensitive packets. These changes will allow userspace timestamps
> >>>>>> packets to be forwarded from the bridge to NIC drivers.
> >>>>>>
> >>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>>>>> userspace tstamp packets in the forwarding path.
> >>>>>>
> >>>>>> Existing functionality of mono_delivery_time remains unaltered here,
> >>>>>> instead just extended with userspace tstamp support for bridge
> >>>>>> forwarding path.
> >>>>>
> >>>>> The patch currently broke the bpf selftest test_tc_dtime:
> >>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>>>>
> >>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >>>>> ingress or a delivery time set by user space).
> >>>>>
> >>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >>>>> side such that reading __sk_buff->tstamp_type only returns
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>>>>
> >>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>>>>
> >>>>> I ran out of idea to solve this uapi breakage.
> >>>>>
> >>>>> I am afraid it may need to go back to v1 idea and use another bit
> >>>>> (user_delivery_time) in the skb.
> >>>>
> >>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> >>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
> >>>
> >>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
> >>> its interpretation depends on skb->sk->sk_clockid.
> >>>
> >>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> >>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
> >>>
> >>> skb->sk == NULL is fine. I tried something like this in
> >>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> >>>
> >>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> >>>
> >>>     skb->mono_delivery_time == 1 &&
> >>>     (!skb->sk ||
> >>>      !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> >>>      skb->sk->sk_protocol == IPPROTO_TCP)
> >>>
> >>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> >>>
> >>>>
> >>>> I don't see an immediate use for this BPF function on egress where it
> >>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> >>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> >>>
> >>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
> >>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
> >>> sending to the sch_fq.
> >>>
> >>> I currently also don't have a use case for skb->sk->sk_clockid !=
> >>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
> >>> before queuing to sch_fq.
> >>>
> >>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
> >>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
> >>> between the veth pair.
> >>>
> >>>>
> >>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> >>>> already explicitly programmed?
> >>>
> >>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
> >>> acceptable.
> >>>
> >>>>
> >>>>       skb->sk &&
> >>>>       sock_flag(sk, SOCK_TXTIME) &&
> >>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
> >>>
> >>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> >>>> fall back on interpreting as monotonic.
> >>>
> >>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> >>>
> >>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
> >>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
> >>> changes skb and does not change skb->sk.
> >>>
> >>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
> >>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
> >>> from looking at __sock_cmsg_send().
> >>>
> >>> There may be a short period of disconnect between what is in sk->sk_flags and
> >>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
> >>> again after skb->tstamp is set by bpf. This could be considered a small glitch
> >>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> >>>
> >>> I think all this is crying for another bit in skb to mean user_delivery_time
> >>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
> >>> mono time either set by kernel-tcp or bpf.
> >>
> >> It does sound like the approach with least side effects.
> >>
> >> If we're going to increase to two bits per skb, it's perhaps
> >> better to just encode the (selected supported) CLOCK_ type, rather
> >> than only supporting clockid through skb->sk->sk_clockid.
> > 
> > Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> > 
> >>
> >> This BPF function is the analogue to SO_TXTIME. It is clearly
> >> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> >> work with sch_etf, say.
> > 
> > Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> > 
> > The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> > 
> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
> 
> Here is what i will do from myside
> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets 
> 2. Keep mono_delivery_time as ease 
> 3. Add user_delivery_time as a new bitfield 
> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. 
> 
> Let me know if i have covered all the design details or add if i missed anything. 

Thanks for revising this.

No need to add the BPF part here.

I was mistaken that we can encode the clock_id in two skb bits.
SO_TXTIME allows essentially all CLOCK_...

So indeed we will need a user_delivery_time bit and use that to look
at sk_clockid.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 21:40             ` Willem de Bruijn
@ 2024-03-13 22:08               ` Martin KaFai Lau
  2024-03-14  9:49                 ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-13 22:08 UTC (permalink / raw)
  To: Willem de Bruijn, Abhishek Chauhan (ABC)
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/13/24 2:40 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
>>> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
>>>> Martin KaFai Lau wrote:
>>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>>>>> Martin KaFai Lau wrote:
>>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>>>>
>>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>>>>> userspace tstamp packets in the forwarding path.
>>>>>>>>
>>>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>>>>> instead just extended with userspace tstamp support for bridge
>>>>>>>> forwarding path.
>>>>>>>
>>>>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>>>>
>>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>>>>> ingress or a delivery time set by user space).
>>>>>>>
>>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>>>>> side such that reading __sk_buff->tstamp_type only returns
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>>>>
>>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>>>>
>>>>>>> I ran out of idea to solve this uapi breakage.
>>>>>>>
>>>>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>>>>> (user_delivery_time) in the skb.
>>>>>>
>>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>>>>
>>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>>>>> its interpretation depends on skb->sk->sk_clockid.
>>>>>
>>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>>>>
>>>>> skb->sk == NULL is fine. I tried something like this in
>>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>>>>
>>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>>>>
>>>>>      skb->mono_delivery_time == 1 &&
>>>>>      (!skb->sk ||
>>>>>       !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>>>>>       skb->sk->sk_protocol == IPPROTO_TCP)
>>>>>
>>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>>>>
>>>>>>
>>>>>> I don't see an immediate use for this BPF function on egress where it
>>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>>>>
>>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>>>>> sending to the sch_fq.
>>>>>
>>>>> I currently also don't have a use case for skb->sk->sk_clockid !=
>>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>>>>> before queuing to sch_fq.
>>>>>
>>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>>>>> between the veth pair.
>>>>>
>>>>>>
>>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>>>>> already explicitly programmed?
>>>>>
>>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>>>>> acceptable.
>>>>>
>>>>>>
>>>>>>        skb->sk &&
>>>>>>        sock_flag(sk, SOCK_TXTIME) &&
>>>>>>        skb->sk->sk_clockid != CLOCK_MONOTONIC
>>>>>
>>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>>>>> fall back on interpreting as monotonic.
>>>>>
>>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>>>>
>>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>>>>> changes skb and does not change skb->sk.
>>>>>
>>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>>>>> from looking at __sock_cmsg_send().
>>>>>
>>>>> There may be a short period of disconnect between what is in sk->sk_flags and
>>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>>>>
>>>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>>>> mono time either set by kernel-tcp or bpf.
>>>>
>>>> It does sound like the approach with least side effects.
>>>>
>>>> If we're going to increase to two bits per skb, it's perhaps
>>>> better to just encode the (selected supported) CLOCK_ type, rather
>>>> than only supporting clockid through skb->sk->sk_clockid.
>>>
>>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
>>>
>>>>
>>>> This BPF function is the analogue to SO_TXTIME. It is clearly
>>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
>>>> work with sch_etf, say.
>>>
>>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
>>>
>>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
>>>
>> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
>>
>> Here is what i will do from myside
>> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets
>> 2. Keep mono_delivery_time as ease
>> 3. Add user_delivery_time as a new bitfield
>> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
>> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set.
>>
>> Let me know if i have covered all the design details or add if i missed anything.
> 
> Thanks for revising this.
> 
> No need to add the BPF part here.
> 
> I was mistaken that we can encode the clock_id in two skb bits.
> SO_TXTIME allows essentially all CLOCK_...

The two bits could potentially only encode the delivery time that is allowed to 
be forwarded without reset. 0 could mean refering back to sk_clockid and don't 
forward. The final consumer of the forwarded skb->tstamp is the qdisc which 
currently only has mono and tai. The concern is more on future extension to 
allow more clock type to be forwarded?

I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on 
how other clock types are used now.

Thanks.

> 
> So indeed we will need a user_delivery_time bit and use that to look
> at sk_clockid.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-13 22:08               ` Martin KaFai Lau
@ 2024-03-14  9:49                 ` Willem de Bruijn
  2024-03-14 19:21                   ` Martin KaFai Lau
  0 siblings, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-03-14  9:49 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn, Abhishek Chauhan (ABC)
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

Martin KaFai Lau wrote:
> On 3/13/24 2:40 PM, Willem de Bruijn wrote:
> > Abhishek Chauhan (ABC) wrote:
> >>
> >>
> >> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> >>> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> >>>> Martin KaFai Lau wrote:
> >>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> >>>>>> Martin KaFai Lau wrote:
> >>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>>>>>>> Bridge driver today has no support to forward the userspace timestamp
> >>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
> >>>>>>>> time sensitive packets. These changes will allow userspace timestamps
> >>>>>>>> packets to be forwarded from the bridge to NIC drivers.
> >>>>>>>>
> >>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>>>>>>> userspace tstamp packets in the forwarding path.
> >>>>>>>>
> >>>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
> >>>>>>>> instead just extended with userspace tstamp support for bridge
> >>>>>>>> forwarding path.
> >>>>>>>
> >>>>>>> The patch currently broke the bpf selftest test_tc_dtime:
> >>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>>>>>>
> >>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >>>>>>> ingress or a delivery time set by user space).
> >>>>>>>
> >>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >>>>>>> side such that reading __sk_buff->tstamp_type only returns
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>>>>>>
> >>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>>>>>>
> >>>>>>> I ran out of idea to solve this uapi breakage.
> >>>>>>>
> >>>>>>> I am afraid it may need to go back to v1 idea and use another bit
> >>>>>>> (user_delivery_time) in the skb.
> >>>>>>
> >>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> >>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
> >>>>>
> >>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
> >>>>> its interpretation depends on skb->sk->sk_clockid.
> >>>>>
> >>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> >>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
> >>>>>
> >>>>> skb->sk == NULL is fine. I tried something like this in
> >>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> >>>>>
> >>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> >>>>>
> >>>>>      skb->mono_delivery_time == 1 &&
> >>>>>      (!skb->sk ||
> >>>>>       !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> >>>>>       skb->sk->sk_protocol == IPPROTO_TCP)
> >>>>>
> >>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> >>>>>
> >>>>>>
> >>>>>> I don't see an immediate use for this BPF function on egress where it
> >>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> >>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> >>>>>
> >>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
> >>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
> >>>>> sending to the sch_fq.
> >>>>>
> >>>>> I currently also don't have a use case for skb->sk->sk_clockid !=
> >>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
> >>>>> before queuing to sch_fq.
> >>>>>
> >>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
> >>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
> >>>>> between the veth pair.
> >>>>>
> >>>>>>
> >>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> >>>>>> already explicitly programmed?
> >>>>>
> >>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
> >>>>> acceptable.
> >>>>>
> >>>>>>
> >>>>>>        skb->sk &&
> >>>>>>        sock_flag(sk, SOCK_TXTIME) &&
> >>>>>>        skb->sk->sk_clockid != CLOCK_MONOTONIC
> >>>>>
> >>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> >>>>>> fall back on interpreting as monotonic.
> >>>>>
> >>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> >>>>>
> >>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
> >>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
> >>>>> changes skb and does not change skb->sk.
> >>>>>
> >>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
> >>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
> >>>>> from looking at __sock_cmsg_send().
> >>>>>
> >>>>> There may be a short period of disconnect between what is in sk->sk_flags and
> >>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
> >>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
> >>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> >>>>>
> >>>>> I think all this is crying for another bit in skb to mean user_delivery_time
> >>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
> >>>>> mono time either set by kernel-tcp or bpf.
> >>>>
> >>>> It does sound like the approach with least side effects.
> >>>>
> >>>> If we're going to increase to two bits per skb, it's perhaps
> >>>> better to just encode the (selected supported) CLOCK_ type, rather
> >>>> than only supporting clockid through skb->sk->sk_clockid.
> >>>
> >>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> >>>
> >>>>
> >>>> This BPF function is the analogue to SO_TXTIME. It is clearly
> >>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> >>>> work with sch_etf, say.
> >>>
> >>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> >>>
> >>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> >>>
> >> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
> >>
> >> Here is what i will do from myside
> >> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets
> >> 2. Keep mono_delivery_time as ease
> >> 3. Add user_delivery_time as a new bitfield
> >> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
> >> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set.
> >>
> >> Let me know if i have covered all the design details or add if i missed anything.
> > 
> > Thanks for revising this.
> > 
> > No need to add the BPF part here.
> > 
> > I was mistaken that we can encode the clock_id in two skb bits.
> > SO_TXTIME allows essentially all CLOCK_...
> 
> The two bits could potentially only encode the delivery time that is allowed to 
> be forwarded without reset. 0 could mean refering back to sk_clockid and don't 
> forward. The final consumer of the forwarded skb->tstamp is the qdisc which 
> currently only has mono and tai.

So the followinng meaning of bit pair
{ skb->mono_delivery_time, skb->user_delivery_time } ?
 
- { 0, 0 } legacy skb->tstamp: realtime on rx
- { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
- { 0, 1 } skb->tstamp is tai: analogous to mono case
- { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid

> The concern is more on future extension to 
> allow more clock type to be forwarded?

Right. Not because I have an immediate use for them. Only because
SO_TXTIME already allows configuring other values for sk_clockid.
  
> I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on 
> how other clock types are used now.
> 
> Thanks.
> 
> > 
> > So indeed we will need a user_delivery_time bit and use that to look
> > at sk_clockid.
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14  9:49                 ` Willem de Bruijn
@ 2024-03-14 19:21                   ` Martin KaFai Lau
  2024-03-14 20:28                     ` Willem de Bruijn
  0 siblings, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-14 19:21 UTC (permalink / raw)
  To: Willem de Bruijn, Abhishek Chauhan (ABC)
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/14/24 2:49 AM, Willem de Bruijn wrote:
>> The two bits could potentially only encode the delivery time that is allowed to
>> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
>> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
>> currently only has mono and tai.
> 
> So the followinng meaning of bit pair
> { skb->mono_delivery_time, skb->user_delivery_time } ?
>   
> - { 0, 0 } legacy skb->tstamp: realtime on rx
> - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
> - { 0, 1 } skb->tstamp is tai: analogous to mono case
> - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid

I was thinking only forward mono and tai until it is clearer how other clocks 
will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
} at egress will mean go look skb->sk->sk_clockid.

I do like your scheme such that it is much clearer what is in skb->tstamp 
without depending on other bits like tc_at_ingress or not.

"{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
skb->sk->sk_clockid".

The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
changed to handle the new "user_delivery_time" bit anyway, e.g. 
bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
"user_delivery_time" bit.

I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
field. "mono_delivery_time" is already in there.

It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
into a 2 bits field like:

struct sk_buff {
	__u8 tstamp_type:2;
};

enum {
	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */

	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
	 *
	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
	 * such that reading __sk_buff->tstamp_type will match the
	 * SKB_TSTAMP_TYPE_TX_USER.
	 *
	 * The bpf program can learn the clockid by
	 * reading skb->sk->sk_clockid.
	 *
	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
	 * should be disallowed for now until the use case
	 * is more clear. Potentially, we could allow it
	 * in the future as long as
	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
	 */
	SKB_TSTAMP_TYPE_TX_USER = 2,

	/* UNUSED_FOR_FUTURE = 3, */
};

It will have more code churns in the first patch to rename 
s/mono_delivery_time/tstamp_type/.

wdyt?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 19:21                   ` Martin KaFai Lau
@ 2024-03-14 20:28                     ` Willem de Bruijn
  2024-03-14 20:53                       ` Abhishek Chauhan (ABC)
  0 siblings, 1 reply; 25+ messages in thread
From: Willem de Bruijn @ 2024-03-14 20:28 UTC (permalink / raw)
  To: Martin KaFai Lau, Willem de Bruijn, Abhishek Chauhan (ABC)
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

Martin KaFai Lau wrote:
> On 3/14/24 2:49 AM, Willem de Bruijn wrote:
> >> The two bits could potentially only encode the delivery time that is allowed to
> >> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
> >> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
> >> currently only has mono and tai.
> > 
> > So the followinng meaning of bit pair
> > { skb->mono_delivery_time, skb->user_delivery_time } ?
> >   
> > - { 0, 0 } legacy skb->tstamp: realtime on rx
> > - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
> > - { 0, 1 } skb->tstamp is tai: analogous to mono case
> > - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid
> 
> I was thinking only forward mono and tai until it is clearer how other clocks 
> will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
> other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
> } at egress will mean go look skb->sk->sk_clockid.
> 
> I do like your scheme such that it is much clearer what is in skb->tstamp 
> without depending on other bits like tc_at_ingress or not.
> 
> "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
> for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
> Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
> skb->sk->sk_clockid".
> 
> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
> changed to handle the new "user_delivery_time" bit anyway, e.g. 
> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
> "user_delivery_time" bit.
> 
> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
> field. "mono_delivery_time" is already in there.
> 
> It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
> into a 2 bits field like:
> 
> struct sk_buff {
> 	__u8 tstamp_type:2;
> };
> 
> enum {
> 	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
> 	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
> 
> 	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
> 	 *
> 	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
> 	 * such that reading __sk_buff->tstamp_type will match the
> 	 * SKB_TSTAMP_TYPE_TX_USER.
> 	 *
> 	 * The bpf program can learn the clockid by
> 	 * reading skb->sk->sk_clockid.
> 	 *
> 	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
> 	 * should be disallowed for now until the use case
> 	 * is more clear. Potentially, we could allow it
> 	 * in the future as long as
> 	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
> 	 */
> 	SKB_TSTAMP_TYPE_TX_USER = 2,
> 
> 	/* UNUSED_FOR_FUTURE = 3, */
> };
> 
> It will have more code churns in the first patch to rename 
> s/mono_delivery_time/tstamp_type/.
> 
> wdyt?

I asked for such code churn in the original patch. We then decided to
leave the variable name as is, as the churn was significant.

Long term, it is obviously cleaner.

I don't have a strong opinion. If doing this, let's at least make it
two separate patches, one that is a NOOP rename only.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 20:28                     ` Willem de Bruijn
@ 2024-03-14 20:53                       ` Abhishek Chauhan (ABC)
  2024-03-14 21:48                         ` Martin KaFai Lau
  0 siblings, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-14 20:53 UTC (permalink / raw)
  To: Willem de Bruijn, Martin KaFai Lau
  Cc: kernel, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/14/2024 1:28 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/14/24 2:49 AM, Willem de Bruijn wrote:
>>>> The two bits could potentially only encode the delivery time that is allowed to
>>>> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
>>>> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
>>>> currently only has mono and tai.
>>>
>>> So the followinng meaning of bit pair
>>> { skb->mono_delivery_time, skb->user_delivery_time } ?
>>>   
>>> - { 0, 0 } legacy skb->tstamp: realtime on rx
>>> - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
>>> - { 0, 1 } skb->tstamp is tai: analogous to mono case
>>> - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid
>>
>> I was thinking only forward mono and tai until it is clearer how other clocks 
>> will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
>> other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
>> } at egress will mean go look skb->sk->sk_clockid.
>>
>> I do like your scheme such that it is much clearer what is in skb->tstamp 
>> without depending on other bits like tc_at_ingress or not.
>>
>> "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
>> for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
>> Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
>> skb->sk->sk_clockid".
>>
>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
>> changed to handle the new "user_delivery_time" bit anyway, e.g. 
>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
>> "user_delivery_time" bit.
>>
>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
>> field. "mono_delivery_time" is already in there.
>>
>> It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
>> into a 2 bits field like:
>>
>> struct sk_buff {
>> 	__u8 tstamp_type:2;
>> };
>>
>> enum {
>> 	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>> 	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>
>> 	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
>> 	 *
>> 	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
>> 	 * such that reading __sk_buff->tstamp_type will match the
>> 	 * SKB_TSTAMP_TYPE_TX_USER.
>> 	 *
>> 	 * The bpf program can learn the clockid by
>> 	 * reading skb->sk->sk_clockid.
>> 	 *
>> 	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
>> 	 * should be disallowed for now until the use case
>> 	 * is more clear. Potentially, we could allow it
>> 	 * in the future as long as
>> 	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
>> 	 */
>> 	SKB_TSTAMP_TYPE_TX_USER = 2,
>>
>> 	/* UNUSED_FOR_FUTURE = 3, */
>> };
>>
>> It will have more code churns in the first patch to rename 
>> s/mono_delivery_time/tstamp_type/.
>>
>> wdyt?
> 
> I asked for such code churn in the original patch. We then decided to
> leave the variable name as is, as the churn was significant.
> 
> Long term, it is obviously cleaner.
> 
> I don't have a strong opinion. If doing this, let's at least make it
> two separate patches, one that is a NOOP rename only.
> Martin and Willem.
Lets do the cleaner approach. I feel its now or never. 

1. I will raise one patch to introduce rename mono_delivery_time to 
tstamp_type 
2. I will introduce setting of userspace timestamp type as the second bit 
whem transmit_time is set. 
3. This will be a first step to make the design scalable. 
4. Tomorrow if we have more timestamp to support, upstream community has to do is 
update the enum and increase the bitfield from 2=>3 and so on. 

I need help from Martin to test the patch which renames the mono_delivery_time 
to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)

Sounds like a plan ? 

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 20:53                       ` Abhishek Chauhan (ABC)
@ 2024-03-14 21:48                         ` Martin KaFai Lau
  2024-03-14 21:54                           ` Martin KaFai Lau
  2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
  0 siblings, 2 replies; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-14 21:48 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC)
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>> "user_delivery_time" bit.
>>>
>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>> field. "mono_delivery_time" is already in there.

[ ... ]

I would think the first step is to revert this patch. I don't think much of the 
current patch can be reused.

> 1. I will raise one patch to introduce rename mono_delivery_time to
> tstamp_type

Right, I expect something like this:

struct sk_buff {
		/* ... */
-	        __u8                    mono_delivery_time:1;
+		__u8			tstamp_type:1;
		/* ... */
};

> 2. I will introduce setting of userspace timestamp type as the second bit
> whem transmit_time is set.

I expect the second patch should be introducing the enum first

enum skb_tstamp_type {
	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
};

and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
"skb->tstamp_type = 1;"

and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
"skb->tstamp_type = 0;"

This one I am not sure but probably need to change the skb_set_delivery_time() 
function signature also:

static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
-                                        bool mono)
+					 enum skb_tstamp_type tstamp_type)

The third patch is to change tstamp_type from 1 bit to 2 bits and add 
SKB_TSTAMP_TYPE_TX_USER.

struct sk_buff {
		/* ... */
-		__u8			tstamp_type:1;
+		__u8			tstamp_type:2;
		/* ... */
};

enum skb_tstamp_type {
	SKB_TSTAMP_TYPE_RX_REAL = 0,	/* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1,	/* A TX (delivery) time in mono */
+	SKB_TSTAMP_TYPE_TX_USER = 2,	/* A TX (delivery) time and its clock
					 * is in skb->sk->sk_clockid.
					 */

};

This will shift a bit out of the byte where tstamp_type lives. It should be the 
"inner_protocol_type" bit by my hand count. Please check if it is directly used 
in bpf instruction (filter.c). As far as I look, it is not, so should be fine. 
Some details about bpf instruction accessible skb bit field here: 
https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/

> 3. This will be a first step to make the design scalable.
> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
> update the enum and increase the bitfield from 2=>3 and so on.
> 
> I need help from Martin to test the patch which renames the mono_delivery_time
> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)

The bpf change is not a no-op rename of mono_delivery_time. It needs to take 
care of the new bit added to the tstamp_type. Please see the previous email (and 
I also left it in the beginning of this email).

Thus, you need to compile the selftests/bpf/ and run it to verify the changes 
when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto 
details. You probably only need the newer llvm (newer gcc should work also as 
bpf CI has been using it) and the newer pahole. I can definitely help if there 
is issue in running the test_progs in selftests/bpf or you have question on 
making the changes in filter.c. To run the test: "./test_progs -t 
tc_redirect/tc_redirect_dtime"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 21:48                         ` Martin KaFai Lau
@ 2024-03-14 21:54                           ` Martin KaFai Lau
  2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
  1 sibling, 0 replies; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-14 21:54 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC)
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/14/24 2:48 PM, Martin KaFai Lau wrote:
> 
> I would think the first step is to revert this patch. I don't think much of the 
> current patch can be reused.

My bad. Please ignore. Just catch up on the revert patch you sent.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 21:48                         ` Martin KaFai Lau
  2024-03-14 21:54                           ` Martin KaFai Lau
@ 2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
  2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
  1 sibling, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-14 22:29 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>> "user_delivery_time" bit.
>>>>
>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>> field. "mono_delivery_time" is already in there.
> 
> [ ... ]
> 
> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
> 
>> 1. I will raise one patch to introduce rename mono_delivery_time to
>> tstamp_type
> 
> Right, I expect something like this:
> 
> struct sk_buff {
>         /* ... */
> -            __u8                    mono_delivery_time:1;
> +        __u8            tstamp_type:1;
>         /* ... */
> };
> 

Okay ,This should be straight-forward. 

>> 2. I will introduce setting of userspace timestamp type as the second bit
>> whem transmit_time is set.
> 
> I expect the second patch should be introducing the enum first
> 
> enum skb_tstamp_type {
>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
> };
> 
> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
> "skb->tstamp_type = 1;"
> 
> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
> "skb->tstamp_type = 0;"
> 
> 
> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
> 
> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> -                                        bool mono)
> +                     enum skb_tstamp_type tstamp_type)
> 
This should be straight-forward as well 

> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
> 
> struct sk_buff {
>         /* ... */
> -        __u8            tstamp_type:1;
> +        __u8            tstamp_type:2;
>         /* ... */
> };
> 
> enum skb_tstamp_type {
>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>                      * is in skb->sk->sk_clockid.
>                      */
>                
> };
> 
> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
But i have plan on this and i know how to do it. 

Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
Fingers crossed :) 

> 
> 
>> 3. This will be a first step to make the design scalable.
>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>> update the enum and increase the bitfield from 2=>3 and so on.
>>
>> I need help from Martin to test the patch which renames the mono_delivery_time
>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
> 
> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
> 
> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
@ 2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
  2024-03-19 19:46                               ` Martin KaFai Lau
  2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
  0 siblings, 2 replies; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-18 19:02 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>> "user_delivery_time" bit.
>>>>>
>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>> field. "mono_delivery_time" is already in there.
>>
>> [ ... ]
>>

Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to 
distinguish between timestamp anyway .

Let me know what you think ? 

>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>
>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>> tstamp_type
>>
>> Right, I expect something like this:
>>
>> struct sk_buff {
>>         /* ... */
>> -            __u8                    mono_delivery_time:1;
>> +        __u8            tstamp_type:1;
>>         /* ... */
>> };
>>
> 
> Okay ,This should be straight-forward. 
> 
>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>> whem transmit_time is set.
>>
>> I expect the second patch should be introducing the enum first
>>
>> enum skb_tstamp_type {
>>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>> };
>>
>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>> "skb->tstamp_type = 1;"
>>
>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>> "skb->tstamp_type = 0;"
>>
>>
>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>
>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>> -                                        bool mono)
>> +                     enum skb_tstamp_type tstamp_type)
>>
> This should be straight-forward as well 
> 
>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>
>> struct sk_buff {
>>         /* ... */
>> -        __u8            tstamp_type:1;
>> +        __u8            tstamp_type:2;
>>         /* ... */
>> };
>>
>> enum skb_tstamp_type {
>>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>                      * is in skb->sk->sk_clockid.
>>                      */
>>                
>> };
>>
>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
> But i have plan on this and i know how to do it. 
> 
> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
> Fingers crossed :) 
> 
>>
>>
>>> 3. This will be a first step to make the design scalable.
>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>
>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>
>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>
>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
@ 2024-03-19 19:46                               ` Martin KaFai Lau
  2024-03-19 20:12                                 ` Abhishek Chauhan (ABC)
  2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
  1 sibling, 1 reply; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-19 19:46 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC)
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>> field. "mono_delivery_time" is already in there.
>>> [ ... ]
>>>
> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
> distinguish between timestamp anyway .


The context for this was before combining mono_delivery_time:1 and 
user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to 
inet_frag_queue because it is combined into tstamp_type:2. If 
mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should 
be done in inet_frag_queue.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-19 19:46                               ` Martin KaFai Lau
@ 2024-03-19 20:12                                 ` Abhishek Chauhan (ABC)
  0 siblings, 0 replies; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-19 20:12 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/19/2024 12:46 PM, Martin KaFai Lau wrote:
> On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>>> field. "mono_delivery_time" is already in there.
>>>> [ ... ]
>>>>
>> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
>> distinguish between timestamp anyway .
> 
> 
> The context for this was before combining mono_delivery_time:1 and user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to inet_frag_queue because it is combined into tstamp_type:2. If mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should be done in inet_frag_queue.
> 
Thats what i was planning to do. This is more or like the patch 3 - when brings in the changes of two bits. 
Thanks Martin for clarification. 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
  2024-03-19 19:46                               ` Martin KaFai Lau
@ 2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
  2024-03-20 20:30                                 ` Martin KaFai Lau
  1 sibling, 1 reply; 25+ messages in thread
From: Abhishek Chauhan (ABC) @ 2024-03-20  6:22 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko



On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>>> "user_delivery_time" bit.
>>>>>>
>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>> field. "mono_delivery_time" is already in there.
>>>
>>> [ ... ]
>>>
> 
> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to 
> distinguish between timestamp anyway .
> 
> Let me know what you think ? 
> 
>>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>>
>>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>>> tstamp_type
>>>
>>> Right, I expect something like this:
>>>
>>> struct sk_buff {
>>>         /* ... */
>>> -            __u8                    mono_delivery_time:1;
>>> +        __u8            tstamp_type:1;
>>>         /* ... */
>>> };
>>>
>>
>> Okay ,This should be straight-forward. 
>>
>>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>>> whem transmit_time is set.
>>>
>>> I expect the second patch should be introducing the enum first
>>>
>>> enum skb_tstamp_type {
>>>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>> };
>>>
>>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>>> "skb->tstamp_type = 1;"
>>>
>>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>>> "skb->tstamp_type = 0;"
>>>
>>>
>>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>>
>>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>> -                                        bool mono)
>>> +                     enum skb_tstamp_type tstamp_type)
>>>
>> This should be straight-forward as well 
>>
>>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>>
>>> struct sk_buff {
>>>         /* ... */
>>> -        __u8            tstamp_type:1;
>>> +        __u8            tstamp_type:2;
>>>         /* ... */
>>> };
>>>
>>> enum skb_tstamp_type {
>>>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>>                      * is in skb->sk->sk_clockid.
>>>                      */
>>>                
>>> };
>>>
>>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
>> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
>> But i have plan on this and i know how to do it. 
>>
>> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
>> Fingers crossed :) 
>>
>>>
>>>
>>>> 3. This will be a first step to make the design scalable.
>>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>>
>>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>>
>>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>>
>>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>>

Martin,
I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ? 

[ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready
[ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
[ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready
[ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready
#214/5   tc_redirect/tc_redirect_dtime:OK
#214     tc_redirect:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
  2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
@ 2024-03-20 20:30                                 ` Martin KaFai Lau
  0 siblings, 0 replies; 25+ messages in thread
From: Martin KaFai Lau @ 2024-03-20 20:30 UTC (permalink / raw)
  To: Abhishek Chauhan (ABC)
  Cc: Willem de Bruijn, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney,
	Martin KaFai Lau, bpf, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko

On 3/19/24 11:22 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
>>>
>>>
>>> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>>>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>>>> "user_delivery_time" bit.
>>>>>>>
>>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>>> field. "mono_delivery_time" is already in there.
>>>>
>>>> [ ... ]
>>>>
>>
>> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
>> distinguish between timestamp anyway .
>>
>> Let me know what you think ?
>>
>>>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>>>
>>>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>>>> tstamp_type
>>>>
>>>> Right, I expect something like this:
>>>>
>>>> struct sk_buff {
>>>>          /* ... */
>>>> -            __u8                    mono_delivery_time:1;
>>>> +        __u8            tstamp_type:1;
>>>>          /* ... */
>>>> };
>>>>
>>>
>>> Okay ,This should be straight-forward.
>>>
>>>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>>>> whem transmit_time is set.
>>>>
>>>> I expect the second patch should be introducing the enum first
>>>>
>>>> enum skb_tstamp_type {
>>>>      SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>>>      SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>>> };
>>>>
>>>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>>>> "skb->tstamp_type = 1;"
>>>>
>>>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>>>> "skb->tstamp_type = 0;"
>>>>
>>>>
>>>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>>>
>>>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>>> -                                        bool mono)
>>>> +                     enum skb_tstamp_type tstamp_type)
>>>>
>>> This should be straight-forward as well
>>>
>>>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>>>
>>>> struct sk_buff {
>>>>          /* ... */
>>>> -        __u8            tstamp_type:1;
>>>> +        __u8            tstamp_type:2;
>>>>          /* ... */
>>>> };
>>>>
>>>> enum skb_tstamp_type {
>>>>      SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>>>      SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>>>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>>>                       * is in skb->sk->sk_clockid.
>>>>                       */
>>>>                 
>>>> };
>>>>
>>>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
>>> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them.
>>> But i have plan on this and i know how to do it.
>>>
>>> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm
>>> Fingers crossed :)
>>>
>>>>
>>>>
>>>>> 3. This will be a first step to make the design scalable.
>>>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>>>
>>>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>>>
>>>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>>>
>>>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>>>
> 
> Martin,
> I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ?
> 
> [ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready
> [ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
> [ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready
> [ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready
> #214/5   tc_redirect/tc_redirect_dtime:OK
> #214     tc_redirect:OK

lgtm. thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-03-20 20:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240301201348.2815102-1-quic_abchauha@quicinc.com>
2024-03-12 23:52 ` [PATCH net-next v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets Martin KaFai Lau
2024-03-13  4:34   ` Abhishek Chauhan (ABC)
2024-03-13  5:32     ` Abhishek Chauhan (ABC)
2024-03-13  8:52   ` Willem de Bruijn
2024-03-13 18:42     ` Martin KaFai Lau
2024-03-13 19:36       ` Willem de Bruijn
2024-03-13 20:59         ` Abhishek Chauhan (ABC)
2024-03-13 21:19           ` Martin KaFai Lau
2024-03-13 21:41             ` Daniel Borkmann
2024-03-13 21:01         ` Martin KaFai Lau
2024-03-13 21:26           ` Abhishek Chauhan (ABC)
2024-03-13 21:40             ` Willem de Bruijn
2024-03-13 22:08               ` Martin KaFai Lau
2024-03-14  9:49                 ` Willem de Bruijn
2024-03-14 19:21                   ` Martin KaFai Lau
2024-03-14 20:28                     ` Willem de Bruijn
2024-03-14 20:53                       ` Abhishek Chauhan (ABC)
2024-03-14 21:48                         ` Martin KaFai Lau
2024-03-14 21:54                           ` Martin KaFai Lau
2024-03-14 22:29                           ` Abhishek Chauhan (ABC)
2024-03-18 19:02                             ` Abhishek Chauhan (ABC)
2024-03-19 19:46                               ` Martin KaFai Lau
2024-03-19 20:12                                 ` Abhishek Chauhan (ABC)
2024-03-20  6:22                               ` Abhishek Chauhan (ABC)
2024-03-20 20:30                                 ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox