* [RFC PATCH bpf-next v4 0/2] Replace mono_delivery_time with tstamp_type @ 2024-04-18 0:43 Abhishek Chauhan 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type Abhishek Chauhan 0 siblings, 2 replies; 16+ messages in thread From: Abhishek Chauhan @ 2024-04-18 0:43 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel Patch 1 :- This patch takes care of only renaming the mono delivery timestamp to tstamp_type with no change in functionality of existing available code in kernel also Starts assigning tstamp_type with either mono or real and introduces a new enum in the skbuff.h, again no change in functionality of the existing available code in kernel , just making the code scalable. Patch 2 :- Additional bit was added to support tai timestamp type to avoid tstamp drops in the forwarding path when testing TC-ETF. With this patch i am updating bpf filter.c and not sure what impacts it has towards BPF code. I need upstream BPF community to help me in adding the necessary BPF changes to avoid any BPF test case failures. I have added some BPF changes as part of this commit to handle cases of both tai and mono bit being set at the same time. Abhishek Chauhan (2): net: Rename mono_delivery_time to tstamp_type for scalabilty net: Add additional bit to support clockid_t timestamp type include/linux/skbuff.h | 61 +++++++++++++++---- include/net/inet_frag.h | 4 +- include/uapi/linux/bpf.h | 1 + net/bridge/netfilter/nf_conntrack_bridge.c | 6 +- net/core/dev.c | 2 +- net/core/filter.c | 50 ++++++++++++--- net/ieee802154/6lowpan/reassembly.c | 2 +- net/ipv4/inet_fragment.c | 2 +- net/ipv4/ip_fragment.c | 2 +- net/ipv4/ip_output.c | 11 ++-- net/ipv4/raw.c | 2 +- net/ipv4/tcp_output.c | 14 ++--- net/ipv6/ip6_output.c | 8 +-- net/ipv6/netfilter.c | 6 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/reassembly.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- net/packet/af_packet.c | 7 +-- net/sched/act_bpf.c | 4 +- net/sched/cls_bpf.c | 4 +- .../selftests/bpf/prog_tests/ctx_rewrite.c | 13 ++-- 22 files changed, 139 insertions(+), 68 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 0:43 [RFC PATCH bpf-next v4 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan @ 2024-04-18 0:43 ` Abhishek Chauhan 2024-04-18 18:47 ` Willem de Bruijn 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type Abhishek Chauhan 1 sibling, 1 reply; 16+ messages in thread From: Abhishek Chauhan @ 2024-04-18 0:43 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel mono_delivery_time was added to check if skb->tstamp has delivery time in mono clock base (i.e. EDT) otherwise skb->tstamp has timestamp in ingress and delivery_time at egress. Renaming the bitfield from mono_delivery_time to tstamp_type is for extensibilty for other timestamps such as userspace timestamp (i.e. SO_TXTIME) set via sock opts. As we are renaming the mono_delivery_time to tstamp_type, it makes sense to start assigning tstamp_type based on enum defined in this commit. Earlier we used bool arg flag to check if the tstamp is mono in function skb_set_delivery_time, Now the signature of the functions accepts tstamp_type to distinguish between mono and real time. In future tstamp_type:1 can be extended to support userspace timestamp by increasing the bitfield. Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- Changes since v3 - Fixed inconsistent capitalization in skbuff.h - remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h and point it to skb_tstamp_type now. - Explicitely setting SKB_CLOCK_MONO if valid transmit_time ip_send_unicast_reply - Keeping skb_tstamp inline with skb_clear_tstamp. - skb_set_delivery_time checks if timstamp is 0 and sets the tstamp_type to SKB_CLOCK_REAL. - Above comments are given by Willem - Found out that skbuff.h has access to uapi/linux/time.h So now instead of using CLOCK_REAL/CLOCK_MONO i am checking actual clockid_t directly to set tstamp_type example:- CLOCK_REALTIME/CLOCK_MONOTONIC - Compilation error fixed in net/ieee802154/6lowpan/reassembly.c Changes since v2 - Minor changes to commit subject Changes since v1 - Squashed the two commits into one as mentioned by Willem. - Introduced switch in skb_set_delivery_time. - Renamed and removed directionality aspects w.r.t tstamp_type as mentioned by Willem. include/linux/skbuff.h | 44 +++++++++++++++++----- include/net/inet_frag.h | 4 +- net/bridge/netfilter/nf_conntrack_bridge.c | 6 +-- net/core/dev.c | 2 +- net/core/filter.c | 10 ++--- net/ieee802154/6lowpan/reassembly.c | 2 +- net/ipv4/inet_fragment.c | 2 +- net/ipv4/ip_fragment.c | 2 +- net/ipv4/ip_output.c | 9 +++-- net/ipv4/tcp_output.c | 14 +++---- net/ipv6/ip6_output.c | 6 +-- net/ipv6/netfilter.c | 6 +-- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +- net/ipv6/reassembly.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- net/sched/act_bpf.c | 4 +- net/sched/cls_bpf.c | 4 +- 17 files changed, 73 insertions(+), 48 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index aded4949ea79..2e7811c7e4db 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -702,6 +702,17 @@ typedef unsigned int sk_buff_data_t; typedef unsigned char *sk_buff_data_t; #endif +/** + * tstamp_type:1 can take 2 values each + * represented by time base in skb + * 0x0 => real timestamp_type + * 0x1 => mono timestamp_type + */ +enum skb_tstamp_type { + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ +}; + /** * DOC: Basic sk_buff geometry * @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required - * @mono_delivery_time: When set, skb->tstamp has the + * @tstamp_type: When set, skb->tstamp has the * delivery_time in mono clock base (i.e. EDT). Otherwise, the * skb->tstamp has the (rcv) timestamp at ingress and * delivery_time at egress. @@ -950,7 +961,7 @@ struct sk_buff { /* private: */ __u8 __mono_tc_offset[0]; /* public: */ - __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ + __u8 tstamp_type:1; /* See SKB_CLOCK_*_MASK */ #ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; @@ -4170,7 +4181,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb, static inline void __net_timestamp(struct sk_buff *skb) { skb->tstamp = ktime_get_real(); - skb->mono_delivery_time = 0; + skb->tstamp_type = SKB_CLOCK_REAL; } static inline ktime_t net_timedelta(ktime_t t) @@ -4179,10 +4190,23 @@ static inline ktime_t net_timedelta(ktime_t t) } static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, - bool mono) + u8 tstamp_type) { skb->tstamp = kt; - skb->mono_delivery_time = kt && mono; + + if (!kt) { + skb->tstamp_type = SKB_CLOCK_REAL; + return; + } + + switch (tstamp_type) { + case CLOCK_REALTIME: + skb->tstamp_type = SKB_CLOCK_REAL; + break; + case CLOCK_MONOTONIC: + skb->tstamp_type = SKB_CLOCK_MONO; + break; + } } DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); @@ -4192,8 +4216,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); */ static inline void skb_clear_delivery_time(struct sk_buff *skb) { - if (skb->mono_delivery_time) { - skb->mono_delivery_time = 0; + if (skb->tstamp_type) { + skb->tstamp_type = SKB_CLOCK_REAL; if (static_branch_unlikely(&netstamp_needed_key)) skb->tstamp = ktime_get_real(); else @@ -4203,7 +4227,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb) static inline void skb_clear_tstamp(struct sk_buff *skb) { - if (skb->mono_delivery_time) + if (skb->tstamp_type) return; skb->tstamp = 0; @@ -4211,7 +4235,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb) static inline ktime_t skb_tstamp(const struct sk_buff *skb) { - if (skb->mono_delivery_time) + if (skb->tstamp_type) return 0; return skb->tstamp; @@ -4219,7 +4243,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb) static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond) { - if (!skb->mono_delivery_time && skb->tstamp) + if (skb->tstamp_type != SKB_CLOCK_MONO && skb->tstamp) return skb->tstamp; if (static_branch_unlikely(&netstamp_needed_key) || cond) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 153960663ce4..5af6eb14c5db 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -76,7 +76,7 @@ struct frag_v6_compare_key { * @stamp: timestamp of the last received fragment * @len: total length of the original datagram * @meat: length of received fragments so far - * @mono_delivery_time: stamp has a mono delivery time (EDT) + * @tstamp_type: stamp has a mono delivery time (EDT) * @flags: fragment queue flags * @max_size: maximum received fragment size * @fqdir: pointer to struct fqdir @@ -97,7 +97,7 @@ struct inet_frag_queue { ktime_t stamp; int len; int meat; - u8 mono_delivery_time; + u8 tstamp_type; __u8 flags; u16 max_size; struct fqdir *fqdir; diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c index c3c51b9a6826..816bb0fde718 100644 --- a/net/bridge/netfilter/nf_conntrack_bridge.c +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, struct sk_buff *)) { int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; - bool mono_delivery_time = skb->mono_delivery_time; + u8 tstamp_type = skb->tstamp_type; unsigned int hlen, ll_rs, mtu; ktime_t tstamp = skb->tstamp; struct ip_frag_state state; @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, if (iter.frag) ip_fraglist_prepare(skb, &iter); - skb_set_delivery_time(skb, tstamp, mono_delivery_time); + skb_set_delivery_time(skb, tstamp, tstamp_type); err = output(net, sk, data, skb); if (err || !iter.frag) break; @@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, goto blackhole; } - skb_set_delivery_time(skb2, tstamp, mono_delivery_time); + skb_set_delivery_time(skb2, tstamp, tstamp_type); err = output(net, sk, data, skb2); if (err) goto blackhole; diff --git a/net/core/dev.c b/net/core/dev.c index 854a3a28a8d8..4e7dfefbb824 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2146,7 +2146,7 @@ EXPORT_SYMBOL(net_disable_timestamp); static inline void net_timestamp_set(struct sk_buff *skb) { skb->tstamp = 0; - skb->mono_delivery_time = 0; + skb->tstamp_type = SKB_CLOCK_REAL; if (static_branch_unlikely(&netstamp_needed_key)) skb->tstamp = ktime_get_real(); } diff --git a/net/core/filter.c b/net/core/filter.c index 8d185d99a643..b1192e672cbb 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7709,13 +7709,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, if (!tstamp) return -EINVAL; skb->tstamp = tstamp; - skb->mono_delivery_time = 1; + skb->tstamp_type = SKB_CLOCK_MONO; break; case BPF_SKB_TSTAMP_UNSPEC: if (tstamp) return -EINVAL; skb->tstamp = 0; - skb->mono_delivery_time = 0; + skb->tstamp_type = SKB_CLOCK_REAL; break; default: return -EINVAL; @@ -9422,7 +9422,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); - /* skb->tc_at_ingress && skb->mono_delivery_time, + /* skb->tc_at_ingress && skb->tstamp_type:1, * read 0 as the (rcv) timestamp. */ *insn++ = BPF_MOV64_IMM(value_reg, 0); @@ -9447,7 +9447,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, * the bpf prog is aware the tstamp could have delivery time. * Thus, write skb->tstamp as is if tstamp_type_access is true. * Otherwise, writing at ingress will have to clear the - * mono_delivery_time bit also. + * mono_delivery_time (skb->tstamp_type:1)bit also. */ if (!prog->tstamp_type_access) { __u8 tmp_reg = BPF_REG_AX; @@ -9457,7 +9457,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); /* goto <store> */ *insn++ = BPF_JMP_A(2); - /* <clear>: mono_delivery_time */ + /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK); *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); } diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c index 6dd960ec558c..ba0455ad7701 100644 --- a/net/ieee802154/6lowpan/reassembly.c +++ b/net/ieee802154/6lowpan/reassembly.c @@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq, goto err; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.tstamp_type = skb->tstamp_type; if (frag_type == LOWPAN_DISPATCH_FRAG1) fq->q.flags |= INET_FRAG_FIRST_IN; diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index faaec92a46ac..d179a2c84222 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -619,7 +619,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head, skb_mark_not_on_list(head); head->prev = NULL; head->tstamp = q->stamp; - head->mono_delivery_time = q->mono_delivery_time; + head->tstamp_type = q->tstamp_type; if (sk) refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index fb947d1613fe..787aa86800f5 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) qp->iif = dev->ifindex; qp->q.stamp = skb->tstamp; - qp->q.mono_delivery_time = skb->mono_delivery_time; + qp->q.tstamp_type = skb->tstamp_type; qp->q.meat += skb->len; qp->ecn |= ecn; add_frag_mem_limit(qp->q.fqdir, skb->truesize); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 1fe794967211..f29349151203 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, { struct iphdr *iph; struct sk_buff *skb2; - bool mono_delivery_time = skb->mono_delivery_time; + u8 tstamp_type = skb->tstamp_type; struct rtable *rt = skb_rtable(skb); unsigned int mtu, hlen, ll_rs; struct ip_fraglist_iter iter; @@ -856,7 +856,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, } } - skb_set_delivery_time(skb, tstamp, mono_delivery_time); + skb_set_delivery_time(skb, tstamp, tstamp_type); err = output(net, sk, skb); if (!err) @@ -912,7 +912,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, /* * Put this fragment into the sending queue. */ - skb_set_delivery_time(skb2, tstamp, mono_delivery_time); + skb_set_delivery_time(skb2, tstamp, tstamp_type); err = output(net, sk, skb2); if (err) goto fail; @@ -1649,7 +1649,8 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, arg->csumoffset) = csum_fold(csum_add(nskb->csum, arg->csum)); nskb->ip_summed = CHECKSUM_NONE; - nskb->mono_delivery_time = !!transmit_time; + if (transmit_time) + nskb->tstamp_type = SKB_CLOCK_MONO; if (txhash) skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4); ip_push_pending_frames(sk, &fl4); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 61119d42b0fd..a062f88c47c3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, tp = tcp_sk(sk); prior_wstamp = tp->tcp_wstamp_ns; tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); if (clone_it) { oskb = skb; @@ -1650,7 +1650,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, skb_split(skb, buff, len); - skb_set_delivery_time(buff, skb->tstamp, true); + skb_set_delivery_time(buff, skb->tstamp, CLOCK_MONOTONIC); tcp_fragment_tstamp(skb, buff); old_factor = tcp_skb_pcount(skb); @@ -2731,7 +2731,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) { /* "skb_mstamp_ns" is used as a start point for the retransmit timer */ tp->tcp_wstamp_ns = tp->tcp_clock_cache; - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue); tcp_init_tso_segs(skb, mss_now); goto repair; /* Skip network transmission */ @@ -3714,11 +3714,11 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, #ifdef CONFIG_SYN_COOKIES if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok)) skb_set_delivery_time(skb, cookie_init_timestamp(req, now), - true); + CLOCK_MONOTONIC); else #endif { - skb_set_delivery_time(skb, now, true); + skb_set_delivery_time(skb, now, CLOCK_MONOTONIC); if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */ tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb); } @@ -3805,7 +3805,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb, synack_type, &opts); - skb_set_delivery_time(skb, now, true); + skb_set_delivery_time(skb, now, CLOCK_MONOTONIC); tcp_add_tx_delay(skb, tp); return skb; @@ -3989,7 +3989,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); - skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true); + skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, CLOCK_MONOTONIC); /* Now full SYN+DATA was cloned and sent (or not), * remove the SYN from the original skb (syn_data) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index b9dd3a66e423..a9e819115622 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? inet6_sk(skb->sk) : NULL; - bool mono_delivery_time = skb->mono_delivery_time; + u8 tstamp_type = skb->tstamp_type; struct ip6_frag_state state; unsigned int mtu, hlen, nexthdr_offset; ktime_t tstamp = skb->tstamp; @@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (iter.frag) ip6_fraglist_prepare(skb, &iter); - skb_set_delivery_time(skb, tstamp, mono_delivery_time); + skb_set_delivery_time(skb, tstamp, tstamp_type); err = output(net, sk, skb); if (!err) IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), @@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, /* * Put this fragment into the sending queue. */ - skb_set_delivery_time(frag, tstamp, mono_delivery_time); + skb_set_delivery_time(frag, tstamp, tstamp_type); err = output(net, sk, frag); if (err) goto fail; diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c index 53d255838e6a..e0c2347b4dc6 100644 --- a/net/ipv6/netfilter.c +++ b/net/ipv6/netfilter.c @@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct sk_buff *)) { int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size; - bool mono_delivery_time = skb->mono_delivery_time; + u8 tstamp_type = skb->tstamp_type; ktime_t tstamp = skb->tstamp; struct ip6_frag_state state; u8 *prevhdr, nexthdr = 0; @@ -192,7 +192,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, if (iter.frag) ip6_fraglist_prepare(skb, &iter); - skb_set_delivery_time(skb, tstamp, mono_delivery_time); + skb_set_delivery_time(skb, tstamp, tstamp_type); err = output(net, sk, data, skb); if (err || !iter.frag) break; @@ -225,7 +225,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, goto blackhole; } - skb_set_delivery_time(skb2, tstamp, mono_delivery_time); + skb_set_delivery_time(skb2, tstamp, tstamp_type); err = output(net, sk, data, skb2); if (err) goto blackhole; diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index d0dcbaca1994..5cc5d823d33f 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.tstamp_type = skb->tstamp_type; fq->q.meat += skb->len; fq->ecn |= ecn; if (payload_len > fq->q.max_size) diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index acb4f119e11f..ea724ff558b4 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, fq->iif = dev->ifindex; fq->q.stamp = skb->tstamp; - fq->q.mono_delivery_time = skb->mono_delivery_time; + fq->q.tstamp_type = skb->tstamp_type; fq->q.meat += skb->len; fq->ecn |= ecn; add_frag_mem_limit(fq->q.fqdir, skb->truesize); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index bb7c3caf4f85..7f8a8bd77547 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -975,7 +975,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 mark = inet_twsk(sk)->tw_mark; else mark = READ_ONCE(sk->sk_mark); - skb_set_delivery_time(buff, tcp_transmit_time(sk), true); + skb_set_delivery_time(buff, tcp_transmit_time(sk), CLOCK_MONOTONIC); } if (txhash) { /* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */ diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 0e3cf11ae5fc..9cd12d61818a 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb, bpf_compute_data_pointers(skb); filter_res = bpf_prog_run(filter, skb); } - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) - skb->mono_delivery_time = 0; + if (unlikely(!skb->tstamp && skb->tstamp_type)) + skb->tstamp_type = SKB_CLOCK_REAL; if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK) skb_orphan(skb); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 5e83e890f6a4..506516be5287 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb, bpf_compute_data_pointers(skb); filter_res = bpf_prog_run(prog->filter, skb); } - if (unlikely(!skb->tstamp && skb->mono_delivery_time)) - skb->mono_delivery_time = 0; + if (unlikely(!skb->tstamp && skb->tstamp_type)) + skb->tstamp_type = SKB_CLOCK_REAL; if (prog->exts_integrated) { res->class = 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan @ 2024-04-18 18:47 ` Willem de Bruijn 2024-04-18 19:58 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Willem de Bruijn @ 2024-04-18 18:47 UTC (permalink / raw) To: Abhishek Chauhan, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel Abhishek Chauhan wrote: > mono_delivery_time was added to check if skb->tstamp has delivery > time in mono clock base (i.e. EDT) otherwise skb->tstamp has > timestamp in ingress and delivery_time at egress. > > Renaming the bitfield from mono_delivery_time to tstamp_type is for > extensibilty for other timestamps such as userspace timestamp > (i.e. SO_TXTIME) set via sock opts. > > As we are renaming the mono_delivery_time to tstamp_type, it makes > sense to start assigning tstamp_type based on enum defined > in this commit. > > Earlier we used bool arg flag to check if the tstamp is mono in > function skb_set_delivery_time, Now the signature of the functions > accepts tstamp_type to distinguish between mono and real time. > > In future tstamp_type:1 can be extended to support userspace timestamp > by increasing the bitfield. > > Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > +/** > + * tstamp_type:1 can take 2 values each > + * represented by time base in skb > + * 0x0 => real timestamp_type > + * 0x1 => mono timestamp_type > + */ > +enum skb_tstamp_type { > + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ > + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ > +}; > + Can drop the comments. These names are self documenting. > /** > * DOC: Basic sk_buff geometry > * > @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; > * @dst_pending_confirm: need to confirm neighbour > * @decrypted: Decrypted SKB > * @slow_gro: state present at GRO time, slower prepare step required > - * @mono_delivery_time: When set, skb->tstamp has the > + * @tstamp_type: When set, skb->tstamp has the > * delivery_time in mono clock base (i.e. EDT). Otherwise, the > * skb->tstamp has the (rcv) timestamp at ingress and > * delivery_time at egress. Is this still correct? I think all egress does now annotate correctly as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 61119d42b0fd..a062f88c47c3 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, > tp = tcp_sk(sk); > prior_wstamp = tp->tcp_wstamp_ns; > tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); > - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); > + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); Multiple references to CLOCK_MONOTONIC left ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 18:47 ` Willem de Bruijn @ 2024-04-18 19:58 ` Abhishek Chauhan (ABC) 2024-04-18 20:11 ` Willem de Bruijn 0 siblings, 1 reply; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-18 19:58 UTC (permalink / raw) To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel On 4/18/2024 11:47 AM, Willem de Bruijn wrote: > Abhishek Chauhan wrote: >> mono_delivery_time was added to check if skb->tstamp has delivery >> time in mono clock base (i.e. EDT) otherwise skb->tstamp has >> timestamp in ingress and delivery_time at egress. >> >> Renaming the bitfield from mono_delivery_time to tstamp_type is for >> extensibilty for other timestamps such as userspace timestamp >> (i.e. SO_TXTIME) set via sock opts. >> >> As we are renaming the mono_delivery_time to tstamp_type, it makes >> sense to start assigning tstamp_type based on enum defined >> in this commit. >> >> Earlier we used bool arg flag to check if the tstamp is mono in >> function skb_set_delivery_time, Now the signature of the functions >> accepts tstamp_type to distinguish between mono and real time. >> >> In future tstamp_type:1 can be extended to support userspace timestamp >> by increasing the bitfield. >> >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > >> +/** >> + * tstamp_type:1 can take 2 values each >> + * represented by time base in skb >> + * 0x0 => real timestamp_type >> + * 0x1 => mono timestamp_type >> + */ >> +enum skb_tstamp_type { >> + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ >> + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ >> +}; >> + > > Can drop the comments. These names are self documenting. Noted! . I will take care of this > >> /** >> * DOC: Basic sk_buff geometry >> * >> @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; >> * @dst_pending_confirm: need to confirm neighbour >> * @decrypted: Decrypted SKB >> * @slow_gro: state present at GRO time, slower prepare step required >> - * @mono_delivery_time: When set, skb->tstamp has the >> + * @tstamp_type: When set, skb->tstamp has the >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >> * skb->tstamp has the (rcv) timestamp at ingress and >> * delivery_time at egress. > > Is this still correct? I think all egress does now annotate correctly > as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. > That is correct. >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >> index 61119d42b0fd..a062f88c47c3 100644 >> --- a/net/ipv4/tcp_output.c >> +++ b/net/ipv4/tcp_output.c >> @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, >> tp = tcp_sk(sk); >> prior_wstamp = tp->tcp_wstamp_ns; >> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); >> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); >> + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); > > Multiple references to CLOCK_MONOTONIC left > I think i took care of all the references. Apologies if i didn't understand your comment here. > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 19:58 ` Abhishek Chauhan (ABC) @ 2024-04-18 20:11 ` Willem de Bruijn 2024-04-18 20:38 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Willem de Bruijn @ 2024-04-18 20:11 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel Abhishek Chauhan (ABC) wrote: > > > On 4/18/2024 11:47 AM, Willem de Bruijn wrote: > > Abhishek Chauhan wrote: > >> mono_delivery_time was added to check if skb->tstamp has delivery > >> time in mono clock base (i.e. EDT) otherwise skb->tstamp has > >> timestamp in ingress and delivery_time at egress. > >> > >> Renaming the bitfield from mono_delivery_time to tstamp_type is for > >> extensibilty for other timestamps such as userspace timestamp > >> (i.e. SO_TXTIME) set via sock opts. > >> > >> As we are renaming the mono_delivery_time to tstamp_type, it makes > >> sense to start assigning tstamp_type based on enum defined > >> in this commit. > >> > >> Earlier we used bool arg flag to check if the tstamp is mono in > >> function skb_set_delivery_time, Now the signature of the functions > >> accepts tstamp_type to distinguish between mono and real time. > >> > >> In future tstamp_type:1 can be extended to support userspace timestamp > >> by increasing the bitfield. > >> > >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > > > >> +/** > >> + * tstamp_type:1 can take 2 values each > >> + * represented by time base in skb > >> + * 0x0 => real timestamp_type > >> + * 0x1 => mono timestamp_type > >> + */ > >> +enum skb_tstamp_type { > >> + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ > >> + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ > >> +}; > >> + > > > > Can drop the comments. These names are self documenting. > > Noted! . I will take care of this > > > >> /** > >> * DOC: Basic sk_buff geometry > >> * > >> @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; > >> * @dst_pending_confirm: need to confirm neighbour > >> * @decrypted: Decrypted SKB > >> * @slow_gro: state present at GRO time, slower prepare step required > >> - * @mono_delivery_time: When set, skb->tstamp has the > >> + * @tstamp_type: When set, skb->tstamp has the > >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the > >> * skb->tstamp has the (rcv) timestamp at ingress and > >> * delivery_time at egress. > > > > Is this still correct? I think all egress does now annotate correctly > > as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. > > > That is correct. > > >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > >> index 61119d42b0fd..a062f88c47c3 100644 > >> --- a/net/ipv4/tcp_output.c > >> +++ b/net/ipv4/tcp_output.c > >> @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, > >> tp = tcp_sk(sk); > >> prior_wstamp = tp->tcp_wstamp_ns; > >> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); > >> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); > >> + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); > > > > Multiple references to CLOCK_MONOTONIC left > > > I think i took care of all the references. Apologies if i didn't understand your comment here. On closer read, there is a type issue here. skb_set_delivery_time takes a u8 tstamp_type. But it is often passed a clockid_t, and that is also what the switch expects. But it does also get called with a tstamp_type in code like the following: + u8 tstamp_type = skb->tstamp_type; unsigned int hlen, ll_rs, mtu; ktime_t tstamp = skb->tstamp; struct ip_frag_state state; @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, if (iter.frag) ip_fraglist_prepare(skb, &iter); - skb_set_delivery_time(skb, tstamp, mono_delivery_time); + skb_set_delivery_time(skb, tstamp, tstamp_type); So maybe we need two variants, one that takes a tstamp_type and one that tames a clockid_t? The first can be simple, not switch needed. Just apply the two stores. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 20:11 ` Willem de Bruijn @ 2024-04-18 20:38 ` Abhishek Chauhan (ABC) 2024-04-18 20:49 ` Willem de Bruijn 0 siblings, 1 reply; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-18 20:38 UTC (permalink / raw) To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel On 4/18/2024 1:11 PM, Willem de Bruijn wrote: > Abhishek Chauhan (ABC) wrote: >> >> >> On 4/18/2024 11:47 AM, Willem de Bruijn wrote: >>> Abhishek Chauhan wrote: >>>> mono_delivery_time was added to check if skb->tstamp has delivery >>>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has >>>> timestamp in ingress and delivery_time at egress. >>>> >>>> Renaming the bitfield from mono_delivery_time to tstamp_type is for >>>> extensibilty for other timestamps such as userspace timestamp >>>> (i.e. SO_TXTIME) set via sock opts. >>>> >>>> As we are renaming the mono_delivery_time to tstamp_type, it makes >>>> sense to start assigning tstamp_type based on enum defined >>>> in this commit. >>>> >>>> Earlier we used bool arg flag to check if the tstamp is mono in >>>> function skb_set_delivery_time, Now the signature of the functions >>>> accepts tstamp_type to distinguish between mono and real time. >>>> >>>> In future tstamp_type:1 can be extended to support userspace timestamp >>>> by increasing the bitfield. >>>> >>>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ >>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >>> >>>> +/** >>>> + * tstamp_type:1 can take 2 values each >>>> + * represented by time base in skb >>>> + * 0x0 => real timestamp_type >>>> + * 0x1 => mono timestamp_type >>>> + */ >>>> +enum skb_tstamp_type { >>>> + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ >>>> + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ >>>> +}; >>>> + >>> >>> Can drop the comments. These names are self documenting. >> >> Noted! . I will take care of this >>> >>>> /** >>>> * DOC: Basic sk_buff geometry >>>> * >>>> @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; >>>> * @dst_pending_confirm: need to confirm neighbour >>>> * @decrypted: Decrypted SKB >>>> * @slow_gro: state present at GRO time, slower prepare step required >>>> - * @mono_delivery_time: When set, skb->tstamp has the >>>> + * @tstamp_type: When set, skb->tstamp has the >>>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >>>> * skb->tstamp has the (rcv) timestamp at ingress and >>>> * delivery_time at egress. >>> >>> Is this still correct? I think all egress does now annotate correctly >>> as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. >>> >> That is correct. >> >>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>>> index 61119d42b0fd..a062f88c47c3 100644 >>>> --- a/net/ipv4/tcp_output.c >>>> +++ b/net/ipv4/tcp_output.c >>>> @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, >>>> tp = tcp_sk(sk); >>>> prior_wstamp = tp->tcp_wstamp_ns; >>>> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); >>>> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); >>>> + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); >>> >>> Multiple references to CLOCK_MONOTONIC left >>> >> I think i took care of all the references. Apologies if i didn't understand your comment here. > > On closer read, there is a type issue here. > > skb_set_delivery_time takes a u8 tstamp_type. But it is often passed > a clockid_t, and that is also what the switch expects. > > But it does also get called with a tstamp_type in code like the > following: > > + u8 tstamp_type = skb->tstamp_type; > unsigned int hlen, ll_rs, mtu; > ktime_t tstamp = skb->tstamp; > struct ip_frag_state state; > @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, > if (iter.frag) > ip_fraglist_prepare(skb, &iter); > > - skb_set_delivery_time(skb, tstamp, mono_delivery_time); > + skb_set_delivery_time(skb, tstamp, tstamp_type); > > So maybe we need two variants, one that takes a tstamp_type and one > that tames a clockid_t? > > The first can be simple, not switch needed. Just apply the two stores. I agree to what you are saying but clockid_t => points to int itself. For example :- void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc, clockid_t clockid) qdisc_watchdog_init_clockid(wd, qdisc, CLOCK_MONOTONIC); => sch_api.c qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid); =>sch_etf.c (q->clockid is int) But i can change it to two new APIs one which accepts only clock_id (with switch) and other accepts u8 to directly store whatever is given. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 20:38 ` Abhishek Chauhan (ABC) @ 2024-04-18 20:49 ` Willem de Bruijn 2024-04-18 20:51 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Willem de Bruijn @ 2024-04-18 20:49 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel Abhishek Chauhan (ABC) wrote: > > > On 4/18/2024 1:11 PM, Willem de Bruijn wrote: > > Abhishek Chauhan (ABC) wrote: > >> > >> > >> On 4/18/2024 11:47 AM, Willem de Bruijn wrote: > >>> Abhishek Chauhan wrote: > >>>> mono_delivery_time was added to check if skb->tstamp has delivery > >>>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has > >>>> timestamp in ingress and delivery_time at egress. > >>>> > >>>> Renaming the bitfield from mono_delivery_time to tstamp_type is for > >>>> extensibilty for other timestamps such as userspace timestamp > >>>> (i.e. SO_TXTIME) set via sock opts. > >>>> > >>>> As we are renaming the mono_delivery_time to tstamp_type, it makes > >>>> sense to start assigning tstamp_type based on enum defined > >>>> in this commit. > >>>> > >>>> Earlier we used bool arg flag to check if the tstamp is mono in > >>>> function skb_set_delivery_time, Now the signature of the functions > >>>> accepts tstamp_type to distinguish between mono and real time. > >>>> > >>>> In future tstamp_type:1 can be extended to support userspace timestamp > >>>> by increasing the bitfield. > >>>> > >>>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > >>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > >>> > >>>> +/** > >>>> + * tstamp_type:1 can take 2 values each > >>>> + * represented by time base in skb > >>>> + * 0x0 => real timestamp_type > >>>> + * 0x1 => mono timestamp_type > >>>> + */ > >>>> +enum skb_tstamp_type { > >>>> + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ > >>>> + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ > >>>> +}; > >>>> + > >>> > >>> Can drop the comments. These names are self documenting. > >> > >> Noted! . I will take care of this > >>> > >>>> /** > >>>> * DOC: Basic sk_buff geometry > >>>> * > >>>> @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; > >>>> * @dst_pending_confirm: need to confirm neighbour > >>>> * @decrypted: Decrypted SKB > >>>> * @slow_gro: state present at GRO time, slower prepare step required > >>>> - * @mono_delivery_time: When set, skb->tstamp has the > >>>> + * @tstamp_type: When set, skb->tstamp has the > >>>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the > >>>> * skb->tstamp has the (rcv) timestamp at ingress and > >>>> * delivery_time at egress. > >>> > >>> Is this still correct? I think all egress does now annotate correctly > >>> as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. > >>> > >> That is correct. > >> > >>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > >>>> index 61119d42b0fd..a062f88c47c3 100644 > >>>> --- a/net/ipv4/tcp_output.c > >>>> +++ b/net/ipv4/tcp_output.c > >>>> @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, > >>>> tp = tcp_sk(sk); > >>>> prior_wstamp = tp->tcp_wstamp_ns; > >>>> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); > >>>> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); > >>>> + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); > >>> > >>> Multiple references to CLOCK_MONOTONIC left > >>> > >> I think i took care of all the references. Apologies if i didn't understand your comment here. > > > > On closer read, there is a type issue here. > > > > skb_set_delivery_time takes a u8 tstamp_type. But it is often passed > > a clockid_t, and that is also what the switch expects. > > > > But it does also get called with a tstamp_type in code like the > > following: > > > > + u8 tstamp_type = skb->tstamp_type; > > unsigned int hlen, ll_rs, mtu; > > ktime_t tstamp = skb->tstamp; > > struct ip_frag_state state; > > @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, > > if (iter.frag) > > ip_fraglist_prepare(skb, &iter); > > > > - skb_set_delivery_time(skb, tstamp, mono_delivery_time); > > + skb_set_delivery_time(skb, tstamp, tstamp_type); > > > > So maybe we need two variants, one that takes a tstamp_type and one > > that tames a clockid_t? > > > > The first can be simple, not switch needed. Just apply the two stores. > I agree to what you are saying but clockid_t => points to int itself. > > For example :- > void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc, > clockid_t clockid) > > qdisc_watchdog_init_clockid(wd, qdisc, CLOCK_MONOTONIC); => sch_api.c > qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid); =>sch_etf.c (q->clockid is int) My concern is more that we use CLOCK_MONOTONIC and SKB_CLOCK_MONO (and other clocks) interchangeably, without invariant checks to make sure that they map onto the same integer value. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty 2024-04-18 20:49 ` Willem de Bruijn @ 2024-04-18 20:51 ` Abhishek Chauhan (ABC) 0 siblings, 0 replies; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-18 20:51 UTC (permalink / raw) To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel On 4/18/2024 1:49 PM, Willem de Bruijn wrote: > Abhishek Chauhan (ABC) wrote: >> >> >> On 4/18/2024 1:11 PM, Willem de Bruijn wrote: >>> Abhishek Chauhan (ABC) wrote: >>>> >>>> >>>> On 4/18/2024 11:47 AM, Willem de Bruijn wrote: >>>>> Abhishek Chauhan wrote: >>>>>> mono_delivery_time was added to check if skb->tstamp has delivery >>>>>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has >>>>>> timestamp in ingress and delivery_time at egress. >>>>>> >>>>>> Renaming the bitfield from mono_delivery_time to tstamp_type is for >>>>>> extensibilty for other timestamps such as userspace timestamp >>>>>> (i.e. SO_TXTIME) set via sock opts. >>>>>> >>>>>> As we are renaming the mono_delivery_time to tstamp_type, it makes >>>>>> sense to start assigning tstamp_type based on enum defined >>>>>> in this commit. >>>>>> >>>>>> Earlier we used bool arg flag to check if the tstamp is mono in >>>>>> function skb_set_delivery_time, Now the signature of the functions >>>>>> accepts tstamp_type to distinguish between mono and real time. >>>>>> >>>>>> In future tstamp_type:1 can be extended to support userspace timestamp >>>>>> by increasing the bitfield. >>>>>> >>>>>> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ >>>>>> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >>>>> >>>>>> +/** >>>>>> + * tstamp_type:1 can take 2 values each >>>>>> + * represented by time base in skb >>>>>> + * 0x0 => real timestamp_type >>>>>> + * 0x1 => mono timestamp_type >>>>>> + */ >>>>>> +enum skb_tstamp_type { >>>>>> + SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ >>>>>> + SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ >>>>>> +}; >>>>>> + >>>>> >>>>> Can drop the comments. These names are self documenting. >>>> >>>> Noted! . I will take care of this >>>>> >>>>>> /** >>>>>> * DOC: Basic sk_buff geometry >>>>>> * >>>>>> @@ -819,7 +830,7 @@ typedef unsigned char *sk_buff_data_t; >>>>>> * @dst_pending_confirm: need to confirm neighbour >>>>>> * @decrypted: Decrypted SKB >>>>>> * @slow_gro: state present at GRO time, slower prepare step required >>>>>> - * @mono_delivery_time: When set, skb->tstamp has the >>>>>> + * @tstamp_type: When set, skb->tstamp has the >>>>>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >>>>>> * skb->tstamp has the (rcv) timestamp at ingress and >>>>>> * delivery_time at egress. >>>>> >>>>> Is this still correct? I think all egress does now annotate correctly >>>>> as SKB_CLOCK_MONO. So when not set it always is SKB_CLOCK_REAL. >>>>> >>>> That is correct. >>>> >>>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c >>>>>> index 61119d42b0fd..a062f88c47c3 100644 >>>>>> --- a/net/ipv4/tcp_output.c >>>>>> +++ b/net/ipv4/tcp_output.c >>>>>> @@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, >>>>>> tp = tcp_sk(sk); >>>>>> prior_wstamp = tp->tcp_wstamp_ns; >>>>>> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache); >>>>>> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true); >>>>>> + skb_set_delivery_time(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC); >>>>> >>>>> Multiple references to CLOCK_MONOTONIC left >>>>> >>>> I think i took care of all the references. Apologies if i didn't understand your comment here. >>> >>> On closer read, there is a type issue here. >>> >>> skb_set_delivery_time takes a u8 tstamp_type. But it is often passed >>> a clockid_t, and that is also what the switch expects. >>> >>> But it does also get called with a tstamp_type in code like the >>> following: >>> >>> + u8 tstamp_type = skb->tstamp_type; >>> unsigned int hlen, ll_rs, mtu; >>> ktime_t tstamp = skb->tstamp; >>> struct ip_frag_state state; >>> @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk, >>> if (iter.frag) >>> ip_fraglist_prepare(skb, &iter); >>> >>> - skb_set_delivery_time(skb, tstamp, mono_delivery_time); >>> + skb_set_delivery_time(skb, tstamp, tstamp_type); >>> >>> So maybe we need two variants, one that takes a tstamp_type and one >>> that tames a clockid_t? >>> >>> The first can be simple, not switch needed. Just apply the two stores. >> I agree to what you are saying but clockid_t => points to int itself. >> >> For example :- >> void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc, >> clockid_t clockid) >> >> qdisc_watchdog_init_clockid(wd, qdisc, CLOCK_MONOTONIC); => sch_api.c >> qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid); =>sch_etf.c (q->clockid is int) > > My concern is more that we use CLOCK_MONOTONIC and SKB_CLOCK_MONO > (and other clocks) interchangeably, without invariant checks to make > sure that they map onto the same integer value. Ah i see. I got it . I will make two APIs . Makes sense. 1. One can check for clockid => switch => set 2. One can set it directly. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-18 0:43 [RFC PATCH bpf-next v4 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan @ 2024-04-18 0:43 ` Abhishek Chauhan 2024-04-18 19:06 ` Willem de Bruijn 1 sibling, 1 reply; 16+ messages in thread From: Abhishek Chauhan @ 2024-04-18 0:43 UTC (permalink / raw) To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel tstamp_type is now set based on actual clockid_t compressed into 2 bits. To make the design scalable for future needs this commit bring in the change to extend the tstamp_type:1 to tstamp_type:2 to support other clockid_t timestamp. We now support CLOCK_TAI as part of tstamp_type as part of this commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME. Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- Changes since v3 - Carefully reviewed BPF APIs and made changes in BPF code as well. - Re-used actual clockid_t values since skbuff.h indirectly includes uapi/linux/time.h - Added CLOCK_TAI as part of the skb_set_delivery_time handling instead of CLOCK_USER - Added default in switch for unsupported and invalid timestamp with an WARN_ONCE. - All of the above comments were given by Willem - Made changes in filter.c as per Martin's comments to handle invalid cases in bpf code with addition of SKB_TAI_DELIVERY_TIME_MASK Changes since v2 - Minor changes to commit subject Changes since v1 - identified additional changes in BPF framework. - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK. - Made changes in skb_set_delivery_time to keep changes similar to previous code for mono_delivery_time and just setting tstamp_type bit 1 for userspace timestamp. include/linux/skbuff.h | 21 +++++++--- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 40 ++++++++++++++++--- net/ipv4/ip_output.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv6/ip6_output.c | 2 +- net/ipv6/raw.c | 2 +- net/packet/af_packet.c | 7 ++-- .../selftests/bpf/prog_tests/ctx_rewrite.c | 13 ++++-- 9 files changed, 68 insertions(+), 22 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2e7811c7e4db..647522ef41cd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -703,14 +703,17 @@ typedef unsigned char *sk_buff_data_t; #endif /** - * tstamp_type:1 can take 2 values each + * tstamp_type:2 can take 4 values each * represented by time base in skb * 0x0 => real timestamp_type * 0x1 => mono timestamp_type + * 0x2 => tai timestamp_type + * 0x3 => undefined timestamp_type */ enum skb_tstamp_type { SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ + SKB_CLOCK_TAI, /* Time base in skb is TAI */ }; /** @@ -833,7 +836,8 @@ enum skb_tstamp_type { * @tstamp_type: When set, skb->tstamp has the * delivery_time in mono clock base (i.e. EDT). Otherwise, the * skb->tstamp has the (rcv) timestamp at ingress and - * delivery_time at egress. + * delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid + * coming from userspace * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @alloc_cpu: CPU which did the skb allocation. @@ -961,7 +965,7 @@ struct sk_buff { /* private: */ __u8 __mono_tc_offset[0]; /* public: */ - __u8 tstamp_type:1; /* See SKB_CLOCK_*_MASK */ + __u8 tstamp_type:2; /* See skb_tstamp_type enum */ #ifdef CONFIG_NET_XGRESS __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ __u8 tc_skip_classify:1; @@ -1096,10 +1100,12 @@ struct sk_buff { */ #ifdef __BIG_ENDIAN_BITFIELD #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) -#define TC_AT_INGRESS_MASK (1 << 6) +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) +#define TC_AT_INGRESS_MASK (1 << 5) #else #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) -#define TC_AT_INGRESS_MASK (1 << 1) +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) +#define TC_AT_INGRESS_MASK (1 << 2) #endif #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, case CLOCK_MONOTONIC: skb->tstamp_type = SKB_CLOCK_MONO; break; + case CLOCK_TAI: + skb->tstamp_type = SKB_CLOCK_TAI; + break; + default: + WARN_ONCE(true, "clockid %d not supported", tstamp_type); } } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index cee0a7915c08..1376ed5ece10 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6209,6 +6209,7 @@ union { \ enum { BPF_SKB_TSTAMP_UNSPEC, BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */ + BPF_SKB_TSTAMP_DELIVERY_TAI, /* tstamp has tai delivery time */ /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle, * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC * and try to deduce it by ingress, egress or skb->sk->sk_clockid. diff --git a/net/core/filter.c b/net/core/filter.c index b1192e672cbb..de86c42b5859 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7711,6 +7711,12 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb, skb->tstamp = tstamp; skb->tstamp_type = SKB_CLOCK_MONO; break; + case BPF_SKB_TSTAMP_DELIVERY_TAI: + if (!tstamp) + return -EINVAL; + skb->tstamp = tstamp; + skb->tstamp_type = SKB_CLOCK_TAI; + break; case BPF_SKB_TSTAMP_UNSPEC: if (tstamp) return -EINVAL; @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, - SKB_MONO_DELIVERY_TIME_MASK, 2); + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, + SKB_MONO_DELIVERY_TIME_MASK, 3); + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, + SKB_TAI_DELIVERY_TIME_MASK, 4); *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); *insn++ = BPF_JMP_A(1); *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); + *insn++ = BPF_JMP_A(1); + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); return insn; } @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); + /*check if all three bits are set*/ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | + SKB_TAI_DELIVERY_TIME_MASK); + /*if all 3 bits are set jump 3 instructions and clear the register */ + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | + SKB_TAI_DELIVERY_TIME_MASK, 4); + /*Now check Mono is set with ingress mask if so clear */ + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); + /*Now Check tai is set with ingress mask if so clear */ + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); + /*Now Check tai and mono are set if so clear */ + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, + SKB_MONO_DELIVERY_TIME_MASK | + SKB_TAI_DELIVERY_TIME_MASK, 1); + /* goto <store> */ + *insn++ = BPF_JMP_A(2); /* skb->tc_at_ingress && skb->tstamp_type:1, * read 0 as the (rcv) timestamp. */ @@ -9456,9 +9484,11 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, /* Writing __sk_buff->tstamp as ingress, goto <clear> */ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); /* goto <store> */ - *insn++ = BPF_JMP_A(2); + *insn++ = BPF_JMP_A(3); /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK); + /* <clear>: tai delivery_time or (skb->tstamp_type:2) */ + *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TAI_DELIVERY_TIME_MASK); *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); } #endif diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f29349151203..91e80a3d411f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); skb->mark = cork->mark; - skb->tstamp = cork->transmit_time; + skb_set_delivery_time(skb, cork->transmit_time, sk->sk_clockid); /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec * on dst refcount diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index dcb11f22cbf2..bbc46a40c8b6 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -360,7 +360,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb->protocol = htons(ETH_P_IP); skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, sk->sk_clockid); skb_dst_set(skb, &rt->dst); *rtp = NULL; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a9e819115622..0b8193bdd98f 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1924,7 +1924,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = cork->base.mark; - skb->tstamp = cork->base.transmit_time; + skb_set_delivery_time(skb, cork->base.transmit_time, sk->sk_clockid); ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 0d896ca7b589..625f3a917e50 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -621,7 +621,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb->protocol = htons(ETH_P_IPV6); skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, sk->sk_clockid); skb_put(skb, length); skb_reset_network_header(skb); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8c6d3fbb4ed8..356c96f23370 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2056,8 +2056,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->dev = dev; skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); - skb->tstamp = sockc.transmit_time; - + skb_set_delivery_time(skb, sockc.transmit_time, sk->sk_clockid); skb_setup_tx_timestamp(skb, sockc.tsflags); if (unlikely(extra_len == 4)) @@ -2585,7 +2584,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->dev = dev; skb->priority = READ_ONCE(po->sk.sk_priority); skb->mark = READ_ONCE(po->sk.sk_mark); - skb->tstamp = sockc->transmit_time; + skb_set_delivery_time(skb, sockc->transmit_time, po->sk.sk_clockid); skb_setup_tx_timestamp(skb, sockc->tsflags); skb_zcopy_set_nouarg(skb, ph.raw); @@ -3063,7 +3062,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->dev = dev; skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc.mark; - skb->tstamp = sockc.transmit_time; + skb_set_delivery_time(skb, sockc.transmit_time, sk->sk_clockid); if (unlikely(extra_len == 4)) skb->no_fcs = 1; diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c index 3b7c57fe55a5..9d0c33d7f52e 100644 --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c @@ -69,15 +69,20 @@ static struct test_case test_cases[] = { { N(SCHED_CLS, struct __sk_buff, tstamp), .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" - "w11 &= 3;" - "if w11 != 0x3 goto pc+2;" + "w11 &= 7;" + "if w11 == 0x7 goto pc+4;" + "if w11 == 0x5 goto pc+3;" + "if w11 == 0x6 goto pc+2;" + "if w11 == 0x3 goto pc+1;" + "goto pc+2" "$dst = 0;" "goto pc+1;" "$dst = *(u64 *)($ctx + sk_buff::tstamp);", .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);" - "if w11 & 0x2 goto pc+1;" - "goto pc+2;" + "if w11 & 0x4 goto pc+1;" + "goto pc+3;" "w11 &= -2;" + "w11 &= -3;" "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;" "*(u64 *)($ctx + sk_buff::tstamp) = $src;", }, -- 2.25.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type Abhishek Chauhan @ 2024-04-18 19:06 ` Willem de Bruijn 2024-04-18 20:10 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Willem de Bruijn @ 2024-04-18 19:06 UTC (permalink / raw) To: Abhishek Chauhan, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Willem de Bruijn, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel Abhishek Chauhan wrote: > tstamp_type is now set based on actual clockid_t compressed > into 2 bits. > > To make the design scalable for future needs this commit bring in > the change to extend the tstamp_type:1 to tstamp_type:2 to support > other clockid_t timestamp. > > We now support CLOCK_TAI as part of tstamp_type as part of this > commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME. > > Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > > /** > - * tstamp_type:1 can take 2 values each > + * tstamp_type:2 can take 4 values each > * represented by time base in skb > * 0x0 => real timestamp_type > * 0x1 => mono timestamp_type > + * 0x2 => tai timestamp_type > + * 0x3 => undefined timestamp_type Same point as previous patch about comment that repeats name. > @@ -833,7 +836,8 @@ enum skb_tstamp_type { > * @tstamp_type: When set, skb->tstamp has the > * delivery_time in mono clock base (i.e. EDT). Otherwise, the > * skb->tstamp has the (rcv) timestamp at ingress and > - * delivery_time at egress. > + * delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid > + * coming from userspace I would simplify the comment: clock base of skb->tstamp. Already in the first patch. > * @napi_id: id of the NAPI struct this skb came from > * @sender_cpu: (aka @napi_id) source CPU in XPS > * @alloc_cpu: CPU which did the skb allocation. > @@ -961,7 +965,7 @@ struct sk_buff { > /* private: */ > __u8 __mono_tc_offset[0]; > /* public: */ > - __u8 tstamp_type:1; /* See SKB_CLOCK_*_MASK */ > + __u8 tstamp_type:2; /* See skb_tstamp_type enum */ Probably good to call out that according to pahole this fills a hole. > #ifdef CONFIG_NET_XGRESS > __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ > __u8 tc_skip_classify:1; > @@ -1096,10 +1100,12 @@ struct sk_buff { > */ > #ifdef __BIG_ENDIAN_BITFIELD > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) > -#define TC_AT_INGRESS_MASK (1 << 6) > +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) SKB_TSTAMP_TYPE_BIT2_MASK? > +#define TC_AT_INGRESS_MASK (1 << 5) > #else > #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) > -#define TC_AT_INGRESS_MASK (1 << 1) > +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) > +#define TC_AT_INGRESS_MASK (1 << 2) > #endif > #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) > > @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, > case CLOCK_MONOTONIC: > skb->tstamp_type = SKB_CLOCK_MONO; > break; > + case CLOCK_TAI: > + skb->tstamp_type = SKB_CLOCK_TAI; > + break; > + default: > + WARN_ONCE(true, "clockid %d not supported", tstamp_type); and set to 0 and default tstamp_type? > } > } > > @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, > *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, > SKB_BF_MONO_TC_OFFSET); > *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, > - SKB_MONO_DELIVERY_TIME_MASK, 2); > + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); > + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, > + SKB_MONO_DELIVERY_TIME_MASK, 3); > + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, > + SKB_TAI_DELIVERY_TIME_MASK, 4); > *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); > *insn++ = BPF_JMP_A(1); > *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); > + *insn++ = BPF_JMP_A(1); > + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); > > return insn; > } > @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, > __u8 tmp_reg = BPF_REG_AX; > > *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); > + /*check if all three bits are set*/ > *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, > - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); > - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, > - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); > + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | > + SKB_TAI_DELIVERY_TIME_MASK); > + /*if all 3 bits are set jump 3 instructions and clear the register */ > + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, > + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | > + SKB_TAI_DELIVERY_TIME_MASK, 4); > + /*Now check Mono is set with ingress mask if so clear */ > + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, > + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); > + /*Now Check tai is set with ingress mask if so clear */ > + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, > + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); > + /*Now Check tai and mono are set if so clear */ > + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, > + SKB_MONO_DELIVERY_TIME_MASK | > + SKB_TAI_DELIVERY_TIME_MASK, 1); This looks as if all JEQ result in "if so clear"? Is the goal to only do something different for the two bits being 0x1, can we have a single test with a two-bit mask, rather than four tests? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-18 19:06 ` Willem de Bruijn @ 2024-04-18 20:10 ` Abhishek Chauhan (ABC) 2024-04-18 21:57 ` Martin KaFai Lau 0 siblings, 1 reply; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-18 20:10 UTC (permalink / raw) To: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Martin KaFai Lau, Daniel Borkmann, bpf Cc: kernel On 4/18/2024 12:06 PM, Willem de Bruijn wrote: > Abhishek Chauhan wrote: >> tstamp_type is now set based on actual clockid_t compressed >> into 2 bits. >> >> To make the design scalable for future needs this commit bring in >> the change to extend the tstamp_type:1 to tstamp_type:2 to support >> other clockid_t timestamp. >> >> We now support CLOCK_TAI as part of tstamp_type as part of this >> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME. >> >> Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >> >> /** >> - * tstamp_type:1 can take 2 values each >> + * tstamp_type:2 can take 4 values each >> * represented by time base in skb >> * 0x0 => real timestamp_type >> * 0x1 => mono timestamp_type >> + * 0x2 => tai timestamp_type >> + * 0x3 => undefined timestamp_type > > Same point as previous patch about comment that repeats name. > Will take care, Noted! >> @@ -833,7 +836,8 @@ enum skb_tstamp_type { >> * @tstamp_type: When set, skb->tstamp has the >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >> * skb->tstamp has the (rcv) timestamp at ingress and >> - * delivery_time at egress. >> + * delivery_time at egress or skb->tstamp defined by skb->sk->sk_clockid >> + * coming from userspace > > I would simplify the comment: clock base of skb->tstamp. > Already in the first patch. > Will take care, Noted! >> * @napi_id: id of the NAPI struct this skb came from >> * @sender_cpu: (aka @napi_id) source CPU in XPS >> * @alloc_cpu: CPU which did the skb allocation. >> @@ -961,7 +965,7 @@ struct sk_buff { >> /* private: */ >> __u8 __mono_tc_offset[0]; >> /* public: */ >> - __u8 tstamp_type:1; /* See SKB_CLOCK_*_MASK */ >> + __u8 tstamp_type:2; /* See skb_tstamp_type enum */ > > Probably good to call out that according to pahole this fills a hole. > I will do that . >> #ifdef CONFIG_NET_XGRESS >> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >> __u8 tc_skip_classify:1; >> @@ -1096,10 +1100,12 @@ struct sk_buff { >> */ >> #ifdef __BIG_ENDIAN_BITFIELD >> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >> -#define TC_AT_INGRESS_MASK (1 << 6) >> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) > > SKB_TSTAMP_TYPE_BIT2_MASK? > I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. >> +#define TC_AT_INGRESS_MASK (1 << 5) >> #else >> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >> -#define TC_AT_INGRESS_MASK (1 << 1) >> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >> +#define TC_AT_INGRESS_MASK (1 << 2) >> #endif >> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >> >> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >> case CLOCK_MONOTONIC: >> skb->tstamp_type = SKB_CLOCK_MONO; >> break; >> + case CLOCK_TAI: >> + skb->tstamp_type = SKB_CLOCK_TAI; >> + break; >> + default: >> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); > > and set to 0 and default tstamp_type? > Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >> } >> } > >> > > @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >> SKB_BF_MONO_TC_OFFSET); >> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >> - SKB_MONO_DELIVERY_TIME_MASK, 2); >> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >> + SKB_MONO_DELIVERY_TIME_MASK, 3); >> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >> + SKB_TAI_DELIVERY_TIME_MASK, 4); >> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >> *insn++ = BPF_JMP_A(1); >> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >> + *insn++ = BPF_JMP_A(1); >> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); >> >> return insn; >> } >> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >> __u8 tmp_reg = BPF_REG_AX; >> >> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >> + /*check if all three bits are set*/ >> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >> + SKB_TAI_DELIVERY_TIME_MASK); >> + /*if all 3 bits are set jump 3 instructions and clear the register */ >> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >> + SKB_TAI_DELIVERY_TIME_MASK, 4); >> + /*Now check Mono is set with ingress mask if so clear */ >> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >> + /*Now Check tai is set with ingress mask if so clear */ >> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >> + /*Now Check tai and mono are set if so clear */ >> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >> + SKB_MONO_DELIVERY_TIME_MASK | >> + SKB_TAI_DELIVERY_TIME_MASK, 1); > > This looks as if all JEQ result in "if so clear"? > > Is the goal to only do something different for the two bits being 0x1, > can we have a single test with a two-bit mask, rather than four tests? > I think Martin wanted to take care of TAI as well. I will wait for his comment here My Goal was to take care of invalid combos which does not hold valid 1. If all 3 bits are set => invalid combo (Test case written is Insane) 2. If 2 bits are set (tai+mono)(Test case written is Insane) => this cannot happen (because clock base can only be one in skb) 3. If 2 bit are set (ingress + tai/mono) => This is existing logic + tai being added (clear tstamp in ingress) 4. For all other cases go ahead and fill in the tstamp in the dest register. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-18 20:10 ` Abhishek Chauhan (ABC) @ 2024-04-18 21:57 ` Martin KaFai Lau 2024-04-19 0:30 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Martin KaFai Lau @ 2024-04-18 21:57 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Willem de Bruijn Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Daniel Borkmann, bpf, kernel On 4/18/24 1:10 PM, Abhishek Chauhan (ABC) wrote: >>> #ifdef CONFIG_NET_XGRESS >>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >>> __u8 tc_skip_classify:1; >>> @@ -1096,10 +1100,12 @@ struct sk_buff { >>> */ >>> #ifdef __BIG_ENDIAN_BITFIELD >>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >>> -#define TC_AT_INGRESS_MASK (1 << 6) >>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) >> >> SKB_TSTAMP_TYPE_BIT2_MASK? nit. Shorten it to just SKB_TSTAMP_TYPE_MASK? #ifdef __BIG_ENDIAN_BITFIELD #define SKB_TSTAMP_TYPE_MASK (3 << 6) #define SKB_TSTAMP_TYPE_RSH (6) /* more on this later */ #else #define SKB_TSTAMP_TYPE_MASK (3) #endif >> > I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. I think it is not very useful to distinguish each bit since it is an enum value now. It becomes more like the "pkt_type:3" and its PKT_TYPE_MAX. >>> +#define TC_AT_INGRESS_MASK (1 << 5) >>> #else >>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>> -#define TC_AT_INGRESS_MASK (1 << 1) >>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >>> +#define TC_AT_INGRESS_MASK (1 << 2) >>> #endif >>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>> >>> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>> case CLOCK_MONOTONIC: >>> skb->tstamp_type = SKB_CLOCK_MONO; >>> break; >>> + case CLOCK_TAI: >>> + skb->tstamp_type = SKB_CLOCK_TAI; >>> + break; >>> + default: >>> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); >> >> and set to 0 and default tstamp_type? >> Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >>> } >>> } >> >>> > >> @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >>> SKB_BF_MONO_TC_OFFSET); >>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>> - SKB_MONO_DELIVERY_TIME_MASK, 2); >>> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>> + SKB_MONO_DELIVERY_TIME_MASK, 3); >>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >>> *insn++ = BPF_JMP_A(1); >>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >>> + *insn++ = BPF_JMP_A(1); >>> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); With SKB_TSTAMP_TYPE_MASK defined like above, this could be simplified like this (untested): static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, struct bpf_insn *insn) { __u8 value_reg = si->dst_reg; __u8 skb_reg = si->src_reg; BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK); #ifdef __BIG_ENDIAN_BITFIELD *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH); #else BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1)); #endif return insn; } >>> >>> return insn; >>> } >>> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >>> __u8 tmp_reg = BPF_REG_AX; >>> >>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>> + /*check if all three bits are set*/ >>> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>> + SKB_TAI_DELIVERY_TIME_MASK); >>> + /*if all 3 bits are set jump 3 instructions and clear the register */ >>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>> + /*Now check Mono is set with ingress mask if so clear */ >>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >>> + /*Now Check tai is set with ingress mask if so clear */ >>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>> + /*Now Check tai and mono are set if so clear */ >>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>> + SKB_MONO_DELIVERY_TIME_MASK | >>> + SKB_TAI_DELIVERY_TIME_MASK, 1); Same as the bpf_convert_tstamp_type_read, this could be simplified with SKB_TSTAMP_TYPE_MASK. >> >> This looks as if all JEQ result in "if so clear"? >> >> Is the goal to only do something different for the two bits being 0x1, >> can we have a single test with a two-bit mask, rather than four tests? >> > I think Martin wanted to take care of TAI as well. I will wait for his comment here > > My Goal was to take care of invalid combos which does not hold valid > 1. If all 3 bits are set => invalid combo (Test case written is Insane) > 2. If 2 bits are set (tai+mono)(Test case written is Insane) => this cannot happen (because clock base can only be one in skb) > 3. If 2 bit are set (ingress + tai/mono) => This is existing logic + tai being added (clear tstamp in ingress) > 4. For all other cases go ahead and fill in the tstamp in the dest register. If it is to ensure no new type is added without adding BPF_SKB_TSTAMP_DELIVERY_XYZ, I would simplify this runtime bpf insns here and use a BUILD_BUG_ON to catch it at compile time. Something like, enum skb_tstamp_type { SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ SKB_CLOCK_TAI, /* Time base in skb is TAI */ __SKB_CLOCK_MAX = SKB_CLOCK_TAI, }; /* Same one used in the bpf_convert_tstamp_type_read() above */ BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); Another thing is, the UDP test in test_tc_dtime.c probably needs to be adjusted, the userspace is using the CLOCK_TAI in SO_TXTIME and it is getting forwarded now. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-18 21:57 ` Martin KaFai Lau @ 2024-04-19 0:30 ` Abhishek Chauhan (ABC) 2024-04-20 1:13 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-19 0:30 UTC (permalink / raw) To: Martin KaFai Lau, Willem de Bruijn Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Daniel Borkmann, bpf, kernel On 4/18/2024 2:57 PM, Martin KaFai Lau wrote: > On 4/18/24 1:10 PM, Abhishek Chauhan (ABC) wrote: >>>> #ifdef CONFIG_NET_XGRESS >>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >>>> __u8 tc_skip_classify:1; >>>> @@ -1096,10 +1100,12 @@ struct sk_buff { >>>> */ >>>> #ifdef __BIG_ENDIAN_BITFIELD >>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >>>> -#define TC_AT_INGRESS_MASK (1 << 6) >>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) >>> >>> SKB_TSTAMP_TYPE_BIT2_MASK? > > nit. Shorten it to just SKB_TSTAMP_TYPE_MASK? > Okay i will do the same. Noted! > #ifdef __BIG_ENDIAN_BITFIELD > #define SKB_TSTAMP_TYPE_MASK (3 << 6) > #define SKB_TSTAMP_TYPE_RSH (6) /* more on this later */ > #else > #define SKB_TSTAMP_TYPE_MASK (3) > #endif > >>> >> I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. > > I think it is not very useful to distinguish each bit since it is an enum value now. It becomes more like the "pkt_type:3" and its PKT_TYPE_MAX. >I see what you are saying. >>>> +#define TC_AT_INGRESS_MASK (1 << 5) >>>> #else >>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>>> -#define TC_AT_INGRESS_MASK (1 << 1) >>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >>>> +#define TC_AT_INGRESS_MASK (1 << 2) >>>> #endif >>>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>>> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>>> case CLOCK_MONOTONIC: >>>> skb->tstamp_type = SKB_CLOCK_MONO; >>>> break; >>>> + case CLOCK_TAI: >>>> + skb->tstamp_type = SKB_CLOCK_TAI; >>>> + break; >>>> + default: >>>> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); >>> >>> and set to 0 and default tstamp_type? >>> Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >>>> } >>>> } >>> >>>> > >>> @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >>>> SKB_BF_MONO_TC_OFFSET); >>>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>> - SKB_MONO_DELIVERY_TIME_MASK, 2); >>>> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>> + SKB_MONO_DELIVERY_TIME_MASK, 3); >>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >>>> *insn++ = BPF_JMP_A(1); >>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >>>> + *insn++ = BPF_JMP_A(1); >>>> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); > > With SKB_TSTAMP_TYPE_MASK defined like above, this could be simplified like this (untested): > Let me think this through and raise it as part of the next rfc patch. > static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, > struct bpf_insn *insn) > { > __u8 value_reg = si->dst_reg; > __u8 skb_reg = si->src_reg; > > BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); > *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); > *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK); > #ifdef __BIG_ENDIAN_BITFIELD > *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH); > #else > BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1)); > #endif > > return insn; > } > >>>> return insn; >>>> } >>>> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >>>> __u8 tmp_reg = BPF_REG_AX; >>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>>> + /*check if all three bits are set*/ >>>> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >>>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>> + SKB_TAI_DELIVERY_TIME_MASK); >>>> + /*if all 3 bits are set jump 3 instructions and clear the register */ >>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>> + /*Now check Mono is set with ingress mask if so clear */ >>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >>>> + /*Now Check tai is set with ingress mask if so clear */ >>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>> + /*Now Check tai and mono are set if so clear */ >>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>> + SKB_MONO_DELIVERY_TIME_MASK | >>>> + SKB_TAI_DELIVERY_TIME_MASK, 1); > > Same as the bpf_convert_tstamp_type_read, this could be simplified with SKB_TSTAMP_TYPE_MASK. > >>> >>> This looks as if all JEQ result in "if so clear"? >>> >>> Is the goal to only do something different for the two bits being 0x1, >>> can we have a single test with a two-bit mask, rather than four tests? >>> >> I think Martin wanted to take care of TAI as well. I will wait for his comment here >> >> My Goal was to take care of invalid combos which does not hold valid >> 1. If all 3 bits are set => invalid combo (Test case written is Insane) >> 2. If 2 bits are set (tai+mono)(Test case written is Insane) => this cannot happen (because clock base can only be one in skb) >> 3. If 2 bit are set (ingress + tai/mono) => This is existing logic + tai being added (clear tstamp in ingress) >> 4. For all other cases go ahead and fill in the tstamp in the dest register. > > If it is to ensure no new type is added without adding BPF_SKB_TSTAMP_DELIVERY_XYZ, I would simplify this runtime bpf insns here and use a BUILD_BUG_ON to catch it at compile time. Something like, > > enum skb_tstamp_type { > SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ > SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ > SKB_CLOCK_TAI, /* Time base in skb is TAI */ > __SKB_CLOCK_MAX = SKB_CLOCK_TAI, > }; > > /* Same one used in the bpf_convert_tstamp_type_read() above */ > BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); > > Another thing is, the UDP test in test_tc_dtime.c probably needs to be adjusted, the userspace is using the CLOCK_TAI in SO_TXTIME and it is getting forwarded now. Noted ! Let me check and evalute this as well. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-19 0:30 ` Abhishek Chauhan (ABC) @ 2024-04-20 1:13 ` Abhishek Chauhan (ABC) 2024-04-22 18:46 ` Martin KaFai Lau 0 siblings, 1 reply; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-20 1:13 UTC (permalink / raw) To: Martin KaFai Lau, Willem de Bruijn Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Daniel Borkmann, bpf, kernel On 4/18/2024 5:30 PM, Abhishek Chauhan (ABC) wrote: > > > On 4/18/2024 2:57 PM, Martin KaFai Lau wrote: >> On 4/18/24 1:10 PM, Abhishek Chauhan (ABC) wrote: >>>>> #ifdef CONFIG_NET_XGRESS >>>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >>>>> __u8 tc_skip_classify:1; >>>>> @@ -1096,10 +1100,12 @@ struct sk_buff { >>>>> */ >>>>> #ifdef __BIG_ENDIAN_BITFIELD >>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >>>>> -#define TC_AT_INGRESS_MASK (1 << 6) >>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) >>>> >>>> SKB_TSTAMP_TYPE_BIT2_MASK? >> >> nit. Shorten it to just SKB_TSTAMP_TYPE_MASK? >> > Okay i will do the same. Noted! >> #ifdef __BIG_ENDIAN_BITFIELD >> #define SKB_TSTAMP_TYPE_MASK (3 << 6) >> #define SKB_TSTAMP_TYPE_RSH (6) /* more on this later */ >> #else >> #define SKB_TSTAMP_TYPE_MASK (3) >> #endif >> >>>> >>> I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. >> >> I think it is not very useful to distinguish each bit since it is an enum value now. It becomes more like the "pkt_type:3" and its PKT_TYPE_MAX. >> I see what you are saying. >>>>> +#define TC_AT_INGRESS_MASK (1 << 5) >>>>> #else >>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>>>> -#define TC_AT_INGRESS_MASK (1 << 1) >>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >>>>> +#define TC_AT_INGRESS_MASK (1 << 2) >>>>> #endif >>>>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>>>> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>>>> case CLOCK_MONOTONIC: >>>>> skb->tstamp_type = SKB_CLOCK_MONO; >>>>> break; >>>>> + case CLOCK_TAI: >>>>> + skb->tstamp_type = SKB_CLOCK_TAI; >>>>> + break; >>>>> + default: >>>>> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); >>>> >>>> and set to 0 and default tstamp_type? >>>> Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >>>>> } >>>>> } >>>> >>>>> > >>>> @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >>>>> SKB_BF_MONO_TC_OFFSET); >>>>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>> - SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>> + SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >>>>> *insn++ = BPF_JMP_A(1); >>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >>>>> + *insn++ = BPF_JMP_A(1); >>>>> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); >> >> With SKB_TSTAMP_TYPE_MASK defined like above, this could be simplified like this (untested): >> > Let me think this through and raise it as part of the next rfc patch. >> static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >> struct bpf_insn *insn) >> { >> __u8 value_reg = si->dst_reg; >> __u8 skb_reg = si->src_reg; >> >> BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); >> *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >> *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK); >> #ifdef __BIG_ENDIAN_BITFIELD >> *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH); >> #else >> BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1)); >> #endif >> >> return insn; >> } >> >>>>> return insn; >>>>> } >>>>> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >>>>> __u8 tmp_reg = BPF_REG_AX; >>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>>>> + /*check if all three bits are set*/ >>>>> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >>>>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>> + SKB_TAI_DELIVERY_TIME_MASK); >>>>> + /*if all 3 bits are set jump 3 instructions and clear the register */ >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>> + /*Now check Mono is set with ingress mask if so clear */ >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>> + /*Now Check tai is set with ingress mask if so clear */ >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>> + /*Now Check tai and mono are set if so clear */ >>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>> + SKB_MONO_DELIVERY_TIME_MASK | >>>>> + SKB_TAI_DELIVERY_TIME_MASK, 1); >> >> Same as the bpf_convert_tstamp_type_read, this could be simplified with SKB_TSTAMP_TYPE_MASK. >> Willem and Martin, When do we clear the tstamp and make it 0 in bpf_convert_tstamp_read? meaning which configuration? I see previously(current upstream code) if mono_delivery is set and tc_ingress_mask is set upstream code used to set the tstamp as 0. Which means with addition of tai mask the new implementation should take care of following cases(correct me if i am wrong) 1. ( tai mask set + ingress mask set ) = Clear tstamp 2. ( mono mask set + ingress mask set ) = Clear tstamp 3. ( mono mask set + tai mask set + ingress mask set ) = Clear tstamp 4. ( No mask set ) = Clear tstamp 5. ( Tai mask set + mono mask set ) = Clear tstamp This leaves us with only two values which can be support which is 0x1 and 0x2 This means the tstamp_type should be either 0x1(mono) and tstamp_type 0x2 (tai) to set the value_reg with tstamp Is my understanding correct ? Do you think the below simplified version looks okay ? static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, const struct bpf_insn *si, struct bpf_insn *insn) { __u8 value_reg = si->dst_reg; __u8 skb_reg = si->src_reg; BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); #ifdef CONFIG_NET_XGRESS /* If the tstamp_type is read, * the bpf prog is aware the tstamp could have delivery time. * Thus, read skb->tstamp as is if tstamp_type_access is true. */ if (!prog->tstamp_type_access) { /* AX is needed because src_reg and dst_reg could be the same */ __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); /* check if all three bits are set*/ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, TC_AT_INGRESS_MASK | SKB_TSTAMP_TYPE_MASK); /* If the value of tmp_reg is 7,6,5,4,3,0 which means invalid * configuration set the tstamp to 0, value 0x1 and 0x2 * is correct configuration */ #ifdef __BIG_ENDIAN_BITFIELD *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1 << SKB_TSTAMP_TYPE_RSH, 3); *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2 << SKB_TSTAMP_TYPE_RSH, 2); #endif *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1, 3); *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2, 2); #endif /* skb->tc_at_ingress && skb->tstamp_type:2, * read 0 as the (rcv) timestamp. */ *insn++ = BPF_MOV64_IMM(value_reg, 0); *insn++ = BPF_JMP_A(1); } #endif *insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg, offsetof(struct sk_buff, tstamp)); return insn; } >>>> >>>> This looks as if all JEQ result in "if so clear"? >>>> >>>> Is the goal to only do something different for the two bits being 0x1, >>>> can we have a single test with a two-bit mask, rather than four tests? >>>> >>> I think Martin wanted to take care of TAI as well. I will wait for his comment here >>> >>> My Goal was to take care of invalid combos which does not hold valid >>> 1. If all 3 bits are set => invalid combo (Test case written is Insane) >>> 2. If 2 bits are set (tai+mono)(Test case written is Insane) => this cannot happen (because clock base can only be one in skb) >>> 3. If 2 bit are set (ingress + tai/mono) => This is existing logic + tai being added (clear tstamp in ingress) >>> 4. For all other cases go ahead and fill in the tstamp in the dest register. >> >> If it is to ensure no new type is added without adding BPF_SKB_TSTAMP_DELIVERY_XYZ, I would simplify this runtime bpf insns here and use a BUILD_BUG_ON to catch it at compile time. Something like, >> >> enum skb_tstamp_type { >> SKB_CLOCK_REAL, /* Time base is skb is REALTIME */ >> SKB_CLOCK_MONO, /* Time base is skb is MONOTONIC */ >> SKB_CLOCK_TAI, /* Time base in skb is TAI */ >> __SKB_CLOCK_MAX = SKB_CLOCK_TAI, >> }; >> >> /* Same one used in the bpf_convert_tstamp_type_read() above */ >> BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); >> >> Another thing is, the UDP test in test_tc_dtime.c probably needs to be adjusted, the userspace is using the CLOCK_TAI in SO_TXTIME and it is getting forwarded now. > Noted ! Let me check and evalute this as well. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-20 1:13 ` Abhishek Chauhan (ABC) @ 2024-04-22 18:46 ` Martin KaFai Lau 2024-04-24 18:03 ` Abhishek Chauhan (ABC) 0 siblings, 1 reply; 16+ messages in thread From: Martin KaFai Lau @ 2024-04-22 18:46 UTC (permalink / raw) To: Abhishek Chauhan (ABC), Willem de Bruijn Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Daniel Borkmann, bpf, kernel On 4/19/24 6:13 PM, Abhishek Chauhan (ABC) wrote: > > > On 4/18/2024 5:30 PM, Abhishek Chauhan (ABC) wrote: >> >> >> On 4/18/2024 2:57 PM, Martin KaFai Lau wrote: >>> On 4/18/24 1:10 PM, Abhishek Chauhan (ABC) wrote: >>>>>> #ifdef CONFIG_NET_XGRESS >>>>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >>>>>> __u8 tc_skip_classify:1; >>>>>> @@ -1096,10 +1100,12 @@ struct sk_buff { >>>>>> */ >>>>>> #ifdef __BIG_ENDIAN_BITFIELD >>>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >>>>>> -#define TC_AT_INGRESS_MASK (1 << 6) >>>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) >>>>> >>>>> SKB_TSTAMP_TYPE_BIT2_MASK? >>> >>> nit. Shorten it to just SKB_TSTAMP_TYPE_MASK? >>> >> Okay i will do the same. Noted! >>> #ifdef __BIG_ENDIAN_BITFIELD >>> #define SKB_TSTAMP_TYPE_MASK (3 << 6) >>> #define SKB_TSTAMP_TYPE_RSH (6) /* more on this later */ >>> #else >>> #define SKB_TSTAMP_TYPE_MASK (3) >>> #endif >>> >>>>> >>>> I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. >>> >>> I think it is not very useful to distinguish each bit since it is an enum value now. It becomes more like the "pkt_type:3" and its PKT_TYPE_MAX. >>> I see what you are saying. >>>>>> +#define TC_AT_INGRESS_MASK (1 << 5) >>>>>> #else >>>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>>>>> -#define TC_AT_INGRESS_MASK (1 << 1) >>>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >>>>>> +#define TC_AT_INGRESS_MASK (1 << 2) >>>>>> #endif >>>>>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>>>>> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>>>>> case CLOCK_MONOTONIC: >>>>>> skb->tstamp_type = SKB_CLOCK_MONO; >>>>>> break; >>>>>> + case CLOCK_TAI: >>>>>> + skb->tstamp_type = SKB_CLOCK_TAI; >>>>>> + break; >>>>>> + default: >>>>>> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); >>>>> >>>>> and set to 0 and default tstamp_type? >>>>> Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >>>>>> } >>>>>> } >>>>> >>>>>> > >>>>> @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >>>>>> SKB_BF_MONO_TC_OFFSET); >>>>>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>> - SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>>> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>> + SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >>>>>> *insn++ = BPF_JMP_A(1); >>>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >>>>>> + *insn++ = BPF_JMP_A(1); >>>>>> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); >>> >>> With SKB_TSTAMP_TYPE_MASK defined like above, this could be simplified like this (untested): >>> >> Let me think this through and raise it as part of the next rfc patch. >>> static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>> struct bpf_insn *insn) >>> { >>> __u8 value_reg = si->dst_reg; >>> __u8 skb_reg = si->src_reg; >>> >>> BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); >>> *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>> *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK); >>> #ifdef __BIG_ENDIAN_BITFIELD >>> *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH); >>> #else >>> BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1)); >>> #endif >>> >>> return insn; >>> } >>> >>>>>> return insn; >>>>>> } >>>>>> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >>>>>> __u8 tmp_reg = BPF_REG_AX; >>>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>>>>> + /*check if all three bits are set*/ >>>>>> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >>>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >>>>>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >>>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>>> + SKB_TAI_DELIVERY_TIME_MASK); >>>>>> + /*if all 3 bits are set jump 3 instructions and clear the register */ >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>>> + /*Now check Mono is set with ingress mask if so clear */ >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>>> + /*Now Check tai is set with ingress mask if so clear */ >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>>> + /*Now Check tai and mono are set if so clear */ >>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>> + SKB_MONO_DELIVERY_TIME_MASK | >>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 1); >>> >>> Same as the bpf_convert_tstamp_type_read, this could be simplified with SKB_TSTAMP_TYPE_MASK. >>> > Willem and Martin, > When do we clear the tstamp and make it 0 in bpf_convert_tstamp_read? meaning which configuration? When the bpf prog does not check the skb->tstamp_type. It is the "if (!prog->tstamp_type_access)" in bpf_convert_tstamp_read(). If bpf prog does not check the skb->tstamp_type and it is at ingress, bpf prog expects recv tstamp (ie. real clock), so it needs to clear out the tstamp (i.e read as 0 tstamp). > I see previously(current upstream code) if mono_delivery is set and tc_ingress_mask is set > upstream code used to set the tstamp as 0. > > Which means with addition of tai mask the new implementation should take care of following cases(correct me if i am wrong) > 1. ( tai mask set + ingress mask set ) = Clear tstamp > 2. ( mono mask set + ingress mask set ) = Clear tstamp > 3. ( mono mask set + tai mask set + ingress mask set ) = Clear tstamp > 4. ( No mask set ) = Clear tstamp > 5. ( Tai mask set + mono mask set ) = Clear tstamp No need to check the individual mono and tai bit here. Check the tstamp_type as a whole. Like in pseudo C: if (skb->tc_at_ingress && skb->tstamp_type) value_reg = 0; untested code for tstamp_read() and tstamp_write(): static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, const struct bpf_insn *si, struct bpf_insn *insn) { __u8 value_reg = si->dst_reg; __u8 skb_reg = si->src_reg; #ifdef CONFIG_NET_XGRESS /* If the tstamp_type is read, * the bpf prog is aware the tstamp could have delivery time. * Thus, read skb->tstamp as is if tstamp_type_access is true. */ if (!prog->tstamp_type_access) { /* AX is needed because src_reg and dst_reg could be the same */ __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); /* goto <read> */ BPF_JMP_A(4); *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1); /* goto <read> */ BPF_JMP_A(2); /* skb->tc_at_ingress && skb->tstamp_type, * read 0 as the (rcv) timestamp. */ *insn++ = BPF_MOV64_IMM(value_reg, 0); *insn++ = BPF_JMP_A(1); } #endif /* <read>: value_reg = skb->tstamp */ *insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg, offsetof(struct sk_buff, tstamp)); return insn; } static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, const struct bpf_insn *si, struct bpf_insn *insn) { __u8 value_reg = si->src_reg; __u8 skb_reg = si->dst_reg; #ifdef CONFIG_NET_XGRESS /* If the tstamp_type is read, * the bpf prog is aware the tstamp could have delivery time. * Thus, write skb->tstamp as is if tstamp_type_access is true. * Otherwise, writing at ingress will have to clear the * mono_delivery_time (skb->tstamp_type:1)bit also. */ if (!prog->tstamp_type_access) { __u8 tmp_reg = BPF_REG_AX; *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); /* Writing __sk_buff->tstamp as ingress, goto <clear> */ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); /* goto <store> */ *insn++ = BPF_JMP_A(2); /* <clear>: skb->tstamp_type */ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK); *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); } #endif /* <store>: skb->tstamp = tstamp */ *insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM, skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm); return insn; } > > This leaves us with only two values which can be support which is 0x1 and 0x2 > > This means the tstamp_type should be either 0x1(mono) and tstamp_type 0x2 (tai) to set the value_reg with tstamp > Is my understanding correct ? > > Do you think the below simplified version looks okay ? > > static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, > const struct bpf_insn *si, > struct bpf_insn *insn) > { > __u8 value_reg = si->dst_reg; > __u8 skb_reg = si->src_reg; > > BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); > #ifdef CONFIG_NET_XGRESS > /* If the tstamp_type is read, > * the bpf prog is aware the tstamp could have delivery time. > * Thus, read skb->tstamp as is if tstamp_type_access is true. > */ > if (!prog->tstamp_type_access) { > /* AX is needed because src_reg and dst_reg could be the same */ > __u8 tmp_reg = BPF_REG_AX; > > *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); > /* check if all three bits are set*/ > *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, > TC_AT_INGRESS_MASK | SKB_TSTAMP_TYPE_MASK); > > /* If the value of tmp_reg is 7,6,5,4,3,0 which means invalid > * configuration set the tstamp to 0, value 0x1 and 0x2 > * is correct configuration > */ > #ifdef __BIG_ENDIAN_BITFIELD > *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1 << SKB_TSTAMP_TYPE_RSH, 3); > *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2 << SKB_TSTAMP_TYPE_RSH, 2); > #endif > *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1, 3); > *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2, 2); > #endif > /* skb->tc_at_ingress && skb->tstamp_type:2, > * read 0 as the (rcv) timestamp. > */ > *insn++ = BPF_MOV64_IMM(value_reg, 0); > *insn++ = BPF_JMP_A(1); > } > #endif > > *insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg, > offsetof(struct sk_buff, tstamp)); > return insn; > } > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type 2024-04-22 18:46 ` Martin KaFai Lau @ 2024-04-24 18:03 ` Abhishek Chauhan (ABC) 0 siblings, 0 replies; 16+ messages in thread From: Abhishek Chauhan (ABC) @ 2024-04-24 18:03 UTC (permalink / raw) To: Martin KaFai Lau, Willem de Bruijn Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel, Andrew Halaney, Martin KaFai Lau, Daniel Borkmann, bpf, kernel On 4/22/2024 11:46 AM, Martin KaFai Lau wrote: > On 4/19/24 6:13 PM, Abhishek Chauhan (ABC) wrote: >> >> >> On 4/18/2024 5:30 PM, Abhishek Chauhan (ABC) wrote: >>> >>> >>> On 4/18/2024 2:57 PM, Martin KaFai Lau wrote: >>>> On 4/18/24 1:10 PM, Abhishek Chauhan (ABC) wrote: >>>>>>> #ifdef CONFIG_NET_XGRESS >>>>>>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ >>>>>>> __u8 tc_skip_classify:1; >>>>>>> @@ -1096,10 +1100,12 @@ struct sk_buff { >>>>>>> */ >>>>>>> #ifdef __BIG_ENDIAN_BITFIELD >>>>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 7) >>>>>>> -#define TC_AT_INGRESS_MASK (1 << 6) >>>>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 6) >>>>>> >>>>>> SKB_TSTAMP_TYPE_BIT2_MASK? >>>> >>>> nit. Shorten it to just SKB_TSTAMP_TYPE_MASK? >>>> >>> Okay i will do the same. Noted! >>>> #ifdef __BIG_ENDIAN_BITFIELD >>>> #define SKB_TSTAMP_TYPE_MASK (3 << 6) >>>> #define SKB_TSTAMP_TYPE_RSH (6) /* more on this later */ >>>> #else >>>> #define SKB_TSTAMP_TYPE_MASK (3) >>>> #endif >>>> >>>>>> >>>>> I was thinking to keep it as TAI because it will confuse developers. I hope thats okay. >>>> >>>> I think it is not very useful to distinguish each bit since it is an enum value now. It becomes more like the "pkt_type:3" and its PKT_TYPE_MAX. >>>> I see what you are saying. >>>>>>> +#define TC_AT_INGRESS_MASK (1 << 5) >>>>>>> #else >>>>>>> #define SKB_MONO_DELIVERY_TIME_MASK (1 << 0) >>>>>>> -#define TC_AT_INGRESS_MASK (1 << 1) >>>>>>> +#define SKB_TAI_DELIVERY_TIME_MASK (1 << 1) >>>>>>> +#define TC_AT_INGRESS_MASK (1 << 2) >>>>>>> #endif >>>>>>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset) >>>>>>> @@ -4206,6 +4212,11 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>>>>>> case CLOCK_MONOTONIC: >>>>>>> skb->tstamp_type = SKB_CLOCK_MONO; >>>>>>> break; >>>>>>> + case CLOCK_TAI: >>>>>>> + skb->tstamp_type = SKB_CLOCK_TAI; >>>>>>> + break; >>>>>>> + default: >>>>>>> + WARN_ONCE(true, "clockid %d not supported", tstamp_type); >>>>>> >>>>>> and set to 0 and default tstamp_type? >>>>>> Actually thinking about it. I feel if its unsupported just fall back to default is the correct thing. I will take care of this. >>>>>>> } >>>>>>> } >>>>>> >>>>>>> > >>>>>> @@ -9372,10 +9378,16 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>>>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, >>>>>>> SKB_BF_MONO_TC_OFFSET); >>>>>>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>>> - SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>>>> + SKB_MONO_DELIVERY_TIME_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>>> + SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, >>>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC); >>>>>>> *insn++ = BPF_JMP_A(1); >>>>>>> *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO); >>>>>>> + *insn++ = BPF_JMP_A(1); >>>>>>> + *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_TAI); >>>> >>>> With SKB_TSTAMP_TYPE_MASK defined like above, this could be simplified like this (untested): >>>> >>> Let me think this through and raise it as part of the next rfc patch. >>>> static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si, >>>> struct bpf_insn *insn) >>>> { >>>> __u8 value_reg = si->dst_reg; >>>> __u8 skb_reg = si->src_reg; >>>> >>>> BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); >>>> *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>>> *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK); >>>> #ifdef __BIG_ENDIAN_BITFIELD >>>> *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH); >>>> #else >>>> BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1)); >>>> #endif >>>> >>>> return insn; >>>> } >>>> >>>>>>> return insn; >>>>>>> } >>>>>>> @@ -9418,10 +9430,26 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >>>>>>> __u8 tmp_reg = BPF_REG_AX; >>>>>>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >>>>>>> + /*check if all three bits are set*/ >>>>>>> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >>>>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK); >>>>>>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg, >>>>>>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2); >>>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>>>> + SKB_TAI_DELIVERY_TIME_MASK); >>>>>>> + /*if all 3 bits are set jump 3 instructions and clear the register */ >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK | >>>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 4); >>>>>>> + /*Now check Mono is set with ingress mask if so clear */ >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>>> + TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 3); >>>>>>> + /*Now Check tai is set with ingress mask if so clear */ >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>>> + TC_AT_INGRESS_MASK | SKB_TAI_DELIVERY_TIME_MASK, 2); >>>>>>> + /*Now Check tai and mono are set if so clear */ >>>>>>> + *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, >>>>>>> + SKB_MONO_DELIVERY_TIME_MASK | >>>>>>> + SKB_TAI_DELIVERY_TIME_MASK, 1); >>>> >>>> Same as the bpf_convert_tstamp_type_read, this could be simplified with SKB_TSTAMP_TYPE_MASK. >>>> >> Willem and Martin, >> When do we clear the tstamp and make it 0 in bpf_convert_tstamp_read? meaning which configuration? > > When the bpf prog does not check the skb->tstamp_type. It is > the "if (!prog->tstamp_type_access)" in bpf_convert_tstamp_read(). > > If bpf prog does not check the skb->tstamp_type and it is at ingress, > bpf prog expects recv tstamp (ie. real clock), so it needs to clear > out the tstamp (i.e read as 0 tstamp). > >> I see previously(current upstream code) if mono_delivery is set and tc_ingress_mask is set >> upstream code used to set the tstamp as 0. >> >> Which means with addition of tai mask the new implementation should take care of following cases(correct me if i am wrong) >> 1. ( tai mask set + ingress mask set ) = Clear tstamp >> 2. ( mono mask set + ingress mask set ) = Clear tstamp >> 3. ( mono mask set + tai mask set + ingress mask set ) = Clear tstamp >> 4. ( No mask set ) = Clear tstamp >> 5. ( Tai mask set + mono mask set ) = Clear tstamp > > No need to check the individual mono and tai bit here. Check the > tstamp_type as a whole. Like in pseudo C: > > if (skb->tc_at_ingress && skb->tstamp_type) > value_reg = 0; > > untested code for tstamp_read() and tstamp_write(): > > static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, > const struct bpf_insn *si, > struct bpf_insn *insn) > { > __u8 value_reg = si->dst_reg; > __u8 skb_reg = si->src_reg; > > #ifdef CONFIG_NET_XGRESS > /* If the tstamp_type is read, > * the bpf prog is aware the tstamp could have delivery time. > * Thus, read skb->tstamp as is if tstamp_type_access is true. > */ > if (!prog->tstamp_type_access) { > /* AX is needed because src_reg and dst_reg could be the same */ > __u8 tmp_reg = BPF_REG_AX; > > *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); > *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); > /* goto <read> */ > BPF_JMP_A(4); > *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1); > /* goto <read> */ > BPF_JMP_A(2); > /* skb->tc_at_ingress && skb->tstamp_type, > * read 0 as the (rcv) timestamp. > */ > *insn++ = BPF_MOV64_IMM(value_reg, 0); > *insn++ = BPF_JMP_A(1); > } > #endif > > /* <read>: value_reg = skb->tstamp */ > *insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg, > offsetof(struct sk_buff, tstamp)); > return insn; > } > > static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog, > const struct bpf_insn *si, > struct bpf_insn *insn) > { > __u8 value_reg = si->src_reg; > __u8 skb_reg = si->dst_reg; > > #ifdef CONFIG_NET_XGRESS > /* If the tstamp_type is read, > * the bpf prog is aware the tstamp could have delivery time. > * Thus, write skb->tstamp as is if tstamp_type_access is true. > * Otherwise, writing at ingress will have to clear the > * mono_delivery_time (skb->tstamp_type:1)bit also. > */ > if (!prog->tstamp_type_access) { > __u8 tmp_reg = BPF_REG_AX; > > *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); > /* Writing __sk_buff->tstamp as ingress, goto <clear> */ > *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1); > /* goto <store> */ > *insn++ = BPF_JMP_A(2); > /* <clear>: skb->tstamp_type */ > *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK); > *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET); > } > #endif > > /* <store>: skb->tstamp = tstamp */ > *insn++ = BPF_RAW_INSN(BPF_CLASS(si->code) | BPF_DW | BPF_MEM, > skb_reg, value_reg, offsetof(struct sk_buff, tstamp), si->imm); > return insn; > } > Thanks Martin. I will raise the RFC patch v5 today. Also made changes in test_tc_dtime.c which needs a closer review too. >> >> This leaves us with only two values which can be support which is 0x1 and 0x2 >> >> This means the tstamp_type should be either 0x1(mono) and tstamp_type 0x2 (tai) to set the value_reg with tstamp >> Is my understanding correct ? >> >> Do you think the below simplified version looks okay ? >> >> static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog, >> const struct bpf_insn *si, >> struct bpf_insn *insn) >> { >> __u8 value_reg = si->dst_reg; >> __u8 skb_reg = si->src_reg; >> >> BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI); >> #ifdef CONFIG_NET_XGRESS >> /* If the tstamp_type is read, >> * the bpf prog is aware the tstamp could have delivery time. >> * Thus, read skb->tstamp as is if tstamp_type_access is true. >> */ >> if (!prog->tstamp_type_access) { >> /* AX is needed because src_reg and dst_reg could be the same */ >> __u8 tmp_reg = BPF_REG_AX; >> >> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET); >> /* check if all three bits are set*/ >> *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, >> TC_AT_INGRESS_MASK | SKB_TSTAMP_TYPE_MASK); >> >> /* If the value of tmp_reg is 7,6,5,4,3,0 which means invalid >> * configuration set the tstamp to 0, value 0x1 and 0x2 >> * is correct configuration >> */ >> #ifdef __BIG_ENDIAN_BITFIELD >> *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1 << SKB_TSTAMP_TYPE_RSH, 3); >> *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2 << SKB_TSTAMP_TYPE_RSH, 2); >> #endif >> *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x1, 3); >> *insn++ = BPF_JMP32_IMM(BPF_JEQ, tmp_reg, 0x2, 2); >> #endif >> /* skb->tc_at_ingress && skb->tstamp_type:2, >> * read 0 as the (rcv) timestamp. >> */ >> *insn++ = BPF_MOV64_IMM(value_reg, 0); >> *insn++ = BPF_JMP_A(1); >> } >> #endif >> >> *insn++ = BPF_LDX_MEM(BPF_DW, value_reg, skb_reg, >> offsetof(struct sk_buff, tstamp)); >> return insn; >> } >> >> > ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-04-24 18:04 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-18 0:43 [RFC PATCH bpf-next v4 0/2] Replace mono_delivery_time with tstamp_type Abhishek Chauhan 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 1/2] net: Rename mono_delivery_time to tstamp_type for scalabilty Abhishek Chauhan 2024-04-18 18:47 ` Willem de Bruijn 2024-04-18 19:58 ` Abhishek Chauhan (ABC) 2024-04-18 20:11 ` Willem de Bruijn 2024-04-18 20:38 ` Abhishek Chauhan (ABC) 2024-04-18 20:49 ` Willem de Bruijn 2024-04-18 20:51 ` Abhishek Chauhan (ABC) 2024-04-18 0:43 ` [RFC PATCH bpf-next v4 2/2] net: Add additional bit to support clockid_t timestamp type Abhishek Chauhan 2024-04-18 19:06 ` Willem de Bruijn 2024-04-18 20:10 ` Abhishek Chauhan (ABC) 2024-04-18 21:57 ` Martin KaFai Lau 2024-04-19 0:30 ` Abhishek Chauhan (ABC) 2024-04-20 1:13 ` Abhishek Chauhan (ABC) 2024-04-22 18:46 ` Martin KaFai Lau 2024-04-24 18:03 ` Abhishek Chauhan (ABC)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).