* [PATCH v2 net-next 0/2] net: prepare pacing offload support
@ 2024-10-02 15:12 Eric Dumazet
2024-10-02 15:12 ` [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Eric Dumazet
2024-10-02 15:12 ` [PATCH v2 net-next 2/2] net_sched: sch_fq: add the ability to offload pacing Eric Dumazet
0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2024-10-02 15:12 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willem de Bruijn, Jeffrey Ji, netdev, eric.dumazet, Eric Dumazet
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
In order to upstream the NIC support, this series adds :
1) timing wheel horizon as a per-device attribute.
2) FQ packet scheduler support, to let paced packets
below the timing wheel horizon be handled by the driver.
v2: addressed Jakub feedback
( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 )
Eric Dumazet (1):
net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute
Jeffrey Ji (1):
net_sched: sch_fq: add the ability to offload pacing
.../networking/net_cachelines/net_device.rst | 1 +
include/linux/netdevice.h | 4 +++
include/uapi/linux/if_link.h | 1 +
include/uapi/linux/pkt_sched.h | 2 ++
net/core/rtnetlink.c | 4 +++
net/sched/sch_fq.c | 33 +++++++++++++++----
tools/include/uapi/linux/if_link.h | 1 +
7 files changed, 40 insertions(+), 6 deletions(-)
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute
2024-10-02 15:12 [PATCH v2 net-next 0/2] net: prepare pacing offload support Eric Dumazet
@ 2024-10-02 15:12 ` Eric Dumazet
2024-10-03 0:41 ` Jakub Kicinski
2024-10-02 15:12 ` [PATCH v2 net-next 2/2] net_sched: sch_fq: add the ability to offload pacing Eric Dumazet
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2024-10-02 15:12 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willem de Bruijn, Jeffrey Ji, netdev, eric.dumazet, Eric Dumazet
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patch adds dev->max_pacing_offload_horizon expressing
this timing wheel horizon in nsec units.
This is a read-only attribute.
Unless a driver sets it, dev->max_pacing_offload_horizon
is zero.
v2: addressed Jakub feedback
( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 )
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
Documentation/networking/net_cachelines/net_device.rst | 1 +
include/linux/netdevice.h | 4 ++++
include/uapi/linux/if_link.h | 1 +
net/core/rtnetlink.c | 4 ++++
tools/include/uapi/linux/if_link.h | 1 +
5 files changed, 11 insertions(+)
diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index 22b07c814f4a4575d255fdf472d07c549536e543..49f03cb78c6e25109af969654c86ebeb19d38e12 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -183,3 +183,4 @@ struct_devlink_port* devlink_port
struct_dpll_pin* dpll_pin
struct hlist_head page_pools
struct dim_irq_moder* irq_moder
+u64 max_pacing_offload_horizon
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e87b5e4883259a0723278ae3f1bee87e940af895..9eb5d9c63630e9a29a8ce2f8bc8042a520ed8398 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2009,6 +2009,8 @@ enum netdev_reg_state {
* @dpll_pin: Pointer to the SyncE source pin of a DPLL subsystem,
* where the clock is recovered.
*
+ * @max_pacing_offload_horizon: max EDT offload horizon in nsec.
+ *
* FIXME: cleanup struct net_device such that network protocol info
* moves out.
*/
@@ -2399,6 +2401,8 @@ struct net_device {
/** @irq_moder: dim parameters used if IS_ENABLED(CONFIG_DIMLIB). */
struct dim_irq_moder *irq_moder;
+ u64 max_pacing_offload_horizon;
+
u8 priv[] ____cacheline_aligned
__counted_by(priv_len);
} ____cacheline_aligned;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6dc258993b177093a77317ee5f2deab97fb04674..506ba9c80e83a5039f003c9def8b4fce41f43847 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -377,6 +377,7 @@ enum {
IFLA_GSO_IPV4_MAX_SIZE,
IFLA_GRO_IPV4_MAX_SIZE,
IFLA_DPLL_PIN,
+ IFLA_MAX_PACING_OFFLOAD_HORIZON,
__IFLA_MAX
};
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f0a52098708584aa27461b7ee941fa324adcaf20..682d8d3127db1d11a3f04c4526119d08349e2bd6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1118,6 +1118,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
+ nla_total_size(MAX_ADDR_LEN) /* IFLA_PERM_ADDRESS */
+ rtnl_devlink_port_size(dev)
+ rtnl_dpll_pin_size(dev)
+ + nla_total_size(8) /* IFLA_MAX_PACING_OFFLOAD_HORIZON */
+ 0;
}
@@ -1867,6 +1868,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
READ_ONCE(dev->tso_max_size)) ||
nla_put_u32(skb, IFLA_TSO_MAX_SEGS,
READ_ONCE(dev->tso_max_segs)) ||
+ nla_put_uint(skb, IFLA_MAX_PACING_OFFLOAD_HORIZON,
+ READ_ONCE(dev->max_pacing_offload_horizon)) ||
#ifdef CONFIG_RPS
nla_put_u32(skb, IFLA_NUM_RX_QUEUES,
READ_ONCE(dev->num_rx_queues)) ||
@@ -1975,6 +1978,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
}
static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
+ [IFLA_UNSPEC] = { .strict_start_type = IFLA_DPLL_PIN },
[IFLA_IFNAME] = { .type = NLA_STRING, .len = IFNAMSIZ-1 },
[IFLA_ADDRESS] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
[IFLA_BROADCAST] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index f0d71b2a3f1e1a3d0945bc3a0efe31cd95940f72..96ec2b01e725b304874816af171d2455bc7b495c 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -377,6 +377,7 @@ enum {
IFLA_GSO_IPV4_MAX_SIZE,
IFLA_GRO_IPV4_MAX_SIZE,
IFLA_DPLL_PIN,
+ IFLA_MAX_PACING_OFFLOAD_HORIZON,
__IFLA_MAX
};
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 net-next 2/2] net_sched: sch_fq: add the ability to offload pacing
2024-10-02 15:12 [PATCH v2 net-next 0/2] net: prepare pacing offload support Eric Dumazet
2024-10-02 15:12 ` [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Eric Dumazet
@ 2024-10-02 15:12 ` Eric Dumazet
1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2024-10-02 15:12 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willem de Bruijn, Jeffrey Ji, netdev, eric.dumazet, Eric Dumazet
From: Jeffrey Ji <jeffreyji@google.com>
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ packet
scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON
attribute.
Its value is capped by the device max_pacing_offload_horizon,
added in the prior patch.
It allows FQ to let packets within pacing offload horizon
to be delivered to the device, which will handle the needed
delay without host involvement.
Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
include/uapi/linux/pkt_sched.h | 2 ++
net/sched/sch_fq.c | 33 +++++++++++++++++++++++++++------
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a3cd0c2dc9956f8c873f35c7b33b2bcf93feb2f1..25a9a47001cdde59cf052ea658ba1ac26f4c34e8 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -836,6 +836,8 @@ enum {
TCA_FQ_WEIGHTS, /* Weights for each band */
+ TCA_FQ_OFFLOAD_HORIZON, /* dequeue paced packets within this horizon immediately (us units) */
+
__TCA_FQ_MAX
};
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 19a49af5a9e527ed0371a3bb96e0113755375eac..aeabf45c9200c4aea75fb6c63986e37eddfea5f9 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -111,6 +111,7 @@ struct fq_perband_flows {
struct fq_sched_data {
/* Read mostly cache line */
+ u64 offload_horizon;
u32 quantum;
u32 initial_quantum;
u32 flow_refill_delay;
@@ -299,7 +300,7 @@ static void fq_gc(struct fq_sched_data *q,
}
/* Fast path can be used if :
- * 1) Packet tstamp is in the past.
+ * 1) Packet tstamp is in the past, or within the pacing offload horizon.
* 2) FQ qlen == 0 OR
* (no flow is currently eligible for transmit,
* AND fast path queue has less than 8 packets)
@@ -314,7 +315,7 @@ static bool fq_fastpath_check(const struct Qdisc *sch, struct sk_buff *skb,
const struct fq_sched_data *q = qdisc_priv(sch);
const struct sock *sk;
- if (fq_skb_cb(skb)->time_to_send > now)
+ if (fq_skb_cb(skb)->time_to_send > now + q->offload_horizon)
return false;
if (sch->q.qlen != 0) {
@@ -595,15 +596,18 @@ static void fq_check_throttled(struct fq_sched_data *q, u64 now)
unsigned long sample;
struct rb_node *p;
- if (q->time_next_delayed_flow > now)
+ if (q->time_next_delayed_flow > now + q->offload_horizon)
return;
/* Update unthrottle latency EWMA.
* This is cheap and can help diagnosing timer/latency problems.
*/
sample = (unsigned long)(now - q->time_next_delayed_flow);
- q->unthrottle_latency_ns -= q->unthrottle_latency_ns >> 3;
- q->unthrottle_latency_ns += sample >> 3;
+ if ((long)sample > 0) {
+ q->unthrottle_latency_ns -= q->unthrottle_latency_ns >> 3;
+ q->unthrottle_latency_ns += sample >> 3;
+ }
+ now += q->offload_horizon;
q->time_next_delayed_flow = ~0ULL;
while ((p = rb_first(&q->delayed)) != NULL) {
@@ -687,7 +691,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
u64 time_next_packet = max_t(u64, fq_skb_cb(skb)->time_to_send,
f->time_next_packet);
- if (now < time_next_packet) {
+ if (now + q->offload_horizon < time_next_packet) {
head->first = f->next;
f->time_next_packet = time_next_packet;
fq_flow_set_throttled(q, f);
@@ -925,6 +929,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
[TCA_FQ_HORIZON_DROP] = { .type = NLA_U8 },
[TCA_FQ_PRIOMAP] = NLA_POLICY_EXACT_LEN(sizeof(struct tc_prio_qopt)),
[TCA_FQ_WEIGHTS] = NLA_POLICY_EXACT_LEN(FQ_BANDS * sizeof(s32)),
+ [TCA_FQ_OFFLOAD_HORIZON] = { .type = NLA_U32 },
};
/* compress a u8 array with all elems <= 3 to an array of 2-bit fields */
@@ -1100,6 +1105,17 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
WRITE_ONCE(q->horizon_drop,
nla_get_u8(tb[TCA_FQ_HORIZON_DROP]));
+ if (tb[TCA_FQ_OFFLOAD_HORIZON]) {
+ u64 offload_horizon = (u64)NSEC_PER_USEC *
+ nla_get_u32(tb[TCA_FQ_OFFLOAD_HORIZON]);
+
+ if (offload_horizon <= qdisc_dev(sch)->max_pacing_offload_horizon) {
+ WRITE_ONCE(q->offload_horizon, offload_horizon);
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "invalid offload_horizon");
+ err = -EINVAL;
+ }
+ }
if (!err) {
sch_tree_unlock(sch);
@@ -1183,6 +1199,7 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
.bands = FQ_BANDS,
};
struct nlattr *opts;
+ u64 offload_horizon;
u64 ce_threshold;
s32 weights[3];
u64 horizon;
@@ -1199,6 +1216,9 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
horizon = READ_ONCE(q->horizon);
do_div(horizon, NSEC_PER_USEC);
+ offload_horizon = READ_ONCE(q->offload_horizon);
+ do_div(offload_horizon, NSEC_PER_USEC);
+
if (nla_put_u32(skb, TCA_FQ_PLIMIT,
READ_ONCE(sch->limit)) ||
nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT,
@@ -1224,6 +1244,7 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_FQ_TIMER_SLACK,
READ_ONCE(q->timer_slack)) ||
nla_put_u32(skb, TCA_FQ_HORIZON, (u32)horizon) ||
+ nla_put_u32(skb, TCA_FQ_OFFLOAD_HORIZON, (u32)offload_horizon) ||
nla_put_u8(skb, TCA_FQ_HORIZON_DROP,
READ_ONCE(q->horizon_drop)))
goto nla_put_failure;
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute
2024-10-02 15:12 ` [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Eric Dumazet
@ 2024-10-03 0:41 ` Jakub Kicinski
2024-10-03 9:52 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2024-10-03 0:41 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, Willem de Bruijn, Jeffrey Ji,
netdev, eric.dumazet
On Wed, 2 Oct 2024 15:12:18 +0000 Eric Dumazet wrote:
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 6dc258993b177093a77317ee5f2deab97fb04674..506ba9c80e83a5039f003c9def8b4fce41f43847 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -377,6 +377,7 @@ enum {
> IFLA_GSO_IPV4_MAX_SIZE,
> IFLA_GRO_IPV4_MAX_SIZE,
> IFLA_DPLL_PIN,
> + IFLA_MAX_PACING_OFFLOAD_HORIZON,
> __IFLA_MAX
> };
Sorry for not catching this earlier, looks like CI is upset that
we didn't add this to the YAML. Please squash or LMK if you prefer
for me to follow up:
diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml
index 0c4d5d40cae9..d7131a1afadf 100644
--- a/Documentation/netlink/specs/rt_link.yaml
+++ b/Documentation/netlink/specs/rt_link.yaml
@@ -1137,6 +1137,10 @@ protonum: 0
name: dpll-pin
type: nest
nested-attributes: link-dpll-pin-attrs
+ -
+ name: max-pacing-offload-horizon
+ type: uint
+ doc: EDT offload horizon supported by the device (in nsec).
-
name: af-spec-attrs
attributes:
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute
2024-10-03 0:41 ` Jakub Kicinski
@ 2024-10-03 9:52 ` Eric Dumazet
0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2024-10-03 9:52 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, Willem de Bruijn, Jeffrey Ji,
netdev, eric.dumazet
On Thu, Oct 3, 2024 at 2:41 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 2 Oct 2024 15:12:18 +0000 Eric Dumazet wrote:
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index 6dc258993b177093a77317ee5f2deab97fb04674..506ba9c80e83a5039f003c9def8b4fce41f43847 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -377,6 +377,7 @@ enum {
> > IFLA_GSO_IPV4_MAX_SIZE,
> > IFLA_GRO_IPV4_MAX_SIZE,
> > IFLA_DPLL_PIN,
> > + IFLA_MAX_PACING_OFFLOAD_HORIZON,
> > __IFLA_MAX
> > };
>
> Sorry for not catching this earlier, looks like CI is upset that
> we didn't add this to the YAML. Please squash or LMK if you prefer
> for me to follow up:
>
> diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml
> index 0c4d5d40cae9..d7131a1afadf 100644
> --- a/Documentation/netlink/specs/rt_link.yaml
> +++ b/Documentation/netlink/specs/rt_link.yaml
> @@ -1137,6 +1137,10 @@ protonum: 0
> name: dpll-pin
> type: nest
> nested-attributes: link-dpll-pin-attrs
> + -
> + name: max-pacing-offload-horizon
> + type: uint
> + doc: EDT offload horizon supported by the device (in nsec).
> -
> name: af-spec-attrs
> attributes:
Sure thing, I will squash this in v3, thanks !
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-03 9:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 15:12 [PATCH v2 net-next 0/2] net: prepare pacing offload support Eric Dumazet
2024-10-02 15:12 ` [PATCH v2 net-next 1/2] net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attribute Eric Dumazet
2024-10-03 0:41 ` Jakub Kicinski
2024-10-03 9:52 ` Eric Dumazet
2024-10-02 15:12 ` [PATCH v2 net-next 2/2] net_sched: sch_fq: add the ability to offload pacing Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).