* [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-08 22:27 ` Vladimir Oltean
2023-11-12 10:31 ` Simon Horman
2023-11-07 11:20 ` [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry Faizal Rahim
` (6 subsequent siblings)
7 siblings, 2 replies; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
In the current taprio code for dynamic schedule change,
admin/oper schedule switching happens immediately when
should_change_schedules() is true. Then the last entry of
the old admin schedule stops being valid anymore from
taprio_dequeue_from_txq’s perspective.
To solve this, we have to delay the switch_schedules() call via
the new cycle_time_correction variable. The variable serves 2
purposes:
1. Upon entering advance_sched(), if the value is set to a
non-initialized value, it indicates that we need to change
schedule.
2. Store the cycle time correction value which will be used for
negative or positive correction.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 2e1949de4171..dee103647823 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
#define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
#define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
#define TAPRIO_FLAGS_INVALID U32_MAX
+#define INIT_CYCLE_TIME_CORRECTION S64_MIN
struct sched_entry {
/* Durations between this GCL entry and the GCL entry where the
@@ -75,6 +76,7 @@ struct sched_gate_list {
ktime_t cycle_end_time;
s64 cycle_time;
s64 cycle_time_extension;
+ s64 cycle_time_correction;
s64 base_time;
};
@@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
admin = rcu_dereference_protected(q->admin_sched,
lockdep_is_held(&q->current_entry_lock));
- if (!oper)
+ if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
+ oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
switch_schedules(q, &admin, &oper);
+ }
/* This can happen in two cases: 1. this is the very first run
* of this function (i.e. we weren't running any schedule
@@ -981,7 +985,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
* schedule runs.
*/
end_time = sched_base_time(admin);
- switch_schedules(q, &admin, &oper);
+ oper->cycle_time_correction = 0;
}
next->end_time = end_time;
@@ -1174,6 +1178,7 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb,
}
taprio_calculate_gate_durations(q, new);
+ new->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
2023-11-07 11:20 ` [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching Faizal Rahim
@ 2023-11-08 22:27 ` Vladimir Oltean
2023-11-15 11:54 ` Abdul Rahim, Faizal
2023-11-12 10:31 ` Simon Horman
1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-08 22:27 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:17AM -0500, Faizal Rahim wrote:
> In the current taprio code for dynamic schedule change,
> admin/oper schedule switching happens immediately when
> should_change_schedules() is true. Then the last entry of
> the old admin schedule stops being valid anymore from
> taprio_dequeue_from_txq’s perspective.
Admittedly, I may have become a bit detached from this code base in the
past months, but I don't understand the reasoning here.
Could you please explain what makes the last entry of the old admin
schedule be invalid from taprio_dequeue_from_txq()'s perspective?
What I see is that when should_change_schedules() is true, we change
q->oper_sched and q->admin_sched through the switch_schedules() call,
but we don't change q->current_entry, so I fail to understand the
connection you are implying.
On the other hand (and I see I did mention this in the other thread),
it seems that taprio_skb_exceeds_queue_max_sdu() - called from the
enqueue() path - looks at q->oper_sched, and that's a valid reason why
we'd want to delay the schedule switch until admin's actual base time,
rather than the current oper's cycle_end_time.
But please, let's spare no expense in providing a proper problem
description, justification for the change and Fixes: tag. This is not
optional.
> To solve this, we have to delay the switch_schedules() call via
> the new cycle_time_correction variable. The variable serves 2
> purposes:
> 1. Upon entering advance_sched(), if the value is set to a
> non-initialized value, it indicates that we need to change
> schedule.
> 2. Store the cycle time correction value which will be used for
> negative or positive correction.
It needs to be stated much more clearly that only purpose 1 is relevant
here (I would even go as far as to omit its secondary purpose here).
The only reason we are using the correction variable is because it
happens that we'll need that in later changes.
>
> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
I believe that since the only observable problem has to do with
taprio_skb_exceeds_queue_max_sdu(), the Fixes: tag should be the commit
which added that logic. Which is:
Fixes: a878fd46fe43 ("net/sched: keep the max_frm_len information inside struct sched_gate_list")
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 2e1949de4171..dee103647823 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
> #define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
> #define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
> #define TAPRIO_FLAGS_INVALID U32_MAX
> +#define INIT_CYCLE_TIME_CORRECTION S64_MIN
I would prefer naming it CYCLE_TIME_CORRECTION_INVALID or _UNSPEC.
It is not just used as the "initial" value.
>
> struct sched_entry {
> /* Durations between this GCL entry and the GCL entry where the
> @@ -75,6 +76,7 @@ struct sched_gate_list {
> ktime_t cycle_end_time;
> s64 cycle_time;
> s64 cycle_time_extension;
> + s64 cycle_time_correction;
> s64 base_time;
> };
>
> @@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> admin = rcu_dereference_protected(q->admin_sched,
> lockdep_is_held(&q->current_entry_lock));
>
> - if (!oper)
> + if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
You could introduce even as early as this change a "static bool
sched_switch_pending(struct sched_gate_list *oper)" function, which
incorporates the entire body of this "if" expression.
> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
> switch_schedules(q, &admin, &oper);
> + }
>
> /* This can happen in two cases: 1. this is the very first run
> * of this function (i.e. we weren't running any schedule
> @@ -981,7 +985,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> * schedule runs.
> */
> end_time = sched_base_time(admin);
> - switch_schedules(q, &admin, &oper);
> + oper->cycle_time_correction = 0;
> }
>
> next->end_time = end_time;
> @@ -1174,6 +1178,7 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb,
> }
>
> taprio_calculate_gate_durations(q, new);
> + new->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>
> return 0;
> }
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
2023-11-08 22:27 ` Vladimir Oltean
@ 2023-11-15 11:54 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:54 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 6:27 am, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:17AM -0500, Faizal Rahim wrote:
>> In the current taprio code for dynamic schedule change,
>> admin/oper schedule switching happens immediately when
>> should_change_schedules() is true. Then the last entry of
>> the old admin schedule stops being valid anymore from
>> taprio_dequeue_from_txq’s perspective.
>
> Admittedly, I may have become a bit detached from this code base in the
> past months, but I don't understand the reasoning here.
>
> Could you please explain what makes the last entry of the old admin
> schedule be invalid from taprio_dequeue_from_txq()'s perspective?
>
> What I see is that when should_change_schedules() is true, we change
> q->oper_sched and q->admin_sched through the switch_schedules() call,
> but we don't change q->current_entry, so I fail to understand the
> connection you are implying.
My bad – I used part of the explanation from the original author without
thoroughly checking it. I have some guesses about
taprio_dequeue_from_txq(), but they're not solid without more testing.
So, I'll swap it with your suggestion below, which highlights the obvious
issue.
> On the other hand (and I see I did mention this in the other thread),
> it seems that taprio_skb_exceeds_queue_max_sdu() - called from the
> enqueue() path - looks at q->oper_sched, and that's a valid reason why
> we'd want to delay the schedule switch until admin's actual base time,
> rather than the current oper's cycle_end_time.
Agree, observed this too.
> But please, let's spare no expense in providing a proper problem
> description, justification for the change and Fixes: tag. This is not
> optional.
>
>> To solve this, we have to delay the switch_schedules() call via
>> the new cycle_time_correction variable. The variable serves 2
>> purposes:
>> 1. Upon entering advance_sched(), if the value is set to a
>> non-initialized value, it indicates that we need to change
>> schedule.
>> 2. Store the cycle time correction value which will be used for
>> negative or positive correction.
>
> It needs to be stated much more clearly that only purpose 1 is relevant
> here (I would even go as far as to omit its secondary purpose here).
> The only reason we are using the correction variable is because it
> happens that we'll need that in later changes.
Got it.
>
>>
>> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
>
> I believe that since the only observable problem has to do with
> taprio_skb_exceeds_queue_max_sdu(), the Fixes: tag should be the commit
> which added that logic. Which is:
>
> Fixes: a878fd46fe43 ("net/sched: keep the max_frm_len information inside struct sched_gate_list")
Will replace the patch explanation with this. Thanks.
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 2e1949de4171..dee103647823 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
>> #define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
>> #define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
>> #define TAPRIO_FLAGS_INVALID U32_MAX
>> +#define INIT_CYCLE_TIME_CORRECTION S64_MIN
>
> I would prefer naming it CYCLE_TIME_CORRECTION_INVALID or _UNSPEC.
> It is not just used as the "initial" value.
>
>>
>> struct sched_entry {
>> /* Durations between this GCL entry and the GCL entry where the
>> @@ -75,6 +76,7 @@ struct sched_gate_list {
>> ktime_t cycle_end_time;
>> s64 cycle_time;
>> s64 cycle_time_extension;
>> + s64 cycle_time_correction;
>> s64 base_time;
>> };
>>
>> @@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> admin = rcu_dereference_protected(q->admin_sched,
>> lockdep_is_held(&q->current_entry_lock));
>>
>> - if (!oper)
>> + if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>
> You could introduce even as early as this change a "static bool
> sched_switch_pending(struct sched_gate_list *oper)" function, which
> incorporates the entire body of this "if" expression.
>
>> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>> switch_schedules(q, &admin, &oper);
>> + }
>>
>> /* This can happen in two cases: 1. this is the very first run
>> * of this function (i.e. we weren't running any schedule
>> @@ -981,7 +985,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> * schedule runs.
>> */
>> end_time = sched_base_time(admin);
>> - switch_schedules(q, &admin, &oper);
>> + oper->cycle_time_correction = 0;
>> }
>>
>> next->end_time = end_time;
>> @@ -1174,6 +1178,7 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb,
>> }
>>
>> taprio_calculate_gate_durations(q, new);
>> + new->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>>
>> return 0;
>> }
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
2023-11-07 11:20 ` [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching Faizal Rahim
2023-11-08 22:27 ` Vladimir Oltean
@ 2023-11-12 10:31 ` Simon Horman
2023-11-16 5:59 ` Abdul Rahim, Faizal
1 sibling, 1 reply; 31+ messages in thread
From: Simon Horman @ 2023-11-12 10:31 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:17AM -0500, Faizal Rahim wrote:
> In the current taprio code for dynamic schedule change,
> admin/oper schedule switching happens immediately when
> should_change_schedules() is true. Then the last entry of
> the old admin schedule stops being valid anymore from
> taprio_dequeue_from_txq’s perspective.
>
> To solve this, we have to delay the switch_schedules() call via
> the new cycle_time_correction variable. The variable serves 2
> purposes:
> 1. Upon entering advance_sched(), if the value is set to a
> non-initialized value, it indicates that we need to change
> schedule.
> 2. Store the cycle time correction value which will be used for
> negative or positive correction.
>
> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 2e1949de4171..dee103647823 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
> #define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
> #define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
> #define TAPRIO_FLAGS_INVALID U32_MAX
> +#define INIT_CYCLE_TIME_CORRECTION S64_MIN
>
> struct sched_entry {
> /* Durations between this GCL entry and the GCL entry where the
> @@ -75,6 +76,7 @@ struct sched_gate_list {
> ktime_t cycle_end_time;
> s64 cycle_time;
> s64 cycle_time_extension;
> + s64 cycle_time_correction;
> s64 base_time;
> };
>
> @@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> admin = rcu_dereference_protected(q->admin_sched,
> lockdep_is_held(&q->current_entry_lock));
>
> - if (!oper)
> + if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
Hi Faizal,
The first condition above expects that oper may be NULL, but the line below
dereferences it unconditionally. This doesn't seem correct, one way or the
other.
As flagged by Smatch and Coccinelle.
> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
> switch_schedules(q, &admin, &oper);
> + }
>
> /* This can happen in two cases: 1. this is the very first run
> * of this function (i.e. we weren't running any schedule
...
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching
2023-11-12 10:31 ` Simon Horman
@ 2023-11-16 5:59 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-16 5:59 UTC (permalink / raw)
To: Simon Horman
Cc: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
On 12/11/2023 6:31 pm, Simon Horman wrote:
> On Tue, Nov 07, 2023 at 06:20:17AM -0500, Faizal Rahim wrote:
>> In the current taprio code for dynamic schedule change,
>> admin/oper schedule switching happens immediately when
>> should_change_schedules() is true. Then the last entry of
>> the old admin schedule stops being valid anymore from
>> taprio_dequeue_from_txq’s perspective.
>>
>> To solve this, we have to delay the switch_schedules() call via
>> the new cycle_time_correction variable. The variable serves 2
>> purposes:
>> 1. Upon entering advance_sched(), if the value is set to a
>> non-initialized value, it indicates that we need to change
>> schedule.
>> 2. Store the cycle time correction value which will be used for
>> negative or positive correction.
>>
>> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 2e1949de4171..dee103647823 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -41,6 +41,7 @@ static struct static_key_false taprio_have_working_mqprio;
>> #define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
>> #define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
>> #define TAPRIO_FLAGS_INVALID U32_MAX
>> +#define INIT_CYCLE_TIME_CORRECTION S64_MIN
>>
>> struct sched_entry {
>> /* Durations between this GCL entry and the GCL entry where the
>> @@ -75,6 +76,7 @@ struct sched_gate_list {
>> ktime_t cycle_end_time;
>> s64 cycle_time;
>> s64 cycle_time_extension;
>> + s64 cycle_time_correction;
>> s64 base_time;
>> };
>>
>> @@ -940,8 +942,10 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> admin = rcu_dereference_protected(q->admin_sched,
>> lockdep_is_held(&q->current_entry_lock));
>>
>> - if (!oper)
>> + if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>
> Hi Faizal,
>
> The first condition above expects that oper may be NULL, but the line below
> dereferences it unconditionally. This doesn't seem correct, one way or the
> other.
>
Thanks for pointing this out, sorry for this blunder.
Will update.
> As flagged by Smatch and Coccinelle.
>
>> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>> switch_schedules(q, &admin, &oper);
>> + }
>>
>> /* This can happen in two cases: 1. this is the very first run
>> * of this function (i.e. we weren't running any schedule
>
> ...
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
2023-11-07 11:20 ` [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-08 23:20 ` Vladimir Oltean
2023-11-07 11:20 ` [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment Faizal Rahim
` (5 subsequent siblings)
7 siblings, 1 reply; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
"the Cycle Time Extension variable allows this extension of the last old
cycle to be done in a defined way. If the last complete old cycle would
normally end less than OperCycleTimeExtension nanoseconds before the new
base time, then the last complete cycle before AdminBaseTime is reached
is extended so that it ends at AdminBaseTime."
The current taprio implementation does not extend the last old cycle in
the defined manner specified in the Qbv Spec. This is part of the fix
covered in this patch.
Here are the changes made:
1. A new API, get_cycle_time_correction(), has been added to return the
correction value. If it returns a non-initialize value, it indicates
changes required for the next entry schedule, and upon the completion
of the next entry's duration, entries will be loaded from the new admin
schedule.
2. Store correction values in cycle_time_correction:
a) Positive correction value/extension
We calculate the correction between the last operational cycle and the
new admin base time. Note that for positive correction to take place,
the next entry should be the last entry from oper and the new admin base
time falls within the next cycle time of old oper.
b) Negative correction value
The new admin base time starts earlier than the next entry's end time.
c) Zero correction value
The new admin base time aligns exactly with the old cycle.
3. When cycle_time_correction is set to a non-initialized value, it is
used to:
a) Indicate that cycle correction is active to trigger adjustments in
affected fields like interval and cycle_time. A new API,
cycle_corr_active(), has been created to help with this purpose.
b) Transition to the new admin schedule at the beginning of
advance_sched(). A new API, should_change_sched(), has been introduced
for this purpose.
4. Remove the previous definition of should_change_scheds() API. A new
should_change_sched() API has been introduced, but it serves a
completely different purpose than the one that was removed.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
1 file changed, 70 insertions(+), 35 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index dee103647823..ed32654b46f5 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
}
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+ if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+ return false;
+ else
+ return true;
+}
+
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
* q->max_sdu[] requested by the user and the max_sdu dynamically determined by
* the maximum open gate durations at the given link speed.
@@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
return false;
}
-static bool should_change_schedules(const struct sched_gate_list *admin,
- const struct sched_gate_list *oper,
- ktime_t end_time)
+static bool should_change_sched(struct sched_gate_list *oper)
{
- ktime_t next_base_time, extension_time;
+ bool change_to_admin_sched = false;
- if (!admin)
- return false;
+ if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
+ /* The recent entry ran is the last one from oper */
+ change_to_admin_sched = true;
+ oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+ }
- next_base_time = sched_base_time(admin);
+ return change_to_admin_sched;
+}
- /* This is the simple case, the end_time would fall after
- * the next schedule base_time.
- */
- if (ktime_compare(next_base_time, end_time) <= 0)
+static bool should_extend_cycle(const struct sched_gate_list *oper,
+ ktime_t new_base_time,
+ ktime_t entry_end_time,
+ const struct sched_entry *entry)
+{
+ ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
+ oper->cycle_time);
+ bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
+ s64 extension_limit = oper->cycle_time_extension;
+
+ if (extension_supported &&
+ list_is_last(&entry->list, &oper->entries) &&
+ ktime_before(new_base_time, next_cycle_end_time) &&
+ ktime_sub(new_base_time, entry_end_time) < extension_limit)
return true;
+ else
+ return false;
+}
- /* This is the cycle_time_extension case, if the end_time
- * plus the amount that can be extended would fall after the
- * next schedule base_time, we can extend the current schedule
- * for that amount.
- */
- extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
+static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
+ ktime_t new_base_time,
+ ktime_t entry_end_time,
+ const struct sched_entry *entry)
+{
+ s64 correction = INIT_CYCLE_TIME_CORRECTION;
- /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
- * how precisely the extension should be made. So after
- * conformance testing, this logic may change.
- */
- if (ktime_compare(next_base_time, extension_time) <= 0)
- return true;
+ if (!entry || !oper)
+ return correction;
- return false;
+ if (ktime_compare(new_base_time, entry_end_time) <= 0) {
+ /* negative or zero correction */
+ correction = ktime_sub(new_base_time, entry_end_time);
+ } else if (ktime_after(new_base_time, entry_end_time) &&
+ should_extend_cycle(oper, new_base_time,
+ entry_end_time, entry)) {
+ /* positive correction */
+ correction = ktime_sub(new_base_time, entry_end_time);
+ }
+
+ return correction;
}
static enum hrtimer_restart advance_sched(struct hrtimer *timer)
@@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
admin = rcu_dereference_protected(q->admin_sched,
lockdep_is_held(&q->current_entry_lock));
- if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
- oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
+ if (!oper || should_change_sched(oper))
switch_schedules(q, &admin, &oper);
- }
/* This can happen in two cases: 1. this is the very first run
* of this function (i.e. we weren't running any schedule
@@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
end_time = ktime_add_ns(entry->end_time, next->interval);
end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
+ if (admin) {
+ ktime_t new_base_time = sched_base_time(admin);
+
+ oper->cycle_time_correction =
+ get_cycle_time_correction(oper, new_base_time,
+ end_time, next);
+
+ if (cycle_corr_active(oper->cycle_time_correction)) {
+ /* The next entry is the last entry we will run from
+ * oper, subsequent ones will take from the new admin
+ */
+ oper->cycle_end_time = new_base_time;
+ end_time = new_base_time;
+ }
+ }
+
for (tc = 0; tc < num_tc; tc++) {
if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
@@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
next->gate_duration[tc]);
}
- if (should_change_schedules(admin, oper, end_time)) {
- /* Set things so the next time this runs, the new
- * schedule runs.
- */
- end_time = sched_base_time(admin);
- oper->cycle_time_correction = 0;
- }
-
next->end_time = end_time;
taprio_set_budgets(q, oper, next);
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry
2023-11-07 11:20 ` [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry Faizal Rahim
@ 2023-11-08 23:20 ` Vladimir Oltean
2023-11-15 11:55 ` Abdul Rahim, Faizal
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-08 23:20 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:18AM -0500, Faizal Rahim wrote:
> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
> "the Cycle Time Extension variable allows this extension of the last old
> cycle to be done in a defined way. If the last complete old cycle would
> normally end less than OperCycleTimeExtension nanoseconds before the new
> base time, then the last complete cycle before AdminBaseTime is reached
> is extended so that it ends at AdminBaseTime."
So far, so good.
> The current taprio implementation does not extend the last old cycle in
> the defined manner specified in the Qbv Spec. This is part of the fix
> covered in this patch.
In the discussion on v1, I said that prior to commit a1e6ad30fa19
("net/sched: taprio: calculate guard band against actual TC gate close
time"), the last entry's next->close_time actually used to include the
oper schedule's correction, but it no longer does. This points to a
regression, and not to something that was never there. Am I wrong?
> Here are the changes made:
>
> 1. A new API, get_cycle_time_correction(), has been added to return the
I would call an "API" an interface between two distinct software layers,
based on an agreed-upon contract. Not a function in sch_taprio.c which
is called by another function in sch_taprio.c.
> correction value. If it returns a non-initialize value, it indicates
> changes required for the next entry schedule, and upon the completion
> of the next entry's duration, entries will be loaded from the new admin
> schedule.
This paragraph doesn't really help. It gets the reader lost in
irrelevant details which are actually not that hard to deduce from the
code with some good naming. Actually I find it poor naming to say
"non-initialize value" when what you mean is "!= INIT_CYCLE_TIME_CORRECTION".
I think I would name this a "specific" or "valid" cycle correction, when
it takes a value different from CYCLE_TIME_CORRECTION_UNSPEC.
>
> 2. Store correction values in cycle_time_correction:
> a) Positive correction value/extension
> We calculate the correction between the last operational cycle and the
> new admin base time. Note that for positive correction to take place,
> the next entry should be the last entry from oper and the new admin base
> time falls within the next cycle time of old oper.
>
> b) Negative correction value
> The new admin base time starts earlier than the next entry's end time.
>
> c) Zero correction value
> The new admin base time aligns exactly with the old cycle.
>
> 3. When cycle_time_correction is set to a non-initialized value, it is
> used to:
> a) Indicate that cycle correction is active to trigger adjustments in
> affected fields like interval and cycle_time. A new API,
> cycle_corr_active(), has been created to help with this purpose.
Again, it's exaggerated to call this an "API".
Although, what you can do is provide a kerneldoc-style comment above the
functions which you wish to describe, and remove this explanation from
the commit message.
>
> b) Transition to the new admin schedule at the beginning of
> advance_sched(). A new API, should_change_sched(), has been introduced
> for this purpose.
This should have been done since patch 1, not here.
> 4. Remove the previous definition of should_change_scheds() API. A new
> should_change_sched() API has been introduced, but it serves a
> completely different purpose than the one that was removed.
So don't name it the same!
>
> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
> 1 file changed, 70 insertions(+), 35 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index dee103647823..ed32654b46f5 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
> return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> }
>
> +static bool cycle_corr_active(s64 cycle_time_correction)
> +{
> + if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> + return false;
> + else
> + return true;
> +}
> +
Could look like this:
static bool cycle_corr_active(struct sched_gate_list *oper)
{
return oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION;
}
> /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
> * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
> * the maximum open gate durations at the given link speed.
> @@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
> return false;
> }
>
> -static bool should_change_schedules(const struct sched_gate_list *admin,
> - const struct sched_gate_list *oper,
> - ktime_t end_time)
> +static bool should_change_sched(struct sched_gate_list *oper)
> {
> - ktime_t next_base_time, extension_time;
> + bool change_to_admin_sched = false;
>
> - if (!admin)
> - return false;
> + if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
> + /* The recent entry ran is the last one from oper */
> + change_to_admin_sched = true;
> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
Don't make a function called should_change_sched() stateful. Don't make
this function reset the value of oper->cycle_time_correction, since that
practically equates with actually starting the schedule change.
The oper->cycle_time_correction assignment actually belongs to
switch_schedules(), in my opinion.
And if you make should_change_sched() stateless, then surprise-surprise,
it will contain the exact same logic, and return the exact same thing,
as cycle_corr_active(). I think that naming this single function
sched_change_pending() and providing a kerneldoc comment as to why it is
implemented the way it is should be sufficient.
> + }
>
> - next_base_time = sched_base_time(admin);
> + return change_to_admin_sched;
> +}
>
> - /* This is the simple case, the end_time would fall after
> - * the next schedule base_time.
> - */
> - if (ktime_compare(next_base_time, end_time) <= 0)
> +static bool should_extend_cycle(const struct sched_gate_list *oper,
> + ktime_t new_base_time,
> + ktime_t entry_end_time,
> + const struct sched_entry *entry)
> +{
> + ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
> + oper->cycle_time);
> + bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
"? true : false" is redundant. Just "extension_supported = oper->cycle_time_extension > 0"
is enough.
> + s64 extension_limit = oper->cycle_time_extension;
> +
> + if (extension_supported &&
> + list_is_last(&entry->list, &oper->entries) &&
> + ktime_before(new_base_time, next_cycle_end_time) &&
> + ktime_sub(new_base_time, entry_end_time) < extension_limit)
> return true;
> + else
> + return false;
Style nitpick:
return extension_supported &&
list_is_last(&entry->list, &oper->entries) &&
ktime_before(new_base_time, next_cycle_end_time) &&
ktime_sub(new_base_time, entry_end_time) < extension_limit;
> +}
>
> - /* This is the cycle_time_extension case, if the end_time
> - * plus the amount that can be extended would fall after the
> - * next schedule base_time, we can extend the current schedule
> - * for that amount.
> - */
> - extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
> +static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
> + ktime_t new_base_time,
> + ktime_t entry_end_time,
> + const struct sched_entry *entry)
> +{
> + s64 correction = INIT_CYCLE_TIME_CORRECTION;
>
> - /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
> - * how precisely the extension should be made. So after
> - * conformance testing, this logic may change.
> - */
> - if (ktime_compare(next_base_time, extension_time) <= 0)
> - return true;
> + if (!entry || !oper)
> + return correction;
This function is called as follows:
oper->cycle_time_correction = get_cycle_time_correction(oper, ...);
So, "oper" cannot be NULL if we dereference "oper" in the left hand side
of the assignment and expect the kernel not to crash, no?
"entry" - assigned from the "next" variable in advance_sched() - should
not be NULL either, from the way in which it is assigned.
>
> - return false;
> + if (ktime_compare(new_base_time, entry_end_time) <= 0) {
> + /* negative or zero correction */
At least for me, it would be helpful if you could transplant the
explanation from the commit message ("The new admin base time starts
earlier than the next entry's end time") into an expanded comment here.
I am easily confused about the "ktime_compare(a, b) <= 0" construction.
> + correction = ktime_sub(new_base_time, entry_end_time);
> + } else if (ktime_after(new_base_time, entry_end_time) &&
> + should_extend_cycle(oper, new_base_time,
> + entry_end_time, entry)) {
> + /* positive correction */
Same here - move the explanation from the commit message to the comment,
please.
> + correction = ktime_sub(new_base_time, entry_end_time);
> + }
> +
> + return correction;
> }
>
> static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> @@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> admin = rcu_dereference_protected(q->admin_sched,
> lockdep_is_held(&q->current_entry_lock));
>
> - if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
> - oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
> + if (!oper || should_change_sched(oper))
> switch_schedules(q, &admin, &oper);
> - }
>
> /* This can happen in two cases: 1. this is the very first run
> * of this function (i.e. we weren't running any schedule
> @@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> end_time = ktime_add_ns(entry->end_time, next->interval);
> end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
>
> + if (admin) {
> + ktime_t new_base_time = sched_base_time(admin);
> +
> + oper->cycle_time_correction =
> + get_cycle_time_correction(oper, new_base_time,
> + end_time, next);
> +
> + if (cycle_corr_active(oper->cycle_time_correction)) {
> + /* The next entry is the last entry we will run from
> + * oper, subsequent ones will take from the new admin
> + */
> + oper->cycle_end_time = new_base_time;
> + end_time = new_base_time;
> + }
> + }
> +
> for (tc = 0; tc < num_tc; tc++) {
> if (next->gate_duration[tc] == oper->cycle_time)
> next->gate_close_time[tc] = KTIME_MAX;
> @@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> next->gate_duration[tc]);
> }
>
> - if (should_change_schedules(admin, oper, end_time)) {
> - /* Set things so the next time this runs, the new
> - * schedule runs.
> - */
> - end_time = sched_base_time(admin);
> - oper->cycle_time_correction = 0;
> - }
> -
> next->end_time = end_time;
> taprio_set_budgets(q, oper, next);
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry
2023-11-08 23:20 ` Vladimir Oltean
@ 2023-11-15 11:55 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:55 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 7:20 am, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:18AM -0500, Faizal Rahim wrote:
>> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension:
>> "the Cycle Time Extension variable allows this extension of the last old
>> cycle to be done in a defined way. If the last complete old cycle would
>> normally end less than OperCycleTimeExtension nanoseconds before the new
>> base time, then the last complete cycle before AdminBaseTime is reached
>> is extended so that it ends at AdminBaseTime."
>
> So far, so good.
>
>> The current taprio implementation does not extend the last old cycle in
>> the defined manner specified in the Qbv Spec. This is part of the fix
>> covered in this patch.
>
> In the discussion on v1, I said that prior to commit a1e6ad30fa19
> ("net/sched: taprio: calculate guard band against actual TC gate close
> time"), the last entry's next->close_time actually used to include the
> oper schedule's correction, but it no longer does. This points to a
> regression, and not to something that was never there. Am I wrong?
That is correct, that patch you mentioned is a regression to some of the
extension correction logic.
The regression caused by the patch you mentioned is resolved by
("net/sched: taprio: update impacted fields during cycle time adjustment").
Here's a snippet:
if (cycle_corr_active(oper->cycle_time_correction) &&
(next->gate_mask & BIT(tc)))
next->gate_close_time[tc] = end_time;
The `end_time` here is already corrected. The corrected `gate_close_time`
is then used by taprio_dequeue_from_txq() -> taprio_entry_allows_tx(),
which now takes dynamic scheduling into account.
However, the extension logic had other issues even before that patch.
Example, in should_change_schedules(), it didn't consider if the next entry
is the last one in the oper cycle before extending the schedule.
Although this comes as no surprise as there was already a FIXME tag in
should_change_schedules().
This patch primarily addresses the should_change_schedules() logic using
the new get_cycle_time_correction().
>> Here are the changes made:
>>
>> 1. A new API, get_cycle_time_correction(), has been added to return the
>
> I would call an "API" an interface between two distinct software layers,
> based on an agreed-upon contract. Not a function in sch_taprio.c which
> is called by another function in sch_taprio.c.
Alright, my use of the term "API" was a bit casual without much
consideration – my mistake.
>> correction value. If it returns a non-initialize value, it indicates
>> changes required for the next entry schedule, and upon the completion
>> of the next entry's duration, entries will be loaded from the new admin
>> schedule.
>
> This paragraph doesn't really help. It gets the reader lost in
> irrelevant details which are actually not that hard to deduce from the
> code with some good naming. Actually I find it poor naming to say
> "non-initialize value" when what you mean is "!= INIT_CYCLE_TIME_CORRECTION".
> I think I would name this a "specific" or "valid" cycle correction, when
> it takes a value different from CYCLE_TIME_CORRECTION_UNSPEC.
Will rename
>>
>> 2. Store correction values in cycle_time_correction:
>> a) Positive correction value/extension
>> We calculate the correction between the last operational cycle and the
>> new admin base time. Note that for positive correction to take place,
>> the next entry should be the last entry from oper and the new admin base
>> time falls within the next cycle time of old oper.
>>
>> b) Negative correction value
>> The new admin base time starts earlier than the next entry's end time.
>>
>> c) Zero correction value
>> The new admin base time aligns exactly with the old cycle.
>>
>> 3. When cycle_time_correction is set to a non-initialized value, it is
>> used to:
>> a) Indicate that cycle correction is active to trigger adjustments in
>> affected fields like interval and cycle_time. A new API,
>> cycle_corr_active(), has been created to help with this purpose.
>
> Again, it's exaggerated to call this an "API".
> Although, what you can do is provide a kerneldoc-style comment above the
> functions which you wish to describe, and remove this explanation from
> the commit message.
okay
>>
>> b) Transition to the new admin schedule at the beginning of
>> advance_sched(). A new API, should_change_sched(), has been introduced
>> for this purpose.
>
> This should have been done since patch 1, not here.
Understood. My bad - that would have been a better choice.
>> 4. Remove the previous definition of should_change_scheds() API. A new
>> should_change_sched() API has been introduced, but it serves a
>> completely different purpose than the one that was removed.
>
> So don't name it the same!
>
>>
>> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 105 +++++++++++++++++++++++++++--------------
>> 1 file changed, 70 insertions(+), 35 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index dee103647823..ed32654b46f5 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -259,6 +259,14 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
>> return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
>> }
>>
>> +static bool cycle_corr_active(s64 cycle_time_correction)
>> +{
>> + if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
>> + return false;
>> + else
>> + return true;
>> +}
>> +
>
> Could look like this:
>
> static bool cycle_corr_active(struct sched_gate_list *oper)
> {
> return oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION;
> }
>
>> /* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
>> * q->max_sdu[] requested by the user and the max_sdu dynamically determined by
>> * the maximum open gate durations at the given link speed.
>> @@ -888,38 +896,59 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
>> return false;
>> }
>>
>> -static bool should_change_schedules(const struct sched_gate_list *admin,
>> - const struct sched_gate_list *oper,
>> - ktime_t end_time)
>> +static bool should_change_sched(struct sched_gate_list *oper)
>> {
>> - ktime_t next_base_time, extension_time;
>> + bool change_to_admin_sched = false;
>>
>> - if (!admin)
>> - return false;
>> + if (oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>> + /* The recent entry ran is the last one from oper */
>> + change_to_admin_sched = true;
>> + oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>
> Don't make a function called should_change_sched() stateful. Don't make
> this function reset the value of oper->cycle_time_correction, since that
> practically equates with actually starting the schedule change.
>
> The oper->cycle_time_correction assignment actually belongs to
> switch_schedules(), in my opinion.
>
> And if you make should_change_sched() stateless, then surprise-surprise,
> it will contain the exact same logic, and return the exact same thing,
> as cycle_corr_active(). I think that naming this single function
> sched_change_pending() and providing a kerneldoc comment as to why it is
> implemented the way it is should be sufficient.
>
I intentionally made should_change_schedule() slightly different from
cycle_corr_active() and, unfortunately, stateful by resetting
oper->cycle_time_correction. My aim was to have two functions with clear
names reflecting their intentions based on usage.
For instance, this:
if (should_change_schedule(oper)) {
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
is less intuitive than this:
if (cycle_corr_active(oper->cycle_time_correction)) {
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
And this:
if (!oper || should_change_sched(oper))
switch_schedules(q, &admin, &oper);
reads clearer than:
if (!oper || cycle_corr_active(oper->cycle_time_correction))
switch_schedules(q, &admin, &oper);
Normally I prefer clear function names that don't need comments for
explanation. But I probably overthink it, seems to have more cons than
pros. Will replace with a single sched_change_pending() as suggested, thanks.
>> + }
>>
>> - next_base_time = sched_base_time(admin);
>> + return change_to_admin_sched;
>> +}
>>
>> - /* This is the simple case, the end_time would fall after
>> - * the next schedule base_time.
>> - */
>> - if (ktime_compare(next_base_time, end_time) <= 0)
>> +static bool should_extend_cycle(const struct sched_gate_list *oper,
>> + ktime_t new_base_time,
>> + ktime_t entry_end_time,
>> + const struct sched_entry *entry)
>> +{
>> + ktime_t next_cycle_end_time = ktime_add_ns(oper->cycle_end_time,
>> + oper->cycle_time);
>> + bool extension_supported = oper->cycle_time_extension > 0 ? true : false;
>
> "? true : false" is redundant. Just "extension_supported = oper->cycle_time_extension > 0"
> is enough.
Ooops sorry for this blunder. Will change.
>> + s64 extension_limit = oper->cycle_time_extension;
>> +
>> + if (extension_supported &&
>> + list_is_last(&entry->list, &oper->entries) &&
>> + ktime_before(new_base_time, next_cycle_end_time) &&
>> + ktime_sub(new_base_time, entry_end_time) < extension_limit)
>> return true;
>> + else
>> + return false;
>
> Style nitpick:
>
> return extension_supported &&
> list_is_last(&entry->list, &oper->entries) &&
> ktime_before(new_base_time, next_cycle_end_time) &&
> ktime_sub(new_base_time, entry_end_time) < extension_limit;
>
>> +}
>>
>> - /* This is the cycle_time_extension case, if the end_time
>> - * plus the amount that can be extended would fall after the
>> - * next schedule base_time, we can extend the current schedule
>> - * for that amount.
>> - */
>> - extension_time = ktime_add_ns(end_time, oper->cycle_time_extension);
>> +static s64 get_cycle_time_correction(const struct sched_gate_list *oper,
>> + ktime_t new_base_time,
>> + ktime_t entry_end_time,
>> + const struct sched_entry *entry)
>> +{
>> + s64 correction = INIT_CYCLE_TIME_CORRECTION;
>>
>> - /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
>> - * how precisely the extension should be made. So after
>> - * conformance testing, this logic may change.
>> - */
>> - if (ktime_compare(next_base_time, extension_time) <= 0)
>> - return true;
>> + if (!entry || !oper)
>> + return correction;
>
> This function is called as follows:
>
> oper->cycle_time_correction = get_cycle_time_correction(oper, ...);
>
> So, "oper" cannot be NULL if we dereference "oper" in the left hand side
> of the assignment and expect the kernel not to crash, no?
>
> "entry" - assigned from the "next" variable in advance_sched() - should
> not be NULL either, from the way in which it is assigned.
My bad on the oper null check.
Will remove both.
>>
>> - return false;
>> + if (ktime_compare(new_base_time, entry_end_time) <= 0) {
>> + /* negative or zero correction */
>
> At least for me, it would be helpful if you could transplant the
> explanation from the commit message ("The new admin base time starts
> earlier than the next entry's end time") into an expanded comment here.
> I am easily confused about the "ktime_compare(a, b) <= 0" construction.
>
>> + correction = ktime_sub(new_base_time, entry_end_time);
>> + } else if (ktime_after(new_base_time, entry_end_time) &&
>> + should_extend_cycle(oper, new_base_time,
>> + entry_end_time, entry)) {
>> + /* positive correction */
>
> Same here - move the explanation from the commit message to the comment,
> please.
Will do.
>
>> + correction = ktime_sub(new_base_time, entry_end_time);
>> + }
>> +
>> + return correction;
>> }
>>
>> static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> @@ -942,10 +971,8 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> admin = rcu_dereference_protected(q->admin_sched,
>> lockdep_is_held(&q->current_entry_lock));
>>
>> - if (!oper || oper->cycle_time_correction != INIT_CYCLE_TIME_CORRECTION) {
>> - oper->cycle_time_correction = INIT_CYCLE_TIME_CORRECTION;
>> + if (!oper || should_change_sched(oper))
>> switch_schedules(q, &admin, &oper);
>> - }
>>
>> /* This can happen in two cases: 1. this is the very first run
>> * of this function (i.e. we weren't running any schedule
>> @@ -972,6 +999,22 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> end_time = ktime_add_ns(entry->end_time, next->interval);
>> end_time = min_t(ktime_t, end_time, oper->cycle_end_time);
>>
>> + if (admin) {
>> + ktime_t new_base_time = sched_base_time(admin);
>> +
>> + oper->cycle_time_correction =
>> + get_cycle_time_correction(oper, new_base_time,
>> + end_time, next);
>> +
>> + if (cycle_corr_active(oper->cycle_time_correction)) {
>> + /* The next entry is the last entry we will run from
>> + * oper, subsequent ones will take from the new admin
>> + */
>> + oper->cycle_end_time = new_base_time;
>> + end_time = new_base_time;
>> + }
>> + }
>> +
>> for (tc = 0; tc < num_tc; tc++) {
>> if (next->gate_duration[tc] == oper->cycle_time)
>> next->gate_close_time[tc] = KTIME_MAX;
>> @@ -980,14 +1023,6 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> next->gate_duration[tc]);
>> }
>>
>> - if (should_change_schedules(admin, oper, end_time)) {
>> - /* Set things so the next time this runs, the new
>> - * schedule runs.
>> - */
>> - end_time = sched_base_time(admin);
>> - oper->cycle_time_correction = 0;
>> - }
>> -
>> next->end_time = end_time;
>> taprio_set_budgets(q, oper, next);
>>
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
2023-11-07 11:20 ` [PATCH v2 net 1/7] net/sched: taprio: fix too early schedules switching Faizal Rahim
2023-11-07 11:20 ` [PATCH v2 net 2/7] net/sched: taprio: fix cycle time adjustment for next entry Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-08 23:41 ` Vladimir Oltean
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
` (4 subsequent siblings)
7 siblings, 1 reply; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
Update impacted fields in advance_sched() if cycle_corr_active()
is true, which indicates that the next entry is the last entry
from oper that it will run.
Update impacted fields:
1. gate_duration[tc], max_open_gate_duration[tc]
Created a new API update_open_gate_duration().The API sets the
duration based on the last remaining entry, the original value
was based on consideration of multiple entries.
2. gate_close_time[tc]
Update next entry gate close time according to the new admin
base time
3. max_sdu[tc], budget[tc]
Restrict from setting to max value because there's only a single
entry left to run from oper before changing to the new admin
schedule, so we shouldn't set to max.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 49 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index ed32654b46f5..119dec3bbe88 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -288,7 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q,
/* TC gate never closes => keep the queueMaxSDU
* selected by the user
*/
- if (sched->max_open_gate_duration[tc] == sched->cycle_time) {
+ if (sched->max_open_gate_duration[tc] == sched->cycle_time &&
+ !cycle_corr_active(sched->cycle_time_correction)) {
max_sdu_dynamic = U32_MAX;
} else {
u32 max_frm_len;
@@ -684,7 +685,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
for (tc = 0; tc < num_tc; tc++) {
/* Traffic classes which never close have infinite budget */
- if (entry->gate_duration[tc] == sched->cycle_time)
+ if (entry->gate_duration[tc] == sched->cycle_time &&
+ !cycle_corr_active(sched->cycle_time_correction))
budget = INT_MAX;
else
budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
@@ -896,6 +898,32 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
return false;
}
+/* Open gate duration were calculated at the beginning with consideration of
+ * multiple entries. If cycle time correction is active, there's only a single
+ * remaining entry left from oper to run.
+ * Update open gate duration based on this last entry.
+ */
+static void update_open_gate_duration(struct sched_entry *entry,
+ struct sched_gate_list *oper,
+ int num_tc,
+ u64 open_gate_duration)
+{
+ int tc;
+
+ if (!entry || !oper)
+ return;
+
+ for (tc = 0; tc < num_tc; tc++) {
+ if (entry->gate_mask & BIT(tc)) {
+ entry->gate_duration[tc] = open_gate_duration;
+ oper->max_open_gate_duration[tc] = open_gate_duration;
+ } else {
+ entry->gate_duration[tc] = 0;
+ oper->max_open_gate_duration[tc] = 0;
+ }
+ }
+}
+
static bool should_change_sched(struct sched_gate_list *oper)
{
bool change_to_admin_sched = false;
@@ -1010,13 +1038,28 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
/* The next entry is the last entry we will run from
* oper, subsequent ones will take from the new admin
*/
+ u64 new_gate_duration =
+ next->interval + oper->cycle_time_correction;
+ struct qdisc_size_table *stab =
+ rtnl_dereference(q->root->stab);
+
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
+
+ update_open_gate_duration(next, oper, num_tc,
+ new_gate_duration);
+ taprio_update_queue_max_sdu(q, oper, stab);
}
}
for (tc = 0; tc < num_tc; tc++) {
- if (next->gate_duration[tc] == oper->cycle_time)
+ if (cycle_corr_active(oper->cycle_time_correction) &&
+ (next->gate_mask & BIT(tc)))
+ /* Set to the new base time, ensuring a smooth transition
+ * to the new schedule when the next entry finishes.
+ */
+ next->gate_close_time[tc] = end_time;
+ else if (next->gate_duration[tc] == oper->cycle_time)
next->gate_close_time[tc] = KTIME_MAX;
else
next->gate_close_time[tc] = ktime_add_ns(entry->end_time,
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment
2023-11-07 11:20 ` [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment Faizal Rahim
@ 2023-11-08 23:41 ` Vladimir Oltean
2023-11-15 11:55 ` Abdul Rahim, Faizal
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-08 23:41 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:19AM -0500, Faizal Rahim wrote:
> Update impacted fields in advance_sched() if cycle_corr_active()
> is true, which indicates that the next entry is the last entry
> from oper that it will run.
>
> Update impacted fields:
>
> 1. gate_duration[tc], max_open_gate_duration[tc]
> Created a new API update_open_gate_duration().The API sets the
> duration based on the last remaining entry, the original value
> was based on consideration of multiple entries.
>
> 2. gate_close_time[tc]
> Update next entry gate close time according to the new admin
> base time
>
> 3. max_sdu[tc], budget[tc]
> Restrict from setting to max value because there's only a single
> entry left to run from oper before changing to the new admin
> schedule, so we shouldn't set to max.
>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
The commit message shouldn't be a text-to-speech output of the commit body.
Say very shortly how the system should behave, what's wrong such that it
doesn't behave as expected, what's the user-visible impact of the bug,
and try to identify why the bug happened.
In this case, what happened is that commit a306a90c8ffe ("net/sched:
taprio: calculate tc gate durations"), which introduced the impacted
fields you are changing, never took dynamic schedule changes into
consideration. So this commit should also appear in the Fixes: tag.
> ---
> net/sched/sch_taprio.c | 49 +++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 46 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index ed32654b46f5..119dec3bbe88 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -288,7 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q,
> /* TC gate never closes => keep the queueMaxSDU
> * selected by the user
> */
> - if (sched->max_open_gate_duration[tc] == sched->cycle_time) {
> + if (sched->max_open_gate_duration[tc] == sched->cycle_time &&
> + !cycle_corr_active(sched->cycle_time_correction)) {
> max_sdu_dynamic = U32_MAX;
> } else {
> u32 max_frm_len;
> @@ -684,7 +685,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
>
> for (tc = 0; tc < num_tc; tc++) {
> /* Traffic classes which never close have infinite budget */
> - if (entry->gate_duration[tc] == sched->cycle_time)
> + if (entry->gate_duration[tc] == sched->cycle_time &&
> + !cycle_corr_active(sched->cycle_time_correction))
> budget = INT_MAX;
> else
> budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
> @@ -896,6 +898,32 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
> return false;
> }
>
> +/* Open gate duration were calculated at the beginning with consideration of
> + * multiple entries. If cycle time correction is active, there's only a single
> + * remaining entry left from oper to run.
> + * Update open gate duration based on this last entry.
> + */
> +static void update_open_gate_duration(struct sched_entry *entry,
> + struct sched_gate_list *oper,
> + int num_tc,
> + u64 open_gate_duration)
> +{
> + int tc;
> +
> + if (!entry || !oper)
> + return;
> +
> + for (tc = 0; tc < num_tc; tc++) {
> + if (entry->gate_mask & BIT(tc)) {
> + entry->gate_duration[tc] = open_gate_duration;
> + oper->max_open_gate_duration[tc] = open_gate_duration;
> + } else {
> + entry->gate_duration[tc] = 0;
> + oper->max_open_gate_duration[tc] = 0;
> + }
> + }
> +}
> +
> static bool should_change_sched(struct sched_gate_list *oper)
> {
> bool change_to_admin_sched = false;
> @@ -1010,13 +1038,28 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
> /* The next entry is the last entry we will run from
> * oper, subsequent ones will take from the new admin
> */
> + u64 new_gate_duration =
> + next->interval + oper->cycle_time_correction;
> + struct qdisc_size_table *stab =
> + rtnl_dereference(q->root->stab);
> +
The lockdep annotation for this RCU accessor is bogus.
rtnl_dereference() is the same as rcu_dereference_protected(..., lockdep_rtnl_is_held()),
which cannot be true in a hrtimer callback, as the rtnetlink lock is a
sleepable mutex and hrtimers run in atomic context.
Running with lockdep enabled will tell you as much:
$ ./test_taprio_cycle_extension.sh
Testing config change with a delay of 5250000000 ns between schedules
[ 100.734925]
[ 100.736703] =============================
[ 100.740780] WARNING: suspicious RCU usage
[ 100.744857] 6.6.0-10114-gca572939947f #1495 Not tainted
[ 100.750162] -----------------------------
[ 100.754236] net/sched/sch_taprio.c:1064 suspicious rcu_dereference_protected() usage!
[ 100.762155]
[ 100.762155] other info that might help us debug this:
[ 100.762155]
[ 100.770242]
[ 100.770242] rcu_scheduler_active = 2, debug_locks = 1
[ 100.776851] 1 lock held by swapper/0/0:
[ 100.780756] #0: ffff3d9784b83b00 (&q->current_entry_lock){-...}-{3:3}, at: advance_sched+0x44/0x59c
[ 100.790099]
[ 100.790099] stack backtrace:
[ 100.794477] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-10114-gca572939947f #1495
[ 100.802346] Hardware name: LS1028A RDB Board (DT)
[ 100.807072] Call trace:
[ 100.809531] dump_backtrace+0xf4/0x140
[ 100.813305] show_stack+0x18/0x2c
[ 100.816638] dump_stack_lvl+0x60/0x80
[ 100.820321] dump_stack+0x18/0x24
[ 100.823654] lockdep_rcu_suspicious+0x170/0x210
[ 100.828210] advance_sched+0x384/0x59c
[ 100.831978] __hrtimer_run_queues+0x200/0x430
[ 100.836360] hrtimer_interrupt+0xdc/0x39c
[ 100.840392] arch_timer_handler_phys+0x3c/0x4c
[ 100.844862] handle_percpu_devid_irq+0xb8/0x28c
[ 100.849417] generic_handle_domain_irq+0x2c/0x44
[ 100.854060] gic_handle_irq+0x4c/0x110
[ 100.857830] call_on_irq_stack+0x24/0x4c
[ 100.861775] el1_interrupt+0x74/0xc0
[ 100.865370] el1h_64_irq_handler+0x18/0x24
[ 100.869489] el1h_64_irq+0x64/0x68
[ 100.872909] arch_local_irq_enable+0x8/0xc
[ 100.877032] cpuidle_enter+0x38/0x50
[ 100.880629] do_idle+0x1ec/0x280
[ 100.883877] cpu_startup_entry+0x34/0x38
[ 100.887822] kernel_init+0x0/0x1a0
[ 100.891245] start_kernel+0x0/0x3b0
[ 100.894756] start_kernel+0x2f8/0x3b0
[ 100.898439] __primary_switched+0xbc/0xc4
What I would do is:
struct qdisc_size_table *stab;
rcu_read_lock();
stab = rcu_dereference(q->root->stab);
taprio_update_queue_max_sdu(q, oper, stab);
rcu_read_unlock();
> oper->cycle_end_time = new_base_time;
> end_time = new_base_time;
> +
> + update_open_gate_duration(next, oper, num_tc,
> + new_gate_duration);
> + taprio_update_queue_max_sdu(q, oper, stab);
> }
> }
>
> for (tc = 0; tc < num_tc; tc++) {
> - if (next->gate_duration[tc] == oper->cycle_time)
> + if (cycle_corr_active(oper->cycle_time_correction) &&
> + (next->gate_mask & BIT(tc)))
> + /* Set to the new base time, ensuring a smooth transition
> + * to the new schedule when the next entry finishes.
> + */
> + next->gate_close_time[tc] = end_time;
> + else if (next->gate_duration[tc] == oper->cycle_time)
> next->gate_close_time[tc] = KTIME_MAX;
> else
> next->gate_close_time[tc] = ktime_add_ns(entry->end_time,
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment
2023-11-08 23:41 ` Vladimir Oltean
@ 2023-11-15 11:55 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:55 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 7:41 am, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:19AM -0500, Faizal Rahim wrote:
>> Update impacted fields in advance_sched() if cycle_corr_active()
>> is true, which indicates that the next entry is the last entry
>> from oper that it will run.
>>
>> Update impacted fields:
>>
>> 1. gate_duration[tc], max_open_gate_duration[tc]
>> Created a new API update_open_gate_duration().The API sets the
>> duration based on the last remaining entry, the original value
>> was based on consideration of multiple entries.
>>
>> 2. gate_close_time[tc]
>> Update next entry gate close time according to the new admin
>> base time
>>
>> 3. max_sdu[tc], budget[tc]
>> Restrict from setting to max value because there's only a single
>> entry left to run from oper before changing to the new admin
>> schedule, so we shouldn't set to max.
>>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>
> The commit message shouldn't be a text-to-speech output of the commit body.
> Say very shortly how the system should behave, what's wrong such that it
> doesn't behave as expected, what's the user-visible impact of the bug,
> and try to identify why the bug happened.
>
> In this case, what happened is that commit a306a90c8ffe ("net/sched:
> taprio: calculate tc gate durations"), which introduced the impacted
> fields you are changing, never took dynamic schedule changes into
> consideration. So this commit should also appear in the Fixes: tag.
Got it.
>> ---
>> net/sched/sch_taprio.c | 49 +++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 46 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index ed32654b46f5..119dec3bbe88 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -288,7 +288,8 @@ static void taprio_update_queue_max_sdu(struct taprio_sched *q,
>> /* TC gate never closes => keep the queueMaxSDU
>> * selected by the user
>> */
>> - if (sched->max_open_gate_duration[tc] == sched->cycle_time) {
>> + if (sched->max_open_gate_duration[tc] == sched->cycle_time &&
>> + !cycle_corr_active(sched->cycle_time_correction)) {
>> max_sdu_dynamic = U32_MAX;
>> } else {
>> u32 max_frm_len;
>> @@ -684,7 +685,8 @@ static void taprio_set_budgets(struct taprio_sched *q,
>>
>> for (tc = 0; tc < num_tc; tc++) {
>> /* Traffic classes which never close have infinite budget */
>> - if (entry->gate_duration[tc] == sched->cycle_time)
>> + if (entry->gate_duration[tc] == sched->cycle_time &&
>> + !cycle_corr_active(sched->cycle_time_correction))
>> budget = INT_MAX;
>> else
>> budget = div64_u64((u64)entry->gate_duration[tc] * PSEC_PER_NSEC,
>> @@ -896,6 +898,32 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
>> return false;
>> }
>>
>> +/* Open gate duration were calculated at the beginning with consideration of
>> + * multiple entries. If cycle time correction is active, there's only a single
>> + * remaining entry left from oper to run.
>> + * Update open gate duration based on this last entry.
>> + */
>> +static void update_open_gate_duration(struct sched_entry *entry,
>> + struct sched_gate_list *oper,
>> + int num_tc,
>> + u64 open_gate_duration)
>> +{
>> + int tc;
>> +
>> + if (!entry || !oper)
>> + return;
>> +
>> + for (tc = 0; tc < num_tc; tc++) {
>> + if (entry->gate_mask & BIT(tc)) {
>> + entry->gate_duration[tc] = open_gate_duration;
>> + oper->max_open_gate_duration[tc] = open_gate_duration;
>> + } else {
>> + entry->gate_duration[tc] = 0;
>> + oper->max_open_gate_duration[tc] = 0;
>> + }
>> + }
>> +}
>> +
>> static bool should_change_sched(struct sched_gate_list *oper)
>> {
>> bool change_to_admin_sched = false;
>> @@ -1010,13 +1038,28 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
>> /* The next entry is the last entry we will run from
>> * oper, subsequent ones will take from the new admin
>> */
>> + u64 new_gate_duration =
>> + next->interval + oper->cycle_time_correction;
>> + struct qdisc_size_table *stab =
>> + rtnl_dereference(q->root->stab);
>> +
>
> The lockdep annotation for this RCU accessor is bogus.
> rtnl_dereference() is the same as rcu_dereference_protected(..., lockdep_rtnl_is_held()),
> which cannot be true in a hrtimer callback, as the rtnetlink lock is a
> sleepable mutex and hrtimers run in atomic context.
>
> Running with lockdep enabled will tell you as much:
>
> $ ./test_taprio_cycle_extension.sh
> Testing config change with a delay of 5250000000 ns between schedules
> [ 100.734925]
> [ 100.736703] =============================
> [ 100.740780] WARNING: suspicious RCU usage
> [ 100.744857] 6.6.0-10114-gca572939947f #1495 Not tainted
> [ 100.750162] -----------------------------
> [ 100.754236] net/sched/sch_taprio.c:1064 suspicious rcu_dereference_protected() usage!
> [ 100.762155]
> [ 100.762155] other info that might help us debug this:
> [ 100.762155]
> [ 100.770242]
> [ 100.770242] rcu_scheduler_active = 2, debug_locks = 1
> [ 100.776851] 1 lock held by swapper/0/0:
> [ 100.780756] #0: ffff3d9784b83b00 (&q->current_entry_lock){-...}-{3:3}, at: advance_sched+0x44/0x59c
> [ 100.790099]
> [ 100.790099] stack backtrace:
> [ 100.794477] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-10114-gca572939947f #1495
> [ 100.802346] Hardware name: LS1028A RDB Board (DT)
> [ 100.807072] Call trace:
> [ 100.809531] dump_backtrace+0xf4/0x140
> [ 100.813305] show_stack+0x18/0x2c
> [ 100.816638] dump_stack_lvl+0x60/0x80
> [ 100.820321] dump_stack+0x18/0x24
> [ 100.823654] lockdep_rcu_suspicious+0x170/0x210
> [ 100.828210] advance_sched+0x384/0x59c
> [ 100.831978] __hrtimer_run_queues+0x200/0x430
> [ 100.836360] hrtimer_interrupt+0xdc/0x39c
> [ 100.840392] arch_timer_handler_phys+0x3c/0x4c
> [ 100.844862] handle_percpu_devid_irq+0xb8/0x28c
> [ 100.849417] generic_handle_domain_irq+0x2c/0x44
> [ 100.854060] gic_handle_irq+0x4c/0x110
> [ 100.857830] call_on_irq_stack+0x24/0x4c
> [ 100.861775] el1_interrupt+0x74/0xc0
> [ 100.865370] el1h_64_irq_handler+0x18/0x24
> [ 100.869489] el1h_64_irq+0x64/0x68
> [ 100.872909] arch_local_irq_enable+0x8/0xc
> [ 100.877032] cpuidle_enter+0x38/0x50
> [ 100.880629] do_idle+0x1ec/0x280
> [ 100.883877] cpu_startup_entry+0x34/0x38
> [ 100.887822] kernel_init+0x0/0x1a0
> [ 100.891245] start_kernel+0x0/0x3b0
> [ 100.894756] start_kernel+0x2f8/0x3b0
> [ 100.898439] __primary_switched+0xbc/0xc4
>
> What I would do is:
>
> struct qdisc_size_table *stab;
>
> rcu_read_lock();
> stab = rcu_dereference(q->root->stab);
> taprio_update_queue_max_sdu(q, oper, stab);
> rcu_read_unlock();
>
>> oper->cycle_end_time = new_base_time;
>> end_time = new_base_time;
>> +
>> + update_open_gate_duration(next, oper, num_tc,
>> + new_gate_duration);
>> + taprio_update_queue_max_sdu(q, oper, stab);
>> }
>> }
>>
>> for (tc = 0; tc < num_tc; tc++) {
>> - if (next->gate_duration[tc] == oper->cycle_time)
>> + if (cycle_corr_active(oper->cycle_time_correction) &&
>> + (next->gate_mask & BIT(tc)))
>> + /* Set to the new base time, ensuring a smooth transition
>> + * to the new schedule when the next entry finishes.
>> + */
>> + next->gate_close_time[tc] = end_time;
>> + else if (next->gate_duration[tc] == oper->cycle_time)
>> next->gate_close_time[tc] = KTIME_MAX;
>> else
>> next->gate_close_time[tc] = ktime_add_ns(entry->end_time,
>> --
>> 2.25.1
>>
>
Will update, thanks.
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
` (2 preceding siblings ...)
2023-11-07 11:20 ` [PATCH v2 net 3/7] net/sched: taprio: update impacted fields during cycle time adjustment Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-07 22:45 ` kernel test robot
` (3 more replies)
2023-11-07 11:20 ` [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry Faizal Rahim
` (3 subsequent siblings)
7 siblings, 4 replies; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
Retrieve adjusted cycle_time and interval values through new APIs.
Note that in some cases where the original values are required,
such as in dump_schedule() and setup_first_end_time(), direct calls
to cycle_time and interval are retained without using the new APIs.
Added a new field, correction_active, in the sched_entry struct to
determine the entry's correction state. This field is required due
to specific flow like find_entry_to_transmit() -> get_interval_end_time()
which retrieves the interval for each entry. During positive cycle
time correction, it's known that the last entry interval requires
correction. However, for negative correction, the affected entry
is unknown, which is why this new field is necessary.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 50 ++++++++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 119dec3bbe88..f18a5fe12f0c 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -61,6 +61,7 @@ struct sched_entry {
u32 gate_mask;
u32 interval;
u8 command;
+ bool correction_active;
};
struct sched_gate_list {
@@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q,
*admin = NULL;
}
+static bool cycle_corr_active(s64 cycle_time_correction)
+{
+ if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
+ return false;
+ else
+ return true;
+}
+
+u32 get_interval(const struct sched_entry *entry,
+ const struct sched_gate_list *oper)
+{
+ if (entry->correction_active)
+ return entry->interval + oper->cycle_time_correction;
+ else
+ return entry->interval;
+}
+
+s64 get_cycle_time(const struct sched_gate_list *oper)
+{
+ if (cycle_corr_active(oper->cycle_time_correction))
+ return oper->cycle_time + oper->cycle_time_correction;
+ else
+ return oper->cycle_time;
+}
+
/* Get how much time has been already elapsed in the current cycle. */
static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
{
@@ -222,7 +248,7 @@ static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
s32 time_elapsed;
time_since_sched_start = ktime_sub(time, sched->base_time);
- div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed);
+ div_s64_rem(time_since_sched_start, get_cycle_time(sched), &time_elapsed);
return time_elapsed;
}
@@ -235,8 +261,9 @@ static ktime_t get_interval_end_time(struct sched_gate_list *sched,
s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start);
ktime_t intv_end, cycle_ext_end, cycle_end;
- cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed);
- intv_end = ktime_add_ns(intv_start, entry->interval);
+ cycle_end = ktime_add_ns(intv_start,
+ get_cycle_time(sched) - cycle_elapsed);
+ intv_end = ktime_add_ns(intv_start, get_interval(entry, sched));
cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension);
if (ktime_before(intv_end, cycle_end))
@@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
}
-static bool cycle_corr_active(s64 cycle_time_correction)
-{
- if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
- return false;
- else
- return true;
-}
-
/* Sets sched->max_sdu[] and sched->max_frm_len[] to the minimum between the
* q->max_sdu[] requested by the user and the max_sdu dynamically determined by
* the maximum open gate durations at the given link speed.
@@ -351,7 +370,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
if (!sched)
return NULL;
- cycle = sched->cycle_time;
+ cycle = get_cycle_time(sched);
cycle_elapsed = get_cycle_time_elapsed(sched, time);
curr_intv_end = ktime_sub_ns(time, cycle_elapsed);
cycle_end = ktime_add_ns(curr_intv_end, cycle);
@@ -365,7 +384,7 @@ static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
break;
if (!(entry->gate_mask & BIT(tc)) ||
- packet_transmit_time > entry->interval)
+ packet_transmit_time > get_interval(entry, sched))
continue;
txtime = entry->next_txtime;
@@ -543,7 +562,8 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
* interval starts.
*/
if (ktime_after(transmit_end_time, interval_end))
- entry->next_txtime = ktime_add(interval_start, sched->cycle_time);
+ entry->next_txtime =
+ ktime_add(interval_start, get_cycle_time(sched));
} while (sched_changed || ktime_after(transmit_end_time, interval_end));
entry->next_txtime = transmit_end_time;
@@ -1045,6 +1065,7 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
oper->cycle_end_time = new_base_time;
end_time = new_base_time;
+ next->correction_active = true;
update_open_gate_duration(next, oper, num_tc,
new_gate_duration);
@@ -1146,6 +1167,7 @@ static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
}
entry->interval = interval;
+ entry->correction_active = false;
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
@ 2023-11-07 22:45 ` kernel test robot
2023-11-09 11:11 ` Vladimir Oltean
` (2 subsequent siblings)
3 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2023-11-07 22:45 UTC (permalink / raw)
To: Faizal Rahim, Vladimir Oltean, Vinicius Costa Gomes,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: oe-kbuild-all, netdev, linux-kernel
Hi Faizal,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net/main]
url: https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843
base: net/main
patch link: https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com
patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231108/202311080506.qMlPx2WA-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311080506.qMlPx2WA-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> net/sched/sch_taprio.c:227:5: warning: no previous prototype for 'get_interval' [-Wmissing-prototypes]
227 | u32 get_interval(const struct sched_entry *entry,
| ^~~~~~~~~~~~
>> net/sched/sch_taprio.c:236:5: warning: no previous prototype for 'get_cycle_time' [-Wmissing-prototypes]
236 | s64 get_cycle_time(const struct sched_gate_list *oper)
| ^~~~~~~~~~~~~~
vim +/get_interval +227 net/sched/sch_taprio.c
226
> 227 u32 get_interval(const struct sched_entry *entry,
228 const struct sched_gate_list *oper)
229 {
230 if (entry->correction_active)
231 return entry->interval + oper->cycle_time_correction;
232 else
233 return entry->interval;
234 }
235
> 236 s64 get_cycle_time(const struct sched_gate_list *oper)
237 {
238 if (cycle_corr_active(oper->cycle_time_correction))
239 return oper->cycle_time + oper->cycle_time_correction;
240 else
241 return oper->cycle_time;
242 }
243
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
2023-11-07 22:45 ` kernel test robot
@ 2023-11-09 11:11 ` Vladimir Oltean
2023-11-15 11:55 ` Abdul Rahim, Faizal
2023-11-17 2:36 ` Vinicius Costa Gomes
2023-11-09 12:01 ` Vladimir Oltean
2023-11-10 19:15 ` kernel test robot
3 siblings, 2 replies; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 11:11 UTC (permalink / raw)
To: Faizal Rahim, Vinicius Costa Gomes
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
> Retrieve adjusted cycle_time and interval values through new APIs.
> Note that in some cases where the original values are required,
> such as in dump_schedule() and setup_first_end_time(), direct calls
> to cycle_time and interval are retained without using the new APIs.
>
> Added a new field, correction_active, in the sched_entry struct to
> determine the entry's correction state. This field is required due
> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
> which retrieves the interval for each entry. During positive cycle
> time correction, it's known that the last entry interval requires
> correction. However, for negative correction, the affected entry
> is unknown, which is why this new field is necessary.
I agree with the motivation, but I'm not sure if the chosen solution is
correct.
static u32 get_interval(const struct sched_entry *entry,
const struct sched_gate_list *oper)
{
if (entry->correction_active)
return entry->interval + oper->cycle_time_correction;
return entry->interval;
}
What if the schedule looks like this:
sched-entry S 0x01 125000000
sched-entry S 0x02 125000000
sched-entry S 0x04 125000000
sched-entry S 0x08 125000000
sched-entry S 0x10 125000000
sched-entry S 0x20 125000000
sched-entry S 0x40 125000000
sched-entry S 0x80 125000000
and the calculated cycle_time_correction is -200000000? That would
eliminate the entire last sched-entry (0x80), and the previous one
(0x40) would run for just 75000000 ns. But your calculation would say
that its interval is −75000000 ns (actually reported as an u32 positive
integer, so it would be a completely bogus value).
So not only is the affected entry unknown, but also the amount of cycle
time correction that applies to it is unknown.
I'm looking at where we need get_interval(), and it's from:
taprio_enqueue_one()
-> is_valid_interval()
-> find_entry_to_transmit()
-> get_interval_end_time()
-> get_packet_txtime()
-> find_entry_to_transmit()
I admit it's a part of taprio which I don't understand too well. Why do
we perform such complex calculations in get_interval_end_time() when we
should have struct sched_entry :: end_time precomputed and available for
this purpose (although it was primarily inteded for advance_sched() and
not for enqueue())?
Vinicius, do you know?
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-09 11:11 ` Vladimir Oltean
@ 2023-11-15 11:55 ` Abdul Rahim, Faizal
2023-11-17 2:36 ` Vinicius Costa Gomes
1 sibling, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:55 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
On 9/11/2023 7:11 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
>> Retrieve adjusted cycle_time and interval values through new APIs.
>> Note that in some cases where the original values are required,
>> such as in dump_schedule() and setup_first_end_time(), direct calls
>> to cycle_time and interval are retained without using the new APIs.
>>
>> Added a new field, correction_active, in the sched_entry struct to
>> determine the entry's correction state. This field is required due
>> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
>> which retrieves the interval for each entry. During positive cycle
>> time correction, it's known that the last entry interval requires
>> correction. However, for negative correction, the affected entry
>> is unknown, which is why this new field is necessary.
>
> I agree with the motivation, but I'm not sure if the chosen solution is
> correct.
>
> static u32 get_interval(const struct sched_entry *entry,
> const struct sched_gate_list *oper)
> {
> if (entry->correction_active)
> return entry->interval + oper->cycle_time_correction;
>
> return entry->interval;
> }
>
> What if the schedule looks like this:
>
> sched-entry S 0x01 125000000
> sched-entry S 0x02 125000000
> sched-entry S 0x04 125000000
> sched-entry S 0x08 125000000
> sched-entry S 0x10 125000000
> sched-entry S 0x20 125000000
> sched-entry S 0x40 125000000
> sched-entry S 0x80 125000000
>
> and the calculated cycle_time_correction is -200000000? That would
> eliminate the entire last sched-entry (0x80), and the previous one
> (0x40) would run for just 75000000 ns. But your calculation would say
> that its interval is −75000000 ns (actually reported as an u32 positive
> integer, so it would be a completely bogus value).
>
> So not only is the affected entry unknown, but also the amount of cycle
> time correction that applies to it is unknown.
>
Just an FYI, my cycle time extension test for sending packets fails without
updating the interval and cycle_time – the duration doesn't extend
properly. I only observe proper extension when this patch is included.
In patch series v1, interval and cycle_time were updated directly. However,
due to concerns in v1 comments about updating the fields directly, v2
doesn't do that.
Regarding the concern about negative correction exceeding the interval
value, I've checked the logic in get_cycle_time_correction() that sets
cycle_time_correction, I don't see the possibility of this happening....
Still, if it does, it suggests an error much earlier than the
get_interval() call. So, I propose a failure check in
get_cycle_time_correction(). If the correction value is negative and
consumes the entire entry interval or more, we set the negative
cycle_time_correction to some arbitrary value, maybe half of the interval,
just to mitigate the impact of the unknown error that occurred earlier.
What do you think ?
> I'm looking at where we need get_interval(), and it's from:
>
> taprio_enqueue_one()
> -> is_valid_interval()
> -> find_entry_to_transmit()
> -> get_interval_end_time()
> -> get_packet_txtime()
> -> find_entry_to_transmit()
>
> I admit it's a part of taprio which I don't understand too well. Why do
> we perform such complex calculations in get_interval_end_time() when we
> should have struct sched_entry :: end_time precomputed and available for
> this purpose (although it was primarily inteded for advance_sched() and
> not for enqueue())?
>
> Vinicius, do you know?
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-09 11:11 ` Vladimir Oltean
2023-11-15 11:55 ` Abdul Rahim, Faizal
@ 2023-11-17 2:36 ` Vinicius Costa Gomes
1 sibling, 0 replies; 31+ messages in thread
From: Vinicius Costa Gomes @ 2023-11-17 2:36 UTC (permalink / raw)
To: Vladimir Oltean, Faizal Rahim
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
Hi Vladimir,
Vladimir Oltean <vladimir.oltean@nxp.com> writes:
> On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
>> Retrieve adjusted cycle_time and interval values through new APIs.
>> Note that in some cases where the original values are required,
>> such as in dump_schedule() and setup_first_end_time(), direct calls
>> to cycle_time and interval are retained without using the new APIs.
>>
>> Added a new field, correction_active, in the sched_entry struct to
>> determine the entry's correction state. This field is required due
>> to specific flow like find_entry_to_transmit() -> get_interval_end_time()
>> which retrieves the interval for each entry. During positive cycle
>> time correction, it's known that the last entry interval requires
>> correction. However, for negative correction, the affected entry
>> is unknown, which is why this new field is necessary.
>
> I agree with the motivation, but I'm not sure if the chosen solution is
> correct.
>
> static u32 get_interval(const struct sched_entry *entry,
> const struct sched_gate_list *oper)
> {
> if (entry->correction_active)
> return entry->interval + oper->cycle_time_correction;
>
> return entry->interval;
> }
>
> What if the schedule looks like this:
>
> sched-entry S 0x01 125000000
> sched-entry S 0x02 125000000
> sched-entry S 0x04 125000000
> sched-entry S 0x08 125000000
> sched-entry S 0x10 125000000
> sched-entry S 0x20 125000000
> sched-entry S 0x40 125000000
> sched-entry S 0x80 125000000
>
> and the calculated cycle_time_correction is -200000000? That would
> eliminate the entire last sched-entry (0x80), and the previous one
> (0x40) would run for just 75000000 ns. But your calculation would say
> that its interval is −75000000 ns (actually reported as an u32 positive
> integer, so it would be a completely bogus value).
>
> So not only is the affected entry unknown, but also the amount of cycle
> time correction that applies to it is unknown.
>
> I'm looking at where we need get_interval(), and it's from:
>
> taprio_enqueue_one()
> -> is_valid_interval()
> -> find_entry_to_transmit()
> -> get_interval_end_time()
> -> get_packet_txtime()
> -> find_entry_to_transmit()
>
> I admit it's a part of taprio which I don't understand too well. Why do
> we perform such complex calculations in get_interval_end_time() when we
> should have struct sched_entry :: end_time precomputed and available for
> this purpose (although it was primarily inteded for advance_sched() and
> not for enqueue())?
>
> Vinicius, do you know?
Sorry for the delay, I thought that I went through all the messages in
this thread, but missed this one.
I think what is missing is some context, this series from Faizal also
includes fixes for taprio "txtime-assisted mode", where we try to
support for 801.1Qbv schedules, including cycle-extension and schedules
with arbitrary number of entries.
The basic idea is that during enqueue, taprio will calculate the txtime
of a packet so it "follows" the configured schedule, and pass that
packet to ETF, which is running as child of taprio. It is a bit of hack,
but it works well enough.
And I agree with your opinion, that this part of the code is
complicated. I have one permanent item on my todo list to spend some
quality time looking at it, and trying to make it simpler.
But fixing it to make it work with cycle-time-extension comes first.
Then, it's on me to not break it later.
Sorry for the rambling. Does this answer your question?
Cheers,
--
Vinicius
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
2023-11-07 22:45 ` kernel test robot
2023-11-09 11:11 ` Vladimir Oltean
@ 2023-11-09 12:01 ` Vladimir Oltean
2023-11-10 19:15 ` kernel test robot
3 siblings, 0 replies; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 12:01 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:20AM -0500, Faizal Rahim wrote:
> @@ -215,6 +216,31 @@ static void switch_schedules(struct taprio_sched *q,
> *admin = NULL;
> }
>
> +static bool cycle_corr_active(s64 cycle_time_correction)
> +{
> + if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> + return false;
> + else
> + return true;
> +}
> @@ -259,14 +286,6 @@ static int duration_to_length(struct taprio_sched *q, u64 duration)
> return div_u64(duration * PSEC_PER_NSEC, atomic64_read(&q->picos_per_byte));
> }
>
> -static bool cycle_corr_active(s64 cycle_time_correction)
> -{
> - if (cycle_time_correction == INIT_CYCLE_TIME_CORRECTION)
> - return false;
> - else
> - return true;
> -}
> -
Don't move code around that you've introduced in earlier changes. Just
place it where it needs to be from the beginning.
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
` (2 preceding siblings ...)
2023-11-09 12:01 ` Vladimir Oltean
@ 2023-11-10 19:15 ` kernel test robot
3 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2023-11-10 19:15 UTC (permalink / raw)
To: Faizal Rahim, Vladimir Oltean, Vinicius Costa Gomes,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: llvm, oe-kbuild-all, netdev, linux-kernel
Hi Faizal,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net/main]
url: https://github.com/intel-lab-lkp/linux/commits/Faizal-Rahim/net-sched-taprio-fix-too-early-schedules-switching/20231107-192843
base: net/main
patch link: https://lore.kernel.org/r/20231107112023.676016-5-faizal.abdul.rahim%40linux.intel.com
patch subject: [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval
config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231111/202311110208.GT4trtEk-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311110208.GT4trtEk-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> net/sched/sch_taprio.c:227:5: warning: no previous prototype for function 'get_interval' [-Wmissing-prototypes]
227 | u32 get_interval(const struct sched_entry *entry,
| ^
net/sched/sch_taprio.c:227:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
227 | u32 get_interval(const struct sched_entry *entry,
| ^
| static
>> net/sched/sch_taprio.c:236:5: warning: no previous prototype for function 'get_cycle_time' [-Wmissing-prototypes]
236 | s64 get_cycle_time(const struct sched_gate_list *oper)
| ^
net/sched/sch_taprio.c:236:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
236 | s64 get_cycle_time(const struct sched_gate_list *oper)
| ^
| static
2 warnings generated.
vim +/get_interval +227 net/sched/sch_taprio.c
226
> 227 u32 get_interval(const struct sched_entry *entry,
228 const struct sched_gate_list *oper)
229 {
230 if (entry->correction_active)
231 return entry->interval + oper->cycle_time_correction;
232 else
233 return entry->interval;
234 }
235
> 236 s64 get_cycle_time(const struct sched_gate_list *oper)
237 {
238 if (cycle_corr_active(oper->cycle_time_correction))
239 return oper->cycle_time + oper->cycle_time_correction;
240 else
241 return oper->cycle_time;
242 }
243
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
` (3 preceding siblings ...)
2023-11-07 11:20 ` [PATCH v2 net 4/7] net/sched: taprio: get corrected value of cycle_time and interval Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-09 11:50 ` Vladimir Oltean
2023-11-09 12:24 ` Vladimir Oltean
2023-11-07 11:20 ` [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry Faizal Rahim
` (2 subsequent siblings)
7 siblings, 2 replies; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
If a new GCL is triggered and the new admin base time falls before the
expiry of advance_timer (current running entry from oper),
taprio_start_sched() resets the current advance_timer expiry to the
new admin base time. However, upon expiry, advance_sched() doesn't
immediately switch to the admin schedule. It continues running entries
from the old oper schedule, and only switches to the new admin schedule
much later. Ideally, if the advance_timer is shorten to align with the
new admin base time, when the timer expires, advance_sched() should
trigger switch_schedules() at the beginning.
To resolve this issue, set the cycle_time_correction to a non-initialized
value in taprio_start_sched(). advance_sched() will use it to initiate
switch_schedules() at the beginning.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index f18a5fe12f0c..01b114edec30 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1379,14 +1379,19 @@ static void setup_first_end_time(struct taprio_sched *q,
}
static void taprio_start_sched(struct Qdisc *sch,
- ktime_t start, struct sched_gate_list *new)
+ ktime_t new_base_time,
+ struct sched_gate_list *new)
{
struct taprio_sched *q = qdisc_priv(sch);
- ktime_t expires;
+ struct sched_gate_list *oper = NULL;
+ ktime_t expires, start;
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
return;
+ oper = rcu_dereference_protected(q->oper_sched,
+ lockdep_is_held(&q->current_entry_lock));
+
expires = hrtimer_get_expires(&q->advance_timer);
if (expires == 0)
expires = KTIME_MAX;
@@ -1395,7 +1400,17 @@ static void taprio_start_sched(struct Qdisc *sch,
* reprogram it to the earliest one, so we change the admin
* schedule to the operational one at the right time.
*/
- start = min_t(ktime_t, start, expires);
+ start = min_t(ktime_t, new_base_time, expires);
+
+ if (expires != KTIME_MAX &&
+ ktime_compare(start, new_base_time) == 0) {
+ /* Since timer was changed to align to the new admin schedule,
+ * setting the variable below to a non-initialized value will
+ * indicate to advance_sched() to call switch_schedules() after
+ * this timer expires.
+ */
+ oper->cycle_time_correction = 0;
+ }
hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry
2023-11-07 11:20 ` [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry Faizal Rahim
@ 2023-11-09 11:50 ` Vladimir Oltean
2023-11-15 11:56 ` Abdul Rahim, Faizal
2023-11-09 12:24 ` Vladimir Oltean
1 sibling, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 11:50 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:21AM -0500, Faizal Rahim wrote:
> If a new GCL is triggered and the new admin base time falls before the
> expiry of advance_timer (current running entry from oper),
> taprio_start_sched() resets the current advance_timer expiry to the
> new admin base time. However, upon expiry, advance_sched() doesn't
> immediately switch to the admin schedule. It continues running entries
> from the old oper schedule, and only switches to the new admin schedule
> much later. Ideally, if the advance_timer is shorten to align with the
> new admin base time, when the timer expires, advance_sched() should
> trigger switch_schedules() at the beginning.
>
> To resolve this issue, set the cycle_time_correction to a non-initialized
> value in taprio_start_sched(). advance_sched() will use it to initiate
> switch_schedules() at the beginning.
>
> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Did the commit you blame really introduce this issue, or was it your
rework to trigger switch_schedules() based on the correction?
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index f18a5fe12f0c..01b114edec30 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1379,14 +1379,19 @@ static void setup_first_end_time(struct taprio_sched *q,
> }
>
> static void taprio_start_sched(struct Qdisc *sch,
> - ktime_t start, struct sched_gate_list *new)
> + ktime_t new_base_time,
> + struct sched_gate_list *new)
> {
> struct taprio_sched *q = qdisc_priv(sch);
> - ktime_t expires;
> + struct sched_gate_list *oper = NULL;
> + ktime_t expires, start;
>
> if (FULL_OFFLOAD_IS_ENABLED(q->flags))
> return;
>
> + oper = rcu_dereference_protected(q->oper_sched,
> + lockdep_is_held(&q->current_entry_lock));
> +
> expires = hrtimer_get_expires(&q->advance_timer);
> if (expires == 0)
> expires = KTIME_MAX;
> @@ -1395,7 +1400,17 @@ static void taprio_start_sched(struct Qdisc *sch,
> * reprogram it to the earliest one, so we change the admin
> * schedule to the operational one at the right time.
> */
> - start = min_t(ktime_t, start, expires);
> + start = min_t(ktime_t, new_base_time, expires);
> +
> + if (expires != KTIME_MAX &&
> + ktime_compare(start, new_base_time) == 0) {
> + /* Since timer was changed to align to the new admin schedule,
> + * setting the variable below to a non-initialized value will
I find the wording "setting the variable below to a non-initialized value"
confusing. 0 is non-initialized? You're talking about a value different
than INIT_CYCLE_TIME_CORRECTION. What about "setting a specific cycle
correction will indicate ..."?
> + * indicate to advance_sched() to call switch_schedules() after
> + * this timer expires.
> + */
> + oper->cycle_time_correction = 0;
Why 0 and not ktime_sub(new_base_time, oper->cycle_end_time)? Doesn't
the precise correction value make a difference?
> + }
>
> hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
> }
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry
2023-11-09 11:50 ` Vladimir Oltean
@ 2023-11-15 11:56 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:56 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 7:50 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:21AM -0500, Faizal Rahim wrote:
>> If a new GCL is triggered and the new admin base time falls before the
>> expiry of advance_timer (current running entry from oper),
>> taprio_start_sched() resets the current advance_timer expiry to the
>> new admin base time. However, upon expiry, advance_sched() doesn't
>> immediately switch to the admin schedule. It continues running entries
>> from the old oper schedule, and only switches to the new admin schedule
>> much later. Ideally, if the advance_timer is shorten to align with the
>> new admin base time, when the timer expires, advance_sched() should
>> trigger switch_schedules() at the beginning.
>>
>> To resolve this issue, set the cycle_time_correction to a non-initialized
>> value in taprio_start_sched(). advance_sched() will use it to initiate
>> switch_schedules() at the beginning.
>>
>> Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
>
> Did the commit you blame really introduce this issue, or was it your
> rework to trigger switch_schedules() based on the correction?
>
Ohh actually this issue happens even without my whole patch set.
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 21 ++++++++++++++++++---
>> 1 file changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index f18a5fe12f0c..01b114edec30 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -1379,14 +1379,19 @@ static void setup_first_end_time(struct taprio_sched *q,
>> }
>>
>> static void taprio_start_sched(struct Qdisc *sch,
>> - ktime_t start, struct sched_gate_list *new)
>> + ktime_t new_base_time,
>> + struct sched_gate_list *new)
>> {
>> struct taprio_sched *q = qdisc_priv(sch);
>> - ktime_t expires;
>> + struct sched_gate_list *oper = NULL;
>> + ktime_t expires, start;
>>
>> if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>> return;
>>
>> + oper = rcu_dereference_protected(q->oper_sched,
>> + lockdep_is_held(&q->current_entry_lock));
>> +
>> expires = hrtimer_get_expires(&q->advance_timer);
>> if (expires == 0)
>> expires = KTIME_MAX;
>> @@ -1395,7 +1400,17 @@ static void taprio_start_sched(struct Qdisc *sch,
>> * reprogram it to the earliest one, so we change the admin
>> * schedule to the operational one at the right time.
>> */
>> - start = min_t(ktime_t, start, expires);
>> + start = min_t(ktime_t, new_base_time, expires);
>> +
>> + if (expires != KTIME_MAX &&
>> + ktime_compare(start, new_base_time) == 0) {
>> + /* Since timer was changed to align to the new admin schedule,
>> + * setting the variable below to a non-initialized value will
>
> I find the wording "setting the variable below to a non-initialized value"
> confusing. 0 is non-initialized? You're talking about a value different
> than INIT_CYCLE_TIME_CORRECTION. What about "setting a specific cycle
> correction will indicate ..."?
>
Sure
>> + * indicate to advance_sched() to call switch_schedules() after
>> + * this timer expires.
>> + */
>> + oper->cycle_time_correction = 0;
>
> Why 0 and not ktime_sub(new_base_time, oper->cycle_end_time)? Doesn't
> the precise correction value make a difference?
>
Negative correction and its calculation is a separate problem handled in
different patch.
My intention is to highlight a specific issue and address it with a single
patch. The core problem stemmed from the new admin schedule not making an
immediate transition in advance_sched().
I'll rework this patch to focus specifically on resolving this problem
while gradually aligning with the overall series. Importantly, I won't be
removing anything from this patch in the process.
Is that okay ?
>> + }
>>
>> hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
>> }
>> --
>> 2.25.1
>>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry
2023-11-07 11:20 ` [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry Faizal Rahim
2023-11-09 11:50 ` Vladimir Oltean
@ 2023-11-09 12:24 ` Vladimir Oltean
1 sibling, 0 replies; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 12:24 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:21AM -0500, Faizal Rahim wrote:
> static void taprio_start_sched(struct Qdisc *sch,
> - ktime_t start, struct sched_gate_list *new)
> + ktime_t new_base_time,
> + struct sched_gate_list *new)
> {
> struct taprio_sched *q = qdisc_priv(sch);
> - ktime_t expires;
> + struct sched_gate_list *oper = NULL;
No point in initializing with NULL if this will be overwritten later.
> + ktime_t expires, start;
>
> if (FULL_OFFLOAD_IS_ENABLED(q->flags))
> return;
>
> + oper = rcu_dereference_protected(q->oper_sched,
> + lockdep_is_held(&q->current_entry_lock));
> +
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
` (4 preceding siblings ...)
2023-11-07 11:20 ` [PATCH v2 net 5/7] net/sched: taprio: fix delayed switching to new schedule after timer expiry Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-09 11:55 ` Vladimir Oltean
2023-11-07 11:20 ` [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry Faizal Rahim
2023-11-08 15:51 ` [PATCH v2 net 0/7] qbv cycle time extension/truncation Vladimir Oltean
7 siblings, 1 reply; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
Fix the issue of prematurely setting q->current_entry to NULL in the
setup_first_end_time() function when a new admin schedule arrives
while the oper schedule is still running but hasn't transitioned yet.
This premature setting causes problems because any reference to
q->current_entry, such as in taprio_dequeue(), will result in NULL
during this period, which is incorrect. q->current_entry should remain
valid until the currently running entry expires.
To address this issue, only set q->current_entry to NULL when there is
no oper schedule currently running.
Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 01b114edec30..c60e9e7ac193 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1375,7 +1375,8 @@ static void setup_first_end_time(struct taprio_sched *q,
first->gate_close_time[tc] = ktime_add_ns(base, first->gate_duration[tc]);
}
- rcu_assign_pointer(q->current_entry, NULL);
+ if (!hrtimer_active(&q->advance_timer))
+ rcu_assign_pointer(q->current_entry, NULL);
}
static void taprio_start_sched(struct Qdisc *sch,
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry
2023-11-07 11:20 ` [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry Faizal Rahim
@ 2023-11-09 11:55 ` Vladimir Oltean
2023-11-15 11:56 ` Abdul Rahim, Faizal
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 11:55 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:22AM -0500, Faizal Rahim wrote:
> Fix the issue of prematurely setting q->current_entry to NULL in the
> setup_first_end_time() function when a new admin schedule arrives
> while the oper schedule is still running but hasn't transitioned yet.
> This premature setting causes problems because any reference to
> q->current_entry, such as in taprio_dequeue(), will result in NULL
> during this period, which is incorrect. q->current_entry should remain
> valid until the currently running entry expires.
>
> To address this issue, only set q->current_entry to NULL when there is
> no oper schedule currently running.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
The "Fixes" tag represents the commit where the code started becoming a
problem, not when the code that you're changing was first introduced.
I find it hard to believe that the problem was in commit 5a781ccbd19e
("tc: Add support for configuring the taprio scheduler"), because we
didn't support admin -> oper dynamic schedules back then.
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 01b114edec30..c60e9e7ac193 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1375,7 +1375,8 @@ static void setup_first_end_time(struct taprio_sched *q,
> first->gate_close_time[tc] = ktime_add_ns(base, first->gate_duration[tc]);
> }
>
> - rcu_assign_pointer(q->current_entry, NULL);
> + if (!hrtimer_active(&q->advance_timer))
> + rcu_assign_pointer(q->current_entry, NULL);
> }
>
> static void taprio_start_sched(struct Qdisc *sch,
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry
2023-11-09 11:55 ` Vladimir Oltean
@ 2023-11-15 11:56 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:56 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 7:55 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:22AM -0500, Faizal Rahim wrote:
>> Fix the issue of prematurely setting q->current_entry to NULL in the
>> setup_first_end_time() function when a new admin schedule arrives
>> while the oper schedule is still running but hasn't transitioned yet.
>> This premature setting causes problems because any reference to
>> q->current_entry, such as in taprio_dequeue(), will result in NULL
>> during this period, which is incorrect. q->current_entry should remain
>> valid until the currently running entry expires.
>>
>> To address this issue, only set q->current_entry to NULL when there is
>> no oper schedule currently running.
>>
>> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
>
> The "Fixes" tag represents the commit where the code started becoming a
> problem, not when the code that you're changing was first introduced.
>
> I find it hard to believe that the problem was in commit 5a781ccbd19e
> ("tc: Add support for configuring the taprio scheduler"), because we
> didn't support admin -> oper dynamic schedules back then.
>
My bad, will update accordingly.
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index 01b114edec30..c60e9e7ac193 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -1375,7 +1375,8 @@ static void setup_first_end_time(struct taprio_sched *q,
>> first->gate_close_time[tc] = ktime_add_ns(base, first->gate_duration[tc]);
>> }
>>
>> - rcu_assign_pointer(q->current_entry, NULL);
>> + if (!hrtimer_active(&q->advance_timer))
>> + rcu_assign_pointer(q->current_entry, NULL);
>> }
>>
>> static void taprio_start_sched(struct Qdisc *sch,
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
` (5 preceding siblings ...)
2023-11-07 11:20 ` [PATCH v2 net 6/7] net/sched: taprio: fix q->current_entry is NULL before its expiry Faizal Rahim
@ 2023-11-07 11:20 ` Faizal Rahim
2023-11-09 13:18 ` Vladimir Oltean
2023-11-08 15:51 ` [PATCH v2 net 0/7] qbv cycle time extension/truncation Vladimir Oltean
7 siblings, 1 reply; 31+ messages in thread
From: Faizal Rahim @ 2023-11-07 11:20 UTC (permalink / raw)
To: Vladimir Oltean, Vinicius Costa Gomes, Jamal Hadi Salim,
Cong Wang, Jiri Pirko, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel
Handles cycle time adjustments for the current active entry
when new admin base time occurs quickly, either within the
current entry or the next one.
Changes covers:
1. Negative cycle correction or truncation
Occurs when the new admin base time falls before the expiry of the
current running entry.
2. Positive cycle correction or extension
Occurs when the new admin base time falls within the next entry,
and the current entry is the cycle's last entry. In this case, the
changes in taprio_start_sched() extends the schedule, preventing
old oper schedule from resuming and getting truncated in the next
advance_sched() call.
3. A new API, update_gate_close_time(), has been created to update
the gate_close_time of the current entry in the event of cycle
correction.
Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
---
net/sched/sch_taprio.c | 72 +++++++++++++++++++++++++++++++-----------
1 file changed, 53 insertions(+), 19 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index c60e9e7ac193..56743754d42e 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1379,41 +1379,75 @@ static void setup_first_end_time(struct taprio_sched *q,
rcu_assign_pointer(q->current_entry, NULL);
}
+static void update_gate_close_time(struct sched_entry *current_entry,
+ ktime_t new_end_time,
+ int num_tc)
+{
+ int tc;
+
+ for (tc = 0; tc < num_tc; tc++) {
+ if (current_entry->gate_mask & BIT(tc))
+ current_entry->gate_close_time[tc] = new_end_time;
+ }
+}
+
static void taprio_start_sched(struct Qdisc *sch,
ktime_t new_base_time,
- struct sched_gate_list *new)
+ struct sched_gate_list *admin)
{
struct taprio_sched *q = qdisc_priv(sch);
+ ktime_t expires = hrtimer_get_expires(&q->advance_timer);
+ struct net_device *dev = qdisc_dev(q->root);
+ struct sched_entry *curr_entry = NULL;
struct sched_gate_list *oper = NULL;
- ktime_t expires, start;
if (FULL_OFFLOAD_IS_ENABLED(q->flags))
return;
oper = rcu_dereference_protected(q->oper_sched,
lockdep_is_held(&q->current_entry_lock));
+ curr_entry = rcu_dereference_protected(q->current_entry,
+ lockdep_is_held(&q->current_entry_lock));
- expires = hrtimer_get_expires(&q->advance_timer);
- if (expires == 0)
- expires = KTIME_MAX;
+ if (hrtimer_active(&q->advance_timer)) {
+ oper->cycle_time_correction =
+ get_cycle_time_correction(oper, new_base_time,
+ curr_entry->end_time,
+ curr_entry);
- /* If the new schedule starts before the next expiration, we
- * reprogram it to the earliest one, so we change the admin
- * schedule to the operational one at the right time.
- */
- start = min_t(ktime_t, new_base_time, expires);
-
- if (expires != KTIME_MAX &&
- ktime_compare(start, new_base_time) == 0) {
- /* Since timer was changed to align to the new admin schedule,
- * setting the variable below to a non-initialized value will
- * indicate to advance_sched() to call switch_schedules() after
- * this timer expires.
+ if (cycle_corr_active(oper->cycle_time_correction)) {
+ /* This is the last entry we are running from oper,
+ * subsequent entry will take from the new admin.
+ */
+ ktime_t now = taprio_get_time(q);
+ u64 gate_duration_left = ktime_sub(new_base_time, now);
+ struct qdisc_size_table *stab =
+ rtnl_dereference(q->root->stab);
+ int num_tc = netdev_get_num_tc(dev);
+
+ oper->cycle_end_time = new_base_time;
+ curr_entry->end_time = new_base_time;
+ curr_entry->correction_active = true;
+
+ update_open_gate_duration(curr_entry, oper, num_tc,
+ gate_duration_left);
+ update_gate_close_time(curr_entry, new_base_time, num_tc);
+ taprio_update_queue_max_sdu(q, oper, stab);
+ taprio_set_budgets(q, oper, curr_entry);
+ }
+ }
+
+ if (!hrtimer_active(&q->advance_timer) ||
+ cycle_corr_active(oper->cycle_time_correction)) {
+ /* Use new admin base time if :
+ * 1. there's no active oper
+ * 2. there's active oper and we will change to the new admin
+ * schedule after the current entry from oper ends
*/
- oper->cycle_time_correction = 0;
+ expires = new_base_time;
}
- hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
+ hrtimer_start(&q->advance_timer, expires, HRTIMER_MODE_ABS);
}
static void taprio_set_picos_per_byte(struct net_device *dev,
--
2.25.1
^ permalink raw reply related [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry
2023-11-07 11:20 ` [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry Faizal Rahim
@ 2023-11-09 13:18 ` Vladimir Oltean
2023-11-15 11:57 ` Abdul Rahim, Faizal
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-09 13:18 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On Tue, Nov 07, 2023 at 06:20:23AM -0500, Faizal Rahim wrote:
> Handles cycle time adjustments for the current active entry
Use the imperative mood for commit messages, i.e. "handle".
> when new admin base time occurs quickly, either within the
> current entry or the next one.
>
> Changes covers:
> 1. Negative cycle correction or truncation
> Occurs when the new admin base time falls before the expiry of the
> current running entry.
>
> 2. Positive cycle correction or extension
> Occurs when the new admin base time falls within the next entry,
> and the current entry is the cycle's last entry. In this case, the
> changes in taprio_start_sched() extends the schedule, preventing
> old oper schedule from resuming and getting truncated in the next
> advance_sched() call.
>
> 3. A new API, update_gate_close_time(), has been created to update
> the gate_close_time of the current entry in the event of cycle
> correction.
>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> ---
> net/sched/sch_taprio.c | 72 +++++++++++++++++++++++++++++++-----------
> 1 file changed, 53 insertions(+), 19 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index c60e9e7ac193..56743754d42e 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -1379,41 +1379,75 @@ static void setup_first_end_time(struct taprio_sched *q,
> rcu_assign_pointer(q->current_entry, NULL);
> }
>
> +static void update_gate_close_time(struct sched_entry *current_entry,
> + ktime_t new_end_time,
> + int num_tc)
> +{
> + int tc;
> +
> + for (tc = 0; tc < num_tc; tc++) {
> + if (current_entry->gate_mask & BIT(tc))
> + current_entry->gate_close_time[tc] = new_end_time;
> + }
> +}
> +
> static void taprio_start_sched(struct Qdisc *sch,
> ktime_t new_base_time,
> - struct sched_gate_list *new)
> + struct sched_gate_list *admin)
> {
> struct taprio_sched *q = qdisc_priv(sch);
> + ktime_t expires = hrtimer_get_expires(&q->advance_timer);
> + struct net_device *dev = qdisc_dev(q->root);
> + struct sched_entry *curr_entry = NULL;
> struct sched_gate_list *oper = NULL;
> - ktime_t expires, start;
>
> if (FULL_OFFLOAD_IS_ENABLED(q->flags))
> return;
>
> oper = rcu_dereference_protected(q->oper_sched,
> lockdep_is_held(&q->current_entry_lock));
> + curr_entry = rcu_dereference_protected(q->current_entry,
> + lockdep_is_held(&q->current_entry_lock));
>
> - expires = hrtimer_get_expires(&q->advance_timer);
> - if (expires == 0)
> - expires = KTIME_MAX;
> + if (hrtimer_active(&q->advance_timer)) {
> + oper->cycle_time_correction =
> + get_cycle_time_correction(oper, new_base_time,
> + curr_entry->end_time,
> + curr_entry);
>
> - /* If the new schedule starts before the next expiration, we
> - * reprogram it to the earliest one, so we change the admin
> - * schedule to the operational one at the right time.
> - */
> - start = min_t(ktime_t, new_base_time, expires);
> -
> - if (expires != KTIME_MAX &&
> - ktime_compare(start, new_base_time) == 0) {
> - /* Since timer was changed to align to the new admin schedule,
> - * setting the variable below to a non-initialized value will
> - * indicate to advance_sched() to call switch_schedules() after
> - * this timer expires.
I would appreciate not changing things that you've established in
earlier changes. Try to keep stuff introduced earlier in a form that is
as close as possible to the final form.
> + if (cycle_corr_active(oper->cycle_time_correction)) {
> + /* This is the last entry we are running from oper,
> + * subsequent entry will take from the new admin.
> + */
> + ktime_t now = taprio_get_time(q);
> + u64 gate_duration_left = ktime_sub(new_base_time, now);
What is special about "now" as a moment in time? Gate durations are
calculated relative to the moment when the sched_entry begins.
> + struct qdisc_size_table *stab =
> + rtnl_dereference(q->root->stab);
"q->root" is "sch".
> + int num_tc = netdev_get_num_tc(dev);
It would be nice if you could pay some attention to the preferred
variable declaration style, i.e. longer lines come first. If you cannot
easily respect that, you could split the variable declarations from
their initialization.
> +
> + oper->cycle_end_time = new_base_time;
> + curr_entry->end_time = new_base_time;
> + curr_entry->correction_active = true;
> +
> + update_open_gate_duration(curr_entry, oper, num_tc,
> + gate_duration_left);
Recalculating open gate durations with a cycle time correction seems
very complicated, at least from this code path. What depends on this?
The data path only looks at the gate_close_time. Can we get away with
updating only the gate_close_time?
> + update_gate_close_time(curr_entry, new_base_time, num_tc);
> + taprio_update_queue_max_sdu(q, oper, stab);
> + taprio_set_budgets(q, oper, curr_entry);
There's a lot of duplication between the correction management from
advance_sched() and the one from taprio_start_sched(). I wonder if some
of it can go into a common function.
> + }
> + }
> +
> + if (!hrtimer_active(&q->advance_timer) ||
> + cycle_corr_active(oper->cycle_time_correction)) {
> + /* Use new admin base time if :
> + * 1. there's no active oper
> + * 2. there's active oper and we will change to the new admin
> + * schedule after the current entry from oper ends
> */
> - oper->cycle_time_correction = 0;
> + expires = new_base_time;
> }
>
> - hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
> + hrtimer_start(&q->advance_timer, expires, HRTIMER_MODE_ABS);
> }
>
> static void taprio_set_picos_per_byte(struct net_device *dev,
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry
2023-11-09 13:18 ` Vladimir Oltean
@ 2023-11-15 11:57 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-15 11:57 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 9/11/2023 9:18 pm, Vladimir Oltean wrote:
> On Tue, Nov 07, 2023 at 06:20:23AM -0500, Faizal Rahim wrote:
>> Handles cycle time adjustments for the current active entry
>
> Use the imperative mood for commit messages, i.e. "handle".
>
>> when new admin base time occurs quickly, either within the
>> current entry or the next one.
>>
>> Changes covers:
>> 1. Negative cycle correction or truncation
>> Occurs when the new admin base time falls before the expiry of the
>> current running entry.
>>
>> 2. Positive cycle correction or extension
>> Occurs when the new admin base time falls within the next entry,
>> and the current entry is the cycle's last entry. In this case, the
>> changes in taprio_start_sched() extends the schedule, preventing
>> old oper schedule from resuming and getting truncated in the next
>> advance_sched() call.
>>
>> 3. A new API, update_gate_close_time(), has been created to update
>> the gate_close_time of the current entry in the event of cycle
>> correction.
>>
>> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
>> ---
>> net/sched/sch_taprio.c | 72 +++++++++++++++++++++++++++++++-----------
>> 1 file changed, 53 insertions(+), 19 deletions(-)
>>
>> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
>> index c60e9e7ac193..56743754d42e 100644
>> --- a/net/sched/sch_taprio.c
>> +++ b/net/sched/sch_taprio.c
>> @@ -1379,41 +1379,75 @@ static void setup_first_end_time(struct taprio_sched *q,
>> rcu_assign_pointer(q->current_entry, NULL);
>> }
>>
>> +static void update_gate_close_time(struct sched_entry *current_entry,
>> + ktime_t new_end_time,
>> + int num_tc)
>> +{
>> + int tc;
>> +
>> + for (tc = 0; tc < num_tc; tc++) {
>> + if (current_entry->gate_mask & BIT(tc))
>> + current_entry->gate_close_time[tc] = new_end_time;
>> + }
>> +}
>> +
>> static void taprio_start_sched(struct Qdisc *sch,
>> ktime_t new_base_time,
>> - struct sched_gate_list *new)
>> + struct sched_gate_list *admin)
>> {
>> struct taprio_sched *q = qdisc_priv(sch);
>> + ktime_t expires = hrtimer_get_expires(&q->advance_timer);
>> + struct net_device *dev = qdisc_dev(q->root);
>> + struct sched_entry *curr_entry = NULL;
>> struct sched_gate_list *oper = NULL;
>> - ktime_t expires, start;
>>
>> if (FULL_OFFLOAD_IS_ENABLED(q->flags))
>> return;
>>
>> oper = rcu_dereference_protected(q->oper_sched,
>> lockdep_is_held(&q->current_entry_lock));
>> + curr_entry = rcu_dereference_protected(q->current_entry,
>> + lockdep_is_held(&q->current_entry_lock));
>>
>> - expires = hrtimer_get_expires(&q->advance_timer);
>> - if (expires == 0)
>> - expires = KTIME_MAX;
>> + if (hrtimer_active(&q->advance_timer)) {
>> + oper->cycle_time_correction =
>> + get_cycle_time_correction(oper, new_base_time,
>> + curr_entry->end_time,
>> + curr_entry);
>>
>> - /* If the new schedule starts before the next expiration, we
>> - * reprogram it to the earliest one, so we change the admin
>> - * schedule to the operational one at the right time.
>> - */
>> - start = min_t(ktime_t, new_base_time, expires);
>> -
>> - if (expires != KTIME_MAX &&
>> - ktime_compare(start, new_base_time) == 0) {
>> - /* Since timer was changed to align to the new admin schedule,
>> - * setting the variable below to a non-initialized value will
>> - * indicate to advance_sched() to call switch_schedules() after
>> - * this timer expires.
>
> I would appreciate not changing things that you've established in
> earlier changes. Try to keep stuff introduced earlier in a form that is
> as close as possible to the final form.
>
Sorry for that, it occurred because I aimed to split a patch that
specifically addressed a problem I identified, versus my entire patch set.
I'll keep that feedback in mind going forward.
>> + if (cycle_corr_active(oper->cycle_time_correction)) {
>> + /* This is the last entry we are running from oper,
>> + * subsequent entry will take from the new admin.
>> + */
>> + ktime_t now = taprio_get_time(q);
>> + u64 gate_duration_left = ktime_sub(new_base_time, now);
>
> What is special about "now" as a moment in time? Gate durations are
> calculated relative to the moment when the sched_entry begins.
>
My reasoning behind using "now" is that, by the time taprio_start_sched()
is called, "maybe" the entry has already run—up to 80% of its total
duration, for example. So, if an extension is needed for this last entry,
wouldn't it be more accurate to calculate the remaining 20% of the entry
duration plus the duration to the new base time? That's the idea behind the
calculation above.
In contrast, in the case of extension in `advance_sched()`, gate durations
during correction are calculated using the full interval duration. This is
because the entry hasn't run yet and will run for the entire interval
duration. Hence, the logic in the code below makes sense for advance_sched():
u64 new_gate_duration = next->interval + oper->cycle_time_correction;
Let me know what you think, my interpretation of what value gate_duration
should be, in this scenario, could be wrong.
>> + struct qdisc_size_table *stab =
>> + rtnl_dereference(q->root->stab);
>
> "q->root" is "sch".
>
>> + int num_tc = netdev_get_num_tc(dev);
>
> It would be nice if you could pay some attention to the preferred
> variable declaration style, i.e. longer lines come first. If you cannot
> easily respect that, you could split the variable declarations from
> their initialization.
>
Hmm, I usually try to follow that that reverse Christmas tree style, but in
this scenario, ensuring the proper initialization value for the local
variable seemed more important to me than adhering to that style.
I wasn't aware it was a set-in-stone kind of rule.
I'll make the change accordingly then.
>> +
>> + oper->cycle_end_time = new_base_time;
>> + curr_entry->end_time = new_base_time;
>> + curr_entry->correction_active = true;
>> +
>> + update_open_gate_duration(curr_entry, oper, num_tc,
>> + gate_duration_left);
>
> Recalculating open gate durations with a cycle time correction seems
> very complicated, at least from this code path. What depends on this?
> The data path only looks at the gate_close_time. Can we get away with
> updating only the gate_close_time?
>
Actually, it follows a similar logic to the earlier patches:
"net/sched: taprio: fix cycle time adjustment for next entry" and
"net/sched: taprio: update impacted fields during cycle time adjustment."
Those patches also involved recalculating gate duration with cycle time
correction.
Skipping the update for max_open_gate_duration means max_sdu value will be
incorrect. Neglecting gate_duration means entry->budget[tc] won't get the
correct update. Both of these affect taprio_dequeue_from_txq() and
taprio_enqueue().
By the way, weren't all these consequences part of your original concern in
the comment back in v1?
The only difference in this patch is that it's handling the "current entry"
instead of the "next entry" - like in advance_sched().
>> + update_gate_close_time(curr_entry, new_base_time, num_tc);
>> + taprio_update_queue_max_sdu(q, oper, stab);
>> + taprio_set_budgets(q, oper, curr_entry);
>
> There's a lot of duplication between the correction management from
> advance_sched() and the one from taprio_start_sched(). I wonder if some
> of it can go into a common function.
>
Considered that, but it's not as doable as it seems.
I can't toss all of these into a single function and call it for both
because there are some differences to navigate.
The common functions being called are taprio_update_queue_max_sdu(),
update_open_gate_duration(), and taprio_set_budgets().
However, advance_sched() doesn't call update_gate_close_time(), and there
are some differences in the order of where it's called in advance_sched().
Nevertheless, I'm all ears for suggestion on the common function.
>> + }
>> + }
>> +
>> + if (!hrtimer_active(&q->advance_timer) ||
>> + cycle_corr_active(oper->cycle_time_correction)) {
>> + /* Use new admin base time if :
>> + * 1. there's no active oper
>> + * 2. there's active oper and we will change to the new admin
>> + * schedule after the current entry from oper ends
>> */
>> - oper->cycle_time_correction = 0;
>> + expires = new_base_time;
>> }
>>
>> - hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
>> + hrtimer_start(&q->advance_timer, expires, HRTIMER_MODE_ABS);
>> }
>>
>> static void taprio_set_picos_per_byte(struct net_device *dev,
>> --
>> 2.25.1
>>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 net 0/7] qbv cycle time extension/truncation
2023-11-07 11:20 [PATCH v2 net 0/7] qbv cycle time extension/truncation Faizal Rahim
` (6 preceding siblings ...)
2023-11-07 11:20 ` [PATCH v2 net 7/7] net/sched: taprio: enable cycle time adjustment for current entry Faizal Rahim
@ 2023-11-08 15:51 ` Vladimir Oltean
2023-11-10 11:06 ` Abdul Rahim, Faizal
7 siblings, 1 reply; 31+ messages in thread
From: Vladimir Oltean @ 2023-11-08 15:51 UTC (permalink / raw)
To: Faizal Rahim
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
Hi Faizal,
On Tue, Nov 07, 2023 at 06:20:16AM -0500, Faizal Rahim wrote:
> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension,
> the Cycle Time Extension variable allows this extension of the last old
> cycle to be done in a defined way. If the last complete old cycle would
> normally end less than OperCycleTimeExtension nanoseconds before the new
> base time, then the last complete cycle before AdminBaseTime is reached
> is extended so that it ends at AdminBaseTime.
>
> Changes in v2:
>
> - Added 's64 cycle_time_correction' in 'sched_gate_list struct'.
> - Removed sched_changed created in v1 since the new cycle_time_correction
> field can also serve to indicate the need for a schedule change.
> - Added 'bool correction_active' in 'struct sched_entry' to represent
> the correction state from the entry's perspective and return corrected
> interval value when active.
> - Fix cycle time correction logics for the next entry in advance_sched()
> - Fix and implement proper cycle time correction logics for current
> entry in taprio_start_sched()
>
> v1 at:
> https://lore.kernel.org/lkml/20230530082541.495-1-muhammad.husaini.zulkifli@intel.com/
I like what came of this patch series. Thanks for following up and
taking over. I have some comments on individual patches.
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: [PATCH v2 net 0/7] qbv cycle time extension/truncation
2023-11-08 15:51 ` [PATCH v2 net 0/7] qbv cycle time extension/truncation Vladimir Oltean
@ 2023-11-10 11:06 ` Abdul Rahim, Faizal
0 siblings, 0 replies; 31+ messages in thread
From: Abdul Rahim, Faizal @ 2023-11-10 11:06 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev, linux-kernel
On 8/11/2023 11:51 pm, Vladimir Oltean wrote:
> Hi Faizal,
>
> On Tue, Nov 07, 2023 at 06:20:16AM -0500, Faizal Rahim wrote:
>> According to IEEE Std. 802.1Q-2018 section Q.5 CycleTimeExtension,
>> the Cycle Time Extension variable allows this extension of the last old
>> cycle to be done in a defined way. If the last complete old cycle would
>> normally end less than OperCycleTimeExtension nanoseconds before the new
>> base time, then the last complete cycle before AdminBaseTime is reached
>> is extended so that it ends at AdminBaseTime.
>>
>> Changes in v2:
>>
>> - Added 's64 cycle_time_correction' in 'sched_gate_list struct'.
>> - Removed sched_changed created in v1 since the new cycle_time_correction
>> field can also serve to indicate the need for a schedule change.
>> - Added 'bool correction_active' in 'struct sched_entry' to represent
>> the correction state from the entry's perspective and return corrected
>> interval value when active.
>> - Fix cycle time correction logics for the next entry in advance_sched()
>> - Fix and implement proper cycle time correction logics for current
>> entry in taprio_start_sched()
>>
>> v1 at:
>> https://lore.kernel.org/lkml/20230530082541.495-1-muhammad.husaini.zulkifli@intel.com/
>
> I like what came of this patch series. Thanks for following up and
> taking over. I have some comments on individual patches.
>
Hi Vladimir,
Thanks a bunch for your review and your patience with some of my basic
mistakes.
Appreciate the time and effort you put into it.
I'll take a bit to double-check the code and retest some stuff.
Will loop back with you soon.
Cheers.
^ permalink raw reply [flat|nested] 31+ messages in thread