[PATCH RFC 0/3] Patches on top of PE series

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/3] Patches on top of PE series
@ 2022-11-23  1:21 Joel Fernandes (Google)
  2022-11-23  1:21 ` [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate Joel Fernandes (Google)
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Joel Fernandes (Google) @ 2022-11-23  1:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Valentin Schneider, Vincent Guittot, kernel-team,
	John Stultz, Joel Fernandes, Qais Yousef, Will Deacon,
	Waiman Long, Boqun Feng, Connor O'Brien

Just some work I did recently on PE for the balancing issue and lock torture.
Could you please consider testing and/or including in next PE series posting?

Last PE series revived by Connor:
https://lore.kernel.org/lkml/20221003214501.2050087-1-connoro@google.com/

Thank you.

Joel Fernandes (Google) (3):
  sched/pe: Exclude balance callback queuing during proxy()'s migrate
  locktorture: Allow non-rtmutex lock types to be boosted
  locktorture: Make the rt_boost factor a tunable

 kernel/locking/locktorture.c | 93 +++++++++++++++++++-----------------
 kernel/sched/core.c          | 72 +++++++++++++++++++++++++++-
 kernel/sched/sched.h         |  7 ++-
 3 files changed, 126 insertions(+), 46 deletions(-)

-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-11-23  1:21 [PATCH RFC 0/3] Patches on top of PE series Joel Fernandes (Google)
@ 2022-11-23  1:21 ` Joel Fernandes (Google)
  2022-12-09 15:07   ` Dietmar Eggemann
  2022-11-23  1:21 ` [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted Joel Fernandes (Google)
  2022-11-23  1:21 ` [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable Joel Fernandes (Google)
  2 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes (Google) @ 2022-11-23  1:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Dietmar Eggemann, Ben Segall,
	Daniel Bristot de Oliveira, Davidlohr Bueso, Ingo Molnar,
	Josh Triplett, Juri Lelli, Mel Gorman, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Valentin Schneider,
	Vincent Guittot, kernel-team, John Stultz, Joel Fernandes,
	Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

In commit 565790d28b1e ("sched: Fix balance_callback()"), it is clear
that rq lock needs to be held from when balance callbacks are queued to
when __balance_callbacks() in schedule() is called.

This has to be done without dropping the runqueue lock in between. If
dropped between queuing and executing callbacks, it is possible that
another CPU, say in __sched_setscheduler() can queue balancing callbacks
and cause issues as show in that commit.

This is precisely what happens in proxy(). During a proxy(), the current
CPU's rq lock is temporary dropped when moving the tasks in the migrate
list to the owner CPU.

This commit therefore make proxy() exclude balance callback queuing on
other CPUs, in the section where proxy() temporarily drops the rq lock
of current CPU.

CPUs that acquire a remote CPU's rq lock but later queue a balance
callback, are made to call the new helpers in this commit to check
whether balance_lock is held. If it is held, then the rq lock is
released and a re-attempt is made to acquire it in the hopes that
the ban on balancing callback queuing has elapsed.

Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/core.c  | 72 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |  7 ++++-
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 88a5fa34dc06..aba90b3dc3ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -633,6 +633,29 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	}
 }
 
+/*
+ * Helper to call __task_rq_lock safely, in scenarios where we might be about to
+ * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
+ * could have released its rq lock while holding balance_lock. So release rq
+ * lock in such a situation to avoid deadlock, and retry.
+ */
+struct rq *__task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
+{
+	struct rq *rq;
+	bool locked = false;
+
+	do {
+		if (locked) {
+			__task_rq_unlock(rq, rf);
+			cpu_relax();
+		}
+		rq = __task_rq_lock(p, rf);
+		locked = true;
+	} while (raw_spin_is_locked(&rq->balance_lock));
+
+	return rq;
+}
+
 /*
  * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
  */
@@ -675,6 +698,29 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	}
 }
 
+/*
+ * Helper to call task_rq_lock safely, in scenarios where we might be about to
+ * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
+ * could have released its rq lock while holding balance_lock. So release rq
+ * lock in such a situation to avoid deadlock, and retry.
+ */
+struct rq *task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
+{
+	struct rq *rq;
+	bool locked = false;
+
+	do {
+		if (locked) {
+			task_rq_unlock(rq, p, rf);
+			cpu_relax();
+		}
+		rq = task_rq_lock(p, rf);
+		locked = true;
+	} while (raw_spin_is_locked(&rq->balance_lock));
+
+	return rq;
+}
+
 /*
  * RQ-clock updating methods:
  */
@@ -6739,6 +6785,12 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 		p->wake_cpu = wake_cpu;
 	}
 
+	/*
+	 * Prevent other CPUs from queuing balance callbacks while we migrate
+	 * tasks in the migrate_list with the rq lock released.
+	 */
+	raw_spin_lock(&rq->balance_lock);
+
 	rq_unpin_lock(rq, rf);
 	raw_spin_rq_unlock(rq);
 	raw_spin_rq_lock(that_rq);
@@ -6758,7 +6810,21 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
 	}
 
 	raw_spin_rq_unlock(that_rq);
+
+	/*
+	 * This may make lockdep unhappy as we acquire rq->lock with
+	 * balance_lock held. But that should be a false positive, as the
+	 * following pattern happens only on the current CPU with interrupts
+	 * disabled:
+	 * rq_lock()
+	 * balance_lock();
+	 * rq_unlock();
+	 * rq_lock();
+	 */
 	raw_spin_rq_lock(rq);
+
+	raw_spin_unlock(&rq->balance_lock);
+
 	rq_repin_lock(rq, rf);
 
 	return NULL; /* Retry task selection on _this_ CPU. */
@@ -7489,7 +7555,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	if (p->pi_top_task == pi_task && prio == p->prio && !dl_prio(prio))
 		return;
 
-	rq = __task_rq_lock(p, &rf);
+	rq = __task_rq_lock_balance(p, &rf);
 	update_rq_clock(rq);
 	/*
 	 * Set under pi_lock && rq->lock, such that the value can be used under
@@ -8093,7 +8159,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	 * To be able to change p->policy safely, the appropriate
 	 * runqueue lock must be held.
 	 */
-	rq = task_rq_lock(p, &rf);
+	rq = task_rq_lock_balance(p, &rf);
+
 	update_rq_clock(rq);
 
 	/*
@@ -10312,6 +10379,7 @@ void __init sched_init(void)
 
 		rq = cpu_rq(i);
 		raw_spin_lock_init(&rq->__lock);
+		raw_spin_lock_init(&rq->balance_lock);
 		rq->nr_running = 0;
 		rq->calc_load_active = 0;
 		rq->calc_load_update = jiffies + LOAD_FREQ;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 354e75587fed..17947a4009de 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1057,6 +1057,7 @@ struct rq {
 	unsigned long		cpu_capacity_orig;
 
 	struct callback_head	*balance_callback;
+	raw_spinlock_t		balance_lock;
 
 	unsigned char		nohz_idle_balance;
 	unsigned char		idle_balance;
@@ -1748,18 +1749,22 @@ queue_balance_callback(struct rq *rq,
 		       void (*func)(struct rq *rq))
 {
 	lockdep_assert_rq_held(rq);
+	raw_spin_lock(&rq->balance_lock);
 
 	/*
 	 * Don't (re)queue an already queued item; nor queue anything when
 	 * balance_push() is active, see the comment with
 	 * balance_push_callback.
 	 */
-	if (unlikely(head->next || rq->balance_callback == &balance_push_callback))
+	if (unlikely(head->next || rq->balance_callback == &balance_push_callback)) {
+		raw_spin_unlock(&rq->balance_lock);
 		return;
+	}
 
 	head->func = (void (*)(struct callback_head *))func;
 	head->next = rq->balance_callback;
 	rq->balance_callback = head;
+	raw_spin_unlock(&rq->balance_lock);
 }
 
 #define rcu_dereference_check_sched_domain(p) \
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-11-23  1:21 ` [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate Joel Fernandes (Google)
@ 2022-12-09 15:07   ` Dietmar Eggemann
  2022-12-09 16:52     ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Dietmar Eggemann @ 2022-12-09 15:07 UTC (permalink / raw)
  To: Joel Fernandes (Google), linux-kernel
  Cc: Ben Segall, Daniel Bristot de Oliveira, Davidlohr Bueso,
	Ingo Molnar, Josh Triplett, Juri Lelli, Mel Gorman,
	Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On 23/11/2022 02:21, Joel Fernandes (Google) wrote:
> In commit 565790d28b1e ("sched: Fix balance_callback()"), it is clear
> that rq lock needs to be held from when balance callbacks are queued to
> when __balance_callbacks() in schedule() is called.
> 
> This has to be done without dropping the runqueue lock in between. If
> dropped between queuing and executing callbacks, it is possible that
> another CPU, say in __sched_setscheduler() can queue balancing callbacks
> and cause issues as show in that commit.
> 
> This is precisely what happens in proxy(). During a proxy(), the current
> CPU's rq lock is temporary dropped when moving the tasks in the migrate
> list to the owner CPU.
> 
> This commit therefore make proxy() exclude balance callback queuing on
> other CPUs, in the section where proxy() temporarily drops the rq lock
> of current CPU.
> 
> CPUs that acquire a remote CPU's rq lock but later queue a balance
> callback, are made to call the new helpers in this commit to check
> whether balance_lock is held. If it is held, then the rq lock is
> released and a re-attempt is made to acquire it in the hopes that
> the ban on balancing callback queuing has elapsed.
> 
> Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/sched/core.c  | 72 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/sched/sched.h |  7 ++++-
>  2 files changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 88a5fa34dc06..aba90b3dc3ef 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -633,6 +633,29 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
>  	}
>  }
>  
> +/*
> + * Helper to call __task_rq_lock safely, in scenarios where we might be about to
> + * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
> + * could have released its rq lock while holding balance_lock. So release rq
> + * lock in such a situation to avoid deadlock, and retry.
> + */
> +struct rq *__task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
> +{
> +	struct rq *rq;
> +	bool locked = false;
> +
> +	do {
> +		if (locked) {
> +			__task_rq_unlock(rq, rf);
> +			cpu_relax();
> +		}
> +		rq = __task_rq_lock(p, rf);
> +		locked = true;
> +	} while (raw_spin_is_locked(&rq->balance_lock));
> +
> +	return rq;
> +}
> +
>  /*
>   * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
>   */
> @@ -675,6 +698,29 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
>  	}
>  }
>  
> +/*
> + * Helper to call task_rq_lock safely, in scenarios where we might be about to
> + * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
> + * could have released its rq lock while holding balance_lock. So release rq
> + * lock in such a situation to avoid deadlock, and retry.
> + */
> +struct rq *task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
> +{
> +	struct rq *rq;
> +	bool locked = false;
> +
> +	do {
> +		if (locked) {
> +			task_rq_unlock(rq, p, rf);
> +			cpu_relax();
> +		}
> +		rq = task_rq_lock(p, rf);
> +		locked = true;
> +	} while (raw_spin_is_locked(&rq->balance_lock));
> +
> +	return rq;
> +}
> +
>  /*
>   * RQ-clock updating methods:
>   */
> @@ -6739,6 +6785,12 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
>  		p->wake_cpu = wake_cpu;
>  	}
>  
> +	/*
> +	 * Prevent other CPUs from queuing balance callbacks while we migrate
> +	 * tasks in the migrate_list with the rq lock released.
> +	 */
> +	raw_spin_lock(&rq->balance_lock);
> +
>  	rq_unpin_lock(rq, rf);
>  	raw_spin_rq_unlock(rq);
>  	raw_spin_rq_lock(that_rq);
> @@ -6758,7 +6810,21 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
>  	}
>  
>  	raw_spin_rq_unlock(that_rq);
> +
> +	/*
> +	 * This may make lockdep unhappy as we acquire rq->lock with
> +	 * balance_lock held. But that should be a false positive, as the
> +	 * following pattern happens only on the current CPU with interrupts
> +	 * disabled:
> +	 * rq_lock()
> +	 * balance_lock();
> +	 * rq_unlock();
> +	 * rq_lock();
> +	 */
>  	raw_spin_rq_lock(rq);
> +
> +	raw_spin_unlock(&rq->balance_lock);
> +
>  	rq_repin_lock(rq, rf);
>  
>  	return NULL; /* Retry task selection on _this_ CPU. */
> @@ -7489,7 +7555,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
>  	if (p->pi_top_task == pi_task && prio == p->prio && !dl_prio(prio))
>  		return;
>  
> -	rq = __task_rq_lock(p, &rf);
> +	rq = __task_rq_lock_balance(p, &rf);
>  	update_rq_clock(rq);
>  	/*
>  	 * Set under pi_lock && rq->lock, such that the value can be used under
> @@ -8093,7 +8159,8 @@ static int __sched_setscheduler(struct task_struct *p,
>  	 * To be able to change p->policy safely, the appropriate
>  	 * runqueue lock must be held.
>  	 */
> -	rq = task_rq_lock(p, &rf);
> +	rq = task_rq_lock_balance(p, &rf);
> +
>  	update_rq_clock(rq);
>  
>  	/*

You consider rt_mutex_setprio() and __sched_setscheduler() versus
proxy() but what about all the other places like load_balance(),
update_blocked_averages(),  __set_cpus_allowed_ptr() and many
more in which we take the rq lock (via task_rq_lock() or
rq_lock{_xxx}())?

With your changes to locktorture in {2-3}/3 you still run CFS
lock_torture_writers but you should see the issue popping up
in __set_cpus_allowed_ptr() (from torture_shuffle()) for example.

Tried with:

insmod /lib/modules/torture.ko
insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 nlocks=3
                                                                      ^^^^^^^^^^^^^^^^^

When changing all lock_torture_writer's to FIFO it becomes even
more visible.

-->8--

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index e4529c2166e9..ea75d525fe7c 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -683,7 +683,8 @@ static int lock_torture_writer(void *arg)
        DEFINE_TORTURE_RANDOM(rand);
 
        VERBOSE_TOROUT_STRING("lock_torture_writer task started");
-       set_user_nice(current, MAX_NICE);
+       if (!rt_task(current))
+               set_user_nice(current, MAX_NICE);
 
        do {
                if ((torture_random(&rand) & 0xfffff) == 0)
diff --git a/kernel/torture.c b/kernel/torture.c
index 1d0dd88369e3..55d8ac417a4a 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -57,6 +57,9 @@ module_param(verbose_sleep_duration, int, 0444);
 static int random_shuffle;
 module_param(random_shuffle, int, 0444);
 
+static int lock_torture_writer_fifo;
+module_param(lock_torture_writer_fifo, int, 0444);
+
 static char *torture_type;
 static int verbose;
 
@@ -734,7 +737,7 @@ bool stutter_wait(const char *title)
        cond_resched_tasks_rcu_qs();
        spt = READ_ONCE(stutter_pause_test);
        for (; spt; spt = READ_ONCE(stutter_pause_test)) {
-               if (!ret) {
+               if (!ret && !rt_task(current)) {
                        sched_set_normal(current, MAX_NICE);
                        ret = true;
                }
@@ -944,6 +947,11 @@ int _torture_create_kthread(int (*fn)(void *arg), void *arg, char *s, char *m,
                *tp = NULL;
                return ret;
        }
+
+       if (lock_torture_writer_fifo &&
+           !strncmp(s, "lock_torture_writer", strlen(s)))
+               sched_set_fifo(*tp);
+
        wake_up_process(*tp);  // Process is sleeping, so ordering provided.
        torture_shuffle_task_register(*tp);
        return ret;

[...]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-12-09 15:07   ` Dietmar Eggemann
@ 2022-12-09 16:52     ` Joel Fernandes
  2022-12-12 14:39       ` Dietmar Eggemann
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2022-12-09 16:52 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Juri Lelli,
	Mel Gorman, Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

Hi Dietmar!

On Fri, Dec 9, 2022 at 3:07 PM Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
>
> On 23/11/2022 02:21, Joel Fernandes (Google) wrote:
> > In commit 565790d28b1e ("sched: Fix balance_callback()"), it is clear
> > that rq lock needs to be held from when balance callbacks are queued to
> > when __balance_callbacks() in schedule() is called.
> >
> > This has to be done without dropping the runqueue lock in between. If
> > dropped between queuing and executing callbacks, it is possible that
> > another CPU, say in __sched_setscheduler() can queue balancing callbacks
> > and cause issues as show in that commit.
> >
> > This is precisely what happens in proxy(). During a proxy(), the current
> > CPU's rq lock is temporary dropped when moving the tasks in the migrate
> > list to the owner CPU.
> >
> > This commit therefore make proxy() exclude balance callback queuing on
> > other CPUs, in the section where proxy() temporarily drops the rq lock
> > of current CPU.
> >
> > CPUs that acquire a remote CPU's rq lock but later queue a balance
> > callback, are made to call the new helpers in this commit to check
> > whether balance_lock is held. If it is held, then the rq lock is
> > released and a re-attempt is made to acquire it in the hopes that
> > the ban on balancing callback queuing has elapsed.
> >
> > Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/sched/core.c  | 72 ++++++++++++++++++++++++++++++++++++++++++--
> >  kernel/sched/sched.h |  7 ++++-
> >  2 files changed, 76 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 88a5fa34dc06..aba90b3dc3ef 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -633,6 +633,29 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
> >       }
> >  }
> >
> > +/*
> > + * Helper to call __task_rq_lock safely, in scenarios where we might be about to
> > + * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
> > + * could have released its rq lock while holding balance_lock. So release rq
> > + * lock in such a situation to avoid deadlock, and retry.
> > + */
> > +struct rq *__task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
> > +{
> > +     struct rq *rq;
> > +     bool locked = false;
> > +
> > +     do {
> > +             if (locked) {
> > +                     __task_rq_unlock(rq, rf);
> > +                     cpu_relax();
> > +             }
> > +             rq = __task_rq_lock(p, rf);
> > +             locked = true;
> > +     } while (raw_spin_is_locked(&rq->balance_lock));
> > +
> > +     return rq;
> > +}
> > +
> >  /*
> >   * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
> >   */
> > @@ -675,6 +698,29 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
> >       }
> >  }
> >
> > +/*
> > + * Helper to call task_rq_lock safely, in scenarios where we might be about to
> > + * queue a balance callback on a remote CPU. That CPU might be in proxy(), and
> > + * could have released its rq lock while holding balance_lock. So release rq
> > + * lock in such a situation to avoid deadlock, and retry.
> > + */
> > +struct rq *task_rq_lock_balance(struct task_struct *p, struct rq_flags *rf)
> > +{
> > +     struct rq *rq;
> > +     bool locked = false;
> > +
> > +     do {
> > +             if (locked) {
> > +                     task_rq_unlock(rq, p, rf);
> > +                     cpu_relax();
> > +             }
> > +             rq = task_rq_lock(p, rf);
> > +             locked = true;
> > +     } while (raw_spin_is_locked(&rq->balance_lock));
> > +
> > +     return rq;
> > +}
> > +
> >  /*
> >   * RQ-clock updating methods:
> >   */
> > @@ -6739,6 +6785,12 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> >               p->wake_cpu = wake_cpu;
> >       }
> >
> > +     /*
> > +      * Prevent other CPUs from queuing balance callbacks while we migrate
> > +      * tasks in the migrate_list with the rq lock released.
> > +      */
> > +     raw_spin_lock(&rq->balance_lock);
> > +
> >       rq_unpin_lock(rq, rf);
> >       raw_spin_rq_unlock(rq);
> >       raw_spin_rq_lock(that_rq);
> > @@ -6758,7 +6810,21 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> >       }
> >
> >       raw_spin_rq_unlock(that_rq);
> > +
> > +     /*
> > +      * This may make lockdep unhappy as we acquire rq->lock with
> > +      * balance_lock held. But that should be a false positive, as the
> > +      * following pattern happens only on the current CPU with interrupts
> > +      * disabled:
> > +      * rq_lock()
> > +      * balance_lock();
> > +      * rq_unlock();
> > +      * rq_lock();
> > +      */
> >       raw_spin_rq_lock(rq);
> > +
> > +     raw_spin_unlock(&rq->balance_lock);
> > +
> >       rq_repin_lock(rq, rf);
> >
> >       return NULL; /* Retry task selection on _this_ CPU. */
> > @@ -7489,7 +7555,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
> >       if (p->pi_top_task == pi_task && prio == p->prio && !dl_prio(prio))
> >               return;
> >
> > -     rq = __task_rq_lock(p, &rf);
> > +     rq = __task_rq_lock_balance(p, &rf);
> >       update_rq_clock(rq);
> >       /*
> >        * Set under pi_lock && rq->lock, such that the value can be used under
> > @@ -8093,7 +8159,8 @@ static int __sched_setscheduler(struct task_struct *p,
> >        * To be able to change p->policy safely, the appropriate
> >        * runqueue lock must be held.
> >        */
> > -     rq = task_rq_lock(p, &rf);
> > +     rq = task_rq_lock_balance(p, &rf);
> > +
> >       update_rq_clock(rq);
> >
> >       /*
>
> You consider rt_mutex_setprio() and __sched_setscheduler() versus
> proxy() but what about all the other places like load_balance(),
> update_blocked_averages(),  __set_cpus_allowed_ptr() and many
> more in which we take the rq lock (via task_rq_lock() or
> rq_lock{_xxx}())?

You are right. Those paths potentially need updates as well. Any
chance you could post stack traces or logs of those issues, just in
case they have new nuggets of information? If you don't have them,
don't bother, I'll reproduce it.

> With your changes to locktorture in {2-3}/3 you still run CFS
> lock_torture_writers but you should see the issue popping up
> in __set_cpus_allowed_ptr() (from torture_shuffle()) for example.
>
> Tried with:
>
> insmod /lib/modules/torture.ko
> insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 nlocks=3
>                                                                       ^^^^^^^^^^^^^^^^^
>
> When changing all lock_torture_writer's to FIFO it becomes even
> more visible.

Ok, thank you, I will make it more aggressively set to RT. The
previous locktorture was setting RT once every minute or so, at least
now that is down to 10 seconds ;-). But as you highlight with the
locktorture diff below, it needs to go further.

Anyway, this is going to be a nice holiday/Christmas project for me,
so thank you in advance for the holiday gift  :-)  ^_^

thanks,

 - Joel

>
> -->8--
>
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index e4529c2166e9..ea75d525fe7c 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -683,7 +683,8 @@ static int lock_torture_writer(void *arg)
>         DEFINE_TORTURE_RANDOM(rand);
>
>         VERBOSE_TOROUT_STRING("lock_torture_writer task started");
> -       set_user_nice(current, MAX_NICE);
> +       if (!rt_task(current))
> +               set_user_nice(current, MAX_NICE);
>
>         do {
>                 if ((torture_random(&rand) & 0xfffff) == 0)
> diff --git a/kernel/torture.c b/kernel/torture.c
> index 1d0dd88369e3..55d8ac417a4a 100644
> --- a/kernel/torture.c
> +++ b/kernel/torture.c
> @@ -57,6 +57,9 @@ module_param(verbose_sleep_duration, int, 0444);
>  static int random_shuffle;
>  module_param(random_shuffle, int, 0444);
>
> +static int lock_torture_writer_fifo;
> +module_param(lock_torture_writer_fifo, int, 0444);
> +
>  static char *torture_type;
>  static int verbose;
>
> @@ -734,7 +737,7 @@ bool stutter_wait(const char *title)
>         cond_resched_tasks_rcu_qs();
>         spt = READ_ONCE(stutter_pause_test);
>         for (; spt; spt = READ_ONCE(stutter_pause_test)) {
> -               if (!ret) {
> +               if (!ret && !rt_task(current)) {
>                         sched_set_normal(current, MAX_NICE);
>                         ret = true;
>                 }
> @@ -944,6 +947,11 @@ int _torture_create_kthread(int (*fn)(void *arg), void *arg, char *s, char *m,
>                 *tp = NULL;
>                 return ret;
>         }
> +
> +       if (lock_torture_writer_fifo &&
> +           !strncmp(s, "lock_torture_writer", strlen(s)))
> +               sched_set_fifo(*tp);
> +
>         wake_up_process(*tp);  // Process is sleeping, so ordering provided.
>         torture_shuffle_task_register(*tp);
>         return ret;
>
> [...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-12-09 16:52     ` Joel Fernandes
@ 2022-12-12 14:39       ` Dietmar Eggemann
  2022-12-15 23:12         ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Dietmar Eggemann @ 2022-12-12 14:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Juri Lelli,
	Mel Gorman, Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

Hi Joel,

On 09/12/2022 17:52, Joel Fernandes wrote:
> Hi Dietmar!
> 
> On Fri, Dec 9, 2022 at 3:07 PM Dietmar Eggemann
> <dietmar.eggemann@arm.com> wrote:
>>
>> On 23/11/2022 02:21, Joel Fernandes (Google) wrote:

[...]

>> You consider rt_mutex_setprio() and __sched_setscheduler() versus
>> proxy() but what about all the other places like load_balance(),
>> update_blocked_averages(),  __set_cpus_allowed_ptr() and many
>> more in which we take the rq lock (via task_rq_lock() or
>> rq_lock{_xxx}())?
> 
> You are right. Those paths potentially need updates as well. Any

IMHO, you would still have to lock the entire `queue->execute` (1)->(2)
thing, like keeping the rq lock currently.

__schedule()

  pick_next_task()
    pick_next_task_{rt/dl}()
      set_next_task_{rt/dl}() 
       {rt/deadline}_queue_push_tasks()
         queue_balance_callback() -->    (1)

  proxy()                         ---    !!!

  finish_lock_switch()
    __balance_callbacks()         <--    (2)

  __balance_callbacks(rq)         <--    (2)

Otherwise. something like this could happen:

With `x-x` : smp_processor_id()-cpu_of(rq)

lock_torture_wr-1745 [003] 338.270963: queue_balance_callback: 3-3-> 
lock_torture_wr-1745 [003] 338.270965: queue_balance_callback: 3-3<-
            cat-1726 [001] 338.270969: __schedule: proxy() 1-1->
lock_torture_wr-1745 [003] 338.270971: __schedule: proxy() 3-3-> 
            cat-1726 [001] 338.270972: __schedule: proxy() 1-1<-
lock_torture_wr-1745 [003] 338.270979: __schedule: proxy() 3-3<-
         <idle>-0    [002] 338.270981: __schedule: proxy() 2-2->
         <idle>-0    [002] 338.270984: __schedule: proxy() 2-2<-
lock_torture_wr-1745 [003] 338.270985: __schedule: proxy() 3-3->
    migration/4-34   [004] 338.270989: active_load_balance_cpu_stop: rq_pin_lock() 4-3 <-- ! cb on CPU3 still enqueued

> chance you could post stack traces or logs of those issues, just in
> case they have new nuggets of information? If you don't have them,
> don't bother, I'll reproduce it.

insmod /lib/modules/torture.ko random_shuffle=1 lock_torture_writer_fifo=1
insmod /lib/modules/locktorture.ko torture_type=mutex_lock nlocks=3

[  447.046916] rq->balance_callback && rq->balance_callback != &balance_push_callback
[  447.046926] WARNING: CPU: 1 PID: 1648 at kernel/sched/sched.h:1583 task_rq_lock+0x148/0x170
[  447.062914] Modules linked in: locktorture(O) torture(O)
[  447.068254] CPU: 1 PID: 1648 Comm: torture_shuffle Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
[  447.079168] Hardware name: ARM Juno development board (r0) (DT)
[  447.085106] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  447.092095] pc : task_rq_lock+0x148/0x170
[  447.096121] lr : task_rq_lock+0x148/0x170
[  447.100146] sp : ffff80000b85bd30
...
[  447.175138] Call trace:
[  447.177589]  task_rq_lock+0x148/0x170
[  447.181267]  __set_cpus_allowed_ptr+0x34/0xc0
[  447.185643]  set_cpus_allowed_ptr+0x30/0x60
[  447.189843]  torture_shuffle+0x158/0x224 [torture]
[  447.194666]  kthread+0x10c/0x110
[  447.197906]  ret_from_fork+0x10/0x20
...
[  447.233560] ---[ end trace 0000000000000000 ]---


[  446.542532] ------------[ cut here ]------------
[  446.553224] rq->balance_callback && rq->balance_callback != &balance_push_callback
[  446.553243] WARNING: CPU: 3 PID: 0 at kernel/sched/sched.h:1583 update_blocked_averages+0x784/0x78c
[  446.576089] Modules linked in: locktorture(O+) torture(O)
[  446.581551] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
[  446.591723] Hardware name: ARM Juno development board (r0) (DT)
[  446.597691] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  446.604712] pc : update_blocked_averages+0x784/0x78c
[  446.609723] lr : update_blocked_averages+0x784/0x78c
[  446.614733] sp : ffff80000b403b70
...
[  446.690142] Call trace:
[  446.692609]  update_blocked_averages+0x784/0x78c
[  446.697270]  newidle_balance+0x184/0x5f0
[  446.701232]  pick_next_task_fair+0x2c/0x500
[  446.705456]  __schedule+0x1d4/0x1084
[  446.709070]  schedule_idle+0x28/0x4c
[  446.712682]  do_idle+0x1d4/0x2d0
[  446.715946]  cpu_startup_entry+0x28/0x30
[  446.719909]  secondary_start_kernel+0x138/0x15c
[  446.724486]  __secondary_switched+0xb0/0xb4
...
[  446.765848] ---[ end trace 0000000000000000 ]---


[   95.091675] ------------[ cut here ]------------
[   95.096301] rq->balance_callback && rq->balance_callback != &balance_push_callback
[   95.096322] WARNING: CPU: 5 PID: 39 at kernel/sched/sched.h:1583 load_balance+0x644/0xdc0
[   95.103135] mutex_lock-torture: Creating lock_torture_writer task
[   95.103238] mutex_lock-torture: lock_torture_writer task started
[   95.110692] Modules linked in: locktorture(O+) torture(O)
[   95.136699] CPU: 5 PID: 39 Comm: migration/5 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
[   95.147137] Hardware name: ARM Juno development board (r0) (DT)
[   95.153105] Stopper: 0x0 <- 0x0
[   95.156282] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   95.163306] pc : load_balance+0x644/0xdc0
[   95.167356] lr : load_balance+0x644/0xdc0
[   95.171405] sp : ffff80000b4cbaf0
...
[   95.246833] Call trace:
[   95.249300]  load_balance+0x644/0xdc0
[   95.253000]  newidle_balance+0x290/0x6f0
[   95.256963]  pick_next_task_fair+0x2c/0x510
[   95.261188]  __schedule+0x1d4/0x1084
[   95.264801]  schedule+0x64/0x11c
[   95.268063]  smpboot_thread_fn+0xa4/0x270
[   95.272115]  kthread+0x10c/0x110
[   95.275375]  ret_from_fork+0x10/0x20
...
[   95.316477] ---[ end trace 0000000000000000 ]---


[  134.893379] ------------[ cut here ]------------
[  134.898066] rq->balance_callback && rq->balance_callback != &balance_push_callback
[  134.898088] WARNING: CPU: 4 PID: 0 at kernel/sched/sched.h:1583 sched_rt_period_timer+0x1dc/0x3f0
[  134.914683] Modules linked in: locktorture(O) torture(O)
[  134.920059] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
[  134.930235] Hardware name: ARM Juno development board (r0) (DT)
[  134.936205] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  134.943229] pc : sched_rt_period_timer+0x1dc/0x3f0
[  134.948069] lr : sched_rt_period_timer+0x1dc/0x3f0
[  134.952908] sp : ffff80000b2cbde0
...
[  135.028342] Call trace:
[  135.030810]  sched_rt_period_timer+0x1dc/0x3f0
[  135.035300]  __hrtimer_run_queues+0x184/0x504
[  135.039700]  hrtimer_interrupt+0xe8/0x244
[  135.043749]  arch_timer_handler_phys+0x2c/0x44
[  135.048239]  handle_percpu_devid_irq+0x8c/0x150
[  135.052815]  generic_handle_domain_irq+0x2c/0x44
[  135.057480]  gic_handle_irq+0x44/0xc4
[  135.061180]  call_on_irq_stack+0x2c/0x5c
[  135.065143]  do_interrupt_handler+0x80/0x84
...
[  135.141122] ---[ end trace 0000000000000000 ]---
 
>> With your changes to locktorture in {2-3}/3 you still run CFS
>> lock_torture_writers but you should see the issue popping up
>> in __set_cpus_allowed_ptr() (from torture_shuffle()) for example.
>>
>> Tried with:
>>
>> insmod /lib/modules/torture.ko
>> insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 nlocks=3
>>                                                                       ^^^^^^^^^^^^^^^^^
>>
>> When changing all lock_torture_writer's to FIFO it becomes even
>> more visible.
> 
> Ok, thank you, I will make it more aggressively set to RT. The
> previous locktorture was setting RT once every minute or so, at least
> now that is down to 10 seconds ;-). But as you highlight with the
> locktorture diff below, it needs to go further.

Yeah, the test env is more aggressive this way and you spot potential
issues quicker.

> Anyway, this is going to be a nice holiday/Christmas project for me,
> so thank you in advance for the holiday gift  :-)  ^_^

Enjoy ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-12-12 14:39       ` Dietmar Eggemann
@ 2022-12-15 23:12         ` Joel Fernandes
  2022-12-15 23:31           ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Joel Fernandes @ 2022-12-15 23:12 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Juri Lelli,
	Mel Gorman, Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Mon, Dec 12, 2022 at 9:39 AM Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
>
> Hi Joel,
>
> On 09/12/2022 17:52, Joel Fernandes wrote:
> > Hi Dietmar!
> >
> > On Fri, Dec 9, 2022 at 3:07 PM Dietmar Eggemann
> > <dietmar.eggemann@arm.com> wrote:
> >>
> >> On 23/11/2022 02:21, Joel Fernandes (Google) wrote:
>
> [...]
>
> >> You consider rt_mutex_setprio() and __sched_setscheduler() versus
> >> proxy() but what about all the other places like load_balance(),
> >> update_blocked_averages(),  __set_cpus_allowed_ptr() and many
> >> more in which we take the rq lock (via task_rq_lock() or
> >> rq_lock{_xxx}())?
> >
> > You are right. Those paths potentially need updates as well. Any
>
> IMHO, you would still have to lock the entire `queue->execute` (1)->(2)
> thing, like keeping the rq lock currently.
>
> __schedule()
>
>   pick_next_task()
>     pick_next_task_{rt/dl}()
>       set_next_task_{rt/dl}()
>        {rt/deadline}_queue_push_tasks()
>          queue_balance_callback() -->    (1)
>
>   proxy()                         ---    !!!
>
>   finish_lock_switch()
>     __balance_callbacks()         <--    (2)
>
>   __balance_callbacks(rq)         <--    (2)

Instead of locking it throughout, I think we can keep my initial patch
and just execute the balance callbacks in proxy() before dropping the
rq lock. I *think* that will make it work properly, but I could be
missing something.

Anyway I see the issue you bring up, I took care of balance CBs queued
from *other CPUs* while the rq lock is dropped, but the current CPU
executing proxy() could itself have queued balance callbacks. Dropping
the rq lock then causes other CPUs to see the proxy() CPU's balance
CBs in the callback list.

Anyway I will try the above and get back to you. Thanks so much again
for the insights. Will test as you suggested below.

Thanks,

 - Joel


> Otherwise. something like this could happen:
>
> With `x-x` : smp_processor_id()-cpu_of(rq)
>
> lock_torture_wr-1745 [003] 338.270963: queue_balance_callback: 3-3->
> lock_torture_wr-1745 [003] 338.270965: queue_balance_callback: 3-3<-
>             cat-1726 [001] 338.270969: __schedule: proxy() 1-1->
> lock_torture_wr-1745 [003] 338.270971: __schedule: proxy() 3-3->
>             cat-1726 [001] 338.270972: __schedule: proxy() 1-1<-
> lock_torture_wr-1745 [003] 338.270979: __schedule: proxy() 3-3<-
>          <idle>-0    [002] 338.270981: __schedule: proxy() 2-2->
>          <idle>-0    [002] 338.270984: __schedule: proxy() 2-2<-
> lock_torture_wr-1745 [003] 338.270985: __schedule: proxy() 3-3->
>     migration/4-34   [004] 338.270989: active_load_balance_cpu_stop: rq_pin_lock() 4-3 <-- ! cb on CPU3 still enqueued
>
> > chance you could post stack traces or logs of those issues, just in
> > case they have new nuggets of information? If you don't have them,
> > don't bother, I'll reproduce it.
>
> insmod /lib/modules/torture.ko random_shuffle=1 lock_torture_writer_fifo=1
> insmod /lib/modules/locktorture.ko torture_type=mutex_lock nlocks=3
>
> [  447.046916] rq->balance_callback && rq->balance_callback != &balance_push_callback
> [  447.046926] WARNING: CPU: 1 PID: 1648 at kernel/sched/sched.h:1583 task_rq_lock+0x148/0x170
> [  447.062914] Modules linked in: locktorture(O) torture(O)
> [  447.068254] CPU: 1 PID: 1648 Comm: torture_shuffle Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
> [  447.079168] Hardware name: ARM Juno development board (r0) (DT)
> [  447.085106] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  447.092095] pc : task_rq_lock+0x148/0x170
> [  447.096121] lr : task_rq_lock+0x148/0x170
> [  447.100146] sp : ffff80000b85bd30
> ...
> [  447.175138] Call trace:
> [  447.177589]  task_rq_lock+0x148/0x170
> [  447.181267]  __set_cpus_allowed_ptr+0x34/0xc0
> [  447.185643]  set_cpus_allowed_ptr+0x30/0x60
> [  447.189843]  torture_shuffle+0x158/0x224 [torture]
> [  447.194666]  kthread+0x10c/0x110
> [  447.197906]  ret_from_fork+0x10/0x20
> ...
> [  447.233560] ---[ end trace 0000000000000000 ]---
>
>
> [  446.542532] ------------[ cut here ]------------
> [  446.553224] rq->balance_callback && rq->balance_callback != &balance_push_callback
> [  446.553243] WARNING: CPU: 3 PID: 0 at kernel/sched/sched.h:1583 update_blocked_averages+0x784/0x78c
> [  446.576089] Modules linked in: locktorture(O+) torture(O)
> [  446.581551] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
> [  446.591723] Hardware name: ARM Juno development board (r0) (DT)
> [  446.597691] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  446.604712] pc : update_blocked_averages+0x784/0x78c
> [  446.609723] lr : update_blocked_averages+0x784/0x78c
> [  446.614733] sp : ffff80000b403b70
> ...
> [  446.690142] Call trace:
> [  446.692609]  update_blocked_averages+0x784/0x78c
> [  446.697270]  newidle_balance+0x184/0x5f0
> [  446.701232]  pick_next_task_fair+0x2c/0x500
> [  446.705456]  __schedule+0x1d4/0x1084
> [  446.709070]  schedule_idle+0x28/0x4c
> [  446.712682]  do_idle+0x1d4/0x2d0
> [  446.715946]  cpu_startup_entry+0x28/0x30
> [  446.719909]  secondary_start_kernel+0x138/0x15c
> [  446.724486]  __secondary_switched+0xb0/0xb4
> ...
> [  446.765848] ---[ end trace 0000000000000000 ]---
>
>
> [   95.091675] ------------[ cut here ]------------
> [   95.096301] rq->balance_callback && rq->balance_callback != &balance_push_callback
> [   95.096322] WARNING: CPU: 5 PID: 39 at kernel/sched/sched.h:1583 load_balance+0x644/0xdc0
> [   95.103135] mutex_lock-torture: Creating lock_torture_writer task
> [   95.103238] mutex_lock-torture: lock_torture_writer task started
> [   95.110692] Modules linked in: locktorture(O+) torture(O)
> [   95.136699] CPU: 5 PID: 39 Comm: migration/5 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
> [   95.147137] Hardware name: ARM Juno development board (r0) (DT)
> [   95.153105] Stopper: 0x0 <- 0x0
> [   95.156282] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   95.163306] pc : load_balance+0x644/0xdc0
> [   95.167356] lr : load_balance+0x644/0xdc0
> [   95.171405] sp : ffff80000b4cbaf0
> ...
> [   95.246833] Call trace:
> [   95.249300]  load_balance+0x644/0xdc0
> [   95.253000]  newidle_balance+0x290/0x6f0
> [   95.256963]  pick_next_task_fair+0x2c/0x510
> [   95.261188]  __schedule+0x1d4/0x1084
> [   95.264801]  schedule+0x64/0x11c
> [   95.268063]  smpboot_thread_fn+0xa4/0x270
> [   95.272115]  kthread+0x10c/0x110
> [   95.275375]  ret_from_fork+0x10/0x20
> ...
> [   95.316477] ---[ end trace 0000000000000000 ]---
>
>
> [  134.893379] ------------[ cut here ]------------
> [  134.898066] rq->balance_callback && rq->balance_callback != &balance_push_callback
> [  134.898088] WARNING: CPU: 4 PID: 0 at kernel/sched/sched.h:1583 sched_rt_period_timer+0x1dc/0x3f0
> [  134.914683] Modules linked in: locktorture(O) torture(O)
> [  134.920059] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
> [  134.930235] Hardware name: ARM Juno development board (r0) (DT)
> [  134.936205] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  134.943229] pc : sched_rt_period_timer+0x1dc/0x3f0
> [  134.948069] lr : sched_rt_period_timer+0x1dc/0x3f0
> [  134.952908] sp : ffff80000b2cbde0
> ...
> [  135.028342] Call trace:
> [  135.030810]  sched_rt_period_timer+0x1dc/0x3f0
> [  135.035300]  __hrtimer_run_queues+0x184/0x504
> [  135.039700]  hrtimer_interrupt+0xe8/0x244
> [  135.043749]  arch_timer_handler_phys+0x2c/0x44
> [  135.048239]  handle_percpu_devid_irq+0x8c/0x150
> [  135.052815]  generic_handle_domain_irq+0x2c/0x44
> [  135.057480]  gic_handle_irq+0x44/0xc4
> [  135.061180]  call_on_irq_stack+0x2c/0x5c
> [  135.065143]  do_interrupt_handler+0x80/0x84
> ...
> [  135.141122] ---[ end trace 0000000000000000 ]---
>
> >> With your changes to locktorture in {2-3}/3 you still run CFS
> >> lock_torture_writers but you should see the issue popping up
> >> in __set_cpus_allowed_ptr() (from torture_shuffle()) for example.
> >>
> >> Tried with:
> >>
> >> insmod /lib/modules/torture.ko
> >> insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 nlocks=3
> >>                                                                       ^^^^^^^^^^^^^^^^^
> >>
> >> When changing all lock_torture_writer's to FIFO it becomes even
> >> more visible.
> >
> > Ok, thank you, I will make it more aggressively set to RT. The
> > previous locktorture was setting RT once every minute or so, at least
> > now that is down to 10 seconds ;-). But as you highlight with the
> > locktorture diff below, it needs to go further.
>
> Yeah, the test env is more aggressive this way and you spot potential
> issues quicker.
>
> > Anyway, this is going to be a nice holiday/Christmas project for me,
> > so thank you in advance for the holiday gift  :-)  ^_^
>
> Enjoy ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate
  2022-12-15 23:12         ` Joel Fernandes
@ 2022-12-15 23:31           ` Joel Fernandes
  0 siblings, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2022-12-15 23:31 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Ingo Molnar, Josh Triplett, Juri Lelli,
	Mel Gorman, Paul E. McKenney, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien



> On Dec 15, 2022, at 6:12 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> On Mon, Dec 12, 2022 at 9:39 AM Dietmar Eggemann
> <dietmar.eggemann@arm.com> wrote:
>> 
>> Hi Joel,
>> 
>>> On 09/12/2022 17:52, Joel Fernandes wrote:
>>> Hi Dietmar!
>>> 
>>> On Fri, Dec 9, 2022 at 3:07 PM Dietmar Eggemann
>>> <dietmar.eggemann@arm.com> wrote:
>>>> 
>>>> On 23/11/2022 02:21, Joel Fernandes (Google) wrote:
>> 
>> [...]
>> 
>>>> You consider rt_mutex_setprio() and __sched_setscheduler() versus
>>>> proxy() but what about all the other places like load_balance(),
>>>> update_blocked_averages(),  __set_cpus_allowed_ptr() and many
>>>> more in which we take the rq lock (via task_rq_lock() or
>>>> rq_lock{_xxx}())?
>>> 
>>> You are right. Those paths potentially need updates as well. Any
>> 
>> IMHO, you would still have to lock the entire `queue->execute` (1)->(2)
>> thing, like keeping the rq lock currently.
>> 
>> __schedule()
>> 
>>  pick_next_task()
>>    pick_next_task_{rt/dl}()
>>      set_next_task_{rt/dl}()
>>       {rt/deadline}_queue_push_tasks()
>>         queue_balance_callback() -->    (1)
>> 
>>  proxy()                         ---    !!!
>> 
>>  finish_lock_switch()
>>    __balance_callbacks()         <--    (2)
>> 
>>  __balance_callbacks(rq)         <--    (2)
> 
> Instead of locking it throughout, I think we can keep my initial patch
> and just execute the balance callbacks in proxy() before dropping the
> rq lock. I *think* that will make it work properly, but I could be
> missing something.
> 
> Anyway I see the issue you bring up, I took care of balance CBs queued
> from *other CPUs* while the rq lock is dropped, but the current CPU
> executing proxy() could itself have queued balance callbacks. Dropping
> the rq lock then causes other CPUs to see the proxy() CPU's balance
> CBs in the callback list.
> 
> Anyway I will try the above and get back to you. Thanks so much again
> for the insights. Will test as you suggested below.

Another option is to dequeue them before dropping the rq lock, and then requeue them later. Again not sure if that’s a can of worms. But the first option appears to me safer in theory. Anyway, it looks like we have a couple of options on the table here for me to try.

 - Joel 


> 
> Thanks,
> 
> - Joel
> 
> 
>> Otherwise. something like this could happen:
>> 
>> With `x-x` : smp_processor_id()-cpu_of(rq)
>> 
>> lock_torture_wr-1745 [003] 338.270963: queue_balance_callback: 3-3->
>> lock_torture_wr-1745 [003] 338.270965: queue_balance_callback: 3-3<-
>>            cat-1726 [001] 338.270969: __schedule: proxy() 1-1->
>> lock_torture_wr-1745 [003] 338.270971: __schedule: proxy() 3-3->
>>            cat-1726 [001] 338.270972: __schedule: proxy() 1-1<-
>> lock_torture_wr-1745 [003] 338.270979: __schedule: proxy() 3-3<-
>>         <idle>-0    [002] 338.270981: __schedule: proxy() 2-2->
>>         <idle>-0    [002] 338.270984: __schedule: proxy() 2-2<-
>> lock_torture_wr-1745 [003] 338.270985: __schedule: proxy() 3-3->
>>    migration/4-34   [004] 338.270989: active_load_balance_cpu_stop: rq_pin_lock() 4-3 <-- ! cb on CPU3 still enqueued
>> 
>>> chance you could post stack traces or logs of those issues, just in
>>> case they have new nuggets of information? If you don't have them,
>>> don't bother, I'll reproduce it.
>> 
>> insmod /lib/modules/torture.ko random_shuffle=1 lock_torture_writer_fifo=1
>> insmod /lib/modules/locktorture.ko torture_type=mutex_lock nlocks=3
>> 
>> [  447.046916] rq->balance_callback && rq->balance_callback != &balance_push_callback
>> [  447.046926] WARNING: CPU: 1 PID: 1648 at kernel/sched/sched.h:1583 task_rq_lock+0x148/0x170
>> [  447.062914] Modules linked in: locktorture(O) torture(O)
>> [  447.068254] CPU: 1 PID: 1648 Comm: torture_shuffle Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
>> [  447.079168] Hardware name: ARM Juno development board (r0) (DT)
>> [  447.085106] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  447.092095] pc : task_rq_lock+0x148/0x170
>> [  447.096121] lr : task_rq_lock+0x148/0x170
>> [  447.100146] sp : ffff80000b85bd30
>> ...
>> [  447.175138] Call trace:
>> [  447.177589]  task_rq_lock+0x148/0x170
>> [  447.181267]  __set_cpus_allowed_ptr+0x34/0xc0
>> [  447.185643]  set_cpus_allowed_ptr+0x30/0x60
>> [  447.189843]  torture_shuffle+0x158/0x224 [torture]
>> [  447.194666]  kthread+0x10c/0x110
>> [  447.197906]  ret_from_fork+0x10/0x20
>> ...
>> [  447.233560] ---[ end trace 0000000000000000 ]---
>> 
>> 
>> [  446.542532] ------------[ cut here ]------------
>> [  446.553224] rq->balance_callback && rq->balance_callback != &balance_push_callback
>> [  446.553243] WARNING: CPU: 3 PID: 0 at kernel/sched/sched.h:1583 update_blocked_averages+0x784/0x78c
>> [  446.576089] Modules linked in: locktorture(O+) torture(O)
>> [  446.581551] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #203
>> [  446.591723] Hardware name: ARM Juno development board (r0) (DT)
>> [  446.597691] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  446.604712] pc : update_blocked_averages+0x784/0x78c
>> [  446.609723] lr : update_blocked_averages+0x784/0x78c
>> [  446.614733] sp : ffff80000b403b70
>> ...
>> [  446.690142] Call trace:
>> [  446.692609]  update_blocked_averages+0x784/0x78c
>> [  446.697270]  newidle_balance+0x184/0x5f0
>> [  446.701232]  pick_next_task_fair+0x2c/0x500
>> [  446.705456]  __schedule+0x1d4/0x1084
>> [  446.709070]  schedule_idle+0x28/0x4c
>> [  446.712682]  do_idle+0x1d4/0x2d0
>> [  446.715946]  cpu_startup_entry+0x28/0x30
>> [  446.719909]  secondary_start_kernel+0x138/0x15c
>> [  446.724486]  __secondary_switched+0xb0/0xb4
>> ...
>> [  446.765848] ---[ end trace 0000000000000000 ]---
>> 
>> 
>> [   95.091675] ------------[ cut here ]------------
>> [   95.096301] rq->balance_callback && rq->balance_callback != &balance_push_callback
>> [   95.096322] WARNING: CPU: 5 PID: 39 at kernel/sched/sched.h:1583 load_balance+0x644/0xdc0
>> [   95.103135] mutex_lock-torture: Creating lock_torture_writer task
>> [   95.103238] mutex_lock-torture: lock_torture_writer task started
>> [   95.110692] Modules linked in: locktorture(O+) torture(O)
>> [   95.136699] CPU: 5 PID: 39 Comm: migration/5 Tainted: G O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
>> [   95.147137] Hardware name: ARM Juno development board (r0) (DT)
>> [   95.153105] Stopper: 0x0 <- 0x0
>> [   95.156282] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   95.163306] pc : load_balance+0x644/0xdc0
>> [   95.167356] lr : load_balance+0x644/0xdc0
>> [   95.171405] sp : ffff80000b4cbaf0
>> ...
>> [   95.246833] Call trace:
>> [   95.249300]  load_balance+0x644/0xdc0
>> [   95.253000]  newidle_balance+0x290/0x6f0
>> [   95.256963]  pick_next_task_fair+0x2c/0x510
>> [   95.261188]  __schedule+0x1d4/0x1084
>> [   95.264801]  schedule+0x64/0x11c
>> [   95.268063]  smpboot_thread_fn+0xa4/0x270
>> [   95.272115]  kthread+0x10c/0x110
>> [   95.275375]  ret_from_fork+0x10/0x20
>> ...
>> [   95.316477] ---[ end trace 0000000000000000 ]---
>> 
>> 
>> [  134.893379] ------------[ cut here ]------------
>> [  134.898066] rq->balance_callback && rq->balance_callback != &balance_push_callback
>> [  134.898088] WARNING: CPU: 4 PID: 0 at kernel/sched/sched.h:1583 sched_rt_period_timer+0x1dc/0x3f0
>> [  134.914683] Modules linked in: locktorture(O) torture(O)
>> [  134.920059] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W  O 6.1.0-rc2-00036-g397ce37b37a8-dirty #204
>> [  134.930235] Hardware name: ARM Juno development board (r0) (DT)
>> [  134.936205] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  134.943229] pc : sched_rt_period_timer+0x1dc/0x3f0
>> [  134.948069] lr : sched_rt_period_timer+0x1dc/0x3f0
>> [  134.952908] sp : ffff80000b2cbde0
>> ...
>> [  135.028342] Call trace:
>> [  135.030810]  sched_rt_period_timer+0x1dc/0x3f0
>> [  135.035300]  __hrtimer_run_queues+0x184/0x504
>> [  135.039700]  hrtimer_interrupt+0xe8/0x244
>> [  135.043749]  arch_timer_handler_phys+0x2c/0x44
>> [  135.048239]  handle_percpu_devid_irq+0x8c/0x150
>> [  135.052815]  generic_handle_domain_irq+0x2c/0x44
>> [  135.057480]  gic_handle_irq+0x44/0xc4
>> [  135.061180]  call_on_irq_stack+0x2c/0x5c
>> [  135.065143]  do_interrupt_handler+0x80/0x84
>> ...
>> [  135.141122] ---[ end trace 0000000000000000 ]---
>> 
>>>> With your changes to locktorture in {2-3}/3 you still run CFS
>>>> lock_torture_writers but you should see the issue popping up
>>>> in __set_cpus_allowed_ptr() (from torture_shuffle()) for example.
>>>> 
>>>> Tried with:
>>>> 
>>>> insmod /lib/modules/torture.ko
>>>> insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 nlocks=3
>>>>                                                                      ^^^^^^^^^^^^^^^^^
>>>> 
>>>> When changing all lock_torture_writer's to FIFO it becomes even
>>>> more visible.
>>> 
>>> Ok, thank you, I will make it more aggressively set to RT. The
>>> previous locktorture was setting RT once every minute or so, at least
>>> now that is down to 10 seconds ;-). But as you highlight with the
>>> locktorture diff below, it needs to go further.
>> 
>> Yeah, the test env is more aggressive this way and you spot potential
>> issues quicker.
>> 
>>> Anyway, this is going to be a nice holiday/Christmas project for me,
>>> so thank you in advance for the holiday gift  :-)  ^_^
>> 
>> Enjoy ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-11-23  1:21 [PATCH RFC 0/3] Patches on top of PE series Joel Fernandes (Google)
  2022-11-23  1:21 ` [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate Joel Fernandes (Google)
@ 2022-11-23  1:21 ` Joel Fernandes (Google)
  2022-12-07 22:14   ` Paul E. McKenney
  2022-12-21  4:21   ` Chen Yu
  2022-11-23  1:21 ` [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable Joel Fernandes (Google)
  2 siblings, 2 replies; 18+ messages in thread
From: Joel Fernandes (Google) @ 2022-11-23  1:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Valentin Schneider, Vincent Guittot, kernel-team,
	John Stultz, Joel Fernandes, Qais Yousef, Will Deacon,
	Waiman Long, Boqun Feng, Connor O'Brien

Currently RT boosting is only done for rtmutex_lock, however with proxy
execution, we also have the mutex_lock participating in priorities. To
exercise the testing better, add RT boosting to other lock testing types
as well, using a new knob (rt_boost).

Tested with boot parameters:
locktorture.torture_type=mutex_lock
locktorture.onoff_interval=1
locktorture.nwriters_stress=8
locktorture.stutter=0
locktorture.rt_boost=1
locktorture.rt_boost_factor=1
locktorture.nlocks=3

For the rtmutex test, rt_boost is always enabled even if disabling is
requested.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/locking/locktorture.c | 91 +++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 43 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index bc3557677eed..5a388ac96a9b 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -46,6 +46,7 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
 torture_param(int, stat_interval, 60,
 	     "Number of seconds between stats printk()s");
 torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
+torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
 torture_param(int, verbose, 1,
 	     "Enable verbose debugging printk()s");
 torture_param(int, nlocks, 1,
@@ -129,15 +130,44 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
 	  /* BUGGY, do not use in real life!!! */
 }
 
-static void torture_boost_dummy(struct torture_random_state *trsp)
+static void torture_rt_boost(struct torture_random_state *trsp)
 {
-	/* Only rtmutexes care about priority */
+	const unsigned int factor = 50000; /* yes, quite arbitrary */
+
+	if (!rt_boost)
+		return;
+
+	if (!rt_task(current)) {
+		/*
+		 * Boost priority once every ~50k operations. When the
+		 * task tries to take the lock, the rtmutex it will account
+		 * for the new priority, and do any corresponding pi-dance.
+		 */
+		if (trsp && !(torture_random(trsp) %
+			      (cxt.nrealwriters_stress * factor))) {
+			sched_set_fifo(current);
+		} else /* common case, do nothing */
+			return;
+	} else {
+		/*
+		 * The task will remain boosted for another ~500k operations,
+		 * then restored back to its original prio, and so forth.
+		 *
+		 * When @trsp is nil, we want to force-reset the task for
+		 * stopping the kthread.
+		 */
+		if (!trsp || !(torture_random(trsp) %
+			       (cxt.nrealwriters_stress * factor * 2))) {
+			sched_set_normal(current, 0);
+		} else /* common case, do nothing */
+			return;
+	}
 }
 
 static struct lock_torture_ops lock_busted_ops = {
 	.writelock	= torture_lock_busted_write_lock,
 	.write_delay	= torture_lock_busted_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_lock_busted_write_unlock,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -181,7 +211,7 @@ __releases(torture_spinlock)
 static struct lock_torture_ops spin_lock_ops = {
 	.writelock	= torture_spin_lock_write_lock,
 	.write_delay	= torture_spin_lock_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_spin_lock_write_unlock,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -208,7 +238,7 @@ __releases(torture_spinlock)
 static struct lock_torture_ops spin_lock_irq_ops = {
 	.writelock	= torture_spin_lock_write_lock_irq,
 	.write_delay	= torture_spin_lock_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_lock_spin_write_unlock_irq,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -277,7 +307,7 @@ __releases(torture_rwlock)
 static struct lock_torture_ops rw_lock_ops = {
 	.writelock	= torture_rwlock_write_lock,
 	.write_delay	= torture_rwlock_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_rwlock_write_unlock,
 	.readlock       = torture_rwlock_read_lock,
 	.read_delay     = torture_rwlock_read_delay,
@@ -320,7 +350,7 @@ __releases(torture_rwlock)
 static struct lock_torture_ops rw_lock_irq_ops = {
 	.writelock	= torture_rwlock_write_lock_irq,
 	.write_delay	= torture_rwlock_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_rwlock_write_unlock_irq,
 	.readlock       = torture_rwlock_read_lock_irq,
 	.read_delay     = torture_rwlock_read_delay,
@@ -362,7 +392,7 @@ __releases(torture_mutex)
 static struct lock_torture_ops mutex_lock_ops = {
 	.writelock	= torture_mutex_lock,
 	.write_delay	= torture_mutex_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_mutex_unlock,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -460,7 +490,7 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
 	.exit		= torture_ww_mutex_exit,
 	.writelock	= torture_ww_mutex_lock,
 	.write_delay	= torture_mutex_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_ww_mutex_unlock,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -471,6 +501,11 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
 #ifdef CONFIG_RT_MUTEXES
 static DEFINE_RT_MUTEX(torture_rtmutex);
 
+static void torture_rtmutex_init(void)
+{
+	rt_boost = 1;
+}
+
 static int torture_rtmutex_lock(int tid __maybe_unused)
 __acquires(torture_rtmutex)
 {
@@ -478,37 +513,6 @@ __acquires(torture_rtmutex)
 	return 0;
 }
 
-static void torture_rtmutex_boost(struct torture_random_state *trsp)
-{
-	const unsigned int factor = 50000; /* yes, quite arbitrary */
-
-	if (!rt_task(current)) {
-		/*
-		 * Boost priority once every ~50k operations. When the
-		 * task tries to take the lock, the rtmutex it will account
-		 * for the new priority, and do any corresponding pi-dance.
-		 */
-		if (trsp && !(torture_random(trsp) %
-			      (cxt.nrealwriters_stress * factor))) {
-			sched_set_fifo(current);
-		} else /* common case, do nothing */
-			return;
-	} else {
-		/*
-		 * The task will remain boosted for another ~500k operations,
-		 * then restored back to its original prio, and so forth.
-		 *
-		 * When @trsp is nil, we want to force-reset the task for
-		 * stopping the kthread.
-		 */
-		if (!trsp || !(torture_random(trsp) %
-			       (cxt.nrealwriters_stress * factor * 2))) {
-			sched_set_normal(current, 0);
-		} else /* common case, do nothing */
-			return;
-	}
-}
-
 static void torture_rtmutex_delay(struct torture_random_state *trsp)
 {
 	const unsigned long shortdelay_us = 2;
@@ -535,9 +539,10 @@ __releases(torture_rtmutex)
 }
 
 static struct lock_torture_ops rtmutex_lock_ops = {
+	.init		= torture_rtmutex_init,
 	.writelock	= torture_rtmutex_lock,
 	.write_delay	= torture_rtmutex_delay,
-	.task_boost     = torture_rtmutex_boost,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_rtmutex_unlock,
 	.readlock       = NULL,
 	.read_delay     = NULL,
@@ -604,7 +609,7 @@ __releases(torture_rwsem)
 static struct lock_torture_ops rwsem_lock_ops = {
 	.writelock	= torture_rwsem_down_write,
 	.write_delay	= torture_rwsem_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_rwsem_up_write,
 	.readlock       = torture_rwsem_down_read,
 	.read_delay     = torture_rwsem_read_delay,
@@ -656,7 +661,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
 	.exit		= torture_percpu_rwsem_exit,
 	.writelock	= torture_percpu_rwsem_down_write,
 	.write_delay	= torture_rwsem_write_delay,
-	.task_boost     = torture_boost_dummy,
+	.task_boost     = torture_rt_boost,
 	.writeunlock	= torture_percpu_rwsem_up_write,
 	.readlock       = torture_percpu_rwsem_down_read,
 	.read_delay     = torture_rwsem_read_delay,
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-11-23  1:21 ` [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted Joel Fernandes (Google)
@ 2022-12-07 22:14   ` Paul E. McKenney
  2022-12-07 22:23     ` Joel Fernandes
  2022-12-21  4:21   ` Chen Yu
  1 sibling, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-07 22:14 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Wed, Nov 23, 2022 at 01:21:03AM +0000, Joel Fernandes (Google) wrote:
> Currently RT boosting is only done for rtmutex_lock, however with proxy
> execution, we also have the mutex_lock participating in priorities. To
> exercise the testing better, add RT boosting to other lock testing types
> as well, using a new knob (rt_boost).
> 
> Tested with boot parameters:
> locktorture.torture_type=mutex_lock
> locktorture.onoff_interval=1
> locktorture.nwriters_stress=8
> locktorture.stutter=0
> locktorture.rt_boost=1
> locktorture.rt_boost_factor=1
> locktorture.nlocks=3
> 
> For the rtmutex test, rt_boost is always enabled even if disabling is
> requested.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/locking/locktorture.c | 91 +++++++++++++++++++-----------------
>  1 file changed, 48 insertions(+), 43 deletions(-)
> 
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index bc3557677eed..5a388ac96a9b 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -46,6 +46,7 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
>  torture_param(int, stat_interval, 60,
>  	     "Number of seconds between stats printk()s");
>  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> +torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
>  torture_param(int, verbose, 1,
>  	     "Enable verbose debugging printk()s");
>  torture_param(int, nlocks, 1,
> @@ -129,15 +130,44 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
>  	  /* BUGGY, do not use in real life!!! */
>  }
>  
> -static void torture_boost_dummy(struct torture_random_state *trsp)

We no longer have torture_boot_dummy().  Is the point that the
"spinlocks" to priority boosting in PREEMPT_RT kernels?  If so,
would it make sense to do something like this for spinlock?

	.task_boost     = IS_ENABLED(CONFIG_PREEMPT_RT) ? torture_rt_boost : torture_boost_dummy,

Or maybe using a similar approach for the default value of the rt_boost
module parameter?

Or is there some benefit of priority boosting for spinlocks even in
non-PREEMPT_RT kernels that I am missing?

> +static void torture_rt_boost(struct torture_random_state *trsp)
>  {
> -	/* Only rtmutexes care about priority */
> +	const unsigned int factor = 50000; /* yes, quite arbitrary */

OK, this one looks like code movement combined with 50000 being named
"factor".  Whoever originally wrote these comments needs to have done
a better job.  ;-)

> +
> +	if (!rt_boost)
> +		return;
> +
> +	if (!rt_task(current)) {
> +		/*
> +		 * Boost priority once every ~50k operations. When the
> +		 * task tries to take the lock, the rtmutex it will account
> +		 * for the new priority, and do any corresponding pi-dance.
> +		 */
> +		if (trsp && !(torture_random(trsp) %
> +			      (cxt.nrealwriters_stress * factor))) {
> +			sched_set_fifo(current);
> +		} else /* common case, do nothing */
> +			return;
> +	} else {
> +		/*
> +		 * The task will remain boosted for another ~500k operations,
> +		 * then restored back to its original prio, and so forth.
> +		 *
> +		 * When @trsp is nil, we want to force-reset the task for
> +		 * stopping the kthread.
> +		 */
> +		if (!trsp || !(torture_random(trsp) %
> +			       (cxt.nrealwriters_stress * factor * 2))) {
> +			sched_set_normal(current, 0);
> +		} else /* common case, do nothing */
> +			return;
> +	}
>  }
>  
>  static struct lock_torture_ops lock_busted_ops = {
>  	.writelock	= torture_lock_busted_write_lock,
>  	.write_delay	= torture_lock_busted_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_lock_busted_write_unlock,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -181,7 +211,7 @@ __releases(torture_spinlock)
>  static struct lock_torture_ops spin_lock_ops = {
>  	.writelock	= torture_spin_lock_write_lock,
>  	.write_delay	= torture_spin_lock_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_spin_lock_write_unlock,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -208,7 +238,7 @@ __releases(torture_spinlock)
>  static struct lock_torture_ops spin_lock_irq_ops = {
>  	.writelock	= torture_spin_lock_write_lock_irq,
>  	.write_delay	= torture_spin_lock_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_lock_spin_write_unlock_irq,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -277,7 +307,7 @@ __releases(torture_rwlock)
>  static struct lock_torture_ops rw_lock_ops = {
>  	.writelock	= torture_rwlock_write_lock,
>  	.write_delay	= torture_rwlock_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_rwlock_write_unlock,
>  	.readlock       = torture_rwlock_read_lock,
>  	.read_delay     = torture_rwlock_read_delay,
> @@ -320,7 +350,7 @@ __releases(torture_rwlock)
>  static struct lock_torture_ops rw_lock_irq_ops = {
>  	.writelock	= torture_rwlock_write_lock_irq,
>  	.write_delay	= torture_rwlock_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_rwlock_write_unlock_irq,
>  	.readlock       = torture_rwlock_read_lock_irq,
>  	.read_delay     = torture_rwlock_read_delay,
> @@ -362,7 +392,7 @@ __releases(torture_mutex)
>  static struct lock_torture_ops mutex_lock_ops = {
>  	.writelock	= torture_mutex_lock,
>  	.write_delay	= torture_mutex_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_mutex_unlock,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -460,7 +490,7 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
>  	.exit		= torture_ww_mutex_exit,
>  	.writelock	= torture_ww_mutex_lock,
>  	.write_delay	= torture_mutex_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_ww_mutex_unlock,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -471,6 +501,11 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
>  #ifdef CONFIG_RT_MUTEXES
>  static DEFINE_RT_MUTEX(torture_rtmutex);
>  
> +static void torture_rtmutex_init(void)
> +{
> +	rt_boost = 1;
> +}
> +
>  static int torture_rtmutex_lock(int tid __maybe_unused)
>  __acquires(torture_rtmutex)
>  {
> @@ -478,37 +513,6 @@ __acquires(torture_rtmutex)
>  	return 0;
>  }
>  
> -static void torture_rtmutex_boost(struct torture_random_state *trsp)
> -{
> -	const unsigned int factor = 50000; /* yes, quite arbitrary */
> -
> -	if (!rt_task(current)) {
> -		/*
> -		 * Boost priority once every ~50k operations. When the
> -		 * task tries to take the lock, the rtmutex it will account
> -		 * for the new priority, and do any corresponding pi-dance.
> -		 */
> -		if (trsp && !(torture_random(trsp) %
> -			      (cxt.nrealwriters_stress * factor))) {
> -			sched_set_fifo(current);
> -		} else /* common case, do nothing */
> -			return;
> -	} else {
> -		/*
> -		 * The task will remain boosted for another ~500k operations,
> -		 * then restored back to its original prio, and so forth.
> -		 *
> -		 * When @trsp is nil, we want to force-reset the task for
> -		 * stopping the kthread.
> -		 */
> -		if (!trsp || !(torture_random(trsp) %
> -			       (cxt.nrealwriters_stress * factor * 2))) {
> -			sched_set_normal(current, 0);
> -		} else /* common case, do nothing */
> -			return;
> -	}
> -}
> -
>  static void torture_rtmutex_delay(struct torture_random_state *trsp)
>  {
>  	const unsigned long shortdelay_us = 2;
> @@ -535,9 +539,10 @@ __releases(torture_rtmutex)
>  }
>  
>  static struct lock_torture_ops rtmutex_lock_ops = {
> +	.init		= torture_rtmutex_init,

OK, so rt_boost defaults on for rtmutex.  In fact, it cannot be disabled,
which might make things more difficult for debugging.

Another approach would to do something similar to the test_boost module
parameter for RCU.  This defaults to "1", which means "Boost if it
makes sense in this situation".  It can be set to "0", which means
"Never boost", and also to "2", which means "Boost even if it makes no
sense to do so.  This last helps verify rcutorture's ability to detect
boost failures.  There is a can_boost field in the rcu_torture_ops
structure that defines when it makes sense to boost, and this field
is initialized based on CONFIG_RCU_BOOST.

In this case, it makes sense to boost rt_mutex always, and it makes
sense to boost exclusive spinlocks in PREEMPT_RT kernels.  It might make
sense to boost reader-writer spinlock situations involving only writers,
but that would likely require additional changes.

Or is there some reason why this approach would not work well?

							Thanx, Paul

>  	.writelock	= torture_rtmutex_lock,
>  	.write_delay	= torture_rtmutex_delay,
> -	.task_boost     = torture_rtmutex_boost,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_rtmutex_unlock,
>  	.readlock       = NULL,
>  	.read_delay     = NULL,
> @@ -604,7 +609,7 @@ __releases(torture_rwsem)
>  static struct lock_torture_ops rwsem_lock_ops = {
>  	.writelock	= torture_rwsem_down_write,
>  	.write_delay	= torture_rwsem_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_rwsem_up_write,
>  	.readlock       = torture_rwsem_down_read,
>  	.read_delay     = torture_rwsem_read_delay,
> @@ -656,7 +661,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
>  	.exit		= torture_percpu_rwsem_exit,
>  	.writelock	= torture_percpu_rwsem_down_write,
>  	.write_delay	= torture_rwsem_write_delay,
> -	.task_boost     = torture_boost_dummy,
> +	.task_boost     = torture_rt_boost,
>  	.writeunlock	= torture_percpu_rwsem_up_write,
>  	.readlock       = torture_percpu_rwsem_down_read,
>  	.read_delay     = torture_rwsem_read_delay,
> -- 
> 2.38.1.584.g0f3c55d4c2-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-12-07 22:14   ` Paul E. McKenney
@ 2022-12-07 22:23     ` Joel Fernandes
  2022-12-07 22:42       ` Joel Fernandes
  2022-12-08  5:06       ` Paul E. McKenney
  0 siblings, 2 replies; 18+ messages in thread
From: Joel Fernandes @ 2022-12-07 22:23 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

Hi Paul,

On Wed, Dec 7, 2022 at 10:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Nov 23, 2022 at 01:21:03AM +0000, Joel Fernandes (Google) wrote:
> > Currently RT boosting is only done for rtmutex_lock, however with proxy
> > execution, we also have the mutex_lock participating in priorities. To
> > exercise the testing better, add RT boosting to other lock testing types
> > as well, using a new knob (rt_boost).
> >
> > Tested with boot parameters:
> > locktorture.torture_type=mutex_lock
> > locktorture.onoff_interval=1
> > locktorture.nwriters_stress=8
> > locktorture.stutter=0
> > locktorture.rt_boost=1
> > locktorture.rt_boost_factor=1
> > locktorture.nlocks=3
> >
> > For the rtmutex test, rt_boost is always enabled even if disabling is
> > requested.
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/locking/locktorture.c | 91 +++++++++++++++++++-----------------
> >  1 file changed, 48 insertions(+), 43 deletions(-)
> >
> > diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> > index bc3557677eed..5a388ac96a9b 100644
> > --- a/kernel/locking/locktorture.c
> > +++ b/kernel/locking/locktorture.c
> > @@ -46,6 +46,7 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
> >  torture_param(int, stat_interval, 60,
> >            "Number of seconds between stats printk()s");
> >  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> > +torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> >  torture_param(int, verbose, 1,
> >            "Enable verbose debugging printk()s");
> >  torture_param(int, nlocks, 1,
> > @@ -129,15 +130,44 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
> >         /* BUGGY, do not use in real life!!! */
> >  }
> >
> > -static void torture_boost_dummy(struct torture_random_state *trsp)
>
> We no longer have torture_boot_dummy().  Is the point that the
> "spinlocks" to priority boosting in PREEMPT_RT kernels?  If so,
> would it make sense to do something like this for spinlock?
>
>         .task_boost     = IS_ENABLED(CONFIG_PREEMPT_RT) ? torture_rt_boost : torture_boost_dummy,
>
> Or maybe using a similar approach for the default value of the rt_boost
> module parameter?
>
> Or is there some benefit of priority boosting for spinlocks even in
> non-PREEMPT_RT kernels that I am missing?

There are 2 advantages as far as I can see:

1. The shuffle thread which ends up in setscheduler exercises the same
path as the rt mutex boost, so that would test races with that and the
boost path.

2. In the future, proxy execution deals with migrations, and changes
of the tasks' class there can race with boosting / and schedule().

If there is no harm, I would like us to keep torture_rt_boost even in
!PREEMPT_RT, just so we can shake bugs out more. Thoughts?

> > +static void torture_rt_boost(struct torture_random_state *trsp)
> >  {
> > -     /* Only rtmutexes care about priority */
> > +     const unsigned int factor = 50000; /* yes, quite arbitrary */
>
> OK, this one looks like code movement combined with 50000 being named
> "factor".  Whoever originally wrote these comments needs to have done
> a better job.  ;-)

True, I will adjust the comments in v2 :)

> > +
> > +     if (!rt_boost)
> > +             return;
> > +
> > +     if (!rt_task(current)) {
> > +             /*
> > +              * Boost priority once every ~50k operations. When the
> > +              * task tries to take the lock, the rtmutex it will account
> > +              * for the new priority, and do any corresponding pi-dance.
> > +              */
> > +             if (trsp && !(torture_random(trsp) %
> > +                           (cxt.nrealwriters_stress * factor))) {
> > +                     sched_set_fifo(current);
> > +             } else /* common case, do nothing */
> > +                     return;
> > +     } else {
> > +             /*
> > +              * The task will remain boosted for another ~500k operations,
> > +              * then restored back to its original prio, and so forth.
> > +              *
> > +              * When @trsp is nil, we want to force-reset the task for
> > +              * stopping the kthread.
> > +              */
> > +             if (!trsp || !(torture_random(trsp) %
> > +                            (cxt.nrealwriters_stress * factor * 2))) {
> > +                     sched_set_normal(current, 0);
> > +             } else /* common case, do nothing */
> > +                     return;
> > +     }
> >  }
> >
> >  static struct lock_torture_ops lock_busted_ops = {
> >       .writelock      = torture_lock_busted_write_lock,
> >       .write_delay    = torture_lock_busted_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_lock_busted_write_unlock,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -181,7 +211,7 @@ __releases(torture_spinlock)
> >  static struct lock_torture_ops spin_lock_ops = {
> >       .writelock      = torture_spin_lock_write_lock,
> >       .write_delay    = torture_spin_lock_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_spin_lock_write_unlock,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -208,7 +238,7 @@ __releases(torture_spinlock)
> >  static struct lock_torture_ops spin_lock_irq_ops = {
> >       .writelock      = torture_spin_lock_write_lock_irq,
> >       .write_delay    = torture_spin_lock_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_lock_spin_write_unlock_irq,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -277,7 +307,7 @@ __releases(torture_rwlock)
> >  static struct lock_torture_ops rw_lock_ops = {
> >       .writelock      = torture_rwlock_write_lock,
> >       .write_delay    = torture_rwlock_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_rwlock_write_unlock,
> >       .readlock       = torture_rwlock_read_lock,
> >       .read_delay     = torture_rwlock_read_delay,
> > @@ -320,7 +350,7 @@ __releases(torture_rwlock)
> >  static struct lock_torture_ops rw_lock_irq_ops = {
> >       .writelock      = torture_rwlock_write_lock_irq,
> >       .write_delay    = torture_rwlock_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_rwlock_write_unlock_irq,
> >       .readlock       = torture_rwlock_read_lock_irq,
> >       .read_delay     = torture_rwlock_read_delay,
> > @@ -362,7 +392,7 @@ __releases(torture_mutex)
> >  static struct lock_torture_ops mutex_lock_ops = {
> >       .writelock      = torture_mutex_lock,
> >       .write_delay    = torture_mutex_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_mutex_unlock,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -460,7 +490,7 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
> >       .exit           = torture_ww_mutex_exit,
> >       .writelock      = torture_ww_mutex_lock,
> >       .write_delay    = torture_mutex_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_ww_mutex_unlock,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -471,6 +501,11 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
> >  #ifdef CONFIG_RT_MUTEXES
> >  static DEFINE_RT_MUTEX(torture_rtmutex);
> >
> > +static void torture_rtmutex_init(void)
> > +{
> > +     rt_boost = 1;
> > +}
> > +
> >  static int torture_rtmutex_lock(int tid __maybe_unused)
> >  __acquires(torture_rtmutex)
> >  {
> > @@ -478,37 +513,6 @@ __acquires(torture_rtmutex)
> >       return 0;
> >  }
> >
> > -static void torture_rtmutex_boost(struct torture_random_state *trsp)
> > -{
> > -     const unsigned int factor = 50000; /* yes, quite arbitrary */
> > -
> > -     if (!rt_task(current)) {
> > -             /*
> > -              * Boost priority once every ~50k operations. When the
> > -              * task tries to take the lock, the rtmutex it will account
> > -              * for the new priority, and do any corresponding pi-dance.
> > -              */
> > -             if (trsp && !(torture_random(trsp) %
> > -                           (cxt.nrealwriters_stress * factor))) {
> > -                     sched_set_fifo(current);
> > -             } else /* common case, do nothing */
> > -                     return;
> > -     } else {
> > -             /*
> > -              * The task will remain boosted for another ~500k operations,
> > -              * then restored back to its original prio, and so forth.
> > -              *
> > -              * When @trsp is nil, we want to force-reset the task for
> > -              * stopping the kthread.
> > -              */
> > -             if (!trsp || !(torture_random(trsp) %
> > -                            (cxt.nrealwriters_stress * factor * 2))) {
> > -                     sched_set_normal(current, 0);
> > -             } else /* common case, do nothing */
> > -                     return;
> > -     }
> > -}
> > -
> >  static void torture_rtmutex_delay(struct torture_random_state *trsp)
> >  {
> >       const unsigned long shortdelay_us = 2;
> > @@ -535,9 +539,10 @@ __releases(torture_rtmutex)
> >  }
> >
> >  static struct lock_torture_ops rtmutex_lock_ops = {
> > +     .init           = torture_rtmutex_init,
>
> OK, so rt_boost defaults on for rtmutex.  In fact, it cannot be disabled,
> which might make things more difficult for debugging.

Ah ok, true. I was hoping the number of users who want it off for
rtmutex would be ~0 :-D

> Another approach would to do something similar to the test_boost module
> parameter for RCU.  This defaults to "1", which means "Boost if it
> makes sense in this situation".  It can be set to "0", which means
> "Never boost", and also to "2", which means "Boost even if it makes no
> sense to do so.  This last helps verify rcutorture's ability to detect
> boost failures.  There is a can_boost field in the rcu_torture_ops
> structure that defines when it makes sense to boost, and this field
> is initialized based on CONFIG_RCU_BOOST.
>
> In this case, it makes sense to boost rt_mutex always, and it makes
> sense to boost exclusive spinlocks in PREEMPT_RT kernels.  It might make
> sense to boost reader-writer spinlock situations involving only writers,
> but that would likely require additional changes.
>
> Or is there some reason why this approach would not work well?

I am thinking let us default to always boosting, for the reasons
mentioned above, and also because it will exercise more scheduler
paths and shake out bugs.

Thoughts?

thanks,

  - Joel

>
>                                                         Thanx, Paul
>
> >       .writelock      = torture_rtmutex_lock,
> >       .write_delay    = torture_rtmutex_delay,
> > -     .task_boost     = torture_rtmutex_boost,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_rtmutex_unlock,
> >       .readlock       = NULL,
> >       .read_delay     = NULL,
> > @@ -604,7 +609,7 @@ __releases(torture_rwsem)
> >  static struct lock_torture_ops rwsem_lock_ops = {
> >       .writelock      = torture_rwsem_down_write,
> >       .write_delay    = torture_rwsem_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_rwsem_up_write,
> >       .readlock       = torture_rwsem_down_read,
> >       .read_delay     = torture_rwsem_read_delay,
> > @@ -656,7 +661,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
> >       .exit           = torture_percpu_rwsem_exit,
> >       .writelock      = torture_percpu_rwsem_down_write,
> >       .write_delay    = torture_rwsem_write_delay,
> > -     .task_boost     = torture_boost_dummy,
> > +     .task_boost     = torture_rt_boost,
> >       .writeunlock    = torture_percpu_rwsem_up_write,
> >       .readlock       = torture_percpu_rwsem_down_read,
> >       .read_delay     = torture_rwsem_read_delay,
> > --
> > 2.38.1.584.g0f3c55d4c2-goog
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-12-07 22:23     ` Joel Fernandes
@ 2022-12-07 22:42       ` Joel Fernandes
  2022-12-08  5:06       ` Paul E. McKenney
  1 sibling, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2022-12-07 22:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: paulmck, linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Wed, Dec 7, 2022 at 10:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hi Paul,
>
> On Wed, Dec 7, 2022 at 10:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Nov 23, 2022 at 01:21:03AM +0000, Joel Fernandes (Google) wrote:
> > > Currently RT boosting is only done for rtmutex_lock, however with proxy
> > > execution, we also have the mutex_lock participating in priorities. To
> > > exercise the testing better, add RT boosting to other lock testing types
> > > as well, using a new knob (rt_boost).
> > >
> > > Tested with boot parameters:
> > > locktorture.torture_type=mutex_lock
> > > locktorture.onoff_interval=1
> > > locktorture.nwriters_stress=8
> > > locktorture.stutter=0
> > > locktorture.rt_boost=1
> > > locktorture.rt_boost_factor=1
> > > locktorture.nlocks=3
> > >
> > > For the rtmutex test, rt_boost is always enabled even if disabling is
> > > requested.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/locking/locktorture.c | 91 +++++++++++++++++++-----------------
> > >  1 file changed, 48 insertions(+), 43 deletions(-)
> > >
> > > diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> > > index bc3557677eed..5a388ac96a9b 100644
> > > --- a/kernel/locking/locktorture.c
> > > +++ b/kernel/locking/locktorture.c
> > > @@ -46,6 +46,7 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
> > >  torture_param(int, stat_interval, 60,
> > >            "Number of seconds between stats printk()s");
> > >  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> > > +torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> > >  torture_param(int, verbose, 1,
> > >            "Enable verbose debugging printk()s");
> > >  torture_param(int, nlocks, 1,
> > > @@ -129,15 +130,44 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
> > >         /* BUGGY, do not use in real life!!! */
> > >  }
> > >
> > > -static void torture_boost_dummy(struct torture_random_state *trsp)
> >
> > We no longer have torture_boot_dummy().  Is the point that the
> > "spinlocks" to priority boosting in PREEMPT_RT kernels?  If so,
> > would it make sense to do something like this for spinlock?
> >
> >         .task_boost     = IS_ENABLED(CONFIG_PREEMPT_RT) ? torture_rt_boost : torture_boost_dummy,
> >
> > Or maybe using a similar approach for the default value of the rt_boost
> > module parameter?
> >
> > Or is there some benefit of priority boosting for spinlocks even in
> > non-PREEMPT_RT kernels that I am missing?
>
> There are 2 advantages as far as I can see:
>
> 1. The shuffle thread which ends up in setscheduler exercises the same
> path as the rt mutex boost, so that would test races with that and the
> boost path.

I was too hasty in my reply, I meant here the thread's "RT boost"
racing with other paths such as migration / schedule() / other
setscheduler() paths.

Obviously rtmutex boost does not happen when there is no rtmutex to begin with.

Thanks,

  - Joel


>
> >
> >                                                         Thanx, Paul
> >
> > >       .writelock      = torture_rtmutex_lock,
> > >       .write_delay    = torture_rtmutex_delay,
> > > -     .task_boost     = torture_rtmutex_boost,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rtmutex_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -604,7 +609,7 @@ __releases(torture_rwsem)
> > >  static struct lock_torture_ops rwsem_lock_ops = {
> > >       .writelock      = torture_rwsem_down_write,
> > >       .write_delay    = torture_rwsem_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rwsem_up_write,
> > >       .readlock       = torture_rwsem_down_read,
> > >       .read_delay     = torture_rwsem_read_delay,
> > > @@ -656,7 +661,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
> > >       .exit           = torture_percpu_rwsem_exit,
> > >       .writelock      = torture_percpu_rwsem_down_write,
> > >       .write_delay    = torture_rwsem_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_percpu_rwsem_up_write,
> > >       .readlock       = torture_percpu_rwsem_down_read,
> > >       .read_delay     = torture_rwsem_read_delay,
> > > --
> > > 2.38.1.584.g0f3c55d4c2-goog
> > >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-12-07 22:23     ` Joel Fernandes
  2022-12-07 22:42       ` Joel Fernandes
@ 2022-12-08  5:06       ` Paul E. McKenney
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-08  5:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Wed, Dec 07, 2022 at 10:23:05PM +0000, Joel Fernandes wrote:
> Hi Paul,
> 
> On Wed, Dec 7, 2022 at 10:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Nov 23, 2022 at 01:21:03AM +0000, Joel Fernandes (Google) wrote:
> > > Currently RT boosting is only done for rtmutex_lock, however with proxy
> > > execution, we also have the mutex_lock participating in priorities. To
> > > exercise the testing better, add RT boosting to other lock testing types
> > > as well, using a new knob (rt_boost).
> > >
> > > Tested with boot parameters:
> > > locktorture.torture_type=mutex_lock
> > > locktorture.onoff_interval=1
> > > locktorture.nwriters_stress=8
> > > locktorture.stutter=0
> > > locktorture.rt_boost=1
> > > locktorture.rt_boost_factor=1
> > > locktorture.nlocks=3
> > >
> > > For the rtmutex test, rt_boost is always enabled even if disabling is
> > > requested.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/locking/locktorture.c | 91 +++++++++++++++++++-----------------
> > >  1 file changed, 48 insertions(+), 43 deletions(-)
> > >
> > > diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> > > index bc3557677eed..5a388ac96a9b 100644
> > > --- a/kernel/locking/locktorture.c
> > > +++ b/kernel/locking/locktorture.c
> > > @@ -46,6 +46,7 @@ torture_param(int, shutdown_secs, 0, "Shutdown time (j), <= zero to disable.");
> > >  torture_param(int, stat_interval, 60,
> > >            "Number of seconds between stats printk()s");
> > >  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> > > +torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> > >  torture_param(int, verbose, 1,
> > >            "Enable verbose debugging printk()s");
> > >  torture_param(int, nlocks, 1,
> > > @@ -129,15 +130,44 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
> > >         /* BUGGY, do not use in real life!!! */
> > >  }
> > >
> > > -static void torture_boost_dummy(struct torture_random_state *trsp)
> >
> > We no longer have torture_boot_dummy().  Is the point that the
> > "spinlocks" to priority boosting in PREEMPT_RT kernels?  If so,
> > would it make sense to do something like this for spinlock?
> >
> >         .task_boost     = IS_ENABLED(CONFIG_PREEMPT_RT) ? torture_rt_boost : torture_boost_dummy,
> >
> > Or maybe using a similar approach for the default value of the rt_boost
> > module parameter?
> >
> > Or is there some benefit of priority boosting for spinlocks even in
> > non-PREEMPT_RT kernels that I am missing?
> 
> There are 2 advantages as far as I can see:
> 
> 1. The shuffle thread which ends up in setscheduler exercises the same
> path as the rt mutex boost, so that would test races with that and the
> boost path.
> 
> 2. In the future, proxy execution deals with migrations, and changes
> of the tasks' class there can race with boosting / and schedule().
> 
> If there is no harm, I would like us to keep torture_rt_boost even in
> !PREEMPT_RT, just so we can shake bugs out more. Thoughts?

We need to give people the choice.  Not that I expect that all
that many people run locktorture...

> > > +static void torture_rt_boost(struct torture_random_state *trsp)
> > >  {
> > > -     /* Only rtmutexes care about priority */
> > > +     const unsigned int factor = 50000; /* yes, quite arbitrary */
> >
> > OK, this one looks like code movement combined with 50000 being named
> > "factor".  Whoever originally wrote these comments needs to have done
> > a better job.  ;-)
> 
> True, I will adjust the comments in v2 :)

Thank you for volunteering to clean up my mess.  ;-)

> > > +
> > > +     if (!rt_boost)
> > > +             return;
> > > +
> > > +     if (!rt_task(current)) {
> > > +             /*
> > > +              * Boost priority once every ~50k operations. When the
> > > +              * task tries to take the lock, the rtmutex it will account
> > > +              * for the new priority, and do any corresponding pi-dance.
> > > +              */
> > > +             if (trsp && !(torture_random(trsp) %
> > > +                           (cxt.nrealwriters_stress * factor))) {
> > > +                     sched_set_fifo(current);
> > > +             } else /* common case, do nothing */
> > > +                     return;
> > > +     } else {
> > > +             /*
> > > +              * The task will remain boosted for another ~500k operations,
> > > +              * then restored back to its original prio, and so forth.
> > > +              *
> > > +              * When @trsp is nil, we want to force-reset the task for
> > > +              * stopping the kthread.
> > > +              */
> > > +             if (!trsp || !(torture_random(trsp) %
> > > +                            (cxt.nrealwriters_stress * factor * 2))) {
> > > +                     sched_set_normal(current, 0);
> > > +             } else /* common case, do nothing */
> > > +                     return;
> > > +     }
> > >  }
> > >
> > >  static struct lock_torture_ops lock_busted_ops = {
> > >       .writelock      = torture_lock_busted_write_lock,
> > >       .write_delay    = torture_lock_busted_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_lock_busted_write_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -181,7 +211,7 @@ __releases(torture_spinlock)
> > >  static struct lock_torture_ops spin_lock_ops = {
> > >       .writelock      = torture_spin_lock_write_lock,
> > >       .write_delay    = torture_spin_lock_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_spin_lock_write_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -208,7 +238,7 @@ __releases(torture_spinlock)
> > >  static struct lock_torture_ops spin_lock_irq_ops = {
> > >       .writelock      = torture_spin_lock_write_lock_irq,
> > >       .write_delay    = torture_spin_lock_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_lock_spin_write_unlock_irq,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -277,7 +307,7 @@ __releases(torture_rwlock)
> > >  static struct lock_torture_ops rw_lock_ops = {
> > >       .writelock      = torture_rwlock_write_lock,
> > >       .write_delay    = torture_rwlock_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rwlock_write_unlock,
> > >       .readlock       = torture_rwlock_read_lock,
> > >       .read_delay     = torture_rwlock_read_delay,
> > > @@ -320,7 +350,7 @@ __releases(torture_rwlock)
> > >  static struct lock_torture_ops rw_lock_irq_ops = {
> > >       .writelock      = torture_rwlock_write_lock_irq,
> > >       .write_delay    = torture_rwlock_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rwlock_write_unlock_irq,
> > >       .readlock       = torture_rwlock_read_lock_irq,
> > >       .read_delay     = torture_rwlock_read_delay,
> > > @@ -362,7 +392,7 @@ __releases(torture_mutex)
> > >  static struct lock_torture_ops mutex_lock_ops = {
> > >       .writelock      = torture_mutex_lock,
> > >       .write_delay    = torture_mutex_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_mutex_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -460,7 +490,7 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
> > >       .exit           = torture_ww_mutex_exit,
> > >       .writelock      = torture_ww_mutex_lock,
> > >       .write_delay    = torture_mutex_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_ww_mutex_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -471,6 +501,11 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
> > >  #ifdef CONFIG_RT_MUTEXES
> > >  static DEFINE_RT_MUTEX(torture_rtmutex);
> > >
> > > +static void torture_rtmutex_init(void)
> > > +{
> > > +     rt_boost = 1;
> > > +}
> > > +
> > >  static int torture_rtmutex_lock(int tid __maybe_unused)
> > >  __acquires(torture_rtmutex)
> > >  {
> > > @@ -478,37 +513,6 @@ __acquires(torture_rtmutex)
> > >       return 0;
> > >  }
> > >
> > > -static void torture_rtmutex_boost(struct torture_random_state *trsp)
> > > -{
> > > -     const unsigned int factor = 50000; /* yes, quite arbitrary */
> > > -
> > > -     if (!rt_task(current)) {
> > > -             /*
> > > -              * Boost priority once every ~50k operations. When the
> > > -              * task tries to take the lock, the rtmutex it will account
> > > -              * for the new priority, and do any corresponding pi-dance.
> > > -              */
> > > -             if (trsp && !(torture_random(trsp) %
> > > -                           (cxt.nrealwriters_stress * factor))) {
> > > -                     sched_set_fifo(current);
> > > -             } else /* common case, do nothing */
> > > -                     return;
> > > -     } else {
> > > -             /*
> > > -              * The task will remain boosted for another ~500k operations,
> > > -              * then restored back to its original prio, and so forth.
> > > -              *
> > > -              * When @trsp is nil, we want to force-reset the task for
> > > -              * stopping the kthread.
> > > -              */
> > > -             if (!trsp || !(torture_random(trsp) %
> > > -                            (cxt.nrealwriters_stress * factor * 2))) {
> > > -                     sched_set_normal(current, 0);
> > > -             } else /* common case, do nothing */
> > > -                     return;
> > > -     }
> > > -}
> > > -
> > >  static void torture_rtmutex_delay(struct torture_random_state *trsp)
> > >  {
> > >       const unsigned long shortdelay_us = 2;
> > > @@ -535,9 +539,10 @@ __releases(torture_rtmutex)
> > >  }
> > >
> > >  static struct lock_torture_ops rtmutex_lock_ops = {
> > > +     .init           = torture_rtmutex_init,
> >
> > OK, so rt_boost defaults on for rtmutex.  In fact, it cannot be disabled,
> > which might make things more difficult for debugging.
> 
> Ah ok, true. I was hoping the number of users who want it off for
> rtmutex would be ~0 :-D

Yes, if someone was getting an rtmutex failure, an immediate question
would very likely be "does this happen without quite so much priority
boosting in the picture".  ;-)

> > Another approach would to do something similar to the test_boost module
> > parameter for RCU.  This defaults to "1", which means "Boost if it
> > makes sense in this situation".  It can be set to "0", which means
> > "Never boost", and also to "2", which means "Boost even if it makes no
> > sense to do so.  This last helps verify rcutorture's ability to detect
> > boost failures.  There is a can_boost field in the rcu_torture_ops
> > structure that defines when it makes sense to boost, and this field
> > is initialized based on CONFIG_RCU_BOOST.
> >
> > In this case, it makes sense to boost rt_mutex always, and it makes
> > sense to boost exclusive spinlocks in PREEMPT_RT kernels.  It might make
> > sense to boost reader-writer spinlock situations involving only writers,
> > but that would likely require additional changes.
> >
> > Or is there some reason why this approach would not work well?
> 
> I am thinking let us default to always boosting, for the reasons
> mentioned above, and also because it will exercise more scheduler
> paths and shake out bugs.
> 
> Thoughts?

Why not put that choice into the scripting?  For example, the main way
I run locktorture is from torture.sh, which could specify this setup.

Perhaps more to the point, the locktorture default could be made to be
"boost even if it does not make sense", while still allowing people to
easily choose other options.

							Thanx, Paul

> thanks,
> 
>   - Joel
> 
> >
> >                                                         Thanx, Paul
> >
> > >       .writelock      = torture_rtmutex_lock,
> > >       .write_delay    = torture_rtmutex_delay,
> > > -     .task_boost     = torture_rtmutex_boost,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rtmutex_unlock,
> > >       .readlock       = NULL,
> > >       .read_delay     = NULL,
> > > @@ -604,7 +609,7 @@ __releases(torture_rwsem)
> > >  static struct lock_torture_ops rwsem_lock_ops = {
> > >       .writelock      = torture_rwsem_down_write,
> > >       .write_delay    = torture_rwsem_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_rwsem_up_write,
> > >       .readlock       = torture_rwsem_down_read,
> > >       .read_delay     = torture_rwsem_read_delay,
> > > @@ -656,7 +661,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
> > >       .exit           = torture_percpu_rwsem_exit,
> > >       .writelock      = torture_percpu_rwsem_down_write,
> > >       .write_delay    = torture_rwsem_write_delay,
> > > -     .task_boost     = torture_boost_dummy,
> > > +     .task_boost     = torture_rt_boost,
> > >       .writeunlock    = torture_percpu_rwsem_up_write,
> > >       .readlock       = torture_percpu_rwsem_down_read,
> > >       .read_delay     = torture_rwsem_read_delay,
> > > --
> > > 2.38.1.584.g0f3c55d4c2-goog
> > >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted
  2022-11-23  1:21 ` [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted Joel Fernandes (Google)
  2022-12-07 22:14   ` Paul E. McKenney
@ 2022-12-21  4:21   ` Chen Yu
  1 sibling, 0 replies; 18+ messages in thread
From: Chen Yu @ 2022-12-21  4:21 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Valentin Schneider, Vincent Guittot, kernel-team,
	John Stultz, Joel Fernandes, Qais Yousef, Will Deacon,
	Waiman Long, Boqun Feng, Connor O'Brien

On 2022-11-23 at 01:21:03 +0000, Joel Fernandes (Google) wrote:
> Currently RT boosting is only done for rtmutex_lock, however with proxy
> execution, we also have the mutex_lock participating in priorities. To
> exercise the testing better, add RT boosting to other lock testing types
> as well, using a new knob (rt_boost).
> 
> Tested with boot parameters:
> locktorture.torture_type=mutex_lock
> locktorture.onoff_interval=1
> locktorture.nwriters_stress=8
> locktorture.stutter=0
> locktorture.rt_boost=1
> locktorture.rt_boost_factor=1
> locktorture.nlocks=3
> 
> For the rtmutex test, rt_boost is always enabled even if disabling is
> requested.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
I have tested this patch set based on
https://lore.kernel.org/lkml/20221003214501.2050087-1-connoro@google.com/
using:
insmod locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=1 stutter=0 nlocks=3
on a dual sockets with 228 CPUs. So far I did not find any error/warning in the past
8 hours.

I'm trying to ramp up what scenario PE would bring benefit to. For mutex_lock(), would
it benefit frequent cgroup hierarchy update because of global cgroup_mutex contention?
But in theory the cgroup hierarchy update would be quite rare in daily usage?

Besides, according to
https://lore.kernel.org/lkml/20221003214501.2050087-1-connoro@google.com/
PE could also mitigate cgroup 'priority inversion':
"One notable scenario arises when cpu cgroups
are used to throttle less important background tasks. Priority inversion
can occur when an "important" unthrottled task blocks on a mutex held by
an "unimportant" task whose CPU time is constrained using cpu
shares. The result is higher worst case latencies for the unthrottled
task." Is it applicable to add cgroup based(different shares) mutex_lock contention
in locktorture?

BTW, can we also expose the rt_boost/rt_boost_factor in lock_torture_print_module_parms()?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable
  2022-11-23  1:21 [PATCH RFC 0/3] Patches on top of PE series Joel Fernandes (Google)
  2022-11-23  1:21 ` [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate Joel Fernandes (Google)
  2022-11-23  1:21 ` [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted Joel Fernandes (Google)
@ 2022-11-23  1:21 ` Joel Fernandes (Google)
  2022-12-07 22:15   ` Paul E. McKenney
  2022-12-21  4:28   ` Chen Yu
  2 siblings, 2 replies; 18+ messages in thread
From: Joel Fernandes (Google) @ 2022-11-23  1:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Valentin Schneider, Vincent Guittot, kernel-team,
	John Stultz, Joel Fernandes, Qais Yousef, Will Deacon,
	Waiman Long, Boqun Feng, Connor O'Brien

The rt boosting in locktorture has a factor variable large enough that
boosting only happens once every minute or so. Add a tunable to educe
the factor so that boosting happens more often, to test paths and arrive
at failure modes earlier. With this change, I can set the factor to
like 50 and have the boosting happens every 10 seconds or so.

Tested with boot parameters:
locktorture.torture_type=mutex_lock
locktorture.onoff_interval=1
locktorture.nwriters_stress=8
locktorture.stutter=0
locktorture.rt_boost=1
locktorture.rt_boost_factor=50
locktorture.nlocks=3

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/locking/locktorture.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 5a388ac96a9b..e4529c2166e9 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -47,6 +47,7 @@ torture_param(int, stat_interval, 60,
 	     "Number of seconds between stats printk()s");
 torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
 torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
+torture_param(int, rt_boost_factor, 50000, "A factor determining how often rt-boost happens");
 torture_param(int, verbose, 1,
 	     "Enable verbose debugging printk()s");
 torture_param(int, nlocks, 1,
@@ -132,15 +133,15 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
 
 static void torture_rt_boost(struct torture_random_state *trsp)
 {
-	const unsigned int factor = 50000; /* yes, quite arbitrary */
+	const unsigned int factor = rt_boost_factor; /* yes, quite arbitrary */
 
 	if (!rt_boost)
 		return;
 
 	if (!rt_task(current)) {
 		/*
-		 * Boost priority once every ~50k operations. When the
-		 * task tries to take the lock, the rtmutex it will account
+		 * Boost priority once every rt_boost_factor operations. When
+		 * the task tries to take the lock, the rtmutex it will account
 		 * for the new priority, and do any corresponding pi-dance.
 		 */
 		if (trsp && !(torture_random(trsp) %
@@ -150,8 +151,9 @@ static void torture_rt_boost(struct torture_random_state *trsp)
 			return;
 	} else {
 		/*
-		 * The task will remain boosted for another ~500k operations,
-		 * then restored back to its original prio, and so forth.
+		 * The task will remain boosted for another 10*rt_boost_factor
+		 * operations, then restored back to its original prio, and so
+		 * forth.
 		 *
 		 * When @trsp is nil, we want to force-reset the task for
 		 * stopping the kthread.
-- 
2.38.1.584.g0f3c55d4c2-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable
  2022-11-23  1:21 ` [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable Joel Fernandes (Google)
@ 2022-12-07 22:15   ` Paul E. McKenney
  2022-12-21  4:28   ` Chen Yu
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-07 22:15 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Joel Fernandes, Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Wed, Nov 23, 2022 at 01:21:04AM +0000, Joel Fernandes (Google) wrote:
> The rt boosting in locktorture has a factor variable large enough that
> boosting only happens once every minute or so. Add a tunable to educe
> the factor so that boosting happens more often, to test paths and arrive
> at failure modes earlier. With this change, I can set the factor to
> like 50 and have the boosting happens every 10 seconds or so.
> 
> Tested with boot parameters:
> locktorture.torture_type=mutex_lock
> locktorture.onoff_interval=1
> locktorture.nwriters_stress=8
> locktorture.stutter=0
> locktorture.rt_boost=1
> locktorture.rt_boost_factor=50
> locktorture.nlocks=3
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

This looks good, and is what I should have done to start with.  But
it depends on the first commit, so I will hold off for the moment.

						Thanx, Paul

> ---
>  kernel/locking/locktorture.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index 5a388ac96a9b..e4529c2166e9 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -47,6 +47,7 @@ torture_param(int, stat_interval, 60,
>  	     "Number of seconds between stats printk()s");
>  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
>  torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> +torture_param(int, rt_boost_factor, 50000, "A factor determining how often rt-boost happens");
>  torture_param(int, verbose, 1,
>  	     "Enable verbose debugging printk()s");
>  torture_param(int, nlocks, 1,
> @@ -132,15 +133,15 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
>  
>  static void torture_rt_boost(struct torture_random_state *trsp)
>  {
> -	const unsigned int factor = 50000; /* yes, quite arbitrary */
> +	const unsigned int factor = rt_boost_factor; /* yes, quite arbitrary */
>  
>  	if (!rt_boost)
>  		return;
>  
>  	if (!rt_task(current)) {
>  		/*
> -		 * Boost priority once every ~50k operations. When the
> -		 * task tries to take the lock, the rtmutex it will account
> +		 * Boost priority once every rt_boost_factor operations. When
> +		 * the task tries to take the lock, the rtmutex it will account
>  		 * for the new priority, and do any corresponding pi-dance.
>  		 */
>  		if (trsp && !(torture_random(trsp) %
> @@ -150,8 +151,9 @@ static void torture_rt_boost(struct torture_random_state *trsp)
>  			return;
>  	} else {
>  		/*
> -		 * The task will remain boosted for another ~500k operations,
> -		 * then restored back to its original prio, and so forth.
> +		 * The task will remain boosted for another 10*rt_boost_factor
> +		 * operations, then restored back to its original prio, and so
> +		 * forth.
>  		 *
>  		 * When @trsp is nil, we want to force-reset the task for
>  		 * stopping the kthread.
> -- 
> 2.38.1.584.g0f3c55d4c2-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable
  2022-11-23  1:21 ` [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable Joel Fernandes (Google)
  2022-12-07 22:15   ` Paul E. McKenney
@ 2022-12-21  4:28   ` Chen Yu
  2023-01-05 18:27     ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Chen Yu @ 2022-12-21  4:28 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Paul E. McKenney, Peter Zijlstra,
	Steven Rostedt, Valentin Schneider, Vincent Guittot, kernel-team,
	John Stultz, Joel Fernandes, Qais Yousef, Will Deacon,
	Waiman Long, Boqun Feng, Connor O'Brien

On 2022-11-23 at 01:21:04 +0000, Joel Fernandes (Google) wrote:
> The rt boosting in locktorture has a factor variable large enough that
> boosting only happens once every minute or so. Add a tunable to educe
> the factor so that boosting happens more often, to test paths and arrive
> at failure modes earlier. With this change, I can set the factor to
> like 50 and have the boosting happens every 10 seconds or so.
> 
> Tested with boot parameters:
> locktorture.torture_type=mutex_lock
> locktorture.onoff_interval=1
> locktorture.nwriters_stress=8
> locktorture.stutter=0
> locktorture.rt_boost=1
> locktorture.rt_boost_factor=50
> locktorture.nlocks=3
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/locking/locktorture.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index 5a388ac96a9b..e4529c2166e9 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -47,6 +47,7 @@ torture_param(int, stat_interval, 60,
>  	     "Number of seconds between stats printk()s");
>  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
>  torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> +torture_param(int, rt_boost_factor, 50000, "A factor determining how often rt-boost happens");
>  torture_param(int, verbose, 1,
>  	     "Enable verbose debugging printk()s");
>  torture_param(int, nlocks, 1,
> @@ -132,15 +133,15 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
>  
>  static void torture_rt_boost(struct torture_random_state *trsp)
>  {
> -	const unsigned int factor = 50000; /* yes, quite arbitrary */
> +	const unsigned int factor = rt_boost_factor; /* yes, quite arbitrary */
>  
>  	if (!rt_boost)
>  		return;
>  
>  	if (!rt_task(current)) {
>  		/*
> -		 * Boost priority once every ~50k operations. When the
> -		 * task tries to take the lock, the rtmutex it will account
> +		 * Boost priority once every rt_boost_factor operations. When
> +		 * the task tries to take the lock, the rtmutex it will account
>  		 * for the new priority, and do any corresponding pi-dance.
>  		 */
>  		if (trsp && !(torture_random(trsp) %
> @@ -150,8 +151,9 @@ static void torture_rt_boost(struct torture_random_state *trsp)
>  			return;
>  	} else {
>  		/*
> -		 * The task will remain boosted for another ~500k operations,
> -		 * then restored back to its original prio, and so forth.
> +		 * The task will remain boosted for another 10*rt_boost_factor
Maybe I understand incorrectly, the code is
cxt.nrealwriters_stress * factor * 2, should it be 2 rather than 10?
May I know where the 10 comes from?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable
  2022-12-21  4:28   ` Chen Yu
@ 2023-01-05 18:27     ` Paul E. McKenney
  2023-01-05 20:19       ` Joel Fernandes
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2023-01-05 18:27 UTC (permalink / raw)
  To: Chen Yu
  Cc: Joel Fernandes (Google), linux-kernel, Ben Segall,
	Daniel Bristot de Oliveira, Davidlohr Bueso, Dietmar Eggemann,
	Ingo Molnar, Josh Triplett, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Steven Rostedt, Valentin Schneider,
	Vincent Guittot, kernel-team, John Stultz, Joel Fernandes,
	Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Wed, Dec 21, 2022 at 12:28:39PM +0800, Chen Yu wrote:
> On 2022-11-23 at 01:21:04 +0000, Joel Fernandes (Google) wrote:
> > The rt boosting in locktorture has a factor variable large enough that
> > boosting only happens once every minute or so. Add a tunable to educe
> > the factor so that boosting happens more often, to test paths and arrive
> > at failure modes earlier. With this change, I can set the factor to
> > like 50 and have the boosting happens every 10 seconds or so.
> > 
> > Tested with boot parameters:
> > locktorture.torture_type=mutex_lock
> > locktorture.onoff_interval=1
> > locktorture.nwriters_stress=8
> > locktorture.stutter=0
> > locktorture.rt_boost=1
> > locktorture.rt_boost_factor=50
> > locktorture.nlocks=3
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/locking/locktorture.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> > index 5a388ac96a9b..e4529c2166e9 100644
> > --- a/kernel/locking/locktorture.c
> > +++ b/kernel/locking/locktorture.c
> > @@ -47,6 +47,7 @@ torture_param(int, stat_interval, 60,
> >  	     "Number of seconds between stats printk()s");
> >  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> >  torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> > +torture_param(int, rt_boost_factor, 50000, "A factor determining how often rt-boost happens");
> >  torture_param(int, verbose, 1,
> >  	     "Enable verbose debugging printk()s");
> >  torture_param(int, nlocks, 1,
> > @@ -132,15 +133,15 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
> >  
> >  static void torture_rt_boost(struct torture_random_state *trsp)
> >  {
> > -	const unsigned int factor = 50000; /* yes, quite arbitrary */
> > +	const unsigned int factor = rt_boost_factor; /* yes, quite arbitrary */
> >  
> >  	if (!rt_boost)
> >  		return;
> >  
> >  	if (!rt_task(current)) {
> >  		/*
> > -		 * Boost priority once every ~50k operations. When the
> > -		 * task tries to take the lock, the rtmutex it will account
> > +		 * Boost priority once every rt_boost_factor operations. When
> > +		 * the task tries to take the lock, the rtmutex it will account
> >  		 * for the new priority, and do any corresponding pi-dance.
> >  		 */
> >  		if (trsp && !(torture_random(trsp) %
> > @@ -150,8 +151,9 @@ static void torture_rt_boost(struct torture_random_state *trsp)
> >  			return;
> >  	} else {
> >  		/*
> > -		 * The task will remain boosted for another ~500k operations,
> > -		 * then restored back to its original prio, and so forth.
> > +		 * The task will remain boosted for another 10*rt_boost_factor
> Maybe I understand incorrectly, the code is
> cxt.nrealwriters_stress * factor * 2, should it be 2 rather than 10?

It looks that way to me, but I might be missing something.  Joel?

							Thanx, Paul

> May I know where the 10 comes from?
> 
> thanks,
> Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable
  2023-01-05 18:27     ` Paul E. McKenney
@ 2023-01-05 20:19       ` Joel Fernandes
  0 siblings, 0 replies; 18+ messages in thread
From: Joel Fernandes @ 2023-01-05 20:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Chen Yu, linux-kernel, Ben Segall, Daniel Bristot de Oliveira,
	Davidlohr Bueso, Dietmar Eggemann, Ingo Molnar, Josh Triplett,
	Juri Lelli, Mel Gorman, Peter Zijlstra, Steven Rostedt,
	Valentin Schneider, Vincent Guittot, kernel-team, John Stultz,
	Qais Yousef, Will Deacon, Waiman Long, Boqun Feng,
	Connor O'Brien

On Thu, Jan 05, 2023 at 10:27:18AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 21, 2022 at 12:28:39PM +0800, Chen Yu wrote:
> > On 2022-11-23 at 01:21:04 +0000, Joel Fernandes (Google) wrote:
> > > The rt boosting in locktorture has a factor variable large enough that
> > > boosting only happens once every minute or so. Add a tunable to educe
> > > the factor so that boosting happens more often, to test paths and arrive
> > > at failure modes earlier. With this change, I can set the factor to
> > > like 50 and have the boosting happens every 10 seconds or so.
> > > 
> > > Tested with boot parameters:
> > > locktorture.torture_type=mutex_lock
> > > locktorture.onoff_interval=1
> > > locktorture.nwriters_stress=8
> > > locktorture.stutter=0
> > > locktorture.rt_boost=1
> > > locktorture.rt_boost_factor=50
> > > locktorture.nlocks=3
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/locking/locktorture.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> > > index 5a388ac96a9b..e4529c2166e9 100644
> > > --- a/kernel/locking/locktorture.c
> > > +++ b/kernel/locking/locktorture.c
> > > @@ -47,6 +47,7 @@ torture_param(int, stat_interval, 60,
> > >  	     "Number of seconds between stats printk()s");
> > >  torture_param(int, stutter, 5, "Number of jiffies to run/halt test, 0=disable");
> > >  torture_param(int, rt_boost, 0, "Perform an rt-boost from the writer, always 1 for rtmutex_lock");
> > > +torture_param(int, rt_boost_factor, 50000, "A factor determining how often rt-boost happens");
> > >  torture_param(int, verbose, 1,
> > >  	     "Enable verbose debugging printk()s");
> > >  torture_param(int, nlocks, 1,
> > > @@ -132,15 +133,15 @@ static void torture_lock_busted_write_unlock(int tid __maybe_unused)
> > >  
> > >  static void torture_rt_boost(struct torture_random_state *trsp)
> > >  {
> > > -	const unsigned int factor = 50000; /* yes, quite arbitrary */
> > > +	const unsigned int factor = rt_boost_factor; /* yes, quite arbitrary */
> > >  
> > >  	if (!rt_boost)
> > >  		return;
> > >  
> > >  	if (!rt_task(current)) {
> > >  		/*
> > > -		 * Boost priority once every ~50k operations. When the
> > > -		 * task tries to take the lock, the rtmutex it will account
> > > +		 * Boost priority once every rt_boost_factor operations. When
> > > +		 * the task tries to take the lock, the rtmutex it will account
> > >  		 * for the new priority, and do any corresponding pi-dance.
> > >  		 */
> > >  		if (trsp && !(torture_random(trsp) %
> > > @@ -150,8 +151,9 @@ static void torture_rt_boost(struct torture_random_state *trsp)
> > >  			return;
> > >  	} else {
> > >  		/*
> > > -		 * The task will remain boosted for another ~500k operations,
> > > -		 * then restored back to its original prio, and so forth.
> > > +		 * The task will remain boosted for another 10*rt_boost_factor
> > Maybe I understand incorrectly, the code is
> > cxt.nrealwriters_stress * factor * 2, should it be 2 rather than 10?
> 
> It looks that way to me, but I might be missing something.  Joel?
> > May I know where the 10 comes from?

The comment in existing code was 500k ops.

Yes, Chen is right, the comment can be improved to mention the actual
equation. I was just going by the initial comment of ~500K ops. Since factor
now defaults to 50k, this translates to 500k (10 times the factor) ops which
it does for a 4-5 CPU system.

But I am Ok with the comment changing to what Chen suggested though!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-01-05 20:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-23  1:21 [PATCH RFC 0/3] Patches on top of PE series Joel Fernandes (Google)
2022-11-23  1:21 ` [PATCH RFC 1/3] sched/pe: Exclude balance callback queuing during proxy()'s migrate Joel Fernandes (Google)
2022-12-09 15:07   ` Dietmar Eggemann
2022-12-09 16:52     ` Joel Fernandes
2022-12-12 14:39       ` Dietmar Eggemann
2022-12-15 23:12         ` Joel Fernandes
2022-12-15 23:31           ` Joel Fernandes
2022-11-23  1:21 ` [PATCH RFC 2/3] locktorture: Allow non-rtmutex lock types to be boosted Joel Fernandes (Google)
2022-12-07 22:14   ` Paul E. McKenney
2022-12-07 22:23     ` Joel Fernandes
2022-12-07 22:42       ` Joel Fernandes
2022-12-08  5:06       ` Paul E. McKenney
2022-12-21  4:21   ` Chen Yu
2022-11-23  1:21 ` [PATCH RFC 3/3] locktorture: Make the rt_boost factor a tunable Joel Fernandes (Google)
2022-12-07 22:15   ` Paul E. McKenney
2022-12-21  4:28   ` Chen Yu
2023-01-05 18:27     ` Paul E. McKenney
2023-01-05 20:19       ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.