* [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() @ 2023-08-24 8:42 Zqiang 2023-08-24 8:59 ` Z qiang 2023-08-24 13:21 ` Paul E. McKenney 0 siblings, 2 replies; 14+ messages in thread From: Zqiang @ 2023-08-24 8:42 UTC (permalink / raw) To: paulmck, joel; +Cc: qiang.zhang1211, linux-kernel, rcu Currently, the maxcpu is set by traversing online CPUs, however, if the rcutorture.onoff_holdoff is set zero and onoff_interval is set non-zero, and the some CPUs with larger cpuid has been offline before setting maxcpu, for these CPUs, even if they are online again, also cannot be offload or deoffload. This commit therefore use for_each_possible_cpu() instead of for_each_online_cpu() in rcu_nocb_toggle(). Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- kernel/rcu/rcutorture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index a58372bdf0c1..b75d0fe558ce 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); while (!rcu_inkernel_boot_has_ended()) schedule_timeout_interruptible(HZ / 10); - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) maxcpu = cpu; WARN_ON(maxcpu < 0); if (toggle_interval > ULONG_MAX) -- 2.17.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-24 8:42 [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() Zqiang @ 2023-08-24 8:59 ` Z qiang 2023-08-24 13:21 ` Paul E. McKenney 1 sibling, 0 replies; 14+ messages in thread From: Z qiang @ 2023-08-24 8:59 UTC (permalink / raw) To: paulmck, joel; +Cc: linux-kernel, rcu Sorry for the repeat sending. > > Currently, the maxcpu is set by traversing online CPUs, however, if > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > non-zero, and the some CPUs with larger cpuid has been offline before > setting maxcpu, for these CPUs, even if they are online again, also > cannot be offload or deoffload. > > This commit therefore use for_each_possible_cpu() instead of > for_each_online_cpu() in rcu_nocb_toggle(). > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > kernel/rcu/rcutorture.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > index a58372bdf0c1..b75d0fe558ce 100644 > --- a/kernel/rcu/rcutorture.c > +++ b/kernel/rcu/rcutorture.c > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > while (!rcu_inkernel_boot_has_ended()) > schedule_timeout_interruptible(HZ / 10); > - for_each_online_cpu(cpu) > + for_each_possible_cpu(cpu) > maxcpu = cpu; > WARN_ON(maxcpu < 0); > if (toggle_interval > ULONG_MAX) > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-24 8:42 [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() Zqiang 2023-08-24 8:59 ` Z qiang @ 2023-08-24 13:21 ` Paul E. McKenney 2023-08-25 2:28 ` Z qiang 1 sibling, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2023-08-24 13:21 UTC (permalink / raw) To: Zqiang; +Cc: joel, linux-kernel, rcu On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > Currently, the maxcpu is set by traversing online CPUs, however, if > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > non-zero, and the some CPUs with larger cpuid has been offline before > setting maxcpu, for these CPUs, even if they are online again, also > cannot be offload or deoffload. > > This commit therefore use for_each_possible_cpu() instead of > for_each_online_cpu() in rcu_nocb_toggle(). > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > kernel/rcu/rcutorture.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > index a58372bdf0c1..b75d0fe558ce 100644 > --- a/kernel/rcu/rcutorture.c > +++ b/kernel/rcu/rcutorture.c > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > while (!rcu_inkernel_boot_has_ended()) > schedule_timeout_interruptible(HZ / 10); > - for_each_online_cpu(cpu) > + for_each_possible_cpu(cpu) Last I checked, bad things could happen if the code attempted to nocb_toggle a CPU that had not yet come online. Has that changed? Thanx, Paul > maxcpu = cpu; > WARN_ON(maxcpu < 0); > if (toggle_interval > ULONG_MAX) > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-24 13:21 ` Paul E. McKenney @ 2023-08-25 2:28 ` Z qiang 2023-08-26 1:29 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Z qiang @ 2023-08-25 2:28 UTC (permalink / raw) To: paulmck; +Cc: joel, linux-kernel, rcu > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > Currently, the maxcpu is set by traversing online CPUs, however, if > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > non-zero, and the some CPUs with larger cpuid has been offline before > > setting maxcpu, for these CPUs, even if they are online again, also > > cannot be offload or deoffload. > > > > This commit therefore use for_each_possible_cpu() instead of > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > --- > > kernel/rcu/rcutorture.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > index a58372bdf0c1..b75d0fe558ce 100644 > > --- a/kernel/rcu/rcutorture.c > > +++ b/kernel/rcu/rcutorture.c > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > while (!rcu_inkernel_boot_has_ended()) > > schedule_timeout_interruptible(HZ / 10); > > - for_each_online_cpu(cpu) > > + for_each_possible_cpu(cpu) > > Last I checked, bad things could happen if the code attempted to > nocb_toggle a CPU that had not yet come online. Has that changed? For example, there are 8 online CPUs in the system, before we traversing online CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle for CPU7(maxcpu=6) Even though we still use for_each_online_cpu(), the things described above also happen. before we toggle the CPU, this CPU has been offline. Thanks Zqiang > > Thanx, Paul > > > maxcpu = cpu; > > WARN_ON(maxcpu < 0); > > if (toggle_interval > ULONG_MAX) > > -- > > 2.17.1 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-25 2:28 ` Z qiang @ 2023-08-26 1:29 ` Paul E. McKenney 2023-08-26 6:13 ` Z qiang 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2023-08-26 1:29 UTC (permalink / raw) To: Z qiang; +Cc: joel, linux-kernel, rcu On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > setting maxcpu, for these CPUs, even if they are online again, also > > > cannot be offload or deoffload. > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > --- > > > kernel/rcu/rcutorture.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > --- a/kernel/rcu/rcutorture.c > > > +++ b/kernel/rcu/rcutorture.c > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > while (!rcu_inkernel_boot_has_ended()) > > > schedule_timeout_interruptible(HZ / 10); > > > - for_each_online_cpu(cpu) > > > + for_each_possible_cpu(cpu) > > > > Last I checked, bad things could happen if the code attempted to > > nocb_toggle a CPU that had not yet come online. Has that changed? > > For example, there are 8 online CPUs in the system, before we traversing online > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > for CPU7(maxcpu=6) > > Even though we still use for_each_online_cpu(), the things described > above also happen. before we toggle the CPU, this CPU has been offline. Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, only 0 and 1 are present in this system, and until some manual action is taken, only 0 and 1 will ever be online. (Yes, this really can happen!) In that state, won't toggling CPU 2 and 3 result in failures? Thanx, Paul > Thanks > Zqiang > > > > > > Thanx, Paul > > > > > maxcpu = cpu; > > > WARN_ON(maxcpu < 0); > > > if (toggle_interval > ULONG_MAX) > > > -- > > > 2.17.1 > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-26 1:29 ` Paul E. McKenney @ 2023-08-26 6:13 ` Z qiang 2023-08-26 13:06 ` Paul E. McKenney 0 siblings, 1 reply; 14+ messages in thread From: Z qiang @ 2023-08-26 6:13 UTC (permalink / raw) To: paulmck; +Cc: joel, linux-kernel, rcu > > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > cannot be offload or deoffload. > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > --- > > > > kernel/rcu/rcutorture.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > --- a/kernel/rcu/rcutorture.c > > > > +++ b/kernel/rcu/rcutorture.c > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > while (!rcu_inkernel_boot_has_ended()) > > > > schedule_timeout_interruptible(HZ / 10); > > > > - for_each_online_cpu(cpu) > > > > + for_each_possible_cpu(cpu) > > > > > > Last I checked, bad things could happen if the code attempted to > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > For example, there are 8 online CPUs in the system, before we traversing online > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > for CPU7(maxcpu=6) > > > > Even though we still use for_each_online_cpu(), the things described > > above also happen. before we toggle the CPU, this CPU has been offline. > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > only 0 and 1 are present in this system, and until some manual action is > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > In that state, won't toggling CPU 2 and 3 result in failures? > Agree. As long as we enabled rcutorture.onoff_interval, regardless of whether we use online CPUs or possible CPUs to set maxcpu, It is all possible to toggling the CPUs failure and print "NOCB: Cannot CB-offload offline CPU" log. but the failures due to CPU offline are acceptable. but at least the toggling operation on CPU7 will not be missed. when CPU7 comes online again. Would it be better to use for_each_present_cpu() ? Thanks Zqiang > > Thanx, Paul > > > Thanks > > Zqiang > > > > > > > > > > Thanx, Paul > > > > > > > maxcpu = cpu; > > > > WARN_ON(maxcpu < 0); > > > > if (toggle_interval > ULONG_MAX) > > > > -- > > > > 2.17.1 > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-26 6:13 ` Z qiang @ 2023-08-26 13:06 ` Paul E. McKenney 2023-08-28 15:17 ` Frederic Weisbecker 0 siblings, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2023-08-26 13:06 UTC (permalink / raw) To: Z qiang; +Cc: joel, linux-kernel, rcu On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: > > > > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > > cannot be offload or deoffload. > > > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > --- > > > > > kernel/rcu/rcutorture.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > > --- a/kernel/rcu/rcutorture.c > > > > > +++ b/kernel/rcu/rcutorture.c > > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > > while (!rcu_inkernel_boot_has_ended()) > > > > > schedule_timeout_interruptible(HZ / 10); > > > > > - for_each_online_cpu(cpu) > > > > > + for_each_possible_cpu(cpu) > > > > > > > > Last I checked, bad things could happen if the code attempted to > > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > > > For example, there are 8 online CPUs in the system, before we traversing online > > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > > for CPU7(maxcpu=6) > > > > > > Even though we still use for_each_online_cpu(), the things described > > > above also happen. before we toggle the CPU, this CPU has been offline. > > > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > > only 0 and 1 are present in this system, and until some manual action is > > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > > In that state, won't toggling CPU 2 and 3 result in failures? > > > > Agree. > As long as we enabled rcutorture.onoff_interval, regardless of whether we use > online CPUs or possible CPUs to set maxcpu, It is all possible to > toggling the CPUs failure > and print "NOCB: Cannot CB-offload offline CPU" log. but the failures > due to CPU offline are acceptable. > > but at least the toggling operation on CPU7 will not be missed. when > CPU7 comes online again. > > Would it be better to use for_each_present_cpu() ? The problem we face is that RCU and rcutorture have no reasonable way of knowing when the boot-time CPU bringup has completed. If there was a way of knowing that, then my approach would be to make rcutorture react to a holdoff of zero by waiting for all the CPUs to come online. Failing that, for_each_present_cpu() with a holdoff of zero will likely get us transient failures between the time rcutorture starts and the last CPU has come online. Or is there now a way for in-kernel code know when boot-time CPU onlining has completed? Thanx, Paul > Thanks > Zqiang > > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > maxcpu = cpu; > > > > > WARN_ON(maxcpu < 0); > > > > > if (toggle_interval > ULONG_MAX) > > > > > -- > > > > > 2.17.1 > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-26 13:06 ` Paul E. McKenney @ 2023-08-28 15:17 ` Frederic Weisbecker 2023-08-28 21:51 ` Joel Fernandes 0 siblings, 1 reply; 14+ messages in thread From: Frederic Weisbecker @ 2023-08-28 15:17 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Z qiang, joel, linux-kernel, rcu Le Sat, Aug 26, 2023 at 06:06:20AM -0700, Paul E. McKenney a écrit : > On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: > > > > > > On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: > > > > > > > > > > On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: > > > > > > Currently, the maxcpu is set by traversing online CPUs, however, if > > > > > > the rcutorture.onoff_holdoff is set zero and onoff_interval is set > > > > > > non-zero, and the some CPUs with larger cpuid has been offline before > > > > > > setting maxcpu, for these CPUs, even if they are online again, also > > > > > > cannot be offload or deoffload. > > > > > > > > > > > > This commit therefore use for_each_possible_cpu() instead of > > > > > > for_each_online_cpu() in rcu_nocb_toggle(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > > > > > --- > > > > > > kernel/rcu/rcutorture.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > > > > > > index a58372bdf0c1..b75d0fe558ce 100644 > > > > > > --- a/kernel/rcu/rcutorture.c > > > > > > +++ b/kernel/rcu/rcutorture.c > > > > > > @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) > > > > > > VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); > > > > > > while (!rcu_inkernel_boot_has_ended()) > > > > > > schedule_timeout_interruptible(HZ / 10); > > > > > > - for_each_online_cpu(cpu) > > > > > > + for_each_possible_cpu(cpu) > > > > > > > > > > Last I checked, bad things could happen if the code attempted to > > > > > nocb_toggle a CPU that had not yet come online. Has that changed? > > > > > > > > For example, there are 8 online CPUs in the system, before we traversing online > > > > CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle > > > > for CPU7(maxcpu=6) > > > > > > > > Even though we still use for_each_online_cpu(), the things described > > > > above also happen. before we toggle the CPU, this CPU has been offline. > > > > > > Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, > > > only 0 and 1 are present in this system, and until some manual action is > > > taken, only 0 and 1 will ever be online. (Yes, this really can happen!) > > > In that state, won't toggling CPU 2 and 3 result in failures? > > > > > > > Agree. > > As long as we enabled rcutorture.onoff_interval, regardless of whether we use > > online CPUs or possible CPUs to set maxcpu, It is all possible to > > toggling the CPUs failure > > and print "NOCB: Cannot CB-offload offline CPU" log. but the failures > > due to CPU offline are acceptable. > > > > but at least the toggling operation on CPU7 will not be missed. when > > CPU7 comes online again. > > > > Would it be better to use for_each_present_cpu() ? > > The problem we face is that RCU and rcutorture have no reasonable way > of knowing when the boot-time CPU bringup has completed. If there was a > way of knowing that, then my approach would be to make rcutorture react > to a holdoff of zero by waiting for all the CPUs to come online. > > Failing that, for_each_present_cpu() with a holdoff of zero will likely > get us transient failures between the time rcutorture starts and the > last CPU has come online. > > Or is there now a way for in-kernel code know when boot-time CPU onlining > has completed? We don't need to wait for all CPUs to be online though. Toggling already handles well failures due to offline CPUs and since toggling happens concurrently with offlining in rcutorture, we already see lots of failures reported in the logs. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-28 15:17 ` Frederic Weisbecker @ 2023-08-28 21:51 ` Joel Fernandes 2023-08-29 9:24 ` Frederic Weisbecker 0 siblings, 1 reply; 14+ messages in thread From: Joel Fernandes @ 2023-08-28 21:51 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Paul E. McKenney, Z qiang, linux-kernel, rcu > On Aug 28, 2023, at 11:17 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > Le Sat, Aug 26, 2023 at 06:06:20AM -0700, Paul E. McKenney a écrit : >> On Sat, Aug 26, 2023 at 02:13:39PM +0800, Z qiang wrote: >>>> >>>> On Fri, Aug 25, 2023 at 10:28:37AM +0800, Z qiang wrote: >>>>>> >>>>>>> On Thu, Aug 24, 2023 at 04:42:06PM +0800, Zqiang wrote: >>>>>>>> Currently, the maxcpu is set by traversing online CPUs, however, if >>>>>>>> the rcutorture.onoff_holdoff is set zero and onoff_interval is set >>>>>>>> non-zero, and the some CPUs with larger cpuid has been offline before >>>>>>>> setting maxcpu, for these CPUs, even if they are online again, also >>>>>>>> cannot be offload or deoffload. >>>>>>>> >>>>>>>> This commit therefore use for_each_possible_cpu() instead of >>>>>>>> for_each_online_cpu() in rcu_nocb_toggle(). >>>>>>>> >>>>>>>> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> >>>>>>>> --- >>>>>>>> kernel/rcu/rcutorture.c | 2 +- >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c >>>>>>>> index a58372bdf0c1..b75d0fe558ce 100644 >>>>>>>> --- a/kernel/rcu/rcutorture.c >>>>>>>> +++ b/kernel/rcu/rcutorture.c >>>>>>>> @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) >>>>>>>> VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); >>>>>>>> while (!rcu_inkernel_boot_has_ended()) >>>>>>>> schedule_timeout_interruptible(HZ / 10); >>>>>>>> - for_each_online_cpu(cpu) >>>>>>>> + for_each_possible_cpu(cpu) >>>>>>> >>>>>>> Last I checked, bad things could happen if the code attempted to >>>>>>> nocb_toggle a CPU that had not yet come online. Has that changed? >>>>>> >>>>>> For example, there are 8 online CPUs in the system, before we traversing online >>>>>> CPUs and set maxcpu, CPU7 has been offline, this causes us to miss nocb_toggle >>>>>> for CPU7(maxcpu=6) >>>>>> >>>>>> Even though we still use for_each_online_cpu(), the things described >>>>>> above also happen. before we toggle the CPU, this CPU has been offline. >>>>> >>>>> Suppose we have a system whose possible CPUs are 0, 1, 2, and 3. However, >>>>> only 0 and 1 are present in this system, and until some manual action is >>>>> taken, only 0 and 1 will ever be online. (Yes, this really can happen!) >>>>> In that state, won't toggling CPU 2 and 3 result in failures? >>>>> >>> >>> Agree. >>> As long as we enabled rcutorture.onoff_interval, regardless of whether we use >>> online CPUs or possible CPUs to set maxcpu, It is all possible to >>> toggling the CPUs failure >>> and print "NOCB: Cannot CB-offload offline CPU" log. but the failures >>> due to CPU offline are acceptable. >>> >>> but at least the toggling operation on CPU7 will not be missed. when >>> CPU7 comes online again. >>> >>> Would it be better to use for_each_present_cpu() ? >> >> The problem we face is that RCU and rcutorture have no reasonable way >> of knowing when the boot-time CPU bringup has completed. If there was a >> way of knowing that, then my approach would be to make rcutorture react >> to a holdoff of zero by waiting for all the CPUs to come online. >> >> Failing that, for_each_present_cpu() with a holdoff of zero will likely >> get us transient failures between the time rcutorture starts and the >> last CPU has come online. >> >> Or is there now a way for in-kernel code know when boot-time CPU onlining >> has completed? > > We don't need to wait for all CPUs to be online though. Toggling > already handles well failures due to offline CPUs and since toggling > happens concurrently with offlining in rcutorture, we already see lots > of failures reported in the logs. I think the issue is the loop later in the function does not try to toggle cpus that came online too late. So it does not test offloading on all CPUs just because max got updated too late. One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 ever got onlined. If it did, update the maxcpu. Thanks. > > Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-28 21:51 ` Joel Fernandes @ 2023-08-29 9:24 ` Frederic Weisbecker 2023-08-29 10:12 ` Joel Fernandes 2023-08-29 12:38 ` Paul E. McKenney 0 siblings, 2 replies; 14+ messages in thread From: Frederic Weisbecker @ 2023-08-29 9:24 UTC (permalink / raw) To: Joel Fernandes; +Cc: Paul E. McKenney, Z qiang, linux-kernel, rcu On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > I think the issue is the loop later in the function does > not try to toggle cpus that came online too late. > > So it does not test offloading on all CPUs just because max got updated too > late. Right, and therefore for_each_possible_cpu() or for_each_present_cpu() should be fine to iterate since it's ok to try to toggle an offline CPU. > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > ever got onlined. If it did, update the maxcpu. Is it worth the complication though? Thanks. > > Thanks. > > > > > > Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-29 9:24 ` Frederic Weisbecker @ 2023-08-29 10:12 ` Joel Fernandes 2023-08-29 12:38 ` Paul E. McKenney 1 sibling, 0 replies; 14+ messages in thread From: Joel Fernandes @ 2023-08-29 10:12 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Paul E. McKenney, Z qiang, linux-kernel, rcu > On Aug 29, 2023, at 5:24 AM, Frederic Weisbecker <frederic@kernel.org> wrote: > > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: >> I think the issue is the loop later in the function does >> not try to toggle cpus that came online too late. >> >> So it does not test offloading on all CPUs just because max got updated too >> late. > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > should be fine to iterate since it's ok to try to toggle an offline CPU. Ah I see what you mean, sounds good. > >> >> One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 >> ever got onlined. If it did, update the maxcpu. > > Is it worth the complication though? Probably not and so your suggestion sounds fine. Thanks! - Joel > > Thanks. > >> >> Thanks. >> >> >>> >>> Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-29 9:24 ` Frederic Weisbecker 2023-08-29 10:12 ` Joel Fernandes @ 2023-08-29 12:38 ` Paul E. McKenney 2023-09-01 14:56 ` Paul E. McKenney 1 sibling, 1 reply; 14+ messages in thread From: Paul E. McKenney @ 2023-08-29 12:38 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Joel Fernandes, Z qiang, linux-kernel, rcu On Tue, Aug 29, 2023 at 11:24:24AM +0200, Frederic Weisbecker wrote: > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > > I think the issue is the loop later in the function does > > not try to toggle cpus that came online too late. > > > > So it does not test offloading on all CPUs just because max got updated too > > late. > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > should be fine to iterate since it's ok to try to toggle an offline CPU. OK, so I will accept the original patch which did just that. Thank you! I recently got burned by lack of workqueues on a not-yet-onlined CPU, hence my questions here. ;-) Thanx, Paul > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > > ever got onlined. If it did, update the maxcpu. > > Is it worth the complication though? > > Thanks. > > > > > Thanks. > > > > > > > > > > Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() 2023-08-29 12:38 ` Paul E. McKenney @ 2023-09-01 14:56 ` Paul E. McKenney 0 siblings, 0 replies; 14+ messages in thread From: Paul E. McKenney @ 2023-09-01 14:56 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Joel Fernandes, Z qiang, linux-kernel, rcu On Tue, Aug 29, 2023 at 05:38:54AM -0700, Paul E. McKenney wrote: > On Tue, Aug 29, 2023 at 11:24:24AM +0200, Frederic Weisbecker wrote: > > On Mon, Aug 28, 2023 at 05:51:09PM -0400, Joel Fernandes wrote: > > > I think the issue is the loop later in the function does > > > not try to toggle cpus that came online too late. > > > > > > So it does not test offloading on all CPUs just because max got updated too > > > late. > > > > Right, and therefore for_each_possible_cpu() or for_each_present_cpu() > > should be fine to iterate since it's ok to try to toggle an offline CPU. > > OK, so I will accept the original patch which did just that. Which I finally did. I took the liberty of adding Frederic's Reviewed-by, but please let me know if this is in any way inappropriate. Thanx, Paul > Thank you! > > I recently got burned by lack of workqueues on a not-yet-onlined CPU, > hence my questions here. ;-) > > Thanx, Paul > > > > One fix could be to periodically check in the loop if a new cpu at maxcpu + 1 > > > ever got onlined. If it did, update the maxcpu. > > > > Is it worth the complication though? > > > > Thanks. > > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() @ 2023-08-24 8:41 Zqiang 0 siblings, 0 replies; 14+ messages in thread From: Zqiang @ 2023-08-24 8:41 UTC (permalink / raw) To: paulmck, joel; +Cc: qiang.zhang1211, linux-kernel, rcu Currently, the maxcpu is set by traversing online CPUs, however, if the rcutorture.onoff_holdoff is set zero and onoff_interval is set non-zero, and the some CPUs with larger cpuid has been offline before setting maxcpu, for these CPUs, even if they are online again, also cannot be offload or deoffload. This commit therefore use for_each_possible_cpu() instead of for_each_online_cpu() in rcu_nocb_toggle(). Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- kernel/rcu/rcutorture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index a58372bdf0c1..b75d0fe558ce 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -2131,7 +2131,7 @@ static int rcu_nocb_toggle(void *arg) VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); while (!rcu_inkernel_boot_has_ended()) schedule_timeout_interruptible(HZ / 10); - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) maxcpu = cpu; WARN_ON(maxcpu < 0); if (toggle_interval > ULONG_MAX) -- 2.17.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-09-01 14:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-24 8:42 [PATCH] rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle() Zqiang 2023-08-24 8:59 ` Z qiang 2023-08-24 13:21 ` Paul E. McKenney 2023-08-25 2:28 ` Z qiang 2023-08-26 1:29 ` Paul E. McKenney 2023-08-26 6:13 ` Z qiang 2023-08-26 13:06 ` Paul E. McKenney 2023-08-28 15:17 ` Frederic Weisbecker 2023-08-28 21:51 ` Joel Fernandes 2023-08-29 9:24 ` Frederic Weisbecker 2023-08-29 10:12 ` Joel Fernandes 2023-08-29 12:38 ` Paul E. McKenney 2023-09-01 14:56 ` Paul E. McKenney -- strict thread matches above, loose matches on Subject: below -- 2023-08-24 8:41 Zqiang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox