netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [syzbot] WARNING in c_start
       [not found]     ` <Y0qfLyhSoTodAdxu@zn.tnic>
@ 2022-10-15 20:44       ` Yury Norov
  2022-10-16  0:24         ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Yury Norov @ 2022-10-15 20:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Andrew Jones, netdev,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, yury.norov, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

Add people from other threads discussing this.

On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
> > That's an invalid command line. The correct syntax is:
> > 
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> The fix is not in Linus' tree yet.
> 
> > Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
> > other architectures have the same problem. And fixing all callers will
> > not be in time for this merge window.
> 
> Why won't there be time? That's why the -rcs are for.
> 
> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
> 
> So no, we will take Andrew's fixes for all arches in time for 6.1.

Summarizing things:

1. cpumask_check() was introduced to make sure that the cpu number
passed into cpumask API belongs to a valid range. But the check is
broken for a very long time. And because of that there are a lot of
places where cpumask API is used wrongly.

2. Underlying bitmap functions handle that correctly - when user
passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
everything is working smoothly.

3. I fixed all warnings that I was aware at the time of submitting the
patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
proposed fixes for c_start() in arch code.

4. The code paths mentioned above are all known to me that violate
cpumask_check() rules. (Did I miss something?)

With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
me about this problem with c_start(). But I don't like the idea to revert
cpumask_check() fix. This way we'll never clean that mess. 

If for some reason those warnings are unacceptable for -rcs (and like
Boris, I don't understand why), than instead of reverting commits, I'd
suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
fix their code. I can send a patch shortly, if we'll decide going this way.

How people would even realize that they're doing something wrong if
they will not get warned about it?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-15 20:44       ` [syzbot] WARNING in c_start Yury Norov
@ 2022-10-16  0:24         ` Tetsuo Handa
  2022-10-16  0:28           ` Randy Dunlap
  2022-10-16  1:12           ` Yury Norov
  0 siblings, 2 replies; 9+ messages in thread
From: Tetsuo Handa @ 2022-10-16  0:24 UTC (permalink / raw)
  To: Yury Norov, Borislav Petkov, Linus Torvalds
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

On 2022/10/16 5:44, Yury Norov wrote:
> Add people from other threads discussing this.
> 
> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>> That's an invalid command line. The correct syntax is:
>>>
>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> The fix is not in Linus' tree yet.
>>
>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>> other architectures have the same problem. And fixing all callers will
>>> not be in time for this merge window.
>>
>> Why won't there be time? That's why the -rcs are for.
>>
>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>
>> So no, we will take Andrew's fixes for all arches in time for 6.1.
> 
> Summarizing things:
> 
> 1. cpumask_check() was introduced to make sure that the cpu number
> passed into cpumask API belongs to a valid range. But the check is
> broken for a very long time. And because of that there are a lot of
> places where cpumask API is used wrongly.
> 
> 2. Underlying bitmap functions handle that correctly - when user
> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
> everything is working smoothly.
> 
> 3. I fixed all warnings that I was aware at the time of submitting the
> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
> proposed fixes for c_start() in arch code.
> 
> 4. The code paths mentioned above are all known to me that violate
> cpumask_check() rules. (Did I miss something?)
> 
> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
> me about this problem with c_start(). But I don't like the idea to revert
> cpumask_check() fix. This way we'll never clean that mess. 
> 
> If for some reason those warnings are unacceptable for -rcs (and like
> Boris, I don't understand why), than instead of reverting commits, I'd
> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
> fix their code. I can send a patch shortly, if we'll decide going this way.
> 
> How people would even realize that they're doing something wrong if
> they will not get warned about it?

I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
Just printing messages (without "BUG:"/"WARNING:" string which also breaks
syzkaller) like below diff is sufficient for people to realize that they're
doing something wrong.

Again, please do revert "cpumask: fix checking valid cpu range" immediately.

 include/linux/cpumask.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2aa0aa26b45..31af2cc5f0c2 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
 	return cpu;
 }
 
+/*
+ * We want to avoid passing -1 as a valid cpu argument.
+ * But we should not crash the kernel until all in-tree callers are fixed.
+ */
+static __always_inline void report_negative_cpuid(void)
+{
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
+	DO_ONCE_LITE(dump_stack);
+#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
 /**
  * cpumask_first - get the first cpu in a cpumask
  * @srcp: the cpumask pointer
@@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
 }
 
@@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
 }
 
@@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
 		nr_cpumask_bits, n + 1);
 }
@@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
 	cpumask_check(start);
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 
 	/*
 	 * Return the first available CPU when wrapping, or when starting before cpu0,
-- 
2.18.4

> 
> Thanks,
> Yury


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:24         ` Tetsuo Handa
@ 2022-10-16  0:28           ` Randy Dunlap
  2022-10-16  0:34             ` Tetsuo Handa
  2022-10-16  1:12           ` Yury Norov
  1 sibling, 1 reply; 9+ messages in thread
From: Randy Dunlap @ 2022-10-16  0:28 UTC (permalink / raw)
  To: Tetsuo Handa, Yury Norov, Borislav Petkov, Linus Torvalds
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

Hi--

On 10/15/22 17:24, Tetsuo Handa wrote:
> On 2022/10/16 5:44, Yury Norov wrote:
>> Add people from other threads discussing this.
>>
>> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>>> That's an invalid command line. The correct syntax is:
>>>>
>>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>
>>> The fix is not in Linus' tree yet.
>>>
>>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>>> other architectures have the same problem. And fixing all callers will
>>>> not be in time for this merge window.
>>>
>>> Why won't there be time? That's why the -rcs are for.
>>>
>>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>>
>>> So no, we will take Andrew's fixes for all arches in time for 6.1.
>>
>> Summarizing things:
>>
>> 1. cpumask_check() was introduced to make sure that the cpu number
>> passed into cpumask API belongs to a valid range. But the check is
>> broken for a very long time. And because of that there are a lot of
>> places where cpumask API is used wrongly.
>>
>> 2. Underlying bitmap functions handle that correctly - when user
>> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
>> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
>> everything is working smoothly.
>>
>> 3. I fixed all warnings that I was aware at the time of submitting the
>> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
>> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
>> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
>> proposed fixes for c_start() in arch code.
>>
>> 4. The code paths mentioned above are all known to me that violate
>> cpumask_check() rules. (Did I miss something?)
>>
>> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
>> me about this problem with c_start(). But I don't like the idea to revert
>> cpumask_check() fix. This way we'll never clean that mess. 
>>
>> If for some reason those warnings are unacceptable for -rcs (and like
>> Boris, I don't understand why), than instead of reverting commits, I'd
>> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
>> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
>> fix their code. I can send a patch shortly, if we'll decide going this way.
>>
>> How people would even realize that they're doing something wrong if
>> they will not get warned about it?
> 
> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
> syzkaller) like below diff is sufficient for people to realize that they're
> doing something wrong.
> 
> Again, please do revert "cpumask: fix checking valid cpu range" immediately.
> 
>  include/linux/cpumask.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index c2aa0aa26b45..31af2cc5f0c2 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
>  	return cpu;
>  }
>  
> +/*
> + * We want to avoid passing -1 as a valid cpu argument.
> + * But we should not crash the kernel until all in-tree callers are fixed.
> + */

Why not say that any negative cpu argument is invalid?
Or is it OK to pass -2 as the cpu arg?

> +static __always_inline void report_negative_cpuid(void)
> +{
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
> +	DO_ONCE_LITE(dump_stack);
> +#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
> +}
> +
>  /**
>   * cpumask_first - get the first cpu in a cpumask
>   * @srcp: the cpumask pointer
> @@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
>  }
>  
> @@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
>  }
>  
> @@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
>  		nr_cpumask_bits, n + 1);
>  }
> @@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
>  	cpumask_check(start);
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  
>  	/*
>  	 * Return the first available CPU when wrapping, or when starting before cpu0,

-- 
~Randy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:28           ` Randy Dunlap
@ 2022-10-16  0:34             ` Tetsuo Handa
  0 siblings, 0 replies; 9+ messages in thread
From: Tetsuo Handa @ 2022-10-16  0:34 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel, Yury Norov, Borislav Petkov,
	Linus Torvalds

On 2022/10/16 9:28, Randy Dunlap wrote:
>> +/*
>> + * We want to avoid passing -1 as a valid cpu argument.
>> + * But we should not crash the kernel until all in-tree callers are fixed.
>> + */
> 
> Why not say that any negative cpu argument is invalid?

Currently only -1 is accepted as exception.

>>  	if (n != -1)
>>  		cpumask_check(n);
>> +	else
>> +		report_negative_cpuid();

> Or is it OK to pass -2 as the cpu arg?

Passing -2 will hit WARN_ON_ONCE(cpu >= nr_cpumask_bits) path.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:24         ` Tetsuo Handa
  2022-10-16  0:28           ` Randy Dunlap
@ 2022-10-16  1:12           ` Yury Norov
  2022-10-16  4:10             ` Tetsuo Handa
  2022-10-16 17:52             ` Linus Torvalds
  1 sibling, 2 replies; 9+ messages in thread
From: Yury Norov @ 2022-10-16  1:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Borislav Petkov, Linus Torvalds, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
> On 2022/10/16 5:44, Yury Norov wrote:
> > Add people from other threads discussing this.
> > 
> > On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
> >> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
> >>> That's an invalid command line. The correct syntax is:
> >>>
> >>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >>
> >> The fix is not in Linus' tree yet.
> >>
> >>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
> >>> other architectures have the same problem. And fixing all callers will
> >>> not be in time for this merge window.
> >>
> >> Why won't there be time? That's why the -rcs are for.
> >>
> >> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
> >>
> >> So no, we will take Andrew's fixes for all arches in time for 6.1.
> > 
> > Summarizing things:
> > 
> > 1. cpumask_check() was introduced to make sure that the cpu number
> > passed into cpumask API belongs to a valid range. But the check is
> > broken for a very long time. And because of that there are a lot of
> > places where cpumask API is used wrongly.
> > 
> > 2. Underlying bitmap functions handle that correctly - when user
> > passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
> > what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
> > everything is working smoothly.
> > 
> > 3. I fixed all warnings that I was aware at the time of submitting the
> > patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
> > netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
> > cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
> > proposed fixes for c_start() in arch code.
> > 
> > 4. The code paths mentioned above are all known to me that violate
> > cpumask_check() rules. (Did I miss something?)
> > 
> > With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
> > me about this problem with c_start(). But I don't like the idea to revert
> > cpumask_check() fix. This way we'll never clean that mess. 
> > 
> > If for some reason those warnings are unacceptable for -rcs (and like
> > Boris, I don't understand why), than instead of reverting commits, I'd
> > suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
> > config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
> > fix their code. I can send a patch shortly, if we'll decide going this way.
> > 
> > How people would even realize that they're doing something wrong if
> > they will not get warned about it?
> 
> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.

It's not me who added WARN_ON() in the cpumask_check(). You're asking
a wrong person. 

What for do we have WARN_ON(), if we can't use it?

> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
> syzkaller) like below diff is sufficient for people to realize that they're
> doing something wrong.
> 
> Again, please do revert "cpumask: fix checking valid cpu range" immediately.

The revert is already in Jakub's batch for -rc2, AFAIK.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  1:12           ` Yury Norov
@ 2022-10-16  4:10             ` Tetsuo Handa
  2022-10-16 17:52             ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Tetsuo Handa @ 2022-10-16  4:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Borislav Petkov, Linus Torvalds, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On 2022/10/16 10:12, Yury Norov wrote:
> On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
>> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
> 
> It's not me who added WARN_ON() in the cpumask_check(). You're asking
> a wrong person. 

Because you broke the kernel testing. It was obvious that passing "cpu + 1"
instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition.
If your patch were reviewed and tested, we would not have done this discussion.

> 
> What for do we have WARN_ON(), if we can't use it?

WARN_ON() could be used which should not happen.
But cpu == nr_cpumask_bits condition shall happen in your patch.

RCs are not for fixing bugs that causes boot failures. Such bugs
should have been tested and fixed before patches are sent to linux.git .

> 
>> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
>> syzkaller) like below diff is sufficient for people to realize that they're
>> doing something wrong.
>>
>> Again, please do revert "cpumask: fix checking valid cpu range" immediately.
> 
> The revert is already in Jakub's batch for -rc2, AFAIK.

OK.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  1:12           ` Yury Norov
  2022-10-16  4:10             ` Tetsuo Handa
@ 2022-10-16 17:52             ` Linus Torvalds
  2022-10-17  2:54               ` Tetsuo Handa
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2022-10-16 17:52 UTC (permalink / raw)
  To: Yury Norov
  Cc: Tetsuo Handa, Borislav Petkov, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Sat, Oct 15, 2022 at 6:12 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
> >
> > Again, please do revert "cpumask: fix checking valid cpu range" immediately.
>
> The revert is already in Jakub's batch for -rc2, AFAIK.

Hmm.

I've looked at this, and at the discussion, and the various reports,
and my gut feel is that the problem is that the whole
"cpumask_check()" is completely bogus for all the "starting at bit X"
cases.

And I think it was wrong even before, and yes, I think the "+1"
simplification just made things worse.

I think that where it makes sense is in contexts where we actually
*use* the bit value as a bit, so cpumask_clear_cpu() doing

        clear_bit(cpumask_check(cpu), cpumask_bits(dstp));

makes 100% sense and is unequivocally something that merits a warning.
An out-of-range cpu number is a serious bug in that context.

But all the cases where the function fundamentally already limits
things to the number of CPU's (with comments like "Returns >=
nr_cpu_ids if no further cpus unset.") should simply not have the
cpumask_check() at all.

All it results in just moving the onus of testing things into the
callers, or just makes for odd complications ("-1 is valid, because it
acts as the previous cpu for the beginning because we add one to get
the next CPU").

Anyway, since rc1 is fairly imminent, I will just revert it for now -
I don't want to have a pending revert wait until -rc2.

But I actually suspect that the thing we should really do is to just
remove the check entirely for these functions that are already defined
in terms of "if no more bits, return nr_cpu_ids". They already
basically return an error case, having them *warn* about the error
they are going to return is just obnoxious.

                     Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16 17:52             ` Linus Torvalds
@ 2022-10-17  2:54               ` Tetsuo Handa
  2022-10-17 21:26                 ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2022-10-17  2:54 UTC (permalink / raw)
  To: Linus Torvalds, Yury Norov, Jakub Kicinski
  Cc: Borislav Petkov, syzbot, syzkaller-bugs, Andrew Jones, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

On 2022/10/17 2:52, Linus Torvalds wrote:
> Anyway, since rc1 is fairly imminent, I will just revert it for now -
> I don't want to have a pending revert wait until -rc2.

Thank you, Linus.

Yury or Jakub, please send a revert request on commit 854701ba4c39 ("net: fix cpu_max_bits_warn()
usage in netif_attrmask_next{,_and}"), for https://syzkaller.appspot.com/bug?extid=9abe5ecc348676215427
says that boot is still failing.

> But I actually suspect that the thing we should really do is to just
> remove the check entirely for these functions that are already defined
> in terms of "if no more bits, return nr_cpu_ids". They already
> basically return an error case, having them *warn* about the error
> they are going to return is just obnoxious.

I agree.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-17  2:54               ` Tetsuo Handa
@ 2022-10-17 21:26                 ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-10-17 21:26 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Linus Torvalds, Yury Norov, Borislav Petkov, syzbot,
	syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Mon, 17 Oct 2022 11:54:01 +0900 Tetsuo Handa wrote:
> On 2022/10/17 2:52, Linus Torvalds wrote:
> > Anyway, since rc1 is fairly imminent, I will just revert it for now -
> > I don't want to have a pending revert wait until -rc2.  
> 
> Thank you, Linus.
> 
> Yury or Jakub, please send a revert request on commit 854701ba4c39 ("net: fix cpu_max_bits_warn()
> usage in netif_attrmask_next{,_and}"), for https://syzkaller.appspot.com/bug?extid=9abe5ecc348676215427
> says that boot is still failing.

Yup, we got that revert queued up. LMK if it's a show stopper 
otherwise it'll get to Linus on Thu.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-17 21:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <0000000000007647ec05eb05249c@google.com>
     [not found] ` <Y0nTd9HSnnt/KDap@zn.tnic>
     [not found]   ` <2eaf1386-8ab0-bd65-acee-e29f1c5a6623@I-love.SAKURA.ne.jp>
     [not found]     ` <Y0qfLyhSoTodAdxu@zn.tnic>
2022-10-15 20:44       ` [syzbot] WARNING in c_start Yury Norov
2022-10-16  0:24         ` Tetsuo Handa
2022-10-16  0:28           ` Randy Dunlap
2022-10-16  0:34             ` Tetsuo Handa
2022-10-16  1:12           ` Yury Norov
2022-10-16  4:10             ` Tetsuo Handa
2022-10-16 17:52             ` Linus Torvalds
2022-10-17  2:54               ` Tetsuo Handa
2022-10-17 21:26                 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).