Re: [syzbot] general protection fault in __device

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [syzbot] general protection fault in __device_attach
       [not found] <000000000000bb7f1c05da29b601@google.com>
@ 2022-06-03 10:02 ` syzbot
  2022-06-03 11:04   ` Andy Shevchenko
  2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot
  1 sibling, 1 reply; 14+ messages in thread
From: syzbot @ 2022-06-03 10:02 UTC (permalink / raw)
  To: andriy.shevchenko, gregkh, hdanton, lenb, linux-acpi,
	linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs

syzbot has bisected this issue to:

commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date:   Fri Jun 18 13:41:27 2021 +0000

    ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000
start commit:   d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect
git tree:       upstream
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665
dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000

Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com
Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot
@ 2022-06-03 11:04   ` Andy Shevchenko
  2022-06-03 15:42     ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Shevchenko @ 2022-06-03 11:04 UTC (permalink / raw)
  To: syzbot
  Cc: gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki,
	rafael, rjw, syzkaller-bugs, linux-usb, Alan Stern

On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Date:   Fri Jun 18 13:41:27 2021 +0000
> 
>     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros

Hmm... It's not obvious at all how this change can alter the behaviour so
drastically. device_add() is called from USB core with intf->dev.name == NULL
by some reason. A-ha, seems like fault injector, which looks like

	dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
		     dev->devpath, configuration, ifnum);

missed the return code check.

But I'm not familiar with that code at all, adding Linux USB ML and Alan.

> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000
> start commit:   d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect
> git tree:       upstream
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665
> dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000
> 
> Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com
> Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 11:04   ` Andy Shevchenko
@ 2022-06-03 15:42     ` Alan Stern
  2022-06-03 15:52       ` Greg KH
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2022-06-03 15:42 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: syzbot, gregkh, hdanton, lenb, linux-acpi, linux-kernel,
	rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb

On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote:
> On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote:
> > syzbot has bisected this issue to:
> > 
> > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Date:   Fri Jun 18 13:41:27 2021 +0000
> > 
> >     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
> 
> Hmm... It's not obvious at all how this change can alter the behaviour so
> drastically. device_add() is called from USB core with intf->dev.name == NULL
> by some reason. A-ha, seems like fault injector, which looks like
> 
> 	dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
> 		     dev->devpath, configuration, ifnum);
> 
> missed the return code check.
> 
> But I'm not familiar with that code at all, adding Linux USB ML and Alan.

I can't see any connection between this bug and acpi/sysfs.c.  Is it a 
bad bisection?

It looks like you're right about dev_set_name() failing.  In fact, the 
kernel appears to be littered with calls to that routine which do not 
check the return code (the entire subtree below drivers/usb/ contains 
only _one_ call that does check the return code!).  The function doesn't 
have any __must_check annotation, and its kerneldoc doesn't mention the 
return code or the possibility of a failure.

Apparently the assumption is that if dev_set_name() fails then 
device_add() later on will also fail, and the problem will be detected 
then.

So now what should happen when device_add() for an interface fails in 
usb_set_configuration()?  I guess the interface should be deleted; 
otherwise we have the possibility that people might still try to access 
it via usbfs, as in the syzbot test run.  Same goes for the 
of_device_is_available() check.

Fixing that will be a little painful.  Right now there are plenty of 
places in the USB core that aren't prepared to cope with a non-existent 
interface.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 15:42     ` Alan Stern
@ 2022-06-03 15:52       ` Greg KH
  2022-06-03 16:03         ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Greg KH @ 2022-06-03 15:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel,
	rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb

On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote:
> On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote:
> > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote:
> > > syzbot has bisected this issue to:
> > > 
> > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Date:   Fri Jun 18 13:41:27 2021 +0000
> > > 
> > >     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
> > 
> > Hmm... It's not obvious at all how this change can alter the behaviour so
> > drastically. device_add() is called from USB core with intf->dev.name == NULL
> > by some reason. A-ha, seems like fault injector, which looks like
> > 
> > 	dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
> > 		     dev->devpath, configuration, ifnum);
> > 
> > missed the return code check.
> > 
> > But I'm not familiar with that code at all, adding Linux USB ML and Alan.
> 
> I can't see any connection between this bug and acpi/sysfs.c.  Is it a 
> bad bisection?
> 
> It looks like you're right about dev_set_name() failing.  In fact, the 
> kernel appears to be littered with calls to that routine which do not 
> check the return code (the entire subtree below drivers/usb/ contains 
> only _one_ call that does check the return code!).  The function doesn't 
> have any __must_check annotation, and its kerneldoc doesn't mention the 
> return code or the possibility of a failure.
> 
> Apparently the assumption is that if dev_set_name() fails then 
> device_add() later on will also fail, and the problem will be detected 
> then.
> 
> So now what should happen when device_add() for an interface fails in 
> usb_set_configuration()?

But how can that really fail on a real system?

Is this just due to error-injection stuff?  If so, I'm really loath to
rework the world for something that can never happen in real life.

Or is this a real syzbot-found-with-reproducer issue?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 15:52       ` Greg KH
@ 2022-06-03 16:03         ` Alan Stern
  2022-06-03 16:11           ` Greg KH
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2022-06-03 16:03 UTC (permalink / raw)
  To: Greg KH
  Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel,
	rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb

On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote:
> On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote:
> > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote:
> > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote:
> > > > syzbot has bisected this issue to:
> > > > 
> > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > Date:   Fri Jun 18 13:41:27 2021 +0000
> > > > 
> > > >     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
> > > 
> > > Hmm... It's not obvious at all how this change can alter the behaviour so
> > > drastically. device_add() is called from USB core with intf->dev.name == NULL
> > > by some reason. A-ha, seems like fault injector, which looks like
> > > 
> > > 	dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
> > > 		     dev->devpath, configuration, ifnum);
> > > 
> > > missed the return code check.
> > > 
> > > But I'm not familiar with that code at all, adding Linux USB ML and Alan.
> > 
> > I can't see any connection between this bug and acpi/sysfs.c.  Is it a 
> > bad bisection?
> > 
> > It looks like you're right about dev_set_name() failing.  In fact, the 
> > kernel appears to be littered with calls to that routine which do not 
> > check the return code (the entire subtree below drivers/usb/ contains 
> > only _one_ call that does check the return code!).  The function doesn't 
> > have any __must_check annotation, and its kerneldoc doesn't mention the 
> > return code or the possibility of a failure.
> > 
> > Apparently the assumption is that if dev_set_name() fails then 
> > device_add() later on will also fail, and the problem will be detected 
> > then.
> > 
> > So now what should happen when device_add() for an interface fails in 
> > usb_set_configuration()?
> 
> But how can that really fail on a real system?
> 
> Is this just due to error-injection stuff?  If so, I'm really loath to
> rework the world for something that can never happen in real life.
> 
> Or is this a real syzbot-found-with-reproducer issue?

Aren't there quite a few reasons why device_add() might fail?  (Although 
most of them probably are memory allocation errors...)

Basically, you have to make up your mind.  If a function can fail, you 
should be prepared to handle the failure.  If it can't fail, there's no 
point in even checking the return code.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 16:03         ` Alan Stern
@ 2022-06-03 16:11           ` Greg KH
  2022-06-03 16:27             ` Alan Stern
  2022-06-04  8:32             ` Dmitry Vyukov
  0 siblings, 2 replies; 14+ messages in thread
From: Greg KH @ 2022-06-03 16:11 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel,
	rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb

On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote:
> On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote:
> > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote:
> > > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote:
> > > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote:
> > > > > syzbot has bisected this issue to:
> > > > > 
> > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > > Date:   Fri Jun 18 13:41:27 2021 +0000
> > > > > 
> > > > >     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
> > > > 
> > > > Hmm... It's not obvious at all how this change can alter the behaviour so
> > > > drastically. device_add() is called from USB core with intf->dev.name == NULL
> > > > by some reason. A-ha, seems like fault injector, which looks like
> > > > 
> > > > 	dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
> > > > 		     dev->devpath, configuration, ifnum);
> > > > 
> > > > missed the return code check.
> > > > 
> > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan.
> > > 
> > > I can't see any connection between this bug and acpi/sysfs.c.  Is it a 
> > > bad bisection?
> > > 
> > > It looks like you're right about dev_set_name() failing.  In fact, the 
> > > kernel appears to be littered with calls to that routine which do not 
> > > check the return code (the entire subtree below drivers/usb/ contains 
> > > only _one_ call that does check the return code!).  The function doesn't 
> > > have any __must_check annotation, and its kerneldoc doesn't mention the 
> > > return code or the possibility of a failure.
> > > 
> > > Apparently the assumption is that if dev_set_name() fails then 
> > > device_add() later on will also fail, and the problem will be detected 
> > > then.
> > > 
> > > So now what should happen when device_add() for an interface fails in 
> > > usb_set_configuration()?
> > 
> > But how can that really fail on a real system?
> > 
> > Is this just due to error-injection stuff?  If so, I'm really loath to
> > rework the world for something that can never happen in real life.
> > 
> > Or is this a real syzbot-found-with-reproducer issue?
> 
> Aren't there quite a few reasons why device_add() might fail?  (Although 
> most of them probably are memory allocation errors...)

I was thinking of the dev_set_name() issue further back in the call
chain.

> Basically, you have to make up your mind.  If a function can fail, you 
> should be prepared to handle the failure.  If it can't fail, there's no 
> point in even checking the return code.

True, ok, we should unwind the mess.  I'll try to look at it after the
merge window...

But again, is this a "real and able to be triggered from userspace"
problem, or just fault-injection-induced?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 16:11           ` Greg KH
@ 2022-06-03 16:27             ` Alan Stern
  2022-06-04  8:32             ` Dmitry Vyukov
  1 sibling, 0 replies; 14+ messages in thread
From: Alan Stern @ 2022-06-03 16:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel,
	rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb

On Fri, Jun 03, 2022 at 06:11:55PM +0200, Greg KH wrote:
> On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote:
> > On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote:
> > > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote:
> > > > So now what should happen when device_add() for an interface fails in 
> > > > usb_set_configuration()?
> > > 
> > > But how can that really fail on a real system?
> > > 
> > > Is this just due to error-injection stuff?  If so, I'm really loath to
> > > rework the world for something that can never happen in real life.
> > > 
> > > Or is this a real syzbot-found-with-reproducer issue?
> > 
> > Aren't there quite a few reasons why device_add() might fail?  (Although 
> > most of them probably are memory allocation errors...)
> 
> I was thinking of the dev_set_name() issue further back in the call
> chain.

As far as I know, the only reason for dev_set_name() to fail is -ENOMEM.  
That's not something the user can control directly.

> > Basically, you have to make up your mind.  If a function can fail, you 
> > should be prepared to handle the failure.  If it can't fail, there's no 
> > point in even checking the return code.
> 
> True, ok, we should unwind the mess.  I'll try to look at it after the
> merge window...
> 
> But again, is this a "real and able to be triggered from userspace"
> problem, or just fault-injection-induced?

I don't think any of the failure paths here are controlled by the user.  
They all seem to involve something going wrong internally in the kernel 
(i.e., corruption or memory allocation failure for a small buffer).  
Once that happens, the game is pretty much over anyway.

Is it worth handling this sort of thing, or should we ignore the 
possibility and allow it to escalate to the point where the user can 
potentially trigger a kernel panic?  Another way of putting it is: How 
gracefully do you want the kernel to collapse when this sort of 
corruption happens?

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-03 16:11           ` Greg KH
  2022-06-03 16:27             ` Alan Stern
@ 2022-06-04  8:32             ` Dmitry Vyukov
  2022-06-06 12:38               ` Dan Carpenter
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2022-06-04  8:32 UTC (permalink / raw)
  To: Greg KH
  Cc: Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi,
	linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs,
	linux-usb

On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > syzbot has bisected this issue to:
> > > > > >
> > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1
> > > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > > > > Date:   Fri Jun 18 13:41:27 2021 +0000
> > > > > >
> > > > > >     ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
> > > > >
> > > > > Hmm... It's not obvious at all how this change can alter the behaviour so
> > > > > drastically. device_add() is called from USB core with intf->dev.name == NULL
> > > > > by some reason. A-ha, seems like fault injector, which looks like
> > > > >
> > > > >         dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum,
> > > > >                      dev->devpath, configuration, ifnum);
> > > > >
> > > > > missed the return code check.
> > > > >
> > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan.
> > > >
> > > > I can't see any connection between this bug and acpi/sysfs.c.  Is it a
> > > > bad bisection?
> > > >
> > > > It looks like you're right about dev_set_name() failing.  In fact, the
> > > > kernel appears to be littered with calls to that routine which do not
> > > > check the return code (the entire subtree below drivers/usb/ contains
> > > > only _one_ call that does check the return code!).  The function doesn't
> > > > have any __must_check annotation, and its kerneldoc doesn't mention the
> > > > return code or the possibility of a failure.
> > > >
> > > > Apparently the assumption is that if dev_set_name() fails then
> > > > device_add() later on will also fail, and the problem will be detected
> > > > then.
> > > >
> > > > So now what should happen when device_add() for an interface fails in
> > > > usb_set_configuration()?
> > >
> > > But how can that really fail on a real system?
> > >
> > > Is this just due to error-injection stuff?  If so, I'm really loath to
> > > rework the world for something that can never happen in real life.
> > >
> > > Or is this a real syzbot-found-with-reproducer issue?
> >
> > Aren't there quite a few reasons why device_add() might fail?  (Although
> > most of them probably are memory allocation errors...)
>
> I was thinking of the dev_set_name() issue further back in the call
> chain.
>
> > Basically, you have to make up your mind.  If a function can fail, you
> > should be prepared to handle the failure.  If it can't fail, there's no
> > point in even checking the return code.
>
> True, ok, we should unwind the mess.  I'll try to look at it after the
> merge window...
>
> But again, is this a "real and able to be triggered from userspace"
> problem, or just fault-injection-induced?

Then this is something to fix in the fault injection subsystem.
Testing systems shouldn't be reporting false positives.
What allocations cannot fail in real life? Is it <=page_size?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-04  8:32             ` Dmitry Vyukov
@ 2022-06-06 12:38               ` Dan Carpenter
  2022-06-07  7:15                 ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2022-06-06 12:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb,
	linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw,
	syzkaller-bugs, linux-usb

On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > But again, is this a "real and able to be triggered from userspace"
> > problem, or just fault-injection-induced?
> 
> Then this is something to fix in the fault injection subsystem.
> Testing systems shouldn't be reporting false positives.
> What allocations cannot fail in real life? Is it <=page_size?
> 

Apparently in 2014, anything less than *EIGHT?!!* pages succeeded!

https://lwn.net/Articles/627419/

I have been on the look out since that article and never seen anyone
mention it changing.  I think we should ignore that and say that
anything over PAGE_SIZE can fail.  Possibly we could go smaller than
PAGE_SIZE...

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-06 12:38               ` Dan Carpenter
@ 2022-06-07  7:15                 ` Dmitry Vyukov
  2022-06-08  3:25                   ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2022-06-07  7:15 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb,
	linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw,
	syzkaller-bugs, linux-usb, Linux-MM

On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > But again, is this a "real and able to be triggered from userspace"
> > > problem, or just fault-injection-induced?
> >
> > Then this is something to fix in the fault injection subsystem.
> > Testing systems shouldn't be reporting false positives.
> > What allocations cannot fail in real life? Is it <=page_size?
> >
>
> Apparently in 2014, anything less than *EIGHT?!!* pages succeeded!
>
> https://lwn.net/Articles/627419/
>
> I have been on the look out since that article and never seen anyone
> mention it changing.  I think we should ignore that and say that
> anything over PAGE_SIZE can fail.  Possibly we could go smaller than
> PAGE_SIZE...

+linux-mm for GFP expertise re what allocations cannot possibly fail
and should be excluded from fault injection.

Interesting, thanks for the link.

PAGE_SIZE looks like a good start. Once we have the predicate in
place, we can refine it later when/if we have more inputs.

But I wonder about GFP flags. They definitely have some impact on allocations.
If GFP_ACCOUNT is set, all allocations can fail, right?
If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones?
If GFP_NORETRY is set, allocations can fail?
What about GFP_NOMEMALLOC and GFP_ATOMIC?
What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At
least some of these need to be set for allocations to not fail? Which
ones?
Any other flags are required to be set/unset for allocations to not fail?

FTR here is quick link to flags list:
https://elixir.bootlin.com/linux/v5.19-rc1/source/include/linux/gfp.h#L32

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-07  7:15                 ` Dmitry Vyukov
@ 2022-06-08  3:25                   ` Matthew Wilcox
  2022-06-08  8:20                     ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2022-06-08  3:25 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot,
	hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael,
	rjw, syzkaller-bugs, linux-usb, Linux-MM

On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote:
> On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> >
> > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > But again, is this a "real and able to be triggered from userspace"
> > > > problem, or just fault-injection-induced?
> > >
> > > Then this is something to fix in the fault injection subsystem.
> > > Testing systems shouldn't be reporting false positives.
> > > What allocations cannot fail in real life? Is it <=page_size?
> > >
> >
> > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded!
> >
> > https://lwn.net/Articles/627419/
> >
> > I have been on the look out since that article and never seen anyone
> > mention it changing.  I think we should ignore that and say that
> > anything over PAGE_SIZE can fail.  Possibly we could go smaller than
> > PAGE_SIZE...
> 
> +linux-mm for GFP expertise re what allocations cannot possibly fail
> and should be excluded from fault injection.
> 
> Interesting, thanks for the link.
> 
> PAGE_SIZE looks like a good start. Once we have the predicate in
> place, we can refine it later when/if we have more inputs.
> 
> But I wonder about GFP flags. They definitely have some impact on allocations.
> If GFP_ACCOUNT is set, all allocations can fail, right?
> If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones?
> If GFP_NORETRY is set, allocations can fail?
> What about GFP_NOMEMALLOC and GFP_ATOMIC?
> What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At
> least some of these need to be set for allocations to not fail? Which
> ones?
> Any other flags are required to be set/unset for allocations to not fail?

I'm not the expert on page allocation, but ...

I don't think GFP_ACCOUNT makes allocations fail.  It might make reclaim
happen from within that cgroup, and it might cause an OOM kill for
something in that cgroup.  But I don't think it makes a (low order)
allocation more likely to fail.

There's usually less memory avilable in DMA/DMA32 zones, but we have
so few allocations from those zones, I question the utility of focusing
testing on those allocations.

GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely
to fail.  KSWAPD_RECLAIM has no effect on whether _this_ allocation
succeeds or fails; it kicks kswapd to do reclaim, rather than doing
reclaim directly.  DIRECT_RECLAIM definitely makes allocations more likely
to succeed.  GFP_FS allows (direct) reclaim to happen from filesystems.
GFP_IO allows IO to start (ie writeback can start) in order to clean
dirty memory.

Anyway, I hope somebody who knows the page allocator better than I do
can say smarter things than this.  Even better if they can put it into
Documentation/ somewhere ;-)

https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
exists but isn't quite enough to answer this question.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-08  3:25                   ` Matthew Wilcox
@ 2022-06-08  8:20                     ` Dmitry Vyukov
  2022-06-08  8:24                       ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2022-06-08  8:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot,
	hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael,
	rjw, syzkaller-bugs, linux-usb, Linux-MM

On Wed, 8 Jun 2022 at 05:25, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote:
> > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > >
> > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > But again, is this a "real and able to be triggered from userspace"
> > > > > problem, or just fault-injection-induced?
> > > >
> > > > Then this is something to fix in the fault injection subsystem.
> > > > Testing systems shouldn't be reporting false positives.
> > > > What allocations cannot fail in real life? Is it <=page_size?
> > > >
> > >
> > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded!
> > >
> > > https://lwn.net/Articles/627419/
> > >
> > > I have been on the look out since that article and never seen anyone
> > > mention it changing.  I think we should ignore that and say that
> > > anything over PAGE_SIZE can fail.  Possibly we could go smaller than
> > > PAGE_SIZE...
> >
> > +linux-mm for GFP expertise re what allocations cannot possibly fail
> > and should be excluded from fault injection.
> >
> > Interesting, thanks for the link.
> >
> > PAGE_SIZE looks like a good start. Once we have the predicate in
> > place, we can refine it later when/if we have more inputs.
> >
> > But I wonder about GFP flags. They definitely have some impact on allocations.
> > If GFP_ACCOUNT is set, all allocations can fail, right?
> > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones?
> > If GFP_NORETRY is set, allocations can fail?
> > What about GFP_NOMEMALLOC and GFP_ATOMIC?
> > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At
> > least some of these need to be set for allocations to not fail? Which
> > ones?
> > Any other flags are required to be set/unset for allocations to not fail?
>
> I'm not the expert on page allocation, but ...
>
> I don't think GFP_ACCOUNT makes allocations fail.  It might make reclaim
> happen from within that cgroup, and it might cause an OOM kill for
> something in that cgroup.  But I don't think it makes a (low order)
> allocation more likely to fail.

Interesting.
I was thinking of some malicious specifically crafted configurations
with very low limit and particular pattern of allocations. Also what
if there is just 1 process (current)? Is it possible to kill and
reclaim the current process when a thread is stuck in the middle of
the kernel on a kmalloc?
Also I see e.g.:
        Tasks with the OOM protection (oom_score_adj set to -1000)
        are treated as an exception and are never killed.

I am not an expert on this either, but I think it may be hard to fight
with a specifically crafted attack.


> There's usually less memory avilable in DMA/DMA32 zones, but we have
> so few allocations from those zones, I question the utility of focusing
> testing on those allocations.
>
> GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely
> to fail.  KSWAPD_RECLAIM has no effect on whether _this_ allocation
> succeeds or fails; it kicks kswapd to do reclaim, rather than doing
> reclaim directly.  DIRECT_RECLAIM definitely makes allocations more likely
> to succeed.  GFP_FS allows (direct) reclaim to happen from filesystems.
> GFP_IO allows IO to start (ie writeback can start) in order to clean
> dirty memory.
>
> Anyway, I hope somebody who knows the page allocator better than I do
> can say smarter things than this.  Even better if they can put it into
> Documentation/ somewhere ;-)

Even better to put this into code as a predicate function that fault
injection will use. It will also serve as precise up-to-date
documentation.

> https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
> exists but isn't quite enough to answer this question.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] general protection fault in __device_attach
  2022-06-08  8:20                     ` Dmitry Vyukov
@ 2022-06-08  8:24                       ` Dmitry Vyukov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Vyukov @ 2022-06-08  8:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot,
	hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael,
	rjw, syzkaller-bugs, linux-usb, Linux-MM

On Wed, 8 Jun 2022 at 10:20, Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote:
> > > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> > > >
> > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > But again, is this a "real and able to be triggered from userspace"
> > > > > > problem, or just fault-injection-induced?
> > > > >
> > > > > Then this is something to fix in the fault injection subsystem.
> > > > > Testing systems shouldn't be reporting false positives.
> > > > > What allocations cannot fail in real life? Is it <=page_size?
> > > > >
> > > >
> > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded!
> > > >
> > > > https://lwn.net/Articles/627419/
> > > >
> > > > I have been on the look out since that article and never seen anyone
> > > > mention it changing.  I think we should ignore that and say that
> > > > anything over PAGE_SIZE can fail.  Possibly we could go smaller than
> > > > PAGE_SIZE...
> > >
> > > +linux-mm for GFP expertise re what allocations cannot possibly fail
> > > and should be excluded from fault injection.
> > >
> > > Interesting, thanks for the link.
> > >
> > > PAGE_SIZE looks like a good start. Once we have the predicate in
> > > place, we can refine it later when/if we have more inputs.
> > >
> > > But I wonder about GFP flags. They definitely have some impact on allocations.
> > > If GFP_ACCOUNT is set, all allocations can fail, right?
> > > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones?
> > > If GFP_NORETRY is set, allocations can fail?
> > > What about GFP_NOMEMALLOC and GFP_ATOMIC?
> > > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At
> > > least some of these need to be set for allocations to not fail? Which
> > > ones?
> > > Any other flags are required to be set/unset for allocations to not fail?
> >
> > I'm not the expert on page allocation, but ...
> >
> > I don't think GFP_ACCOUNT makes allocations fail.  It might make reclaim
> > happen from within that cgroup, and it might cause an OOM kill for
> > something in that cgroup.  But I don't think it makes a (low order)
> > allocation more likely to fail.
>
> Interesting.
> I was thinking of some malicious specifically crafted configurations
> with very low limit and particular pattern of allocations. Also what
> if there is just 1 process (current)? Is it possible to kill and
> reclaim the current process when a thread is stuck in the middle of
> the kernel on a kmalloc?
> Also I see e.g.:
>         Tasks with the OOM protection (oom_score_adj set to -1000)
>         are treated as an exception and are never killed.
>
> I am not an expert on this either, but I think it may be hard to fight
> with a specifically crafted attack.
>
>
> > There's usually less memory avilable in DMA/DMA32 zones, but we have
> > so few allocations from those zones, I question the utility of focusing
> > testing on those allocations.
> >
> > GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely
> > to fail.  KSWAPD_RECLAIM has no effect on whether _this_ allocation
> > succeeds or fails; it kicks kswapd to do reclaim, rather than doing
> > reclaim directly.  DIRECT_RECLAIM definitely makes allocations more likely
> > to succeed.  GFP_FS allows (direct) reclaim to happen from filesystems.
> > GFP_IO allows IO to start (ie writeback can start) in order to clean
> > dirty memory.
> >
> > Anyway, I hope somebody who knows the page allocator better than I do
> > can say smarter things than this.  Even better if they can put it into
> > Documentation/ somewhere ;-)
>
> Even better to put this into code as a predicate function that fault
> injection will use. It will also serve as precise up-to-date
> documentation.

Also at the end of kmalloc as:
WARN_ON(!ret && !cant_fail(size, gfp));
!

> > https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html
> > exists but isn't quite enough to answer this question.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [syzbot] [kernel?] general protection fault in __device_attach
       [not found] <000000000000bb7f1c05da29b601@google.com>
  2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot
@ 2024-01-10 13:12 ` syzbot
  1 sibling, 0 replies; 14+ messages in thread
From: syzbot @ 2024-01-10 13:12 UTC (permalink / raw)
  To: 42.hyeyoo, andriy.shevchenko, dan.carpenter, dvyukov, gregkh,
	hdanton, keescook, lenb, linux-acpi, linux-kernel, linux-mm,
	linux-usb, rafael.j.wysocki, rafael, rientjes, rjw, stern,
	syzkaller-bugs, vbabka, willy

syzbot suspects this issue was fixed by commit:

commit 49378a05ce7f01a203550eb7c2ef772f6d24565c
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu Oct 26 15:45:42 2023 +0000

    mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15b179cde80000
start commit:   d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665
dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-01-10 13:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <000000000000bb7f1c05da29b601@google.com>
2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot
2022-06-03 11:04   ` Andy Shevchenko
2022-06-03 15:42     ` Alan Stern
2022-06-03 15:52       ` Greg KH
2022-06-03 16:03         ` Alan Stern
2022-06-03 16:11           ` Greg KH
2022-06-03 16:27             ` Alan Stern
2022-06-04  8:32             ` Dmitry Vyukov
2022-06-06 12:38               ` Dan Carpenter
2022-06-07  7:15                 ` Dmitry Vyukov
2022-06-08  3:25                   ` Matthew Wilcox
2022-06-08  8:20                     ` Dmitry Vyukov
2022-06-08  8:24                       ` Dmitry Vyukov
2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox