* Re: [syzbot] general protection fault in __device_attach [not found] <000000000000bb7f1c05da29b601@google.com> @ 2022-06-03 10:02 ` syzbot 2022-06-03 11:04 ` Andy Shevchenko 2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot 1 sibling, 1 reply; 14+ messages in thread From: syzbot @ 2022-06-03 10:02 UTC (permalink / raw) To: andriy.shevchenko, gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs syzbot has bisected this issue to: commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Date: Fri Jun 18 13:41:27 2021 +0000 ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000 start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000 console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000 kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot @ 2022-06-03 11:04 ` Andy Shevchenko 2022-06-03 15:42 ` Alan Stern 0 siblings, 1 reply; 14+ messages in thread From: Andy Shevchenko @ 2022-06-03 11:04 UTC (permalink / raw) To: syzbot Cc: gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Alan Stern On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > syzbot has bisected this issue to: > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Date: Fri Jun 18 13:41:27 2021 +0000 > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros Hmm... It's not obvious at all how this change can alter the behaviour so drastically. device_add() is called from USB core with intf->dev.name == NULL by some reason. A-ha, seems like fault injector, which looks like dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, dev->devpath, configuration, ifnum); missed the return code check. But I'm not familiar with that code at all, adding Linux USB ML and Alan. > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1040b80df00000 > start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=1240b80df00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=1440b80df00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 > dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 > > Reported-by: syzbot+dd3c97de244683533381@syzkaller.appspotmail.com > Fixes: a9c4cf299f5f ("ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 11:04 ` Andy Shevchenko @ 2022-06-03 15:42 ` Alan Stern 2022-06-03 15:52 ` Greg KH 0 siblings, 1 reply; 14+ messages in thread From: Alan Stern @ 2022-06-03 15:42 UTC (permalink / raw) To: Andy Shevchenko Cc: syzbot, gregkh, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > syzbot has bisected this issue to: > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > Hmm... It's not obvious at all how this change can alter the behaviour so > drastically. device_add() is called from USB core with intf->dev.name == NULL > by some reason. A-ha, seems like fault injector, which looks like > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > dev->devpath, configuration, ifnum); > > missed the return code check. > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. I can't see any connection between this bug and acpi/sysfs.c. Is it a bad bisection? It looks like you're right about dev_set_name() failing. In fact, the kernel appears to be littered with calls to that routine which do not check the return code (the entire subtree below drivers/usb/ contains only _one_ call that does check the return code!). The function doesn't have any __must_check annotation, and its kerneldoc doesn't mention the return code or the possibility of a failure. Apparently the assumption is that if dev_set_name() fails then device_add() later on will also fail, and the problem will be detected then. So now what should happen when device_add() for an interface fails in usb_set_configuration()? I guess the interface should be deleted; otherwise we have the possibility that people might still try to access it via usbfs, as in the syzbot test run. Same goes for the of_device_is_available() check. Fixing that will be a little painful. Right now there are plenty of places in the USB core that aren't prepared to cope with a non-existent interface. Alan Stern ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 15:42 ` Alan Stern @ 2022-06-03 15:52 ` Greg KH 2022-06-03 16:03 ` Alan Stern 0 siblings, 1 reply; 14+ messages in thread From: Greg KH @ 2022-06-03 15:52 UTC (permalink / raw) To: Alan Stern Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > syzbot has bisected this issue to: > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > by some reason. A-ha, seems like fault injector, which looks like > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > dev->devpath, configuration, ifnum); > > > > missed the return code check. > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > bad bisection? > > It looks like you're right about dev_set_name() failing. In fact, the > kernel appears to be littered with calls to that routine which do not > check the return code (the entire subtree below drivers/usb/ contains > only _one_ call that does check the return code!). The function doesn't > have any __must_check annotation, and its kerneldoc doesn't mention the > return code or the possibility of a failure. > > Apparently the assumption is that if dev_set_name() fails then > device_add() later on will also fail, and the problem will be detected > then. > > So now what should happen when device_add() for an interface fails in > usb_set_configuration()? But how can that really fail on a real system? Is this just due to error-injection stuff? If so, I'm really loath to rework the world for something that can never happen in real life. Or is this a real syzbot-found-with-reproducer issue? thanks, greg k-h ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 15:52 ` Greg KH @ 2022-06-03 16:03 ` Alan Stern 2022-06-03 16:11 ` Greg KH 0 siblings, 1 reply; 14+ messages in thread From: Alan Stern @ 2022-06-03 16:03 UTC (permalink / raw) To: Greg KH Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > > syzbot has bisected this issue to: > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > dev->devpath, configuration, ifnum); > > > > > > missed the return code check. > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > bad bisection? > > > > It looks like you're right about dev_set_name() failing. In fact, the > > kernel appears to be littered with calls to that routine which do not > > check the return code (the entire subtree below drivers/usb/ contains > > only _one_ call that does check the return code!). The function doesn't > > have any __must_check annotation, and its kerneldoc doesn't mention the > > return code or the possibility of a failure. > > > > Apparently the assumption is that if dev_set_name() fails then > > device_add() later on will also fail, and the problem will be detected > > then. > > > > So now what should happen when device_add() for an interface fails in > > usb_set_configuration()? > > But how can that really fail on a real system? > > Is this just due to error-injection stuff? If so, I'm really loath to > rework the world for something that can never happen in real life. > > Or is this a real syzbot-found-with-reproducer issue? Aren't there quite a few reasons why device_add() might fail? (Although most of them probably are memory allocation errors...) Basically, you have to make up your mind. If a function can fail, you should be prepared to handle the failure. If it can't fail, there's no point in even checking the return code. Alan Stern ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:03 ` Alan Stern @ 2022-06-03 16:11 ` Greg KH 2022-06-03 16:27 ` Alan Stern 2022-06-04 8:32 ` Dmitry Vyukov 0 siblings, 2 replies; 14+ messages in thread From: Greg KH @ 2022-06-03 16:11 UTC (permalink / raw) To: Alan Stern Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote: > On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > > On Fri, Jun 03, 2022 at 02:04:04PM +0300, Andy Shevchenko wrote: > > > > On Fri, Jun 03, 2022 at 03:02:07AM -0700, syzbot wrote: > > > > > syzbot has bisected this issue to: > > > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > > dev->devpath, configuration, ifnum); > > > > > > > > missed the return code check. > > > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > > bad bisection? > > > > > > It looks like you're right about dev_set_name() failing. In fact, the > > > kernel appears to be littered with calls to that routine which do not > > > check the return code (the entire subtree below drivers/usb/ contains > > > only _one_ call that does check the return code!). The function doesn't > > > have any __must_check annotation, and its kerneldoc doesn't mention the > > > return code or the possibility of a failure. > > > > > > Apparently the assumption is that if dev_set_name() fails then > > > device_add() later on will also fail, and the problem will be detected > > > then. > > > > > > So now what should happen when device_add() for an interface fails in > > > usb_set_configuration()? > > > > But how can that really fail on a real system? > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > rework the world for something that can never happen in real life. > > > > Or is this a real syzbot-found-with-reproducer issue? > > Aren't there quite a few reasons why device_add() might fail? (Although > most of them probably are memory allocation errors...) I was thinking of the dev_set_name() issue further back in the call chain. > Basically, you have to make up your mind. If a function can fail, you > should be prepared to handle the failure. If it can't fail, there's no > point in even checking the return code. True, ok, we should unwind the mess. I'll try to look at it after the merge window... But again, is this a "real and able to be triggered from userspace" problem, or just fault-injection-induced? thanks, greg k-h ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:11 ` Greg KH @ 2022-06-03 16:27 ` Alan Stern 2022-06-04 8:32 ` Dmitry Vyukov 1 sibling, 0 replies; 14+ messages in thread From: Alan Stern @ 2022-06-03 16:27 UTC (permalink / raw) To: Greg KH Cc: Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, Jun 03, 2022 at 06:11:55PM +0200, Greg KH wrote: > On Fri, Jun 03, 2022 at 12:03:32PM -0400, Alan Stern wrote: > > On Fri, Jun 03, 2022 at 05:52:38PM +0200, Greg KH wrote: > > > On Fri, Jun 03, 2022 at 11:42:19AM -0400, Alan Stern wrote: > > > > So now what should happen when device_add() for an interface fails in > > > > usb_set_configuration()? > > > > > > But how can that really fail on a real system? > > > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > > rework the world for something that can never happen in real life. > > > > > > Or is this a real syzbot-found-with-reproducer issue? > > > > Aren't there quite a few reasons why device_add() might fail? (Although > > most of them probably are memory allocation errors...) > > I was thinking of the dev_set_name() issue further back in the call > chain. As far as I know, the only reason for dev_set_name() to fail is -ENOMEM. That's not something the user can control directly. > > Basically, you have to make up your mind. If a function can fail, you > > should be prepared to handle the failure. If it can't fail, there's no > > point in even checking the return code. > > True, ok, we should unwind the mess. I'll try to look at it after the > merge window... > > But again, is this a "real and able to be triggered from userspace" > problem, or just fault-injection-induced? I don't think any of the failure paths here are controlled by the user. They all seem to involve something going wrong internally in the kernel (i.e., corruption or memory allocation failure for a small buffer). Once that happens, the game is pretty much over anyway. Is it worth handling this sort of thing, or should we ignore the possibility and allow it to escalate to the point where the user can potentially trigger a kernel panic? Another way of putting it is: How gracefully do you want the kernel to collapse when this sort of corruption happens? Alan Stern ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-03 16:11 ` Greg KH 2022-06-03 16:27 ` Alan Stern @ 2022-06-04 8:32 ` Dmitry Vyukov 2022-06-06 12:38 ` Dan Carpenter 1 sibling, 1 reply; 14+ messages in thread From: Dmitry Vyukov @ 2022-06-04 8:32 UTC (permalink / raw) To: Greg KH Cc: Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > syzbot has bisected this issue to: > > > > > > > > > > > > commit a9c4cf299f5f79d5016c8a9646fa1fc49381a8c1 > > > > > > Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > > > > Date: Fri Jun 18 13:41:27 2021 +0000 > > > > > > > > > > > > ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros > > > > > > > > > > Hmm... It's not obvious at all how this change can alter the behaviour so > > > > > drastically. device_add() is called from USB core with intf->dev.name == NULL > > > > > by some reason. A-ha, seems like fault injector, which looks like > > > > > > > > > > dev_set_name(&intf->dev, "%d-%s:%d.%d", dev->bus->busnum, > > > > > dev->devpath, configuration, ifnum); > > > > > > > > > > missed the return code check. > > > > > > > > > > But I'm not familiar with that code at all, adding Linux USB ML and Alan. > > > > > > > > I can't see any connection between this bug and acpi/sysfs.c. Is it a > > > > bad bisection? > > > > > > > > It looks like you're right about dev_set_name() failing. In fact, the > > > > kernel appears to be littered with calls to that routine which do not > > > > check the return code (the entire subtree below drivers/usb/ contains > > > > only _one_ call that does check the return code!). The function doesn't > > > > have any __must_check annotation, and its kerneldoc doesn't mention the > > > > return code or the possibility of a failure. > > > > > > > > Apparently the assumption is that if dev_set_name() fails then > > > > device_add() later on will also fail, and the problem will be detected > > > > then. > > > > > > > > So now what should happen when device_add() for an interface fails in > > > > usb_set_configuration()? > > > > > > But how can that really fail on a real system? > > > > > > Is this just due to error-injection stuff? If so, I'm really loath to > > > rework the world for something that can never happen in real life. > > > > > > Or is this a real syzbot-found-with-reproducer issue? > > > > Aren't there quite a few reasons why device_add() might fail? (Although > > most of them probably are memory allocation errors...) > > I was thinking of the dev_set_name() issue further back in the call > chain. > > > Basically, you have to make up your mind. If a function can fail, you > > should be prepared to handle the failure. If it can't fail, there's no > > point in even checking the return code. > > True, ok, we should unwind the mess. I'll try to look at it after the > merge window... > > But again, is this a "real and able to be triggered from userspace" > problem, or just fault-injection-induced? Then this is something to fix in the fault injection subsystem. Testing systems shouldn't be reporting false positives. What allocations cannot fail in real life? Is it <=page_size? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-04 8:32 ` Dmitry Vyukov @ 2022-06-06 12:38 ` Dan Carpenter 2022-06-07 7:15 ` Dmitry Vyukov 0 siblings, 1 reply; 14+ messages in thread From: Dan Carpenter @ 2022-06-06 12:38 UTC (permalink / raw) To: Dmitry Vyukov Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > But again, is this a "real and able to be triggered from userspace" > > problem, or just fault-injection-induced? > > Then this is something to fix in the fault injection subsystem. > Testing systems shouldn't be reporting false positives. > What allocations cannot fail in real life? Is it <=page_size? > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! https://lwn.net/Articles/627419/ I have been on the look out since that article and never seen anyone mention it changing. I think we should ignore that and say that anything over PAGE_SIZE can fail. Possibly we could go smaller than PAGE_SIZE... regards, dan carpenter ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-06 12:38 ` Dan Carpenter @ 2022-06-07 7:15 ` Dmitry Vyukov 2022-06-08 3:25 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Dmitry Vyukov @ 2022-06-07 7:15 UTC (permalink / raw) To: Dan Carpenter Cc: Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > But again, is this a "real and able to be triggered from userspace" > > > problem, or just fault-injection-induced? > > > > Then this is something to fix in the fault injection subsystem. > > Testing systems shouldn't be reporting false positives. > > What allocations cannot fail in real life? Is it <=page_size? > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > https://lwn.net/Articles/627419/ > > I have been on the look out since that article and never seen anyone > mention it changing. I think we should ignore that and say that > anything over PAGE_SIZE can fail. Possibly we could go smaller than > PAGE_SIZE... +linux-mm for GFP expertise re what allocations cannot possibly fail and should be excluded from fault injection. Interesting, thanks for the link. PAGE_SIZE looks like a good start. Once we have the predicate in place, we can refine it later when/if we have more inputs. But I wonder about GFP flags. They definitely have some impact on allocations. If GFP_ACCOUNT is set, all allocations can fail, right? If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? If GFP_NORETRY is set, allocations can fail? What about GFP_NOMEMALLOC and GFP_ATOMIC? What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At least some of these need to be set for allocations to not fail? Which ones? Any other flags are required to be set/unset for allocations to not fail? FTR here is quick link to flags list: https://elixir.bootlin.com/linux/v5.19-rc1/source/include/linux/gfp.h#L32 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-07 7:15 ` Dmitry Vyukov @ 2022-06-08 3:25 ` Matthew Wilcox 2022-06-08 8:20 ` Dmitry Vyukov 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2022-06-08 3:25 UTC (permalink / raw) To: Dmitry Vyukov Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > problem, or just fault-injection-induced? > > > > > > Then this is something to fix in the fault injection subsystem. > > > Testing systems shouldn't be reporting false positives. > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > https://lwn.net/Articles/627419/ > > > > I have been on the look out since that article and never seen anyone > > mention it changing. I think we should ignore that and say that > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > PAGE_SIZE... > > +linux-mm for GFP expertise re what allocations cannot possibly fail > and should be excluded from fault injection. > > Interesting, thanks for the link. > > PAGE_SIZE looks like a good start. Once we have the predicate in > place, we can refine it later when/if we have more inputs. > > But I wonder about GFP flags. They definitely have some impact on allocations. > If GFP_ACCOUNT is set, all allocations can fail, right? > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > If GFP_NORETRY is set, allocations can fail? > What about GFP_NOMEMALLOC and GFP_ATOMIC? > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > least some of these need to be set for allocations to not fail? Which > ones? > Any other flags are required to be set/unset for allocations to not fail? I'm not the expert on page allocation, but ... I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim happen from within that cgroup, and it might cause an OOM kill for something in that cgroup. But I don't think it makes a (low order) allocation more likely to fail. There's usually less memory avilable in DMA/DMA32 zones, but we have so few allocations from those zones, I question the utility of focusing testing on those allocations. GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation succeeds or fails; it kicks kswapd to do reclaim, rather than doing reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. GFP_IO allows IO to start (ie writeback can start) in order to clean dirty memory. Anyway, I hope somebody who knows the page allocator better than I do can say smarter things than this. Even better if they can put it into Documentation/ somewhere ;-) https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-08 3:25 ` Matthew Wilcox @ 2022-06-08 8:20 ` Dmitry Vyukov 2022-06-08 8:24 ` Dmitry Vyukov 0 siblings, 1 reply; 14+ messages in thread From: Dmitry Vyukov @ 2022-06-08 8:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Wed, 8 Jun 2022 at 05:25, Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > > problem, or just fault-injection-induced? > > > > > > > > Then this is something to fix in the fault injection subsystem. > > > > Testing systems shouldn't be reporting false positives. > > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > > > https://lwn.net/Articles/627419/ > > > > > > I have been on the look out since that article and never seen anyone > > > mention it changing. I think we should ignore that and say that > > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > > PAGE_SIZE... > > > > +linux-mm for GFP expertise re what allocations cannot possibly fail > > and should be excluded from fault injection. > > > > Interesting, thanks for the link. > > > > PAGE_SIZE looks like a good start. Once we have the predicate in > > place, we can refine it later when/if we have more inputs. > > > > But I wonder about GFP flags. They definitely have some impact on allocations. > > If GFP_ACCOUNT is set, all allocations can fail, right? > > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > > If GFP_NORETRY is set, allocations can fail? > > What about GFP_NOMEMALLOC and GFP_ATOMIC? > > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > > least some of these need to be set for allocations to not fail? Which > > ones? > > Any other flags are required to be set/unset for allocations to not fail? > > I'm not the expert on page allocation, but ... > > I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim > happen from within that cgroup, and it might cause an OOM kill for > something in that cgroup. But I don't think it makes a (low order) > allocation more likely to fail. Interesting. I was thinking of some malicious specifically crafted configurations with very low limit and particular pattern of allocations. Also what if there is just 1 process (current)? Is it possible to kill and reclaim the current process when a thread is stuck in the middle of the kernel on a kmalloc? Also I see e.g.: Tasks with the OOM protection (oom_score_adj set to -1000) are treated as an exception and are never killed. I am not an expert on this either, but I think it may be hard to fight with a specifically crafted attack. > There's usually less memory avilable in DMA/DMA32 zones, but we have > so few allocations from those zones, I question the utility of focusing > testing on those allocations. > > GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely > to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation > succeeds or fails; it kicks kswapd to do reclaim, rather than doing > reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely > to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. > GFP_IO allows IO to start (ie writeback can start) in order to clean > dirty memory. > > Anyway, I hope somebody who knows the page allocator better than I do > can say smarter things than this. Even better if they can put it into > Documentation/ somewhere ;-) Even better to put this into code as a predicate function that fault injection will use. It will also serve as precise up-to-date documentation. > https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html > exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] general protection fault in __device_attach 2022-06-08 8:20 ` Dmitry Vyukov @ 2022-06-08 8:24 ` Dmitry Vyukov 0 siblings, 0 replies; 14+ messages in thread From: Dmitry Vyukov @ 2022-06-08 8:24 UTC (permalink / raw) To: Matthew Wilcox Cc: Dan Carpenter, Greg KH, Alan Stern, Andy Shevchenko, syzbot, hdanton, lenb, linux-acpi, linux-kernel, rafael.j.wysocki, rafael, rjw, syzkaller-bugs, linux-usb, Linux-MM On Wed, 8 Jun 2022 at 10:20, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, Jun 07, 2022 at 09:15:09AM +0200, Dmitry Vyukov wrote: > > > On Mon, 6 Jun 2022 at 14:39, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > > > > > On Sat, Jun 04, 2022 at 10:32:46AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > > > > On Fri, 3 Jun 2022 at 18:12, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > But again, is this a "real and able to be triggered from userspace" > > > > > > problem, or just fault-injection-induced? > > > > > > > > > > Then this is something to fix in the fault injection subsystem. > > > > > Testing systems shouldn't be reporting false positives. > > > > > What allocations cannot fail in real life? Is it <=page_size? > > > > > > > > > > > > > Apparently in 2014, anything less than *EIGHT?!!* pages succeeded! > > > > > > > > https://lwn.net/Articles/627419/ > > > > > > > > I have been on the look out since that article and never seen anyone > > > > mention it changing. I think we should ignore that and say that > > > > anything over PAGE_SIZE can fail. Possibly we could go smaller than > > > > PAGE_SIZE... > > > > > > +linux-mm for GFP expertise re what allocations cannot possibly fail > > > and should be excluded from fault injection. > > > > > > Interesting, thanks for the link. > > > > > > PAGE_SIZE looks like a good start. Once we have the predicate in > > > place, we can refine it later when/if we have more inputs. > > > > > > But I wonder about GFP flags. They definitely have some impact on allocations. > > > If GFP_ACCOUNT is set, all allocations can fail, right? > > > If GFP_DMA/DMA32 is set, allocations can fail, right? What about other zones? > > > If GFP_NORETRY is set, allocations can fail? > > > What about GFP_NOMEMALLOC and GFP_ATOMIC? > > > What about GFP_IO/GFP_FS/GFP_DIRECT_RECLAIM/GFP_KSWAPD_RECLAIM? At > > > least some of these need to be set for allocations to not fail? Which > > > ones? > > > Any other flags are required to be set/unset for allocations to not fail? > > > > I'm not the expert on page allocation, but ... > > > > I don't think GFP_ACCOUNT makes allocations fail. It might make reclaim > > happen from within that cgroup, and it might cause an OOM kill for > > something in that cgroup. But I don't think it makes a (low order) > > allocation more likely to fail. > > Interesting. > I was thinking of some malicious specifically crafted configurations > with very low limit and particular pattern of allocations. Also what > if there is just 1 process (current)? Is it possible to kill and > reclaim the current process when a thread is stuck in the middle of > the kernel on a kmalloc? > Also I see e.g.: > Tasks with the OOM protection (oom_score_adj set to -1000) > are treated as an exception and are never killed. > > I am not an expert on this either, but I think it may be hard to fight > with a specifically crafted attack. > > > > There's usually less memory avilable in DMA/DMA32 zones, but we have > > so few allocations from those zones, I question the utility of focusing > > testing on those allocations. > > > > GFP_ATOMIC allows access to emergency pools, so I would say _less_ likely > > to fail. KSWAPD_RECLAIM has no effect on whether _this_ allocation > > succeeds or fails; it kicks kswapd to do reclaim, rather than doing > > reclaim directly. DIRECT_RECLAIM definitely makes allocations more likely > > to succeed. GFP_FS allows (direct) reclaim to happen from filesystems. > > GFP_IO allows IO to start (ie writeback can start) in order to clean > > dirty memory. > > > > Anyway, I hope somebody who knows the page allocator better than I do > > can say smarter things than this. Even better if they can put it into > > Documentation/ somewhere ;-) > > Even better to put this into code as a predicate function that fault > injection will use. It will also serve as precise up-to-date > documentation. Also at the end of kmalloc as: WARN_ON(!ret && !cant_fail(size, gfp)); ! > > https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html > > exists but isn't quite enough to answer this question. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [syzbot] [kernel?] general protection fault in __device_attach [not found] <000000000000bb7f1c05da29b601@google.com> 2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot @ 2024-01-10 13:12 ` syzbot 1 sibling, 0 replies; 14+ messages in thread From: syzbot @ 2024-01-10 13:12 UTC (permalink / raw) To: 42.hyeyoo, andriy.shevchenko, dan.carpenter, dvyukov, gregkh, hdanton, keescook, lenb, linux-acpi, linux-kernel, linux-mm, linux-usb, rafael.j.wysocki, rafael, rientjes, rjw, stern, syzkaller-bugs, vbabka, willy syzbot suspects this issue was fixed by commit: commit 49378a05ce7f01a203550eb7c2ef772f6d24565c Author: Vlastimil Babka <vbabka@suse.cz> Date: Thu Oct 26 15:45:42 2023 +0000 mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15b179cde80000 start commit: d1dc87763f40 assoc_array: Fix BUG_ON during garbage collect git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=c51cd24814bb5665 dashboard link: https://syzkaller.appspot.com/bug?extid=dd3c97de244683533381 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15613e2bf00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c90adbf00000 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: mm/slub: remove slab_alloc() and __kmem_cache_alloc_lru() wrappers For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-01-10 13:12 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <000000000000bb7f1c05da29b601@google.com>
2022-06-03 10:02 ` [syzbot] general protection fault in __device_attach syzbot
2022-06-03 11:04 ` Andy Shevchenko
2022-06-03 15:42 ` Alan Stern
2022-06-03 15:52 ` Greg KH
2022-06-03 16:03 ` Alan Stern
2022-06-03 16:11 ` Greg KH
2022-06-03 16:27 ` Alan Stern
2022-06-04 8:32 ` Dmitry Vyukov
2022-06-06 12:38 ` Dan Carpenter
2022-06-07 7:15 ` Dmitry Vyukov
2022-06-08 3:25 ` Matthew Wilcox
2022-06-08 8:20 ` Dmitry Vyukov
2022-06-08 8:24 ` Dmitry Vyukov
2024-01-10 13:12 ` [syzbot] [kernel?] " syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox