* Bug in PCI core @ 2006-10-11 20:41 Alan Stern 2006-10-13 1:01 ` [linux-pm] " Benjamin Herrenschmidt 2006-10-14 5:34 ` Greg KH 0 siblings, 2 replies; 30+ messages in thread From: Alan Stern @ 2006-10-11 20:41 UTC (permalink / raw) To: Greg KH; +Cc: linux-pci, Linux-pm mailing list, Kernel development list When a PCI device is suspended, its driver calls pci_save_state() so that the config space can be restored when the device is resumed. Then the driver calls pci_set_power_state(). However pci_set_power_state() calls pci_block_user_cfg_access(), and that routine calls pci_save_state() again. This overwrites the saved state with data in which memory, I/O, and bus master accesses are disabled. As a result, when the device is resumed it doesn't work. Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the right way to fix it; hopefully somebody else does. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-11 20:41 Bug in PCI core Alan Stern @ 2006-10-13 1:01 ` Benjamin Herrenschmidt 2006-10-13 8:50 ` Adam Belay 2006-10-14 5:34 ` Greg KH 1 sibling, 1 reply; 30+ messages in thread From: Benjamin Herrenschmidt @ 2006-10-13 1:01 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Wed, 2006-10-11 at 16:41 -0400, Alan Stern wrote: > When a PCI device is suspended, its driver calls pci_save_state() so that > the config space can be restored when the device is resumed. Then the > driver calls pci_set_power_state(). > > However pci_set_power_state() calls pci_block_user_cfg_access(), and that > routine calls pci_save_state() again. This overwrites the saved state > with data in which memory, I/O, and bus master accesses are disabled. As > a result, when the device is resumed it doesn't work. > > Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the > right way to fix it; hopefully somebody else does. Well, blocking user cfg access snapshots the config space to be able to respond to user space while the device is offline. Maybe it should be done from a separate config space image buffer ? ugh.... Ben. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 1:01 ` [linux-pm] " Benjamin Herrenschmidt @ 2006-10-13 8:50 ` Adam Belay 2006-10-13 9:16 ` Benjamin Herrenschmidt 2006-10-13 14:29 ` Alan Stern 0 siblings, 2 replies; 30+ messages in thread From: Adam Belay @ 2006-10-13 8:50 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linux-pci, Kernel development list, Linux-pm mailing list On Fri, 2006-10-13 at 11:01 +1000, Benjamin Herrenschmidt wrote: > On Wed, 2006-10-11 at 16:41 -0400, Alan Stern wrote: > > When a PCI device is suspended, its driver calls pci_save_state() so that > > the config space can be restored when the device is resumed. Then the > > driver calls pci_set_power_state(). > > > > However pci_set_power_state() calls pci_block_user_cfg_access(), and that > > routine calls pci_save_state() again. This overwrites the saved state > > with data in which memory, I/O, and bus master accesses are disabled. As > > a result, when the device is resumed it doesn't work. > > > > Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the > > right way to fix it; hopefully somebody else does. > > Well, blocking user cfg access snapshots the config space to be able to > respond to user space while the device is offline. Maybe it should be > done from a separate config space image buffer ? ugh.... > > Ben. Personally, I don't think exposing a cached version of the PCI config space when direct device access is prohibited is the right approach here. We really shouldn't be lying about the internal state of PCI devices (the cached version could be quite inaccurate). After all, if the device is in D3cold, then the spec claims it's perfectly valid for it to not respond to PCI configuration access. I can only assume this hack was done to satisfy some terribly broken userspace app. It's not surprising that even reading PCI config can easily crash systems. However, it's the responsibility of those apps with permission to access the PCI sysfs interface, not the kernel, to be aware of these constraints. The PCI configuration space cache was originally introduced to support power management. However, it's mostly incorrect, as it unnecessarily stores the values of read only registers (and even BIST which is potentially dangerous). A while back I posted a series of patches that address this issue, and the net result was that the config cache stays around wasting memory because of the pci_block_user_cfg_access() dependency despite being useless to PCI PM. I'd like to propose that we have the pci config sysfs interface return -EIO when it's blocked (e.g. active BIST or D3cold). This accurately reflects the state of the device to userspace, reduces complexity, and could potentially save some memory per PCI device instance. Thanks, Adam ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 8:50 ` Adam Belay @ 2006-10-13 9:16 ` Benjamin Herrenschmidt 2006-10-13 9:31 ` Martin Mares 2006-10-13 14:29 ` Alan Stern 1 sibling, 1 reply; 30+ messages in thread From: Benjamin Herrenschmidt @ 2006-10-13 9:16 UTC (permalink / raw) To: Adam Belay; +Cc: linux-pci, Kernel development list, Linux-pm mailing list > Personally, I don't think exposing a cached version of the PCI config > space when direct device access is prohibited is the right approach > here. We really shouldn't be lying about the internal state of PCI > devices (the cached version could be quite inaccurate). After all, if > the device is in D3cold, then the spec claims it's perfectly valid for > it to not respond to PCI configuration access. Yes, but the problem is that lspci etc... will suddenly see a bunch of ffff's all over the place instead of the device. In fact, I think we do use -some- cached infos already there but not for everything. And that breaks .... distro installers :0 For example, currently, if I power off the ethernet of my mac, or the firewire chip (which are powered off if the module isn't loaded), lspci will get the device id and vendor id right ... but won't get the class code. I'm not sure about kudzu/udev/whoever, I've had problems with distros not loading the modules because they couldn't find the chip because it was off because the module wasn't loaded ... (typical of chips like firewire which are loaded by class code). So I agree... but :) In a perfect world were everthing goes via sysfs and we have files that expose all the necessary cached infos for identifying the device (vendor, device, subsystem ids, class code, etc... which we do have in sysfs), then yes, we can probably make config space accesses to an off device just return ff's or an error (on some platform, they have to be blocked as they can lockup, happens with some cells on the mac when unclocked). > I can only assume this hack was done to satisfy some terribly broken > userspace app. Yes, distro HW detection tools mostly. > It's not surprising that even reading PCI config can > easily crash systems. However, it's the responsibility of those apps > with permission to access the PCI sysfs interface, not the kernel, to be > aware of these constraints. I agree. > The PCI configuration space cache was originally introduced to support > power management. However, it's mostly incorrect, as it unnecessarily > stores the values of read only registers (and even BIST which is > potentially dangerous). A while back I posted a series of patches that > address this issue, and the net result was that the config cache stays > around wasting memory because of the pci_block_user_cfg_access() > dependency despite being useless to PCI PM. > > I'd like to propose that we have the pci config sysfs interface return > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > reflects the state of the device to userspace, reduces complexity, and > could potentially save some memory per PCI device instance. We need to enquire which userspace apps have a problem here. It's mostly a distro matter... or we can force their hand :) The problem is mostly obsolete crap not using sysfs, like ... lspci :) Ben. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 9:16 ` Benjamin Herrenschmidt @ 2006-10-13 9:31 ` Martin Mares 2006-10-13 12:25 ` [linux-pm] " Benjamin Herrenschmidt 0 siblings, 1 reply; 30+ messages in thread From: Martin Mares @ 2006-10-13 9:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay Hi! > For example, currently, if I power off the ethernet of my mac, or the > firewire chip (which are powered off if the module isn't loaded), lspci > will get the device id and vendor id right ... but won't get the class > code. Ehm, you aren't using any recent pciutils, are you? ;-) Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Lottery -- a tax on people who can't do math. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 9:31 ` Martin Mares @ 2006-10-13 12:25 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 30+ messages in thread From: Benjamin Herrenschmidt @ 2006-10-13 12:25 UTC (permalink / raw) To: Martin Mares Cc: Adam Belay, Alan Stern, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Fri, 2006-10-13 at 11:31 +0200, Martin Mares wrote: > Hi! > > > For example, currently, if I power off the ethernet of my mac, or the > > firewire chip (which are powered off if the module isn't loaded), lspci > > will get the device id and vendor id right ... but won't get the class > > code. > > Ehm, you aren't using any recent pciutils, are you? ;-) Whatever came with the distro that complained about the problem back then :) I agree that the problem is fixed on the kernel level (sysfs) and I'm happy to hear that pciutils is fixed too :) So we can probably do what Adam suggest and just return errors or ff's Ben. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 8:50 ` Adam Belay 2006-10-13 9:16 ` Benjamin Herrenschmidt @ 2006-10-13 14:29 ` Alan Stern 2006-10-13 15:26 ` [linux-pm] " Alan Cox 2006-10-13 20:48 ` [linux-pm] " Pavel Machek 1 sibling, 2 replies; 30+ messages in thread From: Alan Stern @ 2006-10-13 14:29 UTC (permalink / raw) To: Adam Belay; +Cc: linux-pci, Linux-pm mailing list, Kernel development list On Fri, 13 Oct 2006, Adam Belay wrote: > Personally, I don't think exposing a cached version of the PCI config > space when direct device access is prohibited is the right approach > here. We really shouldn't be lying about the internal state of PCI > devices (the cached version could be quite inaccurate). After all, if > the device is in D3cold, then the spec claims it's perfectly valid for > it to not respond to PCI configuration access. > > I can only assume this hack was done to satisfy some terribly broken > userspace app. It's not surprising that even reading PCI config can > easily crash systems. However, it's the responsibility of those apps > with permission to access the PCI sysfs interface, not the kernel, to be > aware of these constraints. According to the comment in the code, access is blocked only during the time that the device is making the transition between different power states. > The PCI configuration space cache was originally introduced to support > power management. However, it's mostly incorrect, as it unnecessarily > stores the values of read only registers (and even BIST which is > potentially dangerous). A while back I posted a series of patches that > address this issue, and the net result was that the config cache stays > around wasting memory because of the pci_block_user_cfg_access() > dependency despite being useless to PCI PM. > > I'd like to propose that we have the pci config sysfs interface return > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > reflects the state of the device to userspace, reduces complexity, and > could potentially save some memory per PCI device instance. Could you resubmit your old patches and include a corresponding fix for this access problem? BTW, is it possible for userspace to try accessing a device in D3cold? Doesn't the fact that it is D3cold rather than D3hot mean the computer is turned off? Or have I been missing out on new developments? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 14:29 ` Alan Stern @ 2006-10-13 15:26 ` Alan Cox 2006-10-13 15:29 ` Arjan van de Ven 2006-10-13 20:48 ` [linux-pm] " Pavel Machek 1 sibling, 1 reply; 30+ messages in thread From: Alan Cox @ 2006-10-13 15:26 UTC (permalink / raw) To: Alan Stern Cc: Adam Belay, Benjamin Herrenschmidt, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern: > > I'd like to propose that we have the pci config sysfs interface return > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > > reflects the state of the device to userspace, reduces complexity, and > > could potentially save some memory per PCI device instance. > > Could you resubmit your old patches and include a corresponding fix for > this access problem? And then you can fix the applications it breaks, like the X server which does actually want to know where all the devices are located in PCI space. Alan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 15:26 ` [linux-pm] " Alan Cox @ 2006-10-13 15:29 ` Arjan van de Ven 2006-10-13 16:06 ` Alan Cox 2006-10-13 16:40 ` Adam Belay 0 siblings, 2 replies; 30+ messages in thread From: Arjan van de Ven @ 2006-10-13 15:29 UTC (permalink / raw) To: Alan Cox Cc: Alan Stern, Adam Belay, Benjamin Herrenschmidt, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Fri, 2006-10-13 at 16:26 +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern: > > > I'd like to propose that we have the pci config sysfs interface return > > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > > > reflects the state of the device to userspace, reduces complexity, and > > > could potentially save some memory per PCI device instance. > > > > Could you resubmit your old patches and include a corresponding fix for > > this access problem? > > And then you can fix the applications it breaks, like the X server which > does actually want to know where all the devices are located in PCI > space. > .. but which could equally well mmap the resource from sysfs ;) also the thing this patch does is ONLY when the device is effectively off the bus return -EIO. One can argue that -EAGAIN is nicer since it's only a temporary condition though.... ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 15:29 ` Arjan van de Ven @ 2006-10-13 16:06 ` Alan Cox 2006-10-13 16:34 ` Adam Belay 2006-10-13 16:40 ` Adam Belay 1 sibling, 1 reply; 30+ messages in thread From: Alan Cox @ 2006-10-13 16:06 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Stern, Adam Belay, Benjamin Herrenschmidt, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list Ar Gwe, 2006-10-13 am 17:29 +0200, ysgrifennodd Arjan van de Ven: > > And then you can fix the applications it breaks, like the X server which > > does actually want to know where all the devices are located in PCI > > space. > > > > .. but which could equally well mmap the resource from sysfs ;) That doesn't help deal with the location and PCI control side of things X has to perform and deal with. You also forgot to attach the tested patch set for the X server and other affected apps. The cached stuff was put in place precisely because stuff broke ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 16:06 ` Alan Cox @ 2006-10-13 16:34 ` Adam Belay 2006-10-13 16:36 ` [linux-pm] " Matthew Wilcox 2006-10-13 17:09 ` Alan Cox 0 siblings, 2 replies; 30+ messages in thread From: Adam Belay @ 2006-10-13 16:34 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Linux-pm mailing list, Kernel development list, Arjan van de Ven On Fri, 2006-10-13 at 17:06 +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 17:29 +0200, ysgrifennodd Arjan van de Ven: > > > And then you can fix the applications it breaks, like the X server which > > > does actually want to know where all the devices are located in PCI > > > space. > > > > > > > .. but which could equally well mmap the resource from sysfs ;) > > That doesn't help deal with the location and PCI control side of things > X has to perform and deal with. You also forgot to attach the tested > patch set for the X server and other affected apps. > > The cached stuff was put in place precisely because stuff broke > I agree this needs to be fixed. However, as I previously mentioned, this isn't the right place to attack the problem. Remember, this wasn't originally a kernel regression. Rather it's a workaround for a known X/lspci/whatever bug. It's not the kernel's job to babysit userspace. If a userspace app that has the proper permissions decides to take a course of action that could potentially crash the system, then it has a right to do so. There are probably dozens of vectors for these sorts of problems (e.g. mmap as Arjan has mentioned) so why stop at the pci config sysfs interface? In this specific case, the workaround for this userspace bug actually makes it impossible for programs that are implemented correctly (i.e. understand that PCI configuration space can be inaccessible under certain conditions) from working optimally because the kernel gives inaccurate PCI config data rather than reporting the reality of the situation. I'd much rather give correct code the advantage then work around buggy software that really needs to be fixed directly. Finally, it's worth noting that this issue is really a corner-case, and in most systems it's extremely rare that even incorrect userspace apps would have any issue. Thanks, Adam ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 16:34 ` Adam Belay @ 2006-10-13 16:36 ` Matthew Wilcox 2006-10-13 17:09 ` Alan Cox 1 sibling, 0 replies; 30+ messages in thread From: Matthew Wilcox @ 2006-10-13 16:36 UTC (permalink / raw) To: Adam Belay Cc: Alan Cox, Arjan van de Ven, Alan Stern, Benjamin Herrenschmidt, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Fri, Oct 13, 2006 at 12:34:20PM -0400, Adam Belay wrote: > I agree this needs to be fixed. However, as I previously mentioned, > this isn't the right place to attack the problem. Remember, this wasn't > originally a kernel regression. Rather it's a workaround for a known > X/lspci/whatever bug. It's not the kernel's job to babysit userspace. > If a userspace app that has the proper permissions decides to take a > course of action that could potentially crash the system, then it has a > right to do so. There are probably dozens of vectors for these sorts of > problems (e.g. mmap as Arjan has mentioned) so why stop at the pci > config sysfs interface? The patch I posted (to deny user access while the device is transitioning D-states) is to fix a bug where *any* local user can bring the system into undefined territory, simply by typing lspci at the right moment. No special permission is needed. I hadn't realised that pci_block_user_cfg_access() would call pci_save_state(). There's only one other user of pci_block_user_cfg_access() -- drivers/scsi/ipr.c and I think it could be induced to call pci_save_state() itself. It's an odd asymmetry anyway -- block calls save state, but unblock doesn't call restore_state. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 16:34 ` Adam Belay 2006-10-13 16:36 ` [linux-pm] " Matthew Wilcox @ 2006-10-13 17:09 ` Alan Cox 2006-10-13 16:49 ` Matthew Wilcox 2006-10-13 17:01 ` Adam Belay 1 sibling, 2 replies; 30+ messages in thread From: Alan Cox @ 2006-10-13 17:09 UTC (permalink / raw) To: Adam Belay Cc: linux-pci, Linux-pm mailing list, Kernel development list, Arjan van de Ven Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay: > I agree this needs to be fixed. However, as I previously mentioned, > this isn't the right place to attack the problem. Remember, this wasn't > originally a kernel regression. Rather it's a workaround for a known It's a kernel regression. It used to be reliable to read X resource addresses at any time. > Finally, it's worth noting that this issue is really a corner-case, and > in most systems it's extremely rare that even incorrect userspace apps > would have any issue. Except just occasionally and randomly in the field, probably almost undebuggable and irreproducable - the very worst conceivable kind of bug. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:09 ` Alan Cox @ 2006-10-13 16:49 ` Matthew Wilcox 2006-10-13 17:34 ` Alan Cox 2006-10-13 17:01 ` Adam Belay 1 sibling, 1 reply; 30+ messages in thread From: Matthew Wilcox @ 2006-10-13 16:49 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, Oct 13, 2006 at 06:09:09PM +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay: > > I agree this needs to be fixed. However, as I previously mentioned, > > this isn't the right place to attack the problem. Remember, this wasn't > > originally a kernel regression. Rather it's a workaround for a known > > It's a kernel regression. It used to be reliable to read X resource > addresses at any time. No it didn't. It's undefined behaviour to perform *any* PCI config access to the device while it's doing a D-state transition. It may have happened to work with the chips you tried it with, but more likely you never hit that window because X simply didn't try to do that. > > Finally, it's worth noting that this issue is really a corner-case, and > > in most systems it's extremely rare that even incorrect userspace apps > > would have any issue. > > Except just occasionally and randomly in the field, probably almost > undebuggable and irreproducable - the very worst conceivable kind of > bug. Indeed. Only now we have software producing it, rather than hardware producing it. That's actually an improvement I think, since it forces awareness of the issue. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 16:49 ` Matthew Wilcox @ 2006-10-13 17:34 ` Alan Cox 2006-10-13 17:13 ` Arjan van de Ven ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Alan Cox @ 2006-10-13 17:34 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox: > No it didn't. It's undefined behaviour to perform *any* PCI config > access to the device while it's doing a D-state transition. It may have I think you missed the earlier parts of the story - the kernel caches the base config register state. > happened to work with the chips you tried it with, but more likely you > never hit that window because X simply didn't try to do that. Which is why the kernel caches the register state. This all came up long ago and the solution we currently have was the one chosen after considerable debate and analysis about things like locking. We preserved the historical reliable interface going back to the early Linux PCI support and used by all the apps. There are several problems with making it return an error - What does user space do ? while(pci_...() == -EAGAIN) yield(); which is useful how - there is no select operation for waiting here, and while it could be added it just gets uglier - Who actually wants to get an error in that specific case ? If you can find someone who desperately wants an error code then code in O_DIRECT support to do it and preserve the existing sane API. The job of the kernel is not to expose hardware directly, it is to provide sane interfaces to it. We don't have separate interfaces to conf1, conf2, pcibios etc for good reason. Exposing everyone to ugly minor details of the PCI transition handling isn't progress. Alan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:34 ` Alan Cox @ 2006-10-13 17:13 ` Arjan van de Ven 2006-10-13 17:57 ` Alan Stern ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Arjan van de Ven @ 2006-10-13 17:13 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Matthew Wilcox, Linux-pm mailing list, Kernel development list, Adam Belay On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox: > > No it didn't. It's undefined behaviour to perform *any* PCI config > > access to the device while it's doing a D-state transition. It may have > > I think you missed the earlier parts of the story - the kernel caches > the base config register state. > > > happened to work with the chips you tried it with, but more likely you > > never hit that window because X simply didn't try to do that. > > Which is why the kernel caches the register state. but... it didn't USE the cache in the case we're protecting here. Instead the hardware would just go splat. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:34 ` Alan Cox 2006-10-13 17:13 ` Arjan van de Ven @ 2006-10-13 17:57 ` Alan Stern 2006-10-13 19:18 ` [linux-pm] " Matthew Wilcox 2006-10-13 19:30 ` Adam Belay 2006-10-13 23:00 ` Benjamin Herrenschmidt 3 siblings, 1 reply; 30+ messages in thread From: Alan Stern @ 2006-10-13 17:57 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Matthew Wilcox, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, 13 Oct 2006, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox: > > No it didn't. It's undefined behaviour to perform *any* PCI config > > access to the device while it's doing a D-state transition. It may have > > I think you missed the earlier parts of the story - the kernel caches > the base config register state. > > > happened to work with the chips you tried it with, but more likely you > > never hit that window because X simply didn't try to do that. > > Which is why the kernel caches the register state. This all came up long > ago and the solution we currently have was the one chosen after > considerable debate and analysis about things like locking. We preserved > the historical reliable interface going back to the early Linux PCI > support and used by all the apps. Would it be okay for pci_block_user_cfg_access() to use its own cache, so it doesn't interfere with data previously cached by pci_save_state()? Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 17:57 ` Alan Stern @ 2006-10-13 19:18 ` Matthew Wilcox 2006-10-13 20:59 ` Alan Stern 0 siblings, 1 reply; 30+ messages in thread From: Matthew Wilcox @ 2006-10-13 19:18 UTC (permalink / raw) To: Alan Stern Cc: Alan Cox, Adam Belay, Arjan van de Ven, Benjamin Herrenschmidt, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Fri, Oct 13, 2006 at 01:57:48PM -0400, Alan Stern wrote: > Would it be okay for pci_block_user_cfg_access() to use its own cache, so > it doesn't interfere with data previously cached by pci_save_state()? My suggestion is just to require that the callers have previously called pci_save_state(). The PCI PM stack already has, and it's a one-line change to the IPR driver. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 19:18 ` [linux-pm] " Matthew Wilcox @ 2006-10-13 20:59 ` Alan Stern 0 siblings, 0 replies; 30+ messages in thread From: Alan Stern @ 2006-10-13 20:59 UTC (permalink / raw) To: Matthew Wilcox Cc: Alan Cox, linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, 13 Oct 2006, Matthew Wilcox wrote: > On Fri, Oct 13, 2006 at 01:57:48PM -0400, Alan Stern wrote: > > Would it be okay for pci_block_user_cfg_access() to use its own cache, so > > it doesn't interfere with data previously cached by pci_save_state()? > > My suggestion is just to require that the callers have previously called > pci_save_state(). The PCI PM stack already has, and it's a one-line > change to the IPR driver. Okay. Would you like to write a patch with that fix? Be sure to add a comment explaining the need for a previous call to pci_save_state(). At least it will get things going for now, even if it isn't perfectly correct in the long run. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:34 ` Alan Cox 2006-10-13 17:13 ` Arjan van de Ven 2006-10-13 17:57 ` Alan Stern @ 2006-10-13 19:30 ` Adam Belay 2006-10-13 23:00 ` Benjamin Herrenschmidt 3 siblings, 0 replies; 30+ messages in thread From: Adam Belay @ 2006-10-13 19:30 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Matthew Wilcox, Linux-pm mailing list, Kernel development list, Arjan van de Ven On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote: Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox: > > No it didn't. It's undefined behaviour to perform *any* PCI config > > access to the device while it's doing a D-state transition. It may have > > I think you missed the earlier parts of the story - the kernel caches > the base config register state. > > > happened to work with the chips you tried it with, but more likely you > > never hit that window because X simply didn't try to do that. > > Which is why the kernel caches the register state. This all came up long > ago and the solution we currently have was the one chosen after > considerable debate and analysis about things like locking. We preserved > the historical reliable interface going back to the early Linux PCI > support and used by all the apps. > > > There are several problems with making it return an error > > - What does user space do ? > > while(pci_...() == -EAGAIN) yield(); > > which is useful how - there is no select operation for waiting here, and > while it could be added it just gets uglier > If the sysfs file blocked, this could be handled quite cleanly, and would reflect accurate PCI config state. > - Who actually wants to get an error in that specific case ? > Let's say the device is in D3cold (i.e. the parent bridge has been powered down). In that case, you might want to get an error (probably -EIO, but maybe FF...). A buffered copy would be incorrect if used by a userspace driver, as this would be hiding a legitimate failure condition. > If you can find someone who desperately wants an error code then code in > O_DIRECT support to do it and preserve the existing sane API. > > The job of the kernel is not to expose hardware directly, it is to > provide sane interfaces to it. We don't have separate interfaces to > conf1, conf2, pcibios etc for good reason. Exposing everyone to ugly > minor details of the PCI transition handling isn't progress. > I suppose we have very different ideas about the actual role and purpose of this sysfs interface. As I see it, it provides direct access to hardware for userspace device drivers (software that actually cares about the ugly PCI details). It's much lower-level than the highly abstracted "vendor", "device", "resourceX", etc. interfaces. As such, it's very important that it accurately reflects what's actually going on in hardware, even if this is of potentially greater hassle to userspace. Now that's not to suggest that we shouldn't block this interface when making a power state transition. But I think it's best to expose the hardware failure and powered off cases as errors. On the other hand you seem to suggest that it is a potentially approximate cache of the pci config space that primarily serves to provide pci configuration data to userspace hardware detection mechanisms. However, in this case, I think it may as well be marked as deprecated, as it's clearly inferior to the higher order sysfs attributes ("vendor", "device", "irq", "class", etc.) with regard to accuracy, code complexity (both for the kernel and userspace), and ease-of-use. In other words, I don't see a reason any userspace app should ever use it other than for debugging (i.e. lspci). Adam ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:34 ` Alan Cox ` (2 preceding siblings ...) 2006-10-13 19:30 ` Adam Belay @ 2006-10-13 23:00 ` Benjamin Herrenschmidt 2006-10-14 2:33 ` Alan Stern 3 siblings, 1 reply; 30+ messages in thread From: Benjamin Herrenschmidt @ 2006-10-13 23:00 UTC (permalink / raw) To: Alan Cox Cc: Matthew Wilcox, linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox: > > No it didn't. It's undefined behaviour to perform *any* PCI config > > access to the device while it's doing a D-state transition. It may have > > I think you missed the earlier parts of the story - the kernel caches > the base config register state. > > > happened to work with the chips you tried it with, but more likely you > > never hit that window because X simply didn't try to do that. > > Which is why the kernel caches the register state. This all came up long > ago and the solution we currently have was the one chosen after > considerable debate and analysis about things like locking. We preserved > the historical reliable interface going back to the early Linux PCI > support and used by all the apps. Well, we have two different things here. One is short term block. For example, PM transitions, or BIST. In that case, I reckon it might be worth just making the user space PCI config space accessors block in the kernel during the transition (a wait queue ?) One is long term block: the device is off. That's where it becomes tricky. For D3, I suppose it's actually correct to return cached infos provided that those do actually cache the PM capability indicating D3 state (that is we need to update the cache after the transition). And it's correct to prevent writes too I suppose. Then there are problems with things like embedded or some Apple ASICs where we toggle power/clock lines of devices but not directly using PCI PM (in fact, those devices might not even have PCI PM capability exposed). Returning cached info is fine, but we can't tell userland about the powered off (or unclocked) state of the device that way. Ben. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 23:00 ` Benjamin Herrenschmidt @ 2006-10-14 2:33 ` Alan Stern 2006-10-14 3:04 ` [linux-pm] " Roland Dreier ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Alan Stern @ 2006-10-14 2:33 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alan Cox, Matthew Wilcox, linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Sat, 14 Oct 2006, Benjamin Herrenschmidt wrote: > Well, we have two different things here. > > One is short term block. For example, PM transitions, or BIST. In that > case, I reckon it might be worth just making the user space PCI config > space accessors block in the kernel during the transition (a wait > queue ?) That seems like a reasonable thing to do. (BTW, can anyone explain quickly what "BIST" means?) > One is long term block: the device is off. That's where it becomes > tricky. For D3, I suppose it's actually correct to return cached infos > provided that those do actually cache the PM capability indicating D3 > state (that is we need to update the cache after the transition). And > it's correct to prevent writes too I suppose. > > Then there are problems with things like embedded or some Apple ASICs > where we toggle power/clock lines of devices but not directly using PCI > PM (in fact, those devices might not even have PCI PM capability > exposed). Returning cached info is fine, but we can't tell userland > about the powered off (or unclocked) state of the device that way. Now you're starting to tread in the dangerous waters of runtime PM userspace APIs. So far nobody has figured out a good general way of exposing internal power states to userspace. There may not even be such a thing. Alan Stern ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-14 2:33 ` Alan Stern @ 2006-10-14 3:04 ` Roland Dreier 2006-10-14 3:07 ` Matthew Wilcox ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Roland Dreier @ 2006-10-14 3:04 UTC (permalink / raw) To: Alan Stern Cc: Benjamin Herrenschmidt, Alan Cox, Matthew Wilcox, Adam Belay, Arjan van de Ven, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list Alan> That seems like a reasonable thing to do. (BTW, can anyone Alan> explain quickly what "BIST" means?) "Built-In Self Test" -- make the device go away and run some internal tests for a while and then report back. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-14 2:33 ` Alan Stern 2006-10-14 3:04 ` [linux-pm] " Roland Dreier @ 2006-10-14 3:07 ` Matthew Wilcox 2006-10-14 3:19 ` Bill Randle 2006-10-14 5:47 ` Greg KH 3 siblings, 0 replies; 30+ messages in thread From: Matthew Wilcox @ 2006-10-14 3:07 UTC (permalink / raw) To: Alan Stern Cc: Benjamin Herrenschmidt, Alan Cox, Adam Belay, Arjan van de Ven, Greg KH, linux-pci, Linux-pm mailing list, Kernel development list On Fri, Oct 13, 2006 at 10:33:04PM -0400, Alan Stern wrote: > That seems like a reasonable thing to do. (BTW, can anyone explain > quickly what "BIST" means?) Built In Self Test. Devices can take up to 2 seconds to complete the test, and some drop off the PCI bus while doing it. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-14 2:33 ` Alan Stern 2006-10-14 3:04 ` [linux-pm] " Roland Dreier 2006-10-14 3:07 ` Matthew Wilcox @ 2006-10-14 3:19 ` Bill Randle 2006-10-14 5:47 ` Greg KH 3 siblings, 0 replies; 30+ messages in thread From: Bill Randle @ 2006-10-14 3:19 UTC (permalink / raw) To: Alan Stern Cc: Alan Cox, linux-pci, Matthew Wilcox, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, 2006-10-13 at 22:33 -0400, Alan Stern wrote: > That seems like a reasonable thing to do. (BTW, can anyone explain > quickly what "BIST" means?) Built-In Self Test -Bill ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-14 2:33 ` Alan Stern ` (2 preceding siblings ...) 2006-10-14 3:19 ` Bill Randle @ 2006-10-14 5:47 ` Greg KH 3 siblings, 0 replies; 30+ messages in thread From: Greg KH @ 2006-10-14 5:47 UTC (permalink / raw) To: Alan Stern Cc: Alan Cox, Matthew Wilcox, linux-pci, Linux-pm mailing list, Kernel development list, Adam Belay, Arjan van de Ven On Fri, Oct 13, 2006 at 10:33:04PM -0400, Alan Stern wrote: > (BTW, can anyone explain quickly what "BIST" means?) "Built In Self Test" ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 17:09 ` Alan Cox 2006-10-13 16:49 ` Matthew Wilcox @ 2006-10-13 17:01 ` Adam Belay 1 sibling, 0 replies; 30+ messages in thread From: Adam Belay @ 2006-10-13 17:01 UTC (permalink / raw) To: Alan Cox Cc: linux-pci, Linux-pm mailing list, Kernel development list, Arjan van de Ven On Fri, 2006-10-13 at 18:09 +0100, Alan Cox wrote: > Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay: > > I agree this needs to be fixed. However, as I previously mentioned, > > this isn't the right place to attack the problem. Remember, this wasn't > > originally a kernel regression. Rather it's a workaround for a known > > It's a kernel regression. It used to be reliable to read X resource > addresses at any time. Not true, reading these registers during a BIST has always been a problem. > > > Finally, it's worth noting that this issue is really a corner-case, and > > in most systems it's extremely rare that even incorrect userspace apps > > would have any issue. > > Except just occasionally and randomly in the field, probably almost > undebuggable and irreproducable - the very worst conceivable kind of > bug. > Which is why returning an error code during device transitions is a reasonable compromise between correctness and practicality. -Adam ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-13 15:29 ` Arjan van de Ven 2006-10-13 16:06 ` Alan Cox @ 2006-10-13 16:40 ` Adam Belay 1 sibling, 0 replies; 30+ messages in thread From: Adam Belay @ 2006-10-13 16:40 UTC (permalink / raw) To: Arjan van de Ven Cc: linux-pci, Linux-pm mailing list, Kernel development list, Alan Cox On Fri, 2006-10-13 at 17:29 +0200, Arjan van de Ven wrote: > On Fri, 2006-10-13 at 16:26 +0100, Alan Cox wrote: > > Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern: > > > > I'd like to propose that we have the pci config sysfs interface return > > > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > > > > reflects the state of the device to userspace, reduces complexity, and > > > > could potentially save some memory per PCI device instance. > > > > > > Could you resubmit your old patches and include a corresponding fix for > > > this access problem? > > > > And then you can fix the applications it breaks, like the X server which > > does actually want to know where all the devices are located in PCI > > space. > > > > .. but which could equally well mmap the resource from sysfs ;) > > > also the thing this patch does is ONLY when the device is effectively > off the bus return -EIO. > One can argue that -EAGAIN is nicer since it's only a temporary > condition though.... > > Yeah, perhaps -EAGAIN would be more appropriate, especially in the power state transition and BIST cases. An interesting possibility might be to have the file actually block, although I'm not sure if the O_NONBLOCK flag or polling for that matter can be supported through the sysfs/driver-core API. -Adam ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [linux-pm] Bug in PCI core 2006-10-13 14:29 ` Alan Stern 2006-10-13 15:26 ` [linux-pm] " Alan Cox @ 2006-10-13 20:48 ` Pavel Machek 1 sibling, 0 replies; 30+ messages in thread From: Pavel Machek @ 2006-10-13 20:48 UTC (permalink / raw) To: Alan Stern Cc: Adam Belay, linux-pci, Linux-pm mailing list, Kernel development list Hi! > > The PCI configuration space cache was originally introduced to support > > power management. However, it's mostly incorrect, as it unnecessarily > > stores the values of read only registers (and even BIST which is > > potentially dangerous). A while back I posted a series of patches that > > address this issue, and the net result was that the config cache stays > > around wasting memory because of the pci_block_user_cfg_access() > > dependency despite being useless to PCI PM. > > > > I'd like to propose that we have the pci config sysfs interface return > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately > > reflects the state of the device to userspace, reduces complexity, and > > could potentially save some memory per PCI device instance. > > Could you resubmit your old patches and include a corresponding fix for > this access problem? > > BTW, is it possible for userspace to try accessing a device in D3cold? > Doesn't the fact that it is D3cold rather than D3hot mean the computer is > turned off? Or have I been missing out on new developments? I'm not sure about D3cold vs. D3hot... IIRC D3hot means that it is possible to wake up the system, while D3cold means slot is completely powered down. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in PCI core 2006-10-11 20:41 Bug in PCI core Alan Stern 2006-10-13 1:01 ` [linux-pm] " Benjamin Herrenschmidt @ 2006-10-14 5:34 ` Greg KH 1 sibling, 0 replies; 30+ messages in thread From: Greg KH @ 2006-10-14 5:34 UTC (permalink / raw) To: Alan Stern; +Cc: linux-pci, Linux-pm mailing list, Kernel development list On Wed, Oct 11, 2006 at 04:41:22PM -0400, Alan Stern wrote: > When a PCI device is suspended, its driver calls pci_save_state() so that > the config space can be restored when the device is resumed. Then the > driver calls pci_set_power_state(). > > However pci_set_power_state() calls pci_block_user_cfg_access(), and that > routine calls pci_save_state() again. This overwrites the saved state > with data in which memory, I/O, and bus master accesses are disabled. As > a result, when the device is resumed it doesn't work. > > Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the > right way to fix it; hopefully somebody else does. Yeah, I'm just going to drop the patch from Matthew that added this. Andrew also noted that it causes bad things to happen on his laptop. Thanks for pointing it out. I gotta get suspend working on a machine here so I can test these things better... thanks, greg k-h ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-10-14 5:47 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-11 20:41 Bug in PCI core Alan Stern 2006-10-13 1:01 ` [linux-pm] " Benjamin Herrenschmidt 2006-10-13 8:50 ` Adam Belay 2006-10-13 9:16 ` Benjamin Herrenschmidt 2006-10-13 9:31 ` Martin Mares 2006-10-13 12:25 ` [linux-pm] " Benjamin Herrenschmidt 2006-10-13 14:29 ` Alan Stern 2006-10-13 15:26 ` [linux-pm] " Alan Cox 2006-10-13 15:29 ` Arjan van de Ven 2006-10-13 16:06 ` Alan Cox 2006-10-13 16:34 ` Adam Belay 2006-10-13 16:36 ` [linux-pm] " Matthew Wilcox 2006-10-13 17:09 ` Alan Cox 2006-10-13 16:49 ` Matthew Wilcox 2006-10-13 17:34 ` Alan Cox 2006-10-13 17:13 ` Arjan van de Ven 2006-10-13 17:57 ` Alan Stern 2006-10-13 19:18 ` [linux-pm] " Matthew Wilcox 2006-10-13 20:59 ` Alan Stern 2006-10-13 19:30 ` Adam Belay 2006-10-13 23:00 ` Benjamin Herrenschmidt 2006-10-14 2:33 ` Alan Stern 2006-10-14 3:04 ` [linux-pm] " Roland Dreier 2006-10-14 3:07 ` Matthew Wilcox 2006-10-14 3:19 ` Bill Randle 2006-10-14 5:47 ` Greg KH 2006-10-13 17:01 ` Adam Belay 2006-10-13 16:40 ` Adam Belay 2006-10-13 20:48 ` [linux-pm] " Pavel Machek 2006-10-14 5:34 ` Greg KH
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox