* Re: pci error recovery procedure [not found] <1157008212.20092.36.camel@ymzhang-perf.sh.intel.com> @ 2006-08-31 17:50 ` Linas Vepstas 2006-09-01 3:33 ` Zhang, Yanmin 2006-09-01 3:42 ` Zhang, Yanmin 0 siblings, 2 replies; 21+ messages in thread From: Linas Vepstas @ 2006-08-31 17:50 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci, Yanmin Zhang, inux-kernel, Rajesh Shah On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > Linas, > > I am reviewing the error handlers of e1000 driver and got some ideas. My > startpoint is to simplify the err handler implementations for drivers, or > driver developers are *not willing* to add it if it's too complicated. I don't see that its to complicated ... > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > says the current powerpc implementation does not implement this callback. I don't know if its useless or not. I have not needed it yet for the symbios, ipr and e1000 drivers, but its possible that some more sophisticated device may want it. I'm tempted to keep it a while longer befoe discarding it. The scenario is this: the device driver decides that, rather than asking for a full electical reset of the card, instead, it wants to perform its own recovery. It can do this as follows: a) enable MMIO b) issue reset command to adapter c) enable DMA. If we enabled both DMA and MMIO at the same time, there are mnay cases where the card will immediately trap again -- for example, if its DMA'ing to some crazy address. Thus, typically, one wants DMA disabled until after the card reset. Withouth the mmio_enabled() reset, there is no way of doing this. > 2) Callback slot_reset could be merged with resume. The new resume could be: > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > think there is no actual reason to have both slot_reset and resume. The idea here was to handle multi-function cards. On a multi-function card, *all* devices need to indicate that they were able to reset. Once all devices have been successfuly reset, then operation can be resumed. If the reset of one function fails, then operation is not resumed for any f the functions. > 3) link_reset is not used in pci express aer implementation, so it could be > deleted also. OK. Link reset was added explicitly to support PCI-E, so if its not wanted, we can eliminate it. > How did you test e1000 err_handler? We have three methods (I thought these were documented). In one, a technician brushes a grounding strap to some of the signal pins. In the second, slots are populated with known-bad cards. The third test involes sending a command down to the pci bridge chip, telling it to behave as if it detected an error. For development, the last is quick-n-easy. > In the simulated enviroment, the testing might be > incorrect. Why would it be incorrect? I mean, we don't simulate having someone pour a cup of coffee into the guts of the machine ... but my understanding is the machines do get standard vibration/thermal/humidity testing, which is good enough for me. > For example, e1000_io_error_detected would call e1000_down to reset NIC. Why would that be incorrect? > During > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > e1000_down I/O don't reach NIC. Why would it fail? The e1000_down serves primarily to get the Linux kernel into a known state. It doesn't matter what happens to the card, since the next step will be to perform an electrical reset of the card. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-08-31 17:50 ` pci error recovery procedure Linas Vepstas @ 2006-09-01 3:33 ` Zhang, Yanmin 2006-09-01 21:25 ` Linas Vepstas 2006-09-01 3:42 ` Zhang, Yanmin 1 sibling, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-01 3:33 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, inux-kernel, Rajesh Shah On Fri, 2006-09-01 at 01:50, Linas Vepstas wrote: > On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > > Linas, > > > > I am reviewing the error handlers of e1000 driver and got some ideas. My > > startpoint is to simplify the err handler implementations for drivers, or > > driver developers are *not willing* to add it if it's too complicated. > > I don't see that its to complicated ... Originally, I didn't think so, but after I try to add err_handlers to some drivers, I feel it's too complicated. > > > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > > says the current powerpc implementation does not implement this callback. > > I don't know if its useless or not. I have not needed it yet for the > symbios, ipr and e1000 drivers, but its possible that some more > sophisticated device may want it. I'm tempted to keep it a while > longer befoe discarding it. > > The scenario is this: the device driver decides that, rather than asking > for a full electical reset of the card, instead, it wants to perform > its own recovery. It can do this as follows: > > a) enable MMIO > b) issue reset command to adapter > c) enable DMA. > > If we enabled both DMA and MMIO at the same time, there are mnay cases > where the card will immediately trap again -- for example, if its > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > until after the card reset. Withouth the mmio_enabled() reset, there > is no way of doing this. The new error_resume, or the old slot_reset could take care of it. The specific device driver knows all the details about how to initiate the devices. The error_resume could call the step a) b) c) sequencially while doing checking among steps. If there is really a device having specific requirement to reinitiate it (very rarely), it could use walkaround, such like schedule a WORKER. No need to provide a generic mmio_enabled. > > 2) Callback slot_reset could be merged with resume. The new resume could be: > > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > > think there is no actual reason to have both slot_reset and resume. > > The idea here was to handle multi-function cards. On a multi-function card, > *all* devices need to indicate that they were able to reset. Once all devices > have been successfuly reset, then operation can be resumed. If the reset > of one function fails, then operation is not resumed for any f the > functions. I don't think we need slot_reset to coordinate multi-function devices. The new error_resume could take care of multi-function card. 'reset' here means driver need do I/O to detect if the device (function) still works well. If a function of a multi-function device couldn't reset while other functions could reset, other functions could just go on to reinitiate. In the end, the error recovery procedure (handle_eeh_events in PowerPC implementation) could check all the returning values of error_resume. If there is a failure value, then removes all the functions' pci_dev of the device from the bus. > > > 3) link_reset is not used in pci express aer implementation, so it could be > > deleted also. > > OK. Link reset was added explicitly to support PCI-E, so if its not wanted, > we can eliminate it. > > > How did you test e1000 err_handler? > > We have three methods (I thought these were documented). In one, a > technician brushes a grounding strap to some of the signal pins. > In the second, slots are populated with known-bad cards. The third test > involes sending a command down to the pci bridge chip, telling it to > behave as if it detected an error. For development, the last is > quick-n-easy. Thanks for your explanation. > > > In the simulated enviroment, the testing might be > > incorrect. > > Why would it be incorrect? I mean, we don't simulate having someone pour a > cup of coffee into the guts of the machine ... but my understanding is > the machines do get standard vibration/thermal/humidity testing, which > is good enough for me. > > > For example, e1000_io_error_detected would call e1000_down to reset NIC. > > Why would that be incorrect? > > > During > > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > > e1000_down I/O don't reach NIC. > > Why would it fail? The e1000_down serves primarily to get the Linux > kernel into a known state. It doesn't matter what happens to the card, > since the next step will be to perform an electrical reset of the card. Who will perform the electrical reset of the card? Function e1000_reset or the platform? If it's the platform, I agree with you, but if it's e1000_reset, it might not work because e1000_reset uses a e1000-specific approach to reset the card. I'm not sure if the e1000_reset will restore the NIC to fresh system power-on state. At least, from the source codes, e1000_reset couldn't. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-01 3:33 ` Zhang, Yanmin @ 2006-09-01 21:25 ` Linas Vepstas 2006-09-04 5:47 ` Zhang, Yanmin 0 siblings, 1 reply; 21+ messages in thread From: Linas Vepstas @ 2006-09-01 21:25 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, linux-kernel, Rajesh Shah On Fri, Sep 01, 2006 at 11:33:49AM +0800, Zhang, Yanmin wrote: > On Fri, 2006-09-01 at 01:50, Linas Vepstas wrote: > > On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > > > Linas, > > > > > > I am reviewing the error handlers of e1000 driver and got some ideas. My > > > startpoint is to simplify the err handler implementations for drivers, or > > > driver developers are *not willing* to add it if it's too complicated. > > > > I don't see that its to complicated ... > Originally, I didn't think so, but after I try to add err_handlers to some > drivers, I feel it's too complicated. Which drivers are you working on? > > > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > > > says the current powerpc implementation does not implement this callback. > > > > I don't know if its useless or not. I have not needed it yet for the > > symbios, ipr and e1000 drivers, but its possible that some more > > sophisticated device may want it. I'm tempted to keep it a while > > longer befoe discarding it. > > > > The scenario is this: the device driver decides that, rather than asking > > for a full electical reset of the card, instead, it wants to perform > > its own recovery. It can do this as follows: > > > > a) enable MMIO > > b) issue reset command to adapter > > c) enable DMA. > > > > If we enabled both DMA and MMIO at the same time, there are mnay cases > > where the card will immediately trap again -- for example, if its > > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > > until after the card reset. Withouth the mmio_enabled() reset, there > > is no way of doing this. > The new error_resume, or the old slot_reset could take care of it. The specific > device driver knows all the details about how to initiate the devices. The > error_resume could call the step a) b) c) sequencially while doing checking among > steps. Again, consider the multi-function cards. On pSeries, I can only enable DMA on a er-slot basis, not a per-function basis. So if one driver enables DMA before some other driver has reset appropriately, everything breaks. > If there is really a device having specific requirement to reinitiate it (very rarely), > it could use walkaround, such like schedule a WORKER. No need to provide a generic > mmio_enabled. I don't understand. Enabling MMIO and enabling DMA both require specific commands to be sent to the PCI-host bridge. These commands are not a part of the PCI spec. > > > 2) Callback slot_reset could be merged with resume. The new resume could be: > > > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > > > think there is no actual reason to have both slot_reset and resume. > > > > The idea here was to handle multi-function cards. On a multi-function card, > > *all* devices need to indicate that they were able to reset. Once all devices > > have been successfuly reset, then operation can be resumed. If the reset > > of one function fails, then operation is not resumed for any f the > > functions. > I don't think we need slot_reset to coordinate multi-function devices. The new > error_resume could take care of multi-function card. How? > 'reset' here means driver > need do I/O to detect if the device (function) still works well. If a function > of a multi-function device couldn't reset while other functions could reset, > other functions could just go on to reinitiate. In the end, the error recovery > procedure (handle_eeh_events in PowerPC implementation) could check all the > returning values of error_resume. If there is a failure value, then removes > all the functions' pci_dev of the device from the bus. I can only enable or disable an entire PCI slot, and not individual PCI functions. If there are some pins that are shorted, or parity errors or whatever, I can only turn off the whole card. > > > During > > > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > > > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > > > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > > > e1000_down I/O don't reach NIC. > > > > Why would it fail? The e1000_down serves primarily to get the Linux > > kernel into a known state. It doesn't matter what happens to the card, > > since the next step will be to perform an electrical reset of the card. > Who will perform the electrical reset of the card? Function e1000_reset or the platform? The platform. By "electrical reset", I mean "dropping the #RST pin low for 200mS". Only the platform can do this. > If it's the platform, I agree with you, but if it's e1000_reset, it might not work because > e1000_reset uses a e1000-specific approach to reset the card. The driver has to choices: it can ask for the electrical reset, by returning PCI_ERS_RESULT_NEED_RESET. But if the driver doesn't need the electrical reset, then it can return PCI_ERS_RESULT_CAN_RECOVER, and issue whatever device-specific commands it needs to reset. > I'm not sure if the e1000_reset > will restore the NIC to fresh system power-on state. At least, from the source codes, e1000_reset > couldn't. I have no idea. That's why this driver issues PCI_ERS_RESULT_NEED_RESET, which will get it into a fresh system power-on state. Its easy, its brute-force, it works. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-01 21:25 ` Linas Vepstas @ 2006-09-04 5:47 ` Zhang, Yanmin 2006-09-04 9:03 ` Benjamin Herrenschmidt 2006-09-05 19:17 ` Linas Vepstas 0 siblings, 2 replies; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-04 5:47 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Sat, 2006-09-02 at 05:25, Linas Vepstas wrote: > On Fri, Sep 01, 2006 at 11:33:49AM +0800, Zhang, Yanmin wrote: > > On Fri, 2006-09-01 at 01:50, Linas Vepstas wrote: > > > On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > > > > Linas, > > > > > > > > I am reviewing the error handlers of e1000 driver and got some ideas. My > > > > startpoint is to simplify the err handler implementations for drivers, or > > > > driver developers are *not willing* to add it if it's too complicated. > > > > > > I don't see that its to complicated ... > > Originally, I didn't think so, but after I try to add err_handlers to some > > drivers, I feel it's too complicated. > > Which drivers are you working on? I worked out error handlers for tg3 NIC driver. I'm also checking e1000 driver to try to move all I/O operations from e1000_io_error_detected to e1000_io_slot_reset. > > > > > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > > > > says the current powerpc implementation does not implement this callback. > > > > > > I don't know if its useless or not. I have not needed it yet for the > > > symbios, ipr and e1000 drivers, but its possible that some more > > > sophisticated device may want it. I'm tempted to keep it a while > > > longer befoe discarding it. > > > > > > The scenario is this: the device driver decides that, rather than asking > > > for a full electical reset of the card, instead, it wants to perform > > > its own recovery. It can do this as follows: > > > > > > a) enable MMIO > > > b) issue reset command to adapter > > > c) enable DMA. > > > > > > If we enabled both DMA and MMIO at the same time, there are mnay cases > > > where the card will immediately trap again -- for example, if its > > > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > > > until after the card reset. I think most drivers' error_detected callbacks return PCI_ERS_RESULT_NEED_RESET, so the slot will be reset. Then, the example that one wants DMA disabled until after the card reset is not reasonable. > Withouth the mmio_enabled() reset, there > > > is no way of doing this. > > The new error_resume, or the old slot_reset could take care of it. The specific > > device driver knows all the details about how to initiate the devices. The > > error_resume could call the step a) b) c) sequencially while doing checking among > > steps. > > Again, consider the multi-function cards. On pSeries, I can only enable > DMA on a er-slot basis, not a per-function basis. So if one driver > enables DMA before some other driver has reset appropriately, everything > breaks. Does here 'reset' mean hardware slot reset? In error_detected, driver needs cancel all pending request and don't start any I/O operations. Then, if the slot is always reset, there will be no the problem. Function handle_eeh_events always resets slot except hard failure. See beow more comments. As you know, all functions of a device share the same bus number and 5 bit dev number. They just have different 3 bit function number. We could deduce if functions are in the same device (slot). If mmio_enabled is not used currently, I think we could delete it firstly. Later on, if a platform really need it, we could add it, so we could keep the simplied codes. > > > If there is really a device having specific requirement to reinitiate it (very rarely), > > it could use walkaround, such like schedule a WORKER. No need to provide a generic > > mmio_enabled. > > I don't understand. Enabling MMIO and enabling DMA both require specific > commands to be sent to the PCI-host bridge. These commands are not a > part of the PCI spec. Thanks. Now I understand why you specified mmio_enabled and slot_reset. They are just to map to pSeries platform hardware operation steps. I know little about pSeries hardware, but is it possible to merge such hardware steps from software point of view? I checked the source codes of pSeries eeh_driver. Function handle_eeh_events does nothing between pci_walk_bus(frozen_bus, eeh_report_reset, NULL) and pci_walk_bus(frozen_bus, eeh_report_resume, NULL), that is, it doesn't enable DMA after slot_reset. handle_eeh_events always resets slot except hard failure. So, slot_reset could be merged with resume. > > > > > > 2) Callback slot_reset could be merged with resume. The new resume could be: > > > > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > > > > think there is no actual reason to have both slot_reset and resume. > > > > > > The idea here was to handle multi-function cards. On a multi-function card, > > > *all* devices need to indicate that they were able to reset. Once all devices > > > have been successfuly reset, then operation can be resumed. If the reset > > > of one function fails, then operation is not resumed for any f the > > > functions. > > I don't think we need slot_reset to coordinate multi-function devices. The new > > error_resume could take care of multi-function card. > > How? > > > 'reset' here means driver > > need do I/O to detect if the device (function) still works well. If a function > > of a multi-function device couldn't reset while other functions could reset, > > other functions could just go on to reinitiate. In the end, the error recovery > > procedure (handle_eeh_events in PowerPC implementation) could check all the > > returning values of error_resume. If there is a failure value, then removes > > all the functions' pci_dev of the device from the bus. > > I can only enable or disable an entire PCI slot, and not individual PCI > functions. If there are some pins that are shorted, or parity errors or > whatever, I can only turn off the whole card. It doesn't matter with the simplification. I don't mean that a device function should be disabled immediately after the error_resume of the function driver returns.The disable operation could be delayed till all error_resume return. > > > > > > During > > > > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > > > > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > > > > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > > > > e1000_down I/O don't reach NIC. > > > > > > Why would it fail? The e1000_down serves primarily to get the Linux > > > kernel into a known state. It doesn't matter what happens to the card, > > > since the next step will be to perform an electrical reset of the card. > > Who will perform the electrical reset of the card? Function e1000_reset or the platform? > > The platform. By "electrical reset", I mean "dropping the #RST pin low > for 200mS". Only the platform can do this. Thanks for your explanation. I assume after the electrical reset, all device functions of the device slot will go back to the initial status before attaching their drivers. > > > If it's the platform, I agree with you, but if it's e1000_reset, it might not work because > > e1000_reset uses a e1000-specific approach to reset the card. > > The driver has to choices: it can ask for the electrical reset, by > returning PCI_ERS_RESULT_NEED_RESET. But if the driver doesn't need > the electrical reset, then it can return PCI_ERS_RESULT_CAN_RECOVER, > and issue whatever device-specific commands it needs to reset. > > I'm not sure if the e1000_reset > > will restore the NIC to fresh system power-on state. At least, from the source codes, e1000_reset > > couldn't. > > I have no idea. That's why this driver issues PCI_ERS_RESULT_NEED_RESET, > which will get it into a fresh system power-on state. Its easy, its > brute-force, it works. I found a problem of e1000 driver when testing its error handlers. After the NIC is resumed, its RX/TX packets numbers are crazy. Now, I think it's a bug of function e1000_reset, not the error handlers. Sorry for bothering you on e1000. I copy another email below, so we could keep the discussion in one thread. On Fri, Sep 01, 2006 at 05:04:09PM +0800, Zhang, Yanmin wrote: > > One more comment: The second parameter of error_detected also could be deleted > > because recovery procedures will save error to pci_dev->error_state. > > Yes, I beleive so. Thanks. > > > So, the err_handler pci_error_handlers could be: > > struct pci_error_handlers > > { > > pci_ers_result_t (*error_detected)(struct pci_dev *dev); > > pci_ers_result_t (*error_resume)(struct pci_dev *dev); > > }; > > No, as per other email, we still need a multi-step process for > multi-function cards, As above discussion, reset slot could resolve it like you did for pSeries. > and for cards that may not want to get > a full electrical reset. So I think slot is reset only when a error_detected returns PCI_ERS_RESULT_NEED_RESET. > Finally, there might be platforms > that cannot perform a per-slot electrical reset, and would > therefore require drivers that can recover on thier own. The new pci_error_handlers could process it easily. The driver's error_resume just need schedule a driver-specific worker and returns PCI_ERS_RESULT_RECOVERED. The worker could do recover on the driver own later on. By checking drivers who support err_handler in the latest kernel, we could find they all returns PCI_ERS_RESULT_NEED_RESET. They all could be converted to use the new simplified pci_error_handlers. The new pci_error_handlers also gives drivers flexibility to have more control on error recovery. It's hard to look for a perfect solution. I mean, it's a trade-off. As long as it could finish most functionality, the simpler, the better. Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-04 5:47 ` Zhang, Yanmin @ 2006-09-04 9:03 ` Benjamin Herrenschmidt 2006-09-05 2:32 ` Zhang, Yanmin 2006-09-05 18:50 ` Linas Vepstas 2006-09-05 19:17 ` Linas Vepstas 1 sibling, 2 replies; 21+ messages in thread From: Benjamin Herrenschmidt @ 2006-09-04 9:03 UTC (permalink / raw) To: Zhang, Yanmin Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist > As you know, all functions of a device share the same bus number and 5 bit dev number. > They just have different 3 bit function number. We could deduce if functions are in the same > device (slot). Until you have a P2P bridge ... > Thanks. Now I understand why you specified mmio_enabled and slot_reset. They are just > to map to pSeries platform hardware operation steps. I know little about pSeries hardware, > but is it possible to merge such hardware steps from software point of view? One of the ideas we had when defining those steps is to be precise enough to let drivers who _can_ deal with those fine grained pSeries step implement them, but also have the fallback to slot reset whenever possible. Now, if in practice, after actually implementing this in a number of drivers, we see that slot reset is the only ever used path, then we might want to simplify things a bit. I didn't want to impose that restriction in the initial design though. It's my understanding that doing no slot reset (hardware reset) but just re-enabling MMIO, DMA and clearing pending error status in the PCI config space is, as far as the driver is concerned, almost functionally equivalent to a PCIe link reset. That is, the link reset might not (or will not) actually reset the hardware beyond the PCIe link layer. Thus we could simplify the split between link reset / hard reset. The former is an attempt at recovery with only resetting the PCI path to the device, which on PCIe becomes a link reset, and on old PCI, just clearing of the various error bits along the path (and on pSeries, re-enabling MMIO and DMA access). However, there is still the problem that if you do that, on pSeries at least, you really want to 1- enable MMIO, 2- soft reset the card using MMIO, that is make sure all pending DMA is stopped, and 3- re-enable DMA. While if we collapse that into a single 'link reset' type of operation, we'll end up re-enabling MMIO and DMA before the driver has a chance to stop pending DMA's and thus increase the chance that we crap out due to a pending DMA on the chip. Ben. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-04 9:03 ` Benjamin Herrenschmidt @ 2006-09-05 2:32 ` Zhang, Yanmin 2006-09-05 19:01 ` Linas Vepstas 2006-09-05 18:50 ` Linas Vepstas 1 sibling, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-05 2:32 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Mon, 2006-09-04 at 17:03, Benjamin Herrenschmidt wrote: > > As you know, all functions of a device share the same bus number and 5 bit dev number. > > They just have different 3 bit function number. We could deduce if functions are in the same > > device (slot). > > Until you have a P2P bridge ... > > Thanks. Now I understand why you specified mmio_enabled and slot_reset. They are just > > to map to pSeries platform hardware operation steps. I know little about pSeries hardware, > > but is it possible to merge such hardware steps from software point of view? > > One of the ideas we had when defining those steps is to be precise > enough to let drivers who _can_ deal with those fine grained pSeries > step implement them, but also have the fallback to slot reset whenever > possible. > > Now, if in practice, after actually implementing this in a number of > drivers, we see that slot reset is the only ever used path, then we > might want to simplify things a bit. I didn't want to impose that > restriction in the initial design though. Thanks for your explanation. Now it's the time to delete mmio_enabled and merge slot_reset with resume. > > It's my understanding that doing no slot reset (hardware reset) but just > re-enabling MMIO, DMA and clearing pending error status in the PCI > config space is, as far as the driver is concerned, almost functionally > equivalent to a PCIe link reset. That is, the link reset might not (or > will not) actually reset the hardware beyond the PCIe link layer. I agree. > > Thus we could simplify the split between link reset / hard reset. The > former is an attempt at recovery with only resetting the PCI path to the > device, which on PCIe becomes a link reset, and on old PCI, just > clearing of the various error bits along the path (and on pSeries, > re-enabling MMIO and DMA access). However, there is still the problem > that if you do that, on pSeries at least, you really want to 1- enable > MMIO, 2- soft reset the card using MMIO, that is make sure all pending > DMA is stopped, and 3- re-enable DMA. While if we collapse that into a > single 'link reset' type of operation, we'll end up re-enabling MMIO and > DMA before the driver has a chance to stop pending DMA's Is it the exclusive reason to have multi-steps? 1) Here link reset and hard reset are hardware operations, not the link_reset and slot_reset callback in pci_error_handlers. 2) Callback error_detected will notify drivers there is PCI errors. Drivers shouldn't do any I/O in error_detected. 3) If both the link and slot are reset after all error_detected are called, the device should go back to initial status and all DMA should be stopped automatically. Why does the driver still need a chance to stop DMA? The error_detected of the drivers in the latest kernel who support err handlers always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples. > and thus > increase the chance that we crap out due to a pending DMA on the chip. > > Ben. Above discussion is only about if mmio_enabled is needed. As for slot_reset, I think it could be merged with resume, because platforms do nothing between calling slot_reset and resume. Any comment? Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-05 2:32 ` Zhang, Yanmin @ 2006-09-05 19:01 ` Linas Vepstas 2006-09-06 1:26 ` Zhang, Yanmin 0 siblings, 1 reply; 21+ messages in thread From: Linas Vepstas @ 2006-09-05 19:01 UTC (permalink / raw) To: Zhang, Yanmin Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Tue, Sep 05, 2006 at 10:32:08AM +0800, Zhang, Yanmin wrote: > Is it the exclusive reason to have multi-steps? I don't understand the question. A previous email explained the reason to have mutiple steps. > 1) Here link reset and hard reset are hardware operations, not the > link_reset and slot_reset callback in pci_error_handlers. I don't understand the comment. > 2) Callback error_detected will notify drivers there is PCI errors. Drivers > shouldn't do any I/O in error_detected. It shouldn't matter. If it is truly important for a particular platform to make sure that there is no i/o, then the low-level i/o routines could be modified to drop any accidentally issued i/o on the floor. This doesn't require a change to either the API or the policy. > 3) If both the link and slot are reset after all error_detected are called, > the device should go back to initial status and all DMA should be stopped > automatically. Why does the driver still need a chance to stop DMA? As explained previously, not all drivers may want to have a full electrical device reset. > The > error_detected of the drivers in the latest kernel who support err handlers > always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples. Just because the current drivers do it this way does not mean that this is the best way to do things. A full reset is time-consuming. Some drivers may want to implement a faster and quicker reset. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-05 19:01 ` Linas Vepstas @ 2006-09-06 1:26 ` Zhang, Yanmin 2006-09-06 20:01 ` Linas Vepstas 0 siblings, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-06 1:26 UTC (permalink / raw) To: Linas Vepstas Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Wed, 2006-09-06 at 03:01, Linas Vepstas wrote: > On Tue, Sep 05, 2006 at 10:32:08AM +0800, Zhang, Yanmin wrote: > > Is it the exclusive reason to have multi-steps? > > I don't understand the question. A previous email explained the reason > to have mutiple steps. The question is against Ben's comments. Pls. don't delete his comments in your reply. > > > 1) Here link reset and hard reset are hardware operations, not the > > link_reset and slot_reset callback in pci_error_handlers. > > I don't understand the comment. I wanted to clarify that we need differentiate link/hard reset from callback link_reset and slot_reset when discussing the API. > > > 2) Callback error_detected will notify drivers there is PCI errors. Drivers > > shouldn't do any I/O in error_detected. > > It shouldn't matter. If it is truly important for a particular platform > to make sure that there is no i/o, then the low-level i/o routines > could be modified to drop any accidentally issued i/o on the floor. > This doesn't require a change to either the API or the policy. > > 3) If both the link and slot are reset after all error_detected are called, > > the device should go back to initial status and all DMA should be stopped > > automatically. Why does the driver still need a chance to stop DMA? > > As explained previously, not all drivers may want to have a full > electrical device reset. I need repeat my idea. 1) My new pci_error_handlers doesn't always choose to reset slot. It still depends on the return value of error_detected. 2) As a matter of fact, most cases of specific device's error_detected callback will choose to return PCI_ERS_RESULT_NEED_RESET. Like what you did for e100/e1000/ipr. > > > The > > error_detected of the drivers in the latest kernel who support err handlers > > always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples. > > Just because the current drivers do it this way does not mean that this is > the best way to do things. If it's not the best way, why did you choose to reset slot for e1000/e100/ipr error handlers? They are typical widely-used devices. To make it easier to add error handlers? > A full reset is time-consuming. Some drivers > may want to implement a faster and quicker reset. > > --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-06 1:26 ` Zhang, Yanmin @ 2006-09-06 20:01 ` Linas Vepstas 2006-09-07 1:56 ` Zhang, Yanmin 0 siblings, 1 reply; 21+ messages in thread From: Linas Vepstas @ 2006-09-06 20:01 UTC (permalink / raw) To: Zhang, Yanmin Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Wed, Sep 06, 2006 at 09:26:56AM +0800, Zhang, Yanmin wrote: > > > The > > > error_detected of the drivers in the latest kernel who support err handlers > > > always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples. > > > > Just because the current drivers do it this way does not mean that this is > > the best way to do things. > > If it's not the best way, why did you choose to reset slot for e1000/e100/ipr > error handlers? They are typical widely-used devices. To make it easier to > add error handlers? I did it that way just to get going, get something working. I do not have hardware specs for any of these devices, and do not have much of an idea of what they are capable of; the recovery code I wrote is of "brute force, hit it with a hammer"-nature. Driver writers who know thier hardware well, and are interested in a more refined approach are encouraged to actualy use a more refined approach. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-06 20:01 ` Linas Vepstas @ 2006-09-07 1:56 ` Zhang, Yanmin 0 siblings, 0 replies; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-07 1:56 UTC (permalink / raw) To: Linas Vepstas Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Thu, 2006-09-07 at 04:01, Linas Vepstas wrote: > On Wed, Sep 06, 2006 at 09:26:56AM +0800, Zhang, Yanmin wrote: > > > > The > > > > error_detected of the drivers in the latest kernel who support err handlers > > > > always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples. > > > > > > Just because the current drivers do it this way does not mean that this is > > > the best way to do things. > > > > If it's not the best way, why did you choose to reset slot for e1000/e100/ipr > > error handlers? They are typical widely-used devices. To make it easier to > > add error handlers? > > I did it that way just to get going, get something working. I do not > have hardware specs for any of these devices, and do not have much of > an idea of what they are capable of; Yes, it's difficult to add fine-grained error handlers for guys who are not the driver developers. > the recovery code I wrote is of > "brute force, hit it with a hammer"-nature. Driver writers who > know thier hardware well, and are interested in a more refined > approach are encouraged to actualy use a more refined approach. I guess almost no driver developer is happy to spend lots of time to add refined steps. They would like to focus on normal process (for achievement feeling? :) ). In addition, if they use fine-grained steps in error handlers, all these steps might be rewritten when the device specs is upgraded. Fine-grained steps in error handlers are more difficut to debug. It's impossible for you to develop error handlers for all device drivers. The error handlers look a little like suspend/resume. Of course, it's more complicated. If we could keep it as simple as suspend/resume, it's more welcomed. pci error shouldn't happen frequently. And when it happens, I think mostly it's an endpoint device instead of bridge. When it happens, if we choose always reset slot, performance could be degraded, but not too much. I just deduce, and didn't test it on a machine with hundreds of devices. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-04 9:03 ` Benjamin Herrenschmidt 2006-09-05 2:32 ` Zhang, Yanmin @ 2006-09-05 18:50 ` Linas Vepstas 2006-09-05 21:19 ` Benjamin Herrenschmidt 2006-09-06 1:35 ` Zhang, Yanmin 1 sibling, 2 replies; 21+ messages in thread From: Linas Vepstas @ 2006-09-05 18:50 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Zhang, Yanmin, Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Mon, Sep 04, 2006 at 07:03:12PM +1000, Benjamin Herrenschmidt wrote: > > > As you know, all functions of a device share the same bus number and 5 bit dev number. > > They just have different 3 bit function number. We could deduce if functions are in the same > > device (slot). > > Until you have a P2P bridge ... And this is not theoretical: for example, the matrox graphics cards: 0000:c8:01.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 13) 0000:c9:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 85) Now, I could have sworn there was another device behind this bridge, some serial or joystick controller or something, although this particular card doesn't seem to have it. ------ It's not clear to me what hardware may show up in the future. For example, someone may build a 32x PCI-E card that will act as a bridge to a drawer with half-a-dozen ordinary PCI-X slots in it. This is perhaps a bit hypothetical, but changing the API will make it harder to implement eror recovery for such a system. FWIW, there is at least one pSeries system in the lab which has several hundred PCI slots attached to it, although I've never done testing on it. Hmm. Maybe its time I did ... --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-05 18:50 ` Linas Vepstas @ 2006-09-05 21:19 ` Benjamin Herrenschmidt 2006-09-06 1:35 ` Zhang, Yanmin 1 sibling, 0 replies; 21+ messages in thread From: Benjamin Herrenschmidt @ 2006-09-05 21:19 UTC (permalink / raw) To: Linas Vepstas Cc: Zhang, Yanmin, Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Tue, 2006-09-05 at 13:50 -0500, Linas Vepstas wrote: > On Mon, Sep 04, 2006 at 07:03:12PM +1000, Benjamin Herrenschmidt wrote: > > > > > As you know, all functions of a device share the same bus number and 5 bit dev number. > > > They just have different 3 bit function number. We could deduce if functions are in the same > > > device (slot). > > > > Until you have a P2P bridge ... > > And this is not theoretical: for example, the matrox graphics cards: > > 0000:c8:01.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 13) > 0000:c9:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 85) It's also very common with multiple ports network cards Ben. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-05 18:50 ` Linas Vepstas 2006-09-05 21:19 ` Benjamin Herrenschmidt @ 2006-09-06 1:35 ` Zhang, Yanmin 1 sibling, 0 replies; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-06 1:35 UTC (permalink / raw) To: Linas Vepstas Cc: Yanmin Zhang, LKML, Rajesh Shah, linuxppc-dev, linux-pci maillist On Wed, 2006-09-06 at 02:50, Linas Vepstas wrote: > On Mon, Sep 04, 2006 at 07:03:12PM +1000, Benjamin Herrenschmidt wrote: > > > > > As you know, all functions of a device share the same bus number and 5 bit dev number. > > > They just have different 3 bit function number. We could deduce if functions are in the same > > > device (slot). > > > > Until you have a P2P bridge ... > > And this is not theoretical: for example, the matrox graphics cards: > > 0000:c8:01.0 PCI bridge: Hint Corp HB6 Universal PCI-PCI bridge (non-transparent mode) (rev 13) > 0000:c9:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 85) > > Now, I could have sworn there was another device behind this bridge, > some serial or joystick controller or something, although this > particular card doesn't seem to have it. Thanks. My comments above in this email is just to try to find a method to judge if 2 or more functions belongs to the same device. If it's not right, it still doesn't hurt the new API pci_error_handlers. > > ------ > It's not clear to me what hardware may show up in the future. > For example, someone may build a 32x PCI-E card that will act > as a bridge to a drawer with half-a-dozen ordinary PCI-X slots > in it. This is perhaps a bit hypothetical, but changing the API > will make it harder to implement eror recovery for such a system. I agree that it's difficult to predict the future. At least the new API could process the example. > FWIW, there is at least one pSeries system in the lab which has > several hundred PCI slots attached to it, although I've never > done testing on it. Hmm. Maybe its time I did ... > > --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-04 5:47 ` Zhang, Yanmin 2006-09-04 9:03 ` Benjamin Herrenschmidt @ 2006-09-05 19:17 ` Linas Vepstas 2006-09-06 2:04 ` Zhang, Yanmin 1 sibling, 1 reply; 21+ messages in thread From: Linas Vepstas @ 2006-09-05 19:17 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Mon, Sep 04, 2006 at 01:47:30PM +0800, Zhang, Yanmin wrote: > > > > Again, consider the multi-function cards. On pSeries, I can only enable > > DMA on a per-slot basis, not a per-function basis. So if one driver > > enables DMA before some other driver has reset appropriately, everything > > breaks. > Does here 'reset' mean hardware slot reset? I should have said: If one driver of a multi-function card enables DMA before another driver has stabilized its harware, then everything breaks. > Then, if the slot is always reset, there will be no the problem. But that assumes that a hardware #RST will always be done. The API was designed to get away from this requirement. > If mmio_enabled is not used currently, I think we could delete it firstly. Later on, > if a platform really need it, we could add it, so we could keep the simplied codes. It would be very difficult to add it later. And it would be especially silly, given that someone would find this discussion in the mailing list archives. > Thanks. Now I understand why you specified mmio_enabled and slot_reset. They are just > to map to pSeries platform hardware operation steps. I know little about pSeries hardware, The hardware was designed that way because the hardware engineers thought that this is what the device driver writers would need. Thay are there to map to actual recovery steps that actual device drivers might want to do. > but is it possible to merge such hardware steps from software point of view? The previous email explained why this would be a bad idea. > > The platform. By "electrical reset", I mean "dropping the #RST pin low > > for 200mS". Only the platform can do this. > Thanks for your explanation. I assume after the electrical reset, all device > functions of the device slot will go back to the initial status before > attaching their drivers. Maybe. Depends on what yur BIOS does. On pSeries, I also need to set up the adress BARs > I found a problem of e1000 driver when testing its error handlers. After the NIC is resumed, > its RX/TX packets numbers are crazy. Hmm. There is a patch to prevent this from happening. I thought it was applied a long time ago. e1000_update_stats() should include the lines: if (pdev->error_state && pdev->error_state != pci_channel_io_normal) return; which is enough to prevent crazy stats on my machine. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-05 19:17 ` Linas Vepstas @ 2006-09-06 2:04 ` Zhang, Yanmin 2006-09-06 20:39 ` Linas Vepstas 0 siblings, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-06 2:04 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Wed, 2006-09-06 at 03:17, Linas Vepstas wrote: > On Mon, Sep 04, 2006 at 01:47:30PM +0800, Zhang, Yanmin wrote: > > > > > > Again, consider the multi-function cards. On pSeries, I can only enable > > > DMA on a per-slot basis, not a per-function basis. So if one driver > > > enables DMA before some other driver has reset appropriately, everything > > > breaks. > > Does here 'reset' mean hardware slot reset? > > I should have said: If one driver of a multi-function card enables DMA before > another driver has stabilized its harware, then everything breaks. What's another driver's hardware? A function of the previous multi-function card? Or a function of another device? Ok. now, I copy what you said before below for more discussion. > If we enabled both DMA and MMIO at the same time, there are mnay cases > where the card will immediately trap again -- for example, if its > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > until after the card reset. Withouth the mmio_enabled() reset, there > is no way of doing this. Did you asume the card reset is executed by callback mmio_enabled? > Again, consider the multi-function cards. On pSeries, I can only enable > DMA on a er-slot basis, not a per-function basis. So if one driver > enables DMA before some other driver has reset appropriately, everything > breaks. What does 'I' above stand for? The platform error recovery procedure or the error callbacks of drivers? I guess it means platform, that is, only platform enables DMA for the whole slot. But why does the last sentence become driver enables DMA? As you know, driver binds device function instead of slot. Could driver enable DMA for a function? > > > Then, if the slot is always reset, there will be no the problem. > > But that assumes that a hardware #RST will always be done. The API > was designed to get away from this requirement. > > > If mmio_enabled is not used currently, I think we could delete it firstly. Later on, > > if a platform really need it, we could add it, so we could keep the simplied codes. > > It would be very difficult to add it later. And it would be especially > silly, given that someone would find this discussion in the mailing list > archives. You stick to keep mmio_enabled which is not used currently, but if there will be a new platform who uses a more fine-grained steps to recover pci/pci-e, would you say 'it would be very difficut' and refuse add new callbacks? > > > Thanks. Now I understand why you specified mmio_enabled and slot_reset. They are just > > to map to pSeries platform hardware operation steps. I know little about pSeries hardware, > > The hardware was designed that way because the hardware engineers > thought that this is what the device driver writers would need. > Thay are there to map to actual recovery steps that actual device > drivers might want to do. It doesn't prevent software from merging some steps. And, we want to implement pci/pci-e error recovery for more platforms instead of just pSeries. > > > but is it possible to merge such hardware steps from software point of view? > > The previous email explained why this would be a bad idea. Obviously, such conclusion is too early. > > > > The platform. By "electrical reset", I mean "dropping the #RST pin low > > > for 200mS". Only the platform can do this. > > Thanks for your explanation. I assume after the electrical reset, all device > > functions of the device slot will go back to the initial status before > > attaching their drivers. > > Maybe. Depends on what yur BIOS does. On pSeries, I also need to > set up the adress BARs > > > I found a problem of e1000 driver when testing its error handlers. After the NIC is resumed, > > its RX/TX packets numbers are crazy. > > Hmm. There is a patch to prevent this from happening. I thought > it was applied a long time ago. e1000_update_stats() should include the > lines: > > if (pdev->error_state && pdev->error_state != pci_channel_io_normal) > return; > > which is enough to prevent crazy stats on my machine. Thanks a lot! Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-06 2:04 ` Zhang, Yanmin @ 2006-09-06 20:39 ` Linas Vepstas 2006-09-07 3:18 ` Zhang, Yanmin 0 siblings, 1 reply; 21+ messages in thread From: Linas Vepstas @ 2006-09-06 20:39 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Wed, Sep 06, 2006 at 10:04:31AM +0800, Zhang, Yanmin wrote: > On Wed, 2006-09-06 at 03:17, Linas Vepstas wrote: > > On Mon, Sep 04, 2006 at 01:47:30PM +0800, Zhang, Yanmin wrote: > > > > > > > > Again, consider the multi-function cards. On pSeries, I can only enable > > > > DMA on a per-slot basis, not a per-function basis. So if one driver > > > > enables DMA before some other driver has reset appropriately, everything > > > > breaks. > > > Does here 'reset' mean hardware slot reset? > > > > I should have said: If one driver of a multi-function card enables DMA before > > another driver has stabilized its harware, then everything breaks. > What's another driver's hardware? A function of the previous multi-function > card? Or a function of another device? Yes. Either. Both. Doesn't matter. Enabling DMA is "granular" at a different size scale than pci functions, and possibly even pci devices or slots, dependeing on the architecture. Before DMA can be enabled, *all* affected device drivers have to be approve, and have to be ready for it. > > If we enabled both DMA and MMIO at the same time, there are many cases > > where the card will immediately trap again -- for example, if its > > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > > until after the card reset. Without the mmio_enabled() reset, there > > is no way of doing this. > > Did you asume the card reset is executed by callback mmio_enabled? I am assuming that, when a driver receives the mmio_enabled() callback, it will perform some sort of register i/o. For example, I am currently planning to modify the e1000 driver to do the following: -- The error_occurred() callback returns PCI_ERS_RESULT_CAN_RECOVER -- The arch enables mmio, and then calls the mmio_enabled() callback. -- The mmio_enabled() callback in the driver takes a full dump of all of the regsters on the card. It then returns PCI_ERS_RESULT_NEED_RESET -- The arch performs the full electrical #RST of device. Recovery from this point proceeds as before. > > Again, consider the multi-function cards. On pSeries, I can only enable > > DMA on a per-slot basis, not a per-function basis. So if one driver > > enables DMA before some other driver has reset appropriately, everything > > breaks. > > What does 'I' above stand for? The platform error recovery procedure Yes. The pSeries platform error recovery procedure can only enable DMA on a per-slot basis. > I guess it means platform, that is, > only platform enables DMA for the whole slot. Yes. > But why does the last sentence > become driver enables DMA? In your proposal, you were suggesting that MMIO and DMA be enabled with one and the same routine, and I was attempting to explain why that can't work. > Could driver enable DMA for a function? No, not on pSeries hardware. > > > If mmio_enabled is not used currently, I think we could delete it firstly. Later on, > > > if a platform really need it, we could add it, so we could keep the simplied codes. > > > > It would be very difficult to add it later. And it would be especially > > silly, given that someone would find this discussion in the mailing list > > archives. > You stick to keep mmio_enabled which is not used currently, but if there will be > a new platform who uses a more fine-grained steps to recover pci/pci-e, would > you say 'it would be very difficut' and refuse add new callbacks? Yes. > It doesn't prevent software from merging some steps. And, we want > to implement pci/pci-e error recovery for more platforms instead of just > pSeries. Yes. The API was designed so that it could be supported on every current and future platform we could think of. You haven't yet claimed that "pci-e can't be supported". Based on what I understand, changing the API wouldn't make the implementation any easier. (It is very easy to call a callback, and then examine its return value. Removing a few callbacks does not materially simplify the recovery mechanism. Managing these callbacks is *not* the hard part of implementing this thing.) --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-06 20:39 ` Linas Vepstas @ 2006-09-07 3:18 ` Zhang, Yanmin 2006-09-12 19:38 ` Linas Vepstas 0 siblings, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-07 3:18 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Thu, 2006-09-07 at 04:39, Linas Vepstas wrote: > On Wed, Sep 06, 2006 at 10:04:31AM +0800, Zhang, Yanmin wrote: > > On Wed, 2006-09-06 at 03:17, Linas Vepstas wrote: > > > On Mon, Sep 04, 2006 at 01:47:30PM +0800, Zhang, Yanmin wrote: > > > > > > > > > > Again, consider the multi-function cards. On pSeries, I can only enable > > > > > DMA on a per-slot basis, not a per-function basis. So if one driver > > > > > enables DMA before some other driver has reset appropriately, everything > > > > > breaks. > > > > Does here 'reset' mean hardware slot reset? > > > > > > I should have said: If one driver of a multi-function card enables DMA before > > > another driver has stabilized its harware, then everything breaks. > > What's another driver's hardware? A function of the previous multi-function > > card? Or a function of another device? > > Yes. Either. Both. Doesn't matter. Enabling DMA is "granular" at a > different size scale than pci functions, and possibly even pci devices > or slots, dependeing on the architecture. Before DMA can be enabled, > *all* affected device drivers have to be approve, and have to be ready > for it. > > > If we enabled both DMA and MMIO at the same time, there are many cases > > > where the card will immediately trap again -- for example, if its > > > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > > > until after the card reset. Without the mmio_enabled() reset, there > > > is no way of doing this. > > > > Did you asume the card reset is executed by callback mmio_enabled? > > I am assuming that, when a driver receives the mmio_enabled() callback, > it will perform some sort of register i/o. For example, I am currently > planning to modify the e1000 driver to do the following: > > -- The error_occurred() callback returns PCI_ERS_RESULT_CAN_RECOVER > -- The arch enables mmio, and then calls the mmio_enabled() callback. > -- The mmio_enabled() callback in the driver takes a full dump of all > of the regsters on the card. It then returns PCI_ERS_RESULT_NEED_RESET Such dumping are random data and might be useless. The error recovery procedures are to process pci hardware errors instead of device driver bug. > -- The arch performs the full electrical #RST of device. Recovery from > this point proceeds as before. The steps are exquisite. Scenario: The e1000 NIC and another device (maybe a function) are on the same bus. The error_detected of the second device returns PCI_ERS_RESULT_NEED_RESET, so although error_detected of e1000 returns PCI_ERS_RESULT_CAN_RECOVER, the slot will be reset immediately, then error recovery will go to call slot_reset callback directly. The mmio_enabled is not called. My above scenario is just to say something is easy to be out of control if the steps are complicated. > > > > Again, consider the multi-function cards. On pSeries, I can only enable > > > DMA on a per-slot basis, not a per-function basis. So if one driver > > > enables DMA before some other driver has reset appropriately, everything > > > breaks. > > > > What does 'I' above stand for? The platform error recovery procedure > > Yes. The pSeries platform error recovery procedure can only enable DMA > on a per-slot basis. > > > I guess it means platform, that is, > > only platform enables DMA for the whole slot. > > Yes. > > > But why does the last sentence > > become driver enables DMA? > > In your proposal, you were suggesting that MMIO and DMA be enabled with > one and the same routine, and I was attempting to explain why that can't > work. Thanks for your explanations. My point is that if driver could enable DMA, it could do so in the new error_resume. Driver should do more checking before enabling DMA. Your scenario only exists when: 1) Only platform could enable DMA and enable it per-slot instead of per-function. 2) And at least one device doesn't want a hard slot reset to recover while all other impacted devices also don't want a hard slot; Because if one device want a hard reset, mmio_enabled of all impacted drivers won't be called. 3) And at least one device's DMA is crazy. If using my new API, I just need destroy one condition above. My requirement is: Only if a device uses DMA and the driver is not sure or sure if DMA is pending, its error_detected should return PCI_ERS_RESULT_NEED_RESET. Otherwise, error_detected is allowed to return whatever. > > > Could driver enable DMA for a function? > > No, not on pSeries hardware. > > > > > If mmio_enabled is not used currently, I think we could delete it firstly. Later on, > > > > if a platform really need it, we could add it, so we could keep the simplied codes. > > > > > > It would be very difficult to add it later. And it would be especially > > > silly, given that someone would find this discussion in the mailing list > > > archives. > > You stick to keep mmio_enabled which is not used currently, but if there will be > > a new platform who uses a more fine-grained steps to recover pci/pci-e, would > > you say 'it would be very difficut' and refuse add new callbacks? > > Yes. It's not fare to such other platforms although I have no such example now. > > > It doesn't prevent software from merging some steps. And, we want > > to implement pci/pci-e error recovery for more platforms instead of just > > pSeries. > > Yes. The API was designed so that it could be supported on every > current and future platform we could think of. You haven't yet > claimed that "pci-e can't be supported". Current error handler infrastructure could support pci-e, but I want a better solution to faciliate driver developers to add error handlers more easily. My startpoint is driver developer. If they are not willing to add error handlers, it's impossible to do so for all drivers by you and me. > Based on what > I understand, changing the API wouldn't make the implementation > any easier. (It is very easy to call a callback, and then > examine its return value. It's not easy. Just like above scenario, mmio_enabled might be jumped over when coordinating 2 more devices. Checking current e100/e1000/ipr error handlers, they look ugly. > Removing a few callbacks does not > materially simplify the recovery mechanism. Managing these > callbacks is *not* the hard part of implementing this thing.) Above comments are totally from error recovery design point of view. No considering for driver developers. BTW, most discussion is about if mmio_enabled should be deleted. As for merging slot_reset and resume, my reason is that there is no platform specific operation between calling slot_reset and resume. Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-07 3:18 ` Zhang, Yanmin @ 2006-09-12 19:38 ` Linas Vepstas 0 siblings, 0 replies; 21+ messages in thread From: Linas Vepstas @ 2006-09-12 19:38 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Thu, Sep 07, 2006 at 11:18:56AM +0800, Zhang, Yanmin wrote: > The error recovery procedures > are to process pci hardware errors instead of device driver bug. Over the last three years, we've uncovered (and fixed) dozens of device driver bugs that were only detected because of the pci error detection hardware. The ability to get device dumps is important, because many of these bugs are hard to reproduce, require getting PCI bus analyzers attached to the system, etc. > Current error handler infrastructure could support pci-e, but I want a better > solution to faciliate driver developers to add error handlers more easily. My > startpoint is driver developer. If they are not willing to add error handlers, > it's impossible to do so for all drivers by you and me. Right. As a result, we only care about the products that we actually sell to customers. PCI error recovery is not some "gee its nice" piece of eye-candy or chrome: either one is serious about high-availability, or one is not. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-08-31 17:50 ` pci error recovery procedure Linas Vepstas 2006-09-01 3:33 ` Zhang, Yanmin @ 2006-09-01 3:42 ` Zhang, Yanmin 2006-09-01 9:04 ` Zhang, Yanmin 1 sibling, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-01 3:42 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Fri, 2006-09-01 at 01:50, Linas Vepstas wrote: > On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > > Linas, > > > > I am reviewing the error handlers of e1000 driver and got some ideas. My > > startpoint is to simplify the err handler implementations for drivers, or > > driver developers are *not willing* to add it if it's too complicated. > > I don't see that its to complicated ... Originally, I didn't think so, but after I try to add err_handlers to some drivers, I feel it's too complicated. > > > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > > says the current powerpc implementation does not implement this callback. > > I don't know if its useless or not. I have not needed it yet for the > symbios, ipr and e1000 drivers, but its possible that some more > sophisticated device may want it. I'm tempted to keep it a while > longer befoe discarding it. > > The scenario is this: the device driver decides that, rather than asking > for a full electical reset of the card, instead, it wants to perform > its own recovery. It can do this as follows: > > a) enable MMIO > b) issue reset command to adapter > c) enable DMA. > > If we enabled both DMA and MMIO at the same time, there are mnay cases > where the card will immediately trap again -- for example, if its > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > until after the card reset. Withouth the mmio_enabled() reset, there > is no way of doing this. The new error_resume, or the old slot_reset could take care of it. The specific device driver knows all the details about how to initiate the devices. The error_resume could call the step a) b) c) sequencially while doing checking among steps. If there is really a device having specific requirement to reinitiate it (very rarely), it could use walkaround, such like schedule a WORKER. No need to provide a generic mmio_enabled. > > > 2) Callback slot_reset could be merged with resume. The new resume could be: > > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > > think there is no actual reason to have both slot_reset and resume. > > The idea here was to handle multi-function cards. On a multi-function card, > *all* devices need to indicate that they were able to reset. Once all devices > have been successfuly reset, then operation can be resumed. If the reset > of one function fails, then operation is not resumed for any f the > functions. I don't think we need slot_reset to coordinate multi-function devices. The new error_resume could take care of multi-function card. 'reset' here means driver need do I/O to detect if the device (function) still works well. If a function of a multi-function device couldn't reset while other functions could reset, other functions could just go on to reinitiate. In the end, the error recovery procedure (handle_eeh_events in PowerPC implementation) could check all the returning values of error_resume. If there is a failure value, then removes all the functions' pci_dev of the device from the bus. > > > 3) link_reset is not used in pci express aer implementation, so it could be > > deleted also. > > OK. Link reset was added explicitly to support PCI-E, so if its not wanted, > we can eliminate it. > > > How did you test e1000 err_handler? > > We have three methods (I thought these were documented). In one, a > technician brushes a grounding strap to some of the signal pins. > In the second, slots are populated with known-bad cards. The third test > involes sending a command down to the pci bridge chip, telling it to > behave as if it detected an error. For development, the last is > quick-n-easy. Thanks for your explanation. > > > In the simulated enviroment, the testing might be > > incorrect. > > Why would it be incorrect? I mean, we don't simulate having someone pour a > cup of coffee into the guts of the machine ... but my understanding is > the machines do get standard vibration/thermal/humidity testing, which > is good enough for me. > > > For example, e1000_io_error_detected would call e1000_down to reset NIC. > > Why would that be incorrect? > > > During > > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > > e1000_down I/O don't reach NIC. > > Why would it fail? The e1000_down serves primarily to get the Linux > kernel into a known state. It doesn't matter what happens to the card, > since the next step will be to perform an electrical reset of the card. Who will perform the electrical reset of the card? Function e1000_reset or the platform? If it's the platform, I agree with you, but if it's e1000_reset, it might not work because e1000_reset uses a e1000-specific approach to reset the card. I'm not sure if the e1000_reset will restore the NIC to fresh system power-on state. At least, from the source codes, e1000_reset couldn't. Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-01 3:42 ` Zhang, Yanmin @ 2006-09-01 9:04 ` Zhang, Yanmin 2006-09-01 21:32 ` Linas Vepstas 0 siblings, 1 reply; 21+ messages in thread From: Zhang, Yanmin @ 2006-09-01 9:04 UTC (permalink / raw) To: Linas Vepstas Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Fri, 2006-09-01 at 11:42, Zhang, Yanmin wrote: > On Fri, 2006-09-01 at 01:50, Linas Vepstas wrote: > > On Thu, Aug 31, 2006 at 03:10:12PM +0800, Zhang, Yanmin wrote: > > > Linas, > > > > > > I am reviewing the error handlers of e1000 driver and got some ideas. My > > > startpoint is to simplify the err handler implementations for drivers, or > > > driver developers are *not willing* to add it if it's too complicated. > > > > I don't see that its to complicated ... > Originally, I didn't think so, but after I try to add err_handlers to some > drivers, I feel it's too complicated. > > > > > > 1) Callback mmio_enabled looks useless. Documentation/pci-error-recovery.txt > > > says the current powerpc implementation does not implement this callback. > > > > I don't know if its useless or not. I have not needed it yet for the > > symbios, ipr and e1000 drivers, but its possible that some more > > sophisticated device may want it. I'm tempted to keep it a while > > longer befoe discarding it. > > > > The scenario is this: the device driver decides that, rather than asking > > for a full electical reset of the card, instead, it wants to perform > > its own recovery. It can do this as follows: > > > > a) enable MMIO > > b) issue reset command to adapter > > c) enable DMA. > > > > If we enabled both DMA and MMIO at the same time, there are mnay cases > > where the card will immediately trap again -- for example, if its > > DMA'ing to some crazy address. Thus, typically, one wants DMA disabled > > until after the card reset. Withouth the mmio_enabled() reset, there > > is no way of doing this. > The new error_resume, or the old slot_reset could take care of it. The specific > device driver knows all the details about how to initiate the devices. The > error_resume could call the step a) b) c) sequencially while doing checking among > steps. > > If there is really a device having specific requirement to reinitiate it (very rarely), > it could use walkaround, such like schedule a WORKER. No need to provide a generic > mmio_enabled. > > > > > > 2) Callback slot_reset could be merged with resume. The new resume could be: > > > int (*error_resume)(struct pci_dev *dev); I checked e1000 and e100 drivers and > > > think there is no actual reason to have both slot_reset and resume. > > > > The idea here was to handle multi-function cards. On a multi-function card, > > *all* devices need to indicate that they were able to reset. Once all devices > > have been successfuly reset, then operation can be resumed. If the reset > > of one function fails, then operation is not resumed for any f the > > functions. > I don't think we need slot_reset to coordinate multi-function devices. The new > error_resume could take care of multi-function card. 'reset' here means driver > need do I/O to detect if the device (function) still works well. If a function > of a multi-function device couldn't reset while other functions could reset, > other functions could just go on to reinitiate. In the end, the error recovery > procedure (handle_eeh_events in PowerPC implementation) could check all the > returning values of error_resume. If there is a failure value, then removes > all the functions' pci_dev of the device from the bus. > > > > > > 3) link_reset is not used in pci express aer implementation, so it could be > > > deleted also. > > > > OK. Link reset was added explicitly to support PCI-E, so if its not wanted, > > we can eliminate it. > > > > > How did you test e1000 err_handler? > > > > We have three methods (I thought these were documented). In one, a > > technician brushes a grounding strap to some of the signal pins. > > In the second, slots are populated with known-bad cards. The third test > > involes sending a command down to the pci bridge chip, telling it to > > behave as if it detected an error. For development, the last is > > quick-n-easy. > Thanks for your explanation. > > > > > > In the simulated enviroment, the testing might be > > > incorrect. > > > > Why would it be incorrect? I mean, we don't simulate having someone pour a > > cup of coffee into the guts of the machine ... but my understanding is > > the machines do get standard vibration/thermal/humidity testing, which > > is good enough for me. > > > > > For example, e1000_io_error_detected would call e1000_down to reset NIC. > > > > Why would that be incorrect? > > > > > During > > > our last discussion on LKML, you said PowerPC will block further I/O if the platform captures > > > a pci error, so the all I/O in e1000_down will be blocked. Later on, e1000_io_slot_reset > > > will reenable pci device and initiate NIC. I guess late initiate might fail because prior > > > e1000_down I/O don't reach NIC. > > > > Why would it fail? The e1000_down serves primarily to get the Linux > > kernel into a known state. It doesn't matter what happens to the card, > > since the next step will be to perform an electrical reset of the card. > Who will perform the electrical reset of the card? Function e1000_reset or the platform? > If it's the platform, I agree with you, but if it's e1000_reset, it might not work because > e1000_reset uses a e1000-specific approach to reset the card. I'm not sure if the e1000_reset > will restore the NIC to fresh system power-on state. At least, from the source codes, e1000_reset > couldn't. One more comment: The second parameter of error_detected also could be deleted because recovery procedures will save error to pci_dev->error_state. So, the err_handler pci_error_handlers could be: struct pci_error_handlers { pci_ers_result_t (*error_detected)(struct pci_dev *dev); pci_ers_result_t (*error_resume)(struct pci_dev *dev); }; Yanmin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: pci error recovery procedure 2006-09-01 9:04 ` Zhang, Yanmin @ 2006-09-01 21:32 ` Linas Vepstas 0 siblings, 0 replies; 21+ messages in thread From: Linas Vepstas @ 2006-09-01 21:32 UTC (permalink / raw) To: Zhang, Yanmin Cc: linuxppc-dev, linux-pci maillist, Yanmin Zhang, LKML, Rajesh Shah On Fri, Sep 01, 2006 at 05:04:09PM +0800, Zhang, Yanmin wrote: > One more comment: The second parameter of error_detected also could be deleted > because recovery procedures will save error to pci_dev->error_state. Yes, I beleive so. > So, the err_handler pci_error_handlers could be: > struct pci_error_handlers > { > pci_ers_result_t (*error_detected)(struct pci_dev *dev); > pci_ers_result_t (*error_resume)(struct pci_dev *dev); > }; No, as per other email, we still need a multi-step process for multi-function cards, and for cards that may not want to get a full electrical reset. Finally, there might be platforms that cannot perform a per-slot electrical reset, and would therefore require drivers that can recover on thier own. --linas ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2006-09-12 19:39 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1157008212.20092.36.camel@ymzhang-perf.sh.intel.com>
2006-08-31 17:50 ` pci error recovery procedure Linas Vepstas
2006-09-01 3:33 ` Zhang, Yanmin
2006-09-01 21:25 ` Linas Vepstas
2006-09-04 5:47 ` Zhang, Yanmin
2006-09-04 9:03 ` Benjamin Herrenschmidt
2006-09-05 2:32 ` Zhang, Yanmin
2006-09-05 19:01 ` Linas Vepstas
2006-09-06 1:26 ` Zhang, Yanmin
2006-09-06 20:01 ` Linas Vepstas
2006-09-07 1:56 ` Zhang, Yanmin
2006-09-05 18:50 ` Linas Vepstas
2006-09-05 21:19 ` Benjamin Herrenschmidt
2006-09-06 1:35 ` Zhang, Yanmin
2006-09-05 19:17 ` Linas Vepstas
2006-09-06 2:04 ` Zhang, Yanmin
2006-09-06 20:39 ` Linas Vepstas
2006-09-07 3:18 ` Zhang, Yanmin
2006-09-12 19:38 ` Linas Vepstas
2006-09-01 3:42 ` Zhang, Yanmin
2006-09-01 9:04 ` Zhang, Yanmin
2006-09-01 21:32 ` Linas Vepstas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).