* PCIe endpoint crosstalk @ 2013-08-27 7:53 Ludwig Petrosyan 2013-08-27 16:27 ` Bjorn Helgaas 0 siblings, 1 reply; 4+ messages in thread From: Ludwig Petrosyan @ 2013-08-27 7:53 UTC (permalink / raw) To: linux-pci Hi ALL my name is Ludwig Petrosyan, I am developing PCIe endpoint drivers under Ubuntu. I have got the problem and think it is on pciport driver level. I will try to describe the problem and maybe some body could help me. So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe endpoints), lets call card A and card B. as well there are two device drivers for A and B. Card B has bug, after PCIe memory write operation (MWr) the card sends back Completion packet without data (Cpl) (I now it is wrong, but card designed in this way and has to be changed). User process Ua reads data from Card A in loop, everything is OK , but then I start second user process Ub which writes in loop data to card B (bugged card) the Ua gets wrong data. After improving card B the problem was solved, but could be it has to be checked on the PCIe driver level as well. with best regards Ludwig Petrosyan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: PCIe endpoint crosstalk 2013-08-27 7:53 PCIe endpoint crosstalk Ludwig Petrosyan @ 2013-08-27 16:27 ` Bjorn Helgaas 2013-08-28 8:09 ` Ludwig Petrosyan 0 siblings, 1 reply; 4+ messages in thread From: Bjorn Helgaas @ 2013-08-27 16:27 UTC (permalink / raw) To: Ludwig Petrosyan; +Cc: linux-pci On Tue, Aug 27, 2013 at 09:53:35AM +0200, Ludwig Petrosyan wrote: > So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe > endpoints), lets call card A and card B. > as well there are two device drivers for A and B. Card B has bug, after > PCIe memory write operation (MWr) the card sends back Completion > packet without data (Cpl) (I now it is wrong, but card designed in this > way and has to be changed). > User process Ua reads data from Card A in loop, everything is OK , but > then I start second user process Ub which writes in loop data to card B > (bugged card) the Ua gets wrong data. After improving card B the problem > was solved, but could be it has to be checked on the PCIe driver level > as well. PCIe transactions (MWr, MRd, Cpl, etc.) are not directly visible to the OS or the driver. The only thing I can think of that we could do is add a quirk to blacklist the broken version of card B. You can look at existing quirks in drivers/pci/quirks.c. Most of them workaround issues that aren't quite as severe as this one, but we could probably figure out a way to make the device completely unusable. Or do you have something else in mind? Bjorn ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: PCIe endpoint crosstalk 2013-08-27 16:27 ` Bjorn Helgaas @ 2013-08-28 8:09 ` Ludwig Petrosyan 2013-08-28 12:43 ` Bjorn Helgaas 0 siblings, 1 reply; 4+ messages in thread From: Ludwig Petrosyan @ 2013-08-28 8:09 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci On 08/27/2013 06:27 PM, Bjorn Helgaas wrote: > On Tue, Aug 27, 2013 at 09:53:35AM +0200, Ludwig Petrosyan wrote: >> So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe >> endpoints), lets call card A and card B. >> as well there are two device drivers for A and B. Card B has bug, after >> PCIe memory write operation (MWr) the card sends back Completion >> packet without data (Cpl) (I now it is wrong, but card designed in this >> way and has to be changed). >> User process Ua reads data from Card A in loop, everything is OK , but >> then I start second user process Ub which writes in loop data to card B >> (bugged card) the Ua gets wrong data. After improving card B the problem >> was solved, but could be it has to be checked on the PCIe driver level >> as well. > PCIe transactions (MWr, MRd, Cpl, etc.) are not directly visible > to the OS or the driver. > > The only thing I can think of that we could do is add a quirk to > blacklist the broken version of card B. You can look at existing > quirks in drivers/pci/quirks.c. Most of them workaround issues > that aren't quite as severe as this one, but we could probably > figure out a way to make the device completely unusable. > > Or do you have something else in mind? > > Bjorn We have fixed the bug in card B and now it is OK, but question is open, what will happen if we got some PCIe endpoint card with the same bug: read operations from other PCIe devices could be broken. Just I think this problem should be solved on the OS level (I am not sure) I will try to explain how things are going on how I think: User process Ub sends Memory-Write request to card B, this is Posted request, so just after sending the request Ub forgets about it, TLP of this packet contain Requester ID for RootComplex, at the same time user process Ua (the RootComplex is free now) sends non-Posted memory read request to card A and waits for Completion packet, but at the same time the card B (bugged card, it should not send Completion to Posted memory write request) send to RootComplex Completion Packet without data and some how Ua get this data as result of his Memory Read request. Seems the Completer ID (or Tag field) in Completion packet not checked and completion from one PCIe endpoint returned as completion of read request from other PCIe endpoint. I want to say this is only an assumption, just I wont to be sure the bugged PCIe device won't influence operation of other devices But could be this problem has to be solved on PCIe Switch or RootComplex side not in OS side... with best regards Ludwig ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: PCIe endpoint crosstalk 2013-08-28 8:09 ` Ludwig Petrosyan @ 2013-08-28 12:43 ` Bjorn Helgaas 0 siblings, 0 replies; 4+ messages in thread From: Bjorn Helgaas @ 2013-08-28 12:43 UTC (permalink / raw) To: Ludwig Petrosyan; +Cc: linux-pci@vger.kernel.org On Wed, Aug 28, 2013 at 2:09 AM, Ludwig Petrosyan <ludwig.petrosyan@desy.de> wrote: > On 08/27/2013 06:27 PM, Bjorn Helgaas wrote: >> On Tue, Aug 27, 2013 at 09:53:35AM +0200, Ludwig Petrosyan wrote: >>> So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe >>> endpoints), lets call card A and card B. >>> as well there are two device drivers for A and B. Card B has bug, after >>> PCIe memory write operation (MWr) the card sends back Completion >>> packet without data (Cpl) (I now it is wrong, but card designed in this >>> way and has to be changed). >>> User process Ua reads data from Card A in loop, everything is OK , but >>> then I start second user process Ub which writes in loop data to card B >>> (bugged card) the Ua gets wrong data. After improving card B the problem >>> was solved, but could be it has to be checked on the PCIe driver level >>> as well. >> PCIe transactions (MWr, MRd, Cpl, etc.) are not directly visible >> to the OS or the driver. >> >> The only thing I can think of that we could do is add a quirk to >> blacklist the broken version of card B. You can look at existing >> quirks in drivers/pci/quirks.c. Most of them workaround issues >> that aren't quite as severe as this one, but we could probably >> figure out a way to make the device completely unusable. >> >> Or do you have something else in mind? >> >> Bjorn > We have fixed the bug in card B and now it is OK, but question is open, > what will happen if we got some PCIe endpoint card with the same bug: > read operations from other PCIe devices could be broken. Just I think > this problem should be solved on the OS level (I am not sure) > > I will try to explain how things are going on how I think: > > User process Ub sends Memory-Write request to card B, this is Posted > request, so just after sending the request Ub forgets about it, > TLP of this packet contain Requester ID for RootComplex, at the same > time user process Ua (the RootComplex is free now) sends non-Posted > memory read request to card A and waits for Completion packet, but at > the same time the card B (bugged card, it should not send Completion to > Posted memory write request) send to RootComplex Completion Packet > without data and some how Ua get this data as result of his Memory Read > request. Seems the Completer ID (or Tag field) in Completion packet not > checked and completion from one PCIe endpoint returned as completion of > read request from other PCIe endpoint. > > I want to say this is only an assumption, just I wont to be sure the > bugged PCIe device won't influence operation of other devices > But could be this problem has to be solved on PCIe Switch or RootComplex > side not in OS side... Yes. I can't conceive of a way for the OS to deal with this problem. The only thing I can think of is to disable card B altogether. Bjorn ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-08-28 12:44 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-27 7:53 PCIe endpoint crosstalk Ludwig Petrosyan 2013-08-27 16:27 ` Bjorn Helgaas 2013-08-28 8:09 ` Ludwig Petrosyan 2013-08-28 12:43 ` Bjorn Helgaas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.