linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/1] Add AtomicOp Requester support
@ 2015-08-19 21:10 Jay Cornwall
  2015-08-19 21:10 ` [PATCH RFC 1/1] PCI: Add pci_enable_atomic_request Jay Cornwall
  2015-09-14 19:58 ` [PATCH RFC 0/1] Add AtomicOp Requester support Bjorn Helgaas
  0 siblings, 2 replies; 8+ messages in thread
From: Jay Cornwall @ 2015-08-19 21:10 UTC (permalink / raw)
  To: linux-pci; +Cc: Jay Cornwall

The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be requested
by, routed through and completed by PCIe components. Routing and completion
do not require software support. Component support for each is detectable via
the DEVCAP2 register.

AtomicOp requests are permitted only if a component's
DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
detected but is a no-op if set on a component with no support. These requests
can only be serviced if the upstream components support AtomicOp completion
and/or routing to a component which does.

A concrete example is the AMD Fiji-class GPU, which is specified to support
AtomicOp requests, routed through a PLX 8747 switch (advertising AtomicOp
routing) to a Haswell host bridge (advertising AtomicOp completion support).
When AtomicOp requests are disabled the GPU logs attempts to initiate requests
to an MMIO register for debugging.

Several approaches for software support might be considered:

  1. drivers/pci sets DEVCTL2.ATOMICOP_REQUESTER_ENABLE unconditionally for
     all endpoints and root ports

  2. drivers/pci attempts to establish a routable path to a completer prior to
     setting DEVCTL2.ATOMICOP_REQUESTER_ENABLE

  3. Approach 1/2 with individual drivers (drm/amdgpu in the above example)
     initiating the request for AtomicOp requester support through a function
     exported from drivers/pci

  4. Individual drivers specify a target component for AtomicOp completion

Approach 1 has two drawbacks. There is no guarantee that there is a reachable
component which can complete an AtomicOp request. It also prevents individual
drivers from blacklisting devices with known incorrect implementations. This
might otherwise provide useful diagnostics information. (e.g. AMD GPUs will
log an error to an MMIO register if the AtomicOp requester is disabled when an
atomic memory request would have been promoted to an AtomicOp.)

Approach 2 could only establish that there is a path to at least one completer,
but it would not prevent requests being sent to a different device which does
not support AtomicOp completion. For example, a root complex might support
completion but a transaction could be sent to a different device which does
not. The routable guarantee is not precise and so less useful.

Approach 3 allows drivers to enable AtomicOp requests on a per-device basis to
support blacklisting. A downside is that if AtomicOp support becomes more
prevalent it may be undesirable to explicitly enable the feature in individual
drivers.

Approach 4 is intractable as the target for a transaction is generally known
only by the application. DEVCTL2.ATOMICOP_REQUESTER_ENABLE is also a 1:many
capability and would not align well with this model.

In the absence of an ideal solution, I think approach 3(1) is the most
appropriate. I am open to suggestions for an improved implementation.

Jay Cornwall (1):
  PCI: Add pci_enable_atomic_request

 drivers/pci/pci.c             | 23 +++++++++++++++++++++++
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  1 +
 3 files changed, 25 insertions(+)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/1] PCI: Add pci_enable_atomic_request
  2015-08-19 21:10 [PATCH RFC 0/1] Add AtomicOp Requester support Jay Cornwall
@ 2015-08-19 21:10 ` Jay Cornwall
  2015-09-14 19:58 ` [PATCH RFC 0/1] Add AtomicOp Requester support Bjorn Helgaas
  1 sibling, 0 replies; 8+ messages in thread
From: Jay Cornwall @ 2015-08-19 21:10 UTC (permalink / raw)
  To: linux-pci; +Cc: Jay Cornwall

Allow individual drivers to control AtomicOp Requester PCIe 3.0 capability.
This is a no-op on devices which do not support AtomicOp requests.

Signed-off-by: Jay Cornwall <jay@jcornwall.me>
---
 drivers/pci/pci.c             | 23 +++++++++++++++++++++++
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0008c95..1b9d1a9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2410,6 +2410,29 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 }
 
 /**
+ * pci_enable_atomic_request - enable or disable AtomicOp requester
+ * @dev: the PCI device
+ */
+void pci_enable_atomic_request(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	switch (pci_pcie_type(dev)) {
+	/* PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
+	 * to implement AtomicOp requester capabilities. */
+	case PCI_EXP_TYPE_ENDPOINT:
+	case PCI_EXP_TYPE_LEG_END:
+	case PCI_EXP_TYPE_RC_END:
+	case PCI_EXP_TYPE_ROOT_PORT:
+		pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
+					 PCI_EXP_DEVCTL2_ATOMICOP_REQ);
+		break;
+	}
+}
+EXPORT_SYMBOL(pci_enable_atomic_request);
+
+/**
  * pci_acs_path_enable - test ACS flags from start to end in a hierarchy
  * @start: starting downstream device
  * @end: ending upstream device or NULL to search to the root bus
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8a0321a..946c3ce 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1750,6 +1750,7 @@ void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
 			  struct pci_dev *end, u16 acs_flags);
+void pci_enable_atomic_request(struct pci_dev *dev);
 
 #define PCI_VPD_LRDT			0x80	/* Large Resource Data Type */
 #define PCI_VPD_LRDT_ID(x)		((x) | PCI_VPD_LRDT)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 413417f..d2d2e8d 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -578,6 +578,7 @@
 #define PCI_EXP_DEVCTL2		40	/* Device Control 2 */
 #define  PCI_EXP_DEVCTL2_COMP_TIMEOUT	0x000f	/* Completion Timeout Value */
 #define  PCI_EXP_DEVCTL2_ARI		0x0020	/* Alternative Routing-ID */
+#define  PCI_EXP_DEVCTL2_ATOMICOP_REQ	0x0040	/* Allow AtomicOp Requests */
 #define  PCI_EXP_DEVCTL2_IDO_REQ_EN	0x0100	/* Allow IDO for requests */
 #define  PCI_EXP_DEVCTL2_IDO_CMP_EN	0x0200	/* Allow IDO for completions */
 #define  PCI_EXP_DEVCTL2_LTR_EN		0x0400	/* Enable LTR mechanism */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-08-19 21:10 [PATCH RFC 0/1] Add AtomicOp Requester support Jay Cornwall
  2015-08-19 21:10 ` [PATCH RFC 1/1] PCI: Add pci_enable_atomic_request Jay Cornwall
@ 2015-09-14 19:58 ` Bjorn Helgaas
  2015-09-21 20:44   ` Jay Cornwall
  1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2015-09-14 19:58 UTC (permalink / raw)
  To: Jay Cornwall; +Cc: linux-pci

Hi Jay,

The [0/n] overview text doesn't get included in the changelog, but
some of this description is very useful background.  So I'd suggest
incorporating at least part of it into the patch changelog.

On Wed, Aug 19, 2015 at 04:10:01PM -0500, Jay Cornwall wrote:
> The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be requested
> by, routed through and completed by PCIe components. Routing and completion
> do not require software support. Component support for each is detectable via
> the DEVCAP2 register.
> 
> AtomicOp requests are permitted only if a component's
> DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
> detected but is a no-op if set on a component with no support. These requests
> can only be serviced if the upstream components support AtomicOp completion
> and/or routing to a component which does.
> 
> A concrete example is the AMD Fiji-class GPU, which is specified to support
> AtomicOp requests, routed through a PLX 8747 switch (advertising AtomicOp
> routing) to a Haswell host bridge (advertising AtomicOp completion support).
> When AtomicOp requests are disabled the GPU logs attempts to initiate requests
> to an MMIO register for debugging.
> 
> Several approaches for software support might be considered:
> 
>   1. drivers/pci sets DEVCTL2.ATOMICOP_REQUESTER_ENABLE unconditionally for
>      all endpoints and root ports
> 
>   2. drivers/pci attempts to establish a routable path to a completer prior to
>      setting DEVCTL2.ATOMICOP_REQUESTER_ENABLE
> 
>   3. Approach 1/2 with individual drivers (drm/amdgpu in the above example)
>      initiating the request for AtomicOp requester support through a function
>      exported from drivers/pci
> 
>   4. Individual drivers specify a target component for AtomicOp completion
> 
> Approach 1 has two drawbacks. There is no guarantee that there is a reachable
> component which can complete an AtomicOp request. It also prevents individual
> drivers from blacklisting devices with known incorrect implementations. This
> might otherwise provide useful diagnostics information. (e.g. AMD GPUs will
> log an error to an MMIO register if the AtomicOp requester is disabled when an
> atomic memory request would have been promoted to an AtomicOp.)
> 
> Approach 2 could only establish that there is a path to at least one completer,
> but it would not prevent requests being sent to a different device which does
> not support AtomicOp completion. For example, a root complex might support
> completion but a transaction could be sent to a different device which does
> not. The routable guarantee is not precise and so less useful.

I assume the common usage scenario is to enable AtomicOps for
host-to-device and/or device-to-host transactions, and we can ignore
device-to-device transactions for now.

If I understand correctly, AtomicOps must be supported by all devices
along the path, e.g., a Root Port, possibly some Switch Ports, and
finally an Endpoint.  I guess your worry with Approach 2 is for a
scenario like this:

  00:1c.0: PCI bridge to [bus 01-04]  Root Port, with AtomicOp Routing
  01:00.0: PCI bridge to [bus 02-04]  Upstream Port, with AtomicOp Routing
  02:00.0: PCI bridge to [bus 03]     Downstream Port, with AtomicOp Routing
  03:00.0: endpoint                   AtomicOp Completer Supported
  02:00.1: PCI bridge to [bus 04]     Downstream Port, with AtomicOp Routing
  04:00.0: endpoint                   no AtomicOp Completer support

It's true that we wouldn't want to enable AtomicOp routing to 04:00.0,
but isn't that what the AtomicOp Egress Blocking bit is for?  If we
set that in 02:00.1, we should be safe in the sense that AtomicOps
targeting 04:00.0 should cause non-fatal errors.

> Approach 3 allows drivers to enable AtomicOp requests on a per-device basis to
> support blacklisting. A downside is that if AtomicOp support becomes more
> prevalent it may be undesirable to explicitly enable the feature in individual
> drivers.

Your pci_enable_atomic_request() enables AtomicOps for one component.
I assume that means the driver would have to map out the topology,
figure out whether all the components support AtomicOp routing, and
call pci_enable_atomic_request() for the Root Port and the Endpoint.
That seems like a lot of grubbing around for a driver.

I think a driver should only call pci_enable_atomic_request() for its
own device, and the PCI core should figure out whether it can be
enabled, and if it can, do everything needed to enable it.

> Approach 4 is intractable as the target for a transaction is generally known
> only by the application. DEVCTL2.ATOMICOP_REQUESTER_ENABLE is also a 1:many
> capability and would not align well with this model.
> 
> In the absence of an ideal solution, I think approach 3(1) is the most
> appropriate. I am open to suggestions for an improved implementation.
> 
> Jay Cornwall (1):
>   PCI: Add pci_enable_atomic_request
> 
>  drivers/pci/pci.c             | 23 +++++++++++++++++++++++
>  include/linux/pci.h           |  1 +
>  include/uapi/linux/pci_regs.h |  1 +
>  3 files changed, 25 insertions(+)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-09-14 19:58 ` [PATCH RFC 0/1] Add AtomicOp Requester support Bjorn Helgaas
@ 2015-09-21 20:44   ` Jay Cornwall
  2015-09-21 22:46     ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Jay Cornwall @ 2015-09-21 20:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On 2015-09-14 14:58, Bjorn Helgaas wrote:

> On Wed, Aug 19, 2015 at 04:10:01PM -0500, Jay Cornwall wrote:

>> Approach 2 could only establish that there is a path to at least one 
>> completer,
>> but it would not prevent requests being sent to a different device 
>> which does
>> not support AtomicOp completion. For example, a root complex might 
>> support
>> completion but a transaction could be sent to a different device which 
>> does
>> not. The routable guarantee is not precise and so less useful.

> I assume the common usage scenario is to enable AtomicOps for
> host-to-device and/or device-to-host transactions, and we can ignore
> device-to-device transactions for now.
> 
> If I understand correctly, AtomicOps must be supported by all devices
> along the path, e.g., a Root Port, possibly some Switch Ports, and
> finally an Endpoint. I guess your worry with Approach 2 is for a
> scenario like this:
> 
> 00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing
> 01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing
> 02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing
> 03:00.0: endpoint AtomicOp Completer Supported
> 02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing
> 04:00.0: endpoint no AtomicOp Completer support
> 
> It's true that we wouldn't want to enable AtomicOp routing to 04:00.0,
> but isn't that what the AtomicOp Egress Blocking bit is for? If we
> set that in 02:00.1, we should be safe in the sense that AtomicOps
> targeting 04:00.0 should cause non-fatal errors.

If 02:00.1 had egress blocking then, if I understand correctly, a 
00:1c.0 -> 04:00.0 AtomicOp request would be blocked. Host-to-device and 
device-to-device look quite similar from this perspective.

> Your pci_enable_atomic_request() enables AtomicOps for one component.
> I assume that means the driver would have to map out the topology,
> figure out whether all the components support AtomicOp routing, and
> call pci_enable_atomic_request() for the Root Port and the Endpoint.
> That seems like a lot of grubbing around for a driver.
> 
> I think a driver should only call pci_enable_atomic_request() for its
> own device, and the PCI core should figure out whether it can be
> enabled, and if it can, do everything needed to enable it.

OK. I've put together a v2 patch which checks the upstream bridges for 
AtomicOp routing and the root port for completion. I've left out egress 
blocking per the above concern.

-- 
Jay Cornwall

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-09-21 20:44   ` Jay Cornwall
@ 2015-09-21 22:46     ` Bjorn Helgaas
  2015-09-22  0:16       ` Jay Cornwall
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2015-09-21 22:46 UTC (permalink / raw)
  To: Jay Cornwall; +Cc: linux-pci

On Mon, Sep 21, 2015 at 03:44:59PM -0500, Jay Cornwall wrote:
> On 2015-09-14 14:58, Bjorn Helgaas wrote:
> 
> >On Wed, Aug 19, 2015 at 04:10:01PM -0500, Jay Cornwall wrote:
> 
> >>Approach 2 could only establish that there is a path to at least
> >>one completer,
> >>but it would not prevent requests being sent to a different
> >>device which does
> >>not support AtomicOp completion. For example, a root complex
> >>might support
> >>completion but a transaction could be sent to a different device
> >>which does
> >>not. The routable guarantee is not precise and so less useful.
> 
> >I assume the common usage scenario is to enable AtomicOps for
> >host-to-device and/or device-to-host transactions, and we can ignore
> >device-to-device transactions for now.
> >
> >If I understand correctly, AtomicOps must be supported by all devices
> >along the path, e.g., a Root Port, possibly some Switch Ports, and
> >finally an Endpoint. I guess your worry with Approach 2 is for a
> >scenario like this:
> >
> >00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing
> >01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing
> >02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing
> >03:00.0: endpoint AtomicOp Completer Supported
> >02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing
> >04:00.0: endpoint no AtomicOp Completer support
> >
> >It's true that we wouldn't want to enable AtomicOp routing to 04:00.0,
> >but isn't that what the AtomicOp Egress Blocking bit is for? If we
> >set that in 02:00.1, we should be safe in the sense that AtomicOps
> >targeting 04:00.0 should cause non-fatal errors.
> 
> If 02:00.1 had egress blocking then, if I understand correctly, a
> 00:1c.0 -> 04:00.0 AtomicOp request would be blocked.

Yes, a 1c.0 -> 04:00.0 AtomicOp request would be blocked, but 04:00.0
doesn't support AtomicOps, so we *want* that request to be blocked,
don't we?  If 04:00.0 received an AtomicOp, I think it would handle it
as a Malformed TLP, which by default is a Fatal Error.

If we set AtomicOpEgress Blocking in 02:00.1 and attempt a 1c.0 ->
04:00.0 AtomicOp request, my reading is that 02:00.1 should report an
AtomicOp Egress Blocked error, which by default is an Advisory
Non-Fatal Error, and 04:00.0 should never receive the AtomicOp.

This is from the second-to-last paragraph of PCIe spec r3.0, sec 6.15.

Even if we set AtomicOpEgress Blocking in 02:00.1, an AtomicOp to
03:00.0 should work, because that would be routed via 02:00.0, not
02:00.1.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-09-21 22:46     ` Bjorn Helgaas
@ 2015-09-22  0:16       ` Jay Cornwall
  2015-09-22 16:22         ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Jay Cornwall @ 2015-09-22  0:16 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On 2015-09-21 17:46, Bjorn Helgaas wrote:
> On Mon, Sep 21, 2015 at 03:44:59PM -0500, Jay Cornwall wrote:
>> On 2015-09-14 14:58, Bjorn Helgaas wrote:
>> 
>> >00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing
>> >01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing
>> >02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing
>> >03:00.0: endpoint AtomicOp Completer Supported
>> >02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing
>> >04:00.0: endpoint no AtomicOp Completer support
>> >
>> >It's true that we wouldn't want to enable AtomicOp routing to 04:00.0,
>> >but isn't that what the AtomicOp Egress Blocking bit is for? If we
>> >set that in 02:00.1, we should be safe in the sense that AtomicOps
>> >targeting 04:00.0 should cause non-fatal errors.
>> 
>> If 02:00.1 had egress blocking then, if I understand correctly, a
>> 00:1c.0 -> 04:00.0 AtomicOp request would be blocked.
> 
> Yes, a 1c.0 -> 04:00.0 AtomicOp request would be blocked, but 04:00.0
> doesn't support AtomicOps, so we *want* that request to be blocked,
> don't we?  If 04:00.0 received an AtomicOp, I think it would handle it
> as a Malformed TLP, which by default is a Fatal Error.

I think I was confused by your earlier comment:

>> >I assume the common usage scenario is to enable AtomicOps for
>> >host-to-device and/or device-to-host transactions, and we can ignore
>> >device-to-device transactions for now.

00:1c.0 -> 04:00.0 would be the host-to-device scenario. It's true that 
04:00.0 does not support AtomicOp completion in the above example. 
However, enabling AtomicOp requests for 04:00.0 would cause egress 
blocks to be set in the 00:1c.0 -> 03:00.0 path.

Our immediate use case for AtomicOps is indeed device-to-host, and I 
expect this is the most common case. I can make a V3 patch which 
explicitly supports this case only by setting egress blocks on 
downstream ports of upstream bridges.

This would guarantee that AtomicOp requests terminate correctly, at the 
expense of potential users of host-to-device or device-to-device 
scenarios.

-- 
Jay Cornwall

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-09-22  0:16       ` Jay Cornwall
@ 2015-09-22 16:22         ` Bjorn Helgaas
  2015-09-23 16:44           ` Jay Cornwall
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2015-09-22 16:22 UTC (permalink / raw)
  To: Jay Cornwall; +Cc: linux-pci

On Mon, Sep 21, 2015 at 07:16:08PM -0500, Jay Cornwall wrote:
> On 2015-09-21 17:46, Bjorn Helgaas wrote:
> >On Mon, Sep 21, 2015 at 03:44:59PM -0500, Jay Cornwall wrote:
> >>On 2015-09-14 14:58, Bjorn Helgaas wrote:
> >>
> >>>00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing
> >>>01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing
> >>>02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing
> >>>03:00.0: endpoint AtomicOp Completer Supported
> >>>02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing
> >>>04:00.0: endpoint no AtomicOp Completer support
> >>>
> >>>It's true that we wouldn't want to enable AtomicOp routing to 04:00.0,
> >>>but isn't that what the AtomicOp Egress Blocking bit is for? If we
> >>>set that in 02:00.1, we should be safe in the sense that AtomicOps
> >>>targeting 04:00.0 should cause non-fatal errors.
> >>
> >>If 02:00.1 had egress blocking then, if I understand correctly, a
> >>00:1c.0 -> 04:00.0 AtomicOp request would be blocked.
> >
> >Yes, a 1c.0 -> 04:00.0 AtomicOp request would be blocked, but 04:00.0
> >doesn't support AtomicOps, so we *want* that request to be blocked,
> >don't we?  If 04:00.0 received an AtomicOp, I think it would handle it
> >as a Malformed TLP, which by default is a Fatal Error.
> 
> I think I was confused by your earlier comment:
> 
> >>>I assume the common usage scenario is to enable AtomicOps for
> >>>host-to-device and/or device-to-host transactions, and we can ignore
> >>>device-to-device transactions for now.
> 
> 00:1c.0 -> 04:00.0 would be the host-to-device scenario. It's true
> that 04:00.0 does not support AtomicOp completion in the above
> example. However, enabling AtomicOp requests for 04:00.0 would cause
> egress blocks to be set in the 00:1c.0 -> 03:00.0 path.

In this particular example, enabling AtomicOp requests for 04:00.0
should fail and do nothing, because 04:00.0 doesn't support AtomicOps.

But you're right that if pci_enable_atomic_request() turns on egress
blocking bits, and both 03:00.0 and 04:00.0 support AtomicOps,
enabling AtomicOps for one device would block them for the other.

I'm not sure how we should use egress blocking.  I could imagine
setting egress blocking bits for the whole tree at enumeration-time,
based on analysis of which root ports and functions have "AtomicOp
Complete Supported" bits set.

I could also imagine setting egress blocking bits at the time a driver
calls pci_enable_atomic_request(), but I don't really like the way this
could affect unrelated devices.  For example,

  Case A
    - Enumerate devices, leaving egress blocking bits cleared
    - Send 1c.0 -> 04:00.0 AtomicOp
    - 04:00.0 raises Malformed TLP Fatal Error

  Case B
    - Enumerate devices, leaving egress blocking bits cleared
    - 03:00.0 driver calls pci_enable_atomic_request()
      - Set 02:00.01 egress blocking bit
    - Send 1c.0 -> 04:00.0 AtomicOp
    - 02:00.01 raises AtomicOp Egress Blocked Advisory Non-Fatal Error

It seems wrong that the same action (AtomicOp to 04:00.0) causes
different errors, depending on an unrelated device.

We don't have any guarantee that AtomicOps terminate correctly today,
so maybe we don't need to add one right now.  If we *do* decide to add
one, it seems like it would be better done at enumeration-time.

So I think your v2 patch seems like pretty much what we want, at least
when we want to set up device-to-host AtomicOps: it validates that the
the fabric from device to the root port supports AtomicOp routing,
validates that the root port supports AtomicOp completion, and enables
AtomicOp requests from the function.

The host-to-device case seems a little different, though.  In that
case, I think we need to turn on AtomicOp Requester Enable in the root
port.  I think it'd be nicer if the driver didn't have to look up the
root port device, and there might be other issues, too.  If you don't
have a need for this case yet, maybe we should leave it unsupported
for now.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/1] Add AtomicOp Requester support
  2015-09-22 16:22         ` Bjorn Helgaas
@ 2015-09-23 16:44           ` Jay Cornwall
  0 siblings, 0 replies; 8+ messages in thread
From: Jay Cornwall @ 2015-09-23 16:44 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On 2015-09-22 11:22, Bjorn Helgaas wrote:
> On Mon, Sep 21, 2015 at 07:16:08PM -0500, Jay Cornwall wrote:
>> On 2015-09-21 17:46, Bjorn Helgaas wrote:
>> >On Mon, Sep 21, 2015 at 03:44:59PM -0500, Jay Cornwall wrote:
>> >>On 2015-09-14 14:58, Bjorn Helgaas wrote:
>> >>
>> >>>00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing
>> >>>01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing
>> >>>02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing
>> >>>03:00.0: endpoint AtomicOp Completer Supported
>> >>>02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing
>> >>>04:00.0: endpoint no AtomicOp Completer support
>> >>>
>> I think I was confused by your earlier comment:
>> 
>> >>>I assume the common usage scenario is to enable AtomicOps for
>> >>>host-to-device and/or device-to-host transactions, and we can ignore
>> >>>device-to-device transactions for now.
>> 
>> 00:1c.0 -> 04:00.0 would be the host-to-device scenario. It's true
>> that 04:00.0 does not support AtomicOp completion in the above
>> example. However, enabling AtomicOp requests for 04:00.0 would cause
>> egress blocks to be set in the 00:1c.0 -> 03:00.0 path.
> 
> In this particular example, enabling AtomicOp requests for 04:00.0
> should fail and do nothing, because 04:00.0 doesn't support AtomicOps.
> 
> But you're right that if pci_enable_atomic_request() turns on egress
> blocking bits, and both 03:00.0 and 04:00.0 support AtomicOps,
> enabling AtomicOps for one device would block them for the other.
> 
> I'm not sure how we should use egress blocking.  I could imagine
> setting egress blocking bits for the whole tree at enumeration-time,
> based on analysis of which root ports and functions have "AtomicOp
> Complete Supported" bits set.

I think this was my central concern. If we were to do this analysis at 
enumeration-time then a similar argument might be put forward for 
setting AtomicOp Requester Enable here as well. This would make 
pci_enable_atomic_request redundant.

Thinking more, though, this case is different because AtomicOp Requester 
capability cannot be detected. Having a per-device call for drivers with 
knowledge of this capability would seem reasonable. The alternative 
would be to set DEVCTL2.ATOMICOP_REQUESTER_ENABLE for all devices at 
enumeration time.

> We don't have any guarantee that AtomicOps terminate correctly today,
> so maybe we don't need to add one right now.  If we *do* decide to add
> one, it seems like it would be better done at enumeration-time.
> 
> So I think your v2 patch seems like pretty much what we want, at least
> when we want to set up device-to-host AtomicOps: it validates that the
> the fabric from device to the root port supports AtomicOp routing,
> validates that the root port supports AtomicOp completion, and enables
> AtomicOp requests from the function.
> 
> The host-to-device case seems a little different, though.  In that
> case, I think we need to turn on AtomicOp Requester Enable in the root
> port.  I think it'd be nicer if the driver didn't have to look up the
> root port device, and there might be other issues, too.  If you don't
> have a need for this case yet, maybe we should leave it unsupported
> for now.

Yes, that case is somewhat awkward.

Our (AMD's) use case is to allow the CPU and GPU to synchronize access 
to shared data structures, in a uniform way with our integrated parts. 
These are located in system memory (so, device->host atomics). I can't 
think of a good reason to reverse this scenario. (Our GPUs don't support 
AtomicOp Completion in any case.)

I'll address your comments and submit a V3 with no design changes.

-- 
Jay Cornwall

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-23 16:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-19 21:10 [PATCH RFC 0/1] Add AtomicOp Requester support Jay Cornwall
2015-08-19 21:10 ` [PATCH RFC 1/1] PCI: Add pci_enable_atomic_request Jay Cornwall
2015-09-14 19:58 ` [PATCH RFC 0/1] Add AtomicOp Requester support Bjorn Helgaas
2015-09-21 20:44   ` Jay Cornwall
2015-09-21 22:46     ` Bjorn Helgaas
2015-09-22  0:16       ` Jay Cornwall
2015-09-22 16:22         ` Bjorn Helgaas
2015-09-23 16:44           ` Jay Cornwall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).