From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from jcornwall.me ([50.116.27.114]:47663 "EHLO jcornwall.me" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753148AbbIWQrU (ORCPT ); Wed, 23 Sep 2015 12:47:20 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Date: Wed, 23 Sep 2015 11:44:33 -0500 From: Jay Cornwall To: Bjorn Helgaas Cc: linux-pci@vger.kernel.org Subject: Re: [PATCH RFC 0/1] Add AtomicOp Requester support In-Reply-To: <20150922162208.GU25767@google.com> References: <1440018602-4212-1-git-send-email-jay@jcornwall.me> <20150914195840.GA25767@google.com> <96e2c6b2b12b99282c59c97dbb8e5b69@jcornwall.me> <20150921224629.GS25767@google.com> <99163c8fb92c573ec65219b14cb653ea@jcornwall.me> <20150922162208.GU25767@google.com> Message-ID: Sender: linux-pci-owner@vger.kernel.org List-ID: On 2015-09-22 11:22, Bjorn Helgaas wrote: > On Mon, Sep 21, 2015 at 07:16:08PM -0500, Jay Cornwall wrote: >> On 2015-09-21 17:46, Bjorn Helgaas wrote: >> >On Mon, Sep 21, 2015 at 03:44:59PM -0500, Jay Cornwall wrote: >> >>On 2015-09-14 14:58, Bjorn Helgaas wrote: >> >> >> >>>00:1c.0: PCI bridge to [bus 01-04] Root Port, with AtomicOp Routing >> >>>01:00.0: PCI bridge to [bus 02-04] Upstream Port, with AtomicOp Routing >> >>>02:00.0: PCI bridge to [bus 03] Downstream Port, with AtomicOp Routing >> >>>03:00.0: endpoint AtomicOp Completer Supported >> >>>02:00.1: PCI bridge to [bus 04] Downstream Port, with AtomicOp Routing >> >>>04:00.0: endpoint no AtomicOp Completer support >> >>> >> I think I was confused by your earlier comment: >> >> >>>I assume the common usage scenario is to enable AtomicOps for >> >>>host-to-device and/or device-to-host transactions, and we can ignore >> >>>device-to-device transactions for now. >> >> 00:1c.0 -> 04:00.0 would be the host-to-device scenario. It's true >> that 04:00.0 does not support AtomicOp completion in the above >> example. However, enabling AtomicOp requests for 04:00.0 would cause >> egress blocks to be set in the 00:1c.0 -> 03:00.0 path. > > In this particular example, enabling AtomicOp requests for 04:00.0 > should fail and do nothing, because 04:00.0 doesn't support AtomicOps. > > But you're right that if pci_enable_atomic_request() turns on egress > blocking bits, and both 03:00.0 and 04:00.0 support AtomicOps, > enabling AtomicOps for one device would block them for the other. > > I'm not sure how we should use egress blocking. I could imagine > setting egress blocking bits for the whole tree at enumeration-time, > based on analysis of which root ports and functions have "AtomicOp > Complete Supported" bits set. I think this was my central concern. If we were to do this analysis at enumeration-time then a similar argument might be put forward for setting AtomicOp Requester Enable here as well. This would make pci_enable_atomic_request redundant. Thinking more, though, this case is different because AtomicOp Requester capability cannot be detected. Having a per-device call for drivers with knowledge of this capability would seem reasonable. The alternative would be to set DEVCTL2.ATOMICOP_REQUESTER_ENABLE for all devices at enumeration time. > We don't have any guarantee that AtomicOps terminate correctly today, > so maybe we don't need to add one right now. If we *do* decide to add > one, it seems like it would be better done at enumeration-time. > > So I think your v2 patch seems like pretty much what we want, at least > when we want to set up device-to-host AtomicOps: it validates that the > the fabric from device to the root port supports AtomicOp routing, > validates that the root port supports AtomicOp completion, and enables > AtomicOp requests from the function. > > The host-to-device case seems a little different, though. In that > case, I think we need to turn on AtomicOp Requester Enable in the root > port. I think it'd be nicer if the driver didn't have to look up the > root port device, and there might be other issues, too. If you don't > have a need for this case yet, maybe we should leave it unsupported > for now. Yes, that case is somewhat awkward. Our (AMD's) use case is to allow the CPU and GPU to synchronize access to shared data structures, in a uniform way with our integrated parts. These are located in system memory (so, device->host atomics). I can't think of a good reason to reverse this scenario. (Our GPUs don't support AtomicOp Completion in any case.) I'll address your comments and submit a V3 with no design changes. -- Jay Cornwall