From: Bjorn Helgaas <helgaas@kernel.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
"Stephen Bates" <sbates@raithlin.com>,
"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
"Keith Busch" <keith.busch@intel.com>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Jason Gunthorpe" <jgg@mellanox.com>,
"Max Gurtovoy" <maxg@mellanox.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>
Subject: Re: [PATCH 01/12] pci-p2p: Support peer to peer memory
Date: Thu, 4 Jan 2018 15:40:29 -0600 [thread overview]
Message-ID: <20180104214028.GD189897@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180104190137.7654-2-logang@deltatee.com>
Run "git log --oneline drivers/pci" and follow the convention. I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous. "P2PDMA"?
When you add new files, I guess we're looking for the new SPDX
copyright stuff?
On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
>
> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.
>
> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
>
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial.
It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it? AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.
> +config PCI_P2P
> + bool "PCI Peer to Peer transfer support"
> + depends on ZONE_DEVICE
> + select GENERIC_ALLOCATOR
> + help
> + Enableѕ drivers to do PCI peer to peer transactions to and from
> + bars that are exposed to other devices in the same domain.
s/bars/BARs/ (and similarly below, except in C code)
Similarly, s/dma/DMA/ and s/pci/PCI/ below.
And probably also s/p2p/peer-to-peer DMA/ in messages.
Maybe clarify this domain bit. Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.
> +
> + Many PCIe root complexes do not support P2P transactions and
> + it's hard to tell which support it with good performance, so
> + at this time you will need a PCIe switch.
> +
> + If unsure, say N.
> + * pci_p2pmem_add_resource - add memory for use as p2p memory
> + * @pci: the device to add the memory to
> + * @bar: PCI bar to add
> + * @size: size of the memory to add, may be zero to use the whole bar
> + * @offset: offset into the PCI bar
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any dma request.
> + */
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> + u64 offset)
> +{
> + struct dev_pagemap *pgmap;
> + void *addr;
> + int error;
Seems like there should be
if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
return -EINVAL;
or similar here?
> + if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
> + return -EINVAL;
Are these WARN_ONs for debugging purposes, or do you think we need
them in production? Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.
> + if (!size)
> + size = pci_resource_len(pdev, bar) - offset;
> +
> + if (WARN_ON(size + offset > pci_resource_len(pdev, bar)))
> + return -EINVAL;
> +
> + if (!pdev->p2p) {
> + error = pci_p2pmem_setup(pdev);
> + if (error)
> + return error;
> + }
> +
> + pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> + if (!pgmap)
> + return -ENOMEM;
> +
> + pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> + pgmap->res.end = pgmap->res.start + size - 1;
I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.
I think you should set pgmap->res.flags here, too.
> + pgmap->ref = &pdev->p2p->devmap_ref;
> + pgmap->type = MEMORY_DEVICE_PCI_P2P;
> +
> + addr = devm_memremap_pages(&pdev->dev, pgmap);
> + if (IS_ERR(addr))
> + return PTR_ERR(addr);
> +
> + error = gen_pool_add_virt(pdev->p2p->pool, (uintptr_t)addr,
> + pci_bus_address(pdev, bar) + offset,
> + resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> + if (error)
> + return error;
> +
> + error = devm_add_action_or_reset(&pdev->dev, pci_p2pmem_percpu_kill,
> + &pdev->p2p->devmap_ref);
> + if (error)
> + return error;
> +
> + dev_info(&pdev->dev, "added %zdB of p2p memory\n", size);
Can we add %pR and print pgmap->res itself, too?
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_add_resource);
> + * If a device is behind a switch, we try to find the upstream bridge
> + * port of the switch. This requires two calls to pci_upstream_bridge:
> + * one for the upstream port on the switch, one on the upstream port
> + * for the next level in the hierarchy. Because of this, devices connected
> + * to the root port will be rejected.
> + */
> +static struct pci_dev *get_upstream_switch_port(struct pci_dev *pdev)
> +{
> + struct pci_dev *up1, *up2;
> +
> + if (!pdev)
> + return NULL;
> +
> + up1 = pci_dev_get(pci_upstream_bridge(pdev));
> + if (!up1)
> + return NULL;
> +
> + up2 = pci_dev_get(pci_upstream_bridge(up1));
> + pci_dev_put(up1);
> +
> + return up2;
> +}
> +
> +static bool __upstream_bridges_match(struct pci_dev *upstream,
> + struct pci_dev *client)
> +{
> + struct pci_dev *dma_up;
> + bool ret = true;
> +
> + dma_up = get_upstream_switch_port(client);
> +
> + if (!dma_up) {
> + dev_dbg(&client->dev, "not a pci device behind a switch\n");
You have a bit of a mix of PCI ("pci device", "bridge") and PCIe
("switch", "switch port") terminology. I haven't read the rest of the
patches yet, so I don't know if you intend to restrict this to
PCIe-only, e.g., so you can use ACS, or if you want to make it
available on conventional PCI as well.
If the latter, I would use the generic PCI terminology, i.e., "bridge"
instead of "switch".
> + * pci_virt_to_bus - return the pci bus address for a given virtual
> + * address obtained with pci_alloc_p2pmem
> + * @pdev: the device the memory was allocated from
> + * @addr: address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> + if (!addr)
> + return 0;
> + if (!pdev->p2p)
> + return 0;
> +
> + return gen_pool_virt_to_phys(pdev->p2p->pool, (unsigned long)addr);
This doesn't seem right. A physical address is not the same as a PCI
bus address. I expected something like pci_bus_address() or
pcibios_resource_to_bus() here. Am I missing something? If so, a
clarifying comment would be helpful.
> + * pci_p2pmem_publish - publish the p2p memory for use by other devices
> + * with pci_p2pmem_find
> + * @pdev: the device with p2p memory to publish
> + * @publish: set to true to publish the memory, false to unpublish it
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> + if (WARN_ON(publish && !pdev->p2p))
> + return;
Same WARN_ON question -- is this really intended for production?
Doesn't seem like the end user can really do anything with the warning
information.
> diff --git a/include/linux/pci-p2p.h b/include/linux/pci-p2p.h
> new file mode 100644
> index 000000000000..f811c97a5886
> --- /dev/null
> +++ b/include/linux/pci-p2p.h
> @@ -0,0 +1,85 @@
> +#ifndef _LINUX_PCI_P2P_H
> +#define _LINUX_PCI_P2P_H
> +/*
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;
I've been noticing that we're accumulating PCI-related files in
include/linux: pci.h, pci-aspm.h pci-ats.h, pci-dma.h, pcieport_if.h,
etc. I'm not sure there's value in all those and am thinking maybe
they should just be folded into pci.h. What do you think?
> +#ifdef CONFIG_PCI_P2P
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> + u64 offset);
> +int pci_p2pmem_add_client(struct list_head *head, struct device *dev);
> ...
Bjorn
next prev parent reply other threads:[~2018-01-04 21:40 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-04 19:01 [PATCH 00/11] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 01/12] pci-p2p: Support peer to peer memory Logan Gunthorpe
2018-01-04 21:40 ` Bjorn Helgaas [this message]
2018-01-04 23:06 ` Logan Gunthorpe
2018-01-04 21:59 ` Bjorn Helgaas
2018-01-05 0:20 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 02/12] pci-p2p: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-01-04 21:50 ` Bjorn Helgaas
2018-01-04 22:25 ` Jason Gunthorpe
2018-01-04 23:13 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 03/12] pci-p2p: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 04/12] pci-p2p: Clear ACS P2P flags for all client devices Logan Gunthorpe
2018-01-04 21:57 ` Bjorn Helgaas
2018-01-04 22:35 ` Alex Williamson
2018-01-05 0:00 ` Logan Gunthorpe
2018-01-05 1:09 ` Logan Gunthorpe
2018-01-05 3:33 ` Alex Williamson
2018-01-05 6:47 ` Jerome Glisse
2018-01-05 15:41 ` Alex Williamson
2018-01-05 17:10 ` Logan Gunthorpe
2018-01-05 17:18 ` Alex Williamson
2018-01-04 19:01 ` [PATCH 05/12] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-01-04 19:22 ` Jason Gunthorpe
2018-01-04 19:52 ` Logan Gunthorpe
2018-01-04 22:13 ` Jason Gunthorpe
2018-01-04 23:44 ` Logan Gunthorpe
2018-01-05 4:50 ` Jason Gunthorpe
2018-01-08 14:59 ` Christoph Hellwig
2018-01-08 18:09 ` Jason Gunthorpe
2018-01-08 18:17 ` Logan Gunthorpe
2018-01-08 18:29 ` Jason Gunthorpe
2018-01-08 18:34 ` Christoph Hellwig
2018-01-08 18:44 ` Logan Gunthorpe
2018-01-08 18:57 ` Christoph Hellwig
2018-01-08 19:05 ` Logan Gunthorpe
2018-01-09 16:47 ` Christoph Hellwig
2018-01-08 19:49 ` Jason Gunthorpe
2018-01-09 16:46 ` Christoph Hellwig
2018-01-09 17:10 ` Jason Gunthorpe
2018-01-08 19:01 ` Jason Gunthorpe
2018-01-09 16:55 ` Christoph Hellwig
2018-01-04 19:01 ` [PATCH 07/12] nvme-pci: clean up CMB initialization Logan Gunthorpe
2018-01-04 19:08 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 08/12] nvme-pci: clean up SMBSZ bit definitions Logan Gunthorpe
2018-01-04 19:08 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 09/12] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-01-05 15:30 ` Marta Rybczynska
2018-01-05 18:14 ` Logan Gunthorpe
2018-01-05 18:11 ` Keith Busch
2018-01-05 18:19 ` Logan Gunthorpe
2018-01-05 19:01 ` Keith Busch
2018-01-05 19:04 ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 10/12] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 11/12] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 12/12] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180104214028.GD189897@bhelgaas-glaptop.roam.corp.google.com \
--to=helgaas@kernel.org \
--cc=axboe@kernel.dk \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=jgg@mellanox.com \
--cc=jglisse@redhat.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maxg@mellanox.com \
--cc=sagi@grimberg.me \
--cc=sbates@raithlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox