public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
	"Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Keith Busch" <keith.busch@intel.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>
Subject: Re: [PATCH 01/12] pci-p2p: Support peer to peer memory
Date: Thu, 4 Jan 2018 15:40:29 -0600	[thread overview]
Message-ID: <20180104214028.GD189897@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <20180104190137.7654-2-logang@deltatee.com>

Run "git log --oneline drivers/pci" and follow the convention.  I
think it would make sense to add a new tag like "PCI/P2P", although
"P2P" has historically also been used in the "PCI-to-PCI bridge"
context, so maybe there's something less ambiguous.  "P2PDMA"?

When you add new files, I guess we're looking for the new SPDX
copyright stuff?

On Thu, Jan 04, 2018 at 12:01:26PM -0700, Logan Gunthorpe wrote:
> Some PCI devices may have memory mapped in a BAR space that's
> intended for use in Peer-to-Peer transactions. In order to enable
> such transactions the memory must be registered with ZONE_DEVICE pages
> so it can be used by DMA interfaces in existing drivers.
> 
> A kernel interface is provided so that other subsystems can find and
> allocate chunks of P2P memory as necessary to facilitate transfers
> between two PCI peers. Depending on hardware, this may reduce the
> bandwidth of the transfer but would significantly reduce pressure
> on system memory. This may be desirable in many cases: for example a
> system could be designed with a small CPU connected to a PCI switch by a
> small number of lanes which would maximize the number of lanes available
> to connect to NVME devices.
> 
> The interface requires a user driver to collect a list of client devices
> involved in the transaction with the pci_p2pmem_add_client*() functions
> then call pci_p2pmem_find() to obtain any suitable P2P memory. Once
> this is done the list is bound to the memory and the calling driver is
> free to add and remove clients as necessary. The ACS bits on the
> downstream switch port will be managed for all the registered clients.
> 
> The code is designed to only utilize the p2pmem device if all the devices
> involved in a transfer are behind the same PCI switch. This is because
> using P2P transactions through the PCI root complex can have performance
> limitations or, worse, might not work at all. Finding out how well a
> particular RC supports P2P transfers is non-trivial. 

It's more than "non-trivial" or "with good performance" (from Kconfig
help), isn't it?  AFAIK, there's no standard way at all to discover
whether P2P DMA is supported between root ports or RCs.

> +config PCI_P2P
> +	bool "PCI Peer to Peer transfer support"
> +	depends on ZONE_DEVICE
> +	select GENERIC_ALLOCATOR
> +	help
> +	  Enableѕ drivers to do PCI peer to peer transactions to and from
> +	  bars that are exposed to other devices in the same domain.

s/bars/BARs/ (and similarly below, except in C code)

Similarly, s/dma/DMA/ and s/pci/PCI/ below.

And probably also s/p2p/peer-to-peer DMA/ in messages.

Maybe clarify this domain bit.  Using "domain" suggests the common PCI
segment/domain usage, but I think you really mean something like the
part of the hierarchy where peer-to-peer DMA is guaranteed by the PCI
spec to work, i.e., anything below a single PCI bridge.

> +
> +	  Many PCIe root complexes do not support P2P transactions and
> +	  it's hard to tell which support it with good performance, so
> +	  at this time you will need a PCIe switch.
> +
> +	  If unsure, say N.

> + * pci_p2pmem_add_resource - add memory for use as p2p memory
> + * @pci: the device to add the memory to
> + * @bar: PCI bar to add
> + * @size: size of the memory to add, may be zero to use the whole bar
> + * @offset: offset into the PCI bar
> + *
> + * The memory will be given ZONE_DEVICE struct pages so that it may
> + * be used with any dma request.
> + */
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +			    u64 offset)
> +{
> +	struct dev_pagemap *pgmap;
> +	void *addr;
> +	int error;

Seems like there should be

  if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
    return -EINVAL;

or similar here?

> +	if (WARN_ON(offset >= pci_resource_len(pdev, bar)))
> +		return -EINVAL;

Are these WARN_ONs for debugging purposes, or do you think we need
them in production?  Granted, hitting it would probably be a kernel
driver bug, but still, not sure if the PCI core needs to coddle the
driver author that much.

> +	if (!size)
> +		size = pci_resource_len(pdev, bar) - offset;
> +
> +	if (WARN_ON(size + offset > pci_resource_len(pdev, bar)))
> +		return -EINVAL;
> +
> +	if (!pdev->p2p) {
> +		error = pci_p2pmem_setup(pdev);
> +		if (error)
> +			return error;
> +	}
> +
> +	pgmap = devm_kzalloc(&pdev->dev, sizeof(*pgmap), GFP_KERNEL);
> +	if (!pgmap)
> +		return -ENOMEM;
> +
> +	pgmap->res.start = pci_resource_start(pdev, bar) + offset;
> +	pgmap->res.end = pgmap->res.start + size - 1;

I'm guessing Christoph's dev_pagemap revamp repo must change
pgmap->res from a pointer to a structure, but I don't see the actual
link in your cover letter.

I think you should set pgmap->res.flags here, too.

> +	pgmap->ref = &pdev->p2p->devmap_ref;
> +	pgmap->type = MEMORY_DEVICE_PCI_P2P;
> +
> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
> +	if (IS_ERR(addr))
> +		return PTR_ERR(addr);
> +
> +	error = gen_pool_add_virt(pdev->p2p->pool, (uintptr_t)addr,
> +			pci_bus_address(pdev, bar) + offset,
> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
> +	if (error)
> +		return error;
> +
> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pmem_percpu_kill,
> +					  &pdev->p2p->devmap_ref);
> +	if (error)
> +		return error;
> +
> +	dev_info(&pdev->dev, "added %zdB of p2p memory\n", size);

Can we add %pR and print pgmap->res itself, too?

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_p2pmem_add_resource);

> + * If a device is behind a switch, we try to find the upstream bridge
> + * port of the switch. This requires two calls to pci_upstream_bridge:
> + * one for the upstream port on the switch, one on the upstream port
> + * for the next level in the hierarchy. Because of this, devices connected
> + * to the root port will be rejected.
> + */
> +static struct pci_dev *get_upstream_switch_port(struct pci_dev *pdev)
> +{
> +	struct pci_dev *up1, *up2;
> +
> +	if (!pdev)
> +		return NULL;
> +
> +	up1 = pci_dev_get(pci_upstream_bridge(pdev));
> +	if (!up1)
> +		return NULL;
> +
> +	up2 = pci_dev_get(pci_upstream_bridge(up1));
> +	pci_dev_put(up1);
> +
> +	return up2;
> +}
> +
> +static bool __upstream_bridges_match(struct pci_dev *upstream,
> +				     struct pci_dev *client)
> +{
> +	struct pci_dev *dma_up;
> +	bool ret = true;
> +
> +	dma_up = get_upstream_switch_port(client);
> +
> +	if (!dma_up) {
> +		dev_dbg(&client->dev, "not a pci device behind a switch\n");

You have a bit of a mix of PCI ("pci device", "bridge") and PCIe
("switch", "switch port") terminology.  I haven't read the rest of the
patches yet, so I don't know if you intend to restrict this to
PCIe-only, e.g., so you can use ACS, or if you want to make it
available on conventional PCI as well.

If the latter, I would use the generic PCI terminology, i.e., "bridge"
instead of "switch".

> + * pci_virt_to_bus - return the pci bus address for a given virtual
> + *	address obtained with pci_alloc_p2pmem
> + * @pdev:	the device the memory was allocated from
> + * @addr:	address of the memory that was allocated
> + */
> +pci_bus_addr_t pci_p2pmem_virt_to_bus(struct pci_dev *pdev, void *addr)
> +{
> +	if (!addr)
> +		return 0;
> +	if (!pdev->p2p)
> +		return 0;
> +
> +	return gen_pool_virt_to_phys(pdev->p2p->pool, (unsigned long)addr);

This doesn't seem right.  A physical address is not the same as a PCI
bus address.  I expected something like pci_bus_address() or
pcibios_resource_to_bus() here.  Am I missing something?  If so, a
clarifying comment would be helpful.

> + * pci_p2pmem_publish - publish the p2p memory for use by other devices
> + *	with pci_p2pmem_find
> + * @pdev:	the device with p2p memory to publish
> + * @publish:	set to true to publish the memory, false to unpublish it
> + */
> +void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
> +{
> +	if (WARN_ON(publish && !pdev->p2p))
> +		return;

Same WARN_ON question -- is this really intended for production?
Doesn't seem like the end user can really do anything with the warning
information.

> diff --git a/include/linux/pci-p2p.h b/include/linux/pci-p2p.h
> new file mode 100644
> index 000000000000..f811c97a5886
> --- /dev/null
> +++ b/include/linux/pci-p2p.h
> @@ -0,0 +1,85 @@
> +#ifndef _LINUX_PCI_P2P_H
> +#define _LINUX_PCI_P2P_H
> +/*
> + * Copyright (c) 2016-2017, Microsemi Corporation
> + * Copyright (c) 2017, Christoph Hellwig.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#include <linux/pci.h>
> +
> +struct block_device;
> +struct scatterlist;

I've been noticing that we're accumulating PCI-related files in
include/linux: pci.h, pci-aspm.h pci-ats.h, pci-dma.h, pcieport_if.h,
etc.  I'm not sure there's value in all those and am thinking maybe
they should just be folded into pci.h.  What do you think?

> +#ifdef CONFIG_PCI_P2P
> +int pci_p2pmem_add_resource(struct pci_dev *pdev, int bar, size_t size,
> +		u64 offset);
> +int pci_p2pmem_add_client(struct list_head *head, struct device *dev);
> ...

Bjorn

  reply	other threads:[~2018-01-04 21:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-04 19:01 [PATCH 00/11] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 01/12] pci-p2p: Support peer to peer memory Logan Gunthorpe
2018-01-04 21:40   ` Bjorn Helgaas [this message]
2018-01-04 23:06     ` Logan Gunthorpe
2018-01-04 21:59   ` Bjorn Helgaas
2018-01-05  0:20     ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 02/12] pci-p2p: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-01-04 21:50   ` Bjorn Helgaas
2018-01-04 22:25     ` Jason Gunthorpe
2018-01-04 23:13     ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 03/12] pci-p2p: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 04/12] pci-p2p: Clear ACS P2P flags for all client devices Logan Gunthorpe
2018-01-04 21:57   ` Bjorn Helgaas
2018-01-04 22:35     ` Alex Williamson
2018-01-05  0:00       ` Logan Gunthorpe
2018-01-05  1:09         ` Logan Gunthorpe
2018-01-05  3:33         ` Alex Williamson
2018-01-05  6:47           ` Jerome Glisse
2018-01-05 15:41             ` Alex Williamson
2018-01-05 17:10           ` Logan Gunthorpe
2018-01-05 17:18             ` Alex Williamson
2018-01-04 19:01 ` [PATCH 05/12] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-01-04 19:22   ` Jason Gunthorpe
2018-01-04 19:52     ` Logan Gunthorpe
2018-01-04 22:13       ` Jason Gunthorpe
2018-01-04 23:44         ` Logan Gunthorpe
2018-01-05  4:50           ` Jason Gunthorpe
2018-01-08 14:59             ` Christoph Hellwig
2018-01-08 18:09               ` Jason Gunthorpe
2018-01-08 18:17                 ` Logan Gunthorpe
2018-01-08 18:29                   ` Jason Gunthorpe
2018-01-08 18:34                 ` Christoph Hellwig
2018-01-08 18:44                   ` Logan Gunthorpe
2018-01-08 18:57                     ` Christoph Hellwig
2018-01-08 19:05                       ` Logan Gunthorpe
2018-01-09 16:47                         ` Christoph Hellwig
2018-01-08 19:49                       ` Jason Gunthorpe
2018-01-09 16:46                         ` Christoph Hellwig
2018-01-09 17:10                           ` Jason Gunthorpe
2018-01-08 19:01                   ` Jason Gunthorpe
2018-01-09 16:55                     ` Christoph Hellwig
2018-01-04 19:01 ` [PATCH 07/12] nvme-pci: clean up CMB initialization Logan Gunthorpe
2018-01-04 19:08   ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 08/12] nvme-pci: clean up SMBSZ bit definitions Logan Gunthorpe
2018-01-04 19:08   ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 09/12] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-01-05 15:30   ` Marta Rybczynska
2018-01-05 18:14     ` Logan Gunthorpe
2018-01-05 18:11   ` Keith Busch
2018-01-05 18:19     ` Logan Gunthorpe
2018-01-05 19:01       ` Keith Busch
2018-01-05 19:04         ` Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 10/12] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 11/12] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-01-04 19:01 ` [PATCH 12/12] nvmet: Optionally use PCI P2P memory Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180104214028.GD189897@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox