linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: davide rossetti <davide.rossetti@gmail.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Kovalyov Artemy <artemyko@mellanox.com>,
	"dledford@redhat.com" <dledford@redhat.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"leon@leon.ro" <leon@leon.ro>, Sagi Grimberg <sagig@mellanox.com>
Subject: Re: [RFC 0/7] Peer-direct memory
Date: Wed, 17 Feb 2016 00:44:12 -0800	[thread overview]
Message-ID: <20160217084412.GA13616@infradead.org> (raw)
In-Reply-To: <CAPSaadx3vNBSxoWuvjrTp2n8_-DVqofttFGZRR+X8zdWwV86nw@mail.gmail.com>

[disclaimer: I've been involved with ZONE_DEVICE support and the pmem
 driver and wrote parts of the code and discussed a lot of the tradeoffs
 on how we handle I/O to memory in BARs]

On Tue, Feb 16, 2016 at 08:13:58PM -0800, davide rossetti wrote:
> 1) I see mm as appropriate for real memory, i.e. something that
> user-space apps can pass around.

mm is memory management, and this clearly falls under the umbrella,
so it absolutely needs to be under mm/ and reviewed by the linux-mm
crowd.

> This is not totally true for BAR
> memory, for instance:
>  a) as long as CPU initiated atomic ops are not supported on BAR space
> of PCIe devices.
>  b) OTOT, CPU reading from BAR is awful (BW being abysmal,~10MB/s),
> while high BW writing requires use of vector instructions (at least on
> x86_64).
> Bottom line is, BAR mappings are not like plain memory.

That doesn't change how the are managed.  We've always suppored mapping
BARs to userspace in various drivers, and the only real news with things
like the pmem driver with DAX or some of the things people want to do
with the NVMe controller memoery buffer is that there are much bigger
quantities of it, and:

 a) people want to be able  have cachable mappings of various kinds
    instead of the old uncachable default.
 b) we want to be able to DMA (including RDMA) to the regions in the
    BARs.

a) is something that needs smaller amounts in all kinds of areas to be
done properly, but in principle GPU drivers have been doing this forever
using all kinds of hacks.

b) is the real issue.  The Linux DMA support code doesn't really operate
on just physical addresses, but on page structures, and we don't
allocate for BARs.  We investigated two ways to address this:  1) allow
DMA operations without struct page and 2) create struct page structures
for BARs that we want to be able to use DMA operations on.  For various
reasons version 2) was favored and this is how we ended up with
ZONE_DEVICE.  Read the linux-mm and linux-nvdimm lists for the lenghty
discussions how we ended up here.

Additional issues like which instructions to use for access build on top
of these basic building blocks.

> 2) Instead, I see appropriate that two sophisticated devices, like an
> IB NIC and a storage/accelerator device, can freely target each other
> for I/O, i.e. exchanging peer-to-peer PCIe transactions. And as long
> as the existing sophisticated initiators are confined to the RDMA
> subsystem, that is where this support belongs to.

It doesn't.  There is absolutely nothing RDMA specific here - please
work with the overall community to do the right thing here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-17  8:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1455207177-11949-1-git-send-email-artemyko@mellanox.com>
     [not found] ` <20160211191838.GA23675@obsidianresearch.com>
2016-02-14 14:27   ` [RFC 0/7] Peer-direct memory Haggai Eran
2016-02-16 18:22     ` Jason Gunthorpe
2016-02-17  4:03       ` davide rossetti
2016-02-17  4:13         ` davide rossetti
2016-02-17  4:44           ` Jason Gunthorpe
2016-02-17  8:49             ` Christoph Hellwig
2016-02-18 17:12               ` Jason Gunthorpe
2016-02-17  8:44           ` Christoph Hellwig [this message]
2016-02-17 15:25             ` Haggai Eran
2016-02-19 18:54               ` Dan Williams
     [not found]   ` <20160212201328.GA14122@infradead.org>
     [not found]     ` <20160212203649.GA10540@obsidianresearch.com>
     [not found]       ` <56C09C7E.4060808@dev.mellanox.co.il>
     [not found]         ` <36F6EBABA23FEF4391AF72944D228901EB70C102@BBYEXM01.pmc-sierra.internal>
2016-02-21  9:06           ` Haggai Eran
2016-02-24 23:45             ` Stephen Bates
2016-02-25 11:27               ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160217084412.GA13616@infradead.org \
    --to=hch@infradead.org \
    --cc=artemyko@mellanox.com \
    --cc=davide.rossetti@gmail.com \
    --cc=dledford@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=leon@leon.ro \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagig@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).