From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
Mina Almasry <almasrymina@google.com>,
Christoph Hellwig <hch@lst.de>,
John Hubbard <jhubbard@nvidia.com>,
"Dan Williams" <dan.j.williams@intel.com>
Cc: David Ahern <dsahern@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
"Jesper Dangaard Brouer" <jbrouer@redhat.com>,
<brouer@redhat.com>, Alexander Duyck <alexander.duyck@gmail.com>,
Yunsheng Lin <linyunsheng@huawei.com>, <davem@davemloft.net>,
<pabeni@redhat.com>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
Lorenzo Bianconi <lorenzo@kernel.org>,
"Yisen Zhuang" <yisen.zhuang@huawei.com>,
Salil Mehta <salil.mehta@huawei.com>,
"Eric Dumazet" <edumazet@google.com>,
Sunil Goutham <sgoutham@marvell.com>,
"Geetha sowjanya" <gakula@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
hariprasad <hkelam@marvell.com>,
Saeed Mahameed <saeedm@nvidia.com>,
"Leon Romanovsky" <leon@kernel.org>, Felix Fietkau <nbd@nbd.name>,
Ryder Lee <ryder.lee@mediatek.com>,
Shayne Chen <shayne.chen@mediatek.com>,
Sean Wang <sean.wang@mediatek.com>, Kalle Valo <kvalo@kernel.org>,
Matthias Brugger <matthias.bgg@gmail.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
<linux-rdma@vger.kernel.org>, <linux-wireless@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-mediatek@lists.infradead.org>,
Jonathan Lemon <jonathan.lemon@gmail.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)
Date: Mon, 10 Jul 2023 23:52:14 -0700 [thread overview]
Message-ID: <64acfc1e46a80_8e17829462@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <ZKyZBbKEpmkFkpWV@ziepe.ca>
Jason Gunthorpe wrote:
> On Mon, Jul 10, 2023 at 04:02:59PM -0700, Mina Almasry wrote:
> > On Mon, Jul 10, 2023 at 10:44 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jul 05, 2023 at 06:17:39PM -0700, Mina Almasry wrote:
> > >
> > > > Another issue is that in networks with low MTU, we could be DMAing
> > > > 1400/1500 bytes into each allocation, which is problematic if the
> > > > allocation is 8K+. I would need to investigate a bit to see if/how to
> > > > solve that, and we may end up having to split the page and again run
> > > > into the 'not enough room in struct page' problem.
> > >
> > > You don't have an intree driver to use this with, so who knows, but
> > > the out of tree GPU drivers tend to use a 64k memory management page
> > > size, and I don't expect you'd make progress with a design where a 64K
> > > naturaly sized allocator is producing 4k/8k non-compound pages just
> > > for netdev. We are still struggling with pagemap support for variable
> > > page size folios, so there is a bunch of technical blockers before
> > > drivers could do this.
> > >
> > > This is why it is so important to come with a complete in-tree
> > > solution, as we cannot review this design if your work is done with
> > > hacked up out of tree drivers.
> > >
> >
> > I think you're assuming the proposal requires dma-buf exporter driver
> > changes, and I have a 'hacked up out of tree driver' not visible to
> > you.
>
> Oh, I thought it was obvious what you did in patch 1 was a total
> non-starter when I said you can't abuse the ZONE_DEVICE pages like
> this.
>
> You must create ZONE_DEVICE P2P pages, not MEMORY_DEVICE_PRIVATE to
> represent P2P memory, and you can't do that automatically from the
> dmabuf core code.
>
> Without doing this the DMA API doesn't actually work properly because
> it doesn't have enough information to know about what the underlying
> exporter is.
>
> The entire point of DEVICE_PRIVATE is that the page content, and
> access to the page's physical location, is *explicitly* unavailable to
> anyone but the pgmap owner.
>
> > > Fully and properly adding P2P ZONE_DEVICE to a real world driver is a
> > > pretty big ask still.
> >
> > There is no such ask.
>
> Well, there is from me if you want to use stuct pages as handles for
> P2P memory. That is what we have defined in the pgmap area.
>
> Also I should warn you that your 'option 2' is being NAK'd by
> Christoph for now, we are not adding any new code around DMABUF's
> hacky use of NULL sg_page scatterlist for P2P memory either. I've been
> working on solutions here but it is slow going.
>
> > On dma-buf changes required. I do need approval from the dma-buf
> > maintainers,
>
> It is a neat hack, of sorts, to abuse DEVICE_PRIVATE to create struct
> pages for the exclusive use of pagepool - but you need more approval
> than just dmabuf maintainers to abuse the pgmap framework like
> this.
>
> At least from my position I want to see MEMORY_DEVICE_PCI_P2PDMA used
> to represent P2P memory. You haven't CC'd anyone from the mm community
> so I've added some people here to see if there are other opinions.
>
> To be clear, you need an explicit ack from mm people on the abusive
> use of pgmap in patch 1.
This thread is long so I am only reacting to this first message I am
copied on, but yes agree with Jason anything peer-to-peer DMA needs to
reuse p2pdma and it definitely needs in-tree producers and consumers for
all infrastructure.
One piece of technical debt standing in the way of any proposed
expansion of pgmap usage is the final resolution of this topic:
https://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com/
I am also generally reticent to entertain taking on new pgmap
maintenance. I.e. now that accelerator memory attached over a coherent
link is an open hardware standard (CXL) that addresses what pgmap
infrastructure enabled in software.
> I know it is not what you want to hear, but there are actual reasons
> why the P2P DMA problem has been festering for so long, and hacky
> quick fixes like this are not going to be enough..
>
> Jason
next prev parent reply other threads:[~2023-07-11 6:52 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 13:02 [PATCH net-next v4 0/5] introduce page_pool_alloc() API Yunsheng Lin
2023-06-12 13:02 ` [PATCH net-next v4 1/5] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
2023-06-13 13:30 ` Alexander Lobakin
2023-06-14 3:36 ` Yunsheng Lin
2023-06-14 3:55 ` Jakub Kicinski
2023-06-14 10:52 ` Alexander Lobakin
2023-06-14 12:15 ` Yunsheng Lin
2023-06-14 12:42 ` Alexander Lobakin
2023-06-14 4:09 ` Jakub Kicinski
2023-06-14 11:42 ` Yunsheng Lin
2023-06-14 17:07 ` Jakub Kicinski
2023-06-15 7:29 ` Yunsheng Lin
2023-06-12 13:02 ` [PATCH net-next v4 2/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
2023-06-14 4:33 ` Jakub Kicinski
2023-06-14 11:55 ` Yunsheng Lin
2023-06-12 13:02 ` [PATCH net-next v4 3/5] page_pool: introduce page_pool_alloc() API Yunsheng Lin
2023-06-13 13:08 ` Alexander Lobakin
2023-06-13 13:11 ` Alexander Lobakin
2023-06-14 3:17 ` Yunsheng Lin
2023-06-12 13:02 ` [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag Yunsheng Lin
2023-06-14 17:19 ` Jakub Kicinski
2023-06-15 7:17 ` Yunsheng Lin
2023-06-15 16:51 ` Jakub Kicinski
2023-06-15 18:26 ` Alexander Duyck
2023-06-16 12:20 ` Yunsheng Lin
2023-06-16 15:01 ` Alexander Duyck
2023-06-16 18:59 ` Jesper Dangaard Brouer
2023-06-16 19:21 ` Jakub Kicinski
2023-06-16 20:42 ` Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag) Jesper Dangaard Brouer
2023-06-19 18:07 ` Jakub Kicinski
2023-06-20 15:12 ` Jesper Dangaard Brouer
2023-06-20 15:39 ` Jakub Kicinski
2023-06-30 2:27 ` Mina Almasry
2023-07-03 4:20 ` David Ahern
2023-07-03 6:22 ` Mina Almasry
2023-07-03 14:45 ` David Ahern
2023-07-03 17:13 ` Eric Dumazet
2023-07-03 17:23 ` David Ahern
2023-07-06 1:19 ` Mina Almasry
2023-07-03 17:15 ` Eric Dumazet
2023-07-03 17:25 ` David Ahern
2023-07-03 21:43 ` Jason Gunthorpe
2023-07-06 1:17 ` Mina Almasry
2023-07-10 17:44 ` Jason Gunthorpe
2023-07-10 23:02 ` Mina Almasry
2023-07-10 23:49 ` Jason Gunthorpe
2023-07-11 0:45 ` Mina Almasry
2023-07-11 13:11 ` Jason Gunthorpe
2023-07-11 17:24 ` Mina Almasry
2023-07-11 4:27 ` Christoph Hellwig
2023-07-11 4:59 ` Jakub Kicinski
2023-07-11 5:04 ` Christoph Hellwig
2023-07-11 12:05 ` Jason Gunthorpe
2023-07-11 16:00 ` Jakub Kicinski
2023-07-11 16:20 ` David Ahern
2023-07-11 16:32 ` Jakub Kicinski
2023-07-11 17:06 ` Mina Almasry
2023-07-11 20:39 ` Jakub Kicinski
2023-07-11 21:39 ` David Ahern
2023-07-12 3:42 ` Mina Almasry
2023-07-12 7:55 ` Christian König
2023-07-12 13:03 ` Jason Gunthorpe
2023-07-12 13:35 ` Christian König
2023-07-12 22:41 ` Mina Almasry
2023-07-12 13:01 ` Jason Gunthorpe
2023-07-12 20:16 ` Mina Almasry
2023-07-12 23:57 ` Jason Gunthorpe
2023-07-13 7:56 ` Christian König
2023-07-14 14:55 ` Mina Almasry
2023-07-14 15:18 ` David Ahern
2023-07-17 2:05 ` Mina Almasry
2023-07-17 3:08 ` David Ahern
2023-07-14 15:55 ` Jason Gunthorpe
2023-07-17 1:53 ` Mina Almasry
2023-07-24 14:56 ` Jesper Dangaard Brouer
2023-07-24 16:28 ` Jason Gunthorpe
2023-07-25 4:04 ` Mina Almasry
2023-07-26 17:36 ` Jesper Dangaard Brouer
2023-07-11 16:42 ` Jason Gunthorpe
2023-07-11 17:06 ` Jakub Kicinski
2023-07-11 18:52 ` Jason Gunthorpe
2023-07-11 20:34 ` Jakub Kicinski
2023-07-11 23:56 ` Jason Gunthorpe
2023-07-11 6:52 ` Dan Williams [this message]
2023-07-06 16:50 ` Jakub Kicinski
2023-06-17 12:19 ` [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag Yunsheng Lin
2023-06-15 13:59 ` Alexander Lobakin
2023-06-12 13:02 ` [PATCH net-next v4 5/5] page_pool: update document about frag API Yunsheng Lin
2023-06-14 4:40 ` Jakub Kicinski
2023-06-14 12:04 ` Yunsheng Lin
2023-06-14 16:56 ` Jakub Kicinski
2023-06-15 6:49 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64acfc1e46a80_8e17829462@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=almasrymina@google.com \
--cc=angelogioacchino.delregno@collabora.com \
--cc=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=hawk@kernel.org \
--cc=hch@lst.de \
--cc=hkelam@marvell.com \
--cc=ilias.apalodimas@linaro.org \
--cc=jbrouer@redhat.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=jonathan.lemon@gmail.com \
--cc=kuba@kernel.org \
--cc=kvalo@kernel.org \
--cc=leon@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=lorenzo@kernel.org \
--cc=matthias.bgg@gmail.com \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=ryder.lee@mediatek.com \
--cc=saeedm@nvidia.com \
--cc=salil.mehta@huawei.com \
--cc=sbhatta@marvell.com \
--cc=sean.wang@mediatek.com \
--cc=sgoutham@marvell.com \
--cc=shayne.chen@mediatek.com \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).