From: Thomas Zimmermann <tzimmermann@suse.de>
To: Matthew Wilcox <willy@infradead.org>, linux-kernel@vger.kernel.org
Cc: nvdimm@lists.linux.dev, linux-rdma@vger.kernel.org,
John Hubbard <jhubbard@nvidia.com>,
dri-devel@lists.freedesktop.org, Ming Lei <ming.lei@redhat.com>,
linux-block@vger.kernel.org, linux-mm@kvack.org,
Jason Gunthorpe <jgg@nvidia.com>,
netdev@vger.kernel.org, Joao Martins <joao.m.martins@oracle.com>,
Logan Gunthorpe <logang@deltatee.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: Phyr Starter
Date: Tue, 11 Jan 2022 12:40:10 +0100 [thread overview]
Message-ID: <f7bd672f-dfa8-93fa-e101-e57b90faeb1e@suse.de> (raw)
In-Reply-To: <YdyKWeU0HTv8m7wD@casper.infradead.org>
[-- Attachment #1.1: Type: text/plain, Size: 4734 bytes --]
Hi
Am 10.01.22 um 20:34 schrieb Matthew Wilcox:
> TLDR: I want to introduce a new data type:
>
> struct phyr {
> phys_addr_t addr;
> size_t len;
> };
Did you look at struct dma_buf_map? [1]
For graphics framebuffers, we have the problem that these buffers can be
in I/O or system memory (and possibly move between them). Linux'
traditional interfaces (memcpy_toio(), etc) don't deal with the
differences well.
So we added struct dma_buf_map as an abstraction to the buffer address.
There are interfaces for accessing and copying the data. I also have a
patchset somewhere that adds caching information to the structure.
struct dma_buf_map is for graphics, but really just another memory API.
When we introduced struct dma_buf_map we thought of additional use
cases, but couldn't really find any at the time. Maybe what you're
describing is that use case and struct dma_buf_map could be extended for
this purpose.
Best regards
Thomas
[1]
https://elixir.bootlin.com/linux/v5.16/source/include/linux/dma-buf-map.h#L115
>
> and use it to replace bio_vec as well as using it to replace the array
> of struct pages used by get_user_pages() and friends.
>
> ---
>
> There are two distinct problems I want to address: doing I/O to memory
> which does not have a struct page and efficiently doing I/O to large
> blobs of physically contiguous memory, regardless of whether it has a
> struct page. There are some other improvements which I regard as minor.
>
> There are many types of memory that one might want to do I/O to that do
> not have a struct page, some examples:
> - Memory on a graphics card (or other PCI card, but gfx seems to be
> the primary provider of DRAM on the PCI bus today)
> - DAX, or other pmem (there are some fake pages today, but this is
> mostly a workaround for the IO problem today)
> - Guest memory being accessed from the hypervisor (KVM needs to
> create structpages to make this happen. Xen doesn't ...)
> All of these kinds of memories can be addressed by the CPU and so also
> by a bus master. That is, there is a physical address that the CPU
> can use which will address this memory, and there is a way to convert
> that to a DMA address which can be programmed into another device.
> There's no intent here to support memory which can be accessed by a
> complex scheme like writing an address to a control register and then
> accessing the memory through a FIFO; this is for memory which can be
> accessed by DMA and CPU loads and stores.
>
> For get_user_pages() and friends, we currently fill an array of struct
> pages, each one representing PAGE_SIZE bytes. For an application that
> is using 1GB hugepages, writing 2^18 entries is a significant overhead.
> It also makes drivers hard to write as they have to recoalesce the
> struct pages, even though the VM can tell it whether those 2^18 pages
> are contiguous.
>
> On the minor side, struct phyr can represent any mappable chunk of memory.
> A bio_vec is limited to 2^32 bytes, while on 64-bit machines a phyr
> can represent larger than 4GB. A phyr is the same size as a bio_vec
> on 64 bit (16 bytes), and the same size for 32-bit with PAE (12 bytes).
> It is smaller for 32-bit machines without PAE (8 bytes instead of 12).
>
> Finally, it may be possible to stop using scatterlist to describe the
> input to the DMA-mapping operation. We may be able to get struct
> scatterlist down to just dma_address and dma_length, with chaining
> handled through an enclosing struct.
>
> I would like to see phyr replace bio_vec everywhere it's currently used.
> I don't have time to do that work now because I'm busy with folios.
> If someone else wants to take that on, I shall cheer from the sidelines.
> What I do intend to do is:
>
> - Add an interface to gup.c to pin/unpin N phyrs
> - Add a sg_map_phyrs()
> This will take an array of phyrs and allocate an sg for them
> - Whatever else I need to do to make one RDMA driver happy with
> this scheme
>
> At that point, I intend to stop and let others more familiar with this
> area of the kernel continue the conversion of drivers.
>
> P.S. If you've had the Prodigy song running through your head the whole
> time you've been reading this email ... I'm sorry / You're welcome.
> If people insist, we can rename this to phys_range or something boring,
> but I quite like the spelling of phyr with the pronunciation of "fire".
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
next prev parent reply other threads:[~2022-01-11 11:40 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-10 19:34 Phyr Starter Matthew Wilcox
2022-01-10 19:34 ` Matthew Wilcox
2022-01-11 0:41 ` Jason Gunthorpe
2022-01-11 0:41 ` Jason Gunthorpe
2022-01-11 4:32 ` Matthew Wilcox
2022-01-11 4:32 ` Matthew Wilcox
2022-01-11 15:01 ` Jason Gunthorpe
2022-01-11 15:01 ` Jason Gunthorpe
2022-01-11 18:33 ` Matthew Wilcox
2022-01-11 18:33 ` Matthew Wilcox
2022-01-11 20:21 ` Jason Gunthorpe
2022-01-11 20:21 ` Jason Gunthorpe
2022-01-11 21:25 ` Matthew Wilcox
2022-01-11 21:25 ` Matthew Wilcox
2022-01-11 22:09 ` Logan Gunthorpe
2022-01-11 22:09 ` Logan Gunthorpe
2022-01-11 22:57 ` Jason Gunthorpe
2022-01-11 22:57 ` Jason Gunthorpe
2022-01-11 23:02 ` Logan Gunthorpe
2022-01-11 23:02 ` Logan Gunthorpe
2022-01-11 22:53 ` Jason Gunthorpe
2022-01-11 22:53 ` Jason Gunthorpe
2022-01-11 22:57 ` Logan Gunthorpe
2022-01-11 22:57 ` Logan Gunthorpe
2022-01-11 23:02 ` Jason Gunthorpe
2022-01-11 23:02 ` Jason Gunthorpe
2022-01-11 23:08 ` Logan Gunthorpe
2022-01-11 23:08 ` Logan Gunthorpe
2022-01-12 18:37 ` Matthew Wilcox
2022-01-12 18:37 ` Matthew Wilcox
2022-01-12 19:08 ` Jason Gunthorpe
2022-01-12 19:08 ` Jason Gunthorpe
2022-01-20 14:03 ` Christoph Hellwig
2022-01-20 17:17 ` Jason Gunthorpe
2022-01-20 17:17 ` Jason Gunthorpe
2022-01-20 14:00 ` Christoph Hellwig
2022-01-11 9:05 ` Daniel Vetter
2022-01-11 9:05 ` Daniel Vetter
2022-01-11 20:26 ` Jason Gunthorpe
2022-01-11 20:26 ` Jason Gunthorpe
2022-01-20 14:09 ` Christoph Hellwig
2022-01-20 13:56 ` Christoph Hellwig
2022-01-20 15:27 ` Keith Busch
2022-01-20 15:27 ` Keith Busch
2022-01-20 15:28 ` Christoph Hellwig
2022-01-20 17:54 ` Robin Murphy
2022-01-11 8:17 ` John Hubbard
2022-01-11 8:17 ` John Hubbard
2022-01-11 14:01 ` Matthew Wilcox
2022-01-11 14:01 ` Matthew Wilcox
2022-01-11 15:02 ` Jason Gunthorpe
2022-01-11 15:02 ` Jason Gunthorpe
2022-01-11 17:31 ` Logan Gunthorpe
2022-01-11 17:31 ` Logan Gunthorpe
2022-01-20 14:12 ` Christoph Hellwig
2022-01-20 21:35 ` John Hubbard
2022-01-20 21:35 ` John Hubbard
2022-01-11 11:40 ` Thomas Zimmermann [this message]
2022-01-11 13:56 ` Matthew Wilcox
2022-01-11 13:56 ` Matthew Wilcox
2022-01-11 14:10 ` Thomas Zimmermann
2022-01-20 13:39 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7bd672f-dfa8-93fa-e101-e57b90faeb1e@suse.de \
--to=tzimmermann@suse.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=ming.lei@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.