All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Joao Martins <joao.m.martins@oracle.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, netdev@vger.kernel.org,
	linux-mm@kvack.org, linux-rdma@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nvdimm@lists.linux.dev
Subject: Re: Phyr Starter
Date: Thu, 20 Jan 2022 14:56:02 +0100	[thread overview]
Message-ID: <20220120135602.GA11223@lst.de> (raw)
In-Reply-To: <20220111004126.GJ2328285@nvidia.com>

On Mon, Jan 10, 2022 at 08:41:26PM -0400, Jason Gunthorpe wrote:
> > Finally, it may be possible to stop using scatterlist to describe the
> > input to the DMA-mapping operation.  We may be able to get struct
> > scatterlist down to just dma_address and dma_length, with chaining
> > handled through an enclosing struct.
> 
> Can you talk about this some more? IMHO one of the key properties of
> the scatterlist is that it can hold huge amounts of pages without
> having to do any kind of special allocation due to the chaining.
> 
> The same will be true of the phyr idea right?

No special allocations as in no vmalloc?  The chaining still has to
allocate memory using a mempool.

Anyway, to explain my idea which is very similar but not identical to
the one willy has:

 - on the input side to dma mapping the bio_vecs (or phyrs) are chained
   as bios or whatever the containing structure is.  These already exist
   and have infrastructure at least in the block layer
 - on the output side I plan for two options:

	1) we have a sane IOMMU and everyting will be coalesced into a
	   single dma_range.  This requires setting the block layer
	   merge boundary to match the IOMMU page size, but that is
	   a very good thing to do anyway.
	2) we have no IOMMU (or a weird one) and get one output dma_range
	   per input bio_vec.  We'd eithe have to support chaining or use
	   vmalloc or huge numbers of entries.

> If you limit to that scenario then we can be more optimal because
> things like byte granular offsets and size in the interior pages don't
> need to exist. Every interior chunk is always aligned to its order and
> we only need to record the order.

The block layer does not small offsets.  Direct I/O can often be
512 byte aligned, and some other passthrough commands can have even
smaller alignment, although I don't think we ever go below 4-byte
alignment anywhere in the block layer.

> IMHO storage density here is quite important, we end up having to keep
> this stuff around for a long time.

If we play these tricks it won't be general purpose enough to get rid
of the existing scatterlist usage.

  parent reply	other threads:[~2022-01-20 13:56 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 19:34 Phyr Starter Matthew Wilcox
2022-01-10 19:34 ` Matthew Wilcox
2022-01-11  0:41 ` Jason Gunthorpe
2022-01-11  0:41   ` Jason Gunthorpe
2022-01-11  4:32   ` Matthew Wilcox
2022-01-11  4:32     ` Matthew Wilcox
2022-01-11 15:01     ` Jason Gunthorpe
2022-01-11 15:01       ` Jason Gunthorpe
2022-01-11 18:33       ` Matthew Wilcox
2022-01-11 18:33         ` Matthew Wilcox
2022-01-11 20:21         ` Jason Gunthorpe
2022-01-11 20:21           ` Jason Gunthorpe
2022-01-11 21:25           ` Matthew Wilcox
2022-01-11 21:25             ` Matthew Wilcox
2022-01-11 22:09             ` Logan Gunthorpe
2022-01-11 22:09               ` Logan Gunthorpe
2022-01-11 22:57               ` Jason Gunthorpe
2022-01-11 22:57                 ` Jason Gunthorpe
2022-01-11 23:02                 ` Logan Gunthorpe
2022-01-11 23:02                   ` Logan Gunthorpe
2022-01-11 22:53             ` Jason Gunthorpe
2022-01-11 22:53               ` Jason Gunthorpe
2022-01-11 22:57               ` Logan Gunthorpe
2022-01-11 22:57                 ` Logan Gunthorpe
2022-01-11 23:02                 ` Jason Gunthorpe
2022-01-11 23:02                   ` Jason Gunthorpe
2022-01-11 23:08                   ` Logan Gunthorpe
2022-01-11 23:08                     ` Logan Gunthorpe
2022-01-12 18:37               ` Matthew Wilcox
2022-01-12 18:37                 ` Matthew Wilcox
2022-01-12 19:08                 ` Jason Gunthorpe
2022-01-12 19:08                   ` Jason Gunthorpe
2022-01-20 14:03                 ` Christoph Hellwig
2022-01-20 17:17                   ` Jason Gunthorpe
2022-01-20 17:17                     ` Jason Gunthorpe
2022-01-20 14:00       ` Christoph Hellwig
2022-01-11  9:05   ` Daniel Vetter
2022-01-11  9:05     ` Daniel Vetter
2022-01-11 20:26     ` Jason Gunthorpe
2022-01-11 20:26       ` Jason Gunthorpe
2022-01-20 14:09       ` Christoph Hellwig
2022-01-20 13:56   ` Christoph Hellwig [this message]
2022-01-20 15:27     ` Keith Busch
2022-01-20 15:27       ` Keith Busch
2022-01-20 15:28       ` Christoph Hellwig
2022-01-20 17:54       ` Robin Murphy
2022-01-11  8:17 ` John Hubbard
2022-01-11  8:17   ` John Hubbard
2022-01-11 14:01   ` Matthew Wilcox
2022-01-11 14:01     ` Matthew Wilcox
2022-01-11 15:02     ` Jason Gunthorpe
2022-01-11 15:02       ` Jason Gunthorpe
2022-01-11 17:31   ` Logan Gunthorpe
2022-01-11 17:31     ` Logan Gunthorpe
2022-01-20 14:12   ` Christoph Hellwig
2022-01-20 21:35     ` John Hubbard
2022-01-20 21:35       ` John Hubbard
2022-01-11 11:40 ` Thomas Zimmermann
2022-01-11 13:56   ` Matthew Wilcox
2022-01-11 13:56     ` Matthew Wilcox
2022-01-11 14:10     ` Thomas Zimmermann
2022-01-20 13:39 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220120135602.GA11223@lst.de \
    --to=hch@lst.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=ming.lei@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.