From: Christoph Hellwig <hch@infradead.org>
To: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Al Viro <viro@zeniv.linux.org.uk>,
willy@infradead.org, dchinner@redhat.com,
Steve French <smfrench@gmail.com>,
Shyam Prasad N <nspmangalore@gmail.com>,
Rohith Surabattula <rohiths.msft@gmail.com>,
Jeff Layton <jlayton@kernel.org>, Ira Weiny <ira.weiny@intel.com>,
torvalds@linux-foundation.org, linux-cifs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: How to convert I/O iterators to iterators, sglists and RDMA lists
Date: Mon, 24 Oct 2022 07:57:24 -0700 [thread overview]
Message-ID: <Y1an1NFcowiSS9ms@infradead.org> (raw)
In-Reply-To: <1415915.1666274636@warthog.procyon.org.uk>
On Thu, Oct 20, 2022 at 03:03:56PM +0100, David Howells wrote:
> > What block file systems do is to take the pages from the iter and some flags
> > on what is pinned. We can generalize this to store all extra state in a
> > flags word, or byte the bullet and allow cloning of the iter in one form or
> > another.
>
> Yeah, I know. A list of pages is not an ideal solution. It can only handle
> contiguous runs of pages, possibly with a partial page at either end. A bvec
> iterator would be of more use as it can handle a series of partial pages.
>
> Note also that I would need to turn the pages *back* into an iterator in order
> to commune with sendmsg() in the nether reaches of some network filesystems.
Yes. So I think the right thing here is to make sure we can send
the iter through the whole stack without a convesion.
> It would be nice to be able to pass an iterator to the crypto layer. I'm not
> sure what the crypto people think of that.
Let's ask them..
> On the other hand, if you think the RDMA API should be taking scatterlists
> rather than sge lists, that would be fine. Even better if I can just pass an
> iterator in directly - though neither scatterlist nor iterator has a place to
> put the RDMA local_dma_key - though I wonder if that's actually necessary for
> each sge element, or whether it could be handed through as part of the request
> as a hole.
Well, in the long run it should not take scatterlists either, as they
are a bad data structure. But what should happen in the long run is
that the DMA mapping is only done in the hardware drivers, not the ULPs,
which is a really nasty layering violation. This requires the strange
ib_dma_* stubs to disable DMA mapping for the software drivers, and it
also does complete unneeded DMA mappings for sends that are inline in
the SQE as supported by some Mellanox / Nvidia hardware.
> That's fine in principle. However, I have some extraction code that can
> convert an iterator to another iterator, an sglist or an rdma sge list, using
> a common core of code to do all three.
So I think the iterator to iterator is a really bad idea and we should
not have it at all. It just works around the issue about not being
able to easily keeping state after an iter based get_user_pages, but
that is beeing addressed at the moment. The iter to ib_sge/scatterlist
are very much RDMA specific at the moment, so I guess that might be a
good place to keep them. In fact I suspect the scatterlist conversion
should not be a public API at all, but hidden in rw.c and only be used
internally for the DMA mapping.
next prev parent reply other threads:[~2022-10-24 20:09 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-14 15:26 How to convert I/O iterators to iterators, sglists and RDMA lists David Howells
2022-10-17 13:15 ` Christoph Hellwig
2022-10-20 14:03 ` David Howells
2022-10-21 3:30 ` Ira Weiny
2022-10-24 14:51 ` Christoph Hellwig
2022-10-24 14:57 ` Christoph Hellwig [this message]
2022-10-24 19:53 ` Al Viro
2022-10-28 2:33 ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Al Viro
2022-10-28 2:33 ` [PATCH v2 02/12] csum_and_copy_to_iter(): handle ITER_DISCARD Al Viro
2022-10-28 2:33 ` [PATCH v2 03/12] [s390] copy_oldmem_kernel() - WRITE is "data source", not destination Al Viro
2022-10-28 2:33 ` [PATCH v2 04/12] [fsi] " Al Viro
2022-10-28 2:33 ` [PATCH v2 05/12] [infiniband] READ is "data destination", not source Al Viro
2022-10-28 2:33 ` [PATCH v2 06/12] [s390] zcore: WRITE is "data source", not destination Al Viro
2022-10-28 2:33 ` [PATCH v2 07/12] [s390] memcpy_real(): " Al Viro
2022-10-28 2:33 ` [PATCH v2 08/12] [target] fix iov_iter_bvec() "direction" argument Al Viro
2022-10-28 2:33 ` [PATCH v2 09/12] [vhost] fix 'direction' argument of iov_iter_{init,bvec}() Al Viro
2022-10-28 2:33 ` [PATCH v2 10/12] [xen] fix "direction" argument of iov_iter_kvec() Al Viro
2022-10-28 12:48 ` John Stoffel
2022-10-28 12:49 ` John Stoffel
2022-10-28 2:33 ` [PATCH v2 11/12] iov_iter: saner checks for attempt to copy to/from iterator Al Viro
2022-10-28 2:33 ` [PATCH v2 12/12] use less confusing names for iov_iter direction initializers Al Viro
2022-10-28 16:41 ` Linus Torvalds
2022-10-28 17:02 ` David Howells
2022-10-28 17:09 ` Linus Torvalds
2022-10-28 17:15 ` Al Viro
2022-10-28 18:35 ` Linus Torvalds
2022-10-28 19:30 ` Al Viro
2022-10-28 20:34 ` Linus Torvalds
2022-10-30 5:01 ` Al Viro
2022-10-30 8:12 ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Christoph Hellwig
2022-10-28 17:31 ` How to convert I/O iterators to iterators, sglists and RDMA lists David Howells
2022-11-04 18:47 ` David Howells
2022-11-01 13:51 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1an1NFcowiSS9ms@infradead.org \
--to=hch@infradead.org \
--cc=dchinner@redhat.com \
--cc=dhowells@redhat.com \
--cc=ira.weiny@intel.com \
--cc=jlayton@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nspmangalore@gmail.com \
--cc=rohiths.msft@gmail.com \
--cc=smfrench@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.