linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	willy@infradead.org, dchinner@redhat.com,
	Steve French <smfrench@gmail.com>,
	Shyam Prasad N <nspmangalore@gmail.com>,
	Rohith Surabattula <rohiths.msft@gmail.com>,
	Jeff Layton <jlayton@kernel.org>, Ira Weiny <ira.weiny@intel.com>,
	torvalds@linux-foundation.org, linux-cifs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: How to convert I/O iterators to iterators, sglists and RDMA lists
Date: Mon, 24 Oct 2022 07:57:24 -0700	[thread overview]
Message-ID: <Y1an1NFcowiSS9ms@infradead.org> (raw)
In-Reply-To: <1415915.1666274636@warthog.procyon.org.uk>

On Thu, Oct 20, 2022 at 03:03:56PM +0100, David Howells wrote:
> > What block file systems do is to take the pages from the iter and some flags
> > on what is pinned.  We can generalize this to store all extra state in a
> > flags word, or byte the bullet and allow cloning of the iter in one form or
> > another.
> 
> Yeah, I know.  A list of pages is not an ideal solution.  It can only handle
> contiguous runs of pages, possibly with a partial page at either end.  A bvec
> iterator would be of more use as it can handle a series of partial pages.
> 
> Note also that I would need to turn the pages *back* into an iterator in order
> to commune with sendmsg() in the nether reaches of some network filesystems.

Yes.  So I think the right thing here is to make sure we can send
the iter through the whole stack without a convesion.

> It would be nice to be able to pass an iterator to the crypto layer.  I'm not
> sure what the crypto people think of that.

Let's ask them..

> On the other hand, if you think the RDMA API should be taking scatterlists
> rather than sge lists, that would be fine.  Even better if I can just pass an
> iterator in directly - though neither scatterlist nor iterator has a place to
> put the RDMA local_dma_key - though I wonder if that's actually necessary for
> each sge element, or whether it could be handed through as part of the request
> as a hole.

Well, in the long run it should not take scatterlists either, as they
are a bad data structure.  But what should happen in the long run is
that the DMA mapping is only done in the hardware drivers, not the ULPs,
which is a really nasty layering violation.  This requires the strange
ib_dma_* stubs to disable DMA mapping for the software drivers, and it
also does complete unneeded DMA mappings for sends that are inline in
the SQE as supported by some Mellanox / Nvidia hardware.

> That's fine in principle.  However, I have some extraction code that can
> convert an iterator to another iterator, an sglist or an rdma sge list, using
> a common core of code to do all three.

So I think the iterator to iterator is a really bad idea and we should
not have it at all.  It just works around the issue about not being
able to easily keeping state after an iter based get_user_pages, but
that is beeing addressed at the moment.  The iter to ib_sge/scatterlist
are very much RDMA specific at the moment, so I guess that might be a
good place to keep them.  In fact I suspect the scatterlist conversion
should not be a public API at all, but hidden in rw.c and only be used
internally for the DMA mapping.

  parent reply	other threads:[~2022-10-24 20:10 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 15:26 How to convert I/O iterators to iterators, sglists and RDMA lists David Howells
2022-10-17 13:15 ` Christoph Hellwig
2022-10-20 14:03 ` David Howells
2022-10-21  3:30   ` Ira Weiny
2022-10-24 14:51     ` Christoph Hellwig
2022-10-24 14:57   ` Christoph Hellwig [this message]
2022-10-24 19:53     ` Al Viro
2022-10-28  2:33       ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Al Viro
2022-10-28  2:33         ` [PATCH v2 02/12] csum_and_copy_to_iter(): handle ITER_DISCARD Al Viro
2022-10-28  2:33         ` [PATCH v2 03/12] [s390] copy_oldmem_kernel() - WRITE is "data source", not destination Al Viro
2022-10-28  2:33         ` [PATCH v2 04/12] [fsi] " Al Viro
2022-10-28  2:33         ` [PATCH v2 05/12] [infiniband] READ is "data destination", not source Al Viro
2022-10-28  2:33         ` [PATCH v2 06/12] [s390] zcore: WRITE is "data source", not destination Al Viro
2022-10-28  2:33         ` [PATCH v2 07/12] [s390] memcpy_real(): " Al Viro
2022-10-28  2:33         ` [PATCH v2 08/12] [target] fix iov_iter_bvec() "direction" argument Al Viro
2022-10-28  2:33         ` [PATCH v2 09/12] [vhost] fix 'direction' argument of iov_iter_{init,bvec}() Al Viro
2022-10-28  2:33         ` [PATCH v2 10/12] [xen] fix "direction" argument of iov_iter_kvec() Al Viro
2022-10-28 12:48           ` John Stoffel
2022-10-28 12:49             ` John Stoffel
2022-10-28  2:33         ` [PATCH v2 11/12] iov_iter: saner checks for attempt to copy to/from iterator Al Viro
2022-10-28  2:33         ` [PATCH v2 12/12] use less confusing names for iov_iter direction initializers Al Viro
2022-10-28 16:41           ` Linus Torvalds
2022-10-28 17:15             ` Al Viro
2022-10-28 18:35               ` Linus Torvalds
2022-10-28 19:30                 ` Al Viro
2022-10-28 20:34                   ` Linus Torvalds
2022-10-30  5:01                     ` Al Viro
2022-10-28 17:02           ` David Howells
2022-10-28 17:09             ` Linus Torvalds
2022-10-30  8:12         ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Christoph Hellwig
2022-11-01 13:51       ` How to convert I/O iterators to iterators, sglists and RDMA lists Christoph Hellwig
2022-10-28 17:31     ` David Howells
2022-11-04 18:47     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1an1NFcowiSS9ms@infradead.org \
    --to=hch@infradead.org \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jlayton@kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nspmangalore@gmail.com \
    --cc=rohiths.msft@gmail.com \
    --cc=smfrench@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).