public inbox for linux-cifs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	willy@infradead.org, dchinner@redhat.com,
	Steve French <smfrench@gmail.com>,
	Shyam Prasad N <nspmangalore@gmail.com>,
	Rohith Surabattula <rohiths.msft@gmail.com>,
	Jeff Layton <jlayton@kernel.org>,
	torvalds@linux-foundation.org, linux-cifs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: How to convert I/O iterators to iterators, sglists and RDMA lists
Date: Mon, 17 Oct 2022 06:15:56 -0700	[thread overview]
Message-ID: <Y01VjOE2RrLVA2T6@infradead.org> (raw)
In-Reply-To: <1762414.1665761217@warthog.procyon.org.uk>

On Fri, Oct 14, 2022 at 04:26:57PM +0100, David Howells wrote:
>  (1) Async direct I/O.
> 
>      In the async case direct I/O, we cannot hold on to the iterator when we
>      return, even if the operation is still in progress (ie. we return
>      EIOCBQUEUED), as it is likely to be on the caller's stack.
> 
>      Also, simply copying the iterator isn't sufficient as virtual userspace
>      addresses cannot be trusted and we may have to pin the pages that
>      comprise the buffer.

This is very related to the discussion we are having related to pinning
for O_DIRECT with Ira and Al.  What block file systems do is to take
the pages from the iter and some flags on what is pinned.  We can
generalize this to store all extra state in a flags word, or byte the
bullet and allow cloning of the iter in one form or another.

>  (2) Crypto.
> 
>      The crypto interface takes scatterlists, not iterators, so we need to be
>      able to convert an iterator into a scatterlist in order to do content
>      encryption within netfslib.  Doing this in netfslib makes it easier to
>      store content-encrypted files encrypted in fscache.

Note that the scatterlist is generally a pretty bad interface.  We've
been talking for a while to have an interface that takes a page array
as an input and return an array of { dma_addr, len } tuples.  Thinking
about it taking in an iter might actually be an even better idea.

>  (3) RDMA.
> 
>      To perform RDMA, a buffer list needs to be presented as a QPE array.
>      Currently, cifs converts the iterator it is given to lists of pages, then
>      each list to a scatterlist and thence to a QPE array.  I have code to
>      pass the iterator down to the bottom, using an intermediate BVEC iterator
>      instead of a page list if I can't pass down the original directly (eg. an
>      XARRAY iterator on the pagecache), but I still end up converting it to a
>      scatterlist, which is then converted to a QPE.  I'm trying to go directly
>      from an iterator to a QPE array, thus avoiding the need to allocate an
>      sglist.

I'm not sure what you mean with QPE.  The fundamental low-level
interface in RDMA is the ib_sge.  If you feed it to RDMA READ/WRITE
requests the interface for that is the RDMA R/W API in
drivers/infiniband/core/rw.c, which currently takes a scatterlist but
to which all of the above remarks on DMA interface apply.  For RDMA
SEND that ULP has to do a dma_map_single/page to fill it, which is a
quite horrible layering violation and should move into the driver, but
that is going to a massive change to the whole RDMA subsystem, so
unlikely to happen anytime soon.

Neither case has anything to do with what should be in common iov_iter
code, all this needs to live in the RDMA subsystem as a consumer.

  reply	other threads:[~2022-10-17 13:16 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 15:26 How to convert I/O iterators to iterators, sglists and RDMA lists David Howells
2022-10-17 13:15 ` Christoph Hellwig [this message]
2022-10-20 14:03   ` David Howells
2022-10-21  3:30     ` Ira Weiny
2022-10-24 14:51       ` Christoph Hellwig
2022-10-24 14:57     ` Christoph Hellwig
2022-10-24 19:53       ` Al Viro
2022-10-28  2:33         ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Al Viro
2022-10-28  2:33           ` [PATCH v2 02/12] csum_and_copy_to_iter(): handle ITER_DISCARD Al Viro
2022-10-28  2:33           ` [PATCH v2 03/12] [s390] copy_oldmem_kernel() - WRITE is "data source", not destination Al Viro
2022-10-28  2:33           ` [PATCH v2 04/12] [fsi] " Al Viro
2022-10-28  2:33           ` [PATCH v2 05/12] [infiniband] READ is "data destination", not source Al Viro
2022-10-28  2:33           ` [PATCH v2 06/12] [s390] zcore: WRITE is "data source", not destination Al Viro
2022-10-28  2:33           ` [PATCH v2 07/12] [s390] memcpy_real(): " Al Viro
2022-10-28  2:33           ` [PATCH v2 08/12] [target] fix iov_iter_bvec() "direction" argument Al Viro
2022-10-28  2:33           ` [PATCH v2 09/12] [vhost] fix 'direction' argument of iov_iter_{init,bvec}() Al Viro
2022-10-28  2:33           ` [PATCH v2 10/12] [xen] fix "direction" argument of iov_iter_kvec() Al Viro
2022-10-28 12:48             ` John Stoffel
2022-10-28 12:49               ` John Stoffel
2022-10-28  2:33           ` [PATCH v2 11/12] iov_iter: saner checks for attempt to copy to/from iterator Al Viro
2022-10-28  2:33           ` [PATCH v2 12/12] use less confusing names for iov_iter direction initializers Al Viro
2022-10-28 16:41             ` Linus Torvalds
2022-10-28 17:02               ` David Howells
2022-10-28 17:09                 ` Linus Torvalds
2022-10-28 17:15               ` Al Viro
2022-10-28 18:35                 ` Linus Torvalds
2022-10-28 19:30                   ` Al Viro
2022-10-28 20:34                     ` Linus Torvalds
2022-10-30  5:01                       ` Al Viro
2022-10-30  8:12           ` [PATCH v2 01/12] get rid of unlikely() on page_copy_sane() calls Christoph Hellwig
2022-10-28 17:31         ` How to convert I/O iterators to iterators, sglists and RDMA lists David Howells
2022-11-04 18:47           ` David Howells
2022-11-01 13:51         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y01VjOE2RrLVA2T6@infradead.org \
    --to=hch@infradead.org \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=jlayton@kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nspmangalore@gmail.com \
    --cc=rohiths.msft@gmail.com \
    --cc=smfrench@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox