All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-block@vger.kernel.org, hch@lst.de, viro@zeniv.linux.org.uk
Subject: Re: [PATCH 16/22] aio: add support for submission/completion rings
Date: Wed, 2 Jan 2019 09:28:23 -0700	[thread overview]
Message-ID: <91c8d717-37ab-9696-ff41-700f339a1960@kernel.dk> (raw)
In-Reply-To: <20181227134737.GA24160@infradead.org>

On 12/27/18 6:47 AM, Christoph Hellwig wrote:
> On Fri, Dec 21, 2018 at 12:22:30PM -0700, Jens Axboe wrote:
>> The submission queue (SQ) and completion queue (CQ) rings are shared
>> between the application and the kernel. This eliminates the need to
>> copy data back and forth to submit and complete IO. We use the same
>> structures as the old aio interface. The SQ rings are indexes into a
>> struct iocb array, like we would submit through io_submit(), and the
>> CQ rings are struct io_event, like we would pass in (and copy back)
>> from io_getevents().
>>
>> A new system call is added for this, io_ring_enter(). This system call
>> submits IO that is stored in the SQ ring, and/or completes IO and stores
>> the results in the CQ ring. Hence it's possible to both complete and
>> submit IO in a single system call.
> 
> So this still reuses the aio interface, but I think that is a really
> bad idea.  For one the aio_context_t which really is a chunk of memory
> mapped into the user address space really make no sense at all here.

I don't think that's a big deal at all, the new ring interface just ends
up being a subset of it. We don't map the old user ring, for instance.

> We'd much rather allocate a file descriptor using anon_inode_getfd
> and operate on that.  That also means we can just close that fd
> instead of needing the magic io_destroy, and save all the checks for
> which kind of FD we operate on.

I'm not against doing something else for setup, but we still need some
way of passing in information about what we want. Things like ring
sizing. The actual rings we could mmap from known offsets if we went the
anon route. 

> The big question to me is if we really need the polled version of
> 'classic aio'.  If not we could save io_setup2 and would just need
> io_ring_setup and io_ring_enter as the new syscalls, and basically
> avoid touching the old aio code entirely.

Don't feel strongly about that part. I do like to continue using the aio
infrastructure instead of rewriting everything. And that means that
regular aio gets polling for free, essentially. So seems kind of silly
NOT to offer it as an option.

> Also as another potential interface enhancement I think we should
> consider pre-registering the files we want to do I/O to in the
> io_ring_setup system call, thus avoiding fget/fput entirely
> in io_ring_enter.  In general the set of files used for aio-like
> operations is rather static and should be known at setup time,
> in the worst case we might have to have a version of the setup
> call that can modify the set up files (similar to how say
> epoll works).

I agree, that would be a nice improvement to not have to get/put files
all the time.

> Also this whole API needs a better description, and a CC to the
> linux-api list.

Yep

>> +static void aio_commit_cqring(struct kioctx *ctx, unsigned next_tail)
>> +{
>> +	struct aio_cq_ring *ring = page_address(ctx->cq_ring.pages[0]);
> 
> I don't think we can use page_address here as the memory might be
> highmem.  Same for all the other uses of page_address.

Made changes to support that.

>> +	range->pages = kzalloc(nr_pages * sizeof(struct page *), GFP_KERNEL);
> 
> This should use kcalloc.  Same for a few other instances.

Done

>> +static int __io_ring_enter(struct kioctx *ctx, unsigned int to_submit,
>> +			   unsigned int min_complete, unsigned int flags)
>> +{
>> +	int ret = 0;
>> +
>> +	if (flags & IORING_FLAG_SUBMIT) {
>> +		ret = aio_ring_submit(ctx, to_submit);
>> +		if (ret < 0)
>> +			return ret;
>> +	}
> 
> I don't think we need the IORING_FLAG_SUBMIT flag - a non-zero
> to_submit argument should be a good enough indicator.

Made that change.

> Also this interface will need some cache flushing help, othewise
> it won't work at all for architectures with VIVT caches.

Fixed this up too.

-- 
Jens Axboe


  reply	other threads:[~2019-01-02 16:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-21 19:22 [PATCHSET v10] Support for polled and buffered aio (and more) Jens Axboe
2018-12-21 19:22 ` [PATCH 01/22] fs: add an iopoll method to struct file_operations Jens Axboe
2018-12-21 19:22 ` [PATCH 02/22] block: add bio_set_polled() helper Jens Axboe
2018-12-21 19:22 ` [PATCH 03/22] block: wire up block device iopoll method Jens Axboe
2018-12-21 19:22 ` [PATCH 04/22] block: use REQ_HIPRI_ASYNC for non-sync polled IO Jens Axboe
2018-12-21 19:25   ` Jens Axboe
2018-12-27 13:15     ` Christoph Hellwig
2018-12-27 20:22       ` Jens Axboe
2018-12-21 19:22 ` [PATCH 05/22] block: use bio_set_polled() helper for O_DIRECT Jens Axboe
2018-12-21 19:22 ` [PATCH 06/22] iomap: wire up the iopoll method Jens Axboe
2018-12-21 19:22 ` [PATCH 07/22] aio: add io_setup2() system call Jens Axboe
2018-12-27 13:55   ` Christoph Hellwig
2018-12-27 20:27     ` Jens Axboe
2018-12-21 19:22 ` [PATCH 08/22] aio: support for IO polling Jens Axboe
2018-12-27 13:55   ` Christoph Hellwig
2018-12-27 20:29     ` Jens Axboe
2018-12-21 19:22 ` [PATCH 09/22] aio: add submission side request cache Jens Axboe
2018-12-27 13:56   ` Christoph Hellwig
2018-12-27 20:31     ` Jens Axboe
2018-12-21 19:22 ` [PATCH 10/22] fs: add fget_many() and fput_many() Jens Axboe
2018-12-21 19:22 ` [PATCH 11/22] aio: use fget/fput_many() for file references Jens Axboe
2018-12-21 19:22 ` [PATCH 12/22] aio: split iocb init from allocation Jens Axboe
2018-12-21 19:22 ` [PATCH 13/22] aio: batch aio_kiocb allocation Jens Axboe
2018-12-21 19:22 ` [PATCH 14/22] aio: split old ring complete out from aio_complete() Jens Axboe
2018-12-21 19:22 ` [PATCH 15/22] aio: pass in user index to __io_submit_one() Jens Axboe
2018-12-21 19:22 ` [PATCH 16/22] aio: add support for submission/completion rings Jens Axboe
2018-12-27 13:47   ` Christoph Hellwig
2019-01-02 16:28     ` Jens Axboe [this message]
2019-01-02 20:32       ` Jens Axboe
2019-01-02 19:11   ` Jeff Moyer
2018-12-21 19:22 ` [PATCH 17/22] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2018-12-21 19:22 ` [PATCH 18/22] aio: add support for pre-mapped user IO buffers Jens Axboe
2018-12-27 13:57   ` Christoph Hellwig
2018-12-27 13:57     ` Christoph Hellwig
2018-12-21 19:22 ` [PATCH 19/22] aio: support kernel side submission for aio with SCQRING Jens Axboe
2018-12-27 13:57   ` Christoph Hellwig
2018-12-27 20:34     ` Jens Axboe
2018-12-21 19:22 ` [PATCH 20/22] aio: enable polling for IOCTX_FLAG_SQTHREAD Jens Axboe
2018-12-21 19:22 ` [PATCH 21/22] aio: utilize io_event->res2 for CQ ring Jens Axboe
2018-12-21 19:22 ` [PATCH 22/22] aio: add my copyright Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91c8d717-37ab-9696-ff41-700f339a1960@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-aio@kvack.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.