Linux block layer

* Re: [PATCH 0/8 v2] Non-blocking AIO
From: Avi Kivity @ 2017-03-06 18:17 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara
  Cc: Goldwyn Rodrigues, jack, hch, linux-fsdevel, linux-block,
	linux-btrfs, linux-ext4, linux-xfs
In-Reply-To: <7aabb6b4-df8d-8554-fbe3-90504887fb8e@kernel.dk>

On 03/06/2017 07:06 PM, Jens Axboe wrote:
> On 03/06/2017 09:59 AM, Avi Kivity wrote:
>>
>> On 03/06/2017 06:08 PM, Jens Axboe wrote:
>>> On 03/06/2017 08:59 AM, Avi Kivity wrote:
>>>> On 03/06/2017 05:38 PM, Jens Axboe wrote:
>>>>> On 03/06/2017 08:29 AM, Avi Kivity wrote:
>>>>>> On 03/06/2017 05:19 PM, Jens Axboe wrote:
>>>>>>> On 03/06/2017 01:25 AM, Jan Kara wrote:
>>>>>>>> On Sun 05-03-17 16:56:21, Avi Kivity wrote:
>>>>>>>>>> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
>>>>>>>>>> any of these conditions are met. This way userspace can push most
>>>>>>>>>> of the write()s to the kernel to the best of its ability to complete
>>>>>>>>>> and if it returns -EAGAIN, can defer it to another thread.
>>>>>>>>>>
>>>>>>>>> Is it not possible to push the iocb to a workqueue?  This will allow
>>>>>>>>> existing userspace to work with the new functionality, unchanged. Any
>>>>>>>>> userspace implementation would have to do the same thing, so it's not like
>>>>>>>>> we're saving anything by pushing it there.
>>>>>>>> That is not easy because until IO is fully submitted, you need some parts
>>>>>>>> of the context of the process which submits the IO (e.g. memory mappings,
>>>>>>>> but possibly also other credentials). So you would need to somehow transfer
>>>>>>>> this information to the workqueue.
>>>>>>> Outside of technical challenges, the API also needs to return EAGAIN or
>>>>>>> start blocking at some point. We can't expose a direct connection to
>>>>>>> queue work like that, and let any user potentially create millions of
>>>>>>> pending work items (and IOs).
>>>>>> You wouldn't expect more concurrent events than the maxevents parameter
>>>>>> that was supplied to io_setup syscall; it should have reserved any
>>>>>> resources needed.
>>>>> Doesn't matter what limit you apply, my point still stands - at some
>>>>> point you have to return EAGAIN, or block. Returning EAGAIN without
>>>>> the caller having flagged support for that change of behavior would
>>>>> be problematic.
>>>> Doesn't it already return EAGAIN (or some other error) if you exceed
>>>> maxevents?
>>> It's a setup thing. We check these limits when someone creates an IO
>>> context, and carve out the specified entries form our global pool. Then
>>> we free those "resources" when the io context is freed.
>>>
>>> Right now I can setup an IO context with 1000 entries on it, yet that
>>> number has NO bearing on when io_submit() would potentially block or
>>> return EAGAIN.
>>>
>>> We can have a huge gap on the intent signaled by io context setup, and
>>> the reality imposed by what actually happens on the IO submission side.
>> Isn't that a bug?  Shouldn't that 1001st incomplete io_submit() return
>> EAGAIN?
>>
>> Just tested it, and maxevents is not respected for this:
>>
>> io_setup(1, [0x7fc64537f000])           = 0
>> io_submit(0x7fc64537f000, 10, [{pread, fildes=3, buf=0x1eb4000,
>> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
>> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
>> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread,
>> fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3,
>> buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000,
>> nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096,
>> offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0},
>> {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}]) = 10
>>
>> which is unexpected, to me.
> ioctx_alloc()
> {
>          [...]
>
>          /*
>           * We keep track of the number of available ringbuffer slots, to prevent
>           * overflow (reqs_available), and we also use percpu counters for this.
>           *
>           * So since up to half the slots might be on other cpu's percpu counters
>           * and unavailable, double nr_events so userspace sees what they
>           * expected: additionally, we move req_batch slots to/from percpu
>           * counters at a time, so make sure that isn't 0:
>           */
>          nr_events = max(nr_events, num_possible_cpus() * 4);
>          nr_events *= 2;
> }

On a 4-lcore desktop:

io_setup(1, [0x7fc210041000])           = 0
io_submit(0x7fc210041000, 10000, [big array]) = 126
io_submit(0x7fc210041000, 10000, [big array]) = -1 EAGAIN (Resource 
temporarily unavailable)

so, the user should already expect EAGAIN from io_submit() due to 
resource limits.  I'm sure the check could be tightened so that if we do 
have to use a workqueue, we respect the user's limit rather than some 
inflated number.

^ permalink raw reply