linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH RFC] iomap: only return IO error if no data has been transferred
Date: Wed, 18 Nov 2020 14:36:47 -0700	[thread overview]
Message-ID: <83c8c94e-0d70-bd9a-d5b2-0621c1d977ac@kernel.dk> (raw)
In-Reply-To: <20201118213341.GR7391@dread.disaster.area>

On 11/18/20 2:33 PM, Dave Chinner wrote:
> On Wed, Nov 18, 2020 at 02:19:30PM -0700, Jens Axboe wrote:
>> On 11/18/20 2:15 PM, Dave Chinner wrote:
>>> On Wed, Nov 18, 2020 at 02:00:06PM -0700, Jens Axboe wrote:
>>>> On 11/18/20 1:37 PM, Dave Chinner wrote:
>>>>> On Wed, Nov 18, 2020 at 08:26:50AM -0700, Jens Axboe wrote:
>>>>>> On 11/18/20 12:19 AM, Dave Chinner wrote:
>>>>>>> On Tue, Nov 17, 2020 at 03:17:18PM -0700, Jens Axboe wrote:
>>>>>>>> If we've successfully transferred some data in __iomap_dio_rw(),
>>>>>>>> don't mark an error for a latter segment in the dio.
>>>>>>>>
>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Debugging an issue with io_uring, which uses IOCB_NOWAIT for the
>>>>>>>> IO. If we do parts of an IO, then once that completes, we still
>>>>>>>> return -EAGAIN if we ran into a problem later on. That seems wrong,
>>>>>>>> normal convention would be to return the short IO instead. For the
>>>>>>>> -EAGAIN case, io_uring will retry later parts without IOCB_NOWAIT
>>>>>>>> and complete it successfully.
>>>>>>>
>>>>>>> So you are getting a write IO that is split across an allocated
>>>>>>> extent and a hole, and the second mapping is returning EAGAIN
>>>>>>> because allocation would be required? This sort of split extent IO
>>>>>>> is fairly common, so I'm not sure that splitting them into two
>>>>>>> separate IOs may not be the best approach.
>>>>>>
>>>>>> The case I seem to be hitting is this one:
>>>>>>
>>>>>> if (iocb->ki_flags & IOCB_NOWAIT) {
>>>>>> 	if (filemap_range_has_page(mapping, pos, end)) {
>>>>>>                   ret = -EAGAIN;
>>>>>>                   goto out_free_dio;
>>>>>> 	}
>>>>>> 	flags |= IOMAP_NOWAIT;
>>>>>> }
>>>>>>
>>>>>> in __iomap_dio_rw(), which isn't something we can detect upfront like IO
>>>>>> over a multiple extents...
>>>>>
>>>>> This specific situation cannot result in the partial IO behaviour
>>>>> you described.  It is an -upfront check- that is done before any IO
>>>>> is mapped or issued so results in the entire IO being skipped and we
>>>>> don't get anywhere near the code you changed.
>>>>>
>>>>> IOWs, this doesn't explain why you saw a partial IO, or why changing
>>>>> partial IO return values avoids -EAGAIN from a range we apparently
>>>>> just did a partial IO over and -didn't have page cache pages-
>>>>> sitting over it.
>>>>
>>>> You are right, I double checked and recreated my debugging. What's
>>>> triggering is that we're hitting this in xfs_direct_write_iomap_begin()
>>>> after we've already done some IO:
>>>>
>>>> allocate_blocks:
>>>> 	error = -EAGAIN;
>>>> 	if (flags & IOMAP_NOWAIT)
>>>> 		goto out_unlock;
>>>
>>> Ok, that's exactly the case the reproducer I wrote triggers.
>>
>> OK good, then we're on the same page :-)
>>
>>>>> Can you provide an actual event trace of the IOs in question that
>>>>> are failing in your tests (e.g. from something like `trace-cmd
>>>>> record -e xfs_file\* -e xfs_i\* -e xfs_\*write -e iomap\*` over the
>>>>> sequential that reproduces the issue) so that there's no ambiguity
>>>>> over how this problem is occurring in your systems?
>>>>
>>>> Let me know if you still want this!
>>>
>>> No, it makes sense now :)
>>
>> What's the next step here? Are you working on an XFS fix for this?
> 
> I'm just building the patch now for testing.

Nice, you're fast...

>> Was looking at other potential -EAGAIN during the loop, and seems like
>> we'd be able to hit this if we fail xfs_ilock_for_iomap() as well. And
>> not sure how that would be solvable, without accepting that IOCB_NOWAIT
>> reads/writes can be short.
> 
> The change I'm making should solves that, too. i.e. NOWAIT IO must
> map entirely within a single extent, so there is no scope for a
> short IO and re-entering the mapping code under NOWAIT conditions
> that could then fail.

Perfect, thanks Dave!

-- 
Jens Axboe


  reply	other threads:[~2020-11-18 21:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 22:17 [PATCH RFC] iomap: only return IO error if no data has been transferred Jens Axboe
2020-11-18  7:09 ` Christoph Hellwig
2020-11-18  7:19 ` Dave Chinner
2020-11-18  7:55   ` Dave Chinner
2020-11-18 15:26   ` Jens Axboe
2020-11-18 20:37     ` Dave Chinner
2020-11-18 21:00       ` Jens Axboe
2020-11-18 21:15         ` Dave Chinner
2020-11-18 21:19           ` Jens Axboe
2020-11-18 21:33             ` Dave Chinner
2020-11-18 21:36               ` Jens Axboe [this message]
2020-11-18 21:48                 ` Dave Chinner
2020-11-18 21:55                   ` Jens Axboe
2020-11-18 22:21                     ` Dave Chinner
2020-11-26 14:00 ` [iomap] b258228a4e: Assertion_failed kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83c8c94e-0d70-bd9a-d5b2-0621c1d977ac@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).