From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
To: Jan Kara <jack@suse.cz>
Cc: "Theodore Y. Ts'o" <tytso@mit.edu>,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
linux-fsdevel@vger.kernel.org, hch@infradead.org,
david@fromorbit.com, darrick.wong@oracle.com
Subject: Re: [PATCH v6 00/11] ext4: port direct I/O to iomap infrastructure
Date: Fri, 1 Nov 2019 09:58:28 +1100 [thread overview]
Message-ID: <20191031225826.GA19790@bobrowski> (raw)
In-Reply-To: <20191031165416.GD13321@quack2.suse.cz>
On Thu, Oct 31, 2019 at 05:54:16PM +0100, Jan Kara wrote:
> On Thu 31-10-19 20:16:41, Matthew Bobrowski wrote:
> > On Wed, Oct 30, 2019 at 12:39:18PM +0100, Jan Kara wrote:
> > > On Wed 30-10-19 12:26:52, Jan Kara wrote:
> > > Hum, actually no. This write from fsx output:
> > >
> > > 24( 24 mod 256): WRITE 0x23000 thru 0x285ff (0x5600 bytes)
> > >
> > > should have allocated blocks to where the failed write was going (0x24000).
> > > But still I'd expect some interaction between how buffered writes to holes
> > > interact with following direct IO writes... One of the subtle differences
> > > we have introduced with iomap conversion is that the old code in
> > > __generic_file_write_iter() did fsync & invalidate written range after
> > > buffered write fallback and we don't seem to do that now (probably should
> > > be fixed regardless of relation to this bug).
> >
> > After performing some debugging this afternoon, I quickly realised
> > that the fix for this is rather trivial. Within the previous direct
> > I/O implementation, we passed EXT4_GET_BLOCKS_CREATE to
> > ext4_map_blocks() for any writes to inodes without extents. I seem to
> > have missed that here and consequently block allocation for a write
> > wasn't performing correctly in such cases.
>
> No, this is not correct. For inodes without extents we used
> ext4_dio_get_block() and we pass DIO_SKIP_HOLES to __blockdev_direct_IO().
> Now DIO_SKIP_HOLES means that if starting block is within i_size, we pass
> 'create == 0' to get_blocks() function and thus ext4_dio_get_block() uses
> '0' argument to ext4_map_blocks() similarly to what you do.
Ah right, I missed that part. :(
> And indeed for inodes without extents we must fallback to buffered IO for
> filling holes inside a file to avoid stale data exposure (racing DIO read
> could read block contents before data is written to it if we used
> EXT4_GET_BLOCKS_CREATE).
Well in this case I'm pretty sure I know exactly where the problem
resides. I seem to be falling back to buffered I/O from
ext4_dio_write_iter() without actually taking into account any of the
data that may have partially been written by the direct I/O. So, when
returning the bytes written back to userspace it's whatever actually
is returned by ext4_buffered_write_iter(), which may not necessarily
be the amount of bytes that were expected, so it should rather be
ext4_dio_write_iter() + ext4_buffered_write_iter()...
> > Also, I agree, the fsync + page cache invalidation bits need to be
> > implemented. I'm just thinking to branch out within
> > ext4_buffered_write_iter() and implement those bits there i.e.
> >
> > ...
> > ret = generic_perform_write();
> >
> > if (ret > 0 && iocb->ki_flags & IOCB_DIRECT) {
> > err = filemap_write_and_wait_range();
> >
> > if (!err)
> > invalidate_mapping_pages();
> > ...
> >
> > AFAICT, this would be the most appropriate place to put it? Or, did
> > you have something else in mind?
>
> Yes, either this, or maybe in ext4_dio_write_iter() after returning from
> ext4_buffered_write_iter() would be even more logical.
Yes, let's stick with doing it within ext4_dio_write_iter().
--<M>--
next prev parent reply other threads:[~2019-10-31 22:58 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-28 10:50 [PATCH v6 00/11] ext4: port direct I/O to iomap infrastructure Matthew Bobrowski
2019-10-28 10:50 ` [PATCH v6 01/11] ext4: reorder map.m_flags checks within ext4_iomap_begin() Matthew Bobrowski
2019-10-28 10:50 ` [PATCH v6 02/11] ext4: update direct I/O read lock pattern for IOCB_NOWAIT Matthew Bobrowski
2019-10-28 10:51 ` [PATCH v6 03/11] ext4: iomap that extends beyond EOF should be marked dirty Matthew Bobrowski
2019-10-28 10:51 ` [PATCH v6 04/11] ext4: move set iomap routines into a separate helper ext4_set_iomap() Matthew Bobrowski
2019-10-28 17:03 ` Darrick J. Wong
2019-10-28 20:36 ` Matthew Bobrowski
2019-10-28 23:56 ` Darrick J. Wong
2019-10-28 10:51 ` [PATCH v6 05/11] ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper Matthew Bobrowski
2019-10-28 10:52 ` [PATCH v6 06/11] ext4: introduce new callback for IOMAP_REPORT Matthew Bobrowski
2019-10-29 5:42 ` Ritesh Harjani
2019-10-28 10:52 ` [PATCH v6 07/11] ext4: introduce direct I/O read using iomap infrastructure Matthew Bobrowski
2019-10-28 10:52 ` [PATCH v6 08/11] ext4: move inode extension/truncate code out from ->iomap_end() callback Matthew Bobrowski
2019-10-29 5:46 ` Ritesh Harjani
2019-10-28 10:53 ` [PATCH v6 09/11] ext4: move inode extension check out from ext4_iomap_alloc() Matthew Bobrowski
2019-10-28 10:53 ` [PATCH v6 11/11] ext4: introduce direct I/O write using iomap infrastructure Matthew Bobrowski
2019-10-29 6:14 ` Ritesh Harjani
2019-10-28 10:53 ` [PATCH v6 10/11] ext4: update ext4_sync_file() to not use __generic_file_fsync() Matthew Bobrowski
2019-10-29 6:12 ` Ritesh Harjani
2019-10-30 11:18 ` Jan Kara
2019-10-29 23:31 ` [PATCH v6 00/11] ext4: port direct I/O to iomap infrastructure Theodore Y. Ts'o
2019-10-29 23:34 ` Theodore Y. Ts'o
2019-10-30 2:00 ` Matthew Bobrowski
2019-10-30 11:26 ` Jan Kara
2019-10-30 11:39 ` Jan Kara
2019-10-31 9:16 ` Matthew Bobrowski
2019-10-31 16:54 ` Jan Kara
2019-10-31 22:58 ` Matthew Bobrowski [this message]
2019-11-03 19:20 ` Theodore Y. Ts'o
2019-11-04 6:04 ` Matthew Bobrowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191031225826.GA19790@bobrowski \
--to=mbobrowski@mbobrowski.org \
--cc=adilger.kernel@dilger.ca \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.