From: Liu Bo <bo.li.liu@oracle.com>
To: Filipe Manana <fdmanana@gmail.com>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
David Sterba <dsterba@suse.cz>, Josef Bacik <jbacik@fb.com>
Subject: Re: resend: Re: Btrfs: adjust len of writes if following a preallocated extent
Date: Fri, 16 Dec 2016 14:37:02 -0800 [thread overview]
Message-ID: <20161216223702.GB2002@localhost.localdomain> (raw)
In-Reply-To: <CAL3q7H4_KhCpjtfQguNW41EsKFzj=+hRVOixi4ppbOgsVDggKw@mail.gmail.com>
On Fri, Dec 02, 2016 at 12:25:19PM +0000, Filipe Manana wrote:
> On Fri, Dec 2, 2016 at 1:41 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> > On Thu, Nov 24, 2016 at 11:13:37AM +0000, Filipe Manana wrote:
> >> On Wed, Nov 23, 2016 at 9:22 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
...
> >>
> >> The analysis is correct Bo.
> >> Originally to fix races between fsync and direct IO writes there was a
> >> solution [1, 2] that didn't involve adding a semaphore and relied on
> >> creating first the ordered extents and then the extent maps only in
> >> the direct IO write path (we do things in the reverse order everywhere
> >> else). It worked and was documented in comments but wasn't
> >> particularly elegant and Josef was not happy because of that, so then
> >> we added the semaphore and made direct IO write path create the extent
> >> maps and ordered extents in the same order as everywhere else [3].
> >>
> >> So here I can only see 2 simple solutions. Either revert [3] (which
> >> added the semaphore) or acquire the semaphore at a higher level in
> >> direct IO write path like this:
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 1f980ef..b2c277d 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -7237,7 +7237,6 @@ static struct extent_map
> >> *btrfs_create_dio_extent(struct inode *inode,
> >> struct extent_map *em = NULL;
> >> int ret;
> >>
> >> - down_read(&BTRFS_I(inode)->dio_sem);
> >> if (type != BTRFS_ORDERED_NOCOW) {
> >> em = create_pinned_em(inode, start, len, orig_start,
> >> block_start, block_len, orig_block_len,
> >> @@ -7256,8 +7255,6 @@ static struct extent_map
> >> *btrfs_create_dio_extent(struct inode *inode,
> >> em = ERR_PTR(ret);
> >> }
> >> out:
> >> - up_read(&BTRFS_I(inode)->dio_sem);
> >> -
> >> return em;
> >> }
> >>
> >> @@ -8715,11 +8712,14 @@ static ssize_t btrfs_direct_IO(struct kiocb
> >> *iocb, struct iov_iter *iter)
> >> wakeup = false;
> >> }
> >>
> >> + if (iov_iter_rw(iter) == WRITE)
> >> + down_read(&BTRFS_I(inode)->dio_sem);
> >> ret = __blockdev_direct_IO(iocb, inode,
> >>
> >> BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev,
> >> iter, btrfs_get_blocks_direct, NULL,
> >> btrfs_submit_direct, flags);
> >> if (iov_iter_rw(iter) == WRITE) {
> >> + up_read(&BTRFS_I(inode)->dio_sem);
> >> current->journal_info = NULL;
> >> if (ret < 0 && ret != -EIOCBQUEUED) {
> >> if (dio_data.reserve)
> >>
> >>
> >> Let me know what you think. Thanks.
> >
> > Hi Filipe,
> >
> > After going over again fs/direct-io.c, the AIO dio_complete is diffrent from the
> > 'Note that' part in your patch [1], what am I missing?
> >
> > -------------------------------------------------------------------------
> > static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
> > {
> > ...
> > if (!(dio->flags & DIO_SKIP_DIO_COUNT))
> > inode_dio_end(dio->inode);
> >
> > if (is_async) {
> > /*
> > * generic_write_sync expects ki_pos to have been updated
> > * already, but the submission path only does this for
> > * synchronous I/O.
> > */
> > dio->iocb->ki_pos += transferred;
> >
> > if (dio->op == REQ_OP_WRITE)
> > ret = generic_write_sync(dio->iocb, transferred);
> > dio->iocb->ki_complete(dio->iocb, ret, 0);
> > }
> > ...
> > }
>
> It's still the same as when I checked. The problem is that even after
> that call to inode_dio_end(), the inode->i_dio_count() is still
> non-zero (it's 1 iirc).
> I don't have any longer the debugging patch I used to find out that
> out (nor remember all the details), but I just tried again the
> approach of calling inode_dio_wait() in btrfs_sync_file() after
> locking the inode:
>
> https://friendpaste.com/pRwJkgsFXv6HglftK1BqH
>
> And the same deadlock ends up happening, which is trivial to reproduce
> with fstest generic/113:
>
...
> I don't have the time to go now figure out all the details of the aio
> code path again as I'm on vacation. But it's pretty evident that that
> solution doesn't work unfortunately.
All right...I've got the deadlock under space pressure, i.e. run
generic/113 with '-o fragment=data'.
I was not able to reproduce the deadlock with generic/113 after applying
'inode_dio_wait' in fsync, but I agree on not using 'inode_dio_wait'
approach because dio reads also contribute to inode->i_dio_count.
With the above patch that acquires the semaphore at a higher level
in direct IO write path, the deadlock now has gone away.
So would you please make a single patch? (It's also a good candidate
for stable kernel.)
Thanks,
-liubo
prev parent reply other threads:[~2016-12-16 22:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-04 19:20 [PATCH] Btrfs: adjust len of writes if following a preallocated extent Liu Bo
2016-11-22 19:59 ` Chris Mason
2016-11-23 15:54 ` NOC - Profihost AG
2016-11-23 17:21 ` resend: " Stefan Priebe - Profihost AG
2016-11-23 18:23 ` Holger Hoffstätte
2016-11-23 18:58 ` Stefan Priebe - Profihost AG
2016-11-23 21:22 ` Liu Bo
2016-11-24 11:13 ` Filipe Manana
2016-12-02 1:41 ` Liu Bo
2016-12-02 12:25 ` Filipe Manana
2016-12-16 22:37 ` Liu Bo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161216223702.GB2002@localhost.localdomain \
--to=bo.li.liu@oracle.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@gmail.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=s.priebe@profihost.ag \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).