From: Joseph Qi <joseph.qi@linux.alibaba.com>
To: Jan Kara <jack@suse.cz>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>,
tytso@mit.edu, linux-ext4@vger.kernel.org, david@fromorbit.com,
hch@infradead.org, adilger@dilger.ca, mbobrowski@mbobrowski.org,
rgoldwyn@suse.de
Subject: Re: [RFC 0/2] ext4: Improve locking sequence in DIO write path
Date: Wed, 25 Sep 2019 09:17:22 +0800 [thread overview]
Message-ID: <dc63d1cc-b6ae-7a6d-d932-9f36e0ca29bd@linux.alibaba.com> (raw)
In-Reply-To: <20190924151025.GD11819@quack2.suse.cz>
On 19/9/24 23:10, Jan Kara wrote:
> Hi Joseph!
>
> On Wed 18-09-19 14:35:15, Joseph Qi wrote:
>> On 19/9/17 18:32, Ritesh Harjani wrote:
>>> Hello,
>>>
>>> This patch series is based on the upstream discussion with Jan
>>> & Joseph @ [1].
>>> It is based on top of Matthew's v3 ext4 iomap patch series [2]
>>>
>>> Patch-1: Adds the ext4_ilock/unlock APIs and also replaces all
>>> inode_lock/unlock instances from fs/ext4/*
>>>
>>> For now I already accounted for trylock/lock issue symantics
>>> (which was discussed here [3]) in the same patch,
>>> since the this whole patch was around inode_lock/unlock API,
>>> so I thought it will be best to address that issue in the same patch.
>>> However, kindly let me know if otherwise.
>>>
>>> Patch-2: Commit msg of this patch describes in detail about
>>> what it is doing.
>>> In brief - we try to first take the shared lock (instead of exclusive
>>> lock), unless it is a unaligned_io or extend_io. Then in
>>> ext4_dio_write_checks(), if we start with shared lock, we see
>>> if we can really continue with shared lock or not. If not, then
>>> we release the shared lock then acquire exclusive lock
>>> and restart ext4_dio_write_checks().
>>>
>>>
>>> Tested against few xfstests (with dioread_nolock mount option),
>>> those ran fine (ext4 & generic).
>>>
>>> I tried testing performance numbers on my VM (since I could not get
>>> hold of any real h/w based test device). I could test the fact
>>> that earlier we were trying to do downgrade_write() lock, but with
>>> this patch, that path is now avoided for fio test case
>>> (as reported by Joseph in [4]).
>>> But for the actual results, I am not sure if VM machine testing could
>>> really give the reliable perf numbers which we want to take a look at.
>>> Though I do observe some form of perf improvements, but I could not
>>> get any reliable numbers (not even with the same list of with/without
>>> patches with which Joseph posted his numbers [1]).
>>>
>>>
>>> @Joseph,
>>> Would it be possible for you to give your test case a run with this
>>> patches? That will be really helpful.
>>>
>>> Branch for this is hosted at below tree.
>>>
>>> https://github.com/riteshharjani/linux/tree/ext4-ilock-RFC
>>>
>> I've tested your branch, the result is:
>> mounting with dioread_nolock, it behaves the same like reverting
>> parallel dio reads + dioread_nolock;
>> while mounting without dioread_nolock, no improvement, or even worse.
>> Please refer the test data below.
>>
>> fio -name=parallel_dio_reads_test -filename=/mnt/nvme0n1/testfile
>> -direct=1 -iodepth=1 -thread -rw=randrw -ioengine=psync -bs=$bs
>> -size=20G -numjobs=8 -runtime=600 -group_reporting
>>
>> w/ = with parallel dio reads
>> w/o = reverting parallel dio reads
>
> This is with 16c54688592ce8 "ext4: Allow parallel DIO reads" reverted,
> right?
Yes, actually, it also reverts the related patches:
Revert "ext4: remove EXT4_STATE_DIOREAD_LOCK flag"
Revert "ext4: fix off-by-one error when writing back pages before dio read"
Revert "ext4: Allow parallel DIO reads"
>
>> w/o+ = reverting parallel dio reads + dioread_nolock
>> ilock = ext4-ilock-RFC
>> ilock+ = ext4-ilock-RFC + dioread_nolock
>>
>> bs=4k:
>> --------------------------------------------------------------
>> | READ | WRITE |
>> --------------------------------------------------------------
>> w/ | 30898KB/s,7724,555.00us | 30875KB/s,7718,479.70us |
>> --------------------------------------------------------------
>> w/o | 117915KB/s,29478,248.18us | 117854KB/s,29463,21.91us |
>> --------------------------------------------------------------
>
> I'm really surprised by the numbers here. They would mean that when DIO
> read takes i_rwsem exclusive lock instead of shared, it is a win for your
> workload... Argh, now checking code in fs/direct-io.c I think I can see the
> difference. The trick in do_blockdev_direct_IO() is:
>
> if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
> inode_unlock(dio->inode);
> if (dio->is_async && retval == 0 && dio->result &&
> (iov_iter_rw(iter) == READ || dio->result == count))
> retval = -EIOCBQUEUED;
> else
> dio_await_completion(dio);
>
> So actually only direct IO read submission is protected by i_rwsem with
> DIO_LOCKING. Actual waiting for sync DIO read happens with i_rwsem dropped.
>
> After some thought I think the best solution for this is to just finally
> finish the conversion of ext4 so that dioread_nolock is the only DIO path.
> With i_rwsem held in shared mode even for "unlocked" DIO, it should be
> actually relatively simple and most of the dances with unwritten extents
> shouldn't be needed anymore.
>
> Honza
>
prev parent reply other threads:[~2019-09-25 1:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-17 10:32 [RFC 0/2] ext4: Improve locking sequence in DIO write path Ritesh Harjani
2019-09-17 10:32 ` [RFC 1/2] ext4: Add ext4_ilock & ext4_iunlock API Ritesh Harjani
2019-09-17 10:32 ` [RFC 2/2] ext4: Improve DIO writes locking sequence Ritesh Harjani
2019-09-18 0:58 ` [RFC 0/2] ext4: Improve locking sequence in DIO write path Joseph Qi
2019-09-18 6:35 ` Joseph Qi
2019-09-18 10:03 ` Ritesh Harjani
2019-09-18 10:57 ` Joseph Qi
2019-09-19 2:08 ` Joseph Qi
2019-09-19 18:48 ` Ritesh Harjani
2019-09-23 6:19 ` Ritesh Harjani
2019-09-24 15:10 ` Jan Kara
2019-09-24 19:48 ` Ritesh Harjani
2019-09-25 9:23 ` Jan Kara
2019-09-26 12:34 ` Ritesh Harjani
2019-09-26 13:47 ` Jan Kara
2019-09-25 1:17 ` Joseph Qi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc63d1cc-b6ae-7a6d-d932-9f36e0ca29bd@linux.alibaba.com \
--to=joseph.qi@linux.alibaba.com \
--cc=adilger@dilger.ca \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=mbobrowski@mbobrowski.org \
--cc=rgoldwyn@suse.de \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox