From: Jan Kara <jack@suse.cz>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: jack@suse.cz, tytso@mit.edu, linux-ext4@vger.kernel.org,
linux-fsdevel@vger.kernel.org, mbobrowski@mbobrowski.org,
joseph.qi@linux.alibaba.com
Subject: Re: [PATCHv4 2/3] ext4: Start with shared i_rwsem in case of DIO instead of exclusive
Date: Thu, 5 Dec 2019 13:03:07 +0100 [thread overview]
Message-ID: <20191205120307.GA32639@quack2.suse.cz> (raw)
In-Reply-To: <20191205064624.13419-3-riteshh@linux.ibm.com>
On Thu 05-12-19 12:16:23, Ritesh Harjani wrote:
> Earlier there was no shared lock in DIO read path. But this patch
> (16c54688592ce: ext4: Allow parallel DIO reads)
> simplified some of the locking mechanism while still allowing for parallel DIO
> reads by adding shared lock in inode DIO read path.
>
> But this created problem with mixed read/write workload. It is due to the fact
> that in DIO path, we first start with exclusive lock and only when we determine
> that it is a ovewrite IO, we downgrade the lock. This causes the problem, since
> we still have shared locking in DIO reads.
>
> So, this patch tries to fix this issue by starting with shared lock and then
> switching to exclusive lock only when required based on ext4_dio_write_checks().
>
> Other than that, it also simplifies below cases:-
>
> 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was
> abused in the sense that it was not really checking for AIO anywhere also it
> used to check for extending writes. So this API was renamed and simplified to
> ext4_unaligned_io() which actully only checks if the IO is really unaligned.
>
> Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial
> block and that will require serialization against other direct IOs in the same
> block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO
> we also need to wait for any outstanding IOs to complete so that conversion from
> unwritten to written is completed before anyone try to map the overlapping block.
> Hence we take exclusive inode lock and also wait for inode_dio_wait() for
> unaligned DIO case. Please note since we are anyway taking an exclusive lock in
> unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO.
>
> 2. Added ext4_extending_io(). This checks if the IO is extending the file.
>
> 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and
> only switch to exclusive lock if required. So in most cases with aligned,
> non-extending, dioread_nolock & overwrites, it tries to write with a shared
> lock. If not, then we restart the operation in ext4_dio_write_checks(), after
> acquiring exclusive lock.
>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Cool, the patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Two small nits below:
> -static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
> +static ssize_t ext4_generic_write_checks(struct kiocb *iocb,
> + struct iov_iter *from)
> {
> struct inode *inode = file_inode(iocb->ki_filp);
> ssize_t ret;
> @@ -228,11 +235,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
> iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos);
> }
>
> + return iov_iter_count(from);
> +}
You return iov_iter_count() from ext4_generic_write_checks()...
> +static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from,
> + bool *ilock_shared, bool *extend)
> +{
> + struct file *file = iocb->ki_filp;
> + struct inode *inode = file_inode(file);
> + loff_t offset;
> + size_t count;
> + ssize_t ret;
> +
> +restart:
> + ret = ext4_generic_write_checks(iocb, from);
> + if (ret <= 0)
> + goto out;
> +
> + offset = iocb->ki_pos;
> + count = iov_iter_count(from);
But you don't use the returned count here and just call iov_iter_count()
again (which is cheap anyway but still it's strange).
> + if (ext4_extending_io(inode, offset, count))
> + *extend = true;
> + /*
> + * Determine whether the IO operation will overwrite allocated
> + * and initialized blocks. If so, check to see whether it is
> + * possible to take the dioread_nolock path.
> + *
> + * We need exclusive i_rwsem for changing security info
> + * in file_modified().
> + */
> + if (*ilock_shared && (!IS_NOSEC(inode) || *extend ||
> + !ext4_should_dioread_nolock(inode) ||
> + !ext4_overwrite_io(inode, offset, count))) {
> + inode_unlock_shared(inode);
> + *ilock_shared = false;
> + inode_lock(inode);
> + goto restart;
> + }
> +
> + ret = file_modified(file);
> + if (ret < 0)
> + goto out;
> +
> + return count;
And then you return count from ext4_dio_write_checks() here...
> - ret = ext4_write_checks(iocb, from);
> - if (ret <= 0) {
> - inode_unlock(inode);
> + ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend);
> + if (ret <= 0)
> return ret;
> - }
>
> - /*
> - * Unaligned asynchronous direct I/O must be serialized among each
> - * other as the zeroing of partial blocks of two competing unaligned
> - * asynchronous direct I/O writes can result in data corruption.
> - */
> offset = iocb->ki_pos;
> count = iov_iter_count(from);
And then again just don't use the value here...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2019-12-05 12:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-05 6:46 [PATCHv4 0/3] Fix inode_lock sequence to scale performance of DIO mixed R/W workload Ritesh Harjani
2019-12-05 6:46 ` [PATCHv4 1/3] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT Ritesh Harjani
2019-12-05 6:46 ` [PATCHv4 2/3] ext4: Start with shared i_rwsem in case of DIO instead of exclusive Ritesh Harjani
2019-12-05 12:03 ` Jan Kara [this message]
2019-12-05 13:40 ` Ritesh Harjani
2019-12-05 6:46 ` [PATCHv4 3/3] ext4: Move to shared i_rwsem even without dioread_nolock mount opt Ritesh Harjani
2019-12-05 12:05 ` Jan Kara
2019-12-05 13:41 ` Ritesh Harjani
2019-12-05 13:46 ` Ritesh Harjani
2019-12-06 8:46 ` [PATCHv4 0/3] Fix inode_lock sequence to scale performance of DIO mixed R/W workload Joseph Qi
2019-12-06 8:49 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191205120307.GA32639@quack2.suse.cz \
--to=jack@suse.cz \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mbobrowski@mbobrowski.org \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.