linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: jack@suse.cz, tytso@mit.edu, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, mbobrowski@mbobrowski.org,
	joseph.qi@linux.alibaba.com
Subject: Re: [PATCHv4 2/3] ext4: Start with shared i_rwsem in case of DIO instead of exclusive
Date: Thu, 5 Dec 2019 13:03:07 +0100	[thread overview]
Message-ID: <20191205120307.GA32639@quack2.suse.cz> (raw)
In-Reply-To: <20191205064624.13419-3-riteshh@linux.ibm.com>

On Thu 05-12-19 12:16:23, Ritesh Harjani wrote:
> Earlier there was no shared lock in DIO read path. But this patch
> (16c54688592ce: ext4: Allow parallel DIO reads)
> simplified some of the locking mechanism while still allowing for parallel DIO
> reads by adding shared lock in inode DIO read path.
> 
> But this created problem with mixed read/write workload. It is due to the fact
> that in DIO path, we first start with exclusive lock and only when we determine
> that it is a ovewrite IO, we downgrade the lock. This causes the problem, since
> we still have shared locking in DIO reads.
> 
> So, this patch tries to fix this issue by starting with shared lock and then
> switching to exclusive lock only when required based on ext4_dio_write_checks().
> 
> Other than that, it also simplifies below cases:-
> 
> 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was
> abused in the sense that it was not really checking for AIO anywhere also it
> used to check for extending writes. So this API was renamed and simplified to
> ext4_unaligned_io() which actully only checks if the IO is really unaligned.
> 
> Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial
> block and that will require serialization against other direct IOs in the same
> block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO
> we also need to wait for any outstanding IOs to complete so that conversion from
> unwritten to written is completed before anyone try to map the overlapping block.
> Hence we take exclusive inode lock and also wait for inode_dio_wait() for
> unaligned DIO case. Please note since we are anyway taking an exclusive lock in
> unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO.
> 
> 2. Added ext4_extending_io(). This checks if the IO is extending the file.
> 
> 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and
> only switch to exclusive lock if required. So in most cases with aligned,
> non-extending, dioread_nolock & overwrites, it tries to write with a shared
> lock. If not, then we restart the operation in ext4_dio_write_checks(), after
> acquiring exclusive lock.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Cool, the patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

Two small nits below:

> -static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
> +static ssize_t ext4_generic_write_checks(struct kiocb *iocb,
> +					 struct iov_iter *from)
>  {
>  	struct inode *inode = file_inode(iocb->ki_filp);
>  	ssize_t ret;
> @@ -228,11 +235,21 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
>  		iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos);
>  	}
>  
> +	return iov_iter_count(from);
> +}

You return iov_iter_count() from ext4_generic_write_checks()...

> +static ssize_t ext4_dio_write_checks(struct kiocb *iocb, struct iov_iter *from,
> +				     bool *ilock_shared, bool *extend)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = file_inode(file);
> +	loff_t offset;
> +	size_t count;
> +	ssize_t ret;
> +
> +restart:
> +	ret = ext4_generic_write_checks(iocb, from);
> +	if (ret <= 0)
> +		goto out;
> +
> +	offset = iocb->ki_pos;
> +	count = iov_iter_count(from);

But you don't use the returned count here and just call iov_iter_count()
again (which is cheap anyway but still it's strange).

> +	if (ext4_extending_io(inode, offset, count))
> +		*extend = true;
> +	/*
> +	 * Determine whether the IO operation will overwrite allocated
> +	 * and initialized blocks. If so, check to see whether it is
> +	 * possible to take the dioread_nolock path.
> +	 *
> +	 * We need exclusive i_rwsem for changing security info
> +	 * in file_modified().
> +	 */
> +	if (*ilock_shared && (!IS_NOSEC(inode) || *extend ||
> +	     !ext4_should_dioread_nolock(inode) ||
> +	     !ext4_overwrite_io(inode, offset, count))) {
> +		inode_unlock_shared(inode);
> +		*ilock_shared = false;
> +		inode_lock(inode);
> +		goto restart;
> +	}
> +
> +	ret = file_modified(file);
> +	if (ret < 0)
> +		goto out;
> +
> +	return count;

And then you return count from ext4_dio_write_checks() here...

> -	ret = ext4_write_checks(iocb, from);
> -	if (ret <= 0) {
> -		inode_unlock(inode);
> +	ret = ext4_dio_write_checks(iocb, from, &ilock_shared, &extend);
> +	if (ret <= 0)
>  		return ret;
> -	}
>  
> -	/*
> -	 * Unaligned asynchronous direct I/O must be serialized among each
> -	 * other as the zeroing of partial blocks of two competing unaligned
> -	 * asynchronous direct I/O writes can result in data corruption.
> -	 */
>  	offset = iocb->ki_pos;
>  	count = iov_iter_count(from);

And then again just don't use the value here...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-12-05 12:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05  6:46 [PATCHv4 0/3] Fix inode_lock sequence to scale performance of DIO mixed R/W workload Ritesh Harjani
2019-12-05  6:46 ` [PATCHv4 1/3] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT Ritesh Harjani
2019-12-05  6:46 ` [PATCHv4 2/3] ext4: Start with shared i_rwsem in case of DIO instead of exclusive Ritesh Harjani
2019-12-05 12:03   ` Jan Kara [this message]
2019-12-05 13:40     ` Ritesh Harjani
2019-12-05  6:46 ` [PATCHv4 3/3] ext4: Move to shared i_rwsem even without dioread_nolock mount opt Ritesh Harjani
2019-12-05 12:05   ` Jan Kara
2019-12-05 13:41     ` Ritesh Harjani
2019-12-05 13:46     ` Ritesh Harjani
2019-12-06  8:46 ` [PATCHv4 0/3] Fix inode_lock sequence to scale performance of DIO mixed R/W workload Joseph Qi
2019-12-06  8:49   ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191205120307.GA32639@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mbobrowski@mbobrowski.org \
    --cc=riteshh@linux.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).