Re: [PATCH v2] ext4: make dioread_nolock consistent in each mapping round

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Jinke Han <hanjinke.666@bytedance.com>,
	tytso@mit.edu, adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com, Jinke Han <hanjinke.666@bytedance.com>
Subject: Re: [PATCH v2] ext4: make dioread_nolock consistent in each mapping round
Date: Thu, 16 Feb 2023 18:43:22 +0530	[thread overview]
Message-ID: <87fsb5lpn1.fsf@doe.com> (raw)
In-Reply-To: <20230207124136.88222-1-hanjinke.666@bytedance.com>

Jinke Han <hanjinke.666@bytedance.com> writes:

> From: Jinke Han <hanjinke.666@bytedance.com>
>
> When disable and enable dioread_nolock by remount, we may see
> dioread_lock in ext4_do_writepages while see dioread_nolock in
> mpage_map_one_extent. This inconsistency may triger the warning
> in ext4_add_complete_io when the io_end->handle is NULL. Although
> this warning is harmless in most cases, there is still a risk of
> insufficient log reservation in conversion of unwritten extents.
>

Sorry, I haven't completely gone through the patch yet. But this idea of
caching the initial value of mount parameter and passing it do different
functions while an I/O request completes, is not looking right to me.

If that's the case shouldn't we disallow this mount option to change
until all the outstanding I/O's are done or complete?
Then we need not cache the value of dioread_nolock at the start of
writepages and continue to pass it down in case it it changes.

Just my initial thoughts.

-ritesh

> Signed-off-by: Jinke Han <hanjinke.666@bytedance.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/ext4/inode.c | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 40579ef513b7..122a22ccddb3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1568,6 +1568,7 @@ struct mpage_da_data {
>  	struct ext4_io_submit io_submit;	/* IO submission data */
>  	unsigned int do_map:1;
>  	unsigned int scanned_until_end:1;
> +	unsigned int dioread_nolock:1;
>  };
>
>  static void mpage_release_unused_pages(struct mpage_da_data *mpd,
> @@ -2391,7 +2392,7 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
>  	struct inode *inode = mpd->inode;
>  	struct ext4_map_blocks *map = &mpd->map;
>  	int get_blocks_flags;
> -	int err, dioread_nolock;
> +	int err, dioread_nolock = mpd->dioread_nolock;
>
>  	trace_ext4_da_write_pages_extent(inode, map);
>  	/*
> @@ -2412,7 +2413,6 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
>  	get_blocks_flags = EXT4_GET_BLOCKS_CREATE |
>  			   EXT4_GET_BLOCKS_METADATA_NOFAIL |
>  			   EXT4_GET_BLOCKS_IO_SUBMIT;
> -	dioread_nolock = ext4_should_dioread_nolock(inode);
>  	if (dioread_nolock)
>  		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
>  	if (map->m_flags & BIT(BH_Delay))
> @@ -2727,10 +2727,11 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
>  	handle_t *handle = NULL;
>  	struct inode *inode = mpd->inode;
>  	struct address_space *mapping = inode->i_mapping;
> -	int needed_blocks, rsv_blocks = 0, ret = 0;
> +	int needed_blocks, rsv_blocks = 0, rsv = 0, ret = 0;
>  	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
>  	struct blk_plug plug;
>  	bool give_up_on_write = false;
> +	bool dioread_nolock;
>
>  	trace_ext4_writepages(inode, wbc);
>
> @@ -2783,15 +2784,6 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
>  		ext4_journal_stop(handle);
>  	}
>
> -	if (ext4_should_dioread_nolock(inode)) {
> -		/*
> -		 * We may need to convert up to one extent per block in
> -		 * the page and we may dirty the inode.
> -		 */
> -		rsv_blocks = 1 + ext4_chunk_trans_blocks(inode,
> -						PAGE_SIZE >> inode->i_blkbits);
> -	}
> -
>  	if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
>  		range_whole = 1;
>
> @@ -2837,6 +2829,18 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
>  		goto unplug;
>
>  	while (!mpd->scanned_until_end && wbc->nr_to_write > 0) {
> +		dioread_nolock = ext4_should_dioread_nolock(inode);
> +		if (!rsv && dioread_nolock) {
> +			/*
> +			 * We may need to convert up to one extent per block in
> +			 * the page and we may dirty the inode.
> +			 */
> +			rsv = 1 + ext4_chunk_trans_blocks(inode,
> +							PAGE_SIZE >> inode->i_blkbits);
> +		}
> +		rsv_blocks = dioread_nolock ? rsv : 0;
> +		mpd->dioread_nolock = dioread_nolock;
> +
>  		/* For each extent of pages we use new io_end */
>  		mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
>  		if (!mpd->io_submit.io_end) {
> --
> 2.20.1

next prev parent reply	other threads:[~2023-02-16 13:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07 12:41 [PATCH v2] ext4: make dioread_nolock consistent in each mapping round Jinke Han
2023-02-16 13:13 ` Ritesh Harjani [this message]
2023-02-16 16:10   ` [External] " hanjinke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsb5lpn1.fsf@doe.com \
    --to=ritesh.list@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=hanjinke.666@bytedance.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).