Re: [PATCH] zonefs: Always invalidate last cache page on append write

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Hans Holmberg <hans.holmberg@wdc.com>
Subject: Re: [PATCH] zonefs: Always invalidate last cache page on append write
Date: Wed, 29 Mar 2023 17:27:43 +0900	[thread overview]
Message-ID: <46acc134-3f38-2a2d-c2aa-11d2fbee2abc@opensource.wdc.com> (raw)
In-Reply-To: <ZCPzbFzjFyiOVDdl@infradead.org>

On 3/29/23 17:14, Christoph Hellwig wrote:
> On Wed, Mar 29, 2023 at 02:58:23PM +0900, Damien Le Moal wrote:
>> +	/*
>> +	 * If the inode block size (sector size) is smaller than the
>> +	 * page size, we may be appending data belonging to an already
>> +	 * cached last page of the inode. So make sure to invalidate that
>> +	 * last cached page. This will always be a no-op for the case where
>> +	 * the block size is equal to the page size.
>> +	 */
>> +	ret = invalidate_inode_pages2_range(inode->i_mapping,
>> +					    iocb->ki_pos >> PAGE_SHIFT, -1);
>> +	if (ret)
>> +		return ret;
> 
> The missing truncate here obviously is a bug and needs fixing.
> 
> But why does this not follow the logic in __iomap_dio_rw to to return
> -ENOTBLK for any error so that the write falls back to buffered I/O.

This is a write to sequential zones so we cannot use buffered writes. We have to
do a direct write to ensure ordering between writes.

Note that this is the special blocking write case where we issue a zone append.
For async regular writes, we use iomap so this bug does not exist. But then I
now realize that __iomap_dio_rw() falling back to buffered IOs could also create
an issue with write ordering.

> Also as far as I can tell from reading the code, -1 is not a valid
> end special case for invalidate_inode_pages2_range, so you'll actually
> have to pass a valid end here.

I wondered about that but then saw:

int invalidate_inode_pages2(struct address_space *mapping)
{
	return invalidate_inode_pages2_range(mapping, 0, -1);
}
EXPORT_SYMBOL_GPL(invalidate_inode_pages2);

which tend to indicate that "-1" is fine. The end is passed to
find_get_entries() -> find_get_entry() where it becomes a "max" pgoff_t, so
using -1 seems fine.


-- 
Damien Le Moal
Western Digital Research

next prev parent reply	other threads:[~2023-03-29  8:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-29  5:58 [PATCH] zonefs: Always invalidate last cache page on append write Damien Le Moal
2023-03-29  6:14 ` Johannes Thumshirn
2023-03-29  8:14 ` Christoph Hellwig
2023-03-29  8:27   ` Damien Le Moal [this message]
2023-03-29  9:49     ` Damien Le Moal
2023-03-29 23:36     ` Christoph Hellwig
2023-03-29 23:57       ` Damien Le Moal
2023-03-30  0:07         ` Christoph Hellwig
2023-03-30  0:22           ` Damien Le Moal
2023-03-29 11:04 ` Hans Holmberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46acc134-3f38-2a2d-c2aa-11d2fbee2abc@opensource.wdc.com \
    --to=damien.lemoal@opensource.wdc.com \
    --cc=hans.holmberg@wdc.com \
    --cc=hch@infradead.org \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.