From: "Theodore Ts'o" <tytso@mit.edu>
To: Leah Rumancik <leah.rumancik@gmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush
Date: Thu, 13 May 2021 14:09:26 -0400 [thread overview]
Message-ID: <YJ1rVr7Sf7Az+MQm@mit.edu> (raw)
In-Reply-To: <20210511180428.3358267-1-leah.rumancik@gmail.com>
On Tue, May 11, 2021 at 06:04:26PM +0000, Leah Rumancik wrote:
> @@ -3223,7 +3223,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> ext4_clear_inode_state(inode, EXT4_STATE_JDATA);
> journal = EXT4_JOURNAL(inode);
> jbd2_journal_lock_updates(journal);
> - err = jbd2_journal_flush(journal);
> + err = jbd2_journal_flush(journal, 0);
In the ocfs2 changes, I noticed you are using "false", instead of 0,
in the second argument to jbd2_journal_flush.
When I looked more closely, the function signature of
jbd2_journal_flush is also using an unsigned long long for flags,
which struck me as strange:
> +extern int jbd2_journal_flush(journal_t *journal, unsigned long long flags);
I then noticed that later in the patch series, the ioctl argument is
taking an unsigned long long and we're passing that straight through
to jbd2_journal_flush().
First of all, unsigned long long is not very efficient on many
platforms (especially 32-bit platforms), but also on platforms where
int is 32 bits. If we don't expect us to need more than 32 flag bits,
I'd suggest explicit ly using __u32 in ioctl interface. (__u32 is
fine; it's the use of the base int type which can get us into trouble,
since int can be either 32 or 64 bits depending on the architecture).
Secondly, I'd suggest using a different set of flags for
jbd2_journal_flush(), which is an internal kernel interface, and the
EXT4_IOC_CHECKPOINT interface. We might in the future want to add
some internal flags to jbd2_journal_flush that we do *not* want to
expose via EXT4_IOC_CHECKPOINT, and so it's best that we keep those
two interfaces separate.
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 2dc944442802..f86929dbca3c 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1686,6 +1686,106 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
> write_unlock(&journal->j_state_lock);
> }
>
> +#define JBD2_ERASE_FLAG_DISCARD 1
> +#define JBD2_ERASE_FLAG_ZEROOUT 2
I'd suggest defining these in include/linux/jbd2.h, and giving them
names like: JBD2_JOURNAL_FLUSH_DISCARD and JBD2_JOURNAL_FLUSH_ERASE...
(and making the flags parameter an unsigned int).
> + /* flags must be set to either discard or zeroout */
> + if ((flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT) || !flags)
> + return -EINVAL;
The expression (flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT)
is always going to evaluate to zero, since (1 & 2) is 0.
What you probably want is something like:
#define JBD2_JOURNAL_FLUSH_DISCARD 0x0001
#define JBD2_JOURNAL_FLUSH_ZEROOUT 0x0002
#define JBD2_JOURNAL_FLUSH_VALID 0x0003
if ((flags & ~JBD2_JOURNAL_FLUSH_VALID) ||
((flags & JBD2_JOURNAL_FLUSH_DISCARD) &&
(flags & JBD2_JOURNAL_FLUSH_ZEROOUT)))
return -EINVAL;
> +
> + err = jbd2_journal_bmap(journal, log_offset, &block_start);
> + if (err) {
> + printk(KERN_ERR "JBD2: bad block at offset %lu", log_offset);
> + return err;
> + }
We could get rid of this, and instead make sure block_start is initialized
to ~((unsigned long long) 0). Then in the loop we can do...
> +
> + /*
> + * use block_start - 1 to meet check for contiguous with previous region:
> + * phys_block == block_stop + 1
> + */
> + block_stop = block_start - 1;
> +
> + for (block = log_offset; block < journal->j_total_len; block++) {
> + err = jbd2_journal_bmap(journal, block, &phys_block);
> + if (err) {
> + printk(KERN_ERR "JBD2: bad block at offset %lu", block);
> + return err;
> + }
if (block_start == ~((unsigned long long) 0)) {
block_start = phys_block;
block_Stop = block_start - 1;
}
> +
> + if (block == journal->j_total_len - 1) {
> + block_stop = phys_block;
> + } else if (phys_block == block_stop + 1) {
> + block_stop++;
> + continue;
> + }
> +
> + /*
> + * not contiguous with prior physical block or this is last
> + * block of journal, take care of the region
> + */
> + byte_start = block_start * journal->j_blocksize;
> + byte_stop = block_stop * journal->j_blocksize;
> + byte_count = (block_stop - block_start + 1) *
> + journal->j_blocksize;
> +
> + truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> + byte_start, byte_stop);
> +
> + if (flags & JBD2_ERASE_FLAG_DISCARD) {
> + err = blkdev_issue_discard(journal->j_dev,
> + byte_start >> SECTOR_SHIFT,
> + byte_count >> SECTOR_SHIFT,
> + GFP_NOFS, 0);
> + } else if (flags & JBD2_ERASE_FLAG_ZEROOUT) {
> + err = blkdev_issue_zeroout(journal->j_dev,
> + byte_start >> SECTOR_SHIFT,
> + byte_count >> SECTOR_SHIFT,
> + GFP_NOFS, 0);
> + }
> +
> + if (unlikely(err != 0)) {
> + printk(KERN_ERR "JBD2: (error %d) unable to wipe journal at physical blocks %llu - %llu",
> + err, block_start, block_stop);
> + return err;
> + }
> +
> + block_start = phys_block;
> + block_stop = phys_block;
Is this right? When we initialized the loop, above, block_stop was
set to block_start-1 (where block_start == phys_block). So I think it
might be more correct to replace the above two lines with:
block_start = ~((unsigned long long) 0);
... and then let block_start and block_stop be initialized in a single
place. Do you agree? Does this make sense to you?
- Ted
next prev parent reply other threads:[~2021-05-13 18:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 18:04 [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush Leah Rumancik
2021-05-11 18:04 ` [PATCH v4 2/3] ext4: add ioctl EXT4_IOC_CHECKPOINT Leah Rumancik
2021-05-13 21:05 ` Theodore Ts'o
2021-05-11 18:04 ` [PATCH v4 3/3] ext4: update journal documentation Leah Rumancik
2021-05-13 18:09 ` Theodore Ts'o [this message]
2021-05-13 20:18 ` [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush Darrick J. Wong
2021-05-13 20:27 ` Leah Rumancik
2021-05-14 21:26 ` Theodore Ts'o
2021-05-14 16:19 ` Leah Rumancik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YJ1rVr7Sf7Az+MQm@mit.edu \
--to=tytso@mit.edu \
--cc=leah.rumancik@gmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.