From: Leah Rumancik <leah.rumancik@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush
Date: Thu, 13 May 2021 16:27:23 -0400 [thread overview]
Message-ID: <YJ2Lq4rTCv7RciL1@google.com> (raw)
In-Reply-To: <YJ1rVr7Sf7Az+MQm@mit.edu>
On Thu, May 13, 2021 at 02:09:26PM -0400, Theodore Ts'o wrote:
> On Tue, May 11, 2021 at 06:04:26PM +0000, Leah Rumancik wrote:
> > @@ -3223,7 +3223,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> > ext4_clear_inode_state(inode, EXT4_STATE_JDATA);
> > journal = EXT4_JOURNAL(inode);
> > jbd2_journal_lock_updates(journal);
> > - err = jbd2_journal_flush(journal);
> > + err = jbd2_journal_flush(journal, 0);
>
> In the ocfs2 changes, I noticed you are using "false", instead of 0,
> in the second argument to jbd2_journal_flush.
>
> When I looked more closely, the function signature of
> jbd2_journal_flush is also using an unsigned long long for flags,
> which struck me as strange:
>
> > +extern int jbd2_journal_flush(journal_t *journal, unsigned long long flags);
>
> I then noticed that later in the patch series, the ioctl argument is
> taking an unsigned long long and we're passing that straight through
> to jbd2_journal_flush().
>
> First of all, unsigned long long is not very efficient on many
> platforms (especially 32-bit platforms), but also on platforms where
> int is 32 bits. If we don't expect us to need more than 32 flag bits,
> I'd suggest explicit ly using __u32 in ioctl interface. (__u32 is
> fine; it's the use of the base int type which can get us into trouble,
> since int can be either 32 or 64 bits depending on the architecture).
>
Just to make sure I understand correctly, the explicit __u32 is critical
due to the size being read in by the ioctl, specifically through
copy_from_user? When do you switch from __u32 to unsigned long? I don't
see the __* types being carried throughout.
(Also, just got Darrick's reply about the 32 vs. 64. Yes, originally
went with 64 because there was an argument for it. I believe the 32
is likely sufficient, but I don't feel that strongly about this matter)
> Secondly, I'd suggest using a different set of flags for
> jbd2_journal_flush(), which is an internal kernel interface, and the
> EXT4_IOC_CHECKPOINT interface. We might in the future want to add
> some internal flags to jbd2_journal_flush that we do *not* want to
> expose via EXT4_IOC_CHECKPOINT, and so it's best that we keep those
> two interfaces separate.
>
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index 2dc944442802..f86929dbca3c 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -1686,6 +1686,106 @@ static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
> > write_unlock(&journal->j_state_lock);
> > }
> >
> > +#define JBD2_ERASE_FLAG_DISCARD 1
> > +#define JBD2_ERASE_FLAG_ZEROOUT 2
>
> I'd suggest defining these in include/linux/jbd2.h, and giving them
> names like: JBD2_JOURNAL_FLUSH_DISCARD and JBD2_JOURNAL_FLUSH_ERASE...
> (and making the flags parameter an unsigned int).
>
> > + /* flags must be set to either discard or zeroout */
> > + if ((flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT) || !flags)
> > + return -EINVAL;
>
> The expression (flags & JBD2_ERASE_FLAG_DISCARD & JBD2_ERASE_FLAG_ZEROOUT)
> is always going to evaluate to zero, since (1 & 2) is 0.
>
> What you probably want is something like:
>
> #define JBD2_JOURNAL_FLUSH_DISCARD 0x0001
> #define JBD2_JOURNAL_FLUSH_ZEROOUT 0x0002
> #define JBD2_JOURNAL_FLUSH_VALID 0x0003
>
Why call them JBD2_JOURNAL_FLUSH* instead of JBD2_JOURNAL_ERASE* since
they get passed directly through to the erase function? I feel like it
would be weird if someone wanted to use the erase function directly but
had to use JBD2_JOURNAL_FLUSH* flags.
> if ((flags & ~JBD2_JOURNAL_FLUSH_VALID) ||
> ((flags & JBD2_JOURNAL_FLUSH_DISCARD) &&
> (flags & JBD2_JOURNAL_FLUSH_ZEROOUT)))
> return -EINVAL;
>
Ah, great. Thanks!
> > +
> > + err = jbd2_journal_bmap(journal, log_offset, &block_start);
> > + if (err) {
> > + printk(KERN_ERR "JBD2: bad block at offset %lu", log_offset);
> > + return err;
> > + }
>
> We could get rid of this, and instead make sure block_start is initialized
> to ~((unsigned long long) 0). Then in the loop we can do...
>
> > +
> > + /*
> > + * use block_start - 1 to meet check for contiguous with previous region:
> > + * phys_block == block_stop + 1
> > + */
> > + block_stop = block_start - 1;
> > +
> > + for (block = log_offset; block < journal->j_total_len; block++) {
> > + err = jbd2_journal_bmap(journal, block, &phys_block);
> > + if (err) {
> > + printk(KERN_ERR "JBD2: bad block at offset %lu", block);
> > + return err;
> > + }
>
> if (block_start == ~((unsigned long long) 0)) {
> block_start = phys_block;
> block_Stop = block_start - 1;
> }
>
> > +
> > + if (block == journal->j_total_len - 1) {
> > + block_stop = phys_block;
> > + } else if (phys_block == block_stop + 1) {
> > + block_stop++;
> > + continue;
> > + }
> > +
> > + /*
> > + * not contiguous with prior physical block or this is last
> > + * block of journal, take care of the region
> > + */
> > + byte_start = block_start * journal->j_blocksize;
> > + byte_stop = block_stop * journal->j_blocksize;
> > + byte_count = (block_stop - block_start + 1) *
> > + journal->j_blocksize;
> > +
> > + truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> > + byte_start, byte_stop);
> > +
> > + if (flags & JBD2_ERASE_FLAG_DISCARD) {
> > + err = blkdev_issue_discard(journal->j_dev,
> > + byte_start >> SECTOR_SHIFT,
> > + byte_count >> SECTOR_SHIFT,
> > + GFP_NOFS, 0);
> > + } else if (flags & JBD2_ERASE_FLAG_ZEROOUT) {
> > + err = blkdev_issue_zeroout(journal->j_dev,
> > + byte_start >> SECTOR_SHIFT,
> > + byte_count >> SECTOR_SHIFT,
> > + GFP_NOFS, 0);
> > + }
> > +
> > + if (unlikely(err != 0)) {
> > + printk(KERN_ERR "JBD2: (error %d) unable to wipe journal at physical blocks %llu - %llu",
> > + err, block_start, block_stop);
> > + return err;
> > + }
> > +
> > + block_start = phys_block;
> > + block_stop = phys_block;
>
> Is this right? When we initialized the loop, above, block_stop was
> set to block_start-1 (where block_start == phys_block). So I think it
> might be more correct to replace the above two lines with:
>
> block_start = ~((unsigned long long) 0);
>
I'll play around with this and see if I can get it to work. Seems like
it might simplify the code a bit.
> ... and then let block_start and block_stop be initialized in a single
> place. Do you agree? Does this make sense to you?
>
> - Ted
Thanks!
Leah
next prev parent reply other threads:[~2021-05-13 20:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 18:04 [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush Leah Rumancik
2021-05-11 18:04 ` [PATCH v4 2/3] ext4: add ioctl EXT4_IOC_CHECKPOINT Leah Rumancik
2021-05-13 21:05 ` Theodore Ts'o
2021-05-11 18:04 ` [PATCH v4 3/3] ext4: update journal documentation Leah Rumancik
2021-05-13 18:09 ` [PATCH v4 1/3] ext4: add discard/zeroout flags to journal flush Theodore Ts'o
2021-05-13 20:18 ` Darrick J. Wong
2021-05-13 20:27 ` Leah Rumancik [this message]
2021-05-14 21:26 ` Theodore Ts'o
2021-05-14 16:19 ` Leah Rumancik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YJ2Lq4rTCv7RciL1@google.com \
--to=leah.rumancik@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.