Re: [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer
       [not found] ` <20180220023038.19883-6-tytso@mit.edu>
@ 2018-03-06 17:10   ` Jan Kara
  2018-03-22 15:26     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2018-03-06 17:10 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, stable

On Mon 19-02-18 21:30:36, Theodore Ts'o wrote:
> Previously the jbd2 layer assumed that a file system check would be
> required after a journal abort.  In the case of the deliberate file
> system shutdown, this should not be necessary.  Allow the jbd2 layer
> to distinguish between these two cases by using the ESHUTDOWN errno.
> 
> Also add proper locking to __journal_abort_soft().
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: stable@vger.kernel.org
> ---

Maybe a cleaner api would be to have a separate JBD2 function like
jbd2_journal_abort_clean() that would just abort the journal without
setting an error to it.

> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 3fbf48ec2188..efa0c72a0b9f 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2105,12 +2108,22 @@ void __jbd2_journal_abort_hard(journal_t *journal)
>   * but don't do any other IO. */
>  static void __journal_abort_soft (journal_t *journal, int errno)
>  {
> -	if (journal->j_flags & JBD2_ABORT)
> -		return;
> +	int old_errno;
>  
> -	if (!journal->j_errno)
> +	write_lock(&journal->j_state_lock);
> +	old_errno = journal->j_errno;
> +	if (!journal->j_errno || errno == -ESHUTDOWN)
>  		journal->j_errno = errno;
>  
> +	if (journal->j_flags & JBD2_ABORT) {
> +		write_unlock(&journal->j_state_lock);
> +		if (!old_errno && old_errno != -ESHUTDOWN &&
> +		    errno == -ESHUTDOWN)
> +			jbd2_journal_update_sb_errno(journal);
> +		return;
> +	}
> +	write_unlock(&journal->j_state_lock);
> +

Is it really correct that once the filesystem gets shutdown you clear the
previous error from the journal? Because if we hit some real fs corruption,
the journal gets aborted, and then someone calls ext4_shutdown(), we'd
clear that error which looks like a bug to me because that shutdown hardly
fixes the fs corruption...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer
  2018-03-06 17:10   ` [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer Jan Kara
@ 2018-03-22 15:26     ` Theodore Y. Ts'o
  2018-05-17 15:18       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Theodore Y. Ts'o @ 2018-03-22 15:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, stable

On Tue, Mar 06, 2018 at 06:10:41PM +0100, Jan Kara wrote:
> Is it really correct that once the filesystem gets shutdown you clear the
> previous error from the journal? Because if we hit some real fs corruption,
> the journal gets aborted, and then someone calls ext4_shutdown(), we'd
> clear that error which looks like a bug to me because that shutdown hardly
> fixes the fs corruption...

That's not what the code does.  If journal->j_errno is set, then we
won't clear it, for precisely what concern you've articulated.

      	    	    	      	   	   - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer
  2018-03-22 15:26     ` Theodore Y. Ts'o
@ 2018-05-17 15:18       ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2018-05-17 15:18 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, Ext4 Developers List, stable

On Thu 22-03-18 11:26:45, Theodore Y. Ts'o wrote:
> On Tue, Mar 06, 2018 at 06:10:41PM +0100, Jan Kara wrote:
> > Is it really correct that once the filesystem gets shutdown you clear the
> > previous error from the journal? Because if we hit some real fs corruption,
> > the journal gets aborted, and then someone calls ext4_shutdown(), we'd
> > clear that error which looks like a bug to me because that shutdown hardly
> > fixes the fs corruption...
> 
> That's not what the code does.  If journal->j_errno is set, then we
> won't clear it, for precisely what concern you've articulated.

Sorry for not following up on this earlier but now I've returned to this
and I still cannot wrap my head around checks in __journal_abort_soft().
There's:

       if (!journal->j_errno || errno == -ESHUTDOWN)
               journal->j_errno = errno;

Due to this ESHUTDOWN will override anything in journal->j_errno. Why is
that?

And then:

       if (journal->j_flags & JBD2_ABORT) {
                write_unlock(&journal->j_state_lock);
                if (!old_errno && old_errno != -ESHUTDOWN &&
                    errno == -ESHUTDOWN)
                        jbd2_journal_update_sb_errno(journal);

And here can the test ever be true? Probably only if we hard-aborted the
journal. The test !old_errno && old_errno != -ESHUTDOWN definitely looks
weird and is equivalent to old_errno == 0. Furthermore the errno ==
-ESHUTDOWN part looks strange as well as in that case we'd only clear the
sb->s_errno field... So what was really the intention here?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <20180220023038.19883-7-tytso@mit.edu>]

* Re: [PATCH 6/7] jbd2: if the journal is aborted then don't allow update of the log tail
       [not found] ` <20180220023038.19883-7-tytso@mit.edu>
@ 2018-03-07  8:55   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2018-03-07  8:55 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, stable

On Mon 19-02-18 21:30:37, Theodore Ts'o wrote:
> This updates the jbd2 superblock unnecessarily, and on an abort we
> shouldn't truncate the log.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: stable@vger.kernel.org

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/jbd2/journal.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index efa0c72a0b9f..dfb057900e79 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -974,7 +974,7 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block)
>  }
>  
>  /*
> - * This is a variaon of __jbd2_update_log_tail which checks for validity of
> + * This is a variation of __jbd2_update_log_tail which checks for validity of
>   * provided log tail and locks j_checkpoint_mutex. So it is safe against races
>   * with other threads updating log tail.
>   */
> @@ -1417,6 +1417,9 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
>  	journal_superblock_t *sb = journal->j_superblock;
>  	int ret;
>  
> +	if (is_journal_aborted(journal))
> +		return -EIO;
> +
>  	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
>  	jbd_debug(1, "JBD2: updating superblock (start %lu, seq %u)\n",
>  		  tail_block, tail_tid);
> -- 
> 2.16.1.72.g5be1f00a9a
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <20180220023038.19883-8-tytso@mit.edu>]

* Re: [PATCH 7/7] ext4: don't update checksum of new initialized bitmaps
       [not found] ` <20180220023038.19883-8-tytso@mit.edu>
@ 2018-03-07  9:06   ` Jan Kara
  2018-03-22 15:29     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2018-03-07  9:06 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, stable

On Mon 19-02-18 21:30:38, Theodore Ts'o wrote:
> When reading the inode or block allocation bitmap, if the bitmap needs
> to be initialized, do not update the checksum in the block group
> descriptor.  That's because we're not set up to journal those changes.
> Instead, just set the verified bit on the bitmap block, so that it's
> not necessary to validate the checksum.
> 
> When a block or inode allocation actually happens, at that point the
> checksum will be calculated, and update of the bg descriptor block
> will be properly journalled.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: stable@vger.kernel.org
> ---
>  fs/ext4/balloc.c |  3 +--
>  fs/ext4/ialloc.c | 47 +++--------------------------------------------
>  2 files changed, 4 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index f9b3e0a83526..f82c4966f4ce 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -243,8 +243,6 @@ static int ext4_init_block_bitmap(struct super_block *sb,
>  	 */
>  	ext4_mark_bitmap_end(num_clusters_in_group(sb, block_group),
>  			     sb->s_blocksize * 8, bh->b_data);
> -	ext4_block_bitmap_csum_set(sb, block_group, gdp, bh);
> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>  	return 0;
>  }

Probably you should remove the bad checksum handling in
ext4_init_block_bitmap() the same way as you did in
ext4_init_inode_bitmap()? Otherwise the patch looks good.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7/7] ext4: don't update checksum of new initialized bitmaps
  2018-03-07  9:06   ` [PATCH 7/7] ext4: don't update checksum of new initialized bitmaps Jan Kara
@ 2018-03-22 15:29     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Y. Ts'o @ 2018-03-22 15:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, stable

On Wed, Mar 07, 2018 at 10:06:07AM +0100, Jan Kara wrote:
> 
> Probably you should remove the bad checksum handling in
> ext4_init_block_bitmap() the same way as you did in
> ext4_init_inode_bitmap()? Otherwise the patch looks good.

There isn't an ext4_init_inode_bitmap() --- the functionality is
inlined ext4_read_inode_bitmap(), and it's already doing it that way
(it doesn't set the checksum; it just sets the verified bit).

    	    	    	      	      - Ted
    

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <20180220023038.19883-5-tytso@mit.edu>]

* Re: [PATCH 4/7] ext4: eliminate sleep from shutdown ioctl
       [not found] ` <20180220023038.19883-5-tytso@mit.edu>
@ 2018-03-07  9:07   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2018-03-07  9:07 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, stable

On Mon 19-02-18 21:30:35, Theodore Ts'o wrote:
> The msleep() when processing EXT4_GOING_FLAGS_NOLOGFLUSH was a hack to
> avoid some races (that are now fixed), but in fact it introduced its
> own race.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: stable@vger.kernel.org

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ioctl.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 4d1b1575f8ac..16d3d1325f5b 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -498,10 +498,8 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
>  		break;
>  	case EXT4_GOING_FLAGS_NOLOGFLUSH:
>  		set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
> -		if (sbi->s_journal && !is_journal_aborted(sbi->s_journal)) {
> -			msleep(100);
> +		if (sbi->s_journal && !is_journal_aborted(sbi->s_journal))
>  			jbd2_journal_abort(sbi->s_journal, 0);
> -		}
>  		break;
>  	default:
>  		return -EINVAL;
> -- 
> 2.16.1.72.g5be1f00a9a
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <20180220023038.19883-4-tytso@mit.edu>]

* Re: [PATCH 3/7] ext4: shutdown should not prevent get_write_access
       [not found] ` <20180220023038.19883-4-tytso@mit.edu>
@ 2018-03-07  9:42   ` Jan Kara
  2018-03-22 15:38     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2018-03-07  9:42 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, stable

On Mon 19-02-18 21:30:34, Theodore Ts'o wrote:
> The ext4 forced shutdown flag needs to prevent new handles from being
> started, but it needs to allow existing handles to complete.  So the
> forced shutdown flag should not force ext4_journal_get_write_access to
> fail.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: stable@vger.kernel.org

OK, if you want the semantics of ext4 shutdown to be that running
handles should be allowed to complete, I see where you are going with this
patch. However there are more problems with this semantics than just
__ext4_journal_get_write_access(). Just for example
ext4_reserve_inode_write() will bail in case the fs got shutdown and thus
inode changes won't be properly added to the running handle. Also places
that rely on nested transactions being possible will not work because
ext4_journal_start_sb() will refuse to get refcount of a running handle
(ext4_journal_check_start() fails) in case fs got shutdown. And I may have
missed other cases.

The above problems are a reason why I though the semantics of ext4 shutdown
was terminate the fs *now* - effectively a software equivalent of power
off. That is much easier to implement since we just have to make sure no
running handle makes it to the journal... Since I've said Google is using
ext4 shutdown - is there any reason why you need the "running handles are
allowed to finish" semantics? After all it seems it's just a race whether
some handle makes it before the cut off or not...

								Honza

> ---
>  fs/ext4/ext4_jbd2.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
> index 2d593201cf7a..7c70b08d104c 100644
> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -166,13 +166,6 @@ int __ext4_journal_get_write_access(const char *where, unsigned int line,
>  	might_sleep();
>  
>  	if (ext4_handle_valid(handle)) {
> -		struct super_block *sb;
> -
> -		sb = handle->h_transaction->t_journal->j_private;
> -		if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) {
> -			jbd2_journal_abort_handle(handle);
> -			return -EIO;
> -		}
>  		err = jbd2_journal_get_write_access(handle, bh);
>  		if (err)
>  			ext4_journal_abort_handle(where, line, __func__, bh,
> -- 
> 2.16.1.72.g5be1f00a9a
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/7] ext4: shutdown should not prevent get_write_access
  2018-03-07  9:42   ` [PATCH 3/7] ext4: shutdown should not prevent get_write_access Jan Kara
@ 2018-03-22 15:38     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Y. Ts'o @ 2018-03-22 15:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, stable

On Wed, Mar 07, 2018 at 10:42:01AM +0100, Jan Kara wrote:
> On Mon 19-02-18 21:30:34, Theodore Ts'o wrote:
> > The ext4 forced shutdown flag needs to prevent new handles from being
> > started, but it needs to allow existing handles to complete.  So the
> > forced shutdown flag should not force ext4_journal_get_write_access to
> > fail.
> > 
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > Cc: stable@vger.kernel.org
> 
> OK, if you want the semantics of ext4 shutdown to be that running
> handles should be allowed to complete, I see where you are going with this
> patch. However there are more problems with this semantics than just
> __ext4_journal_get_write_access(). Just for example
> ext4_reserve_inode_write() will bail in case the fs got shutdown and thus
> inode changes won't be properly added to the running handle. Also places
> that rely on nested transactions being possible will not work because
> ext4_journal_start_sb() will refuse to get refcount of a running handle
> (ext4_journal_check_start() fails) in case fs got shutdown. And I may have
> missed other cases.

We have three cases in ext4_shutdown:

EXT4_GOING_FLAGS_DEFUALT:
    Freezes the block device, sets the EXT4_FLAGS_SHUTDOWN flag, and then
    unfreezes the block device:

EXT4_GOING_FLAGS_LOGFLUSH:
    Sets the EXT4_FLAGS_SHUTDOWN flag, forces a commit, and then aborts
    the journal.

EXT4_GOING_FLAGS_NOLOGFLUSH
    Sets the EXT4_FLAGS_SHUTDOWN flag, aborts the journal.

> The above problems are a reason why I though the semantics of ext4 shutdown
> was terminate the fs *now* - effectively a software equivalent of power
> off. That is much easier to implement since we just have to make sure no
> running handle makes it to the journal... Since I've said Google is using
> ext4 shutdown - is there any reason why you need the "running handles are
> allowed to finish" semantics? After all it seems it's just a race whether
> some handle makes it before the cut off or not...

Good points.  I'll have to look at what happens if we just drop the
EXT4_FLAGS_SHUTDOWN flag altogether.

							- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-17 15:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180220023038.19883-1-tytso@mit.edu>
     [not found] ` <20180220023038.19883-6-tytso@mit.edu>
2018-03-06 17:10   ` [PATCH 5/7] ext4: pass -ESHUTDOWN code to jbd2 layer Jan Kara
2018-03-22 15:26     ` Theodore Y. Ts'o
2018-05-17 15:18       ` Jan Kara
     [not found] ` <20180220023038.19883-7-tytso@mit.edu>
2018-03-07  8:55   ` [PATCH 6/7] jbd2: if the journal is aborted then don't allow update of the log tail Jan Kara
     [not found] ` <20180220023038.19883-8-tytso@mit.edu>
2018-03-07  9:06   ` [PATCH 7/7] ext4: don't update checksum of new initialized bitmaps Jan Kara
2018-03-22 15:29     ` Theodore Y. Ts'o
     [not found] ` <20180220023038.19883-5-tytso@mit.edu>
2018-03-07  9:07   ` [PATCH 4/7] ext4: eliminate sleep from shutdown ioctl Jan Kara
     [not found] ` <20180220023038.19883-4-tytso@mit.edu>
2018-03-07  9:42   ` [PATCH 3/7] ext4: shutdown should not prevent get_write_access Jan Kara
2018-03-22 15:38     ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).