public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: fix deadlock while checkpoint thread waits commit thread to finish
Date: Fri, 23 Nov 2018 10:45:20 +0800	[thread overview]
Message-ID: <7f9dab38-26f8-6fd8-299e-30d711ee61b8@linux.alibaba.com> (raw)
In-Reply-To: <20181122123601.GO9840@quack2.suse.cz>

hi,

> On Wed 14-11-18 19:49:35, Xiaoguang Wang wrote:
>> This issue was found when I tried to put checkpoint work in a separate thread,
>> the deadlock below happened:
>>           Thread1                                |   Thread2
>> __jbd2_log_wait_for_space                       |
>> jbd2_log_do_checkpoint (hold j_checkpoint_mutex)|
>>    if (jh->b_transaction != NULL)                |
>>      ...                                         |
>>      jbd2_log_start_commit(journal, tid);        |jbd2_update_log_tail
>>                                                  |  will lock j_checkpoint_mutex,
>>                                                  |  but will be blocked here.
>>                                                  |
>>      jbd2_log_wait_commit(journal, tid);         |
>>      wait_event(journal->j_wait_done_commit,     |
>>       !tid_gt(tid, journal->j_commit_sequence)); |
>>       ...                                        |wake_up(j_wait_done_commit)
>>    }                                             |
>>
>> then deadlock occurs, Thread1 will never be waken up.
>>
>> To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint()
>> when we are going to wait for transaction commit.
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> 
> Thanks for the patch! One comment below...
> 
>> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
>> index 26f8d7e46462..e728844f2f0e 100644
>> --- a/fs/jbd2/checkpoint.c
>> +++ b/fs/jbd2/checkpoint.c
>> @@ -113,7 +113,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
>>   	nblocks = jbd2_space_needed(journal);
>>   	while (jbd2_log_space_left(journal) < nblocks) {
>>   		write_unlock(&journal->j_state_lock);
>> -		mutex_lock(&journal->j_checkpoint_mutex);
>> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>>   
>>   		/*
>>   		 * Test again, another process may have checkpointed while we
>> @@ -241,8 +241,8 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>   	 * done (maybe it's a new transaction, but it fell at the same
>>   	 * address).
>>   	 */
>> -	if (journal->j_checkpoint_transactions != transaction ||
>> -	    transaction->t_tid != this_tid)
>> +	if (journal->j_checkpoint_transactions == NULL ||
>> +	    journal->j_checkpoint_transactions->t_tid != this_tid)
>>   		goto out;
> 
> Why did you change this? As far as I can tell there's no difference and the
> previous condition makes it more obvious that we are still looking at the
> same transaction.
In this patch, we may drop j_checkpoint_mutex, then another thread may acquire
this lock, do checkpoint work and freed current transaction, "transaction->t_tid"
will cause an invalid pointer dereference.

Regards,
Xiaoguang Wang
> 
> Otherwise the patch looks good so after removing the above hunk feel free to
> add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
> 
> 
>> @@ -276,9 +276,22 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>   		"JBD2: %s: Waiting for Godot: block %llu\n",
>>   		journal->j_devname, (unsigned long long) bh->b_blocknr);
>>   
>> +			if (batch_count)
>> +				__flush_batch(journal, &batch_count);
>>   			jbd2_log_start_commit(journal, tid);
>> +			/*
>> +			 * jbd2_journal_commit_transaction() may want
>> +			 * to take the checkpoint_mutex if JBD2_FLUSHED
>> +			 * is set, jbd2_update_log_tail() called by
>> +			 * jbd2_journal_commit_transaction() may also take
>> +			 * checkpoint_mutex.  So we need to temporarily
>> +			 * drop it.
>> +			 */
>> +			mutex_unlock(&journal->j_checkpoint_mutex);
>>   			jbd2_log_wait_commit(journal, tid);
>> -			goto retry;
>> +			mutex_lock_io(&journal->j_checkpoint_mutex);
>> +			spin_lock(&journal->j_list_lock);
>> +			goto restart;
>>   		}
>>   		if (!buffer_dirty(bh)) {
>>   			if (unlikely(buffer_write_io_error(bh)) && !result)
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 8ef6b6daaa7a..88d8f22d2cba 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -2067,7 +2067,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
>>   	err = jbd2_journal_skip_recovery(journal);
>>   	if (write) {
>>   		/* Lock to make assertions happy... */
>> -		mutex_lock(&journal->j_checkpoint_mutex);
>> +		mutex_lock_io(&journal->j_checkpoint_mutex);
>>   		jbd2_mark_journal_empty(journal, REQ_SYNC | REQ_FUA);
>>   		mutex_unlock(&journal->j_checkpoint_mutex);
>>   	}
>> -- 
>> 2.17.2
>>

  reply	other threads:[~2018-11-23 13:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14 11:49 [PATCH] ext4: fix deadlock while checkpoint thread waits commit thread to finish Xiaoguang Wang
2018-11-22 12:36 ` Jan Kara
2018-11-23  2:45   ` Xiaoguang Wang [this message]
2018-11-23 11:15     ` Jan Kara
2018-11-24  3:40       ` Xiaoguang Wang
  -- strict thread matches above, loose matches on Subject: below --
2018-11-13  8:59 Xiaoguang Wang
2018-11-13 12:39 ` Jan Kara
2018-11-13 13:00   ` Xiaoguang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f9dab38-26f8-6fd8-299e-30d711ee61b8@linux.alibaba.com \
    --to=xiaoguang.wang@linux.alibaba.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox