From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
To: Yongqiang Yang <xiaoqiangnk@gmail.com>
Cc: toshi.okajima@jp.fujitsu.com, tytso@mit.edu,
adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org
Subject: Re: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations
Date: Tue, 20 Dec 2011 19:44:00 +0900 [thread overview]
Message-ID: <4EF066F0.5010809@jp.fujitsu.com> (raw)
In-Reply-To: <CAGBYx2YZ4M8QwwOcnLAaf+934WAcZoNS_pPrw8j3Nkr3=6fA5g@mail.gmail.com>
Hi, Yongqiang.
Thanks for reviewing.
(2011/12/16 21:22), Yongqiang Yang wrote:
> On Fri, Dec 16, 2011 at 7:19 PM, Toshiyuki Okajima
> <toshi.okajima@jp.fujitsu.com> wrote:
>> Hi.
>>
>> I found a bug by executing the following reproducer (needs tuning).
>> (The reproducer never ends.)
>>
>> I wrote it for a confirmation of
>> "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
>> Therefore, without
>> "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
>> patch, the other problem (panic) which the patch describes can happen more
>> frequently:
>> -------------------------------------------------------------------------------
>> #!/bin/sh
>>
>> date
>> LOOP=100000
>> dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
>> /sbin/mkfs.ext4 -Fq /tmp/ext4.img
>> mount -o loop /tmp/ext4.img /mnt
>> rm -f /mnt/file
>> echo "0" > /mnt/file
>> (while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
>> PID=$!
>> for ((i=0; i < LOOP; i++));
>> do
>> DATE=$(date)
>> echo -n "[LOOP $i] $DATE"
>> if ((i%2 == 0));
>> then
>> chattr +j /mnt/file
>> else
>> chattr -j /mnt/file
>> fi
>> echo "0" >> /mnt/file
>> done
>> kill -9 $PID
>> rm -f /mnt/file*
>> umount /mnt
>> exit 0
>> -------------------------------------------------------------------------------
>> Though I ran it, it never ended forever.
>> (At one of my tries to reproduce the hang, the reproducer stopped in 6
>> hours.)
>>
>> Because jbd2_journal_lock_updates() never returns from schedule()
>> after prepare_to_wait().
>>
>> The detailed explanation is as follows:
>> ===============================================================================
>> Current implementation of jbd2_journal_stop() has a bug not to synchronize
>> t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()
> IMHO, the description is not very exactly, the problem is not due to
> desync of t_updates, t_updates is a var of atomic_t. The reason is
> that we have a critical code section here:[operation on t_updates,
> wait or wakeup], this section should be synchronized.
I understood. I will rewrite the description much better than this by
applying your comment.
>
>> and jbd2_journal_commit_transaction().
>>
>> This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
>> commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.
>>
>> ---------------------------
>> jbd2_journal_lock_updates()
>> ---------------------------
>> 509 write_lock(&journal->j_state_lock);
>> 510 ++journal->j_barrier_count;
>> ...
>> 513 while (1) {
>> ...
>> 519 spin_lock(&transaction->t_handle_lock);
>> 520 if (!atomic_read(&transaction->t_updates)) {
>> 521 spin_unlock(&transaction->t_handle_lock);
>> 522 break;
>> 523 }
>> 524 prepare_to_wait(&journal->j_wait_updates, &wait,
>> 525 TASK_UNINTERRUPTIBLE);
>> 526 spin_unlock(&transaction->t_handle_lock);
> It seems that t_handle_lock is used to protect the critical section
> here. I think t_handle is not needed here at all. t_handle_lock
> should be used to protect transaction and j_state_lock should be used
> to protect journal. prepare_to_wait operates on journal, so we should
> remove t_handle_lock here.
I think so, too. Then, I will write another patch which deletes the statements
of t_handle_lock.
>
>> 527 write_unlock(&journal->j_state_lock);
>> 528 schedule();
>> 529 finish_wait(&journal->j_wait_updates, &wait);
>> 530 write_lock(&journal->j_state_lock);
>> 531 }
>> 532 write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>> and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> start_this_handle()
>> -------------------
>> 160 read_lock(&journal->j_state_lock);
>> ...
>> 278 atomic_inc(&transaction->t_updates);
>> ...
>> 284 read_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by read_lock(&journal->j_state_lock).
>>
>> ---------------------------------
>> jbd2_journal_commit_transaction()
>> ---------------------------------
>> 358 write_lock(&journal->j_state_lock);
>> ...
>> 367 spin_lock(&commit_transaction->t_handle_lock);
>> 368 while (atomic_read(&commit_transaction->t_updates)) {
>> 369 DEFINE_WAIT(wait);
>> 370
>> 371 prepare_to_wait(&journal->j_wait_updates, &wait,
>> 372 TASK_UNINTERRUPTIBLE);
>> 373 if (atomic_read(&commit_transaction->t_updates)) {
>> 374 spin_unlock(&commit_transaction->t_handle_lock);
> Also, t_handle_lock is used to protect the critical section. I think
> t_handle_lock is not needed here at all.
Yes.
>
>> 375 write_unlock(&journal->j_state_lock);
>> 376 schedule();
>> 377 write_lock(&journal->j_state_lock);
>> 378 spin_lock(&commit_transaction->t_handle_lock);
>> 379 }
>> 380 finish_wait(&journal->j_wait_updates, &wait);
>> 381 }
>> 382 spin_unlock(&commit_transaction->t_handle_lock);
>> ...
>> 447 wake_up(&journal->j_wait_transaction_locked);
>> 448 write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>> and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> jbd2_journal_stop()
>> -------------------
>> ...
>> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
>> 1453 wake_up(&journal->j_wait_updates);
>> 1454 if (journal->j_barrier_count)
>> 1455 wake_up(&journal->j_wait_transaction_locked);
>> 1456 }
>> 1457
>> 1458 if (wait_for_commit)
>> 1459 err = jbd2_log_wait_commit(journal, tid);
>> 1460
>> 1461 lock_map_release(&handle->h_lockdep_map);
>> 1462
>> 1463 jbd2_free_handle(handle);
>> 1464 return err;
>> 1465 }
>>
>> * The operation of t_updates is not protected.
>>
>> Therefore,
>> - jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
>> - jbd2_journal_stop() tries to wake someone up
>> (wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
>> => But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
>> if it has not yet slept completely.
>>
>> After we run the reproducer, we can see the processes which become hang-up.
>>
>> -------------------------------------------------------------------------------
>> crash> ps | grep UN
>> 16645 23737 1 ffff88007a592b20 UN 0.0 105136 508 dd
>> 16684 23727 1 ffff88007a2a9480 UN 0.0 107152 512 chattr
>> 23722 2 1 ffff88007b496080 UN 0.0 0 0 [flush-7:0]
>>
>> [1]
>> crash> bt 16645
>> PID: 16645 TASK: ffff88007a592b20 CPU: 1 COMMAND: "dd"
>> #0 [ffff88007b575878] __schedule at ffffffff814b9b77
>> #1 [ffff88007b575920] schedule at ffffffff814ba3cf
>> #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
>> #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>> #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>> #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>> #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
>> #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
>> #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
>> #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
>> #10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
>> #11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
>> #12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
>> #13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
>> #14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
>> RIP: 00000034930d4230 RSP: 00007fffd8176980 RFLAGS: 00000206
>> RAX: 0000000000000001 RBX: ffffffff814c3f42 RCX: 0000000000000010
>> RDX: 0000000000001000 RSI: 0000000001ac6000 RDI: 0000000000000001
>> RBP: 0000000001ac6000 R8: 0000000000000003 R9: 0000000000040000
>> R10: 0000000000003003 R11: 0000000000000246 R12: 0000000001ac5fff
>> R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000246
>> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
>>
>> [2]
>> crash> bt 16684
>> PID: 16684 TASK: ffff88007a2a9480 CPU: 1 COMMAND: "chattr"
>> #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
>> #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
>> #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
>> #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
>> #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
>> #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
>> #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
>> #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
>> RIP: 00000034930d95f7 RSP: 00007fff8830e0d8 RFLAGS: 00010213
>> RAX: 0000000000000010 RBX: ffffffff814c3f42 RCX: 00000034930d95f7
>> RDX: 00007fff8830e17c RSI: 0000000040086602 RDI: 0000000000000003
>> RBP: 0000000000084000 R8: 0000000000000000 R9: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8830f8db
>> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
>> ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
>>
>> [3]
>> crash> bt 23722
>> PID: 23722 TASK: ffff88007b496080 CPU: 1 COMMAND: "flush-7:0"
>> #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
>> #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
>> #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
>> #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>> #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>> #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>> #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
>> #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
>> #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
>> #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
>> #10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
>> #11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
>> #12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
>> #13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
>> #14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
>> #15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4
>>
>> Step to reproduce a hang-up(figure):
>> =======================================+=======================================
>> ( "chattr -j" process PID: 16684 ) | ( a certain process )
>> t ======================================+=======================================
>> i |
>> m +---------------------------------------
>> e |jbd2_journal_start()
>> | +---------------------------------------
>> | | . . .
>> | | t_updates++ // t_updates = 1
>> | | . . .
>> | +---------------------------------------
>> | ...
>> |---------------------------------------+---------------------------------------
>> |jbd2_journal_lock_updates() |jbd2_journal_stop()
>> |---------------------------------------+---------------------------------------
>> | write_lock(&journal->j_state_lock) | .
>> | ++journal->j_barrier_count | .
>> | spin_lock(&tran->t_handle_lock) | .
>> | atomic_read(&tran->t_updates) //not 0 |
>> | | atomic_dec_and_test(&tran->t_updates)
>> | | // t_updates = 0
>> | | wake_up(&journal->j_wait_updates)
>> | prepare_to_wait() | // no process is woken up.
>> | spin_unlock(&tran->t_handle_lock) + wake_up(&journal->j_wait_tran_locked)
>> | write_unlock(&journal->j_state_lock) | // no process is woken up.
>> | schedule() // never return +---------------------------------------
>> | | ( "dd" PID: 16645 or "flush-7:0" )
>> | +=======================================
>> | |start_this_handle()
>> | +---------------------------------------
>> | | . . .
>> | | read_lock(&journal->j_state_lock)
>> | | if (journal->j_barrier_count)
>> | | read_unlock(&journal->j_state_lock)
>> | | wait_event(journal->j_wait_tran_locked)
>> v | // never return
>>
>> Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
>> read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
>> operators.
>> ===============================================================================
>>
>> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
>> ---
>> fs/jbd2/transaction.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index a0e41a4..d48f395 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
>> * once we do this, we must not dereference transaction
>> * pointer again.
>> */
>> + read_lock(&journal->j_state_lock);
>> tid = transaction->t_tid;
>> if (atomic_dec_and_test(&transaction->t_updates)) {
>> wake_up(&journal->j_wait_updates);
>> if (journal->j_barrier_count)
> oops, j_barrier_count is not protected too in original code. The
> problem does not exists in patched code any more:-)
OK.
I will add the notice into the description which I will rewrite.
Best Regards,
Toshiyuki Okajima
>
>
> Yongqiang.
>> wake_up(&journal->j_wait_transaction_locked);
>> }
>> + read_unlock(&journal->j_state_lock);
>>
>> if (wait_for_commit)
>> err = jbd2_log_wait_commit(journal, tid);
>> --
>> 1.5.5.6
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
next prev parent reply other threads:[~2011-12-20 10:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-16 11:19 [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations Toshiyuki Okajima
2011-12-16 12:22 ` Yongqiang Yang
2011-12-20 10:44 ` Toshiyuki Okajima [this message]
2011-12-22 11:36 ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
2011-12-22 11:56 ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
2012-01-03 15:32 ` Jan Kara
2012-01-05 3:12 ` Ted Ts'o
2011-12-22 12:00 ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
2011-12-22 14:07 ` Yongqiang Yang
2011-12-26 0:20 ` Toshiyuki Okajima
2011-12-26 1:17 ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
2011-12-31 6:35 ` Yongqiang Yang
2012-01-03 15:35 ` Jan Kara
2012-01-05 3:27 ` Ted Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EF066F0.5010809@jp.fujitsu.com \
--to=toshi.okajima@jp.fujitsu.com \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xiaoqiangnk@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.