linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: linux-ext4@vger.kernel.org
Subject: [Bug 102751] New: infinite loop in jbd2_journal_destroy()
Date: Wed, 12 Aug 2015 22:38:22 +0000	[thread overview]
Message-ID: <bug-102751-13602@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=102751

            Bug ID: 102751
           Summary: infinite loop in jbd2_journal_destroy()
           Product: File System
           Version: 2.5
    Kernel Version: 4.1.5
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@kernel-bugs.osdl.org
          Reporter: mihai.dontu@gmail.com
        Regression: No

While watching a video from a removable disk (USB), the connecting cable failed
(too much use) and I had to unplug it. I noticed, however, that vlc has started
consuming 100% CPU time while being zombie. An Alt+SysReq+l showed this:

NMI backtrace for cpu 2
CPU: 2 PID: 17378 Comm: vlc Tainted: G           O    4.1.5-gentoo #1
Hardware name: Dell Inc. Latitude E7440/07F3F4, BIOS A15 05/19/2015
task: ffff88029d050000 ti: ffff8802cd80c000 task.ti: ffff8802cd80c000
RIP: 0010:[<ffffffff8cec3320>]  [<ffffffff8cec3320>] mutex_unlock+0x10/0x20
RSP: 0018:ffff8802cd80fcd0  EFLAGS: 00000202
RAX: 00000000fffffffb RBX: ffff880084068000 RCX: 0000000000000000
RDX: 0000000080000001 RSI: 0000000000000000 RDI: ffff8800840680e8
RBP: ffff8802cd80fd38 R08: 000000000000000a R09: 00000000000004b0
R10: 0000000000017e98 R11: 00000000000004b0 R12: ffff880084068398
R13: ffff8800840680e8 R14: ffff8802cd80fcf0 R15: ffff8800840680a0
FS:  00007fa8ac663700(0000) GS:ffff88041eb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5b3e946000 CR3: 000000000d80d000 CR4: 00000000001426e0
Stack:
 ffffffff8c3d1318 ffff880200000000 ffff88029d050000 ffffffff8c179cc0
 ffff8802cd80fcf0 ffff8802cd80fcf0 0000000028b119c8 ffff88015d99c400
 ffff88008406c000 ffff880185940400 ffff880084068800 ffff88029d050000
Call Trace:
 [<ffffffff8c3d1318>] ? jbd2_journal_destroy+0x138/0x240
 [<ffffffff8c179cc0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8c38f0e7>] ext4_put_super+0x67/0x360
 [<ffffffff8c29d726>] generic_shutdown_super+0x76/0x100
 [<ffffffff8c29dae7>] kill_block_super+0x27/0x80
 [<ffffffff8c29de59>] deactivate_locked_super+0x49/0x80
 [<ffffffff8c29e2cc>] deactivate_super+0x6c/0x80
 [<ffffffff8c2bc033>] cleanup_mnt+0x43/0xa0
 [<ffffffff8c2bc0e2>] __cleanup_mnt+0x12/0x20
 [<ffffffff8c153804>] task_work_run+0xd4/0xf0
 [<ffffffff8c139174>] do_exit+0x2f4/0xb90
 [<ffffffff8c1d381c>] ? __audit_syscall_entry+0xac/0x100
 [<ffffffff8c05f745>] ? do_audit_syscall_entry+0x55/0x80
 [<ffffffff8c139a9b>] do_group_exit+0x3b/0xb0
 [<ffffffff8c139b24>] SyS_exit_group+0x14/0x20
 [<ffffffff8cec59db>] system_call_fastpath+0x16/0x6e
Code: ff 4c 89 e7 e8 d2 1e 00 00 5b 41 5c 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00
00 00 00 0f 1f 44 00 00 48 c7 47 18 00 00 00 00 f0 ff 07 <7f> 0a 55 48 89 e5 e8
95 ff ff ff 5d c3 0f 1f 00 0f 1f 44 00 00

and perf top (first 9 lines):

  18.08%  [kernel]  [k] _raw_spin_lock
  17.97%  [kernel]  [k] mutex_lock
  15.36%  [kernel]  [k] mutex_unlock
  10.89%  [kernel]  [k] _raw_spin_unlock
   6.49%  [kernel]  [k] jbd2_log_do_checkpoint
   6.16%  [kernel]  [k] preempt_count_add
   4.53%  [kernel]  [k] jbd2_cleanup_journal_tail
   3.96%  [kernel]  [k] preempt_count_sub
   3.21%  [kernel]  [k] jbd2_journal_destroy

Looking at the code it would seem that I've hit a race in:

  while (journal->j_checkpoint_transactions != NULL) { ... }

because it's waiting for a transaction that cannot take place:

Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.
Aborting journal on device dm-1-8.
Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.

Maybe the loop should be abandoned on jbd2_log_do_checkpoint() error?

The USB failure happened several times before, but I've never seen vlc get
stuck. This also means that I'm unlikely to be able to reproduce this. :-(

One more detail: the ext4 filesystem sits on top a LUKS device.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

             reply	other threads:[~2015-08-12 22:38 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12 22:38 bugzilla-daemon [this message]
2015-08-12 22:38 ` [Bug 102751] infinite loop in jbd2_journal_destroy() bugzilla-daemon
2015-08-13  2:13 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102751-13602@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).