From: Sergei Trofimovich <slyich@gmail.com>
To: Josef Bacik <josef@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>,
linux-btrfs@vger.kernel.org, Chris Mason <chris.mason@oracle.com>
Subject: Re: [PATCH] btrfs: fix warning in iput for bad-inode
Date: Tue, 30 Aug 2011 22:31:16 +0300 [thread overview]
Message-ID: <20110830223116.5fc21074@sf> (raw)
In-Reply-To: <4E5D25BD.3020901@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 11394 bytes --]
On Tue, 30 Aug 2011 14:02:37 -0400
Josef Bacik <josef@redhat.com> wrote:
> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote:
> >> Running 'sync' program after the load does not finish and eats
> >> 100%CPU busy-waiting for something in kernel.
> >>
> >> It's easy to reproduce hang with patch for me. I just run
> >> liferea and sync after it. Without patch I haven't managed to
> >> hang btrfs up.
> >
> > And I think it's another btrfs bug. I've managed to reproduce it
> > _without_ your patch and _without_ autodefrag enabled by manually
> > running the following commands: $ btrfs fi defrag
> > file-with-20_000-extents $ sync
> >
> > I think your patch just shuffles things a bit and forces
> > autodefrag races to pop-up sooner (which is good! :])
> >
>
> Sergei, can you do sysrq+w when this is happening, and maybe turn on
> the softlockup detector so we can see where sync is getting stuck?
> Thanks,
Sure. As I keep telling about 2 cases in IRC I will state both here explicitely:
==The First Issue (aka "The Hung sync()" case) ==
- it's an unpatched linus's v3.1-rc4-80-g0f43dd5
- /dev/root on / type btrfs (rw,noatime,compress=lzo)
- 50% full 30GB filesystem (usual nonmixed mode)
How I hung it:
$ /usr/sbin/filefrag ~/.bogofilter/wordlist.db
/home/st/.bogofilter/wordlist.db: 19070 extents found
the file is 138MB sqlite database for bayesian SPAM filter,
it's being read and written every 20 minutes or so.
Maybe, it was writtent even in defrag/sync time!
$~/dev/git/btrfs-progs-unstable/btrfs fi defrag ~/.bogofilter/wordlist.db
$ sync
^C<hung in D-state>
I didn't try to reproduce it yet.
As for lockdep I'll try but I'm afraid I will fail to reproduce, but I'll try tomorrow.
I suspect I'll need to seriously fragment some file first down to such horrible state.
With help of David I've some (hopefully relevant) info:
#!/bin/sh -x
for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do
ps $i
sudo cat /proc/$i/stack
done
PID TTY STAT TIME COMMAND
1291 ? D 0:00 [btrfs-endio-wri]
[<ffffffff8130055d>] btrfs_tree_read_lock+0x6d/0x120
[<ffffffff812b8e88>] btrfs_search_slot+0x698/0x8b0
[<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190
[<ffffffff812ca10f>] __btrfs_lookup_bio_sums+0x1cf/0x3e0
[<ffffffff812ca371>] btrfs_lookup_bio_sums+0x11/0x20
[<ffffffff812d6a50>] btrfs_submit_bio_hook+0x140/0x170
[<ffffffff812ed594>] submit_one_bio+0x64/0xa0
[<ffffffff812f14f5>] extent_readpages+0xe5/0x100
[<ffffffff812d7aaa>] btrfs_readpages+0x1a/0x20
[<ffffffff810a6a02>] __do_page_cache_readahead+0x1d2/0x280
[<ffffffff810a6d8c>] ra_submit+0x1c/0x20
[<ffffffff810a6ebd>] ondemand_readahead+0x12d/0x270
[<ffffffff810a70cc>] page_cache_sync_readahead+0x2c/0x40
[<ffffffff81309987>] __load_free_space_cache+0x1a7/0x5b0
[<ffffffff81309e61>] load_free_space_cache+0xd1/0x190
[<ffffffff812be07b>] cache_block_group+0xab/0x290
[<ffffffff812c3def>] find_free_extent.clone.71+0x39f/0xab0
[<ffffffff812c5160>] btrfs_reserve_extent+0xe0/0x170
[<ffffffff812c56df>] btrfs_alloc_free_block+0xcf/0x330
[<ffffffff812b498d>] __btrfs_cow_block+0x11d/0x4a0
[<ffffffff812b4df8>] btrfs_cow_block+0xe8/0x1a0
[<ffffffff812b8965>] btrfs_search_slot+0x175/0x8b0
[<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190
[<ffffffff812caf6e>] btrfs_csum_file_blocks+0xbe/0x670
[<ffffffff812d7d91>] add_pending_csums.clone.39+0x41/0x60
[<ffffffff812da528>] btrfs_finish_ordered_io+0x218/0x310
[<ffffffff812da635>] btrfs_writepage_end_io_hook+0x15/0x20
[<ffffffff8130c71a>] end_compressed_bio_write+0x7a/0xe0
[<ffffffff811146f8>] bio_endio+0x18/0x30
[<ffffffff812cd8fc>] end_workqueue_fn+0xec/0x120
[<ffffffff812fb0ac>] worker_loop+0xac/0x520
[<ffffffff8105d486>] kthread+0x96/0xa0
[<ffffffff815f9214>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
PID TTY STAT TIME COMMAND
1296 ? D 0:00 [btrfs-transacti]
[<ffffffff812d3d6e>] btrfs_commit_transaction+0x22e/0x870
[<ffffffff812cd5bb>] transaction_kthread+0x26b/0x280
[<ffffffff8105d486>] kthread+0x96/0xa0
[<ffffffff815f9214>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff
PID TTY STAT TIME COMMAND
2891 pts/0 D+ 0:00 sync
[<ffffffff8109d069>] sleep_on_page+0x9/0x10
[<ffffffff8109d28e>] wait_on_page_bit+0x6e/0x80
[<ffffffff8109d9aa>] filemap_fdatawait_range+0xfa/0x190
[<ffffffff8109da62>] filemap_fdatawait+0x22/0x30
[<ffffffff8110ae5a>] sync_inodes_sb+0x17a/0x1e0
[<ffffffff8110eae8>] __sync_filesystem+0x78/0x80
[<ffffffff8110eb07>] sync_one_sb+0x17/0x20
[<ffffffff810e9cc5>] iterate_supers+0x95/0xf0
[<ffffffff8110ea2b>] sync_filesystems+0x1b/0x20
[<ffffffff8110eb8c>] sys_sync+0x1c/0x40
[<ffffffff815f813b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
PID TTY STAT TIME COMMAND
cat: /proc/4454/stack: Нет такого файла или каталога
disassebbled btrfs_tree_read_lock:
ffffffff813004f0 <btrfs_tree_read_lock> push %rbp
ffffffff813004f1 <btrfs_tree_read_lock+0x1> mov %gs:0xb5c0,%rax
ffffffff813004fa <btrfs_tree_read_lock+0xa> mov %rsp,%rbp
ffffffff813004fd <btrfs_tree_read_lock+0xd> push %r15
ffffffff813004ff <btrfs_tree_read_lock+0xf> push %r14
ffffffff81300501 <btrfs_tree_read_lock+0x11> lea 0x6c(%rdi),%r14
ffffffff81300505 <btrfs_tree_read_lock+0x15> push %r13
ffffffff81300507 <btrfs_tree_read_lock+0x17> lea 0x70(%rdi),%r13
ffffffff8130050b <btrfs_tree_read_lock+0x1b> push %r12
ffffffff8130050d <btrfs_tree_read_lock+0x1d> lea -0x60(%rbp),%r12
ffffffff81300511 <btrfs_tree_read_lock+0x21> push %rbx
ffffffff81300512 <btrfs_tree_read_lock+0x22> lea 0x18(%r12),%r15
ffffffff81300517 <btrfs_tree_read_lock+0x27> lea 0x5c(%rdi),%rbx
ffffffff8130051b <btrfs_tree_read_lock+0x2b> sub $0x58,%rsp
ffffffff8130051f <btrfs_tree_read_lock+0x2f> mov %rdi,-0x70(%rbp)
ffffffff81300523 <btrfs_tree_read_lock+0x33> mov %rax,-0x68(%rbp)
ffffffff81300527 <btrfs_tree_read_lock+0x37> mov %rax,-0x78(%rbp)
ffffffff8130052b <btrfs_tree_read_lock+0x3b> nopl 0x0(%rax,%rax,1)
ffffffff81300530 <btrfs_tree_read_lock+0x40> mov (%rbx),%eax
ffffffff81300532 <btrfs_tree_read_lock+0x42> test %eax,%eax
ffffffff81300534 <btrfs_tree_read_lock+0x44> je ffffffff8130057e <btrfs_tree_read_lock+0x8e>
ffffffff81300536 <btrfs_tree_read_lock+0x46> mov -0x68(%rbp),%rax
ffffffff8130053a <btrfs_tree_read_lock+0x4a> movq $0x0,-0x60(%rbp)
ffffffff81300542 <btrfs_tree_read_lock+0x52> movq $0xffffffff8105d950,-0x50(%rbp)
ffffffff8130054a <btrfs_tree_read_lock+0x5a> mov %r15,-0x48(%rbp)
ffffffff8130054e <btrfs_tree_read_lock+0x5e> mov %r15,-0x40(%rbp)
ffffffff81300552 <btrfs_tree_read_lock+0x62> mov %rax,-0x58(%rbp)
ffffffff81300556 <btrfs_tree_read_lock+0x66> jmp ffffffff8130055d <btrfs_tree_read_lock+0x6d>
ffffffff81300558 <btrfs_tree_read_lock+0x68> callq ffffffff815f4670 <schedule>
ffffffff8130055d <btrfs_tree_read_lock+0x6d> mov $0x2,%edx
ffffffff81300562 <btrfs_tree_read_lock+0x72> mov %r12,%rsi
ffffffff81300565 <btrfs_tree_read_lock+0x75> mov %r13,%rdi
ffffffff81300568 <btrfs_tree_read_lock+0x78> callq ffffffff8105db90 <prepare_to_wait>
ffffffff8130056d <btrfs_tree_read_lock+0x7d> mov (%rbx),%eax
ffffffff8130056f <btrfs_tree_read_lock+0x7f> test %eax,%eax
ffffffff81300571 <btrfs_tree_read_lock+0x81> jne ffffffff81300558 <btrfs_tree_read_lock+0x68>
ffffffff81300573 <btrfs_tree_read_lock+0x83> mov %r12,%rsi
ffffffff81300576 <btrfs_tree_read_lock+0x86> mov %r13,%rdi
ffffffff81300579 <btrfs_tree_read_lock+0x89> callq ffffffff8105da80 <finish_wait>
ffffffff8130057e <btrfs_tree_read_lock+0x8e> mov %r14,%rdi
ffffffff81300581 <btrfs_tree_read_lock+0x91> callq ffffffff815f76d0 <_raw_read_lock>
ffffffff81300586 <btrfs_tree_read_lock+0x96> mov (%rbx),%eax
ffffffff81300588 <btrfs_tree_read_lock+0x98> test %eax,%eax
ffffffff8130058a <btrfs_tree_read_lock+0x9a> je ffffffff813005eb <btrfs_tree_read_lock+0xfb>
ffffffff8130058c <btrfs_tree_read_lock+0x9c> mov %r14,%rdi
ffffffff8130058f <btrfs_tree_read_lock+0x9f> callq ffffffff815f73f0 <_raw_read_unlock>
ffffffff81300594 <btrfs_tree_read_lock+0xa4> mov (%rbx),%eax
ffffffff81300596 <btrfs_tree_read_lock+0xa6> test %eax,%eax
ffffffff81300598 <btrfs_tree_read_lock+0xa8> je ffffffff81300530 <btrfs_tree_read_lock+0x40>
ffffffff8130059a <btrfs_tree_read_lock+0xaa> mov -0x78(%rbp),%rax
ffffffff8130059e <btrfs_tree_read_lock+0xae> movq $0x0,-0x60(%rbp)
ffffffff813005a6 <btrfs_tree_read_lock+0xb6> movq $0xffffffff8105d950,-0x50(%rbp)
ffffffff813005ae <btrfs_tree_read_lock+0xbe> mov %r15,-0x48(%rbp)
ffffffff813005b2 <btrfs_tree_read_lock+0xc2> mov %r15,-0x40(%rbp)
ffffffff813005b6 <btrfs_tree_read_lock+0xc6> mov %rax,-0x58(%rbp)
ffffffff813005ba <btrfs_tree_read_lock+0xca> jmp ffffffff813005c5 <btrfs_tree_read_lock+0xd5>
ffffffff813005bc <btrfs_tree_read_lock+0xcc> nopl 0x0(%rax)
ffffffff813005c0 <btrfs_tree_read_lock+0xd0> callq ffffffff815f4670 <schedule>
ffffffff813005c5 <btrfs_tree_read_lock+0xd5> mov $0x2,%edx
ffffffff813005ca <btrfs_tree_read_lock+0xda> mov %r12,%rsi
ffffffff813005cd <btrfs_tree_read_lock+0xdd> mov %r13,%rdi
ffffffff813005d0 <btrfs_tree_read_lock+0xe0> callq ffffffff8105db90 <prepare_to_wait>
ffffffff813005d5 <btrfs_tree_read_lock+0xe5> mov (%rbx),%eax
ffffffff813005d7 <btrfs_tree_read_lock+0xe7> test %eax,%eax
ffffffff813005d9 <btrfs_tree_read_lock+0xe9> jne ffffffff813005c0 <btrfs_tree_read_lock+0xd0>
ffffffff813005db <btrfs_tree_read_lock+0xeb> mov %r12,%rsi
ffffffff813005de <btrfs_tree_read_lock+0xee> mov %r13,%rdi
ffffffff813005e1 <btrfs_tree_read_lock+0xf1> callq ffffffff8105da80 <finish_wait>
ffffffff813005e6 <btrfs_tree_read_lock+0xf6> jmpq ffffffff81300530 <btrfs_tree_read_lock+0x40>
ffffffff813005eb <btrfs_tree_read_lock+0xfb> mov -0x70(%rbp),%rax
ffffffff813005ef <btrfs_tree_read_lock+0xff> lock incl 0x58(%rax)
ffffffff813005f3 <btrfs_tree_read_lock+0x103> lock incl 0x64(%rax)
ffffffff813005f7 <btrfs_tree_read_lock+0x107> add $0x58,%rsp
ffffffff813005fb <btrfs_tree_read_lock+0x10b> pop %rbx
ffffffff813005fc <btrfs_tree_read_lock+0x10c> pop %r12
ffffffff813005fe <btrfs_tree_read_lock+0x10e> pop %r13
ffffffff81300600 <btrfs_tree_read_lock+0x110> pop %r14
ffffffff81300602 <btrfs_tree_read_lock+0x112> pop %r15
ffffffff81300604 <btrfs_tree_read_lock+0x114> leaveq
ffffffff81300605 <btrfs_tree_read_lock+0x115> retq
ffffffff81300606 <btrfs_tree_read_lock+0x116> nopw %cs:0x0(%rax,%rax,1)
==The Second Issue (aka "The Busy Looping sync()" case) ==
The box is different from first, so conditions are a bit different.
- it's a _patched_ linus's 3.1.0-rc3-00271-g3230739 (+patch from $SUBJ on top)
- /dev/root on / type btrfs (rw,noatime,autodefrag)
(note autodefrag!)
- 15% full 594GB filesystem (usual nonmixed mode)
How I hung it:
$ liferea
<wait it to calm down. it does a lot of SQLite reads/writes>
$ sync
Got CPU is 100% loaded
Perfectly reproducible.
I don't remembed it it exits or hangs. Will send detailed followup today.
--
Sergei
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2011-08-30 19:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-17 18:56 [PATCH] btrfs: fix warning in iput for bad-inode Konstantin Khlebnikov
2011-08-29 3:34 ` Sergei Trofimovich
2011-08-30 16:53 ` Sergei Trofimovich
2011-08-30 18:02 ` Josef Bacik
2011-08-30 19:31 ` Sergei Trofimovich [this message]
2011-08-30 19:40 ` Josef Bacik
2011-08-30 19:45 ` Josef Bacik
2011-08-30 20:46 ` Sergei Trofimovich
2011-08-30 21:17 ` Sergei Trofimovich
2011-09-02 17:01 ` slyich
2011-09-07 9:18 ` David Sterba
2011-09-01 2:45 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110830223116.5fc21074@sf \
--to=slyich@gmail.com \
--cc=chris.mason@oracle.com \
--cc=josef@redhat.com \
--cc=khlebnikov@openvz.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).