From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sergei Trofimovich Subject: Re: [PATCH] btrfs: fix warning in iput for bad-inode Date: Tue, 30 Aug 2011 23:46:00 +0300 Message-ID: <20110830234600.27db6565@sf> References: <20110817185619.4660.78543.stgit@localhost6> <20110829063416.00f4652c@sf> <20110830195309.4882f478@sf> <4E5D25BD.3020901@redhat.com> <20110830223116.5fc21074@sf> <4E5D3CC1.2090505@redhat.com> <4E5D3DDD.2090401@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/154M91hV2RByQW6vAsIYmul"; protocol="application/pgp-signature" Cc: Konstantin Khlebnikov , linux-btrfs@vger.kernel.org, Chris Mason To: Josef Bacik Return-path: In-Reply-To: <4E5D3DDD.2090401@redhat.com> List-ID: --Sig_/154M91hV2RByQW6vAsIYmul Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable > On 08/30/2011 03:40 PM, Josef Bacik wrote: > > On 08/30/2011 03:31 PM, Sergei Trofimovich wrote: > >> On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik > >> wrote: > >> > >>> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: > >>>>> Running 'sync' program after the load does not finish and > >>>>> eats 100%CPU busy-waiting for something in kernel. > >>>>> > >>>>> It's easy to reproduce hang with patch for me. I just run=20 > >>>>> liferea and sync after it. Without patch I haven't managed > >>>>> to hang btrfs up. > >>>> > >>>> And I think it's another btrfs bug. I've managed to reproduce > >>>> it _without_ your patch and _without_ autodefrag enabled by > >>>> manually running the following commands: $ btrfs fi defrag=20 > >>>> file-with-20_000-extents $ sync > >>>> > >>>> I think your patch just shuffles things a bit and forces=20 > >>>> autodefrag races to pop-up sooner (which is good! :]) > >>>> > >>> > >>> Sergei, can you do sysrq+w when this is happening, and maybe turn > >>> on the softlockup detector so we can see where sync is getting > >>> stuck? Thanks, > >> > >> Sure. As I keep telling about 2 cases in IRC I will state both here > >> explicitely: > >> > >> =3D=3DThe First Issue (aka "The Hung sync()" case) =3D=3D > >> > >> - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on / > >> type btrfs (rw,noatime,compress=3Dlzo) - 50% full 30GB filesystem > >> (usual nonmixed mode) > >> > >> How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db=20 > >> /home/st/.bogofilter/wordlist.db: 19070 extents found the file is > >> 138MB sqlite database for bayesian SPAM filter, it's being read and > >> written every 20 minutes or so. Maybe, it was writtent even in > >> defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag > >> ~/.bogofilter/wordlist.db $ sync ^C > >> > >> I didn't try to reproduce it yet. As for lockdep I'll try but I'm > >> afraid I will fail to reproduce, but I'll try tomorrow. I suspect > >> I'll need to seriously fragment some file first down to such > >> horrible state. > >> > >> With help of David I've some (hopefully relevant) info: #!/bin/sh > >> -x > >> > >> for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do ps $i sudo > >> cat /proc/$i/stack done > >> > >> PID TTY STAT TIME COMMAND 1291 ? D 0:00 > >> [btrfs-endio-wri] [] > >> btrfs_tree_read_lock+0x6d/0x120 [] > >> btrfs_search_slot+0x698/0x8b0 [] > >> btrfs_lookup_csum+0x68/0x190 [] > >> __btrfs_lookup_bio_sums+0x1cf/0x3e0 [] > >> btrfs_lookup_bio_sums+0x11/0x20 [] > >> btrfs_submit_bio_hook+0x140/0x170 [] > >> submit_one_bio+0x64/0xa0 [] > >> extent_readpages+0xe5/0x100 [] > >> btrfs_readpages+0x1a/0x20 [] > >> __do_page_cache_readahead+0x1d2/0x280 [] > >> ra_submit+0x1c/0x20 [] > >> ondemand_readahead+0x12d/0x270 [] > >> page_cache_sync_readahead+0x2c/0x40 [] > >> __load_free_space_cache+0x1a7/0x5b0 [] > >> load_free_space_cache+0xd1/0x190 [] > >> cache_block_group+0xab/0x290 [] > >> find_free_extent.clone.71+0x39f/0xab0 [] > >> btrfs_reserve_extent+0xe0/0x170 [] > >> btrfs_alloc_free_block+0xcf/0x330 [] > >> __btrfs_cow_block+0x11d/0x4a0 [] > >> btrfs_cow_block+0xe8/0x1a0 [] > >> btrfs_search_slot+0x175/0x8b0 [] > >> btrfs_lookup_csum+0x68/0x190 [] > >> btrfs_csum_file_blocks+0xbe/0x670 [] > >> add_pending_csums.clone.39+0x41/0x60 [] > >> btrfs_finish_ordered_io+0x218/0x310 [] > >> btrfs_writepage_end_io_hook+0x15/0x20 [] > >> end_compressed_bio_write+0x7a/0xe0 [] > >> bio_endio+0x18/0x30 [] > >> end_workqueue_fn+0xec/0x120 [] > >> worker_loop+0xac/0x520 [] kthread+0x96/0xa0=20 > >> [] kernel_thread_helper+0x4/0x10=20 > >> [] 0xffffffffffffffff > >=20 > > Ok this should have been fixed with > >=20 > > Btrfs: use the commit_root for reading free_space_inode crcs > >=20 > > which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f. Does your > > kernel have this commit? Because if it does then we did something > > wrong. If not it should be in linus's latest tree, so update and it > > should go away (hopefully). Thanks, Yeah, this one was in my local tree when hung. > Oops looks like that patch won't fix it completely, I just sent another > patch that will fix this problem totally, sorry about that >=20 > [PATCH] Btrfs: skip locking if searching the commit root in lookup_csums I'll try to reproduce/test it tomorrow. About the second one: > =3D=3DThe Second Issue (aka "The Busy Looping sync()" case) =3D=3D > The box is different from first, so conditions are a bit different. > - /dev/root on / type btrfs (rw,noatime,autodefrag) > (note autodefrag!) > - 15% full 594GB filesystem (usual nonmixed mode) >=20 > $ liferea > > $ sync > Got CPU is 100% loaded Still reproducible with 2 patches above + $SUBJ one. strace says it hangs in strace() syscall. Stack trace is odd: # cat /proc/`pidof sync`/stack [] 0xffffffffffffffff --=20 Sergei --Sig_/154M91hV2RByQW6vAsIYmul Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iEYEARECAAYFAk5dTAwACgkQcaHudmEf86rl6wCfeI9pApEV7Pi0riq1udmieqz9 ig8An2cYinvSx92A+0IHbHCC97s5stUx =kDeY -----END PGP SIGNATURE----- --Sig_/154M91hV2RByQW6vAsIYmul--