From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [BUG] sleeping function called from atomic context Date: Fri, 4 May 2012 11:38:44 -0400 Message-ID: <20120504153844.GC25477@shiny> References: <4FA3AD0B.7040500@giantdisaster.de> <20120504132536.GA25477@shiny> <4FA3F3B0.6020902@giantdisaster.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linux Btrfs List To: Stefan Behrens Return-path: In-Reply-To: <4FA3F3B0.6020902@giantdisaster.de> List-ID: On Fri, May 04, 2012 at 05:20:16PM +0200, Stefan Behrens wrote: > On 5/4/2012 3:25 PM, Chris Mason wrote: > > On Fri, May 04, 2012 at 12:18:51PM +0200, Stefan Behrens wrote: > >> Looks like after "btrfs read error corrected" of chunk tree block while > >> reading the chunk tree in open_ctree(), we stay in atomic state (in > >> 3.4-rc5). > > > > I'm having a hard time reproducing this here. Do you have lockdep on? > > It might tell us which lock we're leaving around. > > Next two tries with lockdep and with a reboot before the mount. For the > first one I was using dd(1) to damage (overwite) one mirror of the chunk > tree and the issue was there immediately. The second time with some > writes to the disk in degraded state and a remount with all disks > afterwards. This time umount was raising the issue. > > And as Dave wrote, SLUB checks whether someone is in atomic state and > calls kmem_cache_alloc() with __GFP_WAIT. I see no reason for being in > atomic state at this point, that's the issue. > > > BUG: sleeping function called from invalid context at mm/slub.c:937 > in_atomic(): 1, irqs_disabled(): 0, pid: 3650, name: mount > 2 locks held by mount/3650: > #0: (&type->s_umount_key#32/1){+.+.+.}, at: [] > sget+0x248/0x540 > #1: (btrfs-fs-03){++++..}, at: [] > btrfs_clear_lock_blocking_rw+0x62/0x160 [btrfs] > Pid: 3650, comm: mount Not tainted 3.3.0+ #58 > Call Trace: > [] __might_sleep+0xe1/0x110 > [] kmem_cache_alloc+0x4d/0x160 > [] alloc_extent_state+0x2b/0xf0 [btrfs] > [] __set_extent_bit+0x357/0x590 [btrfs] > [] ? trace_hardirqs_off+0xd/0x10 > [] lock_extent_bits+0x6c/0xa0 [btrfs] > [] verify_parent_transid+0x84/0x160 [btrfs] Oh, that makes sense. verify_parent_transid can sleep because it locks the extent, and we're being called with a spinning lock. Hmmm, let me think on this one. -chris