From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [rfc][patch] fs: turn iprune_mutex into rwsem Date: Sat, 15 Aug 2009 01:39:57 +0200 Message-ID: <20090814233957.GA21814@duck.novell.com> References: <20090814152504.GA19195@wotan.suse.de> <20090814155847.860dd23f.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Nick Piggin , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org To: Andrew Morton Return-path: Received: from cantor.suse.de ([195.135.220.2]:38140 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755214AbZHNXj6 (ORCPT ); Fri, 14 Aug 2009 19:39:58 -0400 Content-Disposition: inline In-Reply-To: <20090814155847.860dd23f.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri 14-08-09 15:58:47, Andrew Morton wrote: > On Fri, 14 Aug 2009 17:25:05 +0200 > Nick Piggin wrote: > > > > > We have had a report of memory allocation hangs during DVD-RAM (UDF) writing. > > > > Jan tracked the cause of this down to UDF inode reclaim blocking: > > > > gnome-screens D ffff810006d1d598 0 20686 1 > > ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800 > > ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580 > > ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0 > > Call Trace: > > [] io_schedule+0x63/0xa5 > > [] sync_buffer+0x3b/0x3f > > [] __wait_on_bit+0x47/0x79 > > [] out_of_line_wait_on_bit+0x6a/0x77 > > [] __wait_on_buffer+0x1f/0x21 > > [] __bread+0x70/0x86 > > [] :udf:udf_tread+0x38/0x3a > > [] :udf:udf_update_inode+0x4d/0x68c > > [] :udf:udf_write_inode+0x1d/0x2b > > [] __writeback_single_inode+0x1c0/0x394 > > [] write_inode_now+0x7d/0xc4 > > [] :udf:udf_clear_inode+0x3d/0x53 > > [] clear_inode+0xc2/0x11b > > [] dispose_list+0x5b/0x102 > > [] shrink_icache_memory+0x1dd/0x213 > > [] shrink_slab+0xe3/0x158 > > [] try_to_free_pages+0x177/0x232 > > [] __alloc_pages+0x1fa/0x392 > > [] alloc_page_vma+0x176/0x189 > > [] __do_fault+0x10c/0x417 > > [] handle_mm_fault+0x466/0x940 > > [] do_page_fault+0x676/0xabf > > > > Which blocks with the inode lock held, which then blocks other > > reclaimers: > > > > X D ffff81009d47c400 0 17285 14831 > > ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288 > > ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400 > > ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740 > > Call Trace: > > [] __mutex_lock_slowpath+0x72/0xa9 > > [] mutex_lock+0x1e/0x22 > > [] shrink_icache_memory+0x49/0x213 > > [] shrink_slab+0xe3/0x158 > > [] try_to_free_pages+0x177/0x232 > > [] __alloc_pages+0x1fa/0x392 > > [] alloc_pages_current+0xd1/0xd6 > > [] __get_free_pages+0xe/0x4d > > [] __pollwait+0x5e/0xdf > > [] :nvidia:nv_kern_poll+0x2e/0x73 > > [] do_select+0x308/0x506 > > [] core_sys_select+0x1a6/0x254 > > [] sys_select+0xb5/0x157 > > That isn't a hang. When the bread() completes, everything proceeds. > > > Now I think the main problem is having the filesystem block (and do IO > > in inode reclaim. The problem is that this doesn't get accounted well > > and penalizes a random allocator with a big latency spike caused by > > work generated from elsewhere. > > Yes. Why does UDF do all that stuff in ->clear_inode()? Other > filesystems have very simple, non-blocking, non-IO-doing > ->clear_inode() implementations. This sounds like a design problem > within UDF. Yes, it's a problem within the UDF code. I already got rid of discarding the preallocation in clear_inode() but still last extent is truncated there. The trouble with getting rid of that is that according to specs, a length of the last extent has to exactly match i_size (and even for directories or symlinks, i_size isn't blocksize aligned). So far we have all extent lengths block aligned and set the length of the last one in clear_inode. To get rid of that I'll probably set a length of the last extent on each inode write, but one has to be careful about races with truncate... Honza -- Jan Kara SUSE Labs, CR