From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Nesterov Subject: Re: __sb_start_write() && force_trylock hack Date: Wed, 19 Aug 2015 17:00:26 +0200 Message-ID: <20150819150026.GA19317@redhat.com> References: <20150818144900.GA26478@redhat.com> <20150819031158.GO714@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20150819031158.GO714@dastard> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 08/19, Dave Chinner wrote: > > On Tue, Aug 18, 2015 at 04:49:00PM +0200, Oleg Nesterov wrote: > > Jan, Dave, perhaps you can take a look... > > > > On 08/14, Oleg Nesterov wrote: > > > > > > Plus another patch which removes the "trylock" > > > hack in __sb_start_write(). > > > > I meant the patch we already discussed (attached at the end). And yes, > > previously I reported it passed the tests. However, I only ran the same > > 'grep -il freeze tests/*/???' tests. When I tried to run all tests, I > > got the new reports from lockdep. > > > > [ 2098.281171] May be due to missing lock nesting notation > > > > > [ 2098.288744] 4 locks held by fsstress/50971: > > [ 2098.293408] #0: (sb_writers#13){++++++}, at: [] __sb_start_write+0x32/0x40 > > [ 2098.303085] #1: (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [] filename_create+0x7f/0x170 > > [ 2098.314038] #2: (sb_internal#2){++++++}, at: [] __sb_start_write+0x32/0x40 > > [ 2098.323711] #3: (&type->s_umount_key#54){++++++}, at: [] xfs_flush_inodes+0x1c/0x40 [xfs] > > [ 2098.334898] > > stack backtrace: > > [ 2098.339762] CPU: 3 PID: 50971 Comm: fsstress Not tainted 4.2.0-rc6+ #27 > > [ 2098.347143] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 > > [ 2098.358303] 0000000000000000 00000000e70ee864 ffff880c05a2b9c8 ffffffff817ee692 > > [ 2098.366603] 0000000000000000 ffffffff826f8030 ffff880c05a2bab8 ffffffff810f45be > > [ 2098.374900] 0000000000000000 ffff880c05a2bb20 0000000000000000 0000000000000000 > > [ 2098.383197] Call Trace: > > [ 2098.385930] [] dump_stack+0x45/0x57 > > [ 2098.391667] [] __lock_acquire+0x1e9e/0x2040 > > [ 2098.398177] [] ? __lock_acquire+0x9ed/0x2040 > > [ 2098.404787] [] ? pagevec_lookup_entries+0x22/0x30 > > [ 2098.411879] [] ? truncate_inode_pages_range+0x2b2/0x7e0 > > [ 2098.419551] [] lock_acquire+0xbe/0x150 > > [ 2098.425566] [] ? __sb_start_write+0x32/0x40 > > [ 2098.432079] [] percpu_down_read+0x51/0xa0 > > [ 2098.438396] [] ? __sb_start_write+0x32/0x40 > > [ 2098.444905] [] __sb_start_write+0x32/0x40 > > [ 2098.451244] [] xfs_trans_alloc+0x24/0x40 [xfs] > > [ 2098.458076] [] xfs_inactive_truncate+0x32/0x110 [xfs] > > [ 2098.465580] [] xfs_inactive+0x102/0x120 [xfs] > > [ 2098.472308] [] xfs_fs_evict_inode+0x7b/0xc0 [xfs] > > [ 2098.479401] [] evict+0xab/0x170 > > [ 2098.484748] [] iput+0x1a8/0x230 > > [ 2098.490100] [] sync_inodes_sb+0x14c/0x1d0 > > [ 2098.496439] [] xfs_flush_inodes+0x28/0x40 [xfs] > > [ 2098.503361] [] xfs_create+0x5c3/0x6d0 [xfs] > > Another false positive. Confused... Dave, I removed your explanation which I can't understand anyway, let me remind that I know absolutely nothing about filesystems. > But there is no deadlock here Yes, we hold SB_FREEZE_WRITE lock, so recursive SB_FREEZE_FS is safe. But, this means that the comment in __sb_start_write() is still correct, "XFS for example gets freeze protection on internal level twice" is true, and we can not remove this force_trylock hack. (Plus we have the jbd2_handle/sb_internal lock inversion but this is the same thing, safe because of the held lock on higher level). Or I misunderstood? Oleg.