From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from victor.provo.novell.com ([137.65.250.26]:56166 "EHLO prv3-mh.provo.novell.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753910AbbDWMeZ (ORCPT ); Thu, 23 Apr 2015 08:34:25 -0400 Message-ID: <5538E6C7.9050201@suse.com> Date: Thu, 23 Apr 2015 13:34:15 +0100 From: Filipe Manana MIME-Version: 1.0 To: =?UTF-8?B?SG9sZ2VyIEhvZmZzdMOkdHRl?= , linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix race when reusing stale extent buffers that leads to BUG_ON References: <1429784928-12665-1-git-send-email-fdmanana@suse.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 04/23/2015 01:16 PM, Holger Hoffstätte wrote: > On Thu, 23 Apr 2015 11:28:48 +0100, Filipe Manana wrote: > >> There's a race between releasing extent buffers that are flagged as stale >> and recycling them that makes us it the following BUG_ON at >> btrfs_release_extent_buffer_page: >> >> BUG_ON(extent_buffer_under_io(eb)) >> >> The BUG_ON is triggered because the extent buffer has the flag >> EXTENT_BUFFER_DIRTY set as a consequence of having been reused and made >> dirty by another concurrent task. > > Awesome analysis! > >> @@ -4768,6 +4768,25 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info, >> start >> PAGE_CACHE_SHIFT); >> if (eb && atomic_inc_not_zero(&eb->refs)) { >> rcu_read_unlock(); >> + /* >> + * Lock our eb's refs_lock to avoid races with >> + * free_extent_buffer. When we get our eb it might be flagged >> + * with EXTENT_BUFFER_STALE and another task running >> + * free_extent_buffer might have seen that flag set, >> + * eb->refs == 2, that the buffer isn't under IO (dirty and >> + * writeback flags not set) and it's still in the tree (flag >> + * EXTENT_BUFFER_TREE_REF set), therefore being in the process >> + * of decrementing the extent buffer's reference count twice. >> + * So here we could race and increment the eb's reference count, >> + * clear its stale flag, mark it as dirty and drop our reference >> + * before the other task finishes executing free_extent_buffer, >> + * which would later result in an attempt to free an extent >> + * buffer that is dirty. >> + */ >> + if (test_bit(EXTENT_BUFFER_STALE, &eb->bflags)) { >> + spin_lock(&eb->refs_lock); >> + spin_unlock(&eb->refs_lock); >> + } >> mark_extent_buffer_accessed(eb, NULL); >> return eb; >> } > > After staring at this (and the Lovecraftian horrors of free_extent_buffer()) > for over an hour and trying to understand how and why this could even remotely > work, I cannot help but think that this fix would shift the race to the much > smaller window between the test_bit and the first spin_lock. > Essentially you subtly phase-shifted all participants and make them avoid the > race most of the time, yet I cannot help but think it's still there (just much > smaller), and could strike again with different scheduling intervals. > > Would this be accurate? Hi Holger, Can you explain how the race can still happen? The goal here is just to make sure a reader does not advance too fast if the eb is stale and there's a concurrent call to free_extent_buffer() in progress. thanks > > -h > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >