From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@vger.kernel.org>
Received: from mx2.suse.de ([195.135.220.15]:51045 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1160998AbcFBVke (ORCPT <rfc822;linux-btrfs@vger.kernel.org>);
	Thu, 2 Jun 2016 17:40:34 -0400
Date: Thu, 2 Jun 2016 14:40:32 -0700
From: Mark Fasheh <mfasheh@suse.de>
To: Jeff Mahoney <jeffm@suse.com>
Cc: dsterba@suse.cz, Lu Fengqi <lufq.fnst@cn.fujitsu.com>,
        linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl
Message-ID: <20160602214032.GT7633@wotan.suse.de>
Reply-To: Mark Fasheh <mfasheh@suse.de>
References: <1464760085-12078-1-git-send-email-lufq.fnst@cn.fujitsu.com>
 <20160601211522.GI7633@wotan.suse.de>
 <20160602170732.GJ29147@twin.jikos.cz>
 <20160602190814.GQ7633@wotan.suse.de>
 <75b66bb1-9d20-a1af-2f41-688b9aee73a3@suse.com>
 <20160602211740.GS7633@wotan.suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20160602211740.GS7633@wotan.suse.de>
Sender: linux-btrfs-owner@vger.kernel.org
List-ID: <linux-btrfs.vger.kernel.org>

On Thu, Jun 02, 2016 at 02:17:40PM -0700, Mark Fasheh wrote:
> On Thu, Jun 02, 2016 at 04:56:06PM -0400, Jeff Mahoney wrote:
> > On 6/2/16 3:08 PM, Mark Fasheh wrote:
> > > On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote:
> > >> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
> > >>>> +/* dynamically allocate and initialize a ref_root */
> > >>>> +static struct ref_root *ref_root_alloc(void)
> > >>>> +{
> > >>>> +	struct ref_root *ref_tree;
> > >>>> +
> > >>>> +	ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> > >>>
> > >>> I'm pretty sure we want GFP_NOFS here.
> > >>
> > >> Then please explain to me why/where the reasoning below is wrong:
> > > 
> > > The general reasoning of when to use GFP_NOFS below is fine, I don't
> > > disagree with that at all. If there is no way a recursion back into btrfs
> > > can't happen at that allocation site then we can use GFP_KERNEL.
> > > 
> > > That said, have you closely audited this path? Does the allocation happen
> > > completely outside any locks that might be shared with the writeout path?
> > > What happens if we have to do writeout of the inode being fiemapped in order
> > > to allocate space? If the answer to all my questions is "there is no way
> > > this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise
> > > GFP_NOFS is a sensible guard against possible future deadlocks.
> > 
> > This is exactly the situation we discussed at LSF/MM this year.  The MM
> > folks are pushing back because the fs folks tend to use GFP_NOFS as a
> > talisman.  The audit needs to happen, otherwise that last sentence is
> > another talisman.
> 
> There's nothing here I disagree with. I'm not seeing a strong technical
> justification, which is what I want (being called from an ioctl means
> nothing in this case).

A small amount of searching shows me that extent_fiemap() does
lock_extent_bits() and writepage_delalloc() also calls lock_extent_bits()
(via find_lock_delalloc_range()).

I'm no expert on the extent locking but that seems pretty deadlocky to me.
	--Mark

--
Mark Fasheh