From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:51045 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1160998AbcFBVke (ORCPT ); Thu, 2 Jun 2016 17:40:34 -0400 Date: Thu, 2 Jun 2016 14:40:32 -0700 From: Mark Fasheh To: Jeff Mahoney Cc: dsterba@suse.cz, Lu Fengqi , linux-btrfs@vger.kernel.org Subject: Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl Message-ID: <20160602214032.GT7633@wotan.suse.de> Reply-To: Mark Fasheh References: <1464760085-12078-1-git-send-email-lufq.fnst@cn.fujitsu.com> <20160601211522.GI7633@wotan.suse.de> <20160602170732.GJ29147@twin.jikos.cz> <20160602190814.GQ7633@wotan.suse.de> <75b66bb1-9d20-a1af-2f41-688b9aee73a3@suse.com> <20160602211740.GS7633@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20160602211740.GS7633@wotan.suse.de> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Jun 02, 2016 at 02:17:40PM -0700, Mark Fasheh wrote: > On Thu, Jun 02, 2016 at 04:56:06PM -0400, Jeff Mahoney wrote: > > On 6/2/16 3:08 PM, Mark Fasheh wrote: > > > On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote: > > >> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote: > > >>>> +/* dynamically allocate and initialize a ref_root */ > > >>>> +static struct ref_root *ref_root_alloc(void) > > >>>> +{ > > >>>> + struct ref_root *ref_tree; > > >>>> + > > >>>> + ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL); > > >>> > > >>> I'm pretty sure we want GFP_NOFS here. > > >> > > >> Then please explain to me why/where the reasoning below is wrong: > > > > > > The general reasoning of when to use GFP_NOFS below is fine, I don't > > > disagree with that at all. If there is no way a recursion back into btrfs > > > can't happen at that allocation site then we can use GFP_KERNEL. > > > > > > That said, have you closely audited this path? Does the allocation happen > > > completely outside any locks that might be shared with the writeout path? > > > What happens if we have to do writeout of the inode being fiemapped in order > > > to allocate space? If the answer to all my questions is "there is no way > > > this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise > > > GFP_NOFS is a sensible guard against possible future deadlocks. > > > > This is exactly the situation we discussed at LSF/MM this year. The MM > > folks are pushing back because the fs folks tend to use GFP_NOFS as a > > talisman. The audit needs to happen, otherwise that last sentence is > > another talisman. > > There's nothing here I disagree with. I'm not seeing a strong technical > justification, which is what I want (being called from an ioctl means > nothing in this case). A small amount of searching shows me that extent_fiemap() does lock_extent_bits() and writepage_delalloc() also calls lock_extent_bits() (via find_lock_delalloc_range()). I'm no expert on the extent locking but that seems pretty deadlocky to me. --Mark -- Mark Fasheh