linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	xfs@oss.sgi.com
Subject: Re: [PATCH] mm: add gfp_mask parameter to vm_map_ram()
Date: Thu, 14 Jun 2012 13:34:29 +1000	[thread overview]
Message-ID: <20120614033429.GD7339@dastard> (raw)
In-Reply-To: <4FD94779.3030108@kernel.org>

On Thu, Jun 14, 2012 at 11:07:53AM +0900, Minchan Kim wrote:
> Hi Fengguang,
> 
> On 06/14/2012 10:49 AM, Fengguang Wu wrote:
> 
> > On Thu, Jun 14, 2012 at 11:20:26AM +1000, Dave Chinner wrote:
> >> On Wed, Jun 13, 2012 at 08:39:32PM +0800, Fengguang Wu wrote:
> >>> Hi Christoph, Dave,
> >>>
> >>> I got this lockdep warning on XFS when running the xfs tests:

[rant warning]

> >> Bug in vm_map_ram - it does an unconditional GFP_KERNEL allocation
> >> here, and we are in a GFP_NOFS context. We can't pass a gfp_mask to
> >> vm_map_ram(), so until vm_map_ram() grows that we can't fix it...
> > 
> > This trivial patch should fix it.
.....
> 
> It shouldn't work because vmap_page_range still can allocate GFP_KERNEL by pud_alloc in vmap_pud_range.
> For it, I tried [1] but other mm guys want to add WARNING [2] so let's avoiding gfp context passing.

Oh, wonderful, you're pulling the "it's not a MM issue, don't use
vmalloc" card.

> [1] https://lkml.org/lkml/2012/4/23/77

https://lkml.org/lkml/2012/4/24/29

"vmalloc was never supposed to use gfp flags for allocation
"context" restriction. I.e., it was always supposed to have
blocking, fs, and io capable allocation context."

vmalloc always was a badly neglected, ugly step-sister of kmalloc
that was kept in the basement and only brought out when the tax
collector called.  But that inner ugliness doesn't change the fact
that beatiful things have been built around it. XFS has used
vm_map_ram() and it's predecessor since it was first ported to linux
some 13 or 14 years ago, so the above claim is way out of date. i.e.
vmalloc has been used in GFP_NOFS context since before that flag
even existed....

http://lkml.org/lkml/2012/4/24/67

"I would say add a bit of warnings and documentation, and see what
can be done about callers."

Wonderful. Well, there's about 2 years of work queued up for me
before I even get to the do the open heart surgery that would allow
XFS to handle memory allocation failures at this level without
causing the filesystem to shut down.

Andrew Morton's response:

https://lkml.org/lkml/2012/4/24/413

"There are gruesome problems in block/blk-throttle.c (thread
"mempool, percpu, blkcg: fix percpu stat allocation and remove
stats_lock").  It wants to do an alloc_percpu()->vmalloc() from the
IO submission path, under GFP_NOIO.

Changing vmalloc() to take a gfp_t does make lots of sense, although
I worry a bit about making vmalloc() easier to use!"

OK, so according to Andrew there is no technical reason why it can't
be done, it's just handwaving about "vmalloc is bad"....


> [2] https://lkml.org/lkml/2012/5/2/340

https://lkml.org/lkml/2012/5/2/452

"> Where are these offending callsites?

dm:
...
ubi:
....
ext4:
....
ntfs:
....
ubifs:
....
mm:
....
ceph:
...."

So, we've got a bunch of filesystems that require vmalloc under
GFP_NOFS conditions. Perhaps there's a reason for needing to be able
to do this in filesystem code? Like, perhaps, avoiding memory
reclaim deadlocks?

https://lkml.org/lkml/2012/5/3/27

"Note that in writeback paths, a "good citizen" filesystem should
not require any allocations, or at least it should be able to
tolerate allocation failures.  So fixing that would be a good idea
anyway."

Oh, please. I have been hearing this for years, and are we any
closer to it? No, we are further away from ever being able to
acheive this than ever. Face it, filesystems require memory
allocation to write dirty data to disk, and the amount is almost
impossible to define. Hence mempools can't be used because we can't
give any guarantees of forward progress. And for vmalloc?

Filesystems widely use vmalloc/vm_map_ram because kmalloc fails on
large contiguous allocations. This renders kmalloc unfit for
purpose, so we have to fall back to single page allocation and
vm_map_ram or vmalloc so that the filesystem can function properly.
And to avoid deadlocks, all memory allocation must be able to
specify GFP_NOFS to prevent the MM subsystem from recursing into the
filesystem. Therefore, vmalloc needs to support GFP_NOFS.

I don't care how you make it happen, just fix it. Trying to place
the blame on the filesystem folk for using vmalloc in GFP_NOFS
contexts is a total and utter cop-out, because mm folk of all people
should know that non-zero order kmalloc is not a reliable
alternative....

[end rant]

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-06-14  3:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120612012134.GA7706@localhost>
2012-06-13 12:39 ` xfs ip->i_lock: inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage Fengguang Wu
2012-06-14  1:20   ` Dave Chinner
2012-06-14  1:49     ` [PATCH] mm: add gfp_mask parameter to vm_map_ram() Fengguang Wu
2012-06-14  2:07       ` Minchan Kim
2012-06-14  2:21         ` Tejun Heo
2012-06-14  2:39           ` Minchan Kim
2012-06-14  3:34         ` Dave Chinner [this message]
2012-06-14  3:53           ` David Rientjes
2012-06-14  5:51           ` Minchan Kim
2012-06-14  7:52           ` Andreas Dilger
2012-06-14  2:15       ` Dave Chinner
2012-06-14  1:22   ` xfs ip->i_lock: inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage Dave Chinner
2012-06-14  1:29     ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120614033429.GD7339@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=tj@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).