All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Neil Brown <neilb@suse.de>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Sage Weil <sage@inktank.com>, Mark Fasheh <mfasheh@suse.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
Date: Fri, 20 Mar 2015 14:48:20 +1100	[thread overview]
Message-ID: <20150320034820.GH28621@dastard> (raw)
In-Reply-To: <20150319124441.GC12466@dhcp22.suse.cz>

On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote:
> On Thu 19-03-15 18:14:39, Dave Chinner wrote:
> > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote:
> > > On Wed 18-03-15 10:44:11, Rik van Riel wrote:
> > > > On 03/18/2015 10:09 AM, Michal Hocko wrote:
> > > > > page_cache_read has been historically using page_cache_alloc_cold to
> > > > > allocate a new page. This means that mapping_gfp_mask is used as the
> > > > > base for the gfp_mask. Many filesystems are setting this mask to
> > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
> > > > > however, not called from the fs layer 
> > > > 
> > > > Is that true for filesystems that have directories in
> > > > the page cache?
> > > 
> > > I haven't found any explicit callers of filemap_fault except for ocfs2
> > > and ceph and those seem OK to me. Which filesystems you have in mind?
> > 
> > Just about every major filesystem calls filemap_fault through the
> > .fault callout.
> 
> That is right but the callback is called from the VM layer where we
> obviously do not take any fs locks (we are holding only mmap_sem
> for reading).
> Those who call filemap_fault directly (ocfs2 and ceph) and those
> who call the callback directly: qxl_ttm_fault, radeon_ttm_fault,
> kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion
> POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem
> to be used from the reclaim context.
> 
> Or did I miss your point? Are you concerned about some fs overloading
> filemap_fault and do some locking before delegating to filemap_fault?

The latter:

https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b

> > GFP_KERNEL allocation for mappings is simply wrong. All mapping
> > allocations where the caller cannot pass a gfp_mask need to obey
> > the mapping_gfp_mask that is set by the mapping owner....
> 
> Hmm, I thought this is true only when the function might be called from
> the fs path.

How do you know in, say, mpage_readpages, you aren't being called
from a fs path that holds locks? e.g. we can get there from ext4
doing readdir, so it is holding an i_mutex lock at that point.

Many other paths into mpages_readpages don't hold locks, but there
are some that do, and those that do need functionals like this to
obey the mapping_gfp_mask because it is set appropriately for the
allocation context of the inode that owns the mapping....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Neil Brown <neilb@suse.de>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Sage Weil <sage@inktank.com>, Mark Fasheh <mfasheh@suse.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read
Date: Fri, 20 Mar 2015 14:48:20 +1100	[thread overview]
Message-ID: <20150320034820.GH28621@dastard> (raw)
In-Reply-To: <20150319124441.GC12466@dhcp22.suse.cz>

On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote:
> On Thu 19-03-15 18:14:39, Dave Chinner wrote:
> > On Wed, Mar 18, 2015 at 03:55:28PM +0100, Michal Hocko wrote:
> > > On Wed 18-03-15 10:44:11, Rik van Riel wrote:
> > > > On 03/18/2015 10:09 AM, Michal Hocko wrote:
> > > > > page_cache_read has been historically using page_cache_alloc_cold to
> > > > > allocate a new page. This means that mapping_gfp_mask is used as the
> > > > > base for the gfp_mask. Many filesystems are setting this mask to
> > > > > GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
> > > > > however, not called from the fs layer 
> > > > 
> > > > Is that true for filesystems that have directories in
> > > > the page cache?
> > > 
> > > I haven't found any explicit callers of filemap_fault except for ocfs2
> > > and ceph and those seem OK to me. Which filesystems you have in mind?
> > 
> > Just about every major filesystem calls filemap_fault through the
> > .fault callout.
> 
> That is right but the callback is called from the VM layer where we
> obviously do not take any fs locks (we are holding only mmap_sem
> for reading).
> Those who call filemap_fault directly (ocfs2 and ceph) and those
> who call the callback directly: qxl_ttm_fault, radeon_ttm_fault,
> kernfs_vma_fault, shm_fault seem to be safe from the reclaim recursion
> POV. radeon_ttm_fault takes a lock for reading but that one doesn't seem
> to be used from the reclaim context.
> 
> Or did I miss your point? Are you concerned about some fs overloading
> filemap_fault and do some locking before delegating to filemap_fault?

The latter:

https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b

> > GFP_KERNEL allocation for mappings is simply wrong. All mapping
> > allocations where the caller cannot pass a gfp_mask need to obey
> > the mapping_gfp_mask that is set by the mapping owner....
> 
> Hmm, I thought this is true only when the function might be called from
> the fs path.

How do you know in, say, mpage_readpages, you aren't being called
from a fs path that holds locks? e.g. we can get there from ext4
doing readdir, so it is holding an i_mutex lock at that point.

Many other paths into mpages_readpages don't hold locks, but there
are some that do, and those that do need functionals like this to
obey the mapping_gfp_mask because it is set appropriately for the
allocation context of the inode that owns the mapping....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-03-20  3:48 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 14:09 [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-18 14:09 ` Michal Hocko
2015-03-18 14:32 ` Rik van Riel
2015-03-18 14:32   ` Rik van Riel
2015-03-18 14:37   ` Michal Hocko
2015-03-18 14:37     ` Michal Hocko
2015-03-18 14:38 ` Mel Gorman
2015-03-18 14:38   ` Mel Gorman
2015-03-18 14:43   ` Michal Hocko
2015-03-18 14:43     ` Michal Hocko
2015-03-18 14:44 ` Rik van Riel
2015-03-18 14:44   ` Rik van Riel
2015-03-18 14:55   ` Michal Hocko
2015-03-18 14:55     ` Michal Hocko
2015-03-19  7:14     ` Dave Chinner
2015-03-19  7:14       ` Dave Chinner
2015-03-19 11:11       ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache inpage_cache_read Tetsuo Handa
2015-03-19 11:11         ` Tetsuo Handa
2015-03-19 12:44       ` [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read Michal Hocko
2015-03-19 12:44         ` Michal Hocko
2015-03-20  3:48         ` Dave Chinner [this message]
2015-03-20  3:48           ` Dave Chinner
2015-03-20 13:14           ` Michal Hocko
2015-03-20 13:14             ` Michal Hocko
2015-03-20 22:51             ` Dave Chinner
2015-03-20 22:51               ` Dave Chinner
2015-03-23 13:02               ` Michal Hocko
2015-03-23 13:02                 ` Michal Hocko
2015-03-26  9:53           ` Michal Hocko
2015-03-26  9:53             ` Michal Hocko
2015-03-26 21:43             ` Dave Chinner
2015-03-26 21:43               ` Dave Chinner
2015-03-30  8:22               ` Michal Hocko
2015-03-30  8:22                 ` Michal Hocko
2015-03-31 21:46                 ` Dave Chinner
2015-03-31 21:46                   ` Dave Chinner
2015-04-07 12:16                   ` Michal Hocko
2015-04-07 12:16                     ` Michal Hocko
2015-03-18 15:45 ` Michal Hocko
2015-03-18 15:45   ` Michal Hocko
2015-03-18 21:38   ` NeilBrown
2015-03-19 13:55     ` Michal Hocko
2015-03-19 13:55       ` Michal Hocko
2015-03-19 14:27       ` Michal Hocko
2015-03-19 14:27         ` Michal Hocko
2015-03-20  3:57       ` Dave Chinner
2015-03-20  3:57         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150320034820.GH28621@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfasheh@suse.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=neilb@suse.de \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=riel@redhat.com \
    --cc=sage@inktank.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.