Re: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
	Mark Fasheh <mfasheh@suse.com>,
	ocfs2-devel@oss.oracle.com, ceph-devel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation
Date: Thu, 12 Nov 2015 10:53:01 +0100	[thread overview]
Message-ID: <20151112095301.GA25265@quack.suse.cz> (raw)
In-Reply-To: <1447251233-14449-1-git-send-email-mhocko@kernel.org>

On Wed 11-11-15 15:13:53, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is
> called from the vm_operations_struct::fault() context during the page
> fault. This context doesn't need the reclaim protection normally.
> 
> ceph and ocfs2 which call filemap_fault from their fault handlers
> seem to be OK because they are not taking any fs lock before invoking
> generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
> from the reclaim recursion POV because this lock serializes truncate
> and punch hole with the page faults and it doesn't get involved in the
> reclaim.
> 
> There is simply no reason to deliberately use a weaker allocation
> context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS
> protection might be even harmful. There is a push to fail GFP_NOFS
> allocations rather than loop within allocator indefinitely with a
> very limited reclaim ability. Once we start failing those requests
> the OOM killer might be triggered prematurely because the page cache
> allocation failure is propagated up the page fault path and end up in
> pagefault_out_of_memory.
> 
> We cannot play with mapping_gfp_mask directly because that would be racy
> wrt. parallel page faults and it might interfere with other users who
> really rely on NOFS semantic from the stored gfp_mask. The mask is also
> inode proper so it would even be a layering violation. What we can do
> instead is to push the gfp_mask into struct vm_fault and allow fs layer
> to overwrite it should the callback need to be called with a different
> allocation context.
> 
> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
> because this should be safe from the page fault path normally. Why do we
> care about mapping_gfp_mask at all then? Because this doesn't hold only
> reclaim protection flags but it also might contain zone and movability
> restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
> those.
> 
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> this has been posted previously as a part of larger GFP_NOFS related
> patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
> but I think it makes sense to discuss it even out of that scope.
> 
> I would like to hear FS and other MM people about the proposed interface.
> Using mapping_gfp_mask blindly doesn't sound good to me and vm_fault
> looks like a proper channel to communicate between MM and FS layers.
> 
> Comments? Are there any better ideas?

Makes sense to me and the filesystems I know should be fine with this
(famous last words ;). Feel free to add:

Acked-by: Jan Kara <jack@suse.com>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
	Mark Fasheh <mfasheh@suse.com>,
	ocfs2-devel@oss.oracle.com, ceph-devel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: [Ocfs2-devel] [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation
Date: Thu, 12 Nov 2015 10:53:01 +0100	[thread overview]
Message-ID: <20151112095301.GA25265@quack.suse.cz> (raw)
In-Reply-To: <1447251233-14449-1-git-send-email-mhocko@kernel.org>

On Wed 11-11-15 15:13:53, mhocko at kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is
> called from the vm_operations_struct::fault() context during the page
> fault. This context doesn't need the reclaim protection normally.
> 
> ceph and ocfs2 which call filemap_fault from their fault handlers
> seem to be OK because they are not taking any fs lock before invoking
> generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
> from the reclaim recursion POV because this lock serializes truncate
> and punch hole with the page faults and it doesn't get involved in the
> reclaim.
> 
> There is simply no reason to deliberately use a weaker allocation
> context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS
> protection might be even harmful. There is a push to fail GFP_NOFS
> allocations rather than loop within allocator indefinitely with a
> very limited reclaim ability. Once we start failing those requests
> the OOM killer might be triggered prematurely because the page cache
> allocation failure is propagated up the page fault path and end up in
> pagefault_out_of_memory.
> 
> We cannot play with mapping_gfp_mask directly because that would be racy
> wrt. parallel page faults and it might interfere with other users who
> really rely on NOFS semantic from the stored gfp_mask. The mask is also
> inode proper so it would even be a layering violation. What we can do
> instead is to push the gfp_mask into struct vm_fault and allow fs layer
> to overwrite it should the callback need to be called with a different
> allocation context.
> 
> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
> because this should be safe from the page fault path normally. Why do we
> care about mapping_gfp_mask at all then? Because this doesn't hold only
> reclaim protection flags but it also might contain zone and movability
> restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
> those.
> 
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> this has been posted previously as a part of larger GFP_NOFS related
> patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
> but I think it makes sense to discuss it even out of that scope.
> 
> I would like to hear FS and other MM people about the proposed interface.
> Using mapping_gfp_mask blindly doesn't sound good to me and vm_fault
> looks like a proper channel to communicate between MM and FS layers.
> 
> Comments? Are there any better ideas?

Makes sense to me and the filesystems I know should be fine with this
(famous last words ;). Feel free to add:

Acked-by: Jan Kara <jack@suse.com>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: mhocko@kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Mel Gorman <mgorman@suse.de>, Dave Chinner <david@fromorbit.com>,
	Mark Fasheh <mfasheh@suse.com>,
	ocfs2-devel@oss.oracle.com, ceph-devel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation
Date: Thu, 12 Nov 2015 10:53:01 +0100	[thread overview]
Message-ID: <20151112095301.GA25265@quack.suse.cz> (raw)
In-Reply-To: <1447251233-14449-1-git-send-email-mhocko@kernel.org>

On Wed 11-11-15 15:13:53, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is
> called from the vm_operations_struct::fault() context during the page
> fault. This context doesn't need the reclaim protection normally.
> 
> ceph and ocfs2 which call filemap_fault from their fault handlers
> seem to be OK because they are not taking any fs lock before invoking
> generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
> from the reclaim recursion POV because this lock serializes truncate
> and punch hole with the page faults and it doesn't get involved in the
> reclaim.
> 
> There is simply no reason to deliberately use a weaker allocation
> context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS
> protection might be even harmful. There is a push to fail GFP_NOFS
> allocations rather than loop within allocator indefinitely with a
> very limited reclaim ability. Once we start failing those requests
> the OOM killer might be triggered prematurely because the page cache
> allocation failure is propagated up the page fault path and end up in
> pagefault_out_of_memory.
> 
> We cannot play with mapping_gfp_mask directly because that would be racy
> wrt. parallel page faults and it might interfere with other users who
> really rely on NOFS semantic from the stored gfp_mask. The mask is also
> inode proper so it would even be a layering violation. What we can do
> instead is to push the gfp_mask into struct vm_fault and allow fs layer
> to overwrite it should the callback need to be called with a different
> allocation context.
> 
> Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
> because this should be safe from the page fault path normally. Why do we
> care about mapping_gfp_mask at all then? Because this doesn't hold only
> reclaim protection flags but it also might contain zone and movability
> restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
> those.
> 
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Hi,
> this has been posted previously as a part of larger GFP_NOFS related
> patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
> but I think it makes sense to discuss it even out of that scope.
> 
> I would like to hear FS and other MM people about the proposed interface.
> Using mapping_gfp_mask blindly doesn't sound good to me and vm_fault
> looks like a proper channel to communicate between MM and FS layers.
> 
> Comments? Are there any better ideas?

Makes sense to me and the filesystems I know should be fine with this
(famous last words ;). Feel free to add:

Acked-by: Jan Kara <jack@suse.com>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2015-11-12  9:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 14:13 [PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation mhocko
2015-11-11 14:13 ` mhocko
2015-11-11 14:13 ` mhocko
2015-11-11 14:13 ` mhocko
2015-11-12  9:53 ` Jan Kara [this message]
2015-11-12  9:53   ` Jan Kara
2015-11-12  9:53   ` [Ocfs2-devel] " Jan Kara
2015-11-26 15:08   ` Michal Hocko
2015-11-26 15:08     ` Michal Hocko
2015-11-27 16:40     ` Vlastimil Babka
2015-11-27 16:40       ` Vlastimil Babka
2015-11-27 17:05       ` Michal Hocko
2015-11-27 17:05         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112095301.GA25265@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfasheh@suse.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.