From: NeilBrown <mr-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
To: Steven Whitehouse
<swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
reiserfs-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
logfs-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
Chris Mason <clm-b10kYP2dOMg@public.gmane.org>,
linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org,
ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.orgcluster-devel
<cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2016-04-30 21:17 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-26 11:56 [PATCH 0/2] scop GFP_NOFS api Michal Hocko
2016-04-26 11:56 ` [PATCH 1/2] mm: add PF_MEMALLOC_NOFS Michal Hocko
2016-04-26 23:07 ` Dave Chinner
2016-04-27 7:51 ` Michal Hocko
2016-04-27 11:54 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2016-04-27 11:54 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 13:07 ` Michal Hocko
2016-04-27 20:09 ` Michal Hocko
2016-04-27 20:30 ` Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 17:41 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Andreas Dilger
2016-04-27 19:43 ` Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 22:58 ` Dave Chinner
2016-04-27 8:03 ` Michal Hocko
2016-04-27 22:55 ` Dave Chinner
2016-04-29 5:35 ` [PATCH 0/2] scop GFP_NOFS api NeilBrown
2016-04-29 10:20 ` [Cluster-devel] " Steven Whitehouse
[not found] ` <57233571.50509-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-30 21:17 ` NeilBrown [this message]
2016-04-29 12:04 ` Michal Hocko
2016-04-30 0:24 ` Dave Chinner
2016-04-30 21:55 ` NeilBrown
2016-05-03 15:13 ` Michal Hocko
2016-05-03 23:26 ` NeilBrown
2016-04-30 0:11 ` Dave Chinner
2016-04-30 22:19 ` NeilBrown
2016-05-04 1:00 ` Dave Chinner
2016-05-06 3:20 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wpneu77f.fsf@notabene.neil.brown.name \
--to=mr-eq65iwfr9nkiecxxmxunqa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=clm-b10kYP2dOMg@public.gmane.org \
--cc=cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.orgcluster-devel \
--cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=logfs-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=reiserfs-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
--cc=xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).