From: NeilBrown <mr-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
To: Steven Whitehouse
<swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
reiserfs-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
logfs-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
Chris Mason <clm-b10kYP2dOMg@public.gmane.org>,
linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org,
ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.orgcluster-devel
<cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <mr@neil.brown.name>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509@redhat.com>
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20160501/1d2a31aa/attachment.sig>
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <mr@neil.brown.name>
To: Steven Whitehouse <swhiteho@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
"Theodore Ts'o" <tytso@mit.edu>,
linux-ntfs-dev@lists.sourceforge.net,
LKML <linux-kernel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
reiserfs-devel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net, logfs@logfs.org,
cluster-devel@redhat.com, Chris Mason <clm@fb.com>,
linux-mtd@lists.infradead.org, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
xfs@oss.sgi.com, ceph-devel@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org,
cluster-devel <cluster-devel@redhat.com>
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <mr-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
To: Steven Whitehouse
<swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
reiserfs-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
logfs-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org,
cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
Chris Mason <clm-b10kYP2dOMg@public.gmane.org>,
linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org,
ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
cluster-devel
<cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <mr@neil.brown.name>
To: Steven Whitehouse <swhiteho@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
Theodore Ts'o <tytso@mit.edu>,
logfs@logfs.org, linux-ntfs-dev@lists.sourceforge.net,
xfs@oss.sgi.com, LKML <linux-kernel@vger.kernel.org>,
reiserfs-devel@vger.kernel.org,
cluster-devel <cluster-devel@redhat.com>,
Chris Mason <clm@fb.com>,
linux-mtd@lists.infradead.org, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-ext4@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <mr@neil.brown.name>
To: Steven Whitehouse <swhiteho@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Cc: linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
Theodore Ts'o <tytso@mit.edu>,
linux-ntfs-dev@lists.sourceforge.net,
LKML <linux-kernel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
reiserfs-devel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net, logfs@logfs.org,
cluster-devel@redhat.com, Chris Mason <clm@fb.com>,
linux-mtd@lists.infradead.org, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
xfs@oss.sgi.com, ceph-devel@vger.kernel.org,
linux-btrfs@vger.kernel.org,
linux-afs@lists.infradead.orgcluster-devel
<cluster-devel@redhat.com>
Subject: Re: [Cluster-devel] [PATCH 0/2] scop GFP_NOFS api
Date: Sun, 01 May 2016 07:17:56 +1000 [thread overview]
Message-ID: <87wpneu77f.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <57233571.50509@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]
On Fri, Apr 29 2016, Steven Whitehouse wrote:
> Hi,
>
> On 29/04/16 06:35, NeilBrown wrote:
>> If we could similarly move evict() into kswapd (and I believe we can)
>> then most file systems would do nothing in reclaim context that
>> interferes with allocation context.
> evict() is an issue, but moving it into kswapd would be a potential
> problem for GFS2. We already have a memory allocation issue when
> recovery is taking place and memory is short. The code path is as follows:
>
> 1. Inode is scheduled for eviction (which requires deallocation)
> 2. The glock is required in order to perform the deallocation, which
> implies getting a DLM lock
> 3. Another node in the cluster fails, so needs recovery
> 4. When the DLM lock is requested, it gets blocked until recovery is
> complete (for the failed node)
> 5. Recovery is performed using a userland fencing utility
> 6. Fencing requires memory and then blocks on the eviction
> 7. Deadlock (Fencing waiting on memory alloc, memory alloc waiting on
> DLM lock, DLM lock waiting on fencing)
You even have user-space in the loop there - impressive! You can't
really pass GFP_NOFS to a user-space thread, can you :-?
>
> It doesn't happen often, but we've been looking at the best place to
> break that cycle, and one of the things we've been wondering is whether
> we could avoid deallocation evictions from memory related contexts, or
> at least make it async somehow.
I think "async" is definitely the answer and I think
evict()/evict_inode() is the best place to focus attention.
I can see now (thanks) that just moving the evict() call to kswapd isn't
really a solution as it will just serve to block kswapd and so lots of
other freeing of memory won't happen.
I'm now imagining giving ->evict_inode() a "don't sleep" flag and
allowing it to return -EAGAIN. In that case evict would queue the inode
to kswapd (or maybe another thread) for periodic retry.
The flag would only get set when prune_icache_sb() calls dispose_list()
to call evict(). Other uses (e.g. unmount, iput) would still be
synchronous.
How difficult would it be to change gfs's evict_inode() to optionally
never block?
For this to work we would need to add a way for
deactivate_locked_super() to wait for all the async evictions to
complete. Currently prune_icache_sb() is called under s_umount. If we
moved part of the eviction out of that lock some other synchronization
would be needed. Possibly a per-superblock list of "inodes being
evicted" would suffice.
Thanks,
NeilBrown
>
> The issue is that it is not possible to know in advance whether an
> eviction will result in mearly writing things back to disk (because the
> inode is being dropped from cache, but still resides on disk) which is
> easy to do, or whether it requires a full deallocation (n_link==0) which
> may require significant resources and time,
>
> Steve.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2016-04-30 21:17 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-26 11:56 [PATCH 0/2] scop GFP_NOFS api Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` [Cluster-devel] " Michal Hocko
2016-04-26 11:56 ` [PATCH 1/2] mm: add PF_MEMALLOC_NOFS Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` [Cluster-devel] " Michal Hocko
2016-04-26 23:07 ` Dave Chinner
2016-04-26 23:07 ` Dave Chinner
2016-04-26 23:07 ` Dave Chinner
2016-04-26 23:07 ` [Cluster-devel] " Dave Chinner
2016-04-27 7:51 ` Michal Hocko
2016-04-27 7:51 ` Michal Hocko
2016-04-27 7:51 ` Michal Hocko
2016-04-27 7:51 ` [Cluster-devel] " Michal Hocko
2016-04-27 10:53 ` Tetsuo Handa
2016-04-27 10:53 ` Tetsuo Handa
2016-04-27 11:15 ` Michal Hocko
2016-04-27 11:15 ` Michal Hocko
2016-04-27 14:44 ` Tetsuo Handa
2016-04-27 14:44 ` Tetsuo Handa
2016-04-27 20:05 ` Michal Hocko
2016-04-27 20:05 ` Michal Hocko
2016-04-27 11:54 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` [Cluster-devel] " Michal Hocko
2016-04-27 11:54 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` Michal Hocko
2016-04-27 11:54 ` [Cluster-devel] [PATCH 1.2/2] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2016-04-27 13:07 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 13:07 ` Michal Hocko
2016-04-27 13:07 ` Michal Hocko
2016-04-27 13:07 ` [Cluster-devel] [PATCH 1.2/2] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2016-04-27 20:09 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 20:09 ` Michal Hocko
2016-04-27 20:09 ` Michal Hocko
2016-04-27 20:09 ` [Cluster-devel] [PATCH 1.2/2] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2016-04-27 20:30 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 20:30 ` Michal Hocko
2016-04-27 20:30 ` Michal Hocko
2016-04-27 20:30 ` [Cluster-devel] [PATCH 1.2/2] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2016-04-27 21:14 ` [PATCH 1.2/2] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 21:14 ` Michal Hocko
2016-04-27 21:14 ` [Cluster-devel] [PATCH 1.2/2] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2016-04-27 17:41 ` [PATCH 1.1/2] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Andreas Dilger
2016-04-27 17:41 ` Andreas Dilger
2016-04-27 17:41 ` [Cluster-devel] " Andreas Dilger
2016-04-27 19:43 ` Michal Hocko
2016-04-27 19:43 ` Michal Hocko
2016-04-27 19:43 ` Michal Hocko
2016-04-27 19:43 ` [Cluster-devel] " Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-26 11:56 ` Michal Hocko
2016-04-26 11:56 ` [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context Michal Hocko
2016-04-26 22:58 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Dave Chinner
2016-04-26 22:58 ` Dave Chinner
2016-04-26 22:58 ` Dave Chinner
2016-04-26 22:58 ` [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context Dave Chinner
2016-04-27 8:03 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-27 8:03 ` Michal Hocko
2016-04-27 8:03 ` Michal Hocko
2016-04-27 8:03 ` [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context Michal Hocko
2016-04-27 22:55 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Dave Chinner
2016-04-27 22:55 ` Dave Chinner
2016-04-27 22:55 ` Dave Chinner
2016-04-27 22:55 ` [Cluster-devel] [PATCH 2/2] mm, debug: report when GFP_NO{FS, IO} is used explicitly from memalloc_no{fs, io}_{save, restore} context Dave Chinner
2016-04-28 8:17 ` [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context Michal Hocko
2016-04-28 8:17 ` Michal Hocko
2016-04-28 8:17 ` Michal Hocko
2016-04-28 21:51 ` Dave Chinner
2016-04-28 21:51 ` Dave Chinner
2016-04-28 21:51 ` Dave Chinner
2016-04-29 12:12 ` Michal Hocko
2016-04-29 12:12 ` Michal Hocko
2016-04-29 12:12 ` Michal Hocko
2016-04-29 23:40 ` Dave Chinner
2016-04-29 23:40 ` Dave Chinner
2016-04-29 23:40 ` Dave Chinner
2016-05-03 15:38 ` Michal Hocko
2016-05-03 15:38 ` Michal Hocko
2016-05-03 15:38 ` Michal Hocko
2016-05-04 0:07 ` Dave Chinner
2016-05-04 0:07 ` Dave Chinner
2016-05-04 0:07 ` Dave Chinner
2016-04-29 5:35 ` [PATCH 0/2] scop GFP_NOFS api NeilBrown
2016-04-29 5:35 ` NeilBrown
2016-04-29 5:35 ` [Cluster-devel] " NeilBrown
2016-04-29 10:20 ` Steven Whitehouse
2016-04-29 10:20 ` Steven Whitehouse
2016-04-29 10:20 ` Steven Whitehouse
2016-04-29 10:20 ` Steven Whitehouse
2016-04-29 10:20 ` Steven Whitehouse
[not found] ` <57233571.50509-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-30 21:17 ` NeilBrown [this message]
2016-04-30 21:17 ` NeilBrown
2016-04-30 21:17 ` NeilBrown
2016-04-30 21:17 ` NeilBrown
2016-04-30 21:17 ` NeilBrown
2016-04-30 21:17 ` NeilBrown
2016-04-29 12:04 ` Michal Hocko
2016-04-29 12:04 ` Michal Hocko
2016-04-29 12:04 ` Michal Hocko
2016-04-29 12:04 ` [Cluster-devel] " Michal Hocko
2016-04-30 0:24 ` Dave Chinner
2016-04-30 0:24 ` Dave Chinner
2016-04-30 0:24 ` Dave Chinner
2016-04-30 0:24 ` [Cluster-devel] " Dave Chinner
2016-04-30 21:55 ` NeilBrown
2016-04-30 21:55 ` NeilBrown
2016-04-30 21:55 ` [Cluster-devel] " NeilBrown
2016-05-03 15:13 ` Michal Hocko
2016-05-03 15:13 ` Michal Hocko
2016-05-03 15:13 ` Michal Hocko
2016-05-03 15:13 ` [Cluster-devel] " Michal Hocko
2016-05-03 23:26 ` NeilBrown
2016-05-03 23:26 ` NeilBrown
2016-05-03 23:26 ` [Cluster-devel] " NeilBrown
2016-04-30 0:11 ` Dave Chinner
2016-04-30 0:11 ` Dave Chinner
2016-04-30 0:11 ` Dave Chinner
2016-04-30 0:11 ` [Cluster-devel] " Dave Chinner
2016-04-30 22:19 ` NeilBrown
2016-04-30 22:19 ` NeilBrown
2016-04-30 22:19 ` [Cluster-devel] " NeilBrown
2016-05-04 1:00 ` Dave Chinner
2016-05-04 1:00 ` Dave Chinner
2016-05-04 1:00 ` Dave Chinner
2016-05-04 1:00 ` [Cluster-devel] " Dave Chinner
2016-05-06 3:20 ` NeilBrown
2016-05-06 3:20 ` NeilBrown
2016-05-06 3:20 ` [Cluster-devel] " NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wpneu77f.fsf@notabene.neil.brown.name \
--to=mr-eq65iwfr9nkiecxxmxunqa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=clm-b10kYP2dOMg@public.gmane.org \
--cc=cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=linux-afs-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.orgcluster-devel \
--cc=linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-f2fs-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-mtd-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-ntfs-dev-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=logfs-PCqxUs/MD9bYtjvyW6yDsg@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=reiserfs-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
--cc=xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.