From: Boaz Harrosh <boaz@plexistor.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-nfs@vger.kernel.org, Matt Benjamin <matt@linuxbox.com>
Subject: [PATCH] pnfs: Kick a pnfs_layoutcommit_inode on recall
Date: Tue, 26 Aug 2014 17:10:13 +0300 [thread overview]
Message-ID: <53FC9545.4000800@plexistor.com> (raw)
In-Reply-To: <20140824191839.GA9717@lst.de>
From: Boaz Harrosh <boaz@plexistor.com>
This fixes a dead-lock in the pnfs recall processing
pnfs_layoutcommit_inode() is called through update_inode()
called from VFS. By setting set_inode_dirty during
pnfs write IO.
But the VFS will not schedule another update_inode()
If it is already inside an update_inode() or an sb-writeback
As part of writeback pnfs code might get stuck in LAYOUT_GET
with the server returning ERR_RECALL_CONFLICT because some
operation has caused the server to RECALL all layouts, including
those from our client.
So the RECALL is received, but our client is returning ERR_DELAY
because its write-segments need a LAYOUT_COMMIT, but
pnfs_layoutcommit_inode will never come because it is scheduled
behind the LAYOUT_GET which is stuck waiting for the recall to
finish
Hence the deadlock, client is stuck polling LAYOUT_GET receiving
ERR_RECALL_CONFLICT. Server is stuck polling RECALL receiving
ERR_DELAY.
With pnfs-objects the above condition can easily happen, when
a file grows beyond a group of devices. The pnfs-objects-server
will RECALL all layouts because the file-objects-map will
change and all old layouts will have stale attributes, therefor
the RECALL is initiated as part of a LAYOUT_GET, and this can
be triggered from within a single client operation.
A simple solution is to kick out a pnfs_layoutcommit_inode()
from within the recall, to free any need-to-commit segments
and let the client return success on the RECALL, so streaming
can continue.
This patch Is based on 3.17-rc1. It is completely UNTESTED.
I have tested a version of this patch at around the 3.12 Kernel
at which point the deadlock was resolved but I hit some race
conditions on pnfs state management farther on, so the actual
overall processing was not fixed. But hopefully these were fixed
by Trond and Christoph, and it should work better now.
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
fs/nfs/callback_proc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 41db525..8660f96 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -171,6 +171,14 @@ static u32 initiate_file_draining(struct nfs_client *clp,
goto out;
ino = lo->plh_inode;
+
+ spin_lock(&ino->i_lock);
+ pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
+ spin_unlock(&ino->i_lock);
+
+ /* kick out any segs held by need to commit */
+ pnfs_layoutcommit_inode(ino, true);
+
spin_lock(&ino->i_lock);
if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) ||
pnfs_mark_matching_lsegs_invalid(lo, &free_me_list,
@@ -178,7 +186,6 @@ static u32 initiate_file_draining(struct nfs_client *clp,
rv = NFS4ERR_DELAY;
else
rv = NFS4ERR_NOMATCHING_LAYOUT;
- pnfs_set_layout_stateid(lo, &args->cbl_stateid, true);
spin_unlock(&ino->i_lock);
pnfs_free_lseg_list(&free_me_list);
pnfs_put_layout_hdr(lo);
--
1.9.3
next prev parent reply other threads:[~2014-08-26 14:10 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <pnfs block layout driver fixes V2>
2014-08-21 16:09 ` Christoph Hellwig
2014-08-21 16:09 ` [PATCH 01/19] nfs: cap request size to fit a kmalloced page array Christoph Hellwig
2014-08-21 16:09 ` [PATCH 02/19] pnfs: do not pass uninitialized lsegs to ->free_lseg Christoph Hellwig
2014-08-21 16:09 ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Christoph Hellwig
2014-08-24 17:49 ` Boaz Harrosh
2014-08-24 19:18 ` Christoph Hellwig
2014-08-26 14:10 ` Boaz Harrosh [this message]
2014-08-26 14:26 ` [PATCH] pnfs: Kick a pnfs_layoutcommit_inode on recall Trond Myklebust
2014-08-26 14:37 ` Boaz Harrosh
2014-08-26 14:52 ` Boaz Harrosh
2014-08-26 14:55 ` Trond Myklebust
2014-08-26 15:02 ` Boaz Harrosh
2014-08-26 15:24 ` Matt W. Benjamin
2014-08-26 15:36 ` Trond Myklebust
2014-08-26 16:56 ` Boaz Harrosh
2014-08-26 16:59 ` Trond Myklebust
2014-08-26 17:06 ` Boaz Harrosh
2014-08-26 17:54 ` Trond Myklebust
2014-08-26 18:19 ` Boaz Harrosh
2014-08-26 18:34 ` Boaz Harrosh
2014-08-26 18:41 ` Trond Myklebust
2014-08-26 19:46 ` Trond Myklebust
2014-08-27 8:50 ` Boaz Harrosh
2014-08-27 8:22 ` Boaz Harrosh
2014-09-09 0:37 ` [PATCH 03/19] pnfs: force a layout commit when encountering busy segments during recall Trond Myklebust
2014-09-09 5:49 ` Christoph Hellwig
2014-09-09 14:38 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 04/19] pnfs: don't check sequence on new stateids in layoutget Christoph Hellwig
2014-08-21 16:09 ` [PATCH 05/19] pnfs: retry after a bad stateid error from layoutget Christoph Hellwig
2014-08-21 16:09 ` [PATCH 06/19] pnfs: avoid using stale stateids after layoutreturn Christoph Hellwig
2014-08-21 16:09 ` [PATCH 07/19] pnfs: add flag to force read-modify-write in ->write_begin Christoph Hellwig
2014-09-09 3:50 ` Trond Myklebust
2014-09-09 5:53 ` Christoph Hellwig
2014-09-09 14:41 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 08/19] pnfs: add return_range method Christoph Hellwig
2014-08-25 13:50 ` Anna Schumaker
2014-08-25 14:09 ` Christoph Hellwig
2014-08-25 14:17 ` Anna Schumaker
2014-08-25 14:20 ` Christoph Hellwig
2014-09-09 3:57 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 09/19] pnfs: allow splicing pre-encoded pages into the layoutcommit args Christoph Hellwig
2014-08-21 16:09 ` [PATCH 10/19] pnfs/blocklayout: reject pnfs blocksize larger than page size Christoph Hellwig
2014-08-21 16:09 ` [PATCH 11/19] pnfs/blocklayout: improve GETDEVICEINFO error reporting Christoph Hellwig
2014-08-21 16:09 ` [PATCH 12/19] pnfs/blocklayout: plug block queues Christoph Hellwig
2014-08-21 16:09 ` [PATCH 13/19] pnfs/blocklayout: correctly decrement extent length Christoph Hellwig
2015-02-09 6:01 ` NeilBrown
2015-02-09 18:24 ` Christoph Hellwig
2014-08-21 16:09 ` [PATCH 14/19] pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist Christoph Hellwig
2014-09-09 4:43 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 15/19] pnfs/blocklayout: don't set pages uptodate Christoph Hellwig
2014-09-09 4:48 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 16/19] pnfs/blocklayout: rewrite extent tracking Christoph Hellwig
2014-08-25 14:36 ` Anna Schumaker
2014-08-25 14:43 ` Christoph Hellwig
2014-08-26 9:06 ` Boaz Harrosh
2014-09-09 4:50 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 17/19] pnfs/blocklayout: implement the return_range method Christoph Hellwig
2014-09-09 4:03 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 18/19] pnfs/blocklayout: return layouts on setattr Christoph Hellwig
2014-09-09 4:09 ` Trond Myklebust
2014-08-21 16:09 ` [PATCH 19/19] pnfs/blocklayout: allocate separate pages for the layoutcommit payload Christoph Hellwig
2014-09-09 4:52 ` Trond Myklebust
2014-08-21 16:13 ` pnfs block layout driver fixes V2 Christoph Hellwig
2014-09-09 4:12 ` Trond Myklebust
2014-09-09 5:54 ` Christoph Hellwig
2014-09-09 14:40 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53FC9545.4000800@plexistor.com \
--to=boaz@plexistor.com \
--cc=hch@lst.de \
--cc=linux-nfs@vger.kernel.org \
--cc=matt@linuxbox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).