public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pNFS: deadlock in pnfs_send_layoutreturn
@ 2026-04-07 15:20 Ben Roberts
  2026-04-08 11:39 ` kernel test robot
  2026-04-10  5:28 ` kernel test robot
  0 siblings, 2 replies; 3+ messages in thread
From: Ben Roberts @ 2026-04-07 15:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, linux-kernel, Ben Roberts

Apologies, resending due to improper mail client settings on earlier
attempts.

On a HPC cluster running 5.14.0-611.9.1.el9.x86_64, regular deadlocks were seen
within pnfs_send_layoutreturn leading to userspace processes stuck in
uninterruptible sleep, ultimately requiring reboots to clear. This was occurring
frequently, sometimes multiple times per day on specific hosts with heavy load.
Claude code was tasked with hunting down any potential deadlocks within
pnfs_send_layoutreturn, and identified the following condition. This patch has
been running in production on top of the EL9 kernel for over three months
without any reoccurrence of the deadlock.

The pnfs_send_layoutreturn() function can deadlock when memory
allocation fails. The issue occurs in the error path where
pnfs_put_layout_hdr() is called, which may trigger
pnfs_layoutreturn_before_put_layout_hdr(), potentially causing
a recursive call back to pnfs_send_layoutreturn().

Call chain that triggers the deadlock:
1. pnfs_send_layoutreturn() - kzalloc() fails
2. Error path calls pnfs_put_layout_hdr(lo)
3. pnfs_put_layout_hdr() calls pnfs_layoutreturn_before_put_layout_hdr()
4. If NFS_LAYOUT_RETURN_REQUESTED is still set, attempts another
   layoutreturn, creating recursion/deadlock

The fix ensures that NFS_LAYOUT_RETURN_REQUESTED is cleared in the
allocation failure path before calling pnfs_put_layout_hdr(). This
prevents pnfs_layoutreturn_before_put_layout_hdr() from attempting
another layout return, breaking the recursion cycle.

Signed-off-by: Ben Roberts <ben.roberts@gsacapital.com>
Assisted-by: Claude:claude-sonnet-4-5
---
 fs/nfs/pnfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index bc13d1e69449..47bda53b2b3a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1361,6 +1361,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
 	if (unlikely(lrp == NULL)) {
 		status = -ENOMEM;
 		spin_lock(&ino->i_lock);
+		pnfs_clear_layoutreturn_info(lo)
 		pnfs_clear_layoutreturn_waitbit(lo);
 		spin_unlock(&ino->i_lock);
 		put_cred(cred);
--
2.43.0

For details of how GSA uses your personal information, please see our Privacy Notice here: https://www.gsacapital.com/privacy-notice 

This email and any files transmitted with it contain confidential and proprietary information and is solely for the use of the intended recipient.
If you are not the intended recipient please return the email to the sender and delete it from your computer and you must not use, disclose, distribute, copy, print or rely on this email or its contents.
This communication is for informational purposes only.
It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction.
Any comments or statements made herein do not necessarily reflect those of GSA Capital.
GSA Capital Partners LLP is authorised and regulated by the Financial Conduct Authority and is registered in England and Wales at Stratton House, 5 Stratton Street, London W1J 8LA, number OC309261.
GSA Capital Services Limited is registered in England and Wales at the same address, number 5320529.


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] pNFS: deadlock in pnfs_send_layoutreturn
  2026-04-07 15:20 [PATCH] pNFS: deadlock in pnfs_send_layoutreturn Ben Roberts
@ 2026-04-08 11:39 ` kernel test robot
  2026-04-10  5:28 ` kernel test robot
  1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-04-08 11:39 UTC (permalink / raw)
  To: Ben Roberts, Trond Myklebust, Anna Schumaker
  Cc: llvm, oe-kbuild-all, linux-nfs, linux-kernel, Ben Roberts

Hi Ben,

kernel test robot noticed the following build errors:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on linus/master v7.0-rc7 next-20260407]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ben-Roberts/pNFS-deadlock-in-pnfs_send_layoutreturn/20260408-135718
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link:    https://lore.kernel.org/r/20260407152035.4034628-1-ben.roberts%40gsacapital.com
patch subject: [PATCH] pNFS: deadlock in pnfs_send_layoutreturn
config: powerpc-motionpro_defconfig (https://download.01.org/0day-ci/archive/20260408/202604081929.yB1AglTU-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project c80443cd37b2e2788cba67ffa180a6331e5f0791)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260408/202604081929.yB1AglTU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604081929.yB1AglTU-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/nfs/pnfs.c:1364:35: error: expected ';' after expression
    1364 |                 pnfs_clear_layoutreturn_info(lo)
         |                                                 ^
         |                                                 ;
   1 error generated.


vim +1364 fs/nfs/pnfs.c

  1345	
  1346	static int
  1347	pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
  1348			       const nfs4_stateid *stateid,
  1349			       const struct cred **pcred,
  1350			       enum pnfs_iomode iomode,
  1351			       unsigned int flags)
  1352	{
  1353		struct inode *ino = lo->plh_inode;
  1354		struct pnfs_layoutdriver_type *ld = NFS_SERVER(ino)->pnfs_curr_ld;
  1355		struct nfs4_layoutreturn *lrp;
  1356		const struct cred *cred = *pcred;
  1357		int status = 0;
  1358	
  1359		*pcred = NULL;
  1360		lrp = kzalloc_obj(*lrp, nfs_io_gfp_mask());
  1361		if (unlikely(lrp == NULL)) {
  1362			status = -ENOMEM;
  1363			spin_lock(&ino->i_lock);
> 1364			pnfs_clear_layoutreturn_info(lo)
  1365			pnfs_clear_layoutreturn_waitbit(lo);
  1366			spin_unlock(&ino->i_lock);
  1367			put_cred(cred);
  1368			pnfs_put_layout_hdr(lo);
  1369			goto out;
  1370		}
  1371	
  1372		pnfs_init_layoutreturn_args(&lrp->args, lo, stateid, iomode);
  1373		lrp->args.ld_private = &lrp->ld_private;
  1374		lrp->clp = NFS_SERVER(ino)->nfs_client;
  1375		lrp->cred = cred;
  1376		if (ld->prepare_layoutreturn)
  1377			ld->prepare_layoutreturn(&lrp->args);
  1378	
  1379		status = nfs4_proc_layoutreturn(lrp, flags);
  1380	out:
  1381		dprintk("<-- %s status: %d\n", __func__, status);
  1382		return status;
  1383	}
  1384	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pNFS: deadlock in pnfs_send_layoutreturn
  2026-04-07 15:20 [PATCH] pNFS: deadlock in pnfs_send_layoutreturn Ben Roberts
  2026-04-08 11:39 ` kernel test robot
@ 2026-04-10  5:28 ` kernel test robot
  1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-04-10  5:28 UTC (permalink / raw)
  To: Ben Roberts, Trond Myklebust, Anna Schumaker
  Cc: oe-kbuild-all, linux-nfs, linux-kernel, Ben Roberts

Hi Ben,

kernel test robot noticed the following build errors:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on linus/master v7.0-rc7 next-20260407]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ben-Roberts/pNFS-deadlock-in-pnfs_send_layoutreturn/20260408-135718
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link:    https://lore.kernel.org/r/20260407152035.4034628-1-ben.roberts%40gsacapital.com
patch subject: [PATCH] pNFS: deadlock in pnfs_send_layoutreturn
config: m68k-defconfig (https://download.01.org/0day-ci/archive/20260410/202604101309.6eRanxOC-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260410/202604101309.6eRanxOC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202604101309.6eRanxOC-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/nfs/pnfs.c: In function 'pnfs_send_layoutreturn':
>> fs/nfs/pnfs.c:1364:49: error: expected ';' before 'pnfs_clear_layoutreturn_waitbit'
    1364 |                 pnfs_clear_layoutreturn_info(lo)
         |                                                 ^
         |                                                 ;
    1365 |                 pnfs_clear_layoutreturn_waitbit(lo);
         |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  


vim +1364 fs/nfs/pnfs.c

  1345	
  1346	static int
  1347	pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
  1348			       const nfs4_stateid *stateid,
  1349			       const struct cred **pcred,
  1350			       enum pnfs_iomode iomode,
  1351			       unsigned int flags)
  1352	{
  1353		struct inode *ino = lo->plh_inode;
  1354		struct pnfs_layoutdriver_type *ld = NFS_SERVER(ino)->pnfs_curr_ld;
  1355		struct nfs4_layoutreturn *lrp;
  1356		const struct cred *cred = *pcred;
  1357		int status = 0;
  1358	
  1359		*pcred = NULL;
  1360		lrp = kzalloc_obj(*lrp, nfs_io_gfp_mask());
  1361		if (unlikely(lrp == NULL)) {
  1362			status = -ENOMEM;
  1363			spin_lock(&ino->i_lock);
> 1364			pnfs_clear_layoutreturn_info(lo)
  1365			pnfs_clear_layoutreturn_waitbit(lo);
  1366			spin_unlock(&ino->i_lock);
  1367			put_cred(cred);
  1368			pnfs_put_layout_hdr(lo);
  1369			goto out;
  1370		}
  1371	
  1372		pnfs_init_layoutreturn_args(&lrp->args, lo, stateid, iomode);
  1373		lrp->args.ld_private = &lrp->ld_private;
  1374		lrp->clp = NFS_SERVER(ino)->nfs_client;
  1375		lrp->cred = cred;
  1376		if (ld->prepare_layoutreturn)
  1377			ld->prepare_layoutreturn(&lrp->args);
  1378	
  1379		status = nfs4_proc_layoutreturn(lrp, flags);
  1380	out:
  1381		dprintk("<-- %s status: %d\n", __func__, status);
  1382		return status;
  1383	}
  1384	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-10  5:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 15:20 [PATCH] pNFS: deadlock in pnfs_send_layoutreturn Ben Roberts
2026-04-08 11:39 ` kernel test robot
2026-04-10  5:28 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox