public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: NeilBrown <neil@brown.name>
Cc: Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna.schumaker@oracle.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2 0/3] Fix localio hangs
Date: Wed, 16 Jul 2025 19:27:00 -0400	[thread overview]
Message-ID: <aHg1RLw-5Csbiber@kernel.org> (raw)
In-Reply-To: <175270375199.2234665.7748991440226043304@noble.neil.brown.name>

On Thu, Jul 17, 2025 at 08:09:11AM +1000, NeilBrown wrote:
> On Thu, 17 Jul 2025, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The following patch series fixes a series of issues with the current
> > localio code, as reported in the link
> > https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
> > 
> > 
> > Trond Myklebust (3):
> >   NFS/localio: nfs_close_local_fh() fix check for file closed
> >   NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
> >   NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
> 
> That all looks good to me - thanks a lot for finding and fixing my bugs.
> 
> Reviewed-by: NeilBrown <neil@brown.name>
> 
> I'd still like to fix the nfsd_file_cache_purge() issue but that is
> quite separate especially now that you've prevented it causing problems
> for nfs_uuid_put().
> 
> thanks,
> NeilBrown

Unfortunately even with these 3 v2 fixes I was just able to hit the
same hang on NFSD shutdown.  It took 5 iterations of the fio test,
reported here:
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
So it is harder to hit with these v2 fixes, nevertheless:

[  369.528839] task:rpc.nfsd        state:D stack:0     pid:10569 tgid:10569 ppid:1      flags:0x00004006
[  369.528985] Call Trace:
[  369.529127]  <TASK>
[  369.529295]  __schedule+0x26d/0x530
[  369.529435]  schedule+0x27/0xa0
[  369.529566]  schedule_timeout+0x14e/0x160
[  369.529700]  ? svc_destroy+0xce/0x160 [sunrpc]
[  369.529882]  ? lockd_put+0x5f/0x90 [lockd]
[  369.530022]  __wait_for_common+0x8f/0x1d0
[  369.530154]  ? __pfx_schedule_timeout+0x10/0x10
[  369.530329]  nfsd_destroy_serv+0x13f/0x1a0 [nfsd]
[  369.530516]  nfsd_svc+0xe0/0x170 [nfsd]
[  369.530684]  write_threads+0xc3/0x190 [nfsd]
[  369.530845]  ? simple_transaction_get+0xc2/0xe0
[  369.530973]  ? __pfx_write_threads+0x10/0x10 [nfsd]
[  369.531133]  nfsctl_transaction_write+0x47/0x80 [nfsd]
[  369.531324]  vfs_write+0xfa/0x420
[  369.531448]  ? do_filp_open+0xae/0x150
[  369.531574]  ksys_write+0x63/0xe0
[  369.531693]  do_syscall_64+0x7d/0x160
[  369.531816]  ? do_sys_openat2+0x81/0xd0
[  369.531937]  ? syscall_exit_work+0xf3/0x120
[  369.532058]  ? syscall_exit_to_user_mode+0x32/0x1b0
[  369.532178]  ? do_syscall_64+0x89/0x160
[  369.532344]  ? __mod_memcg_lruvec_state+0x95/0x150
[  369.532465]  ? __lruvec_stat_mod_folio+0x84/0xd0
[  369.532584]  ? syscall_exit_work+0xf3/0x120
[  369.532705]  ? syscall_exit_to_user_mode+0x32/0x1b0
[  369.532827]  ? do_syscall_64+0x89/0x160
[  369.532947]  ? __handle_mm_fault+0x326/0x730
[  369.533066]  ? __mod_memcg_lruvec_state+0x95/0x150
[  369.533187]  ? __count_memcg_events+0x53/0xf0
[  369.533306]  ? handle_mm_fault+0x245/0x340
[  369.533427]  ? do_user_addr_fault+0x341/0x6b0
[  369.533547]  ? exc_page_fault+0x70/0x160
[  369.533666]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  369.533787] RIP: 0033:0x7f1db10fd617

crash> dis -l nfsd_destroy_serv+0x13f
/root/snitm/git/linux-HS/fs/nfsd/nfssvc.c: 468
0xffffffffc172e36f <nfsd_destroy_serv+319>:     mov    %r12,%rdi

which is the percpu_ref_exit() in nfsd_shutdown_net():

static void nfsd_shutdown_net(struct net *net)
{
        struct nfsd_net *nn = net_generic(net, nfsd_net_id);

        if (!nn->nfsd_net_up)
                return;

        percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done);
        wait_for_completion(&nn->nfsd_net_confirm_done);

        nfsd_export_flush(net);
        nfs4_state_shutdown_net(net);
        nfsd_reply_cache_shutdown(nn);
        nfsd_file_cache_shutdown_net(net);
        if (nn->lockd_up) {
                lockd_down(net);
                nn->lockd_up = false;
        }

        wait_for_completion(&nn->nfsd_net_free_done);
   ---> percpu_ref_exit(&nn->nfsd_net_ref);

        nn->nfsd_net_up = false;
        nfsd_shutdown_generic();
}

  reply	other threads:[~2025-07-16 23:27 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09  0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
2025-05-09  0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
2025-05-09  0:46 ` [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref NeilBrown
2025-05-09  0:46 ` [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file NeilBrown
2025-05-09  0:46 ` [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh() NeilBrown
2025-05-09  0:46 ` [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() NeilBrown
2025-05-09  0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
2025-05-09 11:03   ` kernel test robot
2025-07-08 14:20   ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14  3:13     ` [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-14  4:19         ` NeilBrown
2025-07-14 14:37           ` Mike Snitzer
2025-07-14 12:23         ` Jeff Layton
2025-07-14  3:13       ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
2025-07-14  4:23         ` NeilBrown
2025-07-14 12:28           ` Jeff Layton
2025-07-14 14:08             ` Mike Snitzer
2025-07-14  3:50     ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
2025-07-14 14:45       ` Mike Snitzer
2025-07-15 22:52     ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2025-07-15 22:52       ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-15 22:52       ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events Trond Myklebust
2025-07-15 22:52       ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16  1:09       ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed NeilBrown
2025-07-16  1:22       ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
2025-07-16  2:29         ` Trond Myklebust
2025-07-16  3:51           ` NeilBrown
2025-07-16  1:31       ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
2025-07-16  4:17         ` Trond Myklebust
2025-07-16  5:07           ` NeilBrown
2025-07-16 15:19             ` Trond Myklebust
2025-07-16 15:59       ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16 22:09         ` [PATCH v2 0/3] Fix localio hangs NeilBrown
2025-07-16 23:27           ` Mike Snitzer [this message]
2025-07-18  0:18             ` NeilBrown
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
2025-05-09 21:02   ` Mike Snitzer
2025-05-10  0:16     ` Paul E. McKenney
2025-05-10  2:44       ` NeilBrown
2025-05-10  3:01   ` NeilBrown
2025-05-10 16:02     ` Chuck Lever
2025-05-10 19:57       ` Mike Snitzer
2025-05-16 15:33         ` Chuck Lever
2025-05-18 10:46           ` Pali Rohár
2025-05-19  3:49         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHg1RLw-5Csbiber@kernel.org \
    --to=snitzer@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox