* [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
@ 2024-10-30 14:48 Jeff Layton
2024-10-30 14:48 ` [PATCH v3 1/2] nfsd: make nfsd4_session->se_flags a bool Jeff Layton
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Jeff Layton @ 2024-10-30 14:48 UTC (permalink / raw)
To: Chuck Lever, Neil Brown, Dai Ngo, Tom Talpey
Cc: Olga Kornievskaia, linux-nfs, linux-kernel, Jeff Layton
A few more minor updates to the set to fix some small-ish bugs, and do a
bit of cleanup. This seems to test OK for me so far.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v3:
- add patch to convert se_flags to single se_dead bool
- fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX
- don't reject target highest slot value of 0
- Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org
Changes in v2:
- take cl_lock when fetching fields from session to be encoded
- use fls() instead of bespoke highest_unset_index()
- rename variables in several functions with more descriptive names
- clamp limit of for loop in update_cb_slot_table()
- re-add missing rpc_wake_up_queued_task() call
- fix slotid check in decode_cb_sequence4resok()
- add new per-session spinlock
---
Jeff Layton (2):
nfsd: make nfsd4_session->se_flags a bool
nfsd: allow for up to 32 callback session slots
fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++---------------
fs/nfsd/nfs4state.c | 17 +++++---
fs/nfsd/state.h | 19 ++++-----
fs/nfsd/trace.h | 2 +-
4 files changed, 98 insertions(+), 48 deletions(-)
---
base-commit: 06c049d2a81a81f01ff072c6519d0c38b646b550
change-id: 20241025-bcwide-6bd7e4b63db2
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 1/2] nfsd: make nfsd4_session->se_flags a bool 2024-10-30 14:48 [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots Jeff Layton @ 2024-10-30 14:48 ` Jeff Layton 2024-10-30 14:48 ` [PATCH v3 2/2] nfsd: allow for up to 32 callback session slots Jeff Layton ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Jeff Layton @ 2024-10-30 14:48 UTC (permalink / raw) To: Chuck Lever, Neil Brown, Dai Ngo, Tom Talpey Cc: Olga Kornievskaia, linux-nfs, linux-kernel, Jeff Layton While this holds the flags from the CREATE_SESSION request, nothing ever consults them. The only flag used is NFS4_SESSION_DEAD. Make it a simple bool instead. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4state.c | 6 +++--- fs/nfsd/state.h | 4 +--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5b718b349396f1aecd0ad4c63b2f43342841bbd4..baf7994131fe1b0a4715174ba943fd2a9882aa12 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -149,14 +149,14 @@ void nfsd4_destroy_laundry_wq(void) static bool is_session_dead(struct nfsd4_session *ses) { - return ses->se_flags & NFS4_SESSION_DEAD; + return ses->se_dead; } static __be32 mark_session_dead_locked(struct nfsd4_session *ses, int ref_held_by_me) { if (atomic_read(&ses->se_ref) > ref_held_by_me) return nfserr_jukebox; - ses->se_flags |= NFS4_SESSION_DEAD; + ses->se_dead = true; return nfs_ok; } @@ -2133,7 +2133,7 @@ static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, stru INIT_LIST_HEAD(&new->se_conns); new->se_cb_seq_nr = 1; - new->se_flags = cses->flags; + new->se_dead = false; new->se_cb_prog = cses->callback_prog; new->se_cb_sec = cses->cb_sec; atomic_set(&new->se_ref, 0); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 41cda86fea1f6166a0fd0215d3d458c93ced3e6a..d22e4f2c9039324a0953a9e15a3c255fb8ee1a44 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -314,11 +314,9 @@ struct nfsd4_conn { */ struct nfsd4_session { atomic_t se_ref; + bool se_dead; struct list_head se_hash; /* hash by sessionid */ struct list_head se_perclnt; -/* See SESSION4_PERSIST, etc. for standard flags; this is internal-only: */ -#define NFS4_SESSION_DEAD 0x010 - u32 se_flags; struct nfs4_client *se_client; struct nfs4_sessionid se_sessionid; struct nfsd4_channel_attrs se_fchannel; -- 2.47.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/2] nfsd: allow for up to 32 callback session slots 2024-10-30 14:48 [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots Jeff Layton 2024-10-30 14:48 ` [PATCH v3 1/2] nfsd: make nfsd4_session->se_flags a bool Jeff Layton @ 2024-10-30 14:48 ` Jeff Layton 2024-10-30 20:30 ` [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots cel 2024-11-06 15:16 ` Sebastian Feld 3 siblings, 0 replies; 10+ messages in thread From: Jeff Layton @ 2024-10-30 14:48 UTC (permalink / raw) To: Chuck Lever, Neil Brown, Dai Ngo, Tom Talpey Cc: Olga Kornievskaia, linux-nfs, linux-kernel, Jeff Layton nfsd currently only uses a single slot in the callback channel, which is proving to be a bottleneck in some cases. Widen the callback channel to a max of 32 slots (subject to the client's target_maxreqs value). Change the cb_holds_slot boolean to an integer that tracks the current slot number (with -1 meaning "unassigned"). Move the callback slot tracking info into the session. Add a new u32 that acts as a bitmap to track which slots are in use, and a u32 to track the latest callback target_slotid that the client reports. To protect the new fields, add a new per-session spinlock (the se_lock). Fix nfsd41_cb_get_slot to always search for the lowest slotid (using ffs()), and change it to retry until there is a slot available or the rpc_task is signalled. Finally, convert the session->se_cb_seq_nr field into an array of counters and add the necessary handling to ensure that the seqids get reset at the appropriate times. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4callback.c | 108 ++++++++++++++++++++++++++++++++++--------------- fs/nfsd/nfs4state.c | 11 +++-- fs/nfsd/state.h | 15 ++++--- fs/nfsd/trace.h | 2 +- 4 files changed, 94 insertions(+), 42 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index e38fa834b3d91333acf1425eb14c644e5d5f2601..04b52ae182fc55659662232712a2439fb9a3b95a 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -406,6 +406,19 @@ encode_cb_getattr4args(struct xdr_stream *xdr, struct nfs4_cb_compound_hdr *hdr, hdr->nops++; } +static u32 highest_slotid(struct nfsd4_session *ses) +{ + u32 idx; + + spin_lock(&ses->se_lock); + idx = fls(~ses->se_cb_slot_avail); + if (idx > 0) + --idx; + idx = max(idx, ses->se_cb_highest_slot); + spin_unlock(&ses->se_lock); + return idx; +} + /* * CB_SEQUENCE4args * @@ -432,15 +445,35 @@ static void encode_cb_sequence4args(struct xdr_stream *xdr, encode_sessionid4(xdr, session); p = xdr_reserve_space(xdr, 4 + 4 + 4 + 4 + 4); - *p++ = cpu_to_be32(session->se_cb_seq_nr); /* csa_sequenceid */ - *p++ = xdr_zero; /* csa_slotid */ - *p++ = xdr_zero; /* csa_highest_slotid */ + *p++ = cpu_to_be32(session->se_cb_seq_nr[cb->cb_held_slot]); /* csa_sequenceid */ + *p++ = cpu_to_be32(cb->cb_held_slot); /* csa_slotid */ + *p++ = cpu_to_be32(highest_slotid(session)); /* csa_highest_slotid */ *p++ = xdr_zero; /* csa_cachethis */ xdr_encode_empty_array(p); /* csa_referring_call_lists */ hdr->nops++; } +static void update_cb_slot_table(struct nfsd4_session *ses, u32 target) +{ + /* No need to do anything if nothing changed */ + if (likely(target == READ_ONCE(ses->se_cb_highest_slot))) + return; + + spin_lock(&ses->se_lock); + if (target > ses->se_cb_highest_slot) { + int i; + + target = min(target, NFSD_BC_SLOT_TABLE_MAX); + + /* Growing the slot table. Reset any new sequences to 1 */ + for (i = ses->se_cb_highest_slot + 1; i <= target; ++i) + ses->se_cb_seq_nr[i] = 1; + } + ses->se_cb_highest_slot = target; + spin_unlock(&ses->se_lock); +} + /* * CB_SEQUENCE4resok * @@ -468,7 +501,7 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr, struct nfsd4_session *session = cb->cb_clp->cl_cb_session; int status = -ESERVERFAULT; __be32 *p; - u32 dummy; + u32 seqid, slotid, target; /* * If the server returns different values for sessionID, slotID or @@ -484,21 +517,22 @@ static int decode_cb_sequence4resok(struct xdr_stream *xdr, } p += XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN); - dummy = be32_to_cpup(p++); - if (dummy != session->se_cb_seq_nr) { + seqid = be32_to_cpup(p++); + if (seqid != session->se_cb_seq_nr[cb->cb_held_slot]) { dprintk("NFS: %s Invalid sequence number\n", __func__); goto out; } - dummy = be32_to_cpup(p++); - if (dummy != 0) { + slotid = be32_to_cpup(p++); + if (slotid != cb->cb_held_slot) { dprintk("NFS: %s Invalid slotid\n", __func__); goto out; } - /* - * FIXME: process highest slotid and target highest slotid - */ + p++; // ignore current highest slot value + + target = be32_to_cpup(p++); + update_cb_slot_table(session, target); status = 0; out: cb->cb_seq_status = status; @@ -1211,28 +1245,39 @@ void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *conn) static bool nfsd41_cb_get_slot(struct nfsd4_callback *cb, struct rpc_task *task) { struct nfs4_client *clp = cb->cb_clp; + struct nfsd4_session *ses = clp->cl_cb_session; + int idx; - if (!cb->cb_holds_slot && - test_and_set_bit(0, &clp->cl_cb_slot_busy) != 0) { - rpc_sleep_on(&clp->cl_cb_waitq, task, NULL); - /* Race breaker */ - if (test_and_set_bit(0, &clp->cl_cb_slot_busy) != 0) { - dprintk("%s slot is busy\n", __func__); + if (cb->cb_held_slot >= 0) + return true; +retry: + spin_lock(&ses->se_lock); + idx = ffs(ses->se_cb_slot_avail) - 1; + if (idx < 0 || idx > ses->se_cb_highest_slot) { + spin_unlock(&ses->se_lock); + if (RPC_SIGNALLED(task)) return false; - } + rpc_sleep_on(&clp->cl_cb_waitq, task, NULL); rpc_wake_up_queued_task(&clp->cl_cb_waitq, task); + goto retry; } - cb->cb_holds_slot = true; + /* clear the bit for the slot */ + ses->se_cb_slot_avail &= ~BIT(idx); + spin_unlock(&ses->se_lock); + cb->cb_held_slot = idx; return true; } static void nfsd41_cb_release_slot(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; + struct nfsd4_session *ses = clp->cl_cb_session; - if (cb->cb_holds_slot) { - cb->cb_holds_slot = false; - clear_bit(0, &clp->cl_cb_slot_busy); + if (cb->cb_held_slot >= 0) { + spin_lock(&ses->se_lock); + ses->se_cb_slot_avail |= BIT(cb->cb_held_slot); + spin_unlock(&ses->se_lock); + cb->cb_held_slot = -1; rpc_wake_up_next(&clp->cl_cb_waitq); } } @@ -1249,8 +1294,8 @@ static void nfsd41_destroy_cb(struct nfsd4_callback *cb) } /* - * TODO: cb_sequence should support referring call lists, cachethis, multiple - * slots, and mark callback channel down on communication errors. + * TODO: cb_sequence should support referring call lists, cachethis, + * and mark callback channel down on communication errors. */ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) { @@ -1292,7 +1337,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback return true; } - if (!cb->cb_holds_slot) + if (cb->cb_held_slot < 0) goto need_restart; /* This is the operation status code for CB_SEQUENCE */ @@ -1306,10 +1351,10 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback * If CB_SEQUENCE returns an error, then the state of the slot * (sequence ID, cached reply) MUST NOT change. */ - ++session->se_cb_seq_nr; + ++session->se_cb_seq_nr[cb->cb_held_slot]; break; case -ESERVERFAULT: - ++session->se_cb_seq_nr; + ++session->se_cb_seq_nr[cb->cb_held_slot]; nfsd4_mark_cb_fault(cb->cb_clp); ret = false; break; @@ -1335,17 +1380,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback case -NFS4ERR_BADSLOT: goto retry_nowait; case -NFS4ERR_SEQ_MISORDERED: - if (session->se_cb_seq_nr != 1) { - session->se_cb_seq_nr = 1; + if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) { + session->se_cb_seq_nr[cb->cb_held_slot] = 1; goto retry_nowait; } break; default: nfsd4_mark_cb_fault(cb->cb_clp); } - nfsd41_cb_release_slot(cb); - trace_nfsd_cb_free_slot(task, cb); + nfsd41_cb_release_slot(cb); if (RPC_SIGNALLED(task)) goto need_restart; @@ -1565,7 +1609,7 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, INIT_WORK(&cb->cb_work, nfsd4_run_cb_work); cb->cb_status = 0; cb->cb_need_restart = false; - cb->cb_holds_slot = false; + cb->cb_held_slot = -1; } /** diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index baf7994131fe1b0a4715174ba943fd2a9882aa12..75557e7cc9265517f51952563beaa4cfe8adcc3f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2002,6 +2002,9 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs, } memcpy(&new->se_fchannel, fattrs, sizeof(struct nfsd4_channel_attrs)); + new->se_cb_slot_avail = ~0U; + new->se_cb_highest_slot = battrs->maxreqs - 1; + spin_lock_init(&new->se_lock); return new; out_free: while (i--) @@ -2132,11 +2135,14 @@ static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, stru INIT_LIST_HEAD(&new->se_conns); - new->se_cb_seq_nr = 1; + atomic_set(&new->se_ref, 0); new->se_dead = false; new->se_cb_prog = cses->callback_prog; new->se_cb_sec = cses->cb_sec; - atomic_set(&new->se_ref, 0); + + for (idx = 0; idx < NFSD_BC_SLOT_TABLE_MAX; ++idx) + new->se_cb_seq_nr[idx] = 1; + idx = hash_sessionid(&new->se_sessionid); list_add(&new->se_hash, &nn->sessionid_hashtbl[idx]); spin_lock(&clp->cl_lock); @@ -3159,7 +3165,6 @@ static struct nfs4_client *create_client(struct xdr_netobj name, kref_init(&clp->cl_nfsdfs.cl_ref); nfsd4_init_cb(&clp->cl_cb_null, clp, NULL, NFSPROC4_CLNT_CB_NULL); clp->cl_time = ktime_get_boottime_seconds(); - clear_bit(0, &clp->cl_cb_slot_busy); copy_verf(clp, verf); memcpy(&clp->cl_addr, sa, sizeof(struct sockaddr_storage)); clp->cl_cb_session = NULL; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index d22e4f2c9039324a0953a9e15a3c255fb8ee1a44..848d023cb308f0b69916c4ee34b09075708f0de3 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -71,8 +71,8 @@ struct nfsd4_callback { struct work_struct cb_work; int cb_seq_status; int cb_status; + int cb_held_slot; bool cb_need_restart; - bool cb_holds_slot; }; struct nfsd4_callback_ops { @@ -307,6 +307,9 @@ struct nfsd4_conn { unsigned char cn_flags; }; +/* Highest slot index that nfsd implements in NFSv4.1+ backchannel */ +#define NFSD_BC_SLOT_TABLE_MAX (sizeof(u32) * 8 - 1) + /* * Representation of a v4.1+ session. These are refcounted in a similar fashion * to the nfs4_client. References are only taken when the server is actively @@ -314,6 +317,10 @@ struct nfsd4_conn { */ struct nfsd4_session { atomic_t se_ref; + spinlock_t se_lock; + u32 se_cb_slot_avail; /* bitmap of available slots */ + u32 se_cb_highest_slot; /* highest slot client wants */ + u32 se_cb_prog; bool se_dead; struct list_head se_hash; /* hash by sessionid */ struct list_head se_perclnt; @@ -322,8 +329,7 @@ struct nfsd4_session { struct nfsd4_channel_attrs se_fchannel; struct nfsd4_cb_sec se_cb_sec; struct list_head se_conns; - u32 se_cb_prog; - u32 se_cb_seq_nr; + u32 se_cb_seq_nr[NFSD_BC_SLOT_TABLE_MAX + 1]; struct nfsd4_slot *se_slots[]; /* forward channel slots */ }; @@ -457,9 +463,6 @@ struct nfs4_client { */ struct dentry *cl_nfsd_info_dentry; - /* for nfs41 callbacks */ - /* We currently support a single back channel with a single slot */ - unsigned long cl_cb_slot_busy; struct rpc_wait_queue cl_cb_waitq; /* backchannel callers may */ /* wait here for slots */ struct net *net; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index f318898cfc31614b5a84a4867e18c2b3a07122c9..a9c17186b6892f1df8d7f7b90e250c2913ab23fe 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1697,7 +1697,7 @@ TRACE_EVENT(nfsd_cb_free_slot, __entry->cl_id = sid->clientid.cl_id; __entry->seqno = sid->sequence; __entry->reserved = sid->reserved; - __entry->slot_seqno = session->se_cb_seq_nr; + __entry->slot_seqno = session->se_cb_seq_nr[cb->cb_held_slot]; ), TP_printk(SUNRPC_TRACE_TASK_SPECIFIER " sessionid=%08x:%08x:%08x:%08x new slot seqno=%u", -- 2.47.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-10-30 14:48 [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots Jeff Layton 2024-10-30 14:48 ` [PATCH v3 1/2] nfsd: make nfsd4_session->se_flags a bool Jeff Layton 2024-10-30 14:48 ` [PATCH v3 2/2] nfsd: allow for up to 32 callback session slots Jeff Layton @ 2024-10-30 20:30 ` cel 2024-11-05 22:08 ` Olga Kornievskaia 2024-11-06 15:16 ` Sebastian Feld 3 siblings, 1 reply; 10+ messages in thread From: cel @ 2024-10-30 20:30 UTC (permalink / raw) To: Neil Brown, Dai Ngo, Tom Talpey, Jeff Layton Cc: Chuck Lever, Olga Kornievskaia, linux-nfs, linux-kernel From: Chuck Lever <chuck.lever@oracle.com> On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > A few more minor updates to the set to fix some small-ish bugs, and do a > bit of cleanup. This seems to test OK for me so far. > > Applied to nfsd-next for v6.13, thanks! Still open for comments and test results. [1/2] nfsd: make nfsd4_session->se_flags a bool commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 [2/2] nfsd: allow for up to 32 callback session slots commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 -- Chuck Lever ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-10-30 20:30 ` [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots cel @ 2024-11-05 22:08 ` Olga Kornievskaia 2024-11-05 22:27 ` Jeff Layton 0 siblings, 1 reply; 10+ messages in thread From: Olga Kornievskaia @ 2024-11-05 22:08 UTC (permalink / raw) To: cel, Jeff Layton Cc: Neil Brown, Dai Ngo, Tom Talpey, Chuck Lever, Olga Kornievskaia, linux-nfs, linux-kernel Hi Jeff/Chuck, Hitting the following softlockup when running using nfsd-next code. testing is same open bunch of file get delegations, do local conflicting operation. Network trace shows a few cb_recalls occurring successfully before the soft lockup (I can confirm that more than 1 slot was used. But I also see that the server isn't trying to use the lowest available slot but instead just bumps the number and uses the next one. By that I mean, say slot 0 was used and a reply came back but the next callback would use slot 1 instead of slot 0). [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u24:28:205] [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg fuse [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: loaded Not tainted 6.12.0-rc4+ #42 [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 344.051821] pc : kasan_check_range+0x0/0x188 [ 344.052011] lr : __kasan_check_write+0x1c/0x28 [ 344.052208] sp : ffff800087027920 [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 [ 344.056479] Call trace: [ 344.056636] kasan_check_range+0x0/0x188 [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 [ 344.057192] _raw_spin_lock+0x180/0x1a8 [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] [ 344.058623] process_one_work+0x598/0x1100 [ 344.058818] worker_thread+0x6c0/0xa58 [ 344.058992] kthread+0x288/0x310 [ 344.059145] ret_from_fork+0x10/0x20 [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u24:27:204] [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg fuse [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: loaded Tainted: G L 6.12.0-rc4+ #42 [ 344.080290] Tainted: [L]=SOFTLOCKUP [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 [ 344.081998] sp : ffff800087017a30 [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 [ 344.085915] Call trace: [ 344.086028] _raw_spin_lock+0x108/0x1a8 [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] [ 344.087389] process_one_work+0x598/0x1100 [ 344.087584] worker_thread+0x6c0/0xa58 [ 344.087758] kthread+0x288/0x310 [ 344.087909] ret_from_fork+0x10/0x20 On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > A few more minor updates to the set to fix some small-ish bugs, and do a > > bit of cleanup. This seems to test OK for me so far. > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > test results. > > [1/2] nfsd: make nfsd4_session->se_flags a bool > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > [2/2] nfsd: allow for up to 32 callback session slots > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-11-05 22:08 ` Olga Kornievskaia @ 2024-11-05 22:27 ` Jeff Layton 2024-11-05 22:40 ` Olga Kornievskaia 0 siblings, 1 reply; 10+ messages in thread From: Jeff Layton @ 2024-11-05 22:27 UTC (permalink / raw) To: Olga Kornievskaia, cel Cc: Neil Brown, Dai Ngo, Tom Talpey, Chuck Lever, Olga Kornievskaia, linux-nfs, linux-kernel On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > Hi Jeff/Chuck, > > Hitting the following softlockup when running using nfsd-next code. > testing is same open bunch of file get delegations, do local > conflicting operation. Network trace shows a few cb_recalls occurring > successfully before the soft lockup (I can confirm that more than 1 > slot was used. But I also see that the server isn't trying to use the > lowest available slot but instead just bumps the number and uses the > next one. By that I mean, say slot 0 was used and a reply came back > but the next callback would use slot 1 instead of slot 0). > If the slots are being consumed and not released then that's what you'd see. The question is why those slots aren't being released. Did the client return a SEQUENCE error on some of those callbacks? It looks like the slot doesn't always get released if that occurs, so that might be one possibility. > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > [kworker/u24:28:205] > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > fuse > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > loaded Not tainted 6.12.0-rc4+ #42 > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 344.051821] pc : kasan_check_range+0x0/0x188 > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > [ 344.052208] sp : ffff800087027920 > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > [ 344.056479] Call trace: > [ 344.056636] kasan_check_range+0x0/0x188 > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > [ 344.058623] process_one_work+0x598/0x1100 > [ 344.058818] worker_thread+0x6c0/0xa58 > [ 344.058992] kthread+0x288/0x310 > [ 344.059145] ret_from_fork+0x10/0x20 > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > [kworker/u24:27:204] > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > fuse > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > loaded Tainted: G L 6.12.0-rc4+ #42 > [ 344.080290] Tainted: [L]=SOFTLOCKUP > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > [ 344.081998] sp : ffff800087017a30 > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > [ 344.085915] Call trace: > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > [ 344.087389] process_one_work+0x598/0x1100 > [ 344.087584] worker_thread+0x6c0/0xa58 > [ 344.087758] kthread+0x288/0x310 > [ 344.087909] ret_from_fork+0x10/0x20 > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > test results. > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > [2/2] nfsd: allow for up to 32 callback session slots > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > -- > > Chuck Lever > > > > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-11-05 22:27 ` Jeff Layton @ 2024-11-05 22:40 ` Olga Kornievskaia 2024-11-05 22:55 ` Jeff Layton 0 siblings, 1 reply; 10+ messages in thread From: Olga Kornievskaia @ 2024-11-05 22:40 UTC (permalink / raw) To: Jeff Layton Cc: cel, Neil Brown, Dai Ngo, Tom Talpey, Chuck Lever, Olga Kornievskaia, linux-nfs, linux-kernel On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > > Hi Jeff/Chuck, > > > > Hitting the following softlockup when running using nfsd-next code. > > testing is same open bunch of file get delegations, do local > > conflicting operation. Network trace shows a few cb_recalls occurring > > successfully before the soft lockup (I can confirm that more than 1 > > slot was used. But I also see that the server isn't trying to use the > > lowest available slot but instead just bumps the number and uses the > > next one. By that I mean, say slot 0 was used and a reply came back > > but the next callback would use slot 1 instead of slot 0). > > > > If the slots are being consumed and not released then that's what you'd > see. The question is why those slots aren't being released. > > Did the client return a SEQUENCE error on some of those callbacks? It > looks like the slot doesn't always get released if that occurs, so that > might be one possibility. No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful. > > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > > [kworker/u24:28:205] > > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > fuse > > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > > loaded Not tainted 6.12.0-rc4+ #42 > > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > [ 344.051821] pc : kasan_check_range+0x0/0x188 > > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > > [ 344.052208] sp : ffff800087027920 > > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > > [ 344.056479] Call trace: > > [ 344.056636] kasan_check_range+0x0/0x188 > > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > > [ 344.058623] process_one_work+0x598/0x1100 > > [ 344.058818] worker_thread+0x6c0/0xa58 > > [ 344.058992] kthread+0x288/0x310 > > [ 344.059145] ret_from_fork+0x10/0x20 > > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > > [kworker/u24:27:204] > > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > fuse > > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > > loaded Tainted: G L 6.12.0-rc4+ #42 > > [ 344.080290] Tainted: [L]=SOFTLOCKUP > > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > > [ 344.081998] sp : ffff800087017a30 > > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > > [ 344.085915] Call trace: > > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > > [ 344.087389] process_one_work+0x598/0x1100 > > [ 344.087584] worker_thread+0x6c0/0xa58 > > [ 344.087758] kthread+0x288/0x310 > > [ 344.087909] ret_from_fork+0x10/0x20 > > > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > > test results. > > > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > > [2/2] nfsd: allow for up to 32 callback session slots > > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > > > -- > > > Chuck Lever > > > > > > > > -- > Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-11-05 22:40 ` Olga Kornievskaia @ 2024-11-05 22:55 ` Jeff Layton 0 siblings, 0 replies; 10+ messages in thread From: Jeff Layton @ 2024-11-05 22:55 UTC (permalink / raw) To: Olga Kornievskaia Cc: cel, Neil Brown, Dai Ngo, Tom Talpey, Chuck Lever, Olga Kornievskaia, linux-nfs, linux-kernel On Tue, 2024-11-05 at 17:40 -0500, Olga Kornievskaia wrote: > On Tue, Nov 5, 2024 at 5:27 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Tue, 2024-11-05 at 17:08 -0500, Olga Kornievskaia wrote: > > > Hi Jeff/Chuck, > > > > > > Hitting the following softlockup when running using nfsd-next code. > > > testing is same open bunch of file get delegations, do local > > > conflicting operation. Network trace shows a few cb_recalls occurring > > > successfully before the soft lockup (I can confirm that more than 1 > > > slot was used. But I also see that the server isn't trying to use the > > > lowest available slot but instead just bumps the number and uses the > > > next one. By that I mean, say slot 0 was used and a reply came back > > > but the next callback would use slot 1 instead of slot 0). > > > > > > > If the slots are being consumed and not released then that's what you'd > > see. The question is why those slots aren't being released. > > > > Did the client return a SEQUENCE error on some of those callbacks? It > > looks like the slot doesn't always get released if that occurs, so that > > might be one possibility. > > No sequence errors. CB_SEQUENCE and CB_RECALL replies are all successful. > Nevermind. I think I see the problem. I think I've got the rpc_sleep_on() handling all wrong here. I'll have to respin this patch. Chuck, mind dropping this one for now? Thanks, Jeff > > > [ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! > > > [kworker/u24:28:205] > > > [ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > > fuse > > > [ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump: > > > loaded Not tainted 6.12.0-rc4+ #42 > > > [ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > > [ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > [ 344.051821] pc : kasan_check_range+0x0/0x188 > > > [ 344.052011] lr : __kasan_check_write+0x1c/0x28 > > > [ 344.052208] sp : ffff800087027920 > > > [ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170 > > > [ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0 > > > [ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a > > > [ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000 > > > [ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d > > > [ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f > > > [ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000 > > > [ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001 > > > [ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc > > > [ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172 > > > [ 344.056479] Call trace: > > > [ 344.056636] kasan_check_range+0x0/0x188 > > > [ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0 > > > [ 344.057192] _raw_spin_lock+0x180/0x1a8 > > > [ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc] > > > [ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd] > > > [ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc] > > > [ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc] > > > [ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc] > > > [ 344.058623] process_one_work+0x598/0x1100 > > > [ 344.058818] worker_thread+0x6c0/0xa58 > > > [ 344.058992] kthread+0x288/0x310 > > > [ 344.059145] ret_from_fork+0x10/0x20 > > > [ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! > > > [kworker/u24:27:204] > > > [ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core > > > nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy > > > snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr > > > rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo > > > snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg > > > videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core > > > snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc > > > snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme > > > drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64 > > > drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg > > > fuse > > > [ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump: > > > loaded Tainted: G L 6.12.0-rc4+ #42 > > > [ 344.080290] Tainted: [L]=SOFTLOCKUP > > > [ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS > > > VMW201.00V.21805430.BA64.2305221830 05/22/2023 > > > [ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > [ 344.081630] pc : _raw_spin_lock+0x108/0x1a8 > > > [ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8 > > > [ 344.081998] sp : ffff800087017a30 > > > [ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081 > > > [ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40 > > > [ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50 > > > [ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000 > > > [ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10 > > > [ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b > > > [ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000 > > > [ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001 > > > [ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154 > > > [ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 > > > [ 344.085915] Call trace: > > > [ 344.086028] _raw_spin_lock+0x108/0x1a8 > > > [ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc] > > > [ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd] > > > [ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc] > > > [ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc] > > > [ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc] > > > [ 344.087389] process_one_work+0x598/0x1100 > > > [ 344.087584] worker_thread+0x6c0/0xa58 > > > [ 344.087758] kthread+0x288/0x310 > > > [ 344.087909] ret_from_fork+0x10/0x20 > > > > > > On Wed, Oct 30, 2024 at 4:30 PM <cel@kernel.org> wrote: > > > > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote: > > > > > A few more minor updates to the set to fix some small-ish bugs, and do a > > > > > bit of cleanup. This seems to test OK for me so far. > > > > > > > > > > > > > > > > > > Applied to nfsd-next for v6.13, thanks! Still open for comments and > > > > test results. > > > > > > > > [1/2] nfsd: make nfsd4_session->se_flags a bool > > > > commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816 > > > > [2/2] nfsd: allow for up to 32 callback session slots > > > > commit: 6c8910ac1cd360ea01136d707158690b5159a1d0 > > > > > > > > -- > > > > Chuck Lever > > > > > > > > > > > > -- > > Jeff Layton <jlayton@kernel.org> -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-10-30 14:48 [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots Jeff Layton ` (2 preceding siblings ...) 2024-10-30 20:30 ` [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots cel @ 2024-11-06 15:16 ` Sebastian Feld 2024-11-06 15:37 ` Chuck Lever III 3 siblings, 1 reply; 10+ messages in thread From: Sebastian Feld @ 2024-11-06 15:16 UTC (permalink / raw) To: linux-nfs, linux-kernel On Wed, Oct 30, 2024 at 3:50 PM Jeff Layton <jlayton@kernel.org> wrote: > > A few more minor updates to the set to fix some small-ish bugs, and do a > bit of cleanup. This seems to test OK for me so far. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > Changes in v3: > - add patch to convert se_flags to single se_dead bool > - fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX > - don't reject target highest slot value of 0 > - Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org > > Changes in v2: > - take cl_lock when fetching fields from session to be encoded > - use fls() instead of bespoke highest_unset_index() > - rename variables in several functions with more descriptive names > - clamp limit of for loop in update_cb_slot_table() > - re-add missing rpc_wake_up_queued_task() call > - fix slotid check in decode_cb_sequence4resok() > - add new per-session spinlock > What does a NFSv4.1 client need to do to be compatible with this change? Sebi -- Sebastian Feld - IT secruity expert ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots 2024-11-06 15:16 ` Sebastian Feld @ 2024-11-06 15:37 ` Chuck Lever III 0 siblings, 0 replies; 10+ messages in thread From: Chuck Lever III @ 2024-11-06 15:37 UTC (permalink / raw) To: Sebastian Feld; +Cc: Linux NFS Mailing List, linux-kernel@vger.kernel.org > On Nov 6, 2024, at 10:16 AM, Sebastian Feld <sebastian.n.feld@gmail.com> wrote: > > On Wed, Oct 30, 2024 at 3:50 PM Jeff Layton <jlayton@kernel.org> wrote: >> >> A few more minor updates to the set to fix some small-ish bugs, and do a >> bit of cleanup. This seems to test OK for me so far. >> >> Signed-off-by: Jeff Layton <jlayton@kernel.org> >> --- >> Changes in v3: >> - add patch to convert se_flags to single se_dead bool >> - fix off-by-one bug in handling of NFSD_BC_SLOT_TABLE_MAX >> - don't reject target highest slot value of 0 >> - Link to v2: https://lore.kernel.org/r/20241029-bcwide-v2-1-e9010b6ef55d@kernel.org >> >> Changes in v2: >> - take cl_lock when fetching fields from session to be encoded >> - use fls() instead of bespoke highest_unset_index() >> - rename variables in several functions with more descriptive names >> - clamp limit of for loop in update_cb_slot_table() >> - re-add missing rpc_wake_up_queued_task() call >> - fix slotid check in decode_cb_sequence4resok() >> - add new per-session spinlock >> > > What does a NFSv4.1 client need to do to be compatible with this change? Hi Sebastian - NFSD will continue to use 1 slot if that's all the client can handle. This is negotiated by the CREATE_SESSION operation. This is part of the NFSv4.1 protocol as specified in RFC 8881. If the client complies with that spec, the only thing that stands in the way of compatibility is implementation bugs. -- Chuck Lever ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-11-06 15:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-30 14:48 [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots Jeff Layton 2024-10-30 14:48 ` [PATCH v3 1/2] nfsd: make nfsd4_session->se_flags a bool Jeff Layton 2024-10-30 14:48 ` [PATCH v3 2/2] nfsd: allow for up to 32 callback session slots Jeff Layton 2024-10-30 20:30 ` [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots cel 2024-11-05 22:08 ` Olga Kornievskaia 2024-11-05 22:27 ` Jeff Layton 2024-11-05 22:40 ` Olga Kornievskaia 2024-11-05 22:55 ` Jeff Layton 2024-11-06 15:16 ` Sebastian Feld 2024-11-06 15:37 ` Chuck Lever III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox