* [PATCH AUTOSEL 6.6 04/66] sunrpc: update nextcheck time when adding new cache entries
[not found] <20250601233744.3514795-1-sashal@kernel.org>
@ 2025-06-01 23:36 ` Sasha Levin
2025-06-01 23:37 ` [PATCH AUTOSEL 6.6 31/66] sunrpc: fix race in cache cleanup causing stale nextcheck time Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-01 23:36 UTC (permalink / raw)
To: patches, stable
Cc: Long Li, Chuck Lever, Sasha Levin, trondmy, anna, jlayton, davem,
edumazet, kuba, pabeni, linux-nfs, netdev, linux-kernel
From: Long Li <leo.lilong@huawei.com>
[ Upstream commit 5ca00634c8bbb2979c73465588f486b9632f5ed5 ]
The cache_detail structure uses a "nextcheck" field to control hash table
scanning intervals. When a table scan begins, nextcheck is set to current
time plus 1800 seconds. During scanning, if cache_detail is not empty and
a cache entry's expiry time is earlier than the current nextcheck, the
nextcheck is updated to that expiry time.
This mechanism ensures that:
1) Empty cache_details are scanned every 1800 seconds to avoid unnecessary
scans
2) Non-empty cache_details are scanned based on the earliest expiry time
found
However, when adding a new cache entry to an empty cache_detail, the
nextcheck time was not being updated, remaining at 1800 seconds. This
could delay cache cleanup for up to 1800 seconds, potentially blocking
threads(such as nfsd) that are waiting for cache cleanup.
Fix this by updating the nextcheck time whenever a new cache entry is
added.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Based on my analysis of the commit, the code changes, and the kernel
repository context, here is my assessment: **YES** This commit should be
backported to stable kernel trees. Here's my extensive analysis: ##
Analysis of the Problem The commit addresses a specific cache management
issue in the SUNRPC subsystem that can cause significant delays (up to
1800 seconds/30 minutes) in cache cleanup operations. The problem occurs
when: 1. A cache_detail starts empty, with nextcheck set to current time
+ 1800 seconds 2. A new cache entry is added to this empty cache_detail
3. The nextcheck time is NOT updated to reflect the new entry's expiry
time 4. This means cache cleanup won't occur until the original
1800-second timeout expires 5. During this delay, threads like nfsd that
depend on cache cleanup can be blocked ## Code Change Analysis The fix
is minimal and surgical: ```c + if (detail->nextcheck >
new->expiry_time) + detail->nextcheck = new->expiry_time + 1; ``` This
2-line addition in `sunrpc_cache_add_entry()` ensures that when a new
cache entry is added, if its expiry time is earlier than the current
nextcheck time, nextcheck is updated to trigger cleanup at the
appropriate time. ## Why This Qualifies for Stable Backporting **1. Bug
Fix Nature**: This clearly fixes a bug where cache cleanup could be
inappropriately delayed, affecting system functionality. **2. Minimal
Risk**: The change is extremely small and contained. It only adds a
simple conditional check that makes the cache cleanup more responsive.
There's no architectural change or complex logic. **3. Critical
Subsystem Impact**: The SUNRPC cache affects NFS operations, which are
critical for many production systems. A 1800-second delay in cache
cleanup can severely impact NFS server responsiveness. **4. Clear User
Impact**: The commit message explicitly mentions that this can block
threads like nfsd, which directly affects users experiencing NFS service
delays or hangs. **5. Historical Context**: Looking at the similar
commits provided as reference, this fix is actually addressing a more
fundamental timing issue than some of the other cache-related changes
that were marked as "NO" for backporting. Those commits dealt with
interface behavior changes or feature additions, while this fixes a
clear performance/delay bug. **6. Low Complexity**: The fix doesn't
introduce new features, change APIs, or modify behavioral contracts. It
simply ensures that cache cleanup happens when it should happen. **7.
Contained Scope**: The change is isolated to a single function
(`sunrpc_cache_add_entry`) and only affects the timing of cache cleanup,
not the cleanup logic itself. ## Comparison with Reference Commits
Unlike the reference commits marked "NO" which involved: - Interface
behavior changes (flush interface semantics) - Feature additions (new
cache operations) - Complex timing logic modifications - Architectural
changes This commit simply fixes a timing bug with a minimal, obvious
correction that ensures cache cleanup occurs at the appropriate time
rather than being unnecessarily delayed. The fix follows the stable tree
rules perfectly: it's an important bugfix with minimal risk that
addresses a clear user-visible issue in a critical kernel subsystem.
net/sunrpc/cache.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index cb6a6bc9fea77..50013f0c9f2eb 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -135,6 +135,8 @@ static struct cache_head *sunrpc_cache_add_entry(struct cache_detail *detail,
hlist_add_head_rcu(&new->cache_list, head);
detail->entries++;
+ if (detail->nextcheck > new->expiry_time)
+ detail->nextcheck = new->expiry_time + 1;
cache_get(new);
spin_unlock(&detail->hash_lock);
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 6.6 31/66] sunrpc: fix race in cache cleanup causing stale nextcheck time
[not found] <20250601233744.3514795-1-sashal@kernel.org>
2025-06-01 23:36 ` [PATCH AUTOSEL 6.6 04/66] sunrpc: update nextcheck time when adding new cache entries Sasha Levin
@ 2025-06-01 23:37 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-06-01 23:37 UTC (permalink / raw)
To: patches, stable
Cc: Long Li, Chuck Lever, Sasha Levin, jlayton, trondmy, anna, davem,
edumazet, kuba, pabeni, linux-nfs, netdev, linux-kernel
From: Long Li <leo.lilong@huawei.com>
[ Upstream commit 2298abcbe11e9b553d03c0f1d084da786f7eff88 ]
When cache cleanup runs concurrently with cache entry removal, a race
condition can occur that leads to incorrect nextcheck times. This can
delay cache cleanup for the cache_detail by up to 1800 seconds:
1. cache_clean() sets nextcheck to current time plus 1800 seconds
2. While scanning a non-empty bucket, concurrent cache entry removal can
empty that bucket
3. cache_clean() finds no cache entries in the now-empty bucket to update
the nextcheck time
4. This maybe delays the next scan of the cache_detail by up to 1800
seconds even when it should be scanned earlier based on remaining
entries
Fix this by moving the hash_lock acquisition earlier in cache_clean().
This ensures bucket emptiness checks and nextcheck updates happen
atomically, preventing the race between cleanup and entry removal.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Now let me analyze this commit in detail based on what I've learned:
**ANALYSIS:** **YES** - This commit should be backported to stable
kernel trees. ## Detailed Analysis: ### 1. **Bug Impact Assessment**
This commit fixes a **race condition** that can cause cache cleanup
delays of up to **1800 seconds (30 minutes)**. This is a significant
performance regression that affects the SUNRPC cache subsystem's ability
to clean expired entries in a timely manner. ### 2. **Root Cause
Analysis** The race condition occurs in the `cache_clean()` function
between these steps: - **Line 461**: `current_detail->nextcheck =
seconds_since_boot()+30*60;` (sets nextcheck to current time + 1800
seconds) - **Lines 466-469**: The code searches for non-empty buckets
**without holding hash_lock** - **Lines 479-496**: The code then
acquires `hash_lock` and processes entries, updating nextcheck based on
entry expiry times **The race window**: Between lines 466-469 and 479,
concurrent cache operations (like `sunrpc_cache_unhash()`) can remove
entries from buckets, making them empty. When `cache_clean()` later
acquires the hash_lock, it finds empty buckets and cannot update the
nextcheck time appropriately. ### 3. **Code Change Analysis** The fix is
**minimal and surgical**: ```c // BEFORE: hash_lock acquired late (line
479) if (current_detail && current_index < current_detail->hash_size) {
// ... variable declarations ... spin_lock(¤t_detail->hash_lock);
// TOO LATE! // AFTER: hash_lock acquired early (before bucket search)
spin_lock(¤t_detail->hash_lock); while (current_index <
current_detail->hash_size &&
hlist_empty(¤t_detail->hash_table[current_index]))
current_index++; ``` This ensures that: 1. **Atomicity**: Bucket
emptiness checks and nextcheck updates happen atomically 2. **Race
prevention**: No concurrent entry removal can happen during bucket
scanning 3. **Correctness**: nextcheck times are updated based on the
actual state of buckets at scan time ### 4. **Risk Assessment** **Low
Risk**: - **Scope**: Changes only affect the locking order in
`cache_clean()` - **Backwards compatibility**: No API changes - **Side
effects**: Minimal - just holds the lock slightly longer, which is safe
- **Testing**: The change preserves existing logic while fixing the race
### 5. **Stable Tree Criteria Compliance** ✅ **Fixes important bug**:
30-minute cache cleanup delays significantly impact performance ✅
**Small and contained**: Only 12 lines changed in a single function ✅
**No architectural changes**: Just reorders lock acquisition ✅ **Minimal
regression risk**: The change is conservative and preserves existing
behavior ✅ **Clear side effects**: Only improves correctness, no
negative impacts ### 6. **Comparison with Similar Commits** Looking at
the historical examples: - **Similar to Commit #3 (YES)**: Fixes a race
condition with minimal code changes and clear correctness improvement -
**Unlike Commits #1,#2,#4,#5 (NO)**: Those were either reverted,
cosmetic, or changed behavior significantly ### 7. **Critical System
Impact** SUNRPC cache is used by: - **NFS client/server operations** -
**RPC authentication caching** - **Various network filesystem
operations** A 30-minute delay in cache cleanup can cause: - Memory
bloat from stale entries - Performance degradation - Potential service
disruptions **Conclusion**: This is a clear bug fix with minimal risk
that addresses a significant performance issue in a critical kernel
subsystem. It meets all criteria for stable tree backporting.
net/sunrpc/cache.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 50013f0c9f2eb..0e0a2f46a7968 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -453,24 +453,21 @@ static int cache_clean(void)
}
}
+ spin_lock(¤t_detail->hash_lock);
+
/* find a non-empty bucket in the table */
- while (current_detail &&
- current_index < current_detail->hash_size &&
+ while (current_index < current_detail->hash_size &&
hlist_empty(¤t_detail->hash_table[current_index]))
current_index++;
/* find a cleanable entry in the bucket and clean it, or set to next bucket */
-
- if (current_detail && current_index < current_detail->hash_size) {
+ if (current_index < current_detail->hash_size) {
struct cache_head *ch = NULL;
struct cache_detail *d;
struct hlist_head *head;
struct hlist_node *tmp;
- spin_lock(¤t_detail->hash_lock);
-
/* Ok, now to clean this strand */
-
head = ¤t_detail->hash_table[current_index];
hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
if (current_detail->nextcheck > ch->expiry_time)
@@ -491,8 +488,10 @@ static int cache_clean(void)
spin_unlock(&cache_list_lock);
if (ch)
sunrpc_end_cache_remove_entry(ch, d);
- } else
+ } else {
+ spin_unlock(¤t_detail->hash_lock);
spin_unlock(&cache_list_lock);
+ }
return rv;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread