From: Aaron Tomlin <atomlin@atomlin.com>
To: steved@redhat.com, tbecker@redhat.com
Cc: yi.zhang@redhat.com, linux-nfs@vger.kernel.org
Subject: [PATCH] nfsrahead: enable event-driven mountinfo monitoring and skip non-NFS devices
Date: Fri, 6 Mar 2026 11:19:29 -0500 [thread overview]
Message-ID: <20260306161929.4148128-1-atomlin@atomlin.com> (raw)
The nfsrahead utility relies on parsing "/proc/self/mountinfo" to
correlate a device number with a specific NFS mount point. However, due
to the asynchronous nature of system initialisation, the relevant entry
in mountinfo may not be immediately available when the tool is executed.
Currently, the utility employs a naive polling mechanism, retrying the
search five times with a fixed 50ms delay (totalling 250ms). This
approach proves brittle on systems under heavy load or during
distinctively slow boot sequences.
To mitigate this race condition and improve robustness, update
get_device_info() to utilise the libmount monitoring API.
The new implementation introduces the following logic:
1. Initialises a monitor on /proc/self/mountinfo using
mnt_new_monitor().
2. Replaces the fixed polling loop with mnt_monitor_wait().
3. Increases the maximum wait time to 10 seconds (MNT_NM_TIMEOUT).
4. Introduces a fast-path rejection mechanism. NFS backing devices are
allocated from the kernel's unnamed block device pool (major number
0). While some local multi-device filesystems (such as Btrfs) also
utilise anonymous device numbers, physical hardware block devices
(e.g., sda, nvme) always possess specific, non-zero major numbers.
By instantly exiting with -ENODEV for any device string not
beginning with "0:", we safely bypass the monitor for physical
drives, preventing the exhaustion of udev worker threads.
See set_anon_super() and get_anon_bdev().
5. Implements strict monotonic deadline tracking within the monitor
loop to prevent indefinite blocking.
Fixes: 2b62ac4c ("nfsrahead: enable event-driven mountinfo monitoring")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs8URj2fJ7KyP9ViAm6npVOaMiAErnw2uFyPYEU2wb7G_w@mail.gmail.com/T/#t
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
Hi Steve,
This patch should resolve the udev worker exhaustion issue reported by
Yi. It applies cleanly on top of the current nfs-utils tree, after your
revert [1].
Thank you.
[1]: https://lore.kernel.org/linux-nfs/20260305124221.55407-1-steved@redhat.com/
tools/nfsrahead/main.c | 55 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/tools/nfsrahead/main.c b/tools/nfsrahead/main.c
index b7b889ff..78cd2581 100644
--- a/tools/nfsrahead/main.c
+++ b/tools/nfsrahead/main.c
@@ -3,6 +3,7 @@
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
+#include <time.h>
#include <libmount/libmount.h>
#include <sys/sysmacros.h>
@@ -17,6 +18,8 @@
#define CONF_NAME "nfsrahead"
#define NFS_DEFAULT_READAHEAD 128
+#define MNT_NM_TIMEOUT 10000
+
/* Device information from the system */
struct device_info {
char *device_number;
@@ -117,7 +120,57 @@ out_free_device_info:
static int get_device_info(const char *device_number, struct device_info *device_info)
{
- int ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+ int ret;
+ struct libmnt_monitor *mn = NULL;
+ struct timespec start, now;
+ int remaining_ms = MNT_NM_TIMEOUT;
+
+ /*
+ * Fast-path rejection:
+ * NFS backing devices always use the anonymous block device major number (0).
+ * If the device number does not start with "0:", it is a physical block device
+ * and will never be an NFS mount. Exit immediately to prevent blocking udev.
+ */
+ if (strncmp(device_number, "0:", 2) != 0)
+ return -ENODEV;
+
+ ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+ if (ret == 0)
+ return 0;
+
+ mn = mnt_new_monitor();
+ if (!mn)
+ goto fallback;
+
+ if (mnt_monitor_enable_kernel(mn, 1) < 0) {
+ mnt_unref_monitor(mn);
+ goto fallback;
+ }
+
+ clock_gettime(CLOCK_MONOTONIC, &start);
+
+ while (remaining_ms > 0) {
+ int rc = mnt_monitor_wait(mn, remaining_ms);
+ if (rc > 0) {
+ ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+ if (ret == 0) {
+ mnt_unref_monitor(mn);
+ return 0;
+ }
+ } else {
+ break;
+ }
+
+ clock_gettime(CLOCK_MONOTONIC, &now);
+ long elapsed_ms = (now.tv_sec - start.tv_sec) * 1000 +
+ (now.tv_nsec - start.tv_nsec) / 1000000;
+ remaining_ms = MNT_NM_TIMEOUT - elapsed_ms;
+ }
+
+ mnt_unref_monitor(mn);
+ return ret;
+
+fallback:
for (int retry_count = 0; retry_count < 5 && ret != 0; retry_count++) {
usleep(50000);
ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
--
2.51.0
next reply other threads:[~2026-03-06 16:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 16:19 Aaron Tomlin [this message]
2026-03-06 22:10 ` [PATCH] nfsrahead: enable event-driven mountinfo monitoring and skip non-NFS devices Steve Dickson
2026-03-09 12:38 ` Yi Zhang
2026-03-09 13:29 ` Aaron Tomlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260306161929.4148128-1-atomlin@atomlin.com \
--to=atomlin@atomlin.com \
--cc=linux-nfs@vger.kernel.org \
--cc=steved@redhat.com \
--cc=tbecker@redhat.com \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox