public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Aaron Tomlin <atomlin@atomlin.com>
To: steved@redhat.com, tbecker@redhat.com
Cc: yi.zhang@redhat.com, linux-nfs@vger.kernel.org
Subject: [PATCH] nfsrahead: enable event-driven mountinfo monitoring and skip non-NFS devices
Date: Fri,  6 Mar 2026 11:19:29 -0500	[thread overview]
Message-ID: <20260306161929.4148128-1-atomlin@atomlin.com> (raw)

The nfsrahead utility relies on parsing "/proc/self/mountinfo" to
correlate a device number with a specific NFS mount point. However, due
to the asynchronous nature of system initialisation, the relevant entry
in mountinfo may not be immediately available when the tool is executed.

Currently, the utility employs a naive polling mechanism, retrying the
search five times with a fixed 50ms delay (totalling 250ms). This
approach proves brittle on systems under heavy load or during
distinctively slow boot sequences.

To mitigate this race condition and improve robustness, update
get_device_info() to utilise the libmount monitoring API.

The new implementation introduces the following logic:

    1.  Initialises a monitor on /proc/self/mountinfo using
        mnt_new_monitor().

    2.  Replaces the fixed polling loop with mnt_monitor_wait().

    3.  Increases the maximum wait time to 10 seconds (MNT_NM_TIMEOUT).

    4.  Introduces a fast-path rejection mechanism. NFS backing devices are
        allocated from the kernel's unnamed block device pool (major number
        0). While some local multi-device filesystems (such as Btrfs) also
        utilise anonymous device numbers, physical hardware block devices
        (e.g., sda, nvme) always possess specific, non-zero major numbers.
        By instantly exiting with -ENODEV for any device string not
        beginning with "0:", we safely bypass the monitor for physical
        drives, preventing the exhaustion of udev worker threads.
        See set_anon_super() and get_anon_bdev().

    5.  Implements strict monotonic deadline tracking within the monitor
        loop to prevent indefinite blocking.

Fixes: 2b62ac4c ("nfsrahead: enable event-driven mountinfo monitoring")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs8URj2fJ7KyP9ViAm6npVOaMiAErnw2uFyPYEU2wb7G_w@mail.gmail.com/T/#t
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---

Hi Steve,

This patch should resolve the udev worker exhaustion issue reported by
Yi. It applies cleanly on top of the current nfs-utils tree, after your
revert [1].

Thank you.

[1]: https://lore.kernel.org/linux-nfs/20260305124221.55407-1-steved@redhat.com/


 tools/nfsrahead/main.c | 55 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/tools/nfsrahead/main.c b/tools/nfsrahead/main.c
index b7b889ff..78cd2581 100644
--- a/tools/nfsrahead/main.c
+++ b/tools/nfsrahead/main.c
@@ -3,6 +3,7 @@
 #include <stdlib.h>
 #include <errno.h>
 #include <unistd.h>
+#include <time.h>
 
 #include <libmount/libmount.h>
 #include <sys/sysmacros.h>
@@ -17,6 +18,8 @@
 #define CONF_NAME "nfsrahead"
 #define NFS_DEFAULT_READAHEAD 128
 
+#define MNT_NM_TIMEOUT 10000
+
 /* Device information from the system */
 struct device_info {
 	char *device_number;
@@ -117,7 +120,57 @@ out_free_device_info:
 
 static int get_device_info(const char *device_number, struct device_info *device_info)
 {
-	int ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+	int ret;
+	struct libmnt_monitor *mn = NULL;
+	struct timespec start, now;
+	int remaining_ms = MNT_NM_TIMEOUT;
+
+	/*
+	 * Fast-path rejection:
+	 * NFS backing devices always use the anonymous block device major number (0).
+	 * If the device number does not start with "0:", it is a physical block device
+	 * and will never be an NFS mount. Exit immediately to prevent blocking udev.
+	 */
+	if (strncmp(device_number, "0:", 2) != 0)
+		return -ENODEV;
+
+	ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+	if (ret == 0)
+		return 0;
+
+	mn = mnt_new_monitor();
+	if (!mn)
+		goto fallback;
+
+	if (mnt_monitor_enable_kernel(mn, 1) < 0) {
+		mnt_unref_monitor(mn);
+		goto fallback;
+	}
+
+	clock_gettime(CLOCK_MONOTONIC, &start);
+
+	while (remaining_ms > 0) {
+		int rc = mnt_monitor_wait(mn, remaining_ms);
+		if (rc > 0) {
+			ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
+			if (ret == 0) {
+				mnt_unref_monitor(mn);
+				return 0;
+			}
+		} else {
+			break;
+		}
+
+		clock_gettime(CLOCK_MONOTONIC, &now);
+		long elapsed_ms = (now.tv_sec - start.tv_sec) * 1000 +
+				  (now.tv_nsec - start.tv_nsec) / 1000000;
+		remaining_ms = MNT_NM_TIMEOUT - elapsed_ms;
+	}
+
+	mnt_unref_monitor(mn);
+	return ret;
+
+fallback:
 	for (int retry_count = 0; retry_count < 5 && ret != 0; retry_count++) {
 		usleep(50000);
 		ret = get_mountinfo(device_number, device_info, MOUNTINFO_PATH);
-- 
2.51.0


             reply	other threads:[~2026-03-06 16:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06 16:19 Aaron Tomlin [this message]
2026-03-06 22:10 ` [PATCH] nfsrahead: enable event-driven mountinfo monitoring and skip non-NFS devices Steve Dickson
2026-03-09 12:38   ` Yi Zhang
2026-03-09 13:29     ` Aaron Tomlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260306161929.4148128-1-atomlin@atomlin.com \
    --to=atomlin@atomlin.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=steved@redhat.com \
    --cc=tbecker@redhat.com \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox