From: Alexander Aring <aahringo@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH dlm-tool 2/2] dlm_controld: set SO_RCVBUF for netlink socket
Date: Fri, 4 Sep 2020 10:29:46 -0400 [thread overview]
Message-ID: <20200904142946.8684-2-aahringo@redhat.com> (raw)
In-Reply-To: <20200904142946.8684-1-aahringo@redhat.com>
Saw some:
1597148652 uevent recv error -1 errno 105
on a dlm_tool dump. The errno 105 is ENOBUFS on an recv of an AF_NETLINK
socket. Further investigations showed that we dropping uevents in such
case, see the added comment. The above error message was on a node which
hung inside do_uevent() of dlm kernel code which means that the
node is waiting for a sysfs write of "event_done". My guess is that
dlm_controld dropped some "important" messages and never writes to
"event_done" in this case. However we should prevent such ENOBUFS cases
in netlink which this patch is trying to do in a simple way.
---
dlm_controld/dlm_daemon.h | 2 ++
dlm_controld/main.c | 19 +++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/dlm_controld/dlm_daemon.h b/dlm_controld/dlm_daemon.h
index 0b4ae5f2..95848201 100644
--- a/dlm_controld/dlm_daemon.h
+++ b/dlm_controld/dlm_daemon.h
@@ -83,6 +83,8 @@
#define DEFAULT_LOGFILE_PRIORITY LOG_INFO
#define DEFAULT_LOGFILE LOG_FILE_PATH
+#define DEFAULT_NETLINK_RCVBUF (2 * 1024 * 1024)
+
enum {
no_arg = 0,
req_arg_bool = 1,
diff --git a/dlm_controld/main.c b/dlm_controld/main.c
index 470a067c..a82fc9c2 100644
--- a/dlm_controld/main.c
+++ b/dlm_controld/main.c
@@ -765,6 +765,7 @@ static void process_uevent(int ci)
static int setup_uevent(void)
{
struct sockaddr_nl snl;
+ int rcvbuf;
int s, rv;
s = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
@@ -773,6 +774,24 @@ static int setup_uevent(void)
return s;
}
+ /* man 7 netlink:
+ *
+ * However, reliable transmissions from kernel to user are impossible in
+ * any case. The kernel can't send a netlink message if the socket buffer
+ * is full: the message will be dropped and the kernel and the user-space
+ * process will no longer have the same view of kernel state. It is up to
+ * the application to detect when this happens (via the ENOBUFS error
+ * returned by recvmsg(2)) and resynchronize.
+ *
+ * To prevent ENOBUFS errors we just set the receive buffer to two
+ * megabyte as other applications do it. This will not ensure that we never
+ * receive ENOBUFS but it's more unlikely. May it's worth to handle ENOBUFS
+ * errors on a different way in future.
+ */
+ rcvbuf = DEFAULT_NETLINK_RCVBUF;
+ setsockopt(s, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf));
+ setsockopt(s, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbuf, sizeof(rcvbuf));
+
memset(&snl, 0, sizeof(snl));
snl.nl_family = AF_NETLINK;
snl.nl_pid = getpid();
--
2.26.2
prev parent reply other threads:[~2020-09-04 14:29 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-04 14:29 [Cluster-devel] [PATCH dlm-tool 1/2] Revert "dlm_controld: add support for waitplock_recovery switch" Alexander Aring
2020-09-04 14:29 ` Alexander Aring [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200904142946.8684-2-aahringo@redhat.com \
--to=aahringo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).