netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3] Expose RDS features via sysfs.
@ 2025-06-18  3:13 Konrad Rzeszutek Wilk
  2025-06-18  3:13 ` [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2025-06-18  3:13 UTC (permalink / raw)
  To: allison.henderson, netdev, linux-rdma, rds-devel, tj, andrew
  Cc: hannes, mkoutny, cgroups

Hi folks,

Changelog:
+ Since v2:
 - Changed it to use sysfs and expose a 'features' file with the data.
+ Since v3:
 - Use a 'features' directory similar to ext4, btrfs and expose each feature.
 - Expand on the Documentation
 
This patch addresses an issue where we have applications compiled against
against older (or newer) kernels. RDS does not have any ioctls to query
for what version of ABIs it has or what features it has. Hence this solution
that proposes to put this ABI information in sysfs space.


 Documentation/ABI/stable/sysfs-transport-rds | 55 ++++++++++++++++++++++++
 net/rds/af_rds.c                             | 62 +++++++++++++++++++++++++++-
 2 files changed, 116 insertions(+), 1 deletion(-)

Konrad Rzeszutek Wilk (1):
      rds: Expose feature parameters via sysfs (and ELF)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF)
  2025-06-18  3:13 [PATCH net v3] Expose RDS features via sysfs Konrad Rzeszutek Wilk
@ 2025-06-18  3:13 ` Konrad Rzeszutek Wilk
  2025-06-19  3:19   ` Allison Henderson
  2025-06-19 14:14   ` Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Konrad Rzeszutek Wilk @ 2025-06-18  3:13 UTC (permalink / raw)
  To: allison.henderson, netdev, linux-rdma, rds-devel, tj, andrew
  Cc: hannes, mkoutny, cgroups, Konrad Rzeszutek Wilk

We would like to have a programmatic way for applications
to query which of the features defined in include/uapi/linux/rds.h
are actually implemented by the kernel.

The problem is that applications can be built against newer
kernel (or older) and they may have the feature implemented or not.

The lack of a certain feature would signify that the kernel
does not support it. The presence of it signifies the existence
of it.

This would provide the application to query the sysfs and figure
out what is supported on a running system.

This patch would expose this extra sysfs file:

/sys/kernel/rds/features/ioctl_get_tos
/sys/kernel/rds/features/ioctl_set_tos
/sys/kernel/rds/features/socket_cancel_sent_to
/sys/kernel/rds/features/socket_cong_monitor
/sys/kernel/rds/features/socket_free_mr
/sys/kernel/rds/features/socket_get_mr
/sys/kernel/rds/features/socket_get_mr_for_dest
/sys/kernel/rds/features/socket_recverr
/sys/kernel/rds/features/socket_so_rxpath_latency
/sys/kernel/rds/features/socket_so_transport

With the value of 'supported' in them. In the future this value
could change to say 'deprecated' or have other values (for example
different versions) or can be runtime changed.

The choice to use sysfs and this particular way is modeled on the
filesystems usage exposing their features.

Alternative solution such as exposing one file ('features') with
each feature enumerated (which cgroup does) is a bit limited in
that it does not provide means to provide extra content in the future
for each feature. For example if one of the features had three
modes and one wanted to set a particular one at runtime - that
does not exist in cgroup (albeit it can be implemented but it would
be quite hectic to have just one single attribute).

Another solution of using an ioctl to expose a bitmask has the
disadvantage of being less flexible in the future and while it can
have a bit of supported/unsupported, it is not clear how one would
change modes or expose versions. It is most certainly feasible
but it can get seriously complex fast.

As such this mechanism offers the basic support we require
now and offers the flexibility for the future.

Lastly, we also utilize the ELF note macro to expose these via
so that applications that have not yet initialized RDS transport
can inspect the kernel module to see if they have the appropiate
support and choose an alternative protocol if they wish so.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v3: Add the missing documentation
    Remove the CONFIG_SYSFS #ifdef machinations
    Redo it with each feature as a seperate file in 'features'
    directory.
---
 Documentation/ABI/stable/sysfs-transport-rds | 55 +++++++++++++++++
 net/rds/af_rds.c                             | 62 +++++++++++++++++++-
 2 files changed, 116 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/stable/sysfs-transport-rds

diff --git a/Documentation/ABI/stable/sysfs-transport-rds b/Documentation/ABI/stable/sysfs-transport-rds
new file mode 100644
index 000000000000..db4de277f717
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-transport-rds
@@ -0,0 +1,55 @@
+What:          /sys/kernel/rds/features/*
+Date:          June 2025
+KernelVersion: 6.17
+Contact:       rds-devel@oss.oracle.com
+Description:   This directory contains the features that this kernel
+               has been built with and supports. They correspond
+               to the include/uapi/linux/rds.h features.
+
+	       The intent is for applications compiled against rds.h
+	       to be able to query and find out what features the
+	       driver supports. The current expected value is 'supported'.
+
+	       The features so far are:
+
+	       ioctl_get_tos
+	       ioctl_set_tos
+
+		Allows the user to set on the socket a type of
+		service(tos) value associated forever.
+
+	       socket_cancel_sent_to
+
+		Allows to cancel all pending messages to a given destination.
+
+	       socket_cong_monitor
+
+		RDS provides an explicit monitoring wherein a 64-bit mask value
+		on the socket and each bit corresponds to a group of ports.
+
+		When a congestion update arrives, RDS checks the set of ports
+		that became uncongested against the bit mask.
+
+		If they overlap, a control messages is enqueued on the socket,
+		and the application is woken up.
+
+	       socket_get_mr
+	       socket_get_mr_for_dest
+	       socket_free_mr
+
+		RDS allows a process to register or release memory ranges for
+		RDMA.
+
+	       socket_recverr
+
+		RDS will send RDMA notification messages to the application for
+		any RDMA operation that fails. By default this is off.
+
+	       socket_so_rxpath_latency
+
+		Receive path latency in various stages of receive path.
+
+	       socket_so_transport
+
+		Attach the socket to the underlaying transport (TCP or RDMA)
+		before invoking bind on the socket.
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 8435a20968ef..449b20f236a5 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -32,7 +32,9 @@
  */
 #include <linux/module.h>
 #include <linux/errno.h>
+#include <linux/elfnote.h>
 #include <linux/kernel.h>
+#include <linux/kobject.h>
 #include <linux/gfp.h>
 #include <linux/in.h>
 #include <linux/ipv6.h>
@@ -871,6 +873,53 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
 }
 #endif
 
+static ssize_t supported_show(struct kobject *kobj, struct kobj_attribute *attr,
+			     char *buf)
+{
+	return sysfs_emit(buf, "supported\n");
+}
+
+#define SYSFS_ATTR(_name)					\
+ELFNOTE64("rds." #_name, 0, 1);				\
+static struct kobj_attribute rds_attr_##_name = {		\
+	.attr = {.name = __stringify(_name), .mode = 0444 },	\
+	.show = supported_show,					\
+}
+
+SYSFS_ATTR(ioctl_set_tos);
+SYSFS_ATTR(ioctl_get_tos);
+SYSFS_ATTR(socket_cancel_sent_to);
+SYSFS_ATTR(socket_cong_monitor);
+SYSFS_ATTR(socket_get_mr);
+SYSFS_ATTR(socket_get_mr_for_dest);
+SYSFS_ATTR(socket_free_mr);
+SYSFS_ATTR(socket_recverr);
+SYSFS_ATTR(socket_so_rxpath_latency);
+SYSFS_ATTR(socket_so_transport);
+
+#define ATTR_LIST(_name) &rds_attr_##_name.attr
+
+static struct attribute *rds_feat_attrs[] = {
+	ATTR_LIST(ioctl_set_tos),
+	ATTR_LIST(ioctl_get_tos),
+	ATTR_LIST(socket_cancel_sent_to),
+	ATTR_LIST(socket_cong_monitor),
+	ATTR_LIST(socket_get_mr),
+	ATTR_LIST(socket_get_mr_for_dest),
+	ATTR_LIST(socket_free_mr),
+	ATTR_LIST(socket_recverr),
+	ATTR_LIST(socket_so_rxpath_latency),
+	ATTR_LIST(socket_so_transport),
+	NULL,
+};
+
+static const struct attribute_group rds_feat_group = {
+	.attrs = rds_feat_attrs,
+	.name = "features",
+};
+
+static struct kobject *rds_sysfs_kobj;
+
 static void rds_exit(void)
 {
 	sock_unregister(rds_family_ops.family);
@@ -882,6 +931,8 @@ static void rds_exit(void)
 	rds_stats_exit();
 	rds_page_exit();
 	rds_bind_lock_destroy();
+	sysfs_remove_group(rds_sysfs_kobj, &rds_feat_group);
+	kobject_put(rds_sysfs_kobj);
 	rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
 	rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -923,6 +974,14 @@ static int __init rds_init(void)
 	if (ret)
 		goto out_proto;
 
+	rds_sysfs_kobj = kobject_create_and_add("rds", kernel_kobj);
+	if (!rds_sysfs_kobj)
+		goto out_proto;
+
+	ret = sysfs_create_group(rds_sysfs_kobj, &rds_feat_group);
+	if (ret)
+		goto out_kobject;
+
 	rds_info_register_func(RDS_INFO_SOCKETS, rds_sock_info);
 	rds_info_register_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -931,7 +990,8 @@ static int __init rds_init(void)
 #endif
 
 	goto out;
-
+out_kobject:
+	kobject_put(rds_sysfs_kobj);
 out_proto:
 	proto_unregister(&rds_proto);
 out_stats:
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF)
  2025-06-18  3:13 ` [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk
@ 2025-06-19  3:19   ` Allison Henderson
  2025-06-19 14:14   ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: Allison Henderson @ 2025-06-19  3:19 UTC (permalink / raw)
  To: andrew@lunn.ch, rds-devel@oss.oracle.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Konrad Wilk, tj@kernel.org
  Cc: mkoutny@suse.com, hannes@cmpxchg.org, cgroups@vger.kernel.org

On Tue, 2025-06-17 at 23:13 -0400, Konrad Rzeszutek Wilk wrote:
> We would like to have a programmatic way for applications
> to query which of the features defined in include/uapi/linux/rds.h
> are actually implemented by the kernel.
> 
> The problem is that applications can be built against newer
> kernel (or older) and they may have the feature implemented or not.
> 
> The lack of a certain feature would signify that the kernel
> does not support it. The presence of it signifies the existence
> of it.
> 
> This would provide the application to query the sysfs and figure
> out what is supported on a running system.
> 
> This patch would expose this extra sysfs file:
> 
> /sys/kernel/rds/features/ioctl_get_tos
> /sys/kernel/rds/features/ioctl_set_tos
> /sys/kernel/rds/features/socket_cancel_sent_to
> /sys/kernel/rds/features/socket_cong_monitor
> /sys/kernel/rds/features/socket_free_mr
> /sys/kernel/rds/features/socket_get_mr
> /sys/kernel/rds/features/socket_get_mr_for_dest
> /sys/kernel/rds/features/socket_recverr
> /sys/kernel/rds/features/socket_so_rxpath_latency
> /sys/kernel/rds/features/socket_so_transport
> 
> With the value of 'supported' in them. In the future this value
> could change to say 'deprecated' or have other values (for example
> different versions) or can be runtime changed.
> 
> The choice to use sysfs and this particular way is modeled on the
> filesystems usage exposing their features.
> 
> Alternative solution such as exposing one file ('features') with
> each feature enumerated (which cgroup does) is a bit limited in
> that it does not provide means to provide extra content in the future
> for each feature. For example if one of the features had three
> modes and one wanted to set a particular one at runtime - that
> does not exist in cgroup (albeit it can be implemented but it would
> be quite hectic to have just one single attribute).
> 
> Another solution of using an ioctl to expose a bitmask has the
> disadvantage of being less flexible in the future and while it can
> have a bit of supported/unsupported, it is not clear how one would
> change modes or expose versions. It is most certainly feasible
> but it can get seriously complex fast.
> 
> As such this mechanism offers the basic support we require
> now and offers the flexibility for the future.
> 
> Lastly, we also utilize the ELF note macro to expose these via
> so that applications that have not yet initialized RDS transport
> can inspect the kernel module to see if they have the appropiate
> support and choose an alternative protocol if they wish so.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v3: Add the missing documentation
>     Remove the CONFIG_SYSFS #ifdef machinations
>     Redo it with each feature as a seperate file in 'features'
>     directory.
> ---
>  Documentation/ABI/stable/sysfs-transport-rds | 55 +++++++++++++++++
>  net/rds/af_rds.c                             | 62 +++++++++++++++++++-
>  2 files changed, 116 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/stable/sysfs-transport-rds
> 
> diff --git a/Documentation/ABI/stable/sysfs-transport-rds b/Documentation/ABI/stable/sysfs-transport-rds
> new file mode 100644
> index 000000000000..db4de277f717
> --- /dev/null
> +++ b/Documentation/ABI/stable/sysfs-transport-rds
> @@ -0,0 +1,55 @@
> +What:          /sys/kernel/rds/features/*
> +Date:          June 2025
> +KernelVersion: 6.17
> +Contact:       rds-devel@oss.oracle.com
> +Description:   This directory contains the features that this kernel
> +               has been built with and supports. They correspond
> +               to the include/uapi/linux/rds.h features.
> +
> +	       The intent is for applications compiled against rds.h
> +	       to be able to query and find out what features the
> +	       driver supports. The current expected value is 'supported'.
> +
I noticed some inconsistent whitespace indentation here.  I think most of the 
other documents use 2 tabs to align the description block.

> +	       The features so far are:
> +
> +	       ioctl_get_tos
> +	       ioctl_set_tos
> +
> +		Allows the user to set on the socket a type of
nit: "Allows the user to get or set the sockets type of"
> +		service(tos) value associated forever.

Perhaps you meant to have another level of indentation to the feature descriptions too?  So, 3 tabs here.  

I did some looking around to see if there is a common documentation style, though it appears informal.  I think what you
have is most similar to Documentation/ABI/stable/sysfs-module, which lists the full feature paths.  So something like
this:

		/sys/kernel/rds/features/ioctl_get_tos
		/sys/kernel/rds/features/ioctl_set_tos
			Allows the user to get or set the socket a type of
			service(tos) value associated forever.

Lets follow the same format for the rest as well.  While I did notice quite a few documents repeat the entire
what/date/version/contact block for each feature, I thought that was a bit repetitive in this case.


> +
> +	       socket_cancel_sent_to
> +
> +		Allows to cancel all pending messages to a given destination.
> +
> +	       socket_cong_monitor
> +
> +		RDS provides an explicit monitoring wherein a 64-bit mask value
> +		on the socket and each bit corresponds to a group of ports.
		"RDS provides explicit congestion monitoring for a socket using 
		a 64-bit mask. Each bit in the mask corresponds to a group of
		ports."

> +
> +		When a congestion update arrives, RDS checks the set of ports
> +		that became uncongested against the bit mask.
> +
> +		If they overlap, a control messages is enqueued on the socket,
> +		and the application is woken up.
> +
> +	       socket_get_mr
> +	       socket_get_mr_for_dest
> +	       socket_free_mr
> +
> +		RDS allows a process to register or release memory ranges for
> +		RDMA.
> +
> +	       socket_recverr
> +
> +		RDS will send RDMA notification messages to the application for
> +		any RDMA operation that fails. By default this is off.
> +
> +	       socket_so_rxpath_latency
> +
> +		Receive path latency in various stages of receive path.
> +
> +	       socket_so_transport
> +
> +		Attach the socket to the underlaying transport (TCP or RDMA)
		"(TCP, RDMA or LOOP)"

> +		before invoking bind on the socket.
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 8435a20968ef..449b20f236a5 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -32,7 +32,9 @@
>   */
>  #include <linux/module.h>
>  #include <linux/errno.h>
> +#include <linux/elfnote.h>
>  #include <linux/kernel.h>
> +#include <linux/kobject.h>
>  #include <linux/gfp.h>
>  #include <linux/in.h>
>  #include <linux/ipv6.h>
> @@ -871,6 +873,53 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  }
>  #endif
>  
> +static ssize_t supported_show(struct kobject *kobj, struct kobj_attribute *attr,
> +			     char *buf)
> +{
> +	return sysfs_emit(buf, "supported\n");
> +}
> +
> +#define SYSFS_ATTR(_name)					\
> +ELFNOTE64("rds." #_name, 0, 1);				\
> +static struct kobj_attribute rds_attr_##_name = {		\
> +	.attr = {.name = __stringify(_name), .mode = 0444 },	\
> +	.show = supported_show,					\
> +}
> +
> +SYSFS_ATTR(ioctl_set_tos);
> +SYSFS_ATTR(ioctl_get_tos);
> +SYSFS_ATTR(socket_cancel_sent_to);
> +SYSFS_ATTR(socket_cong_monitor);
> +SYSFS_ATTR(socket_get_mr);
> +SYSFS_ATTR(socket_get_mr_for_dest);
> +SYSFS_ATTR(socket_free_mr);
> +SYSFS_ATTR(socket_recverr);
> +SYSFS_ATTR(socket_so_rxpath_latency);
> +SYSFS_ATTR(socket_so_transport);
> +
> +#define ATTR_LIST(_name) &rds_attr_##_name.attr
> +
> +static struct attribute *rds_feat_attrs[] = {
> +	ATTR_LIST(ioctl_set_tos),
> +	ATTR_LIST(ioctl_get_tos),
> +	ATTR_LIST(socket_cancel_sent_to),
> +	ATTR_LIST(socket_cong_monitor),
> +	ATTR_LIST(socket_get_mr),
> +	ATTR_LIST(socket_get_mr_for_dest),
> +	ATTR_LIST(socket_free_mr),
> +	ATTR_LIST(socket_recverr),
> +	ATTR_LIST(socket_so_rxpath_latency),
> +	ATTR_LIST(socket_so_transport),
> +	NULL,
> +};
> +
> +static const struct attribute_group rds_feat_group = {
> +	.attrs = rds_feat_attrs,
> +	.name = "features",
> +};
> +
> +static struct kobject *rds_sysfs_kobj;
> +
>  static void rds_exit(void)
>  {
>  	sock_unregister(rds_family_ops.family);
> @@ -882,6 +931,8 @@ static void rds_exit(void)
>  	rds_stats_exit();
>  	rds_page_exit();
>  	rds_bind_lock_destroy();
> +	sysfs_remove_group(rds_sysfs_kobj, &rds_feat_group);
> +	kobject_put(rds_sysfs_kobj);
>  	rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
>  	rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
>  #if IS_ENABLED(CONFIG_IPV6)
> @@ -923,6 +974,14 @@ static int __init rds_init(void)
>  	if (ret)
>  		goto out_proto;
>  
> +	rds_sysfs_kobj = kobject_create_and_add("rds", kernel_kobj);
> +	if (!rds_sysfs_kobj)
I think we need to assign something to ret here, or it will just be 0 when we jump to out_proto.  
I think "ret = -ENOMEM;" is appropriate

> +		goto out_proto;
> +
> +	ret = sysfs_create_group(rds_sysfs_kobj, &rds_feat_group);
> +	if (ret)
> +		goto out_kobject;
> +
>  	rds_info_register_func(RDS_INFO_SOCKETS, rds_sock_info);
>  	rds_info_register_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
>  #if IS_ENABLED(CONFIG_IPV6)
> @@ -931,7 +990,8 @@ static int __init rds_init(void)
>  #endif
>  
>  	goto out;
> -
> +out_kobject:
> +	kobject_put(rds_sysfs_kobj);
>  out_proto:
>  	proto_unregister(&rds_proto);
>  out_stats:

Other than a few nits, I think this is looking very good.  Thanks for working on this!

Allison


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF)
  2025-06-18  3:13 ` [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk
  2025-06-19  3:19   ` Allison Henderson
@ 2025-06-19 14:14   ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2025-06-19 14:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: allison.henderson, netdev, linux-rdma, rds-devel, tj, andrew,
	hannes, mkoutny, cgroups

On Tue, 17 Jun 2025 23:13:09 -0400 Konrad Rzeszutek Wilk wrote:
> We would like to have a programmatic way for applications
> to query which of the features defined in include/uapi/linux/rds.h
> are actually implemented by the kernel.

Hi Konrad, when you repost to fix Allison's comments please use 
  [PATCH net-next v4]
as the subject tag, [PATCH net] usually means fixes for current release.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-19 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  3:13 [PATCH net v3] Expose RDS features via sysfs Konrad Rzeszutek Wilk
2025-06-18  3:13 ` [PATCH net v3] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk
2025-06-19  3:19   ` Allison Henderson
2025-06-19 14:14   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).