* [PATCH net-next v4] Expose RDS features via sysfs. @ 2025-06-21 0:26 Konrad Rzeszutek Wilk 2025-06-21 0:26 ` [PATCH net-next v4] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk 0 siblings, 1 reply; 3+ messages in thread From: Konrad Rzeszutek Wilk @ 2025-06-21 0:26 UTC (permalink / raw) To: allison.henderson, netdev, linux-rdma, rds-devel, tj, andrew Cc: hannes, mkoutny, cgroups Hi folks, Changelog: + Since v1: - Changed it to use sysfs and expose a 'features' file with the data. + Since v2: - Use a 'features' directory similar to ext4, btrfs and expose each feature. + Since v3: - Fix ret returning 0 if kobject_create_and_add fails - Follow the syntax of Documentation/ABI/stable/sysfs-module. This patch addresses an issue where we have applications compiled against against older (or newer) kernels. RDS does not have any ioctls to query for what version of ABIs it has or what features it has. Hence this solution that proposes to put this ABI information in sysfs space. Documentation/ABI/stable/sysfs-transport-rds | 43 +++++++++++++++++++ net/rds/af_rds.c | 63 +++++++++++++++++++++++++++- 2 files changed, 105 insertions(+), 1 deletion(-) Konrad Rzeszutek Wilk (1): rds: Expose feature parameters via sysfs (and ELF) ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH net-next v4] rds: Expose feature parameters via sysfs (and ELF) 2025-06-21 0:26 [PATCH net-next v4] Expose RDS features via sysfs Konrad Rzeszutek Wilk @ 2025-06-21 0:26 ` Konrad Rzeszutek Wilk 2025-06-22 17:21 ` Allison Henderson 0 siblings, 1 reply; 3+ messages in thread From: Konrad Rzeszutek Wilk @ 2025-06-21 0:26 UTC (permalink / raw) To: allison.henderson, netdev, linux-rdma, rds-devel, tj, andrew Cc: hannes, mkoutny, cgroups, Konrad Rzeszutek Wilk We would like to have a programmatic way for applications to query which of the features defined in include/uapi/linux/rds.h are actually implemented by the kernel. The problem is that applications can be built against newer kernel (or older) and they may have the feature implemented or not. The lack of a certain feature would signify that the kernel does not support it. The presence of it signifies the existence of it. This would provide the application to query the sysfs and figure out what is supported on a running system. This patch would expose this extra sysfs file: /sys/kernel/rds/features/ioctl_get_tos /sys/kernel/rds/features/ioctl_set_tos /sys/kernel/rds/features/socket_cancel_sent_to /sys/kernel/rds/features/socket_cong_monitor /sys/kernel/rds/features/socket_free_mr /sys/kernel/rds/features/socket_get_mr /sys/kernel/rds/features/socket_get_mr_for_dest /sys/kernel/rds/features/socket_recverr /sys/kernel/rds/features/socket_so_rxpath_latency /sys/kernel/rds/features/socket_so_transport With the value of 'supported' in them. In the future this value could change to say 'deprecated' or have other values (for example different versions) or can be runtime changed. The choice to use sysfs and this particular way is modeled on the filesystems usage exposing their features. Alternative solution such as exposing one file ('features') with each feature enumerated (which cgroup does) is a bit limited in that it does not provide means to provide extra content in the future for each feature. For example if one of the features had three modes and one wanted to set a particular one at runtime - that does not exist in cgroup (albeit it can be implemented but it would be quite hectic to have just one single attribute). Another solution of using an ioctl to expose a bitmask has the disadvantage of being less flexible in the future and while it can have a bit of supported/unsupported, it is not clear how one would change modes or expose versions. It is most certainly feasible but it can get seriously complex fast. As such this mechanism offers the basic support we require now and offers the flexibility for the future. Lastly, we also utilize the ELF note macro to expose these via so that applications that have not yet initialized RDS transport can inspect the kernel module to see if they have the appropiate support and choose an alternative protocol if they wish so. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v3: Add the missing documentation Remove the CONFIG_SYSFS #ifdef machinations Redo it with each feature as a seperate file in 'features' directory. v4: Follow the syntax in Documentation/ABI/stable/sysfs-module Tack on the ret value if we fail kobject_create_and_add --- Documentation/ABI/stable/sysfs-transport-rds | 43 +++++++++++++ net/rds/af_rds.c | 63 +++++++++++++++++++- 2 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 Documentation/ABI/stable/sysfs-transport-rds diff --git a/Documentation/ABI/stable/sysfs-transport-rds b/Documentation/ABI/stable/sysfs-transport-rds new file mode 100644 index 000000000000..749a043d4e88 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-transport-rds @@ -0,0 +1,43 @@ +What: /sys/kernel/rds/features/* +Date: June 2025 +KernelVersion: 6.17 +Contact: rds-devel@oss.oracle.com +Description: This directory contains the features that this kernel + has been built with and supports. They correspond + to the include/uapi/linux/rds.h features. + + The intent is for applications compiled against rds.h + to be able to query and find out what features the + driver supports. The current expected value is 'supported'. + +What: /sys/kernel/rds/features/ioctl_[get,set]_tos +Description: Allows the user to set on the socket a type of + service(tos) value associated forever. + +What: /sys/kernel/rds/features/socket_cancel_sent_to +Description: Allows to cancel all pending messages to a given destination. + +What: /sys/kernel/rds/features/socket_cong_monitor +Description: RDS provides explicit congestion monitoring for a socket using + a 64-bit mask. Each bit in the mask corresponds to a group of ports. + + When a congestion update arrives, RDS checks the set of ports + that became uncongested against the bit mask. + + If they overlap, a control messages is enqueued on the socket, + and the application is woken up. + +What: /sys/kernel/rds/features/socket_[get_mr.get_mr_for_dest,free_mr] +Description: RDS allows a process to register or release memory ranges for + RDMA. + +What: /sys/kernel/rds/features/socket_recverr +Description: RDS will send RDMA notification messages to the application for + any RDMA operation that fails. By default this is off. + +What: /sys/kernel/rds/features/socket_so_rxpath_latency +Description: Receive path latency in various stages of receive path. + +What: /sys/kernel/rds/features/socket_so_transport +Description: Attach the socket to the underlaying transport (TCP, RDMA + or loop) before invoking bind on the socket. diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 8435a20968ef..f920cc6f0b2b 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -32,7 +32,9 @@ */ #include <linux/module.h> #include <linux/errno.h> +#include <linux/elfnote.h> #include <linux/kernel.h> +#include <linux/kobject.h> #include <linux/gfp.h> #include <linux/in.h> #include <linux/ipv6.h> @@ -871,6 +873,53 @@ static void rds6_sock_info(struct socket *sock, unsigned int len, } #endif +static ssize_t supported_show(struct kobject *kobj, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "supported\n"); +} + +#define SYSFS_ATTR(_name) \ +ELFNOTE64("rds." #_name, 0, 1); \ +static struct kobj_attribute rds_attr_##_name = { \ + .attr = {.name = __stringify(_name), .mode = 0444 }, \ + .show = supported_show, \ +} + +SYSFS_ATTR(ioctl_set_tos); +SYSFS_ATTR(ioctl_get_tos); +SYSFS_ATTR(socket_cancel_sent_to); +SYSFS_ATTR(socket_cong_monitor); +SYSFS_ATTR(socket_get_mr); +SYSFS_ATTR(socket_get_mr_for_dest); +SYSFS_ATTR(socket_free_mr); +SYSFS_ATTR(socket_recverr); +SYSFS_ATTR(socket_so_rxpath_latency); +SYSFS_ATTR(socket_so_transport); + +#define ATTR_LIST(_name) &rds_attr_##_name.attr + +static struct attribute *rds_feat_attrs[] = { + ATTR_LIST(ioctl_set_tos), + ATTR_LIST(ioctl_get_tos), + ATTR_LIST(socket_cancel_sent_to), + ATTR_LIST(socket_cong_monitor), + ATTR_LIST(socket_get_mr), + ATTR_LIST(socket_get_mr_for_dest), + ATTR_LIST(socket_free_mr), + ATTR_LIST(socket_recverr), + ATTR_LIST(socket_so_rxpath_latency), + ATTR_LIST(socket_so_transport), + NULL, +}; + +static const struct attribute_group rds_feat_group = { + .attrs = rds_feat_attrs, + .name = "features", +}; + +static struct kobject *rds_sysfs_kobj; + static void rds_exit(void) { sock_unregister(rds_family_ops.family); @@ -882,6 +931,8 @@ static void rds_exit(void) rds_stats_exit(); rds_page_exit(); rds_bind_lock_destroy(); + sysfs_remove_group(rds_sysfs_kobj, &rds_feat_group); + kobject_put(rds_sysfs_kobj); rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info); rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); #if IS_ENABLED(CONFIG_IPV6) @@ -923,6 +974,15 @@ static int __init rds_init(void) if (ret) goto out_proto; + rds_sysfs_kobj = kobject_create_and_add("rds", kernel_kobj); + if (!rds_sysfs_kobj) { + ret = -ENOMEM; + goto out_proto; + } + ret = sysfs_create_group(rds_sysfs_kobj, &rds_feat_group); + if (ret) + goto out_kobject; + rds_info_register_func(RDS_INFO_SOCKETS, rds_sock_info); rds_info_register_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); #if IS_ENABLED(CONFIG_IPV6) @@ -931,7 +991,8 @@ static int __init rds_init(void) #endif goto out; - +out_kobject: + kobject_put(rds_sysfs_kobj); out_proto: proto_unregister(&rds_proto); out_stats: -- 2.43.5 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next v4] rds: Expose feature parameters via sysfs (and ELF) 2025-06-21 0:26 ` [PATCH net-next v4] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk @ 2025-06-22 17:21 ` Allison Henderson 0 siblings, 0 replies; 3+ messages in thread From: Allison Henderson @ 2025-06-22 17:21 UTC (permalink / raw) To: andrew@lunn.ch, rds-devel@oss.oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Konrad Wilk, tj@kernel.org Cc: mkoutny@suse.com, hannes@cmpxchg.org, cgroups@vger.kernel.org On Fri, 2025-06-20 at 20:26 -0400, Konrad Rzeszutek Wilk wrote: > We would like to have a programmatic way for applications > to query which of the features defined in include/uapi/linux/rds.h > are actually implemented by the kernel. > > The problem is that applications can be built against newer > kernel (or older) and they may have the feature implemented or not. > > The lack of a certain feature would signify that the kernel > does not support it. The presence of it signifies the existence > of it. > > This would provide the application to query the sysfs and figure > out what is supported on a running system. > > This patch would expose this extra sysfs file: > > /sys/kernel/rds/features/ioctl_get_tos > /sys/kernel/rds/features/ioctl_set_tos > /sys/kernel/rds/features/socket_cancel_sent_to > /sys/kernel/rds/features/socket_cong_monitor > /sys/kernel/rds/features/socket_free_mr > /sys/kernel/rds/features/socket_get_mr > /sys/kernel/rds/features/socket_get_mr_for_dest > /sys/kernel/rds/features/socket_recverr > /sys/kernel/rds/features/socket_so_rxpath_latency > /sys/kernel/rds/features/socket_so_transport > > With the value of 'supported' in them. In the future this value > could change to say 'deprecated' or have other values (for example > different versions) or can be runtime changed. > > The choice to use sysfs and this particular way is modeled on the > filesystems usage exposing their features. > > Alternative solution such as exposing one file ('features') with > each feature enumerated (which cgroup does) is a bit limited in > that it does not provide means to provide extra content in the future > for each feature. For example if one of the features had three > modes and one wanted to set a particular one at runtime - that > does not exist in cgroup (albeit it can be implemented but it would > be quite hectic to have just one single attribute). > > Another solution of using an ioctl to expose a bitmask has the > disadvantage of being less flexible in the future and while it can > have a bit of supported/unsupported, it is not clear how one would > change modes or expose versions. It is most certainly feasible > but it can get seriously complex fast. > > As such this mechanism offers the basic support we require > now and offers the flexibility for the future. > > Lastly, we also utilize the ELF note macro to expose these via > so that applications that have not yet initialized RDS transport > can inspect the kernel module to see if they have the appropiate > support and choose an alternative protocol if they wish so. > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > v3: Add the missing documentation > Remove the CONFIG_SYSFS #ifdef machinations > Redo it with each feature as a seperate file in 'features' > directory. > v4: Follow the syntax in Documentation/ABI/stable/sysfs-module > Tack on the ret value if we fail kobject_create_and_add > --- > Documentation/ABI/stable/sysfs-transport-rds | 43 +++++++++++++ > net/rds/af_rds.c | 63 +++++++++++++++++++- > 2 files changed, 105 insertions(+), 1 deletion(-) > create mode 100644 Documentation/ABI/stable/sysfs-transport-rds > > diff --git a/Documentation/ABI/stable/sysfs-transport-rds b/Documentation/ABI/stable/sysfs-transport-rds > new file mode 100644 > index 000000000000..749a043d4e88 > --- /dev/null > +++ b/Documentation/ABI/stable/sysfs-transport-rds > @@ -0,0 +1,43 @@ > +What: /sys/kernel/rds/features/* > +Date: June 2025 > +KernelVersion: 6.17 > +Contact: rds-devel@oss.oracle.com > +Description: This directory contains the features that this kernel > + has been built with and supports. They correspond > + to the include/uapi/linux/rds.h features. > + > + The intent is for applications compiled against rds.h > + to be able to query and find out what features the > + driver supports. The current expected value is 'supported'. > + > +What: /sys/kernel/rds/features/ioctl_[get,set]_tos > +Description: Allows the user to set on the socket a type of > + service(tos) value associated forever. > + > +What: /sys/kernel/rds/features/socket_cancel_sent_to > +Description: Allows to cancel all pending messages to a given destination. > + > +What: /sys/kernel/rds/features/socket_cong_monitor > +Description: RDS provides explicit congestion monitoring for a socket using > + a 64-bit mask. Each bit in the mask corresponds to a group of ports. > + > + When a congestion update arrives, RDS checks the set of ports > + that became uncongested against the bit mask. > + > + If they overlap, a control messages is enqueued on the socket, > + and the application is woken up. > + > +What: /sys/kernel/rds/features/socket_[get_mr.get_mr_for_dest,free_mr] One last nit here: comma between get_mr and get_mr_for_des Other than that, this version looks good to me! Thank you for the update! What that fix, you can add my rvb: Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Thank you! Allison > +Description: RDS allows a process to register or release memory ranges for > + RDMA. > + > +What: /sys/kernel/rds/features/socket_recverr > +Description: RDS will send RDMA notification messages to the application for > + any RDMA operation that fails. By default this is off. > + > +What: /sys/kernel/rds/features/socket_so_rxpath_latency > +Description: Receive path latency in various stages of receive path. > + > +What: /sys/kernel/rds/features/socket_so_transport > +Description: Attach the socket to the underlaying transport (TCP, RDMA > + or loop) before invoking bind on the socket. > diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c > index 8435a20968ef..f920cc6f0b2b 100644 > --- a/net/rds/af_rds.c > +++ b/net/rds/af_rds.c > @@ -32,7 +32,9 @@ > */ > #include <linux/module.h> > #include <linux/errno.h> > +#include <linux/elfnote.h> > #include <linux/kernel.h> > +#include <linux/kobject.h> > #include <linux/gfp.h> > #include <linux/in.h> > #include <linux/ipv6.h> > @@ -871,6 +873,53 @@ static void rds6_sock_info(struct socket *sock, unsigned int len, > } > #endif > > +static ssize_t supported_show(struct kobject *kobj, struct kobj_attribute *attr, > + char *buf) > +{ > + return sysfs_emit(buf, "supported\n"); > +} > + > +#define SYSFS_ATTR(_name) \ > +ELFNOTE64("rds." #_name, 0, 1); \ > +static struct kobj_attribute rds_attr_##_name = { \ > + .attr = {.name = __stringify(_name), .mode = 0444 }, \ > + .show = supported_show, \ > +} > + > +SYSFS_ATTR(ioctl_set_tos); > +SYSFS_ATTR(ioctl_get_tos); > +SYSFS_ATTR(socket_cancel_sent_to); > +SYSFS_ATTR(socket_cong_monitor); > +SYSFS_ATTR(socket_get_mr); > +SYSFS_ATTR(socket_get_mr_for_dest); > +SYSFS_ATTR(socket_free_mr); > +SYSFS_ATTR(socket_recverr); > +SYSFS_ATTR(socket_so_rxpath_latency); > +SYSFS_ATTR(socket_so_transport); > + > +#define ATTR_LIST(_name) &rds_attr_##_name.attr > + > +static struct attribute *rds_feat_attrs[] = { > + ATTR_LIST(ioctl_set_tos), > + ATTR_LIST(ioctl_get_tos), > + ATTR_LIST(socket_cancel_sent_to), > + ATTR_LIST(socket_cong_monitor), > + ATTR_LIST(socket_get_mr), > + ATTR_LIST(socket_get_mr_for_dest), > + ATTR_LIST(socket_free_mr), > + ATTR_LIST(socket_recverr), > + ATTR_LIST(socket_so_rxpath_latency), > + ATTR_LIST(socket_so_transport), > + NULL, > +}; > + > +static const struct attribute_group rds_feat_group = { > + .attrs = rds_feat_attrs, > + .name = "features", > +}; > + > +static struct kobject *rds_sysfs_kobj; > + > static void rds_exit(void) > { > sock_unregister(rds_family_ops.family); > @@ -882,6 +931,8 @@ static void rds_exit(void) > rds_stats_exit(); > rds_page_exit(); > rds_bind_lock_destroy(); > + sysfs_remove_group(rds_sysfs_kobj, &rds_feat_group); > + kobject_put(rds_sysfs_kobj); > rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info); > rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); > #if IS_ENABLED(CONFIG_IPV6) > @@ -923,6 +974,15 @@ static int __init rds_init(void) > if (ret) > goto out_proto; > > + rds_sysfs_kobj = kobject_create_and_add("rds", kernel_kobj); > + if (!rds_sysfs_kobj) { > + ret = -ENOMEM; > + goto out_proto; > + } > + ret = sysfs_create_group(rds_sysfs_kobj, &rds_feat_group); > + if (ret) > + goto out_kobject; > + > rds_info_register_func(RDS_INFO_SOCKETS, rds_sock_info); > rds_info_register_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); > #if IS_ENABLED(CONFIG_IPV6) > @@ -931,7 +991,8 @@ static int __init rds_init(void) > #endif > > goto out; > - > +out_kobject: > + kobject_put(rds_sysfs_kobj); > out_proto: > proto_unregister(&rds_proto); > out_stats: ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-06-22 17:21 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-21 0:26 [PATCH net-next v4] Expose RDS features via sysfs Konrad Rzeszutek Wilk 2025-06-21 0:26 ` [PATCH net-next v4] rds: Expose feature parameters via sysfs (and ELF) Konrad Rzeszutek Wilk 2025-06-22 17:21 ` Allison Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).