* Re: [PATCH RFC net-next v4 7/8] vsock: Add lockless sendmsg() support
From: Stefano Garzarella @ 2023-06-22 16:37 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v4-7-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:34AM +0000, Bobby Eshleman wrote:
>Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock
>it does not scale when many senders share a socket.
>
>Prior to this patch the socket lock is used to protect both reads and
>writes to the local_addr, remote_addr, transport, and buffer size
>variables of a vsock socket. What follows are the new protection schemes
>for these fields that ensure a race-free and usually lock-free
>multi-sender sendmsg() path for vsock dgrams.
>
>- local_addr
>local_addr changes as a result of binding a socket. The write path
>for local_addr is bind() and various vsock_auto_bind() call sites.
>After a socket has been bound via vsock_auto_bind() or bind(), subsequent
>calls to bind()/vsock_auto_bind() do not write to local_addr again. bind()
>rejects the user request and vsock_auto_bind() early exits.
>Therefore, the local addr can not change while a parallel thread is
>in sendmsg() and lock-free reads of local addr in sendmsg() are safe.
>Change: only acquire lock for auto-binding as-needed in sendmsg().
>
>- buffer size variables
>Not used by dgram, so they do not need protection. No change.
>
>- remote_addr and transport
>Because a remote_addr update may result in a changed transport, but we
>would like to be able to read these two fields lock-free but coherently
>in the vsock send path, this patch packages these two fields into a new
>struct vsock_remote_info that is referenced by an RCU-protected pointer.
>
>Writes are synchronized as usual by the socket lock. Reads only take
>place in RCU read-side critical sections. When remote_addr or transport
>is updated, a new remote info is allocated. Old readers still see the
>old coherent remote_addr/transport pair, and new readers will refer to
>the new coherent. The coherency between remote_addr and transport
>previously provided by the socket lock alone is now also preserved by
>RCU, except with the highly-scalable lock-free read-side.
>
>Helpers are introduced for accessing and updating the new pointer.
>
>The new structure is contains an rcu_head so that kfree_rcu() can be
>used. This removes the need of writers to use synchronize_rcu() after
>freeing old structures which is simply more efficient and reduces code
>churn where remote_addr/transport are already being updated inside RCU
>read-side sections.
>
>Only virtio has been tested, but updates were necessary to the VMCI and
>hyperv code. Unfortunately the author does not have access to
>VMCI/hyperv systems so those changes are untested.
@Dexuan, @Vishnu, @Bryan, can you test this?
>
>Perf Tests (results from patch v2)
>vCPUS: 16
>Threads: 16
>Payload: 4KB
>Test Runs: 5
>Type: SOCK_DGRAM
>
>Before: 245.2 MB/s
>After: 509.2 MB/s (+107%)
>
>Notably, on the same test system, vsock dgram even outperforms
>multi-threaded UDP over virtio-net with vhost and MQ support enabled.
>
>Throughput metrics for single-threaded SOCK_DGRAM and
>single/multi-threaded SOCK_STREAM showed no statistically signficant
>throughput changes (lowest p-value reaching 0.27), with the range of the
>mean difference ranging between -5% to +1%.
>
Quite nice. Did you see any improvements also on stream/seqpacket
sockets?
However this is a big change, maybe I would move it to another series,
because it takes time to be reviewed and tested properly.
WDYT?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams
From: Stefano Garzarella @ 2023-06-22 16:31 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v4-6-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:33AM +0000, Bobby Eshleman wrote:
>This commit adds support for datagrams over virtio/vsock.
>
>Message boundaries are preserved on a per-skb and per-vq entry basis.
>Messages are copied in whole from the user to an SKB, which in turn is
>added to the scatterlist for the virtqueue in whole for the device.
>Messages do not straddle skbs and they do not straddle packets.
>Messages may be truncated by the receiving user if their buffer is
>shorter than the message.
>
>Other properties of vsock datagrams:
>- Datagrams self-throttle at the per-socket sk_sndbuf threshold.
>- The same virtqueue is used as is used for streams and seqpacket flows
>- Credits are not used for datagrams
>- Packets are dropped silently by the device, which means the virtqueue
> will still get kicked even during high packet loss, so long as the
> socket does not exceed sk_sndbuf.
>
>Future work might include finding a way to reduce the virtqueue kick
>rate for datagram flows with high packet loss.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> drivers/vhost/vsock.c | 27 ++++-
> include/linux/virtio_vsock.h | 5 +-
> include/net/af_vsock.h | 1 +
> include/uapi/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/af_vsock.c | 58 +++++++--
> net/vmw_vsock/virtio_transport.c | 23 +++-
> net/vmw_vsock/virtio_transport_common.c | 207 ++++++++++++++++++++++++--------
> net/vmw_vsock/vsock_loopback.c | 8 +-
> 8 files changed, 264 insertions(+), 66 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 8f0082da5e70..159c1a22c1a8 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -32,7 +32,8 @@
> enum {
> VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>- (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
>+ (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
>+ (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
>@@ -56,6 +57,7 @@ struct vhost_vsock {
> atomic_t queued_replies;
>
> u32 guest_cid;
>+ bool dgram_allow;
> bool seqpacket_allow;
> };
>
>@@ -394,6 +396,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> return val < vq->num;
> }
>
>+static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport vhost_transport = {
>@@ -410,10 +413,11 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
>- .dgram_allow = virtio_transport_dgram_allow,
>+ .dgram_allow = vhost_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
> .dgram_get_length = virtio_transport_dgram_get_length,
>+ .dgram_payload_offset = 0,
>
> .stream_enqueue = virtio_transport_stream_enqueue,
> .stream_dequeue = virtio_transport_stream_dequeue,
>@@ -446,6 +450,22 @@ static struct virtio_transport vhost_transport = {
> .send_pkt = vhost_transport_send_pkt,
> };
>
>+static bool vhost_transport_dgram_allow(u32 cid, u32 port)
>+{
>+ struct vhost_vsock *vsock;
>+ bool dgram_allow = false;
>+
>+ rcu_read_lock();
>+ vsock = vhost_vsock_get(cid);
>+
>+ if (vsock)
>+ dgram_allow = vsock->dgram_allow;
>+
>+ rcu_read_unlock();
>+
>+ return dgram_allow;
>+}
>+
> static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> {
> struct vhost_vsock *vsock;
>@@ -802,6 +822,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
>+ if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
>+ vsock->dgram_allow = true;
>+
> for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> vq = &vsock->vqs[i];
> mutex_lock(&vq->mutex);
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 73afa09f4585..237ca87a2ecd 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -216,7 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
>-bool virtio_transport_dgram_allow(u32 cid, u32 port);
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>@@ -247,4 +246,8 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
> void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list);
> int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor);
>+void virtio_transport_init_dgram_bind_tables(void);
>+int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>+int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>+int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
> #endif /* _LINUX_VIRTIO_VSOCK_H */
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index 7bedb9ee7e3e..c115e655b4f5 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -225,6 +225,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
> void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 9c25f267bbc0..27b4b2b8bf13 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> enum virtio_vsock_type {
> VIRTIO_VSOCK_TYPE_STREAM = 1,
> VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
>+ VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 7a3ca4270446..b0b18e7f4299 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
I would split this patch in 2, one with the changes in af_vsock.c,
of for the virtio changes.
>@@ -114,6 +114,7 @@
> static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> static void vsock_sk_destruct(struct sock *sk);
> static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
>+static bool sock_type_connectible(u16 type);
>
> /* Protocol family. */
> struct proto vsock_proto = {
>@@ -180,6 +181,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
> EXPORT_SYMBOL_GPL(vsock_connected_table);
> DEFINE_SPINLOCK(vsock_table_lock);
> EXPORT_SYMBOL_GPL(vsock_table_lock);
>+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE];
>+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
>
> /* Autobind this socket to the local address if necessary. */
> static int vsock_auto_bind(struct vsock_sock *vsk)
>@@ -202,6 +205,9 @@ static void vsock_init_tables(void)
>
> for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> INIT_LIST_HEAD(&vsock_connected_table[i]);
>+
>+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
>+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
> }
>
> static void __vsock_insert_bound(struct list_head *list,
>@@ -230,8 +236,8 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>- struct list_head *bind_table)
>+static struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>+ struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
>@@ -248,6 +254,23 @@ struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
> return NULL;
> }
>
>+struct sock *
>+vsock_find_bound_dgram_socket(struct sockaddr_vm *addr)
>+{
>+ struct sock *sk;
>+
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ sk = vsock_find_bound_socket_common(addr,
>+ &vsock_dgram_bind_table[VSOCK_HASH(addr)]);
>+ if (sk)
>+ sock_hold(sk);
>+
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+
>+ return sk;
>+}
>+EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket);
>+
> static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> {
> return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>@@ -287,6 +310,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_insert_connected);
>
>+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
>+{
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ if (__vsock_in_bound_table(vsk))
>+ __vsock_remove_bound(vsk);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+}
>+
> void vsock_remove_bound(struct vsock_sock *vsk)
> {
> spin_lock_bh(&vsock_table_lock);
>@@ -338,7 +369,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
> void vsock_remove_sock(struct vsock_sock *vsk)
> {
>- vsock_remove_bound(vsk);
>+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
>+ vsock_remove_bound(vsk);
>+ else
>+ vsock_remove_dgram_bound(vsk);
> vsock_remove_connected(vsk);
> }
> EXPORT_SYMBOL_GPL(vsock_remove_sock);
>@@ -720,11 +754,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
> }
>
>-static int __vsock_bind_dgram(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_dgram(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
> {
>- if (!vsk->transport || !vsk->transport->dgram_bind)
>- return -EINVAL;
>+ if (!vsk->transport || !vsk->transport->dgram_bind) {
>+ int retval;
>+
>+ spin_lock_bh(&vsock_dgram_table_lock);
>+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
>+ VSOCK_HASH_SIZE);
>+ spin_unlock_bh(&vsock_dgram_table_lock);
>+
>+ return retval;
>+ }
>
> return vsk->transport->dgram_bind(vsk, addr);
> }
>@@ -755,7 +797,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
> break;
>
> case SOCK_DGRAM:
>- retval = __vsock_bind_dgram(vsk, addr);
>+ retval = vsock_bind_dgram(vsk, addr);
> break;
>
> default:
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 1b7843a7779a..7160a3104218 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -63,6 +63,7 @@ struct virtio_vsock {
>
> u32 guest_cid;
> bool seqpacket_allow;
>+ bool dgram_allow;
> };
>
> static u32 virtio_transport_get_local_cid(void)
>@@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static bool virtio_transport_dgram_allow(u32 cid, u32 port);
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
>@@ -465,6 +467,21 @@ static struct virtio_transport virtio_transport = {
> .send_pkt = virtio_transport_send_pkt,
> };
>
>+static bool virtio_transport_dgram_allow(u32 cid, u32 port)
>+{
>+ struct virtio_vsock *vsock;
>+ bool dgram_allow;
>+
>+ dgram_allow = false;
>+ rcu_read_lock();
>+ vsock = rcu_dereference(the_virtio_vsock);
>+ if (vsock)
>+ dgram_allow = vsock->dgram_allow;
>+ rcu_read_unlock();
>+
>+ return dgram_allow;
>+}
>+
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
> {
> struct virtio_vsock *vsock;
>@@ -658,6 +675,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
> vsock->seqpacket_allow = true;
>
>+ if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>+ vsock->dgram_allow = true;
>+
> vdev->priv = vsock;
>
> ret = virtio_vsock_vqs_init(vsock);
>@@ -750,7 +770,8 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
>- VIRTIO_VSOCK_F_SEQPACKET
>+ VIRTIO_VSOCK_F_SEQPACKET,
>+ VIRTIO_VSOCK_F_DGRAM
> };
>
> static struct virtio_driver virtio_vsock_driver = {
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index d5a3c8efe84b..bc9d459723f5 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
> return container_of(t, struct virtio_transport, transport);
> }
>
>+/* Requires info->msg and info->vsk */
>+static struct sk_buff *
>+virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size,
>+ gfp_t mask, int *err)
>+{
>+ struct sk_buff *skb;
>+ struct sock *sk;
>+ int noblock;
>+
>+ if (size < VIRTIO_VSOCK_SKB_HEADROOM) {
>+ *err = -EINVAL;
>+ return NULL;
>+ }
>+
>+ if (info->msg)
>+ noblock = info->msg->msg_flags & MSG_DONTWAIT;
>+ else
>+ noblock = 1;
>+
>+ sk = sk_vsock(info->vsk);
>+ sk->sk_allocation = mask;
>+ skb = sock_alloc_send_skb(sk, size, noblock, err);
>+ if (!skb)
>+ return NULL;
>+
>+ skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM);
>+ return skb;
>+}
>+
> /* Returns a new packet on success, otherwise returns NULL.
> *
> * If NULL is returned, errp is set to a negative errno.
^
So this comment was wrong before this change?
>@@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> u32 src_cid,
> u32 src_port,
> u32 dst_cid,
>- u32 dst_port)
>+ u32 dst_port,
>+ int *errp)
> {
> const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
> struct virtio_vsock_hdr *hdr;
>@@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> void *payload;
> int err;
>
>- skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>- if (!skb)
>+ /* dgrams do not use credits, self-throttle according to sk_sndbuf
>+ * using sock_alloc_send_skb. This helps avoid triggering the OOM.
>+ */
I'm thinking if we should do somenthing similar also for other types...
>+ if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) {
>+ skb = virtio_transport_sock_alloc_send_skb(info,
>skb_len, GFP_KERNEL, &err);
Why not using errp here?
>+ } else {
>+ skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
Maybe we can pass errp also to virtio_vsock_alloc_skb.
The rest LGTM.
Anyway, the implementation seems to work well, so I think now we should
discuss the virtio-spec changes, that with this approach should be
not big, right?
Thanks,
Stefano
^ permalink raw reply
* [PATCH v2] net: mana: Batch ringing RX queue doorbell on receiving packets
From: longli @ 2023-06-22 16:22 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Ajay Sharma, Dexuan Cui,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-rdma, linux-hyperv, netdev, linux-kernel, Long Li, stable
From: Long Li <longli@microsoft.com>
It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue.
Move the code for ringing doorbell page to where after we have posted all
WQEs to the receive queue during a callback from napi_poll().
Tests showed no regression in network latency benchmarks.
Cc: stable@vger.kernel.org
Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Long Li <longli@microsoft.com>
---
Change log:
v2:
Check for comp_read > 0 as it might be negative on completion error.
Set rq.wqe_cnt to 0 according to BNIC spec.
drivers/net/ethernet/microsoft/mana/gdma_main.c | 5 ++++-
drivers/net/ethernet/microsoft/mana/mana_en.c | 10 ++++++++--
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..ef11d09a3655 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -300,8 +300,11 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
void mana_gd_wq_ring_doorbell(struct gdma_context *gc, struct gdma_queue *queue)
{
+ /* BNIC Spec specifies that client should set 0 for rq.wqe_cnt
+ * This value is not used in sq
+ */
mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type,
- queue->id, queue->head * GDMA_WQE_BU_SIZE, 1);
+ queue->id, queue->head * GDMA_WQE_BU_SIZE, 0);
}
void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index cd4d5ceb9f2d..1d8abe63fcb8 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1383,8 +1383,8 @@ static void mana_post_pkt_rxq(struct mana_rxq *rxq)
recv_buf_oob = &rxq->rx_oobs[curr_index];
- err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req,
- &recv_buf_oob->wqe_inf);
+ err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req,
+ &recv_buf_oob->wqe_inf);
if (WARN_ON_ONCE(err))
return;
@@ -1654,6 +1654,12 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
mana_process_rx_cqe(rxq, cq, &comp[i]);
}
+ if (comp_read > 0) {
+ struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context;
+
+ mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq);
+ }
+
if (rxq->xdp_flush)
xdp_do_flush();
}
--
2.34.1
^ permalink raw reply related
* Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams
From: Stefano Garzarella @ 2023-06-22 16:09 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: Bobby Eshleman, Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang,
Xuan Zhuo, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers,
Dan Carpenter, Simon Horman, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <92b3a6df-ded3-6470-39d1-fe0939441abc@gmail.com>
On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
>Hello Bobby!
>
>On 10.06.2023 03:58, Bobby Eshleman wrote:
>> This commit adds support for datagrams over virtio/vsock.
>>
>> Message boundaries are preserved on a per-skb and per-vq entry basis.
>
>I'm a little bit confused about the following case: let vhost sends 4097 bytes
>datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
>buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
>buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
>has two skb in it rx queue, and user in guest wants to read data - does it read
>4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
>special marker in header which shows where message ends, and how it works here?
I think the main difference is that DGRAM is not connection-oriented, so
we don't have a stream and we can't split the packet into 2 (maybe we
could, but we have no guarantee that the second one for example will be
not discarded because there is no space).
So I think it is acceptable as a restriction to keep it simple.
My only doubt is, should we make the RX buffer size configurable,
instead of always using 4k?
Thanks,
Stefano
^ permalink raw reply
* Re: [PATCH RFC net-next v4 5/8] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
From: Stefano Garzarella @ 2023-06-22 15:29 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf, Jiang Wang
In-Reply-To: <20230413-b4-vsock-dgram-v4-5-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:32AM +0000, Bobby Eshleman wrote:
>This commit adds a feature bit for virtio vsock to support datagrams.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> include/uapi/linux/virtio_vsock.h | 1 +
> 1 file changed, 1 insertion(+)
LGTM, but I'll give the R-b when we merge the virtio-spec.
Stefano
>
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 64738838bee5..9c25f267bbc0 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -40,6 +40,7 @@
>
> /* The feature bitmap for virtio vsock */
> #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */
>+#define VIRTIO_VSOCK_F_DGRAM 3 /* SOCK_DGRAM supported */
>
> struct virtio_vsock_config {
> __le64 guest_cid;
>
>--
>2.30.2
>
^ permalink raw reply
* Re: [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable
From: Stefano Garzarella @ 2023-06-22 15:25 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v4-4-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:31AM +0000, Bobby Eshleman wrote:
>This commit makes the bind table management functions in vsock usable
>for different bind tables. For use by datagrams in a future patch.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> net/vmw_vsock/af_vsock.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index ef86765f3765..7a3ca4270446 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> sock_put(&vsk->sk);
> }
>
>-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr,
>+ struct list_head *bind_table)
> {
> struct vsock_sock *vsk;
>
>- list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
>+ list_for_each_entry(vsk, bind_table, bound_table) {
> if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> return sk_vsock(vsk);
>
>@@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> return NULL;
> }
>
>+static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
>+{
>+ return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr));
>+}
>+
> static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> struct sockaddr_vm *dst)
> {
>@@ -646,12 +652,17 @@ static void vsock_pending_work(struct work_struct *work)
>
> /**** SOCKET OPERATIONS ****/
>
>-static int __vsock_bind_connectible(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>+static int vsock_bind_common(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr,
>+ struct list_head *bind_table,
>+ size_t table_size)
> {
> static u32 port;
> struct sockaddr_vm new_addr;
>
>+ if (table_size < VSOCK_HASH_SIZE)
>+ return -1;
Why we need this check now?
>+
> if (!port)
> port = get_random_u32_above(LAST_RESERVED_PORT);
>
>@@ -667,7 +678,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>
> new_addr.svm_port = port++;
>
>- if (!__vsock_find_bound_socket(&new_addr)) {
>+ if (!vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)])) {
> found = true;
> break;
> }
>@@ -684,7 +696,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> return -EACCES;
> }
>
>- if (__vsock_find_bound_socket(&new_addr))
>+ if (vsock_find_bound_socket_common(&new_addr,
>+ &bind_table[VSOCK_HASH(addr)]))
> return -EADDRINUSE;
> }
>
>@@ -696,11 +709,17 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> * by AF_UNIX.
> */
> __vsock_remove_bound(vsk);
>- __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk);
>+ __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk);
>
> return 0;
> }
>
>+static int __vsock_bind_connectible(struct vsock_sock *vsk,
>+ struct sockaddr_vm *addr)
>+{
>+ return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
>+}
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>
>--
>2.30.2
>
The rest seems okay to me, but I agree with Simon's suggestion.
Stefano
^ permalink raw reply
* Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams
From: Stefano Garzarella @ 2023-06-22 15:19 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v4-3-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
>This patch adds support for multi-transport datagrams.
>
>This includes:
>- Per-packet lookup of transports when using sendto(sockaddr_vm)
>- Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> sockaddr_vm
>
>To preserve backwards compatibility with VMCI, some important changes
>were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
>be used for dgrams iff there is not yet a g2h or h2g transport that has
s/iff/if
>been registered that can transmit the packet. If there is a g2h/h2g
>transport for that remote address, then that transport will be used and
>not "transport_dgram". This essentially makes "transport_dgram" a
>fallback transport for when h2g/g2h has not yet gone online, which
>appears to be the exact use case for VMCI.
>
>This design makes sense, because there is no reason that the
>transport_{g2h,h2g} cannot also service datagrams, which makes the role
>of transport_dgram difficult to understand outside of the VMCI context.
>
>The logic around "transport_dgram" had to be retained to prevent
>breaking VMCI:
>
>1) VMCI datagrams appear to function outside of the h2g/g2h
> paradigm. When the vmci transport becomes online, it registers itself
> with the DGRAM feature, but not H2G/G2H. Only later when the
> transport has more information about its environment does it register
> H2G or G2H. In the case that a datagram socket becomes active
> after DGRAM registration but before G2H/H2G registration, the
> "transport_dgram" transport needs to be used.
IIRC we did this, because at that time only VMCI supported DGRAM. Now
that there are more transports, maybe DGRAM can follow the h2g/g2h
paradigm.
>
>2) VMCI seems to require special message be sent by the transport when a
> datagram socket calls bind(). Under the h2g/g2h model, the transport
> is selected using the remote_addr which is set by connect(). At
> bind time there is no remote_addr because often no connect() has been
> called yet: the transport is null. Therefore, with a null transport
> there doesn't seem to be any good way for a datagram socket a tell the
> VMCI transport that it has just had bind() called upon it.
@Vishnu, @Bryan do you think we can avoid this in some way?
>
>Only transports with a special datagram fallback use-case such as VMCI
>need to register VSOCK_TRANSPORT_F_DGRAM.
Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
something like that.
In any case, we definitely need to update the comment in
include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
this.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> drivers/vhost/vsock.c | 1 -
> include/linux/virtio_vsock.h | 2 -
> net/vmw_vsock/af_vsock.c | 78 +++++++++++++++++++++++++--------
> net/vmw_vsock/hyperv_transport.c | 6 ---
> net/vmw_vsock/virtio_transport.c | 1 -
> net/vmw_vsock/virtio_transport_common.c | 7 ---
> net/vmw_vsock/vsock_loopback.c | 1 -
> 7 files changed, 60 insertions(+), 36 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index c8201c070b4b..8f0082da5e70 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> .cancel_pkt = vhost_transport_cancel_pkt,
>
> .dgram_enqueue = virtio_transport_dgram_enqueue,
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
> .dgram_get_port = virtio_transport_dgram_get_port,
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 23521a318cf0..73afa09f4585 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> bool virtio_transport_stream_allow(u32 cid, u32 port);
>-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr);
> bool virtio_transport_dgram_allow(u32 cid, u32 port);
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 74358f0b47fa..ef86765f3765 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> return transport;
> }
>
>+static const struct vsock_transport *
>+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
>+{
>+ const struct vsock_transport *transport;
>+
>+ transport = vsock_connectible_lookup_transport(cid, flags);
>+ if (transport)
>+ return transport;
>+
>+ return transport_dgram;
>+}
>+
> /* Assign a transport to a socket and call the .init transport callback.
> *
> * Note: for connection oriented socket this must be called when vsk->remote_addr
>@@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
>
> switch (sk->sk_type) {
> case SOCK_DGRAM:
>- new_transport = transport_dgram;
>+ new_transport = vsock_dgram_lookup_transport(remote_cid,
>+ remote_flags);
> break;
> case SOCK_STREAM:
> case SOCK_SEQPACKET:
>@@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> struct sockaddr_vm *addr)
> {
>+ if (!vsk->transport || !vsk->transport->dgram_bind)
>+ return -EINVAL;
>+
> return vsk->transport->dgram_bind(vsk, addr);
> }
>
>@@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>
> lock_sock(sk);
>
>- transport = vsk->transport;
>-
>- err = vsock_auto_bind(vsk);
>- if (err)
>- goto out;
>-
>-
> /* If the provided message contains an address, use that. Otherwise
> * fall back on the socket's remote handle (if it has been connected).
> */
> if (msg->msg_name &&
> vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> &remote_addr) == 0) {
>+ transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
>+ remote_addr->svm_flags);
>+ if (!transport) {
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ if (!try_module_get(transport->module)) {
>+ err = -ENODEV;
>+ goto out;
>+ }
>+
> /* Ensure this address is of the right type and is a valid
> * destination.
> */
>@@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> remote_addr->svm_cid = transport->get_local_cid();
>
From here ...
> if (!vsock_addr_bound(remote_addr)) {
>+ module_put(transport->module);
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ if (!transport->dgram_allow(remote_addr->svm_cid,
>+ remote_addr->svm_port)) {
>+ module_put(transport->module);
> err = -EINVAL;
> goto out;
> }
>+
>+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
... to here, looks like duplicate code, can we get it out of the if
block?
>+ module_put(transport->module);
> } else if (sock->state == SS_CONNECTED) {
> remote_addr = &vsk->remote_addr;
>+ transport = vsk->transport;
>+
>+ err = vsock_auto_bind(vsk);
>+ if (err)
>+ goto out;
>
> if (remote_addr->svm_cid == VMADDR_CID_ANY)
> remote_addr->svm_cid = transport->get_local_cid();
>@@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> /* XXX Should connect() or this function ensure remote_addr is
> * bound?
> */
>- if (!vsock_addr_bound(&vsk->remote_addr)) {
>+ if (!vsock_addr_bound(remote_addr)) {
> err = -EINVAL;
> goto out;
> }
>- } else {
>- err = -EINVAL;
>- goto out;
>- }
>
>- if (!transport->dgram_allow(remote_addr->svm_cid,
>- remote_addr->svm_port)) {
>+ if (!transport->dgram_allow(remote_addr->svm_cid,
>+ remote_addr->svm_port)) {
>+ err = -EINVAL;
>+ goto out;
>+ }
>+
>+ err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>+ } else {
> err = -EINVAL;
> goto out;
> }
>
>- err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
>-
> out:
> release_sock(sk);
> return err;
>@@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
> if (err)
> goto out;
>
>+ memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
>+
>+ err = vsock_assign_transport(vsk, NULL);
>+ if (err)
>+ goto out;
>+
> if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> remote_addr->svm_port)) {
> err = -EINVAL;
> goto out;
> }
>
>- memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> sock->state = SS_CONNECTED;
>
> /* sock map disallows redirection of non-TCP sockets with sk_state !=
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index ff6e87e25fa0..c00bc5da769a 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> kfree(hvs);
> }
>
>-static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>-{
>- return -EOPNOTSUPP;
>-}
>-
> static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> {
> return -EOPNOTSUPP;
>@@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
> .connect = hvs_connect,
> .shutdown = hvs_shutdown,
>
>- .dgram_bind = hvs_dgram_bind,
> .dgram_get_cid = hvs_dgram_get_cid,
> .dgram_get_port = hvs_dgram_get_port,
> .dgram_get_length = hvs_dgram_get_length,
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 5763cdf13804..1b7843a7779a 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = virtio_transport_cancel_pkt,
>
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index e6903c719964..d5a3c8efe84b 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>
>-int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>- struct sockaddr_vm *addr)
>-{
>- return -EOPNOTSUPP;
>-}
>-EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>-
> int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> {
> return -EOPNOTSUPP;
>diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>index 2f3cabc79ee5..e9de45a26fbd 100644
>--- a/net/vmw_vsock/vsock_loopback.c
>+++ b/net/vmw_vsock/vsock_loopback.c
>@@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> .shutdown = virtio_transport_shutdown,
> .cancel_pkt = vsock_loopback_cancel_pkt,
>
>- .dgram_bind = virtio_transport_dgram_bind,
> .dgram_enqueue = virtio_transport_dgram_enqueue,
> .dgram_allow = virtio_transport_dgram_allow,
> .dgram_get_cid = virtio_transport_dgram_get_cid,
>
>--
>2.30.2
>
The rest LGTM!
Stefano
^ permalink raw reply
* Re: [PATCH RFC net-next v4 2/8] vsock: refactor transport lookup code
From: Stefano Garzarella @ 2023-06-22 14:57 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Bryan Tan,
Vishnu Dasa, VMware PV-Drivers Reviewers, Dan Carpenter,
Simon Horman, Krasnov Arseniy, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <20230413-b4-vsock-dgram-v4-2-0cebbb2ae899@bytedance.com>
On Sat, Jun 10, 2023 at 12:58:29AM +0000, Bobby Eshleman wrote:
>Introduce new reusable function vsock_connectible_lookup_transport()
>that performs the transport lookup logic.
>
>No functional change intended.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index ffb4dd8b6ea7..74358f0b47fa 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -422,6 +422,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk)
> vsk->transport = NULL;
> }
>
>+static const struct vsock_transport *
>+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
>+{
>+ const struct vsock_transport *transport;
>+
>+ if (vsock_use_local_transport(cid))
>+ transport = transport_local;
>+ else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
>+ (flags & VMADDR_FLAG_TO_HOST))
>+ transport = transport_g2h;
>+ else
>+ transport = transport_h2g;
>+
>+ return transport;
>+}
>+
> /* Assign a transport to a socket and call the .init transport callback.
> *
> * Note: for connection oriented socket this must be called when vsk->remote_addr
>@@ -462,13 +478,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> break;
> case SOCK_STREAM:
> case SOCK_SEQPACKET:
>- if (vsock_use_local_transport(remote_cid))
>- new_transport = transport_local;
>- else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
>- (remote_flags & VMADDR_FLAG_TO_HOST))
>- new_transport = transport_g2h;
>- else
>- new_transport = transport_h2g;
>+ new_transport = vsock_connectible_lookup_transport(remote_cid,
>+ remote_flags);
> break;
> default:
> return -ESOCKTNOSUPPORT;
>
>--
>2.30.2
>
^ permalink raw reply
* Re: [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue
From: Stefano Garzarella @ 2023-06-22 14:51 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: Bobby Eshleman, Stefan Hajnoczi, Michael S. Tsirkin, Jason Wang,
Xuan Zhuo, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers,
Dan Carpenter, Simon Horman, kvm, virtualization, netdev,
linux-kernel, linux-hyperv, bpf
In-Reply-To: <3eb6216b-a3d2-e1ef-270c-8a0032a4a8a5@gmail.com>
On Sun, Jun 11, 2023 at 11:43:15PM +0300, Arseniy Krasnov wrote:
>Hello Bobby! Thanks for this patchset! Small comment below:
>
>On 10.06.2023 03:58, Bobby Eshleman wrote:
>> This commit drops the transport->dgram_dequeue callback and makes
>> vsock_dgram_recvmsg() generic. It also adds additional transport
>> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
>> parsing skbs for CID/port which vary in format per transport.
>>
>> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> ---
>> drivers/vhost/vsock.c | 4 +-
>> include/linux/virtio_vsock.h | 3 ++
>> include/net/af_vsock.h | 13 ++++++-
>> net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++-
>> net/vmw_vsock/hyperv_transport.c | 17 +++++++--
>> net/vmw_vsock/virtio_transport.c | 4 +-
>> net/vmw_vsock/virtio_transport_common.c | 18 +++++++++
>> net/vmw_vsock/vmci_transport.c | 68 +++++++++++++--------------------
>> net/vmw_vsock/vsock_loopback.c | 4 +-
>> 9 files changed, 132 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 6578db78f0ae..c8201c070b4b 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = {
>> .cancel_pkt = vhost_transport_cancel_pkt,
>>
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_bind = virtio_transport_dgram_bind,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_enqueue = virtio_transport_stream_enqueue,
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index c58453699ee9..23521a318cf0 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port);
>> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> struct sockaddr_vm *addr);
>> bool virtio_transport_dgram_allow(u32 cid, u32 port);
>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len);
>>
>> int virtio_transport_connect(struct vsock_sock *vsk);
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index 0e7504a42925..7bedb9ee7e3e 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -120,11 +120,20 @@ struct vsock_transport {
>>
>> /* DGRAM. */
>> int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
>> - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>> - size_t len, int flags);
>> int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
>> struct msghdr *, size_t len);
>> bool (*dgram_allow)(u32 cid, u32 port);
>> + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
>> + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
>> + int (*dgram_get_length)(struct sk_buff *skb, size_t *length);
>> +
>> + /* The number of bytes into the buffer at which the payload starts, as
>> + * first seen by the receiving socket layer. For example, if the
>> + * transport presets the skb pointers using skb_pull(sizeof(header))
>> + * than this would be zero, otherwise it would be the size of the
>> + * header.
>> + */
>> + const size_t dgram_payload_offset;
>>
>> /* STREAM. */
>> /* TODO: stream_bind() */
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index efb8a0937a13..ffb4dd8b6ea7 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock,
>> int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> size_t len, int flags)
>> {
>> + const struct vsock_transport *transport;
>> #ifdef CONFIG_BPF_SYSCALL
>> const struct proto *prot;
>> #endif
>> struct vsock_sock *vsk;
>> + struct sk_buff *skb;
>> + size_t payload_len;
>> struct sock *sk;
>> + int err;
>>
>> sk = sock->sk;
>> vsk = vsock_sk(sk);
>> @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
>> return prot->recvmsg(sk, msg, len, flags, NULL);
>> #endif
>>
>> - return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>> + if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>> + return -EOPNOTSUPP;
>> +
>> + transport = vsk->transport;
>> +
>> + /* Retrieve the head sk_buff from the socket's receive queue. */
>> + err = 0;
>> + skb = skb_recv_datagram(sk_vsock(vsk), flags, &err);
>> + if (!skb)
>> + return err;
>> +
>> + err = transport->dgram_get_length(skb, &payload_len);
What about ssize_t return value here?
Or maybe a single callback that return both length and offset?
.dgram_get_payload_info(skb, &payload_len, &payload_off)
>> + if (err)
>> + goto out;
>> +
>> + if (payload_len > len) {
>> + payload_len = len;
>> + msg->msg_flags |= MSG_TRUNC;
>> + }
>> +
>> + /* Place the datagram payload in the user's iovec. */
>> + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len);
>> + if (err)
>> + goto out;
>> +
>> + if (msg->msg_name) {
>> + /* Provide the address of the sender. */
>> + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>> + unsigned int cid, port;
>> +
>> + err = transport->dgram_get_cid(skb, &cid);
>> + if (err)
>> + goto out;
>> +
>> + err = transport->dgram_get_port(skb, &port);
>> + if (err)
>> + goto out;
>
>Maybe we can merge 'dgram_get_cid' and 'dgram_get_port' to a single callback? Because I see that this is
>the only place where both are used (correct me if i'm wrong) and logically both operates with addresses:
>CID and port. E.g. something like that: dgram_get_cid_n_port().
What about .dgram_addr_init(struct sk_buff *skb, struct sockaddr_vm *addr)
and the transport can set cid and port?
>
>Moreover, I'm not sure, but is it good "tradeoff" here: remove transport specific callback for dgram receive
>where we already have 'msghdr' with both data buffer and buffer for 'sockaddr_vm' and instead of it add new
>several fields (callbacks) to transports like dgram_get_cid(), dgram_get_port()? I agree, that in each transport
>specific callback we will have same copying logic by calling 'skb_copy_datagram_msg()' and filling address
>by using 'vsock_addr_init()', but in this case we don't need to update transports too much. For example HyperV
>still unchanged as it does not support SOCK_DGRAM. For VMCI You just need to add 'vsock_addr_init()' logic
>to it's dgram dequeue callback.
>
>What do You think?
Honestly, I'd rather avoid duplicate code than reduce changes in
transports that don't support dgram.
One thing I do agree on though is minimizing the number of callbacks
to call to reduce the number of indirection (more performance?).
Thanks,
Stefano
>
>Thanks, Arseniy
>
>> +
>> + vsock_addr_init(vm_addr, cid, port);
>> + msg->msg_namelen = sizeof(*vm_addr);
>> + }
>> + err = payload_len;
>> +
>> +out:
>> + skb_free_datagram(&vsk->sk, skb);
>> + return err;
>> }
>> EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
>>
>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>> index 7cb1a9d2cdb4..ff6e87e25fa0 100644
>> --- a/net/vmw_vsock/hyperv_transport.c
>> +++ b/net/vmw_vsock/hyperv_transport.c
>> @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
>> return -EOPNOTSUPP;
>> }
>>
>> -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
>> - size_t len, int flags)
>> +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
>> {
>> return -EOPNOTSUPP;
>> }
>> @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = {
>> .shutdown = hvs_shutdown,
>>
>> .dgram_bind = hvs_dgram_bind,
>> - .dgram_dequeue = hvs_dgram_dequeue,
>> + .dgram_get_cid = hvs_dgram_get_cid,
>> + .dgram_get_port = hvs_dgram_get_port,
>> + .dgram_get_length = hvs_dgram_get_length,
>> .dgram_enqueue = hvs_dgram_enqueue,
>> .dgram_allow = hvs_dgram_allow,
>>
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index e95df847176b..5763cdf13804 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = {
>> .cancel_pkt = virtio_transport_cancel_pkt,
>>
>> .dgram_bind = virtio_transport_dgram_bind,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> .stream_enqueue = virtio_transport_stream_enqueue,
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index b769fc258931..e6903c719964 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>> }
>> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>>
>> +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid);
>> +
>> +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port);
>> +
>> +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length);
>> +
>> bool virtio_transport_dgram_allow(u32 cid, u32 port)
>> {
>> return false;
>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>> index b370070194fa..bbc63826bf48 100644
>> --- a/net/vmw_vsock/vmci_transport.c
>> +++ b/net/vmw_vsock/vmci_transport.c
>> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
>> return err - sizeof(*dg);
>> }
>>
>> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
>> - struct msghdr *msg, size_t len,
>> - int flags)
>> +static int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>> {
>> - int err;
>> struct vmci_datagram *dg;
>> - size_t payload_len;
>> - struct sk_buff *skb;
>>
>> - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
>> - return -EOPNOTSUPP;
>> + dg = (struct vmci_datagram *)skb->data;
>> + if (!dg)
>> + return -EINVAL;
>>
>> - /* Retrieve the head sk_buff from the socket's receive queue. */
>> - err = 0;
>> - skb = skb_recv_datagram(&vsk->sk, flags, &err);
>> - if (!skb)
>> - return err;
>> + *cid = dg->src.context;
>> + return 0;
>> +}
>> +
>> +static int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
>> +{
>> + struct vmci_datagram *dg;
>>
>> dg = (struct vmci_datagram *)skb->data;
>> if (!dg)
>> - /* err is 0, meaning we read zero bytes. */
>> - goto out;
>> -
>> - payload_len = dg->payload_size;
>> - /* Ensure the sk_buff matches the payload size claimed in the packet. */
>> - if (payload_len != skb->len - sizeof(*dg)) {
>> - err = -EINVAL;
>> - goto out;
>> - }
>> + return -EINVAL;
>>
>> - if (payload_len > len) {
>> - payload_len = len;
>> - msg->msg_flags |= MSG_TRUNC;
>> - }
>> + *port = dg->src.resource;
>> + return 0;
>> +}
>>
>> - /* Place the datagram payload in the user's iovec. */
>> - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
>> - if (err)
>> - goto out;
>> +static int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
>> +{
>> + struct vmci_datagram *dg;
>>
>> - if (msg->msg_name) {
>> - /* Provide the address of the sender. */
>> - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>> - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
>> - msg->msg_namelen = sizeof(*vm_addr);
>> - }
>> - err = payload_len;
>> + dg = (struct vmci_datagram *)skb->data;
>> + if (!dg)
>> + return -EINVAL;
>>
>> -out:
>> - skb_free_datagram(&vsk->sk, skb);
>> - return err;
>> + *len = dg->payload_size;
>> + return 0;
>> }
>>
>> static bool vmci_transport_dgram_allow(u32 cid, u32 port)
>> @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = {
>> .release = vmci_transport_release,
>> .connect = vmci_transport_connect,
>> .dgram_bind = vmci_transport_dgram_bind,
>> - .dgram_dequeue = vmci_transport_dgram_dequeue,
>> .dgram_enqueue = vmci_transport_dgram_enqueue,
>> .dgram_allow = vmci_transport_dgram_allow,
>> + .dgram_get_cid = vmci_transport_dgram_get_cid,
>> + .dgram_get_port = vmci_transport_dgram_get_port,
>> + .dgram_get_length = vmci_transport_dgram_get_length,
>> + .dgram_payload_offset = sizeof(struct vmci_datagram),
>> .stream_dequeue = vmci_transport_stream_dequeue,
>> .stream_enqueue = vmci_transport_stream_enqueue,
>> .stream_has_data = vmci_transport_stream_has_data,
>> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
>> index 5c6360df1f31..2f3cabc79ee5 100644
>> --- a/net/vmw_vsock/vsock_loopback.c
>> +++ b/net/vmw_vsock/vsock_loopback.c
>> @@ -62,9 +62,11 @@ static struct virtio_transport loopback_transport = {
>> .cancel_pkt = vsock_loopback_cancel_pkt,
>>
>> .dgram_bind = virtio_transport_dgram_bind,
>> - .dgram_dequeue = virtio_transport_dgram_dequeue,
>> .dgram_enqueue = virtio_transport_dgram_enqueue,
>> .dgram_allow = virtio_transport_dgram_allow,
>> + .dgram_get_cid = virtio_transport_dgram_get_cid,
>> + .dgram_get_port = virtio_transport_dgram_get_port,
>> + .dgram_get_length = virtio_transport_dgram_get_length,
>>
>> .stream_dequeue = virtio_transport_stream_dequeue,
>> .stream_enqueue = virtio_transport_stream_enqueue,
>>
>
^ permalink raw reply
* [PATCH v9 2/2] x86/tdx: Support vmalloc() for tdx_enc_status_changed()
From: Dexuan Cui @ 2023-06-21 19:13 UTC (permalink / raw)
To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu, x86,
mikelley
Cc: linux-kernel, Tianyu.Lan, rick.p.edgecombe, Dexuan Cui
In-Reply-To: <20230621191317.4129-1-decui@microsoft.com>
When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
allocates buffers using vzalloc(), and needs to share the buffers with the
host OS by calling set_memory_decrypted(), which is not working for
vmalloc() yet. Add the support by handling the pages one by one.
Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
arch/x86/coco/tdx/tdx.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
Changes in v2:
Changed tdx_enc_status_changed() in place.
Changes in v3:
No change since v2.
Changes in v4:
Added Kirill's Co-developed-by since Kirill helped to improve the
code by adding tdx_enc_status_changed_phys().
Thanks Kirill for the clarification on load_unaligned_zeropad()!
Changes in v5:
Added Kirill's Signed-off-by.
Added Michael's Reviewed-by.
Changes in v6: None.
Changes in v7: None.
Note: there was a race between set_memory_encrypted() and
load_unaligned_zeropad(), which has been fixed by the 3 patches of
Kirill in the x86/tdx branch of the tip tree.
Changes in v8:
Rebased to tip.git's master branch.
Changes in v9:
Added Kuppuswamy Sathyanarayanan's Reviewed-by.
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 746075d20cd2..c1a2423a8159 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -7,6 +7,7 @@
#include <linux/cpufeature.h>
#include <linux/export.h>
#include <linux/io.h>
+#include <linux/mm.h>
#include <asm/coco.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
@@ -753,6 +754,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
return false;
}
+static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end,
+ bool enc)
+{
+ if (!tdx_map_gpa(start, end, enc))
+ return false;
+
+ /* shared->private conversion requires memory to be accepted before use */
+ if (enc)
+ return tdx_accept_memory(start, end);
+
+ return true;
+}
+
/*
* Inform the VMM of the guest's intent for this physical page: shared with
* the VMM or private to the guest. The VMM is expected to change its mapping
@@ -760,15 +774,24 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
*/
static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ unsigned long start = vaddr;
+ unsigned long end = start + numpages * PAGE_SIZE;
- if (!tdx_map_gpa(start, end, enc))
+ if (offset_in_page(start) != 0)
return false;
- /* shared->private conversion requires memory to be accepted before use */
- if (enc)
- return tdx_accept_memory(start, end);
+ if (!is_vmalloc_addr((void *)start))
+ return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc);
+
+ while (start < end) {
+ phys_addr_t start_pa = slow_virt_to_phys((void *)start);
+ phys_addr_t end_pa = start_pa + PAGE_SIZE;
+
+ if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc))
+ return false;
+
+ start += PAGE_SIZE;
+ }
return true;
}
--
2.25.1
^ permalink raw reply related
* [PATCH v9 1/2] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
From: Dexuan Cui @ 2023-06-21 19:13 UTC (permalink / raw)
To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu, x86,
mikelley
Cc: linux-kernel, Tianyu.Lan, rick.p.edgecombe, Dexuan Cui
In-Reply-To: <20230621191317.4129-1-decui@microsoft.com>
GHCI spec for TDX 1.0 says that the MapGPA call may fail with the R10
error code = TDG.VP.VMCALL_RETRY (1), and the guest must retry this
operation for the pages in the region starting at the GPA specified
in R11.
When a fully enlightened TDX guest runs on Hyper-V, Hyper-V can return
the retry error when set_memory_decrypted() is called to decrypt up to
1GB of swiotlb bounce buffers.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
arch/x86/coco/tdx/tdx.c | 64 +++++++++++++++++++++++++------
arch/x86/include/asm/shared/tdx.h | 2 +
2 files changed, 54 insertions(+), 12 deletions(-)
Changes in v2:
Used __tdx_hypercall() directly in tdx_map_gpa().
Added a max_retry_cnt of 1000.
Renamed a few variables, e.g., r11 -> map_fail_paddr.
Changes in v3:
Changed max_retry_cnt from 1000 to 3.
Changes in v4:
__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT) -> __tdx_hypercall_ret()
Added Kirill's Acked-by.
Changes in v5:
Added Michael's Reviewed-by.
Changes in v6: None.
Changes in v7:
Addressed Dave's comments:
see https://lwn.net/ml/linux-kernel/SA1PR21MB1335736123C2BCBBFD7460C3BF46A@SA1PR21MB1335.namprd21.prod.outlook.com
Changes in v8:
Rebased to tip.git's master branch.
Changes in v9:
Added a comment before 'max_retries_per_page'.
Moved 'args', 'map_fail_paddr' and 'ret' into the loop.
Added Kuppuswamy Sathyanarayanan's Reviewed-by.
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 1d6b863c42b0..746075d20cd2 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -703,14 +703,15 @@ static bool tdx_cache_flush_required(void)
}
/*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest. The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
*/
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
{
- phys_addr_t start = __pa(vaddr);
- phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+ /* Retrying the hypercall a second time should succeed; use 3 just in case */
+ const int max_retries_per_page = 3;
+ int retry_count = 0;
if (!enc) {
/* Set the shared (decrypted) bits: */
@@ -718,12 +719,51 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
end |= cc_mkdec(0);
}
- /*
- * Notify the VMM about page mapping conversion. More info about ABI
- * can be found in TDX Guest-Host-Communication Interface (GHCI),
- * section "TDG.VP.VMCALL<MapGPA>"
- */
- if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+ while (retry_count < max_retries_per_page) {
+ struct tdx_hypercall_args args = {
+ .r10 = TDX_HYPERCALL_STANDARD,
+ .r11 = TDVMCALL_MAP_GPA,
+ .r12 = start,
+ .r13 = end - start };
+
+ u64 map_fail_paddr;
+ u64 ret = __tdx_hypercall_ret(&args);
+
+ if (ret != TDVMCALL_STATUS_RETRY)
+ return !ret;
+ /*
+ * The guest must retry the operation for the pages in the
+ * region starting at the GPA specified in R11. R11 comes
+ * from the untrusted VMM. Sanity check it.
+ */
+ map_fail_paddr = args.r11;
+ if (map_fail_paddr < start || map_fail_paddr >= end)
+ return false;
+
+ /* "Consume" a retry without forward progress */
+ if (map_fail_paddr == start) {
+ retry_count++;
+ continue;
+ }
+
+ start = map_fail_paddr;
+ retry_count = 0;
+ }
+
+ return false;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest. The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+ phys_addr_t start = __pa(vaddr);
+ phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
+
+ if (!tdx_map_gpa(start, end, enc))
return false;
/* shared->private conversion requires memory to be accepted before use */
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 90ea813c4b99..9db89a99ae5b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -24,6 +24,8 @@
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
+#define TDVMCALL_STATUS_RETRY 1
+
#ifndef __ASSEMBLY__
/*
--
2.25.1
^ permalink raw reply related
* [PATCH v9 0/2] Support TDX guests on Hyper-V (the x86/tdx part)
From: Dexuan Cui @ 2023-06-21 19:13 UTC (permalink / raw)
To: ak, arnd, bp, brijesh.singh, dan.j.williams, dave.hansen,
dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu, x86,
mikelley
Cc: linux-kernel, Tianyu.Lan, rick.p.edgecombe, Dexuan Cui
The two patches are based on today's tip.git's master branch.
Note: the two patches don't apply to the current x86/tdx branch, which
doesn't have commit 75d090fd167a ("x86/tdx: Add unaccepted memory support").
As Dave suggested, I moved some local variables of tdx_map_gpa() to
inside the loop. I added Sathyanarayanan's Reviewed-by.
Please review.
FWIW, the old versons are here:
v8: https://lwn.net/ml/linux-kernel/20230620154830.25442-1-decui@microsoft.com/
v7: https://lwn.net/ml/linux-kernel/20230616044701.15888-1-decui%40microsoft.com/
v6: https://lwn.net/ml/linux-kernel/20230504225351.10765-1-decui@microsoft.com/
Dexuan Cui (2):
x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
x86/tdx: Support vmalloc() for tdx_enc_status_changed()
arch/x86/coco/tdx/tdx.c | 87 ++++++++++++++++++++++++++-----
arch/x86/include/asm/shared/tdx.h | 2 +
2 files changed, 77 insertions(+), 12 deletions(-)
--
2.25.1
^ permalink raw reply
* RE: [PATCH] net: mana: Fix MANA VF unload when host is unresponsive
From: Haiyang Zhang @ 2023-06-21 18:27 UTC (permalink / raw)
To: souradeep chakrabarti, KY Srinivasan, wei.liu@kernel.org,
Dexuan Cui, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, Long Li, Ajay Sharma,
leon@kernel.org, cai.huoqing@linux.dev,
ssengar@linux.microsoft.com, vkuznets@redhat.com,
tglx@linutronix.de, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
In-Reply-To: <1687343341-10898-1-git-send-email-schakrabarti@linux.microsoft.com>
> -----Original Message-----
> From: souradeep chakrabarti <schakrabarti@linux.microsoft.com>
> Sent: Wednesday, June 21, 2023 6:29 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Long Li <longli@microsoft.com>; Ajay
> Sharma <sharmaajay@microsoft.com>; leon@kernel.org;
> cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> Cc: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> Subject: [PATCH] net: mana: Fix MANA VF unload when host is unresponsive
>
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
>
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
>
> Also this patch adds a new attribute in mana_context, which gets set when
> mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> This flag then helps to avoid the timeouts in successive calls.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> .../net/ethernet/microsoft/mana/gdma_main.c | 4 +++-
> .../net/ethernet/microsoft/mana/hw_channel.c | 12 ++++++++++-
> drivers/net/ethernet/microsoft/mana/mana_en.c | 21 +++++++++++++++++--
> include/net/mana/mana.h | 2 ++
> 4 files changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5cc43ae78334 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev
> *gd)
> struct gdma_context *gc = gd->gdma_context;
> struct gdma_general_resp resp = {};
> struct gdma_general_req req = {};
> + struct mana_context *ac;
> int err;
>
> if (gd->pdid == INVALID_PDID)
> return -EINVAL;
> + ac = (struct mana_context *)gd->driver_data;
>
> mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE,
> sizeof(req),
> sizeof(resp));
> @@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> req.hdr.dev_id = gd->dev_id;
>
> err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> - if (err || resp.hdr.status) {
> + if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
> dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
> err, resp.hdr.status);
> if (!err)
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..557b890ad0ae 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -1,8 +1,10 @@
> // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> /* Copyright (c) 2021, Microsoft Corporation. */
>
> +#include "asm-generic/errno.h"
> #include <net/mana/gdma.h>
> #include <net/mana/hw_channel.h>
> +#include <net/mana/mana.h>
>
> static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16
> *msg_id)
> {
> @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> struct hwc_wq *txq = hwc->txq;
> struct gdma_req_hdr *req_msg;
> struct hwc_caller_ctx *ctx;
> + struct mana_context *ac;
> u32 dest_vrcq = 0;
> u32 dest_vrq = 0;
> u16 msg_id;
> int err;
>
> mana_hwc_get_msg_index(hwc, &msg_id);
> + ac = (struct mana_context *)hwc->gdma_dev->driver_data;
> + if (ac->vf_unload_timeout) {
> + dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
> + err = -ETIMEDOUT;
> + goto out;
> + }
>
> tx_wr = &txq->msg_buf->reqs[msg_id];
>
> @@ -825,9 +834,10 @@ int mana_hwc_send_request(struct
> hw_channel_context *hwc, u32 req_len,
> goto out;
> }
>
> - if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
> + if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
> dev_err(hwc->dev, "HWC: Request timed out!\n");
> err = -ETIMEDOUT;
> + ac->vf_unload_timeout = true;
> goto out;
> }
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..24f5508d2979 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2330,7 +2330,10 @@ static int mana_dealloc_queues(struct net_device
> *ndev)
> struct mana_port_context *apc = netdev_priv(ndev);
> struct gdma_dev *gd = apc->ac->gdma_dev;
> struct mana_txq *txq;
> + struct sk_buff *skb;
> + struct mana_cq *cq;
> int i, err;
> + unsigned long timeout;
>
> if (apc->port_is_up)
> return -EINVAL;
> @@ -2348,13 +2351,26 @@ static int mana_dealloc_queues(struct net_device
> *ndev)
> *
> * Drain all the in-flight TX packets
> */
> +
> + timeout = jiffies + 120 * HZ;
> for (i = 0; i < apc->num_queues; i++) {
> txq = &apc->tx_qp[i].txq;
> -
> - while (atomic_read(&txq->pending_sends) > 0)
> + while (atomic_read(&txq->pending_sends) > 0 &&
> + time_before(jiffies, timeout)) {
> usleep_range(1000, 2000);
> + }
> }
>
> + for (i = 0; i < apc->num_queues; i++) {
> + txq = &apc->tx_qp[i].txq;
> + cq = &apc->tx_qp[i].tx_cq;
> + while (atomic_read(&txq->pending_sends)) {
> + skb = skb_dequeue(&txq->pending_skbs);
> + mana_unmap_skb(skb, apc);
> + napi_consume_skb(skb, cq->budget);
> + atomic_sub(1, &txq->pending_sends);
> + }
> + }
> /* We're 100% sure the queues can no longer be woken up, because
> * we're sure now mana_poll_tx_cq() can't be running.
> */
> @@ -2605,6 +2621,7 @@ int mana_probe(struct gdma_dev *gd, bool
> resuming)
> }
> }
>
> + ac->vf_unload_timeout = false;
> err = add_adev(gd);
> out:
> if (err)
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 9eef19972845..34f5d8e06ede 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -361,6 +361,8 @@ struct mana_context {
> struct mana_eq *eqs;
>
> struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
> +
> + bool vf_unload_timeout;
> };
>
> struct mana_port_context {
> --
Please specify "net" branch for fixes.
Also Cc: stable@vger.kernel.org So it will be ported to stable trees.
Thanks,
- Haiyang
^ permalink raw reply
* [PATCH v2 3/3] tools: Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools
From: Xin Li @ 2023-06-21 17:12 UTC (permalink / raw)
To: linux-kernel, platform-driver-x86, iommu, linux-hyperv,
linux-perf-users, x86
Cc: tglx, mingo, bp, dave.hansen, hpa, steve.wahl, mike.travis,
dimitri.sivanich, russ.anderson, dvhart, andy, joro,
suravee.suthikulpanit, will, robin.murphy, kys, haiyangz, wei.liu,
decui, dwmw2, baolu.lu, peterz, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
xin3.li, seanjc, jiangshanlai, jgg, yangtiezhu
In-Reply-To: <20230621171248.6805-1-xin3.li@intel.com>
Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools.
Signed-off-by: Xin Li <xin3.li@intel.com>
---
tools/arch/x86/include/asm/irq_vectors.h | 7 -------
tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh | 2 +-
2 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/tools/arch/x86/include/asm/irq_vectors.h b/tools/arch/x86/include/asm/irq_vectors.h
index 43dcb9284208..3a19904c2db6 100644
--- a/tools/arch/x86/include/asm/irq_vectors.h
+++ b/tools/arch/x86/include/asm/irq_vectors.h
@@ -35,13 +35,6 @@
*/
#define FIRST_EXTERNAL_VECTOR 0x20
-/*
- * Reserve the lowest usable vector (and hence lowest priority) 0x20 for
- * triggering cleanup after irq migration. 0x21-0x2f will still be used
- * for device interrupts.
- */
-#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR
-
#define IA32_SYSCALL_VECTOR 0x80
/*
diff --git a/tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh b/tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh
index eed9ce0fcbe6..87dc68c7de0c 100755
--- a/tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh
+++ b/tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh
@@ -12,7 +12,7 @@ x86_irq_vectors=${arch_x86_header_dir}/irq_vectors.h
# FIRST_EXTERNAL_VECTOR is not that useful, find what is its number
# and then replace whatever is using it and that is useful, which at
-# the time of writing of this script was: IRQ_MOVE_CLEANUP_VECTOR.
+# the time of writing of this script was: 0x20.
first_external_regex='^#define[[:space:]]+FIRST_EXTERNAL_VECTOR[[:space:]]+(0x[[:xdigit:]]+)$'
first_external_vector=$(grep -E ${first_external_regex} ${x86_irq_vectors} | sed -r "s/${first_external_regex}/\1/g")
--
2.34.1
^ permalink raw reply related
* [PATCH v2 2/3] x86/vector: Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback
From: Xin Li @ 2023-06-21 17:12 UTC (permalink / raw)
To: linux-kernel, platform-driver-x86, iommu, linux-hyperv,
linux-perf-users, x86
Cc: tglx, mingo, bp, dave.hansen, hpa, steve.wahl, mike.travis,
dimitri.sivanich, russ.anderson, dvhart, andy, joro,
suravee.suthikulpanit, will, robin.murphy, kys, haiyangz, wei.liu,
decui, dwmw2, baolu.lu, peterz, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
xin3.li, seanjc, jiangshanlai, jgg, yangtiezhu
In-Reply-To: <20230621171248.6805-1-xin3.li@intel.com>
From: Thomas Gleixner <tglx@linutronix.de>
Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback for cleaning
up the leftovers of a moved interrupt.
The only new job incurred is to do vector cleanup in lapic_offline()
in case the vector cleanup timer has not expired.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
Changes since v1:
* Add a lockdep_assert_held() statement to get rid of a bad comment
that claims __vector_cleanup() needs to be called with vector_lock
held. (Peter Zijlstra).
---
arch/x86/include/asm/idtentry.h | 1 -
arch/x86/include/asm/irq_vectors.h | 7 ---
arch/x86/kernel/apic/vector.c | 98 ++++++++++++++++++++++++------
arch/x86/kernel/idt.c | 1 -
4 files changed, 78 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b241af4ce9b4..cd5c10a74071 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -648,7 +648,6 @@ DECLARE_IDTENTRY_SYSVEC(X86_PLATFORM_IPI_VECTOR, sysvec_x86_platform_ipi);
#ifdef CONFIG_SMP
DECLARE_IDTENTRY(RESCHEDULE_VECTOR, sysvec_reschedule_ipi);
-DECLARE_IDTENTRY_SYSVEC(IRQ_MOVE_CLEANUP_VECTOR, sysvec_irq_move_cleanup);
DECLARE_IDTENTRY_SYSVEC(REBOOT_VECTOR, sysvec_reboot);
DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_SINGLE_VECTOR, sysvec_call_function_single);
DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_VECTOR, sysvec_call_function);
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 43dcb9284208..3a19904c2db6 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -35,13 +35,6 @@
*/
#define FIRST_EXTERNAL_VECTOR 0x20
-/*
- * Reserve the lowest usable vector (and hence lowest priority) 0x20 for
- * triggering cleanup after irq migration. 0x21-0x2f will still be used
- * for device interrupts.
- */
-#define IRQ_MOVE_CLEANUP_VECTOR FIRST_EXTERNAL_VECTOR
-
#define IA32_SYSCALL_VECTOR 0x80
/*
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index aa370bd0d933..01c359a66b04 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -44,7 +44,18 @@ static cpumask_var_t vector_searchmask;
static struct irq_chip lapic_controller;
static struct irq_matrix *vector_matrix;
#ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct hlist_head, cleanup_list);
+
+static void vector_cleanup_callback(struct timer_list *tmr);
+
+struct vector_cleanup {
+ struct hlist_head head;
+ struct timer_list timer;
+};
+
+static DEFINE_PER_CPU(struct vector_cleanup, vector_cleanup) = {
+ .head = HLIST_HEAD_INIT,
+ .timer = __TIMER_INITIALIZER(vector_cleanup_callback, TIMER_PINNED),
+};
#endif
void lock_vector_lock(void)
@@ -841,10 +852,21 @@ void lapic_online(void)
this_cpu_write(vector_irq[vector], __setup_vector_irq(vector));
}
+static void __vector_cleanup(struct vector_cleanup *cl, bool check_irr);
+
void lapic_offline(void)
{
+ struct vector_cleanup *cl = this_cpu_ptr(&vector_cleanup);
+
lock_vector_lock();
+
+ /* In case the vector cleanup timer has not expired */
+ __vector_cleanup(cl, false);
+
irq_matrix_offline(vector_matrix);
+ WARN_ON_ONCE(try_to_del_timer_sync(&cl->timer) < 0);
+ WARN_ON_ONCE(!hlist_empty(&cl->head));
+
unlock_vector_lock();
}
@@ -934,49 +956,85 @@ static void free_moved_vector(struct apic_chip_data *apicd)
apicd->move_in_progress = 0;
}
-DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
+static void __vector_cleanup(struct vector_cleanup *cl, bool check_irr)
{
- struct hlist_head *clhead = this_cpu_ptr(&cleanup_list);
struct apic_chip_data *apicd;
struct hlist_node *tmp;
+ bool rearm = false;
- ack_APIC_irq();
- /* Prevent vectors vanishing under us */
- raw_spin_lock(&vector_lock);
+ lockdep_assert_held(&vector_lock);
- hlist_for_each_entry_safe(apicd, tmp, clhead, clist) {
+ hlist_for_each_entry_safe(apicd, tmp, &cl->head, clist) {
unsigned int irr, vector = apicd->prev_vector;
/*
* Paranoia: Check if the vector that needs to be cleaned
- * up is registered at the APICs IRR. If so, then this is
- * not the best time to clean it up. Clean it up in the
- * next attempt by sending another IRQ_MOVE_CLEANUP_VECTOR
- * to this CPU. IRQ_MOVE_CLEANUP_VECTOR is the lowest
- * priority external vector, so on return from this
- * interrupt the device interrupt will happen first.
+ * up is registered at the APICs IRR. That's clearly a
+ * hardware issue if the vector arrived on the old target
+ * _after_ interrupts were disabled above. Keep @apicd
+ * on the list and schedule the timer again to give the CPU
+ * a chance to handle the pending interrupt.
+ *
+ * Do not check IRR when called from lapic_offline(), because
+ * fixup_irqs() was just called to scan IRR for set bits and
+ * forward them to new destination CPUs via IPIs.
*/
- irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
+ irr = check_irr ? apic_read(APIC_IRR + (vector / 32 * 0x10)) : 0;
if (irr & (1U << (vector % 32))) {
- apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
+ pr_warn_once("Moved interrupt pending in old target APIC %u\n", apicd->irq);
+ rearm = true;
continue;
}
free_moved_vector(apicd);
}
- raw_spin_unlock(&vector_lock);
+ /*
+ * Must happen under vector_lock to make the timer_pending() check
+ * in __vector_schedule_cleanup() race free against the rearm here.
+ */
+ if (rearm)
+ mod_timer(&cl->timer, jiffies + 1);
+}
+
+static void vector_cleanup_callback(struct timer_list *tmr)
+{
+ struct vector_cleanup *cl = container_of(tmr, typeof(*cl), timer);
+
+ /* Prevent vectors vanishing under us */
+ raw_spin_lock_irq(&vector_lock);
+ __vector_cleanup(cl, true);
+ raw_spin_unlock_irq(&vector_lock);
}
static void __vector_schedule_cleanup(struct apic_chip_data *apicd)
{
- unsigned int cpu;
+ unsigned int cpu = apicd->prev_cpu;
raw_spin_lock(&vector_lock);
apicd->move_in_progress = 0;
- cpu = apicd->prev_cpu;
if (cpu_online(cpu)) {
- hlist_add_head(&apicd->clist, per_cpu_ptr(&cleanup_list, cpu));
- apic->send_IPI(cpu, IRQ_MOVE_CLEANUP_VECTOR);
+ struct vector_cleanup *cl = per_cpu_ptr(&vector_cleanup, cpu);
+
+ hlist_add_head(&apicd->clist, &cl->head);
+
+ /*
+ * The lockless timer_pending() check is safe here. If it
+ * returns true, then the callback will observe this new
+ * apic data in the hlist as everything is serialized by
+ * vector lock.
+ *
+ * If it returns false then the timer is either not armed
+ * or the other CPU executes the callback, which again
+ * would be blocked on vector lock. Rearming it in the
+ * latter case makes it fire for nothing.
+ *
+ * This is also safe against the callback rearming the timer
+ * because that's serialized via vector lock too.
+ */
+ if (!timer_pending(&cl->timer)) {
+ cl->timer.expires = jiffies + 1;
+ add_timer_on(&cl->timer, cpu);
+ }
} else {
apicd->prev_vector = 0;
}
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..f3958262c725 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -131,7 +131,6 @@ static const __initconst struct idt_data apic_idts[] = {
INTG(RESCHEDULE_VECTOR, asm_sysvec_reschedule_ipi),
INTG(CALL_FUNCTION_VECTOR, asm_sysvec_call_function),
INTG(CALL_FUNCTION_SINGLE_VECTOR, asm_sysvec_call_function_single),
- INTG(IRQ_MOVE_CLEANUP_VECTOR, asm_sysvec_irq_move_cleanup),
INTG(REBOOT_VECTOR, asm_sysvec_reboot),
#endif
--
2.34.1
^ permalink raw reply related
* [PATCH v2 1/3] x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()
From: Xin Li @ 2023-06-21 17:12 UTC (permalink / raw)
To: linux-kernel, platform-driver-x86, iommu, linux-hyperv,
linux-perf-users, x86
Cc: tglx, mingo, bp, dave.hansen, hpa, steve.wahl, mike.travis,
dimitri.sivanich, russ.anderson, dvhart, andy, joro,
suravee.suthikulpanit, will, robin.murphy, kys, haiyangz, wei.liu,
decui, dwmw2, baolu.lu, peterz, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
xin3.li, seanjc, jiangshanlai, jgg, yangtiezhu
In-Reply-To: <20230621171248.6805-1-xin3.li@intel.com>
From: Thomas Gleixner <tglx@linutronix.de>
Rename send_cleanup_vector() to vector_schedule_cleanup() for the next
patch to replace vector cleanup IPI with a timer callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
arch/x86/include/asm/hw_irq.h | 4 ++--
arch/x86/kernel/apic/vector.c | 8 ++++----
arch/x86/platform/uv/uv_irq.c | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
drivers/iommu/hyperv-iommu.c | 4 ++--
drivers/iommu/intel/irq_remapping.c | 2 +-
6 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece58151..551829884734 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -97,10 +97,10 @@ extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
extern void lock_vector_lock(void);
extern void unlock_vector_lock(void);
#ifdef CONFIG_SMP
-extern void send_cleanup_vector(struct irq_cfg *);
+extern void vector_schedule_cleanup(struct irq_cfg *);
extern void irq_complete_move(struct irq_cfg *cfg);
#else
-static inline void send_cleanup_vector(struct irq_cfg *c) { }
+static inline void vector_schedule_cleanup(struct irq_cfg *c) { }
static inline void irq_complete_move(struct irq_cfg *c) { }
#endif
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index c1efebd27e6c..aa370bd0d933 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -967,7 +967,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_irq_move_cleanup)
raw_spin_unlock(&vector_lock);
}
-static void __send_cleanup_vector(struct apic_chip_data *apicd)
+static void __vector_schedule_cleanup(struct apic_chip_data *apicd)
{
unsigned int cpu;
@@ -983,13 +983,13 @@ static void __send_cleanup_vector(struct apic_chip_data *apicd)
raw_spin_unlock(&vector_lock);
}
-void send_cleanup_vector(struct irq_cfg *cfg)
+void vector_schedule_cleanup(struct irq_cfg *cfg)
{
struct apic_chip_data *apicd;
apicd = container_of(cfg, struct apic_chip_data, hw_irq_cfg);
if (apicd->move_in_progress)
- __send_cleanup_vector(apicd);
+ __vector_schedule_cleanup(apicd);
}
void irq_complete_move(struct irq_cfg *cfg)
@@ -1007,7 +1007,7 @@ void irq_complete_move(struct irq_cfg *cfg)
* on the same CPU.
*/
if (apicd->cpu == smp_processor_id())
- __send_cleanup_vector(apicd);
+ __vector_schedule_cleanup(apicd);
}
/*
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index ee21d6a36a80..4221259a5870 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -58,7 +58,7 @@ uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret >= 0) {
uv_program_mmr(cfg, data->chip_data);
- send_cleanup_vector(cfg);
+ vector_schedule_cleanup(cfg);
}
return ret;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index dc1ec6849775..b5900e70de60 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3658,7 +3658,7 @@ static int amd_ir_set_affinity(struct irq_data *data,
* at the new destination. So, time to cleanup the previous
* vector allocation.
*/
- send_cleanup_vector(cfg);
+ vector_schedule_cleanup(cfg);
return IRQ_SET_MASK_OK_DONE;
}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 8302db7f783e..8a5c17b97310 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -51,7 +51,7 @@ static int hyperv_ir_set_affinity(struct irq_data *data,
if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
return ret;
- send_cleanup_vector(cfg);
+ vector_schedule_cleanup(cfg);
return 0;
}
@@ -257,7 +257,7 @@ static int hyperv_root_ir_set_affinity(struct irq_data *data,
if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
return ret;
- send_cleanup_vector(cfg);
+ vector_schedule_cleanup(cfg);
return 0;
}
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index a1b987335b31..55d899f5a14b 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1180,7 +1180,7 @@ intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
* at the new destination. So, time to cleanup the previous
* vector allocation.
*/
- send_cleanup_vector(cfg);
+ vector_schedule_cleanup(cfg);
return IRQ_SET_MASK_OK_DONE;
}
--
2.34.1
^ permalink raw reply related
* [PATCH v2 0/3] Do IRQ move cleanup with a timer instead of an IPI
From: Xin Li @ 2023-06-21 17:12 UTC (permalink / raw)
To: linux-kernel, platform-driver-x86, iommu, linux-hyperv,
linux-perf-users, x86
Cc: tglx, mingo, bp, dave.hansen, hpa, steve.wahl, mike.travis,
dimitri.sivanich, russ.anderson, dvhart, andy, joro,
suravee.suthikulpanit, will, robin.murphy, kys, haiyangz, wei.liu,
decui, dwmw2, baolu.lu, peterz, acme, mark.rutland,
alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
xin3.li, seanjc, jiangshanlai, jgg, yangtiezhu
No point to waste a vector for cleaning up the leftovers of a moved
interrupt. Aside of that this must be the lowest priority of all vectors
which makes FRED systems utilizing vectors 0x10-0x1f more complicated
than necessary.
Schedule a timer instead.
Changes since v1:
* Add a lockdep_assert_held() statement to get rid of a bad comment
that claims __vector_cleanup() needs to be called with vector_lock
held. (Peter Zijlstra).
Thomas Gleixner (2):
x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()
x86/vector: Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback
Xin Li (1):
tools: Get rid of IRQ_MOVE_CLEANUP_VECTOR from tools
arch/x86/include/asm/hw_irq.h | 4 +-
arch/x86/include/asm/idtentry.h | 1 -
arch/x86/include/asm/irq_vectors.h | 7 --
arch/x86/kernel/apic/vector.c | 106 ++++++++++++++----
arch/x86/kernel/idt.c | 1 -
arch/x86/platform/uv/uv_irq.c | 2 +-
drivers/iommu/amd/iommu.c | 2 +-
drivers/iommu/hyperv-iommu.c | 4 +-
drivers/iommu/intel/irq_remapping.c | 2 +-
tools/arch/x86/include/asm/irq_vectors.h | 7 --
.../beauty/tracepoints/x86_irq_vectors.sh | 2 +-
11 files changed, 90 insertions(+), 48 deletions(-)
--
2.34.1
^ permalink raw reply
* Re: [PATCH] net: mana: Fix MANA VF unload when host is unresponsive
From: Simon Horman @ 2023-06-21 17:37 UTC (permalink / raw)
To: souradeep chakrabarti
Cc: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma
In-Reply-To: <1687343341-10898-1-git-send-email-schakrabarti@linux.microsoft.com>
On Wed, Jun 21, 2023 at 03:29:01AM -0700, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
>
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
>
> Also this patch adds a new attribute in mana_context, which gets set when
> mana_hwc_send_request() hits a timeout because of host unresponsiveness.
> This flag then helps to avoid the timeouts in successive calls.
>
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Hi Souradeep,
thanks for your patch.
Some minor feedback from my aide.
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 8f3f78b68592..5cc43ae78334 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
> struct gdma_context *gc = gd->gdma_context;
> struct gdma_general_resp resp = {};
> struct gdma_general_req req = {};
> + struct mana_context *ac;
> int err;
>
> if (gd->pdid == INVALID_PDID)
> return -EINVAL;
> + ac = (struct mana_context *)gd->driver_data;
drive_data is a void *.
There is no need to cast it to another type of pointer.
...
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index 9d1507eba5b9..557b890ad0ae 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
...
> @@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
> struct hwc_wq *txq = hwc->txq;
> struct gdma_req_hdr *req_msg;
> struct hwc_caller_ctx *ctx;
> + struct mana_context *ac;
> u32 dest_vrcq = 0;
> u32 dest_vrq = 0;
> u16 msg_id;
> int err;
>
> mana_hwc_get_msg_index(hwc, &msg_id);
> + ac = (struct mana_context *)hwc->gdma_dev->driver_data;
Ditto.
...
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..24f5508d2979 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2330,7 +2330,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
> struct mana_port_context *apc = netdev_priv(ndev);
> struct gdma_dev *gd = apc->ac->gdma_dev;
> struct mana_txq *txq;
> + struct sk_buff *skb;
> + struct mana_cq *cq;
> int i, err;
> + unsigned long timeout;
Please use reverse xmas tree - longest line to shortest - for
local variable declarations in Networking code.
...
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 9eef19972845..34f5d8e06ede 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -361,6 +361,8 @@ struct mana_context {
> struct mana_eq *eqs;
>
> struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
> +
> + bool vf_unload_timeout;
Perhaps it is not important, but on x86_54 there is a 6 byte hole in the first
cacheline after num_ports where this could go.
pahole reports:
struct mana_context {
struct gdma_dev * gdma_dev; /* 0 8 */
u16 num_ports; /* 8 2 */
/* XXX 6 bytes hole, try to pack */
struct mana_eq * eqs; /* 16 8 */
struct net_device * ports[256]; /* 24 2048 */
/* --- cacheline 32 boundary (2048 bytes) was 24 bytes ago --- */
bool vf_unload_timeout; /* 2072 1 */
/* size: 2080, cachelines: 33, members: 5 */
/* sum members: 2067, holes: 1, sum holes: 6 */
/* padding: 7 */
/* last cacheline: 32 bytes */
};
--
pw-bot: changes-requested
^ permalink raw reply
* Re: [PATCH 09/11] sysctl: Remove the end element in sysctl table arrays
From: Jani Nikula @ 2023-06-21 11:16 UTC (permalink / raw)
To: Joel Granados, mcgrof, Russell King, Catalin Marinas, Will Deacon,
Michael Ellerman, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Gerald Schaefer, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Herbert Xu, David S. Miller, Russ Weight, Greg Kroah-Hartman,
Phillip Potter, Clemens Ladisch, Arnd Bergmann, Corey Minyard,
Theodore Ts'o, Jason A. Donenfeld, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, David Airlie, Daniel Vetter,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Jason Gunthorpe, Leon Romanovsky, Benjamin Herrenschmidt,
Song Liu, Robin Holt, Steve Wahl, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Sudip Mukherjee, Mark Rutland,
James E.J. Bottomley, Martin K. Petersen, Doug Gilbert,
Jiri Slaby, Juergen Gross, Stefano Stabellini, Alexander Viro,
Christian Brauner, Benjamin LaHaise, David Howells, Jan Harkes,
coda, Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
Jan Kara, Anton Altaparmakov, Mark Fasheh, Joel Becker, Joseph Qi,
Kees Cook, Iurii Zaikin, Eric Biggers, Darrick J. Wong,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Balbir Singh, Eric Biederman, Naveen N. Rao, Anil S Keshavamurthy,
Masami Hiramatsu, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
Juri Lelli, Vincent Guittot, John Stultz, Steven Rostedt,
Andrew Morton, Mike Kravetz, Muchun Song, Naoya Horiguchi,
Matthew Wilcox (Oracle), Joerg Reuter, Ralf Baechle,
Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Roopa Prabhu, Nikolay Aleksandrov, Alexander Aring,
Stefan Schmidt, Miquel Raynal, Steffen Klassert, Matthieu Baerts,
Mat Martineau, Simon Horman, Julian Anastasov,
Remi Denis-Courmont, Santosh Shilimkar, Marc Dionne, Neil Horman,
Marcelo Ricardo Leitner, Xin Long, Karsten Graul, Wenjia Zhang,
Jan Karcher, Jon Maloy, Ying Xue, Martin Schiller, John Johansen,
Paul Moore, James Morris, Serge E. Hallyn, Jarkko Sakkinen
Cc: Joel Granados, Nicholas Piggin, Christophe Leroy,
Christian Borntraeger, Sven Schnelle, H. Peter Anvin,
Rafael J. Wysocki, Mike Travis, Oleksandr Tyshchenko,
Amir Goldstein, Matthew Bobrowski, John Fastabend,
Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Waiman Long, Boqun Feng, John Ogness,
Dietmar Eggemann, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, Valentin Schneider, Andy Lutomirski,
Will Drewry, Stephen Boyd, Miaohe Lin, linux-arm-kernel,
linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-crypto,
openipmi-developer, intel-gfx, dri-devel, linux-hyperv,
linux-rdma, linux-raid, netdev, linux-scsi, xen-devel,
linux-fsdevel, linux-aio, linux-cachefs, codalist, linux-mm,
linux-nfs, linux-ntfs-dev, ocfs2-devel, fsverity, linux-xfs, bpf,
kexec, linux-trace-kernel, linux-hams, netfilter-devel, coreteam,
bridge, dccp, linux-wpan, mptcp, lvs-devel, rds-devel, linux-afs,
linux-sctp, tipc-discussion, linux-x25, apparmor,
linux-security-module, keyrings
In-Reply-To: <20230621094817.433842-1-j.granados@samsung.com>
On Wed, 21 Jun 2023, Joel Granados <j.granados@samsung.com> wrote:
> Remove the empty end element from all the arrays that are passed to the
> register sysctl calls. In some files this means reducing the explicit
> array size by one. Also make sure that we are using the size in
> ctl_table_header instead of evaluating the .procname element.
Where's the harm in removing the end elements driver by driver? This is
an unwieldy patch to handle.
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f43950219ffc..e4d7372afb10 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -4884,24 +4884,23 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
>
> static struct ctl_table oa_table[] = {
> {
> - .procname = "perf_stream_paranoid",
> - .data = &i915_perf_stream_paranoid,
> - .maxlen = sizeof(i915_perf_stream_paranoid),
> - .mode = 0644,
> - .proc_handler = proc_dointvec_minmax,
> - .extra1 = SYSCTL_ZERO,
> - .extra2 = SYSCTL_ONE,
> - },
> + .procname = "perf_stream_paranoid",
> + .data = &i915_perf_stream_paranoid,
> + .maxlen = sizeof(i915_perf_stream_paranoid),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = SYSCTL_ONE,
> + },
> {
> - .procname = "oa_max_sample_rate",
> - .data = &i915_oa_max_sample_rate,
> - .maxlen = sizeof(i915_oa_max_sample_rate),
> - .mode = 0644,
> - .proc_handler = proc_dointvec_minmax,
> - .extra1 = SYSCTL_ZERO,
> - .extra2 = &oa_sample_rate_hard_limit,
> - },
> - {}
> + .procname = "oa_max_sample_rate",
> + .data = &i915_oa_max_sample_rate,
> + .maxlen = sizeof(i915_oa_max_sample_rate),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = &oa_sample_rate_hard_limit,
> + }
> };
The existing indentation is off, but fixing it doesn't really belong in
this patch.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
^ permalink raw reply
* [PATCH] net: mana: Fix MANA VF unload when host is unresponsive
From: souradeep chakrabarti @ 2023-06-21 10:29 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, davem, edumazet, kuba, pabeni,
longli, sharmaajay, leon, cai.huoqing, ssengar, vkuznets, tglx,
linux-hyperv, netdev, linux-kernel, linux-rdma
Cc: Souradeep Chakrabarti
From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
This patch addresses the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.
Also this patch adds a new attribute in mana_context, which gets set when
mana_hwc_send_request() hits a timeout because of host unresponsiveness.
This flag then helps to avoid the timeouts in successive calls.
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 4 +++-
.../net/ethernet/microsoft/mana/hw_channel.c | 12 ++++++++++-
drivers/net/ethernet/microsoft/mana/mana_en.c | 21 +++++++++++++++++--
include/net/mana/mana.h | 2 ++
4 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 8f3f78b68592..5cc43ae78334 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -946,10 +946,12 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
struct gdma_context *gc = gd->gdma_context;
struct gdma_general_resp resp = {};
struct gdma_general_req req = {};
+ struct mana_context *ac;
int err;
if (gd->pdid == INVALID_PDID)
return -EINVAL;
+ ac = (struct mana_context *)gd->driver_data;
mana_gd_init_req_hdr(&req.hdr, GDMA_DEREGISTER_DEVICE, sizeof(req),
sizeof(resp));
@@ -957,7 +959,7 @@ int mana_gd_deregister_device(struct gdma_dev *gd)
req.hdr.dev_id = gd->dev_id;
err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
- if (err || resp.hdr.status) {
+ if ((err || resp.hdr.status) && !ac->vf_unload_timeout) {
dev_err(gc->dev, "Failed to deregister device: %d, 0x%x\n",
err, resp.hdr.status);
if (!err)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index 9d1507eba5b9..557b890ad0ae 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -1,8 +1,10 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
/* Copyright (c) 2021, Microsoft Corporation. */
+#include "asm-generic/errno.h"
#include <net/mana/gdma.h>
#include <net/mana/hw_channel.h>
+#include <net/mana/mana.h>
static int mana_hwc_get_msg_index(struct hw_channel_context *hwc, u16 *msg_id)
{
@@ -786,12 +788,19 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
struct hwc_wq *txq = hwc->txq;
struct gdma_req_hdr *req_msg;
struct hwc_caller_ctx *ctx;
+ struct mana_context *ac;
u32 dest_vrcq = 0;
u32 dest_vrq = 0;
u16 msg_id;
int err;
mana_hwc_get_msg_index(hwc, &msg_id);
+ ac = (struct mana_context *)hwc->gdma_dev->driver_data;
+ if (ac->vf_unload_timeout) {
+ dev_err(hwc->dev, "HWC: vport is already unloaded.\n");
+ err = -ETIMEDOUT;
+ goto out;
+ }
tx_wr = &txq->msg_buf->reqs[msg_id];
@@ -825,9 +834,10 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
goto out;
}
- if (!wait_for_completion_timeout(&ctx->comp_event, 30 * HZ)) {
+ if (!wait_for_completion_timeout(&ctx->comp_event, 5 * HZ)) {
dev_err(hwc->dev, "HWC: Request timed out!\n");
err = -ETIMEDOUT;
+ ac->vf_unload_timeout = true;
goto out;
}
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..24f5508d2979 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2330,7 +2330,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
struct mana_port_context *apc = netdev_priv(ndev);
struct gdma_dev *gd = apc->ac->gdma_dev;
struct mana_txq *txq;
+ struct sk_buff *skb;
+ struct mana_cq *cq;
int i, err;
+ unsigned long timeout;
if (apc->port_is_up)
return -EINVAL;
@@ -2348,13 +2351,26 @@ static int mana_dealloc_queues(struct net_device *ndev)
*
* Drain all the in-flight TX packets
*/
+
+ timeout = jiffies + 120 * HZ;
for (i = 0; i < apc->num_queues; i++) {
txq = &apc->tx_qp[i].txq;
-
- while (atomic_read(&txq->pending_sends) > 0)
+ while (atomic_read(&txq->pending_sends) > 0 &&
+ time_before(jiffies, timeout)) {
usleep_range(1000, 2000);
+ }
}
+ for (i = 0; i < apc->num_queues; i++) {
+ txq = &apc->tx_qp[i].txq;
+ cq = &apc->tx_qp[i].tx_cq;
+ while (atomic_read(&txq->pending_sends)) {
+ skb = skb_dequeue(&txq->pending_skbs);
+ mana_unmap_skb(skb, apc);
+ napi_consume_skb(skb, cq->budget);
+ atomic_sub(1, &txq->pending_sends);
+ }
+ }
/* We're 100% sure the queues can no longer be woken up, because
* we're sure now mana_poll_tx_cq() can't be running.
*/
@@ -2605,6 +2621,7 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
}
}
+ ac->vf_unload_timeout = false;
err = add_adev(gd);
out:
if (err)
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 9eef19972845..34f5d8e06ede 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -361,6 +361,8 @@ struct mana_context {
struct mana_eq *eqs;
struct net_device *ports[MAX_PORTS_IN_MANA_DEV];
+
+ bool vf_unload_timeout;
};
struct mana_port_context {
--
2.34.1
^ permalink raw reply related
* [PATCH 09/11] sysctl: Remove the end element in sysctl table arrays
From: Joel Granados @ 2023-06-21 9:48 UTC (permalink / raw)
To: mcgrof, Russell King, Catalin Marinas, Will Deacon,
Michael Ellerman, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Gerald Schaefer, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Herbert Xu, David S. Miller, Russ Weight, Greg Kroah-Hartman,
Phillip Potter, Clemens Ladisch, Arnd Bergmann, Corey Minyard,
Theodore Ts'o, Jason A. Donenfeld, Jani Nikula,
Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, David Airlie,
Daniel Vetter, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Jason Gunthorpe, Leon Romanovsky,
Benjamin Herrenschmidt, Song Liu, Robin Holt, Steve Wahl,
David Ahern, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Sudip Mukherjee, Mark Rutland, James E.J. Bottomley,
Martin K. Petersen, Doug Gilbert, Jiri Slaby, Juergen Gross,
Stefano Stabellini, Alexander Viro, Christian Brauner,
Benjamin LaHaise, David Howells, Jan Harkes, coda,
Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
Jan Kara, Anton Altaparmakov, Mark Fasheh, Joel Becker, Joseph Qi,
Kees Cook, Iurii Zaikin, Eric Biggers, Darrick J. Wong,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Balbir Singh, Eric Biederman, Naveen N. Rao, Anil S Keshavamurthy,
Masami Hiramatsu, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
Juri Lelli, Vincent Guittot, John Stultz, Steven Rostedt,
Andrew Morton, Mike Kravetz, Muchun Song, Naoya Horiguchi,
Matthew Wilcox (Oracle), Joerg Reuter, Ralf Baechle,
Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Roopa Prabhu, Nikolay Aleksandrov, Alexander Aring,
Stefan Schmidt, Miquel Raynal, Steffen Klassert, Matthieu Baerts,
Mat Martineau, Simon Horman, Julian Anastasov,
Remi Denis-Courmont, Santosh Shilimkar, Marc Dionne, Neil Horman,
Marcelo Ricardo Leitner, Xin Long, Karsten Graul, Wenjia Zhang,
Jan Karcher, Jon Maloy, Ying Xue, Martin Schiller, John Johansen,
Paul Moore, James Morris, Serge E. Hallyn, Jarkko Sakkinen
Cc: Joel Granados, Nicholas Piggin, Christophe Leroy,
Christian Borntraeger, Sven Schnelle, H. Peter Anvin,
Rafael J. Wysocki, Mike Travis, Oleksandr Tyshchenko,
Amir Goldstein, Matthew Bobrowski, John Fastabend,
Martin KaFai Lau, Yonghong Song, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Waiman Long, Boqun Feng, John Ogness,
Dietmar Eggemann, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, Valentin Schneider, Andy Lutomirski,
Will Drewry, Stephen Boyd, Miaohe Lin, linux-arm-kernel,
linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-crypto,
openipmi-developer, intel-gfx, dri-devel, linux-hyperv,
linux-rdma, linux-raid, netdev, linux-scsi, xen-devel,
linux-fsdevel, linux-aio, linux-cachefs, codalist, linux-mm,
linux-nfs, linux-ntfs-dev, ocfs2-devel, fsverity, linux-xfs, bpf,
kexec, linux-trace-kernel, linux-hams, netfilter-devel, coreteam,
bridge, dccp, linux-wpan, mptcp, lvs-devel, rds-devel, linux-afs,
linux-sctp, tipc-discussion, linux-x25, apparmor,
linux-security-module, keyrings
In-Reply-To: <20230621091000.424843-1-j.granados@samsung.com>
Remove the empty end element from all the arrays that are passed to the
register sysctl calls. In some files this means reducing the explicit
array size by one. Also make sure that we are using the size in
ctl_table_header instead of evaluating the .procname element.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
arch/arm/kernel/isa.c | 4 +-
arch/arm64/kernel/armv8_deprecated.c | 8 ++--
arch/arm64/kernel/fpsimd.c | 6 +--
arch/arm64/kernel/process.c | 3 +-
arch/ia64/kernel/crash.c | 3 +-
arch/powerpc/kernel/idle.c | 3 +-
arch/powerpc/platforms/pseries/mobility.c | 3 +-
arch/s390/appldata/appldata_base.c | 7 ++--
arch/s390/kernel/debug.c | 3 +-
arch/s390/kernel/topology.c | 3 +-
arch/s390/mm/cmm.c | 3 +-
arch/s390/mm/pgalloc.c | 3 +-
arch/x86/entry/vdso/vdso32-setup.c | 3 +-
arch/x86/kernel/cpu/intel.c | 3 +-
arch/x86/kernel/itmt.c | 3 +-
crypto/fips.c | 3 +-
drivers/base/firmware_loader/fallback_table.c | 3 +-
drivers/cdrom/cdrom.c | 3 +-
drivers/char/hpet.c | 13 +++---
drivers/char/ipmi/ipmi_poweroff.c | 3 +-
drivers/char/random.c | 3 +-
drivers/gpu/drm/i915/i915_perf.c | 33 +++++++--------
drivers/hv/hv_common.c | 3 +-
drivers/infiniband/core/iwcm.c | 3 +-
drivers/infiniband/core/ucma.c | 3 +-
drivers/macintosh/mac_hid.c | 3 +-
drivers/md/md.c | 3 +-
drivers/misc/sgi-xp/xpc_main.c | 6 +--
drivers/net/vrf.c | 3 +-
drivers/parport/procfs.c | 42 ++++++++-----------
drivers/perf/arm_pmuv3.c | 3 +-
drivers/scsi/scsi_sysctl.c | 3 +-
drivers/scsi/sg.c | 3 +-
drivers/tty/tty_io.c | 3 +-
drivers/xen/balloon.c | 3 +-
fs/aio.c | 3 +-
fs/cachefiles/error_inject.c | 3 +-
fs/coda/sysctl.c | 3 +-
fs/coredump.c | 3 +-
fs/dcache.c | 3 +-
fs/devpts/inode.c | 3 +-
fs/eventpoll.c | 3 +-
fs/exec.c | 3 +-
fs/file_table.c | 3 +-
fs/inode.c | 3 +-
fs/lockd/svc.c | 3 +-
fs/locks.c | 3 +-
fs/namei.c | 3 +-
fs/namespace.c | 3 +-
fs/nfs/nfs4sysctl.c | 3 +-
fs/nfs/sysctl.c | 3 +-
fs/notify/dnotify/dnotify.c | 3 +-
fs/notify/fanotify/fanotify_user.c | 3 +-
fs/notify/inotify/inotify_user.c | 3 +-
fs/ntfs/sysctl.c | 3 +-
fs/ocfs2/stackglue.c | 3 +-
fs/pipe.c | 3 +-
fs/proc/proc_sysctl.c | 8 ++--
fs/quota/dquot.c | 3 +-
fs/sysctls.c | 3 +-
fs/userfaultfd.c | 3 +-
fs/verity/signature.c | 3 +-
fs/xfs/xfs_sysctl.c | 4 +-
init/do_mounts_initrd.c | 3 +-
ipc/ipc_sysctl.c | 3 +-
ipc/mq_sysctl.c | 3 +-
kernel/acct.c | 3 +-
kernel/bpf/syscall.c | 3 +-
kernel/delayacct.c | 3 +-
kernel/exit.c | 3 +-
kernel/hung_task.c | 3 +-
kernel/kexec_core.c | 3 +-
kernel/kprobes.c | 3 +-
kernel/latencytop.c | 3 +-
kernel/locking/lockdep.c | 3 +-
kernel/panic.c | 3 +-
kernel/pid_namespace.c | 3 +-
kernel/pid_sysctl.h | 3 +-
kernel/printk/sysctl.c | 3 +-
kernel/reboot.c | 3 +-
kernel/sched/autogroup.c | 3 +-
kernel/sched/core.c | 3 +-
kernel/sched/deadline.c | 3 +-
kernel/sched/fair.c | 3 +-
kernel/sched/rt.c | 3 +-
kernel/sched/topology.c | 3 +-
kernel/seccomp.c | 3 +-
kernel/signal.c | 3 +-
kernel/stackleak.c | 3 +-
kernel/sysctl.c | 6 +--
kernel/time/timer.c | 3 +-
kernel/trace/ftrace.c | 3 +-
kernel/trace/trace_events_user.c | 3 +-
kernel/ucount.c | 7 ++--
kernel/umh.c | 3 +-
kernel/utsname_sysctl.c | 3 +-
kernel/watchdog.c | 3 +-
lib/test_sysctl.c | 6 +--
mm/compaction.c | 3 +-
mm/hugetlb.c | 3 +-
mm/hugetlb_vmemmap.c | 3 +-
mm/memory-failure.c | 3 +-
mm/oom_kill.c | 3 +-
mm/page-writeback.c | 3 +-
net/appletalk/sysctl_net_atalk.c | 3 +-
net/ax25/sysctl_net_ax25.c | 5 +--
net/bridge/br_netfilter_hooks.c | 3 +-
net/core/neighbour.c | 14 +++----
net/core/sysctl_net_core.c | 6 +--
net/dccp/sysctl.c | 4 +-
net/ieee802154/6lowpan/reassembly.c | 6 +--
net/ipv4/devinet.c | 5 +--
net/ipv4/ip_fragment.c | 6 +--
net/ipv4/route.c | 6 +--
net/ipv4/sysctl_net_ipv4.c | 6 +--
net/ipv4/xfrm4_policy.c | 3 +-
net/ipv6/addrconf.c | 5 +--
net/ipv6/icmp.c | 3 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 3 +-
net/ipv6/reassembly.c | 6 +--
net/ipv6/route.c | 3 +-
net/ipv6/sysctl_net_ipv6.c | 6 +--
net/ipv6/xfrm6_policy.c | 3 +-
net/llc/sysctl_net_llc.c | 4 +-
net/mpls/af_mpls.c | 10 ++---
net/mptcp/ctrl.c | 3 +-
net/netfilter/ipvs/ip_vs_ctl.c | 3 +-
net/netfilter/ipvs/ip_vs_lblc.c | 3 +-
net/netfilter/ipvs/ip_vs_lblcr.c | 3 +-
net/netfilter/nf_conntrack_standalone.c | 10 ++---
net/netfilter/nf_log.c | 5 +--
net/netrom/sysctl_net_netrom.c | 3 +-
net/phonet/sysctl.c | 3 +-
net/rds/ib_sysctl.c | 3 +-
net/rds/sysctl.c | 3 +-
net/rds/tcp.c | 3 +-
net/rose/sysctl_net_rose.c | 3 +-
net/rxrpc/sysctl.c | 3 +-
net/sctp/sysctl.c | 10 ++---
net/smc/smc_sysctl.c | 3 +-
net/sunrpc/sysctl.c | 3 +-
net/sunrpc/xprtrdma/svc_rdma.c | 3 +-
net/sunrpc/xprtrdma/transport.c | 3 +-
net/sunrpc/xprtsock.c | 3 +-
net/tipc/sysctl.c | 3 +-
net/unix/sysctl_net_unix.c | 3 +-
net/x25/sysctl_net_x25.c | 3 +-
net/xfrm/xfrm_sysctl.c | 3 +-
security/apparmor/lsm.c | 4 +-
security/keys/sysctl.c | 7 ++--
security/loadpin/loadpin.c | 3 +-
security/yama/yama_lsm.c | 3 +-
152 files changed, 228 insertions(+), 407 deletions(-)
diff --git a/arch/arm/kernel/isa.c b/arch/arm/kernel/isa.c
index 561432e3c55a..72b1a0e63d21 100644
--- a/arch/arm/kernel/isa.c
+++ b/arch/arm/kernel/isa.c
@@ -16,7 +16,7 @@
static unsigned int isa_membase, isa_portbase, isa_portshift;
-static struct ctl_table ctl_isa_vars[4] = {
+static struct ctl_table ctl_isa_vars[] = {
{
.procname = "membase",
.data = &isa_membase,
@@ -35,7 +35,7 @@ static struct ctl_table ctl_isa_vars[4] = {
.maxlen = sizeof(isa_portshift),
.mode = 0444,
.proc_handler = proc_dointvec,
- }, {}
+ }
};
static struct ctl_table_header *isa_sysctl_header;
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 68ed60a521a6..43945a8bb8e0 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -52,10 +52,8 @@ struct insn_emulation {
int min;
int max;
- /*
- * sysctl for this emulation + a sentinal entry.
- */
- struct ctl_table sysctl[2];
+ /* sysctl for this emulation */
+ struct ctl_table sysctl;
};
#define ARM_OPCODE_CONDTEST_FAIL 0
@@ -558,7 +556,7 @@ static void __init register_insn_emulation(struct insn_emulation *insn)
update_insn_emulation_mode(insn, INSN_UNDEF);
if (insn->status != INSN_UNAVAILABLE) {
- sysctl = &insn->sysctl[0];
+ sysctl = &insn->sysctl;
sysctl->mode = 0644;
sysctl->maxlen = sizeof(int);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ecfb2ef6a036..37155b4ae893 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -588,8 +588,7 @@ static struct ctl_table sve_default_vl_table[] = {
.mode = 0644,
.proc_handler = vec_proc_do_default_vl,
.extra1 = &vl_info[ARM64_VEC_SVE],
- },
- { }
+ }
};
static int __init sve_sysctl_init(void)
@@ -613,8 +612,7 @@ static struct ctl_table sme_default_vl_table[] = {
.mode = 0644,
.proc_handler = vec_proc_do_default_vl,
.extra1 = &vl_info[ARM64_VEC_SME],
- },
- { }
+ }
};
static int __init sme_sysctl_init(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index cfe232960f2f..ae837130a265 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -723,8 +723,7 @@ static struct ctl_table tagged_addr_sysctl_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init tagged_addr_init(void)
diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index 66917b879b2a..958ab3bbbdbc 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -231,8 +231,7 @@ static struct ctl_table kdump_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
#endif
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 3807169fc7e7..f98f7b00d3cf 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -104,8 +104,7 @@ static struct ctl_table powersave_nap_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- {}
+ }
};
static int __init
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 9fdbee8ee126..48337c9dd3a0 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -59,8 +59,7 @@ static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
- },
- {}
+ }
};
static int __init register_nmi_wd_lpm_factor_sysctl(void)
diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index 54d8ed1c4518..0e1136b3dc01 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -62,8 +62,7 @@ static struct ctl_table appldata_table[] = {
.procname = "interval",
.mode = S_IRUGO | S_IWUSR,
.proc_handler = appldata_interval_handler,
- },
- { },
+ }
};
/*
@@ -352,7 +351,7 @@ int appldata_register_ops(struct appldata_ops *ops)
return -EINVAL;
/* The last entry must be an empty one */
- ops->ctl_table = kcalloc(2, sizeof(struct ctl_table), GFP_KERNEL);
+ ops->ctl_table = kcalloc(1, sizeof(struct ctl_table), GFP_KERNEL);
if (!ops->ctl_table)
return -ENOMEM;
@@ -365,7 +364,7 @@ int appldata_register_ops(struct appldata_ops *ops)
ops->ctl_table[0].proc_handler = appldata_generic_handler;
ops->ctl_table[0].data = ops;
- ops->sysctl_header = register_sysctl(appldata_proc_name, ops->ctl_table);
+ ops->sysctl_header = register_sysctl(appldata_proc_name, ops->ctl_table, 1);
if (!ops->sysctl_header)
goto out;
return 0;
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index 002f843e6523..24f33be6565d 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -977,8 +977,7 @@ static struct ctl_table s390dbf_table[] = {
.maxlen = sizeof(int),
.mode = S_IRUGO | S_IWUSR,
.proc_handler = s390dbf_procactive,
- },
- { }
+ }
};
static struct ctl_table_header *s390dbf_sysctl_header;
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 372d2c7c9a8e..931da71b8a4a 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -633,8 +633,7 @@ static struct ctl_table topology_ctl_table[] = {
.procname = "topology",
.mode = 0644,
.proc_handler = topology_ctl_handler,
- },
- { },
+ }
};
static int __init topology_init(void)
diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index 918816dcb42a..1b304352a3e9 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -331,8 +331,7 @@ static struct ctl_table cmm_table[] = {
.procname = "cmm_timeout",
.mode = 0644,
.proc_handler = cmm_timeout_handler,
- },
- { }
+ }
};
#ifdef CONFIG_CMM_IUCV
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index a723f1a8236a..59444f580d0d 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -29,8 +29,7 @@ static struct ctl_table page_table_sysctl[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init page_table_register_sysctl(void)
diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c
index e28cdba83e0e..ab794f70a550 100644
--- a/arch/x86/entry/vdso/vdso32-setup.c
+++ b/arch/x86/entry/vdso/vdso32-setup.c
@@ -66,8 +66,7 @@ static struct ctl_table abi_table2[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static __init int ia32_binfmt_init(void)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index c77a3961443d..d446f2a0fbeb 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1189,8 +1189,7 @@ static struct ctl_table sld_sysctls[] = {
.proc_handler = proc_douintvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static int __init sld_mitigate_sysctl_init(void)
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 58ec95fce798..427093e4ef87 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -73,8 +73,7 @@ static struct ctl_table itmt_kern_table[] = {
.proc_handler = sched_itmt_update_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static struct ctl_table_header *itmt_sysctl_header;
diff --git a/crypto/fips.c b/crypto/fips.c
index 05a251680700..611a86bd2538 100644
--- a/crypto/fips.c
+++ b/crypto/fips.c
@@ -62,8 +62,7 @@ static struct ctl_table crypto_sysctl_table[] = {
.maxlen = 64,
.mode = 0444,
.proc_handler = proc_dostring
- },
- {}
+ }
};
static struct ctl_table_header *crypto_sysctls;
diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c
index 7a2d584233bb..d7dedfb2f4d0 100644
--- a/drivers/base/firmware_loader/fallback_table.c
+++ b/drivers/base/firmware_loader/fallback_table.c
@@ -43,8 +43,7 @@ static struct ctl_table firmware_config_table[] = {
.proc_handler = proc_douintvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static struct ctl_table_header *firmware_config_sysct_table_header;
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 3855da76a16d..e1c352ebab16 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3668,8 +3668,7 @@ static struct ctl_table cdrom_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = cdrom_sysctl_handler
- },
- { }
+ }
};
static struct ctl_table_header *cdrom_sysctl_header;
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index bb1eb801b20c..44eaec98f958 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -722,13 +722,12 @@ static int hpet_is_known(struct hpet_data *hdp)
static struct ctl_table hpet_table[] = {
{
- .procname = "max-user-freq",
- .data = &hpet_max_freq,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec,
- },
- {}
+ .procname = "max-user-freq",
+ .data = &hpet_max_freq,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ }
};
static struct ctl_table_header *sysctl_header;
diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c
index 46b1ea866da9..40c43417a42e 100644
--- a/drivers/char/ipmi/ipmi_poweroff.c
+++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -655,8 +655,7 @@ static struct ctl_table ipmi_table[] = {
.data = &poweroff_powercycle,
.maxlen = sizeof(poweroff_powercycle),
.mode = 0644,
- .proc_handler = proc_dointvec },
- { }
+ .proc_handler = proc_dointvec }
};
static struct ctl_table_header *ipmi_table_header;
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 8db2ea9e3d66..e2998580afc6 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1682,8 +1682,7 @@ static struct ctl_table random_table[] = {
.procname = "uuid",
.mode = 0444,
.proc_handler = proc_do_uuid,
- },
- { }
+ }
};
/*
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f43950219ffc..e4d7372afb10 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -4884,24 +4884,23 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
static struct ctl_table oa_table[] = {
{
- .procname = "perf_stream_paranoid",
- .data = &i915_perf_stream_paranoid,
- .maxlen = sizeof(i915_perf_stream_paranoid),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
+ .procname = "perf_stream_paranoid",
+ .data = &i915_perf_stream_paranoid,
+ .maxlen = sizeof(i915_perf_stream_paranoid),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
{
- .procname = "oa_max_sample_rate",
- .data = &i915_oa_max_sample_rate,
- .maxlen = sizeof(i915_oa_max_sample_rate),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- .extra2 = &oa_sample_rate_hard_limit,
- },
- {}
+ .procname = "oa_max_sample_rate",
+ .data = &i915_oa_max_sample_rate,
+ .maxlen = sizeof(i915_oa_max_sample_rate),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = &oa_sample_rate_hard_limit,
+ }
};
static u32 num_perf_groups_per_gt(struct intel_gt *gt)
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index dd751c391cf7..0216ccd96496 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -146,8 +146,7 @@ static struct ctl_table hv_ctl_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE
- },
- {}
+ }
};
static int hv_die_panic_notify_crash(struct notifier_block *self,
diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 20627a894c89..0147aae8fe9b 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -110,8 +110,7 @@ static struct ctl_table iwcm_ctl_table[] = {
.maxlen = sizeof(default_backlog),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
/*
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index f737ab0de883..cbe1ebef2f2e 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -70,8 +70,7 @@ static struct ctl_table ucma_ctl_table[] = {
.maxlen = sizeof max_backlog,
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
struct ucma_file {
diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index 5d433ef430fa..822517cf4735 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -235,8 +235,7 @@ static struct ctl_table mac_hid_files[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static struct ctl_table_header *mac_hid_sysctl_header;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c10cc8ddd94d..3ad23a7bfbc5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -318,8 +318,7 @@ static struct ctl_table raid_table[] = {
.maxlen = sizeof(int),
.mode = S_IRUGO|S_IWUSR,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static int start_readonly;
diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 264b919d0610..3e6a598df22a 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -109,8 +109,7 @@ static struct ctl_table xpc_sys_xpc_hb[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &xpc_hb_check_min_interval,
- .extra2 = &xpc_hb_check_max_interval},
- {}
+ .extra2 = &xpc_hb_check_max_interval}
};
static struct ctl_table xpc_sys_xpc[] = {
{
@@ -120,8 +119,7 @@ static struct ctl_table xpc_sys_xpc[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &xpc_disengage_min_timelimit,
- .extra2 = &xpc_disengage_max_timelimit},
- {}
+ .extra2 = &xpc_disengage_max_timelimit}
};
static struct ctl_table_header *xpc_sysctl;
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index edd8f2ba5595..22dedd3671ec 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1964,8 +1964,7 @@ static const struct ctl_table vrf_table[] = {
.proc_handler = vrf_shared_table_handler,
/* set by the vrf_netns_init */
.extra1 = NULL,
- },
- { },
+ }
};
static int vrf_netns_init_sysctl(struct net *net, struct netns_vrf *nn_vrf)
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 16cee52f035f..f6e0121f8904 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -259,8 +259,12 @@ PARPORT_MAX_SPINTIME_VALUE;
struct parport_sysctl_table {
struct ctl_table_header *port_header;
struct ctl_table_header *devices_header;
- struct ctl_table vars[12];
- struct ctl_table device_dir[2];
+#ifdef CONFIG_PARPORT_1284
+ struct ctl_table vars[10];
+#else
+ struct ctl_table vars[5];
+#endif /* IEEE 1284 support */
+ struct ctl_table device_dir[1];
};
static const struct parport_sysctl_table parport_sysctl_template = {
@@ -303,9 +307,9 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.maxlen = 0,
.mode = 0444,
.proc_handler = do_hardware_modes
- },
+ }
#ifdef CONFIG_PARPORT_1284
- {
+ , {
.procname = "autoprobe",
.data = NULL,
.maxlen = 0,
@@ -339,9 +343,8 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.maxlen = 0,
.mode = 0444,
.proc_handler = do_autoprobe
- },
+ }
#endif /* IEEE 1284 support */
- {}
},
{
{
@@ -350,20 +353,15 @@ static const struct parport_sysctl_table parport_sysctl_template = {
.maxlen = 0,
.mode = 0444,
.proc_handler = do_active_device
- },
- {}
+ }
},
};
struct parport_device_sysctl_table
{
struct ctl_table_header *sysctl_header;
- struct ctl_table vars[2];
- struct ctl_table device_dir[2];
- struct ctl_table devices_root_dir[2];
- struct ctl_table port_dir[2];
- struct ctl_table parport_dir[2];
- struct ctl_table dev_dir[2];
+ struct ctl_table vars[1];
+ struct ctl_table device_dir[1];
};
static const struct parport_device_sysctl_table
@@ -378,8 +376,7 @@ parport_device_sysctl_template = {
.proc_handler = proc_doulongvec_ms_jiffies_minmax,
.extra1 = (void*) &parport_min_timeslice_value,
.extra2 = (void*) &parport_max_timeslice_value
- },
- {}
+ }
},
{
{
@@ -387,18 +384,14 @@ parport_device_sysctl_template = {
.data = NULL,
.maxlen = 0,
.mode = 0555,
- },
- {}
+ }
}
};
struct parport_default_sysctl_table
{
struct ctl_table_header *sysctl_header;
- struct ctl_table vars[3];
- struct ctl_table default_dir[2];
- struct ctl_table parport_dir[2];
- struct ctl_table dev_dir[2];
+ struct ctl_table vars[2];
};
static struct parport_default_sysctl_table
@@ -422,8 +415,7 @@ parport_default_sysctl_table = {
.proc_handler = proc_dointvec_minmax,
.extra1 = (void*) &parport_min_spintime_value,
.extra2 = (void*) &parport_max_spintime_value
- },
- {}
+ }
}
};
@@ -443,7 +435,9 @@ int parport_proc_register(struct parport *port)
t->vars[0].data = &port->spintime;
for (i = 0; i < 5; i++) {
t->vars[i].extra1 = port;
+#ifdef CONFIG_PARPORT_1284
t->vars[5 + i].extra2 = &port->probe_info[i];
+#endif /* IEEE 1284 support */
}
port_name_len = strnlen(port->name, PARPORT_NAME_MAX_LEN);
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 763f9c8acfbf..85285c85bd49 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1179,8 +1179,7 @@ static struct ctl_table armv8_pmu_sysctl_table[] = {
.proc_handler = armv8pmu_proc_user_access_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static void armv8_pmu_register_sysctl_table(void)
diff --git a/drivers/scsi/scsi_sysctl.c b/drivers/scsi/scsi_sysctl.c
index 0378bd63fea4..22c2d055821e 100644
--- a/drivers/scsi/scsi_sysctl.c
+++ b/drivers/scsi/scsi_sysctl.c
@@ -17,8 +17,7 @@ static struct ctl_table scsi_table[] = {
.data = &scsi_logging_level,
.maxlen = sizeof(scsi_logging_level),
.mode = 0644,
- .proc_handler = proc_dointvec },
- { }
+ .proc_handler = proc_dointvec }
};
static struct ctl_table_header *scsi_table_header;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d12cdf875b50..102a3640c6c7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1638,8 +1638,7 @@ static struct ctl_table sg_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0444,
.proc_handler = proc_dointvec,
- },
- {}
+ }
};
static struct ctl_table_header *hdr;
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 63fb3c543b94..bd6b12394d76 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3608,8 +3608,7 @@ static struct ctl_table tty_table[] = {
.proc_handler = proc_dointvec,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
/*
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index e4544262a429..bef75c236104 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -93,8 +93,7 @@ static struct ctl_table balloon_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
#else
diff --git a/fs/aio.c b/fs/aio.c
index b09abe7a14d3..f2ed0a34a5d4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -238,8 +238,7 @@ static struct ctl_table aio_sysctls[] = {
.maxlen = sizeof(aio_max_nr),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- },
- {}
+ }
};
static void __init aio_sysctl_init(void)
diff --git a/fs/cachefiles/error_inject.c b/fs/cachefiles/error_inject.c
index ea6bcce4f6f1..4fa84880c0d1 100644
--- a/fs/cachefiles/error_inject.c
+++ b/fs/cachefiles/error_inject.c
@@ -18,8 +18,7 @@ static struct ctl_table cachefiles_sysctls[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_douintvec,
- },
- {}
+ }
};
int __init cachefiles_register_error_injection(void)
diff --git a/fs/coda/sysctl.c b/fs/coda/sysctl.c
index 16224a7c6691..e377e400bfed 100644
--- a/fs/coda/sysctl.c
+++ b/fs/coda/sysctl.c
@@ -35,8 +35,7 @@ static struct ctl_table coda_table[] = {
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = proc_dointvec
- },
- {}
+ }
};
void coda_sysctl_init(void)
diff --git a/fs/coredump.c b/fs/coredump.c
index 7e55428dce13..99142426a156 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -978,8 +978,7 @@ static struct ctl_table coredump_sysctls[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static int __init init_fs_coredump_sysctls(void)
diff --git a/fs/dcache.c b/fs/dcache.c
index f02bfd383e66..257450fd8f01 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -190,8 +190,7 @@ static struct ctl_table fs_dcache_sysctls[] = {
.maxlen = 6*sizeof(long),
.mode = 0444,
.proc_handler = proc_nr_dentry,
- },
- { }
+ }
};
static int __init init_fs_dcache_sysctls(void)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index c17f971a8c4b..8d56add71e71 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -68,8 +68,7 @@ static struct ctl_table pty_table[] = {
.mode = 0444,
.data = &pty_count,
.proc_handler = proc_dointvec,
- },
- {}
+ }
};
struct pts_mount_opts {
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e1a0e6a6d3de..b0556c52685c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -321,8 +321,7 @@ static struct ctl_table epoll_table[] = {
.proc_handler = proc_doulongvec_minmax,
.extra1 = &long_zero,
.extra2 = &long_max,
- },
- { }
+ }
};
static void __init epoll_sysctls_init(void)
diff --git a/fs/exec.c b/fs/exec.c
index 5572d148738b..9458ef2b8028 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2164,8 +2164,7 @@ static struct ctl_table fs_exec_sysctls[] = {
.proc_handler = proc_dointvec_minmax_coredump,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
- },
- { }
+ }
};
static int __init init_fs_exec_sysctls(void)
diff --git a/fs/file_table.c b/fs/file_table.c
index 23a645521960..6fec4c691f0a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -114,8 +114,7 @@ static struct ctl_table fs_stat_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &sysctl_nr_open_min,
.extra2 = &sysctl_nr_open_max,
- },
- { }
+ }
};
static int __init init_fs_stat_sysctls(void)
diff --git a/fs/inode.c b/fs/inode.c
index 0a0ad1a2a5d2..79c5916cade7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -129,8 +129,7 @@ static struct ctl_table inodes_sysctls[] = {
.maxlen = 7*sizeof(long),
.mode = 0444,
.proc_handler = proc_nr_inodes,
- },
- { }
+ }
};
static int __init init_fs_inode_sysctls(void)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 84736267f4e1..082fcf6340d4 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -506,8 +506,7 @@ static struct ctl_table nlm_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
#endif /* CONFIG_SYSCTL */
diff --git a/fs/locks.c b/fs/locks.c
index ce5733480aa6..9750076cdd8d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -109,9 +109,8 @@ static struct ctl_table locks_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
+ }
#endif /* CONFIG_MMU */
- {}
};
static int __init init_fs_locks_sysctls(void)
diff --git a/fs/namei.c b/fs/namei.c
index 9b567af081af..b0f8d09c2111 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1067,8 +1067,7 @@ static struct ctl_table namei_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
- },
- { }
+ }
};
static int __init init_fs_namei_sysctls(void)
diff --git a/fs/namespace.c b/fs/namespace.c
index e7f251e40485..780e9292fa52 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4709,8 +4709,7 @@ static struct ctl_table fs_namespace_sysctls[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init init_fs_namespace_sysctls(void)
diff --git a/fs/nfs/nfs4sysctl.c b/fs/nfs/nfs4sysctl.c
index 4a542ee11e68..5515c2e8afe7 100644
--- a/fs/nfs/nfs4sysctl.c
+++ b/fs/nfs/nfs4sysctl.c
@@ -33,8 +33,7 @@ static struct ctl_table nfs4_cb_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
int nfs4_register_sysctl(void)
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index 9dafd44670e4..8a71d31e5dc3 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -28,8 +28,7 @@ static struct ctl_table nfs_cb_sysctls[] = {
.maxlen = sizeof(nfs_congestion_kb),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
int nfs_register_sysctl(void)
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 2c6fe98d6fe1..409ca0a9a048 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -28,8 +28,7 @@ static struct ctl_table dnotify_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- {}
+ }
};
static void __init dnotify_sysctl_init(void)
{
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 78d3bf479f59..4c43eb38b9cf 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -85,8 +85,7 @@ static struct ctl_table fanotify_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO
- },
- { }
+ }
};
static void __init fanotify_sysctls_init(void)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 0ce25c4ddfec..02b74d8b4e28 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -84,8 +84,7 @@ static struct ctl_table inotify_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO
- },
- { }
+ }
};
static void __init inotify_sysctls_init(void)
diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
index 2c48f48a0b80..fef88fb6a40f 100644
--- a/fs/ntfs/sysctl.c
+++ b/fs/ntfs/sysctl.c
@@ -27,8 +27,7 @@ static struct ctl_table ntfs_sysctls[] = {
.maxlen = sizeof(debug_msgs),
.mode = 0644, /* Mode, proc handler. */
.proc_handler = proc_dointvec
- },
- {}
+ }
};
/* Storage for the sysctls header. */
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index 9a653875d1c5..7be619f93960 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -657,8 +657,7 @@ static struct ctl_table ocfs2_nm_table[] = {
.maxlen = OCFS2_MAX_HB_CTL_PATH,
.mode = 0644,
.proc_handler = proc_dostring,
- },
- { }
+ }
};
static struct ctl_table_header *ocfs2_table_header;
diff --git a/fs/pipe.c b/fs/pipe.c
index 8a808fc25552..8fab91c2d546 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1491,8 +1491,7 @@ static struct ctl_table fs_pipe_sysctls[] = {
.maxlen = sizeof(pipe_user_pages_soft),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- },
- { }
+ }
};
#endif
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 9670c5b7b5b2..1debd01209fc 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -19,8 +19,9 @@
#include <linux/kmemleak.h>
#include "internal.h"
-#define list_for_each_table_entry(entry, header) \
- for ((entry) = (header->ctl_table); (entry)->procname; (entry)++)
+#define list_for_each_table_entry(entry, header) \
+ entry = header->ctl_table; \
+ for (size_t i = 0 ; i < header->ctl_table_size ; ++i, entry++)
static const struct dentry_operations proc_sys_dentry_operations;
static const struct file_operations proc_sys_file_operations;
@@ -69,8 +70,7 @@ static struct ctl_table root_table[] = {
{
.procname = "",
.mode = S_IFDIR|S_IRUGO|S_IXUGO,
- },
- { }
+ }
};
static struct ctl_table_root sysctl_table_root = {
.default_set.dir.header = {
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 7c07654e4253..1eb18c8bd639 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -2941,9 +2941,8 @@ static struct ctl_table fs_dqstats_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
+ }
#endif
- { },
};
static int __init dquot_init(void)
diff --git a/fs/sysctls.c b/fs/sysctls.c
index 944254dd92c0..d6ed656738ff 100644
--- a/fs/sysctls.c
+++ b/fs/sysctls.c
@@ -25,8 +25,7 @@ static struct ctl_table fs_shared_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_MAXOLDUID,
- },
- { }
+ }
};
static int __init init_fs_sysctls(void)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 4c3858769226..165b9c52e626 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -44,8 +44,7 @@ static struct ctl_table vm_userfaultfd_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
#endif
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index f617c6a1f16c..05585e93f32b 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c
@@ -97,8 +97,7 @@ static struct ctl_table fsverity_sysctl_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init fsverity_sysctl_init(void)
diff --git a/fs/xfs/xfs_sysctl.c b/fs/xfs/xfs_sysctl.c
index 61075e9c9e37..5c6337526070 100644
--- a/fs/xfs/xfs_sysctl.c
+++ b/fs/xfs/xfs_sysctl.c
@@ -204,10 +204,8 @@ static struct ctl_table xfs_table[] = {
.proc_handler = xfs_stats_clear_proc_handler,
.extra1 = &xfs_params.stats_clear.min,
.extra2 = &xfs_params.stats_clear.max
- },
+ }
#endif /* CONFIG_PROC_FS */
-
- {}
};
int
diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
index 2b10abb8c80e..a894519efdd3 100644
--- a/init/do_mounts_initrd.c
+++ b/init/do_mounts_initrd.c
@@ -28,8 +28,7 @@ static struct ctl_table kern_do_mounts_initrd_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static __init int kernel_do_mounts_initrd_sysctls_init(void)
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 8c62e443f78b..a46d15f5b476 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -175,9 +175,8 @@ static struct ctl_table ipc_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_INT_MAX,
- },
+ }
#endif
- {}
};
static struct ctl_table_set *set_lookup(struct ctl_table_root *root)
diff --git a/ipc/mq_sysctl.c b/ipc/mq_sysctl.c
index ebb5ed81c151..8191d03b39cb 100644
--- a/ipc/mq_sysctl.c
+++ b/ipc/mq_sysctl.c
@@ -62,8 +62,7 @@ static struct ctl_table mq_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &msg_maxsize_limit_min,
.extra2 = &msg_maxsize_limit_max,
- },
- {}
+ }
};
static struct ctl_table_set *set_lookup(struct ctl_table_root *root)
diff --git a/kernel/acct.c b/kernel/acct.c
index 67125b7c5ca2..93417042762b 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -83,8 +83,7 @@ static struct ctl_table kern_acct_table[] = {
.maxlen = 3*sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static __init int kernel_acct_sysctls_init(void)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a81b5122b16b..980ad104fff8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5400,8 +5400,7 @@ static struct ctl_table bpf_syscall_table[] = {
.data = &bpf_stats_enabled_key.key,
.mode = 0644,
.proc_handler = bpf_stats_handler,
- },
- { }
+ }
};
static int __init bpf_syscall_sysctl_init(void)
diff --git a/kernel/delayacct.c b/kernel/delayacct.c
index 4ef14cb5b5a0..539cab051d17 100644
--- a/kernel/delayacct.c
+++ b/kernel/delayacct.c
@@ -73,8 +73,7 @@ static struct ctl_table kern_delayacct_table[] = {
.proc_handler = sysctl_delayacct,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static __init int kernel_delayacct_sysctls_init(void)
diff --git a/kernel/exit.c b/kernel/exit.c
index 633c7a52ef80..87cb53a33bbc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -89,8 +89,7 @@ static struct ctl_table kern_exit_table[] = {
.maxlen = sizeof(oops_limit),
.mode = 0644,
.proc_handler = proc_douintvec,
- },
- { }
+ }
};
static __init int kernel_exit_sysctls_init(void)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 816f133266c4..8d0659453421 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -312,8 +312,7 @@ static struct ctl_table hung_task_sysctls[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_NEG_ONE,
- },
- {}
+ }
};
static void __init hung_task_sysctl_init(void)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 63b04e710890..160779e0a503 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1001,8 +1001,7 @@ static struct ctl_table kexec_core_sysctls[] = {
.data = &load_limit_reboot,
.mode = 0644,
.proc_handler = kexec_limit_handler,
- },
- { }
+ }
};
static int __init kexec_core_sysctl_init(void)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 06a3ac7993f0..ae6b0f78ae6c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -967,8 +967,7 @@ static struct ctl_table kprobe_sysctls[] = {
.proc_handler = proc_kprobes_optimization_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static void __init kprobe_sysctls_init(void)
diff --git a/kernel/latencytop.c b/kernel/latencytop.c
index 55050ae0e197..bb4dd0691b3c 100644
--- a/kernel/latencytop.c
+++ b/kernel/latencytop.c
@@ -84,8 +84,7 @@ static struct ctl_table latencytop_sysctl[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = sysctl_latencytop,
- },
- {}
+ }
};
#endif
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 1e29cec7e00c..0db8af590f87 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -95,9 +95,8 @@ static struct ctl_table kern_lockdep_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
+ }
#endif /* CONFIG_LOCK_STAT */
- { }
};
static __init int kernel_lockdep_sysctls_init(void)
diff --git a/kernel/panic.c b/kernel/panic.c
index 0008273d23fd..79786433efda 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -98,8 +98,7 @@ static struct ctl_table kern_panic_table[] = {
.maxlen = sizeof(warn_limit),
.mode = 0644,
.proc_handler = proc_douintvec,
- },
- { }
+ }
};
static __init int kernel_panic_sysctls_init(void)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 7fd5e8adc2e8..dc7adbd2f412 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -311,8 +311,7 @@ static struct ctl_table pid_ns_ctl_table[] = {
.proc_handler = pid_ns_ctl_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = &pid_max,
- },
- { }
+ }
};
#endif /* CONFIG_CHECKPOINT_RESTORE */
diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
index 8b24744752cb..b9528766d2d8 100644
--- a/kernel/pid_sysctl.h
+++ b/kernel/pid_sysctl.h
@@ -43,8 +43,7 @@ static struct ctl_table pid_ns_ctl_table_vm[] = {
.proc_handler = pid_mfd_noexec_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
- },
- { }
+ }
};
static inline void register_pid_ns_sysctl_table_vm(void)
{
diff --git a/kernel/printk/sysctl.c b/kernel/printk/sysctl.c
index 28f37b86414e..d608832b4489 100644
--- a/kernel/printk/sysctl.c
+++ b/kernel/printk/sysctl.c
@@ -75,8 +75,7 @@ static struct ctl_table printk_sysctls[] = {
.proc_handler = proc_dointvec_minmax_sysadmin,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
- },
- {}
+ }
};
void __init printk_sysctl_init(void)
diff --git a/kernel/reboot.c b/kernel/reboot.c
index cf81d8bfb523..e29d415810c1 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -1271,8 +1271,7 @@ static struct ctl_table kern_reboot_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static void __init kernel_reboot_sysctls_init(void)
diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index 2b9ce82279a5..4c558f0de4f7 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -18,8 +18,7 @@ static struct ctl_table sched_autogroup_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static void __init sched_autogroup_sysctl_init(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8c7e01dd78a..f11ac1d3e315 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4671,9 +4671,8 @@ static struct ctl_table sched_core_sysctls[] = {
.proc_handler = sysctl_numa_balancing,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_FOUR,
- },
+ }
#endif /* CONFIG_NUMA_BALANCING */
- {}
};
static int __init sched_core_sysctl_init(void)
{
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 2aacf5ea2ff3..a6cbdf588590 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -40,8 +40,7 @@ static struct ctl_table sched_dl_sysctls[] = {
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
.extra2 = (void *)&sysctl_sched_dl_period_max,
- },
- {}
+ }
};
static int __init sched_dl_sysctl_init(void)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db09e56c2dd3..876f110e696d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -210,9 +210,8 @@ static struct ctl_table sched_fair_sysctls[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- },
+ }
#endif /* CONFIG_NUMA_BALANCING */
- {}
};
static int __init sched_fair_sysctl_init(void)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index aab9b900ed6f..2e2d49467dd9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -51,8 +51,7 @@ static struct ctl_table sched_rt_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = sched_rr_handler,
- },
- {}
+ }
};
static int __init sched_rt_sysctl_init(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 46d7c3f3e830..cd3fffecbce3 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -249,8 +249,7 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
.proc_handler = sched_energy_aware_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static int __init sched_energy_aware_sysctl_init(void)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 9683a9a4709d..1693f0935904 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -2380,8 +2380,7 @@ static struct ctl_table seccomp_sysctl_table[] = {
.procname = "actions_logged",
.mode = 0644,
.proc_handler = seccomp_actions_logged_handler,
- },
- { }
+ }
};
static int __init seccomp_sysctl_init(void)
diff --git a/kernel/signal.c b/kernel/signal.c
index 19791930f12a..4a87ba91491f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -4781,9 +4781,8 @@ static struct ctl_table signal_debug_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
- },
+ }
#endif
- { }
};
static int __init init_signal_sysctls(void)
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index 123844341148..6a9a65ace05a 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -53,8 +53,7 @@ static struct ctl_table stackleak_sysctls[] = {
.proc_handler = stack_erasing_sysctl,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static int __init stackleak_sysctls_init(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2b9b0c8569ba..f1865c593666 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2041,9 +2041,8 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_INT_MAX,
- },
+ }
#endif
- { }
};
static struct ctl_table vm_table[] = {
@@ -2314,9 +2313,8 @@ static struct ctl_table vm_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = (void *)&mmap_rnd_compat_bits_min,
.extra2 = (void *)&mmap_rnd_compat_bits_max,
- },
+ }
#endif
- { }
};
int __init sysctl_init_bases(void)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index de385b365a7a..b7594ac53c99 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -259,8 +259,7 @@ static struct ctl_table timer_sysctl[] = {
.proc_handler = timer_migration_handler,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- {}
+ }
};
static int __init timer_sysctl_init(void)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 84ef42111f78..7a5a607b6cc2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -8213,8 +8213,7 @@ static struct ctl_table ftrace_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = ftrace_enable_sysctl,
- },
- {}
+ }
};
static int __init ftrace_sysctl_init(void)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index ac019cb21b18..baf48c16d2c5 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -2530,8 +2530,7 @@ static struct ctl_table user_event_sysctls[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = set_max_user_events_sysctl,
- },
- {}
+ }
};
static int __init trace_events_user_init(void)
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 59bf6983f1cf..ce7ab90f7953 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -85,10 +85,9 @@ static struct ctl_table user_table[] = {
#endif
#ifdef CONFIG_FANOTIFY
UCOUNT_ENTRY("max_fanotify_groups"),
- UCOUNT_ENTRY("max_fanotify_marks"),
+ UCOUNT_ENTRY("max_fanotify_marks")
#endif
- { }
-};
+ };
#endif /* CONFIG_SYSCTL */
bool setup_userns_sysctls(struct user_namespace *ns)
@@ -96,7 +95,7 @@ bool setup_userns_sysctls(struct user_namespace *ns)
#ifdef CONFIG_SYSCTL
struct ctl_table *tbl;
- BUILD_BUG_ON(ARRAY_SIZE(user_table) != UCOUNT_COUNTS + 1);
+ BUILD_BUG_ON(ARRAY_SIZE(user_table) != UCOUNT_COUNTS);
setup_sysctl_set(&ns->set, &set_root, set_is_seen);
tbl = kmemdup(user_table, sizeof(user_table), GFP_KERNEL);
if (tbl) {
diff --git a/kernel/umh.c b/kernel/umh.c
index 187a30ff8541..e1304be4823a 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -559,8 +559,7 @@ static struct ctl_table usermodehelper_table[] = {
.maxlen = 2 * sizeof(unsigned long),
.mode = 0600,
.proc_handler = proc_cap_handler,
- },
- { }
+ }
};
static int __init init_umh_sysctls(void)
diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c
index 24527b155538..8776d45daf3a 100644
--- a/kernel/utsname_sysctl.c
+++ b/kernel/utsname_sysctl.c
@@ -119,8 +119,7 @@ static struct ctl_table uts_kern_table[] = {
.mode = 0644,
.proc_handler = proc_do_uts_string,
.poll = &domainname_poll,
- },
- {}
+ }
};
#ifdef CONFIG_PROC_SYSCTL
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index dd5a343fadde..b79e6cfc008c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -839,10 +839,9 @@ static struct ctl_table watchdog_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
+ }
#endif /* CONFIG_SMP */
#endif
- {}
};
static void __init watchdog_sysctl_init(void)
diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 83d37a163836..5a9018787d71 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -129,8 +129,7 @@ static struct ctl_table test_table[] = {
.maxlen = SYSCTL_TEST_BITMAP_SIZE,
.mode = 0644,
.proc_handler = proc_do_large_bitmap,
- },
- { }
+ }
};
static void test_sysctl_calc_match_int_ok(void)
@@ -184,8 +183,7 @@ static struct ctl_table test_table_unregister[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
- },
- {}
+ }
};
static int test_sysctl_run_unregister_nested(void)
diff --git a/mm/compaction.c b/mm/compaction.c
index ca09cdd72bf3..5013f5b7b44b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3126,8 +3126,7 @@ static struct ctl_table vm_compaction[] = {
.proc_handler = proc_dointvec_minmax_warn_RT_change,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init kcompactd_init(void)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7838b0c0b82b..5236805aee57 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4675,8 +4675,7 @@ static struct ctl_table hugetlb_table[] = {
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = hugetlb_overcommit_handler,
- },
- { }
+ }
};
static void hugetlb_sysctl_init(void)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 65885a06269b..b1a2a1089aa3 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -584,8 +584,7 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = {
.maxlen = sizeof(vmemmap_optimize_enabled),
.mode = 0644,
.proc_handler = proc_dobool,
- },
- { }
+ }
};
static int __init hugetlb_vmemmap_init(void)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 46aef76d8e91..9bf5dd7a394e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -142,8 +142,7 @@ static struct ctl_table memory_failure_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static int __init memory_failure_sysctl_init(void)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 500cf2ef9faa..a05416f798e7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -725,8 +725,7 @@ static struct ctl_table vm_oom_kill_table[] = {
.maxlen = sizeof(sysctl_oom_dump_tasks),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- {}
+ }
};
#endif
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9f997de8d12f..b75aaae6f77b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2290,8 +2290,7 @@ static struct ctl_table vm_page_writeback_sysctls[] = {
.maxlen = sizeof(laptop_mode),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- {}
+ }
};
#endif
diff --git a/net/appletalk/sysctl_net_atalk.c b/net/appletalk/sysctl_net_atalk.c
index 30dcbbb8aeff..3975c1fad48c 100644
--- a/net/appletalk/sysctl_net_atalk.c
+++ b/net/appletalk/sysctl_net_atalk.c
@@ -39,8 +39,7 @@ static struct ctl_table atalk_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { },
+ }
};
static struct ctl_table_header *atalk_table_header;
diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index 06afbc14b783..e7e81787e5de 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -139,10 +139,9 @@ static const struct ctl_table ax25_param_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_ds_timeout,
.extra2 = &max_ds_timeout
- },
+ }
#endif
-
- { } /* that's all, folks! */
+/* that's all, folks! */
};
int ax25_register_dev_sysctl(ax25_dev *ax25_dev)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index ebbaef748a48..dfc37ac00980 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -1100,8 +1100,7 @@ static struct ctl_table brnf_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
- },
- { }
+ }
};
static inline void br_netfilter_sysctl_default(struct brnf_net *brnf)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index aa5ad1cfc9b1..096d86013300 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3716,7 +3716,7 @@ static int neigh_proc_base_reachable_time(struct ctl_table *ctl, int write,
static struct neigh_sysctl_table {
struct ctl_table_header *sysctl_header;
- struct ctl_table neigh_vars[NEIGH_VAR_MAX + 1];
+ struct ctl_table neigh_vars[NEIGH_VAR_MAX];
} neigh_sysctl_template __read_mostly = {
.neigh_vars = {
NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(MCAST_PROBES, "mcast_solicit"),
@@ -3766,9 +3766,8 @@ static struct neigh_sysctl_table {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_INT_MAX,
.proc_handler = proc_dointvec_minmax,
- },
- {},
- },
+ }
+ }
};
int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
@@ -3779,6 +3778,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
const char *dev_name_source;
char neigh_path[ sizeof("net//neigh/") + IFNAMSIZ + IFNAMSIZ ];
char *p_name;
+ size_t neigh_vars_size;
t = kmemdup(&neigh_sysctl_template, sizeof(*t), GFP_KERNEL_ACCOUNT);
if (!t)
@@ -3790,11 +3790,11 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
t->neigh_vars[i].extra2 = p;
}
+ neigh_vars_size = ARRAY_SIZE(t->neigh_vars);
if (dev) {
dev_name_source = dev->name;
/* Terminate the table early */
- memset(&t->neigh_vars[NEIGH_VAR_GC_INTERVAL], 0,
- sizeof(t->neigh_vars[NEIGH_VAR_GC_INTERVAL]));
+ neigh_vars_size = NEIGH_VAR_BASE_REACHABLE_TIME_MS;
} else {
struct neigh_table *tbl = p->tbl;
dev_name_source = "default";
@@ -3843,7 +3843,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
p_name, dev_name_source);
t->sysctl_header =
register_net_sysctl(neigh_parms_net(p), neigh_path, t->neigh_vars,
- ARRAY_SIZE(t->neigh_vars));
+ neigh_vars_size);
if (!t->sysctl_header)
goto free;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index aa615f22507b..9acde2a110cd 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -652,8 +652,7 @@ static struct ctl_table net_core_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- },
- { }
+ }
};
static struct ctl_table netns_core_table[] = {
@@ -681,8 +680,7 @@ static struct ctl_table netns_core_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
.proc_handler = proc_dou8vec_minmax,
- },
- { }
+ }
};
static int __init fb_tunnels_only_for_init_net_sysctl_setup(char *str)
diff --git a/net/dccp/sysctl.c b/net/dccp/sysctl.c
index 1140748858b0..7a5ccae4fc10 100644
--- a/net/dccp/sysctl.c
+++ b/net/dccp/sysctl.c
@@ -89,9 +89,7 @@ static struct ctl_table dccp_default_table[] = {
.maxlen = sizeof(sysctl_dccp_sync_ratelimit),
.mode = 0644,
.proc_handler = proc_dointvec_ms_jiffies,
- },
-
- { }
+ }
};
static struct ctl_table_header *dccp_table_header;
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 7b717434368c..3d9f2fbb8ec0 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -337,8 +337,7 @@ static struct ctl_table lowpan_frags_ns_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
/* secret interval has been deprecated */
@@ -350,8 +349,7 @@ static struct ctl_table lowpan_frags_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
static int __net_init lowpan_frags_ns_sysctl_register(struct net *net)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 6360425dfcb2..eeb229b1ab78 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2516,7 +2516,7 @@ static int ipv4_doint_and_flush(struct ctl_table *ctl, int write,
static struct devinet_sysctl_table {
struct ctl_table_header *sysctl_header;
- struct ctl_table devinet_vars[__IPV4_DEVCONF_MAX];
+ struct ctl_table devinet_vars[IPV4_DEVCONF_MAX];
} devinet_sysctl = {
.devinet_vars = {
DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
@@ -2653,8 +2653,7 @@ static struct ctl_table ctl_forward_entry[] = {
.proc_handler = devinet_sysctl_forward,
.extra1 = &ipv4_devconf,
.extra2 = &init_net,
- },
- { },
+ }
};
#endif
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 3d7a82a900b5..2f8a8ac058da 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -579,8 +579,7 @@ static struct ctl_table ip4_frags_ns_ctl_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &dist_min,
- },
- { }
+ }
};
/* secret interval has been deprecated */
@@ -592,8 +591,7 @@ static struct ctl_table ip4_frags_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
static int __net_init ip4_frags_ns_ctl_register(struct net *net)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 883f4f1ee056..de0c0f9078b5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3551,8 +3551,7 @@ static struct ctl_table ipv4_route_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static const char ipv4_route_flush_procname[] = "flush";
@@ -3585,8 +3584,7 @@ static struct ctl_table ipv4_route_netns_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { },
+ }
};
static __net_init int sysctl_route_net_init(struct net *net)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1821f403efc0..31306925a35d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -577,8 +577,7 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_douintvec_minmax,
.extra1 = &sysctl_fib_sync_mem_min,
.extra2 = &sysctl_fib_sync_mem_max,
- },
- { }
+ }
};
static struct ctl_table ipv4_net_table[] = {
@@ -1469,8 +1468,7 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = &tcp_plb_max_cong_thresh,
- },
- { }
+ }
};
static __net_init int ipv4_sysctl_init_net(struct net *net)
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index ec1d68dbffc3..40610bb3a75a 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -160,8 +160,7 @@ static struct ctl_table xfrm4_policy_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static __net_init int xfrm4_net_sysctl_init(struct net *net)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 68a2925c66a5..c72887ad79a3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -7055,9 +7055,6 @@ static const struct ctl_table addrconf_sysctl[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO,
- },
- {
- /* sentinel */
}
};
@@ -7072,7 +7069,7 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
if (!table)
goto out;
- for (i = 0; table[i].data; i++) {
+ for (i = 0; i < ARRAY_SIZE(addrconf_sysctl); i++) {
table[i].data += (char *)p - (char *)&ipv6_devconf;
/* If one of these is already set, then it is not safe to
* overwrite either of them: this makes proc_dointvec_minmax
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4159662fa214..b57e2c447969 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -1204,8 +1204,7 @@ static struct ctl_table ipv6_icmp_table_template[] = {
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { },
+ }
};
struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index dca8e0aabc51..18106042a3ed 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -61,8 +61,7 @@ static struct ctl_table nf_ct_frag6_sysctl_table[] = {
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- },
- { }
+ }
};
static int nf_ct_frag6_sysctl_register(struct net *net)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 0688261202de..b7d7dd6e1f75 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -435,8 +435,7 @@ static struct ctl_table ip6_frags_ns_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
/* secret interval has been deprecated */
@@ -448,8 +447,7 @@ static struct ctl_table ip6_frags_ctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a35470576077..0b2a3afe620e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6417,8 +6417,7 @@ static struct ctl_table ipv6_route_table_template[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 29f121f513a6..0d45a8a32752 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -212,8 +212,7 @@ static struct ctl_table ipv6_table_template[] = {
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
.extra2 = &ioam6_id_wide_max,
- },
- { }
+ }
};
static struct ctl_table ipv6_rotable[] = {
@@ -246,9 +245,8 @@ static struct ctl_table ipv6_rotable[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
+ }
#endif /* CONFIG_NETLABEL */
- { }
};
static int __net_init ipv6_sysctl_net_init(struct net *net)
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 27efdb18a018..f3559ff33ff4 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -187,8 +187,7 @@ static struct ctl_table xfrm6_policy_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static int __net_init xfrm6_net_sysctl_init(struct net *net)
diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index 195296ba29f0..520fc52059b1 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -43,12 +43,10 @@ static struct ctl_table llc2_timeout_table[] = {
.maxlen = sizeof(sysctl_llc2_rej_timeout),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { },
+ }
};
static struct ctl_table llc_station_table[] = {
- { },
};
static struct ctl_table_header *llc2_timeout_header;
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 6f96aae76537..a78daceddf74 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1391,8 +1391,7 @@ static const struct ctl_table mpls_dev_table[] = {
.mode = 0644,
.proc_handler = mpls_conf_proc,
.data = MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
- },
- { }
+ }
};
static int mpls_platform_labels(struct ctl_table *table, int write,
@@ -1425,8 +1424,7 @@ static const struct ctl_table mpls_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE,
.extra2 = &ttl_max,
- },
- { }
+ }
};
static int mpls_dev_sysctl_register(struct net_device *dev,
@@ -1444,7 +1442,7 @@ static int mpls_dev_sysctl_register(struct net_device *dev,
/* Table data contains only offsets relative to the base of
* the mdev at this point, so make them absolute.
*/
- for (i = 0; i < ARRAY_SIZE(mpls_dev_table) - 1; i++) {
+ for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++) {
table[i].data = (char *)mdev + (uintptr_t)table[i].data;
table[i].extra1 = mdev;
table[i].extra2 = net;
@@ -2689,7 +2687,7 @@ static int mpls_net_init(struct net *net)
/* Table data contains only offsets relative to the base of
* the mdev at this point, so make them absolute.
*/
- for (i = 0; i < ARRAY_SIZE(mpls_table) - 1; i++)
+ for (i = 0; i < ARRAY_SIZE(mpls_table); i++)
table[i].data = (char *)net + (uintptr_t)table[i].data;
net->mpls.ctl = register_net_sysctl(net, "net/mpls", table,
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index 42dfc834e5c6..27fb556d2273 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -127,8 +127,7 @@ static struct ctl_table mptcp_sysctl_table[] = {
.proc_handler = proc_dou8vec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = &mptcp_pm_type_max
- },
- {}
+ }
};
static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index abbd30ee3ce0..fef7104ac33e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2258,9 +2258,8 @@ static struct ctl_table vs_vars[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
+ }
#endif
- { }
};
#endif
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 254eb3b61e15..e6297bb6922b 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -122,8 +122,7 @@ static struct ctl_table vs_vars_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
#endif
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 0e39a4fd421f..f3056767818b 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -293,8 +293,7 @@ static struct ctl_table vs_vars_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { }
+ }
};
#endif
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index a3b2029ef098..26efc6f28b34 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -616,11 +616,9 @@ enum nf_ct_sysctl_index {
NF_SYSCTL_CT_LWTUNNEL,
#endif
- __NF_SYSCTL_CT_LAST_SYSCTL,
+ NF_SYSCTL_CT_LAST_SYSCTL,
};
-#define NF_SYSCTL_CT_LAST_SYSCTL (__NF_SYSCTL_CT_LAST_SYSCTL + 1)
-
static struct ctl_table nf_ct_sysctl_table[] = {
[NF_SYSCTL_CT_MAX] = {
.procname = "nf_conntrack_max",
@@ -955,9 +953,8 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = nf_hooks_lwtunnel_sysctl_handler,
- },
+ }
#endif
- {}
};
static struct ctl_table nf_ct_netfilter_table[] = {
@@ -967,8 +964,7 @@ static struct ctl_table nf_ct_netfilter_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 755f9cf570ce..686040b5e431 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -389,7 +389,7 @@ static const struct seq_operations nflog_seq_ops = {
#ifdef CONFIG_SYSCTL
static char nf_log_sysctl_fnames[NFPROTO_NUMPROTO-NFPROTO_UNSPEC][3];
-static struct ctl_table nf_log_sysctl_table[NFPROTO_NUMPROTO+1];
+static struct ctl_table nf_log_sysctl_table[NFPROTO_NUMPROTO];
static struct ctl_table_header *nf_log_sysctl_fhdr;
static struct ctl_table nf_log_sysctl_ftable[] = {
@@ -399,8 +399,7 @@ static struct ctl_table nf_log_sysctl_ftable[] = {
.maxlen = sizeof(sysctl_nf_log_all_netns),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
static int nf_log_proc_dostring(struct ctl_table *table, int write,
diff --git a/net/netrom/sysctl_net_netrom.c b/net/netrom/sysctl_net_netrom.c
index c02b93fd9d4f..133dccdc2201 100644
--- a/net/netrom/sysctl_net_netrom.c
+++ b/net/netrom/sysctl_net_netrom.c
@@ -139,8 +139,7 @@ static struct ctl_table nr_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_reset,
.extra2 = &max_reset
- },
- { }
+ }
};
int __init nr_register_sysctl(void)
diff --git a/net/phonet/sysctl.c b/net/phonet/sysctl.c
index 0fd0fcb00505..5385e980693e 100644
--- a/net/phonet/sysctl.c
+++ b/net/phonet/sysctl.c
@@ -80,8 +80,7 @@ static struct ctl_table phonet_table[] = {
.maxlen = sizeof(local_port_range),
.mode = 0644,
.proc_handler = proc_local_port_range,
- },
- { }
+ }
};
int __init phonet_sysctl_init(void)
diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
index 102fd4a18df7..ee9ec39d9b30 100644
--- a/net/rds/ib_sysctl.c
+++ b/net/rds/ib_sysctl.c
@@ -102,8 +102,7 @@ static struct ctl_table rds_ib_sysctl_table[] = {
.maxlen = sizeof(rds_ib_sysctl_flow_control),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
void rds_ib_sysctl_exit(void)
diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index 5abd2730a1bc..17b325585bd9 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -88,8 +88,7 @@ static struct ctl_table rds_sysctl_rds_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { }
+ }
};
void rds_sysctl_exit(void)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 2e90a2570d3b..e4abe20c4d2d 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -85,8 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = {
.mode = 0644,
.proc_handler = rds_tcp_skbuf_handler,
.extra1 = &rds_tcp_min_rcvbuf,
- },
- { }
+ }
};
u32 rds_tcp_write_seq(struct rds_tcp_connection *tc)
diff --git a/net/rose/sysctl_net_rose.c b/net/rose/sysctl_net_rose.c
index 4f5a1e8b6c54..1a244a4d0221 100644
--- a/net/rose/sysctl_net_rose.c
+++ b/net/rose/sysctl_net_rose.c
@@ -111,8 +111,7 @@ static struct ctl_table rose_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_window,
.extra2 = &max_window
- },
- { }
+ }
};
void __init rose_register_sysctl(void)
diff --git a/net/rxrpc/sysctl.c b/net/rxrpc/sysctl.c
index 2b5824416036..583306fad3ef 100644
--- a/net/rxrpc/sysctl.c
+++ b/net/rxrpc/sysctl.c
@@ -124,8 +124,7 @@ static struct ctl_table rxrpc_sysctl_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = (void *)SYSCTL_ONE,
.extra2 = (void *)&four,
- },
- { }
+ }
};
int __init rxrpc_sysctl_init(void)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 233f37f0fa28..93ea4decbb1b 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -79,9 +79,7 @@ static struct ctl_table sctp_table[] = {
.maxlen = sizeof(sysctl_sctp_wmem),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
-
- { /* sentinel */ }
+ }
};
/* The following index defines are used in sctp_sysctl_net_register().
@@ -383,9 +381,7 @@ static struct ctl_table sctp_net_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = &pf_expose_max,
- },
-
- { /* sentinel */ }
+ }
};
static int proc_sctp_do_hmac_alg(struct ctl_table *ctl, int write,
@@ -604,7 +600,7 @@ int sctp_sysctl_net_register(struct net *net)
if (!table)
return -ENOMEM;
- for (i = 0; table[i].data; i++)
+ for (i = 0; i < ARRAY_SIZE(sctp_net_table); i++)
table[i].data += (char *)(&net->sctp) - (char *)&init_net.sctp;
table[SCTP_RTO_MIN_IDX].extra2 = &net->sctp.rto_max;
diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c
index 9404123883c0..89af1f1dbb58 100644
--- a/net/smc/smc_sysctl.c
+++ b/net/smc/smc_sysctl.c
@@ -61,8 +61,7 @@ static struct ctl_table smc_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_rcvbuf,
- },
- { }
+ }
};
int __net_init smc_sysctl_net_init(struct net *net)
diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index 61222addda7e..f4ac3376c25b 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -159,8 +159,7 @@ static struct ctl_table debug_table[] = {
.maxlen = 256,
.mode = 0444,
.proc_handler = proc_do_xprt,
- },
- { }
+ }
};
void
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index df7fb9c8b785..cbc75193bc70 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -208,8 +208,7 @@ static struct ctl_table svcrdma_parm_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = &zero,
.extra2 = &zero,
- },
- { },
+ }
};
static void svc_rdma_proc_cleanup(void)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index bf43e05044a3..75c789712cd9 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -136,8 +136,7 @@ static struct ctl_table xr_tunables_table[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { },
+ }
};
#endif
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7c3d5ed708be..c0c59cd3b31c 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -154,8 +154,7 @@ static struct ctl_table xs_tunables_table[] = {
.maxlen = sizeof(xs_tcp_fin_timeout),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
- },
- { },
+ }
};
/*
diff --git a/net/tipc/sysctl.c b/net/tipc/sysctl.c
index b9cbc3b359aa..e492d8c6c6f3 100644
--- a/net/tipc/sysctl.c
+++ b/net/tipc/sysctl.c
@@ -90,8 +90,7 @@ static struct ctl_table tipc_table[] = {
.maxlen = sizeof(sysctl_tipc_bc_retruni),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
- },
- {}
+ }
};
int tipc_register_sysctl(void)
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index 92f3bc3cd704..716dee11d9e3 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -18,8 +18,7 @@ static struct ctl_table unix_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
- },
- { }
+ }
};
int __net_init unix_sysctl_register(struct net *net)
diff --git a/net/x25/sysctl_net_x25.c b/net/x25/sysctl_net_x25.c
index 4d7c2ee41943..1e76f96ba77f 100644
--- a/net/x25/sysctl_net_x25.c
+++ b/net/x25/sysctl_net_x25.c
@@ -70,8 +70,7 @@ static struct ctl_table x25_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
- },
- { },
+ }
};
int __init x25_register_sysctl(void)
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index d04b25a47575..e2b2c3437fbc 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -37,8 +37,7 @@ static struct ctl_table xfrm_table[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec
- },
- {}
+ }
};
int __net_init xfrm_sysctl_init(struct net *net)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index b77344506cf3..67aa6e236e66 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1778,9 +1778,7 @@ static struct ctl_table apparmor_sysctl_table[] = {
.maxlen = sizeof(int),
.mode = 0600,
.proc_handler = apparmor_dointvec,
- },
-
- { }
+ }
};
static int __init apparmor_init_sysctl(void)
diff --git a/security/keys/sysctl.c b/security/keys/sysctl.c
index fa305f74f658..7c944ef5a58c 100644
--- a/security/keys/sysctl.c
+++ b/security/keys/sysctl.c
@@ -54,9 +54,9 @@ struct ctl_table key_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = (void *) SYSCTL_ZERO,
.extra2 = (void *) SYSCTL_INT_MAX,
- },
+ }
#ifdef CONFIG_PERSISTENT_KEYRINGS
- {
+ , {
.procname = "persistent_keyring_expiry",
.data = &persistent_keyring_expiry,
.maxlen = sizeof(unsigned),
@@ -64,9 +64,8 @@ struct ctl_table key_sysctls[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = (void *) SYSCTL_ZERO,
.extra2 = (void *) SYSCTL_INT_MAX,
- },
+ }
#endif
- { }
};
static int __init init_security_keys_sysctls(void)
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 6f2cc827df41..28b411adbf0b 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -61,8 +61,7 @@ static struct ctl_table loadpin_sysctl_table[] = {
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE,
.extra2 = SYSCTL_ONE,
- },
- { }
+ }
};
static void set_sysctl(bool is_writable)
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 7b8164a4b504..2d2700af9f6b 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -456,8 +456,7 @@ static struct ctl_table yama_sysctl_table[] = {
.proc_handler = yama_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = &max_scope,
- },
- { }
+ }
};
static void __init yama_init_sysctl(void)
{
--
2.30.2
^ permalink raw reply related
* [PATCH 07/11] sysctl: Add size to register_sysctl
From: Joel Granados @ 2023-06-21 9:09 UTC (permalink / raw)
To: mcgrof, Russell King, Catalin Marinas, Will Deacon,
Michael Ellerman, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Gerald Schaefer, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Herbert Xu, David S. Miller, Russ Weight, Greg Kroah-Hartman,
Phillip Potter, Clemens Ladisch, Arnd Bergmann, Corey Minyard,
Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
David Airlie, Daniel Vetter, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Benjamin Herrenschmidt, Song Liu, Robin Holt,
Steve Wahl, Sudip Mukherjee, Mark Rutland, James E.J. Bottomley,
Martin K. Petersen, Doug Gilbert, David Howells, Jan Harkes, coda,
Alexander Viro, Christian Brauner, Chuck Lever, Jeff Layton,
Trond Myklebust, Anna Schumaker, Jan Kara, Anton Altaparmakov,
Mark Fasheh, Joel Becker, Joseph Qi, Kees Cook, Iurii Zaikin,
Eric Biggers, Theodore Y. Ts'o, Darrick J. Wong, John Stultz,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, John Johansen,
Paul Moore, James Morris, Serge E. Hallyn
Cc: Joel Granados, Nicholas Piggin, Christophe Leroy,
Christian Borntraeger, Sven Schnelle, H. Peter Anvin,
Rafael J. Wysocki, Mike Travis, Amir Goldstein, Matthew Bobrowski,
Stephen Boyd, linux-arm-kernel, linux-kernel, linux-ia64,
linuxppc-dev, linux-s390, linux-crypto, openipmi-developer,
intel-gfx, dri-devel, linux-hyperv, linux-raid, linux-scsi,
linux-cachefs, codalist, linux-fsdevel, linux-nfs, linux-ntfs-dev,
ocfs2-devel, fsverity, linux-xfs, netdev, apparmor,
linux-security-module
In-Reply-To: <20230621091000.424843-1-j.granados@samsung.com>
In order to remove the end element from the ctl_table struct arrays, we
explicitly define the size when registering the targes.
We add a size argument to register_sysctl and change all the callers to
pass the ARRAY_SIZE of their table arg.
Signed-off-by: Joel Granados <j.granados@samsung.com>
---
arch/arm/kernel/isa.c | 2 +-
arch/arm64/kernel/armv8_deprecated.c | 2 +-
arch/arm64/kernel/fpsimd.c | 6 +++--
arch/arm64/kernel/process.c | 3 ++-
arch/ia64/kernel/crash.c | 3 ++-
arch/powerpc/kernel/idle.c | 3 ++-
arch/powerpc/platforms/pseries/mobility.c | 3 ++-
arch/s390/appldata/appldata_base.c | 4 +++-
arch/s390/kernel/debug.c | 3 ++-
arch/s390/kernel/topology.c | 3 ++-
arch/s390/mm/cmm.c | 3 ++-
arch/s390/mm/pgalloc.c | 3 ++-
arch/x86/entry/vdso/vdso32-setup.c | 2 +-
arch/x86/kernel/itmt.c | 3 ++-
crypto/fips.c | 3 ++-
drivers/base/firmware_loader/fallback_table.c | 6 ++---
drivers/cdrom/cdrom.c | 3 ++-
drivers/char/hpet.c | 3 ++-
drivers/char/ipmi/ipmi_poweroff.c | 3 ++-
drivers/gpu/drm/i915/i915_perf.c | 3 ++-
drivers/hv/hv_common.c | 3 ++-
drivers/macintosh/mac_hid.c | 3 ++-
drivers/md/md.c | 3 ++-
drivers/misc/sgi-xp/xpc_main.c | 6 +++--
drivers/parport/procfs.c | 11 +++++----
drivers/perf/arm_pmuv3.c | 3 ++-
drivers/scsi/scsi_sysctl.c | 3 ++-
drivers/scsi/sg.c | 3 ++-
fs/cachefiles/error_inject.c | 3 ++-
fs/coda/sysctl.c | 3 ++-
fs/devpts/inode.c | 3 ++-
fs/eventpoll.c | 2 +-
fs/lockd/svc.c | 3 ++-
fs/nfs/nfs4sysctl.c | 3 ++-
fs/nfs/sysctl.c | 3 ++-
fs/notify/fanotify/fanotify_user.c | 3 ++-
fs/notify/inotify/inotify_user.c | 3 ++-
fs/ntfs/sysctl.c | 3 ++-
fs/ocfs2/stackglue.c | 3 ++-
fs/proc/proc_sysctl.c | 23 ++++++++++---------
fs/verity/signature.c | 4 +++-
fs/xfs/xfs_sysctl.c | 3 ++-
include/linux/sysctl.h | 6 +++--
kernel/pid_sysctl.h | 2 +-
kernel/time/timer.c | 2 +-
kernel/ucount.c | 2 +-
kernel/utsname_sysctl.c | 2 +-
lib/test_sysctl.c | 9 +++++---
net/sunrpc/sysctl.c | 3 ++-
net/sunrpc/xprtrdma/svc_rdma.c | 3 ++-
net/sunrpc/xprtrdma/transport.c | 4 +++-
net/sunrpc/xprtsock.c | 4 +++-
net/sysctl_net.c | 2 +-
security/apparmor/lsm.c | 3 ++-
security/loadpin/loadpin.c | 3 ++-
security/yama/yama_lsm.c | 3 ++-
56 files changed, 133 insertions(+), 76 deletions(-)
diff --git a/arch/arm/kernel/isa.c b/arch/arm/kernel/isa.c
index 20218876bef2..561432e3c55a 100644
--- a/arch/arm/kernel/isa.c
+++ b/arch/arm/kernel/isa.c
@@ -46,5 +46,5 @@ register_isa_ports(unsigned int membase, unsigned int portbase, unsigned int por
isa_membase = membase;
isa_portbase = portbase;
isa_portshift = portshift;
- isa_sysctl_header = register_sysctl("bus/isa", ctl_isa_vars);
+ isa_sysctl_header = register_sysctl("bus/isa", ctl_isa_vars, ARRAY_SIZE(ctl_isa_vars));
}
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 1febd412b4d2..68ed60a521a6 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -569,7 +569,7 @@ static void __init register_insn_emulation(struct insn_emulation *insn)
sysctl->extra2 = &insn->max;
sysctl->proc_handler = emulation_proc_handler;
- register_sysctl("abi", sysctl);
+ register_sysctl("abi", sysctl, 1);
}
}
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 2fbafa5cc7ac..ecfb2ef6a036 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -595,7 +595,8 @@ static struct ctl_table sve_default_vl_table[] = {
static int __init sve_sysctl_init(void)
{
if (system_supports_sve())
- if (!register_sysctl("abi", sve_default_vl_table))
+ if (!register_sysctl("abi", sve_default_vl_table,
+ ARRAY_SIZE(sve_default_vl_table)))
return -EINVAL;
return 0;
@@ -619,7 +620,8 @@ static struct ctl_table sme_default_vl_table[] = {
static int __init sme_sysctl_init(void)
{
if (system_supports_sme())
- if (!register_sysctl("abi", sme_default_vl_table))
+ if (!register_sysctl("abi", sme_default_vl_table,
+ ARRAY_SIZE(sme_default_vl_table)))
return -EINVAL;
return 0;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0fcc4eb1a7ab..cfe232960f2f 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -729,7 +729,8 @@ static struct ctl_table tagged_addr_sysctl_table[] = {
static int __init tagged_addr_init(void)
{
- if (!register_sysctl("abi", tagged_addr_sysctl_table))
+ if (!register_sysctl("abi", tagged_addr_sysctl_table,
+ ARRAY_SIZE(tagged_addr_sysctl_table)))
return -EINVAL;
return 0;
}
diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index 88b3ce3e66cd..66917b879b2a 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -248,7 +248,8 @@ machine_crash_setup(void)
if((ret = register_die_notifier(&kdump_init_notifier_nb)) != 0)
return ret;
#ifdef CONFIG_SYSCTL
- register_sysctl("kernel", kdump_ctl_table);
+ register_sysctl("kernel", kdump_ctl_table,
+ ARRAY_SIZE(kdump_ctl_table));
#endif
return 0;
}
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b1c0418b25c8..3807169fc7e7 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -111,7 +111,8 @@ static struct ctl_table powersave_nap_ctl_table[] = {
static int __init
register_powersave_nap_sysctl(void)
{
- register_sysctl("kernel", powersave_nap_ctl_table);
+ register_sysctl("kernel", powersave_nap_ctl_table,
+ ARRAY_SIZE(powersave_nap_ctl_table));
return 0;
}
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 6f30113b5468..9fdbee8ee126 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -65,7 +65,8 @@ static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = {
static int __init register_nmi_wd_lpm_factor_sysctl(void)
{
- register_sysctl("kernel", nmi_wd_lpm_factor_ctl_table);
+ register_sysctl("kernel", nmi_wd_lpm_factor_ctl_table,
+ ARRAY_SIZE(nmi_wd_lpm_factor_ctl_table));
return 0;
}
diff --git a/arch/s390/appldata/appldata_base.c b/arch/s390/appldata/appldata_base.c
index b07b0610950e..54d8ed1c4518 100644
--- a/arch/s390/appldata/appldata_base.c
+++ b/arch/s390/appldata/appldata_base.c
@@ -408,7 +408,9 @@ static int __init appldata_init(void)
appldata_wq = alloc_ordered_workqueue("appldata", 0);
if (!appldata_wq)
return -ENOMEM;
- appldata_sysctl_header = register_sysctl(appldata_proc_name, appldata_table);
+ appldata_sysctl_header = register_sysctl(appldata_proc_name,
+ appldata_table,
+ ARRAY_SIZE(appldata_table));
return 0;
}
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c
index a85e0c3e7027..002f843e6523 100644
--- a/arch/s390/kernel/debug.c
+++ b/arch/s390/kernel/debug.c
@@ -1564,7 +1564,8 @@ static int debug_sprintf_format_fn(debug_info_t *id, struct debug_view *view,
*/
static int __init debug_init(void)
{
- s390dbf_sysctl_header = register_sysctl("s390dbf", s390dbf_table);
+ s390dbf_sysctl_header = register_sysctl("s390dbf", s390dbf_table,
+ ARRAY_SIZE(s390dbf_table));
mutex_lock(&debug_mutex);
debug_debugfs_root_entry = debugfs_create_dir(DEBUG_DIR_ROOT, NULL);
initialized = 1;
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 9fd19530c9a5..372d2c7c9a8e 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -647,7 +647,8 @@ static int __init topology_init(void)
set_topology_timer();
else
topology_update_polarization_simple();
- register_sysctl("s390", topology_ctl_table);
+ register_sysctl("s390", topology_ctl_table,
+ ARRAY_SIZE(topology_ctl_table));
dev_root = bus_get_dev_root(&cpu_subsys);
if (dev_root) {
diff --git a/arch/s390/mm/cmm.c b/arch/s390/mm/cmm.c
index 5300c6867d5e..918816dcb42a 100644
--- a/arch/s390/mm/cmm.c
+++ b/arch/s390/mm/cmm.c
@@ -379,7 +379,8 @@ static int __init cmm_init(void)
{
int rc = -ENOMEM;
- cmm_sysctl_header = register_sysctl("vm", cmm_table);
+ cmm_sysctl_header = register_sysctl("vm", cmm_table,
+ ARRAY_SIZE(cmm_table));
if (!cmm_sysctl_header)
goto out_sysctl;
#ifdef CONFIG_CMM_IUCV
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 66ab68db9842..a723f1a8236a 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -35,7 +35,8 @@ static struct ctl_table page_table_sysctl[] = {
static int __init page_table_register_sysctl(void)
{
- return register_sysctl("vm", page_table_sysctl) ? 0 : -ENOMEM;
+ return register_sysctl("vm", page_table_sysctl,
+ ARRAY_SIZE(page_table_sysctl)) ? 0 : -ENOMEM;
}
__initcall(page_table_register_sysctl);
diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c
index f3b3cacbcbb0..e28cdba83e0e 100644
--- a/arch/x86/entry/vdso/vdso32-setup.c
+++ b/arch/x86/entry/vdso/vdso32-setup.c
@@ -72,7 +72,7 @@ static struct ctl_table abi_table2[] = {
static __init int ia32_binfmt_init(void)
{
- register_sysctl("abi", abi_table2);
+ register_sysctl("abi", abi_table2, ARRAY_SIZE(abi_table2));
return 0;
}
__initcall(ia32_binfmt_init);
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 670eb08b972a..58ec95fce798 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -105,7 +105,8 @@ int sched_set_itmt_support(void)
return 0;
}
- itmt_sysctl_header = register_sysctl("kernel", itmt_kern_table);
+ itmt_sysctl_header = register_sysctl("kernel", itmt_kern_table,
+ ARRAY_SIZE(itmt_kern_table));
if (!itmt_sysctl_header) {
mutex_unlock(&itmt_update_mutex);
return -ENOMEM;
diff --git a/crypto/fips.c b/crypto/fips.c
index 92fd506abb21..05a251680700 100644
--- a/crypto/fips.c
+++ b/crypto/fips.c
@@ -70,7 +70,8 @@ static struct ctl_table_header *crypto_sysctls;
static void crypto_proc_fips_init(void)
{
- crypto_sysctls = register_sysctl("crypto", crypto_sysctl_table);
+ crypto_sysctls = register_sysctl("crypto", crypto_sysctl_table,
+ ARRAY_SIZE(crypto_sysctl_table));
}
static void crypto_proc_fips_exit(void)
diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c
index e5ac098d0742..7a2d584233bb 100644
--- a/drivers/base/firmware_loader/fallback_table.c
+++ b/drivers/base/firmware_loader/fallback_table.c
@@ -50,9 +50,9 @@ static struct ctl_table firmware_config_table[] = {
static struct ctl_table_header *firmware_config_sysct_table_header;
int register_firmware_config_sysctl(void)
{
- firmware_config_sysct_table_header =
- register_sysctl("kernel/firmware_config",
- firmware_config_table);
+ firmware_config_sysct_table_header = register_sysctl("kernel/firmware_config",
+ firmware_config_table,
+ ARRAY_SIZE(firmware_config_table));
if (!firmware_config_sysct_table_header)
return -ENOMEM;
return 0;
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 416f723a2dbb..3855da76a16d 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -3680,7 +3680,8 @@ static void cdrom_sysctl_register(void)
if (!atomic_add_unless(&initialized, 1, 1))
return;
- cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table);
+ cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table,
+ ARRAY_SIZE(cdrom_table));
/* set the defaults */
cdrom_sysctl_settings.autoclose = autoclose;
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index ee71376f174b..bb1eb801b20c 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -1027,7 +1027,8 @@ static int __init hpet_init(void)
if (result < 0)
return -ENODEV;
- sysctl_header = register_sysctl("dev/hpet", hpet_table);
+ sysctl_header = register_sysctl("dev/hpet", hpet_table,
+ ARRAY_SIZE(hpet_table));
result = acpi_bus_register_driver(&hpet_acpi_driver);
if (result < 0) {
diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c
index 870659d91db2..46b1ea866da9 100644
--- a/drivers/char/ipmi/ipmi_poweroff.c
+++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -675,7 +675,8 @@ static int __init ipmi_poweroff_init(void)
pr_info("Power cycle is enabled\n");
#ifdef CONFIG_PROC_FS
- ipmi_table_header = register_sysctl("dev/ipmi", ipmi_table);
+ ipmi_table_header = register_sysctl("dev/ipmi", ipmi_table,
+ ARRAY_SIZE(ipmi_table));
if (!ipmi_table_header) {
pr_err("Unable to register powercycle sysctl\n");
rv = -ENOMEM;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 050b8ae7b8e7..f43950219ffc 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -5266,7 +5266,8 @@ static int destroy_config(int id, void *p, void *data)
int i915_perf_sysctl_register(void)
{
- sysctl_header = register_sysctl("dev/i915", oa_table);
+ sysctl_header = register_sysctl("dev/i915", oa_table,
+ ARRAY_SIZE(oa_table));
return 0;
}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 64f9ceca887b..dd751c391cf7 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -302,7 +302,8 @@ int __init hv_common_init(void)
* message recording won't be available in isolated
* guests should the following registration fail.
*/
- hv_ctl_table_hdr = register_sysctl("kernel", hv_ctl_table);
+ hv_ctl_table_hdr = register_sysctl("kernel", hv_ctl_table,
+ ARRAY_SIZE(hv_ctl_table));
if (!hv_ctl_table_hdr)
pr_err("Hyper-V: sysctl table register error");
diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c
index d8c4d5664145..5d433ef430fa 100644
--- a/drivers/macintosh/mac_hid.c
+++ b/drivers/macintosh/mac_hid.c
@@ -243,7 +243,8 @@ static struct ctl_table_header *mac_hid_sysctl_header;
static int __init mac_hid_init(void)
{
- mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files);
+ mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files,
+ ARRAY_SIZE(mac_hid_files));
if (!mac_hid_sysctl_header)
return -ENOMEM;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b3444..c10cc8ddd94d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9633,7 +9633,8 @@ static int __init md_init(void)
mdp_major = ret;
register_reboot_notifier(&md_notifier);
- raid_table_header = register_sysctl("dev/raid", raid_table);
+ raid_table_header = register_sysctl("dev/raid", raid_table,
+ ARRAY_SIZE(raid_table));
md_geninit();
return 0;
diff --git a/drivers/misc/sgi-xp/xpc_main.c b/drivers/misc/sgi-xp/xpc_main.c
index 6da509d692bb..264b919d0610 100644
--- a/drivers/misc/sgi-xp/xpc_main.c
+++ b/drivers/misc/sgi-xp/xpc_main.c
@@ -1236,8 +1236,10 @@ xpc_init(void)
goto out_1;
}
- xpc_sysctl = register_sysctl("xpc", xpc_sys_xpc);
- xpc_sysctl_hb = register_sysctl("xpc/hb", xpc_sys_xpc_hb);
+ xpc_sysctl = register_sysctl("xpc", xpc_sys_xpc,
+ ARRAY_SIZE(xpc_sys_xpc));
+ xpc_sysctl_hb = register_sysctl("xpc/hb", xpc_sys_xpc_hb,
+ ARRAY_SIZE(xpc_sys_xpc_hb));
/*
* Fill the partition reserved page with the information needed by
diff --git a/drivers/parport/procfs.c b/drivers/parport/procfs.c
index 4e5b972c3e26..16cee52f035f 100644
--- a/drivers/parport/procfs.c
+++ b/drivers/parport/procfs.c
@@ -464,7 +464,8 @@ int parport_proc_register(struct parport *port)
err = -ENOENT;
goto exit_free_tmp_dir_path;
}
- t->devices_header = register_sysctl(tmp_dir_path, t->device_dir);
+ t->devices_header = register_sysctl(tmp_dir_path, t->device_dir,
+ ARRAY_SIZE(t->device_dir));
if (t->devices_header == NULL) {
err = -ENOENT;
goto exit_free_tmp_dir_path;
@@ -478,7 +479,8 @@ int parport_proc_register(struct parport *port)
goto unregister_devices_h;
}
- t->port_header = register_sysctl(tmp_dir_path, t->vars);
+ t->port_header = register_sysctl(tmp_dir_path, t->vars,
+ ARRAY_SIZE(t->vars));
if (t->port_header == NULL) {
err = -ENOENT;
goto unregister_devices_h;
@@ -544,7 +546,7 @@ int parport_device_proc_register(struct pardevice *device)
t->vars[0].data = &device->timeslice;
- t->sysctl_header = register_sysctl(tmp_dir_path, t->vars);
+ t->sysctl_header = register_sysctl(tmp_dir_path, t->vars, ARRAY_SIZE(t->vars));
if (t->sysctl_header == NULL) {
kfree(t);
t = NULL;
@@ -579,7 +581,8 @@ static int __init parport_default_proc_register(void)
int ret;
parport_default_sysctl_table.sysctl_header =
- register_sysctl("dev/parport/default", parport_default_sysctl_table.vars);
+ register_sysctl("dev/parport/default", parport_default_sysctl_table.vars,
+ ARRAY_SIZE(parport_default_sysctl_table.vars));
if (!parport_default_sysctl_table.sysctl_header)
return -ENOMEM;
ret = parport_bus_init();
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index c98e4039386d..763f9c8acfbf 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1188,7 +1188,8 @@ static void armv8_pmu_register_sysctl_table(void)
static u32 tbl_registered = 0;
if (!cmpxchg_relaxed(&tbl_registered, 0, 1))
- register_sysctl("kernel", armv8_pmu_sysctl_table);
+ register_sysctl("kernel", armv8_pmu_sysctl_table,
+ ARRAY_SIZE(armv8_pmu_sysctl_table));
}
static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
diff --git a/drivers/scsi/scsi_sysctl.c b/drivers/scsi/scsi_sysctl.c
index 7f0914ea168f..0378bd63fea4 100644
--- a/drivers/scsi/scsi_sysctl.c
+++ b/drivers/scsi/scsi_sysctl.c
@@ -25,7 +25,8 @@ static struct ctl_table_header *scsi_table_header;
int __init scsi_init_sysctl(void)
{
- scsi_table_header = register_sysctl("dev/scsi", scsi_table);
+ scsi_table_header = register_sysctl("dev/scsi", scsi_table,
+ ARRAY_SIZE(scsi_table));
if (!scsi_table_header)
return -ENOMEM;
return 0;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 037f8c98a6d3..d12cdf875b50 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1646,7 +1646,8 @@ static struct ctl_table_header *hdr;
static void register_sg_sysctls(void)
{
if (!hdr)
- hdr = register_sysctl("kernel", sg_sysctls);
+ hdr = register_sysctl("kernel", sg_sysctls,
+ ARRAY_SIZE(sg_sysctls));
}
static void unregister_sg_sysctls(void)
diff --git a/fs/cachefiles/error_inject.c b/fs/cachefiles/error_inject.c
index 18de8a876b02..ea6bcce4f6f1 100644
--- a/fs/cachefiles/error_inject.c
+++ b/fs/cachefiles/error_inject.c
@@ -24,7 +24,8 @@ static struct ctl_table cachefiles_sysctls[] = {
int __init cachefiles_register_error_injection(void)
{
- cachefiles_sysctl = register_sysctl("cachefiles", cachefiles_sysctls);
+ cachefiles_sysctl = register_sysctl("cachefiles", cachefiles_sysctls,
+ ARRAY_SIZE(cachefiles_sysctls));
if (!cachefiles_sysctl)
return -ENOMEM;
return 0;
diff --git a/fs/coda/sysctl.c b/fs/coda/sysctl.c
index a247c14aaab7..16224a7c6691 100644
--- a/fs/coda/sysctl.c
+++ b/fs/coda/sysctl.c
@@ -42,7 +42,8 @@ static struct ctl_table coda_table[] = {
void coda_sysctl_init(void)
{
if ( !fs_table_header )
- fs_table_header = register_sysctl("coda", coda_table);
+ fs_table_header = register_sysctl("coda", coda_table,
+ ARRAY_SIZE(coda_table));
}
void coda_sysctl_clean(void)
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index fe3db0eda8e4..c17f971a8c4b 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -612,7 +612,8 @@ static int __init init_devpts_fs(void)
{
int err = register_filesystem(&devpts_fs_type);
if (!err) {
- register_sysctl("kernel/pty", pty_table);
+ register_sysctl("kernel/pty", pty_table,
+ ARRAY_SIZE(pty_table));
}
return err;
}
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 980483455cc0..e1a0e6a6d3de 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -327,7 +327,7 @@ static struct ctl_table epoll_table[] = {
static void __init epoll_sysctls_init(void)
{
- register_sysctl("fs/epoll", epoll_table);
+ register_sysctl("fs/epoll", epoll_table, ARRAY_SIZE(epoll_table));
}
#else
#define epoll_sysctls_init() do { } while (0)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index bb94949bc223..84736267f4e1 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -626,7 +626,8 @@ static int __init init_nlm(void)
#ifdef CONFIG_SYSCTL
err = -ENOMEM;
- nlm_sysctl_table = register_sysctl("fs/nfs", nlm_sysctls);
+ nlm_sysctl_table = register_sysctl("fs/nfs", nlm_sysctls,
+ ARRAY_SIZE(nlm_sysctls));
if (nlm_sysctl_table == NULL)
goto err_sysctl;
#endif
diff --git a/fs/nfs/nfs4sysctl.c b/fs/nfs/nfs4sysctl.c
index e776200e9a11..4a542ee11e68 100644
--- a/fs/nfs/nfs4sysctl.c
+++ b/fs/nfs/nfs4sysctl.c
@@ -40,7 +40,8 @@ static struct ctl_table nfs4_cb_sysctls[] = {
int nfs4_register_sysctl(void)
{
nfs4_callback_sysctl_table = register_sysctl("fs/nfs",
- nfs4_cb_sysctls);
+ nfs4_cb_sysctls,
+ ARRAY_SIZE(nfs4_cb_sysctls));
if (nfs4_callback_sysctl_table == NULL)
return -ENOMEM;
return 0;
diff --git a/fs/nfs/sysctl.c b/fs/nfs/sysctl.c
index f39e2089bc4c..9dafd44670e4 100644
--- a/fs/nfs/sysctl.c
+++ b/fs/nfs/sysctl.c
@@ -34,7 +34,8 @@ static struct ctl_table nfs_cb_sysctls[] = {
int nfs_register_sysctl(void)
{
- nfs_callback_sysctl_table = register_sysctl("fs/nfs", nfs_cb_sysctls);
+ nfs_callback_sysctl_table = register_sysctl("fs/nfs", nfs_cb_sysctls,
+ ARRAY_SIZE(nfs_cb_sysctls));
if (nfs_callback_sysctl_table == NULL)
return -ENOMEM;
return 0;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 22fb1cf7e1fc..78d3bf479f59 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -91,7 +91,8 @@ static struct ctl_table fanotify_table[] = {
static void __init fanotify_sysctls_init(void)
{
- register_sysctl("fs/fanotify", fanotify_table);
+ register_sysctl("fs/fanotify", fanotify_table,
+ ARRAY_SIZE(fanotify_table));
}
#else
#define fanotify_sysctls_init() do { } while (0)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 1c4bfdab008d..0ce25c4ddfec 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -90,7 +90,8 @@ static struct ctl_table inotify_table[] = {
static void __init inotify_sysctls_init(void)
{
- register_sysctl("fs/inotify", inotify_table);
+ register_sysctl("fs/inotify", inotify_table,
+ ARRAY_SIZE(inotify_table));
}
#else
diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
index 174fe536a1c0..2c48f48a0b80 100644
--- a/fs/ntfs/sysctl.c
+++ b/fs/ntfs/sysctl.c
@@ -44,7 +44,8 @@ int ntfs_sysctl(int add)
{
if (add) {
BUG_ON(sysctls_root_table);
- sysctls_root_table = register_sysctl("fs", ntfs_sysctls);
+ sysctls_root_table = register_sysctl("fs", ntfs_sysctls,
+ ARRAY_SIZE(ntfs_sysctls));
if (!sysctls_root_table)
return -ENOMEM;
} else {
diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
index a8d5ca98fa57..9a653875d1c5 100644
--- a/fs/ocfs2/stackglue.c
+++ b/fs/ocfs2/stackglue.c
@@ -673,7 +673,8 @@ static int __init ocfs2_stack_glue_init(void)
strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB);
- ocfs2_table_header = register_sysctl("fs/ocfs2/nm", ocfs2_nm_table);
+ ocfs2_table_header = register_sysctl("fs/ocfs2/nm", ocfs2_nm_table,
+ ARRAY_SIZE(ocfs2_nm_table));
if (!ocfs2_table_header) {
printk(KERN_ERR
"ocfs2 stack glue: unable to register sysctl\n");
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8c415048d540..66c9d7a07d2e 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -43,7 +43,7 @@ struct ctl_table sysctl_mount_point[] = {
*/
struct ctl_table_header *register_sysctl_mount_point(const char *path)
{
- return register_sysctl(path, sysctl_mount_point);
+ return register_sysctl(path, sysctl_mount_point, 0);
}
EXPORT_SYMBOL(register_sysctl_mount_point);
@@ -1414,17 +1414,11 @@ struct ctl_table_header *__register_sysctl_table(
*
* See __register_sysctl_table for more details.
*/
-struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table)
+struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table,
+ size_t table_size)
{
- int count = 0;
- struct ctl_table *entry;
- struct ctl_table_header t_hdr;
-
- t_hdr.ctl_table = table;
- list_for_each_table_entry(entry, (&t_hdr))
- count++;
return __register_sysctl_table(&sysctl_table_root.default_set,
- path, table, count);
+ path, table, table_size);
}
EXPORT_SYMBOL(register_sysctl);
@@ -1451,7 +1445,14 @@ EXPORT_SYMBOL(register_sysctl);
void __init __register_sysctl_init(const char *path, struct ctl_table *table,
const char *table_name)
{
- struct ctl_table_header *hdr = register_sysctl(path, table);
+ int count = 0;
+ struct ctl_table *entry;
+ struct ctl_table_header t_hdr, *hdr;
+
+ t_hdr.ctl_table = table;
+ list_for_each_table_entry(entry, (&t_hdr))
+ count++;
+ hdr = register_sysctl(path, table, count);
if (unlikely(!hdr)) {
pr_err("failed when register_sysctl %s to %s\n", table_name, path);
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index b8c51ad40d3a..f617c6a1f16c 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c
@@ -103,7 +103,9 @@ static struct ctl_table fsverity_sysctl_table[] = {
static int __init fsverity_sysctl_init(void)
{
- fsverity_sysctl_header = register_sysctl("fs/verity", fsverity_sysctl_table);
+ fsverity_sysctl_header = register_sysctl("fs/verity",
+ fsverity_sysctl_table,
+ ARRAY_SIZE(fsverity_sysctl_table));
if (!fsverity_sysctl_header) {
pr_err("sysctl registration failed!\n");
return -ENOMEM;
diff --git a/fs/xfs/xfs_sysctl.c b/fs/xfs/xfs_sysctl.c
index fade33735393..61075e9c9e37 100644
--- a/fs/xfs/xfs_sysctl.c
+++ b/fs/xfs/xfs_sysctl.c
@@ -213,7 +213,8 @@ static struct ctl_table xfs_table[] = {
int
xfs_sysctl_register(void)
{
- xfs_table_header = register_sysctl("fs/xfs", xfs_table);
+ xfs_table_header = register_sysctl("fs/xfs", xfs_table,
+ ARRAY_SIZE(xfs_table));
if (!xfs_table_header)
return -ENOMEM;
return 0;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 0495c858989f..71d7935e50f0 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -227,7 +227,8 @@ extern void retire_sysctl_set(struct ctl_table_set *set);
struct ctl_table_header *__register_sysctl_table(
struct ctl_table_set *set,
const char *path, struct ctl_table *table, size_t table_size);
-struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table);
+struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table,
+ size_t table_size);
void unregister_sysctl_table(struct ctl_table_header * table);
extern int sysctl_init_bases(void);
@@ -262,7 +263,8 @@ static inline struct ctl_table_header *register_sysctl_mount_point(const char *p
return NULL;
}
-static inline struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table)
+static inline struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *table,
+ size_t table_size)
{
return NULL;
}
diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
index d67a4d45bb42..8b24744752cb 100644
--- a/kernel/pid_sysctl.h
+++ b/kernel/pid_sysctl.h
@@ -48,7 +48,7 @@ static struct ctl_table pid_ns_ctl_table_vm[] = {
};
static inline void register_pid_ns_sysctl_table_vm(void)
{
- register_sysctl("vm", pid_ns_ctl_table_vm);
+ register_sysctl("vm", pid_ns_ctl_table_vm, ARRAY_SIZE(pid_ns_ctl_table_vm));
}
#else
static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) {}
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 63a8ce7177dd..de385b365a7a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -265,7 +265,7 @@ static struct ctl_table timer_sysctl[] = {
static int __init timer_sysctl_init(void)
{
- register_sysctl("kernel", timer_sysctl);
+ register_sysctl("kernel", timer_sysctl, ARRAY_SIZE(timer_sysctl));
return 0;
}
device_initcall(timer_sysctl_init);
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 2b80264bb79f..59bf6983f1cf 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -365,7 +365,7 @@ static __init int user_namespace_sysctl_init(void)
* default set so that registrations in the child sets work
* properly.
*/
- user_header = register_sysctl("user", empty);
+ user_header = register_sysctl("user", empty, 0);
kmemleak_ignore(user_header);
BUG_ON(!user_header);
BUG_ON(!setup_userns_sysctls(&init_user_ns));
diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c
index 019e3a1566cf..24527b155538 100644
--- a/kernel/utsname_sysctl.c
+++ b/kernel/utsname_sysctl.c
@@ -138,7 +138,7 @@ void uts_proc_notify(enum uts_proc proc)
static int __init utsname_sysctl_init(void)
{
- register_sysctl("kernel", uts_kern_table);
+ register_sysctl("kernel", uts_kern_table, ARRAY_SIZE(uts_kern_table));
return 0;
}
diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c
index 8036aa91a1cb..83d37a163836 100644
--- a/lib/test_sysctl.c
+++ b/lib/test_sysctl.c
@@ -166,7 +166,8 @@ static int test_sysctl_setup_node_tests(void)
test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL);
if (!test_data.bitmap_0001)
return -ENOMEM;
- sysctl_test_headers.test_h_setup_node = register_sysctl("debug/test_sysctl", test_table);
+ sysctl_test_headers.test_h_setup_node = register_sysctl("debug/test_sysctl", test_table,
+ ARRAY_SIZE(test_table));
if (!sysctl_test_headers.test_h_setup_node) {
kfree(test_data.bitmap_0001);
return -ENOMEM;
@@ -192,7 +193,8 @@ static int test_sysctl_run_unregister_nested(void)
struct ctl_table_header *unregister;
unregister = register_sysctl("debug/test_sysctl/unregister_error",
- test_table_unregister);
+ test_table_unregister,
+ ARRAY_SIZE(test_table_unregister));
if (!unregister)
return -ENOMEM;
@@ -209,7 +211,8 @@ static int test_sysctl_run_register_mount_point(void)
sysctl_test_headers.test_h_mnterror
= register_sysctl("debug/test_sysctl/mnt/mnt_error",
- test_table_unregister);
+ test_table_unregister,
+ ARRAY_SIZE(test_table_unregister));
/*
* Don't check the result.:
* If it fails (expected behavior), return 0.
diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index 93941ab12549..61222addda7e 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -167,7 +167,8 @@ void
rpc_register_sysctl(void)
{
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl("sunrpc", debug_table);
+ sunrpc_table_header = register_sysctl("sunrpc", debug_table,
+ ARRAY_SIZE(debug_table));
}
void
diff --git a/net/sunrpc/xprtrdma/svc_rdma.c b/net/sunrpc/xprtrdma/svc_rdma.c
index f0d5eeed4c88..df7fb9c8b785 100644
--- a/net/sunrpc/xprtrdma/svc_rdma.c
+++ b/net/sunrpc/xprtrdma/svc_rdma.c
@@ -246,7 +246,8 @@ static int svc_rdma_proc_init(void)
goto out_err;
svcrdma_table_header = register_sysctl("sunrpc/svc_rdma",
- svcrdma_parm_table);
+ svcrdma_parm_table,
+ ARRAY_SIZE(svcrdma_parm_table));
return 0;
out_err:
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 29b0562d62e7..bf43e05044a3 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -790,7 +790,9 @@ int xprt_rdma_init(void)
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl("sunrpc", xr_tunables_table);
+ sunrpc_table_header = register_sysctl("sunrpc",
+ xr_tunables_table,
+ ARRAY_SIZE(xr_tunables_table));
#endif
return 0;
}
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 5f9030b81c9e..7c3d5ed708be 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -3169,7 +3169,9 @@ static struct xprt_class xs_bc_tcp_transport = {
int init_socket_xprt(void)
{
if (!sunrpc_table_header)
- sunrpc_table_header = register_sysctl("sunrpc", xs_tunables_table);
+ sunrpc_table_header = register_sysctl("sunrpc",
+ xs_tunables_table,
+ ARRAY_SIZE(xs_tunables_table));
xprt_register_transport(&xs_local_transport);
xprt_register_transport(&xs_udp_transport);
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 1757c18ea065..f96e6633fdd3 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -101,7 +101,7 @@ __init int net_sysctl_init(void)
* registering "/proc/sys/net" as an empty directory not in a
* network namespace.
*/
- net_header = register_sysctl("net", empty);
+ net_header = register_sysctl("net", empty, 0);
if (!net_header)
goto out;
ret = register_pernet_subsys(&sysctl_pernet_ops);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index f431251ffb91..b77344506cf3 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1785,7 +1785,8 @@ static struct ctl_table apparmor_sysctl_table[] = {
static int __init apparmor_init_sysctl(void)
{
- return register_sysctl("kernel", apparmor_sysctl_table) ? 0 : -ENOMEM;
+ return register_sysctl("kernel", apparmor_sysctl_table,
+ ARRAY_SIZE(apparmor_sysctl_table)) ? 0 : -ENOMEM;
}
#else
static inline int apparmor_init_sysctl(void)
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index ebae964f7cc9..6f2cc827df41 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -256,7 +256,8 @@ static int __init loadpin_init(void)
enforce ? "" : "not ");
parse_exclude();
#ifdef CONFIG_SYSCTL
- if (!register_sysctl("kernel/loadpin", loadpin_sysctl_table))
+ if (!register_sysctl("kernel/loadpin", loadpin_sysctl_table,
+ ARRAY_SIZE(loadpin_sysctl_table)))
pr_notice("sysctl registration failed!\n");
#endif
security_add_hooks(loadpin_hooks, ARRAY_SIZE(loadpin_hooks), "loadpin");
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 2503cf153d4a..7b8164a4b504 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -461,7 +461,8 @@ static struct ctl_table yama_sysctl_table[] = {
};
static void __init yama_init_sysctl(void)
{
- if (!register_sysctl("kernel/yama", yama_sysctl_table))
+ if (!register_sysctl("kernel/yama", yama_sysctl_table,
+ ARRAY_SIZE(yama_sysctl_table)))
panic("Yama: sysctl registration failed.\n");
}
#else
--
2.30.2
^ permalink raw reply related
* RE: [PATCH v8 1/2] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
From: Dexuan Cui @ 2023-06-21 0:28 UTC (permalink / raw)
To: Dave Hansen, ak@linux.intel.com, arnd@arndb.de, bp@alien8.de,
brijesh.singh@amd.com, dan.j.williams@intel.com,
dave.hansen@linux.intel.com, Haiyang Zhang, hpa@zytor.com,
jane.chu@oracle.com, kirill.shutemov@linux.intel.com,
KY Srinivasan, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com,
peterz@infradead.org, rostedt@goodmis.org,
sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com,
tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org,
x86@kernel.org, Michael Kelley (LINUX)
Cc: linux-kernel@vger.kernel.org, Tianyu Lan,
rick.p.edgecombe@intel.com
In-Reply-To: <90ff7c36-9b2e-c791-dc26-3644b9ff20df@intel.com>
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: Tuesday, June 20, 2023 4:34 PM
> > ...
> > - if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> > + while (retry_count < max_retries_per_page) {
> > + memset(&args, 0, sizeof(args));
> > + args.r10 = TDX_HYPERCALL_STANDARD;
> > + args.r11 = TDVMCALL_MAP_GPA;
> > + args.r12 = start;
> > + args.r13 = end - start;
> > +
>
> What's wrong with:
>
> while (retry_count < max_retries_per_page) {
> struct tdx_hypercall_args args = {
> .r10 = TDX_HYPERCALL_STANDARD,
> .r11 = TDVMCALL_MAP_GPA,
> .r12 = start,
> .r13 = end - start };
>
> ?
>
> Or maybe with the brackets slightly differently arranged.
>
> Why'd you declare all the variables outside the while() loop anyway?
Thanks for the suggestion of making the code compact!
I'll apply the below diff, and post v9 tomorrow (trying to
not post too frequently...)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 08eac8f46c11..1cb7e9ee3a68 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -710,9 +710,8 @@ static bool tdx_cache_flush_required(void)
*/
static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
{
+ /* Retrying the hypercall a second time should succeed; use 3 just in case */
const int max_retries_per_page = 3;
- struct tdx_hypercall_args args;
- u64 map_fail_paddr, ret;
int retry_count = 0;
if (!enc) {
@@ -722,13 +721,14 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
}
while (retry_count < max_retries_per_page) {
- memset(&args, 0, sizeof(args));
- args.r10 = TDX_HYPERCALL_STANDARD;
- args.r11 = TDVMCALL_MAP_GPA;
- args.r12 = start;
- args.r13 = end - start;
-
- ret = __tdx_hypercall_ret(&args);
+ struct tdx_hypercall_args args = {
+ .r10 = TDX_HYPERCALL_STANDARD,
+ .r11 = TDVMCALL_MAP_GPA,
+ .r12 = start,
+ .r13 = end - start };
+
+ u64 map_fail_paddr;
+ u64 ret = __tdx_hypercall_ret(&args);
if (ret != TDVMCALL_STATUS_RETRY)
return !ret;
/*
The function now looks like this:
/*
* Notify the VMM about page mapping conversion. More info about ABI
* can be found in TDX Guest-Host-Communication Interface (GHCI),
* section "TDG.VP.VMCALL<MapGPA>".
*/
static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
{
/* Retrying the hypercall a second time should succeed; use 3 just in case */
const int max_retries_per_page = 3;
int retry_count = 0;
if (!enc) {
/* Set the shared (decrypted) bits: */
start |= cc_mkdec(0);
end |= cc_mkdec(0);
}
while (retry_count < max_retries_per_page) {
struct tdx_hypercall_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = TDVMCALL_MAP_GPA,
.r12 = start,
.r13 = end - start };
u64 map_fail_paddr;
u64 ret = __tdx_hypercall_ret(&args);
if (ret != TDVMCALL_STATUS_RETRY)
return !ret;
/*
* The guest must retry the operation for the pages in the
* region starting at the GPA specified in R11. R11 comes
* from the untrusted VMM. Sanity check it.
*/
map_fail_paddr = args.r11;
if (map_fail_paddr < start || map_fail_paddr >= end)
return false;
/* "Consume" a retry without forward progress */
if (map_fail_paddr == start) {
retry_count++;
continue;
}
start = map_fail_paddr;
retry_count = 0;
}
return false;
}
^ permalink raw reply related
* Re: [PATCH v8 1/2] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
From: Dave Hansen @ 2023-06-20 23:34 UTC (permalink / raw)
To: Dexuan Cui, ak, arnd, bp, brijesh.singh, dan.j.williams,
dave.hansen, haiyangz, hpa, jane.chu, kirill.shutemov, kys,
linux-arch, linux-hyperv, luto, mingo, peterz, rostedt,
sathyanarayanan.kuppuswamy, seanjc, tglx, tony.luck, wei.liu, x86,
mikelley
Cc: linux-kernel, Tianyu.Lan, rick.p.edgecombe
In-Reply-To: <20230620154830.25442-2-decui@microsoft.com>
On 6/20/23 08:48, Dexuan Cui wrote:
> -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
> {
> - phys_addr_t start = __pa(vaddr);
> - phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
> + const int max_retries_per_page = 3;
> + struct tdx_hypercall_args args;
> + u64 map_fail_paddr, ret;
> + int retry_count = 0;
>
> if (!enc) {
> /* Set the shared (decrypted) bits: */
> @@ -718,12 +720,49 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> end |= cc_mkdec(0);
> }
>
> - /*
> - * Notify the VMM about page mapping conversion. More info about ABI
> - * can be found in TDX Guest-Host-Communication Interface (GHCI),
> - * section "TDG.VP.VMCALL<MapGPA>"
> - */
> - if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> + while (retry_count < max_retries_per_page) {
> + memset(&args, 0, sizeof(args));
> + args.r10 = TDX_HYPERCALL_STANDARD;
> + args.r11 = TDVMCALL_MAP_GPA;
> + args.r12 = start;
> + args.r13 = end - start;
> +
What's wrong with:
while (retry_count < max_retries_per_page) {
struct tdx_hypercall_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = TDVMCALL_MAP_GPA,
.r12 = start,
.r13 = end - start };
?
Or maybe with the brackets slightly differently arranged.
Why'd you declare all the variables outside the while() loop anyway?
^ permalink raw reply
* Re: [PATCH v8 1/2] x86/tdx: Retry TDVMCALL_MAP_GPA() when needed
From: Sathyanarayanan Kuppuswamy @ 2023-06-20 19:44 UTC (permalink / raw)
To: Dexuan Cui, ak@linux.intel.com, arnd@arndb.de, bp@alien8.de,
brijesh.singh@amd.com, dan.j.williams@intel.com,
dave.hansen@intel.com, dave.hansen@linux.intel.com, Haiyang Zhang,
hpa@zytor.com, jane.chu@oracle.com,
kirill.shutemov@linux.intel.com, KY Srinivasan,
linux-arch@vger.kernel.org, linux-hyperv@vger.kernel.org,
luto@kernel.org, mingo@redhat.com, peterz@infradead.org,
rostedt@goodmis.org, seanjc@google.com, tglx@linutronix.de,
tony.luck@intel.com, wei.liu@kernel.org, x86@kernel.org,
Michael Kelley (LINUX)
Cc: linux-kernel@vger.kernel.org, Tianyu Lan,
rick.p.edgecombe@intel.com
In-Reply-To: <SA1PR21MB13359EB1A88FC676C2269318BF5CA@SA1PR21MB1335.namprd21.prod.outlook.com>
Hi,
On 6/20/23 12:23 PM, Dexuan Cui wrote:
>> From: Sathyanarayanan Kuppuswamy
>> Sent: Tuesday, June 20, 2023 11:31 AM
>>> ...
>>> -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages,
>> bool enc)
>>> +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
>>> {
>>> - phys_addr_t start = __pa(vaddr);
>>> - phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE);
>>> + const int max_retries_per_page = 3;
>>
>> Add some details about why you chose 3? Maybe you can also use macro for it.
>
> It's a small number recommended by Kirill:
> https://lwn.net/ml/linux-kernel/20221208194800.n27ak4xj6pmyny46@box.shutemov.name/
>
> The spec doesn't define a max retry count. Normally I guess a max retry count
> of 2 should be enough, at least for Hyper-V according to my testing.
>
> Maybe we can add a comment like this:
>
> /* Retrying the hypercall a second time should succeed; use 3 just in case. */
>
> Does this look good to all?
Looks fine to me.
>
>>> + struct tdx_hypercall_args args;
>>> + u64 map_fail_paddr, ret;
>>> + int retry_count = 0;
>>>
>>> if (!enc) {
>>> /* Set the shared (decrypted) bits: */
>>> @@ -718,12 +720,49 @@ static bool tdx_enc_status_changed(unsigned long
>> vaddr, int numpages, bool enc)
>>> end |= cc_mkdec(0);
>>> }
>>>
>>> - /*
>>> - * Notify the VMM about page mapping conversion. More info about ABI
>>> - * can be found in TDX Guest-Host-Communication Interface (GHCI),
>>> - * section "TDG.VP.VMCALL<MapGPA>"
>>> - */
>>> - if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
>>> + while (retry_count < max_retries_per_page) {
>>> + memset(&args, 0, sizeof(args));
>>> + args.r10 = TDX_HYPERCALL_STANDARD;
>>> + args.r11 = TDVMCALL_MAP_GPA;
>>> + args.r12 = start;
>>> + args.r13 = end - start;
>>> +
>>> + ret = __tdx_hypercall_ret(&args);
>>> + if (ret != TDVMCALL_STATUS_RETRY)
>>> + return !ret;
>>> + /*
>>> + * The guest must retry the operation for the pages in the
>>> + * region starting at the GPA specified in R11. R11 comes
>>> + * from the untrusted VMM. Sanity check it.
>>> + */
>>> + map_fail_paddr = args.r11;
>>
>> Do you really need map_fail_paddr? Why not directly use args.r11?
>>
>>> + if (map_fail_paddr < start || map_fail_paddr >= end)
>>> + return false;
>
> Originally, I used r11.
>
> Dave says " 'r11' needs a real, logical name":
> https://lwn.net/ml/linux-kernel/6bb65614-d420-49d3-312f-316dc8ca4cc4@intel.com/
Got it.
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox