From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91B3C1D302 for ; Tue, 25 Jul 2023 11:25:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A53CC433C7; Tue, 25 Jul 2023 11:25:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1690284352; bh=jbU/E+nKIFSxau9+ObOakyHqpYw2tMLSD6dmiCqF+Jk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IFuQrCwIoTF1kVsuBa3H135PZb89dMFXuqnU/5PyKbB5mXgs0YqOQ4frdnUryPgu3 qwCCngk4I/WAC6zN3XhnGZgJxA5KhaiMRG5Oy6wXBxvpkXLQGQCG9VyQ/g3Fi7zaiU A5un+ePqu0HvdqfCilCaShzyRHjn/KFeQj0Wv8SA= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Linus Torvalds , Manuel Leiner , "Jason A. Donenfeld" , "David S. Miller" Subject: [PATCH 5.10 326/509] wireguard: queueing: use saner cpu selection wrapping Date: Tue, 25 Jul 2023 12:44:25 +0200 Message-ID: <20230725104608.619577396@linuxfoundation.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725104553.588743331@linuxfoundation.org> References: <20230725104553.588743331@linuxfoundation.org> User-Agent: quilt/0.67 Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Jason A. Donenfeld commit 7387943fa35516f6f8017a3b0e9ce48a3bef9faa upstream. Using `% nr_cpumask_bits` is slow and complicated, and not totally robust toward dynamic changes to CPU topologies. Rather than storing the next CPU in the round-robin, just store the last one, and also return that value. This simplifies the loop drastically into a much more common pattern. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Reported-by: Linus Torvalds Tested-by: Manuel Leiner Signed-off-by: Jason A. Donenfeld Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireguard/queueing.c | 1 + drivers/net/wireguard/queueing.h | 25 +++++++++++-------------- drivers/net/wireguard/receive.c | 2 +- drivers/net/wireguard/send.c | 2 +- 4 files changed, 14 insertions(+), 16 deletions(-) --- a/drivers/net/wireguard/queueing.c +++ b/drivers/net/wireguard/queueing.c @@ -28,6 +28,7 @@ int wg_packet_queue_init(struct crypt_qu int ret; memset(queue, 0, sizeof(*queue)); + queue->last_cpu = -1; ret = ptr_ring_init(&queue->ring, len, GFP_KERNEL); if (ret) return ret; --- a/drivers/net/wireguard/queueing.h +++ b/drivers/net/wireguard/queueing.h @@ -119,20 +119,17 @@ static inline int wg_cpumask_choose_onli return cpu; } -/* This function is racy, in the sense that next is unlocked, so it could return - * the same CPU twice. A race-free version of this would be to instead store an - * atomic sequence number, do an increment-and-return, and then iterate through - * every possible CPU until we get to that index -- choose_cpu. However that's - * a bit slower, and it doesn't seem like this potential race actually - * introduces any performance loss, so we live with it. +/* This function is racy, in the sense that it's called while last_cpu is + * unlocked, so it could return the same CPU twice. Adding locking or using + * atomic sequence numbers is slower though, and the consequences of racing are + * harmless, so live with it. */ -static inline int wg_cpumask_next_online(int *next) +static inline int wg_cpumask_next_online(int *last_cpu) { - int cpu = *next; - - while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask))) - cpu = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits; - *next = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits; + int cpu = cpumask_next(*last_cpu, cpu_online_mask); + if (cpu >= nr_cpu_ids) + cpu = cpumask_first(cpu_online_mask); + *last_cpu = cpu; return cpu; } @@ -161,7 +158,7 @@ static inline void wg_prev_queue_drop_pe static inline int wg_queue_enqueue_per_device_and_peer( struct crypt_queue *device_queue, struct prev_queue *peer_queue, - struct sk_buff *skb, struct workqueue_struct *wq, int *next_cpu) + struct sk_buff *skb, struct workqueue_struct *wq) { int cpu; @@ -175,7 +172,7 @@ static inline int wg_queue_enqueue_per_d /* Then we queue it up in the device queue, which consumes the * packet as soon as it can. */ - cpu = wg_cpumask_next_online(next_cpu); + cpu = wg_cpumask_next_online(&device_queue->last_cpu); if (unlikely(ptr_ring_produce_bh(&device_queue->ring, skb))) return -EPIPE; queue_work_on(cpu, wq, &per_cpu_ptr(device_queue->worker, cpu)->work); --- a/drivers/net/wireguard/receive.c +++ b/drivers/net/wireguard/receive.c @@ -531,7 +531,7 @@ static void wg_packet_consume_data(struc goto err; ret = wg_queue_enqueue_per_device_and_peer(&wg->decrypt_queue, &peer->rx_queue, skb, - wg->packet_crypt_wq, &wg->decrypt_queue.last_cpu); + wg->packet_crypt_wq); if (unlikely(ret == -EPIPE)) wg_queue_enqueue_per_peer_rx(skb, PACKET_STATE_DEAD); if (likely(!ret || ret == -EPIPE)) { --- a/drivers/net/wireguard/send.c +++ b/drivers/net/wireguard/send.c @@ -318,7 +318,7 @@ static void wg_packet_create_data(struct goto err; ret = wg_queue_enqueue_per_device_and_peer(&wg->encrypt_queue, &peer->tx_queue, first, - wg->packet_crypt_wq, &wg->encrypt_queue.last_cpu); + wg->packet_crypt_wq); if (unlikely(ret == -EPIPE)) wg_queue_enqueue_per_peer_tx(first, PACKET_STATE_DEAD); err: