From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B3732C0261 for ; Mon, 3 Nov 2025 08:41:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762159305; cv=none; b=aNGNUDw4WHuLnDI4jZNelfZ0SCLK2HeFwnDf2TXwLBf97CVbKyp5L729OlHp9SlgTChiun9eqPK0pSI0572PAEZI8g1RqMgM6+SkZYyXIvoyPek+RlsvBhJJohA+I7o4sSVhGI7tSetBWPyWH1ZZOI4kX9caLxIz6XiTpUolXis= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762159305; c=relaxed/simple; bh=S54nXXo/aFrvMC7QiuJUF89PwUrNEP2rWkjH7sjr3UQ=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=ndeX+RfdTY1VvhscJBj+6N6ruZE4acefbyEtG9SnfNgK6EnOh11mwSvaF4t7AdygFib8TnSqEAeNsRYMHbLzsitjX3ArwjJ0aRvI7gqaJ3cHR6p4PHbF6v7Zm5xvWSt1t7BOqilR7wPZB1o8had10jCuZCL2lpnAvgvvOT3QXgg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gbPSugoH; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gbPSugoH" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-78118e163e5so4202359b3a.0 for ; Mon, 03 Nov 2025 00:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762159303; x=1762764103; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=9IlYd8PBv158GdcOw6rn4rqU0pymAVgCcTAXsADiI14=; b=gbPSugoHdSCeMmpTd+jHuI8rJyR8vNuPTK6wYW4DH1RiQYNUFx0/znQpX9bAChVbWF roXpw9EHzdff6UPY0GqhwLYxeBDygVt5PxXv1sZGikscpiLXmEdllwUQfSbxMeJ8nhf4 6xa1CuGihra5zD+fuLEsdespU1jubzery4pgCIECTg7B064Oj6Y3vme8s/GmcxBj5CSU hVcQ1CcHO7VckrHXDDTfkGdqgoOq8hxU/QNsmUuXSp56J+XbIS3kTn7N/LQSgSHTSo5U d/GlF4nXVeGxkRxTx6GrkT+mNhs1Gn6WkWNPMFS1/DxoMaAkbjae10h5ORkj0D1043Hb zSUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762159303; x=1762764103; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9IlYd8PBv158GdcOw6rn4rqU0pymAVgCcTAXsADiI14=; b=dS58AHznlgbHS1aoD4Y+ALI46mlrRaj2aY11L1SVA43fkBuv3+AG4BSzd/0knQvk6o iPn9QNKp3QiIuTs5BYKsuzQ/aRSCPfc8LsHUcYMmE+UanZivZPfZuy5jerabm/4fB3vy kzHMOyKgkGbajP8BsbgsdSxAf3/AVwsuewxqGK1atV5qPyjKmUYjgduT9J6R/PIq5Ney J1fnPIJ7yq+RuqbUPtODYSizrnwf5829p/iRHjhmWUZgug+I6+2qhabh0IxvWXDxTgXV bdkpMlTx4C1jGm0l3APd6X0drbRDsmwaVHd0yauxb95IoQtcoPdXPDz8nasWO3jh3LbH bTpA== X-Forwarded-Encrypted: i=1; AJvYcCVUZmh9nvqbxbUVCeySlrqTANszdPJGkc2V/zWyDr9NhKB0zwqq+HBmDmNQEbpc5CH/OcR8ory5QZO7j/A=@vger.kernel.org X-Gm-Message-State: AOJu0YwAi69D7kmoVCO0Mfk+IsftwXSSPQbMi5zoC2MWRaKJeB8mc7iu 1I4GILzQPhVIhw5U9brV67tfbLQfCU87lKS82R2bMU9ZqapHNfuuJHV1 X-Gm-Gg: ASbGnctKsHOmLf0M7lhLb36BTnX3i48IRjn9C3CXu0BkjFzUAwZZVF3+VgFCaL33pQA jWk9z/cEuZpuf8yKWHMHA+d7kPJ/Ac7Oy6v36E+TtcZWEopE2ta+0Xgano+GkoHxWueJQIiNvMj nA+njm/ZSozkSl2mBeL35ePA6ZHloxrPXrcXe0Pzn88LiOEMqSQ/oGaIit/yh865lX/x7338mbM V/iMpmqxBWPBNO0b4hljqdBpS3563hZAoMntSrLKESY0tKnnVuJlYEQxw8qvD5741M1NwJAffKE THwuut4rXYHbVUcDOV2H7Z3GI3Ul8rztK7UlR++nRihZgLsmHt0U4WCFxW77HFxlRazgdBLUC+U wRnsrAttwQHTEJK4sxLV+OH5mx4Le0c5nOrCyhcPQsajNlDD95GvbSpR848TOqq0JXVKL6ftmAG HrB22w36aXhsdLdRn3tvAJ6fzupi0nzAlIEfJNarDtMDFN1ihdEVq+w0Z5xhQ6An4/YC+tTS1JG ZJXdzG4fgg= X-Google-Smtp-Source: AGHT+IHVqhieNuVGLPgUB+hpjN6Vf9ZUu7T1hKfxCCQidwT8I97GOgul3IVD0bS7e1uzQqBSFCZNtw== X-Received: by 2002:a17:902:d4ce:b0:295:6b98:6d65 with SMTP id d9443c01a7336-2956b986f84mr76085715ad.22.1762159303320; Mon, 03 Nov 2025 00:41:43 -0800 (PST) Received: from [192.168.99.24] (i218-47-167-230.s42.a013.ap.plala.or.jp. [218.47.167.230]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-2952689f15csm110200325ad.29.2025.11.03.00.41.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 Nov 2025 00:41:42 -0800 (PST) Message-ID: <8b70ba1d-323b-4e76-be7f-9df45b8f53d5@gmail.com> Date: Mon, 3 Nov 2025 17:41:37 +0900 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Toshiaki Makita Subject: Re: [PATCH net V2 2/2] veth: more robust handing of race to avoid txq getting stuck To: Jesper Dangaard Brouer Cc: Eric Dumazet , "David S. Miller" , Jakub Kicinski , Paolo Abeni , ihor.solodrai@linux.dev, "Michael S. Tsirkin" , makita.toshiaki@lab.ntt.co.jp, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com, netdev@vger.kernel.org, =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= References: <176159549627.5396.15971398227283515867.stgit@firesoul> <176159553930.5396.4492315010562655785.stgit@firesoul> <27e74aeb-89f5-4547-8ecc-232570e2644c@kernel.org> <4aa74767-082c-4407-8677-70508eb53a5d@gmail.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2025/10/31 4:06, Jesper Dangaard Brouer wrote: > On 29/10/2025 16.00, Toshiaki Makita wrote: >> On 2025/10/29 19:33, Jesper Dangaard Brouer wrote: >>> On 28/10/2025 15.56, Toshiaki Makita wrote: >>>> On 2025/10/28 5:05, Jesper Dangaard Brouer wrote: >>>>> (3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is >>>>> about to complete (napi_complete_done), it now also checks if the peer TXQ >>>>> is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will >>>>> reschedule itself. This prevents a new race where the producer stops the >>>>> queue just as the consumer is finishing its poll, ensuring the wakeup is not >>>>> missed. >>>> ... >>>> >>>>> @@ -986,7 +979,8 @@ static int veth_poll(struct napi_struct *napi, int budget) >>>>>       if (done < budget && napi_complete_done(napi, done)) { >>>>>           /* Write rx_notify_masked before reading ptr_ring */ >>>>>           smp_store_mb(rq->rx_notify_masked, false); >>>>> -        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) { >>>>> +        if (unlikely(!__ptr_ring_empty(&rq->xdp_ring) || >>>>> +                 (peer_txq && netif_tx_queue_stopped(peer_txq)))) { >>>> >>>> Not sure if this is necessary. >>> >>> How sure are you that this isn't necessary? >>> >>>>  From commitlog, your intention seems to be making sure to wake up the queue, >>>> but you wake up the queue immediately after this hunk in the same function, >>>> so isn't it guaranteed without scheduling another napi? >>>> >>> >>> The above code catches the case, where the ptr_ring is empty and the >>> tx_queue is stopped.  It feels wrong not to reach in this case, but you >>> *might* be right that it isn't strictly necessary, because below code >>> will also call netif_tx_wake_queue() which *should* have a SKB stored >>> that will *indirectly* trigger a restart of the NAPI. >> >> I'm a bit confused. >> Wrt (3), what you want is waking up the queue, right? >> Or, what you want is actually NAPI reschedule itself? > > I want NAPI to reschedule itself, the queue it woken up later close to > the exit of the function.  Maybe it is unnecessary to for NAPI to > reschedule itself here... and that is what you are objecting to? > >> My understanding was the former (wake up the queue). >> If it's correct, (3) seems not necessary because you have already woken up the >> queue in the same function. >> >> First NAPI >>   veth_poll() >>     // ptr_ring_empty() and queue_stopped() >>    __napi_schedule() ... schedule second NAPI >>    netif_tx_wake_queue() ... wake up the queue if queue_stopped() >> >> Second NAPI >>   veth_poll() >>    netif_tx_wake_queue() ... this is what you want, >>                              but the queue has been woken up in the first NAPI >>                              What's the point? >> > > So, yes I agree that there is a potential for restarting NAPI one time > too many.  But only *potential* because if NAPI is already/still running > then the producer will not actually start NAPI. > > I guess this is a kind of optimization, to avoid the time it takes to > restart NAPI. When we see that TXQ is stopped and ptr_ring is empty, > then we know that a packet will be sitting in the qdisc requeue queue, > and netif_tx_wake_queue() will very soon fill "produce" a packet into > ptr_ring (via calling ndo_start_xmit/veth_xmit). In some cases it may be an optimization but not in every case because it can prematurely start NAPI before tx side fills packets? > As this is a fixes patch I can drop this optimization. It seems both > Paolo and you thinks this isn't necessary. I think it's better to drop (3) as a fix. Toshiaki Makita