From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=codeconstruct.com.au header.i=@codeconstruct.com.au header.b="UNBw6HWu" Received: from codeconstruct.com.au (pi.codeconstruct.com.au [203.29.241.158]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE0D5196; Thu, 16 Nov 2023 23:29:10 -0800 (PST) Received: from pecola.lan (unknown [159.196.93.152]) by mail.codeconstruct.com.au (Postfix) with ESMTPSA id 5CD722012D; Fri, 17 Nov 2023 15:29:05 +0800 (AWST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=codeconstruct.com.au; s=2022a; t=1700206145; bh=7sEQcgSTINlPYHwYMq1CMcVwbUsEYi/P5XQkuSsDRCo=; h=Subject:From:To:Cc:Date:In-Reply-To:References; b=UNBw6HWul8s+67Ir3ekXoy2rLntGk1qPfoPfscIFFECmO4z1ujMe4+FykYoxi+JkT 6C0Naxi2l2oKkysuem0u8TxEXx75V5JlzTo4Bvi2THAwZixBgidrb8yAw1c11UQN/A 7msLVMGqUUvI2aIgRcPw20bZZl8wQShfDCXqq/Zx8yiRdzFHW4CJUtoyf22ocLprpw aVKUEsbY4z+/6XJDYGVE+FJrqXom7Hw/ccDurc5l/3Yt9pgifDgdI/FXmhl6FRb5e7 KI9BMP44lSMOwK0PQEpksG+t/m0ch3aNn8pbEOPKH8pFWfJdcYVGqGzF+LC/4LN/Zj AMSX+AbxubzHg== Message-ID: Subject: Re: [PATCH] mctp-i2c: increase the MCTP_I2C_TX_WORK_LEN to 500 From: Jeremy Kerr To: Jinliang Wang , Matt Johnston Cc: William Kennington , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Date: Fri, 17 Nov 2023 15:29:05 +0800 In-Reply-To: <20231117070457.1970786-1-jinliangw@google.com> References: <20231117070457.1970786-1-jinliangw@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4-2 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hi Jinliang, > Tested: > Before the fix, we will see below message in kernel log when > concurrently sending namespace create commands to the 4 NVMe-MI > devices on the same i2c bus: > kernel: i2c i2c-6 mctpi2c6: BUG! Tx Ring full when queue awake! >=20 > After the fix, the error message is gone. Thanks for the report, but I don't think this is the correct fix: you should not hit that error even if > TX_WORK_LEN packets need to be sent. The net core should not be attempting to queue more skbs after netif_stop_queue(), which we do in the conditional below the warning: spin_lock_irqsave(&midev->tx_queue.lock, flags); if (skb_queue_len(&midev->tx_queue) >=3D MCTP_I2C_TX_WORK_LEN) { netif_stop_queue(dev); spin_unlock_irqrestore(&midev->tx_queue.lock, flags); netdev_err(dev, "BUG! Tx Ring full when queue awake!\n"); return NETDEV_TX_BUSY; } __skb_queue_tail(&midev->tx_queue, skb); if (skb_queue_len(&midev->tx_queue) =3D=3D MCTP_I2C_TX_WORK_LEN) netif_stop_queue(dev); spin_unlock_irqrestore(&midev->tx_queue.lock, flags); What looks like has happened here: 1) we have TX_WORK_LEN-1 packets queued 2) we release a flow, which queues the "marker" skb. the tx_queue now has TX_WORK_LEN items 3) we queue another packet, ending up with TX_WORK_LEN+1 in the queue 4) the =3D=3D TX_WORK_LEN test fails, so we dont do a netif_stop_queue() A couple of potential fixes: * We do the check and conditional netif_stop_queue() in (2) * We change the check there to be `>=3D MCTP_I2C_TX_WORK_LEN` Matt, any preferences? Cheers, Jeremy