From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C0E1318F for ; Thu, 15 Jun 2023 00:36:22 +0000 (UTC) Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D123269A for ; Wed, 14 Jun 2023 17:36:21 -0700 (PDT) Received: by mail-wr1-x42b.google.com with SMTP id ffacd0b85a97d-30fa23e106bso4468463f8f.3 for ; Wed, 14 Jun 2023 17:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686789379; x=1689381379; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject:from:to:cc :subject:date:message-id:reply-to; bh=zHkXaOd7qRKf037qagBHI0+WXbHOgwyJxZL5RlVdXyA=; b=nwKdBkTDeGw3t6tVoNLO0Ta/nOVLlYIYz0B91FE7Qp25EfPXE81EKy5xGlQ6GxRCQ5 zVdcic1tjsDQG2UsBExKIkPUjewb07Q35A+wN04HUsEy6KBXkQE8C5i32tZhm5wDjFSR Vq+rO7IJs3E4N9Fkq8BxVMBbiq4I3j2UEiXucgblucTykJDAEJOxpnnn8Fa1e0c2Gu1i GlDrYlRrLmI9g/pUoyszwHxe4r7TSAOWmLsEnZSohR0+gfdqFjoFNhYqpXSu6GEj+Mrf qYnYslHAnQnGNaMSjFW6NjLD65d4/EnDNBgyLAadgkQer/fFxnXT4aD8SDLAI6jIVrWy ndrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686789379; x=1689381379; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zHkXaOd7qRKf037qagBHI0+WXbHOgwyJxZL5RlVdXyA=; b=DAqNvjo4FqqaS/PPLxIL2ytufGhkRSxDCQSlkr/U7cd5lJXDMnmmy/smhu1ahcBnNp WnQ0GE4uZzjN5L6LycR0TnG78ZzEj+ANp6GhzXqOYo2bCf76hoLwQTnR4OZbgFjrNjNy i+6/EYMMXPJvpO2x+LRqLPKf/fy/Ab/C90oCCNfQEu4KWbyWS0eQ7B3dxBunER/1Wie0 v53GnMG09z9G2m3+QodFmxeRkElX5FToGexgxpeVEJN6vTbCOHv6jDob+qFLFhQ4wx/K qNsbnhdaolM93CzWnkMVfGOVSYq3lD1hllufIkn/orI1WoHmpfFZoByquOws+qJLzdHv Y0rg== X-Gm-Message-State: AC+VfDxHIf/NXBHGHDVBUcifKlVLf61tMXxeXnskpPmiBCegvzEmWlRW DFxN7Hv630LfbDp/EaxE0EdM5GYWrWc= X-Google-Smtp-Source: ACHHUZ5+Hn9I0W9SWFBtrgxmgnLUP25oY6spET1+RGnm6FuO3kOOwwnIJPXaBZ/rmcZKorOJUPUEjQ== X-Received: by 2002:a5d:6e11:0:b0:30f:b045:8b60 with SMTP id h17-20020a5d6e11000000b0030fb0458b60mr7302118wrz.69.1686789379349; Wed, 14 Jun 2023 17:36:19 -0700 (PDT) Received: from [192.168.1.122] (cpc159313-cmbg20-2-0-cust161.5-4.cable.virginm.net. [82.0.78.162]) by smtp.gmail.com with ESMTPSA id 10-20020a05600c24ca00b003f7e4d143cfsm18639428wmu.15.2023.06.14.17.36.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Jun 2023 17:36:18 -0700 (PDT) Subject: Re: [PATCH net] sfc: use budget for TX completions To: Jakub Kicinski , Martin Habets Cc: =?UTF-8?B?w43DsWlnbyBIdWd1ZXQ=?= , davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, netdev@vger.kernel.org, linux-net-drivers@amd.com, Fei Liu References: <20230612144254.21039-1-ihuguet@redhat.com> <20230614102744.71c91f20@kernel.org> From: Edward Cree Message-ID: Date: Thu, 15 Jun 2023 01:36:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <20230614102744.71c91f20@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net On 14/06/2023 18:27, Jakub Kicinski wrote: > The documentation is pretty recent. I haven't seen this lockup once > in production or testing. Do multiple queues complete on the same CPU > for SFC or something weird like that? I think the key question here is can one CPU be using a TXQ to send while another CPU is in a NAPI poll on the same channel and thus trying to clean the EVQ that the TXQ is using. If so the NAPI poll could last forever; if not then it shouldn't ever have more than 8k (or whatever the TX ring size is set to) events to process. And even ignoring affinity of the core TXQs, at the very least XDP TXQs can serve different CPUs to the one on which their EVQ (and hence NAPI poll) lives, which means they can keep filling the EVQ as fast as the NAPI poll empties it, and thus keep ev_process looping forever. In principle this can also happen with other kinds of events, e.g. if the MC goes crazy and generates infinite MCDI-event spam then NAPI poll will spin on that CPU forever eating the events. So maybe this limit needs to be broader than just TX events? A hard cap on the number of events (regardless of type) that can be consumed in a single efx_ef10_ev_process() invocation, perhaps? -ed