From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Wang Subject: Re: [PATCH net-next] virtio_net: ethtool tx napi configuration Date: Fri, 14 Sep 2018 16:08:47 +0800 Message-ID: <83c595cb-b994-ef93-e9c2-9ad077d3b037@redhat.com> References: <20180909224449.203593-1-willemdebruijn.kernel@gmail.com> <2fd8d46f-7466-e507-af31-c587693beb6e@gmail.com> <4a3d69be-8651-e36d-bc14-5a3f1f23d155@redhat.com> <6db3c755-a1dc-dcc9-e110-bfc38143e83d@redhat.com> <0af0043d-c13c-e68e-795d-ef62901f57fc@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: Florian Fainelli , Network Development , David Miller , caleb.raitto@gmail.com, "Michael S. Tsirkin" , "Jon Olson (Google Drive)" , Willem de Bruijn To: Willem de Bruijn Return-path: Received: from mx1.redhat.com ([209.132.183.28]:41934 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727709AbeINNWO (ORCPT ); Fri, 14 Sep 2018 09:22:14 -0400 In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 2018年09月14日 12:46, Willem de Bruijn wrote: > On Thu, Sep 13, 2018 at 11:53 PM Jason Wang wrote: >> >> >> On 2018年09月14日 11:40, Willem de Bruijn wrote: >>> On Thu, Sep 13, 2018 at 11:27 PM Jason Wang wrote: >>>> >>>> On 2018年09月13日 22:58, Willem de Bruijn wrote: >>>>> On Thu, Sep 13, 2018 at 5:02 AM Jason Wang wrote: >>>>>> On 2018年09月13日 07:27, Willem de Bruijn wrote: >>>>>>> On Wed, Sep 12, 2018 at 3:11 PM Willem de Bruijn >>>>>>> wrote: >>>>>>>> On Wed, Sep 12, 2018 at 2:16 PM Florian Fainelli wrote: >>>>>>>>> On 9/12/2018 11:07 AM, Willem de Bruijn wrote: >>>>>>>>>> On Wed, Sep 12, 2018 at 1:42 PM Florian Fainelli wrote: >>>>>>>>>>> On 9/9/2018 3:44 PM, Willem de Bruijn wrote: >>>>>>>>>>>> From: Willem de Bruijn >>>>>>>>>>>> >>>>>>>>>>>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers. >>>>>>>>>>>> Interrupt moderation is currently not supported, so these accept and >>>>>>>>>>>> display the default settings of 0 usec and 1 frame. >>>>>>>>>>>> >>>>>>>>>>>> Toggle tx napi through a bit in tx-frames. So as to not interfere >>>>>>>>>>>> with possible future interrupt moderation, use bit 10, well outside >>>>>>>>>>>> the reasonable range of real interrupt moderation values. >>>>>>>>>>>> >>>>>>>>>>>> Changes are not atomic. The tx IRQ, napi BH and transmit path must >>>>>>>>>>>> be quiesced when switching modes. Only allow changing this setting >>>>>>>>>>>> when the device is down. >>>>>>>>>>> Humm, would not a private ethtool flag to switch TX NAPI on/off be more >>>>>>>>>>> appropriate rather than use the coalescing configuration API here? >>>>>>>>>> What do you mean by private ethtool flag? A new field in ethtool >>>>>>>>>> --features (-k)? >>>>>>>>> I meant using ethtool_drvinfo::n_priv_flags, ETH_SS_PRIV_FLAGS and then >>>>>>>>> ETHTOOL_GFPFLAGS and ETHTOOL_SPFLAGS to control the toggling of that >>>>>>>>> private flag. mlx5 has a number of privates flags for instance. >>>>>>>> Interesting, thanks! I was not at all aware of those ethtool flags. >>>>>>>> Am having a look. It definitely looks promising. >>>>>>> Okay, I made that change. That is indeed much cleaner, thanks. >>>>>>> Let me send the patch, initially as RFC. >>>>>>> >>>>>>> I've observed one issue where if we toggle the flag before bringing >>>>>>> up the device, it hits a kernel BUG at include/linux/netdevice.h:515 >>>>>>> >>>>>>> BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); >>>>>> This reminds me that we need to check netif_running() before trying to >>>>>> enable and disable tx napi in ethtool_set_coalesce(). >>>>> The first iteration of my patch checked IFF_UP and effectively >>>>> only allowed the change when not running. What do you mean >>>>> by need to check? >>>> I mean if device is not up, there's no need to toggle napi state and tx >>>> lock. >>>> >>>>> And to respond to the other follow-up notes at once: >>>>> >>>>>> Consider we may have interrupt moderation in the future, I tend to use >>>>>> set_coalesce. Otherwise we may need two steps to enable moderation: >>>>>> >>>>>> - tx-napi on >>>>>> - set_coalesce >>>>> FWIW, I don't care strongly whether we do this through coalesce or priv_flags. >>>> Ok. >>> Since you prefer coalesce, let's go with that (and a revision of your >>> latest patch). >> Good to know this. >> >>>>>>> + if (!napi_weight) >>>>>>> + virtqueue_enable_cb(vi->sq[i].vq); >>>>>> I don't get why we need to disable enable cb here. >>>>> To avoid entering no-napi mode with too few descriptors to >>>>> make progress and no way to get out of that state. This is a >>>>> pretty crude attempt at handling that, admittedly. >>>> But in this case, we will call enable_cb_delayed() and we will finally >>>> get a interrupt? >>> Right. It's a bit of a roundabout way to ensure that >>> netif_tx_wake_queue and thus eventually free_old_xmit_skbs are called. >>> It might make more sense to just wake the device without going through >>> an interrupt. >> I'm not sure I get this. If we don't enable tx napi, we tend to delay TX >> interrupt if we found the ring is about to full to avoid interrupt >> storm, so we're probably ok in this case. > I'm only concerned about the transition state when converting from > napi to no-napi when the queue is stopped and tx interrupt disabled. > > With napi mode the interrupt is only disabled if napi is scheduled, > in which case it will eventually reenable the interrupt. But when > switching to no-napi mode in this state no progress will be made. > > But it seems this cannot happen. When converting to no-napi > mode, set_coalesce waits for napi to complete in napi_disable. > So the interrupt should always start enabled when transitioning > into no-napi mode. Yes, I see. Thanks