From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932798AbdIYG5o (ORCPT ); Mon, 25 Sep 2017 02:57:44 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33402 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932517AbdIYG5m (ORCPT ); Mon, 25 Sep 2017 02:57:42 -0400 X-Google-Smtp-Source: AOwi7QAczXffF8rcgQ0ed4uHPAQcD2MJlkJFITEhTkWSOTIb70khabdUA/G0SiwECubG9dUiBN1iEQ== Date: Mon, 25 Sep 2017 08:57:40 +0200 From: Jiri Pirko To: Yunsheng Lin Cc: "davem@davemloft.net" , huangdaode , "xuwei (O)" , "Liguozhu (Kenneth)" , "Zhuangyuzeng (Yisen)" , Gabriele Paoloni , John Garry , Linuxarm , Salil Mehta , "lipeng (Y)" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH net-next 10/10] net: hns3: Add mqprio support when interacting with network stack Message-ID: <20170925065740.GB1899@nanopsycho> References: <1505992913-107256-1-git-send-email-linyunsheng@huawei.com> <1505992913-107256-11-git-send-email-linyunsheng@huawei.com> <20170922125541.GA2005@nanopsycho.orion> <59c51a37.a1c4df0a.ac4e2.8df0SMTPIN_ADDED_BROKEN@mx.google.com> <20170922160322.GB2005@nanopsycho.orion> <20170924113724.GA2029@nanopsycho> <290b0679-bfc2-c23c-00ee-43768c1c2327@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <290b0679-bfc2-c23c-00ee-43768c1c2327@huawei.com> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mon, Sep 25, 2017 at 02:45:08AM CEST, linyunsheng@huawei.com wrote: >Hi, Jiri > >On 2017/9/24 19:37, Jiri Pirko wrote: >> Sat, Sep 23, 2017 at 02:47:20AM CEST, linyunsheng@huawei.com wrote: >>> Hi, Jiri >>> >>> On 2017/9/23 0:03, Jiri Pirko wrote: >>>> Fri, Sep 22, 2017 at 04:11:51PM CEST, linyunsheng@huawei.com wrote: >>>>> Hi, Jiri >>>>> >>>>>>> - if (!tc) { >>>>>>> + if (if_running) { >>>>>>> + (void)hns3_nic_net_stop(netdev); >>>>>>> + msleep(100); >>>>>>> + } >>>>>>> + >>>>>>> + ret = (kinfo->dcb_ops && kinfo->dcb_ops->>setup_tc) ? >>>>>>> + kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : ->EOPNOTSUPP; >>>>> >>>>>> This is most odd. Why do you call dcb_ops from >ndo_setup_tc callback? >>>>>> Why are you mixing this together? prio->tc mapping >can be done >>>>>> directly in dcbnl >>>>> >>>>> Here is what we do in dcb_ops->setup_tc: >>>>> Firstly, if current tc num is different from the tc num >>>>> that user provide, then we setup the queues for each >>>>> tc. >>>>> >>>>> Secondly, we tell hardware the pri to tc mapping that >>>>> the stack is using. In rx direction, our hardware need >>>>> that mapping to put different packet into different tc' >>>>> queues according to the priority of the packet, then >>>>> rss decides which specific queue in the tc should the >>>>> packet goto. >>>>> >>>>> By mixing, I suppose you meant why we need the >>>>> pri to tc infomation? >>>> >>>> by mixing, I mean what I wrote. You are calling dcb_ops callback from >>>> ndo_setup_tc callback. So you are mixing DCBNL subsystem and TC >>>> subsystem. Why? Why do you need sch_mqprio? Why DCBNL is not enough for >>>> all? >>> >>> When using lldptool, dcbnl is involved. >>> >>> But when using tc qdisc, dcbbl is not involved, below is the a few key >>> call graph in the kernel when tc qdisc cmd is executed. >>> >>> cmd: >>> tc qdisc add dev eth0 root handle 1:0 mqprio num_tc 4 map 1 2 3 3 1 3 1 1 hw 1 >>> >>> call graph: >>> rtnetlink_rcv_msg -> tc_modify_qdisc -> qdisc_create -> mqprio_init -> >>> hns3_nic_setup_tc >>> >>> When hns3_nic_setup_tc is called, we need to know how many tc num and >>> prio_tc mapping from the tc_mqprio_qopt which is provided in the paramter >>> in the ndo_setup_tc function, and dcb_ops is the our hardware specific >>> method to setup the tc related parameter to the hardware, so this is why >>> we call dcb_ops callback in ndo_setup_tc callback. >>> >>> I hope this will answer your question, thanks for your time. >> >> Okay. I understand that you have a usecase for mqprio mapping offload >> without lldptool being involved. Ok. I believe it is wrong to call dcb_ops >> from tc callback. You should have a generic layer inside the driver and >> call it from both dcb_ops and tc callbacks. > >Actually, dcb_ops is our generic layer inside the driver. >Below is high level architecture: > > [ tc qdisc ] [ lldpad ] > | | > | | > | | > [ hns3_enet ] [ hns3_dcbnl ] > \ / > \ / > \ / > [ hclge_dcb ] > / \ > / \ > / \ > [ hclgc_main ] [ hclge_tm ] > >hns3_enet.c implements the ndo_setup_tc callback. >hns3_dcbnl.c implements the dcbnl_rtnl_ops for stack's DCBNL system. >hclge_dcb implements the dcb_ops. >So we already have a generic layer that tc and dcbnl all call from. > >> >> Also, what happens If I run lldptool concurrently with mqprio? Who wins >> and is going to configure the mapping? > >Both lldptool and tc qdisc cmd use rtnl interface provided by stack, so >they are both protected by rtnl_lock, so we do not have to do the locking >in the driver. I was not asking about locking, which is obvious, I was asking about the behaviour. Like for example: If I use tc to configure some mapping, later on I run lldptool and change the mapping. Does the tc dump show the updated values or the original ones? > >The locking is in rtnetlink_rcv_msg: > > rtnl_lock(); > handlers = rtnl_dereference(rtnl_msg_handlers[family]); > if (handlers) { > doit = READ_ONCE(handlers[type].doit); > if (doit) > err = doit(skb, nlh, extack); > } > rtnl_unlock(); > >Thanks. > >> >> >>> >>>> >>>> >>>> >>>>> I hope I did not misunderstand your question, thanks >>>>> for your time reviewing. >>>> >>>> . >>>> >>> >> >> . >> >