Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Ivan Khoronzhuk @ 2016-12-29  1:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap
In-Reply-To: <20161228234213.22166-1-grygorii.strashko@ti.com>

On Wed, Dec 28, 2016 at 05:42:13PM -0600, Grygorii Strashko wrote:
Grygorii,

> Now below code sequence causes "Unable to handle kernel NULL pointer
> dereference.." exception and system crash during CPSW CPDMA initialization:
> 
> cpsw_probe
> |-cpdma_chan_create (TX channel)
>   |-cpdma_chan_split_pool
>     |-cpdma_chan_set_descs(for TX channels)
>     |-cpdma_chan_set_descs(for RX channels) [1]
> 
> - and -
> static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
> 				 int rx, int desc_num,
> 				 int per_ch_desc)
> {
> 	struct cpdma_chan *chan, *most_chan = NULL;
> 
> ...
> 
> 	for (i = min; i < max; i++) {
> 		chan = ctlr->channels[i];
> 		if (!chan)
> 			continue;
> ...
> 
> 		if (most_dnum < chan->desc_num) {
> 			most_dnum = chan->desc_num;
> 			most_chan = chan;
> 		}
> 	}
> 	/* use remains */
> 	most_chan->desc_num += desc_cnt; [2]
> }
> 
> So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
> called second time [1], because there are no RX channels yet and system
> will crash at [2].

How did you get this?
I just remember as I fixed it before sending patchset.

Maybe it was some experiment with it.
I just wonder and want to find actual reason what's happening.

Look bellow:

cpsw_probe
|-cpdma_chan_create (TX channel)
  |-cpdma_chan_split_pool
    |-cpdma_chan_set_descs(for TX channels)
    |-cpdma_chan_set_descs(for RX channels) [1]

|-cpdma_chan_set_descs(for RX channels) in case you'be described has to be
called with rx_desc_num = 0, because all descs are assigned already for tx
channel. And, if desc_num = 0, cpdma_chan_set_descs just exits and no issues.
So, could you please explain how you get this, in what circumstances.

> 
> Hence, fix the issue by checking most_chan for NULL before accessing it.
> 
> Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index 36518fc..b349d572 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
>  		}
>  	}
>  	/* use remains */
> -	most_chan->desc_num += desc_cnt;
> +	if (most_chan)
> +		most_chan->desc_num += desc_cnt;
>  }
>  
>  /**
> -- 
> 2.10.1.dirty
> 

^ permalink raw reply

* Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Florian Fainelli @ 2016-12-29  0:40 UTC (permalink / raw)
  To: Kweh, Hock Leong, David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <F54AEECA5E2B9541821D670476DAE19C5A916EB3@PGSMSX102.gar.corp.intel.com>

On 12/28/2016 04:28 PM, Kweh, Hock Leong wrote:
>> Although this is required, we can't be doing it in all circumstances, we need to
>> mimic what stmmac_drv_remove() does.
>>
>> Let me submit an incremental fix which takes care of mdio bus unregistration.
>> --
>> Florian
> 
> Noted & Thanks. Will test it out once you submitted.

It's done:

https://www.spinics.net/lists/netdev/msg411934.html
-- 
Florian

^ permalink raw reply

* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29  0:28 UTC (permalink / raw)
  To: Florian Fainelli, David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <e88996a8-e010-c07f-7dee-d220a6dfc706@gmail.com>

> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Thursday, December 29, 2016 2:43 AM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>; David Miller
> <davem@davemloft.net>
> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; Ong, Boon Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> <weifeng.voon@intel.com>; lars.persson@axis.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
> 
> On 12/27/2016 09:49 PM, Kweh, Hock Leong wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong"	<hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >>> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Hi David & Florian,
> >
> > Just to clarify that I do not copy exactly from Florian.
> > I have changed it to have proper handling on mdio unregister while
> > netdev_register() failed as showed below:
> >
> > 	return 0;
> >
> > -error_mdio_register:
> > -	unregister_netdev(ndev);
> >  error_netdev_register:
> > +	stmmac_mdio_unregister(ndev);
> 
> Although this is required, we can't be doing it in all circumstances, we need to
> mimic what stmmac_drv_remove() does.
> 
> Let me submit an incremental fix which takes care of mdio bus unregistration.
> --
> Florian

Noted & Thanks. Will test it out once you submitted.

Thanks & Regards,
Wilson


^ permalink raw reply

* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29  0:26 UTC (permalink / raw)
  To: Kishan Sandeep
  Cc: David Miller, f.fainelli@gmail.com, Joao.Pinto@synopsys.com,
	peppe.cavallaro@st.com, seraphin.bonnaffe@st.com,
	alexandre.torgue@gmail.com, manabian@gmail.com,
	niklas.cassel@axis.com, johan@kernel.org, pavel@ucw.cz,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAJ3s=NAzfUSCnxxZph5j6Vk2T=pw2F9U3FLP7dOKRp21mmQvcg@mail.gmail.com>

> -----Original Message-----
> From: Kishan Sandeep [mailto:sandeepkishan108@gmail.com]
> Sent: Wednesday, December 28, 2016 7:56 PM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> Cc: David Miller <davem@davemloft.net>; f.fainelli@gmail.com;
> Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; lars.persson@axis.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
> 
> On Wed, Dec 28, 2016 at 7:10 AM, Kweh, Hock Leong
> <hock.leong.kweh@intel.com> wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong"      <hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >> > From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Noted & Thanks.
> >
> > Hi Florian, could you submit this fix from your side so that you are the author.
> > I will help to test out.
> >
> > Thanks & Regards,
> > Wilson
> >
> I think you can give *--author* for giving author name. Try git commit -am
> "commit message" --author="Author_name <author_email>"

Oh ... I am not aware of that. Thanks for informing. 
:-)

Regards,
Wilson

^ permalink raw reply

* [PATCH net] net: stmmac: Fix error path after register_netdev move
From: Florian Fainelli @ 2016-12-28 23:44 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, pavel, Joao.Pinto, seraphin.bonnaffe,
	alexandre.torgue, manabian, niklas.cassel, johan, boon.leong.ong,
	weifeng.voon, lars.persson, linux-kernel, Giuseppe Cavallaro,
	Alexandre Torgue

Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
stmmac_open") re-ordered how the MDIO bus registration and the network
device are registered, but missed to unwind the MDIO bus registration in
case we fail to register the network device.

Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5910ea51f8f6..39eb7a65bb9f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3366,12 +3366,19 @@ int stmmac_dvr_probe(struct device *device,
 	}
 
 	ret = register_netdev(ndev);
-	if (ret)
+	if (ret) {
 		netdev_err(priv->dev, "%s: ERROR %i registering the device\n",
 			   __func__, ret);
+		goto error_netdev_register;
+	}
 
 	return ret;
 
+error_netdev_register:
+	if (priv->hw->pcs != STMMAC_PCS_RGMII &&
+	    priv->hw->pcs != STMMAC_PCS_TBI &&
+	    priv->hw->pcs != STMMAC_PCS_RTBI)
+		stmmac_mdio_unregister(ndev);
 error_mdio_register:
 	netif_napi_del(&priv->napi);
 error_hw_init:
-- 
2.9.3

^ permalink raw reply related

* [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Grygorii Strashko @ 2016-12-28 23:42 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Ivan Khoronzhuk,
	Grygorii Strashko

Now below code sequence causes "Unable to handle kernel NULL pointer
dereference.." exception and system crash during CPSW CPDMA initialization:

cpsw_probe
|-cpdma_chan_create (TX channel)
  |-cpdma_chan_split_pool
    |-cpdma_chan_set_descs(for TX channels)
    |-cpdma_chan_set_descs(for RX channels) [1]

- and -
static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
				 int rx, int desc_num,
				 int per_ch_desc)
{
	struct cpdma_chan *chan, *most_chan = NULL;

...

	for (i = min; i < max; i++) {
		chan = ctlr->channels[i];
		if (!chan)
			continue;
...

		if (most_dnum < chan->desc_num) {
			most_dnum = chan->desc_num;
			most_chan = chan;
		}
	}
	/* use remains */
	most_chan->desc_num += desc_cnt; [2]
}

So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
called second time [1], because there are no RX channels yet and system
will crash at [2].

Hence, fix the issue by checking most_chan for NULL before accessing it.

Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 36518fc..b349d572 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
 		}
 	}
 	/* use remains */
-	most_chan->desc_num += desc_cnt;
+	if (most_chan)
+		most_chan->desc_num += desc_cnt;
 }
 
 /**
-- 
2.10.1.dirty

^ permalink raw reply related

* (unknown), 
From: doctornina @ 2016-12-28 22:43 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 24488470886248_netdev.zip --]
[-- Type: application/zip, Size: 43724 bytes --]

^ permalink raw reply

* Re: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
From: tndave @ 2016-12-28 21:55 UTC (permalink / raw)
  To: maowenan, jeffrey.t.kirsher@intel.com,
	intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, weiyongjun (A), Dingtianhong
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F201521164@szxeml504-mbs.china.huawei.com>



On 12/27/2016 04:40 PM, maowenan wrote:
>
>
>> -----Original Message-----
>> From: tndave [mailto:tushar.n.dave@oracle.com]
>> Sent: Wednesday, December 28, 2016 6:28 AM
>> To: maowenan; jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; weiyongjun (A); Dingtianhong
>> Subject: Re: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
>>
>>
>>
>> On 12/26/2016 03:39 AM, maowenan wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: netdev-owner@vger.kernel.org
>>>> [mailto:netdev-owner@vger.kernel.org]
>>>> On Behalf Of Tushar Dave
>>>> Sent: Tuesday, December 06, 2016 1:07 AM
>>>> To: jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
>>>> Cc: netdev@vger.kernel.org
>>>> Subject: [RFC PATCH] i40e: enable PCIe relax ordering for SPARC
>>>>
>>>> Unlike previous generation NIC (e.g. ixgbe) i40e doesn't seem to have
>>>> standard CSR where PCIe relaxed ordering can be set. Without PCIe
>>>> relax ordering enabled, i40e performance is significantly low on SPARC.
>>>>
>>> [Mao Wenan]Hi Tushar, you have referred to i40e doesn't seem to have
>>> standard CSR to set PCIe relaxed ordering, this CSR like TX&Rx DCA Control
>> Register in 82599, right?
>> Yes.
>> i40e datasheet mentions some CSR that can be used to enable/disable PCIe
>> relaxed ordering in device; however I don't see the exact definition of those
>> register in datasheet.
>> (https://www.mail-archive.com/netdev@vger.kernel.org/msg117219.html).
>>
>>> Is DMA_ATTR_WEAK_ORDERING the same as TX&RX control register in
>> 82599?
>> No.
>> DMA_ATTR_WEAK_ORDERING applies to the PCIe root complex of the system.
>>
>> -Tushar
>
> I understand that the PCIe Root Complex is the Host Bridge in the CPU that
> connects the CPU and memory to the PCIe architecture. So this attribute
> DMA_ATTR_WEAK_ORDERING is only applied on CPU side(the SPARC in you
> system), it can't apply on i40e, is it right?
Yes.
> And it is not the same as 82599 DCA control register's relax ordering bits.
It is not same as 82599 DCA control register's relax ordering bits.

-Tushar

> -Mao Wenan
>
>>>
>>> And to enable relax ordering mode in 82599 for SPARC using below codes:
>>> s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw) {
>>> 	u32 i;
>>>
>>> 	/* Clear the rate limiters */
>>> 	for (i = 0; i < hw->mac.max_tx_queues; i++) {
>>> 		IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, i);
>>> 		IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRC, 0);
>>> 	}
>>> 	IXGBE_WRITE_FLUSH(hw);
>>>
>>> #ifndef CONFIG_SPARC
>>> 	/* Disable relaxed ordering */
>>> 	for (i = 0; i < hw->mac.max_tx_queues; i++) {
>>> 		u32 regval;
>>>
>>> 		regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
>>> 		regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
>>> 		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
>>> 	}
>>>
>>> 	for (i = 0; i < hw->mac.max_rx_queues; i++) {
>>> 		u32 regval;
>>>
>>> 		regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
>>> 		regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
>>> 			    IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
>>> 		IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
>>> 	}
>>> #endif
>>> 	return 0;
>>> }
>>>
>>>
>>>
>>>> This patch sets PCIe relax ordering for SPARC arch by setting dma
>>>> attr DMA_ATTR_WEAK_ORDERING for every tx and rx DMA map/unmap.
>>>> This has shown 10x increase in performance numbers.
>>>>
>>>> e.g.
>>>> iperf TCP test with 10 threads on SPARC S7
>>>>
>>>> Test 1: Without this patch
>>>>
>>>> [root@brm-snt1-03 net]# iperf -s
>>>> ------------------------------------------------------------
>>>> Server listening on TCP port 5001
>>>> TCP window size: 85.3 KByte (default)
>>>> ------------------------------------------------------------
>>>> [  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 40926 [
>>>> 5] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 40934 [  6] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40930 [  7] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40928 [  8] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40922 [  9] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40932 [ 10] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40920 [ 11] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40924 [ 14] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 40982 [ 12] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 40980
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  4]  0.0-20.0 sec   566 MBytes   237 Mbits/sec
>>>> [  5]  0.0-20.0 sec   532 MBytes   223 Mbits/sec
>>>> [  6]  0.0-20.0 sec   537 MBytes   225 Mbits/sec
>>>> [  8]  0.0-20.0 sec   546 MBytes   229 Mbits/sec
>>>> [ 11]  0.0-20.0 sec   592 MBytes   248 Mbits/sec
>>>> [  7]  0.0-20.0 sec   539 MBytes   226 Mbits/sec
>>>> [  9]  0.0-20.0 sec   572 MBytes   240 Mbits/sec
>>>> [ 10]  0.0-20.0 sec   604 MBytes   253 Mbits/sec
>>>> [ 14]  0.0-20.0 sec   567 MBytes   238 Mbits/sec
>>>> [ 12]  0.0-20.0 sec   511 MBytes   214 Mbits/sec
>>>> [SUM]  0.0-20.0 sec  5.44 GBytes  2.33 Gbits/sec
>>>>
>>>> Test 2: with this patch:
>>>>
>>>> [root@brm-snt1-03 net]# iperf -s
>>>> ------------------------------------------------------------
>>>> Server listening on TCP port 5001
>>>> TCP window size: 85.3 KByte (default)
>>>> ------------------------------------------------------------
>>>> TCP: request_sock_TCP: Possible SYN flooding on port 5001. Sending
>> cookies.
>>>> Check SNMP counters.
>>>> [  4] local 16.0.0.7 port 5001 connected with 16.0.0.1 port 46876 [
>>>> 5] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 46874 [  6] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46872 [  7] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46880 [  8] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46878 [  9] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46884 [ 10] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46886 [ 11] local 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46890 [ 12] local
>>>> 16.0.0.7 port 5001 connected with 16.0.0.1 port 46888 [ 13] local
>>>> 16.0.0.7 port
>>>> 5001 connected with 16.0.0.1 port 46882
>>>> [ ID] Interval       Transfer     Bandwidth
>>>> [  4]  0.0-20.0 sec  7.45 GBytes  3.19 Gbits/sec [  5]  0.0-20.0 sec
>>>> 7.48 GBytes  3.21 Gbits/sec [  7]  0.0-20.0 sec  7.34 GBytes  3.15
>>>> Gbits/sec [  8]  0.0-20.0 sec  7.42 GBytes  3.18 Gbits/sec [  9]
>>>> 0.0-20.0 sec  7.24 GBytes  3.11 Gbits/sec [ 10]  0.0-20.0 sec  7.40
>>>> GBytes  3.17 Gbits/sec [ 12]  0.0-20.0 sec  7.49 GBytes  3.21
>>>> Gbits/sec [  6]  0.0-20.0 sec  7.30 GBytes  3.13 Gbits/sec [ 11]
>>>> 0.0-20.0 sec  7.44 GBytes  3.19 Gbits/sec [ 13]  0.0-20.0 sec  7.22
>>>> GBytes  3.10 Gbits/sec [SUM]  0.0-20.0 sec  73.8 GBytes  31.6
>>>> Gbits/sec
>>>>
>>>> NOTE: In my testing, this patch does _not_ show any harm to i40e
>>>> performance numbers on x86.
>>>>
>>>> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
>>>> ---
>>>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 69
>>>> ++++++++++++++++++++---------
>>>> ++++++++++++++++++++drivers/net/ethernet/intel/i40e/i40e_txrx.h |
>>>> 1 +
>>>>  2 files changed, 49 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> index 6287bf6..800dca7 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>>>> @@ -551,15 +551,17 @@ static void
>>>> i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
>>>>  		else
>>>>  			dev_kfree_skb_any(tx_buffer->skb);
>>>>  		if (dma_unmap_len(tx_buffer, len))
>>>> -			dma_unmap_single(ring->dev,
>>>> -					 dma_unmap_addr(tx_buffer, dma),
>>>> -					 dma_unmap_len(tx_buffer, len),
>>>> -					 DMA_TO_DEVICE);
>>>> +			dma_unmap_single_attrs(ring->dev,
>>>> +					       dma_unmap_addr(tx_buffer, dma),
>>>> +					       dma_unmap_len(tx_buffer, len),
>>>> +					       DMA_TO_DEVICE,
>>>> +					       ring->dma_attrs);
>>>>  	} else if (dma_unmap_len(tx_buffer, len)) {
>>>> -		dma_unmap_page(ring->dev,
>>>> -			       dma_unmap_addr(tx_buffer, dma),
>>>> -			       dma_unmap_len(tx_buffer, len),
>>>> -			       DMA_TO_DEVICE);
>>>> +		dma_unmap_single_attrs(ring->dev,
>>>> +				       dma_unmap_addr(tx_buffer, dma),
>>>> +				       dma_unmap_len(tx_buffer, len),
>>>> +				       DMA_TO_DEVICE,
>>>> +				       ring->dma_attrs);
>>>>  	}
>>>>
>>>>  	tx_buffer->next_to_watch = NULL;
>>>> @@ -662,6 +664,8 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>>  	struct i40e_tx_buffer *tx_buf;
>>>>  	struct i40e_tx_desc *tx_head;
>>>>  	struct i40e_tx_desc *tx_desc;
>>>> +	dma_addr_t addr;
>>>> +	size_t size;
>>>>  	unsigned int total_bytes = 0, total_packets = 0;
>>>>  	unsigned int budget = vsi->work_limit;
>>>>
>>>> @@ -696,10 +700,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>>  		napi_consume_skb(tx_buf->skb, napi_budget);
>>>>
>>>>  		/* unmap skb header data */
>>>> -		dma_unmap_single(tx_ring->dev,
>>>> -				 dma_unmap_addr(tx_buf, dma),
>>>> -				 dma_unmap_len(tx_buf, len),
>>>> -				 DMA_TO_DEVICE);
>>>> +		dma_unmap_single_attrs(tx_ring->dev,
>>>> +				       dma_unmap_addr(tx_buf, dma),
>>>> +				       dma_unmap_len(tx_buf, len),
>>>> +				       DMA_TO_DEVICE,
>>>> +				       tx_ring->dma_attrs);
>>>>
>>>>  		/* clear tx_buffer data */
>>>>  		tx_buf->skb = NULL;
>>>> @@ -717,12 +722,15 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>>>>  				tx_desc = I40E_TX_DESC(tx_ring, 0);
>>>>  			}
>>>>
>>>> +			addr = dma_unmap_addr(tx_buf, dma);
>>>> +			size = dma_unmap_len(tx_buf, len);
>>>>  			/* unmap any remaining paged data */
>>>>  			if (dma_unmap_len(tx_buf, len)) {
>>>> -				dma_unmap_page(tx_ring->dev,
>>>> -					       dma_unmap_addr(tx_buf, dma),
>>>> -					       dma_unmap_len(tx_buf, len),
>>>> -					       DMA_TO_DEVICE);
>>>> +				dma_unmap_single_attrs(tx_ring->dev,
>>>> +						       addr,
>>>> +						       size,
>>>> +						       DMA_TO_DEVICE,
>>>> +						       tx_ring->dma_attrs);
>>>>  				dma_unmap_len_set(tx_buf, len, 0);
>>>>  			}
>>>>  		}
>>>> @@ -1010,6 +1018,11 @@ int i40e_setup_tx_descriptors(struct i40e_ring
>>>> *tx_ring)
>>>>  	 */
>>>>  	tx_ring->size += sizeof(u32);
>>>>  	tx_ring->size = ALIGN(tx_ring->size, 4096);
>>>> +#ifdef CONFIG_SPARC
>>>> +	tx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING; #else
>>>> +	tx_ring->dma_attrs = 0;
>>>> +#endif
>>>>  	tx_ring->desc = dma_alloc_coherent(dev, tx_ring->size,
>>>>  					   &tx_ring->dma, GFP_KERNEL);
>>>>  	if (!tx_ring->desc) {
>>>> @@ -1053,7 +1066,11 @@ void i40e_clean_rx_ring(struct i40e_ring
>> *rx_ring)
>>>>  		if (!rx_bi->page)
>>>>  			continue;
>>>>
>>>> -		dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE,
>>>> DMA_FROM_DEVICE);
>>>> +		dma_unmap_single_attrs(dev,
>>>> +				       rx_bi->dma,
>>>> +				       PAGE_SIZE,
>>>> +				       DMA_FROM_DEVICE,
>>>> +				       rx_ring->dma_attrs);
>>>>  		__free_pages(rx_bi->page, 0);
>>>>
>>>>  		rx_bi->page = NULL;
>>>> @@ -1113,6 +1130,11 @@ int i40e_setup_rx_descriptors(struct i40e_ring
>>>> *rx_ring)
>>>>  	/* Round up to nearest 4K */
>>>>  	rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
>>>>  	rx_ring->size = ALIGN(rx_ring->size, 4096);
>>>> +#ifdef CONFIG_SPARC
>>>> +	rx_ring->dma_attrs = DMA_ATTR_WEAK_ORDERING; #else
>>>> +	rx_ring->dma_attrs = 0;
>>>> +#endif
>>>>  	rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
>>>>  					   &rx_ring->dma, GFP_KERNEL);
>>>>
>>>> @@ -1182,7 +1204,8 @@ static bool i40e_alloc_mapped_page(struct
>>>> i40e_ring *rx_ring,
>>>>  	}
>>>>
>>>>  	/* map page for use */
>>>> -	dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE,
>>>> DMA_FROM_DEVICE);
>>>> +	dma = dma_map_single_attrs(rx_ring->dev, page_address(page),
>>>> PAGE_SIZE,
>>>> +				   DMA_FROM_DEVICE, rx_ring->dma_attrs);
>>>>
>>>>  	/* if mapping failed free memory back to system since
>>>>  	 * there isn't much point in holding memory we can't use @@
>> -1695,8
>>>> +1718,11 @@ struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring
>>>> +*rx_ring,
>>>>  		rx_ring->rx_stats.page_reuse_count++;
>>>>  	} else {
>>>>  		/* we are not reusing the buffer so unmap it */
>>>> -		dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
>>>> -			       DMA_FROM_DEVICE);
>>>> +		dma_unmap_single_attrs(rx_ring->dev,
>>>> +				       rx_buffer->dma,
>>>> +				       PAGE_SIZE,
>>>> +				       DMA_FROM_DEVICE,
>>>> +				       rx_ring->dma_attrs);
>>>>  	}
>>>>
>>>>  	/* clear contents of buffer_info */ @@ -2737,7 +2763,8 @@ static
>>>> inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff
>>>> *skb,
>>>>  	first->skb = skb;
>>>>  	first->tx_flags = tx_flags;
>>>>
>>>> -	dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
>>>> +	dma = dma_map_single_attrs(tx_ring->dev, skb->data, size,
>>>> +				   DMA_TO_DEVICE, tx_ring->dma_attrs);
>>>>
>>>>  	tx_desc = I40E_TX_DESC(tx_ring, i);
>>>>  	tx_bi = first;
>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> index 5088405..9a86212 100644
>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>>>> @@ -327,6 +327,7 @@ struct i40e_ring {
>>>>
>>>>  	unsigned int size;		/* length of descriptor ring in bytes */
>>>>  	dma_addr_t dma;			/* physical address of ring */
>>>> +	unsigned long dma_attrs;	/* DMA attributes */
>>>>
>>>>  	struct i40e_vsi *vsi;		/* Backreference to associated VSI */
>>>>  	struct i40e_q_vector *q_vector;	/* Backreference to associated
>> vector
>>>> */
>>>> --
>>>> 1.9.1
>>>
>>>
>

^ permalink raw reply

* [PATCH] Bluetooth: fix spelling mistake: "advetising" -> "advertising"
From: Colin King @ 2016-12-28 21:17 UTC (permalink / raw)
  To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S . Miller,
	linux-bluetooth, netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

trivial fix to spelling mistake in BT_ERR_RATELIMITED error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/bluetooth/hci_event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index e17aacb..0b4dba0 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -4749,7 +4749,7 @@ static void process_adv_report(struct hci_dev *hdev, u8 type, bdaddr_t *bdaddr,
 	case LE_ADV_SCAN_RSP:
 		break;
 	default:
-		BT_ERR_RATELIMITED("Unknown advetising packet type: 0x%02x",
+		BT_ERR_RATELIMITED("Unknown advertising packet type: 0x%02x",
 				   type);
 		return;
 	}
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCHv2 net-next 00/16] net: mvpp2: add basic support for PPv2.2
From: Thomas Petazzoni @ 2016-12-28 21:08 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg, pawel.moll-5wv7dgnIgG8,
	mark.rutland-5wv7dgnIgG8, galak-sgV2jX0FEOL9JmXXK+q4OQ,
	jason-NLaQJdtUoK4Be96aLqz0jA, andrew-g2DYL2Zd6BY,
	sebastian.hesselbarth-Re5JQEeQqe8AvxtiuMwx3w,
	gregory.clement-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	nadavh-eYqpPyKDWXRBDgjK7y7TUQ, hannah-eYqpPyKDWXRBDgjK7y7TUQ,
	yehuday-eYqpPyKDWXRBDgjK7y7TUQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	stefanc-eYqpPyKDWXRBDgjK7y7TUQ, mw-nYOzD4b6Jr9Wk0Htik3J/w
In-Reply-To: <20161228.120644.1166014191192724301.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hello,

On Wed, 28 Dec 2016 12:06:44 -0500 (EST), David Miller wrote:

> > This series depends on the series named "net: mvpp2: misc improvements
> > and preparation patches".  
> 
> Please in the future only submit one patch series at a time.
> 
> If I've told you that a large patch series is hard to review and that
> therefore one should keep each submitted series small and to a
> reasonable size, that is completely undermined when you submit
> multiple series to work around that request.

Sure. I'll wait for the first patch series to be merged (potentially
after several iterations) before resending the second patch series.

Thanks for the feedback!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured"
From: David Miller @ 2016-12-28 20:12 UTC (permalink / raw)
  To: colin.king; +Cc: javier, qiang.zhao, netdev, linux-kernel
In-Reply-To: <20161228164423.19070-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Wed, 28 Dec 2016 16:44:23 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> trivial fix to spelling mistake in pr_info message
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS
From: David Miller @ 2016-12-28 20:11 UTC (permalink / raw)
  To: augustocaringi
  Cc: netdev, felipe.balbi, keescook, mugunthanvnm, jarod, javier, fw,
	linux-kernel
In-Reply-To: <1482941006-28052-1-git-send-email-augustocaringi@gmail.com>

From: Augusto Mecking Caringi <augustocaringi@gmail.com>
Date: Wed, 28 Dec 2016 16:02:05 +0000

> This patch fixes the following warnings when CONFIG_PROC_FS is not set:
> 
> linux/net/atm/lec.c: In function ‘lane_module_cleanup’:
> linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first
> use in this function)
> remove_proc_entry("lec", atm_proc_root);
>                            ^
> linux/net/atm/lec.c:1062:27: note: each undeclared identifier is
> reported only once for each function it appears in
> 
> Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] net: avoid put_cmsg() possible copy longer data than input
From: David Miller @ 2016-12-28 19:48 UTC (permalink / raw)
  To: cugyly; +Cc: netdev, Linyu.Yuan
In-Reply-To: <1482935663-3428-1-git-send-email-cugyly@163.com>

From: yuan linyu <cugyly@163.com>
Date: Wed, 28 Dec 2016 22:34:23 +0800

> From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
> 
> if CMSG_ALIGN(sizeof(struct cmsghdr)) > sizeof(struct cmsghdr),
> original (cmlen - sizeof(struct cmsghdr)) may greater than
> input len.

You are doing a lot of unrelated cleanups in this change.  This
makes it hard to review.

The important parts of the fix seems to be the added checks to make
sure that we don't access the CMSG_DATA() unless we have more than
CMSG_ALIGN(sizeof(struct cmsghdr)) bytes.

I think you can fix that with a few one-line tests rather than
restructuring all of the CMSG_*() macros.

Also:

> @@ -223,7 +223,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
>  	if (MSG_CMSG_COMPAT & msg->msg_flags)
>  		return put_cmsg_compat(msg, level, type, len, data);
>  
> -	if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
> +	if (cm == NULL || msg->msg_controllen < sizeof(*cm)) {
>  		msg->msg_flags |= MSG_CTRUNC;
>  		return 0; /* XXX: return error? check spec. */
>  	}

This is a coding style fix unrelated to the purpose of this change.

Thanks.

^ permalink raw reply

* Re: [PATCH net 00/12] Mellanox 100G mlx5 fixes 28-12-2016
From: David Miller @ 2016-12-28 19:38 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <1482929922-32626-1-git-send-email-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed, 28 Dec 2016 14:58:30 +0200

> Some fixes for mlx5 core and ethernet driver.
> 
> for -stable:
>     net/mlx5: Check FW limitations on log_max_qp before setting it
>     net/mlx5: Cancel recovery work in remove flow
>     net/mlx5: Avoid shadowing numa_node
>     net/mlx5: Mask destination mac value in ethtool steering rules
>     net/mlx5: Prevent setting multicast macs for VFs
>     net/mlx5e: Don't sync netdev state when not registered
>     net/mlx5e: Disable netdev after close

Series applied, and the patches from the list above have been queued up
for -stable.

Thanks!

^ permalink raw reply

* Re: [PATCH net] net/sched: cls_flower: Fix missing addr_type in classify
From: David Miller @ 2016-12-28 19:29 UTC (permalink / raw)
  To: paulb; +Cc: netdev, jiri, hadarh, ogerlitz, roid
In-Reply-To: <1482929687-20159-1-git-send-email-paulb@mellanox.com>

From: Paul Blakey <paulb@mellanox.com>
Date: Wed, 28 Dec 2016 14:54:47 +0200

> Since we now use a non zero mask on addr_type, we are matching on its
> value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
> failed in SW/classify path since its value was zero.
> This patch sets the proper value of addr_type for encapsulated packets.
> 
> Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type')
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] sctp: add pr_debug for tracking asocs not found
From: David Miller @ 2016-12-28 19:26 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman, lucien.xin
In-Reply-To: <2e7482596bbf75efb56e313e33179f9e1c0e6996.1482924698.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 28 Dec 2016 09:51:56 -0200

> This pr_debug may help identify why the system is generating some
> Aborts. It's not something a sysadmin would be expected to use.
> 
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next RESEND 1/1] driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rx
From: David Miller @ 2016-12-28 19:24 UTC (permalink / raw)
  To: fgao; +Cc: maheshb, edumazet, netdev, gfree.wind
In-Reply-To: <1482914862-2793-1-git-send-email-fgao@ikuai8.com>

From: fgao@ikuai8.com
Date: Wed, 28 Dec 2016 16:47:42 +0800

> From: Gao Feng <fgao@ikuai8.com>
> 
> There are three functions which would invoke the ipvlan_count_rx. They
> are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input.
> The former two functions already use the ipvlan directly before
> ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from
> ipvl_addr->master, it is not possible to be NULL too.
> So the ipvlan pointer check is unnecessary in ipvlan_count_rx.
> 
> Signed-off-by: Gao Feng <fgao@ikuai8.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next RESEND 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address
From: David Miller @ 2016-12-28 19:24 UTC (permalink / raw)
  To: fgao; +Cc: maheshb, edumazet, netdev, gfree.wind
In-Reply-To: <1482914811-2700-1-git-send-email-fgao@ikuai8.com>

From: fgao@ikuai8.com
Date: Wed, 28 Dec 2016 16:46:51 +0800

> From: Gao Feng <fgao@ikuai8.com>
> 
> There are some duplicated codes in ipvlan_add_addr6/4 and
> ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr
> and ipvlan_del_addr to decrease the duplicated codes.
> It could be helful to maintain the codes.
> 
> Signed-off-by: Gao Feng <fgao@ikuai8.com>

Applied.

^ permalink raw reply

* Re: [PATCH v3] net: dev_weight: TX/RX orthogonality
From: David Miller @ 2016-12-28 19:17 UTC (permalink / raw)
  To: matthias.tafelmeier; +Cc: netdev, hagen, fw, edumazet, daniel
In-Reply-To: <1482918134-10483-1-git-send-email-matthias.tafelmeier@gmx.net>

From: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
Date: Wed, 28 Dec 2016 10:42:14 +0100

> @@ -3428,6 +3428,8 @@ EXPORT_SYMBOL(netdev_max_backlog);
>  int netdev_tstamp_prequeue __read_mostly = 1;
>  int netdev_budget __read_mostly = 300;
>  int weight_p __read_mostly = 64;            /* old backlog weight */
> +int dev_weight_rx_bias __read_mostly = 1;            /* bias for backlog weight */
> +int dev_weight_tx_bias __read_mostly = 1;            /* bias for output_queue quota */
>  
>  /* Called with irq disabled */
>  static inline void ____napi_schedule(struct softnet_data *sd,
> @@ -4833,7 +4835,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
>  		net_rps_action_and_irq_enable(sd);
>  	}
>  
> -	napi->weight = weight_p;
> +	napi->weight = weight_p * dev_weight_rx_bias;
>  	while (again) {
>  		struct sk_buff *skb;
>  
 ...
> @@ -247,7 +247,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
>  
>  void __qdisc_run(struct Qdisc *q)
>  {
> -	int quota = weight_p;
> +	int quota = weight_p * dev_weight_tx_bias;

Ok, this is a lot better than what you proposed initially.

However, being that this is the fast path for all packet processing,
introducing a multiply here doesn't sit well.

I think there are two possible ways to address this:

1) Make the bias instead be a "shift".

2) Precompute the dev_tx_weight and dev_rx_weight into two variables
   in net/core/dev.c  Install a special proc_dointvec handler for
   "dev_weight" that, upon proc_dointvec() success, updates both
   dev_tx_weight and dev_rx_weight based upon the bias settings.

^ permalink raw reply

* [PATCH net-next rfc 6/6] net-tc: convert tc_from to tc_from_ingress and tc_redirected
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb->tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb->tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted and must be dropped. Set
tc_redirected on redirection and drop all other packets in ifb_xmit.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/ifb.c            | 13 +++++--------
 include/linux/skbuff.h       |  5 ++++-
 include/net/sch_generic.h    |  2 +-
 include/uapi/linux/pkt_cls.h |  6 ------
 net/sched/act_mirred.c       |  6 ++++--
 net/sched/sch_netem.c        |  2 +-
 6 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 90ad879..193af74 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -78,9 +78,7 @@ static void ifb_ri_tasklet(unsigned long _txp)
 	}
 
 	while ((skb = __skb_dequeue(&txp->tq)) != NULL) {
-		u32 from = skb->tc_from;
-
-		skb_reset_tc(skb);
+		skb->tc_redirected = 0;
 		skb->tc_skip_classify = 1;
 
 		u64_stats_update_begin(&txp->tsync);
@@ -101,13 +99,12 @@ static void ifb_ri_tasklet(unsigned long _txp)
 		rcu_read_unlock();
 		skb->skb_iif = txp->dev->ifindex;
 
-		if (from & AT_EGRESS) {
+		if (!skb->tc_from_ingress) {
 			dev_queue_xmit(skb);
-		} else if (from & AT_INGRESS) {
+		} else {
 			skb_pull(skb, skb->mac_len);
 			netif_receive_skb(skb);
-		} else
-			BUG();
+		}
 	}
 
 	if (__netif_tx_trylock(txq)) {
@@ -248,7 +245,7 @@ static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
 	txp->rx_bytes += skb->len;
 	u64_stats_update_end(&txp->rsync);
 
-	if (skb->tc_from == AT_STACK || !skb->skb_iif) {
+	if (!skb->tc_redirected || !skb->skb_iif) {
 		dev_kfree_skb(skb);
 		dev->stats.rx_dropped++;
 		return NETDEV_TX_OK;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fab3f87..3149a88 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -591,6 +591,8 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@ipvs_property: skbuff is owned by ipvs
  *	@tc_skip_classify: do not classify packet. set by IFB device
  *	@tc_at_ingress: used within tc_classify to distinguish in/egress
+ *	@tc_redirected: packet was redirected by a tc action
+ *	@tc_from_ingress: if tc_redirected, tc_at_ingress at time of redirect
  *	@peeked: this packet has been seen already, so stats have been
  *		done for it, don't do them again
  *	@nf_trace: netfilter packet trace flag
@@ -753,7 +755,8 @@ struct sk_buff {
 #ifdef CONFIG_NET_CLS_ACT
 	__u8			tc_skip_classify:1;
 	__u8			tc_at_ingress:1;
-	__u8			tc_from:2;
+	__u8			tc_redirected:1;
+	__u8			tc_from_ingress:1;
 #endif
 
 #ifdef CONFIG_NET_SCHED
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4bd6d53..e2f426f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -412,7 +412,7 @@ int skb_do_redirect(struct sk_buff *);
 static inline void skb_reset_tc(struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_CLS_ACT
-	skb->tc_from = 0;
+	skb->tc_redirected = 0;
 #endif
 }
 
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cee753a..a081efb 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -4,12 +4,6 @@
 #include <linux/types.h>
 #include <linux/pkt_sched.h>
 
-#ifdef __KERNEL__
-#define AT_STACK	0x0
-#define AT_INGRESS	0x1
-#define AT_EGRESS	0x2
-#endif
-
 /* Action attributes */
 enum {
 	TCA_ACT_UNSPEC,
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index e832c62..84682f0 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -211,8 +211,10 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	/* mirror is always swallowed */
-	if (tcf_mirred_is_act_redirect(m_eaction))
-		skb2->tc_from = skb_at_tc_ingress(skb) ? AT_INGRESS : AT_EGRESS;
+	if (tcf_mirred_is_act_redirect(m_eaction)) {
+		skb2->tc_redirected = 1;
+		skb2->tc_from_ingress = skb2->tc_at_ingress;
+	}
 
 	skb2->skb_iif = skb->dev->ifindex;
 	skb2->dev = dev;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bb5c638..c8bb62a 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -626,7 +626,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 			 * If it's at ingress let's pretend the delay is
 			 * from the network (tstamp will be updated).
 			 */
-			if (skb->tc_from & AT_INGRESS)
+			if (skb->tc_redirected && skb->tc_from_ingress)
 				skb->tstamp = 0;
 #endif
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next rfc 5/6] net-tc: convert tc_at to tc_at_ingress
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose. Set it
within tc_classify to make the scope clear and to avoid the need to
clear it in skb_reset_tc.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h    |  3 ++-
 include/net/sch_generic.h |  3 +--
 net/core/dev.c            |  6 +-----
 net/sched/act_mirred.c    | 12 ++++++------
 net/sched/sch_api.c       |  1 +
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f738d09..fab3f87 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -590,6 +590,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@fclone: skbuff clone status
  *	@ipvs_property: skbuff is owned by ipvs
  *	@tc_skip_classify: do not classify packet. set by IFB device
+ *	@tc_at_ingress: used within tc_classify to distinguish in/egress
  *	@peeked: this packet has been seen already, so stats have been
  *		done for it, don't do them again
  *	@nf_trace: netfilter packet trace flag
@@ -751,7 +752,7 @@ struct sk_buff {
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	__u8			tc_skip_classify:1;
-	__u8			tc_at:2;
+	__u8			tc_at_ingress:1;
 	__u8			tc_from:2;
 #endif
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f80dba5..4bd6d53 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -412,7 +412,6 @@ int skb_do_redirect(struct sk_buff *);
 static inline void skb_reset_tc(struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_CLS_ACT
-	skb->tc_at = 0;
 	skb->tc_from = 0;
 #endif
 }
@@ -420,7 +419,7 @@ static inline void skb_reset_tc(struct sk_buff *skb)
 static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_CLS_ACT
-	return skb->tc_at & AT_INGRESS;
+	return skb->tc_at_ingress;
 #else
 	return false;
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 25620cf..90833fdf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3153,9 +3153,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!cl)
 		return skb;
 
-	/* skb->tc_at and qdisc_skb_cb(skb)->pkt_len were already set
-	 * earlier by the caller.
-	 */
+	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_bstats_cpu_update(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res, false)) {
@@ -3320,7 +3318,6 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
-	skb->tc_at = AT_EGRESS;
 # ifdef CONFIG_NET_EGRESS
 	if (static_key_false(&egress_needed)) {
 		skb = sch_handle_egress(skb, &rc, dev);
@@ -3916,7 +3913,6 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 	}
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
-	skb->tc_at = AT_INGRESS;
 	qdisc_bstats_cpu_update(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res, false)) {
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 8543279..e832c62 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -39,15 +39,15 @@ static bool tcf_mirred_is_act_redirect(int action)
 	return action == TCA_EGRESS_REDIR || action == TCA_INGRESS_REDIR;
 }
 
-static u32 tcf_mirred_act_direction(int action)
+static bool tcf_mirred_act_wants_ingress(int action)
 {
 	switch (action) {
 	case TCA_EGRESS_REDIR:
 	case TCA_EGRESS_MIRROR:
-		return AT_EGRESS;
+		return false;
 	case TCA_INGRESS_REDIR:
 	case TCA_INGRESS_MIRROR:
-		return AT_INGRESS;
+		return true;
 	default:
 		BUG();
 	}
@@ -198,7 +198,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (skb->tc_at != tcf_mirred_act_direction(m_eaction) &&
+	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
 	    m_mac_header_xmit) {
 		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
@@ -212,11 +212,11 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 
 	/* mirror is always swallowed */
 	if (tcf_mirred_is_act_redirect(m_eaction))
-		skb2->tc_from = skb2->tc_at;
+		skb2->tc_from = skb_at_tc_ingress(skb) ? AT_INGRESS : AT_EGRESS;
 
 	skb2->skb_iif = skb->dev->ifindex;
 	skb2->dev = dev;
-	if (tcf_mirred_act_direction(m_eaction) & AT_EGRESS)
+	if (!tcf_mirred_act_wants_ingress(m_eaction))
 		err = dev_queue_xmit(skb2);
 	else
 		err = netif_receive_skb(skb2);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ef53ede..be4e18d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1865,6 +1865,7 @@ int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	const struct tcf_proto *old_tp = tp;
 	int limit = 0;
 
+	skb->tc_at_ingress = !!(tp && tp->q->flags & TCQ_F_INGRESS);
 reclassify:
 #endif
 	for (; tp; tp = rcu_dereference_bh(tp->next)) {
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next rfc 4/6] net-tc: convert tc_verd to integer bitfields
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/ifb.c                    |  7 +++----
 drivers/staging/octeon/ethernet-tx.c |  5 ++---
 include/linux/skbuff.h               |  6 ++----
 include/net/sch_generic.h            | 10 +++++++++-
 include/uapi/linux/pkt_cls.h         | 31 -------------------------------
 net/core/dev.c                       | 10 ++++------
 net/core/pktgen.c                    |  4 +---
 net/core/skbuff.c                    |  3 ---
 net/sched/act_ife.c                  |  7 +++----
 net/sched/act_mirred.c               |  9 ++++-----
 net/sched/sch_netem.c                |  2 +-
 11 files changed, 29 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 4299ac1..90ad879 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -78,9 +78,9 @@ static void ifb_ri_tasklet(unsigned long _txp)
 	}
 
 	while ((skb = __skb_dequeue(&txp->tq)) != NULL) {
-		u32 from = G_TC_FROM(skb->tc_verd);
+		u32 from = skb->tc_from;
 
-		skb->tc_verd = 0;
+		skb_reset_tc(skb);
 		skb->tc_skip_classify = 1;
 
 		u64_stats_update_begin(&txp->tsync);
@@ -241,7 +241,6 @@ static void ifb_setup(struct net_device *dev)
 static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ifb_dev_private *dp = netdev_priv(dev);
-	u32 from = G_TC_FROM(skb->tc_verd);
 	struct ifb_q_private *txp = dp->tx_private + skb_get_queue_mapping(skb);
 
 	u64_stats_update_begin(&txp->rsync);
@@ -249,7 +248,7 @@ static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
 	txp->rx_bytes += skb->len;
 	u64_stats_update_end(&txp->rsync);
 
-	if (!(from & (AT_INGRESS|AT_EGRESS)) || !skb->skb_iif) {
+	if (skb->tc_from == AT_STACK || !skb->skb_iif) {
 		dev_kfree_skb(skb);
 		dev->stats.rx_dropped++;
 		return NETDEV_TX_OK;
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 6b4c208..0b80532 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -23,6 +23,7 @@
 #endif /* CONFIG_XFRM */
 
 #include <linux/atomic.h>
+#include <net/sch_generic.h>
 
 #include <asm/octeon/octeon.h>
 
@@ -369,9 +370,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
 
 #ifdef CONFIG_NET_SCHED
 	skb->tc_index = 0;
-#ifdef CONFIG_NET_CLS_ACT
-	skb->tc_verd = 0;
-#endif /* CONFIG_NET_CLS_ACT */
+	skb_reset_tc(skb);
 #endif /* CONFIG_NET_SCHED */
 #endif /* REUSE_SKBUFFS_WITHOUT_FREE */
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 570f60e..f738d09 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -599,7 +599,6 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  *	@skb_iif: ifindex of device we arrived on
  *	@tc_index: Traffic control index
- *	@tc_verd: traffic control verdict
  *	@hash: the packet hash
  *	@queue_mapping: Queue mapping for multiqueue devices
  *	@xmit_more: More SKBs are pending for this queue
@@ -752,13 +751,12 @@ struct sk_buff {
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	__u8			tc_skip_classify:1;
+	__u8			tc_at:2;
+	__u8			tc_from:2;
 #endif
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
-#ifdef CONFIG_NET_CLS_ACT
-	__u16			tc_verd;	/* traffic control verdict */
-#endif
 #endif
 
 	union {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 857356f..f80dba5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -409,10 +409,18 @@ bool tcf_destroy(struct tcf_proto *tp, bool force);
 void tcf_destroy_chain(struct tcf_proto __rcu **fl);
 int skb_do_redirect(struct sk_buff *);
 
+static inline void skb_reset_tc(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	skb->tc_at = 0;
+	skb->tc_from = 0;
+#endif
+}
+
 static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_CLS_ACT
-	return G_TC_AT(skb->tc_verd) & AT_INGRESS;
+	return skb->tc_at & AT_INGRESS;
 #else
 	return false;
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 1eed5d7..cee753a 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -5,40 +5,9 @@
 #include <linux/pkt_sched.h>
 
 #ifdef __KERNEL__
-/* I think i could have done better macros ; for now this is stolen from
- * some arch/mips code - jhs
-*/
-#define _TC_MAKE32(x) ((x))
-
-#define _TC_MAKEMASK1(n) (_TC_MAKE32(1) << _TC_MAKE32(n))
-#define _TC_MAKEMASK(v,n) (_TC_MAKE32((_TC_MAKE32(1)<<(v))-1) << _TC_MAKE32(n))
-#define _TC_MAKEVALUE(v,n) (_TC_MAKE32(v) << _TC_MAKE32(n))
-#define _TC_GETVALUE(v,n,m) ((_TC_MAKE32(v) & _TC_MAKE32(m)) >> _TC_MAKE32(n))
-
-/* verdict bit breakdown 
- *
-bit 6,7: Where this packet was last seen 
-0: Above the transmit example at the socket level
-1: on the Ingress
-2: on the Egress
-
- *
- * */
-
-#define S_TC_FROM          _TC_MAKE32(6)
-#define M_TC_FROM          _TC_MAKEMASK(2,S_TC_FROM)
-#define G_TC_FROM(x)       _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
-#define V_TC_FROM(x)       _TC_MAKEVALUE(x,S_TC_FROM)
-#define SET_TC_FROM(v,n)   ((V_TC_FROM(n)) | (v & ~M_TC_FROM))
 #define AT_STACK	0x0
 #define AT_INGRESS	0x1
 #define AT_EGRESS	0x2
-
-#define S_TC_AT          _TC_MAKE32(12)
-#define M_TC_AT          _TC_MAKEMASK(2,S_TC_AT)
-#define G_TC_AT(x)       _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
-#define V_TC_AT(x)       _TC_MAKEVALUE(x,S_TC_AT)
-#define SET_TC_AT(v,n)   ((V_TC_AT(n)) | (v & ~M_TC_AT))
 #endif
 
 /* Action attributes */
diff --git a/net/core/dev.c b/net/core/dev.c
index 7b41d97..25620cf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3153,7 +3153,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!cl)
 		return skb;
 
-	/* skb->tc_verd and qdisc_skb_cb(skb)->pkt_len were already set
+	/* skb->tc_at and qdisc_skb_cb(skb)->pkt_len were already set
 	 * earlier by the caller.
 	 */
 	qdisc_bstats_cpu_update(cl->q, skb);
@@ -3320,7 +3320,7 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 
 	qdisc_pkt_len_init(skb);
 #ifdef CONFIG_NET_CLS_ACT
-	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
+	skb->tc_at = AT_EGRESS;
 # ifdef CONFIG_NET_EGRESS
 	if (static_key_false(&egress_needed)) {
 		skb = sch_handle_egress(skb, &rc, dev);
@@ -3916,7 +3916,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 	}
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
-	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+	skb->tc_at = AT_INGRESS;
 	qdisc_bstats_cpu_update(cl->q, skb);
 
 	switch (tc_classify(skb, cl, &cl_res, false)) {
@@ -4118,9 +4118,7 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 			goto out;
 	}
 #endif
-#ifdef CONFIG_NET_CLS_ACT
-	skb->tc_verd = 0;
-#endif
+	skb_reset_tc(skb);
 skip_classify:
 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
 		goto drop;
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 8e69ce4..96947f5 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3439,9 +3439,7 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 			/* skb was 'freed' by stack, so clean few
 			 * bits and reuse it
 			 */
-#ifdef CONFIG_NET_CLS_ACT
-			skb->tc_verd = 0; /* reset reclass/redir ttl */
-#endif
+			skb_reset_tc(skb);
 		} while (--burst > 0);
 		goto out; /* Skips xmit_mode M_START_XMIT */
 	} else if (pkt_dev->xmit_mode == M_QUEUE_XMIT) {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a03730..adec4bf 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -878,9 +878,6 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 #endif
 #ifdef CONFIG_NET_SCHED
 	CHECK_SKB_FIELD(tc_index);
-#ifdef CONFIG_NET_CLS_ACT
-	CHECK_SKB_FIELD(tc_verd);
-#endif
 #endif
 
 }
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 80b848d..921fb20 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -736,12 +736,11 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 	u16 metalen = ife_get_sz(skb, ife);
 	int hdrm = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
 	unsigned int skboff = skb->dev->hard_header_len;
-	u32 at = G_TC_AT(skb->tc_verd);
 	int new_len = skb->len + hdrm;
 	bool exceed_mtu = false;
 	int err;
 
-	if (at & AT_EGRESS) {
+	if (!skb_at_tc_ingress(skb)) {
 		if (new_len > skb->dev->mtu)
 			exceed_mtu = true;
 	}
@@ -773,7 +772,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		return TC_ACT_SHOT;
 	}
 
-	if (!(at & AT_EGRESS))
+	if (skb_at_tc_ingress(skb))
 		skb_push(skb, skb->dev->hard_header_len);
 
 	iethh = (struct ethhdr *)skb->data;
@@ -816,7 +815,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 		ether_addr_copy(oethh->h_dest, iethh->h_dest);
 	oethh->h_proto = htons(ife->eth_type);
 
-	if (!(at & AT_EGRESS))
+	if (skb_at_tc_ingress(skb))
 		skb_pull(skb, skb->dev->hard_header_len);
 
 	spin_unlock(&ife->tcf_lock);
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 2d9fa6e..8543279 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -170,7 +170,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	int retval, err = 0;
 	int m_eaction;
 	int mac_len;
-	u32 at;
 
 	tcf_lastuse_update(&m->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
@@ -191,7 +190,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		goto out;
 	}
 
-	at = G_TC_AT(skb->tc_verd);
 	skb2 = skb_clone(skb, GFP_ATOMIC);
 	if (!skb2)
 		goto out;
@@ -200,8 +198,9 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (at != tcf_mirred_act_direction(m_eaction) && m_mac_header_xmit) {
-		if (at & AT_EGRESS) {
+	if (skb->tc_at != tcf_mirred_act_direction(m_eaction) &&
+	    m_mac_header_xmit) {
+		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
 			skb_pull_rcsum(skb2, mac_len);
@@ -213,7 +212,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 
 	/* mirror is always swallowed */
 	if (tcf_mirred_is_act_redirect(m_eaction))
-		skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
+		skb2->tc_from = skb2->tc_at;
 
 	skb2->skb_iif = skb->dev->ifindex;
 	skb2->dev = dev;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bcfadfd..bb5c638 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -626,7 +626,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 			 * If it's at ingress let's pretend the delay is
 			 * from the network (tstamp will be updated).
 			 */
-			if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
+			if (skb->tc_from & AT_INGRESS)
 				skb->tstamp = 0;
 #endif
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next rfc 3/6] net-tc: extract skip classify bit from tc_verd
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/ifb.c            |  2 +-
 include/linux/skbuff.h       |  5 ++++-
 include/net/sch_generic.h    | 11 +++++++++++
 include/uapi/linux/pkt_cls.h |  6 ------
 net/core/dev.c               | 10 +++-------
 net/sched/act_api.c          |  8 +++-----
 6 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 66c0eea..4299ac1 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -81,7 +81,7 @@ static void ifb_ri_tasklet(unsigned long _txp)
 		u32 from = G_TC_FROM(skb->tc_verd);
 
 		skb->tc_verd = 0;
-		skb->tc_verd = SET_TC_NCLS(skb->tc_verd);
+		skb->tc_skip_classify = 1;
 
 		u64_stats_update_begin(&txp->tsync);
 		txp->tx_packets++;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b53c0cf..570f60e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -589,6 +589,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@pkt_type: Packet class
  *	@fclone: skbuff clone status
  *	@ipvs_property: skbuff is owned by ipvs
+ *	@tc_skip_classify: do not classify packet. set by IFB device
  *	@peeked: this packet has been seen already, so stats have been
  *		done for it, don't do them again
  *	@nf_trace: netfilter packet trace flag
@@ -749,7 +750,9 @@ struct sk_buff {
 #ifdef CONFIG_NET_SWITCHDEV
 	__u8			offload_fwd_mark:1;
 #endif
-	/* 2, 4 or 5 bit hole */
+#ifdef CONFIG_NET_CLS_ACT
+	__u8			tc_skip_classify:1;
+#endif
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 498f81b..857356f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -418,6 +418,17 @@ static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
 #endif
 }
 
+static inline bool skb_skip_tc_classify(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	if (skb->tc_skip_classify) {
+		skb->tc_skip_classify = 0;
+		return true;
+	}
+#endif
+	return false;
+}
+
 /* Reset all TX qdiscs greater then index of a device.  */
 static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
 {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index bba23db..1eed5d7 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -22,8 +22,6 @@ bit 6,7: Where this packet was last seen
 1: on the Ingress
 2: on the Egress
 
-bit 8: when set --> Request not to classify on ingress. 
-
  *
  * */
 
@@ -36,10 +34,6 @@ bit 8: when set --> Request not to classify on ingress.
 #define AT_INGRESS	0x1
 #define AT_EGRESS	0x2
 
-#define TC_NCLS          _TC_MAKEMASK1(8)
-#define SET_TC_NCLS(v)   ( TC_NCLS | (v & ~TC_NCLS))
-#define CLR_TC_NCLS(v)   ( v & ~TC_NCLS)
-
 #define S_TC_AT          _TC_MAKE32(12)
 #define M_TC_AT          _TC_MAKEMASK(2,S_TC_AT)
 #define G_TC_AT(x)       _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
diff --git a/net/core/dev.c b/net/core/dev.c
index 8db5a0b..7b41d97 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4089,12 +4089,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 			goto out;
 	}
 
-#ifdef CONFIG_NET_CLS_ACT
-	if (skb->tc_verd & TC_NCLS) {
-		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
-		goto ncls;
-	}
-#endif
+	if (skb_skip_tc_classify(skb))
+		goto skip_classify;
 
 	if (pfmemalloc)
 		goto skip_taps;
@@ -4124,8 +4120,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 #endif
 #ifdef CONFIG_NET_CLS_ACT
 	skb->tc_verd = 0;
-ncls:
 #endif
+skip_classify:
 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
 		goto drop;
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2095c83..7afaf8e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -426,11 +426,9 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 {
 	int ret = -1, i;
 
-	if (skb->tc_verd & TC_NCLS) {
-		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
-		ret = TC_ACT_OK;
-		goto exec_done;
-	}
+	if (skb_skip_tc_classify(skb))
+		return TC_ACT_OK;
+
 	for (i = 0; i < nr_actions; i++) {
 		const struct tc_action *a = actions[i];
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next rfc 2/6] net-tc: make MAX_RECLASSIFY_LOOP local
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

This field is no longer kept in tc_verd. Remove it from the global
definition of that struct.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/pkt_cls.h | 5 -----
 net/sched/sch_api.c          | 3 ++-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index c769f71..bba23db 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -17,9 +17,6 @@
 
 /* verdict bit breakdown 
  *
-bit 2,3,4,5: Reclassify counter - sort of reverse TTL - if exceeded
-assume loop
-
 bit 6,7: Where this packet was last seen 
 0: Above the transmit example at the socket level
 1: on the Ingress
@@ -48,8 +45,6 @@ bit 8: when set --> Request not to classify on ingress.
 #define G_TC_AT(x)       _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
 #define V_TC_AT(x)       _TC_MAKEVALUE(x,S_TC_AT)
 #define SET_TC_AT(v,n)   ((V_TC_AT(n)) | (v & ~M_TC_AT))
-
-#define MAX_REC_LOOP 4
 #endif
 
 /* Action attributes */
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d7b9342..ef53ede 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1861,6 +1861,7 @@ int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 {
 	__be16 protocol = tc_skb_protocol(skb);
 #ifdef CONFIG_NET_CLS_ACT
+	const int max_reclassify_loop = 4;
 	const struct tcf_proto *old_tp = tp;
 	int limit = 0;
 
@@ -1885,7 +1886,7 @@ int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	return TC_ACT_UNSPEC; /* signal: continue lookup */
 #ifdef CONFIG_NET_CLS_ACT
 reset:
-	if (unlikely(limit++ >= MAX_REC_LOOP)) {
+	if (unlikely(limit++ >= max_reclassify_loop)) {
 		net_notice_ratelimited("%s: reclassify loop, rule prio %u, protocol %02x\n",
 				       tp->q->ops->id, tp->prio & 0xffff,
 				       ntohs(tp->protocol));
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next rfc 1/6] net-tc: remove unused tc_verd fields
From: Willem de Bruijn @ 2016-12-28 19:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, fw, dborkman, jhs, alexei.starovoitov, Willem de Bruijn
In-Reply-To: <1482952410-50003-1-git-send-email-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Remove the last reference to tc_verd's munge and redirect ttl bits.
These fields are no longer used.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/pkt_cls.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc..c769f71 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -17,10 +17,6 @@
 
 /* verdict bit breakdown 
  *
-bit 0: when set -> this packet has been munged already
-
-bit 1: when set -> It is ok to munge this packet
-
 bit 2,3,4,5: Reclassify counter - sort of reverse TTL - if exceeded
 assume loop
 
@@ -31,8 +27,6 @@ bit 6,7: Where this packet was last seen
 
 bit 8: when set --> Request not to classify on ingress. 
 
-bits 9,10,11: redirect counter -  redirect TTL. Loop avoidance
-
  *
  * */
 
@@ -56,7 +50,6 @@ bits 9,10,11: redirect counter -  redirect TTL. Loop avoidance
 #define SET_TC_AT(v,n)   ((V_TC_AT(n)) | (v & ~M_TC_AT))
 
 #define MAX_REC_LOOP 4
-#define MAX_RED_LOOP 4
 #endif
 
 /* Action attributes */
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox