Netdev List

Netdev List
 help / color / mirror / Atom feed

* (unknown), 
From: doctornina @ 2016-12-28 22:43 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 24488470886248_netdev.zip --]
[-- Type: application/zip, Size: 43724 bytes --]

^ permalink raw reply

* [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Grygorii Strashko @ 2016-12-28 23:42 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Ivan Khoronzhuk,
	Grygorii Strashko

Now below code sequence causes "Unable to handle kernel NULL pointer
dereference.." exception and system crash during CPSW CPDMA initialization:

cpsw_probe
|-cpdma_chan_create (TX channel)
  |-cpdma_chan_split_pool
    |-cpdma_chan_set_descs(for TX channels)
    |-cpdma_chan_set_descs(for RX channels) [1]

- and -
static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
				 int rx, int desc_num,
				 int per_ch_desc)
{
	struct cpdma_chan *chan, *most_chan = NULL;

...

	for (i = min; i < max; i++) {
		chan = ctlr->channels[i];
		if (!chan)
			continue;
...

		if (most_dnum < chan->desc_num) {
			most_dnum = chan->desc_num;
			most_chan = chan;
		}
	}
	/* use remains */
	most_chan->desc_num += desc_cnt; [2]
}

So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
called second time [1], because there are no RX channels yet and system
will crash at [2].

Hence, fix the issue by checking most_chan for NULL before accessing it.

Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 36518fc..b349d572 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
 		}
 	}
 	/* use remains */
-	most_chan->desc_num += desc_cnt;
+	if (most_chan)
+		most_chan->desc_num += desc_cnt;
 }
 
 /**
-- 
2.10.1.dirty

^ permalink raw reply related

* [PATCH net] net: stmmac: Fix error path after register_netdev move
From: Florian Fainelli @ 2016-12-28 23:44 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, pavel, Joao.Pinto, seraphin.bonnaffe,
	alexandre.torgue, manabian, niklas.cassel, johan, boon.leong.ong,
	weifeng.voon, lars.persson, linux-kernel, Giuseppe Cavallaro,
	Alexandre Torgue

Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
stmmac_open") re-ordered how the MDIO bus registration and the network
device are registered, but missed to unwind the MDIO bus registration in
case we fail to register the network device.

Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5910ea51f8f6..39eb7a65bb9f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3366,12 +3366,19 @@ int stmmac_dvr_probe(struct device *device,
 	}
 
 	ret = register_netdev(ndev);
-	if (ret)
+	if (ret) {
 		netdev_err(priv->dev, "%s: ERROR %i registering the device\n",
 			   __func__, ret);
+		goto error_netdev_register;
+	}
 
 	return ret;
 
+error_netdev_register:
+	if (priv->hw->pcs != STMMAC_PCS_RGMII &&
+	    priv->hw->pcs != STMMAC_PCS_TBI &&
+	    priv->hw->pcs != STMMAC_PCS_RTBI)
+		stmmac_mdio_unregister(ndev);
 error_mdio_register:
 	netif_napi_del(&priv->napi);
 error_hw_init:
-- 
2.9.3

^ permalink raw reply related

* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29  0:26 UTC (permalink / raw)
  To: Kishan Sandeep
  Cc: David Miller, f.fainelli@gmail.com, Joao.Pinto@synopsys.com,
	peppe.cavallaro@st.com, seraphin.bonnaffe@st.com,
	alexandre.torgue@gmail.com, manabian@gmail.com,
	niklas.cassel@axis.com, johan@kernel.org, pavel@ucw.cz,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAJ3s=NAzfUSCnxxZph5j6Vk2T=pw2F9U3FLP7dOKRp21mmQvcg@mail.gmail.com>

> -----Original Message-----
> From: Kishan Sandeep [mailto:sandeepkishan108@gmail.com]
> Sent: Wednesday, December 28, 2016 7:56 PM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> Cc: David Miller <davem@davemloft.net>; f.fainelli@gmail.com;
> Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; lars.persson@axis.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
> 
> On Wed, Dec 28, 2016 at 7:10 AM, Kweh, Hock Leong
> <hock.leong.kweh@intel.com> wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong"      <hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >> > From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Noted & Thanks.
> >
> > Hi Florian, could you submit this fix from your side so that you are the author.
> > I will help to test out.
> >
> > Thanks & Regards,
> > Wilson
> >
> I think you can give *--author* for giving author name. Try git commit -am
> "commit message" --author="Author_name <author_email>"

Oh ... I am not aware of that. Thanks for informing. 
:-)

Regards,
Wilson

^ permalink raw reply

* RE: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-29  0:28 UTC (permalink / raw)
  To: Florian Fainelli, David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <e88996a8-e010-c07f-7dee-d220a6dfc706@gmail.com>

> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Thursday, December 29, 2016 2:43 AM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>; David Miller
> <davem@davemloft.net>
> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; Ong, Boon Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> <weifeng.voon@intel.com>; lars.persson@axis.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and
> stmmac_dvr_probe
> 
> On 12/27/2016 09:49 PM, Kweh, Hock Leong wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: Wednesday, December 28, 2016 12:34 AM
> >> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> >> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> >> seraphin.bonnaffe@st.com; f.fainelli@gmail.com;
> >> alexandre.torgue@gmail.com; manabian@gmail.com;
> >> niklas.cassel@axis.com; johan@kernel.org; pavel@ucw.cz; Ong, Boon
> >> Leong <boon.leong.ong@intel.com>; Voon, Weifeng
> >> <weifeng.voon@intel.com>; lars.persson@axis.com;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2] net: stmmac: bug fix to synchronize
> >> stmmac_open and stmmac_dvr_probe
> >>
> >> From: "Kweh, Hock Leong"	<hock.leong.kweh@intel.com>
> >> Date: Tue, 27 Dec 2016 22:42:36 +0800
> >>
> >>> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> You are not the author of this change, do not take credit for it.
> >>
> >> You have copied Florian's patch character by character, therefore he
> >> is the author.
> >>
> >> You also didn't CC: the netdev mailing list properly.
> >
> > Hi David & Florian,
> >
> > Just to clarify that I do not copy exactly from Florian.
> > I have changed it to have proper handling on mdio unregister while
> > netdev_register() failed as showed below:
> >
> > 	return 0;
> >
> > -error_mdio_register:
> > -	unregister_netdev(ndev);
> >  error_netdev_register:
> > +	stmmac_mdio_unregister(ndev);
> 
> Although this is required, we can't be doing it in all circumstances, we need to
> mimic what stmmac_drv_remove() does.
> 
> Let me submit an incremental fix which takes care of mdio bus unregistration.
> --
> Florian

Noted & Thanks. Will test it out once you submitted.

Thanks & Regards,
Wilson


^ permalink raw reply

* Re: [PATCH v2] net: stmmac: bug fix to synchronize stmmac_open and stmmac_dvr_probe
From: Florian Fainelli @ 2016-12-29  0:40 UTC (permalink / raw)
  To: Kweh, Hock Leong, David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <F54AEECA5E2B9541821D670476DAE19C5A916EB3@PGSMSX102.gar.corp.intel.com>

On 12/28/2016 04:28 PM, Kweh, Hock Leong wrote:
>> Although this is required, we can't be doing it in all circumstances, we need to
>> mimic what stmmac_drv_remove() does.
>>
>> Let me submit an incremental fix which takes care of mdio bus unregistration.
>> --
>> Florian
> 
> Noted & Thanks. Will test it out once you submitted.

It's done:

https://www.spinics.net/lists/netdev/msg411934.html
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Ivan Khoronzhuk @ 2016-12-29  1:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap
In-Reply-To: <20161228234213.22166-1-grygorii.strashko@ti.com>

On Wed, Dec 28, 2016 at 05:42:13PM -0600, Grygorii Strashko wrote:
Grygorii,

> Now below code sequence causes "Unable to handle kernel NULL pointer
> dereference.." exception and system crash during CPSW CPDMA initialization:
> 
> cpsw_probe
> |-cpdma_chan_create (TX channel)
>   |-cpdma_chan_split_pool
>     |-cpdma_chan_set_descs(for TX channels)
>     |-cpdma_chan_set_descs(for RX channels) [1]
> 
> - and -
> static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
> 				 int rx, int desc_num,
> 				 int per_ch_desc)
> {
> 	struct cpdma_chan *chan, *most_chan = NULL;
> 
> ...
> 
> 	for (i = min; i < max; i++) {
> 		chan = ctlr->channels[i];
> 		if (!chan)
> 			continue;
> ...
> 
> 		if (most_dnum < chan->desc_num) {
> 			most_dnum = chan->desc_num;
> 			most_chan = chan;
> 		}
> 	}
> 	/* use remains */
> 	most_chan->desc_num += desc_cnt; [2]
> }
> 
> So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
> called second time [1], because there are no RX channels yet and system
> will crash at [2].

How did you get this?
I just remember as I fixed it before sending patchset.

Maybe it was some experiment with it.
I just wonder and want to find actual reason what's happening.

Look bellow:

cpsw_probe
|-cpdma_chan_create (TX channel)
  |-cpdma_chan_split_pool
    |-cpdma_chan_set_descs(for TX channels)
    |-cpdma_chan_set_descs(for RX channels) [1]

|-cpdma_chan_set_descs(for RX channels) in case you'be described has to be
called with rx_desc_num = 0, because all descs are assigned already for tx
channel. And, if desc_num = 0, cpdma_chan_set_descs just exits and no issues.
So, could you please explain how you get this, in what circumstances.

> 
> Hence, fix the issue by checking most_chan for NULL before accessing it.
> 
> Fixes: 0fc6432cc78d ("net: ethernet: ti: davinci_cpdma: add weight function for channels")
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index 36518fc..b349d572 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -708,7 +708,8 @@ static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
>  		}
>  	}
>  	/* use remains */
> -	most_chan->desc_num += desc_cnt;
> +	if (most_chan)
> +		most_chan->desc_num += desc_cnt;
>  }
>  
>  /**
> -- 
> 2.10.1.dirty
> 

^ permalink raw reply

* Re: [PATCH] net: fix incorrect original ingress device index in PKTINFO
From: David Ahern @ 2016-12-29  4:42 UTC (permalink / raw)
  To: David Miller, asuka.com
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <20161227.140313.1837464529059496066.davem@davemloft.net>

On 12/27/16 12:03 PM, David Miller wrote:
> From: Wei Zhang <asuka.com@163.com>
> Date: Tue, 27 Dec 2016 17:52:24 +0800
> 
>> When we send a packet for our own local address on a non-loopback
>> interface (e.g. eth0), due to the change had been introduced from
>> commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
>> original ingress device index would be set as the loopback interface.
>> However, the packet should be considered as if it is being arrived via the
>> sending interface (eth0), otherwise it would break the expectation of the
>> userspace application (e.g. the DHCPRELEASE message from dhcp_release
>> binary would be ignored by the dnsmasq daemon, since it come from lo which
>> is not the interface dnsmasq bind to)
>>

Add a Fixes line before the sign-off:

Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
>> Signed-off-by: Wei Zhang <asuka.com@163.com>
> 
> When you are fixing a problem introduced by another change, always CC:
> the author of that change as I have done so here.
> 
> David, please take a look at this, thanks.
> 
>> ---
>>  net/ipv4/ip_sockglue.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>> index b8a2d63..76d78a7 100644
>> --- a/net/ipv4/ip_sockglue.c
>> +++ b/net/ipv4/ip_sockglue.c
>> @@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
>>  		 * which has interface index (iif) as the first member of the
>>  		 * underlying inet{6}_skb_parm struct. This code then overlays
>>  		 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
>> -		 * element so the iif is picked up from the prior IPCB
>> +		 * element so the iif is picked up from the prior IPCB except
>> +		 * iif is loopback interface which the packet should be
>> +		 * considered as if it is being arrived via the sending
>> +		 * interface

That comment change could use an adjustment (adjust to fit with in the 80 columns):

element so the iif is picked up from the prior IPCB. If iif
is the loopback interface, then return the sending interface
(e.g., process binds socket to eth0 for Tx which is redirected
to loopback in the rtable/dst).


>>  		 */
>> +		if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
>> +			pktinfo->ipi_ifindex = inet_iif(skb);
>> +
>>  		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
>>  	} else {
>>  		pktinfo->ipi_ifindex = 0;


The actual change looks ok to me.

Acked-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply

* [PATCH v2] net: fix incorrect original ingress device index in PKTINFO
From: Wei Zhang @ 2016-12-29  8:45 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber, dsa; +Cc: netdev, linux-kernel

When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
---
v2:
 - add the missing Fixes line
 - better comment come from David Ahern

 net/ipv4/ip_sockglue.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 57e1405..53ae0c6 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
 		 * which has interface index (iif) as the first member of the
 		 * underlying inet{6}_skb_parm struct. This code then overlays
 		 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-		 * element so the iif is picked up from the prior IPCB
+		 * element so the iif is picked up from the prior IPCB. If iif
+		 * is the loopback interface, then return the sending interface
+		 * (e.g., process binds socket to eth0 for Tx which is
+		 * redirected to loopback in the rtable/dst).
 		 */
+		if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+			pktinfo->ipi_ifindex = inet_iif(skb);
+
 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
 	} else {
 		pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 05/12] Support for NIC-specific code
From: David VomLehn @ 2016-12-29  9:35 UTC (permalink / raw)
  To: Rami Rosen
  Cc: Netdev, Simon Edelhaus, Dmitrii Tarakanov, Alexander Loktionov,
	Pavel Belous
In-Reply-To: <CAKoUArn+zfeU16SgJbPxh1f0ud+M8ERi2j6c34cqeO=+LGjuCw@mail.gmail.com>

Responses inline.

On 12/27/2016 09:21 PM, Rami Rosen wrote:
> Hi, David,
>
> Several nitpicks and comments, from a brief overview:
>
> The commented label //err_exit:  should be removed
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
>> @@ -0,0 +1,993 @@
>> +//err_exit:
>> +//err_exit:
> Shouldn't aq_nic_rss_init() be static? isn't it called only from
> aq_nic_cfg_init_defaults()?
> and it always returns 0, shouldn't it be void as well ? (+ remove
> checking the return code when invoking it in
> aq_nic_cfg_init_defaults())
Yes, thanks.

>> +int aq_nic_rss_init(struct aq_nic_s *self, unsigned int num_rss_queues)
>> +{
>> +       struct aq_nic_cfg_s *cfg = &self->aq_nic_cfg;
>> +       struct aq_receive_scale_parameters *rss_params = &cfg->aq_rss;
>> +       int i = 0;
>> +
> ...
>> +       return 0;
>> +}
>
> Shouldn't aq_nic_ndev_alloc() be static ? Isn't it invoked only from
> aq_nic_alloc_cold()?
Yes.
>
>> +struct net_device *aq_nic_ndev_alloc(void)
>> +{
> ...
>> +}
>
>
>> +
>> +static unsigned int aq_nic_map_skb_lso(struct aq_nic_s *self,
>> +                                      struct sk_buff *skb,
>> +                                      struct aq_ring_buff_s *dx)
>> +{
>> +       unsigned int ret = 0U;
>> +
>> +       dx->flags = 0U;
>> +       dx->len_pkt = skb->len;
>> +       dx->len_l2 = ETH_HLEN;
>> +       dx->len_l3 = ip_hdrlen(skb);
>> +       dx->len_l4 = tcp_hdrlen(skb);
>> +       dx->mss = skb_shinfo(skb)->gso_size;
>> +       dx->is_txc = 1U;
>> +       ret = 1U;
>> +
> Why not remove this "ret" variable, and simply return 1 ? the method
> always returns 1:
>
>> +       return ret;
>> +}
>> +
Yes, better.
>> +int aq_nic_xmit(struct aq_nic_s *self, struct sk_buff *skb)
>> +{
>> +       struct aq_ring_s *ring = NULL;
>> +       unsigned int frags = 0U;
>> +       unsigned int vec = skb->queue_mapping % self->aq_nic_cfg.vecs;
>> +       unsigned int tc = 0U;
>> +       int err = 0;
>> +       bool is_nic_in_bad_state;
>> +       bool is_locked = false;
>> +       bool is_busy = false;
>> +       struct aq_ring_buff_s buffers[AQ_CFG_SKB_FRAGS_MAX];
>> +
>> +       frags = skb_shinfo(skb)->nr_frags + 1;
>> +
>> +       ring = self->aq_ring_tx[AQ_NIC_TCVEC2RING(self, tc, vec)];
>> +
>> +       atomic_inc(&self->busy_count);
>> +       is_busy = true;
>> +
>> +       if (frags > AQ_CFG_SKB_FRAGS_MAX) {
>> +               dev_kfree_skb_any(skb);
>> +               goto err_exit;
>> +       }
>> +
>> +       is_nic_in_bad_state = AQ_OBJ_TST(self, AQ_NIC_FLAGS_IS_NOT_TX_READY) ||
>> +                               (aq_ring_avail_dx(ring) < AQ_CFG_SKB_FRAGS_MAX);
>> +
>> +       if (is_nic_in_bad_state) {
>> +               aq_nic_ndev_queue_stop(self, ring->idx);
>> +               err = NETDEV_TX_BUSY;
>> +               goto err_exit;
>> +       }
>> +
> Usage of this internal block is not common (unless it is under #ifdef,
> and also not very common also in that case). I suggest move "unsigned
> int trys" to the variables definitions in the beginning of the method
> and remove the opening and closing brackets of the following block:
>> +       {
>> +               unsigned int trys = AQ_CFG_LOCK_TRYS;
>> +
>> +               frags = aq_nic_map_skb(self, skb, &buffers[0]);
>> +
>> +               do {
>> +                       is_locked = spin_trylock(&ring->lock);
>> +               } while (--trys && !is_locked);
>> +               if (!(is_locked)) {
>> +                       err = NETDEV_TX_BUSY;
>> +                       goto err_exit;
>> +               }
>> +
Yes, this is better.
> Usually you don't let the mtu be less than 68, for example:
> http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c#L2246
> See also RFV 791:
> https://tools.ietf.org/html/rfc791
>
>
>> +int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu)
>> +{
>> +       int err = 0;
>> +
>> +       if (new_mtu > self->aq_hw_caps.mtu) {
>> +               err = 0;
>> +               goto err_exit;
>> +       }
>> +       self->aq_nic_cfg.mtu = new_mtu;
>> +
>> +err_exit:
>> +       return err;
>> +}
Clearly a must--thanks!
>> +
>> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
>> new file mode 100644
>> index 0000000..89958e7
>> --- /dev/null
>> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
>> @@ -0,0 +1,111 @@
>> +/*
>> + * Aquantia Corporation Network Driver
>> + * Copyright (C) 2014-2016 Aquantia Corporation. All rights reserved
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + */
>> +
>> +/*
> Should be, of course, aq_nic.h:
>
>> + * File aq_nic.c: Declaration of common code for NIC.
>> + */
>> +
Good point. Better still, including the name of the file has little 
value and makes the comment incorrect if it gets renamed. So, thanks!
> Regards,
> Rami Rosen


-- 
David VL

^ permalink raw reply

* RE: [PATCH net] net: stmmac: Fix error path after register_netdev move
From: Kweh, Hock Leong @ 2016-12-29  9:36 UTC (permalink / raw)
  To: Florian Fainelli, David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, Voon, Weifeng,
	lars.persson@axis.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <D6759987A7968C4889FDA6FA91D5CBC801CA3364@PGSMSX103.gar.corp.intel.com>

> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Thursday, December 29, 2016 7:45 AM
> To: netdev@vger.kernel.org
> Cc: Florian Fainelli <f.fainelli@gmail.com>; pavel@ucw.cz;
> Joao.Pinto@synopsys.com; seraphin.bonnaffe@st.com;
> alexandre.torgue@gmail.com; manabian@gmail.com; niklas.cassel@axis.com;
> johan@kernel.org; Ong, Boon Leong <boon.leong.ong@intel.com>; Voon,
> Weifeng <weifeng.voon@intel.com>; lars.persson@axis.com; linux-
> kernel@vger.kernel.org; Giuseppe Cavallaro <peppe.cavallaro@st.com>;
> Alexandre Torgue <alexandre.torgue@st.com>
> Subject: [PATCH net] net: stmmac: Fix error path after register_netdev move
> 
> Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
> stmmac_open") re-ordered how the MDIO bus registration and the network
> device are registered, but missed to unwind the MDIO bus registration in case
> we fail to register the network device.
> 
> Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and
> stmmac_open")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---

Acked-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>



>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 5910ea51f8f6..39eb7a65bb9f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3366,12 +3366,19 @@ int stmmac_dvr_probe(struct device *device,
>  	}
> 
>  	ret = register_netdev(ndev);
> -	if (ret)
> +	if (ret) {
>  		netdev_err(priv->dev, "%s: ERROR %i registering the device\n",
>  			   __func__, ret);
> +		goto error_netdev_register;
> +	}
> 
>  	return ret;
> 
> +error_netdev_register:
> +	if (priv->hw->pcs != STMMAC_PCS_RGMII &&
> +	    priv->hw->pcs != STMMAC_PCS_TBI &&
> +	    priv->hw->pcs != STMMAC_PCS_RTBI)
> +		stmmac_mdio_unregister(ndev);
>  error_mdio_register:
>  	netif_napi_del(&priv->napi);
>  error_hw_init:
> --
> 2.9.3

^ permalink raw reply

* [PATCH v4] net: dev_weight: TX/RX orthogonality
From: Matthias Tafelmeier @ 2016-12-29  9:58 UTC (permalink / raw)
  To: netdev; +Cc: hagen, fw, edumazet, daniel
In-Reply-To: <20161228.141751.81302085672323860.davem@davemloft.net>

Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.

This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).

A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.

This patch therefore introduces an independent configurability via sysctl to
userland.
---
 Documentation/sysctl/net.txt | 21 +++++++++++++++++++++
 include/linux/netdevice.h    |  4 ++++
 net/core/dev.c               |  6 +++++-
 net/core/sysctl_net_core.c   | 31 ++++++++++++++++++++++++++++++-
 net/sched/sch_generic.c      |  2 +-
 5 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index f0480f7..53cef32 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -61,6 +61,27 @@ The maximum number of packets that kernel can handle on a NAPI interrupt,
 it's a Per-CPU variable.
 Default: 64
 
+dev_weight_rx_bias
+--------------
+
+RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
+of the driver for the per softirq cycle netdev_budget. This parameter influences
+the proportion of the configured netdev_budget that is spent on RPS based packet
+processing during RX softirq cycles. It is further meant for making current
+dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
+(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
+on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
+Default: 1
+
+dev_weight_tx_bias
+--------------
+
+Scales the maximum number of packets that can be processed during a TX softirq cycle.
+Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
+net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
+Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
+Default: 1
+
 default_qdisc
 --------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 994f742..ecd78b3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3795,6 +3795,10 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
+extern int		dev_weight_rx_bias;
+extern int		dev_weight_tx_bias;
+extern int		dev_rx_weight;
+extern int		dev_tx_weight;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
 struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 8db5a0b..f2fe98b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3428,6 +3428,10 @@ EXPORT_SYMBOL(netdev_max_backlog);
 int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;            /* old backlog weight */
+int dev_weight_rx_bias __read_mostly = 1;            /* bias for backlog weight */
+int dev_weight_tx_bias __read_mostly = 1;            /* bias for output_queue quota */
+int dev_rx_weight __read_mostly = weight_p;
+int dev_tx_weight __read_mostly = weight_p;
 
 /* Called with irq disabled */
 static inline void ____napi_schedule(struct softnet_data *sd,
@@ -4833,7 +4837,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		net_rps_action_and_irq_enable(sd);
 	}
 
-	napi->weight = weight_p;
+	napi->weight = dev_rx_weight;
 	while (again) {
 		struct sk_buff *skb;
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 2a46e40..698ddd7 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -222,6 +222,21 @@ static int set_default_qdisc(struct ctl_table *table, int write,
 }
 #endif
 
+static int proc_do_dev_weight(struct ctl_table *table, int write,
+			   void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret != 0)
+		return ret;
+
+	dev_rx_weight = weight_p * dev_weight_rx_bias;
+	dev_tx_weight = weight_p * dev_weight_tx_bias;
+
+	return ret;
+}
+
 static int proc_do_rss_key(struct ctl_table *table, int write,
 			   void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -273,7 +288,21 @@ static struct ctl_table net_core_table[] = {
 		.data		= &weight_p,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_do_dev_weight,
+	},
+	{
+		.procname	= "dev_weight_rx_bias",
+		.data		= &dev_weight_rx_bias,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_do_dev_weight,
+	},
+	{
+		.procname	= "dev_weight_tx_bias",
+		.data		= &dev_weight_tx_bias,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_do_dev_weight,
 	},
 	{
 		.procname	= "netdev_max_backlog",
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6eb9c8e..b052b27 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -247,7 +247,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
 
 void __qdisc_run(struct Qdisc *q)
 {
-	int quota = weight_p;
+	int quota = dev_tx_weight;
 	int packets;
 
 	while (qdisc_restart(q, &packets)) {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] vif queue counters from int to long
From: Wei Liu @ 2016-12-29 10:47 UTC (permalink / raw)
  To: Mart van Santen; +Cc: netdev, wei.liu2
In-Reply-To: <585D3E23.10509@greenhost.nl>

On Fri, Dec 23, 2016 at 04:09:23PM +0100, Mart van Santen wrote:
> 
> Hello,
> 
> This patch fixes an issue where counters in the queue have type int,
> while the counters of the vif itself are specified as long. This can
> cause incorrect reporting of tx/rx values of the vif interface.
> More extensively reported on xen-devel mailinglist.
> 

Hello,

Please also CC xen-devel@lists.xenproject.org for your future patch(es).
And please note that the most up to date maintainer information should
be used.

Wei.

> 
> 
> Signed-off-by: Mart van Santen <mart@greenhost.nl>
> --- a/drivers/net/xen-netback/common.h  2016-12-22 15:41:07.785535748 +0000
> +++ b/drivers/net/xen-netback/common.h  2016-12-23 13:08:18.123080064 +0000
> @@ -113,10 +113,10 @@ struct xenvif_stats {
>          * A subset of struct net_device_stats that contains only the
>          * fields that are updated in netback.c for each queue.
>          */
> -       unsigned int rx_bytes;
> -       unsigned int rx_packets;
> -       unsigned int tx_bytes;
> -       unsigned int tx_packets;
> +       unsigned long rx_bytes;
> +       unsigned long rx_packets;
> +       unsigned long tx_bytes;
> +       unsigned long tx_packets;
> 
>         /* Additional stats used by xenvif */
>         unsigned long rx_gso_checksum_fixup;
> 
> -- 
> Mart van Santen
> Greenhost
> E: mart@greenhost.nl
> T: +31 20 4890444
> W: https://greenhost.nl
> 
> A PGP signature can be attached to this e-mail,
> you need PGP software to verify it. 
> My public key is available in keyserver(s)
> see: http://tinyurl.com/openpgp-manual
> 
> PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5
> 
> 

^ permalink raw reply

* [PATCH] scm: fix possible control message header alignment issue
From: yuan linyu @ 2016-12-29 12:10 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, yuan linyu

From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

1. put_cmsg{_compat}() may copy data to user when buffer free space less than
   control message header alignment size.
2. scm_detach_fds{_compat}() may calc wrong fdmax if control message header
   have greater alignment size.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
 net/compat.c   | 10 ++++++++--
 net/core/scm.c |  8 +++++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 96c544b..fe1f41c 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -245,7 +245,9 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 
 	if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
 		return -EFAULT;
-	if (copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr)))
+	if (cmlen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) && 
+	    copy_to_user(CMSG_COMPAT_DATA(cm), data, 
+			cmlen - CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))))
 		return -EFAULT;
 	cmlen = CMSG_COMPAT_SPACE(len);
 	if (kmsg->msg_controllen < cmlen)
@@ -258,12 +260,16 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 {
 	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
-	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+	int fdmax = 0;
 	int fdnum = scm->fp->count;
 	struct file **fp = scm->fp->fp;
 	int __user *cmfptr;
 	int err = 0, i;
 
+	if (kmsg->msg_controllen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)))
+		fdmax = (kmsg->msg_controllen - 
+		         CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))) / sizeof(int);
+
 	if (fdnum < fdmax)
 		fdmax = fdnum;
 
diff --git a/net/core/scm.c b/net/core/scm.c
index d882043..5d8ef4f 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -238,7 +238,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
 	err = -EFAULT;
 	if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
 		goto out;
-	if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
+	if (cmlen > CMSG_ALIGN(sizeof(struct cmsghdr)) && 
+	    copy_to_user(CMSG_DATA(cm), data, 
+			cmlen - CMSG_ALIGN(sizeof(struct cmsghdr))))
 		goto out;
 	cmlen = CMSG_SPACE(len);
 	if (msg->msg_controllen < cmlen)
@@ -267,8 +269,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 		return;
 	}
 
-	if (msg->msg_controllen > sizeof(struct cmsghdr))
-		fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
+	if (msg->msg_controllen > CMSG_ALIGN(sizeof(struct cmsghdr)))
+		fdmax = ((msg->msg_controllen - CMSG_ALIGN(sizeof(struct cmsghdr)))
 			 / sizeof(int));
 
 	if (fdnum < fdmax)
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2] scm: fix possible control message header alignment issue
From: yuan linyu @ 2016-12-29 12:39 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, yuan linyu

From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

1. put_cmsg{_compat}() may copy data to user when buffer free space less than
   control message header alignment size.
2. scm_detach_fds{_compat}() may calc wrong fdmax if control message header
   have greater alignment size.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
 net/compat.c   | 10 ++++++++--
 net/core/scm.c |  8 +++++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 96c544b..ffe7a04 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -245,7 +245,9 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 
 	if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
 		return -EFAULT;
-	if (copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr)))
+	if (cmlen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) &&
+	    copy_to_user(CMSG_COMPAT_DATA(cm), data,
+			    cmlen - CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))))
 		return -EFAULT;
 	cmlen = CMSG_COMPAT_SPACE(len);
 	if (kmsg->msg_controllen < cmlen)
@@ -258,12 +260,16 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 {
 	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
-	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+	int fdmax = 0;
 	int fdnum = scm->fp->count;
 	struct file **fp = scm->fp->fp;
 	int __user *cmfptr;
 	int err = 0, i;
 
+	if (kmsg->msg_controllen > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)))
+		fdmax = (kmsg->msg_controllen -
+		         CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr))) / sizeof(int);
+
 	if (fdnum < fdmax)
 		fdmax = fdnum;
 
diff --git a/net/core/scm.c b/net/core/scm.c
index d882043..b2e60fd 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -238,7 +238,9 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
 	err = -EFAULT;
 	if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
 		goto out;
-	if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
+	if (cmlen > CMSG_ALIGN(sizeof(struct cmsghdr)) &&
+	    copy_to_user(CMSG_DATA(cm), data,
+			    cmlen - CMSG_ALIGN(sizeof(struct cmsghdr))))
 		goto out;
 	cmlen = CMSG_SPACE(len);
 	if (msg->msg_controllen < cmlen)
@@ -267,8 +269,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 		return;
 	}
 
-	if (msg->msg_controllen > sizeof(struct cmsghdr))
-		fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
+	if (msg->msg_controllen > CMSG_ALIGN(sizeof(struct cmsghdr)))
+		fdmax = ((msg->msg_controllen - CMSG_ALIGN(sizeof(struct cmsghdr)))
 			 / sizeof(int));
 
 	if (fdnum < fdmax)
-- 
2.7.4

^ permalink raw reply related

* [PATCH] ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
From: Pavel Tikhomirov @ 2016-12-29 14:35 UTC (permalink / raw)
  To: David S . Miller
  Cc: Eric Dumazet, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, linux-kernel, Konstantin Khorenko,
	Pavel Tikhomirov

> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 net/ipv4/sysctl_net_ipv4.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 80bc36b..5361373 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -41,6 +41,7 @@ static int tcp_syn_retries_min = 1;
 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
+static unsigned int uint_max = UINT_MAX;
 
 /* Update system visible IP port range */
 static void set_local_port_range(struct net *net, int range[2])
@@ -958,7 +959,9 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_tcp_notsent_lowat,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_doulongvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &uint_max,
 	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net] sctp: do not loose window information if in rwnd_over
From: Neil Horman @ 2016-12-29 15:42 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, linux-sctp, Vlad Yasevich
In-Reply-To: <14d1eef1c9da4fcebc53d428a59ecaefab439b2c.1482510284.git.marcelo.leitner@gmail.com>

On Fri, Dec 23, 2016 at 02:29:02PM -0200, Marcelo Ricardo Leitner wrote:
> It's possible that we receive a packet that is larger than current
> window. If it's the first packet in this way, it will cause it to
> increase rwnd_over. Then, if we receive another data chunk (specially as
> SCTP allows you to have one data chunk in flight even during 0 window),
> rwnd_over will be overwritten instead of added to.
> 
> In the long run, this could cause the window to grow bigger than its
> initial size, as rwnd_over would be charged only for the last received
> data chunk while the code will try open the window for all packets that
> were received and had its value in rwnd_over overwritten. This, then,
> can lead to the worsening of payload/buffer ratio and cause rwnd_press
> to kick in more often.
> 
> The fix is to sum it too, same as is done for rwnd_press, so that if we
> receive 3 chunks after closing the window, we still have to release that
> same amount before re-opening it.
> 
> Log snippet from sctp_test exhibiting the issue:
> [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
> rwnd decreased by 1 to (0, 1, 114221)
> [  146.209232] sctp: sctp_assoc_rwnd_decrease:
> association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
> [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
> rwnd decreased by 1 to (0, 1, 114221)
> [  146.209232] sctp: sctp_assoc_rwnd_decrease:
> association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
> [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
> rwnd decreased by 1 to (0, 1, 114221)
> [  146.209232] sctp: sctp_assoc_rwnd_decrease:
> association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
> [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
> rwnd decreased by 1 to (0, 1, 114221)
> 
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  net/sctp/associola.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 68428e1f71810fbe65b7f86c750c3ad61f0266ec..56ddcfaeb4f64235b54b0daa915fabf0cc0170a9 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1539,7 +1539,7 @@ void sctp_assoc_rwnd_decrease(struct sctp_association *asoc, unsigned int len)
>  			asoc->rwnd = 0;
>  		}
>  	} else {
> -		asoc->rwnd_over = len - asoc->rwnd;
> +		asoc->rwnd_over += len - asoc->rwnd;
>  		asoc->rwnd = 0;
>  	}
>  
> -- 
> 2.9.3
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH] rtlwifi: fix spelling mistake: "contry" -> "country"
From: Colin King @ 2016-12-29 16:00 UTC (permalink / raw)
  To: Larry Finger, Chaoming Li, Kalle Valo, linux-wireless, netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

trivial fix to spelling mistake in RT_TRACE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/realtek/rtlwifi/regd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/regd.c b/drivers/net/wireless/realtek/rtlwifi/regd.c
index 6ee6bf8..558c31b 100644
--- a/drivers/net/wireless/realtek/rtlwifi/regd.c
+++ b/drivers/net/wireless/realtek/rtlwifi/regd.c
@@ -440,7 +440,7 @@ int rtl_regd_init(struct ieee80211_hw *hw,
 
 	if (rtlpriv->regd.country_code >= COUNTRY_CODE_MAX) {
 		RT_TRACE(rtlpriv, COMP_REGD, DBG_DMESG,
-			 "rtl: EEPROM indicates invalid contry code, world wide 13 should be used\n");
+			 "rtl: EEPROM indicates invalid country code, world wide 13 should be used\n");
 
 		rtlpriv->regd.country_code = COUNTRY_CODE_WORLD_WIDE_13;
 	}
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix access to uninitialized variable in cpdma_chan_set_descs()
From: Grygorii Strashko @ 2016-12-29 16:04 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap
In-Reply-To: <20161229014904.GA22718@khorivan>

Hi Ivan,

On 12/28/2016 07:49 PM, Ivan Khoronzhuk wrote:
> On Wed, Dec 28, 2016 at 05:42:13PM -0600, Grygorii Strashko wrote:
>> Now below code sequence causes "Unable to handle kernel NULL pointer
>> dereference.." exception and system crash during CPSW CPDMA initialization:
>>
>> cpsw_probe
>> |-cpdma_chan_create (TX channel)
>>   |-cpdma_chan_split_pool
>>     |-cpdma_chan_set_descs(for TX channels)
>>     |-cpdma_chan_set_descs(for RX channels) [1]
>>
>> - and -
>> static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
>> 				 int rx, int desc_num,
>> 				 int per_ch_desc)
>> {
>> 	struct cpdma_chan *chan, *most_chan = NULL;
>>
>> ...
>>
>> 	for (i = min; i < max; i++) {
>> 		chan = ctlr->channels[i];
>> 		if (!chan)
>> 			continue;
>> ...
>>
>> 		if (most_dnum < chan->desc_num) {
>> 			most_dnum = chan->desc_num;
>> 			most_chan = chan;
>> 		}
>> 	}
>> 	/* use remains */
>> 	most_chan->desc_num += desc_cnt; [2]
>> }
>>
>> So, most_chan value will never be reassigned when cpdma_chan_set_descs() is
>> called second time [1], because there are no RX channels yet and system
>> will crash at [2].
> 
> How did you get this?
> I just remember as I fixed it before sending patchset.
> 
> Maybe it was some experiment with it.
> I just wonder and want to find actual reason what's happening.
> 
> Look bellow:
> 
> cpsw_probe
> |-cpdma_chan_create (TX channel)
>   |-cpdma_chan_split_pool
>     |-cpdma_chan_set_descs(for TX channels)
>     |-cpdma_chan_set_descs(for RX channels) [1]
> 
> |-cpdma_chan_set_descs(for RX channels) in case you'be described has to be
> called with rx_desc_num = 0, because all descs are assigned already for tx
> channel. And, if desc_num = 0, cpdma_chan_set_descs just exits and no issues.
> So, could you please explain how you get this, in what circumstances.

You are right. I've hit this issue while working on other feature which allows
to split pool between RX and TX path and as part of it cpdma_chan_set_descs()
is called with different set of arguments. I probably will just squash it in my changes or 
or send as part of my series.


-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH net-next V2 3/3] tun: rx batching
From: David Miller @ 2016-12-29 16:35 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1482912571-3157-4-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 28 Dec 2016 16:09:31 +0800

> +	spin_lock(&queue->lock);
> +	qlen = skb_queue_len(queue);
> +	if (qlen > rx_batched)
> +		goto drop;
> +	__skb_queue_tail(queue, skb);
> +	if (!more || qlen + 1 > rx_batched) {
> +		__skb_queue_head_init(&process_queue);
> +		skb_queue_splice_tail_init(queue, &process_queue);
> +		rcv = true;
> +	}
> +	spin_unlock(&queue->lock);

Since you always clear the 'queue' when you insert the skb that hits
the limit, I don't see how the "goto drop" path can be possibly taken.

^ permalink raw reply

* [PATCH net 2/5] net/mlx4_en: Fix bad WQE issue
From: Tariq Toukan @ 2016-12-29 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Eugenia Emantayev, Tariq Toukan
In-Reply-To: <1483029433-3624-1-git-send-email-tariqt@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

Single send WQE in RX buffer should be stamped with software
ownership in order to prevent the flow of QP in error in FW
once UPDATE_QP is called.


Fixes: 9f519f68cfff ('mlx4_en: Not using Shared Receive Queues')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 3c37e216bbf3..eac527e25ec9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -445,8 +445,14 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
 		ring->cqn = priv->rx_cq[ring_ind]->mcq.cqn;
 
 		ring->stride = stride;
-		if (ring->stride <= TXBB_SIZE)
+		if (ring->stride <= TXBB_SIZE) {
+			/* Stamp first unused send wqe */
+			__be32 *ptr = (__be32 *)ring->buf;
+			__be32 stamp = cpu_to_be32(1 << STAMP_SHIFT);
+			*ptr = stamp;
+			/* Move pointer to start of rx section */
 			ring->buf += TXBB_SIZE;
+		}
 
 		ring->log_stride = ffs(ring->stride) - 1;
 		ring->buf_size = ring->size * ring->stride;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 1/5] net/mlx4_core: Use-after-free causes a resource leak in flow-steering detach
From: Tariq Toukan @ 2016-12-29 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Jack Morgenstein, Tariq Toukan
In-Reply-To: <1483029433-3624-1-git-send-email-tariqt@mellanox.com>

From: Jack Morgenstein <jackm@dev.mellanox.co.il>

mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering
rule (which results in freeing the rule structure), and then
references a field in this struct (the qp number) when releasing the
busy-status on the rule's qp.

Since this memory was freed, it could reallocated and changed.
Therefore, the qp number in the struct may be incorrect,
so that we are releasing the incorrect qp. This leaves the rule's qp
in the busy state (and could possibly release an incorrect qp as well).

Fix this by saving the qp number in a local variable, for use after
removing the steering rule.

Fixes: 2c473ae7e582 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index c548beaaf910..4b3e139e9c82 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -4473,6 +4473,7 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 	struct res_qp *rqp;
 	struct res_fs_rule *rrule;
 	u64 mirr_reg_id;
+	int qpn;

 	if (dev->caps.steering_mode !=
 	    MLX4_STEERING_MODE_DEVICE_MANAGED)
@@ -4489,10 +4490,11 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 	}
 	mirr_reg_id = rrule->mirr_rule_id;
 	kfree(rrule->mirr_mbox);
+	qpn = rrule->qpn;

 	/* Release the rule form busy state before removal */
 	put_res(dev, slave, vhcr->in_param, RES_FS_RULE);
-	err = get_res(dev, slave, rrule->qpn, RES_QP, &rqp);
+	err = get_res(dev, slave, qpn, RES_QP, &rqp);
 	if (err)
 		return err;

@@ -4517,7 +4519,7 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 	if (!err)
 		atomic_dec(&rqp->ref_count);
 out:
-	put_res(dev, slave, rrule->qpn, RES_QP);
+	put_res(dev, slave, qpn, RES_QP);
 	return err;
 }

-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 3/5] net/mlx4: Remove BUG_ON from ICM allocation routine
From: Tariq Toukan @ 2016-12-29 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Leon Romanovsky, Tariq Toukan
In-Reply-To: <1483029433-3624-1-git-send-email-tariqt@mellanox.com>

From: Leon Romanovsky <leonro@mellanox.com>

This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent()
by checking DMA address alignment in advance and performing proper
folding in case of error.

Fixes: 5b0bf5e25efe ("mlx4_core: Support ICM tables in coherent memory")
Reported-by: Ozgur Karatas <okaratas@member.fsf.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/icm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 2a9dd460a95f..e1f9e7cebf8f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -118,8 +118,13 @@ static int mlx4_alloc_icm_coherent(struct device *dev, struct scatterlist *mem,
 	if (!buf)
 		return -ENOMEM;
 
+	if (offset_in_page(buf)) {
+		dma_free_coherent(dev, PAGE_SIZE << order,
+				  buf, sg_dma_address(mem));
+		return -ENOMEM;
+	}
+
 	sg_set_buf(mem, buf, PAGE_SIZE << order);
-	BUG_ON(mem->offset);
 	sg_dma_len(mem) = PAGE_SIZE << order;
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 4/5] net/mlx4_en: Fix type mismatch for 32-bit systems
From: Tariq Toukan @ 2016-12-29 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Slava Shwartsman, Tariq Toukan
In-Reply-To: <1483029433-3624-1-git-send-email-tariqt@mellanox.com>

From: Slava Shwartsman <slavash@mellanox.com>

is_power_of_2 expects unsigned long and we pass u64 max_val_cycles,
this will be truncated on 32 bit systems, and the result is not what we
were expecting.
div_u64 expects u32 as a second argument and we pass
max_val_cycles_rounded which is u64 hence it will always be truncated.
Fix was tested on both 64 and 32 bit systems and got same results for
max_val_cycles and max_val_cycles_rounded.

Fixes: 4850cf458157 ("net/mlx4_en: Resolve dividing by zero in 32-bit system")
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 015198c14fa8..504461a464c5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -245,13 +245,9 @@ static u32 freq_to_shift(u16 freq)
 {
 	u32 freq_khz = freq * 1000;
 	u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-	u64 tmp_rounded =
-		roundup_pow_of_two(max_val_cycles) > max_val_cycles ?
-		roundup_pow_of_two(max_val_cycles) - 1 : UINT_MAX;
-	u64 max_val_cycles_rounded = is_power_of_2(max_val_cycles + 1) ?
-		max_val_cycles : tmp_rounded;
+	u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
 	/* calculate max possible multiplier in order to fit in 64bit */
-	u64 max_mul = div_u64(0xffffffffffffffffULL, max_val_cycles_rounded);
+	u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);

 	/* This comes from the reverse of clocksource_khz2mult */
 	return ilog2(div_u64(max_mul * freq_khz, 1000000));
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 0/5] mlx4 misc fixes
From: Tariq Toukan @ 2016-12-29 16:37 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan

Hi Dave,

This patchset contains several bug fixes from the team to the
mlx4 Eth and Core drivers.

Series generated against net commit:
60133867f1f1 'net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured"'

Thanks,
Tariq.

Eugenia Emantayev (1):
  net/mlx4_en: Fix bad WQE issue

Jack Morgenstein (2):
  net/mlx4_core: Use-after-free causes a resource leak in flow-steering
    detach
  net/mlx4_core: Fix raw qp flow steering rules under SRIOV

Leon Romanovsky (1):
  net/mlx4: Remove BUG_ON from ICM allocation routine

Slava Shwartsman (1):
  net/mlx4_en: Fix type mismatch for 32-bit systems

 drivers/infiniband/hw/mlx4/main.c                  | 14 +++++++++--
 drivers/net/ethernet/mellanox/mlx4/en_clock.c      |  8 ++-----
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |  8 ++++++-
 drivers/net/ethernet/mellanox/mlx4/icm.c           |  7 +++++-
 drivers/net/ethernet/mellanox/mlx4/main.c          | 18 ++++++++++++++
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 28 ++++------------------
 include/linux/mlx4/device.h                        |  2 ++
 7 files changed, 52 insertions(+), 33 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox