Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 0/3] dp83640: Get pin and master/slave configuration from DT
From: Stefan Sørensen @ 2014-02-13 14:35 UTC (permalink / raw)
  To: richardcochran, grant.likely, robh+dt, mark.rutland, devicetree,
	linux-kernel, netdev
  Cc: Stefan Sørensen

This patch series add DT configuration to the DP83640 PHY driver and makes
the configuration of periodic output pins dynamic.

Changes since v2:
 - Add patch to properly configure perout triggers 0+1
 - Keep extts and perout numbers separate
 - Shorten pr_err lines

Changes since v1:
 - Add bindings documentation
 - Keep module parameters
 - Rename gpio->pin
 - Split patch into DT part and dynamic periodic output support

Stefan Sørensen (3):
  net:phy:dp83640: Program pulsewidth2 values of perout triggers 0 and 1
  net:phy:dp83640: Support a configurable number of periodic outputs
  net:phy:dp83640: Get pin and master/slave configuration from DT

 Documentation/devicetree/bindings/net/dp83640.txt |  29 ++++
 drivers/net/phy/dp83640.c                         | 199 +++++++++++++++-------
 2 files changed, 170 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/dp83640.txt

-- 
1.8.5.3

^ permalink raw reply

* [PATCH v3 1/3] net:phy:dp83640: Program pulsewidth2 values of perout triggers 0 and 1
From: Stefan Sørensen @ 2014-02-13 14:35 UTC (permalink / raw)
  To: richardcochran, grant.likely, robh+dt, mark.rutland, devicetree,
	linux-kernel, netdev
  Cc: Stefan Sørensen
In-Reply-To: <1392302129-24947-1-git-send-email-stefan.sorensen@spectralink.com>

Periodic output triggers 0 and 1 of the dp83640 has a programmable
duty-cycle which is controlled by the Pulsewidth2 field of the trigger
data register.  This field is not documented in the datasheet, but it
is described in the "PHYTER Software Development Guide" section
3.1.4.1. Failing to set the field can result in unstable/no trigger
output.

This patch add programming of the Pulsewidth2 field, setting it to the
same value as the Pulsewidth field for a 50% duty cycle.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 drivers/net/phy/dp83640.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 5ff221d..a370814 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -312,6 +312,11 @@ static void periodic_output(struct dp83640_clock *clock,
 	ext_write(0, phydev, PAGE4, PTP_TDR, sec >> 16);       /* sec[31:16] */
 	ext_write(0, phydev, PAGE4, PTP_TDR, period & 0xffff); /* ns[15:0] */
 	ext_write(0, phydev, PAGE4, PTP_TDR, period >> 16);    /* ns[31:16] */
+	/* Triggers 0 and 1 has programmable pulsewidth2 */
+	if (trigger == 0 || trigger == 1) {
+		ext_write(0, phydev, PAGE4, PTP_TDR, period & 0xffff);
+		ext_write(0, phydev, PAGE4, PTP_TDR, period >> 16);
+	}
 
 	/*enable trigger*/
 	val &= ~TRIG_LOAD;
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH next resend] tcp: use zero-window when free_space is low
From: Eric Dumazet @ 2014-02-13 14:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1392292350-28800-1-git-send-email-fw@strlen.de>

On Thu, 2014-02-13 at 12:52 +0100, Florian Westphal wrote:
> Currently the kernel tries to announce a zero window when free_space
> is below the current receiver mss estimate.
> 
> When a sender is transmitting small packets and reader consumes data
> slowly (or not at all), receiver might be unable to shrink the receive
> win because
> 
> a) we cannot withdraw already-commited receive window, and,
> b) we have to round the current rwin up to a multiple of the wscale
>    factor, else we would shrink the current window.
> 
> This causes the receive buffer to fill up until the rmem limit is hit.
> When this happens, we start dropping packets.
> 
> Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2]
> even if socket is not being read from.
> 
> As we cannot avoid the "current_win is rounded up to multiple of mss"
> issue [we would violate a) above] at least try to prevent the receive buf
> growth towards tcp_rmem[2] limit by attempting to move to zero-window
> announcement when free_space becomes less than 1/16 of the current
> allowed receive buffer maximum.  If tcp_rmem[2] is large, this will
> increase our chances to get a zero-window announcement out in time.
> 
> Reproducer:
> On server:
> $ nc -l -p 12345
> <suspend it: CTRL-Z>
> 
> Client:
> #!/usr/bin/env python
> import socket
> import time
> 
> sock = socket.socket()
> sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
> sock.connect(("192.168.4.1", 12345));
> while True:
>    sock.send('A' * 23)
>    time.sleep(0.005)
> 
> 
> socket buffer on server-side will grow until tcp_rmem[2] is hit,
> at which point the client rexmits data until -EDTIMEOUT:
> 
> tcp_data_queue invokes tcp_try_rmem_schedule which will call
> tcp_prune_queue which calls tcp_clamp_window().  And that function will
> grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2].
> 
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  V1 of this patch was deferred, resending to get discussion going again.
>  Changes since v1:
>   - add reproducer to commit message
> 
>  Unfortunately I couldn't come up with something that has no magic
>  ('allowed >> 4') value.  I chose >>4 (1/16th) because it didn't cause
>  tput limitations in my 'full-mss-sized, steady state' netcat tests.
> 
>  Maybe someone has better idea?

Thanks a lot Florian looking at this.

Do we have one SNMP counter tracking number of time we took the decision
to send a 0 window ?

Would you mind waiting we run our packetdrill tests before acknowledging
this patch, because I suspect this might have some impact ?

Thanks !

^ permalink raw reply

* RE: [PATCH net,v3] hyperv: Fix the carrier status setting
From: Haiyang Zhang @ 2014-02-13 15:04 UTC (permalink / raw)
  To: Jason Wang, davem@davemloft.net, netdev@vger.kernel.org
  Cc: driverdev-devel@linuxdriverproject.org, olaf@aepfle.de,
	linux-kernel@vger.kernel.org
In-Reply-To: <52FC4147.7060803@redhat.com>



> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, February 12, 2014 10:52 PM
> To: Haiyang Zhang; davem@davemloft.net; netdev@vger.kernel.org
> Cc: KY Srinivasan; olaf@aepfle.de; linux-kernel@vger.kernel.org; driverdev-
> devel@linuxdriverproject.org
> Subject: Re: [PATCH net,v3] hyperv: Fix the carrier status setting
> 
> On 02/13/2014 08:54 AM, Haiyang Zhang wrote:
> > Without this patch, the "cat /sys/class/net/ethN/operstate" shows
> > "unknown", and "ethtool ethN" shows "Link detected: yes", when VM
> > boots up with or without vNIC connected.
> >
> > This patch fixed the problem.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
> > ---
> >  drivers/net/hyperv/netvsc_drv.c |   53
> ++++++++++++++++++++++++++++-----------
> >  1 files changed, 38 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > b/drivers/net/hyperv/netvsc_drv.c index 7756118..7141a19 100644
> > --- a/drivers/net/hyperv/netvsc_drv.c
> > +++ b/drivers/net/hyperv/netvsc_drv.c
> > @@ -88,8 +88,12 @@ static int netvsc_open(struct net_device *net)  {
> >  	struct net_device_context *net_device_ctx = netdev_priv(net);
> >  	struct hv_device *device_obj = net_device_ctx->device_ctx;
> > +	struct netvsc_device *nvdev;
> > +	struct rndis_device *rdev;
> >  	int ret = 0;
> >
> > +	netif_carrier_off(net);
> > +
> >  	/* Open up the device */
> >  	ret = rndis_filter_open(device_obj);
> >  	if (ret != 0) {
> > @@ -99,6 +103,11 @@ static int netvsc_open(struct net_device *net)
> >
> >  	netif_start_queue(net);
> >
> > +	nvdev = hv_get_drvdata(device_obj);
> > +	rdev = nvdev->extension;
> > +	if (!rdev->link_state)
> > +		netif_carrier_on(net);
> > +
> 
> Maybe you can just schedule the work here and then you can drop the
> rtnl_lock in netvsc_link_change() ?

The rtnl_lock will still be necessary in the netvsc_link_change(), because 
we want to prevent it getting wrong rdev pointer when netvsc_change_mtu
is removing/adding rndis device.

> > +
> > +	if (notify)
> > +		netdev_notify_peers(net);
> >  }
> >
> 
> Looks like this forces arp_notify here. Is it expected?

Yes, this is expected. It's required after live migration.

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH next resend] tcp: use zero-window when free_space is low
From: Florian Westphal @ 2014-02-13 15:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1392303499.1752.19.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Do we have one SNMP counter tracking number of time we took the decision
> to send a 0 window ?

No.

> Would you mind waiting we run our packetdrill tests before acknowledging
> this patch, because I suspect this might have some impact ?

Of course not.  I am very happy that you folks have these kinds of tests
and are willing to double-check.  Take all time you need, there is no
need to haste.

Many Thanks Eric.

Do you think it makes sense to add counters for this?

The caveat is that decision to send 0 window doesn't mean we end
up sending one, since we cannot shrink already offered window.

static u16 tcp_select_window(struct sock *sk)
{
    struct tcp_sock *tp = tcp_sk(sk);
    u32 cur_win = tcp_receive_window(tp);
    u32 new_win = __tcp_select_window(sk);

    /* Never shrink the offered window */
    if (new_win < cur_win) {

Would you add SNMP counter for '__tcp_select_window() wants 0 window'
or for 'tcp_select_window() does pick 0 window' ?

[ or even different counters for both ? ]

Cheers,
Florian

^ permalink raw reply

* Re: RFC: bridge get fdb by bridge device
From: Jamal Hadi Salim @ 2014-02-13 15:37 UTC (permalink / raw)
  To: John Fastabend
  Cc: vyasevic, netdev@vger.kernel.org, Stephen Hemminger,
	Scott Feldman
In-Reply-To: <52FBC282.6020301@intel.com>

On 02/12/14 13:50, John Fastabend wrote:
> On 2/11/2014 1:04 PM, Jamal Hadi Salim wrote:
>
> Because it is not the same type of object as the software bridge.
> Most notably it doesn't do learning.

This kept nagging at me.
Learning is optional for a bridge. So is flooding.
I think we got this right in recent kernels.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 08/14] net: axienet: Removed checkpatch errors/warnings
From: Joe Perches @ 2014-02-13 15:51 UTC (permalink / raw)
  To: Michal Simek
  Cc: netdev, Srikanth Thokala, Srikanth Thokala, Michal Simek,
	Anirudha Sarangi, John Linn, linux-arm-kernel, linux-kernel
In-Reply-To: <ff5b1f6a-a6a7-405c-9e83-59ad33852b93@TX2EHSMHS044.ehs.local>

On Thu, 2014-02-13 at 08:19 +0100, Michal Simek wrote:
> On 02/13/2014 01:31 AM, Joe Perches wrote:
> > On Wed, 2014-02-12 at 16:55 +0100, Michal Simek wrote:

Hi again Michal.

> >> +		netdev_warn(lp->ndev,
> >> +			 "Could not find clock ethernet controller property.");
> > 
> > here too. (and alignment)
> 
> This is problematic. I would like to keep 80 char limits and keeping
> this align just break it. That's why I was using tab alignment.
> Probably the solution is just to shorten message.

(overly long, tiresomely trivial stuff below)

Your choice.  I'm not an 80 column zealot but
please don't shorten the message just to fit
80 columns if it impacts intelligibility.

Generally, I'd write this something like:

		netdev_warn(lp->ndev,
			    "Could not find clock ethernet controller property\n");

(without the period) which is 83 columns.

checkpatch makes exceptions for 80 column line
length maximums for format strings.

I've no real issue if you indent it back one.

fyi: this is 77 columns

		netdev_warn(lp->ndev,
			    "No clock ethernet controller property found\n");

About the message itself.

You dropped the "axienet_mdio_setup" function name.

I believe the dmesg output will look something like:

xilinx_temac 0000:01:00.0 (unregistered net_device): Could not find clock ethernet controller property.
xilinx_temac 0000:01:00.0 (unregistered net_device): Setting MDIO clock divisor to default 29

Because these 2 messages are effectively linked,
my preference would be to emit them on a single line,

Something like:

xilinx_temac 0000:01:00.0 (unregistered net_device): of_get_property("clock-frequency") not found - setting MDIO clock divisor to default 29

or

		netdev_warn(lp->ndev,
			    "of_get_property(\"clock-frequency\") not found - setting MDIO clock divisor to default %u\n",
			    DEFAULT_CLOCK_DIVISOR);

^ permalink raw reply

* Re: [PATCH net-next] net: remove unnecessary return's
From: Stephen Hemminger @ 2014-02-13 16:02 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, netdev, Julia Lawall
In-Reply-To: <1392273125.2214.25.camel@joe-AO722>

On Wed, 12 Feb 2014 22:32:05 -0800
Joe Perches <joe@perches.com> wrote:

> On Wed, 2014-02-12 at 20:51 -0800, Stephen Hemminger wrote:
> > One of my pet coding style peeves is the practice of
> > adding extra return; at the end of function.
> > Kill several instances of this in network code.
> > I suppose some coccinelle wizardy could do this automatically.
> 
> Maybe, but grep version 2.5.4 will show most of them.
> 
> $ grep-2.5.4 -rP --include=*.[ch] "return;\n}" *
> [...]
> 
> Fixing them has to make sure that there's no
> label before the close brace.
> 
> gcc has to have a statement before the close brace
> of a void return after a label.
> 
> label:
> }
> 
> must be:
> 
> label:
> 	;
> }
> 
> to compile.
> 

My method was to use:
 find . -name '*.c' | xargs grep -Pzo '(?s)^(\s*)\Nreturn;.}'
Then ignore cases where it was done for final label and where return
was alone in stub function.

^ permalink raw reply

* Re: RFC: bridge get fdb by bridge device
From: John Fastabend @ 2014-02-13 16:03 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: vyasevic, netdev@vger.kernel.org, Stephen Hemminger,
	Scott Feldman
In-Reply-To: <52FCE69F.5080602@mojatatu.com>

On 2/13/2014 7:37 AM, Jamal Hadi Salim wrote:
> On 02/12/14 13:50, John Fastabend wrote:
>> On 2/11/2014 1:04 PM, Jamal Hadi Salim wrote:
>>
>> Because it is not the same type of object as the software bridge.
>> Most notably it doesn't do learning.
>
> This kept nagging at me.
> Learning is optional for a bridge. So is flooding.
> I think we got this right in recent kernels.
>
> cheers,
> jamal

Yeah I remember now Vlad added them. The real distinction
is macvlan devices only have _one_ uplink port and support
other forwarding modes, VEPA, etc.

^ permalink raw reply

* FROM WORLD BANK/UNITED, READ THE ATTACHMENT!!!!
From: WORLD BANK/UNITED NATIONS @ 2014-02-13 16:13 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 48 bytes --]

FROM WORLD BANK/UNITED, READ THE ATTACHMENT!!!!

[-- Attachment #2: WORLD BANK UNITED NATIONS.doc --]
[-- Type: application/msword, Size: 28672 bytes --]

^ permalink raw reply

* Re: [PATCH next resend] tcp: use zero-window when free_space is low
From: Eric Dumazet @ 2014-02-13 16:19 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <20140213153441.GJ25153@breakpoint.cc>

On Thu, 2014-02-13 at 16:34 +0100, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Would you mind waiting we run our packetdrill tests before acknowledging
> > this patch, because I suspect this might have some impact ?
> 
> Of course not.  I am very happy that you folks have these kinds of tests
> and are willing to double-check.  Take all time you need, there is no
> need to haste.
> 


We have today 348 different packetdrill tests, and this number keeps
increasing ;)

> Many Thanks Eric.
> 
> Do you think it makes sense to add counters for this?
> 
> The caveat is that decision to send 0 window doesn't mean we end
> up sending one, since we cannot shrink already offered window.
> 
> static u16 tcp_select_window(struct sock *sk)
> {
>     struct tcp_sock *tp = tcp_sk(sk);
>     u32 cur_win = tcp_receive_window(tp);
>     u32 new_win = __tcp_select_window(sk);
> 
>     /* Never shrink the offered window */
>     if (new_win < cur_win) {
> 
> Would you add SNMP counter for '__tcp_select_window() wants 0 window'
> or for 'tcp_select_window() does pick 0 window' ?
> 
> [ or even different counters for both ? ]

Ideally, having counters of transitions would be nice.

This would help us to spot regressions in TCP stacks or network drivers,
or wrong application tunings.

One counter tracking number of times a socket went from a non zero
window to a zero window (as you said, I am referring to effective window
being sent)

One counter tracking the reverse.

If it proves being too complex or expensive, a single counter tracking
"win 0" sent to the peers.

^ permalink raw reply

* Re: [PATCH v2 net-next] net: remove useless if check from register_netdevice()
From: Alexei Starovoitov @ 2014-02-13 16:23 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: David S. Miller, netdev
In-Reply-To: <1392266829-2365-1-git-send-email-kda@linux-powerpc.org>

On Wed, Feb 12, 2014 at 8:47 PM, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> remove useless if check from register_netdevice()
>
> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
> ---
> v1 -> v2: Fixed identation
> ---
>  net/core/dev.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4ad1b78..21a72ad 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5876,8 +5876,7 @@ int register_netdevice(struct net_device *dev)
>         if (dev->netdev_ops->ndo_init) {
>                 ret = dev->netdev_ops->ndo_init(dev);
>                 if (ret) {
> -                       if (ret > 0)
> -                               ret = -EIO;
> +                       ret = -EIO;
>                         goto out;

why is it a useless check?
seems perfectly valid to me.
most of the time ndo_init() returns negative error code like -ENOMEM
which we want
to propagate further down and want to override it to -EIO if it's > 0

^ permalink raw reply

* [PATCH iproute2 v2] iplink_bond: fix parameter value matching
From: Michal Kubecek @ 2014-02-13 16:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Ding Tianhong
In-Reply-To: <52F448AF.3060604@huawei.com>

Lookup function get_index() compares argument with table entries
only up to the length of the table entry so that if an entry
with lower index is a substring of a later one, earlier entry is
used even if the argument is equal to the other. For example,

  ip link set bond0 type bond xmit_hash_policy layer2+3

sets xmit_hash_policy to 0 (layer2) as this is found before
"layer2+3" can be checked.

Use strcmp() to compare whole strings instead.

v2: look for an exact match only

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 ip/iplink_bond.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c
index f22151e..7a950df 100644
--- a/ip/iplink_bond.c
+++ b/ip/iplink_bond.c
@@ -106,7 +106,7 @@ static int get_index(const char **tbl, char *name)
 				return i;
 
 	for (i = 0; tbl[i]; i++)
-		if (strncmp(tbl[i], name, strlen(tbl[i])) == 0)
+		if (strcmp(tbl[i], name) == 0)
 			return i;
 
 	return -1;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH v2] usbnet: remove generic hard_header_len check
From: Emil Goode @ 2014-02-13 16:50 UTC (permalink / raw)
  To: Steve Glendinning, Oliver Neukum, Bjørn Mork,
	David S. Miller, Freddy Xin, Eric Dumazet, Ming Lei,
	Paul Gortmaker, Jeff Kirsher, Liu Junliang, Octavian Purdila
  Cc: linux-usb, netdev, linux-kernel, Emil Goode

This patch removes a generic hard_header_len check from the usbnet
module that is causing dropped packages under certain circumstances
for devices that send rx packets that cross urb boundaries.

One example is the AX88772B which occasionally send rx packets that
cross urb boundaries where the remaining partial packet is sent with
no hardware header. When the buffer with a partial packet is of less
number of octets than the value of hard_header_len the buffer is
discarded by the usbnet module.

With AX88772B this can be reproduced by using ping with a packet
size between 1965-1976.

The bug has been reported here:

https://bugzilla.kernel.org/show_bug.cgi?id=29082

This patch introduces the following changes:
- Removes the generic hard_header_len check in the rx_complete
  function in the usbnet module.
- Introduces a ETH_HLEN check for skbs that are not cloned from
  within a rx_fixup callback.
- For safety a hard_header_len check is added to each rx_fixup
  callback function that could be affected by this change.
  These extra checks could possibly be removed by someone
  who has the hardware to test.
- Removes a call to dev_kfree_skb_any() and instead utilizes the
  dev->done list to queue skbs for cleanup.

The changes place full responsibility on the rx_fixup callback
functions that clone skbs to only pass valid skbs to the
usbnet_skb_return function.

Signed-off-by: Emil Goode <emilgoode@gmail.com>
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
---
v2: Added a comment in the changelog about the removal of a
    call to dev_kfree_skb_any().

 drivers/net/usb/ax88179_178a.c |    4 ++++
 drivers/net/usb/gl620a.c       |    4 ++++
 drivers/net/usb/mcs7830.c      |    5 +++--
 drivers/net/usb/net1080.c      |    4 ++++
 drivers/net/usb/qmi_wwan.c     |    8 ++++----
 drivers/net/usb/rndis_host.c   |    4 ++++
 drivers/net/usb/smsc75xx.c     |    4 ++++
 drivers/net/usb/smsc95xx.c     |    4 ++++
 drivers/net/usb/sr9800.c       |    4 ++++
 drivers/net/usb/usbnet.c       |   25 ++++++++++---------------
 10 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index d6f64da..955df81 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1118,6 +1118,10 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 	u16 hdr_off;
 	u32 *pkt_hdr;
 
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	skb_trim(skb, skb->len - 4);
 	memcpy(&rx_hdr, skb_tail_pointer(skb), 4);
 	le32_to_cpus(&rx_hdr);
diff --git a/drivers/net/usb/gl620a.c b/drivers/net/usb/gl620a.c
index e4a8a93..1cc24e6 100644
--- a/drivers/net/usb/gl620a.c
+++ b/drivers/net/usb/gl620a.c
@@ -84,6 +84,10 @@ static int genelink_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 	u32			size;
 	u32			count;
 
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	header = (struct gl_header *) skb->data;
 
 	// get the packet count of the received skb
diff --git a/drivers/net/usb/mcs7830.c b/drivers/net/usb/mcs7830.c
index a305a7b..82d844a 100644
--- a/drivers/net/usb/mcs7830.c
+++ b/drivers/net/usb/mcs7830.c
@@ -526,8 +526,9 @@ static int mcs7830_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
 	u8 status;
 
-	if (skb->len == 0) {
-		dev_err(&dev->udev->dev, "unexpected empty rx frame\n");
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len) {
+		dev_err(&dev->udev->dev, "unexpected tiny rx frame\n");
 		return 0;
 	}
 
diff --git a/drivers/net/usb/net1080.c b/drivers/net/usb/net1080.c
index 0a85d92..4cbdb13 100644
--- a/drivers/net/usb/net1080.c
+++ b/drivers/net/usb/net1080.c
@@ -364,6 +364,10 @@ static int net1080_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 	struct nc_trailer	*trailer;
 	u16			hdr_len, packet_len;
 
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	if (!(skb->len & 0x01)) {
 		netdev_dbg(dev->net, "rx framesize %d range %d..%d mtu %d\n",
 			   skb->len, dev->net->hard_header_len, dev->hard_mtu,
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index ff5c871..b2e2aee 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -80,10 +80,10 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
 	__be16 proto;
 
-	/* usbnet rx_complete guarantees that skb->len is at least
-	 * hard_header_len, so we can inspect the dest address without
-	 * checking skb->len
-	 */
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	switch (skb->data[0] & 0xf0) {
 	case 0x40:
 		proto = htons(ETH_P_IP);
diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c
index a48bc0f..524a47a 100644
--- a/drivers/net/usb/rndis_host.c
+++ b/drivers/net/usb/rndis_host.c
@@ -492,6 +492,10 @@ EXPORT_SYMBOL_GPL(rndis_unbind);
  */
 int rndis_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	/* peripheral may have batched packets to us... */
 	while (likely(skb->len)) {
 		struct rndis_data_hdr	*hdr = (void *)skb->data;
diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index f17b9e0..d9e7892 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -2106,6 +2106,10 @@ static void smsc75xx_rx_csum_offload(struct usbnet *dev, struct sk_buff *skb,
 
 static int smsc75xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	while (skb->len > 0) {
 		u32 rx_cmd_a, rx_cmd_b, align_count, size;
 		struct sk_buff *ax_skb;
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 8dd54a0..424db65e 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1723,6 +1723,10 @@ static void smsc95xx_rx_csum_offload(struct sk_buff *skb)
 
 static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	while (skb->len > 0) {
 		u32 header, align_count;
 		struct sk_buff *ax_skb;
diff --git a/drivers/net/usb/sr9800.c b/drivers/net/usb/sr9800.c
index 4175eb9..575be80 100644
--- a/drivers/net/usb/sr9800.c
+++ b/drivers/net/usb/sr9800.c
@@ -63,6 +63,10 @@ static int sr_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
 	int offset = 0;
 
+	/* This check is no longer done by usbnet */
+	if (skb->len < dev->net->hard_header_len)
+		return 0;
+
 	while (offset + sizeof(u32) < skb->len) {
 		struct sk_buff *sr_skb;
 		u16 size;
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 4671da7..dd10d58 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -542,17 +542,19 @@ static inline void rx_process (struct usbnet *dev, struct sk_buff *skb)
 	}
 	// else network stack removes extra byte if we forced a short packet
 
-	if (skb->len) {
-		/* all data was already cloned from skb inside the driver */
-		if (dev->driver_info->flags & FLAG_MULTI_PACKET)
-			dev_kfree_skb_any(skb);
-		else
-			usbnet_skb_return(dev, skb);
+	/* all data was already cloned from skb inside the driver */
+	if (dev->driver_info->flags & FLAG_MULTI_PACKET)
+		goto done;
+
+	if (skb->len < ETH_HLEN) {
+		dev->net->stats.rx_errors++;
+		dev->net->stats.rx_length_errors++;
+		netif_dbg(dev, rx_err, dev->net, "rx length %d\n", skb->len);
+	} else {
+		usbnet_skb_return(dev, skb);
 		return;
 	}
 
-	netif_dbg(dev, rx_err, dev->net, "drop\n");
-	dev->net->stats.rx_errors++;
 done:
 	skb_queue_tail(&dev->done, skb);
 }
@@ -574,13 +576,6 @@ static void rx_complete (struct urb *urb)
 	switch (urb_status) {
 	/* success */
 	case 0:
-		if (skb->len < dev->net->hard_header_len) {
-			state = rx_cleanup;
-			dev->net->stats.rx_errors++;
-			dev->net->stats.rx_length_errors++;
-			netif_dbg(dev, rx_err, dev->net,
-				  "rx length %d\n", skb->len);
-		}
 		break;
 
 	/* stalls need manual reset. this is rare ... except that
-- 
1.7.10.4

^ permalink raw reply related

* RE: [PATCH v2] usbnet: remove generic hard_header_len check
From: David Laight @ 2014-02-13 16:56 UTC (permalink / raw)
  To: 'Emil Goode', Steve Glendinning, Oliver Neukum,
	Bjørn Mork, David S. Miller, Freddy Xin, Eric Dumazet,
	Ming Lei, Paul Gortmaker, Jeff Kirsher, Liu Junliang,
	Octavian Purdila
  Cc: linux-usb@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1392310219-5719-1-git-send-email-emilgoode@gmail.com>

From: Of Emil Goode
> This patch removes a generic hard_header_len check from the usbnet
> module that is causing dropped packages under certain circumstances
> for devices that send rx packets that cross urb boundaries.
> 
> One example is the AX88772B which occasionally send rx packets that
> cross urb boundaries where the remaining partial packet is sent with
> no hardware header. When the buffer with a partial packet is of less
> number of octets than the value of hard_header_len the buffer is
> discarded by the usbnet module.
...
> diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
> index d6f64da..955df81 100644
> --- a/drivers/net/usb/ax88179_178a.c
> +++ b/drivers/net/usb/ax88179_178a.c
> @@ -1118,6 +1118,10 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>  	u16 hdr_off;
>  	u32 *pkt_hdr;
> 
> +	/* This check is no longer done by usbnet */
> +	if (skb->len < dev->net->hard_header_len)
> +		return 0;
> +

The ax88179 driver can also receive ethernet frames that cross the
end of rx URB.
It should have the code to save the last fragment (of a urb) until
the next data arrives, and then correctly merge the fragments.

It is likely that any sub-driver that sets the receive urb length
to a multiple of 1k (rather than leaving it at hard_hdr+mtu) can defrag
rx data that crosses urb boundaries.

	David

^ permalink raw reply

* Re: [PATCH net-next v2 09/10] Documentation: add Device tree bindings for Broadcom GENET
From: Florian Fainelli @ 2014-02-13 16:57 UTC (permalink / raw)
  To: Mark Rutland
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	cernekee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20140213111328.GB30705-NuALmloUBlrZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>

Hi Mark,

2014-02-13 3:13 GMT-08:00 Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>:
> On Thu, Feb 13, 2014 at 05:29:54AM +0000, Florian Fainelli wrote:
>> This patch adds the Device Tree bindings for the Broadcom GENET Gigabit
>> Ethernet controller. A bunch of examples are provided to illustrate the
>> versatile aspect of the hardare.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> Changes since v1:
>> - rebased
>>
>>  .../devicetree/bindings/net/broadcom-bcmgenet.txt  | 111 +++++++++++++++++++++
>>  1 file changed, 111 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
>> new file mode 100644
>> index 0000000..93c58e9
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/broadcom-bcmgenet.txt
>> @@ -0,0 +1,111 @@
>> +* Broadcom BCM7xxx Ethernet Controller (GENET)
>> +
>> +Required properties:
>> +- compatible: should be "brcm,genet-v1", "brcm,genet-v2", "brcm,genet-v3",
>> +  "brcm,genet-v4".
>
> Presumably "should contain one of" is a better description than "should
> be"?
>
> Are the newer revisions compatible with older revisions?

Not entirely, the driver has internal macros: GENET_IS_V<N>() to help
figuring out which parts are different.

>
>> +- reg: address and length of the register set for the device.
>> +- interrupts: interrupt for the device
>> +- mdio bus node: this node should always be present regarless of the PHY
>> +  configuration of the GENET instance
>
> Nit: a node is not a property, list it after properties.
>
>> +- phy-mode: The interface between the SoC and the PHY (a string that
>> +  of_get_phy_mode() can understand).
>
> Do we not have a document under bindings listing these? I really don't
> like referring to code in bindings docs.

Sergei has been working on a patch that centralizes the Ethernet DT
binding, but I am targetting the net-next/master tree in which this is
not yet present. I could probably pro-actively mention it though.

>
>> +
>> +MDIO bus node required properties:
>> +
>> +- compatible: should be "brcm,genet-v<N>-mdio"
>
> Where N is? Could this not be an explicit list as above? It helps when
> searching for bindings.

Yes, this should match the genet-v<N> compatible string.

>
>> +- reg: address and length relative to the parent node base register address
>
> The parent node will require #address-cells and #size-cells too then.

Correct.

>
>> +- address-cells: address cell for MDIO bus addressing, should be 1
>> +- size-cells: size of the cells for MDIO bus addressing, should be 0
>> +
>> +Optional properties:
>> +- phy-handle: A phandle to a phy node defining the PHY address (as the reg
>> +  property, a single integer), used to describe configurations where a PHY
>> +  (internal or external) is used.
>
> Is there not a phy binding you could refer to instead?

This is ePAPR, but once again, Sergei's document centralizes that nicely.

>
>> +
>> +- fixed-link: When the GENET interface is connected to a MoCA hardware block
>> +  or when operating in a RGMII to RGMII type of connection, or when the
>> +  MDIO bus is voluntarily disabled, this property should be used to describe
>> +  the "fixed link", the property is described as follows:
>> +
>> +  fixed-link: <a b c d e> where a is emulated phy id - choose any,
>> +  but unique to the all specified fixed-links, b is duplex - 0 half,
>> +  1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
>> +  pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
>
> Is this not documented elsewhere such that it can be referred to?

The Freescale FEC driver is the one holding most of the documentation
for fixed-link, I can refer to it until Sergei's patch which
centralizes the Ethernet DT bindings gets merged.

>
>> +
>> +Internal Gigabit PHY example:
>> +
>> +ethernet@f0b60000 {
>> +     phy-mode = "internal";
>> +     phy-handle = <&phy1>;
>> +     mac-address = [ 00 10 18 36 23 1a ];
>> +     compatible = "brcm,genet-v4";
>> +     #address-cells = <0x1>;
>> +     #size-cells = <0x1>;
>> +     device_type = "ethernet";
>
> What's this needed by? I can't see any other devices with this
> device_type, and I was under the impression that we didn't want new
> device_type properties cropping up.

This is just an oversight, and is not required per-se, I will get it removed.

>
>> +     reg = <0xf0b60000 0xfc4c>;
>> +     interrupts = <0x0 0x14 0x0 0x0 0x15 0x0>;
>
> How many? The binding implied only one, and I'm not away of any
> interrupt controller bindings with #interrupt-cells = <6>.

In fact, only two interrupts, cells, this is a bad copy-pasting from
the bootloader providing the DTB.

Thanks for the review!
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v2 07/10] net: bcmgenet: add MDIO routines
From: Florian Fainelli @ 2014-02-13 17:00 UTC (permalink / raw)
  To: Mark Rutland
  Cc: netdev@vger.kernel.org, davem@davemloft.net, cernekee@gmail.com,
	devicetree@vger.kernel.org
In-Reply-To: <20140213115008.GB5871@e106331-lin.cambridge.arm.com>

Hi Mark,

2014-02-13 3:50 GMT-08:00 Mark Rutland <mark.rutland@arm.com>:
> On Thu, Feb 13, 2014 at 05:29:52AM +0000, Florian Fainelli wrote:
>> This patch adds support for configuring the port multiplexer hardware
>> which resides in front of the GENET Ethernet MAC controller. This allows
>> us to support:
>>
>> - internal PHYs (using drivers/net/phy/bcm7xxx.c)
>> - MoCA PHYs which are an entirely separate hardware block not covered
>>   here
>> - external PHYs and switches
>>
>> Note that MoCA and switches are currently supported using the emulated
>> "fixed PHY" driver.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Changes since v1:
>> - fixed MDIO crash/warning when Device Tree probing fails
>> - removed the use of priv->phy_type and use priv->phy_interface
>>   directly
>>
>>  drivers/net/ethernet/broadcom/genet/bcmmii.c | 481 +++++++++++++++++++++++++++
>>  1 file changed, 481 insertions(+)
>>  create mode 100644 drivers/net/ethernet/broadcom/genet/bcmmii.c
>
> [...]
>
>> +static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
>> +{
>> +       struct device_node *dn = priv->pdev->dev.of_node;
>> +       struct device *kdev = &priv->pdev->dev;
>> +       struct device_node *mdio_dn;
>> +       const __be32 *fixed_link;
>
> This looks a bit odd. Could we not have a common parser for fixed-link
> properties?

Do you remember the fixed-link revamp that Thomas Petazzoni submitted
a while ago? I was planning on using it, but there were still some
disagreements on how to do it, so I ended up using the good old
"fixed-link" property as-is.

>
>> +       u32 propval;
>> +       int ret, sz;
>> +
>> +       mdio_dn = of_get_next_child(dn, NULL);
>> +       if (!mdio_dn) {
>> +               dev_err(kdev, "unable to find MDIO bus node\n");
>> +               return -ENODEV;
>> +       }
>
> Could you please check that this is the node you expect (by looking at
> the compatible string list).

Makes sense.

>
>> +
>> +       ret = of_mdiobus_register(priv->mii_bus, mdio_dn);
>> +       if (ret) {
>> +               dev_err(kdev, "failed to register MDIO bus\n");
>> +               return ret;
>> +       }
>> +
>> +       /* Check if we have an internal or external PHY */
>> +       priv->phy_dn = of_parse_phandle(dn, "phy-handle", 0);
>> +       if (priv->phy_dn) {
>> +               if (!of_property_read_u32(priv->phy_dn, "max-speed", &propval))
>> +                       priv->phy_speed = propval;
>
> Is there no way to find this out without reading values directly off of
> the PHY? It seems like something we should have an abstraction for.

In fact, this is a remnant from when I did not submit the patch doing
this in of_mdiobus_register(), so this is now useless.

>
>> +       } else {
>> +               /* Read the link speed from the fixed-link property */
>> +               fixed_link = of_get_property(dn, "fixed-link", &sz);
>> +               if (!fixed_link || sz < sizeof(*fixed_link)) {
>> +                       ret = -ENODEV;
>> +                       goto out;
>> +               }
>> +
>> +               priv->phy_speed = be32_to_cpu(fixed_link[2]);
>
> Similarly can we not have a common fixed-link parser? Or abstraction
> such that you query the phy regardless of what it is and how its binding
> represents this?

I do not think I need this line anymore. I will do some more testing
and will remove this parsing. I do agree that we need a common
fixed-link parser, but that is part of the discussion initiated by
Thomas Petazzoni.

Thanks for the review!
-- 
Florian

^ permalink raw reply

* ovs inconsistent lock state
From: Jiri Pirko @ 2014-02-13 17:13 UTC (permalink / raw)
  To: netdev; +Cc: jesse, dev

Hi.

On current net-next I'm getting following message once I add a dp:

[ 3014.523665] =================================
[ 3014.524118] [ INFO: inconsistent lock state ]
[ 3014.524118] 3.14.0-rc2+ #1 Not tainted
[ 3014.524118] ---------------------------------
[ 3014.524118] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 3014.524118] swapper/1/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
[ 3014.524118]  (&(&flow->stats.stat->lock)->rlock){+.?...}, at: [<ffffffffa021b6ef>] ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
[ 3014.524118] {SOFTIRQ-ON-W} state was registered at:
[ 3014.524118]   [<ffffffff810d6843>] __lock_acquire+0x5a3/0x1c30
[ 3014.524118]   [<ffffffff810d7f80>] lock_acquire+0xb0/0x150
[ 3014.524118]   [<ffffffff81705c99>] _raw_spin_lock+0x39/0x50
[ 3014.524118]   [<ffffffffa021b8d3>] ovs_flow_stats_get+0x163/0x1b0 [openvswitch]
[ 3014.524118]   [<ffffffffa0218ce5>] ovs_flow_cmd_fill_info+0x165/0x2b0 [openvswitch]
[ 3014.524118]   [<ffffffffa021904c>] ovs_flow_cmd_build_info.constprop.27+0x6c/0xa0 [openvswitch]
[ 3014.524118]   [<ffffffffa0219519>] ovs_flow_cmd_new_or_set+0x499/0x4f0 [openvswitch]
[ 3014.524118]   [<ffffffff8161bf09>] genl_family_rcv_msg+0x199/0x390
[ 3014.524118]   [<ffffffff8161c18e>] genl_rcv_msg+0x8e/0xd0
[ 3014.524118]   [<ffffffff8161a189>] netlink_rcv_skb+0xa9/0xc0
[ 3014.524118]   [<ffffffff8161a6a8>] genl_rcv+0x28/0x40
[ 3014.524118]   [<ffffffff816197d0>] netlink_unicast+0xf0/0x1b0
[ 3014.524118]   [<ffffffff81619b8f>] netlink_sendmsg+0x2ff/0x740
[ 3014.524118]   [<ffffffff815ce86b>] sock_sendmsg+0x8b/0xc0
[ 3014.524118]   [<ffffffff815cf369>] ___sys_sendmsg+0x389/0x3a0
[ 3014.524118]   [<ffffffff815cfa62>] __sys_sendmsg+0x42/0x80
[ 3014.524118]   [<ffffffff815cfab2>] SyS_sendmsg+0x12/0x20
[ 3014.524118]   [<ffffffff8170f829>] system_call_fastpath+0x16/0x1b
[ 3014.524118] irq event stamp: 6163288
[ 3014.524118] hardirqs last  enabled at (6163288): [<ffffffff8169f8a0>] ip6_finish_output2+0x380/0x670
[ 3014.524118] hardirqs last disabled at (6163287): [<ffffffff8169f85b>] ip6_finish_output2+0x33b/0x670
[ 3014.524118] softirqs last  enabled at (6163266): [<ffffffff810888a2>] _local_bh_enable+0x22/0x50
[ 3014.524118] softirqs last disabled at (6163267): [<ffffffff81089f15>] irq_exit+0xc5/0xd0
[ 3014.524118] 
other info that might help us debug this:
[ 3014.524118]  Possible unsafe locking scenario:

[ 3014.524118]        CPU0
[ 3014.524118]        ----
[ 3014.524118]   lock(&(&flow->stats.stat->lock)->rlock);
[ 3014.524118]   <Interrupt>
[ 3014.524118]     lock(&(&flow->stats.stat->lock)->rlock);
[ 3014.524118] 
 *** DEADLOCK ***

[ 3014.524118] 5 locks held by swapper/1/0:
[ 3014.524118]  #0:  (((&idev->mc_ifc_timer))){+.-...}, at: [<ffffffff81090095>] call_timer_fn+0x5/0x180
[ 3014.524118]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816c5755>] mld_sendpack+0x5/0x330
[ 3014.524118]  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8169f579>] ip6_finish_output2+0x59/0x670
[ 3014.524118]  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815eb585>] __dev_queue_xmit+0x5/0x6a0
[ 3014.524118]  #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0221995>] internal_dev_xmit+0x5/0x90 [openvswitch]
[ 3014.524118] 
stack backtrace:
[ 3014.524118] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc2+ #1
[ 3014.524118] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 3014.524118]  ffffffff82338940 ffff8800bb2838f0 ffffffff816fd1ac ffff8800b6b33570
[ 3014.524118]  ffff8800bb283940 ffffffff816f9144 0000000000000005 ffff880000000001
[ 3014.524118]  ffff880000000000 0000000000000006 ffff8800b6b33570 ffffffff810d4380
[ 3014.524118] Call Trace:
[ 3014.524118]  <IRQ>  [<ffffffff816fd1ac>] dump_stack+0x4d/0x66
[ 3014.524118]  [<ffffffff816f9144>] print_usage_bug+0x1f3/0x204
[ 3014.524118]  [<ffffffff810d4380>] ? check_usage_backwards+0x130/0x130
[ 3014.524118]  [<ffffffff810d4da0>] mark_lock+0x270/0x300
[ 3014.524118]  [<ffffffff810d6718>] __lock_acquire+0x478/0x1c30
[ 3014.524118]  [<ffffffff810d4ede>] ? mark_held_locks+0xae/0x130
[ 3014.524118]  [<ffffffffa0220047>] ? find_bucket.isra.1+0x67/0x70 [openvswitch]
[ 3014.524118]  [<ffffffffa0220228>] ? masked_flow_lookup+0x78/0x140 [openvswitch]
[ 3014.524118]  [<ffffffff810d669a>] ? __lock_acquire+0x3fa/0x1c30
[ 3014.524118]  [<ffffffff810d7f80>] lock_acquire+0xb0/0x150
[ 3014.524118]  [<ffffffffa021b6ef>] ? ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
[ 3014.524118]  [<ffffffff81705c99>] _raw_spin_lock+0x39/0x50
[ 3014.524118]  [<ffffffffa021b6ef>] ? ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
[ 3014.524118]  [<ffffffffa021b6ef>] ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
[ 3014.524118]  [<ffffffffa021ac75>] ovs_dp_process_received_packet+0x75/0x100 [openvswitch]
[ 3014.524118]  [<ffffffffa02213ea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[ 3014.524118]  [<ffffffffa02219e6>] internal_dev_xmit+0x56/0x90 [openvswitch]
[ 3014.524118]  [<ffffffffa0221995>] ? internal_dev_xmit+0x5/0x90 [openvswitch]
[ 3014.524118]  [<ffffffff815eb276>] dev_hard_start_xmit+0x316/0x620
[ 3014.524118]  [<ffffffff815eb8ff>] __dev_queue_xmit+0x37f/0x6a0
[ 3014.524118]  [<ffffffff815eb585>] ? __dev_queue_xmit+0x5/0x6a0
[ 3014.524118]  [<ffffffff8169f8a0>] ? ip6_finish_output2+0x380/0x670
[ 3014.524118]  [<ffffffff815ebc30>] dev_queue_xmit+0x10/0x20
[ 3014.524118]  [<ffffffff8169f900>] ip6_finish_output2+0x3e0/0x670
[ 3014.524118]  [<ffffffff816a31aa>] ? ip6_finish_output+0x9a/0x200
[ 3014.524118]  [<ffffffff816a31aa>] ip6_finish_output+0x9a/0x200
[ 3014.524118]  [<ffffffff816a335f>] ip6_output+0x4f/0x1d0
[ 3014.524118]  [<ffffffff816c58e6>] mld_sendpack+0x196/0x330
[ 3014.524118]  [<ffffffff816c5755>] ? mld_sendpack+0x5/0x330
[ 3014.524118]  [<ffffffff816c63cd>] mld_ifc_timer_expire+0x19d/0x2d0
[ 3014.524118]  [<ffffffff816c6230>] ? mld_dad_timer_expire+0x70/0x70
[ 3014.524118]  [<ffffffff8109010a>] call_timer_fn+0x7a/0x180
[ 3014.524118]  [<ffffffff81090095>] ? call_timer_fn+0x5/0x180
[ 3014.524118]  [<ffffffff816c6230>] ? mld_dad_timer_expire+0x70/0x70
[ 3014.524118]  [<ffffffff8109054c>] run_timer_softirq+0x20c/0x2c0
[ 3014.524118]  [<ffffffff810899fd>] __do_softirq+0x12d/0x310
[ 3014.524118]  [<ffffffff81089f15>] irq_exit+0xc5/0xd0
[ 3014.524118]  [<ffffffff81711b75>] smp_apic_timer_interrupt+0x45/0x60
[ 3014.524118]  [<ffffffff817104b2>] apic_timer_interrupt+0x72/0x80
[ 3014.524118]  <EOI>  [<ffffffff81052486>] ? native_safe_halt+0x6/0x10
[ 3014.524118]  [<ffffffff8101ec04>] default_idle+0x24/0xe0
[ 3014.524118]  [<ffffffff8101f54e>] arch_cpu_idle+0x2e/0x40
[ 3014.524118]  [<ffffffff810e96ae>] cpu_startup_entry+0x9e/0x280
[ 3014.524118]  [<ffffffff81044929>] start_secondary+0x1d9/0x280

^ permalink raw reply

* Re: [PATCH v2 net-next] net: remove useless if check from register_netdevice()
From: Denis Kirjanov @ 2014-02-13 17:15 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, netdev
In-Reply-To: <CAADnVQJwP+ThozpeKOEvzoJRkYQvZgWyM9U+ixyeKKH9nkNb6w@mail.gmail.com>

All of the users of ndo_init return negative error code on error path.
Only staging stuff tries to use positive codes like octeon ethernet
which does request_irq inside ndo_open...

On 2/13/14, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Wed, Feb 12, 2014 at 8:47 PM, Denis Kirjanov <kda@linux-powerpc.org>
> wrote:
>> remove useless if check from register_netdevice()
>>
>> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
>> ---
>> v1 -> v2: Fixed identation
>> ---
>>  net/core/dev.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 4ad1b78..21a72ad 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5876,8 +5876,7 @@ int register_netdevice(struct net_device *dev)
>>         if (dev->netdev_ops->ndo_init) {
>>                 ret = dev->netdev_ops->ndo_init(dev);
>>                 if (ret) {
>> -                       if (ret > 0)
>> -                               ret = -EIO;
>> +                       ret = -EIO;
>>                         goto out;
>
> why is it a useless check?
> seems perfectly valid to me.
> most of the time ndo_init() returns negative error code like -ENOMEM
> which we want
> to propagate further down and want to override it to -EIO if it's > 0
>

^ permalink raw reply

* Re: [PATCH next resend] tcp: use zero-window when free_space is low
From: Rick Jones @ 2014-02-13 17:18 UTC (permalink / raw)
  To: Florian Westphal, netdev; +Cc: Neal Cardwell, Eric Dumazet, Yuchung Cheng
In-Reply-To: <1392292350-28800-1-git-send-email-fw@strlen.de>

On 02/13/2014 03:52 AM, Florian Westphal wrote:
>   Maybe someone has better idea?

Does tcp_rmem have to be a "hard" limit, or could it allowed to be fuzzy?

rick jones

^ permalink raw reply

* [PATCH net] packet: check for ndo_select_queue during queue selection
From: Daniel Borkmann @ 2014-02-13 17:18 UTC (permalink / raw)
  To: davem; +Cc: mathias.kretschmer, netdev, Jesper Dangaard Brouer

Mathias reported that on an AMD Geode LX embedded board (ALiX)
with ath9k driver PACKET_QDISC_BYPASS triggers a WARN_ON()
coming from the driver itself via 066dae93bdf ("ath9k: rework
tx queue selection and fix queue stopping/waking"). The reason
why this happened is that ndo_select_queue() call is not
invoked from direct xmit path i.e. for ieee80211 subsystem
that sets queue and TID (similar to 802.1d tag) which is being
put into the frame through 802.11e (WMM, QoS). If that is not
set, pending frame counter for e.g. ath9k can get messed up.
So the WARN_ON() in ath9k is absolutely legitimate. Generally,
the hw queue selection in ieee80211 depends on the type of
traffic, and priorities are set according to ieee80211_ac_numbers
mapping; working in a similar way as DiffServ only on a lower
layer, so that the AP can favour frames that have "real-time"
requirements like voice or video data frames. We were evaluating
this and concluded that it's best for PACKET_QDISC_BYPASS to
just probe for existance of this function and invoke it if
available so that drivers that implement it can make an
informed decision to which hw queue the frame should be
dispatched to. Only few drivers implement this function, so
for the majority of drivers nothing changes. The original
pktgen scenario for which PACKET_QDISC_BYPASS was designed
for is still intact with this change anyway. We think
restricting this feature only to ethernet drivers would be an
artificial barrier where we would instead only need to call
ndo_select_queue(). Besides, checking for ARPHRD_ETHER as pktgen
does wouldn't be sufficient as also many ieee80211 drivers
use it as dev->type. If side-effects that we documented are
undesired or use case involves buffering, then people just go
with the normal dev_queue_xmit() that is on default enabled.

Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
---
 Intended for 3.14.

 include/linux/netdevice.h | 20 ++++++++++++++++++++
 net/core/flow_dissector.c | 13 +------------
 net/packet/af_packet.c    | 20 ++++++++++++++++----
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 440a02e..5f8c860 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2276,6 +2276,26 @@ static inline void netdev_reset_queue(struct net_device *dev_queue)
 }
 
 /**
+ * 	netdev_cap_txqueue - check if selected tx queue exceeds device queues
+ * 	@dev: network device
+ * 	@queue_index: given tx queue index
+ *
+ * 	Returns 0 if given tx queue index >= number of device tx queues,
+ * 	otherwise returns the originally passed tx queue index.
+ */
+static inline u16 netdev_cap_txqueue(struct net_device *dev, u16 queue_index)
+{
+	if (unlikely(queue_index >= dev->real_num_tx_queues)) {
+		net_warn_ratelimited("%s selects TX queue %d, but real number of TX queues is %d\n",
+				     dev->name, queue_index,
+				     dev->real_num_tx_queues);
+		return 0;
+	}
+
+	return queue_index;
+}
+
+/**
  *	netif_running - test if up
  *	@dev: network device
  *
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 87577d4..d0e1fb1 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -323,17 +323,6 @@ u32 __skb_get_poff(const struct sk_buff *skb)
 	return poff;
 }
 
-static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
-{
-	if (unlikely(queue_index >= dev->real_num_tx_queues)) {
-		net_warn_ratelimited("%s selects TX queue %d, but real number of TX queues is %d\n",
-				     dev->name, queue_index,
-				     dev->real_num_tx_queues);
-		return 0;
-	}
-	return queue_index;
-}
-
 static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_XPS
@@ -409,7 +398,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 			queue_index = __netdev_pick_tx(dev, skb);
 
 		if (!accel_priv)
-			queue_index = dev_cap_txqueue(dev, queue_index);
+			queue_index = netdev_cap_txqueue(dev, queue_index);
 	}
 
 	skb_set_queue_mapping(skb, queue_index);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 6a2bb37..562ced9 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -308,9 +308,19 @@ static bool packet_use_direct_xmit(const struct packet_sock *po)
 	return po->xmit == packet_direct_xmit;
 }
 
-static u16 packet_pick_tx_queue(struct net_device *dev)
+static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb)
 {
-	return (u16) raw_smp_processor_id() % dev->real_num_tx_queues;
+	const struct net_device_ops *ops = dev->netdev_ops;
+	u16 queue_index;
+
+	if (ops->ndo_select_queue) {
+		queue_index = ops->ndo_select_queue(dev, skb, NULL);
+		queue_index = netdev_cap_txqueue(dev, queue_index);
+	} else {
+		queue_index = raw_smp_processor_id() % dev->real_num_tx_queues;
+	}
+
+	skb_set_queue_mapping(skb, queue_index);
 }
 
 /* register_prot_hook must be invoked with the po->bind_lock held,
@@ -2285,7 +2295,8 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 			}
 		}
 
-		skb_set_queue_mapping(skb, packet_pick_tx_queue(dev));
+		packet_pick_tx_queue(dev, skb);
+
 		skb->destructor = tpacket_destruct_skb;
 		__packet_set_status(po, ph, TP_STATUS_SENDING);
 		packet_inc_pending(&po->tx_ring);
@@ -2499,7 +2510,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
-	skb_set_queue_mapping(skb, packet_pick_tx_queue(dev));
+
+	packet_pick_tx_queue(dev, skb);
 
 	if (po->has_vnet_hdr) {
 		if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH v2] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: Vlad Yasevich @ 2014-02-13 17:14 UTC (permalink / raw)
  To: Matija Glavinic Pecotic
  Cc: linux-sctp@vger.kernel.org, netdev@vger.kernel.org,
	Alexander Sverdlin
In-Reply-To: <52FA8073.2060607@nsn.com>

On 02/11/2014 02:56 PM, Matija Glavinic Pecotic wrote:
> Hello Vlad,
>
> On 02/11/2014 03:52 PM, ext Vlad Yasevich wrote:
>> Hi Matija
>>
>> On 02/09/2014 02:15 AM, Matija Glavinic Pecotic wrote:
>>>
>>> --- net-next.orig/net/sctp/ulpevent.c
>>> +++ net-next/net/sctp/ulpevent.c
>>> @@ -989,7 +989,7 @@ static void sctp_ulpevent_receive_data(s
>>>  	skb = sctp_event2skb(event);
>>>  	/* Set the owner and charge rwnd for bytes received.  */
>>>  	sctp_ulpevent_set_owner(event, asoc);
>>> -	sctp_assoc_rwnd_decrease(asoc, skb_headlen(skb));
>>> +	sctp_assoc_rwnd_update(asoc, false);
>>>
>>>  	if (!skb->data_len)
>>>  		return;
>>> @@ -1035,8 +1035,9 @@ static void sctp_ulpevent_release_data(s
>>>  	}
>>>
>>>  done:
>>> -	sctp_assoc_rwnd_increase(event->asoc, len);
>>> -	sctp_ulpevent_release_owner(event);
>>> +	atomic_sub(event->rmem_len, &event->asoc->rmem_alloc);
>>> +	sctp_assoc_rwnd_update(event->asoc, true);
>>> +	sctp_association_put(event->asoc)
>>
>> Can't we simply change the order of window update and release instead
>> of open coding it like this?
>
> that was the initial idea, but sctp_ulpevent_release_owner puts
> the association and calls sctp_association_destroy if its time to
> do so. IMHO, in the case if we would switch it, we would open a
> potential race condition.

On the tx side, I agree that there would be race.  One the recieve side,
I don't think you could ever be in the codition where the last reference
on the asoc is held by a data chunk you are feeing.

However, to be completely safe we could do:
    asoc = event->asoc;
    sctp_association_hold(asoc);
    sctp_ulpevent_release_owner(event);
    sctp_assoc_rwnd_update(asoc, true);
    sctp_assocition_put(asoc);


The reason I don't like to open-code release owner is that if it
ever changes, we'll have to rememeber to change this open-coded
implementation as well.

Thanks
-vlad

>
> I agree this doesn't look the best. But since we should call
> sctp_assoc_rwnd_update after accounting and before put, we have only
> option to move sctp_assoc_rwnd_update to _ulpevent_release_owner. As on
> this path we wish to update peer and generate sack, but we for sure do not
> want it on all paths where ulpevent_release_owner is used, I see no
> alternative but to add additional parameter to ulpevent_release_owner
> which would be just passed to rwnd_update - bool update_peer. On the
> other hand, I wonder whether ulpevent_release_owner would do more then
> it should in that case?
>
>>
>> -vlad
>>
>>>  }
>>>
>>>  static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent
*event)
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>

^ permalink raw reply

* Re: [PATCH v2] usbnet: remove generic hard_header_len check
From: Emil Goode @ 2014-02-13 17:29 UTC (permalink / raw)
  To: David Laight
  Cc: Steve Glendinning, Oliver Neukum, Bjørn Mork,
	David S. Miller, Freddy Xin, Eric Dumazet, Ming Lei,
	Paul Gortmaker, Jeff Kirsher, Liu Junliang, Octavian Purdila,
	linux-usb@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6BEF3A@AcuExch.aculab.com>

On Thu, Feb 13, 2014 at 04:56:58PM +0000, David Laight wrote:
> From: Of Emil Goode
> > This patch removes a generic hard_header_len check from the usbnet
> > module that is causing dropped packages under certain circumstances
> > for devices that send rx packets that cross urb boundaries.
> > 
> > One example is the AX88772B which occasionally send rx packets that
> > cross urb boundaries where the remaining partial packet is sent with
> > no hardware header. When the buffer with a partial packet is of less
> > number of octets than the value of hard_header_len the buffer is
> > discarded by the usbnet module.
> ...
> > diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
> > index d6f64da..955df81 100644
> > --- a/drivers/net/usb/ax88179_178a.c
> > +++ b/drivers/net/usb/ax88179_178a.c
> > @@ -1118,6 +1118,10 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
> >  	u16 hdr_off;
> >  	u32 *pkt_hdr;
> > 
> > +	/* This check is no longer done by usbnet */
> > +	if (skb->len < dev->net->hard_header_len)
> > +		return 0;
> > +
> 
> The ax88179 driver can also receive ethernet frames that cross the
> end of rx URB.
> It should have the code to save the last fragment (of a urb) until
> the next data arrives, and then correctly merge the fragments.
> 
> It is likely that any sub-driver that sets the receive urb length
> to a multiple of 1k (rather than leaving it at hard_hdr+mtu) can defrag
> rx data that crosses urb boundaries.

Yes probably the check is unnecessary to many of the devices but
it should be tested by someone who has the hardware.
If you have a ax88179 device to test this patch but without the
hard_header_len check added to the ax88179_rx_fixup function that
would be great. Then we don't need to introduce that check.

Best regards,

Emil Goode

^ permalink raw reply

* Re: ovs inconsistent lock state
From: Pravin Shelar @ 2014-02-13 17:42 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, Jesse Gross, dev@openvswitch.org
In-Reply-To: <20140213171307.GC2829@minipsycho.orion>

On Thu, Feb 13, 2014 at 9:13 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Hi.
>
> On current net-next I'm getting following message once I add a dp:
>
> [ 3014.523665] =================================
> [ 3014.524118] [ INFO: inconsistent lock state ]
> [ 3014.524118] 3.14.0-rc2+ #1 Not tainted
> [ 3014.524118] ---------------------------------
> [ 3014.524118] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [ 3014.524118] swapper/1/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
> [ 3014.524118]  (&(&flow->stats.stat->lock)->rlock){+.?...}, at: [<ffffffffa021b6ef>] ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
> [ 3014.524118] {SOFTIRQ-ON-W} state was registered at:
> [ 3014.524118]   [<ffffffff810d6843>] __lock_acquire+0x5a3/0x1c30
> [ 3014.524118]   [<ffffffff810d7f80>] lock_acquire+0xb0/0x150
> [ 3014.524118]   [<ffffffff81705c99>] _raw_spin_lock+0x39/0x50
> [ 3014.524118]   [<ffffffffa021b8d3>] ovs_flow_stats_get+0x163/0x1b0 [openvswitch]
> [ 3014.524118]   [<ffffffffa0218ce5>] ovs_flow_cmd_fill_info+0x165/0x2b0 [openvswitch]
> [ 3014.524118]   [<ffffffffa021904c>] ovs_flow_cmd_build_info.constprop.27+0x6c/0xa0 [openvswitch]
> [ 3014.524118]   [<ffffffffa0219519>] ovs_flow_cmd_new_or_set+0x499/0x4f0 [openvswitch]
> [ 3014.524118]   [<ffffffff8161bf09>] genl_family_rcv_msg+0x199/0x390
> [ 3014.524118]   [<ffffffff8161c18e>] genl_rcv_msg+0x8e/0xd0
> [ 3014.524118]   [<ffffffff8161a189>] netlink_rcv_skb+0xa9/0xc0
> [ 3014.524118]   [<ffffffff8161a6a8>] genl_rcv+0x28/0x40
> [ 3014.524118]   [<ffffffff816197d0>] netlink_unicast+0xf0/0x1b0
> [ 3014.524118]   [<ffffffff81619b8f>] netlink_sendmsg+0x2ff/0x740
> [ 3014.524118]   [<ffffffff815ce86b>] sock_sendmsg+0x8b/0xc0
> [ 3014.524118]   [<ffffffff815cf369>] ___sys_sendmsg+0x389/0x3a0
> [ 3014.524118]   [<ffffffff815cfa62>] __sys_sendmsg+0x42/0x80
> [ 3014.524118]   [<ffffffff815cfab2>] SyS_sendmsg+0x12/0x20
> [ 3014.524118]   [<ffffffff8170f829>] system_call_fastpath+0x16/0x1b
> [ 3014.524118] irq event stamp: 6163288
> [ 3014.524118] hardirqs last  enabled at (6163288): [<ffffffff8169f8a0>] ip6_finish_output2+0x380/0x670
> [ 3014.524118] hardirqs last disabled at (6163287): [<ffffffff8169f85b>] ip6_finish_output2+0x33b/0x670
> [ 3014.524118] softirqs last  enabled at (6163266): [<ffffffff810888a2>] _local_bh_enable+0x22/0x50
> [ 3014.524118] softirqs last disabled at (6163267): [<ffffffff81089f15>] irq_exit+0xc5/0xd0
> [ 3014.524118]
> other info that might help us debug this:
> [ 3014.524118]  Possible unsafe locking scenario:
>
> [ 3014.524118]        CPU0
> [ 3014.524118]        ----
> [ 3014.524118]   lock(&(&flow->stats.stat->lock)->rlock);
> [ 3014.524118]   <Interrupt>
> [ 3014.524118]     lock(&(&flow->stats.stat->lock)->rlock);
> [ 3014.524118]
>  *** DEADLOCK ***
>
> [ 3014.524118] 5 locks held by swapper/1/0:
> [ 3014.524118]  #0:  (((&idev->mc_ifc_timer))){+.-...}, at: [<ffffffff81090095>] call_timer_fn+0x5/0x180
> [ 3014.524118]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816c5755>] mld_sendpack+0x5/0x330
> [ 3014.524118]  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8169f579>] ip6_finish_output2+0x59/0x670
> [ 3014.524118]  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815eb585>] __dev_queue_xmit+0x5/0x6a0
> [ 3014.524118]  #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0221995>] internal_dev_xmit+0x5/0x90 [openvswitch]
> [ 3014.524118]
> stack backtrace:
> [ 3014.524118] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc2+ #1
> [ 3014.524118] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 3014.524118]  ffffffff82338940 ffff8800bb2838f0 ffffffff816fd1ac ffff8800b6b33570
> [ 3014.524118]  ffff8800bb283940 ffffffff816f9144 0000000000000005 ffff880000000001
> [ 3014.524118]  ffff880000000000 0000000000000006 ffff8800b6b33570 ffffffff810d4380
> [ 3014.524118] Call Trace:
> [ 3014.524118]  <IRQ>  [<ffffffff816fd1ac>] dump_stack+0x4d/0x66
> [ 3014.524118]  [<ffffffff816f9144>] print_usage_bug+0x1f3/0x204
> [ 3014.524118]  [<ffffffff810d4380>] ? check_usage_backwards+0x130/0x130
> [ 3014.524118]  [<ffffffff810d4da0>] mark_lock+0x270/0x300
> [ 3014.524118]  [<ffffffff810d6718>] __lock_acquire+0x478/0x1c30
> [ 3014.524118]  [<ffffffff810d4ede>] ? mark_held_locks+0xae/0x130
> [ 3014.524118]  [<ffffffffa0220047>] ? find_bucket.isra.1+0x67/0x70 [openvswitch]
> [ 3014.524118]  [<ffffffffa0220228>] ? masked_flow_lookup+0x78/0x140 [openvswitch]
> [ 3014.524118]  [<ffffffff810d669a>] ? __lock_acquire+0x3fa/0x1c30
> [ 3014.524118]  [<ffffffff810d7f80>] lock_acquire+0xb0/0x150
> [ 3014.524118]  [<ffffffffa021b6ef>] ? ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
> [ 3014.524118]  [<ffffffff81705c99>] _raw_spin_lock+0x39/0x50
> [ 3014.524118]  [<ffffffffa021b6ef>] ? ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
> [ 3014.524118]  [<ffffffffa021b6ef>] ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
> [ 3014.524118]  [<ffffffffa021ac75>] ovs_dp_process_received_packet+0x75/0x100 [openvswitch]
> [ 3014.524118]  [<ffffffffa02213ea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> [ 3014.524118]  [<ffffffffa02219e6>] internal_dev_xmit+0x56/0x90 [openvswitch]
> [ 3014.524118]  [<ffffffffa0221995>] ? internal_dev_xmit+0x5/0x90 [openvswitch]
> [ 3014.524118]  [<ffffffff815eb276>] dev_hard_start_xmit+0x316/0x620
> [ 3014.524118]  [<ffffffff815eb8ff>] __dev_queue_xmit+0x37f/0x6a0
> [ 3014.524118]  [<ffffffff815eb585>] ? __dev_queue_xmit+0x5/0x6a0
> [ 3014.524118]  [<ffffffff8169f8a0>] ? ip6_finish_output2+0x380/0x670
> [ 3014.524118]  [<ffffffff815ebc30>] dev_queue_xmit+0x10/0x20
> [ 3014.524118]  [<ffffffff8169f900>] ip6_finish_output2+0x3e0/0x670
> [ 3014.524118]  [<ffffffff816a31aa>] ? ip6_finish_output+0x9a/0x200
> [ 3014.524118]  [<ffffffff816a31aa>] ip6_finish_output+0x9a/0x200
> [ 3014.524118]  [<ffffffff816a335f>] ip6_output+0x4f/0x1d0
> [ 3014.524118]  [<ffffffff816c58e6>] mld_sendpack+0x196/0x330
> [ 3014.524118]  [<ffffffff816c5755>] ? mld_sendpack+0x5/0x330
> [ 3014.524118]  [<ffffffff816c63cd>] mld_ifc_timer_expire+0x19d/0x2d0
> [ 3014.524118]  [<ffffffff816c6230>] ? mld_dad_timer_expire+0x70/0x70
> [ 3014.524118]  [<ffffffff8109010a>] call_timer_fn+0x7a/0x180
> [ 3014.524118]  [<ffffffff81090095>] ? call_timer_fn+0x5/0x180
> [ 3014.524118]  [<ffffffff816c6230>] ? mld_dad_timer_expire+0x70/0x70
> [ 3014.524118]  [<ffffffff8109054c>] run_timer_softirq+0x20c/0x2c0
> [ 3014.524118]  [<ffffffff810899fd>] __do_softirq+0x12d/0x310
> [ 3014.524118]  [<ffffffff81089f15>] irq_exit+0xc5/0xd0
> [ 3014.524118]  [<ffffffff81711b75>] smp_apic_timer_interrupt+0x45/0x60
> [ 3014.524118]  [<ffffffff817104b2>] apic_timer_interrupt+0x72/0x80
> [ 3014.524118]  <EOI>  [<ffffffff81052486>] ? native_safe_halt+0x6/0x10
> [ 3014.524118]  [<ffffffff8101ec04>] default_idle+0x24/0xe0
> [ 3014.524118]  [<ffffffff8101f54e>] arch_cpu_idle+0x2e/0x40
> [ 3014.524118]  [<ffffffff810e96ae>] cpu_startup_entry+0x9e/0x280
> [ 3014.524118]  [<ffffffff81044929>] start_secondary+0x1d9/0x280
> --

Since ovs disable bh while reading stats from local cpu, deadlock can
not occur. This is false positive.

> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Intermittent lockups creating/manipulating network namespaces (was Re: large degradation in ip netns add/exec performance in 3.13?)
From: Rick Jones @ 2014-02-13 18:02 UTC (permalink / raw)
  To: netdev
In-Reply-To: <52FAC6D1.50702@hp.com>

On 02/11/2014 04:56 PM, Rick Jones wrote:
> On 02/11/2014 02:26 PM, Rick Jones wrote:
>> I did have the system lockup once at 16 concurrent streams on the 3.13.0
>> kernel.  Didn't happen the next two times I tried.
>
> I've had it happen again with a slightly tweaked ("heavier") fake router
> creation.  I'm guessing the attached soft lockup messages are related
> and may help point somewhere.

The 16 concurrent streams lockup.  The "rest of the system" keeps going.

I've had the same hang happen with 3.14.0-rc2 as well.  I was able to 
take a crash dump via "echo c > /proc/sysrq-trigger" but I have no clue 
how to look at it.  More than willing to upload it to netperf.org.

rick jones

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox