Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: Jamal Hadi Salim @ 2014-11-10 12:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, jiri, netdev, nhorman, andy, tgraf, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460B090.7080106@redhat.com>

On 11/10/14 07:33, Daniel Borkmann wrote:

> $ git grep -n MAX_PHYS_PORT_ID_LEN
> include/linux/netdevice.h:756:#define MAX_PHYS_PORT_ID_LEN 32
> include/linux/netdevice.h:762:  unsigned char id[MAX_PHYS_PORT_ID_LEN];
> net/core/rtnetlink.c:871:              +
> nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
> net/core/rtnetlink.c:1199:      [IFLA_PHYS_PORT_ID]     = { .type =
> NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
>
> ... and based on commit 66cae9ed6bc4 ("rtnl: export physical port id
> via RT netlink") only exported as read-only.
>

I guess it is *not exported* if no user space code sees it.
If that is the case, I agree that my suggestion is unneeded.

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Jamal Hadi Salim @ 2014-11-10 13:11 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-8-git-send-email-jiri@resnulli.us>

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To notify switch driver of change in STP state of bridge port, add new
> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
> code then.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/netdevice.h |  6 ++++++
>   include/net/switchdev.h   |  6 ++++++
>   net/bridge/br_netlink.c   |  2 ++
>   net/bridge/br_stp.c       |  4 ++++
>   net/bridge/br_stp_if.c    |  3 +++
>   net/bridge/br_stp_timer.c |  2 ++
>   net/switchdev/switchdev.c | 19 +++++++++++++++++++
>   7 files changed, 42 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 116a19d..35f21a95 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    *			      const unsigned char *addr,
>    *			      u16 vid);
>    *	Called to delete a fdb from switch device port.
> + *
> + * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
> + *	Called to notify switch device port of bridge port STP
> + *	state change.

You are unconditionally calling
netdev_sw_port_stp_update(p->dev, p->state);
Again issue is policy. Could you make this work the same
way the fdb_add e.g user intent of whether i want to turn
a port in hardware and/or software to disabled/learning/etc
is reflected?

btw: does _sw_ stand for switch? why not _hw_ ?
Could we have one ndo for all flags instead of individual ones.

I know the current user space code uses u8 as a bitflag; but
maybe we can introduce a new u32 flag bitmask that has all the
flags set for backward compat? I can count about a total of 10.

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Jiri Pirko @ 2014-11-10 13:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460AC90.9060607@mojatatu.com>

Mon, Nov 10, 2014 at 01:16:16PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 02:23, Jiri Pirko wrote:
>
>>
>>Yes I looked over their patches. Roopas patche's are about new class of
>>device which, as I commented in the cover letter, I left out for now and
>>can be safely added later on.
>>
>>I went over the Ben's work very carefully as well. The patches are very
>>rough, mostly rtl-chip specific. But again, my patchset is a base on
>>which this patches can be build on. I see no issues in that.
>>
>>>At least please get their sign on - this  is such an important piece of
>>>new work that you should make sure you get consensus.
>>
>>Since I did not use their code now, I only put sign off of Scott.
>>
>
>Your last comment was "i am going to merge the patches" ;->
>At least send an email explaining your plan to people who have worked
>hard to cooperate with you or say it in the cover letter.

Well I had feedback only from Roopa and we discussed it in following
email thread. I never had any feedback from Ben. I only saw you pasting
link to his git.

There's really nothing else to merge at the moment. I would love to went
over patches and merge them into my tree if anyone sends them.

>
>>>Otherwise we are back to square one and everyone is going their way with
>>>their patches;
>>
>>I do think that we are in sync. I do not see any counter ways. As I
>>said, their work can be added on to the base made of this patchset.
>>
>
>Ok, I hope so. I spoke for myself - it is important for this patches
>you get their sign-on in my opinion.
>
>cheers,
>jamal
>

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: Jiri Pirko @ 2014-11-10 13:16 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460ACC0.3020805@mojatatu.com>

Mon, Nov 10, 2014 at 01:17:04PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 02:43, Jiri Pirko wrote:
>>Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:
>
>>I don't see a reason why this would break kabi:
>>
>>-#define MAX_PHYS_PORT_ID_LEN 32
>>+#define MAX_PHYS_ITEM_ID_LEN 32
>>
>
>refer to my response to Dave. Just define MAX_PHYS_PORT_ID_LEN to
>MAX_PHYS_ITEM_ID_LEN so people dont have to change their code
>because a name change.

Jamal, please look at the patch & code. MAX_PHYS_PORT_ID_LEN is in
include/linux/netdevice.h which is not part of user exported api.

>
>cheers,
>jamal

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: Jamal Hadi Salim @ 2014-11-10 13:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <20141110131622.GC4256@nanopsycho.orion>

On 11/10/14 08:16, Jiri Pirko wrote:

> Jamal, please look at the patch & code. MAX_PHYS_PORT_ID_LEN is in
> include/linux/netdevice.h which is not part of user exported api.
>

I got it - the confusing part was rtnetlink.c was looking for it
as if it was expecting user space to send it.

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
From: Jiri Pirko @ 2014-11-10 13:47 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460B3E5.7020502@mojatatu.com>

Mon, Nov 10, 2014 at 01:47:33PM CET, jhs@mojatatu.com wrote:
>On 11/10/14 03:15, Jiri Pirko wrote:
>>Mon, Nov 10, 2014 at 04:47:48AM CET, jhs@mojatatu.com wrote:
>>>On 11/09/14 05:51, Jiri Pirko wrote:
>>>>From: Scott Feldman <sfeldma@gmail.com>
>>>>
>
>>Jamal, I believe we discussed this already.
>
>I cant remember how that ended.
>
>>The thing is that current
>>fdb_add/del does not need vlanid and master/self flags, because it
>>already has that (struct nlattr *tb[]). Here is the whole list of
>>parameters to these functions:
>>         NDA_DST,
>>         NDA_LLADDR,
>>         NDA_CACHEINFO,
>>         NDA_PROBES,
>>         NDA_VLAN,
>>         NDA_PORT,
>>         NDA_VNI,
>>         NDA_IFINDEX,
>>         NDA_MASTER,
>>
>>There are few problems in re-using this. It is netlink based so for calling
>>it from bridge code, we would have to construct netlink message. But
>>that could be probably changed.
>
>Trying to understand.
>
>A netlink message for a bridge to add an fdb is targeted at the
>*bridge port*.
>That message has semantic which says "please add this entry
>to the software bridge and/or offloaded hardware".
>If something is targetted at the bridge port, ->ndo_fdb_add()
>is invoked with an internally chewed structure.
>Why would you have to construct a new netlink message to the driver?

Because now, If you would like to pass one of NDA_DST, NDA_LLADDR,
NDA_CACHEINFO, NDA_PROBES, NDA_VLAN, NDA_PORT, NDA_VNI, NDA_IFINDEX,
NDA_MASTER values via ndo_fdb_add/del to the driver, you have to
construct "struct nlattr *tb[]". Preprocessing this tb into struct might
be suitable for some use-case, for some it may not.


>
>
>>As you can see from the list of parameters, this is no longer about fdb (addr,
>>vlanid) but this has been extended to something else.
>
>I am still missing understanding that part.
>Or maybe are you saying that you dont want to pass netlink
>constructs to the driver?

What I try to say is that the naming ndo_fdb_add/del is not accurate
because it is now used for far more than fdb (addr, vlan). See vxlan
code for example.


>
>>See vxlan code for
>>what this is used for. I believe that fdb_add/del should be renamed to
>>something else, perhaps l2neigh_add/del or something like that.
>>The other problem is that fdb_add/del is currently used by various
>>drivers for different purpose (adding macs to unicast list).
>>
>
>Ok, now a small spark ignited in my brain. You did talk about renaming
>things to neighXXX in one of the exchanges. I think this is a separate
>issue from the question of why you cant refactor ndo_fdb_add/del

It can be probably refactored in a way so it fits our fdb offloading
needs. I'm not really sure we would want it. ndo_fdb_* use-case
is dirrerent from what we introduce with ndo_sw_port_fdb_*. The only
similarity is the "fdb" name which in case of ndo_fdb_* is no longer
correct I believe.


>
>The abuse of using this interface for unicast addresses is probably
>driven by the fact some of the hardware probably offloads vlanid 0 or
>something speacial like 4095 to point to the underlying hardware that
>"this belongs to host cpu".
>I am not a fan of it (and have posted in exchanges with Vlad in the
>past).
>
>cheers,
>jamal

^ permalink raw reply

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
From: Thomas Graf @ 2014-11-10 13:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl@
In-Reply-To: <20141110081552.GD1850@nanopsycho.orion>

On 11/10/14 at 09:15am, Jiri Pirko wrote:
> There are few problems in re-using this. It is netlink based so for calling
> it from bridge code, we would have to construct netlink message. But
> that could be probably changed.
> As you can see from the list of parameters, this is no longer about fdb (addr,
> vlanid) but this has been extended to something else. See vxlan code for
> what this is used for. I believe that fdb_add/del should be renamed to
> something else, perhaps l2neigh_add/del or something like that.
> The other problem is that fdb_add/del is currently used by various
> drivers for different purpose (adding macs to unicast list).

Can you elaborate a bit on the intended semantic differences between
the existing ndo_fdb_add() and ndo_sw_port_fdb_add()? I'm not sure we
need the sw_ prefix for this specific ndo.

I completely agree that relying on Netlink is wrong because we'll have
in-kernel users of the API but I believe that existing ndo_fdb_add()
implementations in i40e, ixgbe, qlcnic and macvlan could use the new
API you propose.

How about we rename the existing ndo_fdb_add() to ndo_neigh_add() as
you propose and convert vxlan over to it and have all others which don't
even depend on the Netlink attributes being passed in (i40e, ixgbe,
qlcnic, macvlan) use ndo_fdb_add() which would have the behaviour of your
proposed ndo_sw_port_fdb_add()?

^ permalink raw reply

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Thomas Graf @ 2014-11-10 14:04 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460B989.8030404@mojatatu.com>

On 11/10/14 at 08:11am, Jamal Hadi Salim wrote:
> You are unconditionally calling
> netdev_sw_port_stp_update(p->dev, p->state);
> Again issue is policy. Could you make this work the same
> way the fdb_add e.g user intent of whether i want to turn
> a port in hardware and/or software to disabled/learning/etc
> is reflected?

Agreed. Can be added in a next series perhaps?

> btw: does _sw_ stand for switch? why not _hw_ ?
> Could we have one ndo for all flags instead of individual ones.
> 
> I know the current user space code uses u8 as a bitflag; but
> maybe we can introduce a new u32 flag bitmask that has all the
> flags set for backward compat? I can count about a total of 10.

I think we can just extend the size of IFLA_BRPORT_STATE, accept
both a u8 and u32, and return a u32 that that is compatible to
existing u8 readers.

^ permalink raw reply

* Re: [RFC PATCH net-next] net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
From: Nicolas Dichtel @ 2014-11-10 14:27 UTC (permalink / raw)
  To: Joe Perches, David Miller; +Cc: netdev, linux-kernel, Remi Denis-Courmont
In-Reply-To: <1415560642.23530.43.camel@perches.com>

Le 09/11/2014 20:17, Joe Perches a écrit :
> Use the more common dynamic_debug capable net_dbg_ratelimited
> and remove the LIMIT_NETDEBUG macro.
>
> This may have some negative impact on messages that were
> emitted at KERN_INFO that are not not enabled at all unless
> DEBUG is defined or dynamic_debug is enabled.  Even so,
> these messages are now _not_ emitted by default.
>
> This eliminates the use of the net_msg_warn sysctl
> "/proc/sys/net/core/warnings".
>
> All messages are still ratelimited.
>
> Some KERN_LEVEL uses are changed to KERN_DEBUG.
>
> Miscellanea:
>
> o Update the sysctl documentation
> o Remove the embedded uses of pr_fmt
> o Coalesce format fragments
> o Realign arguments
>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>
> Let me know if you want this consolidate patch broken up
> into multiple patches or any of the messages and the
> macro kept.
>
>   Documentation/sysctl/net.txt | 12 ++++++++----
>   include/net/sock.h           |  8 +-------
>   include/net/udplite.h        |  6 +++---
>   net/ipv4/icmp.c              |  8 ++++----
>   net/ipv4/inet_fragment.c     |  2 +-
>   net/ipv4/ip_fragment.c       |  3 +--
>   net/ipv4/tcp_input.c         |  8 ++++----
>   net/ipv4/tcp_timer.c         | 18 ++++++++++--------
>   net/ipv4/udp.c               | 30 +++++++++++++++---------------
>   net/ipv6/addrconf.c          |  6 ++----
>   net/ipv6/ah6.c               |  7 +++----
>   net/ipv6/datagram.c          |  4 ++--
>   net/ipv6/esp6.c              |  4 ++--
>   net/ipv6/exthdrs.c           | 18 +++++++++---------
>   net/ipv6/icmp.c              | 15 +++++++--------
>   net/ipv6/mip6.c              | 11 ++++++-----
>   net/ipv6/netfilter.c         |  2 +-
>   net/ipv6/udp.c               | 31 +++++++++++++------------------
>   net/phonet/af_phonet.c       |  9 +++++----
>   net/phonet/pep-gprs.c        |  3 +--
>   net/phonet/pep.c             | 12 ++++++------
>   21 files changed, 104 insertions(+), 113 deletions(-)
>
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index 04892b8..46cd03d 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -120,10 +120,14 @@ seconds.
>   warnings
>   --------
>
> -This controls console messages from the networking stack that can occur because
> -of problems on the network like duplicate address or bad checksums. Normally,
> -this should be enabled, but if the problem persists the messages can be
> -disabled.
> +This sysctl is now unused.
> +
> +This was used to control console messages from the networking stack that
> +occur because of problems on the network like duplicate address or bad
> +checksums.
> +
> +These messages are now emitted at KERN_DEBUG and can generally be enabled
> +and controlled by the dynamic_debug facility.
>
>   netdev_budget
>   -------------
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 6767d75..db363ad 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2276,13 +2276,7 @@ bool sk_ns_capable(const struct sock *sk,
>   bool sk_capable(const struct sock *sk, int cap);
>   bool sk_net_capable(const struct sock *sk, int cap);
>
> -/*
> - *	Enable debug/info messages
> - */
> -extern int net_msg_warn;
> -#define LIMIT_NETDEBUG(fmt, args...) \
> -	do { if (net_msg_warn && net_ratelimit()) printk(fmt,##args); } while(0)
> -
> +extern int net_msg_warn;	/* Unused, but still a sysctl */
Why not removing this variable from this header and from net/core/utils.c?
Just declaring a static variable in net/core/sysctl_net_core.c should be enough.
Am I missing something?

Nicolas

^ permalink raw reply

* Re: [PATCH] stmmac: split to core library and probe drivers
From: Giuseppe CAVALLARO @ 2014-11-10 14:28 UTC (permalink / raw)
  To: Andy Shevchenko, netdev, Kweh Hock Leong, David S . Miller,
	Vince Bridgers
In-Reply-To: <1415615939-6671-1-git-send-email-andriy.shevchenko@linux.intel.com>

Hi Andy

On 11/10/2014 11:38 AM, Andy Shevchenko wrote:
> Instead of registering the platform and PCI drivers in one module let's move
> necessary bits to where it belongs. During this procedure we convert the module
> registration part to use module_*_driver() macros which makes code simplier.
>
>>From now on the driver consists three parts: core library, PCI, and platform
> drivers.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

patch looks fine and net-next.

Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

> ---
>   drivers/net/ethernet/stmicro/stmmac/Kconfig        |  4 +-
>   drivers/net/ethernet/stmicro/stmmac/Makefile       | 15 +++---
>   drivers/net/ethernet/stmicro/stmmac/stmmac.h       | 61 ----------------------
>   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 36 ++-----------
>   drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c   |  4 +-
>   .../net/ethernet/stmicro/stmmac/stmmac_platform.c  | 13 ++---
>   6 files changed, 25 insertions(+), 108 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> index 33b85ba..7d3af19 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
> +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> @@ -14,7 +14,7 @@ config STMMAC_ETH
>   if STMMAC_ETH
>
>   config STMMAC_PLATFORM
> -	bool "STMMAC Platform bus support"
> +	tristate "STMMAC Platform bus support"
>   	depends on STMMAC_ETH
>   	default y
>   	---help---
> @@ -27,7 +27,7 @@ config STMMAC_PLATFORM
>   	  If unsure, say N.
>
>   config STMMAC_PCI
> -	bool "STMMAC PCI bus support"
> +	tristate "STMMAC PCI bus support"
>   	depends on STMMAC_ETH && PCI
>   	---help---
>   	  This is to select the Synopsys DWMAC available on PCI devices,
> diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile b/drivers/net/ethernet/stmicro/stmmac/Makefile
> index 034da70..ac4d562 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/Makefile
> +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
> @@ -1,9 +1,12 @@
>   obj-$(CONFIG_STMMAC_ETH) += stmmac.o
> -stmmac-$(CONFIG_STMMAC_PCI) += stmmac_pci.o
> -stmmac-$(CONFIG_STMMAC_PLATFORM) +=	stmmac_platform.o dwmac-meson.o \
> -					dwmac-sunxi.o dwmac-sti.o \
> -					dwmac-socfpga.o
>   stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o ring_mode.o	\
> -	      chain_mode.o dwmac_lib.o dwmac1000_core.o  dwmac1000_dma.o \
> -	      dwmac100_core.o dwmac100_dma.o enh_desc.o  norm_desc.o \
> +	      chain_mode.o dwmac_lib.o dwmac1000_core.o dwmac1000_dma.o	\
> +	      dwmac100_core.o dwmac100_dma.o enh_desc.o norm_desc.o	\
>   	      mmc_core.o stmmac_hwtstamp.o stmmac_ptp.o $(stmmac-y)
> +
> +obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o
> +stmmac-platform-objs:= stmmac_platform.o dwmac-meson.o dwmac-sunxi.o	\
> +		       dwmac-sti.o dwmac-socfpga.o
> +
> +obj-$(CONFIG_STMMAC_PCI) += stmmac-pci.o
> +stmmac-pci-objs:= stmmac_pci.o
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> index bd75ee8..c0a3919 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> @@ -134,65 +134,4 @@ struct stmmac_priv *stmmac_dvr_probe(struct device *device,
>   void stmmac_disable_eee_mode(struct stmmac_priv *priv);
>   bool stmmac_eee_init(struct stmmac_priv *priv);
>
> -#ifdef CONFIG_STMMAC_PLATFORM
> -extern struct platform_driver stmmac_pltfr_driver;
> -
> -static inline int stmmac_register_platform(void)
> -{
> -	int err;
> -
> -	err = platform_driver_register(&stmmac_pltfr_driver);
> -	if (err)
> -		pr_err("stmmac: failed to register the platform driver\n");
> -
> -	return err;
> -}
> -
> -static inline void stmmac_unregister_platform(void)
> -{
> -	platform_driver_unregister(&stmmac_pltfr_driver);
> -}
> -#else
> -static inline int stmmac_register_platform(void)
> -{
> -	pr_debug("stmmac: do not register the platf driver\n");
> -
> -	return 0;
> -}
> -
> -static inline void stmmac_unregister_platform(void)
> -{
> -}
> -#endif /* CONFIG_STMMAC_PLATFORM */
> -
> -#ifdef CONFIG_STMMAC_PCI
> -extern struct pci_driver stmmac_pci_driver;
> -static inline int stmmac_register_pci(void)
> -{
> -	int err;
> -
> -	err = pci_register_driver(&stmmac_pci_driver);
> -	if (err)
> -		pr_err("stmmac: failed to register the PCI driver\n");
> -
> -	return err;
> -}
> -
> -static inline void stmmac_unregister_pci(void)
> -{
> -	pci_unregister_driver(&stmmac_pci_driver);
> -}
> -#else
> -static inline int stmmac_register_pci(void)
> -{
> -	pr_debug("stmmac: do not register the PCI driver\n");
> -
> -	return 0;
> -}
> -
> -static inline void stmmac_unregister_pci(void)
> -{
> -}
> -#endif /* CONFIG_STMMAC_PCI */
> -
>   #endif /* __STMMAC_H__ */
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 53db11b..0f1c146 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2885,6 +2885,7 @@ error_clk_get:
>
>   	return ERR_PTR(ret);
>   }
> +EXPORT_SYMBOL_GPL(stmmac_dvr_probe);
>
>   /**
>    * stmmac_dvr_remove
> @@ -2914,8 +2915,8 @@ int stmmac_dvr_remove(struct net_device *ndev)
>
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(stmmac_dvr_remove);
>
> -#ifdef CONFIG_PM
>   int stmmac_suspend(struct net_device *ndev)
>   {
>   	struct stmmac_priv *priv = netdev_priv(ndev);
> @@ -2957,6 +2958,7 @@ int stmmac_suspend(struct net_device *ndev)
>   	priv->oldduplex = -1;
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(stmmac_suspend);
>
>   int stmmac_resume(struct net_device *ndev)
>   {
> @@ -3003,37 +3005,7 @@ int stmmac_resume(struct net_device *ndev)
>
>   	return 0;
>   }
> -#endif /* CONFIG_PM */
> -
> -/* Driver can be configured w/ and w/ both PCI and Platf drivers
> - * depending on the configuration selected.
> - */
> -static int __init stmmac_init(void)
> -{
> -	int ret;
> -
> -	ret = stmmac_register_platform();
> -	if (ret)
> -		goto err;
> -	ret = stmmac_register_pci();
> -	if (ret)
> -		goto err_pci;
> -	return 0;
> -err_pci:
> -	stmmac_unregister_platform();
> -err:
> -	pr_err("stmmac: driver registration failed\n");
> -	return ret;
> -}
> -
> -static void __exit stmmac_exit(void)
> -{
> -	stmmac_unregister_platform();
> -	stmmac_unregister_pci();
> -}
> -
> -module_init(stmmac_init);
> -module_exit(stmmac_exit);
> +EXPORT_SYMBOL_GPL(stmmac_resume);
>
>   #ifndef MODULE
>   static int __init stmmac_cmdline_opt(char *str)
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> index 5084699..77a6d68 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> @@ -158,7 +158,7 @@ static const struct pci_device_id stmmac_id_table[] = {
>
>   MODULE_DEVICE_TABLE(pci, stmmac_id_table);
>
> -struct pci_driver stmmac_pci_driver = {
> +static struct pci_driver stmmac_pci_driver = {
>   	.name = STMMAC_RESOURCE_NAME,
>   	.id_table = stmmac_id_table,
>   	.probe = stmmac_pci_probe,
> @@ -168,6 +168,8 @@ struct pci_driver stmmac_pci_driver = {
>   	},
>   };
>
> +module_pci_driver(stmmac_pci_driver);
> +
>   MODULE_DESCRIPTION("STMMAC 10/100/1000 Ethernet PCI driver");
>   MODULE_AUTHOR("Rayagond Kokatanur <rayagond.kokatanur@vayavyalabs.com>");
>   MODULE_AUTHOR("Giuseppe Cavallaro <peppe.cavallaro@st.com>");
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index 9f18401..e22a960 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -362,7 +362,7 @@ static int stmmac_pltfr_remove(struct platform_device *pdev)
>   	return ret;
>   }
>
> -#ifdef CONFIG_PM
> +#ifdef CONFIG_PM_SLEEP
>   static int stmmac_pltfr_suspend(struct device *dev)
>   {
>   	int ret;
> @@ -388,13 +388,12 @@ static int stmmac_pltfr_resume(struct device *dev)
>
>   	return stmmac_resume(ndev);
>   }
> -
> -#endif /* CONFIG_PM */
> +#endif /* CONFIG_PM_SLEEP */
>
>   static SIMPLE_DEV_PM_OPS(stmmac_pltfr_pm_ops,
> -			stmmac_pltfr_suspend, stmmac_pltfr_resume);
> +			 stmmac_pltfr_suspend, stmmac_pltfr_resume);
>
> -struct platform_driver stmmac_pltfr_driver = {
> +static struct platform_driver stmmac_pltfr_driver = {
>   	.probe = stmmac_pltfr_probe,
>   	.remove = stmmac_pltfr_remove,
>   	.driver = {
> @@ -402,9 +401,11 @@ struct platform_driver stmmac_pltfr_driver = {
>   		   .owner = THIS_MODULE,
>   		   .pm = &stmmac_pltfr_pm_ops,
>   		   .of_match_table = of_match_ptr(stmmac_dt_ids),
> -		   },
> +	},
>   };
>
> +module_platform_driver(stmmac_pltfr_driver);
> +
>   MODULE_DESCRIPTION("STMMAC 10/100/1000 Ethernet PLATFORM driver");
>   MODULE_AUTHOR("Giuseppe Cavallaro <peppe.cavallaro@st.com>");
>   MODULE_LICENSE("GPL");
>

^ permalink raw reply

* Re: BUG in xennet_make_frags with paged skb data
From: Seth Forshee @ 2014-11-10 14:35 UTC (permalink / raw)
  To: David Vrabel
  Cc: netdev, David S. Miller, Eric Dumazet, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Stefan Bader, Jay Vosburgh, linux-kernel,
	xen-devel, seth.forshee
In-Reply-To: <545CA27F.4070400@citrix.com>

On Fri, Nov 07, 2014 at 10:44:15AM +0000, David Vrabel wrote:
> On 06/11/14 21:49, Seth Forshee wrote:
> > We've had several reports of hitting the following BUG_ON in
> > xennet_make_frags with 3.2 and 3.13 kernels (I'm currently awaiting
> > results of testing with 3.17):
> > 
> >         /* Grant backend access to each skb fragment page. */
> >         for (i = 0; i < frags; i++) {
> >                 skb_frag_t *frag = skb_shinfo(skb)->frags + i;
> >                 struct page *page = skb_frag_page(frag);
> > 
> >                 len = skb_frag_size(frag);
> >                 offset = frag->page_offset;
> > 
> >                 /* Data must not cross a page boundary. */
> >                 BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));
> > 
> > When this happens the page in question is a "middle" page in a compound
> > page (i.e. it's a tail page but not the last tail page), and the data is
> > fully contained within the compound page. The data does however cross
> > the hardware page boundary, and since compound_order evaluates to 0 for
> > tail pages the check fails.
> > 
> > In going over this I've been unable to determine whether the BUG_ON in
> > xennet_make_frags is incorrect or the paged skb data is wrong. I can't
> > find that it's documented anywhere, and the networking code itself is a
> > bit ambiguous when it comes to compound pages. On the one hand
> > __skb_fill_page_desc specifically handles adding tail pages as paged
> > data, but on the other hand skb_copy_bits kmaps frag->page.p which could
> > fail with data that extends into another page.
> 
> netfront will safely handle this case so you can remove this BUG_ON()
> (and the one later on).  But it would be better to find out were these
> funny-looking skbs are coming from and (if necessary) fixing the bug there.

There still seems to be disagreement about whether the "funny" skb is
valid though - you imply it isn't, but Eric says it is. I've been trying
to track down where these skbs originate, and so far I've determined
that they come from a socket spliced to a pipe spliced to a socket. It
looks like the particular page/offset/len tuple originates at least as
far back as the first socket, as the tuple is simply copied from an skb
into the pipe and from the pipe into the final skb.

Anyway, it looks like at minimum it should be safe to change the first
BUG_ON to assert that the data is fully within the compound page as
Stefan suggested, then remove the second BUG_ON entirely. This is the
path I plan to pursue unless someone objects.

Thanks,
Seth

^ permalink raw reply

* Re: BUG in xennet_make_frags with paged skb data
From: David Vrabel @ 2014-11-10 14:41 UTC (permalink / raw)
  To: netdev, David S. Miller, Eric Dumazet, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Stefan Bader, Jay Vosburgh, linux-kernel,
	xen-devel
In-Reply-To: <20141110143517.GA74005@ubuntu-hedt>

On 10/11/14 14:35, Seth Forshee wrote:
> On Fri, Nov 07, 2014 at 10:44:15AM +0000, David Vrabel wrote:
>> On 06/11/14 21:49, Seth Forshee wrote:
>>> We've had several reports of hitting the following BUG_ON in
>>> xennet_make_frags with 3.2 and 3.13 kernels (I'm currently awaiting
>>> results of testing with 3.17):
>>>
>>>         /* Grant backend access to each skb fragment page. */
>>>         for (i = 0; i < frags; i++) {
>>>                 skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>>>                 struct page *page = skb_frag_page(frag);
>>>
>>>                 len = skb_frag_size(frag);
>>>                 offset = frag->page_offset;
>>>
>>>                 /* Data must not cross a page boundary. */
>>>                 BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));
>>>
>>> When this happens the page in question is a "middle" page in a compound
>>> page (i.e. it's a tail page but not the last tail page), and the data is
>>> fully contained within the compound page. The data does however cross
>>> the hardware page boundary, and since compound_order evaluates to 0 for
>>> tail pages the check fails.
>>>
>>> In going over this I've been unable to determine whether the BUG_ON in
>>> xennet_make_frags is incorrect or the paged skb data is wrong. I can't
>>> find that it's documented anywhere, and the networking code itself is a
>>> bit ambiguous when it comes to compound pages. On the one hand
>>> __skb_fill_page_desc specifically handles adding tail pages as paged
>>> data, but on the other hand skb_copy_bits kmaps frag->page.p which could
>>> fail with data that extends into another page.
>>
>> netfront will safely handle this case so you can remove this BUG_ON()
>> (and the one later on).  But it would be better to find out were these
>> funny-looking skbs are coming from and (if necessary) fixing the bug there.
> 
> There still seems to be disagreement about whether the "funny" skb is
> valid though - you imply it isn't, but Eric says it is. I've been trying
> to track down where these skbs originate, and so far I've determined
> that they come from a socket spliced to a pipe spliced to a socket. It
> looks like the particular page/offset/len tuple originates at least as
> far back as the first socket, as the tuple is simply copied from an skb
> into the pipe and from the pipe into the final skb.

Apologies for the lack of clarity.  I meant either: a) fix the producer
if these skbs are invalid; or b) remove the BUG_ON()s.  Since Eric says
these are actually valid skbs, please do option (b).

i.e., remove both BUG_ON()s.

David

^ permalink raw reply

* Re: [RFC PATCH net-next] net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
From: Joe Perches @ 2014-11-10 15:26 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: David Miller, netdev, linux-kernel, Remi Denis-Courmont
In-Reply-To: <5460CB43.4050008@6wind.com>

On Mon, 2014-11-10 at 15:27 +0100, Nicolas Dichtel wrote:
> Le 09/11/2014 20:17, Joe Perches a écrit :
> > Use the more common dynamic_debug capable net_dbg_ratelimited
> > and remove the LIMIT_NETDEBUG macro.
> >
> > This may have some negative impact on messages that were
> > emitted at KERN_INFO that are not not enabled at all unless
> > DEBUG is defined or dynamic_debug is enabled.  Even so,
> > these messages are now _not_ emitted by default.
> >
> > This eliminates the use of the net_msg_warn sysctl
> > "/proc/sys/net/core/warnings".
[]
> > diff --git a/include/net/sock.h b/include/net/sock.h
[]
> > @@ -2276,13 +2276,7 @@ bool sk_ns_capable(const struct sock *sk,
> >   bool sk_capable(const struct sock *sk, int cap);
> >   bool sk_net_capable(const struct sock *sk, int cap);
> >
> > -/*
> > - *	Enable debug/info messages
> > - */
> > -extern int net_msg_warn;
> > -#define LIMIT_NETDEBUG(fmt, args...) \
> > -	do { if (net_msg_warn && net_ratelimit()) printk(fmt,##args); } while(0)
> > -
> > +extern int net_msg_warn;	/* Unused, but still a sysctl */
> Why not removing this variable from this header and from net/core/utils.c?
> Just declaring a static variable in net/core/sysctl_net_core.c should be enough.
> Am I missing something?

No.  It's reasonable to remove its EXPORT_SYMBOL use too.

First let's see if there are any objections to the removal.

^ permalink raw reply

* Re: [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Roopa Prabhu @ 2014-11-10 15:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460B989.8030404@mojatatu.com>

On 11/10/14, 5:11 AM, Jamal Hadi Salim wrote:
> On 11/09/14 05:51, Jiri Pirko wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To notify switch driver of change in STP state of bridge port, add new
>> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
>> code then.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/netdevice.h |  6 ++++++
>>   include/net/switchdev.h   |  6 ++++++
>>   net/bridge/br_netlink.c   |  2 ++
>>   net/bridge/br_stp.c       |  4 ++++
>>   net/bridge/br_stp_if.c    |  3 +++
>>   net/bridge/br_stp_timer.c |  2 ++
>>   net/switchdev/switchdev.c | 19 +++++++++++++++++++
>>   7 files changed, 42 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 116a19d..35f21a95 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1033,6 +1033,10 @@ typedef u16 (*select_queue_fallback_t)(struct 
>> net_device *dev,
>>    *                  const unsigned char *addr,
>>    *                  u16 vid);
>>    *    Called to delete a fdb from switch device port.
>> + *
>> + * int (*ndo_sw_port_stp_update)(struct net_device *dev, u8 state);
>> + *    Called to notify switch device port of bridge port STP
>> + *    state change.
>
> You are unconditionally calling
> netdev_sw_port_stp_update(p->dev, p->state);
> Again issue is policy. Could you make this work the same
> way the fdb_add e.g user intent of whether i want to turn
> a port in hardware and/or software to disabled/learning/etc
> is reflected?
>
> btw: does _sw_ stand for switch? why not _hw_ ?
> Could we have one ndo for all flags instead of individual ones.

I agree. There is the bridge port state and a bunch of bridge port 
flags. A generic ndo will be good.
>
> I know the current user space code uses u8 as a bitflag; but
> maybe we can introduce a new u32 flag bitmask that has all the
> flags set for backward compat? I can count about a total of 10.

^ permalink raw reply

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
From: Roopa Prabhu @ 2014-11-10 16:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Scott Feldman, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, xiyou.wangcong,
	Fastabend, John R, edumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman
In-Reply-To: <5460AF22.2040701@mojatatu.com>

On 11/10/14, 4:27 AM, Jamal Hadi Salim wrote:
> On 11/10/14 03:46, Scott Feldman wrote:
>
>>
>> IFLA_BRPORT_LEARNING is u8 attr and we're only using lower bit to turn
>> learning on/off.  Maybe we can use another bit to indicate learning to
>> be done in sw or hw.  I don't think adding another bit would break
>> existing iproute2.
>>
>> LEARNING_ENABLED    (1 << 0)
>> LEARNING_HW              (1 << 1)
>>
>> Would this work?
>>
>
> Yes to making it a bit. But:
> This is not *learning*. You are doing a *sync*.
> Those are two different things.
>
> Learning on/off exists today. It signals to the L2 whether you
> should learn or not.
> I like the way fdb_add/del work with a flag which says
> it is the software and/or offloaded version. Please keep that
> semantic.
> What you are doing above is letting the hardware learn then
> syncing to software. You need a different flag there. something
> like:
>
> SYNC_HW_FDB (1<<1)
>
And in any case, It seems like this policy should be per bridge or per 
switch chip...or per fdb..
entry (like the original fdb_add/del) and not a "port" flag.. ?

^ permalink raw reply

* Re: [PATCH] brcmfmac: unlink URB when request timed out
From: Mathy Vanhoef @ 2014-11-10 16:08 UTC (permalink / raw)
  To: Arend van Spriel, brudley, frankyl, meuleman, linville, pieterpg,
	linux-wireless, brcm80211-dev-list, netdev, Oliver Neukum
In-Reply-To: <54609F09.6070807@broadcom.com>

On 11/10/2014 06:18 AM, Arend van Spriel wrote:
> On 09-11-14 19:10, Mathy Vanhoef wrote:
>> From: Mathy Vanhoef <vanhoefm@gmail.com>
>>
>> Unlink the submitted URB in brcmf_usb_dl_cmd if the request timed out. This
>> assures the URB is never submitted twice, preventing a driver crash.
> 
> Hi Mathy,
> 
> What driver crash are you referring to? The log only shows the WARNING
> ending in a USB disconnect but no actual crash. Does your patch get the
> driver running properly or does it only avoid the warning.

Hi Arend,

It shows a warning, after which the device doesn't work (but the computer is
still usable). But I've noticed that when *unplugging* the USB cable the OS may
freeze. This doesn't always happen though, sometimes unplugging works OK. The
patch both avoids the warning, and gets the device/driver running properly
(unplugging also works OK).

> 
> With that said, it seems there is some need for improvement, but I also
> notice you are running this on a virtual machine so could that affect
> the timeout to kick in before completion. Could you try to increase the
> timeout. Still when a timeout occurs this needs to be handled properly.
> Could you also try the following patch?

I did a few additional tests:

1. When increasing IOCTL_RESP_TIMEOUT to 20000 (ten times the normal value) the
   timeout and warning still occur. Device/driver doesn't work.
2. When increasing BRCMF_USB_RESET_GETVER_SPINWAIT to 1000 (ten timers the
   normal value) everything works. Device/driver works.
3. Quick test using backports-3.18-rc1-1 (aka unpatched driver) on a non-
   virtualized Linux install: In that case everything worked fine. So the bug
   may only be triggered in a virtualized environment / VMWare.
4. When applying your patch, the driver stops early during initialization of
   the device. I included a WARN_ONCE before returning EINVAL and got the
   output below.

Kind regards,
Mathy
---
[  220.955647] usb 1-1: new high-speed USB device number 3 using ehci-pci
[  221.487797] usb 1-1: New USB device found, idVendor=043e, idProduct=3004
[  221.487802] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  221.487804] usb 1-1: Product: Remote Download Wireless Adapter
[  221.487806] usb 1-1: Manufacturer: Broadcom
[  221.487808] usb 1-1: SerialNumber: 000000000001
[  221.490472] brcmfmac: brcmf_usb_probe Enter 0x043e:0x3004
[  221.490476] brcmfmac: brcmf_usb_probe Broadcom high speed USB WLAN interface detected
[  221.490477] brcmfmac: brcmf_usb_probe_cb Enter
[  221.490480] brcmfmac: brcmf_usb_attach Enter
[  221.490494] brcmfmac: brcmf_usb_dlneeded Enter
[  221.490495] ------------[ cut here ]------------
[  221.490503] WARNING: CPU: 0 PID: 100 at drivers/net/wireless/brcm80211/brcmfmac/usb.c:716 brcmf_usb_dl_cmd+0x75/0x1a0 [brcmfmac]()
[  221.490505] EINVAL devinfo=c0044000 ctl_rub=ef898380 completed=0
[  221.490506] Modules linked in: brcmfmac brcmutil vmw_pvscsi pcnet32 mptspi mptscsih mptbase
[  221.490514] CPU: 0 PID: 100 Comm: kworker/0:1 Not tainted 3.18.0-rc3-wl+ #2
[  221.490515] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  221.490528] Workqueue: usb_hub_wq hub_event
[  221.490530]  00000000 00000000 eecffb58 c1711f4a eecffb98 eecffb88 c103edaf f11cbc58
[  221.490534]  eecffbb4 00000064 f11cbc84 000002cc f11c1595 f11c1595 c0044000 ffffffea
[  221.490537]  ef726000 eecffba0 c103ee4e 00000009 eecffb98 f11cbc58 eecffbb4 eecffbd0
[  221.490541] Call Trace:
[  221.490550]  [<c1711f4a>] dump_stack+0x41/0x52
[  221.490558]  [<c103edaf>] warn_slowpath_common+0x7f/0xa0
[  221.490563]  [<f11c1595>] ? brcmf_usb_dl_cmd+0x75/0x1a0 [brcmfmac]
[  221.490567]  [<f11c1595>] ? brcmf_usb_dl_cmd+0x75/0x1a0 [brcmfmac]
[  221.490570]  [<c103ee4e>] warn_slowpath_fmt+0x2e/0x30
[  221.490575]  [<f11c1595>] brcmf_usb_dl_cmd+0x75/0x1a0 [brcmfmac]
[  221.490580]  [<f11c2cd8>] brcmf_usb_probe+0x3c8/0x640 [brcmfmac]
[  221.490583]  [<c1717d53>] ? mutex_lock+0x13/0x32
[  221.490586]  [<c1493ae3>] usb_probe_interface+0xa3/0x180
[  221.490590]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490592]  [<c13f546e>] driver_probe_device+0x5e/0x1f0
[  221.490595]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490597]  [<c13f56c9>] __device_attach+0x39/0x50
[  221.490600]  [<c13f3d84>] bus_for_each_drv+0x34/0x70
[  221.490602]  [<c13f53db>] device_attach+0x7b/0x90
[  221.490604]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490607]  [<c13f4b8f>] bus_probe_device+0x6f/0x90
[  221.490609]  [<c13f3256>] device_add+0x426/0x520
[  221.490611]  [<c1491503>] ? usb_control_msg+0xb3/0xd0
[  221.490614]  [<c1717d53>] ? mutex_lock+0x13/0x32
[  221.490627]  [<c14922f8>] usb_set_configuration+0x3f8/0x700
[  221.490630]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490633]  [<c149ac7b>] generic_probe+0x2b/0x90
[  221.490637]  [<c1188bc0>] ? sysfs_create_link+0x20/0x40
[  221.490639]  [<c1492bec>] usb_probe_device+0xc/0x10
[  221.490641]  [<c13f546e>] driver_probe_device+0x5e/0x1f0
[  221.490644]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490646]  [<c13f56c9>] __device_attach+0x39/0x50
[  221.490649]  [<c13f3d84>] bus_for_each_drv+0x34/0x70
[  221.490651]  [<c13f53db>] device_attach+0x7b/0x90
[  221.490653]  [<c13f5690>] ? __driver_attach+0x90/0x90
[  221.490656]  [<c13f4b8f>] bus_probe_device+0x6f/0x90
[  221.490658]  [<c13f3256>] device_add+0x426/0x520
[  221.490661]  [<c148aa2e>] ? usb_new_device+0x16e/0x3a0
[  221.490663]  [<c148aad7>] usb_new_device+0x217/0x3a0
[  221.490666]  [<c148bff7>] hub_event+0xa17/0xda0
[  221.490668]  [<c1716918>] ? __schedule+0x2f8/0x710
[  221.490672]  [<c105127c>] ? pwq_dec_nr_in_flight+0x3c/0x90
[  221.490674]  [<c10513ee>] process_one_work+0x11e/0x360
[  221.490677]  [<c1051750>] worker_thread+0xf0/0x3c0
[  221.490680]  [<c106e14a>] ? __wake_up_locked+0x1a/0x20
[  221.490682]  [<c1051660>] ? process_scheduled_works+0x30/0x30
[  221.490685]  [<c1055b56>] kthread+0x96/0xb0
[  221.490687]  [<c1050000>] ? put_unbound_pool+0x110/0x170
[  221.490691]  [<c1719c81>] ret_from_kernel_thread+0x21/0x30
[  221.490693]  [<c1055ac0>] ? kthread_worker_fn+0x110/0x110
[  221.490695] ---[ end trace 9befd914693f3083 ]---
[  221.490697] brcmfmac: brcmf_usb_dlneeded chip 57005 rev 0xf11cfcec
[  221.490699] brcmfmac: brcmf_fw_get_firmwares enter: dev=1-1

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
index 5265aa7..15b1aa7 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
@@ -709,8 +709,13 @@ static int brcmf_usb_dl_cmd(struct brcmf_usbdev_info *devinfo, u8 cmd,
 	char *tmpbuf;
 	u16 size;
 
-	if ((!devinfo) || (devinfo->ctl_urb == NULL))
+	if (!devinfo || !devinfo->ctl_urb || !devinfo->ctl_completed) {
+		WARN_ONCE(1, "EINVAL devinfo=%p ctl_rub=%p completed=%d\n",
+			devinfo,
+			devinfo ? devinfo->ctl_urb : NULL,
+			devinfo ? devinfo->ctl_completed : -1);
 		return -EINVAL;
+	}
 
 	tmpbuf = kmalloc(buflen, GFP_ATOMIC);
 	if (!tmpbuf)


> 
> Regards,
> Arend
> ---
>  drivers/net/wireless/brcm80211/brcmfmac/usb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c
> b/drivers/net/wireles
> index dc13591..786c40b 100644
> --- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c
> +++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
> @@ -640,7 +640,7 @@ static int brcmf_usb_dl_cmd(struct brcmf_usbdev_info
> *devinf
>         char *tmpbuf;
>         u16 size;
> 
> -       if ((!devinfo) || (devinfo->ctl_urb == NULL))
> +       if (!devinfo || !devinfo->ctl_urb || !devinfo->ctl_completed)
>                 return -EINVAL;
> 
>         tmpbuf = kmalloc(buflen, GFP_ATOMIC);
> 
>> Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
>> ---
>> Currently brcmfmac may crash when a USB device is attached (tested with a LG
>> TWFM-B003D). In particular it fails on the second call to brcmf_usb_dl_cmd in
>> the while loop of brcmf_usb_resetcfg. The problem is that an URB is being
>> submitted twice:
>>
>> [  169.861800] brcmfmac: brcmf_usb_dl_writeimage Enter, fw f14db000, len 348160
>> [  171.787791] brcmfmac: brcmf_usb_dl_writeimage Exit, err=0
>> [  171.787797] brcmfmac: brcmf_usb_dlstart Exit, err=0
>> [  171.787799] brcmfmac: brcmf_usb_dlrun Enter
>> [  171.791794] brcmfmac: brcmf_usb_resetcfg Enter
>> [  173.988072] ------------[ cut here ]------------
>> [  173.988083] WARNING: CPU: 0 PID: 369 at drivers/usb/core/urb.c:339 usb_submit_urb+0x4e6/0x500()
>> [  173.988085] URB eaf45f00 submitted while active
>> [  173.988086] Modules linked in: brcmfmac brcmutil vmw_pvscsi pcnet32 mptspi mptscsih mptbase
>> [  173.988100] CPU: 0 PID: 369 Comm: kworker/0:2 Not tainted 3.18.0-rc3-wl #1
>> [  173.988102] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
>> [  173.988106] Workqueue: events request_firmware_work_func
>> [  173.988108]  00000000 00000000 ee747db8 c1711f4a ee747df8 ee747de8 c103edaf c18d1e10
>> [  173.988112]  ee747e14 00000171 c18a8b29 00000153 c1490556 c1490556 eaf45f00 eafdc660
>> [  173.988115]  f14b8fa0 ee747e00 c103ee4e 00000009 ee747df8 c18d1e10 ee747e14 ee747e50
>> [  173.988119] Call Trace:
>> [  173.988129]  [<c1711f4a>] dump_stack+0x41/0x52
>> [  173.988136]  [<c103edaf>] warn_slowpath_common+0x7f/0xa0
>> [  173.988139]  [<c1490556>] ? usb_submit_urb+0x4e6/0x500
>> [  173.988141]  [<c1490556>] ? usb_submit_urb+0x4e6/0x500
>> [  173.988147]  [<f14b8fa0>] ? brcmf_usb_ioctl_resp_wake+0x40/0x40 [brcmfmac]
>> [  173.988150]  [<c103ee4e>] warn_slowpath_fmt+0x2e/0x30
>> [  173.988152]  [<c1490556>] usb_submit_urb+0x4e6/0x500
>> [  173.988156]  [<c1123de1>] ? __kmalloc+0x21/0x140
>> [  173.988161]  [<f14b91c3>] ? brcmf_usb_dl_cmd+0x33/0x120 [brcmfmac]
>> [  173.988166]  [<f14b9243>] brcmf_usb_dl_cmd+0xb3/0x120 [brcmfmac]
>> [  173.988170]  [<f14ba6c4>] brcmf_usb_probe_phase2+0x4e4/0x640 [brcmfmac]
>> [  173.988176]  [<f14b4900>] brcmf_fw_request_code_done+0xd0/0xf0 [brcmfmac]
>> [  173.988178]  [<c1400876>] request_firmware_work_func+0x26/0x50
>> [  173.988182]  [<c10513ee>] process_one_work+0x11e/0x360
>> [  173.988184]  [<c1051750>] worker_thread+0xf0/0x3c0
>> [  173.988205]  [<c106e14a>] ? __wake_up_locked+0x1a/0x20
>> [  173.988208]  [<c1051660>] ? process_scheduled_works+0x30/0x30
>> [  173.988211]  [<c1055b56>] kthread+0x96/0xb0
>> [  173.988214]  [<c1719c81>] ret_from_kernel_thread+0x21/0x30
>> [  173.988217]  [<c1055ac0>] ? kthread_worker_fn+0x110/0x110
>> [  173.988219] ---[ end trace 0c88bf46801de083 ]---
>> [  173.988221] brcmf_usb_dl_cmd: usb_submit_urb failed -16
>> [  173.988396] brcmfmac: brcmf_usb_probe_phase2 failed: dev=1-1, err=-19
>> [  173.989503] brcmfmac: brcmf_usb_disconnect Enter
>>
>> This patch fixes the brcmf_usb_dl_cmd function to prevent an URB from being
>> submitted twice. Tested using a LG TWFM-B003D, which now works properly.
>>
>>
>>  drivers/net/wireless/brcm80211/brcmfmac/usb.c |    6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
>> index 5265aa7..1bc7858 100644
>> --- a/drivers/net/wireless/brcm80211/brcmfmac/usb.c
>> +++ b/drivers/net/wireless/brcm80211/brcmfmac/usb.c
>> @@ -738,10 +738,12 @@ static int brcmf_usb_dl_cmd(struct brcmf_usbdev_info *devinfo, u8 cmd,
>>  		goto finalize;
>>  	}
>>  
>> -	if (!brcmf_usb_ioctl_resp_wait(devinfo))
>> +	if (!brcmf_usb_ioctl_resp_wait(devinfo)) {
>> +		usb_unlink_urb(devinfo->ctl_urb);
>>  		ret = -ETIMEDOUT;
>> -	else
>> +	} else {
>>  		memcpy(buffer, tmpbuf, buflen);
>> +	}
>>  
>>  finalize:
>>  	kfree(tmpbuf);
>>
> 

^ permalink raw reply related

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: David Miller @ 2014-11-10 16:18 UTC (permalink / raw)
  To: viro; +Cc: herbert, netdev, linux-kernel, bcrl, mst
In-Reply-To: <20141110090955.GH7996@ZenIV.linux.org.uk>

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Mon, 10 Nov 2014 09:09:55 +0000

> BTW, what's the usual regression suite used for net/* stuff?

There's a regression suite? :-)

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: David Miller @ 2014-11-10 16:28 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460AA45.6000007@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Mon, 10 Nov 2014 07:06:29 -0500

> It is a _user space visible rename_, how about:
> 
> #define MAX_PHYS_ITEM_ID_LEN 32
> #define MAX_PHYS_PORT_ID_LEN   MAX_PHYS_ITEM_ID_LEN
> 
> I did miss the fact that the size didnt change.

The user cannot see this macro Jamal, please really read this
code, instead of, once again, jumping to conclusions.

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: David Miller @ 2014-11-10 16:28 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460ACC0.3020805@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Mon, 10 Nov 2014 07:17:04 -0500

> On 11/10/14 02:43, Jiri Pirko wrote:
>> Mon, Nov 10, 2014 at 04:35:12AM CET, jhs@mojatatu.com wrote:
> 
>> I don't see a reason why this would break kabi:
>>
>> -#define MAX_PHYS_PORT_ID_LEN 32
>> +#define MAX_PHYS_ITEM_ID_LEN 32
>>
> 
> refer to my response to Dave. Just define MAX_PHYS_PORT_ID_LEN to
> MAX_PHYS_ITEM_ID_LEN so people dont have to change their code
> because a name change.

Again, nobody has to change anything.

This macro is not visible outside of the kernel.

Jamal, this is really driving me crazy, this is a non-issue.

^ permalink raw reply

* Re: [Xen-devel] BUG in xennet_make_frags with paged skb data
From: Zoltan Kiss @ 2014-11-10 16:39 UTC (permalink / raw)
  To: David Vrabel, netdev, David S. Miller, Eric Dumazet,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefan Bader,
	Jay Vosburgh, linux-kernel, xen-devel
In-Reply-To: <5460CEA5.3070201@citrix.com>



On 10/11/14 14:41, David Vrabel wrote:
> On 10/11/14 14:35, Seth Forshee wrote:
>> On Fri, Nov 07, 2014 at 10:44:15AM +0000, David Vrabel wrote:
>>> On 06/11/14 21:49, Seth Forshee wrote:
>>>> We've had several reports of hitting the following BUG_ON in
>>>> xennet_make_frags with 3.2 and 3.13 kernels (I'm currently awaiting
>>>> results of testing with 3.17):
>>>>
>>>>          /* Grant backend access to each skb fragment page. */
>>>>          for (i = 0; i < frags; i++) {
>>>>                  skb_frag_t *frag = skb_shinfo(skb)->frags + i;
>>>>                  struct page *page = skb_frag_page(frag);
>>>>
>>>>                  len = skb_frag_size(frag);
>>>>                  offset = frag->page_offset;
>>>>
>>>>                  /* Data must not cross a page boundary. */
>>>>                  BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));
>>>>
>>>> When this happens the page in question is a "middle" page in a compound
>>>> page (i.e. it's a tail page but not the last tail page), and the data is
>>>> fully contained within the compound page. The data does however cross
>>>> the hardware page boundary, and since compound_order evaluates to 0 for
>>>> tail pages the check fails.
>>>>
>>>> In going over this I've been unable to determine whether the BUG_ON in
>>>> xennet_make_frags is incorrect or the paged skb data is wrong. I can't
>>>> find that it's documented anywhere, and the networking code itself is a
>>>> bit ambiguous when it comes to compound pages. On the one hand
>>>> __skb_fill_page_desc specifically handles adding tail pages as paged
>>>> data, but on the other hand skb_copy_bits kmaps frag->page.p which could
>>>> fail with data that extends into another page.
>>>
>>> netfront will safely handle this case so you can remove this BUG_ON()
>>> (and the one later on).  But it would be better to find out were these
>>> funny-looking skbs are coming from and (if necessary) fixing the bug there.
>>
>> There still seems to be disagreement about whether the "funny" skb is
>> valid though - you imply it isn't, but Eric says it is. I've been trying
>> to track down where these skbs originate, and so far I've determined
>> that they come from a socket spliced to a pipe spliced to a socket. It
>> looks like the particular page/offset/len tuple originates at least as
>> far back as the first socket, as the tuple is simply copied from an skb
>> into the pipe and from the pipe into the final skb.
>
> Apologies for the lack of clarity.  I meant either: a) fix the producer
> if these skbs are invalid; or b) remove the BUG_ON()s.  Since Eric says
> these are actually valid skbs, please do option (b).
>
> i.e., remove both BUG_ON()s.

The BUG_ON suggested by Stefan would be still reasonable:

BUG_ON(((page-compound_head(page))*PAGE_SIZE)+offset+len >
PAGE_SIZE<<compound_order(compound_head(page)));
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply

* Re: [Xen-devel] BUG in xennet_make_frags with paged skb data
From: David Vrabel @ 2014-11-10 16:42 UTC (permalink / raw)
  To: Zoltan Kiss, netdev, David S. Miller, Eric Dumazet,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefan Bader,
	Jay Vosburgh, linux-kernel, xen-devel
In-Reply-To: <5460EA45.2080202@linaro.org>

On 10/11/14 16:39, Zoltan Kiss wrote:
> 
> The BUG_ON suggested by Stefan would be still reasonable:
> 
> BUG_ON(((page-compound_head(page))*PAGE_SIZE)+offset+len >
> PAGE_SIZE<<compound_order(compound_head(page)));

Well, it wouldn't trigger but I don't think it is useful any more.

David

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Thomas Graf @ 2014-11-10 16:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415530280-9190-1-git-send-email-jiri@resnulli.us>

On 11/09/14 at 11:51am, Jiri Pirko wrote:
> Hi all.
> 
> This patchset is just the first phase of switch and switch-ish device
> support api in kernel. Note that the api will extend (our complete work
> can be pulled from https://github.com/jpirko/net-next-rocker).

Despite my comment on ndo_fdb_add() which I believe can be reconsidered
later. I like this pach series a lot. Thanks for putting in so much work
on this topic.

Reviewed-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* [PATCH net] net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
From: Daniel Borkmann @ 2014-11-10 16:54 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev, Vlad Yasevich

An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:

  ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>

While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.

So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().

The trace for the log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
 <IRQ>
 [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
 [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
 [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]

A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.

Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/sm_make_chunk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ab734be..9f32741 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2609,6 +2609,9 @@ do_addr_param:
 		addr_param = param.v + sizeof(sctp_addip_param_t);
 
 		af = sctp_get_af_specific(param_type2af(param.p->type));
+		if (af == NULL)
+			break;
+
 		af->from_addr_param(&addr, addr_param,
 				    htons(asoc->peer.port), 0);
 
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH nf] netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
From: Pablo Neira Ayuso @ 2014-11-10 16:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: programme110, netfilter-devel, Florian Westphal, netdev
In-Reply-To: <20141106133648.2534.1403.stgit@dragon>

On Thu, Nov 06, 2014 at 02:36:48PM +0100, Jesper Dangaard Brouer wrote:
> From: bill bonaparte <programme110@gmail.com>
> 
> After removal of the central spinlock nf_conntrack_lock, in
> commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
> spinlock nf_conntrack_lock"), it is possible to race against
> get_next_corpse().
> 
> The race is against the get_next_corpse() cleanup on
> the "unconfirmed" list (a per-cpu list with seperate locking),
> which set the DYING bit.
> 
> Fix this race, in __nf_conntrack_confirm(), by removing the CT
> from unconfirmed list before checking the DYING bit.  In case
> race occured, re-add the CT to the dying list.

This seems correct to me, some side comments.

> Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
> Reported-by: bill bonaparte <programme110@gmail.com>
> Signed-off-by: bill bonaparte <programme110@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
>  net/netfilter/nf_conntrack_core.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 5016a69..1072650 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -611,12 +611,15 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	 */
>  	NF_CT_ASSERT(!nf_ct_is_confirmed(ct));
>  	pr_debug("Confirming conntrack %p\n", ct);
> -	/* We have to check the DYING flag inside the lock to prevent
> +
> +	/* We have to check the DYING flag after unlink to prevent
>  	   a race against nf_ct_get_next_corpse() possibly called from
>  	   user context, else we insert an already 'dead' hash, blocking
>  	   further use of that particular connection -JM */

While at this, I think it would be good to fix comment style to:

        /* We have ...
         * ...
         */

I can fix this here, no need to resend, just let me know.

> +	nf_ct_del_from_dying_or_unconfirmed_list(ct);
>  
>  	if (unlikely(nf_ct_is_dying(ct))) {
> +		nf_ct_add_to_dying_list(ct);
>  		nf_conntrack_double_unlock(hash, reply_hash);
>  		local_bh_enable();
>  		return NF_ACCEPT;

Not directly related to your patch, but I don't find a good reason why
we're accepting this packet.

If the conntrack from the unconfirmed list is dying, then the object
will be released by when the packet leaves the stack to its
destination. With stateful filtering depending in place, the follow up
packet in the reply direction will likely be considered invalid (if
tcp tracking is on). Fortunately for us, the origin will likely
retransmit the syn again, so the ct will be setup accordingly.

So, why should we allow this to go through?

This return verdict was introduced in: fc35077 ("netfilter:
nf_conntrack: fix a race in __nf_conntrack_confirm against
nf_ct_get_next_corpse()") btw.

^ permalink raw reply

* [PATCH net] net: sctp: fix memory leak in auth key management
From: Daniel Borkmann @ 2014-11-10 17:00 UTC (permalink / raw)
  To: davem; +Cc: linux-sctp, netdev, Vlad Yasevich

A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:

unreferenced object 0xffff8800837047c0 (size 16):
  comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
  hex dump (first 16 bytes):
    01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
    [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
    [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
    [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
    [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
    [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
    [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
    [<ffffffffffffffff>] 0xffffffffffffffff

This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.

Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/auth.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 0e85291..fb7976a 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -862,8 +862,6 @@ int sctp_auth_set_key(struct sctp_endpoint *ep,
 		list_add(&cur_key->key_list, sh_keys);
 
 	cur_key->key = key;
-	sctp_auth_key_hold(key);
-
 	return 0;
 nomem:
 	if (!replace)
-- 
1.7.11.7

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox