Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux-next: Tree for Nov 10 (net/ipv4/ip_tunnel.c)
From: Tom Herbert @ 2014-11-10 20:29 UTC (permalink / raw)
  To: David Miller; +Cc: rdunlap, sfr, Linux-Next, LKML, Linux Netdev List
In-Reply-To: <20141110.142439.2292080248990157625.davem@davemloft.net>

I am looking at it.

On Mon, Nov 10, 2014 at 11:24 AM, David Miller <davem@davemloft.net> wrote:
> From: Randy Dunlap <rdunlap@infradead.org>
> Date: Mon, 10 Nov 2014 10:15:11 -0800
>
>> On 11/10/14 01:59, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Changes since 20141106:
>>>
>>
>> on x86_64:
>> when CONFIG_NET_IP_TUNNEL=y and CONFIG_NET_FOU=m:
>>
>> net/built-in.o: In function `ip_tunnel_encap':
>> (.text+0xb04d8): undefined reference to `gue_build_header'
>> net/built-in.o: In function `ip_tunnel_encap':
>> (.text+0xb04ea): undefined reference to `fou_build_header'
>
> Tom, this has now been reported twice, please fix this.
>
> Thanks.

^ permalink raw reply

* Re: [PATCH 1/2] 8139too: Allow setting MTU larger than 1500
From: David Miller @ 2014-11-10 20:30 UTC (permalink / raw)
  To: albeu; +Cc: linux-kernel, netdev, bhelgaas, benoit.taine, ebiederm
In-Reply-To: <1415447316-12424-1-git-send-email-albeu@free.fr>

From: Alban Bedel <albeu@free.fr>
Date: Sat,  8 Nov 2014 12:48:35 +0100

> Replace the default ndo_change_mtu callback with one that allow
> setting MTU that the driver can handle.
> 
> Signed-off-by: Alban Bedel <albeu@free.fr>

Applied to net-next

^ permalink raw reply

* Re: [PATCH 2/2] 8139too: Allow using the largest possible MTU
From: David Miller @ 2014-11-10 20:30 UTC (permalink / raw)
  To: albeu; +Cc: linux-kernel, netdev, bhelgaas, benoit.taine, ebiederm
In-Reply-To: <1415447316-12424-2-git-send-email-albeu@free.fr>

From: Alban Bedel <albeu@free.fr>
Date: Sat,  8 Nov 2014 12:48:36 +0100

> This driver allows MTU up to 1518 bytes which is not enought to run
> batman-adv. Simply raise the maximum packet size up to the maximum
> allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU
> of 1774 bytes.
> 
> Signed-off-by: Alban Bedel <albeu@free.fr>

Also applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH] drivers: atm: eni: Add pci_dma_mapping_error() call
From: David Miller @ 2014-11-10 20:32 UTC (permalink / raw)
  To: tinajohnson.1234
  Cc: chas, linux-atm-general, netdev, linux-kernel, julia.lawall
In-Reply-To: <1415463482-8013-1-git-send-email-tinajohnson.1234@gmail.com>

From: Tina Johnson <tinajohnson.1234@gmail.com>
Date: Sat,  8 Nov 2014 21:48:02 +0530

> Added a pci_dma_mapping_error() call to check for mapping errors before
> further using the dma handle. Unchecked dma handles were found using
> Coccinelle:
> 
> @rule1@
> expression e1;
> identifier x;
> @@
> 
> *x = pci_map_single(...);
>  ... when != pci_dma_mapping_error(e1,x)
> 
> Signed-off-by: Tina Johnson <tinajohnson.1234@gmail.com>
> Acked-by: Julia Lawall <julia.lawall@lip6.fr>

I really don't like quick fixes like this, exactly because of the
kind of new bug this change is introducing.

The 'trouble' label assumes that it is recovering and unwinding state
when an error occurs after the DMA buffer is successfully mapped.

It unconditionally does pci_unmap_single() if 'paddr' is non-zero
which it might be in the error case depending upon how DMA errors
are represented on a given platform.

^ permalink raw reply

* question about ethtool bits?
From: Jesse Brandeburg @ 2014-11-10 20:47 UTC (permalink / raw)
  To: netdev; +Cc: jesse.brandeburg, ben

Ben et al,

So, I was just looking at adding some more types/speeds to
ethtool.h and noticed we are about out of bits in ecmd->advertising
(and others), which are declared u32.

We currently have room for one more type (bit 31) and then all 32 bits
will be full.

Previous solutions have involved adding a new struct member to the end
of ecmd and friends and calling it u32 advertising2 or something.
Another option could be adding a u64, but there may be not fun results
of doing so.

...
#define SUPPORTED_56000baseSR4_Full     (1 << 29)
#define SUPPORTED_56000baseLR4_Full     (1 << 30)

...
#define ADVERTISED_56000baseSR4_Full    (1 << 29)
#define ADVERTISED_56000baseLR4_Full    (1 << 30)

in the case of speed there is already a speed_hi, so that gets us 32
bits of speed expressed in Mb/s

These fast networks create so many problems ;-)

There are two u32's reserved at the end of ethtool_cmd.

Suggestions?

^ permalink raw reply

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
From: Jamal Hadi Salim @ 2014-11-10 21:14 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Roopa Prabhu, Jiri Pirko, Netdev, David S. Miller, nhorman,
	Andy Gospodarek, Thomas Graf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, Kirsher, Jeffrey T, vyasevic, Cong Wang,
	Fastabend, John R, Eric Dumazet, Florian Fainelli, John Linville,
	jasowang, ebiederm, Nicolas Dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, Alexei Starovoitov
In-Reply-To: <CAE4R7bDQesnq8kDFsUqiY0+H1SnUHPSJACLMgFc9eX8chjoxKA@mail.gmail.com>

On 11/10/14 14:47, Scott Feldman wrote:
> On Mon, Nov 10, 2014 at 9:27 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>
> For swdev, I don't care for the model where each port has an fdb and
> the bridge has an fdb.  The bridge's fdb lookup/learning/fwding is
> what we're offloading to HW, so it makes more sense from the driver
> and to the user to use one fdb, the bridge's fdb.  So user types
> "bridge fdb show" and static fdbs installed on the bridge and learned
> fdbs synced from HW are represented.  One table.
>

side note:
I hope we'd be able to tell apart what is in hardware vs software
and what has been synced up from hardware (assuming policy says we
are allowed to sync things).
Yes, there is one fdb table per bridge - but each entry would say
which brport is involved. So the reference point being a bridge point
sounds reasonable to me. i.e
Any fdb entry whether in h/w or s/w would point to an egress
brport *always*. Are you thinking there is only one possible bridge?
Caveat: a piece of hardware could have multiple virtual bridges.
In what Ben showed on the $5 realtek, after boot up we just
know which brports exist and nothing more.
We can then create a bridge and attach brport to each. This gets
reflected into hardware on a per brport level (I think there was
a field called FID?).
Constraint: Each brport connects only to one bridge.

> I view the existing ndo_fdb_add/del ops useful for devices working
> standalone without the bridge driver that have some HW fwding
> capabilities and need to manage their own fdb.

As in offload said fdb?

>For devices under
> bridge, let's use the bridge's fdb, at least for swdev.
>

If the hardware can only do one bridge sure.

> Does this make sense?

not quiet for me but i may be missing something.

I hate to use a lot of "I"s in my sentences,
> but looks like I did exactly that in above, so take this as an
> opinion, within the scope of swdev.
>

"I" is useful for expressing an opinion or an expectation of course ;->
As an example:
_I_ believe we should be able to define how {learning, flooding etc} and 
where {hw vs sw} things are to be installed or learnt-from.
I have use cases where the controller makes all the decisions.
And i tried to provide my motivation in one of the meetings here:
https://linux.cumulusnetworks.com/offload-discussion-1/jamal-NFstatecaching.pdf

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: John Fastabend @ 2014-11-10 21:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-2-git-send-email-jiri@resnulli.us>

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> So this can be reused for identification of other "items" as well.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Looks good to me. Just a simple code refactoring and doesn't
touch any user visible uapi files and makes the next patches a
bit cleaner.

Acked-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
From: John Fastabend @ 2014-11-10 21:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-3-git-send-email-jiri@resnulli.us>

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>   MAINTAINERS                            |  7 ++++
>   include/linux/netdevice.h              | 10 ++++++
>   include/net/switchdev.h                | 30 +++++++++++++++++
>   net/Kconfig                            |  1 +
>   net/Makefile                           |  3 ++
>   net/switchdev/Kconfig                  | 13 ++++++++
>   net/switchdev/Makefile                 |  5 +++
>   net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>   9 files changed, 161 insertions(+)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..98be76c
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  NIC0 NIC1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  eth0 eth1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> +	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
> +			       of the same physical switch chip. This is
> +			       mandatory to be implemented by all switch drivers
> +			       and serves the caller for recognition of a port
> +			       netdevice.

What is the connection between ndo_sw_parent_get_id and
ndo_get_phys_port_id(). I'm having a bit of trouble teasing
this out.

For example here is my ascii art for a SR-IOV NIC,

        eth0     eth1     eth2
         |         |        |
         |         |        |
         PF        VF       VF
    +----+---------+--------+----+
    |       embedded bridge      |
    +-------------+--------------+
                  |
                 port

that can do switching between the various uplinks and downlinks.
In IEEE 802.1Q language the embedded bridge acts like an edge
relay. At least that seems to be the current state of the art
for SR-IOV. Edge relay just means it has a single uplink port
to the network and multiple downlinks and also isn't required
to do learning and run loop detection protocols STP, et. al.

Also there are multi-function devices that look the same except
replace the VFs with PFs. It seems to be a common mode for NICs
that do the iSCSI offloads with storage functions.

When something is an embedded bridge vs a SOME switch chip is
not entirely clear.

My understanding is use ndo_sw_parent_get_id() when you have
multiple physical ports all connected to a single switch object.
When you have a single port connected to multiple PCIE functions
or queues representing a netdev (e.g. macvlan offload) use the
ndo_get_phys_port_id(). Just want to be sure we are on the
same page here.

Otherwise patch looks good. I think we can clear the above up
with an addition to the documentation. Could go in after the
initial set and be OK with me.

IMO this patch is needed otherwise user space is at a complete
loss on trying to figure out how netdevs map to switch silicon.
You could have reused ndo_get_phys_port_id() perhaps but then
I think user space may get confused by SR-IOV/VMDQ/etc ports
attached to a switch silicon. For .02$ having a new distinct
identifier is cleaner.


> +	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
> +			  chip itself (it can be though of as a "parent" of the
> +			  port, therefore the name). They are not port-specific.
> +			  Caller might use arbitrary port netdevice of the same
> +			  switch and it will make no difference.
> +	ndo_sw_port_* - Functions that serve for a port-specific manipulation.

[...]

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
From: John Fastabend @ 2014-11-10 22:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-4-git-send-email-jiri@resnulli.us>

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> The netdevice represents a port in a switch, it will expose
> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Yep, I need something like this for my management app to
learn how the switch is laid out. This becomes more relevant
if I have switch silicon mixed with NICs that are not connected
to the switch object in the same platform.

Acked-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 04/10] net-sysfs: expose physical switch id for particular device
From: John Fastabend @ 2014-11-10 22:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-5-git-send-email-jiri@resnulli.us>

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

Acked-by: John Fastabend <john.r.fastabend@intel.com>



-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: John Fastabend @ 2014-11-10 22:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-6-git-send-email-jiri@resnulli.us>

On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> This patch introduces the first driver to benefit from the switchdev
> infrastructure and to implement newly introduced switch ndos. This is a
> driver for emulated switch chip implemented in qemu:
> https://github.com/sfeldma/qemu-rocker/
>
> This patch is a result of joint work with Scott Feldman.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   MAINTAINERS                          |    7 +
>   drivers/net/ethernet/Kconfig         |    1 +
>   drivers/net/ethernet/Makefile        |    1 +
>   drivers/net/ethernet/rocker/Kconfig  |   27 +
>   drivers/net/ethernet/rocker/Makefile |    5 +
>   drivers/net/ethernet/rocker/rocker.c | 2060 ++++++++++++++++++++++++++++++++++
>   drivers/net/ethernet/rocker/rocker.h |  427 +++++++
>   7 files changed, 2528 insertions(+)
>   create mode 100644 drivers/net/ethernet/rocker/Kconfig
>   create mode 100644 drivers/net/ethernet/rocker/Makefile
>   create mode 100644 drivers/net/ethernet/rocker/rocker.c
>   create mode 100644 drivers/net/ethernet/rocker/rocker.h
>

[...]

> +
> +static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	struct rocker *rocker = rocker_port->rocker;
> +	struct rocker_desc_info *desc_info;
> +	struct rocker_tlv *frags;
> +	int i;
> +	int err;
> +
> +	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
> +	if (unlikely(!desc_info)) {
> +		if (net_ratelimit())

Could you have a netif_stop_queue() here as well same as below? Not
that optimizing the xmit routine is the interesting part of this patch.
But I guess this is just some strange error path because I see you
check this case below.

> +			netdev_err(dev, "tx ring full when queue awake\n");
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	rocker_desc_cookie_ptr_set(desc_info, skb);
> +
> +	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
> +	if (!frags)
> +		goto out;
> +	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
> +					  skb->data, skb_headlen(skb));
> +	if (err)
> +		goto nest_cancel;
> +	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
> +		goto nest_cancel;
> +
> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> +		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
> +						  skb_frag_address(frag),
> +						  skb_frag_size(frag));
> +		if (err)
> +			goto unmap_frags;
> +	}
> +	rocker_tlv_nest_end(desc_info, frags);
> +
> +	rocker_desc_gen_clear(desc_info);
> +	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
> +
> +	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
> +	if (!desc_info)
> +		netif_stop_queue(dev);

I'm not entirely sure I followed the TLV usage here but OK. If it
works...

> +
> +	return NETDEV_TX_OK;
> +
> +unmap_frags:
> +	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
> +nest_cancel:
> +	rocker_tlv_nest_cancel(desc_info, frags);
> +out:
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}
> +
> +static int rocker_port_set_mac_address(struct net_device *dev, void *p)
> +{
> +	struct sockaddr *addr = p;
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	int err;
> +
> +	if (!is_valid_ether_addr(addr->sa_data))
> +		return -EADDRNOTAVAIL;
> +
> +	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
> +	if (err)
> +		return err;
> +	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
> +	return 0;
> +}
> +
> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
> +					struct netdev_phys_item_id *psid)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +	struct rocker *rocker = rocker_port->rocker;
> +

hmm looks like you read this out of a magic switch register :) but
my switch doesn't have this magic reg. I suposse the switch MAC address
should work.

> +	psid->id_len = sizeof(rocker->hw.id);
> +	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
> +	return 0;
> +}
> +
> +static const struct net_device_ops rocker_port_netdev_ops = {
> +	.ndo_open			= rocker_port_open,
> +	.ndo_stop			= rocker_port_stop,
> +	.ndo_start_xmit			= rocker_port_xmit,
> +	.ndo_set_mac_address		= rocker_port_set_mac_address,
> +	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
> +};
> +
> +/********************
> + * ethtool interface
> + ********************/
> +
> +static int rocker_port_get_settings(struct net_device *dev,
> +				    struct ethtool_cmd *ecmd)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +
> +	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
> +}
> +
> +static int rocker_port_set_settings(struct net_device *dev,
> +				    struct ethtool_cmd *ecmd)
> +{
> +	struct rocker_port *rocker_port = netdev_priv(dev);
> +
> +	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
> +}
> +
> +static void rocker_port_get_drvinfo(struct net_device *dev,
> +				    struct ethtool_drvinfo *drvinfo)
> +{
> +	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
> +	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
> +}
> +
> +static const struct ethtool_ops rocker_port_ethtool_ops = {
> +	.get_settings		= rocker_port_get_settings,
> +	.set_settings		= rocker_port_set_settings,
> +	.get_drvinfo		= rocker_port_get_drvinfo,
> +	.get_link		= ethtool_op_get_link,
> +};
> +

[...]

Looks reasonable to me, although I mostly scanned it and looked
over the interface parts. My comments are just observations no
need to change anything for them.

Reviewed-by: John Fastabend <john.r.fastabend@intel.com>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* [PATCH net-next] mlx4: restore conditional call to napi_complete_done()
From: Eric Dumazet @ 2014-11-10 22:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, amirv, ogerlitz, willemb
In-Reply-To: <1415642488.9613.5.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.

This added extra interrupt pressure, this patch restores proper
behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1a28817282 ("mlx4: use napi_complete_done()")
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 46ee78326f1fecb14f42d1afcc113fa18087cfa6..5a193f40a14c2a89fecfbac5dce0771e15e69861 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -910,13 +910,14 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 		cpu_curr = smp_processor_id();
 		aff = irq_desc_get_irq_data(cq->irq_desc)->affinity;
 
-		if (unlikely(!cpumask_test_cpu(cpu_curr, aff))) {
-			/* Current cpu is not according to smp_irq_affinity -
-			 * probably affinity changed. need to stop this NAPI
-			 * poll, and restart it on the right CPU
-			 */
-			done = 0;
-		}
+		if (likely(cpumask_test_cpu(cpu_curr, aff)))
+			return budget;
+
+		/* Current cpu is not according to smp_irq_affinity -
+		 * probably affinity changed. need to stop this NAPI
+		 * poll, and restart it on the right CPU
+		 */
+		done = 0;
 	}
 	/* Done for now */
 	napi_complete_done(napi, done);

^ permalink raw reply related

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
From: Jiri Pirko @ 2014-11-10 22:14 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5460FCB0.1040409@cumulusnetworks.com>

Mon, Nov 10, 2014 at 06:58:08PM CET, roopa@cumulusnetworks.com wrote:
>On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>The netdevice represents a port in a switch, it will expose
>>IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>belong to one physical switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  include/uapi/linux/if_link.h |  1 +
>>  net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>index 7072d83..4163753 100644
>>--- a/include/uapi/linux/if_link.h
>>+++ b/include/uapi/linux/if_link.h
>>@@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>>+	IFLA_PHYS_SWITCH_ID,
>
>Jiri, since we have not really converged on the switchdev class or having a
>separate switchdev instance,
>am thinking it is better if we dont expose any such switch_id to userspace
>yet until absolutely needed. Do you need it today ?
>There is no real in kernel hw switch driver that will use it today. And quite
>likely this will need to change when we introduce real hw switch drivers.

When and if the switchdev class is introduced, switch id can happily
live on. It is nothing against it. Userspace should use this id to group
the ports of physical switch.

>
>
>>  	__IFLA_MAX
>>  };
>>diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>index 1087c6d..f839354 100644
>>--- a/net/core/rtnetlink.c
>>+++ b/net/core/rtnetlink.c
>>@@ -43,6 +43,7 @@
>>  #include <linux/inet.h>
>>  #include <linux/netdevice.h>
>>+#include <net/switchdev.h>
>>  #include <net/ip.h>
>>  #include <net/protocol.h>
>>  #include <net/arp.h>
>>@@ -868,7 +869,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>>  	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>>  	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>>  	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
>>-	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
>>+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
>>+	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
>>  }
>>  static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
>>@@ -967,6 +969,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>>  	return 0;
>>  }
>>+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
>>+{
>>+	int err;
>>+	struct netdev_phys_item_id psid;
>>+
>>+	err = netdev_sw_parent_id_get(dev, &psid);
>>+	if (err) {
>>+		if (err == -EOPNOTSUPP)
>>+			return 0;
>>+		return err;
>>+	}
>>+
>>+	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
>>+		return -EMSGSIZE;
>>+
>>+	return 0;
>>+}
>>+
>>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>  			    int type, u32 pid, u32 seq, u32 change,
>>  			    unsigned int flags, u32 ext_filter_mask)
>>@@ -1039,6 +1059,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>>  	if (rtnl_phys_port_id_fill(skb, dev))
>>  		goto nla_put_failure;
>>+	if (rtnl_phys_switch_id_fill(skb, dev))
>>+		goto nla_put_failure;
>>+
>>  	attr = nla_reserve(skb, IFLA_STATS,
>>  			sizeof(struct rtnl_link_stats));
>>  	if (attr == NULL)
>>@@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>>  	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>>  	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>>+	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>>  };
>>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: John Fastabend @ 2014-11-10 22:23 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy, tgraf,
	dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, roopa, linville, jasowang,
	ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, alexander.h.duyck,
	john.ronciak, mleitner, shrijeet, gospo
In-Reply-To: <20141110045830.GC17246@vergenet.net>

On 11/09/2014 08:58 PM, Simon Horman wrote:
> On Sun, Nov 09, 2014 at 11:03:40PM -0500, Jamal Hadi Salim wrote:
>> Hi Simon,
>>
>> On 11/09/14 22:46, Simon Horman wrote:
>>> Hi Jamal, Hi Jiri,
>>>
>>> On a somewhat related note I am also wondering what if any progress has
>>> been made regarding discussions of (and code for) the following:
>>>
>>> 1. Exposing flow tables to user-space
>>>     - I realise that this is Open vSwitch specific to some extent
>>>       but I am in no way implying that it should be done instead of
>>>       non-Open vSwitch specific work.
>>>     - Jiri, IIRC this was part ~v2 of your earlier offload patchset
>>>
>>
>> I dont know what Rocker crowd is doing; however, I know
>> John F. has been doing some work which i have stared at
>> and I was hoping to join in with Ben's effort and show tc flow
>> offload on the realtek chip in my infinite spare time unles.
>> (for both Linux bridge and ports).
>> The priority is to merge the obvious bits first.
>
> Merging the obvious bits first is quite fine my me.
>

+1

>>> 2. Describing Switch Hardware
>>>     - I see John Fastabend moving forwards on this in his git repository
>>>       https://github.com/jrfastab/flow-net-next
>>>
>>> The way that I see things is that both of the above could be exposed via
>>> netlink. And that the first at least could be backed by NDOs.  As such I
>>> see this work as complementary and perhaps applying on top of this
>>> patchset. If I am mistaken in this regards it would be good to know :)
>>>
>>
>> You are correct - I will let John speak on his work, but
>> that is the intent.
>> The challenge is there are many schools of thoughts and i am hoping
>> it is not an arms race.
>
> That is also my hope.

My intent is to submit the Flow API bits once the base rocker switch
gets committed. I've implemented the Flow API against ixgbe and a
sadly a proprietary SDK. I'll implement it against the rocker switch
as well.

>
>>> I am of course also interested to know if the above are moving forwards.
>>> To be clear I am very interested in being able to use these APIs to
>>> perform Open vSwitch offloads and I am very happy to help.
>>> (Jamal: I'm also interested in non-Open vSwitch offloads :)
>>>
>>

I think they are moving forward. I have some code cleanup to do on
the flow API, but its mostly in place. Then I want to implement
an example on Rocker switch so we could experiment with something
why we wait for a real hardware driver. From my side assuming I at
least got it close to correct should be doable in the next few week.

After that I want to work with Jesse/Jamal and look at integrating
with OVS and other stacks. I thought a bit about the OVS integration
path but I'll hold that discussion for the moment.

Simon, if your feeling adventurous any feedback on the repo link
would be great. I still need to smash the commit log into something
coherent though at the moment you can see all the errors and rewrites,
etc as I made them.

>> Hey, OVS should be able to use these APIs; i am just interested in making
>> sure they are not just for OVS or OF. Then we are all happy;->
>
> I think we are all happy :)

I'm happy :) Also my intent is the flow API is more general then
just OVS. My view is OVS should be one user of the API.

.John

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
From: John Fastabend @ 2014-11-10 22:31 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Roopa Prabhu, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo
In-Reply-To: <20141110221419.GA2061@nanopsycho.orion>

On 11/10/2014 02:14 PM, Jiri Pirko wrote:
> Mon, Nov 10, 2014 at 06:58:08PM CET, roopa@cumulusnetworks.com wrote:
>> On 11/9/14, 2:51 AM, Jiri Pirko wrote:
>>> The netdevice represents a port in a switch, it will expose
>>> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
>>> belong to one physical switch.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>   include/uapi/linux/if_link.h |  1 +
>>>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>>>   2 files changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 7072d83..4163753 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>   	IFLA_CARRIER,
>>>   	IFLA_PHYS_PORT_ID,
>>>   	IFLA_CARRIER_CHANGES,
>>> +	IFLA_PHYS_SWITCH_ID,
>>
>> Jiri, since we have not really converged on the switchdev class or having a
>> separate switchdev instance,
>> am thinking it is better if we dont expose any such switch_id to userspace
>> yet until absolutely needed. Do you need it today ?
>> There is no real in kernel hw switch driver that will use it today. And quite
>> likely this will need to change when we introduce real hw switch drivers.
>
> When and if the switchdev class is introduced, switch id can happily
> live on. It is nothing against it. Userspace should use this id to group
> the ports of physical switch.
>
>

+1 I think we need this otherwise how will userspace "know" how the
ports are related? If I have two switch silicon blocks in a single
platform or perhaps have a switch silicon with traditional host
nics on the same platform.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* [net-next PATCH] ixgbe: Remove IXGBE_FLAG_IN_NETPOLL since it doesn't do anything
From: Alexander Duyck @ 2014-11-10 22:50 UTC (permalink / raw)
  To: netdev; +Cc: donald.c.skidmore, davem, jeffrey.t.kirsher

This patch removes some dead code from the cleanup path for ixgbe.

Setting and clearing the flag doesn't do anything since all we are doing is
setting the flag, scheduling napi, clearing the flag and then letting
netpoll do the polling cleanup.  As such it doesn't make much sense to have
it there.

This patch also removes one minor white-space error.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   16 ++++------------
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5032a60..c8880a1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -615,7 +615,6 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG_RX_1BUF_CAPABLE              (u32)(1 << 4)
 #define IXGBE_FLAG_RX_PS_CAPABLE                (u32)(1 << 5)
 #define IXGBE_FLAG_RX_PS_ENABLED                (u32)(1 << 6)
-#define IXGBE_FLAG_IN_NETPOLL                   (u32)(1 << 7)
 #define IXGBE_FLAG_DCA_ENABLED                  (u32)(1 << 8)
 #define IXGBE_FLAG_DCA_CAPABLE                  (u32)(1 << 9)
 #define IXGBE_FLAG_IMIR_ENABLED                 (u32)(1 << 10)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 8e4ba8c..ab14b6b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1590,10 +1590,8 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 
 	if (ixgbe_qv_busy_polling(q_vector))
 		netif_receive_skb(skb);
-	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
-		napi_gro_receive(&q_vector->napi, skb);
 	else
-		netif_rx(skb);
+		napi_gro_receive(&q_vector->napi, skb);
 }
 
 /**
@@ -6017,7 +6015,6 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
 
 	/* Cause software interrupt to ensure rings are cleaned */
 	ixgbe_irq_rearm_queues(adapter, eics);
-
 }
 
 /**
@@ -7340,14 +7337,9 @@ static void ixgbe_netpoll(struct net_device *netdev)
 	if (test_bit(__IXGBE_DOWN, &adapter->state))
 		return;
 
-	adapter->flags |= IXGBE_FLAG_IN_NETPOLL;
-	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
-		for (i = 0; i < adapter->num_q_vectors; i++)
-			ixgbe_msix_clean_rings(0, adapter->q_vector[i]);
-	} else {
-		ixgbe_intr(adapter->pdev->irq, netdev);
-	}
-	adapter->flags &= ~IXGBE_FLAG_IN_NETPOLL;
+	/* loop through and schedule all active queues */
+	for (i = 0; i < adapter->num_q_vectors; i++)
+		ixgbe_msix_clean_rings(0, adapter->q_vector[i]);
 }
 
 #endif

^ permalink raw reply related

* Re: How to make stack send broadcast ARP request when entry is STALE?
From: Brian Haley @ 2014-11-10 22:52 UTC (permalink / raw)
  To: Ulf samuelsson; +Cc: Netdev
In-Reply-To: <3A67CF46-0BC1-4BAA-818A-0B656C6B46B6@emagii.com>

On 11/07/2014 05:11 AM, Ulf samuelsson wrote:
> The HP router is configured by a customer, and they intentionally limit replies
> to broadcast, and that is how they want it.

So this is the crux of the problem - the customer has configured the router so
that it doesn't play well with most modern network stacks that try and use
unicast so they don't send unnecessary broadcast packets.  I don't know why I
thought this was something wrong with the router software.

Did you try this?

$ sudo sysctl net.ipv4.neigh.eth0.ucast_solicit=0

It works for me.

And they really should re-think their decision on that configuration setting.

-Brian


> In the previous version of the build system, the Interpeak stack was used
> and this would in PROBE state send unicast ARP request, and if that failed
> send broadcast ARP.
> 
> The native linux stack, when in PROBE state sends only unicast until it decides
> that it should enter FAILED state.
> 
> The 'mcast_probes' variable seems to be totally ignored, except the first  time,
> so I do not see why it is there.
> 
> Best Regards
> Ulf Samuelsson
> ulf@emagii.com
> +46  (722) 427 437
> 
> 
>> 7 nov 2014 kl. 10:54 skrev Brian Haley <brian.haley@hp.com>:
>>
>>> On 11/05/2014 07:48 AM, Ulf samuelsson wrote:
>>> Have a problem with an HP router at a certain location, which
>>> is configured to only answer to broadcast ARP requests.
>>> That cannot be changed.
>>
>> Sorry to hear about the problem, but my only suggestions would be to try the latest firmware and/or put a call in to support.  I don't happen work in the division that makes routers...
>>
>>> The first ARP request the kernel sends out, is a broadcast request,
>>> which is fine, but after the reply, the kernel sends unicast requests,
>>> which will not get any replies.
>>
>> You might be able to hack this by inserting an ebtables rule - check the dnat target section of the man page - don't know the exact syntax but it would probably end in '-j dnat --to-destination ff:ff:ff:ff:ff:ff'
>>
>> -Brian
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] checkpatch: Add --strict preference for #defines using BIT(foo)
From: Andrew Morton @ 2014-11-10 23:36 UTC (permalink / raw)
  To: Joe Perches; +Cc: David Miller, jiri, netdev, LKML
In-Reply-To: <1415394939.23530.20.camel@perches.com>

On Fri, 07 Nov 2014 13:15:39 -0800 Joe Perches <joe@perches.com> wrote:

> Using BIT(foo) and BIT_ULL(bar) is more common now.
> Suggest using these macros over #defines with 1<<value.

urgh.  I'm counting eightish implementations of BIT(), an unknown
number of which are actually being used.  Many use 1<<n, some use
1UL<<N, another uses 1ULL<<n.  I'm a bit reluctant to recommend that
anyone should use BIT() until it has has some vigorous scrubbing :(

Is it actually an improvement?  If I see

#define X	(1U << 7)

then I know exactly what it does.  Whereas when I see

#define X	BIT(7)

I know neither the size or the signedness of X so I have to go look it
up.

I have no strong feelings either way, but I'm wondering what might have
inspired this change?

^ permalink raw reply

* Re: [PATCH net-next] net: introduce SO_INCOMING_CPU
From: Eric Dumazet @ 2014-11-10 23:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ycai, willemb, ncardwell
In-Reply-To: <20141110.150826.1917858055737674539.davem@davemloft.net>

On Mon, 2014-11-10 at 15:08 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 07 Nov 2014 12:51:12 -0800
> 
> > @@ -1455,6 +1455,7 @@ process:
> >  		goto discard_and_relse;
> >  
> >  	sk_mark_napi_id(sk, skb);
> > +	sk_incoming_cpu_update(sk);
> 
> Just make sk_mark_napi_id() call sk_incoming_cpu_update().
> 
> You've matched up the calls precisely in this patch, and I can't think
> of any situation where we'd add a sk_mark_napi_call() not not want to
> do an sk_incoming_cpu_update().

I believe this was a coincidence.

In fact some sk_mark_napi_id() calls are not at the right place.
It makes little sense to change sk->sk_napi_id for a listener socket.

sk_mark_napi_id() should better be done [1] at the same time we call
sock_rps_save_rxhash

But we need to store cpu before prequeue or backlog (as I did in my
patch)


[1]

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9c7d7621466b1241f404a5ca11de809dcff2d02a..f10438ac9c0a4013a8a812b64d94e1cf6dfbd83e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1429,6 +1429,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 		struct dst_entry *dst = sk->sk_rx_dst;
 
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
 		if (dst) {
 			if (inet_sk(sk)->rx_dst_ifindex != skb->skb_iif ||
 			    dst->ops->check(dst, 0) == NULL) {
@@ -1450,6 +1451,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 
 		if (nsk != sk) {
 			sock_rps_save_rxhash(nsk, skb);
+			sk_mark_napi_id(nsk, skb);
 			if (tcp_child_process(sk, nsk, skb)) {
 				rsk = nsk;
 				goto reset;
@@ -1661,7 +1663,6 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
-	sk_mark_napi_id(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index ace29b60813cf8a1d7182ad2262cbcbd21810fa7..a83eaff0d936677ef71e8f9f9cd0509cb023b45d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1293,6 +1293,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		struct dst_entry *dst = sk->sk_rx_dst;
 
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
 		if (dst) {
 			if (inet_sk(sk)->rx_dst_ifindex != skb->skb_iif ||
 			    dst->ops->check(dst, np->rx_dst_cookie) == NULL) {
@@ -1322,6 +1323,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (nsk != sk) {
 			sock_rps_save_rxhash(nsk, skb);
+			sk_mark_napi_id(nsk, skb);
 			if (tcp_child_process(sk, nsk, skb))
 				goto reset;
 			if (opt_skb)
@@ -1454,7 +1456,6 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
-	sk_mark_napi_id(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);

^ permalink raw reply related

* Re: [PATCH] checkpatch: Add --strict preference for #defines using BIT(foo)
From: Joe Perches @ 2014-11-10 23:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Miller, jiri, netdev, LKML
In-Reply-To: <20141110153647.f98f7d60fa24bf3bf7cbc215@linux-foundation.org>

On Mon, 2014-11-10 at 15:36 -0800, Andrew Morton wrote:
> On Fri, 07 Nov 2014 13:15:39 -0800 Joe Perches <joe@perches.com> wrote:
> 
> > Using BIT(foo) and BIT_ULL(bar) is more common now.
> > Suggest using these macros over #defines with 1<<value.
> 
> urgh.  I'm counting eightish implementations of BIT(), an unknown
> number of which are actually being used.  Many use 1<<n, some use
> 1UL<<N, another uses 1ULL<<n.  I'm a bit reluctant to recommend that
> anyone should use BIT() until it has has some vigorous scrubbing :(
>
> Is it actually an improvement?  If I see
> 
> #define X	(1U << 7)
> 
> then I know exactly what it does.  Whereas when I see
> 
> #define X	BIT(7)
> 
> I know neither the size or the signedness of X so I have to go look it
> up.

I'm not sure the signedness or type of X matters much
as the non-64bit comparisons are done by promotion to
at least int or unsigned int anyway.

The BIT macro makes sure a single bit is set.

The 'good' one is in bitops.h.  It also has #define BIT_ULL

include/linux/bitops.h:#define BIT(nr)                  (1UL << (nr))
include/linux/bitops.h-#define BIT_ULL(nr)              (1ULL << (nr))

The ones in tools/ are independent and should not be changed.
Excluding tools/, the others should probably be removed

$ git grep -E "define\s+BIT\b" 
arch/arm/mach-davinci/sleep.S:#define BIT(nr)                   (1 << (nr))
drivers/staging/rtl8188eu/include/rtl8188e_spec.h:#define BIT(x)                (1 << (x))
drivers/staging/rtl8188eu/include/wifi.h:#define BIT(x) (1 << (x))
drivers/staging/rtl8192e/rtl8192e/rtl_core.h:#define BIT(_i)                            (1<<(_i))
drivers/staging/rtl8712/osdep_service.h:        #define BIT(x)  (1 << (x))
drivers/staging/rtl8712/wifi.h:#define BIT(x)   (1 << (x))

> I have no strong feelings either way, but I'm wondering what might have
> inspired this change?

David Miller commented on a netdev patch where 1<<foo was
being used in a #define and suggested using BIT.

http://article.gmane.org/gmane.linux.network/337535/match=bit

Using BIT _is_ more common in recent patches too.

^ permalink raw reply

* [Patch net-next v3] neigh: remove dynamic neigh table registration support
From: Cong Wang @ 2014-11-10 23:59 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang

Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
v3: fix neigh_table_clear()
v2: remove useless #ifdef's
    move the assignment to the end of neigh_table_init()

 include/net/neighbour.h |  11 ++-
 net/core/neighbour.c    | 247 ++++++++++++++++++++++--------------------------
 net/decnet/dn_neigh.c   |   4 +-
 net/ipv4/arp.c          |   2 +-
 net/ipv6/ndisc.c        |   4 +-
 5 files changed, 126 insertions(+), 142 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index dedfb18..eb070b3 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -220,6 +220,13 @@ struct neigh_table {
 	struct pneigh_entry	**phash_buckets;
 };
 
+enum {
+	NEIGH_ARP_TABLE = 0,
+	NEIGH_ND_TABLE = 1,
+	NEIGH_DN_TABLE = 2,
+	NEIGH_NR_TABLES,
+};
+
 static inline int neigh_parms_family(struct neigh_parms *p)
 {
 	return p->tbl->family;
@@ -240,8 +247,8 @@ static inline void *neighbour_priv(const struct neighbour *n)
 #define NEIGH_UPDATE_F_ISROUTER			0x40000000
 #define NEIGH_UPDATE_F_ADMIN			0x80000000
 
-void neigh_table_init(struct neigh_table *tbl);
-int neigh_table_clear(struct neigh_table *tbl);
+void neigh_table_init(int index, struct neigh_table *tbl);
+int neigh_table_clear(int index, struct neigh_table *tbl);
 struct neighbour *neigh_lookup(struct neigh_table *tbl, const void *pkey,
 			       struct net_device *dev);
 struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index edd0411..8e38f17 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -56,7 +56,6 @@ static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 
-static struct neigh_table *neigh_tables;
 #ifdef CONFIG_PROC_FS
 static const struct file_operations neigh_stat_seq_fops;
 #endif
@@ -87,13 +86,8 @@ static const struct file_operations neigh_stat_seq_fops;
    the most complicated procedure, which we allow is dev->hard_header.
    It is supposed, that dev->hard_header is simplistic and does
    not make callbacks to neighbour tables.
-
-   The last lock is neigh_tbl_lock. It is pure SMP lock, protecting
-   list of neighbour tables. This list is used only in process context,
  */
 
-static DEFINE_RWLOCK(neigh_tbl_lock);
-
 static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb)
 {
 	kfree_skb(skb);
@@ -1520,7 +1514,9 @@ static void neigh_parms_destroy(struct neigh_parms *parms)
 
 static struct lock_class_key neigh_table_proxy_queue_class;
 
-static void neigh_table_init_no_netlink(struct neigh_table *tbl)
+static struct neigh_table *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
+
+void neigh_table_init(int index, struct neigh_table *tbl)
 {
 	unsigned long now = jiffies;
 	unsigned long phsize;
@@ -1566,34 +1562,14 @@ static void neigh_table_init_no_netlink(struct neigh_table *tbl)
 
 	tbl->last_flush = now;
 	tbl->last_rand	= now + tbl->parms.reachable_time * 20;
-}
-
-void neigh_table_init(struct neigh_table *tbl)
-{
-	struct neigh_table *tmp;
 
-	neigh_table_init_no_netlink(tbl);
-	write_lock(&neigh_tbl_lock);
-	for (tmp = neigh_tables; tmp; tmp = tmp->next) {
-		if (tmp->family == tbl->family)
-			break;
-	}
-	tbl->next	= neigh_tables;
-	neigh_tables	= tbl;
-	write_unlock(&neigh_tbl_lock);
-
-	if (unlikely(tmp)) {
-		pr_err("Registering multiple tables for family %d\n",
-		       tbl->family);
-		dump_stack();
-	}
+	neigh_tables[index] = tbl;
 }
 EXPORT_SYMBOL(neigh_table_init);
 
-int neigh_table_clear(struct neigh_table *tbl)
+int neigh_table_clear(int index, struct neigh_table *tbl)
 {
-	struct neigh_table **tp;
-
+	neigh_tables[index] = NULL;
 	/* It is not clean... Fix it to unload IPv6 module safely */
 	cancel_delayed_work_sync(&tbl->gc_work);
 	del_timer_sync(&tbl->proxy_timer);
@@ -1601,14 +1577,6 @@ int neigh_table_clear(struct neigh_table *tbl)
 	neigh_ifdown(tbl, NULL);
 	if (atomic_read(&tbl->entries))
 		pr_crit("neighbour leakage\n");
-	write_lock(&neigh_tbl_lock);
-	for (tp = &neigh_tables; *tp; tp = &(*tp)->next) {
-		if (*tp == tbl) {
-			*tp = tbl->next;
-			break;
-		}
-	}
-	write_unlock(&neigh_tbl_lock);
 
 	call_rcu(&rcu_dereference_protected(tbl->nht, 1)->rcu,
 		 neigh_hash_free_rcu);
@@ -1626,12 +1594,32 @@ int neigh_table_clear(struct neigh_table *tbl)
 }
 EXPORT_SYMBOL(neigh_table_clear);
 
+static struct neigh_table *neigh_find_table(int family)
+{
+	struct neigh_table *tbl = NULL;
+
+	switch (family) {
+	case AF_INET:
+		tbl = neigh_tables[NEIGH_ARP_TABLE];
+		break;
+	case AF_INET6:
+		tbl = neigh_tables[NEIGH_ND_TABLE];
+		break;
+	case AF_DECnet:
+		tbl = neigh_tables[NEIGH_DN_TABLE];
+		break;
+	}
+
+	return tbl;
+}
+
 static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
 	struct nlattr *dst_attr;
 	struct neigh_table *tbl;
+	struct neighbour *neigh;
 	struct net_device *dev = NULL;
 	int err = -EINVAL;
 
@@ -1652,39 +1640,31 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 	}
 
-	read_lock(&neigh_tbl_lock);
-	for (tbl = neigh_tables; tbl; tbl = tbl->next) {
-		struct neighbour *neigh;
-
-		if (tbl->family != ndm->ndm_family)
-			continue;
-		read_unlock(&neigh_tbl_lock);
-
-		if (nla_len(dst_attr) < tbl->key_len)
-			goto out;
+	tbl = neigh_find_table(ndm->ndm_family);
+	if (tbl == NULL)
+		return -EAFNOSUPPORT;
 
-		if (ndm->ndm_flags & NTF_PROXY) {
-			err = pneigh_delete(tbl, net, nla_data(dst_attr), dev);
-			goto out;
-		}
+	if (nla_len(dst_attr) < tbl->key_len)
+		goto out;
 
-		if (dev == NULL)
-			goto out;
+	if (ndm->ndm_flags & NTF_PROXY) {
+		err = pneigh_delete(tbl, net, nla_data(dst_attr), dev);
+		goto out;
+	}
 
-		neigh = neigh_lookup(tbl, nla_data(dst_attr), dev);
-		if (neigh == NULL) {
-			err = -ENOENT;
-			goto out;
-		}
+	if (dev == NULL)
+		goto out;
 
-		err = neigh_update(neigh, NULL, NUD_FAILED,
-				   NEIGH_UPDATE_F_OVERRIDE |
-				   NEIGH_UPDATE_F_ADMIN);
-		neigh_release(neigh);
+	neigh = neigh_lookup(tbl, nla_data(dst_attr), dev);
+	if (neigh == NULL) {
+		err = -ENOENT;
 		goto out;
 	}
-	read_unlock(&neigh_tbl_lock);
-	err = -EAFNOSUPPORT;
+
+	err = neigh_update(neigh, NULL, NUD_FAILED,
+			   NEIGH_UPDATE_F_OVERRIDE |
+			   NEIGH_UPDATE_F_ADMIN);
+	neigh_release(neigh);
 
 out:
 	return err;
@@ -1692,11 +1672,14 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
+	int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE;
 	struct net *net = sock_net(skb->sk);
 	struct ndmsg *ndm;
 	struct nlattr *tb[NDA_MAX+1];
 	struct neigh_table *tbl;
 	struct net_device *dev = NULL;
+	struct neighbour *neigh;
+	void *dst, *lladdr;
 	int err;
 
 	ASSERT_RTNL();
@@ -1720,70 +1703,60 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 			goto out;
 	}
 
-	read_lock(&neigh_tbl_lock);
-	for (tbl = neigh_tables; tbl; tbl = tbl->next) {
-		int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE;
-		struct neighbour *neigh;
-		void *dst, *lladdr;
+	tbl = neigh_find_table(ndm->ndm_family);
+	if (tbl == NULL)
+		return -EAFNOSUPPORT;
 
-		if (tbl->family != ndm->ndm_family)
-			continue;
-		read_unlock(&neigh_tbl_lock);
+	if (nla_len(tb[NDA_DST]) < tbl->key_len)
+		goto out;
+	dst = nla_data(tb[NDA_DST]);
+	lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL;
 
-		if (nla_len(tb[NDA_DST]) < tbl->key_len)
-			goto out;
-		dst = nla_data(tb[NDA_DST]);
-		lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL;
+	if (ndm->ndm_flags & NTF_PROXY) {
+		struct pneigh_entry *pn;
+
+		err = -ENOBUFS;
+		pn = pneigh_lookup(tbl, net, dst, dev, 1);
+		if (pn) {
+			pn->flags = ndm->ndm_flags;
+			err = 0;
+		}
+		goto out;
+	}
 
-		if (ndm->ndm_flags & NTF_PROXY) {
-			struct pneigh_entry *pn;
+	if (dev == NULL)
+		goto out;
 
-			err = -ENOBUFS;
-			pn = pneigh_lookup(tbl, net, dst, dev, 1);
-			if (pn) {
-				pn->flags = ndm->ndm_flags;
-				err = 0;
-			}
+	neigh = neigh_lookup(tbl, dst, dev);
+	if (neigh == NULL) {
+		if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
+			err = -ENOENT;
 			goto out;
 		}
 
-		if (dev == NULL)
+		neigh = __neigh_lookup_errno(tbl, dst, dev);
+		if (IS_ERR(neigh)) {
+			err = PTR_ERR(neigh);
+			goto out;
+		}
+	} else {
+		if (nlh->nlmsg_flags & NLM_F_EXCL) {
+			err = -EEXIST;
+			neigh_release(neigh);
 			goto out;
-
-		neigh = neigh_lookup(tbl, dst, dev);
-		if (neigh == NULL) {
-			if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
-				err = -ENOENT;
-				goto out;
-			}
-
-			neigh = __neigh_lookup_errno(tbl, dst, dev);
-			if (IS_ERR(neigh)) {
-				err = PTR_ERR(neigh);
-				goto out;
-			}
-		} else {
-			if (nlh->nlmsg_flags & NLM_F_EXCL) {
-				err = -EEXIST;
-				neigh_release(neigh);
-				goto out;
-			}
-
-			if (!(nlh->nlmsg_flags & NLM_F_REPLACE))
-				flags &= ~NEIGH_UPDATE_F_OVERRIDE;
 		}
 
-		if (ndm->ndm_flags & NTF_USE) {
-			neigh_event_send(neigh, NULL);
-			err = 0;
-		} else
-			err = neigh_update(neigh, lladdr, ndm->ndm_state, flags);
-		neigh_release(neigh);
-		goto out;
+		if (!(nlh->nlmsg_flags & NLM_F_REPLACE))
+			flags &= ~NEIGH_UPDATE_F_OVERRIDE;
 	}
 
-	read_unlock(&neigh_tbl_lock);
-	err = -EAFNOSUPPORT;
+	if (ndm->ndm_flags & NTF_USE) {
+		neigh_event_send(neigh, NULL);
+		err = 0;
+	} else
+		err = neigh_update(neigh, lladdr, ndm->ndm_state, flags);
+	neigh_release(neigh);
+
 out:
 	return err;
 }
@@ -1982,7 +1955,8 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct neigh_table *tbl;
 	struct ndtmsg *ndtmsg;
 	struct nlattr *tb[NDTA_MAX+1];
-	int err;
+	bool found = false;
+	int err, tidx;
 
 	err = nlmsg_parse(nlh, sizeof(*ndtmsg), tb, NDTA_MAX,
 			  nl_neightbl_policy);
@@ -1995,19 +1969,21 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 	}
 
 	ndtmsg = nlmsg_data(nlh);
-	read_lock(&neigh_tbl_lock);
-	for (tbl = neigh_tables; tbl; tbl = tbl->next) {
+
+	for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) {
+		tbl = neigh_tables[tidx];
+		if (!tbl)
+			continue;
 		if (ndtmsg->ndtm_family && tbl->family != ndtmsg->ndtm_family)
 			continue;
-
-		if (nla_strcmp(tb[NDTA_NAME], tbl->id) == 0)
+		if (nla_strcmp(tb[NDTA_NAME], tbl->id) == 0) {
+			found = true;
 			break;
+		}
 	}
 
-	if (tbl == NULL) {
-		err = -ENOENT;
-		goto errout_locked;
-	}
+	if (!found)
+		return -ENOENT;
 
 	/*
 	 * We acquire tbl->lock to be nice to the periodic timers and
@@ -2118,8 +2094,6 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 errout_tbl_lock:
 	write_unlock_bh(&tbl->lock);
-errout_locked:
-	read_unlock(&neigh_tbl_lock);
 errout:
 	return err;
 }
@@ -2134,10 +2108,13 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 
 	family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
 
-	read_lock(&neigh_tbl_lock);
-	for (tbl = neigh_tables, tidx = 0; tbl; tbl = tbl->next, tidx++) {
+	for (tidx = 0; tidx < NEIGH_NR_TABLES; tidx++) {
 		struct neigh_parms *p;
 
+		tbl = neigh_tables[tidx];
+		if (!tbl)
+			continue;
+
 		if (tidx < tbl_skip || (family && tbl->family != family))
 			continue;
 
@@ -2168,7 +2145,6 @@ static int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 		neigh_skip = 0;
 	}
 out:
-	read_unlock(&neigh_tbl_lock);
 	cb->args[0] = tidx;
 	cb->args[1] = nidx;
 
@@ -2351,7 +2327,6 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 	int proxy = 0;
 	int err;
 
-	read_lock(&neigh_tbl_lock);
 	family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
 
 	/* check for full ndmsg structure presence, family member is
@@ -2363,8 +2338,11 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 
 	s_t = cb->args[0];
 
-	for (tbl = neigh_tables, t = 0; tbl;
-	     tbl = tbl->next, t++) {
+	for (t = 0; t < NEIGH_NR_TABLES; t++) {
+		tbl = neigh_tables[t];
+
+		if (!tbl)
+			continue;
 		if (t < s_t || (family && tbl->family != family))
 			continue;
 		if (t > s_t)
@@ -2377,7 +2355,6 @@ static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 		if (err < 0)
 			break;
 	}
-	read_unlock(&neigh_tbl_lock);
 
 	cb->args[0] = t;
 	return skb->len;
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index c8121ce..7ca7c31 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -591,7 +591,7 @@ static const struct file_operations dn_neigh_seq_fops = {
 
 void __init dn_neigh_init(void)
 {
-	neigh_table_init(&dn_neigh_table);
+	neigh_table_init(NEIGH_DN_TABLE, &dn_neigh_table);
 	proc_create("decnet_neigh", S_IRUGO, init_net.proc_net,
 		    &dn_neigh_seq_fops);
 }
@@ -599,5 +599,5 @@ void __init dn_neigh_init(void)
 void __exit dn_neigh_cleanup(void)
 {
 	remove_proc_entry("decnet_neigh", init_net.proc_net);
-	neigh_table_clear(&dn_neigh_table);
+	neigh_table_clear(NEIGH_DN_TABLE, &dn_neigh_table);
 }
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 16acb59..205e147 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1292,7 +1292,7 @@ static int arp_proc_init(void);
 
 void __init arp_init(void)
 {
-	neigh_table_init(&arp_tbl);
+	neigh_table_init(NEIGH_ARP_TABLE, &arp_tbl);
 
 	dev_add_pack(&arp_packet_type);
 	arp_proc_init();
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 4cb45c1..2c9f6bf 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1763,7 +1763,7 @@ int __init ndisc_init(void)
 	/*
 	 * Initialize the neighbour table
 	 */
-	neigh_table_init(&nd_tbl);
+	neigh_table_init(NEIGH_ND_TABLE, &nd_tbl);
 
 #ifdef CONFIG_SYSCTL
 	err = neigh_sysctl_register(NULL, &nd_tbl.parms,
@@ -1796,6 +1796,6 @@ void ndisc_cleanup(void)
 #ifdef CONFIG_SYSCTL
 	neigh_sysctl_unregister(&nd_tbl.parms);
 #endif
-	neigh_table_clear(&nd_tbl);
+	neigh_table_clear(NEIGH_ND_TABLE, &nd_tbl);
 	unregister_pernet_subsys(&ndisc_net_ops);
 }

^ permalink raw reply related

* Re: [net-next PATCH 1/5] net: Add netdev Rx page allocation function
From: Cong Wang @ 2014-11-11  0:26 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, hariprasad-ut6Up61K2wZBDgjK7y7TUQ,
	donald.c.skidmore-ral2JQCrhuEAvxtiuMwx3w,
	oliver-GvhC2dPhHPQdnm+yROfE0A, balbi-l0cyMroinI0,
	matthew.vick-ral2JQCrhuEAvxtiuMwx3w, mgorman-l3A5Bk7waGM,
	David Miller, jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20141110195158.16182.71937.stgit@ahduyck-vm-fedora20>

On Mon, Nov 10, 2014 at 11:51 AM, Alexander Duyck
<alexander.h.duyck-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> This patch implements __netdev_alloc_pages and __netdev_alloc_page.  These
> are meant to replace the __skb_alloc_pages and __skb_alloc_page functions.
> The reason for doing this is that it occurred to me that __skb_alloc_page is
> supposed to be passed an sk_buff pointer, but it is NULL in all cases where
> it is used.  Worse is that in the case of ixgbe it is passed NULL via the
> sk_buff pointer in the rx_buffer info structure which means the compiler is
> not correctly stripping it out.

These netdev_*() have nothing related with struct net_device, please
find a better prefix. Also, they are in skbuff.h, you perhaps want to move them
to netdevice.h.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH iproute2] utils: relax strtoX checking in get_time_rtt
From: Florian Westphal @ 2014-11-11  0:38 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

ip route change dev tap0 192.168.7.0/24 rto_min 1ms
Error: argument "1ms" is wrong: "rto_min" value is invalid

get_time_rtt() checks for 's' or 'msec' and converts to milliseconds
if needed.

Fixes: 697ac63905 (utils: fix range checking for get_u32/get_u64 et all)
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 lib/utils.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/utils.c b/lib/utils.c
index dc21567..987377b 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -144,8 +144,8 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw)
 		if (t < 0.0)
 			return -1;
 
-		/* extra non-digits */
-		if (!p || p == arg || *p)
+		/* no digits? */
+		if (!p || p == arg)
 			return -1;
 
 		/* over/underflow */
@@ -154,8 +154,8 @@ int get_time_rtt(unsigned *val, const char *arg, int *raw)
 	} else {
 		res = strtoul(arg, &p, 0);
 
-		/* empty string or trailing non-digits */
-		if (!p || p == arg || *p)
+		/* empty string? */
+		if (!p || p == arg)
 			return -1;
 
 		/* overflow */
-- 
2.0.4

^ permalink raw reply related

* Re: [PATCH v4 10/25] virtio: add API to enable VQs early
From: Andy Grover @ 2014-11-11  0:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Rusty Russell, virtualization, linux-scsi, linux-s390,
	v9fs-developer, netdev, kvm, Amit Shah, Cornelia Huck,
	Christian Borntraeger, David S. Miller, Paolo Bonzini
In-Reply-To: <1413114332-626-11-git-send-email-mst-v4@redhat.com>

On 10/13/2014 12:50 AM, Michael S. Tsirkin wrote:
> virtio spec 0.9.X requires DRIVER_OK to be set before
> VQs are used, but some drivers use VQs before probe
> function returns.
> Since DRIVER_OK is set after probe, this violates the spec.
>
> Even though under virtio 1.0 transitional devices support this
> behaviour, we want to make it possible for those early callers to become
> spec compliant and eventually support non-transitional devices.
>
> Add API for drivers to call before using VQs.
>
> Sets DRIVER_OK internally.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> ---
>   include/linux/virtio_config.h | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
>
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index e8f8f71..e36403b 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -109,6 +109,23 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev,
>   	return vq;
>   }
>
> +/**
> + * virtio_device_ready - enable vq use in probe function
> + * @vdev: the device
> + *
> + * Driver must call this to use vqs in the probe function.
> + *
> + * Note: vqs are enabled automatically after probe returns.
> + */
> +static inline
> +void virtio_device_ready(struct virtio_device *dev)
> +{
> +	unsigned status = dev->config->get_status(dev);
> +
> +	BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> +	dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> +}

Getting a BUG when booting via KVM, host Fedora 20, guest Fedora 20.

my config is at:

https://fedorapeople.org/~grover/config-20141110



[    0.828494] ------------[ cut here ]------------
[    0.829039] kernel BUG at 
/home/agrover/git/kernel/include/linux/virtio_config.h:125!
[    0.831266] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[    0.831266] Modules linked in:
[    0.831266] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 3.18.0-rc4 #120
[    0.831266] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.831266] Workqueue: events control_work_handler
[    0.831266] task: ffff88003cd98000 ti: ffff88003cd94000 task.ti: 
ffff88003cd94000
[    0.831266] RIP: 0010:[<ffffffff81445004>]  [<ffffffff81445004>] 
add_port+0x264/0x410
[    0.831266] RSP: 0000:ffff88003cd97c78  EFLAGS: 00010202
[    0.831266] RAX: 0000000000000007 RBX: ffff88003c58c400 RCX: 
0000000000000001
[    0.831266] RDX: 000000000000c132 RSI: ffffffff81a955e9 RDI: 
000000000001c132
[    0.831266] RBP: ffff88003cd97cc8 R08: 0000000000000000 R09: 
0000000000000000
[    0.831266] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffff88003c58be00
[    0.831266] R13: 0000000000000001 R14: ffff8800395ca800 R15: 
ffff88003c58c420
[    0.831266] FS:  0000000000000000(0000) GS:ffff88003fa00000(0000) 
knlGS:0000000000000000
[    0.831266] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.831266] CR2: 0000000000000000 CR3: 0000000001c11000 CR4: 
00000000000006e0
[    0.831266] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.831266] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[    0.831266] Stack:
[    0.831266]  ffff880000000001 0000000000000292 0000000000000000 
0000000000000001
[    0.831266]  ffff88003cd97cc8 ffff88003dfa8a20 ffff88003c58beb8 
ffff88003c58be10
[    0.831266]  ffff8800395a2000 0000000000000000 ffff88003cd97d38 
ffffffff8144531a
[    0.831266] Call Trace:
[    0.831266]  [<ffffffff8144531a>] control_work_handler+0x16a/0x3c0
[    0.831266]  [<ffffffff8108b0c8>] ? process_one_work+0x208/0x500
[    0.831266]  [<ffffffff8108b16c>] process_one_work+0x2ac/0x500
[    0.831266]  [<ffffffff8108b0c8>] ? process_one_work+0x208/0x500
[    0.831266]  [<ffffffff8108b68e>] worker_thread+0x2ce/0x4e0
[    0.831266]  [<ffffffff8108b3c0>] ? process_one_work+0x500/0x500
[    0.831266]  [<ffffffff81090b28>] kthread+0xf8/0x100
[    0.831266]  [<ffffffff810bad7d>] ? trace_hardirqs_on+0xd/0x10
[    0.831266]  [<ffffffff81090a30>] ? kthread_stop+0x140/0x140
[    0.831266]  [<ffffffff816ea92c>] ret_from_fork+0x7c/0xb0
[    0.831266]  [<ffffffff81090a30>] ? kthread_stop+0x140/0x140
[    0.831266] Code: c7 c2 48 31 01 83 48 c7 c6 e9 55 a9 81 e8 55 b4 c6 
ff 4d 8b b4 24 58 01 00 00 49 8b 86 e8 04 00 00 4c 89 f7 ff 50 10 a8 04 
74 0c <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 49 8b 96 e8 04 00 00 83 c8
[    0.831266] RIP  [<ffffffff81445004>] add_port+0x264/0x410
[    0.831266]  RSP <ffff88003cd97c78>
[    0.878202] ---[ end trace f98fbb172cc7bbf4 ]---

Thanks -- Andy

^ permalink raw reply

* [PATCH net] ipv6: fix IPV6_PKTINFO with v4 mapped
From: Eric Dumazet @ 2014-11-11  1:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Hannes Frederic Sowa

From: Eric Dumazet <edumazet@google.com>

Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: c8e6ad0829a7 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
---
 net/ipv4/ip_sockglue.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 21894df662622bcb0f4bec01a2e612480276b92b..b7826575d2154dc67934d47adf90c7cd0fc40635 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -195,7 +195,7 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc,
 	for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
 		if (!CMSG_OK(msg, cmsg))
 			return -EINVAL;
-#if defined(CONFIG_IPV6)
+#if IS_ENABLED(CONFIG_IPV6)
 		if (allow_ipv6 &&
 		    cmsg->cmsg_level == SOL_IPV6 &&
 		    cmsg->cmsg_type == IPV6_PKTINFO) {

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox