Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v1 2/2] bridge: export multicast database via netlink
From: Thomas Graf @ 2012-11-30 11:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bridge, Herbert Xu, Stephen Hemminger, David S. Miller,
	Jesper Dangaard Brouer
In-Reply-To: <1354269514-1863-2-git-send-email-amwang@redhat.com>

On 11/30/12 at 05:58pm, Cong Wang wrote:
> +enum {
> +	MDBA_UNSPEC,
> +	MDBA_MDB,
> +	MDBA_ROUTER,
> +	__MDBA_MAX,
> +};
> +#define MDBA_MAX (__MDBA_MAX - 1)
> +
> +enum {
> +	MDBA_MDB_UNSPEC,
> +	MDBA_MCADDR,
> +	MDBA_BRPORT_NO,
> +	__MDBA_MDB_MAX,
> +};
> +#define MDBA_MDB_MAX (__MDBA_MDB_MAX - 1)

So juse use these enums from iproute2 directly instead redefining
them.

> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> new file mode 100644
> index 0000000..3751f7d
> --- /dev/null
> +++ b/net/bridge/br_mdb.c
> @@ -0,0 +1,166 @@
> +#include <linux/err.h>
> +#include <linux/if_ether.h>
> +#include <linux/igmp.h>
> +#include <linux/kernel.h>
> +#include <linux/netdevice.h>
> +#include <linux/rculist.h>
> +#include <linux/skbuff.h>
> +#include <linux/slab.h>
> +#include <net/ip.h>
> +#if IS_ENABLED(CONFIG_IPV6)
> +#include <net/ipv6.h>
> +#include <net/mld.h>
> +#include <net/addrconf.h>
> +#include <net/ip6_checksum.h>
> +#endif
> +
> +#include "br_private.h"
> +
> +struct br_port_msg {
> +	int ifindex;
> +};

Make that __u32 ifindex and move the definition into if_bridge.h so
you can reuse it in iproute2.

> +static int br_rports_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
> +			       u32 seq, struct net_device *dev)
> +{
> +	struct net_bridge *br = netdev_priv(dev);
> +	struct net_bridge_port *p;
> +	struct hlist_node *n;
> +	struct nlattr *nest;
> +
> +	if (!br->multicast_router || hlist_empty(&br->router_list)) {
> +		printk(KERN_INFO "no router on bridge\n");
> +		return 0;
> +	}
> +
> +	nest = nla_nest_start(skb, MDBA_ROUTER);
> +	if (nest == NULL)
> +		return -EMSGSIZE;
> +
> +	hlist_for_each_entry_rcu(p, n, &br->router_list, rlist) {
> +		if (p && nla_put_u16(skb, MDBA_BRPORT_NO, p->port_no))
> +			goto fail;
> +	}

port_no 0 is reserved, right?

We can reduce message size here and make it easier to extend by
using p->port_no as the attribute id by doing something like this:

hlist_for_each_entry_rcu(p, n, &br->router_list, rlist)
	if (nla_put_flag(skb, p->port_no))
		goto fail;

This will result in an empty attribute body for now and if you ever
need to include more data you can simply start putting attributes
insde that empty body and old users will continue to function.

> +static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
> +			    u32 seq, struct net_device *dev)
> +{
> +	struct net_bridge *br = netdev_priv(dev);
> +	struct net_bridge_mdb_htable *mdb;
> +	struct nlattr *nest;
> +	unsigned int i;
> +	int s_idx;
> +
> +	if (br->multicast_disabled) {
> +		printk(KERN_INFO "multicast is disabled on bridge\n");
> +		return 0;
> +	}
> +
> +	mdb = rcu_dereference(br->mdb);
> +	if (!mdb) {
> +		printk(KERN_INFO "no mdb on bridge\n");
> +		return 0;
> +	}
> +
> +	cb->seq = mdb->seq;

I'm not sure how this is supposed to worl. cb->seq may not change
throughout the complete dump process or the dump will be interrupted
and the user is required to restart.

Each bridge will have its own mdb resulting in a differing seq.

> +	s_idx = cb->args[1];
> +	if (s_idx >= mdb->max)
> +		return 0;
> +
> +	nest = nla_nest_start(skb, MDBA_MDB);
> +	if (nest == NULL)
> +		return -EMSGSIZE;
> +
> +	for (i = s_idx; i < mdb->max; i++) {
> +		struct hlist_node *h;
> +		struct net_bridge_mdb_entry *mp;
> +		struct net_bridge_port_group *p, **pp;
> +		struct net_bridge_port *port;
> +
> +		hlist_for_each_entry_rcu(mp, h, &mdb->mhash[i], hlist[mdb->ver]) {
> +			for (pp = &mp->ports;
> +			     (p = rcu_dereference(*pp)) != NULL;
> +			      pp = &p->next) {
> +				port = p->port;
> +				if (port) {
> +					printk(KERN_INFO "port %u, mcaddr: %pI4\n", port->port_no, &mp->addr.u.ip4);
> +					if (nla_put_u16(skb, MDBA_BRPORT_NO, port->port_no) ||
> +					    nla_put_be32(skb, MDBA_MCADDR, p->addr.u.ip4))
> +						goto fail;
> +				}
> +			}
> +		}
> +	}
> +
> +	cb->args[1] = i;
> +	nla_nest_end(skb, nest);
> +	return 0;
> +fail:
> +	cb->args[1] = i;
> +	nla_nest_cancel(skb, nest);
> +	return -EMSGSIZE;

This still relies on the assumption that all of mdb->mhash[] will fit
into a single netlink message. Is that guaranteed? If so that would
simplify this a lot and you wouldn't have to worry about rehashes at
all.

As-is it looks broken, you store an offset into the hash but if you
run out of space you trim away the complete nested attribute again
and thus offseting will actually result in data loss because that
data has never been wired but you will skip over it next time.

You need something like this:
        for (i = 0; i < mdb->max; i++) {
                [...]
                hlist_for_each_entry_rcu(mp, h, &mdb->mhash[i], hlist[mdb->ver]) {
                        if (idx < s_idx)
                                goto skip;

                        hentry = nla_nest_start(...)
                        if (nla_put_u16(...) ||
                            nla_put_32(...)) {
                                nla_nest_cancel(skb, hentry);
				err = -EMSGSIZE;
                                goto out;
                        }
                        nla_nest_end(skb, hentry);
                skip:
                        idx++;
                }
        }

out:
	cb->args[1] = idx;
	nla_nest_end(skb, nest);
	return err;

> +}
> +
> +static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
> +{
> +	struct net_device *dev;
> +	struct net *net = sock_net(skb->sk);
> +	struct nlmsghdr *nlh;
> +	u32 seq = cb->nlh->nlmsg_seq;
> +	int idx = 0, s_idx;
> +
> +	s_idx = cb->args[0];
> +
> +	rcu_read_lock();
> +
> +	for_each_netdev_rcu(net, dev) {
> +		if (dev->priv_flags & IFF_EBRIDGE) {
> +			struct br_port_msg *bpm;
> +
> +			if (idx < s_idx)
> +				goto cont;
> +
> +			nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid,
> +					seq, RTM_GETMDB,
> +					sizeof(*bpm), NLM_F_MULTI);
> +			if (nlh == NULL)
> +				break;
> +
> +			bpm = nlmsg_data(nlh);
> +			bpm->ifindex = dev->ifindex;
> +			if (br_mdb_fill_info(skb, cb, seq, dev) < 0) {
> +				printk(KERN_INFO "br_mdb_fill_info failed\n");
> +				goto fail;
> +			}
> +			if (br_rports_fill_info(skb, cb, seq, dev) < 0) {
> +				printk(KERN_INFO "br_rports_fill_info failed\n");
> +				goto fail;
> +			}
> +
> +			nlmsg_end(skb, nlh);
> +cont:
> +			idx++;
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +	cb->args[0] = idx;
> +	return skb->len;
> +
> +fail:
> +	rcu_read_unlock();
> +	nlmsg_cancel(skb, nlh);
> +	return skb->len;

Assuming you implement partial messages as proposed above you don't
want to nlmsg_cancel() here. Instead you just want to nlmsg_end() and
send out the message with a partial MDB_MDBA and continue where you
left off.

^ permalink raw reply

* [PATCH net-next] rtnelink: remove unused parameter from rtnl_create_link().
From: Rami Rosen @ 2012-11-30 11:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, ebiederm, jiri, Rami Rosen

This patch removes an unused parameter (src_net) from rtnl_create_link() 
method and from the method single invocation, in veth. 
This parameter was used in the past when calling 
ops->get_tx_queues(src_net, tb) in rtnl_create_link().  
The get_tx_queues() member of rtnl_link_ops was replaced by two methods, 
get_num_tx_queues() and get_num_rx_queues(), which do not get any 
parameter. This was done in commit d40156aa5ecbd51fed932ed4813df82b56e5ff4d by 
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
---
 drivers/net/veth.c      | 2 +-
 include/net/rtnetlink.h | 2 +-
 net/core/rtnetlink.c    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 24f6b27..95814d9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -340,7 +340,7 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	peer = rtnl_create_link(src_net, net, ifname, &veth_link_ops, tbp);
+	peer = rtnl_create_link(net, ifname, &veth_link_ops, tbp);
 	if (IS_ERR(peer)) {
 		put_net(net);
 		return PTR_ERR(peer);
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 6b00c4f..5a15fab 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -125,7 +125,7 @@ extern void	rtnl_af_unregister(struct rtnl_af_ops *ops);
 
 
 extern struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[]);
-extern struct net_device *rtnl_create_link(struct net *src_net, struct net *net,
+extern struct net_device *rtnl_create_link(struct net *net,
 	char *ifname, const struct rtnl_link_ops *ops, struct nlattr *tb[]);
 extern int rtnl_configure_link(struct net_device *dev,
 			       const struct ifinfomsg *ifm);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 575a6ee..1868625 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1642,7 +1642,7 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
 }
 EXPORT_SYMBOL(rtnl_configure_link);
 
-struct net_device *rtnl_create_link(struct net *src_net, struct net *net,
+struct net_device *rtnl_create_link(struct net *net,
 	char *ifname, const struct rtnl_link_ops *ops, struct nlattr *tb[])
 {
 	int err;
@@ -1840,7 +1840,7 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(net, dest_net, ifname, ops, tb);
+		dev = rtnl_create_link(dest_net, ifname, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH iproute2 v1] Add mdb command to bridge
From: Thomas Graf @ 2012-11-30 10:54 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bridge, Herbert Xu, Stephen Hemminger, David S. Miller,
	Jesper Dangaard Brouer
In-Reply-To: <1354269514-1863-3-git-send-email-amwang@redhat.com>

On 11/30/12 at 05:58pm, Cong Wang wrote:
> diff --git a/bridge/br_common.h b/bridge/br_common.h
> index 718ecb9..83b2b9f 100644
> --- a/bridge/br_common.h
> +++ b/bridge/br_common.h
> @@ -5,6 +5,7 @@ extern int print_fdb(const struct sockaddr_nl *who,
>  		     struct nlmsghdr *n, void *arg);
>  
>  extern int do_fdb(int argc, char **argv);
> +extern int do_mdb(int argc, char **argv);
>  extern int do_monitor(int argc, char **argv);
>  
>  extern int preferred_family;
> @@ -12,3 +13,37 @@ extern int show_stats;
>  extern int show_detail;
>  extern int timestamp;
>  extern struct rtnl_handle rth;
> +
> +/* Bridge multicast database attributes
> + * [MDBA_MDB] = {
> + *    [MDBA_MCADDR]
> + *    [MDBA_BRPORT_NO]
> + * }
> + * [MDBA_ROUTER] = {
> + *    [MDBA_BRPORT_NO]
> + * }
> + */
> +enum {
> +	MDBA_UNSPEC,
> +	MDBA_MDB,
> +	MDBA_ROUTER,
> +	__MDBA_MAX,
> +};
> +#define MDBA_MAX (__MDBA_MAX - 1)
> +
> +enum {
> +	MDBA_MDB_UNSPEC,
> +	MDBA_MCADDR,
> +	MDBA_BRPORT_NO,
> +	__MDBA_MDB_MAX,
> +};
> +#define MDBA_MDB_MAX (__MDBA_MDB_MAX - 1)
> +
> +struct br_port_msg {
> +        int ifindex;
> +};
> +
> +#ifndef MDBA_RTA
> +#define MDBA_RTA(r) \
> +	((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
> +#endif

You shouldn't need to duplicate the attribute ids and struct br_port_msg
in iproute2. Just put these definitions in a uapi kernel header and
include the header from iproute2.

^ permalink raw reply

* Re: [PATCH 00/17] ATM fixes for pppoatm/br2684
From: Krzysztof Mazur @ 2012-11-30 10:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, chas
In-Reply-To: <1354235736-26833-1-git-send-email-dwmw2@infradead.org>

On Fri, Nov 30, 2012 at 12:35:19AM +0000, David Woodhouse wrote:
> This is the result of pulling on the thread started by Krzysztof Mazur's
> original patch 'pppoatm: don't send frames to destroyed vcc'.
> 
> Various problems in the pppoatm and br2684 code are solved, some of which
> were easily triggered and would panic the kernel.
> 
> The patch series can be pulled from
> 	git://git.infradead.org/users/dwmw2/atm.git
> or viewed at 
> 	http://git.infradead.org/users/dwmw2/atm.git
> 
> DaveM, please wait for an ack from Krzysztof and Chas before pulling this.

looks good to me, except "pppoatm: drop frames to not-ready vcc"
that needs updated commit message update after we drop "1/7".

Thanks,

Krzysiek

^ permalink raw reply

* Re: [PATCH 05/17] pppoatm: drop frames to not-ready vcc
From: Krzysztof Mazur @ 2012-11-30 10:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, chas, David Woodhouse
In-Reply-To: <1354235736-26833-6-git-send-email-dwmw2@infradead.org>

On Fri, Nov 30, 2012 at 12:35:24AM +0000, David Woodhouse wrote:
> From: Krzysztof Mazur <krzysiek@podlesie.net>
> 
> Patches "atm: detach protocol before closing vcc"
> and "pppoatm: allow assign only on a connected socket" fixed
> common cases where the pppoatm_send() crashes while sending
> frame to not-ready vcc. However there are still some other cases
> where we can send frames to vcc, which is flagged as ATM_VF_CLOSE
> (for instance after vcc_release_async()) or it's opened but not
> ready yet.

We should update the paragraph above, with something like:

The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.

> 
> Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
> indicate that vcc is not ready. If the vcc is not ready we just
> drop frame. Queueing frames is much more complicated because we
> don't have callbacks that inform us about vcc flags changes.
> 
> Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

I'm sending the updated version. The only difference is the commit
message.

Krzysiek
-- >8 --
Subject: [PATCH 05/17] pppoatm: drop frames to not-ready vcc

The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.

Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready. If the vcc is not ready we just
drop frame. Queueing frames is much more complicated because we
don't have callbacks that inform us about vcc flags changes.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
---
 net/atm/pppoatm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index c4a57bc..aeb726c 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -284,6 +284,13 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 	bh_lock_sock(sk_atm(vcc));
 	if (sock_owned_by_user(sk_atm(vcc)))
 		goto nospace;
+	if (test_bit(ATM_VF_RELEASED, &vcc->flags) ||
+	    test_bit(ATM_VF_CLOSE, &vcc->flags) ||
+	    !test_bit(ATM_VF_READY, &vcc->flags)) {
+		bh_unlock_sock(sk_atm(vcc));
+		kfree_skb(skb);
+		return DROP_PACKET;
+	}
 
 	switch (pvcc->encaps) {		/* LLC encapsulation needed */
 	case e_llc:
-- 
1.8.0.411.g71a7da8

^ permalink raw reply related

* Re: [PATCH net-next v1 1/2] bridge: export port_no and port_id via IFA_INFO_DATA
From: Thomas Graf @ 2012-11-30 10:18 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, bridge, Herbert Xu, Stephen Hemminger, David S. Miller,
	Jesper Dangaard Brouer
In-Reply-To: <1354269514-1863-1-git-send-email-amwang@redhat.com>

On 11/30/12 at 05:58pm, Cong Wang wrote:
> @@ -168,6 +172,8 @@ static const struct nla_policy ifla_brport_policy[IFLA_BRPORT_MAX + 1] = {
>  	[IFLA_BRPORT_MODE]	= { .type = NLA_U8 },
>  	[IFLA_BRPORT_GUARD]	= { .type = NLA_U8 },
>  	[IFLA_BRPORT_PROTECT]	= { .type = NLA_U8 },
> +	[IFLA_BRPORT_NO]	= { .type = NLA_U16 },
> +	[IFLA_BRPORT_ID]	= { .type = NLA_U16 },
>  };
>  
>  /* Change the state of the port and notify spanning tree */

I missed this in the first round. Not much of a point in adding policy
entries if the attributes are read-only as in this case.

^ permalink raw reply

* [PATCH] batman-adv: use ETH_P_BATMAN
From: Antonio Quartulli @ 2012-11-30 10:14 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Marek Lindner
In-Reply-To: <1354270450-25935-1-git-send-email-ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

The ETH_P_BATMAN ethertype is now defined kernel-wide. Use it instead
of the private BATADV_ETH_P_BATMAN define.

Signed-off-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
Signed-off-by: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
---
 net/batman-adv/hard-interface.c | 3 ++-
 net/batman-adv/packet.h         | 2 --
 net/batman-adv/send.c           | 6 ++++--
 net/batman-adv/soft-interface.c | 9 +++++----
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 365ed74..f1d37cd 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -30,6 +30,7 @@
 #include "bridge_loop_avoidance.h"
 
 #include <linux/if_arp.h>
+#include <linux/if_ether.h>
 
 void batadv_hardif_free_rcu(struct rcu_head *rcu)
 {
@@ -311,7 +312,7 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,
 {
 	struct batadv_priv *bat_priv;
 	struct net_device *soft_iface;
-	__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
+	__be16 ethertype = __constant_htons(ETH_P_BATMAN);
 	int ret;
 
 	if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index 1c5454d..cb6405bf 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -20,8 +20,6 @@
 #ifndef _NET_BATMAN_ADV_PACKET_H_
 #define _NET_BATMAN_ADV_PACKET_H_
 
-#define BATADV_ETH_P_BATMAN  0x4305 /* unofficial/not registered Ethertype */
-
 enum batadv_packettype {
 	BATADV_IV_OGM		= 0x01,
 	BATADV_ICMP		= 0x02,
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
index c7f7023..4425af9 100644
--- a/net/batman-adv/send.c
+++ b/net/batman-adv/send.c
@@ -28,6 +28,8 @@
 #include "gateway_common.h"
 #include "originator.h"
 
+#include <linux/if_ether.h>
+
 static void batadv_send_outstanding_bcast_packet(struct work_struct *work);
 
 /* send out an already prepared packet to the given address via the
@@ -60,11 +62,11 @@ int batadv_send_skb_packet(struct sk_buff *skb,
 	ethhdr = (struct ethhdr *)skb_mac_header(skb);
 	memcpy(ethhdr->h_source, hard_iface->net_dev->dev_addr, ETH_ALEN);
 	memcpy(ethhdr->h_dest, dst_addr, ETH_ALEN);
-	ethhdr->h_proto = __constant_htons(BATADV_ETH_P_BATMAN);
+	ethhdr->h_proto = __constant_htons(ETH_P_BATMAN);
 
 	skb_set_network_header(skb, ETH_HLEN);
 	skb->priority = TC_PRIO_CONTROL;
-	skb->protocol = __constant_htons(BATADV_ETH_P_BATMAN);
+	skb->protocol = __constant_htons(ETH_P_BATMAN);
 
 	skb->dev = hard_iface->net_dev;
 
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 54800c7..6b548fd 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -34,6 +34,7 @@
 #include <linux/ethtool.h>
 #include <linux/etherdevice.h>
 #include <linux/if_vlan.h>
+#include <linux/if_ether.h>
 #include "unicast.h"
 #include "bridge_loop_avoidance.h"
 
@@ -146,7 +147,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
 	struct batadv_hard_iface *primary_if = NULL;
 	struct batadv_bcast_packet *bcast_packet;
 	struct vlan_ethhdr *vhdr;
-	__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
+	__be16 ethertype = __constant_htons(ETH_P_BATMAN);
 	static const uint8_t stp_addr[ETH_ALEN] = {0x01, 0x80, 0xC2, 0x00,
 						   0x00, 0x00};
 	static const uint8_t ectp_addr[ETH_ALEN] = {0xCF, 0x00, 0x00, 0x00,
@@ -172,7 +173,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
 			break;
 
 		/* fall through */
-	case BATADV_ETH_P_BATMAN:
+	case ETH_P_BATMAN:
 		goto dropped;
 	}
 
@@ -302,7 +303,7 @@ void batadv_interface_rx(struct net_device *soft_iface,
 	struct vlan_ethhdr *vhdr;
 	struct batadv_header *batadv_header = (struct batadv_header *)skb->data;
 	short vid __maybe_unused = -1;
-	__be16 ethertype = __constant_htons(BATADV_ETH_P_BATMAN);
+	__be16 ethertype = __constant_htons(ETH_P_BATMAN);
 	bool is_bcast;
 
 	is_bcast = (batadv_header->packet_type == BATADV_BCAST);
@@ -325,7 +326,7 @@ void batadv_interface_rx(struct net_device *soft_iface,
 			break;
 
 		/* fall through */
-	case BATADV_ETH_P_BATMAN:
+	case ETH_P_BATMAN:
 		goto dropped;
 	}
 
-- 
1.8.0

^ permalink raw reply related

* pull request: batman-adv 2012-11-30
From: Antonio Quartulli @ 2012-11-30 10:14 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

Here is a lonely patch intended for net-next/linux-3.8.
It it simply adapting the batman-adv code to use the new ETH_P_BATMAN define
recently introduced in if_ether.h.

Let me know if there is any problem!

Thank you,
	Antonio


The following changes since commit bb728820fe7c42fdb838ab2745fb5fe6b18b5ffa:

  core: make GRO methods static. (2012-11-29 13:18:32 -0500)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batman-adv-for-davem

for you to fetch changes up to af5d4f7737963f2112f148f97c5820425f050650:

  batman-adv: use ETH_P_BATMAN (2012-11-30 10:50:22 +0100)

----------------------------------------------------------------
Included changes:
- Use the new ETH_P_BATMAN define instead of the private BATADV_ETH_P_BATMAN

----------------------------------------------------------------
Antonio Quartulli (1):
      batman-adv: use ETH_P_BATMAN

 net/batman-adv/hard-interface.c | 3 ++-
 net/batman-adv/packet.h         | 2 --
 net/batman-adv/send.c           | 6 ++++--
 net/batman-adv/soft-interface.c | 9 +++++----
 4 files changed, 11 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-11-30 10:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, fw, netdev, pablo, tgraf, amwang, kaber, paulmck,
	herbert
In-Reply-To: <1354230100.3299.40.camel@edumazet-glaptop>

On Thu, 2012-11-29 at 15:01 -0800, Eric Dumazet wrote:
> On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
> 
> > For example lets give a threshold of 2000 MBytes:
> > 
> > [root@dragon ~]# sysctl -w net/ipv4/ipfrag_high_thresh=$(((1024**2*2000)))
> > net.ipv4.ipfrag_high_thresh = 2097152000
> > 
> > [root@dragon ~]# sysctl -w net/ipv4/ipfrag_low_thresh=$(((1024**2*2000)-655350))
> > net.ipv4.ipfrag_low_thresh = 2096496650
> > 
> > 4x10 Netperf adjusted output:
> >  Socket  Message  Elapsed      Messages
> >  Size    Size     Time         Okay Errors   Throughput
> >  bytes   bytes    secs            #      #   10^6bits/sec
> > 
> >  229376   65507   20.00      298685      0    7826.35
> >  212992           20.00          27              0.71
> > 
> >  229376   65507   20.00      366668      0    9607.71
> >  212992           20.00          13              0.34
> > 
> >  229376   65507   20.00      254790      0    6676.20
> >  212992           20.00          14              0.37
> > 
> >  229376   65507   20.00      309293      0    8104.33
> >  212992           20.00          15              0.39
> > 
> > Can we agree that the current evictor strategy is broken?
> 
> Not really, you drop packets because of another limit.

Then tell me which limit?
And notice the result is the same for 200 MBytes threshold.

As I wrote *just* above the section you quoted:

On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
[...] Thus, we must drop packets, or else the NIC will do it for
> us... for fragments we need do this "dropping" more intelligent. 

So, I think it is the NIC dropping packets, in this case... what do you
claim?



I still claim the the current evictor strategy is broken!

We need to drop fragments more intelligently in software. As DaveM
correctly states, the code/algorithm needs some "probability
of fulfillment" taken into account.   Which is actually what my evictor
code implements (I don't claim its perfect, as it currently does have
fairness/fair-queue issues, I have a plan for fixing it, but lets not
clutter up this answer).


So, let me instead show, with tests, that the evictor strategy is
broken, while keeping the original default thresh settings:

# grep . /proc/sys/net/ipv4/ipfrag_*_thresh
/proc/sys/net/ipv4/ipfrag_high_thresh:262144
/proc/sys/net/ipv4/ipfrag_low_thresh:196608

Test purpose, I will on a single 10G link demonstrate, that starting
several "N" netperf UDP fragmentation flows, will hurt performance, and
then claim this is caused by the bad evictor strategy.

Test setup:
 - Disable Ethernet flow control
 - netperf packet size 65507
 - Run netserver on one NUMA node
 - Start netperf clients against a NIC on the other NUMA node
 - (The NUMA imbalance helps the effect occur at lower N) 

Result: N=1  8040 Mbit/s
Result: N=2  9584 Mbit/s (4739+4845)
Result: N=3  4055 Mbit/s (1436+1371+1248)
Result: N=4  2247 Mbit/s (1538+29+54+626)
Result: N=5   879 Mbit/s (78+152+226+125+298)
Result: N=6   293 Mbit/s (85+55+32+57+46+18)
Result: N=7   354 Mbit/s (70+47+33+80+20+72+32)

Can we, now, agree that the current evictor strategy is broken?!?


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-11-30 10:00 UTC (permalink / raw)
  To: Francois Romieu, nic_swsd, netdev, linux-kernel, Hayes Wang
In-Reply-To: <20121130090429.GA14439@udknight>

On Fri, Nov 30, 2012 at 05:04:29PM +0800, Wang YanQing wrote:
> On Fri, Nov 30, 2012 at 07:35:00AM +0100, Francois Romieu wrote:
> > Which kernel version is it ?
> I have done the test and debug  with mainline 
> e23739b4ade80a3a7f87198f008f6c44a7cbc9fd, v3.7-rc7-51-ge23739b
More exactly, I find the problem in 3.0 stable tree, and fix it
for 3.0 stable tree first, then re test and debug for mainline,
e23739b4ade80a3a7f87198f008f6c44a7cbc9fd, v3.7-rc7-51-ge23739b.

I found the mainline's r8169 works the same as realtek's driver,
the first time don't work, but it works after reboot, the reason
is the pci driver's shutdown don't call rtl_rar_set in 3.0 stable
tree but the mainline does.

Thanks.

^ permalink raw reply

* [PATCH net-next v1 1/2] bridge: export port_no and port_id via IFA_INFO_DATA
From: Cong Wang @ 2012-11-30  9:58 UTC (permalink / raw)
  To: netdev
  Cc: bridge, Cong Wang, Herbert Xu, Stephen Hemminger, David S. Miller,
	Thomas Graf, Jesper Dangaard Brouer

port_no will be used to get ifindex of a port in user-space,
export it togather with port_id.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/uapi/linux/if_link.h |    2 ++
 net/bridge/br_netlink.c      |    8 +++++++-
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb58aeb..9cd91a9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -218,6 +218,8 @@ enum {
 	IFLA_BRPORT_MODE,	/* mode (hairpin)          */
 	IFLA_BRPORT_GUARD,	/* bpdu guard              */
 	IFLA_BRPORT_PROTECT,	/* root port protection    */
+	IFLA_BRPORT_NO,		/* port no                 */
+	IFLA_BRPORT_ID,		/* port id                 */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 65429b9..7b7414e 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -28,6 +28,8 @@ static inline size_t br_port_info_size(void)
 		+ nla_total_size(1)	/* IFLA_BRPORT_MODE */
 		+ nla_total_size(1)	/* IFLA_BRPORT_GUARD */
 		+ nla_total_size(1)	/* IFLA_BRPORT_PROTECT */
+		+ nla_total_size(2)	/* IFLA_BRPORT_NO */
+		+ nla_total_size(2)	/* IFLA_BRPORT_ID */
 		+ 0;
 }
 
@@ -53,7 +55,9 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_BRPORT_COST, p->path_cost) ||
 	    nla_put_u8(skb, IFLA_BRPORT_MODE, mode) ||
 	    nla_put_u8(skb, IFLA_BRPORT_GUARD, !!(p->flags & BR_BPDU_GUARD)) ||
-	    nla_put_u8(skb, IFLA_BRPORT_PROTECT, !!(p->flags & BR_ROOT_BLOCK)))
+	    nla_put_u8(skb, IFLA_BRPORT_PROTECT, !!(p->flags & BR_ROOT_BLOCK)) ||
+	    nla_put_u16(skb, IFLA_BRPORT_NO, p->port_no) ||
+	    nla_put_u16(skb, IFLA_BRPORT_ID, p->port_id))
 		return -EMSGSIZE;
 
 	return 0;
@@ -168,6 +172,8 @@ static const struct nla_policy ifla_brport_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_MODE]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_GUARD]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_PROTECT]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_NO]	= { .type = NLA_U16 },
+	[IFLA_BRPORT_ID]	= { .type = NLA_U16 },
 };
 
 /* Change the state of the port and notify spanning tree */

^ permalink raw reply related

* [PATCH iproute2 v1] Add mdb command to bridge
From: Cong Wang @ 2012-11-30  9:58 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, bridge, Herbert Xu, Jesper Dangaard Brouer,
	Thomas Graf, Stephen Hemminger, David S. Miller
In-Reply-To: <1354269514-1863-1-git-send-email-amwang@redhat.com>

Sample output:

	# ./bridge/bridge mdb
	bridge dev br0
	multicast database:
	port no 1, multicast addr 224.8.8.9
	port no 2, multicast addr 224.8.8.8
	router ports: 2

TODO: display device name too

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/Makefile           |    2 +-
 bridge/br_common.h        |   35 ++++++++++
 bridge/bridge.c           |    1 +
 bridge/mdb.c              |  157 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/rtnetlink.h |    3 +
 5 files changed, 197 insertions(+), 1 deletions(-)

diff --git a/bridge/Makefile b/bridge/Makefile
index 9a6743e..67aceb4 100644
--- a/bridge/Makefile
+++ b/bridge/Makefile
@@ -1,4 +1,4 @@
-BROBJ = bridge.o fdb.o monitor.o link.o
+BROBJ = bridge.o fdb.o monitor.o link.o mdb.o
 
 include ../Config
 
diff --git a/bridge/br_common.h b/bridge/br_common.h
index 718ecb9..83b2b9f 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -5,6 +5,7 @@ extern int print_fdb(const struct sockaddr_nl *who,
 		     struct nlmsghdr *n, void *arg);
 
 extern int do_fdb(int argc, char **argv);
+extern int do_mdb(int argc, char **argv);
 extern int do_monitor(int argc, char **argv);
 
 extern int preferred_family;
@@ -12,3 +13,37 @@ extern int show_stats;
 extern int show_detail;
 extern int timestamp;
 extern struct rtnl_handle rth;
+
+/* Bridge multicast database attributes
+ * [MDBA_MDB] = {
+ *    [MDBA_MCADDR]
+ *    [MDBA_BRPORT_NO]
+ * }
+ * [MDBA_ROUTER] = {
+ *    [MDBA_BRPORT_NO]
+ * }
+ */
+enum {
+	MDBA_UNSPEC,
+	MDBA_MDB,
+	MDBA_ROUTER,
+	__MDBA_MAX,
+};
+#define MDBA_MAX (__MDBA_MAX - 1)
+
+enum {
+	MDBA_MDB_UNSPEC,
+	MDBA_MCADDR,
+	MDBA_BRPORT_NO,
+	__MDBA_MDB_MAX,
+};
+#define MDBA_MDB_MAX (__MDBA_MDB_MAX - 1)
+
+struct br_port_msg {
+        int ifindex;
+};
+
+#ifndef MDBA_RTA
+#define MDBA_RTA(r) \
+	((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct br_port_msg))))
+#endif
diff --git a/bridge/bridge.c b/bridge/bridge.c
index e2c33b0..1fcd365 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -43,6 +43,7 @@ static const struct cmd {
 	int (*func)(int argc, char **argv);
 } cmds[] = {
 	{ "fdb", 	do_fdb },
+	{ "mdb", 	do_mdb },
 	{ "monitor",	do_monitor },
 	{ "help",	do_help },
 	{ 0 }
diff --git a/bridge/mdb.c b/bridge/mdb.c
new file mode 100644
index 0000000..f2f4ff2
--- /dev/null
+++ b/bridge/mdb.c
@@ -0,0 +1,157 @@
+/*
+ * Get mdb table with netlink
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if_bridge.h>
+#include <linux/if_ether.h>
+#include <linux/neighbour.h>
+#include <string.h>
+#include <arpa/inet.h>
+
+#include "libnetlink.h"
+#include "br_common.h"
+#include "rt_names.h"
+#include "utils.h"
+
+int filter_index;
+
+static void usage(void)
+{
+	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
+	exit(-1);
+}
+
+static void br_print_router_ports(FILE *f, struct rtattr *attr)
+{
+	uint16_t *port_no;
+	struct rtattr *i;
+	int rem;
+
+	fprintf(f, "router ports: ");
+
+	rem = RTA_PAYLOAD(attr);
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		port_no = RTA_DATA(i);
+		fprintf(f, "%u ", *port_no);
+	}
+	fprintf(f, "\n");
+}
+
+static void br_print_mdb_entry(FILE *f, struct rtattr *attr)
+{
+	uint32_t addr;
+	uint16_t port_no;
+	struct rtattr *i;
+	int rem;
+	SPRINT_BUF(abuf);
+
+	fprintf(f, "multicast database:\n");
+
+	rem = RTA_PAYLOAD(attr);
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+		port_no = rta_getattr_u16(i);
+		i = RTA_NEXT(i, rem);
+		addr = rta_getattr_u32(i);
+		fprintf(f, "port no %u, multicast addr %s\n", port_no, inet_ntop(AF_INET, &addr, abuf, sizeof(abuf)));
+	}
+}
+
+int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = arg;
+	struct br_port_msg *r = NLMSG_DATA(n);
+	int len = n->nlmsg_len;
+	struct rtattr * tb[MDBA_MAX+1];
+
+	if (n->nlmsg_type != RTM_GETMDB) {
+		fprintf(stderr, "Not RTM_MDB: %08x %08x %08x\n",
+			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
+
+		return 0;
+	}
+
+	len -= NLMSG_LENGTH(sizeof(*r));
+	if (len < 0) {
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	if (filter_index && filter_index != r->ifindex)
+		return 0;
+
+	if (!filter_index && r->ifindex)
+		fprintf(fp, "bridge dev %s\n", ll_index_to_name(r->ifindex));
+
+	parse_rtattr(tb, MDBA_MAX, MDBA_RTA(r), n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
+
+	if (tb[MDBA_MDB])
+		br_print_mdb_entry(fp, tb[MDBA_MDB]);
+
+	if (tb[MDBA_ROUTER])
+		br_print_router_ports(fp, tb[MDBA_ROUTER]);
+
+	return 0;
+}
+
+static int mdb_show(int argc, char **argv)
+{
+	char *filter_dev = NULL;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "dev") == 0) {
+			NEXT_ARG();
+			if (filter_dev)
+				duparg("dev", *argv);
+			filter_dev = *argv;
+		}
+		argc--; argv++;
+	}
+
+	if (filter_dev) {
+		filter_index = if_nametoindex(filter_dev);
+		if (filter_index == 0) {
+			fprintf(stderr, "Cannot find device \"%s\"\n",
+				filter_dev);
+			return -1;
+		}
+	}
+
+	if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETMDB) < 0) {
+		perror("Cannot send dump request");
+		exit(1);
+	}
+
+	if (rtnl_dump_filter(&rth, print_mdb, stdout) < 0) {
+		fprintf(stderr, "Dump terminated\n");
+		exit(1);
+	}
+
+	return 0;
+}
+
+int do_mdb(int argc, char **argv)
+{
+	ll_init_map(&rth);
+
+	if (argc > 0) {
+		if (matches(*argv, "show") == 0 ||
+		    matches(*argv, "lst") == 0 ||
+		    matches(*argv, "list") == 0)
+			return mdb_show(argc-1, argv+1);
+		if (matches(*argv, "help") == 0)
+			usage();
+	} else
+		return mdb_show(0, NULL);
+
+	fprintf(stderr, "Command \"%s\" is unknown, try \"bridge mdb help\".\n", *argv);
+	exit(-1);
+}
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 0e3e0c1..f400b5e 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -120,6 +120,9 @@ enum {
 	RTM_SETDCB,
 #define RTM_SETDCB RTM_SETDCB
 
+	RTM_GETMDB = 86,
+#define RTM_GETMDB RTM_GETMDB
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };

^ permalink raw reply related

* [PATCH net-next v1 2/2] bridge: export multicast database via netlink
From: Cong Wang @ 2012-11-30  9:58 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, bridge, Herbert Xu, Jesper Dangaard Brouer,
	Thomas Graf, Stephen Hemminger, David S. Miller
In-Reply-To: <1354269514-1863-1-git-send-email-amwang@redhat.com>

This patch exports bridge multicast database via netlink
message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
    
---
 include/uapi/linux/if_bridge.h |   26 ++++++
 include/uapi/linux/rtnetlink.h |    3 +
 net/bridge/Makefile            |    2 +-
 net/bridge/br_mdb.c            |  166 ++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_multicast.c      |    2 +
 net/bridge/br_private.h        |    2 +
 6 files changed, 200 insertions(+), 1 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index b388579..c30c236 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -116,4 +116,30 @@ enum {
 	__IFLA_BRIDGE_MAX,
 };
 #define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
+
+/* Bridge multicast database attributes
+ * [MDBA_MDB] = {
+ *     [MDBA_MCADDR]
+ *     [MDBA_BRPORT_NO]
+ * }
+ * [MDBA_ROUTER] = {
+ *    [MDBA_BRPORT_NO]
+ * }
+ */
+enum {
+	MDBA_UNSPEC,
+	MDBA_MDB,
+	MDBA_ROUTER,
+	__MDBA_MAX,
+};
+#define MDBA_MAX (__MDBA_MAX - 1)
+
+enum {
+	MDBA_MDB_UNSPEC,
+	MDBA_MCADDR,
+	MDBA_BRPORT_NO,
+	__MDBA_MDB_MAX,
+};
+#define MDBA_MDB_MAX (__MDBA_MDB_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 3dee071..0df623f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -125,6 +125,9 @@ enum {
 	RTM_GETNETCONF = 82,
 #define RTM_GETNETCONF RTM_GETNETCONF
 
+	RTM_GETMDB = 86,
+#define RTM_GETMDB RTM_GETMDB
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index d0359ea..e859098 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -12,6 +12,6 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
 
 bridge-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
 
-bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o
+bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o
 
 obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
new file mode 100644
index 0000000..3751f7d
--- /dev/null
+++ b/net/bridge/br_mdb.c
@@ -0,0 +1,166 @@
+#include <linux/err.h>
+#include <linux/if_ether.h>
+#include <linux/igmp.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/rculist.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/ipv6.h>
+#include <net/mld.h>
+#include <net/addrconf.h>
+#include <net/ip6_checksum.h>
+#endif
+
+#include "br_private.h"
+
+struct br_port_msg {
+	int ifindex;
+};
+
+static int br_rports_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
+			       u32 seq, struct net_device *dev)
+{
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_port *p;
+	struct hlist_node *n;
+	struct nlattr *nest;
+
+	if (!br->multicast_router || hlist_empty(&br->router_list)) {
+		printk(KERN_INFO "no router on bridge\n");
+		return 0;
+	}
+
+	nest = nla_nest_start(skb, MDBA_ROUTER);
+	if (nest == NULL)
+		return -EMSGSIZE;
+
+	hlist_for_each_entry_rcu(p, n, &br->router_list, rlist) {
+		if (p && nla_put_u16(skb, MDBA_BRPORT_NO, p->port_no))
+			goto fail;
+	}
+
+	nla_nest_end(skb, nest);
+	return 0;
+fail:
+	nla_nest_cancel(skb, nest);
+	return -EMSGSIZE;
+}
+
+static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
+			    u32 seq, struct net_device *dev)
+{
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_mdb_htable *mdb;
+	struct nlattr *nest;
+	unsigned int i;
+	int s_idx;
+
+	if (br->multicast_disabled) {
+		printk(KERN_INFO "multicast is disabled on bridge\n");
+		return 0;
+	}
+
+	mdb = rcu_dereference(br->mdb);
+	if (!mdb) {
+		printk(KERN_INFO "no mdb on bridge\n");
+		return 0;
+	}
+
+	cb->seq = mdb->seq;
+	s_idx = cb->args[1];
+	if (s_idx >= mdb->max)
+		return 0;
+
+	nest = nla_nest_start(skb, MDBA_MDB);
+	if (nest == NULL)
+		return -EMSGSIZE;
+
+	for (i = s_idx; i < mdb->max; i++) {
+		struct hlist_node *h;
+		struct net_bridge_mdb_entry *mp;
+		struct net_bridge_port_group *p, **pp;
+		struct net_bridge_port *port;
+
+		hlist_for_each_entry_rcu(mp, h, &mdb->mhash[i], hlist[mdb->ver]) {
+			for (pp = &mp->ports;
+			     (p = rcu_dereference(*pp)) != NULL;
+			      pp = &p->next) {
+				port = p->port;
+				if (port) {
+					printk(KERN_INFO "port %u, mcaddr: %pI4\n", port->port_no, &mp->addr.u.ip4);
+					if (nla_put_u16(skb, MDBA_BRPORT_NO, port->port_no) ||
+					    nla_put_be32(skb, MDBA_MCADDR, p->addr.u.ip4))
+						goto fail;
+				}
+			}
+		}
+	}
+
+	cb->args[1] = i;
+	nla_nest_end(skb, nest);
+	return 0;
+fail:
+	cb->args[1] = i;
+	nla_nest_cancel(skb, nest);
+	return -EMSGSIZE;
+}
+
+static int br_mdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net_device *dev;
+	struct net *net = sock_net(skb->sk);
+	struct nlmsghdr *nlh;
+	u32 seq = cb->nlh->nlmsg_seq;
+	int idx = 0, s_idx;
+
+	s_idx = cb->args[0];
+
+	rcu_read_lock();
+
+	for_each_netdev_rcu(net, dev) {
+		if (dev->priv_flags & IFF_EBRIDGE) {
+			struct br_port_msg *bpm;
+
+			if (idx < s_idx)
+				goto cont;
+
+			nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+					seq, RTM_GETMDB,
+					sizeof(*bpm), NLM_F_MULTI);
+			if (nlh == NULL)
+				break;
+
+			bpm = nlmsg_data(nlh);
+			bpm->ifindex = dev->ifindex;
+			if (br_mdb_fill_info(skb, cb, seq, dev) < 0) {
+				printk(KERN_INFO "br_mdb_fill_info failed\n");
+				goto fail;
+			}
+			if (br_rports_fill_info(skb, cb, seq, dev) < 0) {
+				printk(KERN_INFO "br_rports_fill_info failed\n");
+				goto fail;
+			}
+
+			nlmsg_end(skb, nlh);
+cont:
+			idx++;
+		}
+	}
+
+	rcu_read_unlock();
+	cb->args[0] = idx;
+	return skb->len;
+
+fail:
+	rcu_read_unlock();
+	nlmsg_cancel(skb, nlh);
+	return skb->len;
+}
+
+void br_mdb_init(void)
+{
+	rtnl_register(PF_BRIDGE, RTM_GETMDB, NULL, br_mdb_dump, NULL);
+}
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2417434..d53e4f4 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -322,6 +322,7 @@ static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
 
 	mdb->size = old ? old->size : 0;
 	mdb->ver = old ? old->ver ^ 1 : 0;
+	mdb->seq = old ? (old->seq + 1): 0;
 
 	if (!old || elasticity)
 		get_random_bytes(&mdb->secret, sizeof(mdb->secret));
@@ -1584,6 +1585,7 @@ void br_multicast_init(struct net_bridge *br)
 		    br_multicast_querier_expired, (unsigned long)br);
 	setup_timer(&br->multicast_query_timer, br_multicast_query_expired,
 		    (unsigned long)br);
+	br_mdb_init();
 }
 
 void br_multicast_open(struct net_bridge *br)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index eb9cd42..6484069 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -105,6 +105,7 @@ struct net_bridge_mdb_htable
 	u32				max;
 	u32				secret;
 	u32				ver;
+	u32				seq;
 };
 
 struct net_bridge_port
@@ -432,6 +433,7 @@ extern int br_multicast_set_port_router(struct net_bridge_port *p,
 extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_querier(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
+extern void br_mdb_init(void);
 
 static inline bool br_multicast_is_router(struct net_bridge *br)
 {

^ permalink raw reply related

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: Krzysztof Mazur @ 2012-11-30  9:53 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Chas Williams (CONTRACTOR), David Laight, davem, netdev,
	linux-kernel, nathan
In-Reply-To: <1354263922.21562.270.camel@shinybook.infradead.org>

On Fri, Nov 30, 2012 at 08:25:22AM +0000, David Woodhouse wrote:
> On Fri, 2012-11-30 at 01:57 +0000, David Woodhouse wrote:
> > I think it's actually fixed for pppoatm by the bh_lock_sock() and the
> > sock_owned_by_user() check. As soon as vcc_release() calls lock_sock(),
> > pppoatm stops accepting packets.
> > 
> > It should be simple enough to do the same in br2684.
> 
> Um... but now I come to look at it... Krzysztof, doesn't your 'pppoatm:
> take ATM socket lock in pppoatm_send()' patch actually *break* the case
> of sending via vcc_sendmsg()?

no, in case of

	pppoatm_send()
			vcc_sendmsg()

the vcc_sendmsg() will just wait for releasing sk->sk_lock.slock.

When the vcc_sendmsg() gets lock first

			vcc_sendmsg()
	pppoatm_send()

The pppoatm_send() might spin for a while for sk->sk_lock.slock, but
after lock_sock() the vcc_sendmsg() releases that lock and
pppoatm_send() will acquire it and notice that locked is locked
(sock_owned_by_user() returns true) and will just block pppoatm,
and will be woken up in release_sock() (fix was fixed by your 10/17
patch).

> 
> Why did you include the sock_owned_by_user() check in there and not just
> use bh_lock_sock()?

because bh_lock_sock() will succeeds even with concurrent vcc_sendmsg()
and will have some races in that case.

> 
> With the sock_owned_by_user() check, it'll *always* drop packets
> submitted through vcc_sendmsg(), won't it?

No, sock_owned_by_user() is just in pppoatm_send() and instead of
dropping packets we block pppd.

> 
> Admittedly, for PPPoATM and BR2684 we never do want to have packets
> submitted directly from userspace that way; they should all come via the
> PPP channel or the netdev respectively. So we might want to keep the
> sock_owned_by_user() check because it fixes the close race, and
> explicitly document it.

It fixes also races with vcc_sendmsg(). If we really don't wont
vcc_sendmsg() with pppoatm and br2684 we must do some protection
than vcc_sendmsg() will fail instead of racing with pppoatm_send()
and crashing with some drivers that does not support concurent
->send().

> 
> But it doesn't necessarily work for other protocols, so we may need a
> better solution for the general case. Perhaps drop the
> sock_owned_by_user() check, and put bh_lock_sock() around the beginning
> of vcc_destroy_socket() where it clears ATM_VF_READY? That'll ensure
> that no ->push() is *currently* operating on a skb having seen that the
> VCC is still open.
> 
> Or maybe we just make the *devices* check the ATM_VF_CLOSE flag and
> refuse to send the skb? Put the entire thing into their domain. Although
> that may involve extra locking in the driver to synchronise send() and
> close() sufficiently.

We need some additional synchronizization with pppoatm_send(), now
we use:

	tasklet_kill(&pvcc->wakeup_tasklet);
	ppp_unregister_channel(&pvcc->chan);

In ppp_unregister_channel() we will synchronize with the function
calling pppoatm_send() using "downl" lock.

And this must be done in pppoatm.

> 
> I'm still reluctant to swap the order of the device/protocol close in
> vcc_destroy_socket(). I think that'll just swap one set of problems
> which is now fairly well-understood and mostly solved, for another set.
> In particular, I think the device needs to see the close first, because
> *it* can actually abort or flush any pending TX and RX (including
> synchronising with its tasklet as solos-pci does, etc.). Only then does
> the protocol tear its data structures down. But I suppose the new set of
> problems could be found and overcome, if Chas wants to propose an
> alternative patch set...
> 

I think that the current order is good, now we have:

	1. stop_sending_fames	to protocol
		now TX is shut down
		(currently done by
			set_bit(ATM_VF_CLOSE, &vcc->flags);
			clear_bit(ATM_VF_READY, &vcc->flags);
		)
	2. close_device		to device
		now RX is shut down
	3. device_was_closed	to protocol
		ugly push(NULL), but we can add some other callback.

we also can do:

	1. disable RX		to device
		now RX is shut down
	2. detach	to protocol
		now TX is shut down
		(now protocol can fully detach because RX is disabled)
	3. close_device		to device
		(device is not used anymore)

Krzysiek

^ permalink raw reply

* [PATCH v2]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-11-30  9:38 UTC (permalink / raw)
  To: nic_swsd; +Cc: romieu, netdev, linux-kernel

I get a board with 8168e-vl(10ec:8168 with RTL_GIGA_MAC_VER_34),
everything looks well first, I can use ifconfig to set ip, netmask,
etc. And the rx/tx statistics show by ifconfig looks good when I
ping another host or ping it from another host. But it don't work,
I can't get ICMP REPLAY from both sides, although the RX/TX statistics
seem good.

After add some debug code, I found this NIC only accept ethernet
broadcast package, it can't filter out the package send to its
MAC address, but it works good for sending.So ifconfig show the
RX/TX status means it can receive ARP package.(It don't know its
MAC address, so below)

I have try the driver provided by realtek's website, it have the
same problem at the first time. BUT IT WORK AFTER I REBOOT with
CRTL-ALT-DEL, the reason is that realtek's driver call rtl8168_rar_set
in the .shutdown function register with pci_register_driver. Yes,
the really reason to make it work is rtl8689_rar_set, this function
set extended GigaMAC registers, so after reboot without lost the power,
NIC keep the status before reboot.

I haven't see any code to set GigaMAC registers in kernel when boot,
so I guess BIOS or NIC's circuit make it, but of course one miss
the extended GigaMAC registers  in this problem. The probe code can
get MAC address right, so MAC{0,4} must had been setted, but some
guys forget the extended GigaMAC registers.

This patch fix it.

[ I don't known whether others' realtek's NIC with extended GigaMAC
reigisters have the same problem, I meet it in 8168e-vl with
RTL_GIGA_MAC_VER_34, so I make this patch just for it.]

Changes:
V1-V2:
I follow Francois Romieu 's below opinion to make this patch oneline:

I'd rather see the GigaMAC registers written through a call to
rtl_rar_set when the mac address is read in rtl_init_one instead of
duplicating most of rtl_rar_set in a quite different place.

Signed-off-by: Wang YanQing <udknight@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 927aa33..cb1dce4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6903,6 +6903,7 @@ rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->dev_addr[i] = RTL_R8(MAC0 + i);
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
+	rtl_rar_set(tp, dev->dev_addr);
 	SET_ETHTOOL_OPS(dev, &rtl8169_ethtool_ops);
 	dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
 
-- 
1.7.11.1.116.g8228a23

^ permalink raw reply related

* [PATCH] sctp: fix CONFIG_SCTP_DBG_MSG=y null pointer dereference in sctp_v6_get_dst()
From: Tommi Rantala @ 2012-11-30  9:17 UTC (permalink / raw)
  To: linux-sctp, netdev
  Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	Dave Jones, Tommi Rantala

Trinity (the syscall fuzzer) triggered the following BUG, reproducible
only when the kernel is configured with CONFIG_SCTP_DBG_MSG=y.

When CONFIG_SCTP_DBG_MSG is not set, the null pointer is never
dereferenced.

---[ end trace a4de0bfcb38a3642 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
IP: [<ffffffff8136796e>] ip6_string+0x1e/0xa0
PGD 4eead067 PUD 4e472067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU 3
Pid: 21324, comm: trinity-child11 Tainted: G        W    3.7.0-rc7+ #61 ASUSTeK Computer INC. EB1012/EB1012
RIP: 0010:[<ffffffff8136796e>]  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
RSP: 0018:ffff88004e4637a0  EFLAGS: 00010046
RAX: ffff88004e4637da RBX: ffff88004e4637da RCX: 0000000000000000
RDX: ffffffff8246e92a RSI: 0000000000000100 RDI: ffff88004e4637da
RBP: ffff88004e4637a8 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8289d600
R13: ffffffff8289d230 R14: ffffffff8246e928 R15: ffffffff8289d600
FS:  00007fed95153700(0000) GS:ffff88005fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000100 CR3: 000000004eeac000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process trinity-child11 (pid: 21324, threadinfo ffff88004e462000, task ffff8800524b0000)
Stack:
 ffff88004e4637da ffff88004e463828 ffffffff81368eee 000000004e4637d8
 ffffffff0000ffff ffff88000000ffff 0000000000000000 000000004e4637f8
 ffffffff826285d8 ffff88004e4637f8 0000000000000000 ffff8800524b06b0
Call Trace:
 [<ffffffff81368eee>] ip6_addr_string.isra.11+0x3e/0xa0
 [<ffffffff81369183>] pointer.isra.12+0x233/0x2d0
 [<ffffffff810a413a>] ? vprintk_emit+0x1ba/0x450
 [<ffffffff8110953d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81369757>] vsnprintf+0x187/0x5d0
 [<ffffffff81369c62>] vscnprintf+0x12/0x30
 [<ffffffff810a4028>] vprintk_emit+0xa8/0x450
 [<ffffffff81e5cb00>] printk+0x49/0x4b
 [<ffffffff81d17221>] sctp_v6_get_dst+0x731/0x780
 [<ffffffff81d16e15>] ? sctp_v6_get_dst+0x325/0x780
 [<ffffffff81d00a96>] sctp_transport_route+0x46/0x120
 [<ffffffff81cff0f1>] sctp_assoc_add_peer+0x161/0x350
 [<ffffffff81d0fd8d>] sctp_sendmsg+0x6cd/0xcb0
 [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
 [<ffffffff81b55cfb>] inet_sendmsg+0x10b/0x220
 [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
 [<ffffffff81a72a64>] ? sock_update_classid+0xa4/0x2b0
 [<ffffffff81a72ab0>] ? sock_update_classid+0xf0/0x2b0
 [<ffffffff81a6ac1c>] sock_sendmsg+0xdc/0xf0
 [<ffffffff8118e9e5>] ? might_fault+0x85/0x90
 [<ffffffff8118e99c>] ? might_fault+0x3c/0x90
 [<ffffffff81a6e12a>] sys_sendto+0xfa/0x130
 [<ffffffff810a9887>] ? do_setitimer+0x197/0x380
 [<ffffffff81e960d5>] ? sysret_check+0x22/0x5d
 [<ffffffff81e960a9>] system_call_fastpath+0x16/0x1b
Code: 01 eb 89 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 f8 31 c9 48 89 e5 53 eb 12 0f 1f 40 00 48 83 c1 01 48 83 c0 04 48 83 f9 08 74 70 <0f> b6 3c 4e 89 fb 83 e7 0f c0 eb 04 41 89 d8 41 83 e0 0f 0f b6
RIP  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
 RSP <ffff88004e4637a0>
CR2: 0000000000000100
---[ end trace a4de0bfcb38a3643 ]---

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
---
 net/sctp/ipv6.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index ea14cb4..f3f0f4d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -345,7 +345,7 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 	}
 
 out:
-	if (!IS_ERR(dst)) {
+	if (!IS_ERR_OR_NULL(dst)) {
 		struct rt6_info *rt;
 		rt = (struct rt6_info *)dst;
 		t->dst = dst;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-11-30  9:04 UTC (permalink / raw)
  To: Francois Romieu; +Cc: nic_swsd, netdev, linux-kernel, Hayes Wang
In-Reply-To: <20121130063500.GA9436@electric-eye.fr.zoreil.com>

On Fri, Nov 30, 2012 at 07:35:00AM +0100, Francois Romieu wrote:
> Which kernel version is it ?
I have done the test and debug  with mainline 
e23739b4ade80a3a7f87198f008f6c44a7cbc9fd, v3.7-rc7-51-ge23739b

> I'd rather see the GigaMAC registers written through a call to
> rtl_rar_set when the mac address is read in rtl_init_one instead
> of duplicating most of rtl_rar_set in a quite different place.

I have the same feeling with you, I will resend the patch follow
your opinion if everything pass test and work.

Thanks

^ permalink raw reply

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: David Woodhouse @ 2012-11-30  8:25 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR)
  Cc: Krzysztof Mazur, David Laight, davem, netdev, linux-kernel,
	nathan
In-Reply-To: <1354240620.21562.256.camel@shinybook.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 2212 bytes --]

On Fri, 2012-11-30 at 01:57 +0000, David Woodhouse wrote:
> I think it's actually fixed for pppoatm by the bh_lock_sock() and the
> sock_owned_by_user() check. As soon as vcc_release() calls lock_sock(),
> pppoatm stops accepting packets.
> 
> It should be simple enough to do the same in br2684.

Um... but now I come to look at it... Krzysztof, doesn't your 'pppoatm:
take ATM socket lock in pppoatm_send()' patch actually *break* the case
of sending via vcc_sendmsg()?

Why did you include the sock_owned_by_user() check in there and not just
use bh_lock_sock()?

With the sock_owned_by_user() check, it'll *always* drop packets
submitted through vcc_sendmsg(), won't it?

Admittedly, for PPPoATM and BR2684 we never do want to have packets
submitted directly from userspace that way; they should all come via the
PPP channel or the netdev respectively. So we might want to keep the
sock_owned_by_user() check because it fixes the close race, and
explicitly document it.

But it doesn't necessarily work for other protocols, so we may need a
better solution for the general case. Perhaps drop the
sock_owned_by_user() check, and put bh_lock_sock() around the beginning
of vcc_destroy_socket() where it clears ATM_VF_READY? That'll ensure
that no ->push() is *currently* operating on a skb having seen that the
VCC is still open.

Or maybe we just make the *devices* check the ATM_VF_CLOSE flag and
refuse to send the skb? Put the entire thing into their domain. Although
that may involve extra locking in the driver to synchronise send() and
close() sufficiently.

I'm still reluctant to swap the order of the device/protocol close in
vcc_destroy_socket(). I think that'll just swap one set of problems
which is now fairly well-understood and mostly solved, for another set.
In particular, I think the device needs to see the close first, because
*it* can actually abort or flush any pending TX and RX (including
synchronising with its tasklet as solos-pci does, etc.). Only then does
the protocol tear its data structures down. But I suppose the new set of
problems could be found and overcome, if Chas wants to propose an
alternative patch set...

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Re: [PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Francois Romieu @ 2012-11-30  6:35 UTC (permalink / raw)
  To: Wang YanQing; +Cc: nic_swsd, netdev, linux-kernel, Hayes Wang
In-Reply-To: <20121130050619.GA12862@udknight>

Wang YanQing <udknight@gmail.com> :
[...]
> After add some debug code, I found this NIC only accept ethernet
> broadcast package, it can't filter out the package send to its
> MAC address, but it works good for sending.So ifconfig show the
> RX/TX status means it can receive ARP package.(It don't its MAC
> address, so below)

Which kernel version is it ?

[...]
> I haven't see any code to set GigaMAC registers in kernel when boot,
> so I guess BIOS or NIC's circuit make it, but of course one miss

I'd appreciate to figure it out (and understand why I did not notice
it when testing).

> the extended GigaMAC registers  in this problem. The probe code can
> get MAC address right, so MAC{0,4} must had been setted, but some
> guys forget the extended GigaMAC registers.
> 
> This patch fix it.

It is a good analysis job.

I'd rather see the GigaMAC registers written through a call to
rtl_rar_set when the mac address is read in rtl_init_one instead
of duplicating most of rtl_rar_set in a quite different place.

Hayes, can you specify if it would work or if it may mess the registers
init sequence ordering ?

Thanks.

-- 
Ueimor

^ permalink raw reply

* Re: iputils: ping -I <iface>
From: Jan Synacek @ 2012-11-30  6:06 UTC (permalink / raw)
  To: Ben Greear; +Cc: YOSHIFUJI Hideaki, netdev
In-Reply-To: <50B7BC0F.6000709@candelatech.com>

On 11/29/2012 08:48 PM, Ben Greear wrote:
> On 11/29/2012 06:12 AM, Jan Synacek wrote:
>> Hello,
>>
>> There seems to be a bug(?) when calling ping with -I lo:
>>
>> $ ping -I lo kernel.org
>>
>> PING kernel.org (149.20.4.69) from 192.168.1.10 lo: 56(84) bytes of data.
>> ^C
>>
>> Note that 192.168.1.10 is my primary interface's address (em1). However, no
>> replies are coming back.
>>
>> $ ping -I em1 kernel.org
>>
>> PING kernel.org (149.20.4.69) from 192.168.1.10 em1: 56(84) bytes of data.
>> 64 bytes from pub2.kernel.org (149.20.4.69): icmp_seq=1 ttl=42 time=202 ms
>> 64 bytes from pub2.kernel.org (149.20.4.69): icmp_seq=2 ttl=42 time=187 ms
>> ^C
>>
>> Works as expected.
>>
>> I know that binding to loopback probably doesn't make much sense, but I think
>> that ping should be able to cope with that.
> 
> I think it would be wrong if ping worked as you suggest.  Binding to an
> interface means use that interface as the source of your packets, and having
> it bind hard helps when using systems with multiple NICs on same subnet
> (or possibly, same IP).

I just wanted to point out that if I call ping with -I lo, its 'from' address is
wrong (in my case 192.168.1.10) and nothing happens (that's, I guess, expected
if it really bound to loopback). If I call ping with the -I <the same address>
or -I em1 (the same address again), it works as expected. I'm sorry if I wasn't
clear enough.

> 
>> Also, it would be nice to mention the difference between -I <ip> and -I <iface>
>> in the manpage.
> 
> In my opinion, -I <iface> should use SO_BINDTODEVICE, but at least in
> older versions of ping it did not.

Ping does use SO_BINDTODEVICE.

> 
> Thanks,
> Ben
> 

Regards,
-- 
Jan Synacek
Software Engineer, BaseOS team Brno, Red Hat

^ permalink raw reply

* [PATCH]realtek:r8169: Bugfix or workaround for missing extended GigaMAC registers settings
From: Wang YanQing @ 2012-11-30  5:06 UTC (permalink / raw)
  To: nic_swsd; +Cc: romieu, netdev, linux-kernel

I get a board with 8168e-vl(10ec:8168 with RTL_GIGA_MAC_VER_34),
everything looks well first, I can use ifconfig to set ip, netmask,
etc. And the rx/tx statistics show by ifconfig looks good when I
ping another host or ping it from another host. But it don't work,
I can't get ICMP REPLAY from both sides, although the RX/TX statistics
seem good.

After add some debug code, I found this NIC only accept ethernet
broadcast package, it can't filter out the package send to its
MAC address, but it works good for sending.So ifconfig show the
RX/TX status means it can receive ARP package.(It don't its MAC
address, so below)

I have try the driver provided by realtek's website, it have the
same problem at the first time. BUT IT WORK AFTER I REBOOT with
CRTL-ALT-DEL, the reason is that realtek's driver call rtl8168_rar_set
in the .shutdown function register with pci_register_driver. Yes,
the really reason to make it work is rtl8689_rar_set, this function
set extended GigaMAC registers, so after reboot without lost the power,
NIC keep the status before reboot.

I haven't see any code to set GigaMAC registers in kernel when boot,
so I guess BIOS or NIC's circuit make it, but of course one miss
the extended GigaMAC registers  in this problem. The probe code can
get MAC address right, so MAC{0,4} must had been setted, but some
guys forget the extended GigaMAC registers.

This patch fix it.
[ I don't known whether others' realtek's NIC with extended GigaMAC
reigisters have the same problem, I meet it in 8168e-vl with
RTL_GIGA_MAC_VER_34, so I make this patch just for it.]

Signed-off-by: Wang YanQing <udknight@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 927aa33..e49c08d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3095,7 +3095,30 @@ static void rtl8168e_1_hw_phy_config(struct rtl8169_private *tp)
 	rtl_writephy(tp, 0x0e, 0x0000);
 	rtl_writephy(tp, 0x0d, 0x0000);
 }
+static void rtl8168e_2_workaround(struct rtl8169_private *tp, u8 *addr)
+{
+	void __iomem *ioaddr = tp->mmio_addr;
+	u32 high;
+	u32 low;
+
+	low  = addr[0] | (addr[1] << 8) | (addr[2] << 16) | (addr[3] << 24);
+	high = addr[4] | (addr[5] << 8);
+
+
+	RTL_W8(Cfg9346, Cfg9346_Unlock);
+	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		const struct exgmac_reg e[] = {
+			{ .addr = 0xe0, ERIAR_MASK_1111, .val = low },
+			{ .addr = 0xe4, ERIAR_MASK_1111, .val = high },
+			{ .addr = 0xf0, ERIAR_MASK_1111, .val = low << 16 },
+			{ .addr = 0xf4, ERIAR_MASK_1111, .val = high << 16 |
+								low  >> 16 },
+		};
 
+		rtl_write_exgmac_batch(tp, e, ARRAY_SIZE(e));
+	}
+	RTL_W8(Cfg9346, Cfg9346_Lock);
+}
 static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
 {
 	static const struct phy_reg phy_reg_init[] = {
@@ -3178,6 +3201,7 @@ static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
 	rtl_w1w0_phy(tp, 0x19, 0x0000, 0x0001);
 	rtl_w1w0_phy(tp, 0x10, 0x0000, 0x0400);
 	rtl_writephy(tp, 0x1f, 0x0000);
+	rtl8168e_2_workaround(tp, tp->dev->dev_addr);
 }
 
 static void rtl8168f_hw_phy_config(struct rtl8169_private *tp)
-- 
1.7.11.1.116.g8228a23

^ permalink raw reply related

* [PATCH 2/2] mac802154: use kfree_skb() instead of dev_kfree_skb()
From: Alan Ott @ 2012-11-30  4:25 UTC (permalink / raw)
  To: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	Eric Dumazet
  Cc: linux-zigbee-devel, netdev, linux-kernel, Alan Ott
In-Reply-To: <1354249511-8086-1-git-send-email-alan@signal11.us>

kfree_skb() indicates failure, which is where this is being used.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/mac802154/tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index db63914..4e09d07 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -99,7 +99,7 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
 	}
 
 	if (skb_cow_head(skb, priv->hw.extra_tx_headroom)) {
-		dev_kfree_skb(skb);
+		kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
 
-- 
1.7.11.2

^ permalink raw reply related

* [PATCH 1/2] mac802154: fix memory leaks
From: Alan Ott @ 2012-11-30  4:25 UTC (permalink / raw)
  To: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	Eric Dumazet
  Cc: linux-zigbee-devel, netdev, linux-kernel, Alan Ott
In-Reply-To: <504EA95C.9010003@signal11.us>

kfree_skb() was not getting called in the case of some failures.
This was pointed out by Eric Dumazet.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 net/mac802154/tx.c   | 5 ++++-
 net/mac802154/wpan.c | 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 1a4df39..db63914 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -85,6 +85,7 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
 
 	if (!(priv->phy->channels_supported[page] & (1 << chan))) {
 		WARN_ON(1);
+		kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
 
@@ -103,8 +104,10 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
 	}
 
 	work = kzalloc(sizeof(struct xmit_work), GFP_ATOMIC);
-	if (!work)
+	if (!work) {
+		kfree_skb(skb);
 		return NETDEV_TX_BUSY;
+	}
 
 	INIT_WORK(&work->work, mac802154_xmit_worker);
 	work->skb = skb;
diff --git a/net/mac802154/wpan.c b/net/mac802154/wpan.c
index f30f6d4..1191039 100644
--- a/net/mac802154/wpan.c
+++ b/net/mac802154/wpan.c
@@ -327,8 +327,10 @@ mac802154_wpan_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (chan == MAC802154_CHAN_NONE ||
 	    page >= WPAN_NUM_PAGES ||
-	    chan >= WPAN_NUM_CHANNELS)
+	    chan >= WPAN_NUM_CHANNELS) {
+		kfree_skb(skb);
 		return NETDEV_TX_OK;
+	}
 
 	skb->skb_iif = dev->ifindex;
 	dev->stats.tx_packets++;
-- 
1.7.11.2

^ permalink raw reply related

* Re: [PATCH] 6lowpan: consider checksum bytes in fragmentation threshold
From: Alan Ott @ 2012-11-30  1:58 UTC (permalink / raw)
  To: Alan Ott
  Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	linux-zigbee-devel, netdev, linux-kernel
In-Reply-To: <1354240544-22214-1-git-send-email-alan@signal11.us>

On 11/29/2012 08:55 PM, Alan Ott wrote:
> Change the threshold for framentation of a lowpan packet from
> using the MTU size to now use the MTU size minus the checksum length,
> which is added by the hardware. For IEEE 802.15.4, this effectively
> changes it from 127 bytes to 125 bytes.
>

Sorry, this was put in the wrong thread. One day I'll get one of these
first-try. :(

^ permalink raw reply

* Re: [PATCH v2 3/3] pppoatm: protect against freeing of vcc
From: David Woodhouse @ 2012-11-30  1:57 UTC (permalink / raw)
  To: Chas Williams (CONTRACTOR)
  Cc: Krzysztof Mazur, David Laight, davem, netdev, linux-kernel,
	nathan
In-Reply-To: <201211300138.qAU1c8sE003388@thirdoffive.cmf.nrl.navy.mil>

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On Thu, 2012-11-29 at 20:38 -0500, Chas Williams (CONTRACTOR) wrote:
> it isnt clear to me that fixes the race entirely either.
> vcc_destroy_socket() and any of the push()/sends()'s are not
> serialized.
> while you may clear the ATM_VF_READY flag, you might not clear it soon
> enough for any particular push() that is already running.  so it still
> seems like you are racing close() against push() at this point.  the
> window is greatly reduced, but it still exists.

I think it's actually fixed for pppoatm by the bh_lock_sock() and the
sock_owned_by_user() check. As soon as vcc_release() calls lock_sock(),
pppoatm stops accepting packets.

It should be simple enough to do the same in br2684.

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox