Netdev List
 help / color / mirror / Atom feed
* sfc: an enumeration is not a bitmask
From: David Miller @ 2011-05-17 18:14 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev


Ben can you please get rid of "enum efx_fc_type"?

drivers/net/sfc/mcdi_mac.c: In function ‘efx_mcdi_set_mac’:
drivers/net/sfc/mcdi_mac.c:36:2: warning: case value ‘3’ not in enumerated type ‘enum efx_fc_type’

An enumeration is not a bitmask, instead it means one out of the set
of enumerated values will be used.  This means that the warning
here about:

	switch (efx->wanted_fc) {
	case EFX_FC_RX | EFX_FC_TX:

is completely legitimate.

Thanks.

^ permalink raw reply

* Re: pull request: sfc-next-2.6 2011-05-17
From: David Miller @ 2011-05-17 18:00 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1305647988.2848.14.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 17 May 2011 16:59:48 +0100

> The following changes since commit 7be799a70ba3dd90a59e8d2c72bbe06020005b3f:
> 
>   ipv4: Remove rt->rt_dst reference from ip_forward_options(). (2011-05-13 17:31:02 -0400)
> 
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6.git master
> 
> Miscellaneous bug fixes.

Pulled, thanks Ben.

^ permalink raw reply

* Re: [PATCH] net: ping: fix build failure
From: David Miller @ 2011-05-17 18:00 UTC (permalink / raw)
  To: randy.dunlap; +Cc: segoon, sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20110517084125.77b543c8.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Tue, 17 May 2011 08:41:25 -0700

> On Tue, 17 May 2011 14:16:56 +0400 Vasiliy Kulikov wrote:
> 
>> On Mon, May 16, 2011 at 15:38 -0400, David Miller wrote:
>> > From: Randy Dunlap <randy.dunlap@oracle.com>
>> > Date: Mon, 16 May 2011 12:35:34 -0700
>> > 
>> > > On Mon, 16 May 2011 15:10:19 +1000 Stephen Rothwell wrote:
>> > > when CONFIG_PROC_SYSCTL is not enabled:
>> > > 
>> > > ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
>> > 
>> > Vasiliy, please fix this.
>> 
>> I wonder whether there is any way to test such unusual configurations?
>> Only randconfig or are there any (partly-)automated tools for it?
>> 
>> 
>> [PATCH] net: ping: fix build failure
>> 
>> If CONFIG_PROC_SYSCTL=n the building process fails:
>> 
>>     ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
>> 
>> Moved inet_get_ping_group_range_net() to ping.c.
>> 
>> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
>> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
> 
> Acked-by: Randy Dunlap <randy.dunlap@oracle.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-2.6] net: use hlist_del_rcu() in dev_change_name()
From: David Miller @ 2011-05-17 17:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1305649260.2850.106.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 17 May 2011 18:21:00 +0200

> Using plain hlist_del() in dev_change_name() is wrong since a
> concurrent reader can crash trying to dereference LIST_POISON1.
> 
> Bug introduced in commit 72c9528bab94 (net: Introduce
> dev_get_by_name_rcu())
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH v3 1/1] igmp: call ip_mc_clear_src() only when we have no users of ip_mc_list
From: David Stevens @ 2011-05-17 17:42 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: David Miller, jmorris, kaber, kuznet, linux-kbuild, linux-kernel,
	mmarek, netdev, pekkas, yoshfuji
In-Reply-To: <20110517143756.GE30366@darkmag.usersys.redhat.com>

Veaceslav,
        It looks to me like this will leak the source filters if we are 
called from ip_mc_destroy_dev(),
Even with your previous patch, you're assuming that we don't free the 
ip_mc_list and so we have the
same one when we up the device, but if there are no timers running, it 
looks like refcnt canl go to 0 and free
it. If we can ever free the ip_mc_list when users != 0 (or going to 0 
immediately after the drop), we
have to do the ip_mc_clear_src() or leak the list. I haven't looked at 
this code in years, so I'll need
to refresh my memory.
        So, I'll look at that a bit more; at a minimum, I think you need 
to do the clear_src
also in the destroy case. We could lose the filters and set the exclude 
count to users, instead
of 1; but I like the idea of keeping the source filters across a down/up, 
if we can be sure there
are no cases where we free the ip_mc_list without first freeing all the 
filters.

                                                                +-DLS

Veaceslav Falico <vfalico@redhat.com> wrote on 05/17/2011 07:37:56 AM:

> From: Veaceslav Falico <vfalico@redhat.com>
> To: David Stevens/Beaverton/IBM@IBMUS
> Cc: David Miller <davem@davemloft.net>, jmorris@namei.org, 
> kaber@trash.net, kuznet@ms2.inr.ac.ru, linux-kbuild@vger.kernel.org,
> linux-kernel@vger.kernel.org, mmarek@suse.cz, 
> netdev@vger.kernel.org, pekkas@netcore.fi, yoshfuji@linux-ipv6.org
> Date: 05/17/2011 07:39 AM
> Subject: [PATCH v3 1/1] igmp: call ip_mc_clear_src() only when we 
> have no users of ip_mc_list
> 
> In igmp_group_dropped() we call ip_mc_clear_src(), which resets the 
number
> of source filters per mulitcast. However, igmp_group_dropped() is also
> called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, 
which
> means that the group might get added back on NETDEV_UP, NETDEV_REGISTER 
and
> NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source
> filters.
> 
> To fix that, we must clear the source filters only when there are no 
users
> in the ip_mc_list, i.e. in ip_mc_dec_group().
> 
> Correct version of the patch.
> 
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 1fd3d9c..142ca0d 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1169,20 +1169,18 @@ static void igmp_group_dropped(struct ip_mc_list 
*im)
> 
>     if (!in_dev->dead) {
>        if (IGMP_V1_SEEN(in_dev))
> -         goto done;
> +         return;
>        if (IGMP_V2_SEEN(in_dev)) {
>           if (reporter)
>              igmp_send_report(in_dev, im, IGMP_HOST_LEAVE_MESSAGE);
> -         goto done;
> +         return;
>        }
>        /* IGMPv3 */
>        igmpv3_add_delrec(in_dev, im);
> 
>        igmp_ifc_event(in_dev);
>     }
> -done:
>  #endif
> -   ip_mc_clear_src(im);
>  }
> 
>  static void igmp_group_added(struct ip_mc_list *im)
> @@ -1319,6 +1317,7 @@ void ip_mc_dec_group(struct in_device *in_dev,
> __be32 addr)
>              *ip = i->next_rcu;
>              in_dev->mc_count--;
>              igmp_group_dropped(i);
> +            ip_mc_clear_src(i);
> 
>              if (!in_dev->dead)
>                 ip_rt_multicast_event(in_dev);

^ permalink raw reply

* RE: [RFC V3 PATCH] The message size allocated for rtnl info dumps was limited to a single
From: Rose, Gregory V @ 2011-05-17 17:16 UTC (permalink / raw)
  To: Rose, Gregory V, netdev@vger.kernel.org
  Cc: bhutchings@solarflare.com, davem@davemloft.net,
	eric.dumazet@gmail.com
In-Reply-To: <20110517171412.5481.71745.stgit@gitlad.jf.intel.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Greg Rose
> Sent: Tuesday, May 17, 2011 10:14 AM
> To: netdev@vger.kernel.org
> Cc: bhutchings@solarflare.com; davem@davemloft.net; eric.dumazet@gmail.com
> Subject: [RFC V3 PATCH] The message size allocated for rtnl info dumps was
> limited to a single
> 
Guh... screwed up the title...

Oh well, it's an RFC.

;^)


^ permalink raw reply

* Re: ath5k regression associating with APs in 2.6.38
From: Nick Kossifidis @ 2011-05-17 17:14 UTC (permalink / raw)
  To: Nick Kossifidis, John W. Linville, Jiri Slaby, Luis R. Rodriguez,
	Bob Copeland
In-Reply-To: <20110517165720.GB9258@thinkpad-t410>

2011/5/17 Seth Forshee <seth.forshee@canonical.com>:
> On Mon, May 09, 2011 at 09:02:30AM +0200, Seth Forshee wrote:
>> On Thu, May 05, 2011 at 05:30:42PM +0300, Nick Kossifidis wrote:
>> > Hmm I don't see any errors from reset/phy code, can you disable
>> > Network Manager/wpa-supplicant and test connection on an open network
>> > using iw ? It 'll give us a better picture...
>> >
>> > If iw doesn't return any scan results we are probably hitting a PHY/RF
>> > error specific to your device (not all vendors follow the reference
>> > design). Maybe we should follow a blacklist/whitelist approach for
>> > this feature.
>>
>> I got the results back from my tester. He was able to get scan results,
>> but it took multiple tries and the direct probe failures appear in the
>> log. He didn't enable ATH5K_DEBUG_RESET this time; let me know if you
>> need that and I'll request he retest with the extra debug logs enabled.
>
> I got some more feedback. Most of the time iw does not get scan results,
> but even when it does connecting to the AP isn't always successful. The
> tester did note that he doesn't seem to have any trouble if his machine
> is within a few feet of his AP. Let me know if you'd like something else
> tested.
>
> I noticed that bugzilla #31922 (ath5k: Decreased throughput in IBSS or
> 802.11n mode) is also fixed by reverting 8aec7af9. It seems like the
> synth-only channel changes are resulting in poor connection quality.
> Maybe that patch needs to be reverted?
>
> Thanks,
> Seth
>
>

http://www.kernel.org/pub/linux/kernel/people/mickflemm/01-fast-chan-switch-modparm

-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick

^ permalink raw reply

* [RFC V3 PATCH] The message size allocated for rtnl info dumps was limited to a single
From: Greg Rose @ 2011-05-17 17:14 UTC (permalink / raw)
  To: netdev; +Cc: bhutchings, davem, eric.dumazet

page.  This is not enough for additional interface info available with
devices that support SR-IOV.  Calculate the amount of data required so
the dump can allocate enough data to satisfy the request.

V2 added the calcit function to the rtnl_register calls so that
dump functions could get the minimum dump allocation size if they
needed to.

V3 leverages the fact that the netdev register function ends up
calling if_nlmsg_size.  We collect the minimum dump allocation size
there and keep it in a module static variable so that the calcit
function doesn't have to search for the device on every info dump.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
---

 include/linux/netlink.h              |    6 ++-
 include/net/rtnetlink.h              |    5 ++-
 net/bridge/br_netlink.c              |   15 ++++++---
 net/core/fib_rules.c                 |    6 ++-
 net/core/neighbour.c                 |   10 +++---
 net/core/rtnetlink.c                 |   59 ++++++++++++++++++++++++++++------
 net/dcb/dcbnl.c                      |    4 +-
 net/decnet/dn_dev.c                  |    6 ++-
 net/decnet/dn_fib.c                  |    4 +-
 net/decnet/dn_route.c                |    5 ++-
 net/ipv4/devinet.c                   |    6 ++-
 net/ipv4/fib_frontend.c              |    6 ++-
 net/ipv4/inet_diag.c                 |    2 +
 net/ipv4/ipmr.c                      |    3 +-
 net/ipv4/route.c                     |    2 +
 net/ipv6/addrconf.c                  |   12 +++----
 net/ipv6/addrlabel.c                 |    6 ++-
 net/ipv6/ip6_fib.c                   |    2 +
 net/ipv6/ip6mr.c                     |    2 +
 net/ipv6/route.c                     |    6 ++-
 net/netfilter/ipset/ip_set_core.c    |    2 +
 net/netfilter/nf_conntrack_netlink.c |    4 +-
 net/netlink/af_netlink.c             |   17 ++++++----
 net/netlink/genetlink.c              |    2 +
 net/phonet/pn_netlink.c              |   12 +++----
 net/sched/act_api.c                  |    6 ++-
 net/sched/cls_api.c                  |    6 ++-
 net/sched/sch_api.c                  |   12 +++----
 net/xfrm/xfrm_user.c                 |    2 +
 29 files changed, 141 insertions(+), 89 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 4c4ac3f..8b8dfb8 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -220,7 +220,8 @@ struct netlink_callback {
 	int			(*dump)(struct sk_buff * skb,
 					struct netlink_callback *cb);
 	int			(*done)(struct netlink_callback *cb);
-	int			family;
+	u16			family;
+	u16			min_dump_alloc;
 	long			args[6];
 };
 
@@ -258,7 +259,8 @@ __nlmsg_put(struct sk_buff *skb, u32 pid, u32 seq, int type, int len, int flags)
 extern int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 			      const struct nlmsghdr *nlh,
 			      int (*dump)(struct sk_buff *skb, struct netlink_callback*),
-			      int (*done)(struct netlink_callback*));
+			      int (*done)(struct netlink_callback*),
+			      u16 min_dump_alloc);
 
 
 #define NL_NONROOT_RECV 0x1
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 4093ca7..d1ac642 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -6,11 +6,12 @@
 
 typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *, void *);
 typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
+typedef u16 (*rtnl_calcit_func)(struct sk_buff *);
 
 extern int	__rtnl_register(int protocol, int msgtype,
-				rtnl_doit_func, rtnl_dumpit_func);
+				rtnl_doit_func, rtnl_dumpit_func, rtnl_calcit_func);
 extern void	rtnl_register(int protocol, int msgtype,
-			      rtnl_doit_func, rtnl_dumpit_func);
+			      rtnl_doit_func, rtnl_dumpit_func, rtnl_calcit_func);
 extern int	rtnl_unregister(int protocol, int msgtype);
 extern void	rtnl_unregister_all(int protocol);
 
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index ffb0dc4..6814083 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -218,19 +218,24 @@ int __init br_netlink_init(void)
 	if (err < 0)
 		goto err1;
 
-	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, br_dump_ifinfo);
+	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL,
+			      br_dump_ifinfo, NULL);
 	if (err)
 		goto err2;
-	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK,
+			      br_rtm_setlink, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH,
+			      br_fdb_add, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
+	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH,
+			      br_fdb_delete, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
+	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
+			      NULL, br_fdb_dump, NULL);
 	if (err)
 		goto err3;
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 3911586..56e6fc8 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -739,9 +739,9 @@ static struct pernet_operations fib_rules_net_ops = {
 static int __init fib_rules_init(void)
 {
 	int err;
-	rtnl_register(PF_UNSPEC, RTM_NEWRULE, fib_nl_newrule, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELRULE, fib_nl_delrule, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule);
+	rtnl_register(PF_UNSPEC, RTM_NEWRULE, fib_nl_newrule, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELRULE, fib_nl_delrule, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETRULE, NULL, fib_nl_dumprule, NULL);
 
 	err = register_pernet_subsys(&fib_rules_net_ops);
 	if (err < 0)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 799f06e..a880b83 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2909,12 +2909,12 @@ EXPORT_SYMBOL(neigh_sysctl_unregister);
 
 static int __init neigh_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETNEIGH, NULL, neigh_dump_info);
+	rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNEIGH, NULL, neigh_dump_info, NULL);
 
-	rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info);
-	rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info, NULL);
+	rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL, NULL);
 
 	return 0;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2ba259..a59e595 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -56,9 +56,11 @@
 struct rtnl_link {
 	rtnl_doit_func		doit;
 	rtnl_dumpit_func	dumpit;
+	rtnl_calcit_func calcit;
 };
 
 static DEFINE_MUTEX(rtnl_mutex);
+static u16 min_ifinfo_dump_size;
 
 void rtnl_lock(void)
 {
@@ -144,12 +146,28 @@ static rtnl_dumpit_func rtnl_get_dumpit(int protocol, int msgindex)
 	return tab ? tab[msgindex].dumpit : NULL;
 }
 
+static rtnl_calcit_func rtnl_get_calcit(int protocol, int msgindex)
+{
+	struct rtnl_link *tab;
+
+	if (protocol <= RTNL_FAMILY_MAX)
+		tab = rtnl_msg_handlers[protocol];
+	else
+		tab = NULL;
+
+	if (tab == NULL || tab[msgindex].calcit == NULL)
+		tab = rtnl_msg_handlers[PF_UNSPEC];
+
+	return tab ? tab[msgindex].calcit : NULL;
+}
+
 /**
  * __rtnl_register - Register a rtnetlink message type
  * @protocol: Protocol family or PF_UNSPEC
  * @msgtype: rtnetlink message type
  * @doit: Function pointer called for each request message
  * @dumpit: Function pointer called for each dump request (NLM_F_DUMP) message
+ * @calcit: Function pointer to calc size of dump message
  *
  * Registers the specified function pointers (at least one of them has
  * to be non-NULL) to be called whenever a request message for the
@@ -162,7 +180,8 @@ static rtnl_dumpit_func rtnl_get_dumpit(int protocol, int msgindex)
  * Returns 0 on success or a negative error code.
  */
 int __rtnl_register(int protocol, int msgtype,
-		    rtnl_doit_func doit, rtnl_dumpit_func dumpit)
+		    rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+		    rtnl_calcit_func calcit)
 {
 	struct rtnl_link *tab;
 	int msgindex;
@@ -185,6 +204,9 @@ int __rtnl_register(int protocol, int msgtype,
 	if (dumpit)
 		tab[msgindex].dumpit = dumpit;
 
+	if (calcit)
+		tab[msgindex].calcit = calcit;
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__rtnl_register);
@@ -199,9 +221,10 @@ EXPORT_SYMBOL_GPL(__rtnl_register);
  * of memory implies no sense in continuing.
  */
 void rtnl_register(int protocol, int msgtype,
-		   rtnl_doit_func doit, rtnl_dumpit_func dumpit)
+		   rtnl_doit_func doit, rtnl_dumpit_func dumpit,
+		   rtnl_calcit_func calcit)
 {
-	if (__rtnl_register(protocol, msgtype, doit, dumpit) < 0)
+	if (__rtnl_register(protocol, msgtype, doit, dumpit, calcit) < 0)
 		panic("Unable to register rtnetlink message handler, "
 		      "protocol = %d, message type = %d\n",
 		      protocol, msgtype);
@@ -1814,6 +1837,11 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	return err;
 }
 
+static u16 rtnl_calcit(struct sk_buff *skb)
+{
+	return min_ifinfo_dump_size;
+}
+
 static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	int idx;
@@ -1843,11 +1871,14 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
+	size_t if_info_size;
 
-	skb = nlmsg_new(if_nlmsg_size(dev), GFP_KERNEL);
+	skb = nlmsg_new((if_info_size = if_nlmsg_size(dev)), GFP_KERNEL);
 	if (skb == NULL)
 		goto errout;
 
+	min_ifinfo_dump_size = max_t(u16, if_info_size, min_ifinfo_dump_size);
+
 	err = rtnl_fill_ifinfo(skb, dev, type, 0, 0, change, 0);
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in if_nlmsg_size() */
@@ -1897,13 +1928,19 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
 		struct sock *rtnl;
 		rtnl_dumpit_func dumpit;
+		rtnl_calcit_func calcit;
+		u16 min_dump_alloc = 0;
 
 		dumpit = rtnl_get_dumpit(family, type);
 		if (dumpit == NULL)
 			return -EOPNOTSUPP;
+		calcit = rtnl_get_calcit(family, type);
+		if (calcit)
+			min_dump_alloc = calcit(skb);
 
 		rtnl = net->rtnl;
-		return netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
+		return netlink_dump_start(rtnl, skb, nlh, dumpit,
+					  NULL, min_dump_alloc);
 	}
 
 	memset(rta_buf, 0, (rtattr_max * sizeof(struct rtattr *)));
@@ -2009,12 +2046,12 @@ void __init rtnetlink_init(void)
 	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
 	register_netdevice_notifier(&rtnetlink_dev_notifier);
 
-	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink, rtnl_dump_ifinfo);
-	rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL);
-	rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink, rtnl_dump_ifinfo, rtnl_calcit);
+	rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL, NULL);
 
-	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all);
-	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all);
+	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, NULL);
 }
 
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 3609eac..ed1bb8c 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1819,8 +1819,8 @@ static int __init dcbnl_init(void)
 {
 	INIT_LIST_HEAD(&dcb_app_list);
 
-	rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL);
-	rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL, NULL);
 
 	return 0;
 }
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 404fa15..0011eba 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -1419,9 +1419,9 @@ void __init dn_dev_init(void)
 
 	dn_dev_devices_on();
 
-	rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL);
-	rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL);
-	rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr);
+	rtnl_register(PF_DECnet, RTM_NEWADDR, dn_nl_newaddr, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_DELADDR, dn_nl_deladdr, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_GETADDR, NULL, dn_nl_dump_ifaddr, NULL);
 
 	proc_net_fops_create(&init_net, "decnet_dev", S_IRUGO, &dn_dev_seq_fops);
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 1c74ed3..104324d 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -763,8 +763,8 @@ void __init dn_fib_init(void)
 
 	register_dnaddr_notifier(&dn_fib_dnaddr_notifier);
 
-	rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL);
-	rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL);
+	rtnl_register(PF_DECnet, RTM_NEWROUTE, dn_fib_rtm_newroute, NULL, NULL);
+	rtnl_register(PF_DECnet, RTM_DELROUTE, dn_fib_rtm_delroute, NULL, NULL);
 }
 
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 74544bc..2949ca4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1841,10 +1841,11 @@ void __init dn_route_init(void)
 	proc_net_fops_create(&init_net, "decnet_cache", S_IRUGO, &dn_rt_cache_seq_fops);
 
 #ifdef CONFIG_DECNET_ROUTER
-	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute, dn_fib_dump);
+	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
+		      dn_fib_dump, NULL);
 #else
 	rtnl_register(PF_DECnet, RTM_GETROUTE, dn_cache_getroute,
-		      dn_cache_dump);
+		      dn_cache_dump, NULL);
 #endif
 }
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 0d4a184..37b3c18 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1833,8 +1833,8 @@ void __init devinet_init(void)
 
 	rtnl_af_register(&inet_af_ops);
 
-	rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL);
-	rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL);
-	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr);
+	rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, NULL);
+	rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, NULL);
+	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, NULL);
 }
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2252471..92fc5f6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1124,9 +1124,9 @@ static struct pernet_operations fib_net_ops = {
 
 void __init ip_fib_init(void)
 {
-	rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL);
-	rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL);
-	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib);
+	rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, NULL);
+	rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, NULL);
+	rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, NULL);
 
 	register_pernet_subsys(&fib_net_ops);
 	register_netdevice_notifier(&fib_netdev_notifier);
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 6ffe94c..5ff4765 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -871,7 +871,7 @@ static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		}
 
 		return netlink_dump_start(idiagnl, skb, nlh,
-					  inet_diag_dump, NULL);
+					  inet_diag_dump, NULL, 0);
 	}
 
 	return inet_diag_get_exact(skb, nlh);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 30a7763..aae2bd8 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2544,7 +2544,8 @@ int __init ip_mr_init(void)
 		goto add_proto_fail;
 	}
 #endif
-	rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE, NULL, ipmr_rtm_dumproute);
+	rtnl_register(RTNL_FAMILY_IPMR, RTM_GETROUTE,
+		      NULL, ipmr_rtm_dumproute, NULL);
 	return 0;
 
 #ifdef CONFIG_IP_PIMSM_V2
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6a83840..eec0caa 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3312,7 +3312,7 @@ int __init ip_rt_init(void)
 	xfrm_init();
 	xfrm4_init(ip_rt_max_size);
 #endif
-	rtnl_register(PF_INET, RTM_GETROUTE, inet_rtm_getroute, NULL);
+	rtnl_register(PF_INET, RTM_GETROUTE, inet_rtm_getroute, NULL, NULL);
 
 #ifdef CONFIG_SYSCTL
 	register_pernet_subsys(&sysctl_route_ops);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f2f9b2e..f013979 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4704,16 +4704,16 @@ int __init addrconf_init(void)
 	if (err < 0)
 		goto errout_af;
 
-	err = __rtnl_register(PF_INET6, RTM_GETLINK, NULL, inet6_dump_ifinfo);
+	err = __rtnl_register(PF_INET6, RTM_GETLINK, NULL, inet6_dump_ifinfo, NULL);
 	if (err < 0)
 		goto errout;
 
 	/* Only the first call to __rtnl_register can fail */
-	__rtnl_register(PF_INET6, RTM_NEWADDR, inet6_rtm_newaddr, NULL);
-	__rtnl_register(PF_INET6, RTM_DELADDR, inet6_rtm_deladdr, NULL);
-	__rtnl_register(PF_INET6, RTM_GETADDR, inet6_rtm_getaddr, inet6_dump_ifaddr);
-	__rtnl_register(PF_INET6, RTM_GETMULTICAST, NULL, inet6_dump_ifmcaddr);
-	__rtnl_register(PF_INET6, RTM_GETANYCAST, NULL, inet6_dump_ifacaddr);
+	__rtnl_register(PF_INET6, RTM_NEWADDR, inet6_rtm_newaddr, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_DELADDR, inet6_rtm_deladdr, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_GETADDR, inet6_rtm_getaddr, inet6_dump_ifaddr, NULL);
+	__rtnl_register(PF_INET6, RTM_GETMULTICAST, NULL, inet6_dump_ifmcaddr, NULL);
+	__rtnl_register(PF_INET6, RTM_GETANYCAST, NULL, inet6_dump_ifacaddr, NULL);
 
 	ipv6_addr_label_rtnl_register();
 
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index c8993e5..f3aa749 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -592,8 +592,8 @@ out:
 
 void __init ipv6_addr_label_rtnl_register(void)
 {
-	__rtnl_register(PF_INET6, RTM_NEWADDRLABEL, ip6addrlbl_newdel, NULL);
-	__rtnl_register(PF_INET6, RTM_DELADDRLABEL, ip6addrlbl_newdel, NULL);
-	__rtnl_register(PF_INET6, RTM_GETADDRLABEL, ip6addrlbl_get, ip6addrlbl_dump);
+	__rtnl_register(PF_INET6, RTM_NEWADDRLABEL, ip6addrlbl_newdel, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_DELADDRLABEL, ip6addrlbl_newdel, NULL, NULL);
+	__rtnl_register(PF_INET6, RTM_GETADDRLABEL, ip6addrlbl_get, ip6addrlbl_dump, NULL);
 }
 
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 4076a0b..9b257da 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1586,7 +1586,7 @@ int __init fib6_init(void)
 	if (ret)
 		goto out_kmem_cache_create;
 
-	ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib);
+	ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib, NULL);
 	if (ret)
 		goto out_unregister_subsys;
 out:
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 82a8099..1edfcc9 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1354,7 +1354,7 @@ int __init ip6_mr_init(void)
 		goto add_proto_fail;
 	}
 #endif
-	rtnl_register(RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL, ip6mr_rtm_dumproute);
+	rtnl_register(RTNL_FAMILY_IP6MR, RTM_GETROUTE, NULL, ip6mr_rtm_dumproute, NULL);
 	return 0;
 #ifdef CONFIG_IPV6_PIMSM_V2
 add_proto_fail:
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f1be5c5..1c49165 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2924,9 +2924,9 @@ int __init ip6_route_init(void)
 		goto xfrm6_init;
 
 	ret = -ENOBUFS;
-	if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL) ||
-	    __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL) ||
-	    __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL))
+	if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL, NULL) ||
+	    __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL, NULL) ||
+	    __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL, NULL))
 		goto fib6_rules_init;
 
 	ret = register_netdevice_notifier(&ip6_route_dev_notifier);
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 72d1ac6..dc1528c 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1120,7 +1120,7 @@ ip_set_dump(struct sock *ctnl, struct sk_buff *skb,
 
 	return netlink_dump_start(ctnl, skb, nlh,
 				  ip_set_dump_start,
-				  ip_set_dump_done);
+				  ip_set_dump_done, 0);
 }
 
 /* Add, del and test */
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..7dec88a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -970,7 +970,7 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb,
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP)
 		return netlink_dump_start(ctnl, skb, nlh, ctnetlink_dump_table,
-					  ctnetlink_done);
+					  ctnetlink_done, 0);
 
 	err = ctnetlink_parse_zone(cda[CTA_ZONE], &zone);
 	if (err < 0)
@@ -1840,7 +1840,7 @@ ctnetlink_get_expect(struct sock *ctnl, struct sk_buff *skb,
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
 		return netlink_dump_start(ctnl, skb, nlh,
 					  ctnetlink_exp_dump_table,
-					  ctnetlink_exp_done);
+					  ctnetlink_exp_done, 0);
 	}
 
 	err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c8f35b5..063bee9 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1665,13 +1665,10 @@ static int netlink_dump(struct sock *sk)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct netlink_callback *cb;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	int len, err = -ENOBUFS;
-
-	skb = sock_rmalloc(sk, NLMSG_GOODSIZE, 0, GFP_KERNEL);
-	if (!skb)
-		goto errout;
+	int alloc_size;
 
 	mutex_lock(nlk->cb_mutex);
 
@@ -1681,6 +1678,12 @@ static int netlink_dump(struct sock *sk)
 		goto errout_skb;
 	}
 
+	alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
+
+	skb = sock_rmalloc(sk, alloc_size, 0, GFP_KERNEL);
+	if (!skb)
+		goto errout;
+
 	len = cb->dump(skb, cb);
 
 	if (len > 0) {
@@ -1727,7 +1730,8 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 		       const struct nlmsghdr *nlh,
 		       int (*dump)(struct sk_buff *skb,
 				   struct netlink_callback *),
-		       int (*done)(struct netlink_callback *))
+		       int (*done)(struct netlink_callback *),
+		       u16 min_dump_alloc)
 {
 	struct netlink_callback *cb;
 	struct sock *sk;
@@ -1741,6 +1745,7 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	cb->dump = dump;
 	cb->done = done;
 	cb->nlh = nlh;
+	cb->min_dump_alloc = min_dump_alloc;
 	atomic_inc(&skb->users);
 	cb->skb = skb;
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 1781d99..482fa57 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -525,7 +525,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 		genl_unlock();
 		err = netlink_dump_start(net->genl_sock, skb, nlh,
-					 ops->dumpit, ops->done);
+					 ops->dumpit, ops->done, 0);
 		genl_lock();
 		return err;
 	}
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index 438accb..4ad4bb9 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -289,15 +289,15 @@ out:
 
 int __init phonet_netlink_register(void)
 {
-	int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit, NULL);
+	int err = __rtnl_register(PF_PHONET, RTM_NEWADDR, addr_doit, NULL, NULL);
 	if (err)
 		return err;
 
 	/* Further __rtnl_register() cannot fail */
-	__rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit);
-	__rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL);
-	__rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit);
+	__rtnl_register(PF_PHONET, RTM_DELADDR, addr_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_GETADDR, NULL, getaddr_dumpit, NULL);
+	__rtnl_register(PF_PHONET, RTM_NEWROUTE, route_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_DELROUTE, route_doit, NULL, NULL);
+	__rtnl_register(PF_PHONET, RTM_GETROUTE, NULL, route_dumpit, NULL);
 	return 0;
 }
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 14b42f4..c857763 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1120,9 +1120,9 @@ nlmsg_failure:
 
 static int __init tc_action_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action);
+	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action, NULL);
 
 	return 0;
 }
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bb2c523..9563887 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -610,10 +610,10 @@ EXPORT_SYMBOL(tcf_exts_dump_stats);
 
 static int __init tc_filter_init(void)
 {
-	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
-						 tc_dump_tfilter);
+		      tc_dump_tfilter, NULL);
 
 	return 0;
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 7490f3f..7870a92 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1794,12 +1794,12 @@ static int __init pktsched_init(void)
 	register_qdisc(&pfifo_head_drop_qdisc_ops);
 	register_qdisc(&mq_qdisc_ops);
 
-	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETQDISC, tc_get_qdisc, tc_dump_qdisc);
-	rtnl_register(PF_UNSPEC, RTM_NEWTCLASS, tc_ctl_tclass, NULL);
-	rtnl_register(PF_UNSPEC, RTM_DELTCLASS, tc_ctl_tclass, NULL);
-	rtnl_register(PF_UNSPEC, RTM_GETTCLASS, tc_ctl_tclass, tc_dump_tclass);
+	rtnl_register(PF_UNSPEC, RTM_NEWQDISC, tc_modify_qdisc, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELQDISC, tc_get_qdisc, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETQDISC, tc_get_qdisc, tc_dump_qdisc, NULL);
+	rtnl_register(PF_UNSPEC, RTM_NEWTCLASS, tc_ctl_tclass, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_DELTCLASS, tc_ctl_tclass, NULL, NULL);
+	rtnl_register(PF_UNSPEC, RTM_GETTCLASS, tc_ctl_tclass, tc_dump_tclass, NULL);
 
 	return 0;
 }
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index c658cb3..8bd79c8 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2299,7 +2299,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		if (link->dump == NULL)
 			return -EINVAL;
 
-		return netlink_dump_start(net->xfrm.nlsk, skb, nlh, link->dump, link->done);
+		return netlink_dump_start(net->xfrm.nlsk, skb, nlh, link->dump, link->done, 0);
 	}
 
 	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs, XFRMA_MAX,


^ permalink raw reply related

* Re: ath5k regression associating with APs in 2.6.38
From: Seth Forshee @ 2011-05-17 16:57 UTC (permalink / raw)
  To: Nick Kossifidis
  Cc: John W. Linville, Jiri Slaby, Luis R. Rodriguez, Bob Copeland,
	linux-wireless, ath5k-devel, netdev, linux-kernel
In-Reply-To: <20110509070230.GA20458@thinkpad-t410>

On Mon, May 09, 2011 at 09:02:30AM +0200, Seth Forshee wrote:
> On Thu, May 05, 2011 at 05:30:42PM +0300, Nick Kossifidis wrote:
> > Hmm I don't see any errors from reset/phy code, can you disable
> > Network Manager/wpa-supplicant and test connection on an open network
> > using iw ? It 'll give us a better picture...
> > 
> > If iw doesn't return any scan results we are probably hitting a PHY/RF
> > error specific to your device (not all vendors follow the reference
> > design). Maybe we should follow a blacklist/whitelist approach for
> > this feature.
> 
> I got the results back from my tester. He was able to get scan results,
> but it took multiple tries and the direct probe failures appear in the
> log. He didn't enable ATH5K_DEBUG_RESET this time; let me know if you
> need that and I'll request he retest with the extra debug logs enabled.

I got some more feedback. Most of the time iw does not get scan results,
but even when it does connecting to the AP isn't always successful. The
tester did note that he doesn't seem to have any trouble if his machine
is within a few feet of his AP. Let me know if you'd like something else
tested.

I noticed that bugzilla #31922 (ath5k: Decreased throughput in IBSS or
802.11n mode) is also fixed by reverting 8aec7af9. It seems like the
synth-only channel changes are resulting in poor connection quality.
Maybe that patch needs to be reverted?

Thanks,
Seth

^ permalink raw reply

* Re: [PATCH] 2.6.38 ENC28J60 works with half-duplex DMA
From: David Miller @ 2011-05-17 16:25 UTC (permalink / raw)
  To: elpa.rizzo; +Cc: netdev
In-Reply-To: <BANLkTikye2gir-RJjG1rHSdxDWD3VXBnEw@mail.gmail.com>

From: Davide Rizzo <elpa.rizzo@gmail.com>
Date: Tue, 17 May 2011 07:15:48 +0200

> Where can I see a message list to have a confirmation that it was received ?

All patches posted to the mailing list get queued up at patchwork:

http://patchwork.ozlabs.org/project/netdev/list/

^ permalink raw reply

* Re: [PATCH 09/18] virtio: use avail_event index
From: Tom Lendacky @ 2011-05-17 16:23 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, linux-kernel, Carsten Otte,
	Christian Borntraeger, linux390, Martin Schwidefsky,
	Heiko Carstens, Shirley Ma, lguest, virtualization, netdev,
	linux-s390, kvm, Krishna Kumar, steved, habanero
In-Reply-To: <87ei3zdsq2.fsf@rustcorp.com.au>


On Monday, May 16, 2011 02:12:21 AM Rusty Russell wrote:
> On Sun, 15 May 2011 16:55:41 +0300, "Michael S. Tsirkin" <mst@redhat.com> 
wrote:
> > On Mon, May 09, 2011 at 02:03:26PM +0930, Rusty Russell wrote:
> > > On Wed, 4 May 2011 23:51:47 +0300, "Michael S. Tsirkin" <mst@redhat.com> 
wrote:
> > > > Use the new avail_event feature to reduce the number
> > > > of exits from the guest.
> > > 
> > > Figures here would be nice :)
> > 
> > You mean ASCII art in comments?
> 
> I mean benchmarks of some kind.

I'm working on getting some benchmark results for the patches.  I should 
hopefully have something in the next day or two.

Tom
> 
> > > > @@ -228,6 +237,12 @@ add_head:
> > > >  	 * new available array entries. */
> > > >  	
> > > >  	virtio_wmb();
> > > >  	vq->vring.avail->idx++;
> > > > 
> > > > +	/* If the driver never bothers to kick in a very long while,
> > > > +	 * avail index might wrap around. If that happens, invalidate
> > > > +	 * kicked_avail index we stored. TODO: make sure all drivers
> > > > +	 * kick at least once in 2^16 and remove this. */
> > > > +	if (unlikely(vq->vring.avail->idx == vq->kicked_avail))
> > > > +		vq->kicked_avail_valid = true;
> > > 
> > > If they don't, they're already buggy.  Simply do:
> > >         WARN_ON(vq->vring.avail->idx == vq->kicked_avail);
> > 
> > Hmm, but does it say that somewhere?
> 
> AFAICT it's a corollary of:
> 1) You have a finite ring of size <= 2^16.
> 2) You need to kick the other side once you've done some work.
> 
> > > > @@ -482,6 +517,8 @@ void vring_transport_features(struct
> > > > virtio_device *vdev)
> > > > 
> > > >  			break;
> > > >  		
> > > >  		case VIRTIO_RING_F_USED_EVENT_IDX:
> > > >  			break;
> > > > 
> > > > +		case VIRTIO_RING_F_AVAIL_EVENT_IDX:
> > > > +			break;
> > > > 
> > > >  		default:
> > > >  			/* We don't understand this bit. */
> > > >  			clear_bit(i, vdev->features);
> > > 
> > > Does this belong in a prior patch?
> > > 
> > > Thanks,
> > > Rusty.
> > 
> > Well if we don't support the feature in the ring we should not
> > ack the feature, right?
> 
> Ah, you're right.
> 
> Thanks,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-2.6] net: use hlist_del_rcu() in dev_change_name()
From: Eric Dumazet @ 2011-05-17 16:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Using plain hlist_del() in dev_change_name() is wrong since a
concurrent reader can crash trying to dereference LIST_POISON1.

Bug introduced in commit 72c9528bab94 (net: Introduce
dev_get_by_name_rcu())

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
Stable candidate, for 2.6.33+

 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b624fe4..30a4078 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1007,7 +1007,7 @@ rollback:
 	}
 
 	write_lock_bh(&dev_base_lock);
-	hlist_del(&dev->name_hlist);
+	hlist_del_rcu(&dev->name_hlist);
 	write_unlock_bh(&dev_base_lock);
 
 	synchronize_rcu();



^ permalink raw reply related

* [PATCH net-next-2.6 3/3] sfc: Use netif_device_{detach,attach}() around reset and self-test
From: Ben Hutchings @ 2011-05-17 16:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1305647988.2848.14.camel@bwh-desktop>

We need to keep the TX queues stopped throughout a reset, without
triggering the TX watchdog and regardless of the link state.  The
proper way to do this is to use netif_device_{detach,attach}() just as
we do around suspend/resume, rather than the current bodge of faking
link-down.

Since we also need to do this during an offline self-test and we
perform a reset during that, add these function calls outside of
efx_reset_down() and efx_reset_up().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c        |    9 +++------
 drivers/net/sfc/net_driver.h |    4 +---
 drivers/net/sfc/selftest.c   |   11 +++++------
 drivers/net/sfc/tx.c         |    9 +++++----
 4 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 796c47e..05502b3 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -798,11 +798,6 @@ void efx_link_status_changed(struct efx_nic *efx)
 	if (!netif_running(efx->net_dev))
 		return;
 
-	if (efx->port_inhibited) {
-		netif_carrier_off(efx->net_dev);
-		return;
-	}
-
 	if (link_state->up != netif_carrier_ok(efx->net_dev)) {
 		efx->n_link_state_changes++;
 
@@ -1450,7 +1445,7 @@ static void efx_start_all(struct efx_nic *efx)
 	 * restart the transmit interface early so the watchdog timer stops */
 	efx_start_port(efx);
 
-	if (efx_dev_registered(efx) && !efx->port_inhibited)
+	if (efx_dev_registered(efx) && netif_device_present(efx->net_dev))
 		netif_tx_wake_all_queues(efx->net_dev);
 
 	efx_for_each_channel(channel, efx)
@@ -2114,6 +2109,7 @@ int efx_reset(struct efx_nic *efx, enum reset_type method)
 	netif_info(efx, drv, efx->net_dev, "resetting (%s)\n",
 		   RESET_TYPE(method));
 
+	netif_device_detach(efx->net_dev);
 	efx_reset_down(efx, method);
 
 	rc = efx->type->reset(efx, method);
@@ -2147,6 +2143,7 @@ out:
 		efx->state = STATE_DISABLED;
 	} else {
 		netif_dbg(efx, drv, efx->net_dev, "reset complete\n");
+		netif_device_attach(efx->net_dev);
 	}
 	return rc;
 }
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index 5718260..ce9697b 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -670,13 +670,12 @@ struct efx_filter_state;
  * @mtd_list: List of MTDs attached to the NIC
  * @nic_data: Hardware dependent state
  * @mac_lock: MAC access lock. Protects @port_enabled, @phy_mode,
- *	@port_inhibited, efx_monitor() and efx_reconfigure_port()
+ *	efx_monitor() and efx_reconfigure_port()
  * @port_enabled: Port enabled indicator.
  *	Serialises efx_stop_all(), efx_start_all(), efx_monitor() and
  *	efx_mac_work() with kernel interfaces. Safe to read under any
  *	one of the rtnl_lock, mac_lock, or netif_tx_lock, but all three must
  *	be held to modify it.
- * @port_inhibited: If set, the netif_carrier is always off. Hold the mac_lock
  * @port_initialized: Port initialized?
  * @net_dev: Operating system network device. Consider holding the rtnl lock
  * @stats_buffer: DMA buffer for statistics
@@ -764,7 +763,6 @@ struct efx_nic {
 	struct mutex mac_lock;
 	struct work_struct mac_work;
 	bool port_enabled;
-	bool port_inhibited;
 
 	bool port_initialized;
 	struct net_device *net_dev;
diff --git a/drivers/net/sfc/selftest.c b/drivers/net/sfc/selftest.c
index 50ad3bc..822f6c2 100644
--- a/drivers/net/sfc/selftest.c
+++ b/drivers/net/sfc/selftest.c
@@ -695,12 +695,12 @@ int efx_selftest(struct efx_nic *efx, struct efx_self_tests *tests,
 	/* Offline (i.e. disruptive) testing
 	 * This checks MAC and PHY loopback on the specified port. */
 
-	/* force the carrier state off so the kernel doesn't transmit during
-	 * the loopback test, and the watchdog timeout doesn't fire. Also put
-	 * falcon into loopback for the register test.
+	/* Detach the device so the kernel doesn't transmit during the
+	 * loopback test and the watchdog timeout doesn't fire.
 	 */
+	netif_device_detach(efx->net_dev);
+
 	mutex_lock(&efx->mac_lock);
-	efx->port_inhibited = true;
 	if (efx->loopback_modes) {
 		/* We need the 312 clock from the PHY to test the XMAC
 		 * registers, so move into XGMII loopback if available */
@@ -750,12 +750,11 @@ int efx_selftest(struct efx_nic *efx, struct efx_self_tests *tests,
 	/* restore the PHY to the previous state */
 	mutex_lock(&efx->mac_lock);
 	efx->phy_mode = phy_mode;
-	efx->port_inhibited = false;
 	efx->loopback_mode = loopback_mode;
 	__efx_reconfigure_port(efx);
 	mutex_unlock(&efx->mac_lock);
 
-	netif_tx_wake_all_queues(efx->net_dev);
+	netif_device_attach(efx->net_dev);
 
 	return rc_test;
 }
diff --git a/drivers/net/sfc/tx.c b/drivers/net/sfc/tx.c
index d2c85df..84eb99e 100644
--- a/drivers/net/sfc/tx.c
+++ b/drivers/net/sfc/tx.c
@@ -205,7 +205,9 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 					goto unwind;
 				}
 				smp_mb();
-				netif_tx_start_queue(tx_queue->core_txq);
+				if (likely(!efx->loopback_selftest))
+					netif_tx_start_queue(
+						tx_queue->core_txq);
 			}
 
 			insert_ptr = tx_queue->insert_count & tx_queue->ptr_mask;
@@ -338,8 +340,7 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 	struct efx_tx_queue *tx_queue;
 	unsigned index, type;
 
-	if (unlikely(efx->port_inhibited))
-		return NETDEV_TX_BUSY;
+	EFX_WARN_ON_PARANOID(!netif_device_present(net_dev));
 
 	index = skb_get_queue_mapping(skb);
 	type = skb->ip_summed == CHECKSUM_PARTIAL ? EFX_TXQ_TYPE_OFFLOAD : 0;
@@ -436,7 +437,7 @@ void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index)
 	smp_mb();
 	if (unlikely(netif_tx_queue_stopped(tx_queue->core_txq)) &&
 	    likely(efx->port_enabled) &&
-	    likely(!efx->port_inhibited)) {
+	    likely(netif_device_present(efx->net_dev))) {
 		fill_level = tx_queue->insert_count - tx_queue->read_count;
 		if (fill_level < EFX_TXQ_THRESHOLD(efx)) {
 			EFX_BUG_ON_PARANOID(!efx_dev_registered(efx));
-- 
1.7.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-next-2.6 2/3] sfc: Fix TX queue numbering when separate_tx_channels=1
From: Ben Hutchings @ 2011-05-17 16:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1305647988.2848.14.camel@bwh-desktop>

This option appears to have been broken by commit
8313aca38b3937947fffebca6e34bac8e24300c8 ('sfc: Allocate each channel
separately, along with its RX and TX queues').

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 38a55e9..796c47e 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1319,8 +1319,20 @@ static void efx_remove_interrupts(struct efx_nic *efx)
 
 static void efx_set_channels(struct efx_nic *efx)
 {
+	struct efx_channel *channel;
+	struct efx_tx_queue *tx_queue;
+
 	efx->tx_channel_offset =
 		separate_tx_channels ? efx->n_channels - efx->n_tx_channels : 0;
+
+	/* We need to adjust the TX queue numbers if we have separate
+	 * RX-only and TX-only channels.
+	 */
+	efx_for_each_channel(channel, efx) {
+		efx_for_each_channel_tx_queue(tx_queue, channel)
+			tx_queue->queue -= (efx->tx_channel_offset *
+					    EFX_TXQ_TYPES);
+	}
 }
 
 static int efx_probe_nic(struct efx_nic *efx)
-- 
1.7.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-next-2.6 1/3] sfc: Fix return value from efx_ethtool_set_rx_ntuple()
From: Ben Hutchings @ 2011-05-17 16:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1305647988.2848.14.camel@bwh-desktop>

ethtool_ops::set_rx_ntuple is supposed to return 0 on success, but it
currently returns the filter ID when it inserts or modifies a filter.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/ethtool.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index 8c5e005..348437a 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -955,8 +955,9 @@ static int efx_ethtool_set_rx_ntuple(struct net_device *net_dev,
 
 	if (ntuple->fs.action == ETHTOOL_RXNTUPLE_ACTION_CLEAR)
 		return efx_filter_remove_filter(efx, &filter);
-	else
-		return efx_filter_insert_filter(efx, &filter, true);
+
+	rc = efx_filter_insert_filter(efx, &filter, true);
+	return rc < 0 ? rc : 0;
 }
 
 static int efx_ethtool_get_rxfh_indir(struct net_device *net_dev,
-- 
1.7.4



-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* pull request: sfc-next-2.6 2011-05-17
From: Ben Hutchings @ 2011-05-17 15:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

The following changes since commit 7be799a70ba3dd90a59e8d2c72bbe06020005b3f:

  ipv4: Remove rt->rt_dst reference from ip_forward_options(). (2011-05-13 17:31:02 -0400)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6.git master

Miscellaneous bug fixes.

Ben.

Ben Hutchings (3):
      sfc: Fix return value from efx_ethtool_set_rx_ntuple()
      sfc: Fix TX queue numbering when separate_tx_channels=1
      sfc: Use netif_device_{detach,attach}() around reset and self-test

 drivers/net/sfc/efx.c        |   21 +++++++++++++++------
 drivers/net/sfc/ethtool.c    |    5 +++--
 drivers/net/sfc/net_driver.h |    4 +---
 drivers/net/sfc/selftest.c   |   11 +++++------
 drivers/net/sfc/tx.c         |    9 +++++----
 5 files changed, 29 insertions(+), 21 deletions(-)

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] net: ping: fix build failure
From: Randy Dunlap @ 2011-05-17 15:41 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: David Miller, randy.dunlap, sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20110517101656.GA28685@albatros>

On Tue, 17 May 2011 14:16:56 +0400 Vasiliy Kulikov wrote:

> On Mon, May 16, 2011 at 15:38 -0400, David Miller wrote:
> > From: Randy Dunlap <randy.dunlap@oracle.com>
> > Date: Mon, 16 May 2011 12:35:34 -0700
> > 
> > > On Mon, 16 May 2011 15:10:19 +1000 Stephen Rothwell wrote:
> > > when CONFIG_PROC_SYSCTL is not enabled:
> > > 
> > > ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
> > 
> > Vasiliy, please fix this.
> 
> I wonder whether there is any way to test such unusual configurations?
> Only randconfig or are there any (partly-)automated tools for it?
> 
> 
> [PATCH] net: ping: fix build failure
> 
> If CONFIG_PROC_SYSCTL=n the building process fails:
> 
>     ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
> 
> Moved inet_get_ping_group_range_net() to ping.c.
> 
> Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>

Acked-by: Randy Dunlap <randy.dunlap@oracle.com>

Thanks.

> ---
>  include/net/ping.h         |    2 --
>  net/ipv4/ping.c            |   13 +++++++++++++
>  net/ipv4/sysctl_net_ipv4.c |   12 ------------
>  3 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/ping.h b/include/net/ping.h
> index 23062c3..682b5ae 100644
> --- a/include/net/ping.h
> +++ b/include/net/ping.h
> @@ -44,8 +44,6 @@ extern struct proto ping_prot;
>  extern void ping_rcv(struct sk_buff *);
>  extern void ping_err(struct sk_buff *, u32 info);
>  
> -extern void inet_get_ping_group_range_net(struct net *net, unsigned int *low, unsigned int *high);
> -
>  #ifdef CONFIG_PROC_FS
>  extern int __init ping_proc_init(void);
>  extern void ping_proc_exit(void);
> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index 7041d09..3635975 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -188,6 +188,19 @@ exit:
>  	return sk;
>  }
>  
> +static void inet_get_ping_group_range_net(struct net *net, gid_t *low, gid_t *high)
> +{
> +	gid_t *data = net->ipv4.sysctl_ping_group_range;
> +	unsigned seq;
> +	do {
> +		seq = read_seqbegin(&sysctl_local_ports.lock);
> +
> +		*low = data[0];
> +		*high = data[1];
> +	} while (read_seqretry(&sysctl_local_ports.lock, seq));
> +}
> +
> +
>  static int ping_init_sock(struct sock *sk)
>  {
>  	struct net *net = sock_net(sk);
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index 28e8273..57d0752 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -73,18 +73,6 @@ static int ipv4_local_port_range(ctl_table *table, int write,
>  }
>  
>  
> -void inet_get_ping_group_range_net(struct net *net, gid_t *low, gid_t *high)
> -{
> -	gid_t *data = net->ipv4.sysctl_ping_group_range;
> -	unsigned seq;
> -	do {
> -		seq = read_seqbegin(&sysctl_local_ports.lock);
> -
> -		*low = data[0];
> -		*high = data[1];
> -	} while (read_seqretry(&sysctl_local_ports.lock, seq));
> -}
> -
>  void inet_get_ping_group_range_table(struct ctl_table *table, gid_t *low, gid_t *high)
>  {
>  	gid_t *data = table->data;
> -- 
> 1.7.0.4
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: [PATCH 0/7] Network namespace manipulation with file descriptors
From: David Lamparter @ 2011-05-17 15:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Lamparter, Alex Bligh, linux-arch, netdev, linux-kernel,
	Linux Containers, linux-fsdevel
In-Reply-To: <m1mxil9z2p.fsf@fess.ebiederm.org>

On Tue, May 17, 2011 at 07:33:18AM -0700, Eric W. Biederman wrote:
> I went the round of keeping a daemon open, saw how much code that
> takes and how fragile that can be in the corner cases and decided to
> patch the kernel to make the interfaces better.

Yes, it is more fragile. I'm currently using it without setns-patched
kernels though, so it's all I have.

(More or less related, I was able to cause a kernel oops on doing a
"find /proc" on setns-patched kernels after bind-mounting /proc; I did
not report it because I'm running grsecurity-patched kernels which tend
to mess with /proc quite a bit... I'll try to reproduce it with your
current patches on an otherwise vanilla kernel.)

> > I also have patches for OpenVPN and pptpd floating around that make it
> > possible to 'cross' namespace boundaries, i.e. the VPN servers listen in
> > one namespace and have their devices in another.
> 
> For openvpn I have managed to get away with simply using an up script. 
> Mostly the script is:
> 
> ip netns add $NSNAME || true
> ip netns exec $NSNAME ip link set lo up
> ip link set $dev netns $NSNAME

Historic annotation: This line used to kernel panic around a year and a
half back - nowadays tap devices do get the netns move right...

> ip netns exec $NSNAME ip link set $dev up
> ip netns exec $NSNAME ifconfig $dev $ifconfig_local netmask $ifconfig_netmask broadcast $ifconfig_broadcast
> 
> With a few extra bits for dns options and routes.  If I had an openvpn
> built with the iproute option I expect I could get away by just wrapping
> iproute.  Not that I would mind a patched openvpn.

I didn't even try to make amends for DNS, I just put 127.0.0.1 and have
caches in each of the namespaces. Wrapping iproute2 would work; one of
the advantages of patching OpenVPN is that the OpenVPN daemon is in the
same namespace as the tap devices and can do all the configuration as
usual.

For pptp, er, well, reading up on how ppp devices behave if you splice
them across namespace boundaries would've taken more time than patching
up the UDP/GRE sockets.

> Personally I think using a vpn in a network namespace seems like a
> killer feature.

Yes, it very much is - it provides very nice and clean solutions to
problems that up to now were usually hacked around with policy routing &
co.


-David

^ permalink raw reply

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Shirley Ma @ 2011-05-17 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <20110517152840.GA2389@redhat.com>

On Tue, 2011-05-17 at 18:28 +0300, Michael S. Tsirkin wrote:
> Which is the order the descriptors are put on avail ring.
> By design, guest should not depend on used ring entries
> being in order with avail ring (and btw with virtio block,
> they aren't). If it does, it's a bug I think.

Ok, I thought, the order should be maintained.

> > The original code has no problem, because it gets one head and pass
> that
> > head to vhost_add_used one by one once done the copy. So it's in
> > sequence.
> > 
> > This issue can easily recreate without zerocopy patch by simply
> changing
> > the order from "head return vhost_get_vq_desc" when passing to
> > vhost_add_used.
> > 
> > Thanks
> > Shirley
> 
> Ah, did you try that? Could you post this patch pls?
> This seems to imply a bug in guest virtio. 

I am creating the patch against net-next for you to test today.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Michael S. Tsirkin @ 2011-05-17 15:28 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <1305645734.10756.14.camel@localhost.localdomain>

On Tue, May 17, 2011 at 08:22:14AM -0700, Shirley Ma wrote:
> On Tue, 2011-05-17 at 08:55 +0300, Michael S. Tsirkin wrote:
> > Something in your patch that overwrites the id in vhost
> > and makes it put the wrong id in the used ring?
> > 
> > By the way, need to keep in mind that a guest can
> > give us the same head twice, need to make sure this
> > at least does not corrupt host memory.
> 
> I think I didn't explain the problem very well here.
> 
> This patch doesn't overwrite the id. It just keeps the same coming
> sequence from "head return vhost_get_vq_desc()" to pass to
> vhost_add_used.
> 
> The same ids can be used many times once it passes to guest from
> vhost_add_used. There is no problem. The zero copy patch doesn't have
> any issue.
> 
> The problem is the order of head from return vhost_get_vq_desc should be
> in sequence when it passes to vhost_add_used.

Which is the order the descriptors are put on avail ring.
By design, guest should not depend on used ring entries
being in order with avail ring (and btw with virtio block,
they aren't). If it does, it's a bug I think.

> The original code has no problem, because it gets one head and pass that
> head to vhost_add_used one by one once done the copy. So it's in
> sequence.
> 
> This issue can easily recreate without zerocopy patch by simply changing
> the order from "head return vhost_get_vq_desc" when passing to
> vhost_add_used.
> 
> Thanks
> Shirley

Ah, did you try that? Could you post this patch pls?
This seems to imply a bug in guest virtio.

-- 
MST

^ permalink raw reply

* Re: [PATCH V5 4/6 net-next] vhost: vhost TX zero-copy support
From: Shirley Ma @ 2011-05-17 15:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Eric Dumazet, Avi Kivity, Arnd Bergmann, netdev,
	kvm, linux-kernel
In-Reply-To: <20110517055503.GA26989@redhat.com>

On Tue, 2011-05-17 at 08:55 +0300, Michael S. Tsirkin wrote:
> Something in your patch that overwrites the id in vhost
> and makes it put the wrong id in the used ring?
> 
> By the way, need to keep in mind that a guest can
> give us the same head twice, need to make sure this
> at least does not corrupt host memory.

I think I didn't explain the problem very well here.

This patch doesn't overwrite the id. It just keeps the same coming
sequence from "head return vhost_get_vq_desc()" to pass to
vhost_add_used.

The same ids can be used many times once it passes to guest from
vhost_add_used. There is no problem. The zero copy patch doesn't have
any issue.

The problem is the order of head from return vhost_get_vq_desc should be
in sequence when it passes to vhost_add_used.

The original code has no problem, because it gets one head and pass that
head to vhost_add_used one by one once done the copy. So it's in
sequence.

This issue can easily recreate without zerocopy patch by simply changing
the order from "head return vhost_get_vq_desc" when passing to
vhost_add_used.

Thanks
Shirley

^ permalink raw reply

* Re: [PATCH] net: tuntap: Fix tun_net_fix_features()
From: Michael S. Tsirkin @ 2011-05-17 15:11 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Herbert Xu, Ben Hutchings, Shan Wei
In-Reply-To: <20110517150029.GA23179@rere.qmqm.pl>

On Tue, May 17, 2011 at 05:00:29PM +0200, Michał Mirosław wrote:
> On Tue, May 17, 2011 at 05:54:28PM +0300, Michael S. Tsirkin wrote:
> > On Tue, May 17, 2011 at 04:46:35PM +0200, Michał Mirosław wrote:
> > > On Tue, May 17, 2011 at 05:29:43PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, May 17, 2011 at 10:19:54AM +0200, Michał Mirosław wrote:
> > > > > tun->set_features are meant to limit not force the features.
> [...]
> > > > One thing that this will do though: previously, if
> > > > ethtool disables offloads, then an application enables
> > > > them, the application will have the last say.
> > > > With this patch, the most conservative approach wins.
> > > > Right?
> > > 
> > > Exactly.
> > > 
> > > On device creation, wanted_features default to all offloads
> > > enabled, so unless an admin changes the flags, the application controls
> > > what is enabled. This matters only when using persistent tun/tap and
> > > admin and user are two different people. If the admin is using queues
> > > and doesn't want to handle e.g. TSO packets (I'm not sure if they are
> > > properly accounted in all queuing disciplines), then the feature should
> > > not be enabled by user.
> [...]
> > Yes, with virtualization admin and the app are two different people
> > usually.  The device doesn't have to be persistent though I think -
> > what limits this to persistent devices?
> 
> Hmm. Nothing really. I just forgot about the virtualization case. You
> usually will change the offloads just after device creation unless you're
> testing or debugging something.

That's true. kvm invokes a user script after creating device
but just before configuring it, if there might be a problem
it's likely only because of something such a script might do
(which used to be harmless). My gut feeling is this
is unlikely.

> > I agree this behaviour seems more consistent, I just hope this change
> > does not break any setups.
> 
> The only effect would be some performance drop on cases, where admin turned
> off the offloads and they stay like that regardless of what user part does.
> 
> Best Regards,
> Michał Mirosław

The performance drop is actually quite drastic :), but yes it
will keep going, which is a good thing.

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: tuntap: Fix tun_net_fix_features()
From: Michał Mirosław @ 2011-05-17 15:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, Herbert Xu, Ben Hutchings, Shan Wei
In-Reply-To: <20110517145428.GA1472@redhat.com>

On Tue, May 17, 2011 at 05:54:28PM +0300, Michael S. Tsirkin wrote:
> On Tue, May 17, 2011 at 04:46:35PM +0200, Michał Mirosław wrote:
> > On Tue, May 17, 2011 at 05:29:43PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, May 17, 2011 at 10:19:54AM +0200, Michał Mirosław wrote:
> > > > tun->set_features are meant to limit not force the features.
[...]
> > > One thing that this will do though: previously, if
> > > ethtool disables offloads, then an application enables
> > > them, the application will have the last say.
> > > With this patch, the most conservative approach wins.
> > > Right?
> > 
> > Exactly.
> > 
> > On device creation, wanted_features default to all offloads
> > enabled, so unless an admin changes the flags, the application controls
> > what is enabled. This matters only when using persistent tun/tap and
> > admin and user are two different people. If the admin is using queues
> > and doesn't want to handle e.g. TSO packets (I'm not sure if they are
> > properly accounted in all queuing disciplines), then the feature should
> > not be enabled by user.
[...]
> Yes, with virtualization admin and the app are two different people
> usually.  The device doesn't have to be persistent though I think -
> what limits this to persistent devices?

Hmm. Nothing really. I just forgot about the virtualization case. You
usually will change the offloads just after device creation unless you're
testing or debugging something.

> I agree this behaviour seems more consistent, I just hope this change
> does not break any setups.

The only effect would be some performance drop on cases, where admin turned
off the offloads and they stay like that regardless of what user part does.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH] net: ping: fix build failure
From: Randy Dunlap @ 2011-05-17 14:58 UTC (permalink / raw)
  To: Vasiliy Kulikov; +Cc: David Miller, sfr, netdev, linux-next, linux-kernel
In-Reply-To: <20110517101656.GA28685@albatros>

On 05/17/11 03:16, Vasiliy Kulikov wrote:
> On Mon, May 16, 2011 at 15:38 -0400, David Miller wrote:
>> From: Randy Dunlap <randy.dunlap@oracle.com>
>> Date: Mon, 16 May 2011 12:35:34 -0700
>>
>>> On Mon, 16 May 2011 15:10:19 +1000 Stephen Rothwell wrote:
>>> when CONFIG_PROC_SYSCTL is not enabled:
>>>
>>> ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
>>
>> Vasiliy, please fix this.
> 
> I wonder whether there is any way to test such unusual configurations?
> Only randconfig or are there any (partly-)automated tools for it?

Hi,

I do automated (cron) randconfigs nightly and occasionally I run a script
that builds a kernel with each one of these options disabled (although
lately CONFIG_PM is not being disabled, but I'll fix that one day):

BLOCK HOTPLUG INET INPUT MAGIC_SYSRQ NET PCI PM PROC_FS SMP SYSFS

Hm, I should add SYSCTL (and/or PROC_SYSCTL) to that list.  It often has issues.

-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: [PATCH] net: tuntap: Fix tun_net_fix_features()
From: Michael S. Tsirkin @ 2011-05-17 14:54 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Herbert Xu, Ben Hutchings, Shan Wei
In-Reply-To: <20110517144635.GA22878@rere.qmqm.pl>

On Tue, May 17, 2011 at 04:46:35PM +0200, Michał Mirosław wrote:
> On Tue, May 17, 2011 at 05:29:43PM +0300, Michael S. Tsirkin wrote:
> > On Tue, May 17, 2011 at 10:19:54AM +0200, Michał Mirosław wrote:
> > > tun->set_features are meant to limit not force the features.
> > > 
> > > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > ---
> > >  drivers/net/tun.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 74e9405..f77c6d0 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -458,7 +458,7 @@ static u32 tun_net_fix_features(struct net_device *dev, u32 features)
> > >  {
> > >  	struct tun_struct *tun = netdev_priv(dev);
> > >  
> > > -	return (features & tun->set_features) | (features & ~TUN_USER_FEATURES);
> > > +	return features & (tun->set_features | ~TUN_USER_FEATURES);
> > >  }
> > >  
> > >  static const struct net_device_ops tun_netdev_ops = {
> > > -- 
> > > 1.7.2.5
> > 
> > One thing that this will do though: previously, if
> > ethtool disables offloads, then an application enables
> > them, the application will have the last say.
> > With this patch, the most conservative approach wins.
> > Right?
> 
> Exactly.
> 
> On device creation, wanted_features default to all offloads
> enabled, so unless an admin changes the flags, the application controls
> what is enabled. This matters only when using persistent tun/tap and
> admin and user are two different people. If the admin is using queues
> and doesn't want to handle e.g. TSO packets (I'm not sure if they are
> properly accounted in all queuing disciplines), then the feature should
> not be enabled by user.
> 
> > If we want to have the existing behaviour
> > I think the following would do this (untested). What do you think?
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 74e9405..1d6c7bc 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1199,6 +1199,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> >  		return -EINVAL;
> >  
> >  	tun->set_features = features;
> > +	tun->dev->features &= TUN_USER_FEATURES;
> > +	tun->dev->features |= (features & TUN_USER_FEATURES);
> >  	netdev_update_features(tun->dev);
> 
> tun->dev->features will be recalculated by netdev_update_features()
> anyway. For this to work as you described it would need to alter
> wanted_features. I don't like the idea that something other than one
> of ethtool_ops is changing this field, as it then becomes something
> else that what the admin wants (even if that is not what he gets).
> 
> Best Regards,
> Michał Mirosław

Yes, with virtualization admin and the app are two different people
usually.  The device doesn't have to be persistent though I think -
what limits this to persistent devices?
I agree this behaviour seems more consistent, I just hope this change
does not break any setups.

-- 
MST

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox