Netdev List

Netdev List
 help / color / mirror / Atom feed

* Setting mac address of loopback device
From: Cong Wang @ 2014-01-23 19:53 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev

Hi,

I am wondering how much sense it makes to allow setting the mac
address of the loopback device?

We are trying to mirror the local traffic from lo to eth0, but
apparently the mac addresses in L2 header are all zero. Of course, one
solution is using pedit action to modify the mac addresses. But still,
if we could set the mac addr of lo, we don't need to modify the mac
header twice.

The patch itself is simple, probably just:

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c5011e0..a0ee030 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -160,6 +160,7 @@ static const struct net_device_ops loopback_ops = {
        .ndo_init      = loopback_dev_init,
        .ndo_start_xmit= loopback_xmit,
        .ndo_get_stats64 = loopback_get_stats64,
+       .ndo_set_mac_address = eth_mac_addr,
 };

 /*

BTW, how to change route to redirect _local_ traffic (that is packets
sending to itself) to a non-loopback device? I tried to modify local
route table, but have no luck to make it working.

Thanks!

^ permalink raw reply related

* [PATCH ebtables] arptables: long option "--set-counters" were missing
From: Jesper Dangaard Brouer @ 2014-01-23 20:09 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: netdev

The long option "--set-counters" where missing in option parsing.
And the corresponding short option "-c" were not mentioned in
the help usage text.

Also update the arptables man page with a description
of the parameter.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 userspace/arptables/arptables.8 |    8 ++++++++
 userspace/arptables/arptables.c |    3 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/userspace/arptables/arptables.8 b/userspace/arptables/arptables.8
index 0b6b62e..78b2c60 100644
--- a/userspace/arptables/arptables.8
+++ b/userspace/arptables/arptables.8
@@ -215,6 +215,14 @@ The target of the rule. This is one of the following values:
 a target extension (see
 .BR "TARGET EXTENSIONS" ")"
 or a user-defined chain name.
+.TP
+.BI "-c, --set-counters " "PKTS BYTES"
+This enables the administrator to initialize the packet and byte
+counters of a rule (during
+.B INSERT,
+.B APPEND,
+.B REPLACE
+operations).
 
 .SS RULE-SPECIFICATIONS
 The following command line arguments make up a rule specification (as used 
diff --git a/userspace/arptables/arptables.c b/userspace/arptables/arptables.c
index 4da6fea..3fb8ed5 100644
--- a/userspace/arptables/arptables.c
+++ b/userspace/arptables/arptables.c
@@ -152,6 +152,7 @@ static struct option original_opts[] = {
 	{ "help", 2, 0, 'h' },
 	{ "line-numbers", 0, 0, '0' },
 	{ "modprobe", 1, 0, 'M' },
+	{ "set-counters", 1, 0, 'c' },
 	{ 0 }
 };
 
@@ -529,7 +530,7 @@ exit_printhelp(void)
 "  --line-numbers		print line numbers when listing\n"
 "  --exact	-x		expand numbers (display exact values)\n"
 "  --modprobe=<command>		try to insert modules using this command\n"
-"  --set-counters PKTS BYTES	set the counter during insert/append\n"
+"  --set-counters -c PKTS BYTES	set the counter during insert/append\n"
 "[!] --version	-V		print package version.\n");
 	printf(" opcode strings: \n");
         for (i = 0; i < NUMOPCODES; i++)

^ permalink raw reply related

* [PATCH net-next v2 0/4] net: phy: neaten phy_print_status
From: Florian Fainelli @ 2014-01-23 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, joe, Florian Fainelli

David, Joe,

This patchset neatens phy_print_status based on earlier feedback from Joe.

Thanks!

Florian Fainelli (4):
  net: phy: use network device in phy_print_status
  net: phy: update phy_print_status to show pause settings
  net: phy: display human readable PHY speed settings
  net: phy: remove unneeded parenthesis

 drivers/net/phy/phy.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

-- 
1.8.3.2

^ permalink raw reply

* [PATCH net-next v2 3/4] net: phy: display human readable PHY speed settings
From: Florian Fainelli @ 2014-01-23 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, joe, Florian Fainelli
In-Reply-To: <1390508269-28769-1-git-send-email-f.fainelli@gmail.com>

Use a convenience function: phy_speed_to_str() which will display human
readable speeds.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Joe,

This is highly inspired from the version you sent to me, although
I renamed to phy_speed_to_str() for clarity as well as added the
unit suffix to each speed because your version would show:

Unknownmbps for instance.

 drivers/net/phy/phy.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 8ae2260..36fc6e1 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -38,6 +38,26 @@
 
 #include <asm/irq.h>
 
+static const char *phy_speed_to_str(int speed)
+{
+	switch (speed) {
+	case SPEED_10:
+		return "10Mbps";
+	case SPEED_100:
+		return "100Mbps";
+	case SPEED_1000:
+		return "1Gbps";
+	case SPEED_2500:
+		return "2.5Gbps";
+	case SPEED_10000:
+		return "10Gbps";
+	case SPEED_UNKNOWN:
+		return "Unknown";
+	default:
+		return "Unsupported (update phy.c)";
+	}
+}
+
 /**
  * phy_print_status - Convenience function to print out the current phy status
  * @phydev: the phy_device struct
@@ -46,8 +66,8 @@ void phy_print_status(struct phy_device *phydev)
 {
 	if (phydev->link) {
 		netdev_info(phydev->attached_dev,
-			"Link is Up - %d/%s - flow control %s\n",
-			phydev->speed,
+			"Link is Up - %s/%s - flow control %s\n",
+			phy_speed_to_str(phydev->speed),
 			DUPLEX_FULL == phydev->duplex ? "Full" : "Half",
 			phydev->pause ? "rx/tx" : "off");
 	} else	{
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next v2 4/4] net: phy: remove unneeded parenthesis
From: Florian Fainelli @ 2014-01-23 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, joe, Florian Fainelli
In-Reply-To: <1390508269-28769-1-git-send-email-f.fainelli@gmail.com>

Our if/else statement in phy_print_status() is only comprised of one
line for each, remove the parenthesis.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 36fc6e1..59aa85e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -64,15 +64,14 @@ static const char *phy_speed_to_str(int speed)
  */
 void phy_print_status(struct phy_device *phydev)
 {
-	if (phydev->link) {
+	if (phydev->link)
 		netdev_info(phydev->attached_dev,
 			"Link is Up - %s/%s - flow control %s\n",
 			phy_speed_to_str(phydev->speed),
 			DUPLEX_FULL == phydev->duplex ? "Full" : "Half",
 			phydev->pause ? "rx/tx" : "off");
-	} else	{
+	else
 		netdev_info(phydev->attached_dev, "Link is Down\n");
-	}
 }
 EXPORT_SYMBOL(phy_print_status);
 
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next v2 1/4] net: phy: use network device in phy_print_status
From: Florian Fainelli @ 2014-01-23 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, joe, Florian Fainelli
In-Reply-To: <1390508269-28769-1-git-send-email-f.fainelli@gmail.com>

phy_print_status() currently uses dev_name(&phydev->dev) which will
usually result in printing something along those lines for Device Tree
aware drivers:

libphy: f0b60000.etherne:0a - Link is Down
libphy: f0ba0000.etherne:00 - Link is Up - 1000/Full

This is not terribly useful for network administrators or users since we
expect a network interface name to be able to correlate link events with
interfaces. Update phy_print_status() to use netdev_info() with
phydev->attached_dev which is the backing network device for our PHY
device. The leading dash is removed since netdev_info() prefixes the
messages with "<interface>: " already.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 19c9eca..c35b2e7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -45,12 +45,11 @@
 void phy_print_status(struct phy_device *phydev)
 {
 	if (phydev->link) {
-		pr_info("%s - Link is Up - %d/%s\n",
-			dev_name(&phydev->dev),
+		netdev_info(phydev->attached_dev, "Link is Up - %d/%s\n",
 			phydev->speed,
 			DUPLEX_FULL == phydev->duplex ? "Full" : "Half");
 	} else	{
-		pr_info("%s - Link is Down\n", dev_name(&phydev->dev));
+		netdev_info(phydev->attached_dev, "Link is Down\n");
 	}
 }
 EXPORT_SYMBOL(phy_print_status);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH net-next v2 2/4] net: phy: update phy_print_status to show pause settings
From: Florian Fainelli @ 2014-01-23 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, joe, Florian Fainelli
In-Reply-To: <1390508269-28769-1-git-send-email-f.fainelli@gmail.com>

Update phy_print_status() to also display the PHY device pause settings
(rx/tx or off).

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/phy.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c35b2e7..8ae2260 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -45,9 +45,11 @@
 void phy_print_status(struct phy_device *phydev)
 {
 	if (phydev->link) {
-		netdev_info(phydev->attached_dev, "Link is Up - %d/%s\n",
+		netdev_info(phydev->attached_dev,
+			"Link is Up - %d/%s - flow control %s\n",
 			phydev->speed,
-			DUPLEX_FULL == phydev->duplex ? "Full" : "Half");
+			DUPLEX_FULL == phydev->duplex ? "Full" : "Half",
+			phydev->pause ? "rx/tx" : "off");
 	} else	{
 		netdev_info(phydev->attached_dev, "Link is Down\n");
 	}
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH v2 1/2] can: Decrease default size of CAN_RAW socket send queue
From: David Miller @ 2014-01-23 20:41 UTC (permalink / raw)
  To: sojkam1; +Cc: linux-can, netdev, mkl
In-Reply-To: <1390379257-9040-1-git-send-email-sojkam1@fel.cvut.cz>

From: Michal Sojka <sojkam1@fel.cvut.cz>
Date: Wed, 22 Jan 2014 09:27:36 +0100

> Since the length of the qdisc queue was set by default to 10
> packets, this is exactly what was happening.

This is your bug, set the qdisc limit to something more reasonable.

Something large enough to absorb the traffic wrt. the speed at which
the CAN device can sink the data.

These two patches are something I am not willing to apply to my
tree, this is not how you solve this problem, sorry.

^ permalink raw reply

* Re: [PATCH net-next] bonding: Use do_div to divide 64 bit numbers
From: Nikolay Aleksandrov @ 2014-01-23 20:44 UTC (permalink / raw)
  To: Zoltan Kiss, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	netdev, linux-kernel
In-Reply-To: <1390502838-21581-1-git-send-email-zoltan.kiss@citrix.com>

On 01/23/2014 07:47 PM, Zoltan Kiss wrote:
> Nikolay Aleksandrov's recent bonding option API changes (25a9b54a and e4994612)
> introduced u64 as the type of downdelay and updelay. On 32 bit the division and
> modulo operations cause compile errors:
> 
> ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
> ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!
> 
> This patch use the do_div macro, which guaranteed to do the right thing.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/bonding/bond_options.c |   19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index 4cee04a..4f94907 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -18,6 +18,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/ctype.h>
>  #include <linux/inet.h>
> +#include <asm/div64.h>
>  #include "bonding.h"
>  
>  static struct bond_opt_value bond_mode_tbl[] = {
> @@ -727,19 +728,20 @@ int bond_option_miimon_set(struct bonding *bond, struct bond_opt_value *newval)
>  
>  int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
>  {
> +	u64 quotient = newval->value;
> +	u64 remainder = do_div(quotient, bond->params.miimon);
Hi Zoltan,
Thanks for fixing this, a few comments though:
bond->params.miimon can be 0 here that's why there's a check afterwards,
also please separate the local variable definitions from the body with a
new line.
The same applies for downdelay.

Nik
>  	if (!bond->params.miimon) {
>  		pr_err("%s: Unable to set up delay as MII monitoring is disabled\n",
>  		       bond->dev->name);
>  		return -EPERM;
>  	}
> -	if ((newval->value % bond->params.miimon) != 0) {
> +	if (remainder != 0) {
>  		pr_warn("%s: Warning: up delay (%llu) is not a multiple of miimon (%d), updelay rounded to %llu ms\n",
>  			bond->dev->name, newval->value,
>  			bond->params.miimon,
> -			(newval->value / bond->params.miimon) *
> -			bond->params.miimon);
> +			quotient * bond->params.miimon);
>  	}
> -	bond->params.updelay = newval->value / bond->params.miimon;
> +	bond->params.updelay = quotient;
>  	pr_info("%s: Setting up delay to %d.\n",
>  		bond->dev->name,
>  		bond->params.updelay * bond->params.miimon);
> @@ -750,19 +752,20 @@ int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
>  int bond_option_downdelay_set(struct bonding *bond,
>  			      struct bond_opt_value *newval)
>  {
> +	u64 quotient = newval->value;
> +	u64 remainder = do_div(quotient, bond->params.miimon);
>  	if (!bond->params.miimon) {
>  		pr_err("%s: Unable to set down delay as MII monitoring is disabled\n",
>  		       bond->dev->name);
>  		return -EPERM;
>  	}
> -	if ((newval->value % bond->params.miimon) != 0) {
> +	if (remainder != 0) {
>  		pr_warn("%s: Warning: down delay (%llu) is not a multiple of miimon (%d), delay rounded to %llu ms\n",
>  			bond->dev->name, newval->value,
>  			bond->params.miimon,
> -			(newval->value / bond->params.miimon) *
> -			bond->params.miimon);
> +			quotient * bond->params.miimon);
>  	}
> -	bond->params.downdelay = newval->value / bond->params.miimon;
> +	bond->params.downdelay = quotient;
>  	pr_info("%s: Setting down delay to %d.\n",
>  		bond->dev->name,
>  		bond->params.downdelay * bond->params.miimon);
> 

^ permalink raw reply

* getifaddrs performance, sudo and Linux vs FreeBSD
From: Rick Jones @ 2014-01-23 20:37 UTC (permalink / raw)
  To: netdev; +Cc: Lee Schermerhorn

Some performance work I've been doing lately has led me down a path to 
the performance of sudo on a system with a Very Large Number (tm) of 
interfaces (four digits).  I noticed that the time it took for an "sudo 
sleep 1" (chosen as a nice, consequences-free something to pass to sudo) 
to complete could have hundreds of milliseconds added to it on a system 
with that large number of interfaces.  How much could vary - on the 
system on which I was looking initially, there were interfaces coming 
and going at the same time.  An strace showed an AF_NETLINK sendto() 
followed by 1600 recvfrom's on that fd seemed to be the the source of 
all the added time.

I went then to an inactive system and did some synthetic things and 
found that the added time for sudo was coming from its getifaddrs() 
call, which it makes at start-up to gather information which might be 
used later by one or more sudo plugins.

I brought this to the attention of the sudo-workers mailing list to see 
if there might be a way to make the getifaddrs() call "lazy" - 
http://www.sudo.ws/pipermail/sudo-workers/2014-January/000826.html and 
was told that other platforms don't take so much time in getifaddrs().

As I was using a 3.5 kernel on the bare-iron, and don't have flexibility 
to put a later kernel there, nor to run another OS there, for the rest 
of this I've spun-up two equal configuration VMs, one with a 3.11 kernel 
(via Ubuntu 13.10) and the other with FreeBSD 9.2.  Indeed, performance 
work in VMs is problematic, but this is just to give a demonstration.

On the FreeBSD VM I created 8000 additional interfaces via "ifconfig 
greN create" and on the 3.11 VM I did that via "ip tuntap add mode tun" 
.  I then brought sudo 1.8.9p4 source bits onto each and compiled them, 
with a couple of printf's bracketing the getifaddrs() call to make 
finding it in a system call trace later easier.

Under FreeBSD the times for sudo sleep 1 are:

# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.02 real         0.00 user         0.00 sys
# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.02 real         0.00 user         0.01 sys
# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.02 real         0.00 user         0.01 sys

Under 3.11 the times for sudo sleep 1 are:
root@ubuntu1310:~/sudo-1.8.9p4# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs

real	0m1.133s
user	0m0.049s
sys	0m0.083s
root@ubuntu1310:~/sudo-1.8.9p4# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs

real	0m1.108s
user	0m0.035s
sys	0m0.070s
root@ubuntu1310:~/sudo-1.8.9p4# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs

real	0m1.133s
user	0m0.033s
sys	0m0.099s

(not as dramatic as on the "doing real work" systems where I'd see the 
added time, with fewer added interfaces, be as much as 800 milliseconds, 
but it does give the flavor)

A truss -dD taken on FreeBSD shows:
0.015641137 0.000277260 write(1,"Calling getifaddrs\n",19) = 19 (0x13)
0.020622446 0.004852609 
__sysctl(0x7fffffffd3d0,0x6,0x0,0x7fffffffd3e8,0x0,0x0) = 0 (0x0)
0.025353806 0.004647170 
__sysctl(0x7fffffffd3d0,0x6,0x8010b7000,0x7fffffffd3e8,0x0,0x0) = 0 (0x0)
0.025864666 0.000200780 
mmap(0x0,8388608,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 
34380709888 (0x801400000)
0.026154166 0.000052810 munmap(0x801800000,4194304) = 0 (0x0)
0.028974145 0.000661820 
madvise(0x8010b7000,0x1b6000,0x5,0xb6,0x7fffffffca60,0x1) = 0 (0x0)
0.029470295 0.000196330 write(1,"Called getifaddrs\n",18) = 18 (0x12)

and strace -tttT under Linux 3.11 shows:
1390506086.363045 write(1, "Calling getifaddrs\n", 19) = 19 <0.000024>
1390506086.363088 socket(PF_NETLINK, SOCK_RAW, 0) = 3 <0.001267>
1390506086.364360 bind(3, {sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 12) = 0 <0.000032>
1390506086.364392 getsockname(3, {sa_family=AF_NETLINK, pid=8957, 
groups=00000000}, [12]) = 0 <0.000020>
1390506086.364419 sendto(3, "\24\0\0\0\22\0\1\3fp\341R\0\0\0\0\0\0\0\0", 
20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 <0.000000>
1390506086.364419 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"(\4\0\0\20\0\2\0fp\341R\375\"\0\0\0\0\376\377\0\37\0\0\220\20\0\0\0\0\0\0"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 3192 <0.000000>
1390506086.364419 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"(\4\0\0\20\0\2\0fp\341R\375\"\0\0\0\0\376\377\0\34\0\0\220\20\0\0\0\0\0\0"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 3192 <0.000000>
...
1390506086.557375 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"@\0\0\0\24\0\2\0gp\341R\375\"\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 
4096}], msg_controllen=0, msg_flags=0}, 0) = 128 <0.000011>
1390506086.557421 recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, 
groups=00000000}, 
msg_iov(1)=[{"\24\0\0\0\3\0\2\0gp\341R\375\"\0\0\0\0\0\0", 4096}], 
msg_controllen=0, msg_flags=0}, 0) = 20 <0.000007>
1390506086.558258 mmap(NULL, 2211840, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7246eb4000 <0.000013>
1390506086.630268 close(3)              = 0 <0.000046>
1390506086.630344 write(1, "Called getifaddrs\n", 18) = 18 <0.000033>

There were something like 2700 recvmsg calls there.

Admittedly, one probably aught to minimize the use of sudo wherever she 
can, but there ma be limits to that and sudo is not the only thing that 
might call getifaddrs().  Similarly, the number of times one puts four 
or even perhaps three digits worth of interfaces on a system is somewhat 
rare, but I suspect it isn't unheard of.  So, I thought I might post 
this to solicit input/ideas.

happy benchmarking,

rick jones

FWIW, here is a perf report for a single sudo sleep 1 from the vm, only 
those routines > 1% shown:
# Samples: 392  of event 'cpu-clock'
# Event count (approx.): 98000000
#
# Overhead      Shared Object                            Symbol
# ........  .................  ................................
#
     32.40%  libc-2.17.so       [.] 0x00000000000811c8
     17.60%  [kernel.kallsyms]  [k] snmp_fold_field
      6.63%  [kernel.kallsyms]  [k] inet6_fill_ifla6_attrs
      5.87%  [kernel.kallsyms]  [k] rtnl_fill_ifinfo
      4.85%  [kernel.kallsyms]  [k] memcpy
      4.08%  [kernel.kallsyms]  [k] find_next_bit
      2.30%  [kernel.kallsyms]  [k] netdev_master_upper_dev_get
      2.30%  [kernel.kallsyms]  [k] inet6_dump_addr
      2.04%  [kernel.kallsyms]  [k] inet_fill_link_af
      1.79%  [kernel.kallsyms]  [k] clear_page_c
      1.53%  [kernel.kallsyms]  [k] memset
      1.53%  [kernel.kallsyms]  [k] inet_dump_ifaddr
      1.28%  [kernel.kallsyms]  [k] _raw_read_lock_bh
      1.02%  [kernel.kallsyms]  [k] handle_mm_fault
      1.02%  [kernel.kallsyms]  [k] __nla_reserve
      1.02%  [kernel.kallsyms]  [k] __nla_put
      1.02%  [kernel.kallsyms]  [k] rtnl_dump_ifinfo
      1.02%  [kernel.kallsyms]  [k] __do_page_fault

For the curious, here are the times for when there aren't those 8000 
added interfaces.  FreeBSD:

# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.01 real         0.00 user         0.00 sys
# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.00 real         0.00 user         0.00 sys
# time src/sudo sleep 1
Calling getifaddrs
Called getifaddrs
         1.00 real         0.00 user         0.00 sys

and the Linux 3.11:

Oh, and if you've read this far, I've also noticed that perhaps not 
surprisingly, sysctl -a can take an exceedingly long time on a system 
with lots of added interfaces.

^ permalink raw reply

* Re: [patch net-next 6/6] rtnetlink: remove IFLA_SLAVES definition
From: David Miller @ 2014-01-23 20:47 UTC (permalink / raw)
  To: jiri
  Cc: netdev, fubar, vfalico, andy, sfeldma, stephen, vyasevic,
	nicolas.dichtel, john.r.fastabend
In-Reply-To: <1390387435-23918-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 22 Jan 2014 11:43:55 +0100

> Not used anymore, and never should be.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

First of all your Subject doesn't match the patch, the subject mentions
IFLA_SLAVES whereas the patch deletes IFLA_BOND_SLAVE.

Secondly, you can't remove this, otherwise the next IFLA_* value added
here will reuse that old number potentially breaking userland apps.

I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: metrics: Handle v6/v4-mapped sockets in tcp-metrics
From: David Miller @ 2014-01-23 20:49 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev
In-Reply-To: <1390395524-27930-1-git-send-email-christoph.paasch@uclouvain.be>

From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Wed, 22 Jan 2014 13:58:44 +0100

> A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6,
> but the IP being used is actually an IPv4-address.
> Current's tcp-metrics will thus represent it as an IPv6-address:
> 
> root@server:~# ip tcp_metrics
> ::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10
> 10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10
> 
> This patch modifies the tcp-metrics so that they are able to handle the
> v6/v4-mapped sockets correctly.
> 
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Applied, but I guess you didn't look at the build warnings, nor thoroughly
test this:

> +	else if (sk->sk_family = AF_INET6) {

That's an assignment, not a straight test, and the compiler warns
about it.

I fixed it when I applied this, but please don't be so sloppy in the
future.

^ permalink raw reply

* Re: [PATCH net-next] net/udp_offload: Handle static checker complaints
From: David Miller @ 2014-01-23 20:59 UTC (permalink / raw)
  To: ogerlitz; +Cc: netdev, shlomop
In-Reply-To: <1390397009-28280-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 22 Jan 2014 15:23:29 +0200

> From: Shlomo Pongratz <shlomop@mellanox.com>
> 
> Fixed few issues around using __rcu prefix and rcu_assign_pointer, also
> fixed a warning print to use ntohs(port) and not htons(port).
> 
> net/ipv4/udp_offload.c:112:9: error: incompatible types in comparison expression (different address spaces)
> net/ipv4/udp_offload.c:113:9: error: incompatible types in comparison expression (different address spaces)
> net/ipv4/udp_offload.c:176:19: error: incompatible types in comparison expression (different address spaces)
> 
> Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 1/2] can: Decrease default size of CAN_RAW socket send queue
From: Marc Kleine-Budde @ 2014-01-23 21:01 UTC (permalink / raw)
  To: David Miller, sojkam1; +Cc: linux-can, netdev
In-Reply-To: <20140123.124108.891860564230946746.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1263 bytes --]

On 01/23/2014 09:41 PM, David Miller wrote:
> From: Michal Sojka <sojkam1@fel.cvut.cz>
> Date: Wed, 22 Jan 2014 09:27:36 +0100
> 
>> Since the length of the qdisc queue was set by default to 10
>> packets, this is exactly what was happening.
> 
> This is your bug, set the qdisc limit to something more reasonable.
> 
> Something large enough to absorb the traffic wrt. the speed at which
> the CAN device can sink the data.

Hmmm, the problem is on i686 I have to increase the txqueuelen to 366
before the socket works as expected (writes are blocking). This means if
there are two processes one sending a stream of bulk data and the other
one occasionally more important data there will be a lot of frames in
front of the important ones.

With Michal's patch we can limit the number of frames in flight to a
reasonably low amount.

> These two patches are something I am not willing to apply to my
> tree, this is not how you solve this problem, sorry.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply

* Re: [PATCH] ipv6: reallocate addrconf router for ipv6 address when lo device up
From: Hannes Frederic Sowa @ 2014-01-23 21:03 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, davem, Sabrina Dubroca, Weilong Chen
In-Reply-To: <1390460277-23451-1-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, Jan 23, 2014 at 02:57:57PM +0800, Gao feng wrote:
> commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f
> "net IPv6 : Fix broken IPv6 routing table after loopback down-up"
> allocates addrconf router for ipv6 address when lo device up.
> but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad
> "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks
> this behavior.
> 
> Since the addrconf router is moved to the garbage list when
> lo device down, we should delete this router and rellocate
> a new one for ipv6 address when lo device up.
> 
> This patch solves bug 67951 on bugzilla
> https://bugzilla.kernel.org/show_bug.cgi?id=67951
> 
> CC: Sabrina Dubroca <sd@queasysnail.net>
> CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Reported-by: Weilong Chen <chenweilong@huawei.com>
> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/ipv6/addrconf.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..6eecd9d 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2611,8 +2611,17 @@ static void init_loopback(struct net_device *dev)
>  			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>  				continue;
>  
> -			if (sp_ifa->rt)
> -				continue;
> +			if (sp_ifa->rt) {
> +				/* This dst has been added to garbage list when
> +				 * lo device down, delete this obsolete dst and
> +				 * reallocate a new router for ifa.
> +				 */
> +				if (sp_ifa->rt->dst.obsolete > 0) {
> +					ip6_del_rt(sp_ifa->rt);

We should not delete dst.obsolete > 0 routes. I think a ip6_put_rt is just
fine here, no?

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH v2] net: Correctly sync addresses from multiple sources to single device
From: David Miller @ 2014-01-23 21:07 UTC (permalink / raw)
  To: vyasevic; +Cc: netdev, andrey.dmitrov, Alexandra.Kossovsky, Konstantin.Ushakov
In-Reply-To: <1390413255-32223-1-git-send-email-vyasevic@redhat.com>

From: Vlad Yasevich <vyasevic@redhat.com>
Date: Wed, 22 Jan 2014 12:54:15 -0500

> When we have multiple devices attempting to sync the same address
> to a single destination, each device should be permitted to sync
> it once.  To accomplish this, pass the 'sync_cnt' of the source
> address when adding the addresss to the lower device.  'sync_cnt'
> tracks how many time a given address has been succefully synced.
> This way, we know that if the 'sync_cnt' passed in is 0, we should
> sync this address.
> 
> Also, turn 'synced' member back into the counter as was originally
> done in
>    commit 4543fbefe6e06a9e40d9f2b28d688393a299f079.
>    net: count hw_addr syncs so that unsync works properly.
> It tracks how many time a given address has been added via a
> 'sync' operation.  For every successfull 'sync' the counter is
> incremented, and for ever 'unsync', the counter is decremented.
> This makes sure that the address will be properly removed from
> the the lower device when all the upper devices have removed it.
> 
> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
> CC: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
> CC: Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

This one compiles, great :-)

Applied, thanks Vlad.

^ permalink raw reply

* Re: [net-next PATCH v2 1/1] drivers: net: cpsw: enable promiscuous mode support
From: David Miller @ 2014-01-23 21:12 UTC (permalink / raw)
  To: mugunthanvnm; +Cc: netdev, linux-omap
In-Reply-To: <1390415592-13719-1-git-send-email-mugunthanvnm@ti.com>

From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Thu, 23 Jan 2014 00:03:12 +0530

> Enable promiscuous mode support for CPSW.
> 
> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ipv6: reallocate addrconf router for ipv6 address when lo device up
From: Sergei Shtylyov @ 2014-01-23 21:13 UTC (permalink / raw)
  To: Gao feng, netdev
  Cc: davem, Sabrina Dubroca, Hannes Frederic Sowa, Weilong Chen
In-Reply-To: <1390460277-23451-1-git-send-email-gaofeng@cn.fujitsu.com>

Hello.

On 23-01-2014 10:57, Gao feng wrote:

> commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f
> "net IPv6 : Fix broken IPv6 routing table after loopback down-up"
> allocates addrconf router for ipv6 address when lo device up.
> but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad
> "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks
> this behavior.

> Since the addrconf router is moved to the garbage list when
> lo device down, we should delete this router and rellocate
> a new one for ipv6 address when lo device up.

> This patch solves bug 67951 on bugzilla
> https://bugzilla.kernel.org/show_bug.cgi?id=67951

> CC: Sabrina Dubroca <sd@queasysnail.net>
> CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Reported-by: Weilong Chen <chenweilong@huawei.com>
> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>   net/ipv6/addrconf.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)

> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..6eecd9d 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2611,8 +2611,17 @@ static void init_loopback(struct net_device *dev)
>   			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>   				continue;
>
> -			if (sp_ifa->rt)
> -				continue;
> +			if (sp_ifa->rt) {
> +				/* This dst has been added to garbage list when
> +				 * lo device down, delete this obsolete dst and
> +				 * reallocate a new router for ifa.
> +				 */
> +				if (sp_ifa->rt->dst.obsolete > 0) {
> +					ip6_del_rt(sp_ifa->rt);
> +					sp_ifa->rt = NULL;
> +				} else
> +					continue;

    *else* arm should have {} when the other arm of *if* statement has it -- 
see Documentation/CodingStyle.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next] bonding: Use do_div to divide 64 bit numbers
From: Nikolay Aleksandrov @ 2014-01-23 21:18 UTC (permalink / raw)
  To: Zoltan Kiss, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	netdev, linux-kernel
In-Reply-To: <52E17F40.4040004@redhat.com>

On 01/23/2014 09:44 PM, Nikolay Aleksandrov wrote:
> On 01/23/2014 07:47 PM, Zoltan Kiss wrote:
>> Nikolay Aleksandrov's recent bonding option API changes (25a9b54a and e4994612)
>> introduced u64 as the type of downdelay and updelay. On 32 bit the division and
>> modulo operations cause compile errors:
>>
>> ERROR: "__udivdi3" [drivers/net/bonding/bonding.ko] undefined!
>> ERROR: "__umoddi3" [drivers/net/bonding/bonding.ko] undefined!
>>
>> This patch use the do_div macro, which guaranteed to do the right thing.
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
>>  drivers/net/bonding/bond_options.c |   19 +++++++++++--------
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>> index 4cee04a..4f94907 100644
>> --- a/drivers/net/bonding/bond_options.c
>> +++ b/drivers/net/bonding/bond_options.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/rcupdate.h>
>>  #include <linux/ctype.h>
>>  #include <linux/inet.h>
>> +#include <asm/div64.h>
>>  #include "bonding.h"
>>  
>>  static struct bond_opt_value bond_mode_tbl[] = {
>> @@ -727,19 +728,20 @@ int bond_option_miimon_set(struct bonding *bond, struct bond_opt_value *newval)
>>  
>>  int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
>>  {
>> +	u64 quotient = newval->value;
>> +	u64 remainder = do_div(quotient, bond->params.miimon);
> Hi Zoltan,
> Thanks for fixing this, a few comments though:
> bond->params.miimon can be 0 here that's why there's a check afterwards,
> also please separate the local variable definitions from the body with a
> new line.
> The same applies for downdelay.
> 
> Nik
In fact since we don't need the u64 and newval->value is limited to
INT_MAX, can't we simply cast it to (int) and avoid the do_div entirely ?

>>  	if (!bond->params.miimon) {
>>  		pr_err("%s: Unable to set up delay as MII monitoring is disabled\n",
>>  		       bond->dev->name);
>>  		return -EPERM;
>>  	}
>> -	if ((newval->value % bond->params.miimon) != 0) {
>> +	if (remainder != 0) {
>>  		pr_warn("%s: Warning: up delay (%llu) is not a multiple of miimon (%d), updelay rounded to %llu ms\n",
>>  			bond->dev->name, newval->value,
>>  			bond->params.miimon,
>> -			(newval->value / bond->params.miimon) *
>> -			bond->params.miimon);
>> +			quotient * bond->params.miimon);
>>  	}
>> -	bond->params.updelay = newval->value / bond->params.miimon;
>> +	bond->params.updelay = quotient;
>>  	pr_info("%s: Setting up delay to %d.\n",
>>  		bond->dev->name,
>>  		bond->params.updelay * bond->params.miimon);
>> @@ -750,19 +752,20 @@ int bond_option_updelay_set(struct bonding *bond, struct bond_opt_value *newval)
>>  int bond_option_downdelay_set(struct bonding *bond,
>>  			      struct bond_opt_value *newval)
>>  {
>> +	u64 quotient = newval->value;
>> +	u64 remainder = do_div(quotient, bond->params.miimon);
>>  	if (!bond->params.miimon) {
>>  		pr_err("%s: Unable to set down delay as MII monitoring is disabled\n",
>>  		       bond->dev->name);
>>  		return -EPERM;
>>  	}
>> -	if ((newval->value % bond->params.miimon) != 0) {
>> +	if (remainder != 0) {
>>  		pr_warn("%s: Warning: down delay (%llu) is not a multiple of miimon (%d), delay rounded to %llu ms\n",
>>  			bond->dev->name, newval->value,
>>  			bond->params.miimon,
>> -			(newval->value / bond->params.miimon) *
>> -			bond->params.miimon);
>> +			quotient * bond->params.miimon);
>>  	}
>> -	bond->params.downdelay = newval->value / bond->params.miimon;
>> +	bond->params.downdelay = quotient;
>>  	pr_info("%s: Setting down delay to %d.\n",
>>  		bond->dev->name,
>>  		bond->params.downdelay * bond->params.miimon);
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH v2 1/2] net/cxgb4: Avoid disabling PCI device for towice
From: David Miller @ 2014-01-23 21:21 UTC (permalink / raw)
  To: shangw; +Cc: netdev, dm, ben
In-Reply-To: <1390451256-11486-1-git-send-email-shangw@linux.vnet.ibm.com>

From: Gavin Shan <shangw@linux.vnet.ibm.com>
Date: Thu, 23 Jan 2014 12:27:34 +0800

> If we have EEH error happens to the adapter and we have to remove
> it from the system for some reasons (e.g. more than 5 EEH errors
> detected from the device in last hour), the adapter will be disabled
> for towice separately by eeh_err_detected() and remove_one(), which
> will incur following unexpected backtrace. The patch tries to avoid
> it.
 ...
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 2/2] net/cxgb4: Don't retrieve stats during recovery
From: David Miller @ 2014-01-23 21:21 UTC (permalink / raw)
  To: shangw; +Cc: netdev, dm, ben
In-Reply-To: <1390451256-11486-2-git-send-email-shangw@linux.vnet.ibm.com>

From: Gavin Shan <shangw@linux.vnet.ibm.com>
Date: Thu, 23 Jan 2014 12:27:35 +0800

> We possibly retrieve the adapter's statistics during EEH recovery
> and that should be disallowed. Otherwise, it would possibly incur
> replicate EEH error and EEH recovery is going to fail eventually.
> 
> The patch reuses statistics lock and checks net_device is attached
> before going to retrieve statistics, so that the problem can be
> avoided.
> 
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* [PATCH v6] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-01-23 21:23 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
	jonathan.davies
  Cc: Zoltan Kiss

The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
  parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
  the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
  m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour

It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.

v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch

v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one

v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
  won't race with this

v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places

v6:
- don't pass pfn to m2p* functions, just get it locally

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/xen/page.h     |    5 +-
 arch/x86/xen/p2m.c                  |   17 +------
 drivers/block/xen-blkback/blkback.c |   15 +++---
 drivers/xen/gntdev.c                |   13 +++--
 drivers/xen/grant-table.c           |   89 ++++++++++++++++++++++++++++++-----
 include/xen/grant_table.h           |    8 +++-
 6 files changed, 101 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b913915..ce47243 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -52,7 +52,8 @@ extern unsigned long set_phys_range_identity(unsigned long pfn_s,
 extern int m2p_add_override(unsigned long mfn, struct page *page,
 			    struct gnttab_map_grant_ref *kmap_op);
 extern int m2p_remove_override(struct page *page,
-				struct gnttab_map_grant_ref *kmap_op);
+			       struct gnttab_map_grant_ref *kmap_op,
+			       unsigned long mfn);
 extern struct page *m2p_find_override(unsigned long mfn);
 extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
 
@@ -121,7 +122,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
 		pfn = m2p_find_override_pfn(mfn, ~0);
 	}
 
-	/* 
+	/*
 	 * pfn is ~0 if there are no entries in the m2p for mfn or if the
 	 * entry doesn't map back to the mfn and m2p_override doesn't have a
 	 * valid entry for it.
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2ae8699..bd4724b 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -888,13 +888,6 @@ int m2p_add_override(unsigned long mfn, struct page *page,
 					"m2p_add_override: pfn %lx not mapped", pfn))
 			return -EINVAL;
 	}
-	WARN_ON(PagePrivate(page));
-	SetPagePrivate(page);
-	set_page_private(page, mfn);
-	page->index = pfn_to_mfn(pfn);
-
-	if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
-		return -ENOMEM;
 
 	if (kmap_op != NULL) {
 		if (!PageHighMem(page)) {
@@ -933,19 +926,16 @@ int m2p_add_override(unsigned long mfn, struct page *page,
 }
 EXPORT_SYMBOL_GPL(m2p_add_override);
 int m2p_remove_override(struct page *page,
-		struct gnttab_map_grant_ref *kmap_op)
+			struct gnttab_map_grant_ref *kmap_op,
+			unsigned long mfn)
 {
 	unsigned long flags;
-	unsigned long mfn;
 	unsigned long pfn;
 	unsigned long uninitialized_var(address);
 	unsigned level;
 	pte_t *ptep = NULL;
 
 	pfn = page_to_pfn(page);
-	mfn = get_phys_to_machine(pfn);
-	if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
-		return -EINVAL;
 
 	if (!PageHighMem(page)) {
 		address = (unsigned long)__va(pfn << PAGE_SHIFT);
@@ -959,10 +949,7 @@ int m2p_remove_override(struct page *page,
 	spin_lock_irqsave(&m2p_override_lock, flags);
 	list_del(&page->lru);
 	spin_unlock_irqrestore(&m2p_override_lock, flags);
-	WARN_ON(!PagePrivate(page));
-	ClearPagePrivate(page);
 
-	set_phys_to_machine(pfn, page->index);
 	if (kmap_op != NULL) {
 		if (!PageHighMem(page)) {
 			struct multicall_space mcs;
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 6620b73..875025f 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -285,8 +285,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
 
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
 			!rb_next(&persistent_gnt->node)) {
-			ret = gnttab_unmap_refs(unmap, NULL, pages,
-				segs_to_unmap);
+			ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
 			BUG_ON(ret);
 			put_free_pages(blkif, pages, segs_to_unmap);
 			segs_to_unmap = 0;
@@ -321,8 +320,7 @@ static void unmap_purged_grants(struct work_struct *work)
 		pages[segs_to_unmap] = persistent_gnt->page;
 
 		if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
-			ret = gnttab_unmap_refs(unmap, NULL, pages,
-				segs_to_unmap);
+			ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
 			BUG_ON(ret);
 			put_free_pages(blkif, pages, segs_to_unmap);
 			segs_to_unmap = 0;
@@ -330,7 +328,7 @@ static void unmap_purged_grants(struct work_struct *work)
 		kfree(persistent_gnt);
 	}
 	if (segs_to_unmap > 0) {
-		ret = gnttab_unmap_refs(unmap, NULL, pages, segs_to_unmap);
+		ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
 		BUG_ON(ret);
 		put_free_pages(blkif, pages, segs_to_unmap);
 	}
@@ -670,15 +668,14 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
 				    GNTMAP_host_map, pages[i]->handle);
 		pages[i]->handle = BLKBACK_INVALID_HANDLE;
 		if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
-			ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
-			                        invcount);
+			ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
 			BUG_ON(ret);
 			put_free_pages(blkif, unmap_pages, invcount);
 			invcount = 0;
 		}
 	}
 	if (invcount) {
-		ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
+		ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
 		BUG_ON(ret);
 		put_free_pages(blkif, unmap_pages, invcount);
 	}
@@ -740,7 +737,7 @@ again:
 	}
 
 	if (segs_to_map) {
-		ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map);
+		ret = gnttab_map_refs(map, pages_to_gnt, segs_to_map);
 		BUG_ON(ret);
 	}
 
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c..e652c0e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -284,8 +284,10 @@ static int map_grant_pages(struct grant_map *map)
 	}
 
 	pr_debug("map %d+%d\n", map->index, map->count);
-	err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops : NULL,
-			map->pages, map->count);
+	err = gnttab_map_refs_userspace(map->map_ops,
+					use_ptemod ? map->kmap_ops : NULL,
+					map->pages,
+					map->count);
 	if (err)
 		return err;
 
@@ -315,9 +317,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
 		}
 	}
 
-	err = gnttab_unmap_refs(map->unmap_ops + offset,
-			use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
-			pages);
+	err = gnttab_unmap_refs_userspace(map->unmap_ops + offset,
+					  use_ptemod ? map->kmap_ops + offset : NULL,
+					  map->pages + offset,
+					  pages);
 	if (err)
 		return err;
 
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index aa846a4..e4ddfeb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -880,15 +880,17 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
 }
 EXPORT_SYMBOL_GPL(gnttab_batch_copy);
 
-int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+int __gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		    struct gnttab_map_grant_ref *kmap_ops,
-		    struct page **pages, unsigned int count)
+		    struct page **pages, unsigned int count,
+		    bool m2p_override)
 {
 	int i, ret;
 	bool lazy = false;
 	pte_t *pte;
-	unsigned long mfn;
+	unsigned long mfn, pfn;
 
+	BUG_ON(kmap_ops && !m2p_override);
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
 	if (ret)
 		return ret;
@@ -907,10 +909,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 			set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
 					map_ops[i].dev_bus_addr >> PAGE_SHIFT);
 		}
-		return ret;
+		return 0;
 	}
 
-	if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
+	if (m2p_override &&
+	    !in_interrupt() &&
+	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
 		arch_enter_lazy_mmu_mode();
 		lazy = true;
 	}
@@ -927,8 +931,20 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 		} else {
 			mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
 		}
-		ret = m2p_add_override(mfn, pages[i], kmap_ops ?
-				       &kmap_ops[i] : NULL);
+		pfn = page_to_pfn(pages[i]);
+
+		WARN_ON(PagePrivate(pages[i]));
+		SetPagePrivate(pages[i]);
+		set_page_private(pages[i], mfn);
+
+		pages[i]->index = pfn_to_mfn(pfn);
+		if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn)))) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		if (m2p_override)
+			ret = m2p_add_override(mfn, pages[i], kmap_ops ?
+					       &kmap_ops[i] : NULL);
 		if (ret)
 			goto out;
 	}
@@ -939,15 +955,32 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
 
 	return ret;
 }
+
+int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+		    struct page **pages, unsigned int count)
+{
+	return __gnttab_map_refs(map_ops, NULL, pages, count, false);
+}
 EXPORT_SYMBOL_GPL(gnttab_map_refs);
 
-int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
+int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
+			      struct gnttab_map_grant_ref *kmap_ops,
+			      struct page **pages, unsigned int count)
+{
+	return __gnttab_map_refs(map_ops, kmap_ops, pages, count, true);
+}
+EXPORT_SYMBOL_GPL(gnttab_map_refs_userspace);
+
+int __gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 		      struct gnttab_map_grant_ref *kmap_ops,
-		      struct page **pages, unsigned int count)
+		      struct page **pages, unsigned int count,
+		      bool m2p_override)
 {
 	int i, ret;
 	bool lazy = false;
+	unsigned long pfn, mfn;
 
+	BUG_ON(kmap_ops && !m2p_override);
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
 	if (ret)
 		return ret;
@@ -958,17 +991,33 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 			set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
 					INVALID_P2M_ENTRY);
 		}
-		return ret;
+		return 0;
 	}
 
-	if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
+	if (m2p_override &&
+	    !in_interrupt() &&
+	    paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
 		arch_enter_lazy_mmu_mode();
 		lazy = true;
 	}
 
 	for (i = 0; i < count; i++) {
-		ret = m2p_remove_override(pages[i], kmap_ops ?
-				       &kmap_ops[i] : NULL);
+		pfn = page_to_pfn(pages[i]);
+		mfn = get_phys_to_machine(pfn);
+		if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT)) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		set_page_private(pages[i], INVALID_P2M_ENTRY);
+		WARN_ON(!PagePrivate(pages[i]));
+		ClearPagePrivate(pages[i]);
+		set_phys_to_machine(pfn, pages[i]->index);
+		if (m2p_override)
+			ret = m2p_remove_override(pages[i],
+						  kmap_ops ?
+						   &kmap_ops[i] : NULL,
+						  mfn);
 		if (ret)
 			goto out;
 	}
@@ -979,8 +1028,22 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
 
 	return ret;
 }
+
+int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *map_ops,
+		    struct page **pages, unsigned int count)
+{
+	return __gnttab_unmap_refs(map_ops, NULL, pages, count, false);
+}
 EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
 
+int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *map_ops,
+				struct gnttab_map_grant_ref *kmap_ops,
+				struct page **pages, unsigned int count)
+{
+	return __gnttab_unmap_refs(map_ops, kmap_ops, pages, count, true);
+}
+EXPORT_SYMBOL_GPL(gnttab_unmap_refs_userspace);
+
 static unsigned nr_status_frames(unsigned nr_grant_frames)
 {
 	BUG_ON(grefs_per_grant_frame == 0);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 694dcaf..9a919b1 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -184,11 +184,15 @@ unsigned int gnttab_max_grant_frames(void);
 #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
 
 int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
-		    struct gnttab_map_grant_ref *kmap_ops,
 		    struct page **pages, unsigned int count);
+int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
+			      struct gnttab_map_grant_ref *kmap_ops,
+			      struct page **pages, unsigned int count);
 int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
-		      struct gnttab_map_grant_ref *kunmap_ops,
 		      struct page **pages, unsigned int count);
+int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *unmap_ops,
+				struct gnttab_map_grant_ref *kunmap_ops,
+				struct page **pages, unsigned int count);
 
 /* Perform a batch of grant map/copy operations. Retry every batch slot
  * for which the hypervisor returns GNTST_eagain. This is typically due

^ permalink raw reply related

* Re: [PATCH v2] ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
From: David Miller @ 2014-01-23 21:28 UTC (permalink / raw)
  To: duanj.fnst; +Cc: pshelar, daniel.petre, edumazet, netdev
In-Reply-To: <52E0AFF9.6050803@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Thu, 23 Jan 2014 14:00:25 +0800

> 
> commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach")
> clear IPCB in ip_tunnel_xmit()  , or else skb->cb[] may contain garbage from
> GSO segmentation layer.
> 
> But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes,
> and it clear IPCB behind the dst_link_failure().
> 
> So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel:
> fix kernel panic with icmp_dest_unreach").
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [patch] tulip: cleanup by using ARRAY_SIZE()
From: David Miller @ 2014-01-23 21:29 UTC (permalink / raw)
  To: dan.carpenter; +Cc: grundler, netdev, kernel-janitors
In-Reply-To: <20140123082637.GC28688@elgon.mountain>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Thu, 23 Jan 2014 11:26:37 +0300

> In this situation then ARRAY_SIZE() and sizeof() are the same, but we're
> really dealing with array indexes and not byte offsets so ARRAY_SIZE()
> is cleaner.
> 
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied, thanks Dan.

^ permalink raw reply

* Re: [PATCH V1 net-next] net/vxlan: Share RX skb de-marking and checksum checks with ovs
From: David Miller @ 2014-01-23 21:30 UTC (permalink / raw)
  To: ogerlitz; +Cc: netdev, joseph.gasparakis, pshelar
In-Reply-To: <1390469293-2234-1-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 23 Jan 2014 11:28:13 +0200

> Make sure the practice set by commit 0afb166 "vxlan: Add capability
> of Rx checksum offload for inner packet" is applied when the skb
> goes through the portion of the RX code which is shared between
> vxlan netdevices and ovs vxlan port instances.
> 
> Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Cc: Pravin B Shelar <pshelar@nicira.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox