Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6 2/6] sfc: Implement generic features interface
From: Ben Hutchings @ 2011-04-03 20:27 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, netdev, linux-net-drivers
In-Reply-To: <20110403201322.GA13122@rere.qmqm.pl>

On Sun, 2011-04-03 at 22:13 +0200, Michał Mirosław wrote:
> On Sun, Apr 03, 2011 at 08:51:21PM +0100, Ben Hutchings wrote:
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> >  drivers/net/sfc/efx.c        |   17 ++++++++-
> >  drivers/net/sfc/ethtool.c    |   78 ------------------------------------------
> >  drivers/net/sfc/net_driver.h |    2 -
> >  drivers/net/sfc/rx.c         |    2 +-
> >  4 files changed, 16 insertions(+), 83 deletions(-)
> > 
> [cut patch]
> 
> Looks ok to me.
> 
> BTW, I noticed that TSO6 is not enabled in vlan_features. Is this intentional?

Well spotted.  It's not intentional.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch net-next-2.6] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Jesse Gross @ 2011-04-03 20:38 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jiri Pirko, netdev, davem, shemminger, kaber, fubar, eric.dumazet,
	andy, xiaosuo, Eric W. Biederman
In-Reply-To: <4D989100.1090207@gmail.com>

On Sun, Apr 3, 2011 at 8:23 AM, Nicolas de Pesloüan
<nicolas.2p.debian@gmail.com> wrote:
> Le 02/04/2011 12:26, Jiri Pirko a écrit :
>>
>> Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>> enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>> vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>
>> For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>> __netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>
>> This incosistency is fixed by this patch. Vlan untagging happens early in
>> __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>> see the skb like it was untagged by hw.
>>
>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>

You saw Eric B.'s recent patch trying to tackle the same issues, right?:
http://permalink.gmane.org/gmane.linux.network/190229

>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 3da9fb0..bfe9fce 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3130,6 +3130,12 @@ another_round:
>>
>>        __this_cpu_inc(softnet_data.processed);
>>
>> +       if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) {
>> +               skb = vlan_untag(skb);
>> +               if (unlikely(!skb))
>> +                       goto out;
>> +       }
>> +
>
> I like the general idea of this patch, but I don't like the idea of
> re-inserting specific code inside __netif_receive_skb.
>
> You made a great work removing most - if not all - device specific parts
> from __netif_receive_skb, by introducing rx_handler.
>
> I think the above part (and vlan_untag) should be moved to a vlan_rx_handler
> that would be set on the net_devices that are the parent of a vlan
> net_device and are NOT hwaccel.
>
> vlan_rx_handler would return RX_HANDLER_ANOTHER if skb holds a tagged frame
> (skb->dev changed) and RX_HANDLER_PASS if skb holds an untagged frame
> (skb->dev unchanged).

It would be nice to merge all of this together.  One complication is
the interaction of bridging and vlan on the same device.  Some people
want to have a bridge for each vlan and a bridge for untagged packets.
 On older kernels with vlan accelerated hardware this was possible
because vlan devices would get packets before bridging and on current
kernels it is possible with ebtables rules.  If we use rx_handler for
both I believe we would need to extend it some to allow multiple
handlers.

>
> This would also cause protocol handlers to receive the untouched (tagged)
> frame, if no setup required the frame to be untagged, which I think is the
> right thing to do.

At the very least we need to make sure that these packets are marked
as PACKET_OTHERHOST because protocol handlers don't pay attention to
the vlan field.

>
>> @@ -3177,7 +3183,7 @@ ncls:
>>                       ret = deliver_skb(skb, pt_prev, orig_dev);
>>                       pt_prev = NULL;
>>               }
>> -             if (vlan_hwaccel_do_receive(&skb)) {
>> +             if (vlan_do_receive(&skb)) {
>>                       ret = __netif_receive_skb(skb);
>>                       goto out;
>>               } else if (unlikely(!skb))
>
> Why are you calling __netif_receive_skb here? Can't we simply goto
> another_round?

This code (other than the name change) predates the
another_round/rx_handler changes.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/6] sfc: Implement generic features interface
From: Michał Mirosław @ 2011-04-03 20:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers
In-Reply-To: <1301860281.2935.25.camel@localhost>

On Sun, Apr 03, 2011 at 08:51:21PM +0100, Ben Hutchings wrote:
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
>  drivers/net/sfc/efx.c        |   17 ++++++++-
>  drivers/net/sfc/ethtool.c    |   78 ------------------------------------------
>  drivers/net/sfc/net_driver.h |    2 -
>  drivers/net/sfc/rx.c         |    2 +-
>  4 files changed, 16 insertions(+), 83 deletions(-)

Noticed one more thing:

> diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
> index d890679..98da250 100644
> --- a/drivers/net/sfc/efx.c
> +++ b/drivers/net/sfc/efx.c
[...]
> @@ -2452,12 +2463,14 @@ static int __devinit efx_pci_probe(struct pci_dev *pci_dev,
>  		return -ENOMEM;
>  	net_dev->features |= (type->offload_features | NETIF_F_SG |
>  			      NETIF_F_HIGHDMA | NETIF_F_TSO |
> -			      NETIF_F_GRO);
> +			      NETIF_F_GRO | NETIF_F_RXCSUM);

NETIF_F_GRO is enabled in register_netdev() now , so it's not
needed here.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next-2.6 4/6] ethtool: Fill out and update comment for struct ethtool_ops
From: Michał Mirosław @ 2011-04-03 21:25 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, linux-net-drivers
In-Reply-To: <1301860351.2935.27.camel@localhost>

2011/4/3 Ben Hutchings <bhutchings@solarflare.com>:
> Briefly document all operations (except get_rx_ntuple), including
> whether they may return an error code and whether they are deprecated.
> Also mention some things that should be handled by the ethtool core
> rather than by drivers.
[...]
> + * @set_pauseparam: Set pause parameters.  Returns a negative error code
> + *     or zero.
> + * @get_rx_csum: Deprecated in favour of the netdev feature %NETIF_F_RXCSUM.
> + *     Report whether receive checksums are turned on or off.
> + * @set_rx_csum: Deprecated in favour of the netdev op ndo_set_flags.  Turn
> + *     receive checksum on or off.  Returns a negative error code or zero.

Correct op is ndo_set_features and not ndo_set_flags. This should also
refer to hw_features field as that's more likely to be the thing
needed as the replacement.

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH] mlx4: fix kfree on error path in new_steering_entry()
From: Mariusz Kozlowski @ 2011-04-03 21:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Yevgeny Petrilin, Roland Dreier, Aleksey Senin, netdev,
	linux-kernel, Mariusz Kozlowski

On error path kfree() should get pointer to memory allocated by
kmalloc() not the address of variable holding it (which is on stack).

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
---
 drivers/net/mlx4/mcg.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mcg.c b/drivers/net/mlx4/mcg.c
index 37150b2..c6d336a 100644
--- a/drivers/net/mlx4/mcg.c
+++ b/drivers/net/mlx4/mcg.c
@@ -111,7 +111,7 @@ static int new_steering_entry(struct mlx4_dev *dev, u8 vep_num, u8 port,
 	u32 members_count;
 	struct mlx4_steer_index *new_entry;
 	struct mlx4_promisc_qp *pqp;
-	struct mlx4_promisc_qp *dqp;
+	struct mlx4_promisc_qp *dqp = NULL;
 	u32 prot;
 	int err;
 	u8 pf_num;
@@ -184,7 +184,7 @@ out_mailbox:
 out_alloc:
 	if (dqp) {
 		list_del(&dqp->list);
-		kfree(&dqp);
+		kfree(dqp);
 	}
 	list_del(&new_entry->list);
 	kfree(new_entry);
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH net-next-2.6 4/6] ethtool: Fill out and update comment for struct ethtool_ops
From: Ben Hutchings @ 2011-04-03 21:36 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, netdev, linux-net-drivers
In-Reply-To: <BANLkTik-+yCyrdSKhirfUSw7ecr0rg=PFA@mail.gmail.com>

On Sun, 2011-04-03 at 23:25 +0200, Michał Mirosław wrote:
> 2011/4/3 Ben Hutchings <bhutchings@solarflare.com>:
> > Briefly document all operations (except get_rx_ntuple), including
> > whether they may return an error code and whether they are deprecated.
> > Also mention some things that should be handled by the ethtool core
> > rather than by drivers.
> [...]
> > + * @set_pauseparam: Set pause parameters.  Returns a negative error code
> > + *     or zero.
> > + * @get_rx_csum: Deprecated in favour of the netdev feature %NETIF_F_RXCSUM.
> > + *     Report whether receive checksums are turned on or off.
> > + * @set_rx_csum: Deprecated in favour of the netdev op ndo_set_flags.  Turn
> > + *     receive checksum on or off.  Returns a negative error code or zero.
> 
> Correct op is ndo_set_features and not ndo_set_flags.

That's what I meant.

> This should also
> refer to hw_features field as that's more likely to be the thing
> needed as the replacement.

Agreed.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Congratulations"
From: MSMPROMO @ 2011-04-03 22:19 UTC (permalink / raw)




Dear Winner,

Congratulations!!! Your email address has won you 450,000.GBP on this year
 GSM-WEB Promo if intrested,Kindly provide us with these requirements
below:

Name:
Address:
Country:
Sex:
Age:
Phone Numbers:
Identity Proof:

For claims call:+447031894678 or Email: msmpromo@gmail.com

Thanks,
Gary Williams
GSM-WEB


^ permalink raw reply

* mISDN: fix "persistant" typo
From: Jan Engelhardt @ 2011-04-03 23:31 UTC (permalink / raw)
  To: isdn; +Cc: Linux Networking Developer Mailing List

parent fdbba80c7a1638bb2041d6349db27762e951a074 (v2.6.39-rc1-187-gfdbba80)
commit 4d556433b96279a2ca5837a6e9314d9d7bf56a29
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Mon Apr 4 01:30:35 2011 +0200

mISDN: fix "persistant" typo

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 drivers/isdn/mISDN/layer2.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/isdn/mISDN/layer2.c b/drivers/isdn/mISDN/layer2.c
index 4ae7505..9e1610f 100644
--- a/drivers/isdn/mISDN/layer2.c
+++ b/drivers/isdn/mISDN/layer2.c
@@ -1640,7 +1640,7 @@ l2_tei_remove(struct FsmInst *fi, int event, void *arg)
 }
 
 static void
-l2_st14_persistant_da(struct FsmInst *fi, int event, void *arg)
+l2_st14_persistent_da(struct FsmInst *fi, int event, void *arg)
 {
 	struct layer2 *l2 = fi->userdata;
 	struct sk_buff *skb = arg;
@@ -1654,7 +1654,7 @@ l2_st14_persistant_da(struct FsmInst *fi, int event, void *arg)
 }
 
 static void
-l2_st5_persistant_da(struct FsmInst *fi, int event, void *arg)
+l2_st5_persistent_da(struct FsmInst *fi, int event, void *arg)
 {
 	struct layer2 *l2 = fi->userdata;
 	struct sk_buff *skb = arg;
@@ -1671,7 +1671,7 @@ l2_st5_persistant_da(struct FsmInst *fi, int event, void *arg)
 }
 
 static void
-l2_st6_persistant_da(struct FsmInst *fi, int event, void *arg)
+l2_st6_persistent_da(struct FsmInst *fi, int event, void *arg)
 {
 	struct layer2 *l2 = fi->userdata;
 	struct sk_buff *skb = arg;
@@ -1685,7 +1685,7 @@ l2_st6_persistant_da(struct FsmInst *fi, int event, void *arg)
 }
 
 static void
-l2_persistant_da(struct FsmInst *fi, int event, void *arg)
+l2_persistent_da(struct FsmInst *fi, int event, void *arg)
 {
 	struct layer2 *l2 = fi->userdata;
 	struct sk_buff *skb = arg;
@@ -1829,14 +1829,14 @@ static struct FsmNode L2FnList[] =
 	{ST_L2_6, EV_L2_FRAME_ERROR, l2_frame_error},
 	{ST_L2_7, EV_L2_FRAME_ERROR, l2_frame_error_reest},
 	{ST_L2_8, EV_L2_FRAME_ERROR, l2_frame_error_reest},
-	{ST_L2_1, EV_L1_DEACTIVATE, l2_st14_persistant_da},
+	{ST_L2_1, EV_L1_DEACTIVATE, l2_st14_persistent_da},
 	{ST_L2_2, EV_L1_DEACTIVATE, l2_st24_tei_remove},
 	{ST_L2_3, EV_L1_DEACTIVATE, l2_st3_tei_remove},
-	{ST_L2_4, EV_L1_DEACTIVATE, l2_st14_persistant_da},
-	{ST_L2_5, EV_L1_DEACTIVATE, l2_st5_persistant_da},
-	{ST_L2_6, EV_L1_DEACTIVATE, l2_st6_persistant_da},
-	{ST_L2_7, EV_L1_DEACTIVATE, l2_persistant_da},
-	{ST_L2_8, EV_L1_DEACTIVATE, l2_persistant_da},
+	{ST_L2_4, EV_L1_DEACTIVATE, l2_st14_persistent_da},
+	{ST_L2_5, EV_L1_DEACTIVATE, l2_st5_persistent_da},
+	{ST_L2_6, EV_L1_DEACTIVATE, l2_st6_persistent_da},
+	{ST_L2_7, EV_L1_DEACTIVATE, l2_persistent_da},
+	{ST_L2_8, EV_L1_DEACTIVATE, l2_persistent_da},
 };
 
 static int
-- 
# Created with git-export-patch


^ permalink raw reply related

* Re: [PATCH] xen: netfront: fix declaration order
From: David Miller @ 2011-04-04  0:24 UTC (permalink / raw)
  To: eric.dumazet
  Cc: mirq-linux, netdev, jeremy.fitzhardinge, konrad.wilk,
	Ian.Campbell, xen-devel, virtualization
In-Reply-To: <1301828839.2837.143.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 03 Apr 2011 13:07:19 +0200

> [PATCH] xen: netfront: fix declaration order
> 
> Must declare xennet_fix_features() and xennet_set_features() before
> using them.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Ugh, it makes no sense that XEN won't make it into the x86_32
allmodconfig build.  Those dependencies in arch/x86/xen/Kconfig
are terrible.

For if it did, I would have caught this immediately.

^ permalink raw reply

* Re: pull request: sfc-next-2.6 2011-04-03
From: David Miller @ 2011-04-04  0:50 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1301859889.2935.23.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Sun, 03 Apr 2011 20:44:49 +0100

> The following changes since commit 9b12c75bf4d58dd85c987ee7b6a4356fdc7c1222:
>   David S. Miller (1):
>         net: Order ports in same order as addresses in flow objects.
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6.git master
> 
> 1. Implement of generic features interface in sfc.
> 2. Update ethtool_ops documentation.
> 3. Reimplement ETHTOOL_PHYS_ID as dicussed, dropping the RTNL lock.
> 
> Please allow some time for others to review before pulling.

Ok, it seems there has been some feedback and you'll need to respin
these changes.

^ permalink raw reply

* linux-next: build failure after merge of the net tree
From: Stephen Rothwell @ 2011-04-04  1:28 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, "Michał Mirosław"

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

Hi all,

After merging the net tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/net/xen-netfront.c:1151: error: 'xennet_fix_features' undeclared here (not in a function)
drivers/net/xen-netfront.c:1152: error: 'xennet_set_features' undeclared here (not in a function)

Caused by commit fb507934fd6f ("net: convert xen-netfront to hw_features").

I have used the net tree from next-20110401 for today.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: David Miller @ 2011-04-04  2:43 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, mirq-linux
In-Reply-To: <20110404112840.98dd98ec.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 4 Apr 2011 11:28:40 +1000

> Hi all,
> 
> After merging the net tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> drivers/net/xen-netfront.c:1151: error: 'xennet_fix_features' undeclared here (not in a function)
> drivers/net/xen-netfront.c:1152: error: 'xennet_set_features' undeclared here (not in a function)
> 
> Caused by commit fb507934fd6f ("net: convert xen-netfront to hw_features").

Just pushed a fix for this to net-next-2.6, sorry about that.

^ permalink raw reply

* [PATCH v3] net: Allow no-cache copy from user on transmit
From: Tom Herbert @ 2011-04-04  4:56 UTC (permalink / raw)
  To: davem, netdev

This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement.  skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg.  This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data.  The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary.  For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply.  Platform is 16 core AMD x86.

No-cache copy disabled:
   672703 tps, 97.13% utilization
   50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
   702113 tps, 96.16% utilization,
   50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
   79571 tps, 34.34 %utlization
   50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
   83856 tps, 34.81% utilization
   50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran.  Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/bonding/bond_main.c |    2 +-
 include/linux/netdevice.h       |    3 +-
 include/net/sock.h              |   55 +++++++++++++++++++++++++++++++++++++++
 net/core/dev.c                  |   15 ++++++++++
 net/core/ethtool.c              |    2 +-
 net/ipv4/tcp.c                  |    7 +++--
 6 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 16d6fe9..b51e021 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1407,7 +1407,7 @@ static int bond_compute_features(struct bonding *bond)
 	int i;
 
 	features &= ~(NETIF_F_ALL_CSUM | BOND_VLAN_FEATURES);
-	features |=  NETIF_F_GSO_MASK | NETIF_F_NO_CSUM;
+	features |=  NETIF_F_GSO_MASK | NETIF_F_NO_CSUM | NETIF_F_NOCACHE_COPY;
 
 	if (!bond->first_slave)
 		goto done;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 423a544..1828119 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1066,6 +1066,7 @@ struct net_device {
 #define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
 #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
+#define NETIF_F_NOCACHE_COPY	(1 << 30) /* Use no-cache copyfromuser */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -1081,7 +1082,7 @@ struct net_device {
 	/* = all defined minus driver/device-class-related */
 #define NETIF_F_NEVER_CHANGE	(NETIF_F_HIGHDMA | NETIF_F_VLAN_CHALLENGED | \
 				  NETIF_F_LLTX | NETIF_F_NETNS_LOCAL)
-#define NETIF_F_ETHTOOL_BITS	(0x3f3fffff & ~NETIF_F_NEVER_CHANGE)
+#define NETIF_F_ETHTOOL_BITS	(0x7f3fffff & ~NETIF_F_NEVER_CHANGE)
 
 	/* List of features with software fallbacks. */
 #define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
diff --git a/include/net/sock.h b/include/net/sock.h
index da0534d..91c81f5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -52,6 +52,7 @@
 #include <linux/mm.h>
 #include <linux/security.h>
 #include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #include <linux/filter.h>
 #include <linux/rculist_nulls.h>
@@ -1389,6 +1390,60 @@ static inline void sk_nocaps_add(struct sock *sk, int flags)
 	sk->sk_route_caps &= ~flags;
 }
 
+static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb,
+					   char __user *from, char *to,
+					   int copy)
+{
+	if (skb->ip_summed == CHECKSUM_NONE) {
+		int err = 0;
+		__wsum csum = csum_and_copy_from_user(from, to, copy, 0, &err);
+		if (err)
+			return err;
+		skb->csum = csum_block_add(skb->csum, csum, skb->len);
+#ifdef ARCH_HAS_NOCACHE_UACCESS
+	} else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) {
+		if (!access_ok(VERIFY_READ, from, copy) ||
+		    __copy_from_user_nocache(to, from, copy))
+			return -EFAULT;
+#endif
+	} else if (copy_from_user(to, from, copy))
+		return -EFAULT;
+
+	return 0;
+}
+
+static inline int skb_add_data_nocache(struct sock *sk, struct sk_buff *skb,
+				       char __user *from, int copy)
+{
+	int err;
+
+	err = skb_do_copy_data_nocache(sk, skb, from, skb_put(skb, copy), copy);
+	if (err)
+		__skb_trim(skb, skb->len);
+
+	return err;
+}
+
+static inline int skb_copy_to_page_nocache(struct sock *sk, char __user *from,
+					   struct sk_buff *skb,
+					   struct page *page,
+					   int off, int copy)
+{
+	int err;
+
+	err = skb_do_copy_data_nocache(sk, skb, from,
+				       page_address(page) + off, copy);
+	if (err)
+		return err;
+
+	skb->len	     += copy;
+	skb->data_len	     += copy;
+	skb->truesize	     += copy;
+	sk->sk_wmem_queued   += copy;
+	sk_mem_charge(sk, copy);
+	return 0;
+}
+
 static inline int skb_copy_to_page(struct sock *sk, char __user *from,
 				   struct sk_buff *skb, struct page *page,
 				   int off, int copy)
diff --git a/net/core/dev.c b/net/core/dev.c
index 02f5637..4c58a90 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5425,6 +5425,17 @@ int register_netdevice(struct net_device *dev)
 		dev->features &= ~NETIF_F_GSO;
 	}
 
+#ifdef ARCH_HAS_NOCACHE_UACCESS
+	dev->hw_features |= NETIF_F_NOCACHE_COPY;
+
+	/* Turn on no cache copy off if HW is doing checksum */
+	if ((dev->features & NETIF_F_ALL_CSUM) &&
+	    !(dev->features & NETIF_F_NO_CSUM)) {
+		dev->wanted_features |= NETIF_F_NOCACHE_COPY;
+		dev->features |= NETIF_F_NOCACHE_COPY;
+	}
+#endif
+
 	/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,
 	 * vlan_dev_init() will do the dev->features check, so these features
 	 * are enabled only if supported by underlying device.
@@ -6182,6 +6193,10 @@ u32 netdev_increment_features(u32 all, u32 one, u32 mask)
 		}
 	}
 
+	/* If device can't no cache copy, don't do for all */
+	if (!(one & NETIF_F_NOCACHE_COPY))
+		all &= ~NETIF_F_NOCACHE_COPY;
+
 	one |= NETIF_F_ALL_CSUM;
 
 	one |= all & NETIF_F_ONE_FOR_ALL;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 439e4b0..719670a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -359,7 +359,7 @@ static const char netdev_features_strings[ETHTOOL_DEV_FEATURE_WORDS * 32][ETH_GS
 	/* NETIF_F_NTUPLE */          "rx-ntuple-filter",
 	/* NETIF_F_RXHASH */          "rx-hashing",
 	/* NETIF_F_RXCSUM */          "rx-checksum",
-	"",
+	/* NETIF_F_NOCACHE_COPY */    "tx-nocache-copy"
 	"",
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b22d450..054a59d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -999,7 +999,8 @@ new_segment:
 				/* We have some space in skb head. Superb! */
 				if (copy > skb_tailroom(skb))
 					copy = skb_tailroom(skb);
-				if ((err = skb_add_data(skb, from, copy)) != 0)
+				err = skb_add_data_nocache(sk, skb, from, copy);
+				if (err)
 					goto do_fault;
 			} else {
 				int merge = 0;
@@ -1042,8 +1043,8 @@ new_segment:
 
 				/* Time to copy data. We are close to
 				 * the end! */
-				err = skb_copy_to_page(sk, from, skb, page,
-						       off, copy);
+				err = skb_copy_to_page_nocache(sk, from, skb,
+							       page, off, copy);
 				if (err) {
 					/* If this page was new, give it to the
 					 * socket so it does not get leaked.
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCH v3] net: Allow no-cache copy from user on transmit
From: David Miller @ 2011-04-04  5:03 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.2.00.1104032136540.5452@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Sun, 3 Apr 2011 21:56:17 -0700 (PDT)

> This patch uses __copy_from_user_nocache on transmit to bypass data
> cache for a performance improvement.  skb_add_data_nocache and
> skb_copy_to_page_nocache can be called by sendmsg functions to use
> this feature, initial support is in tcp_sendmsg.  This functionality is
> configurable per device using ethtool.
 ...
> Signed-off-by: Tom Herbert <therbert@google.com>

Applied, thanks Tom.

^ permalink raw reply

* Re: mISDN: fix "persistant" typo
From: David Miller @ 2011-04-04  5:03 UTC (permalink / raw)
  To: jengelh; +Cc: isdn, netdev
In-Reply-To: <alpine.LNX.2.01.1104040130390.21333@obet.zrqbmnf.qr>

From: Jan Engelhardt <jengelh@medozas.de>
Date: Mon, 4 Apr 2011 01:31:06 +0200 (CEST)

> mISDN: fix "persistant" typo
> 
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

Applied.

^ permalink raw reply

* Re: [PATCH] mlx4: fix kfree on error path in new_steering_entry()
From: David Miller @ 2011-04-04  5:04 UTC (permalink / raw)
  To: mk; +Cc: yevgenyp, rolandd, alekseys, netdev, linux-kernel
In-Reply-To: <1301865983-6584-1-git-send-email-mk@lab.zgora.pl>

From: Mariusz Kozlowski <mk@lab.zgora.pl>
Date: Sun,  3 Apr 2011 23:26:23 +0200

> On error path kfree() should get pointer to memory allocated by
> kmalloc() not the address of variable holding it (which is on stack).
> 
> Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: David Miller @ 2011-04-04  5:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, acme, bhutchings, hagen
In-Reply-To: <1301838968.2837.200.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 03 Apr 2011 15:56:08 +0200

>  arch/x86/Kbuild              |    1 
>  arch/x86/Kconfig             |    1 
>  arch/x86/net/bpf_jit.S       |  142 +++++++
>  arch/x86/net/bpf_jit_comp.c  |  655 +++++++++++++++++++++++++++++++++

Is this missing arch/x86/net/Makefile?

Otherwise I can't see how the x86 bpf objects get built.

^ permalink raw reply

* Re: [PATCH v2] net: filter: Just In Time compiler
From: Eric Dumazet @ 2011-04-04  5:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, acme, bhutchings, hagen
In-Reply-To: <20110403.220745.173856758.davem@davemloft.net>

Le dimanche 03 avril 2011 à 22:07 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 03 Apr 2011 15:56:08 +0200
> 
> >  arch/x86/Kbuild              |    1 
> >  arch/x86/Kconfig             |    1 
> >  arch/x86/net/bpf_jit.S       |  142 +++++++
> >  arch/x86/net/bpf_jit_comp.c  |  655 +++++++++++++++++++++++++++++++++
> 
> Is this missing arch/x86/net/Makefile?
> 
> Otherwise I can't see how the x86 bpf objects get built.

Arg yes, sorry, I'll add it for V3 ;)

diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
new file mode 100644
index 0000000..53b46d1
--- /dev/null
+++ b/arch/x86/net/Makefile
@@ -0,0 +1,5 @@
+#
+# Arch-specific network modules
+#
+obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
+



^ permalink raw reply related

* Re: [PATCH v3] net: Allow no-cache copy from user on transmit
From: David Miller @ 2011-04-04  5:23 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <20110403.220305.71570981.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sun, 03 Apr 2011 22:03:05 -0700 (PDT)

> From: Tom Herbert <therbert@google.com>
> Date: Sun, 3 Apr 2011 21:56:17 -0700 (PDT)
> 
>> This patch uses __copy_from_user_nocache on transmit to bypass data
>> cache for a performance improvement.  skb_add_data_nocache and
>> skb_copy_to_page_nocache can be called by sendmsg functions to use
>> this feature, initial support is in tcp_sendmsg.  This functionality is
>> configurable per device using ethtool.
>  ...
>> Signed-off-by: Tom Herbert <therbert@google.com>
> 
> Applied, thanks Tom.

Actually, I'm sorry, I have to kick this back to you again Tom.

The original problem is that "linux/uaccess.h" has not been included
in the spot where you try to invoke the nocache copies.

linux/uaccess.h, when ARCH_HAS_NOCACHE_UACCESS is defined, provides
dummy routines.

So it's not correct to use ARCH_HAS_NOCACHE_UACCESS to conditionalize
things in the networking, just make sure linux/uaccess.h is included
at the call sites.

Thanks.

^ permalink raw reply

* Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop
From: Rusty Russell @ 2011-04-04  6:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Shirley Ma, Herbert Xu, davem, kvm, netdev
In-Reply-To: <20110327075254.GA3776@redhat.com>

On Sun, 27 Mar 2011 09:52:54 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > Though IIRC, qemu's virtio barfs if the first descriptor isn't just the
> > hdr (barf...).
> 
> Maybe we can try fixing this before adding more flags,
> then e.g. publish used flag can be resued to also
> tell us layout is flexible. Or just add a feature flag for that.

We should probably do this at some stage, yes.

> > > 2. I didn't have time to work on virtio2 ideas presented
> > >    at the kvm forum yet, any takers?
> > 
> > I didn't even attend.
> 
> Hmm, right. But what was presented there was discussed on list as well:
> a single R/W descriptor ring with valid bit instead of 2 rings
> + a descriptor array.

I'll be happy when we reach the point that the extra cacheline is
hurting us :)

Then we should do direct descriptors w/ a cookie as the value to hand
back when finished.  That seems to be close to optimal.

> I agree absolutely that not all lessons has been learned,
> playing with different ring layouts would make at least
> an interesting paper IMO.

Yes, I'd like to see the results...

Thanks,
Rusty.

^ permalink raw reply

* Re: [patch net-next-2.6] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Nicolas de Pesloüan @ 2011-04-04  6:54 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Jiri Pirko, netdev, davem, shemminger, kaber, fubar, eric.dumazet,
	andy, xiaosuo, Eric W. Biederman
In-Reply-To: <BANLkTin-VVOLQgPA7+CrtBdntnO-0C+R+g@mail.gmail.com>

Le 03/04/2011 22:38, Jesse Gross a écrit :
> On Sun, Apr 3, 2011 at 8:23 AM, Nicolas de Pesloüan
> <nicolas.2p.debian@gmail.com>  wrote:
>> Le 02/04/2011 12:26, Jiri Pirko a écrit :
>>>
>>> Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>>> enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>>> vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>>
>>> For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>>> __netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>>
>>> This incosistency is fixed by this patch. Vlan untagging happens early in
>>> __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>>> see the skb like it was untagged by hw.
>>>
>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>
> You saw Eric B.'s recent patch trying to tackle the same issues, right?:
> http://permalink.gmane.org/gmane.linux.network/190229

Yes, of course I saw it.

>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 3da9fb0..bfe9fce 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3130,6 +3130,12 @@ another_round:
>>>
>>>         __this_cpu_inc(softnet_data.processed);
>>>
>>> +       if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) {
>>> +               skb = vlan_untag(skb);
>>> +               if (unlikely(!skb))
>>> +                       goto out;
>>> +       }
>>> +
>>
>> I like the general idea of this patch, but I don't like the idea of
>> re-inserting specific code inside __netif_receive_skb.
>>
>> You made a great work removing most - if not all - device specific parts
>> from __netif_receive_skb, by introducing rx_handler.
>>
>> I think the above part (and vlan_untag) should be moved to a vlan_rx_handler
>> that would be set on the net_devices that are the parent of a vlan
>> net_device and are NOT hwaccel.
>>
>> vlan_rx_handler would return RX_HANDLER_ANOTHER if skb holds a tagged frame
>> (skb->dev changed) and RX_HANDLER_PASS if skb holds an untagged frame
>> (skb->dev unchanged).
>
> It would be nice to merge all of this together.  One complication is
> the interaction of bridging and vlan on the same device.  Some people
> want to have a bridge for each vlan and a bridge for untagged packets.
>   On older kernels with vlan accelerated hardware this was possible
> because vlan devices would get packets before bridging and on current
> kernels it is possible with ebtables rules.  If we use rx_handler for
> both I believe we would need to extend it some to allow multiple
> handlers.

I totally agree.

Remember that Jiri's original proposal (last summer) was to have several rx_handlers per net_device. 
I still think we need several of them, because the network stack need to be generic and allow for 
any complex stacking setup. The rx_handler framework may need to be enhanced for that, but I think 
it is the right tool to do all those per net_device specific features.

>> This would also cause protocol handlers to receive the untouched (tagged)
>> frame, if no setup required the frame to be untagged, which I think is the
>> right thing to do.
>
> At the very least we need to make sure that these packets are marked
> as PACKET_OTHERHOST because protocol handlers don't pay attention to
> the vlan field.

Agreed.

>>> @@ -3177,7 +3183,7 @@ ncls:
>>>                        ret = deliver_skb(skb, pt_prev, orig_dev);
>>>                        pt_prev = NULL;
>>>                }
>>> -             if (vlan_hwaccel_do_receive(&skb)) {
>>> +             if (vlan_do_receive(&skb)) {
>>>                        ret = __netif_receive_skb(skb);
>>>                        goto out;
>>>                } else if (unlikely(!skb))
>>
>> Why are you calling __netif_receive_skb here? Can't we simply goto
>> another_round?
>
> This code (other than the name change) predates the
> another_round/rx_handler changes.

Yes, you are right. Let's keep this for a possible follow-up patch, to avoid skb reinjection when it 
is not strictly necessary.

	Nicolas.

^ permalink raw reply

* Re: [patch net-next-2.6] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Jiri Pirko @ 2011-04-04  7:14 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jesse Gross, netdev, davem, shemminger, kaber, fubar,
	eric.dumazet, andy, xiaosuo, Eric W. Biederman
In-Reply-To: <4D996B30.3080408@gmail.com>

Mon, Apr 04, 2011 at 08:54:40AM CEST, nicolas.2p.debian@gmail.com wrote:
>Le 03/04/2011 22:38, Jesse Gross a écrit :
>>On Sun, Apr 3, 2011 at 8:23 AM, Nicolas de Pesloüan
>><nicolas.2p.debian@gmail.com>  wrote:
>>>Le 02/04/2011 12:26, Jiri Pirko a écrit :
>>>>
>>>>Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>>>>enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>>>>vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>>>
>>>>For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>>>>__netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>>>
>>>>This incosistency is fixed by this patch. Vlan untagging happens early in
>>>>__netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>>>>see the skb like it was untagged by hw.
>>>>
>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>
>>You saw Eric B.'s recent patch trying to tackle the same issues, right?:
>>http://permalink.gmane.org/gmane.linux.network/190229
>
>Yes, of course I saw it.

I did not. Interestingly enough the patch looks pretty same as mine. I
posted rfc of my patch a while ago, before merge window. Anyway I think
my patch is nicer :)

>
>>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>>index 3da9fb0..bfe9fce 100644
>>>>--- a/net/core/dev.c
>>>>+++ b/net/core/dev.c
>>>>@@ -3130,6 +3130,12 @@ another_round:
>>>>
>>>>        __this_cpu_inc(softnet_data.processed);
>>>>
>>>>+       if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) {
>>>>+               skb = vlan_untag(skb);
>>>>+               if (unlikely(!skb))
>>>>+                       goto out;
>>>>+       }
>>>>+
>>>
>>>I like the general idea of this patch, but I don't like the idea of
>>>re-inserting specific code inside __netif_receive_skb.
>>>
>>>You made a great work removing most - if not all - device specific parts
>>>from __netif_receive_skb, by introducing rx_handler.
>>>
>>>I think the above part (and vlan_untag) should be moved to a vlan_rx_handler
>>>that would be set on the net_devices that are the parent of a vlan
>>>net_device and are NOT hwaccel.
>>>
>>>vlan_rx_handler would return RX_HANDLER_ANOTHER if skb holds a tagged frame
>>>(skb->dev changed) and RX_HANDLER_PASS if skb holds an untagged frame
>>>(skb->dev unchanged).
>>
>>It would be nice to merge all of this together.  One complication is
>>the interaction of bridging and vlan on the same device.  Some people
>>want to have a bridge for each vlan and a bridge for untagged packets.
>>  On older kernels with vlan accelerated hardware this was possible
>>because vlan devices would get packets before bridging and on current
>>kernels it is possible with ebtables rules.  If we use rx_handler for
>>both I believe we would need to extend it some to allow multiple
>>handlers.
>
>I totally agree.

I do not. The reason I do vlan_untag early is so actually emulates
hw acceleration. The reason is to make rx path of hwaccel an
nonhwaccel similar. If you move vlan untag to rx_handler, this goal
wouldn't be achieved.

>
>Remember that Jiri's original proposal (last summer) was to have
>several rx_handlers per net_device. I still think we need several of
>them, because the network stack need to be generic and allow for any
>complex stacking setup. The rx_handler framework may need to be
>enhanced for that, but I think it is the right tool to do all those
>per net_device specific features.
>
>>>This would also cause protocol handlers to receive the untouched (tagged)
>>>frame, if no setup required the frame to be untagged, which I think is the
>>>right thing to do.
>>
>>At the very least we need to make sure that these packets are marked
>>as PACKET_OTHERHOST because protocol handlers don't pay attention to
>>the vlan field.
>
>Agreed.
>
>>>>@@ -3177,7 +3183,7 @@ ncls:
>>>>                       ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>                       pt_prev = NULL;
>>>>               }
>>>>-             if (vlan_hwaccel_do_receive(&skb)) {
>>>>+             if (vlan_do_receive(&skb)) {
>>>>                       ret = __netif_receive_skb(skb);
>>>>                       goto out;
>>>>               } else if (unlikely(!skb))
>>>
>>>Why are you calling __netif_receive_skb here? Can't we simply goto
>>>another_round?
>>
>>This code (other than the name change) predates the
>>another_round/rx_handler changes.
>
>Yes, you are right. Let's keep this for a possible follow-up patch,
>to avoid skb reinjection when it is not strictly necessary.

To do another round here was my attention do do in follow up patch (I'm
still figuring out how to move this effectively into rx_handlers)

>
>	Nicolas.

^ permalink raw reply

* Re: [PATCH 2/3] igb: transform igb_{update,validate}_nvm_checksum into wrappers of their *_with_offset equivalents
From: Stefan Assmann @ 2011-04-04  7:23 UTC (permalink / raw)
  To: netdev; +Cc: e1000-devel, jeffrey.t.kirsher, alexander.h.duyck
In-Reply-To: <1301404186-20872-3-git-send-email-sassmann@kpanic.de>

On 29.03.2011 15:09, Stefan Assmann wrote:
> igb_update_nvm_checksum_with_offset and igb_update_nvm_checksum are similar
> except one additionally handles an offset.
> Move igb_update_nvm_checksum_with_offset to e1000_nvm.c and transform
> igb_update_nvm_checksum to a simple wrapper of
> igb_update_nvm_checksum_with_offset.
> 
> Exactly the same is done for igb_validate_nvm_checksum.
> 
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>

Please ignore this for now, Intel will send a replacement patch.

  Stefan

^ permalink raw reply

* Re: ipv6: Add support for RTA_PREFSRC
From: Daniel Walter @ 2011-04-04  7:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110401.204613.246536923.davem@davemloft.net>

On Fri, 2011-04-01 at 20:46 -0700, David Miller wrote:
> You can't change the layout of "struct in6_rtmsg", as that structure
> is explicitly exported to user space and changing it will break every
> application out there.

Hi,

I've kicked support for setting the preferred source via ioctl,
to keep "struct in6_rtmsg" untouched.
This reduces the RTA_PREFSRC support to netlink only, unless
we break the struct.

Do you see any other way around this problem?

regards,
daniel

---

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index bc3cde0..98348d5 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -42,6 +42,7 @@ struct fib6_config {
 
 	struct in6_addr	fc_dst;
 	struct in6_addr	fc_src;
+	struct in6_addr	fc_prefsrc;
 	struct in6_addr	fc_gateway;
 
 	unsigned long	fc_expires;
@@ -107,6 +108,7 @@ struct rt6_info {
 	struct rt6key			rt6i_dst ____cacheline_aligned_in_smp;
 	u32				rt6i_flags;
 	struct rt6key			rt6i_src;
+	struct rt6key			rt6i_prefsrc;
 	u32				rt6i_metric;
 	u32				rt6i_peer_genid;
 
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index c850e5f..2b37c20 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -141,6 +141,7 @@ struct rt6_rtnl_dump_arg {
 extern int rt6_dump_route(struct rt6_info *rt, void *p_arg);
 extern void rt6_ifdown(struct net *net, struct net_device *dev);
 extern void rt6_mtu_change(struct net_device *dev, unsigned mtu);
+extern void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 
 
 /*
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3daaf3c..26f9e14 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -825,6 +825,8 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 		dst_release(&rt->dst);
 	}
 
+	/* clean up prefsrc entries */
+	rt6_remove_prefsrc(ifp);
 out:
 	in6_ifa_put(ifp);
 }
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1820887..e2d8463 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -930,10 +930,14 @@ static int ip6_dst_lookup_tail(struct sock *sk,
 		goto out_err_release;
 
 	if (ipv6_addr_any(&fl6->saddr)) {
-		err = ipv6_dev_get_saddr(net, ip6_dst_idev(*dst)->dev,
-					 &fl6->daddr,
-					 sk ? inet6_sk(sk)->srcprefs : 0,
-					 &fl6->saddr);
+		struct rt6_info *rt = (struct rt6_info *) *dst;
+		if (rt->rt6i_prefsrc.plen)
+			ipv6_addr_copy(&fl6->saddr, &rt->rt6i_prefsrc.addr);
+		else
+			err = ipv6_dev_get_saddr(net, ip6_dst_idev(*dst)->dev,
+						 &fl6->daddr,
+						 sk ? inet6_sk(sk)->srcprefs : 0,
+						 &fl6->saddr);
 		if (err)
 			goto out_err_release;
 	}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 843406f..f59dbae 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1325,6 +1325,20 @@ int ip6_route_add(struct fib6_config *cfg)
 	if (dev == NULL)
 		goto out;
 
+	/* check if prefsrc is set */
+	if (!ipv6_addr_any(&cfg->fc_prefsrc)) {
+		struct in6_addr saddr_buf;
+		ipv6_addr_copy(&saddr_buf, &cfg->fc_prefsrc);
+		if (!ipv6_chk_addr(net, &saddr_buf, dev, 0)) {
+			printk(KERN_DEBUG "invalid pref_src\n");
+			err = -EINVAL;
+			goto out;
+		}
+		ipv6_addr_copy(&rt->rt6i_prefsrc.addr, &cfg->fc_prefsrc);
+		rt->rt6i_prefsrc.plen = 128;
+	} else
+		rt->rt6i_prefsrc.plen = 0;
+
 	if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
 		rt->rt6i_nexthop = __neigh_lookup_errno(&nd_tbl, &rt->rt6i_gateway, dev);
 		if (IS_ERR(rt->rt6i_nexthop)) {
@@ -2037,6 +2051,39 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 	return rt;
 }
 
+/* remove deleted ip from prefsrc entries */
+struct arg_dev_net_ip {
+	struct net_device *dev;
+	struct net *net;
+	struct in6_addr *addr;
+};
+
+static int fib6_remove_prefsrc(struct rt6_info *rt, void *arg)
+{
+	struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev;
+	struct net *net = ((struct arg_dev_net_ip *)arg)->net;
+	struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr;
+
+	if (((void *)rt->rt6i_dev == dev || dev == NULL) &&
+	    rt != net->ipv6.ip6_null_entry &&
+	    ipv6_addr_equal(addr, &rt->rt6i_prefsrc.addr)) {
+		/* remove prefsrc entry */
+		rt->rt6i_prefsrc.plen = 0;
+	}
+	return 0;
+}
+
+void rt6_remove_prefsrc(struct inet6_ifaddr *ifp)
+{
+	struct net *net = dev_net(ifp->idev->dev);
+	struct arg_dev_net_ip adni = {
+		.dev = ifp->idev->dev,
+		.net = net,
+		.addr = &ifp->addr,
+	};
+	fib6_clean_all(net, fib6_remove_prefsrc, 0, &adni);
+}
+
 struct arg_dev_net {
 	struct net_device *dev;
 	struct net *net;
@@ -2183,6 +2230,9 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
 		nla_memcpy(&cfg->fc_src, tb[RTA_SRC], plen);
 	}
 
+	if (tb[RTA_PREFSRC])
+		nla_memcpy(&cfg->fc_prefsrc, tb[RTA_PREFSRC], 16);
+
 	if (tb[RTA_OIF])
 		cfg->fc_ifindex = nla_get_u32(tb[RTA_OIF]);
 
@@ -2325,11 +2375,22 @@ static int rt6_fill_node(struct net *net,
 #endif
 			NLA_PUT_U32(skb, RTA_IIF, iif);
 	} else if (dst) {
-		struct inet6_dev *idev = ip6_dst_idev(&rt->dst);
 		struct in6_addr saddr_buf;
-		if (ipv6_dev_get_saddr(net, idev ? idev->dev : NULL,
-				       dst, 0, &saddr_buf) == 0)
+		if (rt->rt6i_prefsrc.plen) {
+			ipv6_addr_copy(&saddr_buf, &rt->rt6i_prefsrc.addr);
 			NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
+		} else {
+			struct inet6_dev *idev = ip6_dst_idev(&rt->dst);
+			if (ipv6_dev_get_saddr(net, idev ? idev->dev : NULL,
+					       dst, 0, &saddr_buf) == 0)
+				NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
+		}
+	}
+
+	if (rt->rt6i_prefsrc.plen) {
+		struct in6_addr saddr_buf;
+		ipv6_addr_copy(&saddr_buf, &rt->rt6i_prefsrc.addr);
+		NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf);
 	}
 
 	if (rtnetlink_put_metrics(skb, dst_metrics_ptr(&rt->dst)) < 0)







Daniel Walter
Software Engineer

Barracuda Networks AG
Eduard-Bodem-Gasse 1
6020 Innsbruck
Austria

Phone: +43 (0) 508 100
Fax: +43 508 100 20
eMail: mailto:DWalter@barracuda.com
Web: www.barracudanetworks.com, www.phion.com


Barracuda Networks solutions are now available as virtual appliances. 
Visit www.barracudanetworks.com/vx for more information.




^ permalink raw reply related

* [PATCH] MAINTAINERS: add entry for Xen network backend
From: Ian Campbell @ 2011-04-04  8:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ian Campbell, netdev, xen-devel, Andrew Morton, Linus Torvalds

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 MAINTAINERS |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6b4b9cd..bb702f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6916,6 +6916,12 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86.
 S:	Maintained
 F:	drivers/platform/x86
 
+XEN NETWORK BACKEND DRIVER
+M:	Ian Campbell <ian.campbell@citrix.com>
+L:	xen-devel@lists.xensource.com (moderated for non-subscribers)
+S:	Supported
+F:	drivers/net/xen-netback/*
+
 XEN PCI SUBSYSTEM
 M:	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 L:	xen-devel@lists.xensource.com (moderated for non-subscribers)
-- 
1.7.2.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox