Netdev List

Netdev List
 help / color / mirror / Atom feed
* Re:
From: luiz.malaquias @ 2015-01-17 23:32 UTC (permalink / raw)
  To: Recipients

I have a business worth $24.5 for you to handle for me.

^ permalink raw reply
* Re: [PATCH net-next] iproute2: bridge: support vlan range
From: Scott Feldman @ 2015-01-18  1:35 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: Netdev, shemminger, vyasevic@redhat.com, Wilson Kok
In-Reply-To: <1421391147-35021-1-git-send-email-roopa@cumulusnetworks.com>

On Thu, Jan 15, 2015 at 10:52 PM,  <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This patch adds vlan range support to bridge command
> using the newly added vinfo flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
> BRIDGE_VLAN_INFO_RANGE_END.
>
> $bridge vlan show
> port    vlan ids
> br0      1 PVID Egress Untagged
>
> dummy0   1 PVID Egress Untagged
>
> $bridge vlan add vid 10-15 dev dummy0
> port    vlan ids
> br0      1 PVID Egress Untagged
>
> dummy0   1 PVID Egress Untagged
>          10
>          11
>          12
>          13
>          14
>          15
>
> $bridge vlan del vid 14 dev dummy0
>
> $bridge vlan show
> port    vlan ids
> br0      1 PVID Egress Untagged
>
> dummy0   1 PVID Egress Untagged
>          10
>          11
>          12
>          13
>          15
>
> $bridge vlan del vid 10-15 dev dummy0
>
> $bridge vlan show
> port    vlan ids
> br0      1 PVID Egress Untagged
>
> dummy0   1 PVID Egress Untagged
>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> ---
>  bridge/vlan.c             |   46 ++++++++++++++++++++++++++++++++++++++-------
>  include/linux/if_bridge.h |    2 ++
>  2 files changed, 41 insertions(+), 7 deletions(-)
>
> diff --git a/bridge/vlan.c b/bridge/vlan.c
> index 3bd7b0d..90b3b6b 100644
> --- a/bridge/vlan.c
> +++ b/bridge/vlan.c
> @@ -32,6 +32,7 @@ static int vlan_modify(int cmd, int argc, char **argv)
>         } req;
>         char *d = NULL;
>         short vid = -1;
> +       short vid_end = -1;
>         struct rtattr *afspec;
>         struct bridge_vlan_info vinfo;
>         unsigned short flags = 0;
> @@ -49,8 +50,18 @@ static int vlan_modify(int cmd, int argc, char **argv)
>                         NEXT_ARG();
>                         d = *argv;
>                 } else if (strcmp(*argv, "vid") == 0) {
> +                       char *p;
>                         NEXT_ARG();
> -                       vid = atoi(*argv);
> +                       p = strchr(*argv, '-');
> +                       if (p) {
> +                               *p = '\0';
> +                               p++;
> +                               vinfo.vid = atoi(*argv);
> +                               vid_end = atoi(p);

Is "vid 10-" same as "vid 10-0"?

Is "vid -15" same as "vid 0-15"?

What is "vid -"?

Does the "-" char mess up shells?  I don't know the answer; just asking.

> +                               vinfo.flags |= BRIDGE_VLAN_INFO_RANGE_BEGIN;
> +                       } else {
> +                               vinfo.vid = atoi(*argv);
> +                       }
>                 } else if (strcmp(*argv, "self") == 0) {
>                         flags |= BRIDGE_FLAGS_SELF;
>                 } else if (strcmp(*argv, "master") == 0) {
> @@ -67,7 +78,7 @@ static int vlan_modify(int cmd, int argc, char **argv)
>                 argc--; argv++;
>         }
>
> -       if (d == NULL || vid == -1) {
> +       if (d == NULL || vinfo.vid == -1) {

Where was vinfo.vid initialized to -1?  Maybe use vid rather than
vinfo.vid in the code above where parsing the arg, and continue using
vid and vid_end until final put of vinfo.

-scott

^ permalink raw reply
* Re: [net-next v2 00/17][pull request] Intel Wired LAN Driver Updates 2015-01-16
From: David Miller @ 2015-01-18  1:34 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <1421414946-22179-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 16 Jan 2015 05:28:49 -0800

> This series contains updates to i40e and i40evf.
...
> v2:
>  - Dropped patch 10 "i40e: clean up PTP log messages" based on feedback
>    from David Laight and David Miller
>  - Split up the original patch 13 "i40e: AQ API updates for new commands"
>    into 2 patches (now #12 & #13) based on feedback from Or Gerlitz

Pulled, thanks a lot Jeff.

^ permalink raw reply
* Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
From: Scott Feldman @ 2015-01-18  1:05 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: stephen@networkplumber.org, David S. Miller, Jamal Hadi Salim,
	Jiří Pírko, Arad, Ronen, Thomas Graf,
	john fastabend, vyasevic@redhat.com, Netdev, Wilson Kok,
	Andy Gospodarek
In-Reply-To: <1421479975-62049-1-git-send-email-roopa@cumulusnetworks.com>

On Fri, Jan 16, 2015 at 11:32 PM,  <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
> there is a need to not re-forward the frames that come up to the
> kernel in software.
>
> Typically these are broadcast or multicast packets forwarded by the
> hardware to multiple destination ports including sending a copy of
> the packet to the kernel (e.g. an arp broadcast).
> The bridge driver will try to forward the packet again, resulting in
> two copies of the same packet.
>
> These packets can also come up to the kernel for logging when they hit
> a LOG acl in hardware.
>
> This patch makes forwarding a flag on the port similar to
> learn and flood and drops the packet just before forwarding.
> (The forwarding disable on a bridge is tested to work on our boxes.
> The bridge port flag addition is only compile tested.
> This will need to be further refined to cover cases where a non-switch port
> is bridged to a switch port etc. We will submit more patches to cover
> all cases if we agree on this approach).

Good topic to bring up, thanks for proposing a patch.  There is indeed
duplicate pkts sent out in the case where both the bridge and the
offloaded device are flooding these non-unicast pkts, such as ARP
requests.  We do have per-port control today over unicast flooding
using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).

As you point out, this doesn't solve the case for non-offloaded ports
bridged with switch ports.  If this port setting is enabled on an
offloaded switch port, for example, the non-offloaded port can't get
an ARP request resolved, if the MAC is behind the offloaded switch
port.  But do we care?  Is there a use-case for this one, mixing
offloaded and non-offloaded ports in a bridge?

>
> Other ways to solve the same problem could be to:
> - use the offload feature flag on these switch ports to avoid the
> re-forward:
> https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>
> - Or the switch driver can mark or set a flag in the skb, which the bridge
> driver can use to avoid a re-forward.
>
> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> ---
>  include/linux/if_bridge.h    |    3 ++-
>  include/uapi/linux/if_link.h |    1 +
>  net/bridge/br_forward.c      |   13 +++++++++++++
>  net/bridge/br_if.c           |    2 +-
>  net/bridge/br_netlink.c      |    4 +++-
>  net/bridge/br_sysfs_if.c     |    1 +
>  net/core/rtnetlink.c         |    4 +++-
>  7 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 0a8ce76..c79f4eb 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -40,10 +40,11 @@ struct br_ip_list {
>  #define BR_ADMIN_COST          BIT(4)
>  #define BR_LEARNING            BIT(5)
>  #define BR_FLOOD               BIT(6)
> -#define BR_AUTO_MASK           (BR_FLOOD | BR_LEARNING)
>  #define BR_PROMISC             BIT(7)
>  #define BR_PROXYARP            BIT(8)
>  #define BR_LEARNING_SYNC       BIT(9)
> +#define BR_FORWARD             BIT(10)

The name BR_FORWARD might confuse people thinking this is related to
STP FORWARDING state.  We have BR_FLOOD for unknown unicast flooding.
How about renaming BR_FLOOD to BR_FLOOD_UNICAST and add
BR_FLOOD_BROADCAST?  So you would have:

  IFLA_BRPORT_UNICAST_FLOOD           BR_FLOOD_UNICAST        /* flood
unknown unicast traffic to port */
  IFLA_BRPORT_BROADCAST_FLOOD    BR_FLOOD_BROADCAST  /* flood
bcast/mcast traffic to port */

> +#define BR_AUTO_MASK           (BR_FLOOD | BR_LEARNING | BR_FORWARD)
>
>  extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..d394625 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -246,6 +246,7 @@ enum {
>         IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
>         IFLA_BRPORT_PROXYARP,   /* proxy ARP */
>         IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
> +       IFLA_BRPORT_FORWARD,    /* enable forwarding on a device */
>         __IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index f96933a..98c41c8 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -81,10 +81,23 @@ static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
>                 br_forward_finish);
>  }
>
> +int br_hw_forward_finish(struct sk_buff *skb)
> +{
> +       kfree_skb(skb);
> +
> +       return 0;
> +}
> +
>  static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
>  {
>         struct net_device *indev;
>
> +       if (!(to->flags & BR_FORWARD)) {
> +               NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, skb->dev, to->dev,
> +                       br_hw_forward_finish);
> +               return;
> +       }
> +

Seems you should make the (flags & BR_FORWARD) check earlier, before
skb cloning, in br_flood(), alongside the (flags & BR_FLOOD) check.

Also, the above code is skipping some vlan checks (br_handle_vlan).

-scott

^ permalink raw reply
* Re: [PATCH net-next 1/2] udp: Do not require sock in udp_tunnel_xmit_skb
From: Jesse Gross @ 2015-01-17 23:45 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Thomas Graf, netdev
In-Reply-To: <1421518700-22460-2-git-send-email-therbert@google.com>

On Sat, Jan 17, 2015 at 10:18 AM, Tom Herbert <therbert@google.com> wrote:
> The UDP tunnel transmit functions udp_tunnel_xmit_skb and
> udp_tunnel6_xmit_skb include a socket argument. The socket being
> passed to the functions (from VXLAN) is a UDP created for receive
> side. The only thing that the socket is used for in the transmit
> functions is to get the setting for checksum (enabled or zero).
> This patch removes the argument and and adds a nocheck argument
> for checksum setting. This eliminates the unnecessary dependency
> on a UDP socket for UDP tunnel transmit.
>
> Signed-off-by: Tom Herbert <therbert@google.com>

I think you need to update Geneve as well:

net/ipv4/geneve.c:139:38: warning: incorrect type in argument 1
(different base types)
net/ipv4/geneve.c:139:38:    expected struct rtable *rt
net/ipv4/geneve.c:139:38:    got struct socket *sock
net/ipv4/geneve.c:139:46: warning: incorrect type in argument 2
(different base types)
net/ipv4/geneve.c:139:46:    expected struct sk_buff *skb
net/ipv4/geneve.c:139:46:    got struct rtable *rt
net/ipv4/geneve.c:139:50: warning: incorrect type in argument 3
(different base types)
net/ipv4/geneve.c:139:50:    expected restricted __be32 [usertype] src
net/ipv4/geneve.c:139:50:    got struct sk_buff *[assigned] skb
net/ipv4/geneve.c:139:60: warning: incorrect type in argument 5
(different base types)
net/ipv4/geneve.c:139:60:    expected unsigned char [unsigned] [usertype] tos
net/ipv4/geneve.c:139:60:    got restricted __be32 [usertype] dst
net/ipv4/geneve.c:140:41: warning: incorrect type in argument 7
(different base types)
net/ipv4/geneve.c:140:41:    expected restricted __be16 [usertype] df
net/ipv4/geneve.c:140:41:    got unsigned char [unsigned] [usertype] ttl
net/ipv4/geneve.c:140:60: warning: incorrect type in argument 10
(different base types)
net/ipv4/geneve.c:140:60:    expected bool [unsigned] [usertype] xnet
net/ipv4/geneve.c:140:60:    got restricted __be16 [usertype] dst_port

^ permalink raw reply
* RE: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
From: Arad, Ronen @ 2015-01-17 21:14 UTC (permalink / raw)
  To: roopa@cumulusnetworks.com, stephen@networkplumber.org,
	davem@davemloft.net, jhs@mojatatu.com, netdev@vger.kernel.org,
	sfeldma@gmail.com, jiri@resnulli.us, tgraf@suug.ch,
	john.fastabend@gmail.com, vyasevic@redhat.com
  Cc: wkok@cumulusnetworks.com, gospo@cumulusnetworks.com
In-Reply-To: <1421479975-62049-1-git-send-email-roopa@cumulusnetworks.com>



>-----Original Message-----
>From: roopa@cumulusnetworks.com [mailto:roopa@cumulusnetworks.com]
>Sent: Friday, January 16, 2015 11:33 PM
>To: stephen@networkplumber.org; davem@davemloft.net; jhs@mojatatu.com;
>sfeldma@gmail.com; jiri@resnulli.us; Arad, Ronen; tgraf@suug.ch;
>john.fastabend@gmail.com; vyasevic@redhat.com
>Cc: netdev@vger.kernel.org; wkok@cumulusnetworks.com;
>gospo@cumulusnetworks.com
>Subject: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
>
>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
>On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>there is a need to not re-forward the frames that come up to the
>kernel in software.
>
>Typically these are broadcast or multicast packets forwarded by the
>hardware to multiple destination ports including sending a copy of
>the packet to the kernel (e.g. an arp broadcast).
>The bridge driver will try to forward the packet again, resulting in
>two copies of the same packet.
>
>These packets can also come up to the kernel for logging when they hit
>a LOG acl in hardware.
>
>This patch makes forwarding a flag on the port similar to
>learn and flood and drops the packet just before forwarding.
>(The forwarding disable on a bridge is tested to work on our boxes.
>The bridge port flag addition is only compile tested.
>This will need to be further refined to cover cases where a non-switch port
>is bridged to a switch port etc. We will submit more patches to cover
>all cases if we agree on this approach).
>
>Other ways to solve the same problem could be to:
>- use the offload feature flag on these switch ports to avoid the
>re-forward:
>https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>
>- Or the switch driver can mark or set a flag in the skb, which the bridge
>driver can use to avoid a re-forward.
>

The proposed patch does not go along with the offload feature flag.
The premise of the offload feature flag is that offloading is driven by the 
switch port driver without user intervention. This patch requires different
setting for BR_FLOOD in the software bridge port and the switch port driver.
The alternatives suggested (offload flag or skb flag) are better.

The proposed patch avoids re-forward but not without cost. For example in the
case of unicast flood with local destination, the skb is cloned for each port
before the forward avoidance in __br_forward. Is it acceptable overhead?
  
[...]

^ permalink raw reply
* Re: [patch net-next 2/2] net: replace br_fdb_external_learn_* calls with switchdev notifier events
From: David Miller @ 2015-01-17 20:28 UTC (permalink / raw)
  To: jiri; +Cc: sfeldma, netdev, jhs, sfeldma, stephen, linus.luessing, tgraf
In-Reply-To: <20150117083112.GB1891@nanopsycho.orion>

From: Jiri Pirko <jiri@resnulli.us>
Date: Sat, 17 Jan 2015 09:31:12 +0100

>>> @@ -3026,11 +3026,17 @@ static void rocker_port_fdb_learn_work(struct work_struct *work)
>>>                 container_of(work, struct rocker_fdb_learn_work, work);
>>>         bool removing = (lw->flags & ROCKER_OP_FLAG_REMOVE);
>>>         bool learned = (lw->flags & ROCKER_OP_FLAG_LEARNED);
>>> +       struct netdev_switch_notifier_fdb_info info;
>>> +
>>> +       info.addr = lw->addr;
>>> +       info.vid = lw->vid;
>>
>>If you respin patch, use initializer to zero out other members, just
>>to future proof it.
> 
> There are no other members :)

That's why he said "future proof", he knows there are no other members
(currently) too.

^ permalink raw reply
* [PATCH net-next 2/2] vxlan: Eliminate dependency on UDP socket in transmit path
From: Tom Herbert @ 2015-01-17 18:18 UTC (permalink / raw)
  To: davem, tgraf, netdev
In-Reply-To: <1421518700-22460-1-git-send-email-therbert@google.com>

In the vxlan transmit path there is no need to reference the socket
for a tunnel which is needed for the receive side. We do, however,
need the vxlan_dev flags. This patch eliminate references
to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
vxlan_sock flags.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/vxlan.c           | 60 ++++++++++++++++++++-----------------------
 include/net/vxlan.h           | 13 ++++++----
 net/openvswitch/vport-vxlan.c |  6 ++---
 3 files changed, 38 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4fb4205..fb7805b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -270,12 +270,13 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
 					  __be16 port, u32 flags)
 {
 	struct vxlan_sock *vs;
-	u32 match_flags = flags & VXLAN_F_UNSHAREABLE;
+
+	flags &= VXLAN_F_RCV_FLAGS;
 
 	hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
 		if (inet_sk(vs->sock->sk)->inet_sport == port &&
 		    inet_sk(vs->sock->sk)->sk.sk_family == family &&
-		    (vs->flags & VXLAN_F_UNSHAREABLE) == match_flags)
+		    vs->flags == flags)
 			return vs;
 	}
 	return NULL;
@@ -1668,7 +1669,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
-static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, struct vxlan_sock *vs,
+static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, u32 vxflags,
 				struct vxlan_metadata *md)
 {
 	struct vxlanhdr_gbp *gbp;
@@ -1686,21 +1687,20 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, struct vxlan_sock *vs,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static int vxlan6_xmit_skb(struct vxlan_sock *vs,
-			   struct dst_entry *dst, struct sk_buff *skb,
+static int vxlan6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
 			   __be16 src_port, __be16 dst_port,
-			   struct vxlan_metadata *md, bool xnet)
+			   struct vxlan_metadata *md, bool xnet, u32 vxflags)
 {
 	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
-	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
+	bool udp_sum = !(vxflags & VXLAN_F_UDP_ZERO_CSUM6_TX);
 	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 	u16 hdrlen = sizeof(struct vxlanhdr);
 
-	if ((vs->flags & VXLAN_F_REMCSUM_TX) &&
+	if ((vxflags & VXLAN_F_REMCSUM_TX) &&
 	    skb->ip_summed == CHECKSUM_PARTIAL) {
 		int csum_start = skb_checksum_start_offset(skb);
 
@@ -1758,14 +1758,14 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 		}
 	}
 
-	if (vs->flags & VXLAN_F_GBP)
-		vxlan_build_gbp_hdr(vxh, vs, md);
+	if (vxflags & VXLAN_F_GBP)
+		vxlan_build_gbp_hdr(vxh, vxflags, md);
 
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port,
-			     udp_get_no_check6_tx(vs->sock->sk));
+			     !!(vxflags & VXLAN_F_UDP_ZERO_CSUM6_TX));
 	return 0;
 err:
 	dst_release(dst);
@@ -1773,20 +1773,19 @@ err:
 }
 #endif
 
-int vxlan_xmit_skb(struct vxlan_sock *vs,
-		   struct rtable *rt, struct sk_buff *skb,
+int vxlan_xmit_skb(struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
 		   __be16 src_port, __be16 dst_port,
-		   struct vxlan_metadata *md, bool xnet)
+		   struct vxlan_metadata *md, bool xnet, u32 vxflags)
 {
 	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
-	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
+	bool udp_sum = !!(vxflags & VXLAN_F_UDP_CSUM);
 	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 	u16 hdrlen = sizeof(struct vxlanhdr);
 
-	if ((vs->flags & VXLAN_F_REMCSUM_TX) &&
+	if ((vxflags & VXLAN_F_REMCSUM_TX) &&
 	    skb->ip_summed == CHECKSUM_PARTIAL) {
 		int csum_start = skb_checksum_start_offset(skb);
 
@@ -1838,14 +1837,14 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 		}
 	}
 
-	if (vs->flags & VXLAN_F_GBP)
-		vxlan_build_gbp_hdr(vxh, vs, md);
+	if (vxflags & VXLAN_F_GBP)
+		vxlan_build_gbp_hdr(vxh, vxflags, md);
 
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet,
-				   vs->sock->sk->sk_no_check_tx);
+				   !(vxflags & VXLAN_F_UDP_CSUM));
 }
 EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
@@ -1977,10 +1976,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		md.vni = htonl(vni << 8);
 		md.gbp = skb->mark;
 
-		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
-				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port, &md,
-				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
+		err = vxlan_xmit_skb(rt, skb, fl4.saddr,
+				     dst->sin.sin_addr.s_addr, tos, ttl, df,
+				     src_port, dst_port, &md,
+				     !net_eq(vxlan->net, dev_net(vxlan->dev)),
+				     vxlan->flags);
 		if (err < 0) {
 			/* skb is already freed. */
 			skb = NULL;
@@ -2036,10 +2036,10 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		md.vni = htonl(vni << 8);
 		md.gbp = skb->mark;
 
-		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
-				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, &md,
-				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
+		err = vxlan6_xmit_skb(ndst, skb, dev, &fl6.saddr, &fl6.daddr,
+				      0, ttl, src_port, dst_port, &md,
+				      !net_eq(vxlan->net, dev_net(vxlan->dev)),
+				      vxlan->flags);
 #endif
 	}
 
@@ -2511,15 +2511,11 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 
 	if (ipv6) {
 		udp_conf.family = AF_INET6;
-		udp_conf.use_udp6_tx_checksums =
-		    !(flags & VXLAN_F_UDP_ZERO_CSUM6_TX);
 		udp_conf.use_udp6_rx_checksums =
 		    !(flags & VXLAN_F_UDP_ZERO_CSUM6_RX);
 	} else {
 		udp_conf.family = AF_INET;
 		udp_conf.local_ip.s_addr = INADDR_ANY;
-		udp_conf.use_udp_checksums =
-		    !!(flags & VXLAN_F_UDP_CSUM);
 	}
 
 	udp_conf.local_udp_port = port;
@@ -2563,7 +2559,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
-	vs->flags = flags;
+	vs->flags = (flags & VXLAN_F_RCV_FLAGS);
 
 	/* Initialize the vxlan udp offloads structure */
 	vs->udp_offloads.port = port;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 7be8c34..2927d62 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -129,8 +129,12 @@ struct vxlan_sock {
 #define VXLAN_F_REMCSUM_RX		0x400
 #define VXLAN_F_GBP			0x800
 
-/* These flags must match in order for a socket to be shareable */
-#define VXLAN_F_UNSHAREABLE		VXLAN_F_GBP
+/* Flags that are used in the receive patch. These flags must match in
+ * order for a socket to be shareable
+ */
+#define VXLAN_F_RCV_FLAGS		(VXLAN_F_GBP |			\
+					 VXLAN_F_UDP_ZERO_CSUM6_RX |	\
+					 VXLAN_F_REMCSUM_RX)
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
@@ -138,11 +142,10 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 
 void vxlan_sock_release(struct vxlan_sock *vs);
 
-int vxlan_xmit_skb(struct vxlan_sock *vs,
-		   struct rtable *rt, struct sk_buff *skb,
+int vxlan_xmit_skb(struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
 		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
-		   bool xnet);
+		   bool xnet, u32 vxflags);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 8a2d54c..3cc983b 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -252,12 +252,10 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 	md.gbp = vxlan_ext_gbp(skb);
 
-	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
-			     fl.saddr, tun_key->ipv4_dst,
+	err = vxlan_xmit_skb(rt, skb, fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     &md,
-			     false);
+			     &md, false, vxlan_port->exts);
 	if (err < 0)
 		ip_rt_put(rt);
 	return err;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related
* [PATCH net-next 1/2] udp: Do not require sock in udp_tunnel_xmit_skb
From: Tom Herbert @ 2015-01-17 18:18 UTC (permalink / raw)
  To: davem, tgraf, netdev
In-Reply-To: <1421518700-22460-1-git-send-email-therbert@google.com>

The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/vxlan.c       | 10 ++++++----
 include/net/udp_tunnel.h  | 16 ++++++++--------
 net/ipv4/udp_tunnel.c     | 12 ++++++------
 net/ipv6/ip6_udp_tunnel.c | 12 ++++++------
 4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6b6b456..4fb4205 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1763,8 +1763,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
-	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
-			     ttl, src_port, dst_port);
+	udp_tunnel6_xmit_skb(dst, skb, dev, saddr, daddr, prio,
+			     ttl, src_port, dst_port,
+			     udp_get_no_check6_tx(vs->sock->sk));
 	return 0;
 err:
 	dst_release(dst);
@@ -1842,8 +1843,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
-	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
-				   ttl, df, src_port, dst_port, xnet);
+	return udp_tunnel_xmit_skb(rt, skb, src, dst, tos,
+				   ttl, df, src_port, dst_port, xnet,
+				   vs->sock->sk->sk_no_check_tx);
 }
 EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 2a50a70..1a20d33 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -77,17 +77,17 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 			   struct udp_tunnel_sock_cfg *sock_cfg);
 
 /* Transmit the skb using UDP encapsulation. */
-int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
-			struct sk_buff *skb, __be32 src, __be32 dst,
-			__u8 tos, __u8 ttl, __be16 df, __be16 src_port,
-			__be16 dst_port, bool xnet);
+int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb,
+			__be32 src, __be32 dst, __u8 tos, __u8 ttl,
+			__be16 df, __be16 src_port, __be16 dst_port,
+			bool xnet, bool nocheck);
 
 #if IS_ENABLED(CONFIG_IPV6)
-int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
-			 struct sk_buff *skb, struct net_device *dev,
-			 struct in6_addr *saddr, struct in6_addr *daddr,
+int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb,
+			 struct net_device *dev, struct in6_addr *saddr,
+			 struct in6_addr *daddr,
 			 __u8 prio, __u8 ttl, __be16 src_port,
-			 __be16 dst_port);
+			 __be16 dst_port, bool nocheck);
 #endif
 
 void udp_tunnel_sock_release(struct socket *sock);
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 9996e63..c83b354 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -75,10 +75,10 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 }
 EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
 
-int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
-			struct sk_buff *skb, __be32 src, __be32 dst,
-			__u8 tos, __u8 ttl, __be16 df, __be16 src_port,
-			__be16 dst_port, bool xnet)
+int udp_tunnel_xmit_skb(struct rtable *rt, struct sk_buff *skb,
+			__be32 src, __be32 dst, __u8 tos, __u8 ttl,
+			__be16 df, __be16 src_port, __be16 dst_port,
+			bool xnet, bool nocheck)
 {
 	struct udphdr *uh;
 
@@ -90,9 +90,9 @@ int udp_tunnel_xmit_skb(struct socket *sock, struct rtable *rt,
 	uh->source = src_port;
 	uh->len = htons(skb->len);
 
-	udp_set_csum(sock->sk->sk_no_check_tx, skb, src, dst, skb->len);
+	udp_set_csum(nocheck, skb, src, dst, skb->len);
 
-	return iptunnel_xmit(sock->sk, rt, skb, src, dst, IPPROTO_UDP,
+	return iptunnel_xmit(skb->sk, rt, skb, src, dst, IPPROTO_UDP,
 			     tos, ttl, df, xnet);
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb);
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index 8db6c98..32d9b26 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -62,14 +62,14 @@ error:
 }
 EXPORT_SYMBOL_GPL(udp_sock_create6);
 
-int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
-			 struct sk_buff *skb, struct net_device *dev,
-			 struct in6_addr *saddr, struct in6_addr *daddr,
-			 __u8 prio, __u8 ttl, __be16 src_port, __be16 dst_port)
+int udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sk_buff *skb,
+			 struct net_device *dev, struct in6_addr *saddr,
+			 struct in6_addr *daddr,
+			 __u8 prio, __u8 ttl, __be16 src_port,
+			 __be16 dst_port, bool nocheck)
 {
 	struct udphdr *uh;
 	struct ipv6hdr *ip6h;
-	struct sock *sk = sock->sk;
 
 	__skb_push(skb, sizeof(*uh));
 	skb_reset_transport_header(skb);
@@ -85,7 +85,7 @@ int udp_tunnel6_xmit_skb(struct socket *sock, struct dst_entry *dst,
 			    | IPSKB_REROUTED);
 	skb_dst_set(skb, dst);
 
-	udp6_set_csum(udp_get_no_check6_tx(sk), skb, saddr, daddr, skb->len);
+	udp6_set_csum(nocheck, skb, saddr, daddr, skb->len);
 
 	__skb_push(skb, sizeof(*ip6h));
 	skb_reset_network_header(skb);
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related
* [PATCH net-next 0/2] vxlan: Don't use UDP socket for transmit
From: Tom Herbert @ 2015-01-17 18:18 UTC (permalink / raw)
  To: davem, tgraf, netdev

UDP socket is not pertinent to transmit for UDP tunnels, checksum
enablement can be done with a socket. This patch set eliminates
reference to a socket in udp_tunnel_xmit functions and in VXLAN
trnasmit.

Also, make GBP, RCO, can CSUM6_RX flags visible to receive socket
and only match these for shareable socket.
Tom Herbert (2):
  udp: Do not require sock in udp_tunnel_xmit_skb
  vxlan: Eliminate dependency on UDP socket in transmit path

 drivers/net/vxlan.c           | 66 +++++++++++++++++++++----------------------
 include/net/udp_tunnel.h      | 16 +++++------
 include/net/vxlan.h           | 13 +++++----
 net/ipv4/udp_tunnel.c         | 12 ++++----
 net/ipv6/ip6_udp_tunnel.c     | 12 ++++----
 net/openvswitch/vport-vxlan.c |  6 ++--
 6 files changed, 62 insertions(+), 63 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply
* Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: John Fastabend @ 2015-01-17 17:35 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, danny.zhou, nhorman, dborkman, john.ronciak, hannes,
	brouer
In-Reply-To: <20150114.153509.1264618607573705890.davem@davemloft.net>

On 01/14/2015 12:35 PM, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Mon, 12 Jan 2015 20:35:11 -0800
>
>> +		if ((region.direction != DMA_BIDIRECTIONAL) &&
>> +		    (region.direction != DMA_TO_DEVICE) &&
>> +		    (region.direction != DMA_FROM_DEVICE))
>> +			return -EFAULT;
>   ...
>> +		if ((umem->nmap == npages) &&
>> +		    (0 != dma_map_sg(dev->dev.parent, umem->sglist,
>> +				     umem->nmap, region.direction))) {
>> +			region.iova = sg_dma_address(umem->sglist) + offset;
>
> I am having trouble seeing how this can work.
>
> dma_map_{single,sg}() mappings need synchronization after a DMA
> transfer takes place.
>
> For example if the DMA occurs to the device, then that region can
> be cached in the PCI controller's internal caches and thus future
> cpu writes into that memory region will not be seen, until a
> dma_sync_*() is invoked.
>
> That isn't going to happen when the device transmit queue is
> being completely managed in userspace.
>
> And this takes us back to the issue of protection, I don't think
> it is addressed properly yet.
>
> CAP_NET_ADMIN privileges do not mean "can crap all over memory"
> yet with this feature that can still happen.
>
> If we are dealing with a device which cannot provide strict protection
> to only the process's locked local pages, you have to do something
> to implement that protection.
>
> And you have _exactly_ one option to do that, abstracting the page
> addresses and eating a system call to trigger the sends, so that you
> can read from the user's (fake) descriptors and write into the real
> descriptors (translating the DMA addresses along the way) and
> triggering the TX doorbell.

OK, I think this brings us back to some of the original designs/ideas
we were thinking about with Daniel/Neil. We are going to take a look
at this. At least on the RX side we can have the af_packet logic give
us a set of DMA addresses'. I wonder if we can also make the busy
poll logic per queue and use it.

>
> I am not going to consider seriously an implementation that says "yeah
> sometimes the user can crap onto other people's memory", this isn't
> MS-DOS, it's a system where proper memory protections are mandatory
> rather than optional.
>

More to sort out on our side. Thanks for looking at the patches.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply
* Re: [PATCH] tipc: link: Remove unused function
From: Rickard Strandqvist @ 2015-01-17 17:18 UTC (permalink / raw)
  To: Chris Rorvick
  Cc: Jon Maloy, Allan Stephens, David S. Miller, Network Development,
	tipc-discussion, Linux Kernel Mailing List
In-Reply-To: <CAEUsAPaJcx-3uzH5e-wfYgicejdPt8R88hZ5X8K2gH4eXHVaVw@mail.gmail.com>

2015-01-17 17:55 GMT+01:00 Chris Rorvick <chris@rorvick.com>:
> On Sat, Jan 17, 2015 at 10:13 AM, Rickard Strandqvist
> <rickard_strandqvist@spectrumdigital.se> wrote:
>> Remove the function tipc_link_get_max_pkt() that is not used anywhere.
>
> This is already in the next tree:
>
> commit 54fef04ad05f15984082c225fe47ce6af8ea1c5c
> Author: Ying Xue <ying.xue@windriver.com>
> Date:   Fri Jan 9 15:27:03 2015 +0800
>
>     tipc: remove unused tipc_link_get_max_pkt routine
>
>     Signed-off-by: Ying Xue <ying.xue@windriver.com>
>     Tested-by: Tero Aho <Tero.Aho@coriant.com>
>     Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>


Hi

Ok sorry, I update my linux-next clon

Kind regards
Rickard Strandqvist

^ permalink raw reply
* Re: [PATCH] tipc: link: Remove unused function
From: Chris Rorvick @ 2015-01-17 16:55 UTC (permalink / raw)
  To: Rickard Strandqvist
  Cc: Jon Maloy, Allan Stephens, David S. Miller, netdev,
	tipc-discussion, linux-kernel
In-Reply-To: <1421511231-14482-1-git-send-email-rickard_strandqvist@spectrumdigital.se>

On Sat, Jan 17, 2015 at 10:13 AM, Rickard Strandqvist
<rickard_strandqvist@spectrumdigital.se> wrote:
> Remove the function tipc_link_get_max_pkt() that is not used anywhere.

This is already in the next tree:

commit 54fef04ad05f15984082c225fe47ce6af8ea1c5c
Author: Ying Xue <ying.xue@windriver.com>
Date:   Fri Jan 9 15:27:03 2015 +0800

    tipc: remove unused tipc_link_get_max_pkt routine

    Signed-off-by: Ying Xue <ying.xue@windriver.com>
    Tested-by: Tero Aho <Tero.Aho@coriant.com>
    Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply
* Re: Regression from "ipv4: Cache ip_error() routes even when not forwarding."
From: Francesco Ruggeri @ 2015-01-17 16:30 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Francesco Ruggeri, David Miller, netdev
In-Reply-To: <alpine.LFD.2.11.1501171023470.2088@ja.home.ssi.bg>

On Sat, Jan 17, 2015 at 12:25 AM, Julian Anastasov <ja@ssi.bg> wrote:
>
>         Hello,
>
> On Fri, 16 Jan 2015, Francesco Ruggeri wrote:
>
>> Commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."),
>> later slightly modified by cd0f0b95("ipv4: distinguish EHOSTUNREACH from
>> the ENETUNREACH"), introduced a regression where an ip_error route is cached
>> when an ARP request is received on a non-forwarding non matching interface,
>> and it affects later legitimate packets for the same destination even if
>> coming over different interfaces.
>> Attached are two scripts that show the problem. The first one does basic
>> forwarding, and the second one does proxy arp.
>> In both cases a dummy interface is created for the sole purpose of receiving
>> an ARP request that results in the ip_error route to be cached. The offending
>> ARP request is generated by using a 'ping -c 1' (commented out in the scripts).
>> Verified in 3.16 build.
>
>         3.16? Just in case, can you check if this
> fix from 3.18 helps:

Thanks, I will.

Francesco

>
> commit fa19c2b050ab5254326f5fc07096dd3c6a8d5d58
> Author: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
> Date:   Thu Oct 30 10:09:53 2014 +0100
>
>     ipv4: Do not cache routing failures due to disabled forwarding.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply
* [PATCH] tipc: link: Remove unused function
From: Rickard Strandqvist @ 2015-01-17 16:13 UTC (permalink / raw)
  To: Jon Maloy, Allan Stephens
  Cc: Rickard Strandqvist, David S. Miller, netdev, tipc-discussion,
	linux-kernel

Remove the function tipc_link_get_max_pkt() that is not used anywhere.

This was partially found by using a static code analysis program called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
---
 net/tipc/link.c |   27 ---------------------------
 net/tipc/link.h |    1 -
 2 files changed, 28 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 23bcc11..e92fcfb 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2266,33 +2266,6 @@ struct sk_buff *tipc_link_cmd_show_stats(const void *req_tlv_area, int req_tlv_s
 	return buf;
 }
 
-/**
- * tipc_link_get_max_pkt - get maximum packet size to use when sending to destination
- * @dest: network address of destination node
- * @selector: used to select from set of active links
- *
- * If no active link can be found, uses default maximum packet size.
- */
-u32 tipc_link_get_max_pkt(u32 dest, u32 selector)
-{
-	struct tipc_node *n_ptr;
-	struct tipc_link *l_ptr;
-	u32 res = MAX_PKT_DEFAULT;
-
-	if (dest == tipc_own_addr)
-		return MAX_MSG_SIZE;
-
-	n_ptr = tipc_node_find(dest);
-	if (n_ptr) {
-		tipc_node_lock(n_ptr);
-		l_ptr = n_ptr->active_links[selector & 1];
-		if (l_ptr)
-			res = l_ptr->max_pkt;
-		tipc_node_unlock(n_ptr);
-	}
-	return res;
-}
-
 static void link_print(struct tipc_link *l_ptr, const char *str)
 {
 	struct tipc_bearer *b_ptr;
diff --git a/net/tipc/link.h b/net/tipc/link.h
index 55812e8..084bd4e 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -216,7 +216,6 @@ void tipc_link_reset_list(unsigned int bearer_id);
 int tipc_link_xmit_skb(struct sk_buff *skb, u32 dest, u32 selector);
 int tipc_link_xmit(struct sk_buff_head *list, u32 dest, u32 selector);
 int __tipc_link_xmit(struct tipc_link *link, struct sk_buff_head *list);
-u32 tipc_link_get_max_pkt(u32 dest, u32 selector);
 void tipc_link_bundle_rcv(struct sk_buff *buf);
 void tipc_link_proto_xmit(struct tipc_link *l_ptr, u32 msg_typ, int prob,
 			  u32 gap, u32 tolerance, u32 priority, u32 acked_mtu);
-- 
1.7.10.4

^ permalink raw reply related
* Re: [PATCH 7/9] rhashtable: Per bucket locks & deferred expansion/shrinking
From: Patrick McHardy @ 2015-01-17 11:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, David Laight, davem@davemloft.net,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com,
	edumazet@google.com, john.r.fastabend@intel.com,
	josh@joshtriplett.org, netfilter-devel@vger.kernel.org
In-Reply-To: <20150117101309.GA19585@gondor.apana.org.au>

On 17.01, Herbert Xu wrote:
> On Sat, Jan 17, 2015 at 09:51:46AM +0000, Patrick McHardy wrote:
> > 
> > I agree, however at least in the case of nftables you can easily do the same thing by adding millions of rules.
> 
> I think that's a problem in itself.  If a single packet can kill
> the CPU through millions of rules, then namespaces would be a joke.
> There has to be a limit to the number of rules or the processing
> has to be deferred into thread context (thus subject to scheduler
> control) at some point.

I think that's a problem that's unrelated to netfilter. Its quite easy
to configure something that will make the network stack eat up all
the CPU, consider f.i.:

bridge name	bridge id		STP enabled	interfaces
br0		8000.625dda62a3d4	no		veth0
							veth1

Now thing of bridging, veth, TC actions, iptables, routing, all in
combination. Sure, single cases can be caught it might be possible
to restrict them, but I don't believe that in the near term we will
be able to handle this properly.

And even if all loops etc are handled, what keeps the user from
creating a million veth devices and putting them into a long
chain?

This needs to be fixed on a different level in my opinion.

> > It doesn't make things worse.
> 
> So I don't think that's a valid justification for ignoring this
> hash table problem.

^ permalink raw reply
* Re: [PATCH 7/9] rhashtable: Per bucket locks & deferred expansion/shrinking
From: Herbert Xu @ 2015-01-17 10:13 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Thomas Graf, David Laight, davem@davemloft.net,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com,
	edumazet@google.com, john.r.fastabend@intel.com,
	josh@joshtriplett.org, netfilter-devel@vger.kernel.org
In-Reply-To: <7AE5EEE0-60C7-43B4-848A-2D952D3A6DEF@trash.net>

On Sat, Jan 17, 2015 at 09:51:46AM +0000, Patrick McHardy wrote:
> 
> I agree, however at least in the case of nftables you can easily do the same thing by adding millions of rules.

I think that's a problem in itself.  If a single packet can kill
the CPU through millions of rules, then namespaces would be a joke.
There has to be a limit to the number of rules or the processing
has to be deferred into thread context (thus subject to scheduler
control) at some point.

> It doesn't make things worse.

So I don't think that's a valid justification for ignoring this
hash table problem.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply
* Re: [PATCH 7/9] rhashtable: Per bucket locks & deferred expansion/shrinking
From: Patrick McHardy @ 2015-01-17  9:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, David Laight, davem@davemloft.net,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com,
	edumazet@google.com, john.r.fastabend@intel.com,
	josh@joshtriplett.org, netfilter-devel@vger.kernel.org
In-Reply-To: <20150117093228.GA19137@gondor.apana.org.au>

Am 17. Januar 2015 09:32:28 GMT+00:00, schrieb Herbert Xu <herbert@gondor.apana.org.au>:
>On Sat, Jan 17, 2015 at 08:06:21AM +0000, Patrick McHardy wrote:
>> 
>> Resizing might also fail because of memory allocation problems, but
>> I'd argue that its better to continue with a non-optimal sized table
>> and retry later than to completely fail, at least unless the API
>> user has explicitly requested this behaviour.
>> 
>> As for the element counter, yeah, it should prevent overflow. In that
>> case I agree that failing insertion is the easiest solution.
>
>Well you have to consider the security aspect.  These days root-
>only is no longer an acceptable excuse given things like namespaces.
>
>If you don't fail the insertions while the expansion is ongoing,
>and assuming a dump can postpone expansions, then you can essentially
>insert entries into the hash table at will which is an easy DoS
>attack.

I agree, however at least in the case of nftables you can easily do the same thing by adding millions of rules.

It doesn't make things worse.

>Note that you don't have to fail the insertion right away.  I
>think waiting until you reach max * 2 would be fine.




^ permalink raw reply
* Attn
From: Simone Wolken @ 2015-01-17  9:10 UTC (permalink / raw)






A Donation Has Been Made To You Email:gloriamack101@outlook.com For More Details 























We believe 1 family torn apart by war is too many.

Join UNHCR and share their stories:

http://stories.unhcr.org?link=email

^ permalink raw reply
* Re: [PATCH 7/9] rhashtable: Per bucket locks & deferred expansion/shrinking
From: Herbert Xu @ 2015-01-17  9:32 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Thomas Graf, David Laight, davem@davemloft.net,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com,
	edumazet@google.com, john.r.fastabend@intel.com,
	josh@joshtriplett.org, netfilter-devel@vger.kernel.org
In-Reply-To: <20150117080621.GB3968@acer.localdomain>

On Sat, Jan 17, 2015 at 08:06:21AM +0000, Patrick McHardy wrote:
> 
> Resizing might also fail because of memory allocation problems, but
> I'd argue that its better to continue with a non-optimal sized table
> and retry later than to completely fail, at least unless the API
> user has explicitly requested this behaviour.
> 
> As for the element counter, yeah, it should prevent overflow. In that
> case I agree that failing insertion is the easiest solution.

Well you have to consider the security aspect.  These days root-
only is no longer an acceptable excuse given things like namespaces.

If you don't fail the insertions while the expansion is ongoing,
and assuming a dump can postpone expansions, then you can essentially
insert entries into the hash table at will which is an easy DoS
attack.

Note that you don't have to fail the insertion right away.  I
think waiting until you reach max * 2 would be fine.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply
* Final Warning: Webmail Quota Exceeded
From: webmail Administrator @ 2015-01-17  7:27 UTC (permalink / raw)
  To: Recipients

Warning: webmail

Your mailbox has exceeded the 100 MB storage limit can can not receive or send email until you update your mailbox. To update click the link below and fill complete the update to your mailbox

http://vtxch.tripod.com/

After 24 hours without receiving any response you are deactivate your mailbox.

Click here: http://vtxch.tripod.com/


Thank you for using webmail Administrator
Copyright © 2014 the Help Desk
webmail Administrator 

-- 
Ezen uzenet virusellenorzesen esett at, es virusmentesnek bizonyult.
MailScanner - Email Virus Scanner - http://www.mailscanner.info
Pentru orice informatii sau suport tehnic: http://www.smarttrend.ro 

^ permalink raw reply
* Re: NETDEV WATCHDOG:  internal(r8152): transmit queue 0 timed out
From: poma @ 2015-01-17  8:56 UTC (permalink / raw)
  To: Community support for Fedora users; +Cc: netdev, linux-usb
In-Reply-To: <m9c8hl$2sd$1@ger.gmane.org>

On 17.01.2015 00:57, sean darcy wrote:
> On 01/16/2015 07:09 AM, poma wrote:
>> On 16.01.2015 10:37, Hayes Wang wrote:
>>>   poma [mailto:pomidorabelisima@gmail.com]
>>>> Sent: Friday, January 16, 2015 4:25 PM
>>> [...]
>>>>> This looks like a USB problem. Is there a way to get usb (or
>>>>> NetworkManager) to reinitialize the driver when this happens?
>>>>
>>>> I would ask these people for advice, therefore.
>>>
>>> Our hw engineers need to analyse the behavior of the device.
>>> However, I don't think you have such instrument to provide
>>> the required information. If we don't know the reason, we
>>> couldn't give you the proper solution. Besides, your solution
>>> would work if and only if reloading the driver is helpful.
>>>
>>> The issue have to debug from the hardware, and I have no idea
>>> about what the software could do before analysing the hw. Maybe
>>> you could try the following driver first to check if it is useful.
>>>
>>> http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=2&PNid=13&PFid=56&Level=5&Conn=4&DownTypeID=3&GetDown=false
>>>
>>> Best Regards,
>>> Hayes
>>>
>>
>> Thanks for your response, Mr. Hayes.
>>
>> Mr. Sean, please download and check if "timeout" is still present with built RTL8153 module from REALTEK site, as Mr. Hayes proposed.
>> http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=2&PNid=13&PFid=56&Level=5&Conn=4&DownTypeID=3&GetDown=false#2
>> r8152.53-2.03.0.tar.bz2
>>
>> Procedure - should be equal for both, Fedora 21 & 20:
>>
>> $ uname -r
>> 3.17.8-300.fc21.x86_64
>>
>> $ su -c 'yum install kernel-devel'
>>
>> $ tar xf r8152.53-2.03.0.tar.bz2
>> $ cd r8152-2.03.0/
>> $ make
>> $ su
>>
>> # cp 50-usb-realtek-net.rules /etc/udev/rules.d/
>> # udevadm trigger --action=add
>>
>> # modprobe -rv r8152
>> # cp r8152.ko /lib/modules/$(uname -r)/updates/
>> # depmod
>> # modprobe -v r8152
>>
>>
>> poma
>>
> OK. Did all that. Now to see if I get the same problem over the next 
> couple of weeks.
> 
> I'd never heard about the updates subfolder in modules. Very slick.
> 
> But when I update the kernel, I get to do this again correct? How will I 

$ cd r8152-2.03.0/
$ make clean
$ make
$ su

# cp r8152.ko /lib/modules/$(uname -r)/updates/
# depmod
# modprobe -v r8152

is part of the procedure necessary for a new i.e. an upgraded kernel.


> know that this module has been incorporated in the running kernel. 
> modinfo doesn't give any version info.
> 

$ modinfo r8152 -n

will show the module considered for loading.


> BTW, I'm not sure what modprobe --dump-modversions is supposed to do, 
> but it doesn't:
> 
> #modprobe --dump-modversions r8152
> modprobe: FATAL: Module r8152 not found.
> # modprobe --dump-modversions r8152.ko
> modprobe: FATAL: Module r8152.ko not found.
> #lsmod | grep 8152
> r8152                  49646  0
> 

"--dump-modversions" will probably show the same error for any module.


> Thanks for all your help.
> 
> sean
> 

YW



-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

^ permalink raw reply
* Re: [patch net-next 2/2] net: replace br_fdb_external_learn_* calls with switchdev notifier events
From: Jiri Pirko @ 2015-01-17  8:31 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Jamal Hadi Salim, Scott Feldman,
	stephen@networkplumber.org, linus.luessing, Thomas Graf
In-Reply-To: <CAE4R7bAXvxFzuvuz5YPviTDsMLCYWnQxbkHkVJbjYq6+K574rw@mail.gmail.com>

Fri, Jan 16, 2015 at 08:32:34PM CET, sfeldma@gmail.com wrote:
>On Thu, Jan 15, 2015 at 2:49 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> This patch benefits from newly introduced switchdev notifier and uses it
>> to propagate fdb learn events from rocker driver to bridge. That avoids
>> direct function calls and possible use by other listeners (ovs).
>>
>> Suggested-by: Thomas Graf <tgraf@suug.ch>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>
>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>
>>  drivers/net/ethernet/rocker/rocker.c | 10 +++++--
>>  include/linux/if_bridge.h            | 18 -------------
>>  include/net/switchdev.h              | 11 ++++++++
>>  net/bridge/br.c                      | 52 +++++++++++++++++++++++++++++++++++-
>>  net/bridge/br_fdb.c                  | 38 +++-----------------------
>>  net/bridge/br_private.h              |  4 +++
>>  6 files changed, 78 insertions(+), 55 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>> index cad8cf9..964d719 100644
>> --- a/drivers/net/ethernet/rocker/rocker.c
>> +++ b/drivers/net/ethernet/rocker/rocker.c
>> @@ -3026,11 +3026,17 @@ static void rocker_port_fdb_learn_work(struct work_struct *work)
>>                 container_of(work, struct rocker_fdb_learn_work, work);
>>         bool removing = (lw->flags & ROCKER_OP_FLAG_REMOVE);
>>         bool learned = (lw->flags & ROCKER_OP_FLAG_LEARNED);
>> +       struct netdev_switch_notifier_fdb_info info;
>> +
>> +       info.addr = lw->addr;
>> +       info.vid = lw->vid;
>
>If you respin patch, use initializer to zero out other members, just
>to future proof it.

There are no other members :)

^ permalink raw reply
* Re: [PATCH] net: rocker: Add basic netdev counters - v2
From: Jiri Pirko @ 2015-01-17  8:29 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, Scott Feldman
In-Reply-To: <1421443349-77718-1-git-send-email-dsahern@gmail.com>

Fri, Jan 16, 2015 at 10:22:29PM CET, dsahern@gmail.com wrote:
>Add packet and byte counters for RX and TX paths.
>
>$ ifconfig eth1
>eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>        inet6 fe80::5054:ff:fe12:3501  prefixlen 64  scopeid 0x20<link>
>        ether 52:54:00:12:35:01  txqueuelen 1000  (Ethernet)
>        RX packets 63  bytes 15813 (15.4 KiB)
>        RX errors 1  dropped 0  overruns 0  frame 0
>        TX packets 79  bytes 17991 (17.5 KiB)
>        TX errors 7  dropped 0 overruns 0  carrier 0  collisions 0
>
>Rx / Tx errors tested by injecting faults in qemu's hardware model for Rocker.
>
>v2:
>- moved counter locations to avoid potential use after free per Florian's comment
>
>Signed-off-by: David Ahern <dsahern@gmail.com>

Acked-by: Jiri Pirko <jiri@resnulli.us>

Thanks David.

^ permalink raw reply
* Re: Regression from "ipv4: Cache ip_error() routes even when not forwarding."
From: Julian Anastasov @ 2015-01-17  8:25 UTC (permalink / raw)
  To: Francesco Ruggeri; +Cc: fruggeri, davem, netdev
In-Reply-To: <20150117000755.CC87348A0C5@fruggeri-Arora18.sjc.aristanetworks.com>


	Hello,

On Fri, 16 Jan 2015, Francesco Ruggeri wrote:

> Commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."),
> later slightly modified by cd0f0b95("ipv4: distinguish EHOSTUNREACH from
> the ENETUNREACH"), introduced a regression where an ip_error route is cached
> when an ARP request is received on a non-forwarding non matching interface,
> and it affects later legitimate packets for the same destination even if
> coming over different interfaces.
> Attached are two scripts that show the problem. The first one does basic
> forwarding, and the second one does proxy arp.
> In both cases a dummy interface is created for the sole purpose of receiving
> an ARP request that results in the ip_error route to be cached. The offending
> ARP request is generated by using a 'ping -c 1' (commented out in the scripts).
> Verified in 3.16 build.

	3.16? Just in case, can you check if this
fix from 3.18 helps:

commit fa19c2b050ab5254326f5fc07096dd3c6a8d5d58
Author: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Date:   Thu Oct 30 10:09:53 2014 +0100

    ipv4: Do not cache routing failures due to disabled forwarding.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox