Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] Add packet recirculation
From: Simon Horman @ 2013-04-09  7:50 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev@openvswitch.org, netdev, Ravi K, Isaku Yamahata, Ben Pfaff
In-Reply-To: <CAEP_g=9nj24hZcA-czzuz3xaweATxRxmibuFaqODTSoPmRmHsw@mail.gmail.com>

On Mon, Apr 08, 2013 at 06:46:29PM -0700, Jesse Gross wrote:
> On Sun, Apr 7, 2013 at 11:43 PM, Simon Horman <horms@verge.net.au> wrote:
> > diff --git a/datapath/actions.c b/datapath/actions.c
> > index e9634fe..7b0f022 100644
> > --- a/datapath/actions.c
> > +++ b/datapath/actions.c
> > @@ -617,6 +617,9 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
> >                 case OVS_ACTION_ATTR_SAMPLE:
> >                         err = sample(dp, skb, a);
> >                         break;
> > +
> > +               case OVS_ACTION_ATTR_RECIRCULATE:
> > +                       return 1;
> 
> I think that if we've had a previous output action with the port
> stored in prev_port then this will cause the packet to not actually be
> output.

I'm not so sure.

I see something like this occurring:

1. Iteration of for loop for output action

   switch (nla_type(a)) {
   case OVS_ACTION_ATTR_OUTPUT:
	prev_port = nla_get_u32(a);
	break;
	...
   }

2. Iteration of of for loop for next action, lets say its is recirculate

   i. Output packet

   if (prev_port != -1) {
	do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
	prev_port = -1;
   }

   ii. Return due to recirculate
   switch (nla_type(a)) {
   ...
   case OVS_ACTION_ATTR_RECIRCULATE:
           return 1;
   }


Am I missing something?

> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> > index e8be795..ab39dd7 100644
> > --- a/datapath/datapath.c
> > +++ b/datapath/datapath.c
> >  void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
> [...]
> > +               if (IS_ERR_OR_NULL(skb)) {
> > +                       break;
> > +               } else if (unlikely(!limit--)) {
> 
> Should this be a predecrement?

I will make it so.

> > +                       kfree_skb(skb);
> 
> Should we log some kind of rate limited warning here?

Sure.

> > +                       return;
> 
> In the first case we use break to exit the loop and here we use
> return.  Both should have the same effect so it might be nice to make
> them the same.
> 
> > @@ -901,6 +913,9 @@ static int validate_and_copy_actions__(const struct nlattr *attr,
> >                         skip_copy = true;
> >                         break;
> >
> > +               case OVS_ACTION_ATTR_RECIRCULATE:
> > +                       break;
> 
> I think we might want to jump out the loop here to better model how
> the actions are actually executed.

Sure, perhaps something like this?

diff --git a/datapath/datapath.c b/datapath/datapath.c
index ab39dd7..721a52c 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -914,7 +914,7 @@ static int validate_and_copy_actions__(const struct nlattr *attr,
 			break;
 
 		case OVS_ACTION_ATTR_RECIRCULATE:
-			break;
+			goto out;
 
 		default:
 			return -EINVAL;
@@ -926,6 +926,7 @@ static int validate_and_copy_actions__(const struct nlattr *attr,
 		}
 	}
 
+out:
 	if (rem > 0)
 		return -EINVAL;
 

> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e4a2f75..31255f6 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> >  dp_netdev_port_input(struct dp_netdev *dp, struct dp_netdev_port *port,
> >                       struct ofpbuf *packet)
> [...]
> > +        } else {
> > +            dp->n_missed++;
> > +            dp_netdev_output_userspace(dp, packet, DPIF_UC_MISS, &key, NULL);
> > +            recirculate = false;
> > +        }
> > +    } while (recirculate && limit--);
> 
> I have the same question about predecrement here.

I will change this one too.

> > @@ -1163,6 +1190,7 @@ dp_netdev_sample(struct dp_netdev *dp,
> >      const struct nlattr *subactions = NULL;
> >      const struct nlattr *a;
> >      size_t left;
> > +    uint32_t skb_mark;
> 
> I don't think it's right to have a new (and uninitialized) copy of
> skb_mark here.  We should have the same one all the way through, like
> we do in the kernel.

Sure. I will pass it as an argument to dp_netdev_sample()

> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index 47830c1..5129da1 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> 
> I'm still working on more detailed comments for this.  However, I'm
> concerned about whether the behavior for revalidation and stats is
> correct.

I am a little concerned about that too.
Perhaps Ben could look over it?

^ permalink raw reply related

* Re: [PATCH 1/3] if.h: add IFF_BRIDGE_RESTRICTED flag
From: Antonio Quartulli @ 2013-04-09  7:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David S. Miller, bridge@lists.linux-foundation.org,
	netdev@vger.kernel.org
In-Reply-To: <CAOaVG17nHuoRsf2Kss6LnpgC-Aojcso_zLYyBa18Vn9CGGY+GA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3152 bytes --]

On Mon, Apr 08, 2013 at 11:58:48 -0700, Stephen Hemminger wrote:
> The standard way to do this is to use netfilter. Considering the
> additional device flags and skb flag changes, I am not sure that your
> method is better.

To make it a bit more clear:

1) the skb flag will be used on the "receiving end-point" by batman-adv to mark
received packets and so instruct the bridge to do not forward them to restricted
interfaces.

2) the IFF_ flag is used by batman-adv on the "sending side" to determine
whether a packet has been originated by a restricted interface and so instruct
the remote endpoint to mark the skb when received.

3) to make the bridge code general enough, I decided to let it mark packets
coming from restricted interfaces as well so that it can also apply the policy
at 1) locally, without any further setting. The logic described in 1) is
therefore applied by the bridge even for local packets (not passing through
batman-adv)



Point 3) is the only one where netfilter might help. But using two mechanism to
achieve one goal looked not sane to me and therefore I decided to to do it this
way. And actually the code allowing point 3 is only:

+       skb->bridge_restricted = !!(skb->dev->flags & IFF_BRIDGE_RESTRICTED);


I hope this summary did not create further confusion :)

Thanks,

> 
> On Mon, Apr 8, 2013 at 10:41 AM, Antonio Quartulli
> <antonio@open-mesh.com> wrote:
> > This new flag tells whether a network device has to be
> > considered as restricted in the new bridge forwarding logic.
> >
> > Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
> > ---
> >  include/uapi/linux/if.h | 1 +
> >  net/core/dev.c          | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
> > index 1ec407b..5c3a9bd 100644
> > --- a/include/uapi/linux/if.h
> > +++ b/include/uapi/linux/if.h
> > @@ -83,6 +83,7 @@
> >  #define IFF_SUPP_NOFCS 0x80000         /* device supports sending custom FCS */
> >  #define IFF_LIVE_ADDR_CHANGE 0x100000  /* device supports hardware address
> >                                          * change when it's running */
> > +#define IFF_BRIDGE_RESTRICTED 0x200000 /* device is bridge-restricted */
> >
> >
> >  #define IF_GET_IFACE   0x0001          /* for querying only */
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 3655ff9..49eafc8 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4627,7 +4627,7 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
> >
> >         dev->flags = (flags & (IFF_DEBUG | IFF_NOTRAILERS | IFF_NOARP |
> >                                IFF_DYNAMIC | IFF_MULTICAST | IFF_PORTSEL |
> > -                              IFF_AUTOMEDIA)) |
> > +                              IFF_AUTOMEDIA | IFF_BRIDGE_RESTRICTED)) |
> >                      (dev->flags & (IFF_UP | IFF_VOLATILE | IFF_PROMISC |
> >                                     IFF_ALLMULTI));
> >
> > --
> > 1.8.1.5
> >

-- 
Antonio Quartulli

..each of us alone is worth nothing..
Ernesto "Che" Guevara

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: be2net: GRO for non-inet protocols
From: Erik Hugne @ 2013-04-09  7:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, netdev
In-Reply-To: <1365436285.3887.16.camel@edumazet-glaptop>

On Mon, Apr 08, 2013 at 08:51:25AM -0700, Eric Dumazet wrote:
> On Mon, 2013-04-08 at 17:24 +0200, Erik Hugne wrote:
> > On Mon, Apr 08, 2013 at 08:40:10AM +0200, Erik Hugne wrote:
> > > Thanks Eric, it works as expected after applying this.
> > 
> > So, on to the next problem, now i'm getting corrupted packets
> > from the driver instead. Would be great to get some comments
> > from the Emulex guys regarding this.
> > 
> > Attaching a printk trace where i log the mac header and packet data of 
> > all 0x88CA (TIPC) packets in the gro_receive routine that have an 
> > erroneous TIPC header. This happens immediately when i register 
> > myself with the device.
> > 
> 
> 
> You could check using a GRE tunnel, that the GRO support I added in
> recent kernel is working or not. (including the be2net patch I sent)

I haven't found any issues with the GRO support itself. It's only on
the system with Emulex NIC's that i have problems.
Unfortunately, I'm stuck to kernel 3.0.61 on these machines, newer kernels
won't boot due to an unresolved driver issue with the hw raid controller :/

//E

^ permalink raw reply

* pull-request: can 2013-04-09
From: Marc Kleine-Budde @ 2013-04-09  8:13 UTC (permalink / raw)
  To: netdev; +Cc: linux-can, davem

Hello David,

here's a fix for the v3.9 release cycle, if not too late:

Wei Yongjun contributes a patch for the can-gw protocoll. The patch fixes the
memory allocated with kmem_cache_alloc(), is now freed using kmem_cache_free(),
not kfree().

regards,
Marc

---

The following changes since commit c802d759623acbd6e1ee9fbdabae89159a513913:

  netrom: fix invalid use of sizeof in nr_recvmsg() (2013-04-08 22:49:23 -0400)

are available in the git repository at:

  git://gitorious.org/linux-can/linux-can.git fixes-for-3.9

for you to fetch changes up to 3480a2125923e4b7a56d79efc76743089bf273fc:

  can: gw: use kmem_cache_free() instead of kfree() (2013-04-09 09:58:44 +0200)

----------------------------------------------------------------
Wei Yongjun (1):
      can: gw: use kmem_cache_free() instead of kfree()

 net/can/gw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)



^ permalink raw reply

* [PATCH] can: gw: use kmem_cache_free() instead of kfree()
From: Marc Kleine-Budde @ 2013-04-09  8:13 UTC (permalink / raw)
  To: netdev; +Cc: linux-can, davem, Wei Yongjun, linux-stable, Marc Kleine-Budde
In-Reply-To: <1365495215-12081-1-git-send-email-mkl@pengutronix.de>

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/gw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/can/gw.c b/net/can/gw.c
index 2d117dc..117814a 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -466,7 +466,7 @@ static int cgw_notifier(struct notifier_block *nb,
 			if (gwj->src.dev == dev || gwj->dst.dev == dev) {
 				hlist_del(&gwj->list);
 				cgw_unregister_filter(gwj);
-				kfree(gwj);
+				kmem_cache_free(cgw_cache, gwj);
 			}
 		}
 	}
@@ -864,7 +864,7 @@ static void cgw_remove_all_jobs(void)
 	hlist_for_each_entry_safe(gwj, nx, &cgw_list, list) {
 		hlist_del(&gwj->list);
 		cgw_unregister_filter(gwj);
-		kfree(gwj);
+		kmem_cache_free(cgw_cache, gwj);
 	}
 }
 
@@ -920,7 +920,7 @@ static int cgw_remove_job(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 
 		hlist_del(&gwj->list);
 		cgw_unregister_filter(gwj);
-		kfree(gwj);
+		kmem_cache_free(cgw_cache, gwj);
 		err = 0;
 		break;
 	}
-- 
1.8.2.rc2

^ permalink raw reply related

* Re: [PATCH] can: gw: use kmem_cache_free() instead of kfree()
From: Marc Kleine-Budde @ 2013-04-09  8:20 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: socketcan, davem, yongjun_wei, linux-can, netdev
In-Reply-To: <CAPgLHd9eAfwJMJ9P5RqOKNP5kq87y61V7CTw2J--Ld8uHT0RrQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

On 04/09/2013 08:16 AM, Wei Yongjun wrote:
> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> 
> memory allocated by kmem_cache_alloc() should be freed using
> kmem_cache_free(), not kfree().
> 
> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Thanks for catching this. Applied to can/master.
I've send a pull request to David, along with Oliver's Acked-by and
stable on Cc.

Thanks,
Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Modifying the exponential backoff on new connection SYN packets
From: Ed W @ 2013-04-09  9:06 UTC (permalink / raw)
  To: Linux Networking Developer Mailing List

Hi, I have an unusual situation in that I would like to cap the 
retransmit frequency on the initial SYN packets at some fairly short 
time interval, eg a max of 2-4 seconds, rather than the usual 
exponentially increasing interval.  I could use some help figuring out 
the exact point in the kernel to make such a change please?

The situation is that I am building a firewall which will be used with 
expensive satellite links (think $10-100/MB range). Some of the links 
are dialup links which take 20-40 seconds to bring up, and then we have 
PPP drop the link after 10 seconds of inactivity. However, with the 
default exponential backoff on new connections we are generally 
retransmitting with a 16sec or 32 sec interval by the time the dialup 
link is connected, the timout for inactivity kicks in and drops the link 
before the retransmit...

I believe the exponential backoff is intended to prevent amplification 
attacks? In this particular case we are accounting for traffic per user 
and the internet costs are extremely substantial, so I think it's not a 
problem

Could someone please help figure out the appropriate place to tweak the 
exponential backoff? Note this is not retransmit of in flight data, just 
the backoff for the initial syn (which doesn't seem to be configurable 
in user space?)

Note, we have an application proxy here, but I can't see a sensible way 
to fake it in user space without a lot of extra coding - any suggestions?

Thanks

Ed W

^ permalink raw reply

* [Patch net-next] vxlan: revert "vxlan: Bypass encapsulation if the destination is local"
From: Cong Wang @ 2013-04-09  9:57 UTC (permalink / raw)
  To: netdev; +Cc: Sridhar Samudrala, David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

This reverts commit 9dcc71e1fdbb7aa10d92a3d35e8a201adc84abd0.
It apparently breaks my vxlan tests between different namespaces.

Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c |   59 +++++++++++++-------------------------------------
 1 files changed, 16 insertions(+), 43 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9a64715..62a4438 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -912,36 +912,6 @@ static int handle_offloads(struct sk_buff *skb)
 	return 0;
 }
 
-/* Bypass encapsulation if the destination is local */
-static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
-			       struct vxlan_dev *dst_vxlan)
-{
-	struct pcpu_tstats *tx_stats = this_cpu_ptr(src_vxlan->dev->tstats);
-	struct pcpu_tstats *rx_stats = this_cpu_ptr(dst_vxlan->dev->tstats);
-
-	skb->pkt_type = PACKET_HOST;
-	skb->encapsulation = 0;
-	skb->dev = dst_vxlan->dev;
-	__skb_pull(skb, skb_network_offset(skb));
-
-	if (dst_vxlan->flags & VXLAN_F_LEARN)
-		vxlan_snoop(skb->dev, INADDR_LOOPBACK, eth_hdr(skb)->h_source);
-
-	u64_stats_update_begin(&tx_stats->syncp);
-	tx_stats->tx_packets++;
-	tx_stats->tx_bytes += skb->len;
-	u64_stats_update_end(&tx_stats->syncp);
-
-	if (netif_rx(skb) == NET_RX_SUCCESS) {
-		u64_stats_update_begin(&rx_stats->syncp);
-		rx_stats->rx_packets++;
-		rx_stats->rx_bytes += skb->len;
-		u64_stats_update_end(&rx_stats->syncp);
-	} else {
-		skb->dev->stats.rx_dropped++;
-	}
-}
-
 static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				  struct vxlan_rdst *rdst, bool did_rsc)
 {
@@ -952,6 +922,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	struct vxlanhdr *vxh;
 	struct udphdr *uh;
 	struct flowi4 fl4;
+	unsigned int pkt_len = skb->len;
 	__be32 dst;
 	__u16 src_port, dst_port;
         u32 vni;
@@ -964,8 +935,22 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 	if (!dst) {
 		if (did_rsc) {
+			__skb_pull(skb, skb_network_offset(skb));
+			skb->ip_summed = CHECKSUM_NONE;
+			skb->pkt_type = PACKET_HOST;
+
 			/* short-circuited back to local bridge */
-			vxlan_encap_bypass(skb, vxlan, vxlan);
+			if (netif_rx(skb) == NET_RX_SUCCESS) {
+				struct pcpu_tstats *stats = this_cpu_ptr(dev->tstats);
+
+				u64_stats_update_begin(&stats->syncp);
+				stats->tx_packets++;
+				stats->tx_bytes += pkt_len;
+				u64_stats_update_end(&stats->syncp);
+			} else {
+				dev->stats.tx_errors++;
+				dev->stats.tx_aborted_errors++;
+			}
 			return NETDEV_TX_OK;
 		}
 		goto drop;
@@ -1012,18 +997,6 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		goto tx_error;
 	}
 
-	/* Bypass encapsulation if the destination is local */
-	if (rt->rt_flags & RTCF_LOCAL) {
-		struct vxlan_dev *dst_vxlan;
-
-		ip_rt_put(rt);
-		dst_vxlan = vxlan_find_vni(dev_net(dev), vni);
-		if (!dst_vxlan)
-			goto tx_error;
-		vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-		return NETDEV_TX_OK;
-	}
-
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
 			      IPSKB_REROUTED);
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 0/3 net-next RFC] selftest: Introduce test abstraction for net
From: Alexandru Copot @ 2013-04-09 10:30 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemb, dborkman, edumazet, Alexandru Copot, Daniel Baluta

This series adds a generic test abstraction that can make
writing testcases easier. A generic_test structure is
used to define a test and its methods.

The second patch updates the socket tests to use the
new framework, and the third patch creates new tests
for [set/get]sockopt with some IPV6_* options.

Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>

Alexandru Copot (3):
  selftest: add abstractions for net selftests
  selftest: update socket tests to use the testing abstractions
  selftests: add socket options test with IPv6 testcases

 tools/testing/selftests/net/Makefile         |   7 +-
 tools/testing/selftests/net/run_netsocktests |  10 ++
 tools/testing/selftests/net/selftests.c      |  25 ++++
 tools/testing/selftests/net/selftests.h      |  42 +++++++
 tools/testing/selftests/net/socket.c         | 107 +++++++++++-----
 tools/testing/selftests/net/sockopt.c        | 179 +++++++++++++++++++++++++++
 6 files changed, 338 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/net/selftests.c
 create mode 100644 tools/testing/selftests/net/selftests.h
 create mode 100644 tools/testing/selftests/net/sockopt.c

-- 
1.8.2

^ permalink raw reply

* [PATCH 1/3 net-next RFC] selftest: add abstractions for net selftests
From: Alexandru Copot @ 2013-04-09 10:30 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemb, dborkman, edumazet, Alexandru Copot, Daniel Baluta
In-Reply-To: <1365503461-26309-1-git-send-email-alex.mihai.c@gmail.com>

Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
---
 tools/testing/selftests/net/selftests.c | 30 +++++++++++++++++++++++
 tools/testing/selftests/net/selftests.h | 42 +++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)
 create mode 100644 tools/testing/selftests/net/selftests.c
 create mode 100644 tools/testing/selftests/net/selftests.h

diff --git a/tools/testing/selftests/net/selftests.c b/tools/testing/selftests/net/selftests.c
new file mode 100644
index 0000000..cd6e427
--- /dev/null
+++ b/tools/testing/selftests/net/selftests.c
@@ -0,0 +1,30 @@
+#include <stdio.h>
+
+#include "selftests.h"
+
+int run_all_tests(struct generic_test *test, void *param)
+{
+	int i;
+	int rc, allrc;
+	char *ptr;
+
+	rc = test->prepare ? test->prepare(param) : 0;
+	if (rc)
+		return rc;
+
+	allrc = 0;
+	printf("Testing: %s ", test->name);
+	ptr = test->testcases;
+	for (i = 0; i < test->testcase_count; i++) {
+		rc = test->run(ptr);
+		allrc |= rc;
+
+		if (test->abort_on_fail && rc) {
+			printf("Testcase %d failed, aborting\n", i);
+		}
+
+		ptr += test->testcase_size;
+	}
+	printf("\t\t%s\n", allrc ? "[FAIL]" : "[PASS]");
+	return allrc;
+}
diff --git a/tools/testing/selftests/net/selftests.h b/tools/testing/selftests/net/selftests.h
new file mode 100644
index 0000000..e289f03
--- /dev/null
+++ b/tools/testing/selftests/net/selftests.h
@@ -0,0 +1,42 @@
+#ifndef SELFTESTS_H
+#define SELFTESTS_H
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+
+#define ASSERT(cond) do {								\
+	if (!(cond))     {								\
+			fprintf(stderr, "%s:%d %s\n", __FILE__, __LINE__, #cond);	\
+			perror("");							\
+			exit(EXIT_FAILURE);						\
+		}									\
+} while (0)
+
+#define CHECK(cond,fmt,...)			        \
+	do {						\
+		if (!(cond)) {                          \
+			fprintf(stderr, "(%s, %d): " fmt,	\
+					__FILE__, __LINE__, ##__VA_ARGS__); \
+			perror("");              	\
+			return 1;			\
+		}					\
+	} while (0)
+
+struct generic_test {
+	const char *name;
+	void *testcases;
+	int testcase_size;
+	int testcase_count;
+
+	int abort_on_fail;
+
+	int (*prepare)(void *);
+	int (*run)(void *);
+	int (*cleanup)(void *);
+};
+
+int run_all_tests(struct generic_test *test, void *param);
+
+#endif
+
-- 
1.8.2

^ permalink raw reply related

* [PATCH 2/3 net-next RFC] selftest: Adapt socket test to new testing framework
From: Alexandru Copot @ 2013-04-09 10:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemb, dborkman, edumazet, Alexandru Copot, Daniel Baluta
In-Reply-To: <1365503461-26309-1-git-send-email-alex.mihai.c@gmail.com>

Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
---
 tools/testing/selftests/net/Makefile |   4 +-
 tools/testing/selftests/net/socket.c | 107 +++++++++++++++++++++++++----------
 2 files changed, 80 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 750512b..9de4ae6 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -11,9 +11,11 @@ all: $(NET_PROGS)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
+socket: selftests.o
+
 run_tests: all
 	@/bin/sh ./run_netsocktests || echo "sockettests: [FAIL]"
 	@/bin/sh ./run_afpackettests || echo "afpackettests: [FAIL]"
 
 clean:
-	$(RM) $(NET_PROGS)
+	$(RM) $(NET_PROGS) *.o
diff --git a/tools/testing/selftests/net/socket.c b/tools/testing/selftests/net/socket.c
index 0f227f2..89b1b1e 100644
--- a/tools/testing/selftests/net/socket.c
+++ b/tools/testing/selftests/net/socket.c
@@ -6,6 +6,8 @@
 #include <sys/socket.h>
 #include <netinet/in.h>
 
+#include "selftests.h"
+
 struct socket_testcase {
 	int	domain;
 	int	type;
@@ -22,7 +24,7 @@ struct socket_testcase {
 	int	nosupport_ok;
 };
 
-static struct socket_testcase tests[] = {
+static struct socket_testcase tests_inet[] = {
 	{ AF_MAX,  0,           0,           -EAFNOSUPPORT,    0 },
 	{ AF_INET, SOCK_STREAM, IPPROTO_TCP, 0,                1  },
 	{ AF_INET, SOCK_DGRAM,  IPPROTO_TCP, -EPROTONOSUPPORT, 1  },
@@ -30,58 +32,103 @@ static struct socket_testcase tests[] = {
 	{ AF_INET, SOCK_STREAM, IPPROTO_UDP, -EPROTONOSUPPORT, 1  },
 };
 
+static struct socket_testcase tests_inet6[] = {
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP, 0,                1  },
+	{ AF_INET6, SOCK_DGRAM,  IPPROTO_TCP, -EPROTONOSUPPORT, 1  },
+	{ AF_INET6, SOCK_DGRAM,  IPPROTO_UDP, 0,                1  },
+	{ AF_INET6, SOCK_STREAM, IPPROTO_UDP, -EPROTONOSUPPORT, 1  },
+};
+
+static struct socket_testcase tests_unix[] = {
+	{ AF_UNIX, SOCK_STREAM,    0,           0,                1  },
+	{ AF_UNIX, SOCK_DGRAM,     0,           0,                1  },
+	{ AF_UNIX, SOCK_SEQPACKET, 0,           0,                1  },
+	{ AF_UNIX, SOCK_STREAM,    PF_UNIX,     0,                1  },
+	{ AF_UNIX, SOCK_DGRAM,     PF_UNIX,     0,                1  },
+	{ AF_UNIX, SOCK_SEQPACKET, PF_UNIX,     0,                1  },
+	{ AF_UNIX, SOCK_STREAM,    IPPROTO_TCP, -EPROTONOSUPPORT, 1  },
+	{ AF_UNIX, SOCK_STREAM,    IPPROTO_UDP, -EPROTONOSUPPORT, 1  },
+	{ AF_UNIX, SOCK_DCCP,      0,           -ESOCKTNOSUPPORT, 1  },
+};
+
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 #define ERR_STRING_SZ	64
 
-static int run_tests(void)
+static int my_run_testcase(void *arg)
 {
+	struct socket_testcase *s = (struct socket_testcase*)arg;
+	int fd;
 	char err_string1[ERR_STRING_SZ];
 	char err_string2[ERR_STRING_SZ];
-	int i, err;
-
-	err = 0;
-	for (i = 0; i < ARRAY_SIZE(tests); i++) {
-		struct socket_testcase *s = &tests[i];
-		int fd;
 
-		fd = socket(s->domain, s->type, s->protocol);
-		if (fd < 0) {
-			if (s->nosupport_ok &&
-			    errno == EAFNOSUPPORT)
-				continue;
+	fd = socket(s->domain, s->type, s->protocol);
+	if (fd < 0) {
+		if (s->nosupport_ok && errno == EAFNOSUPPORT)
+			return 0;
 
-			if (s->expect < 0 &&
-			    errno == -s->expect)
-				continue;
+		if (s->expect < 0 && errno == -s->expect)
+			return 0;
 
-			strerror_r(-s->expect, err_string1, ERR_STRING_SZ);
-			strerror_r(errno, err_string2, ERR_STRING_SZ);
+		strerror_r(-s->expect, err_string1, ERR_STRING_SZ);
+		strerror_r(errno, err_string2, ERR_STRING_SZ);
 
-			fprintf(stderr, "socket(%d, %d, %d) expected "
+		fprintf(stderr, "socket(%d, %d, %d) expected "
 				"err (%s) got (%s)\n",
 				s->domain, s->type, s->protocol,
 				err_string1, err_string2);
 
-			err = -1;
-			break;
-		} else {
-			close(fd);
+		return -1;
+	} else {
+		close(fd);
 
-			if (s->expect < 0) {
-				strerror_r(errno, err_string1, ERR_STRING_SZ);
+		if (s->expect < 0) {
+			strerror_r(errno, err_string1, ERR_STRING_SZ);
 
-				fprintf(stderr, "socket(%d, %d, %d) expected "
+			fprintf(stderr, "socket(%d, %d, %d) expected "
 					"success got err (%s)\n",
 					s->domain, s->type, s->protocol,
 					err_string1);
 
-				err = -1;
-				break;
-			}
+			return -1;
 		}
 	}
+	return 0;
+}
 
-	return err;
+static int run_tests(void)
+{
+	int rc;
+	struct generic_test test1 = {
+		.name = "socket AF_INET",
+		.prepare = NULL,
+		.run = my_run_testcase,
+		.testcases = tests_inet,
+		.testcase_size = sizeof(struct socket_testcase),
+		.testcase_count = ARRAY_SIZE(tests_inet),
+	};
+	struct generic_test test2 = {
+		.name = "socket AF_INET6",
+		.prepare = NULL,
+		.run = my_run_testcase,
+		.testcases = tests_inet6,
+		.testcase_size = sizeof(struct socket_testcase),
+		.testcase_count = ARRAY_SIZE(tests_inet6),
+	};
+	struct generic_test test3 = {
+		.name = "socket AF_UNIX",
+		.prepare = NULL,
+		.run = my_run_testcase,
+		.testcases = tests_unix,
+		.testcase_size = sizeof(struct socket_testcase),
+		.testcase_count = ARRAY_SIZE(tests_unix),
+	};
+
+	rc = 0;
+	rc |= run_all_tests(&test1, NULL);
+	rc |= run_all_tests(&test2, NULL);
+	rc |= run_all_tests(&test3, NULL);
+
+	return rc;
 }
 
 int main(void)
-- 
1.8.2

^ permalink raw reply related

* [PATCH net-next RFC] selftests: add socket options test with IPv6 testcases
From: Alexandru Copot @ 2013-04-09 10:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemb, dborkman, edumazet, Alexandru Copot, Daniel Baluta
In-Reply-To: <1365503461-26309-1-git-send-email-alex.mihai.c@gmail.com>

Only a part of the boolean socket options for IPv6 are tested.

Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
Cc: Daniel Baluta <dbaluta@ixiacom.com>
---
 tools/testing/selftests/net/Makefile         |   3 +-
 tools/testing/selftests/net/run_netsocktests |  10 ++
 tools/testing/selftests/net/sockopt.c        | 189 +++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/sockopt.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 9de4ae6..da0e954 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,13 +5,14 @@ CFLAGS = -Wall -O2 -g
 
 CFLAGS += -I../../../../usr/include/
 
-NET_PROGS = socket psock_fanout psock_tpacket
+NET_PROGS = socket sockopt psock_fanout psock_tpacket
 
 all: $(NET_PROGS)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
 socket: selftests.o
+sockopt: selftests.o
 
 run_tests: all
 	@/bin/sh ./run_netsocktests || echo "sockettests: [FAIL]"
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index c09a682..7aa4b01 100644
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -10,3 +10,13 @@ else
 	echo "[PASS]"
 fi
 
+echo "---------------------------"
+echo "running socket options test"
+echo "---------------------------"
+./sockopt
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+else
+	echo "[PASS]"
+fi
+
diff --git a/tools/testing/selftests/net/sockopt.c b/tools/testing/selftests/net/sockopt.c
new file mode 100644
index 0000000..4c130e3
--- /dev/null
+++ b/tools/testing/selftests/net/sockopt.c
@@ -0,0 +1,189 @@
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <linux/in.h>
+#include <linux/ipv6.h>
+
+#include "selftests.h"
+
+struct sockopt_testcase {
+	/* socket */
+	int	domain;
+	int	type;
+	int	protocol;
+
+	/* option */
+	int	level;
+	int	optname;
+	void	*value;
+	socklen_t size;
+
+	#define TYPE_INT   1
+	#define TYPE_DATA  2
+	int data_type;
+
+	int	expect_set;
+	int	expect_get;
+
+	int	nosupport_ok;
+};
+
+
+static struct sockopt_testcase tests_inet6[] = {
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, -1,                      (void*)0, sizeof(int),  TYPE_INT, -ENOPROTOOPT,  -ENOPROTOOPT,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_V6ONLY,             (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_V6ONLY,             (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVPKTINFO,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVPKTINFO,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292PKTINFO,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292PKTINFO,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+	
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVHOPLIMIT,       (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVHOPLIMIT,       (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292HOPLIMIT,       (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292HOPLIMIT,       (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+ 	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVRTHDR,          (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVRTHDR,          (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292RTHDR,          (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292RTHDR,          (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+	
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVHOPOPTS,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVHOPOPTS,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292HOPOPTS,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292HOPOPTS,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVDSTOPTS,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVDSTOPTS,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292DSTOPTS,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_2292DSTOPTS,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_TCLASS,             (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_TCLASS,             (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVTCLASS,         (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVTCLASS,         (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_FLOWINFO,           (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_FLOWINFO,           (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVPATHMTU,        (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVPATHMTU,        (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVORIGDSTADDR,    (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_RECVORIGDSTADDR,    (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_DONTFRAG,           (void*)1, sizeof(int),  TYPE_INT, 0,  0,  0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_DONTFRAG,           (void*)0, sizeof(int),  TYPE_INT, 0,  0,  0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU,                (void*)512,          sizeof(int),  TYPE_INT, -EINVAL, -ENOTCONN, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU,                (void*)IPV6_MIN_MTU, sizeof(int),  TYPE_INT,       0, -ENOTCONN, 0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU_DISCOVER,       (void*)IP_PMTUDISC_DONT, sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU_DISCOVER,       (void*)IP_PMTUDISC_WANT, sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU_DISCOVER,       (void*)IP_PMTUDISC_DO,   sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MTU_DISCOVER,       (void*)IP_PMTUDISC_PROBE,sizeof(int),  TYPE_INT,       0,     0, 0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_UNICAST_IF,         (void*)0,                sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_UNICAST_IF,         (void*)0x01000000,       sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_UNICAST_IF,         (void*)-1,               sizeof(int),  TYPE_INT, -EADDRNOTAVAIL,     -EADDRNOTAVAIL, 0},
+
+	{ AF_INET6, SOCK_STREAM, IPPROTO_TCP,   IPPROTO_IPV6, IPV6_MULTICAST_IF,       (void*)0,                sizeof(int),  TYPE_INT, -ENOPROTOOPT, -ENOPROTOOPT, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_IF,        (void*)0,                sizeof(int),  TYPE_INT,       0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_IF,        (void*)-1,               sizeof(int),  TYPE_INT, -ENODEV, -ENODEV, 0},
+
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_HOPS,      (void*)-1,               sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_HOPS,      (void*)0,                sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_HOPS,      (void*)255,              sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_MULTICAST_HOPS,      (void*)256,              sizeof(int),  TYPE_INT, -EINVAL,    0, 0},
+
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_UNICAST_HOPS,        (void*)-1,               sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_UNICAST_HOPS,        (void*)0,                sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_UNICAST_HOPS,        (void*)255,              sizeof(int),  TYPE_INT,      0,     0, 0},
+	{ AF_INET6, SOCK_DGRAM, IPPROTO_UDP,   IPPROTO_IPV6, IPV6_UNICAST_HOPS,        (void*)256,              sizeof(int),  TYPE_INT, -EINVAL,    0, 0},
+
+
+};
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define ERR_STRING_SZ	64
+
+static int my_run_testcase(void *arg)
+{
+	struct sockopt_testcase *s = (struct sockopt_testcase*)arg;
+	int fd, rc, val;
+	void *optdata;
+	socklen_t readsize;
+
+	fd = socket(s->domain, s->type, s->protocol);
+	ASSERT(fd > 0);
+
+	if (s->data_type == TYPE_INT) {
+		val = (long)s->value;
+		optdata = &val;
+	} else {
+		optdata = s->value;
+	}
+
+	rc = setsockopt(fd, s->level, s->optname, optdata, s->size);
+	CHECK(rc == 0 || errno == -s->expect_set, "setsockopt option %d\n", s->optname);
+
+	if (s->data_type == TYPE_INT) {
+		optdata = &val;
+		val = -1;
+	} else {
+		optdata = malloc(s->size);
+		ASSERT(optdata != NULL);
+	}
+
+	readsize = s->size;
+	rc = getsockopt(fd, s->level, s->optname, optdata, &readsize);
+	CHECK(rc == 0 || errno == -s->expect_get, "getsockopt option %d\n", s->optname);
+	ASSERT(readsize == s->size);
+
+	if (rc == 0 && errno == 0) {
+		if (s->data_type == TYPE_INT) {
+			CHECK(val == (long)s->value, "Read value different from written value\n");
+		} else {
+			CHECK(!memcmp(optdata, s->value, s->size), "Read value different from written value\n");
+			free(optdata);
+		}
+	}
+	ASSERT(close(fd) == 0);
+	return 0;
+}
+
+static int run_tests(void)
+{
+	int rc;
+	struct generic_test test1 = {
+		.name = "sockopt AF_INET6",
+		.prepare = NULL,
+		.run = my_run_testcase,
+		.testcases = tests_inet6,
+		.testcase_size = sizeof(struct sockopt_testcase),
+		.testcase_count = ARRAY_SIZE(tests_inet6),
+	};
+
+	rc = 0;
+	rc |= run_all_tests(&test1, NULL);
+
+	return rc;
+}
+
+int main(void)
+{
+	int err = run_tests();
+
+	return err;
+}
-- 
1.8.2

^ permalink raw reply related

* Re: [Patch net-next v3 3/4] vxlan: add ipv6 support
From: Cong Wang @ 2013-04-09 10:47 UTC (permalink / raw)
  To: David Stevens; +Cc: David S. Miller, netdev, Stephen Hemminger
In-Reply-To: <OF82703BAC.20223779-ON85257B47.0040E058-85257B47.0046C8A6@us.ibm.com>

On Mon, 2013-04-08 at 08:53 -0400, David Stevens wrote:
>         With your current definitions, "sin6" is just an in6_addr, but
> you are not checking the sin6_scope_id, which is not correct for IPv6
> link-local addresses. You can rely on "ifindex" in vxlan_rdst for
> fdb entries, but you'd at least need to make sure it is not 0 for LL
> scope, and you still need sin6_scope_id to match for calls in 
> vxlan_snoop()
> and vxlan_group_used(). The same sin6_addr with different
> sin6_scope_id
> for link-local addrs is not the same address in v6. 

It seems this is not very easy to do, at least for me. So I will send
another patch after this patcheset is merged, now let's not make this to
be a blocker for this patchset.

Thanks!

^ permalink raw reply

* Re: [RFC] net : add tx timestamp to packet mmap.
From: Paul Chavent @ 2013-04-09 10:42 UTC (permalink / raw)
  To: Richard Cochran; +Cc: davem, edumazet, daniel.borkmann, xemul, ebiederm, netdev
In-Reply-To: <20121213181733.GA2312@netboy.at.omicron.at>



On 12/13/2012 07:17 PM, Richard Cochran wrote:
> On Thu, Dec 13, 2012 at 05:13:56PM +0100, Paul Chavent wrote:
>>>
>>> In order for time stamps to appear, somebody has to call
>>> skb_tx_timestamp() ...
>> Yes. "Somebody" means "the hardware driver" after completing xmit.
>> That's true ?
>
> Yes, the MAC driver must call this helper function, but not many
> drivers do this yet. You didn't say which MAC driver you are using and
> whether it supports Tx SO_TIMESTAMPING or not.
>
>> Yes, it only sets some flags. I thought that those flags was
>> required by the skb_tx_timestamp() in order to make the appropriate
>> timestamping (hardware, software, etc).
>>
>> So in order to have tx timestamp that work, both calls are needed ?
>
> Yes.
>
>> Why sock_tx_timestamp is called in packet_fill_skb and
>> packet_sendmsg_spkt and not in tpacket_fill_skb ?
>> Why i can retrieve timestamps when i add this call ?
>
> Sorry, I don't know much about packet mmap. Last time I tried it, some
> years ago, it wasn't really working.
>
> Richard
>
Hi.

Would it be possible that the packet mmap maintainers give their opinion 
on this thread please ?

Regards.

Paul.

^ permalink raw reply

* Re: [PATCH 1/3 net-next RFC] selftest: add abstractions for net selftests
From: Daniel Borkmann @ 2013-04-09 11:13 UTC (permalink / raw)
  To: Alexandru Copot; +Cc: netdev, davem, willemb, edumazet, Daniel Baluta
In-Reply-To: <1365503461-26309-2-git-send-email-alex.mihai.c@gmail.com>

On 04/09/2013 12:30 PM, Alexandru Copot wrote:
> Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
> Cc: Daniel Baluta <dbaluta@ixiacom.com>
> ---
>   tools/testing/selftests/net/selftests.c | 30 +++++++++++++++++++++++
>   tools/testing/selftests/net/selftests.h | 42 +++++++++++++++++++++++++++++++++
>   2 files changed, 72 insertions(+)
>   create mode 100644 tools/testing/selftests/net/selftests.c
>   create mode 100644 tools/testing/selftests/net/selftests.h
>
> diff --git a/tools/testing/selftests/net/selftests.c b/tools/testing/selftests/net/selftests.c
> new file mode 100644
> index 0000000..cd6e427
> --- /dev/null
> +++ b/tools/testing/selftests/net/selftests.c
> @@ -0,0 +1,30 @@
> +#include <stdio.h>
> +
> +#include "selftests.h"
> +
> +int run_all_tests(struct generic_test *test, void *param)
> +{
> +	int i;
> +	int rc, allrc;
> +	char *ptr;
> +
> +	rc = test->prepare ? test->prepare(param) : 0;
> +	if (rc)
> +		return rc;
> +
> +	allrc = 0;

Nitpick: this could already have been initialized above.

> +	printf("Testing: %s ", test->name);
> +	ptr = test->testcases;

ditto

> +	for (i = 0; i < test->testcase_count; i++) {
> +		rc = test->run(ptr);
> +		allrc |= rc;
> +
> +		if (test->abort_on_fail && rc) {
> +			printf("Testcase %d failed, aborting\n", i);
> +		}

I think here you wanted to abort but didn't?

> +
> +		ptr += test->testcase_size;
> +	}
> +	printf("\t\t%s\n", allrc ? "[FAIL]" : "[PASS]");
> +	return allrc;
> +}
> diff --git a/tools/testing/selftests/net/selftests.h b/tools/testing/selftests/net/selftests.h
> new file mode 100644
> index 0000000..e289f03
> --- /dev/null
> +++ b/tools/testing/selftests/net/selftests.h
> @@ -0,0 +1,42 @@
> +#ifndef SELFTESTS_H
> +#define SELFTESTS_H
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <errno.h>
> +
> +#define ASSERT(cond) do {								\
> +	if (!(cond))     {								\
> +			fprintf(stderr, "%s:%d %s\n", __FILE__, __LINE__, #cond);	\
> +			perror("");							\
> +			exit(EXIT_FAILURE);						\
> +		}									\
> +} while (0)
> +
> +#define CHECK(cond,fmt,...)			        \
> +	do {						\
> +		if (!(cond)) {                          \
> +			fprintf(stderr, "(%s, %d): " fmt,	\
> +					__FILE__, __LINE__, ##__VA_ARGS__); \
> +			perror("");              	\
> +			return 1;			\
> +		}					\
> +	} while (0)

Isn't it a bit error-prone if in the middle of somewhere this check fails
and the function suddenly returns 1?

What if this is called from a function that was declared as void or to
return a pointer to a struct etc.?

> +struct generic_test {
> +	const char *name;
> +	void *testcases;
> +	int testcase_size;
> +	int testcase_count;
> +
> +	int abort_on_fail;
> +
> +	int (*prepare)(void *);
> +	int (*run)(void *);
> +	int (*cleanup)(void *);
> +};
> +
> +int run_all_tests(struct generic_test *test, void *param);
> +
> +#endif
> +
>

Nitpick: here's an extra newline at the end of the file.

^ permalink raw reply

* [PATCH 5/7] xen-netfront: reduce gso_max_size to account for ethernet header
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

The maximum packet including ethernet header that can be handled by netfront /
netback wire format is 65535. Reduce gso_max_size accordingly.

Drop skb and print warning when skb->len > 65535. This can 1) save the effort
to send malformed packet to netback, 2) help spotting misconfiguration of
netfront in the future.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 1bb2e20..42ef4a8 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -547,6 +547,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
+	/* If skb->len is too big for wire format, drop skb and alert
+	 * user about misconfiguration.
+	 */
+	if (unlikely(skb->len > (typeof(tx->size))(~0))) {
+		net_alert_ratelimited(
+			"xennet: skb->len = %u, too big for wire format\n",
+			skb->len);
+		goto drop;
+	}
+
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
@@ -1058,7 +1068,7 @@ err:
 
 static int xennet_change_mtu(struct net_device *dev, int mtu)
 {
-	int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
+	int max = xennet_can_sg(dev) ? 65535 - VLAN_ETH_HLEN : ETH_DATA_LEN;
 
 	if (mtu > max)
 		return -EINVAL;
@@ -1362,6 +1372,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
 	SET_NETDEV_DEV(netdev, &dev->dev);
 
+	netif_set_gso_max_size(netdev, 65535 - VLAN_ETH_HLEN);
+
 	np->netdev = netdev;
 
 	netif_carrier_off(netdev);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 6/7] xen-netback: coalesce slots and fix regressions
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.

With the help of coalescing, this patch tries to address two regressions and
avoid reopening the security hole in XSA-39.

Regression 1. The reduction of the number of supported ring entries (slots)
per packet (from 18 to 17). This regression has been around for some time but
remains unnoticed until XSA-39 security fix. This is fixed by coalescing
slots.

Regression 2. The XSA-39 security fix turning "too many frags" errors from
just dropping the packet to a fatal error and disabling the VIF. This is fixed
by coalescing slots (handling 18 slots when backend's MAX_SKB_FRAGS is 17)
which rules out false positive (using 18 slots is legit) and dropping packets
using 19 to `max_skb_slots` slots.

To avoid reopening security hole in XSA-39, frontend sending packet using more
than max_skb_slots is considered malicious.

The behavior of netback for packet is thus:

    1-18            slots: valid
   19-max_skb_slots slots: drop and respond with an error
   max_skb_slots+   slots: fatal error

max_skb_slots is configurable by admin, default value is 20.

Also change variable name from "frags" to "slots" in netbk_count_requests.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |  270 ++++++++++++++++++++++++++++++-------
 include/xen/interface/io/netif.h  |   18 +++
 2 files changed, 238 insertions(+), 50 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 6e8e51a..3490b2c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -47,11 +47,47 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
+/*
+ * This is the maximum slots a skb can have. If a guest sends a skb
+ * which exceeds this limit it is considered malicious.
+ */
+#define MAX_SKB_SLOTS_DEFAULT 20
+static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
+
+static int max_skb_slots_set(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+	unsigned long param = 0;
+
+	ret = kstrtoul(val, 10, &param);
+
+	if (ret < 0 || param < XEN_NETIF_NR_SLOTS_MIN)
+		return -EINVAL;
+
+	max_skb_slots = param;
+
+	return 0;
+}
+
+static struct kernel_param_ops max_skb_slots_param_ops = {
+	.set = max_skb_slots_set,
+	.get = param_get_ulong,
+};
+
+module_param_cb(max_skb_slots, &max_skb_slots_param_ops,
+		&max_skb_slots, 0444);
+
+typedef unsigned int pending_ring_idx_t;
+#define INVALID_PENDING_RING_IDX (~0U)
+
 struct pending_tx_info {
-	struct xen_netif_tx_request req;
+	struct xen_netif_tx_request req; /* coalesced tx request  */
 	struct xenvif *vif;
+	pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
+				  * if it is head of one or more tx
+				  * reqs
+				  */
 };
-typedef unsigned int pending_ring_idx_t;
 
 struct netbk_rx_meta {
 	int id;
@@ -102,7 +138,11 @@ struct xen_netbk {
 	atomic_t netfront_count;
 
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
-	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
+	/* Coalescing tx requests before copying makes number of grant
+	 * copy ops greater of equal to number of slots required. In
+	 * worst case a tx request consumes 2 gnttab_copy.
+	 */
+	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
 
 	u16 pending_ring[MAX_PENDING_REQS];
 
@@ -251,7 +291,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	if (vif->can_sg || vif->gso || vif->gso_prefix)
-		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
+		max += XEN_NETIF_NR_SLOTS_MIN + 1; /* extra_info + frags */
 
 	return max;
 }
@@ -657,7 +697,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 		__skb_queue_tail(&rxq, skb);
 
 		/* Filled the batch queue? */
-		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
+		if (count + XEN_NETIF_NR_SLOTS_MIN >= XEN_NETIF_RX_RING_SIZE)
 			break;
 	}
 
@@ -904,38 +944,56 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
 
 static int netbk_count_requests(struct xenvif *vif,
 				struct xen_netif_tx_request *first,
+				RING_IDX first_idx,
 				struct xen_netif_tx_request *txp,
 				int work_to_do)
 {
 	RING_IDX cons = vif->tx.req_cons;
-	int frags = 0;
+	int slots = 0;
+	int drop_err = 0;
 
 	if (!(first->flags & XEN_NETTXF_more_data))
 		return 0;
 
 	do {
-		if (frags >= work_to_do) {
-			netdev_err(vif->dev, "Need more frags\n");
+		if (slots >= work_to_do) {
+			netdev_err(vif->dev, "Need more slots\n");
 			netbk_fatal_tx_err(vif);
 			return -ENODATA;
 		}
 
-		if (unlikely(frags >= MAX_SKB_FRAGS)) {
-			netdev_err(vif->dev, "Too many frags\n");
+		/* Xen network protocol had implicit dependency on
+		 * MAX_SKB_FRAGS. XEN_NETIF_NR_SLOTS_MIN is set to the
+		 * historical MAX_SKB_FRAGS value 18 to honor the same
+		 * behavior as before. Any packet using more than 18
+		 * slots but less than max_skb_slots slots is dropped
+		 */
+		if (!drop_err && slots >= XEN_NETIF_NR_SLOTS_MIN) {
+			if (net_ratelimit())
+				netdev_dbg(vif->dev, "Too many slots\n");
+			drop_err = -E2BIG;
+		}
+
+		/* This guest is really using too many slots and
+		 * considered malicious.
+		 */
+		if (unlikely(slots >= max_skb_slots)) {
+			netdev_err(vif->dev,
+				   "Malicious frontend using too many slots\n");
 			netbk_fatal_tx_err(vif);
 			return -E2BIG;
 		}
 
-		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
+		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
 		       sizeof(*txp));
 		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Frag is bigger than frame.\n");
+			netdev_err(vif->dev, "Packet is bigger than frame.\n");
 			netbk_fatal_tx_err(vif);
 			return -EIO;
 		}
 
 		first->size -= txp->size;
-		frags++;
+		slots++;
 
 		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
 			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
@@ -944,7 +1002,13 @@ static int netbk_count_requests(struct xenvif *vif,
 			return -EINVAL;
 		}
 	} while ((txp++)->flags & XEN_NETTXF_more_data);
-	return frags;
+
+	if (drop_err) {
+		netbk_tx_err(vif, first, first_idx + slots);
+		return drop_err;
+	}
+
+	return slots;
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
@@ -968,48 +1032,114 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	int i, start;
+	u16 head_idx = 0;
+	int slot, start;
+	struct page *page;
+	pending_ring_idx_t index, start_idx = 0;
+	uint16_t dst_offset;
+	unsigned int nr_slots;
+	struct pending_tx_info *first = NULL;
+
+	/* At this point shinfo->nr_frags is in fact the number of
+	 * slots, which can be as large as XEN_NETIF_NR_SLOTS_MIN.
+	 */
+	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	for (i = start; i < shinfo->nr_frags; i++, txp++) {
-		struct page *page;
-		pending_ring_idx_t index;
+	/* Coalesce tx requests, at this point the packet passed in
+	 * should be <= 64K. Any packets larger than 64K have been
+	 * handled in netbk_count_requests().
+	 */
+	for (shinfo->nr_frags = slot = start; slot < nr_slots;
+	     shinfo->nr_frags++) {
 		struct pending_tx_info *pending_tx_info =
 			netbk->pending_tx_info;
 
-		index = pending_index(netbk->pending_cons++);
-		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, pending_idx);
+		page = alloc_page(GFP_KERNEL|__GFP_COLD);
 		if (!page)
 			goto err;
 
-		gop->source.u.ref = txp->gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txp->offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txp->offset;
-
-		gop->len = txp->size;
-		gop->flags = GNTCOPY_source_gref;
+		dst_offset = 0;
+		first = NULL;
+		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
+			gop->flags = GNTCOPY_source_gref;
+
+			gop->source.u.ref = txp->gref;
+			gop->source.domid = vif->domid;
+			gop->source.offset = txp->offset;
+
+			gop->dest.domid = DOMID_SELF;
+
+			gop->dest.offset = dst_offset;
+			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
+
+			if (dst_offset + txp->size > PAGE_SIZE) {
+				/* This page can only merge a portion
+				 * of tx request. Do not increment any
+				 * pointer / counter here. The txp
+				 * will be dealt with in future
+				 * rounds, eventually hitting the
+				 * `else` branch.
+				 */
+				gop->len = PAGE_SIZE - dst_offset;
+				txp->offset += gop->len;
+				txp->size -= gop->len;
+				dst_offset += gop->len; /* quit loop */
+			} else {
+				/* This tx request can be merged in the page */
+				gop->len = txp->size;
+				dst_offset += gop->len;
+
+				index = pending_index(netbk->pending_cons++);
+
+				pending_idx = netbk->pending_ring[index];
+
+				memcpy(&pending_tx_info[pending_idx].req, txp,
+				       sizeof(*txp));
+				xenvif_get(vif);
+
+				pending_tx_info[pending_idx].vif = vif;
+
+				/* Poison these fields, corresponding
+				 * fields for head tx req will be set
+				 * to correct values after the loop.
+				 */
+				netbk->mmap_pages[pending_idx] = (void *)(~0UL);
+				pending_tx_info[pending_idx].head =
+					INVALID_PENDING_RING_IDX;
+
+				if (!first) {
+					first = &pending_tx_info[pending_idx];
+					start_idx = index;
+					head_idx = pending_idx;
+				}
+
+				txp++;
+				slot++;
+			}
 
-		gop++;
+			gop++;
+		}
 
-		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
-		xenvif_get(vif);
-		pending_tx_info[pending_idx].vif = vif;
-		frag_set_pending_idx(&frags[i], pending_idx);
+		first->req.offset = 0;
+		first->req.size = dst_offset;
+		first->head = start_idx;
+		set_page_ext(page, netbk, head_idx);
+		netbk->mmap_pages[head_idx] = page;
+		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
 	}
 
+	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
+
 	return gop;
 err:
 	/* Unwind, freeing all pages and sending error responses. */
-	while (i-- > start) {
-		xen_netbk_idx_release(netbk, frag_get_pending_idx(&frags[i]),
-				      XEN_NETIF_RSP_ERROR);
+	while (shinfo->nr_frags-- > start) {
+		xen_netbk_idx_release(netbk,
+				frag_get_pending_idx(&frags[shinfo->nr_frags]),
+				XEN_NETIF_RSP_ERROR);
 	}
 	/* The head too, if necessary. */
 	if (start)
@@ -1025,8 +1155,10 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	struct gnttab_copy *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
+	u16 peek; /* peek into next tx request */
 
 	/* Check status of header. */
 	err = gop->status;
@@ -1038,11 +1170,21 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
+		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+		tx_info = &netbk->pending_tx_info[pending_idx];
+		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-		newerr = (++gop)->status;
+		do {
+			newerr = (++gop)->status;
+			if (newerr)
+				break;
+			peek = netbk->pending_ring[pending_index(++head)];
+		} while (netbk->pending_tx_info[peek].head
+			 == INVALID_PENDING_RING_IDX);
+
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
@@ -1267,11 +1409,12 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 	struct sk_buff *skb;
 	int ret;
 
-	while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	while ((nr_pending_reqs(netbk) + XEN_NETIF_NR_SLOTS_MIN
+		 < MAX_PENDING_REQS) &&
 		!list_empty(&netbk->net_schedule_list)) {
 		struct xenvif *vif;
 		struct xen_netif_tx_request txreq;
-		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
+		struct xen_netif_tx_request txfrags[max_skb_slots];
 		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
@@ -1332,7 +1475,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 				continue;
 		}
 
-		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = netbk_count_requests(vif, &txreq, idx,
+					   txfrags, work_to_do);
 		if (unlikely(ret < 0))
 			continue;
 
@@ -1359,7 +1503,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		pending_idx = netbk->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
-			    ret < MAX_SKB_FRAGS) ?
+			    ret < XEN_NETIF_NR_SLOTS_MIN) ?
 			PKT_PROT_LEN : txreq.size;
 
 		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
@@ -1409,6 +1553,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		memcpy(&netbk->pending_tx_info[pending_idx].req,
 		       &txreq, sizeof(txreq));
 		netbk->pending_tx_info[pending_idx].vif = vif;
+		netbk->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1539,7 +1684,10 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 {
 	struct xenvif *vif;
 	struct pending_tx_info *pending_tx_info;
-	pending_ring_idx_t index;
+	pending_ring_idx_t head;
+	u16 peek; /* peek into next tx request */
+
+	BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL));
 
 	/* Already complete? */
 	if (netbk->mmap_pages[pending_idx] == NULL)
@@ -1548,13 +1696,34 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 	pending_tx_info = &netbk->pending_tx_info[pending_idx];
 
 	vif = pending_tx_info->vif;
+	head = pending_tx_info->head;
 
-	make_tx_response(vif, &pending_tx_info->req, status);
+	BUG_ON(head == INVALID_PENDING_RING_IDX);
+	BUG_ON(netbk->pending_ring[pending_index(head)] != pending_idx);
 
-	index = pending_index(netbk->pending_prod++);
-	netbk->pending_ring[index] = pending_idx;
+	do {
+		pending_ring_idx_t index;
+		pending_ring_idx_t idx = pending_index(head);
+		u16 info_idx = netbk->pending_ring[idx];
 
-	xenvif_put(vif);
+		pending_tx_info = &netbk->pending_tx_info[info_idx];
+		make_tx_response(vif, &pending_tx_info->req, status);
+
+		/* Setting any number other than
+		 * INVALID_PENDING_RING_IDX indicates this slot is
+		 * starting a new packet / ending a previous packet.
+		 */
+		pending_tx_info->head = 0;
+
+		index = pending_index(netbk->pending_prod++);
+		netbk->pending_ring[index] = netbk->pending_ring[info_idx];
+
+		xenvif_put(vif);
+
+		peek = netbk->pending_ring[pending_index(++head)];
+
+	} while (netbk->pending_tx_info[peek].head
+		 == INVALID_PENDING_RING_IDX);
 
 	netbk->mmap_pages[pending_idx]->mapping = 0;
 	put_page(netbk->mmap_pages[pending_idx]);
@@ -1613,7 +1782,8 @@ static inline int rx_work_todo(struct xen_netbk *netbk)
 static inline int tx_work_todo(struct xen_netbk *netbk)
 {
 
-	if (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	if (((nr_pending_reqs(netbk) + XEN_NETIF_NR_SLOTS_MIN)
+	     < MAX_PENDING_REQS) &&
 			!list_empty(&netbk->net_schedule_list))
 		return 1;
 
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index 9dfc120..e829a09 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -13,6 +13,24 @@
 #include <xen/interface/grant_table.h>
 
 /*
+ * Older implementation of Xen network frontend / backend has an
+ * implicit dependency on the MAX_SKB_FRAGS as the maximum number of
+ * ring slots a skb can use. Netfront / netback may not work as
+ * expected when frontend and backend have different MAX_SKB_FRAGS.
+ *
+ * A better approach is to add mechanism for netfront / netback to
+ * negotiate this value. However we cannot fix all possible
+ * frontends, so we need to define a value which states the minimum
+ * slots backend must support.
+ *
+ * The minimum value derives from older Linux kernel's MAX_SKB_FRAGS
+ * (18), which is proved to work with most frontends. Any new backend
+ * which doesn't negotiate with frontend should expect frontend to
+ * send a valid packet using slots up to this value.
+ */
+#define XEN_NETIF_NR_SLOTS_MIN 18
+
+/*
  * Notifications after enqueuing any type of message should be conditional on
  * the appropriate req_event or rsp_event field in the shared ring.
  * If the client sends notification for rx requests then it should specify
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 4/7] xen-netfront: frags -> slots in log message
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

Also fix a typo in comment.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d9097a7..1bb2e20 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -735,7 +735,7 @@ static int xennet_get_responses(struct netfront_info *np,
 		/*
 		 * This definitely indicates a bug, either in this driver or in
 		 * the backend driver. In future this should flag the bad
-		 * situation to the system controller to reboot the backed.
+		 * situation to the system controller to reboot the backend.
 		 */
 		if (ref == GRANT_INVALID_REF) {
 			if (net_ratelimit())
@@ -771,7 +771,7 @@ next:
 
 	if (unlikely(slots > max)) {
 		if (net_ratelimit())
-			dev_warn(dev, "Too many frags\n");
+			dev_warn(dev, "Too many slots\n");
 		err = -E2BIG;
 	}
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 7/7] xen-netback: don't disconnect frontend when seeing oversize packet
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

Some frontend drivers are sending packets > 64 KiB in length. This length
overflows the length field in the first slot making the following slots have
an invalid length ("Packet is bigger than frame").

Turn this error back into a non-fatal error by dropping the packet. To avoid
having the following slots having fatal errors, consume all slots in the
packet.

This does not reopen the security hole in XSA-39 as if the packet as an
invalid number of slots it will still hit fatal error case.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 3490b2c..acd057b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -986,10 +986,21 @@ static int netbk_count_requests(struct xenvif *vif,
 
 		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
 		       sizeof(*txp));
-		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Packet is bigger than frame.\n");
-			netbk_fatal_tx_err(vif);
-			return -EIO;
+
+		/* If the guest submitted a frame >= 64 KiB then
+		 * first->size overflowed and following slots will
+		 * appear to be larger than the frame.
+		 *
+		 * This cannot be fatal error as there are buggy
+		 * frontends that do this.
+		 *
+		 * Consume all slots and drop the packet.
+		 */
+		if (!drop_err && txp->size > first->size) {
+			if (net_ratelimit())
+				netdev_dbg(vif->dev,
+					   "Packet is bigger than frame.\n");
+			drop_err = -EIO;
 		}
 
 		first->size -= txp->size;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/7] xen-netback: remove skb in xen_netbk_alloc_page
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

This variable is never used.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index da726a3..6e8e51a 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -948,7 +948,6 @@ static int netbk_count_requests(struct xenvif *vif,
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
-					 struct sk_buff *skb,
 					 u16 pending_idx)
 {
 	struct page *page;
@@ -982,7 +981,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 
 		index = pending_index(netbk->pending_cons++);
 		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page)
 			goto err;
 
@@ -1387,7 +1386,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		}
 
 		/* XXX could copy straight to head */
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/7] xen-netfront: remove unused variable `extra'
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

This variable is supposed to hold reference to the last extra_info in the
loop. However there is only type of extra info here and the loop to process
extra info is missing, so this variable is never used and causes confusion.

Remove it at the moment. We can add it back when necessary.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netfront.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7ffa43b..5527663 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netfront_info *np = netdev_priv(dev);
 	struct netfront_stats *stats = this_cpu_ptr(np->stats);
 	struct xen_netif_tx_request *tx;
-	struct xen_netif_extra_info *extra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	tx->gref = np->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
-	extra = NULL;
 
 	tx->flags = 0;
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		gso = (struct xen_netif_extra_info *)
 			RING_GET_REQUEST(&np->tx, ++i);
 
-		if (extra)
-			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
-		else
-			tx->flags |= XEN_NETTXF_extra_info;
+		tx->flags |= XEN_NETTXF_extra_info;
 
 		gso->u.gso.size = skb_shinfo(skb)->gso_size;
 		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
@@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
 		gso->flags = 0;
-		extra = gso;
 	}
 
 	np->tx.req_prod_pvt = i + 1;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/7] xen-netfront: frags -> slots in xennet_get_responses
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy,
	Wei Liu
In-Reply-To: <1365505655-8021-1-git-send-email-wei.liu2@citrix.com>

This function is in fact counting the ring slots required for responses.
Separate the concepts of ring slots and skb frags make the code clearer, as
now netfront and netback can have different MAX_SKB_FRAGS, slot and frag are
not mapped 1:1 any more.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netfront.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 5527663..d9097a7 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -712,7 +712,7 @@ static int xennet_get_responses(struct netfront_info *np,
 	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
 	grant_ref_t ref = xennet_get_rx_ref(np, cons);
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
-	int frags = 1;
+	int slots = 1;
 	int err = 0;
 	unsigned long ret;
 
@@ -756,27 +756,27 @@ next:
 		if (!(rx->flags & XEN_NETRXF_more_data))
 			break;
 
-		if (cons + frags == rp) {
+		if (cons + slots == rp) {
 			if (net_ratelimit())
-				dev_warn(dev, "Need more frags\n");
+				dev_warn(dev, "Need more slots\n");
 			err = -ENOENT;
 			break;
 		}
 
-		rx = RING_GET_RESPONSE(&np->rx, cons + frags);
-		skb = xennet_get_rx_skb(np, cons + frags);
-		ref = xennet_get_rx_ref(np, cons + frags);
-		frags++;
+		rx = RING_GET_RESPONSE(&np->rx, cons + slots);
+		skb = xennet_get_rx_skb(np, cons + slots);
+		ref = xennet_get_rx_ref(np, cons + slots);
+		slots++;
 	}
 
-	if (unlikely(frags > max)) {
+	if (unlikely(slots > max)) {
 		if (net_ratelimit())
 			dev_warn(dev, "Too many frags\n");
 		err = -E2BIG;
 	}
 
 	if (unlikely(err))
-		np->rx.rsp_cons = cons + frags;
+		np->rx.rsp_cons = cons + slots;
 
 	return err;
 }
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V3 0/7] Bundle fixes for Xen netfront / netback
From: Wei Liu @ 2013-04-09 11:07 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: ian.campbell, david.vrabel, konrad.wilk, annie.li, wdauchy

1-3 have been applied, repost them for completeness.

4 is incremental patch on top of 2 to fix log message and comment.

5 now accounts for vlan header.

6 is rewritten, please see commit log for details.

7 is slightly adjusted, but the logic remains the same as last version.

^ permalink raw reply

* Re: [PATCH 0/3 net-next RFC] selftest: Introduce test abstraction for net
From: Daniel Borkmann @ 2013-04-09 11:22 UTC (permalink / raw)
  To: Alexandru Copot; +Cc: netdev, davem, willemb, edumazet, Daniel Baluta
In-Reply-To: <1365503461-26309-1-git-send-email-alex.mihai.c@gmail.com>

On 04/09/2013 12:30 PM, Alexandru Copot wrote:
> This series adds a generic test abstraction that can make
> writing testcases easier. A generic_test structure is
> used to define a test and its methods.

Probably it would make sense as well to include other subsystem
test case developers/authors from testing/selftests/* in discussion,
since not only testing/selftests/net/ might want to use it, not
sure.

I don't know how far we want to go here or what is planned, but
maybe also having a look at libtap or others might be quite useful:

   https://github.com/zorgnax/libtap/blob/master/tap.h
   https://github.com/zorgnax/libtap/blob/master/tap.c

If we want to introduce something generic, then this should maybe
go under testing/selftests/lib/ so that others can make use of
that as well and convert (at least) some of their selftests.

> The second patch updates the socket tests to use the
> new framework, and the third patch creates new tests
> for [set/get]sockopt with some IPV6_* options.
>
> Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
> Cc: Daniel Baluta <dbaluta@ixiacom.com>

^ permalink raw reply

* Re: [PATCH 1/3 net-next RFC] selftest: add abstractions for net selftests
From: Alexandru Copot @ 2013-04-09 11:24 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, davem, willemb, edumazet, Daniel Baluta
In-Reply-To: <5163F7E1.8040208@redhat.com>

On Tue, Apr 9, 2013 at 2:13 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> +       for (i = 0; i < test->testcase_count; i++) {
>> +               rc = test->run(ptr);
>> +               allrc |= rc;
>> +
>> +               if (test->abort_on_fail && rc) {
>> +                       printf("Testcase %d failed, aborting\n", i);
>> +               }
>
>
> I think here you wanted to abort but didn't?

Yes, I forgot to break;

>> +#define CHECK(cond,fmt,...)                            \
>> +       do {                                            \
>> +               if (!(cond)) {                          \
>> +                       fprintf(stderr, "(%s, %d): " fmt,       \
>> +                                       __FILE__, __LINE__,
>> ##__VA_ARGS__); \
>> +                       perror("");                     \
>> +                       return 1;                       \
>> +               }                                       \
>> +       } while (0)
>
>
> Isn't it a bit error-prone if in the middle of somewhere this check fails
> and the function suddenly returns 1?
>
> What if this is called from a function that was declared as void or to
> return a pointer to a struct etc.?

Well, I tought of using this only in your high-level testcase methods
(test->run()).
It is also easier to see what is actually being tested.

For anything else the user is free to use any other functions or
return conventions
as the test requires.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox