Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH v2] net: fix smatch warnings inside datagram_poll
From: David Miller @ 2013-04-02 21:01 UTC (permalink / raw)
  To: jacob.e.keller; +Cc: netdev
In-Reply-To: <20130402205539.26083.7462.stgit@jekeller-hub.jf.intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>
Date: Tue,  2 Apr 2013 13:55:40 -0700

> Commit 7d4c04fc170087119727119074e72445f2bb192b ("net: add option to enable
> error queue packets waking select") has an issue due to operator precedence
> causing the bit-wise OR to bind to the sock_flags call instead of the result of
> the terniary conditional. This fixes the *_poll functions to work properly. The
> old code results in "mask |= POLLPRI" instead of what was intended, which is to
> only include POLLPRI when the socket option is enabled.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

Much better, applied, thanks.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2013-04-02 21:11 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel

1) Fix VSOCK layer handling of context ID changes, from Reilly Grant.

2) Now that we have a synchronize_net() in netdev_rx_handler_unregister(),
   we can't let any call sites hold locks.  Unfortunately bonding does,
   so we have to drop the rwlock there a little bit earlier, fix from
   Veaceslav Falico.

3) MAC address setting loop exits one iteration too early in mlx4 driver,
   from Yan Burman.

4) Restore ipv6 routes properly upon ifdown/ifup of loopback, from
   Balakumaran Kannan.

Please pull, thanks a lot!

The following changes since commit 118c9a45fdacc6fe57910fa1d048e2d5bbc193f4:

  Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2013-04-02 08:35:03 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 990454b5a48babde44a23c0f22bae5523f4fdf13:

  VSOCK: Handle changes to the VMCI context ID. (2013-04-02 14:39:17 -0400)

----------------------------------------------------------------
Balakumaran Kannan (1):
      net IPv6 : Fix broken IPv6 routing table after loopback down-up

Reilly Grant (1):
      VSOCK: Handle changes to the VMCI context ID.

Vasily Averin (1):
      cbq: incorrect processing of high limits

Veaceslav Falico (1):
      bonding: get netdev_rx_handler_unregister out of locks

Yan Burman (1):
      net/mlx4_en: Fix setting initial MAC address

 drivers/net/bonding/bond_main.c                |  3 +--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  4 ++--
 net/ipv6/addrconf.c                            | 27 +++++++++++++++++++++++++++
 net/sched/sch_cbq.c                            |  5 ++++-
 net/vmw_vsock/af_vsock.c                       |  6 +++---
 net/vmw_vsock/vmci_transport.c                 | 31 ++++++++++++++++++++-----------
 net/vmw_vsock/vsock_addr.c                     | 10 ----------
 net/vmw_vsock/vsock_addr.h                     |  2 --
 8 files changed, 57 insertions(+), 31 deletions(-)

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Hannes Frederic Sowa @ 2013-04-02 21:15 UTC (permalink / raw)
  To: Huang, Xiong
  Cc: Ben Hutchings, Anders Boström, netdev@vger.kernel.org,
	565404@bugs.debian.org
In-Reply-To: <157393863283F442885425D2C45428564F20261B@nasanexd02f.na.qualcomm.com>

On Mon, Apr 01, 2013 at 02:51:56AM +0000, Huang, Xiong wrote:
> > >
> > > I checked windows driver, it does limit  the max packet length for TSO
> > > windows XP : 32*1024 bytes (include MAC header and all MAC payload). No
> > support IP/TCP option.
> > > Windows 7:  15, 000 bytes, support IP/TCP option.
> > 
> > If TSO on these devices don't work properly with TCP options then you're
> > just going to have to disable it - Linux requires it to support at least the
> > timestamp option.  I'm not sure about IP options (this really ought to be
> > documented).
> > 
> > If there's a length limit lower than 64K, you'll need to set the limit using
> > netif_set_gso_max_size() before registering the net device.
> > 
> 
> Ben, thanks for your advice. 
> I have discussed with windows driver developer and hardware designer, the TSO limitation for win driver is just
> For simplifying windows driver due to the buffer length limitation of TX descriptor. The hardware itself has no limitation on
> TSO packet length.

The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
can even raise it to 0x3000 and don't see any tcp retransmits. Do you
have an advice on how to size this value (e.g. should we switch to the
windows values)?

I also found some irregularities in the mtu update code. It differs from the
calculations in the init function (I will send a patch for that).

^ permalink raw reply

* Re: [PATCH 3/6] mac802154: Use netif flow control
From: Sergei Shtylyov @ 2013-04-02 21:21 UTC (permalink / raw)
  To: Alan Ott
  Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, David S. Miller,
	linux-zigbee-devel, netdev, linux-kernel
In-Reply-To: <1364928481-1813-4-git-send-email-alan@signal11.us>

Hello.

On 04/02/2013 10:47 PM, Alan Ott wrote:

> Use netif_stop_queue() and netif_wake_queue() to control the flow of
> packets to mac802154 devices.  Since many IEEE 802.15.4 devices have no
> output buffer, and since the mac802154 xmit() function is designed to
> block, netif_stop_queue() is called after each packet.
>
> Signed-off-by: Alan Ott <alan@signal11.us>
> ---
>   net/mac802154/tx.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
>
> diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
> index a248246..fe3e02c 100644
> --- a/net/mac802154/tx.c
> +++ b/net/mac802154/tx.c
[...]
> @@ -71,6 +73,12 @@ static void mac802154_xmit_worker(struct work_struct *work)
>   out:
>   	mutex_unlock(&xw->priv->phy->pib_lock);
>   
> +	/* Restart the netif queue on each sub_if_data object. */
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sdata, &xw->priv->slaves, list) {
> +		netif_wake_queue(sdata->dev);
> +	}


    Are {} really necessary here?

> @@ -108,6 +117,13 @@ netdev_tx_t mac802154_tx(struct mac802154_priv *priv, struct sk_buff *skb,
>   		return NETDEV_TX_BUSY;
>   	}
>   
> +	/* Stop the netif queue on each sub_if_data object. */
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(sdata, &priv->slaves, list) {
> +		netif_stop_queue(sdata->dev);
> +	}

    And here?

WBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/6] mac802154: Immediately retry sending failed packets
From: Alan Ott @ 2013-04-02 21:28 UTC (permalink / raw)
  To: Alexander Smirnov
  Cc: Dmitry Eremin-Solenikov, David S. Miller, linux-zigbee-devel,
	netdev, linux-kernel
In-Reply-To: <515B3F78.2020301@signal11.us>

On 04/02/2013 04:28 PM, Alan Ott wrote:
> On 04/02/2013 03:11 PM, Alexander Smirnov wrote:
>> 2013/4/2 Alan Ott <alan@signal11.us <mailto:alan@signal11.us>>
>>
>>     When ops->xmit() fails, immediately retry. Previously the packet was
>>     sent
>>     to the back of the workqueue.
>>
>>     Signed-off-by: Alan Ott <alan@signal11.us <mailto:alan@signal11.us>>
>>     ---
>>      net/mac802154/tx.c | 17 ++++++++---------
>>      1 file changed, 8 insertions(+), 9 deletions(-)
>>
>>     diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
>>     index 4e09d07..fbf937c 100644
>>     --- a/net/mac802154/tx.c
>>     +++ b/net/mac802154/tx.c
>>     @@ -59,19 +59,18 @@ static void mac802154_xmit_worker(struct
>>     work_struct *work)
>>                     }
>>             }
>>
>>     -       res = xw->priv->ops->xmit(&xw->priv->hw, xw->skb);
>>     +       do {
>>     +               res = xw->priv->ops->xmit(&xw->priv->hw, xw->skb);
>>     +               if (res && ++xw->xmit_attempts >=
>>     MAC802154_MAX_XMIT_ATTEMPTS) {
>>     +                       pr_debug("transmission failed for %d times",
>>     +                                MAC802154_MAX_XMIT_ATTEMPTS);
>>     +                       break;
>>     +               }
>>     +       } while (res);
>>
>>  
>>
>> IIRC this 802.15.4 stack uses single-thread-work-queue and all RX/TX
>> are performed by using it.
> Hi Alexander,
>
> Yes, that is true. As is currently implemented, the driver xmit()
> functions are called from a workqueue and block until the packet is sent.
>
>
>> Doing TX retry in the way you proposed -
>> it's possible that you will block other packets pending in this
>> queue.
> Yes. Since sending data is a blocking operation, any time spent sending
> (or re-sending) is blocking.
>
> As it was before this patch series, with the buffer (workqueue) growing
> arbitrarily large, doing retry by putting a packet at the end of the
> workqueue was largely useless because by the time it came to retry it,
> any state associated with it (with respect to fragmentation/reassembly)
> had expired.
>
> Keep in mind that with the netif stop/wake code, putting retries at the
> end of the workqueue or doing them immediately is basically the same
> thing, since the workqueue is no longer the packet queue (and will
> ideally only have 0 or 1 packets in it). The workqueue (with these
> patches) only serves to lift the driver xmit() calls out of atomic
> context, allowing them to block.
>
> However, it is easy to envision one process clogging up the works with
> retries by sending many packets to an unavailable address.
>
> What do you recommend doing here instead?

According to 7.5.6.5 of IEEE 802.15.4-2003, if the retransmission fails
more than aMaxFrameRetries (3) times, it is assumed that it has failed.
Since some transceivers (and I would assume most if not all) do this in
hardware, it's now my opinion that we should _not_ try to retransmit at
all in mac802154/tx.c.

For a driver for a device which _doesn't_ do retransmission in hardware,
maybe it should be handled by that driver then.

>
>> Despite on Linux is already 'slow' system to provide
>> real-time for specific 802.15.4 features, I think it's not a good
>> idea to increase nodes communication latency.
> With the transmit buffer length increased (and actually being used),
> maybe the packets with realtime requirements can be given a higher
> priority to deal with these requirements.
>
> Alan.
>
>>
>>      out:
>>             mutex_unlock(&xw->priv->phy->pib_lock);
>>
>>     -       if (res) {
>>     -               if (xw->xmit_attempts++ < MAC802154_MAX_XMIT_ATTEMPTS) {
>>     -                       queue_work(xw->priv->dev_workqueue, &xw->work);
>>     -                       return;
>>     -               } else
>>     -                       pr_debug("transmission failed for %d times",
>>     -                                MAC802154_MAX_XMIT_ATTEMPTS);
>>     -       }
>>
>>             dev_kfree_skb(xw->skb);
>>
>>     --
>>     1.7.11.2
>>
>>

^ permalink raw reply

* Re: [PATCH v2] r8169: fix auto speed down issue
From: Francois Romieu @ 2013-04-02 21:37 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel, bowgotsai
In-Reply-To: <1364785324-1604-1-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> It would cause no link after suspending or shutdowning when the
> nic changes the speed to 10M and connects to a link partner which
> forces the speed to 100M.
> 
> Check the link partner ability to determine which speed to set.
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Acked-by: Francois Romieu <romieu@fr.zoreil.com>

I have given it a short sanity testing with suspend or shutdown (no rpm)
on 8168evl or 8168e and it did not seem to regress (8168e and 8168evl).

The driver does not do what it is requested to when the r81xx - not the
link partner - forces the link. Not that it ever did but autonegotiation
will now always end up enabled and nothing will prevent a higher than
expected link speed after that.

It could be worth to check MII_BMCR for BMCR_ANENABLE before anything else
in rtl_speed_down.

[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 28fb50a..bdc03a9 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -3818,6 +3818,30 @@ static void rtl_init_mdio_ops(struct rtl8169_private *tp)
>  	}
>  }
>  
> +static void rtl_speed_down(struct rtl8169_private *tp)
> +{
> +	u32	adv;
> +	int	lpa;
           ^^^^^
Please use a single, true space (no tabs) next time.

-- 
Ueimor

^ permalink raw reply

* [PATCH] net: count hw_addr syncs so that unsync works properly.
From: Vlad Yasevich @ 2013-04-02 21:10 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the fist device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
 include/linux/netdevice.h | 2 +-
 net/core/dev_addr_lists.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8bfa956..6151e90 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -210,9 +210,9 @@ struct netdev_hw_addr {
 #define NETDEV_HW_ADDR_T_SLAVE		3
 #define NETDEV_HW_ADDR_T_UNICAST	4
 #define NETDEV_HW_ADDR_T_MULTICAST	5
-	bool			synced;
 	bool			global_use;
 	int			refcount;
+	int			synced;
 	struct rcu_head		rcu_head;
 };
 
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index bd2eb9d..abdc9e6 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -37,7 +37,7 @@ static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
 	ha->type = addr_type;
 	ha->refcount = 1;
 	ha->global_use = global;
-	ha->synced = false;
+	ha->synced = 0;
 	list_add_tail_rcu(&ha->list, &list->list);
 	list->count++;
 
@@ -165,7 +165,7 @@ int __hw_addr_sync(struct netdev_hw_addr_list *to_list,
 					    addr_len, ha->type);
 			if (err)
 				break;
-			ha->synced = true;
+			ha->synced++;
 			ha->refcount++;
 		} else if (ha->refcount == 1) {
 			__hw_addr_del(to_list, ha->addr, addr_len, ha->type);
@@ -186,7 +186,7 @@ void __hw_addr_unsync(struct netdev_hw_addr_list *to_list,
 		if (ha->synced) {
 			__hw_addr_del(to_list, ha->addr,
 				      addr_len, ha->type);
-			ha->synced = false;
+			ha->synced--;
 			__hw_addr_del(from_list, ha->addr,
 				      addr_len, ha->type);
 		}
-- 
1.8.1.4

^ permalink raw reply related

* RE: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Huang, Xiong @ 2013-04-02 21:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Ben Hutchings, Anders Boström, netdev@vger.kernel.org,
	565404@bugs.debian.org
In-Reply-To: <20130402211524.GE4924@order.stressinduktion.org>

> The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN in
> the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I can even
> raise it to 0x3000 and don't see any tcp retransmits. Do you have an advice on
> how to size this value (e.g. should we switch to the windows values)?
> 

Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits, 0x4000 exceeds max value.
Do you find any bug/issue on the code that calculate the length for each TX descriptor ?

Thanks
Xiong

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Eric Dumazet @ 2013-04-02 22:00 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Huang, Xiong, Ben Hutchings, Anders Boström,
	netdev@vger.kernel.org, 565404@bugs.debian.org
In-Reply-To: <20130402211524.GE4924@order.stressinduktion.org>

On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:

> The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
> in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
> can even raise it to 0x3000 and don't see any tcp retransmits. Do you
> have an advice on how to size this value (e.g. should we switch to the
> windows values)?

This looks like an overflow error...

diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
index 7e0a822..7965f89 100644
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -1569,18 +1569,17 @@ static u16 atl1e_cal_tdp_req(const struct sk_buff *skb)
 {
 	int i = 0;
 	u16 tpd_req = 1;
-	u16 fg_size = 0;
-	u16 proto_hdr_len = 0;
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		fg_size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		u32 fg_size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+
 		tpd_req += ((fg_size + MAX_TX_BUF_LEN - 1) >> MAX_TX_BUF_SHIFT);
 	}
 
 	if (skb_is_gso(skb)) {
 		if (skb->protocol == htons(ETH_P_IP) ||
 		   (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV6)) {
-			proto_hdr_len = skb_transport_offset(skb) +
+			u32 proto_hdr_len = skb_transport_offset(skb) +
 					tcp_hdrlen(skb);
 			if (proto_hdr_len < skb_headlen(skb)) {
 				tpd_req += ((skb_headlen(skb) - proto_hdr_len +

^ permalink raw reply related

* [PATCH net-next] selftests: net: add PF_PACKET TPACKET v1/v2/v3 selftests
From: Daniel Borkmann @ 2013-04-02 22:12 UTC (permalink / raw)
  To: davem; +Cc: netdev

This patch adds a simple test case that probes the packet socket's
TPACKET_V1, TPACKET_V2 and TPACKET_V3 behavior regarding mmap(2)'ed
I/O for a small burst of 100 packets. The test currently runs for ...

  TPACKET_V1: RX_RING, TX_RING
  TPACKET_V2: RX_RING, TX_RING
  TPACKET_V3: RX_RING

... and will output on success:

  test: TPACKET_V1 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V1 with PACKET_TX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V2 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V2 with PACKET_TX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V3 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  OK. All tests passed

Reusable parts of psock_fanout.c have been put into a psock_lib.h
file for common usage. Test case successfully tested on x86_64.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   4 +-
 tools/testing/selftests/net/psock_fanout.c    |  88 +--
 tools/testing/selftests/net/psock_lib.h       | 127 ++++
 tools/testing/selftests/net/psock_tpacket.c   | 824 ++++++++++++++++++++++++++
 tools/testing/selftests/net/run_afpackettests |  10 +
 5 files changed, 966 insertions(+), 87 deletions(-)
 create mode 100644 tools/testing/selftests/net/psock_lib.h
 create mode 100644 tools/testing/selftests/net/psock_tpacket.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index bd6e272..750512b 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -1,11 +1,11 @@
 # Makefile for net selftests
 
 CC = $(CROSS_COMPILE)gcc
-CFLAGS = -Wall
+CFLAGS = -Wall -O2 -g
 
 CFLAGS += -I../../../../usr/include/
 
-NET_PROGS = socket psock_fanout
+NET_PROGS = socket psock_fanout psock_tpacket
 
 all: $(NET_PROGS)
 %: %.c
diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index 59bd636..57b9c2b 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -61,91 +61,9 @@
 #include <sys/types.h>
 #include <unistd.h>
 
-#define DATA_LEN			100
-#define DATA_CHAR			'a'
-#define RING_NUM_FRAMES			20
-#define PORT_BASE			8000
-
-static void pair_udp_open(int fds[], uint16_t port)
-{
-	struct sockaddr_in saddr, daddr;
-
-	fds[0] = socket(PF_INET, SOCK_DGRAM, 0);
-	fds[1] = socket(PF_INET, SOCK_DGRAM, 0);
-	if (fds[0] == -1 || fds[1] == -1) {
-		fprintf(stderr, "ERROR: socket dgram\n");
-		exit(1);
-	}
-
-	memset(&saddr, 0, sizeof(saddr));
-	saddr.sin_family = AF_INET;
-	saddr.sin_port = htons(port);
-	saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
-
-	memset(&daddr, 0, sizeof(daddr));
-	daddr.sin_family = AF_INET;
-	daddr.sin_port = htons(port + 1);
-	daddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+#include "psock_lib.h"
 
-	/* must bind both to get consistent hash result */
-	if (bind(fds[1], (void *) &daddr, sizeof(daddr))) {
-		perror("bind");
-		exit(1);
-	}
-	if (bind(fds[0], (void *) &saddr, sizeof(saddr))) {
-		perror("bind");
-		exit(1);
-	}
-	if (connect(fds[0], (void *) &daddr, sizeof(daddr))) {
-		perror("connect");
-		exit(1);
-	}
-}
-
-static void pair_udp_send(int fds[], int num)
-{
-	char buf[DATA_LEN], rbuf[DATA_LEN];
-
-	memset(buf, DATA_CHAR, sizeof(buf));
-	while (num--) {
-		/* Should really handle EINTR and EAGAIN */
-		if (write(fds[0], buf, sizeof(buf)) != sizeof(buf)) {
-			fprintf(stderr, "ERROR: send failed left=%d\n", num);
-			exit(1);
-		}
-		if (read(fds[1], rbuf, sizeof(rbuf)) != sizeof(rbuf)) {
-			fprintf(stderr, "ERROR: recv failed left=%d\n", num);
-			exit(1);
-		}
-		if (memcmp(buf, rbuf, sizeof(buf))) {
-			fprintf(stderr, "ERROR: data failed left=%d\n", num);
-			exit(1);
-		}
-	}
-}
-
-static void sock_fanout_setfilter(int fd)
-{
-	struct sock_filter bpf_filter[] = {
-		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
-		{ 0x35, 0, 5, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
-		{ 0x15, 0, 3, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000051 },  /* LD  ip[81]		      */
-		{ 0x15, 0, 1, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x6, 0, 0, 0x00000060  },  /* RET match	              */
-/* nomatch */	{ 0x6, 0, 0, 0x00000000  },  /* RET no match		      */
-	};
-	struct sock_fprog bpf_prog;
-
-	bpf_prog.filter = bpf_filter;
-	bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
-	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf_prog,
-		       sizeof(bpf_prog))) {
-		perror("setsockopt SO_ATTACH_FILTER");
-		exit(1);
-	}
-}
+#define RING_NUM_FRAMES			20
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
@@ -169,7 +87,7 @@ static int sock_fanout_open(uint16_t typeflags, int num_packets)
 		return -1;
 	}
 
-	sock_fanout_setfilter(fd);
+	pair_udp_setfilter(fd);
 	return fd;
 }
 
diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
new file mode 100644
index 0000000..37da54a
--- /dev/null
+++ b/tools/testing/selftests/net/psock_lib.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright 2013 Google Inc.
+ * Author: Willem de Bruijn <willemb@google.com>
+ *         Daniel Borkmann <dborkman@redhat.com>
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef PSOCK_LIB_H
+#define PSOCK_LIB_H
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+
+#define DATA_LEN			100
+#define DATA_CHAR			'a'
+
+#define PORT_BASE			8000
+
+#ifndef __maybe_unused
+# define __maybe_unused		__attribute__ ((__unused__))
+#endif
+
+static __maybe_unused void pair_udp_setfilter(int fd)
+{
+	struct sock_filter bpf_filter[] = {
+		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
+		{ 0x35, 0, 5, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
+		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
+		{ 0x15, 0, 3, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
+		{ 0x30, 0, 0, 0x00000051 },  /* LD  ip[81]		      */
+		{ 0x15, 0, 1, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
+		{ 0x06, 0, 0, 0x00000060 },  /* RET match	              */
+		{ 0x06, 0, 0, 0x00000000 },  /* RET no match		      */
+	};
+	struct sock_fprog bpf_prog;
+
+	bpf_prog.filter = bpf_filter;
+	bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
+	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf_prog,
+		       sizeof(bpf_prog))) {
+		perror("setsockopt SO_ATTACH_FILTER");
+		exit(1);
+	}
+}
+
+static __maybe_unused void pair_udp_open(int fds[], uint16_t port)
+{
+	struct sockaddr_in saddr, daddr;
+
+	fds[0] = socket(PF_INET, SOCK_DGRAM, 0);
+	fds[1] = socket(PF_INET, SOCK_DGRAM, 0);
+	if (fds[0] == -1 || fds[1] == -1) {
+		fprintf(stderr, "ERROR: socket dgram\n");
+		exit(1);
+	}
+
+	memset(&saddr, 0, sizeof(saddr));
+	saddr.sin_family = AF_INET;
+	saddr.sin_port = htons(port);
+	saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+	memset(&daddr, 0, sizeof(daddr));
+	daddr.sin_family = AF_INET;
+	daddr.sin_port = htons(port + 1);
+	daddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+	/* must bind both to get consistent hash result */
+	if (bind(fds[1], (void *) &daddr, sizeof(daddr))) {
+		perror("bind");
+		exit(1);
+	}
+	if (bind(fds[0], (void *) &saddr, sizeof(saddr))) {
+		perror("bind");
+		exit(1);
+	}
+	if (connect(fds[0], (void *) &daddr, sizeof(daddr))) {
+		perror("connect");
+		exit(1);
+	}
+}
+
+static __maybe_unused void pair_udp_send(int fds[], int num)
+{
+	char buf[DATA_LEN], rbuf[DATA_LEN];
+
+	memset(buf, DATA_CHAR, sizeof(buf));
+	while (num--) {
+		/* Should really handle EINTR and EAGAIN */
+		if (write(fds[0], buf, sizeof(buf)) != sizeof(buf)) {
+			fprintf(stderr, "ERROR: send failed left=%d\n", num);
+			exit(1);
+		}
+		if (read(fds[1], rbuf, sizeof(rbuf)) != sizeof(rbuf)) {
+			fprintf(stderr, "ERROR: recv failed left=%d\n", num);
+			exit(1);
+		}
+		if (memcmp(buf, rbuf, sizeof(buf))) {
+			fprintf(stderr, "ERROR: data failed left=%d\n", num);
+			exit(1);
+		}
+	}
+}
+
+static __maybe_unused void pair_udp_close(int fds[])
+{
+	close(fds[0]);
+	close(fds[1]);
+}
+
+#endif /* PSOCK_LIB_H */
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
new file mode 100644
index 0000000..a8d7ffa
--- /dev/null
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -0,0 +1,824 @@
+/*
+ * Copyright 2013 Red Hat, Inc.
+ * Author: Daniel Borkmann <dborkman@redhat.com>
+ *
+ * A basic test of packet socket's TPACKET_V1/TPACKET_V2/TPACKET_V3 behavior.
+ *
+ * Control:
+ *   Test the setup of the TPACKET socket with different patterns that are
+ *   known to fail (TODO) resp. succeed (OK).
+ *
+ * Datapath:
+ *   Open a pair of packet sockets and send resp. receive an a priori known
+ *   packet pattern accross the sockets and check if it was received resp.
+ *   sent correctly. Fanout in combination with RX_RING is currently not
+ *   tested here.
+ *
+ *   The test currently runs for
+ *   - TPACKET_V1: RX_RING, TX_RING
+ *   - TPACKET_V2: RX_RING, TX_RING
+ *   - TPACKET_V3: RX_RING
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <linux/if_packet.h>
+#include <linux/filter.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <bits/wordsize.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <arpa/inet.h>
+#include <stdint.h>
+#include <string.h>
+#include <assert.h>
+#include <net/if.h>
+#include <inttypes.h>
+#include <poll.h>
+
+#include "psock_lib.h"
+
+#ifndef bug_on
+# define bug_on(cond)		assert(!(cond))
+#endif
+
+#ifndef __aligned_tpacket
+# define __aligned_tpacket	__attribute__((aligned(TPACKET_ALIGNMENT)))
+#endif
+
+#ifndef __align_tpacket
+# define __align_tpacket(x)	__attribute__((aligned(TPACKET_ALIGN(x))))
+#endif
+
+#define BLOCK_STATUS(x)		((x)->h1.block_status)
+#define BLOCK_NUM_PKTS(x)	((x)->h1.num_pkts)
+#define BLOCK_O2FP(x)		((x)->h1.offset_to_first_pkt)
+#define BLOCK_LEN(x)		((x)->h1.blk_len)
+#define BLOCK_SNUM(x)		((x)->h1.seq_num)
+#define BLOCK_O2PRIV(x)		((x)->offset_to_priv)
+#define BLOCK_PRIV(x)		((void *) ((uint8_t *) (x) + BLOCK_O2PRIV(x)))
+#define BLOCK_HDR_LEN		(ALIGN_8(sizeof(struct block_desc)))
+#define ALIGN_8(x)		(((x) + 8 - 1) & ~(8 - 1))
+#define BLOCK_PLUS_PRIV(sz_pri)	(BLOCK_HDR_LEN + ALIGN_8((sz_pri)))
+
+#define NUM_PACKETS		100
+
+struct ring {
+	struct iovec *rd;
+	uint8_t *mm_space;
+	size_t mm_len, rd_len;
+	struct sockaddr_ll ll;
+	void (*walk)(int sock, struct ring *ring);
+	int type, rd_num, flen, version;
+	union {
+		struct tpacket_req  req;
+		struct tpacket_req3 req3;
+	};
+};
+
+struct block_desc {
+	uint32_t version;
+	uint32_t offset_to_priv;
+	struct tpacket_hdr_v1 h1;
+};
+
+union frame_map {
+	struct {
+		struct tpacket_hdr tp_h __aligned_tpacket;
+		struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket_hdr));
+	} *v1;
+	struct {
+		struct tpacket2_hdr tp_h __aligned_tpacket;
+		struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket2_hdr));
+	} *v2;
+	void *raw;
+};
+
+static unsigned int total_packets, total_bytes;
+
+static int pfsocket(int ver)
+{
+	int ret, sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (sock == -1) {
+		perror("socket");
+		exit(1);
+	}
+
+	ret = setsockopt(sock, SOL_PACKET, PACKET_VERSION, &ver, sizeof(ver));
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+
+	return sock;
+}
+
+static void status_bar_update(void)
+{
+	if (total_packets % 10 == 0) {
+		fprintf(stderr, ".");
+		fflush(stderr);
+	}
+}
+
+static void test_payload(void *pay, size_t len)
+{
+	struct ethhdr *eth = pay;
+
+	if (len < sizeof(struct ethhdr)) {
+		fprintf(stderr, "test_payload: packet too "
+			"small: %zu bytes!\n", len);
+		exit(1);
+	}
+
+	if (eth->h_proto != htons(ETH_P_IP)) {
+		fprintf(stderr, "test_payload: wrong ethernet "
+			"type: 0x%x!\n", ntohs(eth->h_proto));
+		exit(1);
+	}
+}
+
+static void create_payload(void *pay, size_t *len)
+{
+	int i;
+	struct ethhdr *eth = pay;
+	struct iphdr *ip = pay + sizeof(*eth);
+
+	/* Lets create some broken crap, that still passes
+	 * our BPF filter.
+	 */
+
+	*len = DATA_LEN + 42;
+
+	memset(pay, 0xff, ETH_ALEN * 2);
+	eth->h_proto = htons(ETH_P_IP);
+
+	for (i = 0; i < sizeof(*ip); ++i)
+		((uint8_t *) pay)[i + sizeof(*eth)] = (uint8_t) rand();
+
+	ip->ihl = 5;
+	ip->version = 4;
+	ip->protocol = 0x11;
+	ip->frag_off = 0;
+	ip->ttl = 64;
+	ip->tot_len = htons((uint16_t) *len - sizeof(*eth));
+
+	ip->saddr = htonl(INADDR_LOOPBACK);
+	ip->daddr = htonl(INADDR_LOOPBACK);
+
+	memset(pay + sizeof(*eth) + sizeof(*ip),
+	       DATA_CHAR, DATA_LEN);
+}
+
+static inline int __v1_rx_kernel_ready(struct tpacket_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v1_rx_user_ready(struct tpacket_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static inline int __v2_rx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v2_rx_user_ready(struct tpacket2_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static inline int __v1_v2_rx_kernel_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		return __v1_rx_kernel_ready(base);
+	case TPACKET_V2:
+		return __v2_rx_kernel_ready(base);
+	default:
+		bug_on(1);
+		return 0;
+	}
+}
+
+static inline void __v1_v2_rx_user_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		__v1_rx_user_ready(base);
+		break;
+	case TPACKET_V2:
+		__v2_rx_user_ready(base);
+		break;
+	}
+}
+
+static void walk_v1_v2_rx(int sock, struct ring *ring)
+{
+	struct pollfd pfd;
+	int udp_sock[2];
+	union frame_map ppd;
+	unsigned int frame_num = 0;
+
+	bug_on(ring->type != PACKET_RX_RING);
+
+	pair_udp_open(udp_sock, PORT_BASE);
+	pair_udp_setfilter(sock);
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLIN | POLLERR;
+	pfd.revents = 0;
+
+	pair_udp_send(udp_sock, NUM_PACKETS);
+
+	while (total_packets < NUM_PACKETS * 2) {
+		while (__v1_v2_rx_kernel_ready(ring->rd[frame_num].iov_base,
+					       ring->version)) {
+			ppd.raw = ring->rd[frame_num].iov_base;
+
+			switch (ring->version) {
+			case TPACKET_V1:
+				test_payload((uint8_t *) ppd.raw + ppd.v1->tp_h.tp_mac,
+					     ppd.v1->tp_h.tp_snaplen);
+				total_bytes += ppd.v1->tp_h.tp_snaplen;
+				break;
+
+			case TPACKET_V2:
+				test_payload((uint8_t *) ppd.raw + ppd.v2->tp_h.tp_mac,
+					     ppd.v2->tp_h.tp_snaplen);
+				total_bytes += ppd.v2->tp_h.tp_snaplen;
+				break;
+			}
+
+			status_bar_update();
+			total_packets++;
+
+			__v1_v2_rx_user_ready(ppd.raw, ring->version);
+
+			frame_num = (frame_num + 1) % ring->rd_num;
+		}
+
+		poll(&pfd, 1, 1);
+	}
+
+	pair_udp_close(udp_sock);
+
+	if (total_packets != 2 * NUM_PACKETS) {
+		fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+			ring->version, total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static inline int __v1_tx_kernel_ready(struct tpacket_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_AVAILABLE) == TP_STATUS_AVAILABLE);
+}
+
+static inline void __v1_tx_user_ready(struct tpacket_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_SEND_REQUEST;
+	__sync_synchronize();
+}
+
+static inline int __v2_tx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_AVAILABLE) == TP_STATUS_AVAILABLE);
+}
+
+static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_SEND_REQUEST;
+	__sync_synchronize();
+}
+
+static inline int __v1_v2_tx_kernel_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		return __v1_tx_kernel_ready(base);
+	case TPACKET_V2:
+		return __v2_tx_kernel_ready(base);
+	default:
+		bug_on(1);
+		return 0;
+	}
+}
+
+static inline void __v1_v2_tx_user_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		__v1_tx_user_ready(base);
+		break;
+	case TPACKET_V2:
+		__v2_tx_user_ready(base);
+		break;
+	}
+}
+
+static void __v1_v2_set_packet_loss_discard(int sock)
+{
+	int ret, discard = 1;
+
+	ret = setsockopt(sock, SOL_PACKET, PACKET_LOSS, (void *) &discard,
+			 sizeof(discard));
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+}
+
+static void walk_v1_v2_tx(int sock, struct ring *ring)
+{
+	struct pollfd pfd;
+	int rcv_sock, ret;
+	size_t packet_len;
+	union frame_map ppd;
+	char packet[1024];
+	unsigned int frame_num = 0, got = 0;
+	struct sockaddr_ll ll = {
+		.sll_family = PF_PACKET,
+		.sll_halen = ETH_ALEN,
+	};
+
+	bug_on(ring->type != PACKET_TX_RING);
+	bug_on(ring->rd_num < NUM_PACKETS);
+
+	rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (rcv_sock == -1) {
+		perror("socket");
+		exit(1);
+	}
+
+	pair_udp_setfilter(rcv_sock);
+
+	ll.sll_ifindex = if_nametoindex("lo");
+	ret = bind(rcv_sock, (struct sockaddr *) &ll, sizeof(ll));
+	if (ret == -1) {
+		perror("bind");
+		exit(1);
+	}
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLOUT | POLLERR;
+	pfd.revents = 0;
+
+	total_packets = NUM_PACKETS;
+	create_payload(packet, &packet_len);
+
+	while (total_packets > 0) {
+		while (__v1_v2_tx_kernel_ready(ring->rd[frame_num].iov_base,
+					       ring->version) &&
+		       total_packets > 0) {
+			ppd.raw = ring->rd[frame_num].iov_base;
+
+			switch (ring->version) {
+			case TPACKET_V1:
+				ppd.v1->tp_h.tp_snaplen = packet_len;
+				ppd.v1->tp_h.tp_len = packet_len;
+
+				memcpy((uint8_t *) ppd.raw + TPACKET_HDRLEN -
+				       sizeof(struct sockaddr_ll), packet,
+				       packet_len);
+				total_bytes += ppd.v1->tp_h.tp_snaplen;
+				break;
+
+			case TPACKET_V2:
+				ppd.v2->tp_h.tp_snaplen = packet_len;
+				ppd.v2->tp_h.tp_len = packet_len;
+
+				memcpy((uint8_t *) ppd.raw + TPACKET2_HDRLEN -
+				       sizeof(struct sockaddr_ll), packet,
+				       packet_len);
+				total_bytes += ppd.v2->tp_h.tp_snaplen;
+				break;
+			}
+
+			status_bar_update();
+			total_packets--;
+
+			__v1_v2_tx_user_ready(ppd.raw, ring->version);
+
+			frame_num = (frame_num + 1) % ring->rd_num;
+		}
+
+		poll(&pfd, 1, 1);
+	}
+
+	bug_on(total_packets != 0);
+
+	ret = sendto(sock, NULL, 0, 0, NULL, 0);
+	if (ret == -1) {
+		perror("sendto");
+		exit(1);
+	}
+
+	while ((ret = recvfrom(rcv_sock, packet, sizeof(packet),
+			       0, NULL, NULL)) > 0 &&
+	       total_packets < NUM_PACKETS) {
+		got += ret;
+		test_payload(packet, ret);
+
+		status_bar_update();
+		total_packets++;
+	}
+
+	close(rcv_sock);
+
+	if (total_packets != NUM_PACKETS) {
+		fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+			ring->version, total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, got);
+}
+
+static void walk_v1_v2(int sock, struct ring *ring)
+{
+	if (ring->type == PACKET_RX_RING)
+		walk_v1_v2_rx(sock, ring);
+	else
+		walk_v1_v2_tx(sock, ring);
+}
+
+static uint64_t __v3_prev_block_seq_num = 0;
+
+void __v3_test_block_seq_num(struct block_desc *pbd)
+{
+	if (__v3_prev_block_seq_num + 1 != BLOCK_SNUM(pbd)) {
+		fprintf(stderr, "\nprev_block_seq_num:%"PRIu64", expected "
+			"seq:%"PRIu64" != actual seq:%"PRIu64"\n",
+			__v3_prev_block_seq_num, __v3_prev_block_seq_num + 1,
+			(uint64_t) BLOCK_SNUM(pbd));
+		exit(1);
+	}
+
+	__v3_prev_block_seq_num = BLOCK_SNUM(pbd);
+}
+
+static void __v3_test_block_len(struct block_desc *pbd, uint32_t bytes, int block_num)
+{
+	if (BLOCK_NUM_PKTS(pbd)) {
+		if (bytes != BLOCK_LEN(pbd)) {
+			fprintf(stderr, "\nblock:%u with %upackets, expected "
+				"len:%u != actual len:%u\n", block_num,
+				BLOCK_NUM_PKTS(pbd), bytes, BLOCK_LEN(pbd));
+			exit(1);
+		}
+	} else {
+		if (BLOCK_LEN(pbd) != BLOCK_PLUS_PRIV(13)) {
+			fprintf(stderr, "\nblock:%u, expected len:%lu != "
+				"actual len:%u\n", block_num, BLOCK_HDR_LEN,
+				BLOCK_LEN(pbd));
+			exit(1);
+		}
+	}
+}
+
+static void __v3_test_block_header(struct block_desc *pbd, const int block_num)
+{
+	uint32_t block_status = BLOCK_STATUS(pbd);
+
+	if ((block_status & TP_STATUS_USER) == 0) {
+		fprintf(stderr, "\nblock %u: not in TP_STATUS_USER\n", block_num);
+		exit(1);
+	}
+
+	__v3_test_block_seq_num(pbd);
+}
+
+static void __v3_walk_block(struct block_desc *pbd, const int block_num)
+{
+	int num_pkts = BLOCK_NUM_PKTS(pbd), i;
+	unsigned long bytes = 0;
+	unsigned long bytes_with_padding = BLOCK_PLUS_PRIV(13);
+	struct tpacket3_hdr *ppd;
+
+	__v3_test_block_header(pbd, block_num);
+
+	ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + BLOCK_O2FP(pbd));
+	for (i = 0; i < num_pkts; ++i) {
+		bytes += ppd->tp_snaplen;
+
+		if (ppd->tp_next_offset)
+			bytes_with_padding += ppd->tp_next_offset;
+		else
+			bytes_with_padding += ALIGN_8(ppd->tp_snaplen + ppd->tp_mac);
+
+		test_payload((uint8_t *) ppd + ppd->tp_mac, ppd->tp_snaplen);
+
+		status_bar_update();
+		total_packets++;
+
+		ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd + ppd->tp_next_offset);
+		__sync_synchronize();
+	}
+
+	__v3_test_block_len(pbd, bytes_with_padding, block_num);
+	total_bytes += bytes;
+}
+
+void __v3_flush_block(struct block_desc *pbd)
+{
+	BLOCK_STATUS(pbd) = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static void walk_v3_rx(int sock, struct ring *ring)
+{
+	unsigned int block_num = 0;
+	struct pollfd pfd;
+	struct block_desc *pbd;
+	int udp_sock[2];
+
+	bug_on(ring->type != PACKET_RX_RING);
+
+	pair_udp_open(udp_sock, PORT_BASE);
+	pair_udp_setfilter(sock);
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLIN | POLLERR;
+	pfd.revents = 0;
+
+	pair_udp_send(udp_sock, NUM_PACKETS);
+
+	while (total_packets < NUM_PACKETS * 2) {
+		pbd = (struct block_desc *) ring->rd[block_num].iov_base;
+
+		while ((BLOCK_STATUS(pbd) & TP_STATUS_USER) == 0)
+			poll(&pfd, 1, 1);
+
+		__v3_walk_block(pbd, block_num);
+		__v3_flush_block(pbd);
+
+		block_num = (block_num + 1) % ring->rd_num;
+	}
+
+	pair_udp_close(udp_sock);
+
+	if (total_packets != 2 * NUM_PACKETS) {
+		fprintf(stderr, "walk_v3_rx: received %u out of %u pkts\n",
+			total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static void walk_v3(int sock, struct ring *ring)
+{
+	if (ring->type == PACKET_RX_RING)
+		walk_v3_rx(sock, ring);
+	else
+		bug_on(1);
+}
+
+static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
+{
+	ring->req.tp_block_size = getpagesize() << 2;
+	ring->req.tp_frame_size = TPACKET_ALIGNMENT << 7;
+	ring->req.tp_block_nr = blocks;
+
+	ring->req.tp_frame_nr = ring->req.tp_block_size /
+				ring->req.tp_frame_size *
+				ring->req.tp_block_nr;
+
+	ring->mm_len = ring->req.tp_block_size * ring->req.tp_block_nr;
+	ring->walk = walk_v1_v2;
+	ring->rd_num = ring->req.tp_frame_nr;
+	ring->flen = ring->req.tp_frame_size;
+}
+
+static void __v3_fill(struct ring *ring, unsigned int blocks)
+{
+	ring->req3.tp_retire_blk_tov = 64;
+	ring->req3.tp_sizeof_priv = 13;
+	ring->req3.tp_feature_req_word |= TP_FT_REQ_FILL_RXHASH;
+
+	ring->req3.tp_block_size = getpagesize() << 2;
+	ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
+	ring->req3.tp_block_nr = blocks;
+
+	ring->req3.tp_frame_nr = ring->req3.tp_block_size /
+				 ring->req3.tp_frame_size *
+				 ring->req3.tp_block_nr;
+
+	ring->mm_len = ring->req3.tp_block_size * ring->req3.tp_block_nr;
+	ring->walk = walk_v3;
+	ring->rd_num = ring->req3.tp_block_nr;
+	ring->flen = ring->req3.tp_block_size;
+}
+
+static void setup_ring(int sock, struct ring *ring, int version, int type)
+{
+	int ret = 0;
+	unsigned int blocks = 256;
+
+	ring->type = type;
+	ring->version = version;
+
+	switch (version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		if (type == PACKET_TX_RING)
+			__v1_v2_set_packet_loss_discard(sock);
+		__v1_v2_fill(ring, blocks);
+		ret = setsockopt(sock, SOL_PACKET, type, &ring->req,
+				 sizeof(ring->req));
+		break;
+
+	case TPACKET_V3:
+		__v3_fill(ring, blocks);
+		ret = setsockopt(sock, SOL_PACKET, type, &ring->req3,
+				 sizeof(ring->req3));
+		break;
+	}
+
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+
+	ring->rd_len = ring->rd_num * sizeof(*ring->rd);
+	ring->rd = malloc(ring->rd_len);
+	if (ring->rd == NULL) {
+		perror("malloc");
+		exit(1);
+	}
+
+	total_packets = 0;
+	total_bytes = 0;
+}
+
+static void mmap_ring(int sock, struct ring *ring)
+{
+	int i;
+
+	ring->mm_space = mmap(0, ring->mm_len, PROT_READ | PROT_WRITE,
+			      MAP_SHARED | MAP_LOCKED | MAP_POPULATE, sock, 0);
+	if (ring->mm_space == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	memset(ring->rd, 0, ring->rd_len);
+	for (i = 0; i < ring->rd_num; ++i) {
+		ring->rd[i].iov_base = ring->mm_space + (i * ring->flen);
+		ring->rd[i].iov_len = ring->flen;
+	}
+}
+
+static void bind_ring(int sock, struct ring *ring)
+{
+	int ret;
+
+	ring->ll.sll_family = PF_PACKET;
+	ring->ll.sll_protocol = htons(ETH_P_ALL);
+	ring->ll.sll_ifindex = if_nametoindex("lo");
+	ring->ll.sll_hatype = 0;
+	ring->ll.sll_pkttype = 0;
+	ring->ll.sll_halen = 0;
+
+	ret = bind(sock, (struct sockaddr *) &ring->ll, sizeof(ring->ll));
+	if (ret == -1) {
+		perror("bind");
+		exit(1);
+	}
+}
+
+static void walk_ring(int sock, struct ring *ring)
+{
+	ring->walk(sock, ring);
+}
+
+static void unmap_ring(int sock, struct ring *ring)
+{
+	munmap(ring->mm_space, ring->mm_len);
+	free(ring->rd);
+}
+
+static int test_kernel_bit_width(void)
+{
+	char in[512], *ptr;
+	int num = 0, fd;
+	ssize_t ret;
+
+	fd = open("/proc/kallsyms", O_RDONLY);
+	if (fd == -1) {
+		perror("open");
+		exit(1);
+	}
+
+	ret = read(fd, in, sizeof(in));
+	if (ret <= 0) {
+		perror("read");
+		exit(1);
+	}
+
+	close(fd);
+
+	ptr = in;
+	while(!isspace(*ptr)) {
+		num++;
+		ptr++;
+	}
+
+	return num * 4;
+}
+
+static int test_user_bit_width(void)
+{
+	return __WORDSIZE;
+}
+
+static const char *tpacket_str[] = {
+	[TPACKET_V1] = "TPACKET_V1",
+	[TPACKET_V2] = "TPACKET_V2",
+	[TPACKET_V3] = "TPACKET_V3",
+};
+
+static const char *type_str[] = {
+	[PACKET_RX_RING] = "PACKET_RX_RING",
+	[PACKET_TX_RING] = "PACKET_TX_RING",
+};
+
+static int test_tpacket(int version, int type)
+{
+	int sock;
+	struct ring ring;
+
+	fprintf(stderr, "test: %s with %s ", tpacket_str[version],
+		type_str[type]);
+	fflush(stderr);
+
+	if (version == TPACKET_V1 &&
+	    test_kernel_bit_width() != test_user_bit_width()) {
+		fprintf(stderr, "test: skip %s %s since user and kernel "
+			"space have different bit width\n",
+			tpacket_str[version], type_str[type]);
+		return 0;
+	}
+
+	sock = pfsocket(version);
+	memset(&ring, 0, sizeof(ring));
+	setup_ring(sock, &ring, version, type);
+	mmap_ring(sock, &ring);
+	bind_ring(sock, &ring);
+	walk_ring(sock, &ring);
+	unmap_ring(sock, &ring);
+	close(sock);
+
+	fprintf(stderr, "\n");
+	return 0;
+}
+
+int main(void)
+{
+	int ret = 0;
+
+	ret |= test_tpacket(TPACKET_V1, PACKET_RX_RING);
+	ret |= test_tpacket(TPACKET_V1, PACKET_TX_RING);
+
+	ret |= test_tpacket(TPACKET_V2, PACKET_RX_RING);
+	ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
+
+	ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+
+	if (ret)
+		return 1;
+
+	printf("OK. All tests passed\n");
+	return 0;
+}
diff --git a/tools/testing/selftests/net/run_afpackettests b/tools/testing/selftests/net/run_afpackettests
index 7907824..5246e78 100644
--- a/tools/testing/selftests/net/run_afpackettests
+++ b/tools/testing/selftests/net/run_afpackettests
@@ -14,3 +14,13 @@ if [ $? -ne 0 ]; then
 else
 	echo "[PASS]"
 fi
+
+echo "--------------------"
+echo "running psock_tpacket test"
+echo "--------------------"
+./psock_tpacket
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+else
+	echo "[PASS]"
+fi
-- 
1.7.11.7

^ permalink raw reply related

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Hannes Frederic Sowa @ 2013-04-02 22:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Huang, Xiong, Ben Hutchings, Anders Boström,
	netdev@vger.kernel.org, 565404@bugs.debian.org
In-Reply-To: <1364940038.5113.187.camel@edumazet-glaptop>

On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
> On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
> 
> > The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
> > in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
> > can even raise it to 0x3000 and don't see any tcp retransmits. Do you
> > have an advice on how to size this value (e.g. should we switch to the
> > windows values)?
> 
> This looks like an overflow error...

Thanks for your input, Eric.

I am limited in my time to work on this today but nontheless just tested
your patch without any of my changes and count a lot of TcpRetransSegs
again. Either there is really some hardware limitation or another
overflow.

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Hannes Frederic Sowa @ 2013-04-02 22:19 UTC (permalink / raw)
  To: Huang, Xiong
  Cc: Ben Hutchings, Anders Boström, netdev@vger.kernel.org,
	565404@bugs.debian.org
In-Reply-To: <157393863283F442885425D2C45428564F202AD5@nasanexd02f.na.qualcomm.com>

On Tue, Apr 02, 2013 at 09:51:12PM +0000, Huang, Xiong wrote:
> > The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN in
> > the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I can even
> > raise it to 0x3000 and don't see any tcp retransmits. Do you have an advice on
> > how to size this value (e.g. should we switch to the windows values)?
> > 
> 
> Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits, 0x4000 exceeds max value.
> Do you find any bug/issue on the code that calculate the length for each TX descriptor ?

Setting MAX_TX_BUF_LEN to 0x4000

[ 8949.833750] ATL1E 0000:04:00.0 p33p1: NIC Link is Up <100 Mbps Full Duplex>
[ 8949.833783] IPv6: ADDRCONF(NETDEV_CHANGE): p33p1: link becomes ready
[ 8960.861557] ATL1E 0000:04:00.0 p33p1: PCIE DMA RW error (status = 0x5000400)
[ 8960.866879] ATL1E 0000:04:00.0 p33p1: NIC Link is Up <100 Mbps Full Duplex>
[ 8961.095266] ATL1E 0000:04:00.0 p33p1: PCIE DMA RW error (status = 0x5000400)
[ 8961.100791] ATL1E 0000:04:00.0 p33p1: NIC Link is Up <100 Mbps Full Duplex>

I have not looked at the buffer calculations intensly.

^ permalink raw reply

* RE: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Huang, Xiong @ 2013-04-02 22:23 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Ben Hutchings, Anders Boström, netdev@vger.kernel.org,
	565404@bugs.debian.org
In-Reply-To: <20130402221913.GG4924@order.stressinduktion.org>


> 
> On Tue, Apr 02, 2013 at 09:51:12PM +0000, Huang, Xiong wrote:
> > > The error vanishes as soon as I put a gso size limit of
> > > MAX_TX_BUF_LEN in the driver. MAX_TX_BUF_LEN seems to be
> arbitrary
> > > set to 0x2000. I can even raise it to 0x3000 and don't see any tcp
> > > retransmits. Do you have an advice on how to size this value (e.g. should
> we switch to the windows values)?
> > >
> >
> > Would you try 0x4000 ? because the buffer-length in TX descriptor is 14bits,
> 0x4000 exceeds max value.
> > Do you find any bug/issue on the code that calculate the length for each TX
> descriptor ?
> 
> Setting MAX_TX_BUF_LEN to 0x4000
> 
> [ 8949.833750] ATL1E 0000:04:00.0 p33p1: NIC Link is Up <100 Mbps Full
> Duplex> [ 8949.833783] IPv6: ADDRCONF(NETDEV_CHANGE): p33p1: link
> becomes ready [ 8960.861557] ATL1E 0000:04:00.0 p33p1: PCIE DMA RW error
> (status = 0x5000400) [ 8960.866879] ATL1E 0000:04:00.0 p33p1: NIC Link is Up
> <100 Mbps Full Duplex> [ 8961.095266] ATL1E 0000:04:00.0 p33p1: PCIE DMA
> RW error (status = 0x5000400) [ 8961.100791] ATL1E 0000:04:00.0 p33p1: NIC
> Link is Up <100 Mbps Full Duplex>
> 
Hannes,  Thanks for your testing !

 simply revising MAX_TX_BUF_LEN to 0x4000 will cause incorrect TX configuration...
I mean you can try to put a gso size limit of 0x4000 (or 0x5000)....

Thanks
Xiong


^ permalink raw reply

* Re: [PATCH v2 net-next 6/8] r8169: add a new chip for RTL8111G
From: Francois Romieu @ 2013-04-02 22:27 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1364891022-3220-6-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> Add a new chip for RTL8111G series.

It does not need any of the workarounds in patch #5, right ?

-- 
Ueimor

^ permalink raw reply

* [PATCH net-next] openvswitch: Provide OVS_DP_ATTR_UPCALL_PID in datapath messages
From: Thomas Graf @ 2013-04-02 22:28 UTC (permalink / raw)
  To: jesse; +Cc: dev, netdev

The upcall port configured when adding a new datapath is currently
only provided to user space as part of the vport message. Therefore
user space has to request two separate messages which is prone to
race conditions.

Provide the upcall port of the local port (0) of a data path in the
datapath message to gain symmetry between the SET and GET command.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/datapath.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index d406503..e9b9469 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1240,6 +1240,7 @@ static size_t ovs_dp_cmd_msg_size(void)
 	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
 
 	msgsize += nla_total_size(IFNAMSIZ);
+	msgsize += nla_total_size(4); /* OVS_DP_ATTR_UPCALL_PID */
 	msgsize += nla_total_size(sizeof(struct ovs_dp_stats));
 
 	return msgsize;
@@ -1250,6 +1251,7 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 {
 	struct ovs_header *ovs_header;
 	struct ovs_dp_stats dp_stats;
+	struct vport *local;
 	int err;
 
 	ovs_header = genlmsg_put(skb, portid, seq, &dp_datapath_genl_family,
@@ -1261,17 +1263,24 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 
 	rcu_read_lock();
 	err = nla_put_string(skb, OVS_DP_ATTR_NAME, ovs_dp_name(dp));
-	rcu_read_unlock();
 	if (err)
 		goto nla_put_failure;
 
+	local = ovs_vport_rcu(dp, OVSP_LOCAL);
+	if (local &&
+	    nla_put_u32(skb, OVS_DP_ATTR_UPCALL_PID, local->upcall_portid))
+		goto nla_put_failure;
+
 	get_dp_stats(dp, &dp_stats);
 	if (nla_put(skb, OVS_DP_ATTR_STATS, sizeof(struct ovs_dp_stats), &dp_stats))
 		goto nla_put_failure;
 
+	rcu_read_unlock();
+
 	return genlmsg_end(skb, ovs_header);
 
 nla_put_failure:
+	rcu_read_unlock();
 	genlmsg_cancel(skb, ovs_header);
 error:
 	return -EMSGSIZE;
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next] openvswitch: Don't insert empty OVS_VPORT_ATTR_OPTIONS attribute
From: Thomas Graf @ 2013-04-02 22:30 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

The port specific options are currently unused resulting in an
empty OVS_VPORT_ATTR_OPTIONS nested attribute being inserted
into every OVS_VPORT_CMD_GET message.

Don't insert OVS_VPORT_ATTR_OPTIONS if no options are present.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
---
 net/openvswitch/vport.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index f6b8132..71a2de8 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -301,17 +301,19 @@ void ovs_vport_get_stats(struct vport *vport, struct ovs_vport_stats *stats)
 int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb)
 {
 	struct nlattr *nla;
+	int err;
+
+	if (!vport->ops->get_options)
+		return 0;
 
 	nla = nla_nest_start(skb, OVS_VPORT_ATTR_OPTIONS);
 	if (!nla)
 		return -EMSGSIZE;
 
-	if (vport->ops->get_options) {
-		int err = vport->ops->get_options(vport, skb);
-		if (err) {
-			nla_nest_cancel(skb, nla);
-			return err;
-		}
+	err = vport->ops->get_options(vport, skb);
+	if (err) {
+		nla_nest_cancel(skb, nla);
+		return err;
 	}
 
 	nla_nest_end(skb, nla);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH v2 net-next] vxlan: Bypass encapsulation if the destination is local
From: Sridhar Samudrala @ 2013-04-02 22:31 UTC (permalink / raw)
  To: shemminger, davem, dlstevens; +Cc: netdev

This patch bypasses vxlan encapsulation if the destination vxlan
endpoint is a local device.

Changes since v1: added missing check for vxlan_find_vni() failure

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
---

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 62a4438..f3610cd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -912,6 +912,36 @@ static int handle_offloads(struct sk_buff *skb)
 	return 0;
 }
 
+/* Bypass encapsulation if the destination is local */
+static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
+			       struct vxlan_dev *dst_vxlan)
+{
+	struct pcpu_tstats *tx_stats = this_cpu_ptr(src_vxlan->dev->tstats);
+	struct pcpu_tstats *rx_stats = this_cpu_ptr(dst_vxlan->dev->tstats);
+
+	skb->pkt_type = PACKET_HOST;
+	skb->encapsulation = 0;
+	skb->dev = dst_vxlan->dev;
+	__skb_pull(skb, skb_network_offset(skb));
+
+	if (dst_vxlan->flags & VXLAN_F_LEARN)
+		vxlan_snoop(skb->dev, INADDR_LOOPBACK, eth_hdr(skb)->h_source);
+
+	u64_stats_update_begin(&tx_stats->syncp);
+	tx_stats->tx_packets++;
+	tx_stats->tx_bytes += skb->len;
+	u64_stats_update_end(&tx_stats->syncp);
+
+	if (netif_rx(skb) == NET_RX_SUCCESS) {
+		u64_stats_update_begin(&rx_stats->syncp);
+		rx_stats->rx_packets++;
+		rx_stats->rx_bytes += skb->len;
+		u64_stats_update_end(&rx_stats->syncp);
+	} else {
+		skb->dev->stats.rx_dropped++;
+	}
+}
+
 static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				  struct vxlan_rdst *rdst, bool did_rsc)
 {
@@ -922,7 +952,6 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	struct vxlanhdr *vxh;
 	struct udphdr *uh;
 	struct flowi4 fl4;
-	unsigned int pkt_len = skb->len;
 	__be32 dst;
 	__u16 src_port, dst_port;
         u32 vni;
@@ -935,22 +964,8 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 	if (!dst) {
 		if (did_rsc) {
-			__skb_pull(skb, skb_network_offset(skb));
-			skb->ip_summed = CHECKSUM_NONE;
-			skb->pkt_type = PACKET_HOST;
-
 			/* short-circuited back to local bridge */
-			if (netif_rx(skb) == NET_RX_SUCCESS) {
-				struct pcpu_tstats *stats = this_cpu_ptr(dev->tstats);
-
-				u64_stats_update_begin(&stats->syncp);
-				stats->tx_packets++;
-				stats->tx_bytes += pkt_len;
-				u64_stats_update_end(&stats->syncp);
-			} else {
-				dev->stats.tx_errors++;
-				dev->stats.tx_aborted_errors++;
-			}
+			vxlan_encap_bypass(skb, vxlan, vxlan);
 			return NETDEV_TX_OK;
 		}
 		goto drop;
@@ -997,6 +1012,18 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		goto tx_error;
 	}
 
+	/* Bypass encapsulation if the destination is local */
+	if (rt->rt_flags & RTCF_LOCAL) {
+		struct vxlan_dev *dst_vxlan;
+
+		ip_rt_put(rt);
+		dst_vxlan = vxlan_find_vni(dev_net(dev), vni);
+		if (!dst_vxlan)
+			goto tx_error;	
+		vxlan_encap_bypass(skb, vxlan, dst_vxlan);
+		return NETDEV_TX_OK;
+	}
+
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
 			      IPSKB_REROUTED);

^ permalink raw reply related

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Eric Dumazet @ 2013-04-02 22:34 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Huang, Xiong, Ben Hutchings, Anders Boström,
	netdev@vger.kernel.org, 565404@bugs.debian.org
In-Reply-To: <20130402221520.GF4924@order.stressinduktion.org>

On Wed, 2013-04-03 at 00:15 +0200, Hannes Frederic Sowa wrote:
> On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
> > On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
> > 
> > > The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
> > > in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
> > > can even raise it to 0x3000 and don't see any tcp retransmits. Do you
> > > have an advice on how to size this value (e.g. should we switch to the
> > > windows values)?
> > 
> > This looks like an overflow error...
> 
> Thanks for your input, Eric.
> 
> I am limited in my time to work on this today but nontheless just tested
> your patch without any of my changes and count a lot of TcpRetransSegs
> again. Either there is really some hardware limitation or another
> overflow.

Another overflow...

Really I don't understand why people use u16 instead of u32.

u16 is slower most of the time, and more prone to overflows.

diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
index 7e0a822..48ac487 100644
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -1569,18 +1569,17 @@ static u16 atl1e_cal_tdp_req(const struct sk_buff *skb)
 {
 	int i = 0;
 	u16 tpd_req = 1;
-	u16 fg_size = 0;
-	u16 proto_hdr_len = 0;
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		fg_size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		u32 fg_size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+
 		tpd_req += ((fg_size + MAX_TX_BUF_LEN - 1) >> MAX_TX_BUF_SHIFT);
 	}
 
 	if (skb_is_gso(skb)) {
 		if (skb->protocol == htons(ETH_P_IP) ||
 		   (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV6)) {
-			proto_hdr_len = skb_transport_offset(skb) +
+			u32 proto_hdr_len = skb_transport_offset(skb) +
 					tcp_hdrlen(skb);
 			if (proto_hdr_len < skb_headlen(skb)) {
 				tpd_req += ((skb_headlen(skb) - proto_hdr_len +
@@ -1670,7 +1669,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
 {
 	struct atl1e_tpd_desc *use_tpd = NULL;
 	struct atl1e_tx_buffer *tx_buffer = NULL;
-	u16 buf_len = skb_headlen(skb);
+	u32 buf_len = skb_headlen(skb);
 	u16 map_len = 0;
 	u16 mapped_len = 0;
 	u16 hdr_len = 0;

^ permalink raw reply related

* Re: [PATCH 1/6] mac802154: Immediately retry sending failed packets
From: Alan Ott @ 2013-04-02 22:35 UTC (permalink / raw)
  To: Alexander Smirnov
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	linux-zigbee-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <515B4D79.40805-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

On 04/02/2013 05:28 PM, Alan Ott wrote:

> According to 7.5.6.5 of IEEE 802.15.4-2003, if the retransmission fails
> more than aMaxFrameRetries (3) times, it is assumed that it has failed.
> Since some transceivers (and I would assume most if not all) do this in
> hardware, it's now my opinion that we should _not_ try to retransmit at
> all in mac802154/tx.c.
> 
> For a driver for a device which _doesn't_ do retransmission in hardware,
> maybe it should be handled by that driver then.


It's worth noting that the mrf24j40, the at86rf230, and the cc2420
support retransmission in hardware (the first two are the only two in
mainline).

Alan.


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

^ permalink raw reply

* [PATCH net-next] sctp: remove 'sridhar' from maintainers list
From: Sridhar Samudrala @ 2013-04-02 22:35 UTC (permalink / raw)
  To: vyasevich, nhorman, davem; +Cc: netdev

Update SCTP maintainers list.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
---

diff --git a/MAINTAINERS b/MAINTAINERS
index d3a888c..c8f792a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6953,7 +6953,6 @@ F:	drivers/scsi/st*
 
 SCTP PROTOCOL
 M:	Vlad Yasevich <vyasevich@gmail.com>
-M:	Sridhar Samudrala <sri@us.ibm.com>
 M:	Neil Horman <nhorman@tuxdriver.com>
 L:	linux-sctp@vger.kernel.org
 W:	http://lksctp.sourceforge.net

^ permalink raw reply related

* [PATCH net-next] selftests: net: add PF_PACKET TPACKET v1/v2/v3 selftests
From: Daniel Borkmann @ 2013-04-02 23:00 UTC (permalink / raw)
  To: davem; +Cc: netdev

This patch adds a simple test case that probes the packet socket's
TPACKET_V1, TPACKET_V2 and TPACKET_V3 behavior regarding mmap(2)'ed
I/O for a small burst of 100 packets. The test currently runs for ...

  TPACKET_V1: RX_RING, TX_RING
  TPACKET_V2: RX_RING, TX_RING
  TPACKET_V3: RX_RING

... and will output on success:

  test: TPACKET_V1 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V1 with PACKET_TX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V2 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V2 with PACKET_TX_RING .................... 100 pkts (9600 bytes)
  test: TPACKET_V3 with PACKET_RX_RING .................... 100 pkts (9600 bytes)
  OK. All tests passed

Reusable parts of psock_fanout.c have been put into a psock_lib.h
file for common usage. Test case successfully tested on x86_64.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 tools/testing/selftests/net/Makefile          |   4 +-
 tools/testing/selftests/net/psock_fanout.c    |  88 +--
 tools/testing/selftests/net/psock_lib.h       | 127 ++++
 tools/testing/selftests/net/psock_tpacket.c   | 824 ++++++++++++++++++++++++++
 tools/testing/selftests/net/run_afpackettests |  10 +
 5 files changed, 966 insertions(+), 87 deletions(-)
 create mode 100644 tools/testing/selftests/net/psock_lib.h
 create mode 100644 tools/testing/selftests/net/psock_tpacket.c

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index bd6e272..750512b 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -1,11 +1,11 @@
 # Makefile for net selftests
 
 CC = $(CROSS_COMPILE)gcc
-CFLAGS = -Wall
+CFLAGS = -Wall -O2 -g
 
 CFLAGS += -I../../../../usr/include/
 
-NET_PROGS = socket psock_fanout
+NET_PROGS = socket psock_fanout psock_tpacket
 
 all: $(NET_PROGS)
 %: %.c
diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index 59bd636..57b9c2b 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -61,91 +61,9 @@
 #include <sys/types.h>
 #include <unistd.h>
 
-#define DATA_LEN			100
-#define DATA_CHAR			'a'
-#define RING_NUM_FRAMES			20
-#define PORT_BASE			8000
-
-static void pair_udp_open(int fds[], uint16_t port)
-{
-	struct sockaddr_in saddr, daddr;
-
-	fds[0] = socket(PF_INET, SOCK_DGRAM, 0);
-	fds[1] = socket(PF_INET, SOCK_DGRAM, 0);
-	if (fds[0] == -1 || fds[1] == -1) {
-		fprintf(stderr, "ERROR: socket dgram\n");
-		exit(1);
-	}
-
-	memset(&saddr, 0, sizeof(saddr));
-	saddr.sin_family = AF_INET;
-	saddr.sin_port = htons(port);
-	saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
-
-	memset(&daddr, 0, sizeof(daddr));
-	daddr.sin_family = AF_INET;
-	daddr.sin_port = htons(port + 1);
-	daddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+#include "psock_lib.h"
 
-	/* must bind both to get consistent hash result */
-	if (bind(fds[1], (void *) &daddr, sizeof(daddr))) {
-		perror("bind");
-		exit(1);
-	}
-	if (bind(fds[0], (void *) &saddr, sizeof(saddr))) {
-		perror("bind");
-		exit(1);
-	}
-	if (connect(fds[0], (void *) &daddr, sizeof(daddr))) {
-		perror("connect");
-		exit(1);
-	}
-}
-
-static void pair_udp_send(int fds[], int num)
-{
-	char buf[DATA_LEN], rbuf[DATA_LEN];
-
-	memset(buf, DATA_CHAR, sizeof(buf));
-	while (num--) {
-		/* Should really handle EINTR and EAGAIN */
-		if (write(fds[0], buf, sizeof(buf)) != sizeof(buf)) {
-			fprintf(stderr, "ERROR: send failed left=%d\n", num);
-			exit(1);
-		}
-		if (read(fds[1], rbuf, sizeof(rbuf)) != sizeof(rbuf)) {
-			fprintf(stderr, "ERROR: recv failed left=%d\n", num);
-			exit(1);
-		}
-		if (memcmp(buf, rbuf, sizeof(buf))) {
-			fprintf(stderr, "ERROR: data failed left=%d\n", num);
-			exit(1);
-		}
-	}
-}
-
-static void sock_fanout_setfilter(int fd)
-{
-	struct sock_filter bpf_filter[] = {
-		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
-		{ 0x35, 0, 5, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
-		{ 0x15, 0, 3, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x30, 0, 0, 0x00000051 },  /* LD  ip[81]		      */
-		{ 0x15, 0, 1, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
-		{ 0x6, 0, 0, 0x00000060  },  /* RET match	              */
-/* nomatch */	{ 0x6, 0, 0, 0x00000000  },  /* RET no match		      */
-	};
-	struct sock_fprog bpf_prog;
-
-	bpf_prog.filter = bpf_filter;
-	bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
-	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf_prog,
-		       sizeof(bpf_prog))) {
-		perror("setsockopt SO_ATTACH_FILTER");
-		exit(1);
-	}
-}
+#define RING_NUM_FRAMES			20
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
@@ -169,7 +87,7 @@ static int sock_fanout_open(uint16_t typeflags, int num_packets)
 		return -1;
 	}
 
-	sock_fanout_setfilter(fd);
+	pair_udp_setfilter(fd);
 	return fd;
 }
 
diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
new file mode 100644
index 0000000..37da54a
--- /dev/null
+++ b/tools/testing/selftests/net/psock_lib.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright 2013 Google Inc.
+ * Author: Willem de Bruijn <willemb@google.com>
+ *         Daniel Borkmann <dborkman@redhat.com>
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef PSOCK_LIB_H
+#define PSOCK_LIB_H
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <string.h>
+#include <arpa/inet.h>
+#include <unistd.h>
+
+#define DATA_LEN			100
+#define DATA_CHAR			'a'
+
+#define PORT_BASE			8000
+
+#ifndef __maybe_unused
+# define __maybe_unused		__attribute__ ((__unused__))
+#endif
+
+static __maybe_unused void pair_udp_setfilter(int fd)
+{
+	struct sock_filter bpf_filter[] = {
+		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
+		{ 0x35, 0, 5, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
+		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
+		{ 0x15, 0, 3, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
+		{ 0x30, 0, 0, 0x00000051 },  /* LD  ip[81]		      */
+		{ 0x15, 0, 1, DATA_CHAR  },  /* JEQ DATA_CHAR [f goto nomatch]*/
+		{ 0x06, 0, 0, 0x00000060 },  /* RET match	              */
+		{ 0x06, 0, 0, 0x00000000 },  /* RET no match		      */
+	};
+	struct sock_fprog bpf_prog;
+
+	bpf_prog.filter = bpf_filter;
+	bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
+	if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf_prog,
+		       sizeof(bpf_prog))) {
+		perror("setsockopt SO_ATTACH_FILTER");
+		exit(1);
+	}
+}
+
+static __maybe_unused void pair_udp_open(int fds[], uint16_t port)
+{
+	struct sockaddr_in saddr, daddr;
+
+	fds[0] = socket(PF_INET, SOCK_DGRAM, 0);
+	fds[1] = socket(PF_INET, SOCK_DGRAM, 0);
+	if (fds[0] == -1 || fds[1] == -1) {
+		fprintf(stderr, "ERROR: socket dgram\n");
+		exit(1);
+	}
+
+	memset(&saddr, 0, sizeof(saddr));
+	saddr.sin_family = AF_INET;
+	saddr.sin_port = htons(port);
+	saddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+	memset(&daddr, 0, sizeof(daddr));
+	daddr.sin_family = AF_INET;
+	daddr.sin_port = htons(port + 1);
+	daddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+	/* must bind both to get consistent hash result */
+	if (bind(fds[1], (void *) &daddr, sizeof(daddr))) {
+		perror("bind");
+		exit(1);
+	}
+	if (bind(fds[0], (void *) &saddr, sizeof(saddr))) {
+		perror("bind");
+		exit(1);
+	}
+	if (connect(fds[0], (void *) &daddr, sizeof(daddr))) {
+		perror("connect");
+		exit(1);
+	}
+}
+
+static __maybe_unused void pair_udp_send(int fds[], int num)
+{
+	char buf[DATA_LEN], rbuf[DATA_LEN];
+
+	memset(buf, DATA_CHAR, sizeof(buf));
+	while (num--) {
+		/* Should really handle EINTR and EAGAIN */
+		if (write(fds[0], buf, sizeof(buf)) != sizeof(buf)) {
+			fprintf(stderr, "ERROR: send failed left=%d\n", num);
+			exit(1);
+		}
+		if (read(fds[1], rbuf, sizeof(rbuf)) != sizeof(rbuf)) {
+			fprintf(stderr, "ERROR: recv failed left=%d\n", num);
+			exit(1);
+		}
+		if (memcmp(buf, rbuf, sizeof(buf))) {
+			fprintf(stderr, "ERROR: data failed left=%d\n", num);
+			exit(1);
+		}
+	}
+}
+
+static __maybe_unused void pair_udp_close(int fds[])
+{
+	close(fds[0]);
+	close(fds[1]);
+}
+
+#endif /* PSOCK_LIB_H */
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
new file mode 100644
index 0000000..a8d7ffa
--- /dev/null
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -0,0 +1,824 @@
+/*
+ * Copyright 2013 Red Hat, Inc.
+ * Author: Daniel Borkmann <dborkman@redhat.com>
+ *
+ * A basic test of packet socket's TPACKET_V1/TPACKET_V2/TPACKET_V3 behavior.
+ *
+ * Control:
+ *   Test the setup of the TPACKET socket with different patterns that are
+ *   known to fail (TODO) resp. succeed (OK).
+ *
+ * Datapath:
+ *   Open a pair of packet sockets and send resp. receive an a priori known
+ *   packet pattern accross the sockets and check if it was received resp.
+ *   sent correctly. Fanout in combination with RX_RING is currently not
+ *   tested here.
+ *
+ *   The test currently runs for
+ *   - TPACKET_V1: RX_RING, TX_RING
+ *   - TPACKET_V2: RX_RING, TX_RING
+ *   - TPACKET_V3: RX_RING
+ *
+ * License (GPLv2):
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <linux/if_packet.h>
+#include <linux/filter.h>
+#include <ctype.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <bits/wordsize.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <arpa/inet.h>
+#include <stdint.h>
+#include <string.h>
+#include <assert.h>
+#include <net/if.h>
+#include <inttypes.h>
+#include <poll.h>
+
+#include "psock_lib.h"
+
+#ifndef bug_on
+# define bug_on(cond)		assert(!(cond))
+#endif
+
+#ifndef __aligned_tpacket
+# define __aligned_tpacket	__attribute__((aligned(TPACKET_ALIGNMENT)))
+#endif
+
+#ifndef __align_tpacket
+# define __align_tpacket(x)	__attribute__((aligned(TPACKET_ALIGN(x))))
+#endif
+
+#define BLOCK_STATUS(x)		((x)->h1.block_status)
+#define BLOCK_NUM_PKTS(x)	((x)->h1.num_pkts)
+#define BLOCK_O2FP(x)		((x)->h1.offset_to_first_pkt)
+#define BLOCK_LEN(x)		((x)->h1.blk_len)
+#define BLOCK_SNUM(x)		((x)->h1.seq_num)
+#define BLOCK_O2PRIV(x)		((x)->offset_to_priv)
+#define BLOCK_PRIV(x)		((void *) ((uint8_t *) (x) + BLOCK_O2PRIV(x)))
+#define BLOCK_HDR_LEN		(ALIGN_8(sizeof(struct block_desc)))
+#define ALIGN_8(x)		(((x) + 8 - 1) & ~(8 - 1))
+#define BLOCK_PLUS_PRIV(sz_pri)	(BLOCK_HDR_LEN + ALIGN_8((sz_pri)))
+
+#define NUM_PACKETS		100
+
+struct ring {
+	struct iovec *rd;
+	uint8_t *mm_space;
+	size_t mm_len, rd_len;
+	struct sockaddr_ll ll;
+	void (*walk)(int sock, struct ring *ring);
+	int type, rd_num, flen, version;
+	union {
+		struct tpacket_req  req;
+		struct tpacket_req3 req3;
+	};
+};
+
+struct block_desc {
+	uint32_t version;
+	uint32_t offset_to_priv;
+	struct tpacket_hdr_v1 h1;
+};
+
+union frame_map {
+	struct {
+		struct tpacket_hdr tp_h __aligned_tpacket;
+		struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket_hdr));
+	} *v1;
+	struct {
+		struct tpacket2_hdr tp_h __aligned_tpacket;
+		struct sockaddr_ll s_ll __align_tpacket(sizeof(struct tpacket2_hdr));
+	} *v2;
+	void *raw;
+};
+
+static unsigned int total_packets, total_bytes;
+
+static int pfsocket(int ver)
+{
+	int ret, sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (sock == -1) {
+		perror("socket");
+		exit(1);
+	}
+
+	ret = setsockopt(sock, SOL_PACKET, PACKET_VERSION, &ver, sizeof(ver));
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+
+	return sock;
+}
+
+static void status_bar_update(void)
+{
+	if (total_packets % 10 == 0) {
+		fprintf(stderr, ".");
+		fflush(stderr);
+	}
+}
+
+static void test_payload(void *pay, size_t len)
+{
+	struct ethhdr *eth = pay;
+
+	if (len < sizeof(struct ethhdr)) {
+		fprintf(stderr, "test_payload: packet too "
+			"small: %zu bytes!\n", len);
+		exit(1);
+	}
+
+	if (eth->h_proto != htons(ETH_P_IP)) {
+		fprintf(stderr, "test_payload: wrong ethernet "
+			"type: 0x%x!\n", ntohs(eth->h_proto));
+		exit(1);
+	}
+}
+
+static void create_payload(void *pay, size_t *len)
+{
+	int i;
+	struct ethhdr *eth = pay;
+	struct iphdr *ip = pay + sizeof(*eth);
+
+	/* Lets create some broken crap, that still passes
+	 * our BPF filter.
+	 */
+
+	*len = DATA_LEN + 42;
+
+	memset(pay, 0xff, ETH_ALEN * 2);
+	eth->h_proto = htons(ETH_P_IP);
+
+	for (i = 0; i < sizeof(*ip); ++i)
+		((uint8_t *) pay)[i + sizeof(*eth)] = (uint8_t) rand();
+
+	ip->ihl = 5;
+	ip->version = 4;
+	ip->protocol = 0x11;
+	ip->frag_off = 0;
+	ip->ttl = 64;
+	ip->tot_len = htons((uint16_t) *len - sizeof(*eth));
+
+	ip->saddr = htonl(INADDR_LOOPBACK);
+	ip->daddr = htonl(INADDR_LOOPBACK);
+
+	memset(pay + sizeof(*eth) + sizeof(*ip),
+	       DATA_CHAR, DATA_LEN);
+}
+
+static inline int __v1_rx_kernel_ready(struct tpacket_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v1_rx_user_ready(struct tpacket_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static inline int __v2_rx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_USER) == TP_STATUS_USER);
+}
+
+static inline void __v2_rx_user_ready(struct tpacket2_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static inline int __v1_v2_rx_kernel_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		return __v1_rx_kernel_ready(base);
+	case TPACKET_V2:
+		return __v2_rx_kernel_ready(base);
+	default:
+		bug_on(1);
+		return 0;
+	}
+}
+
+static inline void __v1_v2_rx_user_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		__v1_rx_user_ready(base);
+		break;
+	case TPACKET_V2:
+		__v2_rx_user_ready(base);
+		break;
+	}
+}
+
+static void walk_v1_v2_rx(int sock, struct ring *ring)
+{
+	struct pollfd pfd;
+	int udp_sock[2];
+	union frame_map ppd;
+	unsigned int frame_num = 0;
+
+	bug_on(ring->type != PACKET_RX_RING);
+
+	pair_udp_open(udp_sock, PORT_BASE);
+	pair_udp_setfilter(sock);
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLIN | POLLERR;
+	pfd.revents = 0;
+
+	pair_udp_send(udp_sock, NUM_PACKETS);
+
+	while (total_packets < NUM_PACKETS * 2) {
+		while (__v1_v2_rx_kernel_ready(ring->rd[frame_num].iov_base,
+					       ring->version)) {
+			ppd.raw = ring->rd[frame_num].iov_base;
+
+			switch (ring->version) {
+			case TPACKET_V1:
+				test_payload((uint8_t *) ppd.raw + ppd.v1->tp_h.tp_mac,
+					     ppd.v1->tp_h.tp_snaplen);
+				total_bytes += ppd.v1->tp_h.tp_snaplen;
+				break;
+
+			case TPACKET_V2:
+				test_payload((uint8_t *) ppd.raw + ppd.v2->tp_h.tp_mac,
+					     ppd.v2->tp_h.tp_snaplen);
+				total_bytes += ppd.v2->tp_h.tp_snaplen;
+				break;
+			}
+
+			status_bar_update();
+			total_packets++;
+
+			__v1_v2_rx_user_ready(ppd.raw, ring->version);
+
+			frame_num = (frame_num + 1) % ring->rd_num;
+		}
+
+		poll(&pfd, 1, 1);
+	}
+
+	pair_udp_close(udp_sock);
+
+	if (total_packets != 2 * NUM_PACKETS) {
+		fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+			ring->version, total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static inline int __v1_tx_kernel_ready(struct tpacket_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_AVAILABLE) == TP_STATUS_AVAILABLE);
+}
+
+static inline void __v1_tx_user_ready(struct tpacket_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_SEND_REQUEST;
+	__sync_synchronize();
+}
+
+static inline int __v2_tx_kernel_ready(struct tpacket2_hdr *hdr)
+{
+	return ((hdr->tp_status & TP_STATUS_AVAILABLE) == TP_STATUS_AVAILABLE);
+}
+
+static inline void __v2_tx_user_ready(struct tpacket2_hdr *hdr)
+{
+	hdr->tp_status = TP_STATUS_SEND_REQUEST;
+	__sync_synchronize();
+}
+
+static inline int __v1_v2_tx_kernel_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		return __v1_tx_kernel_ready(base);
+	case TPACKET_V2:
+		return __v2_tx_kernel_ready(base);
+	default:
+		bug_on(1);
+		return 0;
+	}
+}
+
+static inline void __v1_v2_tx_user_ready(void *base, int version)
+{
+	switch (version) {
+	case TPACKET_V1:
+		__v1_tx_user_ready(base);
+		break;
+	case TPACKET_V2:
+		__v2_tx_user_ready(base);
+		break;
+	}
+}
+
+static void __v1_v2_set_packet_loss_discard(int sock)
+{
+	int ret, discard = 1;
+
+	ret = setsockopt(sock, SOL_PACKET, PACKET_LOSS, (void *) &discard,
+			 sizeof(discard));
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+}
+
+static void walk_v1_v2_tx(int sock, struct ring *ring)
+{
+	struct pollfd pfd;
+	int rcv_sock, ret;
+	size_t packet_len;
+	union frame_map ppd;
+	char packet[1024];
+	unsigned int frame_num = 0, got = 0;
+	struct sockaddr_ll ll = {
+		.sll_family = PF_PACKET,
+		.sll_halen = ETH_ALEN,
+	};
+
+	bug_on(ring->type != PACKET_TX_RING);
+	bug_on(ring->rd_num < NUM_PACKETS);
+
+	rcv_sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+	if (rcv_sock == -1) {
+		perror("socket");
+		exit(1);
+	}
+
+	pair_udp_setfilter(rcv_sock);
+
+	ll.sll_ifindex = if_nametoindex("lo");
+	ret = bind(rcv_sock, (struct sockaddr *) &ll, sizeof(ll));
+	if (ret == -1) {
+		perror("bind");
+		exit(1);
+	}
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLOUT | POLLERR;
+	pfd.revents = 0;
+
+	total_packets = NUM_PACKETS;
+	create_payload(packet, &packet_len);
+
+	while (total_packets > 0) {
+		while (__v1_v2_tx_kernel_ready(ring->rd[frame_num].iov_base,
+					       ring->version) &&
+		       total_packets > 0) {
+			ppd.raw = ring->rd[frame_num].iov_base;
+
+			switch (ring->version) {
+			case TPACKET_V1:
+				ppd.v1->tp_h.tp_snaplen = packet_len;
+				ppd.v1->tp_h.tp_len = packet_len;
+
+				memcpy((uint8_t *) ppd.raw + TPACKET_HDRLEN -
+				       sizeof(struct sockaddr_ll), packet,
+				       packet_len);
+				total_bytes += ppd.v1->tp_h.tp_snaplen;
+				break;
+
+			case TPACKET_V2:
+				ppd.v2->tp_h.tp_snaplen = packet_len;
+				ppd.v2->tp_h.tp_len = packet_len;
+
+				memcpy((uint8_t *) ppd.raw + TPACKET2_HDRLEN -
+				       sizeof(struct sockaddr_ll), packet,
+				       packet_len);
+				total_bytes += ppd.v2->tp_h.tp_snaplen;
+				break;
+			}
+
+			status_bar_update();
+			total_packets--;
+
+			__v1_v2_tx_user_ready(ppd.raw, ring->version);
+
+			frame_num = (frame_num + 1) % ring->rd_num;
+		}
+
+		poll(&pfd, 1, 1);
+	}
+
+	bug_on(total_packets != 0);
+
+	ret = sendto(sock, NULL, 0, 0, NULL, 0);
+	if (ret == -1) {
+		perror("sendto");
+		exit(1);
+	}
+
+	while ((ret = recvfrom(rcv_sock, packet, sizeof(packet),
+			       0, NULL, NULL)) > 0 &&
+	       total_packets < NUM_PACKETS) {
+		got += ret;
+		test_payload(packet, ret);
+
+		status_bar_update();
+		total_packets++;
+	}
+
+	close(rcv_sock);
+
+	if (total_packets != NUM_PACKETS) {
+		fprintf(stderr, "walk_v%d_rx: received %u out of %u pkts\n",
+			ring->version, total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, got);
+}
+
+static void walk_v1_v2(int sock, struct ring *ring)
+{
+	if (ring->type == PACKET_RX_RING)
+		walk_v1_v2_rx(sock, ring);
+	else
+		walk_v1_v2_tx(sock, ring);
+}
+
+static uint64_t __v3_prev_block_seq_num = 0;
+
+void __v3_test_block_seq_num(struct block_desc *pbd)
+{
+	if (__v3_prev_block_seq_num + 1 != BLOCK_SNUM(pbd)) {
+		fprintf(stderr, "\nprev_block_seq_num:%"PRIu64", expected "
+			"seq:%"PRIu64" != actual seq:%"PRIu64"\n",
+			__v3_prev_block_seq_num, __v3_prev_block_seq_num + 1,
+			(uint64_t) BLOCK_SNUM(pbd));
+		exit(1);
+	}
+
+	__v3_prev_block_seq_num = BLOCK_SNUM(pbd);
+}
+
+static void __v3_test_block_len(struct block_desc *pbd, uint32_t bytes, int block_num)
+{
+	if (BLOCK_NUM_PKTS(pbd)) {
+		if (bytes != BLOCK_LEN(pbd)) {
+			fprintf(stderr, "\nblock:%u with %upackets, expected "
+				"len:%u != actual len:%u\n", block_num,
+				BLOCK_NUM_PKTS(pbd), bytes, BLOCK_LEN(pbd));
+			exit(1);
+		}
+	} else {
+		if (BLOCK_LEN(pbd) != BLOCK_PLUS_PRIV(13)) {
+			fprintf(stderr, "\nblock:%u, expected len:%lu != "
+				"actual len:%u\n", block_num, BLOCK_HDR_LEN,
+				BLOCK_LEN(pbd));
+			exit(1);
+		}
+	}
+}
+
+static void __v3_test_block_header(struct block_desc *pbd, const int block_num)
+{
+	uint32_t block_status = BLOCK_STATUS(pbd);
+
+	if ((block_status & TP_STATUS_USER) == 0) {
+		fprintf(stderr, "\nblock %u: not in TP_STATUS_USER\n", block_num);
+		exit(1);
+	}
+
+	__v3_test_block_seq_num(pbd);
+}
+
+static void __v3_walk_block(struct block_desc *pbd, const int block_num)
+{
+	int num_pkts = BLOCK_NUM_PKTS(pbd), i;
+	unsigned long bytes = 0;
+	unsigned long bytes_with_padding = BLOCK_PLUS_PRIV(13);
+	struct tpacket3_hdr *ppd;
+
+	__v3_test_block_header(pbd, block_num);
+
+	ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + BLOCK_O2FP(pbd));
+	for (i = 0; i < num_pkts; ++i) {
+		bytes += ppd->tp_snaplen;
+
+		if (ppd->tp_next_offset)
+			bytes_with_padding += ppd->tp_next_offset;
+		else
+			bytes_with_padding += ALIGN_8(ppd->tp_snaplen + ppd->tp_mac);
+
+		test_payload((uint8_t *) ppd + ppd->tp_mac, ppd->tp_snaplen);
+
+		status_bar_update();
+		total_packets++;
+
+		ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd + ppd->tp_next_offset);
+		__sync_synchronize();
+	}
+
+	__v3_test_block_len(pbd, bytes_with_padding, block_num);
+	total_bytes += bytes;
+}
+
+void __v3_flush_block(struct block_desc *pbd)
+{
+	BLOCK_STATUS(pbd) = TP_STATUS_KERNEL;
+	__sync_synchronize();
+}
+
+static void walk_v3_rx(int sock, struct ring *ring)
+{
+	unsigned int block_num = 0;
+	struct pollfd pfd;
+	struct block_desc *pbd;
+	int udp_sock[2];
+
+	bug_on(ring->type != PACKET_RX_RING);
+
+	pair_udp_open(udp_sock, PORT_BASE);
+	pair_udp_setfilter(sock);
+
+	memset(&pfd, 0, sizeof(pfd));
+	pfd.fd = sock;
+	pfd.events = POLLIN | POLLERR;
+	pfd.revents = 0;
+
+	pair_udp_send(udp_sock, NUM_PACKETS);
+
+	while (total_packets < NUM_PACKETS * 2) {
+		pbd = (struct block_desc *) ring->rd[block_num].iov_base;
+
+		while ((BLOCK_STATUS(pbd) & TP_STATUS_USER) == 0)
+			poll(&pfd, 1, 1);
+
+		__v3_walk_block(pbd, block_num);
+		__v3_flush_block(pbd);
+
+		block_num = (block_num + 1) % ring->rd_num;
+	}
+
+	pair_udp_close(udp_sock);
+
+	if (total_packets != 2 * NUM_PACKETS) {
+		fprintf(stderr, "walk_v3_rx: received %u out of %u pkts\n",
+			total_packets, NUM_PACKETS);
+		exit(1);
+	}
+
+	fprintf(stderr, " %u pkts (%u bytes)", NUM_PACKETS, total_bytes >> 1);
+}
+
+static void walk_v3(int sock, struct ring *ring)
+{
+	if (ring->type == PACKET_RX_RING)
+		walk_v3_rx(sock, ring);
+	else
+		bug_on(1);
+}
+
+static void __v1_v2_fill(struct ring *ring, unsigned int blocks)
+{
+	ring->req.tp_block_size = getpagesize() << 2;
+	ring->req.tp_frame_size = TPACKET_ALIGNMENT << 7;
+	ring->req.tp_block_nr = blocks;
+
+	ring->req.tp_frame_nr = ring->req.tp_block_size /
+				ring->req.tp_frame_size *
+				ring->req.tp_block_nr;
+
+	ring->mm_len = ring->req.tp_block_size * ring->req.tp_block_nr;
+	ring->walk = walk_v1_v2;
+	ring->rd_num = ring->req.tp_frame_nr;
+	ring->flen = ring->req.tp_frame_size;
+}
+
+static void __v3_fill(struct ring *ring, unsigned int blocks)
+{
+	ring->req3.tp_retire_blk_tov = 64;
+	ring->req3.tp_sizeof_priv = 13;
+	ring->req3.tp_feature_req_word |= TP_FT_REQ_FILL_RXHASH;
+
+	ring->req3.tp_block_size = getpagesize() << 2;
+	ring->req3.tp_frame_size = TPACKET_ALIGNMENT << 7;
+	ring->req3.tp_block_nr = blocks;
+
+	ring->req3.tp_frame_nr = ring->req3.tp_block_size /
+				 ring->req3.tp_frame_size *
+				 ring->req3.tp_block_nr;
+
+	ring->mm_len = ring->req3.tp_block_size * ring->req3.tp_block_nr;
+	ring->walk = walk_v3;
+	ring->rd_num = ring->req3.tp_block_nr;
+	ring->flen = ring->req3.tp_block_size;
+}
+
+static void setup_ring(int sock, struct ring *ring, int version, int type)
+{
+	int ret = 0;
+	unsigned int blocks = 256;
+
+	ring->type = type;
+	ring->version = version;
+
+	switch (version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		if (type == PACKET_TX_RING)
+			__v1_v2_set_packet_loss_discard(sock);
+		__v1_v2_fill(ring, blocks);
+		ret = setsockopt(sock, SOL_PACKET, type, &ring->req,
+				 sizeof(ring->req));
+		break;
+
+	case TPACKET_V3:
+		__v3_fill(ring, blocks);
+		ret = setsockopt(sock, SOL_PACKET, type, &ring->req3,
+				 sizeof(ring->req3));
+		break;
+	}
+
+	if (ret == -1) {
+		perror("setsockopt");
+		exit(1);
+	}
+
+	ring->rd_len = ring->rd_num * sizeof(*ring->rd);
+	ring->rd = malloc(ring->rd_len);
+	if (ring->rd == NULL) {
+		perror("malloc");
+		exit(1);
+	}
+
+	total_packets = 0;
+	total_bytes = 0;
+}
+
+static void mmap_ring(int sock, struct ring *ring)
+{
+	int i;
+
+	ring->mm_space = mmap(0, ring->mm_len, PROT_READ | PROT_WRITE,
+			      MAP_SHARED | MAP_LOCKED | MAP_POPULATE, sock, 0);
+	if (ring->mm_space == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	memset(ring->rd, 0, ring->rd_len);
+	for (i = 0; i < ring->rd_num; ++i) {
+		ring->rd[i].iov_base = ring->mm_space + (i * ring->flen);
+		ring->rd[i].iov_len = ring->flen;
+	}
+}
+
+static void bind_ring(int sock, struct ring *ring)
+{
+	int ret;
+
+	ring->ll.sll_family = PF_PACKET;
+	ring->ll.sll_protocol = htons(ETH_P_ALL);
+	ring->ll.sll_ifindex = if_nametoindex("lo");
+	ring->ll.sll_hatype = 0;
+	ring->ll.sll_pkttype = 0;
+	ring->ll.sll_halen = 0;
+
+	ret = bind(sock, (struct sockaddr *) &ring->ll, sizeof(ring->ll));
+	if (ret == -1) {
+		perror("bind");
+		exit(1);
+	}
+}
+
+static void walk_ring(int sock, struct ring *ring)
+{
+	ring->walk(sock, ring);
+}
+
+static void unmap_ring(int sock, struct ring *ring)
+{
+	munmap(ring->mm_space, ring->mm_len);
+	free(ring->rd);
+}
+
+static int test_kernel_bit_width(void)
+{
+	char in[512], *ptr;
+	int num = 0, fd;
+	ssize_t ret;
+
+	fd = open("/proc/kallsyms", O_RDONLY);
+	if (fd == -1) {
+		perror("open");
+		exit(1);
+	}
+
+	ret = read(fd, in, sizeof(in));
+	if (ret <= 0) {
+		perror("read");
+		exit(1);
+	}
+
+	close(fd);
+
+	ptr = in;
+	while(!isspace(*ptr)) {
+		num++;
+		ptr++;
+	}
+
+	return num * 4;
+}
+
+static int test_user_bit_width(void)
+{
+	return __WORDSIZE;
+}
+
+static const char *tpacket_str[] = {
+	[TPACKET_V1] = "TPACKET_V1",
+	[TPACKET_V2] = "TPACKET_V2",
+	[TPACKET_V3] = "TPACKET_V3",
+};
+
+static const char *type_str[] = {
+	[PACKET_RX_RING] = "PACKET_RX_RING",
+	[PACKET_TX_RING] = "PACKET_TX_RING",
+};
+
+static int test_tpacket(int version, int type)
+{
+	int sock;
+	struct ring ring;
+
+	fprintf(stderr, "test: %s with %s ", tpacket_str[version],
+		type_str[type]);
+	fflush(stderr);
+
+	if (version == TPACKET_V1 &&
+	    test_kernel_bit_width() != test_user_bit_width()) {
+		fprintf(stderr, "test: skip %s %s since user and kernel "
+			"space have different bit width\n",
+			tpacket_str[version], type_str[type]);
+		return 0;
+	}
+
+	sock = pfsocket(version);
+	memset(&ring, 0, sizeof(ring));
+	setup_ring(sock, &ring, version, type);
+	mmap_ring(sock, &ring);
+	bind_ring(sock, &ring);
+	walk_ring(sock, &ring);
+	unmap_ring(sock, &ring);
+	close(sock);
+
+	fprintf(stderr, "\n");
+	return 0;
+}
+
+int main(void)
+{
+	int ret = 0;
+
+	ret |= test_tpacket(TPACKET_V1, PACKET_RX_RING);
+	ret |= test_tpacket(TPACKET_V1, PACKET_TX_RING);
+
+	ret |= test_tpacket(TPACKET_V2, PACKET_RX_RING);
+	ret |= test_tpacket(TPACKET_V2, PACKET_TX_RING);
+
+	ret |= test_tpacket(TPACKET_V3, PACKET_RX_RING);
+
+	if (ret)
+		return 1;
+
+	printf("OK. All tests passed\n");
+	return 0;
+}
diff --git a/tools/testing/selftests/net/run_afpackettests b/tools/testing/selftests/net/run_afpackettests
index 7907824..5246e78 100644
--- a/tools/testing/selftests/net/run_afpackettests
+++ b/tools/testing/selftests/net/run_afpackettests
@@ -14,3 +14,13 @@ if [ $? -ne 0 ]; then
 else
 	echo "[PASS]"
 fi
+
+echo "--------------------"
+echo "running psock_tpacket test"
+echo "--------------------"
+./psock_tpacket
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+else
+	echo "[PASS]"
+fi
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH 1/6] mac802154: Immediately retry sending failed packets
From: Werner Almesberger @ 2013-04-02 23:13 UTC (permalink / raw)
  To: Alan Ott
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	linux-zigbee-devel, linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <515B4D79.40805-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Alan Ott wrote:
> it's now my opinion that we should _not_ try to retransmit at
> all in mac802154/tx.c.

I think the currently blocking workqueue design is ugly and
quite contrary to how most the rest of the stack works. So
anything that kills it has my blessing :-)

I do wonder though why it was done like this in the first place.
Just for convenience ?

If we want to move towards an asynchronous interface, it could
exist in parallel with the current one. That way, drivers could
be migrated one by one.

Having said that, the errors you get there may not be failed
single transmissions on the air but some form of congestion in
the driver or a problem with the device. But I don't think
that's a valid reason for retrying the transmission at that
level.

- Werner

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

^ permalink raw reply

* Re: [PATCH net-next 5/7] r8169: add a new chip for RTL8111G
From: Francois Romieu @ 2013-04-02 23:20 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, linux-kernel
In-Reply-To: <2808A1ABA9A5470C8CB9534309588984@realtek.com.tw>

hayeswang <hayeswang@realtek.com> :
> Francois Romieu [mailto:romieu@fr.zoreil.com] 
[...]
> > There is close to zero added value for this stuff in the kernel.
> > You may as well move it completely into the firmware.
> 
> Do you mean all of the phy settings ? I have checked these settings with
> our hw engineers. These are not firmware. 

Undocumented configuration data which is subject to change over time ?

No one outside of Realtek can make any sense of this opaque pile of data.
There is no point in me or anybody else rubber stamping it for inclusion.

-- 
Ueimor

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Hannes Frederic Sowa @ 2013-04-02 23:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Huang, Xiong, Ben Hutchings, Anders Boström,
	netdev@vger.kernel.org, 565404@bugs.debian.org
In-Reply-To: <1364942093.5113.189.camel@edumazet-glaptop>

On Tue, Apr 02, 2013 at 03:34:53PM -0700, Eric Dumazet wrote:
> On Wed, 2013-04-03 at 00:15 +0200, Hannes Frederic Sowa wrote:
> > On Tue, Apr 02, 2013 at 03:00:38PM -0700, Eric Dumazet wrote:
> > > On Tue, 2013-04-02 at 23:15 +0200, Hannes Frederic Sowa wrote:
> > > 
> > > > The error vanishes as soon as I put a gso size limit of MAX_TX_BUF_LEN
> > > > in the driver. MAX_TX_BUF_LEN seems to be arbitrary set to 0x2000. I
> > > > can even raise it to 0x3000 and don't see any tcp retransmits. Do you
> > > > have an advice on how to size this value (e.g. should we switch to the
> > > > windows values)?
> > > 
> > > This looks like an overflow error...
> > 
> > Thanks for your input, Eric.
> > 
> > I am limited in my time to work on this today but nontheless just tested
> > your patch without any of my changes and count a lot of TcpRetransSegs
> > again. Either there is really some hardware limitation or another
> > overflow.
> 
> Another overflow...
> 
> Really I don't understand why people use u16 instead of u32.
> 
> u16 is slower most of the time, and more prone to overflows.

Just gave your patch a test and I still have a fast increasing tcp
retransmitted segments counter.

Maximum skb length hitting the device is 23234 in my tests (as reported
by ftrace). So I actually think it is a device limitation.

^ permalink raw reply

* Re: [PATCH net-next 5/7] r8169: add a new chip for RTL8111G
From: David Miller @ 2013-04-02 23:26 UTC (permalink / raw)
  To: romieu; +Cc: hayeswang, netdev, linux-kernel
In-Reply-To: <20130402232008.GB16612@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 3 Apr 2013 01:20:08 +0200

> hayeswang <hayeswang@realtek.com> :
>> Francois Romieu [mailto:romieu@fr.zoreil.com] 
> [...]
>> > There is close to zero added value for this stuff in the kernel.
>> > You may as well move it completely into the firmware.
>> 
>> Do you mean all of the phy settings ? I have checked these settings with
>> our hw engineers. These are not firmware. 
> 
> Undocumented configuration data which is subject to change over time ?
> 
> No one outside of Realtek can make any sense of this opaque pile of data.
> There is no point in me or anybody else rubber stamping it for inclusion.

Right, so either document all of these indirect registers being programmed
in the PHY or move it to firmware.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox