Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 0/3] can: at91_can: fix for errata 50.2.6.3 & 50.3.5.3
From: Marc Kleine-Budde @ 2011-01-24 14:19 UTC (permalink / raw)
  To: David Miller
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1295878532-15769-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 2083 bytes --]

On 01/24/2011 03:15 PM, Marc Kleine-Budde wrote:
> Hello,
> 
> as promised I've implemented the proposed workaround for the errata
> 50.2.6.3 & 50.3.5.3:
> "Contents of Mailbox 0 can be sent Even if Mailbox is Disabled"
> 
> This means under high bus load it can happen that the mailbox 0 is send
> to the bus. And it does happen, even with the mainline driver where
> Mailbox 0 is a receive mailbox. The errata proposes not to use mailbox 0
> and load it with an unused can_id that will not disturb the bus.
> 
> The first patch cleans up the driver without any functional changes, so
> that the mailbox 0 can be disabled in the second patch. The third patch
> adds a sysfs parameter to the driver, so that the identifier of mailbox 0
> can configured.
> 
> This series applies to net-2.6/master. It has been tested on a ronetix pm9263
> board against a PCI-SJA1000 card with the canfdtest utility and on custom
> at91 boards against each other.

I've updated the patch series in my git-repo, too.

The following changes since commit b30532515f0a62bfe17207ab00883dd262497006:

  bonding: Ensure that we unshare skbs prior to calling pskb_may_pull (2011-01-20 16:45:56 -0800)

are available in the git repository at:
  git://git.pengutronix.de/git/mkl/linux-2.6.git can/at91_can-for-net-2.6

Marc Kleine-Budde (3):
      can: at91_can: clean up usage of AT91_MB_RX_FIRST and AT91_MB_RX_NUM
      can: at91_can: don't use mailbox 0
      can: at91_can: make can_id of mailbox 0 configurable

 Documentation/ABI/testing/sysfs-platform-at91 |   25 +++++
 drivers/net/can/at91_can.c                    |  138 ++++++++++++++++++++-----
 2 files changed, 137 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-platform-at91

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 188 bytes --]

_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core

^ permalink raw reply

* Re: [PATCH v5] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Eric Dumazet @ 2011-01-24 15:07 UTC (permalink / raw)
  To: Po-Yu Chuang
  Cc: netdev, linux-kernel, bhutchings, joe, dilinger, mirqus,
	Po-Yu Chuang
In-Reply-To: <1295872799-1637-1-git-send-email-ratbert.chuang@gmail.com>

Le lundi 24 janvier 2011 à 20:39 +0800, Po-Yu Chuang a écrit :
> From: Po-Yu Chuang <ratbert@faraday-tech.com>


> +static int ftmac100_xmit(struct ftmac100 *priv, struct sk_buff *skb,
> +			 dma_addr_t map)
> +{
> +	struct net_device *netdev = priv->netdev;
> +	struct ftmac100_txdes *txdes;
> +	unsigned int len = (skb->len < ETH_ZLEN) ? ETH_ZLEN : skb->len;
> +
> +	txdes = ftmac100_current_txdes(priv);
> +	ftmac100_tx_pointer_advance(priv);
> +
> +	/* setup TX descriptor */
> +
> +	spin_lock(&priv->tx_lock);
> +	ftmac100_txdes_set_skb(txdes, skb);
> +	ftmac100_txdes_set_dma_addr(txdes, map);
> +
> +	ftmac100_txdes_set_first_segment(txdes);
> +	ftmac100_txdes_set_last_segment(txdes);
> +	ftmac100_txdes_set_txint(txdes);
> +	ftmac100_txdes_set_buffer_size(txdes, len);
> +
> +	priv->tx_pending++;
> +	if (priv->tx_pending == TX_QUEUE_ENTRIES) {
> +		if (net_ratelimit())
> +			netdev_info(netdev, "tx queue full\n");

Hmm, I guess you didnt tested your driver with a pktgen flood ;)

This 'netdev_info(netdev, "tx queue full\n");' is not necessary, since
its a pretty normal condition for a driver (to fill its TX ring buffer)

> +
> +		netif_stop_queue(netdev);
> +	}
> +
> +	/* start transmit */
> +	ftmac100_txdes_set_dma_own(txdes);
> +	spin_unlock(&priv->tx_lock);
> +
> +	ftmac100_txdma_start_polling(priv);
> +
> +	return NETDEV_TX_OK;
> +}

^ permalink raw reply

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: Maciej Rutecki @ 2011-01-24 15:25 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Simon Arlott, netdev, Linux Kernel Mailing List
In-Reply-To: <AANLkTimTz=kMmJ=YhyJv28WUrsbN=ygBt9e7dMJ3KGqB@mail.gmail.com>

On niedziela, 23 stycznia 2011 o 22:29:02 Jesse Gross wrote:
> On Sun, Jan 23, 2011 at 9:45 AM, Maciej Rutecki
> 
> <maciej.rutecki@gmail.com> wrote:
> > I created a Bugzilla entry at
> > https://bugzilla.kernel.org/show_bug.cgi?id=27432
> > for your bug report, please add your address to the CC list in there,
> > thanks!
> 
> This isn't a bug - the change resolved behavior that varied depending
> on what NIC was in use.

Thanks for the update. Closing.

Regards
-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply

* [PATCH net-next-2.6] veth: remove unneeded ifname code from veth_newlink()
From: Jiri Pirko @ 2011-01-24 15:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, xemul

The code is not needed because tb[IFLA_IFNAME] is already
processed in rtnl_newlink(). Remove this redundancy.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index cc83fa7..105d7f0 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -403,17 +403,6 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 	if (tb[IFLA_ADDRESS] == NULL)
 		random_ether_addr(dev->dev_addr);
 
-	if (tb[IFLA_IFNAME])
-		nla_strlcpy(dev->name, tb[IFLA_IFNAME], IFNAMSIZ);
-	else
-		snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
-
-	if (strchr(dev->name, '%')) {
-		err = dev_alloc_name(dev, dev->name);
-		if (err < 0)
-			goto err_alloc_name;
-	}
-
 	err = register_netdevice(dev);
 	if (err < 0)
 		goto err_register_dev;
@@ -433,7 +422,6 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 
 err_register_dev:
 	/* nothing to do */
-err_alloc_name:
 err_configure_peer:
 	unregister_netdevice(peer);
 	return err;

^ permalink raw reply related

* Re: [net-2.6 PATCH 1/2] net: dcbnl: remove redundant DCB_CAP_DCBX_STATIC bit
From: John Fastabend @ 2011-01-24 15:52 UTC (permalink / raw)
  To: Shmulik Ravid; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1295882871.25104.20.camel@lb-tlvb-shmulik.il.broadcom.com>

On 1/24/2011 7:27 AM, Shmulik Ravid wrote:
> 
> On Sun, 2011-01-23 at 21:46 -0800, John Fastabend wrote:
>> On 1/23/2011 8:53 AM, Shmulik Ravid wrote:
>>>
>>> On Fri, 2011-01-21 at 18:52 -0800, John Fastabend wrote:
>>>> On 1/21/2011 6:35 PM, John Fastabend wrote:
>>>>> Remove redundant DCB_CAP_DCBX_STATIC bit in DCB capabilities
>>>>>
>>>>> Setting this bit indicates that no embedded DCBx engine is
>>>>> present and the hardware can not be configured. This is the
>>>>> same as having none of the DCB capability flags set or simply
>>>>> not implementing the dcbnl ops at all.
>>>>>
>>>>> This patch removes this bit. The bit has not made a stable
>>>>> release yet so removing it should not be an issue with
>>>>> existing apps.
>>>>>
>>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>>> CC: Shmulik Ravid <shmulikr@broadcom.com>
>>>>> ---
>>>>>
>>>>

Dave, Please drop this patch sorry for the noise.

[...]

>> We have an advertise bit in userspace that can be set and cleared to
>> do something similar for host based agents. I think for pg and application
>> data you can get the same behavior by setting the device to not willing.
>>
> True, but this requires a proper DCBx peer. The STATIC option is a bit
> stronger.

At least in the PG case the CEE spec says the local configuration should be
used[1]. Application is a bit more vague in my opinion[2].

> 
>> However for PFC it could potentially be useful. But how would the
>> user set this mode? This is a capabilities bit indicating the device
>> supports this. Is there a way to subsequently put the device in this
>> mode?
> You can set this mode by specifying this attribute in the set_dcbx
> operation. The input to set_dcbx should be a subset of the advertised
> dcbx attributes.
> 

OK This works for me Shmulik thanks for the explanation.

[1] 3.1.4. http://www.ieee802.org/1/files/public/docs2008/az-wadekar-dcbx-capability-exchange-discovery-protocol-1108-v1.01.pdf

[2] 3.3.2. http://www.ieee802.org/1/files/public/docs2008/az-wadekar-dcbx-capability-exchange-discovery-protocol-1108-v1.01.pdf

^ permalink raw reply

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Patrick McHardy @ 2011-01-24 16:10 UTC (permalink / raw)
  To: Mario 'BitKoenig' Holbe, netfilter-devel, linux-kernel,
	NetDev
In-Reply-To: <20110124143518.GA2616@darkside.kls.lan>

Am 24.01.2011 15:35, schrieb Mario 'BitKoenig' Holbe:
> On Mon, Jan 24, 2011 at 02:46:57PM +0100, Patrick McHardy wrote:
>> On 23.01.2011 13:21, Mario 'BitKoenig' Holbe wrote:
>>> Without marking everything runs as it should be.
>>> Marking eth0 packets results in all advertisements transmitted via eth1.
>>> The behaviour goes back to normal as soon as the marking disappears.
>>> I also tried marking with 0xff00 instead of 1 - same results.
>> That probably means that we're not using the correct keys
>> when rerouting in ip6_route_me_harder(). Just for testing,
>> please try to disable the ip6_route_me_harder() call in
>> net/ipv6/netfilter/ip6table_mangle.c::ip6t_mangle_out().
> 
> Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
> results in the advertisements being transmitted on the correct
> interfaces

Thanks. The problem appears to be that ip6_route_me_harder()
only uses the socket's oif for the route lookup when the
socket is bound to an interface, but radvd uses IPV6_PKTINFO
to specify the outgoing interface.

I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
we unfortunately have neither an indication of this nor the
original route lookup keys available at the time the packet
is rerouted.

^ permalink raw reply

* RE: Using ethernet device as efficient small packet generator
From: Eric Dumazet @ 2011-01-24 16:34 UTC (permalink / raw)
  To: juice
  Cc: Brandeburg, Jesse, Loke, Chetan, Jon Zhou, Stephen Hemminger,
	netdev@vger.kernel.org
In-Reply-To: <30747065682effddc661b8cd235553d9.squirrel@www.liukuma.net>

Le lundi 24 janvier 2011 à 10:10 +0200, juice a écrit :

> Result: OK: 12345544(c12344739+d804) nsec, 10000000 (60byte,0frags)
>   810008pps 388Mb/sec (388803840bps) errors: 0
> 

> 
> This is a bit better than the previous maxim of 750064pps / 360Mb/sec
> that I was able to achieve without tuning parameters with ethtool, but
> still not near the 1.1Mpacks/s that shoud be doable with my card?

Please check what numbers you can get using dummy0 device instead of
real ethernet driver.

Here : (E5540  @ 2.53GHz) clone = 1

Result: OK: 34775941(c34775225+d716) nsec, 100000000 (60byte,0frags)
  2875551pps 1380Mb/sec (1380264480bps) errors: 0




^ permalink raw reply

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Mario 'BitKoenig' Holbe @ 2011-01-24 17:02 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, linux-kernel, NetDev
In-Reply-To: <4D3DA48A.2020605@trash.net>

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Mon, Jan 24, 2011 at 05:10:50PM +0100, Patrick McHardy wrote:
> > Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
> > results in the advertisements being transmitted on the correct
> > interfaces
> Thanks. The problem appears to be that ip6_route_me_harder()
> only uses the socket's oif for the route lookup when the
> socket is bound to an interface, but radvd uses IPV6_PKTINFO
> to specify the outgoing interface.
> 
> I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
> we unfortunately have neither an indication of this nor the
> original route lookup keys available at the time the packet
> is rerouted.

Mh, I'm not sure, but I guess an indication of netfilter not overriding
IPV6_PKTINFO could be the fact that the source address does not
change...

From my 1st mail:
| # ip6tables -t mangle -A OUTPUT -o eth0 -j MARK --set-mark 1
| # /etc/init.d/radvd start
| -> eth0: <no traffic>
| -> eth1: fe80::2a0:c9ff:fee6:90ce > ff02::1: prefix 2001:6f8:90c:10::/64
| -> eth1: fe80::2d0:b7ff:fe06:6b36 > ff02::1: prefix 2001:6f8:90c:12::/64

fe80::2d0:b7ff:fe06:6b36 is the link-local address of eth0 set by radvd
in IPV6_PKTINFO as well. This, of course, is no guarantee for
ipi6_ifindex not being changed, but I believe if something would have
changed it, it would also have changed ipi6_addr.


Mario
-- 
Doing it right is no excuse for not meeting the schedule.
                                -- Plant Manager, Delphi Corporation

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply

* Re: does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Alexander Duyck @ 2011-01-24 17:09 UTC (permalink / raw)
  To: Rui; +Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <AANLkTinuozfPcAZStV-a=siqcLesqPHnGhVh=QitOnQs@mail.gmail.com>

On 1/24/2011 6:18 AM, Rui wrote:
> hi
> does intel X520-SR support RSS on single VLAN?
>
> tested with 3 different vlan id and priority packets
> What I saw is that all packets were always delivered to the same RxQ.
> looks can not get a different RSS index for these packet?
> any setting needed?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The X520 should have no problems hashing on a single VLAN tagged frame. 
  However the VLAN will not be a part of the RSS hash.  The  only 
components of the hash are the IPv4/IPv6 source and destination 
addresses, and if the flow is TCP then the port numbers.

I would recommend testing with something like the "netperf -t TCP_CRR" 
test which should open a number of ports and spread traffic out between 
multiple queues.

Thanks,

Alex

^ permalink raw reply

* [PATCH] GRO: fix merging a paged skb after non-paged skbs
From: Michal Schmidt @ 2011-01-24 17:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Herbert Xu, Ben Hutchings

Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 net/core/skbuff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..c231f5b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2746,7 +2746,7 @@ merge:
 	if (offset > headlen) {
 		skbinfo->frags[0].page_offset += offset - headlen;
 		skbinfo->frags[0].size -= offset - headlen;
-		offset = headlen;
+		skb->data_len -= offset - headlen;
 	}
 
 	__skb_pull(skb, offset);
-- 
1.7.1


^ permalink raw reply related

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Patrick McHardy @ 2011-01-24 17:50 UTC (permalink / raw)
  To: Mario 'BitKoenig' Holbe, netfilter-devel, linux-kernel,
	NetDev
In-Reply-To: <20110124170213.GB2616@darkside.kls.lan>

Am 24.01.2011 18:02, schrieb Mario 'BitKoenig' Holbe:
> On Mon, Jan 24, 2011 at 05:10:50PM +0100, Patrick McHardy wrote:
>>> Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
>>> results in the advertisements being transmitted on the correct
>>> interfaces
>> Thanks. The problem appears to be that ip6_route_me_harder()
>> only uses the socket's oif for the route lookup when the
>> socket is bound to an interface, but radvd uses IPV6_PKTINFO
>> to specify the outgoing interface.
>>
>> I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
>> we unfortunately have neither an indication of this nor the
>> original route lookup keys available at the time the packet
>> is rerouted.
> 
> Mh, I'm not sure, but I guess an indication of netfilter not overriding
> IPV6_PKTINFO could be the fact that the source address does not
> change...

No, ip6_route_me_harder() only attaches a new route to the packet,
the packets contents are not changed.

> From my 1st mail:
> | # ip6tables -t mangle -A OUTPUT -o eth0 -j MARK --set-mark 1
> | # /etc/init.d/radvd start
> | -> eth0: <no traffic>
> | -> eth1: fe80::2a0:c9ff:fee6:90ce > ff02::1: prefix 2001:6f8:90c:10::/64
> | -> eth1: fe80::2d0:b7ff:fe06:6b36 > ff02::1: prefix 2001:6f8:90c:12::/64
> 
> fe80::2d0:b7ff:fe06:6b36 is the link-local address of eth0 set by radvd
> in IPV6_PKTINFO as well. This, of course, is no guarantee for
> ipi6_ifindex not being changed, but I believe if something would have
> changed it, it would also have changed ipi6_addr.

No, what is happening is that radvd sends the packet with a specified
ifindex using IPV6_PKTINFO. The mangle table notices that the mark
changes and calls ip6_route_me_harder(), which performs a new route
lookup without taking the specified oif into account. It therefore
chooses the first of your two routes and sends the packet out eth1.

^ permalink raw reply

* Re: [PATCH] xen: netfront: Drop GSO SKBs which do not have csum_blank.
From: Jeremy Fitzhardinge @ 2011-01-24 17:55 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev@vger.kernel.org, xen-devel@lists.xensource.com
In-Reply-To: <1295689392.3693.153.camel@localhost.localdomain>

On 01/22/2011 01:43 AM, Ian Campbell wrote:
> On Sat, 2011-01-22 at 00:58 +0000, Jeremy Fitzhardinge wrote: 
>> On 01/05/2011 05:23 AM, Ian Campbell wrote:
>>> The Linux network stack expects all GSO SKBs to have ip_summed ==
>>> CHECKSUM_PARTIAL (which implies that the frame contains a partial
>>> checksum) and the Xen network ring protocol similarly expects an SKB
>>> which has GSO set to also have NETRX_csum_blank (which also implies a
>>> partial checksum). Therefore drop such frames on receive otherwise
>>> they will trigger the warning in skb_gso_segment.
>>>
>>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>>> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
>>> Cc: xen-devel@lists.xensource.com
>>> Cc: netdev@vger.kernel.org
>>> ---
>>>  drivers/net/xen-netfront.c |    5 +++++
>>>  1 files changed, 5 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>> index cdbeec9..8b8c480 100644
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>>> @@ -836,6 +836,11 @@ static int handle_incoming_queue(struct net_device *dev,
>>>  				dev->stats.rx_errors++;
>>>  				continue;
>>>  			}
>>> +		} else if (skb_is_gso(skb)) {
>>> +			kfree_skb(skb);
>>> +			packets_dropped++;
>>> +			dev->stats.rx_errors++;
>>> +			continue;
>> This looks redundant; why not something like:
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 47e6a71..c1b8f64 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -852,13 +852,12 @@ static int handle_incoming_queue(struct net_device *dev,
>>  		/* Ethernet work: Delayed to here as it peeks the header. */
>>  		skb->protocol = eth_type_trans(skb, dev);
>>  
>> -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
>> -			if (skb_checksum_setup(skb)) {
>> -				kfree_skb(skb);
>> -				packets_dropped++;
>> -				dev->stats.rx_errors++;
>> -				continue;
>> -			}
>> +		if (skb->ip_summed != CHECKSUM_PARTIAL ||
>> +		    skb_checksum_setup(skb)) {
> That drops non-partial skbs. However they are fine unless they also
> claim to be gso.
>
> Perhaps you meant "skb->ip_summed == CHECKSUM_PARTIAL && !
> skb_checksum_setup(skb)" which I think works but doesn't allow us to
> correctly chain the gso check onto the else.

No, I didn't mean to drop the skb_is_gso() test.  But still, the if()s
can be folded to share the same body.

    J

^ permalink raw reply

* Re: does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Rick Jones @ 2011-01-24 18:10 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org, Rui
In-Reply-To: <4D3DB248.5070802@intel.com>

Alexander Duyck wrote:
> I would recommend testing with something like the "netperf -t TCP_CRR" 
> test which should open a number of ports and spread traffic out between 
> multiple queues.

TCP_CRR - Connect Request Response - it will cycle through almost the entire 
port space as it goes, one at a time.  Any one four-tuple will be unlikely to 
have all that many packets - just the SYN exchange, the request/response 
exchange and then the FIN exchange, so unless there is a tool looking at the 
queues with microsecond granularity, it will appear like it is all happening at 
once :)

If you want to see one queue used for "a while" and then another, I would 
suggest a TCP_RR test with the confidence intervals set to say 30 iterations. 
That will exchange packets for the test duration (global -l option) and then the 
next iteration will have a four-tuple that differs in the client port number 
from the previous (the "server" port number remains fixed through the iterations 
of the TCP_RR test.

One can also run TCP_RR tests in turn, one after the other, but that consumes 
port numbers in twos on both sides.  I suppose that these days with port number 
randomization that's OK, but in "the old days" it tended to mean that the 
control and data ports marched in lock-step and one would always be even the 
other odd, which didn't always work that well with hashes...  The use of the 
confidence intervals with the TCP_RR test deals with that by having only the one 
netperf control connection and then successive data connections.

happy benchmarking,

rick jones

There is also always the full specification of the port numbers and IP's for the 
data connection, though it is a bit more cumbersome.

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [PATCH] TCP: fix a bug that triggers large number of TCP RST by mistake
From: H.K. Jerry Chu @ 2011-01-22 19:06 UTC (permalink / raw)
  To: netdev; +Cc: Jerry Chu

From: Jerry Chu <hkchu@google.com>

This patch fixes a bug that causes TCP RST packets to be generated
on otherwise correctly behaved applications, e.g., no unread data
on close,..., etc. To trigger the bug, at least two conditions must
be met:

1. The FIN flag is set on the last data packet, i.e., it's not on a
separate, FIN only packet.
2. The size of the last data chunk on the receive side matches
exactly with the size of buffer posted by the receiver, and the
receiver closes the socket without any further read attempt.

This bug was first noticed on our netperf based testbed for our IW10
proposal to IETF where a large number of RST packets were observed.
netperf's read side code meets the condition 2 above 100%.

Before the fix, tcp_data_queue() will queue the last skb that meets
condition 1 to sk_receive_queue even though it has fully copied out
(skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
tcp_recvmsg() often returns all the copied out data successfully
without actually consuming the skb, due to a check
"if ((chunk = len - tp->ucopy.len) != 0) {"
and
"len -= chunk;"
after tcp_prequeue_process() that causes "len" to become 0 and an
early exit from the big while loop.

I don't see any reason not to free the skb whose data have been fully
consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
get there if MSG_PEEK is on. Am I missing some arcane cases related
to urgent data?

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2549b29..eb7f82e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4399,7 +4399,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			if (!skb_copy_datagram_iovec(skb, 0, tp->ucopy.iov, chunk)) {
 				tp->ucopy.len -= chunk;
 				tp->copied_seq += chunk;
-				eaten = (chunk == skb->len && !th->fin);
+				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
 			}
 			local_bh_disable();
-- 
1.7.3.1


^ permalink raw reply related

* Re: Flow Control and Port Mirroring Revisited
From: Rick Jones @ 2011-01-24 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110123103902.GA28585@redhat.com>

> 
> Just to block netperf you can send it SIGSTOP :)
> 

Clever :)  One could I suppose achieve the same result by making the remote 
receive socket buffer size smaller than the UDP message size and then not worry 
about having to learn the netserver's PID to send it the SIGSTOP.  I *think* the 
semantics will be substantially the same?  Both will be drops at the socket 
buffer, albeit for for different reasons.  The "too small socket buffer" version 
though doesn't require one remember to "wake" the netserver in time to have it 
send results back to netperf without netperf tossing-up an error and not 
reporting any statistics.

Also, netperf has a "no control connection" mode where you can, in effect cause 
it to send UDP datagrams out into the void - I put it there to allow folks to 
test against the likes of echo discard and chargen services but it may have a 
use here.  Requires that one specify the destination IP and port for the "data 
connection" explicitly via the test-specific options.  In that mode the only 
stats reported are those local to netperf rather than netserver.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-24 18:36 UTC (permalink / raw)
  To: Rick Jones
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <4D3DC4AB.1000207@hp.com>

On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> >
> >Just to block netperf you can send it SIGSTOP :)
> >
> 
> Clever :)  One could I suppose achieve the same result by making the
> remote receive socket buffer size smaller than the UDP message size
> and then not worry about having to learn the netserver's PID to send
> it the SIGSTOP.  I *think* the semantics will be substantially the
> same?

If you could set, it, yes. But at least linux ignores
any value substantially smaller than 1K, and then
multiplies that by 2:

        case SO_RCVBUF:
                /* Don't error on this BSD doesn't and if you think
                   about it this is right. Otherwise apps have to
                   play 'guess the biggest size' games. RCVBUF/SNDBUF
                   are treated in BSD as hints */

                if (val > sysctl_rmem_max)
                        val = sysctl_rmem_max;
set_rcvbuf:     
                sk->sk_userlocks |= SOCK_RCVBUF_LOCK;

                /*
                 * We double it on the way in to account for
                 * "struct sk_buff" etc. overhead.   Applications
                 * assume that the SO_RCVBUF setting they make will
                 * allow that much actual data to be received on that
                 * socket.
                 *
                 * Applications are unaware that "struct sk_buff" and
                 * other overheads allocate from the receive buffer
                 * during socket buffer allocation. 
                 *
                 * And after considering the possible alternatives,
                 * returning the value we actually used in getsockopt
                 * is the most desirable behavior.
                 */ 
                if ((val * 2) < SOCK_MIN_RCVBUF)
                        sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
                else
                        sk->sk_rcvbuf = val * 2;

and

/*                      
 * Since sk_rmem_alloc sums skb->truesize, even a small frame might need
 * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
 */             
#define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))


>  Both will be drops at the socket buffer, albeit for for
> different reasons.  The "too small socket buffer" version though
> doesn't require one remember to "wake" the netserver in time to have
> it send results back to netperf without netperf tossing-up an error
> and not reporting any statistics.
> 
> Also, netperf has a "no control connection" mode where you can, in
> effect cause it to send UDP datagrams out into the void - I put it
> there to allow folks to test against the likes of echo discard and
> chargen services but it may have a use here.  Requires that one
> specify the destination IP and port for the "data connection"
> explicitly via the test-specific options.  In that mode the only
> stats reported are those local to netperf rather than netserver.

Ah, sounds perfect.

> happy benchmarking,
> 
> rick jones


^ permalink raw reply

* Re: [PATCH] GRO: fix merging a paged skb after non-paged skbs
From: Eric Dumazet @ 2011-01-24 18:44 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: David Miller, netdev, Herbert Xu, Ben Hutchings
In-Reply-To: <20110124184752.1d0947dd@delilah>

Le lundi 24 janvier 2011 à 18:47 +0100, Michal Schmidt a écrit :
> Suppose that several linear skbs of the same flow were received by GRO. They
> were thus merged into one skb with a frag_list. Then a new skb of the same flow
> arrives, but it is a paged skb with data starting in its frags[].
> 
> Before adding the skb to the frag_list skb_gro_receive() will of course adjust
> the skb to throw away the headers. It correctly modifies the page_offset and
> size of the frag, but it leaves incorrect information in the skb:
>  ->data_len is not decreased at all.
>  ->len is decreased only by headlen, as if no change were done to the frag.
> Later in a receiving process this causes skb_copy_datagram_iovec() to return
> -EFAULT and this is seen in userspace as the result of the recv() syscall.
> 
> In practice the bug can be reproduced with the sfc driver. By default the
> driver uses an adaptive scheme when it switches between using
> napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
> reproduced when under rx load with enough successful GRO merging the driver
> decides to switch from the former to the latter.
> 
> Manual control is also possible, so reproducing this is easy with netcat:
>  - on machine1 (with sfc): nc -l 12345 > /dev/null
>  - on machine2: nc machine1 12345 < /dev/zero
>  - on machine1:
>    echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
>    echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
>  - See that nc has quit suddenly.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  net/core/skbuff.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d31bb36..c231f5b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2746,7 +2746,7 @@ merge:
>  	if (offset > headlen) {
>  		skbinfo->frags[0].page_offset += offset - headlen;
>  		skbinfo->frags[0].size -= offset - headlen;
> -		offset = headlen;
> +		skb->data_len -= offset - headlen;
>  	}
>  
>  	__skb_pull(skb, offset);

Hi Michal

Hmm, I dont really understand how __skb_pull(skb, offset) can be ok if
offset > headlen

skb->data might reach tail/end ?

Maybe I am too confused, this code is a bit complex :(

Thanks !

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..7cd1bc8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2744,8 +2744,12 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 merge:
 	if (offset > headlen) {
-		skbinfo->frags[0].page_offset += offset - headlen;
-		skbinfo->frags[0].size -= offset - headlen;
+		unsigned int eat = offset - headlen;
+
+		skbinfo->frags[0].page_offset += eat;
+		skbinfo->frags[0].size -= eat;
+		skb->data_len -= eat;
+		skb->len -= eat;
 		offset = headlen;
 	}
 



^ permalink raw reply related

* Re: Flow Control and Port Mirroring Revisited
From: Rick Jones @ 2011-01-24 19:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110124183626.GB29941@redhat.com>

Michael S. Tsirkin wrote:
> On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> 
>>>Just to block netperf you can send it SIGSTOP :)
>>>
>>
>>Clever :)  One could I suppose achieve the same result by making the
>>remote receive socket buffer size smaller than the UDP message size
>>and then not worry about having to learn the netserver's PID to send
>>it the SIGSTOP.  I *think* the semantics will be substantially the
>>same?
> 
> 
> If you could set, it, yes. But at least linux ignores
> any value substantially smaller than 1K, and then
> multiplies that by 2:
> 
>         case SO_RCVBUF:
>                 /* Don't error on this BSD doesn't and if you think
>                    about it this is right. Otherwise apps have to
>                    play 'guess the biggest size' games. RCVBUF/SNDBUF
>                    are treated in BSD as hints */
> 
>                 if (val > sysctl_rmem_max)
>                         val = sysctl_rmem_max;
> set_rcvbuf:     
>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
> 
>                 /*
>                  * We double it on the way in to account for
>                  * "struct sk_buff" etc. overhead.   Applications
>                  * assume that the SO_RCVBUF setting they make will
>                  * allow that much actual data to be received on that
>                  * socket.
>                  *
>                  * Applications are unaware that "struct sk_buff" and
>                  * other overheads allocate from the receive buffer
>                  * during socket buffer allocation. 
>                  *
>                  * And after considering the possible alternatives,
>                  * returning the value we actually used in getsockopt
>                  * is the most desirable behavior.
>                  */ 
>                 if ((val * 2) < SOCK_MIN_RCVBUF)
>                         sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
>                 else
>                         sk->sk_rcvbuf = val * 2;
> 
> and
> 
> /*                      
>  * Since sk_rmem_alloc sums skb->truesize, even a small frame might need
>  * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
>  */             
> #define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))

Pity - seems to work back on 2.6.26:

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928    1024   10.00     2882334      0    2361.17
    256           10.00           0              0.00

raj@tardy:~/netperf2_trunk$ uname -a
Linux tardy 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux

Still, even with that (or SIGSTOP) we don't really know where the packets were 
dropped right?  There is no guarantee they weren't dropped before they got to 
the socket buffer

happy benchmarking,
rick jones

PS - here is with a -S 1024 option:

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1024 -m 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928    1024   10.00     1679269      0    1375.64
   2048           10.00     1490662           1221.13

showing that there is a decent chance that many of the frames were dropped at 
the socket buffer, but not all - I suppose I could/should be checking netstat 
stats... :)

And just a little more, only because I was curious :)

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1M -m 257
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928     257   10.00     1869134      0     384.29
262142           10.00     1869134            384.29

raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 257
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost 
(127.0.0.1) port 0 AF_INET : histogram
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

124928     257   10.00     3076363      0     632.49
    256           10.00           0              0.00


^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy for single TCP session balancing
From: Oleg V. Ukhno @ 2011-01-24 19:32 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jay Vosburgh, John Fastabend, netdev@vger.kernel.org
In-Reply-To: <4D3AD234.7080709@gmail.com>

On 01/22/2011 03:48 PM, Nicolas de Pesloüan wrote:
> Le 21/01/2011 14:55, Oleg V. Ukhno a écrit :
>> On 01/19/2011 11:12 PM, Nicolas de Pesloüan wrote:

>>>
>> Nicolas,
>> I've ran similar tests for VLAN tunneling scenario. Results are
>> identical, as I expected. The only significat difference is link failure
>> handling. 802.3ad mode allows almost painless load reditribution,
>> balance-rr causes packet loss.
>
> Oleg,
>
> Thanks for doing the tests.
>
> What link failure mode did you use for those tests ? miimon or arp
> monitoring ?
>
> Nicolas.
>
>

Nicolas,
  as for tests:
  MII link monitoring kills the whole transfer, when in ARP mode 
monitoring - it still works, but there is asymmetric load striping on 
bond slaves(one slave is overloaded, two other - about 50-60% badwidnth 
utilized.
Just as a summary - balance-rr behaves like patched 802.3ad when using 
arp monitoring mode, but there is quite asymmetric load striping and 
quite a monstrous configuration on switch and server sides.



-- 
Best regards,
Oleg Ukhno



^ permalink raw reply

* Re: [PATCH] MAINTAINERS: add second list for IRDA
From: Wolfram Sang @ 2011-01-24 19:34 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: netdev, David Miller
In-Reply-To: <20101126120920.GK5520@sortiz-mobl>

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

On Fri, Nov 26, 2010 at 01:09:21PM +0100, Samuel Ortiz wrote:

> Hi Wolfgang,

Wolfram, please ;)

> On Tue, Nov 23, 2010 at 07:10:13PM +0100, Wolfram Sang wrote:
> > The irda-users-list is currently almost dead and subscribers-only. Adding
> > netdev increases the audience which might help to not overlook a bugreport
> > (again).
> Makes sense. I'll carry this patch, thanks.

Ping. I could also send this to trivial@ with your ack if that makes it easier?

> 
> Cheers,
> Samuel.
> 
> 
> > Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
> > Cc: Samuel Ortiz <sameo@linux.intel.com>
> > Cc: David Miller <davem@davemloft.net>
> > ---
> >  MAINTAINERS |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 8b6ca96..2596a78 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3261,6 +3261,7 @@ F:	net/ipx/
> >  IRDA SUBSYSTEM
> >  M:	Samuel Ortiz <samuel@sortiz.org>
> >  L:	irda-users@lists.sourceforge.net (subscribers-only)
> > +L:	netdev@vger.kernel.org
> >  W:	http://irda.sourceforge.net/
> >  S:	Maintained
> >  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/sameo/irda-2.6.git
> > -- 
> > 1.7.2.3
> > 
> 
> -- 
> Intel Open Source Technology Centre
> http://oss.intel.com/

-- 
Pengutronix e.K.                           | Wolfram Sang                |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-24 19:42 UTC (permalink / raw)
  To: Rick Jones
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <4D3DCC99.5050101@hp.com>

On Mon, Jan 24, 2011 at 11:01:45AM -0800, Rick Jones wrote:
> Michael S. Tsirkin wrote:
> >On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> >
> >>>Just to block netperf you can send it SIGSTOP :)
> >>>
> >>
> >>Clever :)  One could I suppose achieve the same result by making the
> >>remote receive socket buffer size smaller than the UDP message size
> >>and then not worry about having to learn the netserver's PID to send
> >>it the SIGSTOP.  I *think* the semantics will be substantially the
> >>same?
> >
> >
> >If you could set, it, yes. But at least linux ignores
> >any value substantially smaller than 1K, and then
> >multiplies that by 2:
> >
> >        case SO_RCVBUF:
> >                /* Don't error on this BSD doesn't and if you think
> >                   about it this is right. Otherwise apps have to
> >                   play 'guess the biggest size' games. RCVBUF/SNDBUF
> >                   are treated in BSD as hints */
> >
> >                if (val > sysctl_rmem_max)
> >                        val = sysctl_rmem_max;
> >set_rcvbuf:                     sk->sk_userlocks |=
> >SOCK_RCVBUF_LOCK;
> >
> >                /*
> >                 * We double it on the way in to account for
> >                 * "struct sk_buff" etc. overhead.   Applications
> >                 * assume that the SO_RCVBUF setting they make will
> >                 * allow that much actual data to be received on that
> >                 * socket.
> >                 *
> >                 * Applications are unaware that "struct sk_buff" and
> >                 * other overheads allocate from the receive buffer
> >                 * during socket buffer allocation.
> >*
> >                 * And after considering the possible alternatives,
> >                 * returning the value we actually used in getsockopt
> >                 * is the most desirable behavior.
> >                 */                 if ((val * 2) <
> >SOCK_MIN_RCVBUF)
> >                        sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
> >                else
> >                        sk->sk_rcvbuf = val * 2;
> >
> >and
> >
> >/*                       * Since sk_rmem_alloc sums skb->truesize,
> >even a small frame might need
> > * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
> > */             #define SOCK_MIN_RCVBUF (2048 + sizeof(struct
> >sk_buff))
> 
> Pity - seems to work back on 2.6.26:

Hmm, that code is there at least as far back as 2.6.12.

> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928    1024   10.00     2882334      0    2361.17
>    256           10.00           0              0.00
> 
> raj@tardy:~/netperf2_trunk$ uname -a
> Linux tardy 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux
> 
> Still, even with that (or SIGSTOP) we don't really know where the
> packets were dropped right?  There is no guarantee they weren't
> dropped before they got to the socket buffer
> 
> happy benchmarking,
> rick jones

Right. Better send to a port with no socket listening there,
that would drop the packet at an early (if not at the earliest
possible)  opportunity.

> PS - here is with a -S 1024 option:
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1024 -m 1024
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928    1024   10.00     1679269      0    1375.64
>   2048           10.00     1490662           1221.13
> 
> showing that there is a decent chance that many of the frames were
> dropped at the socket buffer, but not all - I suppose I could/should
> be checking netstat stats... :)
> 
> And just a little more, only because I was curious :)
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1M -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928     257   10.00     1869134      0     384.29
> 262142           10.00     1869134            384.29
> 
> raj@tardy:~/netperf2_trunk$ src/netperf -t UDP_STREAM -- -S 1 -m 257
> MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> localhost (127.0.0.1) port 0 AF_INET : histogram
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 124928     257   10.00     3076363      0     632.49
>    256           10.00           0              0.00

^ permalink raw reply

* Re: [PATCH v4] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Michał Mirosław @ 2011-01-24 20:22 UTC (permalink / raw)
  To: Po-Yu Chuang
  Cc: netdev, linux-kernel, bhutchings, eric.dumazet, joe, dilinger,
	Po-Yu Chuang
In-Reply-To: <AANLkTinsWVHj6qdSS-_ztztcNPHAkwZQnimPbjzvcHb2@mail.gmail.com>

W dniu 24 stycznia 2011 09:26 użytkownik Po-Yu Chuang
<ratbert.chuang@gmail.com> napisał:
> 2011/1/21 Michał Mirosław <mirqus@gmail.com>:
>> 2011/1/21 Po-Yu Chuang <ratbert.chuang@gmail.com>:
>>> +static void ftmac100_free_buffers(struct ftmac100 *priv)
>>> +{
>>> +       int i;
>>> +
>>> +       for (i = 0; i < RX_QUEUE_ENTRIES; i += 2) {
>>> +               struct ftmac100_rxdes *rxdes = &priv->descs->rxdes[i];
>>> +               dma_addr_t d = ftmac100_rxdes_get_dma_addr(rxdes);
>>> +               void *page = ftmac100_rxdes_get_va(rxdes);
>>> +
>>> +               if (d)
>>> +                       dma_unmap_single(priv->dev, d, PAGE_SIZE,
>>> +                                        DMA_FROM_DEVICE);
>>> +
>>> +               if (page != NULL)
>>> +                       free_page((unsigned long)page);
>>> +       }
>>> +
>> [...]
>>
>>> +static int ftmac100_alloc_buffers(struct ftmac100 *priv)
>>> +{
>>> +       int i;
>>> +
>>> +       priv->descs = dma_alloc_coherent(priv->dev,
>>> +                                        sizeof(struct ftmac100_descs),
>>> +                                        &priv->descs_dma_addr,
>>> +                                        GFP_KERNEL | GFP_DMA);
>>> +       if (priv->descs == NULL)
>>> +               return -ENOMEM;
>>> +
>>> +       memset(priv->descs, 0, sizeof(struct ftmac100_descs));
>>> +
>>> +       /* initialize RX ring */
>>> +
>>> +       ftmac100_rxdes_set_end_of_ring(&priv->descs->rxdes[RX_QUEUE_ENTRIES - 1]);
>>> +
>>> +       for (i = 0; i < RX_QUEUE_ENTRIES; i += 2) {
>>> +               struct ftmac100_rxdes *rxdes = &priv->descs->rxdes[i];
>>> +               void *page;
>>> +               dma_addr_t d;
>>> +
>>> +               page = (void *)__get_free_page(GFP_KERNEL | GFP_DMA);
>>> +               if (page == NULL)
>>> +                       goto err;
>>> +
>>> +               d = dma_map_single(priv->dev, page, PAGE_SIZE, DMA_FROM_DEVICE);
>>> +               if (unlikely(dma_mapping_error(priv->dev, d))) {
>>> +                       free_page((unsigned long)page);
>>> +                       goto err;
>>> +               }
>>> +
>>> +               /*
>>> +                * The hardware enforces a sub-2K maximum packet size, so we
>>> +                * put two buffers on every hardware page.
>>> +                */
>>> +               ftmac100_rxdes_set_va(rxdes, page);
>>> +               ftmac100_rxdes_set_va(rxdes + 1, page + PAGE_SIZE / 2);
>>> +
>>> +               ftmac100_rxdes_set_dma_addr(rxdes, d);
>>> +               ftmac100_rxdes_set_dma_addr(rxdes + 1, d + PAGE_SIZE / 2);
>>> +
>>> +               ftmac100_rxdes_set_buffer_size(rxdes, RX_BUF_SIZE);
>>> +               ftmac100_rxdes_set_buffer_size(rxdes + 1, RX_BUF_SIZE);
>>> +
>>> +               ftmac100_rxdes_set_dma_own(rxdes);
>>> +               ftmac100_rxdes_set_dma_own(rxdes + 1);
>>> +       }
>> [...]
>>
>> Did you test this? This looks like it will result in double free after
>> packet RX, as you are giving the same page (referenced once) to two
>> distinct RX descriptors, that may be assigned different packets.
>
> Yes, this is tested.
>
>> Since your not implementing any RX offloads, you might just allocate
>> fresh skb's with alloc_skb() and store skb pointer in rxdes3. Since
>
> rxdes3 does not store virtual address of an skb.
> It stores the address of the buffer allocated while open() and freed
> only when stop().
> The data in that buffer will be memcpy()ed to an skb allocated in
> ftmac100_rx_packet().
> No double free happens.

Ah, I blindly assumed that you're just appending the buffers to the
skb (using skb_fill_page_desc() and friends). Since you have to mark
descriptors for the device anyway, it might be faster to allocate new
skbs and map those as rx buffers (changing the descriptor's buffer
address after every RX) instead of keeping static buffer and copying
every time. (For small packets it wastes lot of memory, though - so
the right choice depends on the expected workload.)

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: Using ethernet device as efficient small packet generator
From: juice @ 2011-01-24 20:51 UTC (permalink / raw)
  To: Eric Dumazet, Brandeburg, Jesse, Loke, Chetan, Jon Zhou,
	"Stephen Hemming
In-Reply-To: <1295886844.2755.36.camel@edumazet-laptop>



> Please check what numbers you can get using dummy0 device instead of
> real ethernet driver.
>
> Here : (E5540  @ 2.53GHz) clone = 1
>
> Result: OK: 34775941(c34775225+d716) nsec, 100000000 (60byte,0frags)
>   2875551pps 1380Mb/sec (1380264480bps) errors: 0
>

My result on the machine (W3503 @ 2.40GHz):

Params: count 10000000  min_pkt_size: 60  max_pkt_size: 60
     frags: 0  delay: 0  clone_skb: 0  ifname: dummy0
     flows: 0 flowlen: 0
     queue_map_min: 0  queue_map_max: 0
     dst_min: 10.10.11.2  dst_max:
     src_min:   src_max:
     src_mac: b6:b2:a2:f4:8e:dc dst_mac: 00:04:23:08:91:dc
     udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
     src_mac_count: 0  dst_mac_count: 0
     Flags:
Current:
     pkts-sofar: 10000000  errors: 0
     started: 1295902048722173us  stopped: 1295902052312514us idle: 3664us
     seq_num: 10000001  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
     cur_saddr: 0x0  cur_daddr: 0x20b0a0a
     cur_udp_dst: 9  cur_udp_src: 9
     cur_queue_map: 0
     flows: 0
Result: OK: 3590341(c3586677+d3664) usec, 10000000 (60byte,0frags)
  2785250pps 1336Mb/sec (1336920000bps) errors: 0


Yours, Jussi Ohenoja



^ permalink raw reply

* Re: [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2011-01-24 21:02 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips
In-Reply-To: <AANLkTikmt1BsuLmw+6Lv_trPfRzHYdjWY-61DPv_3T7y@mail.gmail.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 24 Jan 2011 00:19:30 -0800

> On Sat, Jan 22, 2011 at 03:56, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>> The following series contains cleanups for e1000e and addition support
>> for the i340 adapter in igb.
>>
>> The following are changes since commit bb134d2298b49f50cf6d9388410fba96272905dc:
>>  net: netif_setup_tc() is static
>>
>> and are available in the git repository at:
>>  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next-2.6 master
>>
>> Bruce Allan (2):
>>  e1000e: reduce scope of some variables, remove unnecessary ones
>>  e1000e: Use kmemdup rather than duplicating its implementation
>>
>> Carolyn Wyborny (1):
>>  igb: Add support for i340 Quad Port Fiber Adapter
>>
>> Jeff Kirsher (1):
>>  e1000e: convert to stats64
>>
> 
> I have updated my tree to include Flavio's Signed-off-by on the following patch:
> 
> e1000e: convert to stats64

Pulled, thanks a lot Jeff.

^ permalink raw reply

* Re: 2.6.38-rc1: arp triggers RTNL assertion
From: Eric Dumazet @ 2011-01-24 21:11 UTC (permalink / raw)
  To: David Miller; +Cc: jamie, linux-kernel, netdev
In-Reply-To: <20110121.130657.106806953.davem@davemloft.net>

Le vendredi 21 janvier 2011 à 13:06 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 21 Jan 2011 19:52:56 +0100
> 
> > Here is how I fixed this, thanks again Jamie !
> > 
> > [PATCH] net: neighbour: pneigh_lookup() doesnt need RTNL
> 
> Eric, I don't think we can do this.
> 
> Fundamentally, any time a user operation changes the configuration
> of the networking, we must hold the RTNL.
> 
> Eliding the RTNL for lookups is fine, but for things that change
> state it is not.
> 
> I therefore think you'll need to rework the arp_ioctl() portions
> of the commit that introduced this regression.
> 

Here is a second try of the fix, thanks !

Note : Tested with CONFIG_PROVE_RCU=y

[PATCH] net: arp_ioctl() must hold RTNL

Commit 941666c2e3e0 "net: RCU conversion of dev_getbyhwaddr() and
arp_ioctl()" introduced a regression, reported by Jamie Heilman.
"arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert
in pneigh_lookup()

Removing RTNL requirement from arp_ioctl() was a mistake, just revert
that part.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |    3 ++-
 net/ipv4/arp.c |   11 +++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7c6a46f..24ea2d7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -749,7 +749,8 @@ EXPORT_SYMBOL(dev_get_by_index);
  *	@ha: hardware address
  *
  *	Search for an interface by MAC address. Returns NULL if the device
- *	is not found or a pointer to the device. The caller must hold RCU
+ *	is not found or a pointer to the device.
+ *	The caller must hold RCU or RTNL.
  *	The returned device has not had its ref count increased
  *	and the caller must therefore be careful about locking
  *
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 04c8b69..7927589 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1017,14 +1017,13 @@ static int arp_req_set_proxy(struct net *net, struct net_device *dev, int on)
 		IPV4_DEVCONF_ALL(net, PROXY_ARP) = on;
 		return 0;
 	}
-	if (__in_dev_get_rcu(dev)) {
-		IN_DEV_CONF_SET(__in_dev_get_rcu(dev), PROXY_ARP, on);
+	if (__in_dev_get_rtnl(dev)) {
+		IN_DEV_CONF_SET(__in_dev_get_rtnl(dev), PROXY_ARP, on);
 		return 0;
 	}
 	return -ENXIO;
 }
 
-/* must be called with rcu_read_lock() */
 static int arp_req_set_public(struct net *net, struct arpreq *r,
 		struct net_device *dev)
 {
@@ -1233,10 +1232,10 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	if (!(r.arp_flags & ATF_NETMASK))
 		((struct sockaddr_in *)&r.arp_netmask)->sin_addr.s_addr =
 							   htonl(0xFFFFFFFFUL);
-	rcu_read_lock();
+	rtnl_lock();
 	if (r.arp_dev[0]) {
 		err = -ENODEV;
-		dev = dev_get_by_name_rcu(net, r.arp_dev);
+		dev = __dev_get_by_name(net, r.arp_dev);
 		if (dev == NULL)
 			goto out;
 
@@ -1263,7 +1262,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 		break;
 	}
 out:
-	rcu_read_unlock();
+	rtnl_unlock();
 	if (cmd == SIOCGARP && !err && copy_to_user(arg, &r, sizeof(r)))
 		err = -EFAULT;
 	return err;



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox