Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] GRO: fix merging a paged skb after non-paged skbs
From: Eric Dumazet @ 2011-01-24 18:44 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: David Miller, netdev, Herbert Xu, Ben Hutchings
In-Reply-To: <20110124184752.1d0947dd@delilah>

Le lundi 24 janvier 2011 à 18:47 +0100, Michal Schmidt a écrit :
> Suppose that several linear skbs of the same flow were received by GRO. They
> were thus merged into one skb with a frag_list. Then a new skb of the same flow
> arrives, but it is a paged skb with data starting in its frags[].
> 
> Before adding the skb to the frag_list skb_gro_receive() will of course adjust
> the skb to throw away the headers. It correctly modifies the page_offset and
> size of the frag, but it leaves incorrect information in the skb:
>  ->data_len is not decreased at all.
>  ->len is decreased only by headlen, as if no change were done to the frag.
> Later in a receiving process this causes skb_copy_datagram_iovec() to return
> -EFAULT and this is seen in userspace as the result of the recv() syscall.
> 
> In practice the bug can be reproduced with the sfc driver. By default the
> driver uses an adaptive scheme when it switches between using
> napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
> reproduced when under rx load with enough successful GRO merging the driver
> decides to switch from the former to the latter.
> 
> Manual control is also possible, so reproducing this is easy with netcat:
>  - on machine1 (with sfc): nc -l 12345 > /dev/null
>  - on machine2: nc machine1 12345 < /dev/zero
>  - on machine1:
>    echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
>    echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
>  - See that nc has quit suddenly.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  net/core/skbuff.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d31bb36..c231f5b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2746,7 +2746,7 @@ merge:
>  	if (offset > headlen) {
>  		skbinfo->frags[0].page_offset += offset - headlen;
>  		skbinfo->frags[0].size -= offset - headlen;
> -		offset = headlen;
> +		skb->data_len -= offset - headlen;
>  	}
>  
>  	__skb_pull(skb, offset);

Hi Michal

Hmm, I dont really understand how __skb_pull(skb, offset) can be ok if
offset > headlen

skb->data might reach tail/end ?

Maybe I am too confused, this code is a bit complex :(

Thanks !

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..7cd1bc8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2744,8 +2744,12 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 merge:
 	if (offset > headlen) {
-		skbinfo->frags[0].page_offset += offset - headlen;
-		skbinfo->frags[0].size -= offset - headlen;
+		unsigned int eat = offset - headlen;
+
+		skbinfo->frags[0].page_offset += eat;
+		skbinfo->frags[0].size -= eat;
+		skb->data_len -= eat;
+		skb->len -= eat;
 		offset = headlen;
 	}
 



^ permalink raw reply related

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-24 18:36 UTC (permalink / raw)
  To: Rick Jones
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <4D3DC4AB.1000207@hp.com>

On Mon, Jan 24, 2011 at 10:27:55AM -0800, Rick Jones wrote:
> >
> >Just to block netperf you can send it SIGSTOP :)
> >
> 
> Clever :)  One could I suppose achieve the same result by making the
> remote receive socket buffer size smaller than the UDP message size
> and then not worry about having to learn the netserver's PID to send
> it the SIGSTOP.  I *think* the semantics will be substantially the
> same?

If you could set, it, yes. But at least linux ignores
any value substantially smaller than 1K, and then
multiplies that by 2:

        case SO_RCVBUF:
                /* Don't error on this BSD doesn't and if you think
                   about it this is right. Otherwise apps have to
                   play 'guess the biggest size' games. RCVBUF/SNDBUF
                   are treated in BSD as hints */

                if (val > sysctl_rmem_max)
                        val = sysctl_rmem_max;
set_rcvbuf:     
                sk->sk_userlocks |= SOCK_RCVBUF_LOCK;

                /*
                 * We double it on the way in to account for
                 * "struct sk_buff" etc. overhead.   Applications
                 * assume that the SO_RCVBUF setting they make will
                 * allow that much actual data to be received on that
                 * socket.
                 *
                 * Applications are unaware that "struct sk_buff" and
                 * other overheads allocate from the receive buffer
                 * during socket buffer allocation. 
                 *
                 * And after considering the possible alternatives,
                 * returning the value we actually used in getsockopt
                 * is the most desirable behavior.
                 */ 
                if ((val * 2) < SOCK_MIN_RCVBUF)
                        sk->sk_rcvbuf = SOCK_MIN_RCVBUF;
                else
                        sk->sk_rcvbuf = val * 2;

and

/*                      
 * Since sk_rmem_alloc sums skb->truesize, even a small frame might need
 * sizeof(sk_buff) + MTU + padding, unless net driver perform copybreak
 */             
#define SOCK_MIN_RCVBUF (2048 + sizeof(struct sk_buff))


>  Both will be drops at the socket buffer, albeit for for
> different reasons.  The "too small socket buffer" version though
> doesn't require one remember to "wake" the netserver in time to have
> it send results back to netperf without netperf tossing-up an error
> and not reporting any statistics.
> 
> Also, netperf has a "no control connection" mode where you can, in
> effect cause it to send UDP datagrams out into the void - I put it
> there to allow folks to test against the likes of echo discard and
> chargen services but it may have a use here.  Requires that one
> specify the destination IP and port for the "data connection"
> explicitly via the test-specific options.  In that mode the only
> stats reported are those local to netperf rather than netserver.

Ah, sounds perfect.

> happy benchmarking,
> 
> rick jones


^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Rick Jones @ 2011-01-24 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Simon Horman, Jesse Gross, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110123103902.GA28585@redhat.com>

> 
> Just to block netperf you can send it SIGSTOP :)
> 

Clever :)  One could I suppose achieve the same result by making the remote 
receive socket buffer size smaller than the UDP message size and then not worry 
about having to learn the netserver's PID to send it the SIGSTOP.  I *think* the 
semantics will be substantially the same?  Both will be drops at the socket 
buffer, albeit for for different reasons.  The "too small socket buffer" version 
though doesn't require one remember to "wake" the netserver in time to have it 
send results back to netperf without netperf tossing-up an error and not 
reporting any statistics.

Also, netperf has a "no control connection" mode where you can, in effect cause 
it to send UDP datagrams out into the void - I put it there to allow folks to 
test against the likes of echo discard and chargen services but it may have a 
use here.  Requires that one specify the destination IP and port for the "data 
connection" explicitly via the test-specific options.  In that mode the only 
stats reported are those local to netperf rather than netserver.

happy benchmarking,

rick jones

^ permalink raw reply

* [PATCH] TCP: fix a bug that triggers large number of TCP RST by mistake
From: H.K. Jerry Chu @ 2011-01-22 19:06 UTC (permalink / raw)
  To: netdev; +Cc: Jerry Chu

From: Jerry Chu <hkchu@google.com>

This patch fixes a bug that causes TCP RST packets to be generated
on otherwise correctly behaved applications, e.g., no unread data
on close,..., etc. To trigger the bug, at least two conditions must
be met:

1. The FIN flag is set on the last data packet, i.e., it's not on a
separate, FIN only packet.
2. The size of the last data chunk on the receive side matches
exactly with the size of buffer posted by the receiver, and the
receiver closes the socket without any further read attempt.

This bug was first noticed on our netperf based testbed for our IW10
proposal to IETF where a large number of RST packets were observed.
netperf's read side code meets the condition 2 above 100%.

Before the fix, tcp_data_queue() will queue the last skb that meets
condition 1 to sk_receive_queue even though it has fully copied out
(skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
tcp_recvmsg() often returns all the copied out data successfully
without actually consuming the skb, due to a check
"if ((chunk = len - tp->ucopy.len) != 0) {"
and
"len -= chunk;"
after tcp_prequeue_process() that causes "len" to become 0 and an
early exit from the big while loop.

I don't see any reason not to free the skb whose data have been fully
consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
get there if MSG_PEEK is on. Am I missing some arcane cases related
to urgent data?

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2549b29..eb7f82e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4399,7 +4399,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			if (!skb_copy_datagram_iovec(skb, 0, tp->ucopy.iov, chunk)) {
 				tp->ucopy.len -= chunk;
 				tp->copied_seq += chunk;
-				eaten = (chunk == skb->len && !th->fin);
+				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
 			}
 			local_bh_disable();
-- 
1.7.3.1


^ permalink raw reply related

* Re: does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Rick Jones @ 2011-01-24 18:10 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org, Rui
In-Reply-To: <4D3DB248.5070802@intel.com>

Alexander Duyck wrote:
> I would recommend testing with something like the "netperf -t TCP_CRR" 
> test which should open a number of ports and spread traffic out between 
> multiple queues.

TCP_CRR - Connect Request Response - it will cycle through almost the entire 
port space as it goes, one at a time.  Any one four-tuple will be unlikely to 
have all that many packets - just the SYN exchange, the request/response 
exchange and then the FIN exchange, so unless there is a tool looking at the 
queues with microsecond granularity, it will appear like it is all happening at 
once :)

If you want to see one queue used for "a while" and then another, I would 
suggest a TCP_RR test with the confidence intervals set to say 30 iterations. 
That will exchange packets for the test duration (global -l option) and then the 
next iteration will have a four-tuple that differs in the client port number 
from the previous (the "server" port number remains fixed through the iterations 
of the TCP_RR test.

One can also run TCP_RR tests in turn, one after the other, but that consumes 
port numbers in twos on both sides.  I suppose that these days with port number 
randomization that's OK, but in "the old days" it tended to mean that the 
control and data ports marched in lock-step and one would always be even the 
other odd, which didn't always work that well with hashes...  The use of the 
confidence intervals with the TCP_RR test deals with that by having only the one 
netperf control connection and then successive data connections.

happy benchmarking,

rick jones

There is also always the full specification of the port numbers and IP's for the 
data connection, though it is a bit more cumbersome.

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] xen: netfront: Drop GSO SKBs which do not have csum_blank.
From: Jeremy Fitzhardinge @ 2011-01-24 17:55 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev@vger.kernel.org, xen-devel@lists.xensource.com
In-Reply-To: <1295689392.3693.153.camel@localhost.localdomain>

On 01/22/2011 01:43 AM, Ian Campbell wrote:
> On Sat, 2011-01-22 at 00:58 +0000, Jeremy Fitzhardinge wrote: 
>> On 01/05/2011 05:23 AM, Ian Campbell wrote:
>>> The Linux network stack expects all GSO SKBs to have ip_summed ==
>>> CHECKSUM_PARTIAL (which implies that the frame contains a partial
>>> checksum) and the Xen network ring protocol similarly expects an SKB
>>> which has GSO set to also have NETRX_csum_blank (which also implies a
>>> partial checksum). Therefore drop such frames on receive otherwise
>>> they will trigger the warning in skb_gso_segment.
>>>
>>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>>> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
>>> Cc: xen-devel@lists.xensource.com
>>> Cc: netdev@vger.kernel.org
>>> ---
>>>  drivers/net/xen-netfront.c |    5 +++++
>>>  1 files changed, 5 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>> index cdbeec9..8b8c480 100644
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>>> @@ -836,6 +836,11 @@ static int handle_incoming_queue(struct net_device *dev,
>>>  				dev->stats.rx_errors++;
>>>  				continue;
>>>  			}
>>> +		} else if (skb_is_gso(skb)) {
>>> +			kfree_skb(skb);
>>> +			packets_dropped++;
>>> +			dev->stats.rx_errors++;
>>> +			continue;
>> This looks redundant; why not something like:
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 47e6a71..c1b8f64 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -852,13 +852,12 @@ static int handle_incoming_queue(struct net_device *dev,
>>  		/* Ethernet work: Delayed to here as it peeks the header. */
>>  		skb->protocol = eth_type_trans(skb, dev);
>>  
>> -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
>> -			if (skb_checksum_setup(skb)) {
>> -				kfree_skb(skb);
>> -				packets_dropped++;
>> -				dev->stats.rx_errors++;
>> -				continue;
>> -			}
>> +		if (skb->ip_summed != CHECKSUM_PARTIAL ||
>> +		    skb_checksum_setup(skb)) {
> That drops non-partial skbs. However they are fine unless they also
> claim to be gso.
>
> Perhaps you meant "skb->ip_summed == CHECKSUM_PARTIAL && !
> skb_checksum_setup(skb)" which I think works but doesn't allow us to
> correctly chain the gso check onto the else.

No, I didn't mean to drop the skb_is_gso() test.  But still, the if()s
can be folded to share the same body.

    J

^ permalink raw reply

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Patrick McHardy @ 2011-01-24 17:50 UTC (permalink / raw)
  To: Mario 'BitKoenig' Holbe, netfilter-devel, linux-kernel,
	NetDev
In-Reply-To: <20110124170213.GB2616@darkside.kls.lan>

Am 24.01.2011 18:02, schrieb Mario 'BitKoenig' Holbe:
> On Mon, Jan 24, 2011 at 05:10:50PM +0100, Patrick McHardy wrote:
>>> Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
>>> results in the advertisements being transmitted on the correct
>>> interfaces
>> Thanks. The problem appears to be that ip6_route_me_harder()
>> only uses the socket's oif for the route lookup when the
>> socket is bound to an interface, but radvd uses IPV6_PKTINFO
>> to specify the outgoing interface.
>>
>> I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
>> we unfortunately have neither an indication of this nor the
>> original route lookup keys available at the time the packet
>> is rerouted.
> 
> Mh, I'm not sure, but I guess an indication of netfilter not overriding
> IPV6_PKTINFO could be the fact that the source address does not
> change...

No, ip6_route_me_harder() only attaches a new route to the packet,
the packets contents are not changed.

> From my 1st mail:
> | # ip6tables -t mangle -A OUTPUT -o eth0 -j MARK --set-mark 1
> | # /etc/init.d/radvd start
> | -> eth0: <no traffic>
> | -> eth1: fe80::2a0:c9ff:fee6:90ce > ff02::1: prefix 2001:6f8:90c:10::/64
> | -> eth1: fe80::2d0:b7ff:fe06:6b36 > ff02::1: prefix 2001:6f8:90c:12::/64
> 
> fe80::2d0:b7ff:fe06:6b36 is the link-local address of eth0 set by radvd
> in IPV6_PKTINFO as well. This, of course, is no guarantee for
> ipi6_ifindex not being changed, but I believe if something would have
> changed it, it would also have changed ipi6_addr.

No, what is happening is that radvd sends the packet with a specified
ifindex using IPV6_PKTINFO. The mangle table notices that the mark
changes and calls ip6_route_me_harder(), which performs a new route
lookup without taking the specified oif into account. It therefore
chooses the first of your two routes and sends the packet out eth1.

^ permalink raw reply

* [PATCH] GRO: fix merging a paged skb after non-paged skbs
From: Michal Schmidt @ 2011-01-24 17:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Herbert Xu, Ben Hutchings

Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 net/core/skbuff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..c231f5b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2746,7 +2746,7 @@ merge:
 	if (offset > headlen) {
 		skbinfo->frags[0].page_offset += offset - headlen;
 		skbinfo->frags[0].size -= offset - headlen;
-		offset = headlen;
+		skb->data_len -= offset - headlen;
 	}
 
 	__skb_pull(skb, offset);
-- 
1.7.1


^ permalink raw reply related

* Re: does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Alexander Duyck @ 2011-01-24 17:09 UTC (permalink / raw)
  To: Rui; +Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <AANLkTinuozfPcAZStV-a=siqcLesqPHnGhVh=QitOnQs@mail.gmail.com>

On 1/24/2011 6:18 AM, Rui wrote:
> hi
> does intel X520-SR support RSS on single VLAN?
>
> tested with 3 different vlan id and priority packets
> What I saw is that all packets were always delivered to the same RxQ.
> looks can not get a different RSS index for these packet?
> any setting needed?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The X520 should have no problems hashing on a single VLAN tagged frame. 
  However the VLAN will not be a part of the RSS hash.  The  only 
components of the hash are the IPv4/IPv6 source and destination 
addresses, and if the flow is TCP then the port numbers.

I would recommend testing with something like the "netperf -t TCP_CRR" 
test which should open a number of ports and spread traffic out between 
multiple queues.

Thanks,

Alex

^ permalink raw reply

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Mario 'BitKoenig' Holbe @ 2011-01-24 17:02 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, linux-kernel, NetDev
In-Reply-To: <4D3DA48A.2020605@trash.net>

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Mon, Jan 24, 2011 at 05:10:50PM +0100, Patrick McHardy wrote:
> > Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
> > results in the advertisements being transmitted on the correct
> > interfaces
> Thanks. The problem appears to be that ip6_route_me_harder()
> only uses the socket's oif for the route lookup when the
> socket is bound to an interface, but radvd uses IPV6_PKTINFO
> to specify the outgoing interface.
> 
> I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
> we unfortunately have neither an indication of this nor the
> original route lookup keys available at the time the packet
> is rerouted.

Mh, I'm not sure, but I guess an indication of netfilter not overriding
IPV6_PKTINFO could be the fact that the source address does not
change...

From my 1st mail:
| # ip6tables -t mangle -A OUTPUT -o eth0 -j MARK --set-mark 1
| # /etc/init.d/radvd start
| -> eth0: <no traffic>
| -> eth1: fe80::2a0:c9ff:fee6:90ce > ff02::1: prefix 2001:6f8:90c:10::/64
| -> eth1: fe80::2d0:b7ff:fe06:6b36 > ff02::1: prefix 2001:6f8:90c:12::/64

fe80::2d0:b7ff:fe06:6b36 is the link-local address of eth0 set by radvd
in IPV6_PKTINFO as well. This, of course, is no guarantee for
ipi6_ifindex not being changed, but I believe if something would have
changed it, it would also have changed ipi6_addr.


Mario
-- 
Doing it right is no excuse for not meeting the schedule.
                                -- Plant Manager, Delphi Corporation

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply

* RE: Using ethernet device as efficient small packet generator
From: Eric Dumazet @ 2011-01-24 16:34 UTC (permalink / raw)
  To: juice
  Cc: Brandeburg, Jesse, Loke, Chetan, Jon Zhou, Stephen Hemminger,
	netdev@vger.kernel.org
In-Reply-To: <30747065682effddc661b8cd235553d9.squirrel@www.liukuma.net>

Le lundi 24 janvier 2011 à 10:10 +0200, juice a écrit :

> Result: OK: 12345544(c12344739+d804) nsec, 10000000 (60byte,0frags)
>   810008pps 388Mb/sec (388803840bps) errors: 0
> 

> 
> This is a bit better than the previous maxim of 750064pps / 360Mb/sec
> that I was able to achieve without tuning parameters with ethtool, but
> still not near the 1.1Mpacks/s that shoud be doable with my card?

Please check what numbers you can get using dummy0 device instead of
real ethernet driver.

Here : (E5540  @ 2.53GHz) clone = 1

Result: OK: 34775941(c34775225+d716) nsec, 100000000 (60byte,0frags)
  2875551pps 1380Mb/sec (1380264480bps) errors: 0




^ permalink raw reply

* Re: netfilter: marking IPv6 packets sends them to the wrong interface
From: Patrick McHardy @ 2011-01-24 16:10 UTC (permalink / raw)
  To: Mario 'BitKoenig' Holbe, netfilter-devel, linux-kernel,
	NetDev
In-Reply-To: <20110124143518.GA2616@darkside.kls.lan>

Am 24.01.2011 15:35, schrieb Mario 'BitKoenig' Holbe:
> On Mon, Jan 24, 2011 at 02:46:57PM +0100, Patrick McHardy wrote:
>> On 23.01.2011 13:21, Mario 'BitKoenig' Holbe wrote:
>>> Without marking everything runs as it should be.
>>> Marking eth0 packets results in all advertisements transmitted via eth1.
>>> The behaviour goes back to normal as soon as the marking disappears.
>>> I also tried marking with 0xff00 instead of 1 - same results.
>> That probably means that we're not using the correct keys
>> when rerouting in ip6_route_me_harder(). Just for testing,
>> please try to disable the ip6_route_me_harder() call in
>> net/ipv6/netfilter/ip6table_mangle.c::ip6t_mangle_out().
> 
> Yes, disabling the ip6_route_me_harder() call in ip6t_mangle_out()
> results in the advertisements being transmitted on the correct
> interfaces

Thanks. The problem appears to be that ip6_route_me_harder()
only uses the socket's oif for the route lookup when the
socket is bound to an interface, but radvd uses IPV6_PKTINFO
to specify the outgoing interface.

I guess netfilter shouldn't be overriding IPV6_PKTINFO, but
we unfortunately have neither an indication of this nor the
original route lookup keys available at the time the packet
is rerouted.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/2] net: dcbnl: remove redundant DCB_CAP_DCBX_STATIC bit
From: John Fastabend @ 2011-01-24 15:52 UTC (permalink / raw)
  To: Shmulik Ravid; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1295882871.25104.20.camel@lb-tlvb-shmulik.il.broadcom.com>

On 1/24/2011 7:27 AM, Shmulik Ravid wrote:
> 
> On Sun, 2011-01-23 at 21:46 -0800, John Fastabend wrote:
>> On 1/23/2011 8:53 AM, Shmulik Ravid wrote:
>>>
>>> On Fri, 2011-01-21 at 18:52 -0800, John Fastabend wrote:
>>>> On 1/21/2011 6:35 PM, John Fastabend wrote:
>>>>> Remove redundant DCB_CAP_DCBX_STATIC bit in DCB capabilities
>>>>>
>>>>> Setting this bit indicates that no embedded DCBx engine is
>>>>> present and the hardware can not be configured. This is the
>>>>> same as having none of the DCB capability flags set or simply
>>>>> not implementing the dcbnl ops at all.
>>>>>
>>>>> This patch removes this bit. The bit has not made a stable
>>>>> release yet so removing it should not be an issue with
>>>>> existing apps.
>>>>>
>>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>>> CC: Shmulik Ravid <shmulikr@broadcom.com>
>>>>> ---
>>>>>
>>>>

Dave, Please drop this patch sorry for the noise.

[...]

>> We have an advertise bit in userspace that can be set and cleared to
>> do something similar for host based agents. I think for pg and application
>> data you can get the same behavior by setting the device to not willing.
>>
> True, but this requires a proper DCBx peer. The STATIC option is a bit
> stronger.

At least in the PG case the CEE spec says the local configuration should be
used[1]. Application is a bit more vague in my opinion[2].

> 
>> However for PFC it could potentially be useful. But how would the
>> user set this mode? This is a capabilities bit indicating the device
>> supports this. Is there a way to subsequently put the device in this
>> mode?
> You can set this mode by specifying this attribute in the set_dcbx
> operation. The input to set_dcbx should be a subset of the advertised
> dcbx attributes.
> 

OK This works for me Shmulik thanks for the explanation.

[1] 3.1.4. http://www.ieee802.org/1/files/public/docs2008/az-wadekar-dcbx-capability-exchange-discovery-protocol-1108-v1.01.pdf

[2] 3.3.2. http://www.ieee802.org/1/files/public/docs2008/az-wadekar-dcbx-capability-exchange-discovery-protocol-1108-v1.01.pdf

^ permalink raw reply

* [PATCH net-next-2.6] veth: remove unneeded ifname code from veth_newlink()
From: Jiri Pirko @ 2011-01-24 15:45 UTC (permalink / raw)
  To: netdev; +Cc: davem, xemul

The code is not needed because tb[IFLA_IFNAME] is already
processed in rtnl_newlink(). Remove this redundancy.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index cc83fa7..105d7f0 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -403,17 +403,6 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 	if (tb[IFLA_ADDRESS] == NULL)
 		random_ether_addr(dev->dev_addr);
 
-	if (tb[IFLA_IFNAME])
-		nla_strlcpy(dev->name, tb[IFLA_IFNAME], IFNAMSIZ);
-	else
-		snprintf(dev->name, IFNAMSIZ, DRV_NAME "%%d");
-
-	if (strchr(dev->name, '%')) {
-		err = dev_alloc_name(dev, dev->name);
-		if (err < 0)
-			goto err_alloc_name;
-	}
-
 	err = register_netdevice(dev);
 	if (err < 0)
 		goto err_register_dev;
@@ -433,7 +422,6 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 
 err_register_dev:
 	/* nothing to do */
-err_alloc_name:
 err_configure_peer:
 	unregister_netdevice(peer);
 	return err;

^ permalink raw reply related

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: Maciej Rutecki @ 2011-01-24 15:25 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Simon Arlott, netdev, Linux Kernel Mailing List
In-Reply-To: <AANLkTimTz=kMmJ=YhyJv28WUrsbN=ygBt9e7dMJ3KGqB@mail.gmail.com>

On niedziela, 23 stycznia 2011 o 22:29:02 Jesse Gross wrote:
> On Sun, Jan 23, 2011 at 9:45 AM, Maciej Rutecki
> 
> <maciej.rutecki@gmail.com> wrote:
> > I created a Bugzilla entry at
> > https://bugzilla.kernel.org/show_bug.cgi?id=27432
> > for your bug report, please add your address to the CC list in there,
> > thanks!
> 
> This isn't a bug - the change resolved behavior that varied depending
> on what NIC was in use.

Thanks for the update. Closing.

Regards
-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply

* Re: [PATCH v5] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Eric Dumazet @ 2011-01-24 15:07 UTC (permalink / raw)
  To: Po-Yu Chuang
  Cc: netdev, linux-kernel, bhutchings, joe, dilinger, mirqus,
	Po-Yu Chuang
In-Reply-To: <1295872799-1637-1-git-send-email-ratbert.chuang@gmail.com>

Le lundi 24 janvier 2011 à 20:39 +0800, Po-Yu Chuang a écrit :
> From: Po-Yu Chuang <ratbert@faraday-tech.com>


> +static int ftmac100_xmit(struct ftmac100 *priv, struct sk_buff *skb,
> +			 dma_addr_t map)
> +{
> +	struct net_device *netdev = priv->netdev;
> +	struct ftmac100_txdes *txdes;
> +	unsigned int len = (skb->len < ETH_ZLEN) ? ETH_ZLEN : skb->len;
> +
> +	txdes = ftmac100_current_txdes(priv);
> +	ftmac100_tx_pointer_advance(priv);
> +
> +	/* setup TX descriptor */
> +
> +	spin_lock(&priv->tx_lock);
> +	ftmac100_txdes_set_skb(txdes, skb);
> +	ftmac100_txdes_set_dma_addr(txdes, map);
> +
> +	ftmac100_txdes_set_first_segment(txdes);
> +	ftmac100_txdes_set_last_segment(txdes);
> +	ftmac100_txdes_set_txint(txdes);
> +	ftmac100_txdes_set_buffer_size(txdes, len);
> +
> +	priv->tx_pending++;
> +	if (priv->tx_pending == TX_QUEUE_ENTRIES) {
> +		if (net_ratelimit())
> +			netdev_info(netdev, "tx queue full\n");

Hmm, I guess you didnt tested your driver with a pktgen flood ;)

This 'netdev_info(netdev, "tx queue full\n");' is not necessary, since
its a pretty normal condition for a driver (to fill its TX ring buffer)

> +
> +		netif_stop_queue(netdev);
> +	}
> +
> +	/* start transmit */
> +	ftmac100_txdes_set_dma_own(txdes);
> +	spin_unlock(&priv->tx_lock);
> +
> +	ftmac100_txdma_start_polling(priv);
> +
> +	return NETDEV_TX_OK;
> +}

^ permalink raw reply

* Re: [PATCH v3 0/3] can: at91_can: fix for errata 50.2.6.3 & 50.3.5.3
From: Marc Kleine-Budde @ 2011-01-24 14:19 UTC (permalink / raw)
  To: David Miller
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1295878532-15769-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 2083 bytes --]

On 01/24/2011 03:15 PM, Marc Kleine-Budde wrote:
> Hello,
> 
> as promised I've implemented the proposed workaround for the errata
> 50.2.6.3 & 50.3.5.3:
> "Contents of Mailbox 0 can be sent Even if Mailbox is Disabled"
> 
> This means under high bus load it can happen that the mailbox 0 is send
> to the bus. And it does happen, even with the mainline driver where
> Mailbox 0 is a receive mailbox. The errata proposes not to use mailbox 0
> and load it with an unused can_id that will not disturb the bus.
> 
> The first patch cleans up the driver without any functional changes, so
> that the mailbox 0 can be disabled in the second patch. The third patch
> adds a sysfs parameter to the driver, so that the identifier of mailbox 0
> can configured.
> 
> This series applies to net-2.6/master. It has been tested on a ronetix pm9263
> board against a PCI-SJA1000 card with the canfdtest utility and on custom
> at91 boards against each other.

I've updated the patch series in my git-repo, too.

The following changes since commit b30532515f0a62bfe17207ab00883dd262497006:

  bonding: Ensure that we unshare skbs prior to calling pskb_may_pull (2011-01-20 16:45:56 -0800)

are available in the git repository at:
  git://git.pengutronix.de/git/mkl/linux-2.6.git can/at91_can-for-net-2.6

Marc Kleine-Budde (3):
      can: at91_can: clean up usage of AT91_MB_RX_FIRST and AT91_MB_RX_NUM
      can: at91_can: don't use mailbox 0
      can: at91_can: make can_id of mailbox 0 configurable

 Documentation/ABI/testing/sysfs-platform-at91 |   25 +++++
 drivers/net/can/at91_can.c                    |  138 ++++++++++++++++++++-----
 2 files changed, 137 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-platform-at91

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 188 bytes --]

_______________________________________________
Socketcan-core mailing list
Socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-core

^ permalink raw reply

* does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Rui @ 2011-01-24 14:18 UTC (permalink / raw)
  To: netdev

hi
does intel X520-SR support RSS on single VLAN?

tested with 3 different vlan id and priority packets
What I saw is that all packets were always delivered to the same RxQ.
looks can not get a different RSS index for these packet?
any setting needed?

^ permalink raw reply

* [PATCH v3 2/3] can: at91_can: don't use mailbox 0
From: Marc Kleine-Budde @ 2011-01-24 14:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Socketcan-core, Marc Kleine-Budde
In-Reply-To: <1295878532-15769-1-git-send-email-mkl@pengutronix.de>

Due to a chip bug (errata 50.2.6.3 & 50.3.5.3 in
"AT91SAM9263 Preliminary 6249H-ATARM-27-Jul-09") the contents of mailbox
0 may be send under certain conditions (even if disabled or in rx mode).

The workaround in the errata suggests not to use the mailbox and load it
with a unused identifier.

This patch implements the first part of the workaround, it updates
AT91_MB_RX_NUM and AT91_MB_RX_FIRST (and the inline documentation)
so that mailbox 0 stays unused.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
---
 drivers/net/can/at91_can.c |   32 ++++++++++++++++++++------------
 1 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
index 892c3d8..16e45a5 100644
--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -40,16 +40,16 @@
 
 #include <mach/board.h>
 
-#define AT91_NAPI_WEIGHT	12
+#define AT91_NAPI_WEIGHT	11
 
 /*
  * RX/TX Mailbox split
  * don't dare to touch
  */
-#define AT91_MB_RX_NUM		12
+#define AT91_MB_RX_NUM		11
 #define AT91_MB_TX_SHIFT	2
 
-#define AT91_MB_RX_FIRST	0
+#define AT91_MB_RX_FIRST	1
 #define AT91_MB_RX_LAST		(AT91_MB_RX_FIRST + AT91_MB_RX_NUM - 1)
 
 #define AT91_MB_RX_MASK(i)	((1 << (i)) - 1)
@@ -236,10 +236,14 @@ static void at91_setup_mailboxes(struct net_device *dev)
 	unsigned int i;
 
 	/*
-	 * The first 12 mailboxes are used as a reception FIFO. The
-	 * last mailbox is configured with overwrite option. The
-	 * overwrite flag indicates a FIFO overflow.
+	 * Due to a chip bug (errata 50.2.6.3 & 50.3.5.3) the first
+	 * mailbox is disabled. The next 11 mailboxes are used as a
+	 * reception FIFO. The last mailbox is configured with
+	 * overwrite option. The overwrite flag indicates a FIFO
+	 * overflow.
 	 */
+	for (i = 0; i < AT91_MB_RX_FIRST; i++)
+		set_mb_mode(priv, i, AT91_MB_MODE_DISABLED);
 	for (i = AT91_MB_RX_FIRST; i < AT91_MB_RX_LAST; i++)
 		set_mb_mode(priv, i, AT91_MB_MODE_RX);
 	set_mb_mode(priv, AT91_MB_RX_LAST, AT91_MB_MODE_RX_OVRWR);
@@ -541,27 +545,31 @@ static void at91_read_msg(struct net_device *dev, unsigned int mb)
  *
  * Theory of Operation:
  *
- * 12 of the 16 mailboxes on the chip are reserved for RX. we split
- * them into 2 groups. The lower group holds 8 and upper 4 mailboxes.
+ * 11 of the 16 mailboxes on the chip are reserved for RX. we split
+ * them into 2 groups. The lower group holds 7 and upper 4 mailboxes.
  *
  * Like it or not, but the chip always saves a received CAN message
  * into the first free mailbox it finds (starting with the
  * lowest). This makes it very difficult to read the messages in the
  * right order from the chip. This is how we work around that problem:
  *
- * The first message goes into mb nr. 0 and issues an interrupt. All
+ * The first message goes into mb nr. 1 and issues an interrupt. All
  * rx ints are disabled in the interrupt handler and a napi poll is
  * scheduled. We read the mailbox, but do _not_ reenable the mb (to
  * receive another message).
  *
  *    lower mbxs      upper
- *   ______^______    __^__
- *  /             \  /     \
+ *     ____^______    __^__
+ *    /           \  /     \
  * +-+-+-+-+-+-+-+-++-+-+-+-+
- * |x|x|x|x|x|x|x|x|| | | | |
+ * | |x|x|x|x|x|x|x|| | | | |
  * +-+-+-+-+-+-+-+-++-+-+-+-+
  *  0 0 0 0 0 0  0 0 0 0 1 1  \ mail
  *  0 1 2 3 4 5  6 7 8 9 0 1  / box
+ *  ^
+ *  |
+ *   \
+ *     unused, due to chip bug
  *
  * The variable priv->rx_next points to the next mailbox to read a
  * message from. As long we're in the lower mailboxes we just read the
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH v3 0/3] can: at91_can: fix for errata 50.2.6.3 & 50.3.5.3
From: Marc Kleine-Budde @ 2011-01-24 14:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Socketcan-core

Hello,

as promised I've implemented the proposed workaround for the errata
50.2.6.3 & 50.3.5.3:
"Contents of Mailbox 0 can be sent Even if Mailbox is Disabled"

This means under high bus load it can happen that the mailbox 0 is send
to the bus. And it does happen, even with the mainline driver where
Mailbox 0 is a receive mailbox. The errata proposes not to use mailbox 0
and load it with an unused can_id that will not disturb the bus.

The first patch cleans up the driver without any functional changes, so
that the mailbox 0 can be disabled in the second patch. The third patch
adds a sysfs parameter to the driver, so that the identifier of mailbox 0
can configured.

This series applies to net-2.6/master. It has been tested on a ronetix pm9263
board against a PCI-SJA1000 card with the canfdtest utility and on custom
at91 boards against each other.

changes since v2:
- rebased to current net-2.6/master
- added Acked-by (Thanks to Kurt Van Dijck, Wolfgang Grandegger
  and Wolfram Sang)

regards, Marc




^ permalink raw reply

* [PATCH v3 3/3] can: at91_can: make can_id of mailbox 0 configurable
From: Marc Kleine-Budde @ 2011-01-24 14:15 UTC (permalink / raw)
  To: David Miller
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1295878532-15769-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Due to a chip bug (errata 50.2.6.3 & 50.3.5.3 in
"AT91SAM9263 Preliminary 6249H-ATARM-27-Jul-09") the contents of mailbox
0 may be send under certain conditions (even if disabled or in rx mode).

The workaround in the errata suggests not to use the mailbox and load it
with an unused identifier.

This patch implements the second part of the workaround. A sysfs entry
"mb0_id" is introduced. While the interface is down it can be used to
configure the can_id of mailbox 0. The default value id 0x7ff.

In order to use an extended can_id add the CAN_EFF_FLAG (0x80000000U)
to the can_id. Example:

- standard id 0x7ff:
echo 0x7ff      > /sys/class/net/can0/mb0_id

- extended id 0x1fffffff:
echo 0x9fffffff > /sys/class/net/can0/mb0_id

Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
Acked-by: Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org>
For the Documentation-part:
Acked-by: Wolfram Sang <w.sang-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
---
 Documentation/ABI/testing/sysfs-platform-at91 |   25 +++++++
 drivers/net/can/at91_can.c                    |   90 +++++++++++++++++++++++--
 2 files changed, 108 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-platform-at91

diff --git a/Documentation/ABI/testing/sysfs-platform-at91 b/Documentation/ABI/testing/sysfs-platform-at91
new file mode 100644
index 0000000..4cc6a86
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-platform-at91
@@ -0,0 +1,25 @@
+What:		/sys/devices/platform/at91_can/net/<iface>/mb0_id
+Date:		January 2011
+KernelVersion:	2.6.38
+Contact:	Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
+Description:
+		Value representing the can_id of mailbox 0.
+
+		Default: 0x7ff (standard frame)
+
+		Due to a chip bug (errata 50.2.6.3 & 50.3.5.3 in
+		"AT91SAM9263 Preliminary 6249H-ATARM-27-Jul-09") the
+		contents of mailbox 0 may be send under certain
+		conditions (even if disabled or in rx mode).
+
+		The workaround in the errata suggests not to use the
+		mailbox and load it with an unused identifier.
+
+		In order to use an extended can_id add the
+		CAN_EFF_FLAG (0x80000000U) to the can_id. Example:
+
+		- standard id 0x7ff:
+		echo 0x7ff      > /sys/class/net/can0/mb0_id
+
+		- extended id 0x1fffffff:
+		echo 0x9fffffff > /sys/class/net/can0/mb0_id
diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
index 16e45a5..2532b96 100644
--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -30,6 +30,7 @@
 #include <linux/module.h>
 #include <linux/netdevice.h>
 #include <linux/platform_device.h>
+#include <linux/rtnetlink.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
 #include <linux/string.h>
@@ -169,6 +170,8 @@ struct at91_priv {
 
 	struct clk		*clk;
 	struct at91_can_data	*pdata;
+
+	canid_t			mb0_id;
 };
 
 static struct can_bittiming_const at91_bittiming_const = {
@@ -221,6 +224,18 @@ static inline void set_mb_mode(const struct at91_priv *priv, unsigned int mb,
 	set_mb_mode_prio(priv, mb, mode, 0);
 }
 
+static inline u32 at91_can_id_to_reg_mid(canid_t can_id)
+{
+	u32 reg_mid;
+
+	if (can_id & CAN_EFF_FLAG)
+		reg_mid = (can_id & CAN_EFF_MASK) | AT91_MID_MIDE;
+	else
+		reg_mid = (can_id & CAN_SFF_MASK) << 18;
+
+	return reg_mid;
+}
+
 /*
  * Swtich transceiver on or off
  */
@@ -234,6 +249,7 @@ static void at91_setup_mailboxes(struct net_device *dev)
 {
 	struct at91_priv *priv = netdev_priv(dev);
 	unsigned int i;
+	u32 reg_mid;
 
 	/*
 	 * Due to a chip bug (errata 50.2.6.3 & 50.3.5.3) the first
@@ -242,8 +258,13 @@ static void at91_setup_mailboxes(struct net_device *dev)
 	 * overwrite option. The overwrite flag indicates a FIFO
 	 * overflow.
 	 */
-	for (i = 0; i < AT91_MB_RX_FIRST; i++)
+	reg_mid = at91_can_id_to_reg_mid(priv->mb0_id);
+	for (i = 0; i < AT91_MB_RX_FIRST; i++) {
 		set_mb_mode(priv, i, AT91_MB_MODE_DISABLED);
+		at91_write(priv, AT91_MID(i), reg_mid);
+		at91_write(priv, AT91_MCR(i), 0x0);	/* clear dlc */
+	}
+
 	for (i = AT91_MB_RX_FIRST; i < AT91_MB_RX_LAST; i++)
 		set_mb_mode(priv, i, AT91_MB_MODE_RX);
 	set_mb_mode(priv, AT91_MB_RX_LAST, AT91_MB_MODE_RX_OVRWR);
@@ -378,12 +399,7 @@ static netdev_tx_t at91_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		netdev_err(dev, "BUG! TX buffer full when queue awake!\n");
 		return NETDEV_TX_BUSY;
 	}
-
-	if (cf->can_id & CAN_EFF_FLAG)
-		reg_mid = (cf->can_id & CAN_EFF_MASK) | AT91_MID_MIDE;
-	else
-		reg_mid = (cf->can_id & CAN_SFF_MASK) << 18;
-
+	reg_mid = at91_can_id_to_reg_mid(cf->can_id);
 	reg_mcr = ((cf->can_id & CAN_RTR_FLAG) ? AT91_MCR_MRTR : 0) |
 		(cf->can_dlc << 16) | AT91_MCR_MTCR;
 
@@ -1047,6 +1063,64 @@ static const struct net_device_ops at91_netdev_ops = {
 	.ndo_start_xmit	= at91_start_xmit,
 };
 
+static ssize_t at91_sysfs_show_mb0_id(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct at91_priv *priv = netdev_priv(to_net_dev(dev));
+
+	if (priv->mb0_id & CAN_EFF_FLAG)
+		return snprintf(buf, PAGE_SIZE, "0x%08x\n", priv->mb0_id);
+	else
+		return snprintf(buf, PAGE_SIZE, "0x%03x\n", priv->mb0_id);
+}
+
+static ssize_t at91_sysfs_set_mb0_id(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct net_device *ndev = to_net_dev(dev);
+	struct at91_priv *priv = netdev_priv(ndev);
+	unsigned long can_id;
+	ssize_t ret;
+	int err;
+
+	rtnl_lock();
+
+	if (ndev->flags & IFF_UP) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	err = strict_strtoul(buf, 0, &can_id);
+	if (err) {
+		ret = err;
+		goto out;
+	}
+
+	if (can_id & CAN_EFF_FLAG)
+		can_id &= CAN_EFF_MASK | CAN_EFF_FLAG;
+	else
+		can_id &= CAN_SFF_MASK;
+
+	priv->mb0_id = can_id;
+	ret = count;
+
+ out:
+	rtnl_unlock();
+	return ret;
+}
+
+static DEVICE_ATTR(mb0_id, S_IWUGO | S_IRUGO,
+	at91_sysfs_show_mb0_id, at91_sysfs_set_mb0_id);
+
+static struct attribute *at91_sysfs_attrs[] = {
+	&dev_attr_mb0_id.attr,
+	NULL,
+};
+
+static struct attribute_group at91_sysfs_attr_group = {
+	.attrs = at91_sysfs_attrs,
+};
+
 static int __devinit at91_can_probe(struct platform_device *pdev)
 {
 	struct net_device *dev;
@@ -1092,6 +1166,7 @@ static int __devinit at91_can_probe(struct platform_device *pdev)
 	dev->netdev_ops	= &at91_netdev_ops;
 	dev->irq = irq;
 	dev->flags |= IFF_ECHO;
+	dev->sysfs_groups[0] = &at91_sysfs_attr_group;
 
 	priv = netdev_priv(dev);
 	priv->can.clock.freq = clk_get_rate(clk);
@@ -1103,6 +1178,7 @@ static int __devinit at91_can_probe(struct platform_device *pdev)
 	priv->dev = dev;
 	priv->clk = clk;
 	priv->pdata = pdev->dev.platform_data;
+	priv->mb0_id = 0x7ff;
 
 	netif_napi_add(dev, &priv->napi, at91_poll, AT91_NAPI_WEIGHT);
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH v3 1/3] can: at91_can: clean up usage of AT91_MB_RX_FIRST and AT91_MB_RX_NUM
From: Marc Kleine-Budde @ 2011-01-24 14:15 UTC (permalink / raw)
  To: David Miller
  Cc: Socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1295878532-15769-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

This patch cleans up the usage of two macros which specify the mailbox
usage. AT91_MB_RX_FIRST and AT91_MB_RX_NUM define the first and the
number of RX mailboxes. The current driver uses these variables in an
unclean way; assuming that AT91_MB_RX_FIRST is 0;

This patch cleans up the usage of these macros, no longer assuming
AT91_MB_RX_FIRST == 0.

Signed-off-by: Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Acked-by: Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
---
 drivers/net/can/at91_can.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
index 7ef83d0..892c3d8 100644
--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -2,7 +2,7 @@
  * at91_can.c - CAN network driver for AT91 SoC CAN controller
  *
  * (C) 2007 by Hans J. Koch <hjk-vqZO0P4V72/QD6PfKP4TzA@public.gmane.org>
- * (C) 2008, 2009, 2010 by Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
+ * (C) 2008, 2009, 2010, 2011 by Marc Kleine-Budde <kernel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  *
  * This software may be distributed under the terms of the GNU General
  * Public License ("GPL") version 2 as distributed in the 'COPYING'
@@ -55,7 +55,8 @@
 #define AT91_MB_RX_MASK(i)	((1 << (i)) - 1)
 #define AT91_MB_RX_SPLIT	8
 #define AT91_MB_RX_LOW_LAST	(AT91_MB_RX_SPLIT - 1)
-#define AT91_MB_RX_LOW_MASK	(AT91_MB_RX_MASK(AT91_MB_RX_SPLIT))
+#define AT91_MB_RX_LOW_MASK	(AT91_MB_RX_MASK(AT91_MB_RX_SPLIT) & \
+				 ~AT91_MB_RX_MASK(AT91_MB_RX_FIRST))
 
 #define AT91_MB_TX_NUM		(1 << AT91_MB_TX_SHIFT)
 #define AT91_MB_TX_FIRST	(AT91_MB_RX_LAST + 1)
@@ -254,7 +255,8 @@ static void at91_setup_mailboxes(struct net_device *dev)
 		set_mb_mode_prio(priv, i, AT91_MB_MODE_TX, 0);
 
 	/* Reset tx and rx helper pointers */
-	priv->tx_next = priv->tx_echo = priv->rx_next = 0;
+	priv->tx_next = priv->tx_echo = 0;
+	priv->rx_next = AT91_MB_RX_FIRST;
 }
 
 static int at91_set_bittiming(struct net_device *dev)
@@ -590,10 +592,10 @@ static int at91_poll_rx(struct net_device *dev, int quota)
 			"order of incoming frames cannot be guaranteed\n");
 
  again:
-	for (mb = find_next_bit(addr, AT91_MB_RX_NUM, priv->rx_next);
-	     mb < AT91_MB_RX_NUM && quota > 0;
+	for (mb = find_next_bit(addr, AT91_MB_RX_LAST + 1, priv->rx_next);
+	     mb < AT91_MB_RX_LAST + 1 && quota > 0;
 	     reg_sr = at91_read(priv, AT91_SR),
-	     mb = find_next_bit(addr, AT91_MB_RX_NUM, ++priv->rx_next)) {
+	     mb = find_next_bit(addr, AT91_MB_RX_LAST + 1, ++priv->rx_next)) {
 		at91_read_msg(dev, mb);
 
 		/* reactivate mailboxes */
@@ -610,8 +612,8 @@ static int at91_poll_rx(struct net_device *dev, int quota)
 
 	/* upper group completed, look again in lower */
 	if (priv->rx_next > AT91_MB_RX_LOW_LAST &&
-	    quota > 0 && mb >= AT91_MB_RX_NUM) {
-		priv->rx_next = 0;
+	    quota > 0 && mb > AT91_MB_RX_LAST) {
+		priv->rx_next = AT91_MB_RX_FIRST;
 		goto again;
 	}
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH 1/1] IPVS netns BUG, register sysctl for root ns
From: Hans Schillstrom @ 2011-01-24 14:14 UTC (permalink / raw)
  To: horms, ja, wensong, lvs-devel, netdev, netfilter-devel
  Cc: hans, Hans Schillstrom

The newly created table was not used when register sysctl for a new namespace.
I.e. sysctl doesn't work for other than root namespace (init_net)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 68b8033..98df59a 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3556,7 +3556,7 @@ int __net_init __ip_vs_control_init(struct net *net)
 
 
 	ipvs->sysctl_hdr = register_net_sysctl_table(net, net_vs_ctl_path,
-						  vs_vars);
+						     tbl);
 	if (ipvs->sysctl_hdr == NULL)
 		goto err_reg;
 	ip_vs_new_estimator(net, ipvs->tot_stats);
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH] net: add sysfs entry for device group
From: Vlad Dogaru @ 2011-01-24 13:37 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Dogaru, Stephen Hemminger

The group of a network device can be queried or changed from userspace
using sysfs.

For example, considering sysfs mounted in /sys, one can change the group
that interface lo belongs to:
	echo 1 > /sys/class/net/lo/group

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
---
 net/core/net-sysfs.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e23c01b..b726131 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -295,6 +295,20 @@ static ssize_t show_ifalias(struct device *dev,
 	return ret;
 }
 
+NETDEVICE_SHOW(group, fmt_dec);
+
+static int change_group(struct net_device *net, unsigned long new_group)
+{
+	dev_set_group(net, (int) new_group);
+	return 0;
+}
+
+static ssize_t store_group(struct device *dev, struct device_attribute *attr,
+			 const char *buf, size_t len)
+{
+	return netdev_store(dev, attr, buf, len, change_group);
+}
+
 static struct device_attribute net_class_attributes[] = {
 	__ATTR(addr_assign_type, S_IRUGO, show_addr_assign_type, NULL),
 	__ATTR(addr_len, S_IRUGO, show_addr_len, NULL),
@@ -316,6 +330,7 @@ static struct device_attribute net_class_attributes[] = {
 	__ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags),
 	__ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
 	       store_tx_queue_len),
+	__ATTR(group, S_IRUGO | S_IWUSR, show_group, store_group),
 	{}
 };
 
-- 
1.7.1


^ permalink raw reply related

* Re: [net-2.6 PATCH 1/2] net: dcbnl: remove redundant DCB_CAP_DCBX_STATIC bit
From: Shmulik Ravid @ 2011-01-24 15:27 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <4D3D123F.40700@intel.com>


On Sun, 2011-01-23 at 21:46 -0800, John Fastabend wrote:
> On 1/23/2011 8:53 AM, Shmulik Ravid wrote:
> > 
> > On Fri, 2011-01-21 at 18:52 -0800, John Fastabend wrote:
> >> On 1/21/2011 6:35 PM, John Fastabend wrote:
> >>> Remove redundant DCB_CAP_DCBX_STATIC bit in DCB capabilities
> >>>
> >>> Setting this bit indicates that no embedded DCBx engine is
> >>> present and the hardware can not be configured. This is the
> >>> same as having none of the DCB capability flags set or simply
> >>> not implementing the dcbnl ops at all.
> >>>
> >>> This patch removes this bit. The bit has not made a stable
> >>> release yet so removing it should not be an issue with
> >>> existing apps.
> >>>
> >>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>> CC: Shmulik Ravid <shmulikr@broadcom.com>
> >>> ---
> >>>
> >>
> >> Shmulik, could you ACK this because you added these bits? But
> >> I was adding support for this in lldpad and I see no reason that
> >> we need these?
> >>
> > DCB_CAP_DCBX_STATIC means that the embedded engine will turn the user
> > configuration into the operational configuration without performing the
> > actual negotiation, so it is not equivalent to not having an embedded
> > DCBx engine. This is mostly a debug and integration option as it allows
> > you to do DCB related or dependent testing and development without
> > having a proper DCBx peer.
> > 
> > On second thought, I'm not sure this option is justified although we
> > found it useful during our development. If you think it's not useful
> > enough (or not at all) then by all means remove it.
> 
> We have an advertise bit in userspace that can be set and cleared to
> do something similar for host based agents. I think for pg and application
> data you can get the same behavior by setting the device to not willing.
> 
True, but this requires a proper DCBx peer. The STATIC option is a bit
stronger.

> However for PFC it could potentially be useful. But how would the
> user set this mode? This is a capabilities bit indicating the device
> supports this. Is there a way to subsequently put the device in this
> mode?
You can set this mode by specifying this attribute in the set_dcbx
operation. The input to set_dcbx should be a subset of the advertised
dcbx attributes.

Shmulik 




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox