Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-14  9:25 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux1394-devel, linux-kernel, netdev
In-Reply-To: <1289710228.8581.16.camel@maxim-laptop>

Maxim Levitsky wrote:
> In fact after lot of testing I see that original patch, 
> '[PATCH 4/4] firewire: net: throttle TX queue before running out of
> tlabels' works the best here.
> With AR fixes, I don't see even a single fwnet_write_complete error on
> ether side.

Well, that version missed that the rx path opened up the tx queue again. I.e.
it did not work as intended.

> However the 'update 2' (maybe update 1 too, didn't test), lowers
> desktop->laptop throughput somewhat.
> (250 vs 227 Mbits/s). I tested this many times.
> 
> Actuall raw troughput possible with UDP stream and ether no throttling
> or higher packets in flight count (I tested 50/30), it 280 Mbits/s.

Good, I will test deeper queues with a few different controllers here.  As
long as we keep a margin to 64 so that other traffic besides IPover1394 still
has a chance to acquire transaction labels, it's OK.

> BTW, I still don't understand fully  why my laptop sends only at 180
> Mbits/s pretty much always regardless of patches or TCP/UDP.

If it is not CPU bound, then it is because Ricoh did not optimize the AR DMA
unit as well as Texas Instruments did.
-- 
Stefan Richter
-=====-==-=- =-== -===-
http://arcgraph.de/sr/

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Marcos @ 2010-11-14  9:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Stephen Guerin
In-Reply-To: <1289724643.2743.58.camel@edumazet-laptop>

> I have no idea why and how kademlia would be added to "linux kernel"
>
> Its a protocol based on UDP, and probably already done on userland.
>
> What am I missing ?

The idea is to tightly couple it to the operating system to create a
sort of "super operating system" that is seamless to the application
layers above.  Just like memory stores are tightly integrated as to be
unnoticeable....

marcos

^ permalink raw reply

* [PATCH] netfilter: don't use atomic bit operation
From: Changli Gao @ 2010-11-14  9:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

As we own ct, and the others can't see it until we confirm it, we don't
need to use atomic bit operation on ct->status.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/netfilter/nf_nat_core.h |    4 ++--
 net/ipv4/netfilter/nf_nat_core.c    |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/net/netfilter/nf_nat_core.h b/include/net/netfilter/nf_nat_core.h
index 33602ab..52ac1d8 100644
--- a/include/net/netfilter/nf_nat_core.h
+++ b/include/net/netfilter/nf_nat_core.h
@@ -21,9 +21,9 @@ static inline int nf_nat_initialized(struct nf_conn *ct,
 				     enum nf_nat_manip_type manip)
 {
 	if (manip == IP_NAT_MANIP_SRC)
-		return test_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
+		return IPS_SRC_NAT_DONE_BIT & ct->status;
 	else
-		return test_bit(IPS_DST_NAT_DONE_BIT, &ct->status);
+		return IPS_DST_NAT_DONE_BIT & ct->status;
 }
 
 struct nlattr;
diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c
index c04787c..ab877ac 100644
--- a/net/ipv4/netfilter/nf_nat_core.c
+++ b/net/ipv4/netfilter/nf_nat_core.c
@@ -323,9 +323,9 @@ nf_nat_setup_info(struct nf_conn *ct,
 
 	/* It's done. */
 	if (maniptype == IP_NAT_MANIP_DST)
-		set_bit(IPS_DST_NAT_DONE_BIT, &ct->status);
+		ct->status |= IPS_DST_NAT_DONE_BIT;
 	else
-		set_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
+		ct->status |= IPS_SRC_NAT_DONE_BIT;
 
 	return NF_ACCEPT;
 }

^ permalink raw reply related

* Re: [PATCH/RFC] netfilter: nf_conntrack_sip: Handle quirky Cisco phones
From: Eric Dumazet @ 2010-11-14  8:59 UTC (permalink / raw)
  To: Kevin Cernekee
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netfilter, coreteam, linux-kernel, netdev
In-Reply-To: <28d666269c390965f1a4edca42f93c12@localhost>

Le dimanche 14 novembre 2010 à 00:32 -0800, Kevin Cernekee a écrit :
> Most SIP devices use a source port of 5060/udp on SIP requests, so the
> response automatically comes back to port 5060:
> 
> phone_ip:5060 -> proxy_ip:5060   REGISTER
> proxy_ip:5060 -> phone_ip:5060   100 Trying
> 
> The newer Cisco IP phones, however, use a randomly chosen high source
> port for the SIP request but expect the response on port 5060:
> 
> phone_ip:49173 -> proxy_ip:5060  REGISTER
> proxy_ip:5060 -> phone_ip:5060   100 Trying
> 
> Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> to port 49173, not 5060:
> 
> phone_ip:49173 -> proxy_ip:5060  REGISTER
> proxy_ip:5060 -> phone_ip:49173  100 Trying
> 
> But the phone is not listening on 49173, so it will never see the reply.
> 
> This issue was seen on a Cisco CP-7965G, firmware 8-5(3).  It appears
> to be a well-known problem on 7941 and newer:
> 
> http://www.voip-info.org/wiki/view/Standalone+Cisco+7941%252F7961+without+a+local+PBX
> 
> Search for "Connecting to the outside world"
> 
> I contacted Cisco support and they were not amenable to changing the
> behavior.  It appears to be RFC3261-compliant, as the "Sent-by port"
> field in the request specifies 5060:
> 

There is a difference between being RFC compliant, and being usable.

Most SIP sotfwares I know will break with such a stupid CISCO behavior.



> 18.2.2 Sending Responses
> 
>    The server transport uses the value of the top Via header field in
>    order to determine where to send a response.  It MUST follow the
>    following process:
> 
> ...
> 
>       o  Otherwise (for unreliable unicast transports), if the top Via
>          has a "received" parameter, the response MUST be sent to the
>          address in the "received" parameter, using the port indicated
>          in the "sent-by" value, or using port 5060 if none is specified
>          explicitly.  If this fails, for example, elicits an ICMP "port
>          unreachable" response, the procedures of Section 5 of [4]
>          SHOULD be used to determine where to send the response.
> 
> This patch modifies nf_*_sip to work around this quirk, by rewriting
> the response port to 5060 when the following conditions are met:
> 
>  - User-Agent starts with "Cisco"
> 
>  - Incoming TTL was exactly 64 (meaning that our system is the phone's
>    local router, not an intermediate router)
> 

This seems a hack to me, sorry. How many different vendors will switch
to "Cisco" broken way, and we have to patch over and over ?

I would like to get an exact SIP exchange to make sure their is not
another way to handle this without adding a "Cisco" string somewhere...

Please provide a pcap or tcpdump -A

Thanks



^ permalink raw reply

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
From: Eric Dumazet @ 2010-11-14  8:52 UTC (permalink / raw)
  To: Zhang Le
  Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1289720156-30118-1-git-send-email-r0bertz@gentoo.org>

Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> req->ts_recent. In this case, theoretically the req should not be ignored.
> 
> But in fact, it could be ignored, if peer->tcp_ts is so small that the
> difference between this two number is larger than 2 to the power of 31.
> 
> I understand that under this situation, timestamp does not make sense any more,
> because it actually comes from difference machines. However, if anyone
> ever need to do the same investigation which I have done, this will
> save some time for him.
> 
> Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> ---
>  net/ipv4/tcp_ipv4.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 8f8527d..1eb4974 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  		    peer->v4daddr == saddr) {
>  			inet_peer_refcheck(peer);
>  			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> -			    (s32)(peer->tcp_ts - req->ts_recent) >
> -							TCP_PAWS_WINDOW) {
> +			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> +			     peer->tcp_ts > req->ts_recent)) {
>  				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
>  				goto drop_and_release;
>  			}

This seems very wrong to me.

Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.

Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.

What about following patch instead ?

[PATCH] doc: extend tcp_tw_recycle documentation

tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/networking/ip-sysctl.txt |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
 tcp_tw_recycle - BOOLEAN
 	Enable fast recycling TIME-WAIT sockets. Default value is 0.
 	It should not be changed without advice/request of technical
-	experts.
+	experts. If you set it to 1, make sure you dont miss connections
+	attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+	In particular, this might break if several clients are behind
+	a common NAT device, since their TCP timestamp wont be changed
+	by the NAT. tcp_tw_recycle should be used with care, most
+	probably in private networks.
 
 tcp_tw_reuse - BOOLEAN
 	Allow to reuse TIME-WAIT sockets for new connections when it is



^ permalink raw reply related

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Eric Dumazet @ 2010-11-14  8:50 UTC (permalink / raw)
  To: Marcos; +Cc: netdev, Stephen Guerin
In-Reply-To: <AANLkTinWUmQ91cCULC8ZXFLwSKz6SNt3BpszrBEhbgcu@mail.gmail.com>

Le dimanche 14 novembre 2010 à 00:21 -0700, Marcos a écrit :
> [Fwd from [linux-kernel], thought I'd follow the suggestion to post
> this to netdev:]
> 
> After seeing some attention this idea generated in the linux press,
> I'd like to re-visit this suggestion.  I'm a nobody on this list, but
> do have some expertise in complex systems (i.e. complexity theory).
> 
> The Kademlia protocol is simple: it has four commands (and won't
> likely grow more): PING, STORE, FIND_NODE, FIND_VALUE.
> It is computationally effortless: it generates random node id's and
> computes distance on a distributed hash table using an simple XOR
> function.
> It is (probably optimally) efficient:  O(log(n)) for n nodes.
> Ultimately, it could increase security: by creating a system for
> tracking trusted peers, a new topology of content-sharing can be
> generated.
> 
> [From the (kademlia) wikipedia article]: "The first generation peer-to-peer file
> sharing networks, such as Napster, relied on a central database to
> co-ordinate look ups on the network. Second generation peer-to-peer
> networks, such as Gnutella, used flooding to locate files, searching
> every node on the network. Third generation peer-to-peer networks use
> Distributed Hash Tables to look up files in the network. Distributed
> hash tables store resource locations throughout the network. A major
> criterion for these protocols is locating the desired nodes quickly."
> 
> Putting a simple, but robust p2p network layer in the kernel offers
> several novel and very interesting possibilities.
> 
> 1. Cutting-edge cool factor:  It would put linux way ahead of the
> net's general evolution to an full-fledged "Internet Operating
> System".  The world needs an open source solution over Google's,
> Microsoft's (or any other's) attempt to create such a solution.
> Dismiss any attempts to see such a request as warez-d00ds looking to
> make a more efficient pirating network.
> 
> 2. Lower maintenance:  Though unification, it would simplify the many
> (currently disparate) linux solutions for large-scale aggregation of
> computational and storage resources that are distributed across many
> machines.  Additionally, NFS (the networking protocol that *IS* in the
> kernel) is stale, has high administrative and operational overhead,
> and is not made to scale to millions of shared nodes in a graph
> topology.
> 
> 3. Excite a new wave of Linux development:  90% of linux machines are
> on the net, but don't utilize the real value of peer connectivity
> (which can grow profoundly faster than Metcalf's N^2 "value of the
> network" law).  Putting p2p in kernel space communicates to every
> developer that linux is serious about creating a unified and complete
> solution for creating such a infrastructure.  Let the cloud
> applications and such be in user space, but keep the main
> connection-tracking in the kernel.   Such a move would make for many
> (unforeseeable) complex emergent behaviors and optimizations to arise
> -- see Wikipedia on Reed's Law for a sense of it (to wit: "even if the
> utility of groups available to be joined is very small on a peer-group
> basis, eventually the network effect of potential group membership ...
> dominate[s] the overall economics of the system").
> 
> Consider, for example, social networking: it is an inherently p2p
> structure and is lying in wait to explode the next wave of internet
> evolution and new-value generation.  There's no doubt that this is the
> trend of the future -- best that open source be there first.  Users
> are creating value on their machines *every day*, but there's little
> infrastructure to take advantage of it.  Currently, it's either lost
> or exploited.  Solution and vision trajectories:  Diaspora comes to
> mind, mash-up applications like Photosynth aggregating the millions of
> photos on people's computers (see the TED.com presentation), open
> currencies and meritocratic market systems using such a "meta-linux"
> as a backbone, etc. -- whole new governance models for sharing content
> would undoubtedly arise.  HTTP/HTML is too much of an all-or-nothing
> and coarse approach to organizing the world's content.  The net needs
> a backbone for sharing personal content and grouping it to create new
> abstractions and wealth.  See pangaia.sourceforge.net for some of
> ideas I've personally been developing.
> 
> Anyway, I'm with hp_fk on this one.  Ignore at the peril and risk of
> the future...  :)
> 

I have no idea why and how kademlia would be added to "linux kernel"

Its a protocol based on UDP, and probably already done on userland.

What am I missing ?




^ permalink raw reply

* [PATCH/RFC] netfilter: nf_conntrack_sip: Handle quirky Cisco phones
From: Kevin Cernekee @ 2010-11-14  8:32 UTC (permalink / raw)
  To: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris <jmorris@
  Cc: netfilter-devel, netfilter, coreteam, linux-kernel, netdev

Most SIP devices use a source port of 5060/udp on SIP requests, so the
response automatically comes back to port 5060:

phone_ip:5060 -> proxy_ip:5060   REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

The newer Cisco IP phones, however, use a randomly chosen high source
port for the SIP request but expect the response on port 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

Standard Linux NAT, with or without nf_nat_sip, will send the reply back
to port 49173, not 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:49173  100 Trying

But the phone is not listening on 49173, so it will never see the reply.

This issue was seen on a Cisco CP-7965G, firmware 8-5(3).  It appears
to be a well-known problem on 7941 and newer:

http://www.voip-info.org/wiki/view/Standalone+Cisco+7941%252F7961+without+a+local+PBX

Search for "Connecting to the outside world"

I contacted Cisco support and they were not amenable to changing the
behavior.  It appears to be RFC3261-compliant, as the "Sent-by port"
field in the request specifies 5060:

18.2.2 Sending Responses

   The server transport uses the value of the top Via header field in
   order to determine where to send a response.  It MUST follow the
   following process:

...

      o  Otherwise (for unreliable unicast transports), if the top Via
         has a "received" parameter, the response MUST be sent to the
         address in the "received" parameter, using the port indicated
         in the "sent-by" value, or using port 5060 if none is specified
         explicitly.  If this fails, for example, elicits an ICMP "port
         unreachable" response, the procedures of Section 5 of [4]
         SHOULD be used to determine where to send the response.

This patch modifies nf_*_sip to work around this quirk, by rewriting
the response port to 5060 when the following conditions are met:

 - User-Agent starts with "Cisco"

 - Incoming TTL was exactly 64 (meaning that our system is the phone's
   local router, not an intermediate router)

Tested on Linus' latest 2.6.37-rc tree.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
---
 include/linux/netfilter/nf_conntrack_sip.h |    2 ++
 net/ipv4/netfilter/nf_nat_sip.c            |   12 ++++++++++++
 net/netfilter/nf_conntrack_sip.c           |   25 +++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 0ce91d5..a6ea620 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -8,6 +8,7 @@
 struct nf_ct_sip_master {
 	unsigned int	register_cseq;
 	unsigned int	invite_cseq;
+	unsigned int	cisco_port_mangle;
 };
 
 enum sip_expectation_classes {
@@ -90,6 +91,7 @@ enum sip_header_types {
 	SIP_HDR_EXPIRES,
 	SIP_HDR_CONTENT_LENGTH,
 	SIP_HDR_CALL_ID,
+	SIP_HDR_USER_AGENT,
 };
 
 enum sdp_header_types {
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index e40cf78..4b9a46d 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -121,6 +121,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
 	unsigned int coff, matchoff, matchlen;
 	enum sip_header_types hdr;
 	union nf_inet_addr addr;
@@ -225,6 +226,17 @@ next:
 			return NF_DROP;
 	}
 
+	/* Mangle destination port for Cisco phones, then fix up checksums */
+	if (help->help.ct_sip_info.cisco_port_mangle) {
+		struct udphdr *uh;
+
+		uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
+		uh->dest = htons(SIP_PORT);
+
+		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
+			return NF_DROP;
+	}
+
 	if (!map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_FROM) ||
 	    !map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_TO))
 		return NF_DROP;
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index bcf47eb..6042f66 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -18,6 +18,7 @@
 #include <linux/udp.h>
 #include <linux/tcp.h>
 #include <linux/netfilter.h>
+#include <linux/ip.h>
 
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_core.h>
@@ -338,6 +339,7 @@ static const struct sip_header ct_sip_hdrs[] = {
 	[SIP_HDR_EXPIRES]		= SIP_HDR("Expires", NULL, NULL, digits_len),
 	[SIP_HDR_CONTENT_LENGTH]	= SIP_HDR("Content-Length", "l", NULL, digits_len),
 	[SIP_HDR_CALL_ID]		= SIP_HDR("Call-Id", "i", NULL, callid_len),
+	[SIP_HDR_USER_AGENT]		= SIP_HDR("User-Agent", NULL, NULL, string_len),
 };
 
 static const char *sip_follow_continuation(const char *dptr, const char *limit)
@@ -1366,6 +1368,29 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 	unsigned int matchoff, matchlen;
 	unsigned int cseq, i;
 
+	/* Many Cisco IP phones use a high source port for SIP requests, but
+	 * listen for the response on port 5060.  If we are the local
+	 * router for one of these phones, flag the connection here so that
+	 * responses will be redirected to the correct port.
+	 */
+	do {
+		static const char cisco[] = "Cisco";
+		struct iphdr *iph = ip_hdr(skb);
+		struct nf_conn_help *help = nfct_help(ct);
+
+		if (iph->ttl != 63)
+			break;
+		if (ct_sip_get_header(ct, *dptr, 0, *datalen,
+				SIP_HDR_USER_AGENT, &matchoff, &matchlen) <= 0)
+			break;
+		if (matchlen < strlen(cisco))
+			break;
+		if (strnicmp(*dptr + matchoff, cisco, strlen(cisco)) != 0)
+			break;
+
+		help->help.ct_sip_info.cisco_port_mangle = 1;
+	} while (0);
+
 	for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
 		const struct sip_handler *handler;
 
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH v2] drivers/net/tile/: on-chip network drivers for the tile architecture
From: Sam Ravnborg @ 2010-11-14  7:52 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: linux-kernel, netdev, Stephen Hemminger, Eric Dumazet
In-Reply-To: <201011140335.oAE3Zuqg032279@farm-0027.internal.tilera.com>

> diff --git a/drivers/net/tile/Makefile b/drivers/net/tile/Makefile
> new file mode 100644
> index 0000000..f634f14
> --- /dev/null
> +++ b/drivers/net/tile/Makefile
> @@ -0,0 +1,10 @@
> +#
> +# Makefile for the TILE on-chip networking support.
> +#
> +
> +obj-$(CONFIG_TILE_NET) += tile_net.o
> +ifdef CONFIG_TILEGX
> +tile_net-objs := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
> +else
> +tile_net-objs := tilepro.o
> +endif

The -objs syntax is deprecated these days.
Preferred syntax for the above is:

ifdef CONFIG_TILEGX
tile_net-y := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
else
tile_net-y := tilepro.o
endif

We could make this even shorter like this (assuming TILEGX is a bool):
tile_net-y := tilepro.o
tile_net-$(CONFIG_TILEGX) := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o

But this is less readable - so the longer version is better.


	Sam

^ permalink raw reply

* [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
From: Zhang Le @ 2010-11-14  7:35 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Zhang Le, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.

But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.

I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		    peer->v4daddr == saddr) {
 			inet_peer_refcheck(peer);
 			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
-			    (s32)(peer->tcp_ts - req->ts_recent) >
-							TCP_PAWS_WINDOW) {
+			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+			     peer->tcp_ts > req->ts_recent)) {
 				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
 				goto drop_and_release;
 			}
-- 
1.7.3.2

^ permalink raw reply related

* Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Marcos @ 2010-11-14  7:21 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Guerin
In-Reply-To: <AANLkTi=VgBQ5rwkjZeo2zsRx1dSBy+4PBXeLLPQ4LRES@mail.gmail.com>

[Fwd from [linux-kernel], thought I'd follow the suggestion to post
this to netdev:]

After seeing some attention this idea generated in the linux press,
I'd like to re-visit this suggestion.  I'm a nobody on this list, but
do have some expertise in complex systems (i.e. complexity theory).

The Kademlia protocol is simple: it has four commands (and won't
likely grow more): PING, STORE, FIND_NODE, FIND_VALUE.
It is computationally effortless: it generates random node id's and
computes distance on a distributed hash table using an simple XOR
function.
It is (probably optimally) efficient:  O(log(n)) for n nodes.
Ultimately, it could increase security: by creating a system for
tracking trusted peers, a new topology of content-sharing can be
generated.

[From the (kademlia) wikipedia article]: "The first generation peer-to-peer file
sharing networks, such as Napster, relied on a central database to
co-ordinate look ups on the network. Second generation peer-to-peer
networks, such as Gnutella, used flooding to locate files, searching
every node on the network. Third generation peer-to-peer networks use
Distributed Hash Tables to look up files in the network. Distributed
hash tables store resource locations throughout the network. A major
criterion for these protocols is locating the desired nodes quickly."

Putting a simple, but robust p2p network layer in the kernel offers
several novel and very interesting possibilities.

1. Cutting-edge cool factor:  It would put linux way ahead of the
net's general evolution to an full-fledged "Internet Operating
System".  The world needs an open source solution over Google's,
Microsoft's (or any other's) attempt to create such a solution.
Dismiss any attempts to see such a request as warez-d00ds looking to
make a more efficient pirating network.

2. Lower maintenance:  Though unification, it would simplify the many
(currently disparate) linux solutions for large-scale aggregation of
computational and storage resources that are distributed across many
machines.  Additionally, NFS (the networking protocol that *IS* in the
kernel) is stale, has high administrative and operational overhead,
and is not made to scale to millions of shared nodes in a graph
topology.

3. Excite a new wave of Linux development:  90% of linux machines are
on the net, but don't utilize the real value of peer connectivity
(which can grow profoundly faster than Metcalf's N^2 "value of the
network" law).  Putting p2p in kernel space communicates to every
developer that linux is serious about creating a unified and complete
solution for creating such a infrastructure.  Let the cloud
applications and such be in user space, but keep the main
connection-tracking in the kernel.   Such a move would make for many
(unforeseeable) complex emergent behaviors and optimizations to arise
-- see Wikipedia on Reed's Law for a sense of it (to wit: "even if the
utility of groups available to be joined is very small on a peer-group
basis, eventually the network effect of potential group membership ...
dominate[s] the overall economics of the system").

Consider, for example, social networking: it is an inherently p2p
structure and is lying in wait to explode the next wave of internet
evolution and new-value generation.  There's no doubt that this is the
trend of the future -- best that open source be there first.  Users
are creating value on their machines *every day*, but there's little
infrastructure to take advantage of it.  Currently, it's either lost
or exploited.  Solution and vision trajectories:  Diaspora comes to
mind, mash-up applications like Photosynth aggregating the millions of
photos on people's computers (see the TED.com presentation), open
currencies and meritocratic market systems using such a "meta-linux"
as a backbone, etc. -- whole new governance models for sharing content
would undoubtedly arise.  HTTP/HTML is too much of an all-or-nothing
and coarse approach to organizing the world's content.  The net needs
a backbone for sharing personal content and grouping it to create new
abstractions and wealth.  See pangaia.sourceforge.net for some of
ideas I've personally been developing.

Anyway, I'm with hp_fk on this one.  Ignore at the peril and risk of
the future...  :)

marcos

^ permalink raw reply

* [PATCH] netfilter: define ct_*_info as needed
From: Changli Gao @ 2010-11-14  6:40 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/netfilter/nf_conntrack.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index caf17db..25c34af 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -50,11 +50,24 @@ union nf_conntrack_expect_proto {
 /* per conntrack: application helper private data */
 union nf_conntrack_help {
 	/* insert conntrack helper private data (master) here */
+#if defined(CONFIG_NF_CONNTRACK_FTP) || defined(CONFIG_NF_CONNTRACK_FTP_MODULE)
 	struct nf_ct_ftp_master ct_ftp_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_PPTP) || \
+    defined(CONFIG_NF_CONNTRACK_PPTP_MODULE)
 	struct nf_ct_pptp_master ct_pptp_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_H323) || \
+    defined(CONFIG_NF_CONNTRACK_H323_MODULE)
 	struct nf_ct_h323_master ct_h323_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_SANE) || \
+    defined(CONFIG_NF_CONNTRACK_SANE_MODULE)
 	struct nf_ct_sane_master ct_sane_info;
+#endif
+#if defined(CONFIG_NF_CONNTRACK_SIP) || defined(CONFIG_NF_CONNTRACK_SIP_MODULE)
 	struct nf_ct_sip_master ct_sip_info;
+#endif
 };
 
 #include <linux/types.h>

^ permalink raw reply related

* [PATCH] netfilter: fix the wrong alloc_size
From: Changli Gao @ 2010-11-14  6:35 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice
wrongly.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 net/netfilter/nf_conntrack_extend.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
index bd82450..920f924 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -144,9 +144,8 @@ static void update_alloc_size(struct nf_ct_ext_type *type)
 		if (!t1)
 			continue;
 
-		t1->alloc_size = sizeof(struct nf_ct_ext)
-				 + ALIGN(sizeof(struct nf_ct_ext), t1->align)
-				 + t1->len;
+		t1->alloc_size = ALIGN(sizeof(struct nf_ct_ext), t1->align) +
+				 t1->len;
 		for (j = 0; j < NF_CT_EXT_NUM; j++) {
 			t2 = nf_ct_ext_types[j];
 			if (t2 == NULL || t2 == t1 ||

^ permalink raw reply related

* Re: [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Maxim Levitsky @ 2010-11-14  4:50 UTC (permalink / raw)
  To: Stefan Richter; +Cc: netdev, linux1394-devel, linux-kernel
In-Reply-To: <tkrat.83117f4523f0d928@s5r6.in-berlin.de>

On Sat, 2010-11-13 at 23:07 +0100, Stefan Richter wrote:
> This prevents firewire-net from submitting write requests in fast
> succession until failure due to all 64 transaction labels were used up
> for unfinished split transactions.  The netif_stop/wake_queue API is
> used for this purpose.
> 
> Without this stop/wake mechanism, datagrams were simply lost whenever
> the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
> also prevented other unrelated outbound transactions to be initiated.
> 
> The high watermark is set to considerably less than 64 (I chose 8)
> because peers which run current Linux firewire-ohci are still easily
> saturated by this (i.e. some datagrams are dropped with ack-busy-*
> events), depending on the hardware at transmitter and receiver side.
> 
> I did not see changes to resulting throughput that were discernible from
> the usual measuring noise.  To do:  Revisit the choice of queue depth
> once firewire-ohci's AR DMA was improved.
> 
> I wonder what a good net_device.tx_queue_len value is.  I just set it
> to the same value as the chosen watermark for now.
> 
> Note:  This removes some netif_wake_queue from reception code paths.
> They were apparently copy&paste artefacts from a nonsensical
> netif_wake_queue use in the older eth1394 driver.  This belongs only
> into the transmit path.
> 
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
> Update 2:  Maxim told me to de-obfuscate status tracking.  I realized
> that netif_queue_stopped can be used for that and thereby noticed bogus
> usages of it in the rx path.

In fact after lot of testing I see that original patch, 
'[PATCH 4/4] firewire: net: throttle TX queue before running out of
tlabels' works the best here.
With AR fixes, I don't see even a single fwnet_write_complete error on
ether side.

However the 'update 2' (maybe update 1 too, didn't test), lowers
desktop->laptop throughput somewhat.
(250 vs 227 Mbits/s). I tested this many times.

Actuall raw troughput possible with UDP stream and ether no throttling
or higher packets in flight count (I tested 50/30), it 280 Mbits/s.

BTW, I still don't understand fully  why my laptop sends only at 180
Mbits/s pretty much always regardless of patches or TCP/UDP.

I also tested performance impact of other patches, and it is too small
to see through the noise.

Not bad, ah? From complete trainwreck, the IP over 1394, turned out into
very stable and fast connection that beats 100 Mbit ethernet a bit.

Now next on my list a POC (Piece of cake) items.

I need to figure out why s2ram hoses the network connection.
In fact usually, firewire-ohci does work, and reload of firewire-net
restores the connection.

Also, I need to add all required bits to make firewire-net work with NM.

I need to make resets more robust. Currently after cable plug it takes
some time until connection starts working.

So thanks again, especially to Clemens Ladisch for the hardest fixes
that made all this possible.

And of course feel free to not merge my AR rewrite, it is mostly done
as a prof of concept to see if my hardware is buggy or not.
I am sure these patches can be done better.

Best regards,
	Maxim Levitsky


------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev

^ permalink raw reply

* [PATCH  kernel 2.6.37-rc1]ipg.c: remove id [SUNDANCE, 0x1021]
From: Ken Kawasaki @ 2010-11-13 23:42 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20101107001124.7d8ef6c4.ken_kawasaki@spring.nifty.jp>


ipg.c:
  The id [SUNDANCE, 0x1021] (=[0x13f0, 0x1021]) is defined
  at dl2k.h and ipg.c.
  But this device works better with dl2k driver.
  
  This problem is similar with the commit 
  [25cca5352712561fba97bd37c495593d641c1d39
  ipg: Remove device claimed by dl2k from pci id table]
  at 11 Feb 2010.

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.37-rc1/drivers/net/ipg.c.orig	2010-11-13 19:54:53.000000000 +0900
+++ linux-2.6.37-rc1/drivers/net/ipg.c	2010-11-13 19:57:03.000000000 +0900
@@ -88,16 +88,14 @@ static const char *ipg_brand_name[] = {
 	"IC PLUS IP1000 1000/100/10 based NIC",
 	"Sundance Technology ST2021 based NIC",
 	"Tamarack Microelectronics TC9020/9021 based NIC",
-	"Tamarack Microelectronics TC9020/9021 based NIC",
 	"D-Link NIC IP1000A"
 };
 
 static DEFINE_PCI_DEVICE_TABLE(ipg_pci_tbl) = {
 	{ PCI_VDEVICE(SUNDANCE,	0x1023), 0 },
 	{ PCI_VDEVICE(SUNDANCE,	0x2021), 1 },
-	{ PCI_VDEVICE(SUNDANCE,	0x1021), 2 },
-	{ PCI_VDEVICE(DLINK,	0x9021), 3 },
-	{ PCI_VDEVICE(DLINK,	0x4020), 4 },
+	{ PCI_VDEVICE(DLINK,	0x9021), 2 },
+	{ PCI_VDEVICE(DLINK,	0x4020), 3 },
 	{ 0, }
 };
 

^ permalink raw reply

* bridge netpoll support: mismatch between net core and bridge headers
From: Mike Frysinger @ 2010-11-13 23:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

commit 91d2c34a4eed32876ca333b0ca44f3bc56645805 added this bit of code
to net/bridge/br_private.h:
struct net_bridge_port {
    ....
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   struct netpoll          *np;
+#endif
};
....
#ifdef CONFIG_NET_POLL_CONTROLLER
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+   return br->dev->npinfo;
+}
....

unfortunately, this is not the define protection that is used in the
core net code (include/linux/netdevice.h):
#ifdef CONFIG_NETPOLL
    struct netpoll_info *npinfo;
#endif

so in my randconfig builds, i'm now seeing frequent failures along the lines of:

In file included from net/bridge/br.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br.o] Error 1
In file included from net/bridge/br_device.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
In file included from net/bridge/br_fdb.c:27:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_fdb.o] Error 1
make[2]: *** [net/bridge/br_device.o] Error 1
In file included from net/bridge/br_forward.c:23:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_forward.o] Error 1

seems to be a regression introduced during the 2.6.36 cycle
-mike

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Chris Metcalf @ 2010-11-13 23:03 UTC (permalink / raw)
  To: Américo Wang; +Cc: Cypher Wu, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <ZXmP8hjgLHA.4648@exchange1.tad.internal.tilera.com>

On 11/12/2010 2:13 AM, Américo Wang wrote:
> On Fri, Nov 12, 2010 at 11:32:59AM +0800, Cypher Wu wrote:
>> On Thu, Nov 11, 2010 at 11:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le jeudi 11 novembre 2010 à 21:49 +0800, Cypher Wu a écrit :
>>>> I'm using TILEPro and its rwlock in kernel is a liitle different than
>>>> other platforms. It have a priority for write lock that when tried it
>>>> will block the following read lock even if read lock is hold by
>>>> others. Its code can be read in Linux Kernel 2.6.36 in
>>>> arch/tile/lib/spinlock_32.c.
>>>
>>> This seems a bug to me.
>>> [...]
>>>
>> It seems not a problem that read_lock() can be nested or not since
>> rwlock doesn't have 'owner', it's just that should we give
>> write_lock() a priority than read_lock() since if there have a lot
>> read_lock()s then they'll starve write_lock().
>> We should work out a well defined behavior so all the
>> platform-dependent raw_rwlock has to design under that principle.
> 
> It is a known weakness of rwlock, it is designed like that. :)

Exactly.  The tile rwlock correctly allows recursively reacquiring the read
lock.  But it does give priority to writers, for the (unfortunately
incorrect) reasons Cypher Wu outlined above, e.g.:

- Core A takes a read lock
- Core B tries for a write lock and blocks new read locks
- Core A tries for a (recursive) read lock and blocks

Core A and B are now deadlocked.

The solution is actually to simplify the tile rwlock implementation so that
both readers and writers contend fairly for the lock.

I'll post a patch in the next day or two for tile.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Peter Zijlstra @ 2010-11-13 22:54 UTC (permalink / raw)
  To: Américo Wang; +Cc: Eric Dumazet, Cypher Wu, linux-kernel, netdev
In-Reply-To: <20101112081945.GA5949@cr0.nay.redhat.com>

On Fri, 2010-11-12 at 16:19 +0800, Américo Wang wrote:
> 
> Just for record, both Tile and X86 implement rwlock with a write-bias,
> this somewhat reduces the write-starvation problem. 

x86 does no such thing.

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Peter Zijlstra @ 2010-11-13 22:53 UTC (permalink / raw)
  To: Cypher Wu; +Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <AANLkTik93QUQxSDmwd4Qj-gXQiWWPzd68JPAYAHBAsHR@mail.gmail.com>

On Fri, 2010-11-12 at 11:32 +0800, Cypher Wu wrote:
> It seems not a problem that read_lock() can be nested or not since
> rwlock doesn't have 'owner', 

You're mistaken.

> it's just that should we give
> write_lock() a priority than read_lock() since if there have a lot
> read_lock()s then they'll starve write_lock().

We rely on that behaviour. FWIW write preference locks will starve
readers.

> We should work out a well defined behavior so all the
> platform-dependent raw_rwlock has to design under that principle. 

We have that, all archs have read preference rwlock_t, they have to,
code relies on it.

^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Paul E. McKenney @ 2010-11-13 22:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Andrew Morton, linux-kernel, David Miller, netdev,
	Arnaldo Carvalho de Melo, Ingo Molnar, Andi Kleen, Nick Piggin
In-Reply-To: <alpine.DEB.2.00.1011121313001.16754@router.home>

On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> 
> prefetchw() would be too much overhead?

No idea.  Where do you believe that prefetchw() should be added?

							Thanx, Paul

^ permalink raw reply

* [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-13 22:07 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, netdev
In-Reply-To: <tkrat.eaac597cf54bb660@s5r6.in-berlin.de>

This prevents firewire-net from submitting write requests in fast
succession until failure due to all 64 transaction labels were used up
for unfinished split transactions.  The netif_stop/wake_queue API is
used for this purpose.

Without this stop/wake mechanism, datagrams were simply lost whenever
the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
also prevented other unrelated outbound transactions to be initiated.

The high watermark is set to considerably less than 64 (I chose 8)
because peers which run current Linux firewire-ohci are still easily
saturated by this (i.e. some datagrams are dropped with ack-busy-*
events), depending on the hardware at transmitter and receiver side.

I did not see changes to resulting throughput that were discernible from
the usual measuring noise.  To do:  Revisit the choice of queue depth
once firewire-ohci's AR DMA was improved.

I wonder what a good net_device.tx_queue_len value is.  I just set it
to the same value as the chosen watermark for now.

Note:  This removes some netif_wake_queue from reception code paths.
They were apparently copy&paste artefacts from a nonsensical
netif_wake_queue use in the older eth1394 driver.  This belongs only
into the transmit path.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Update 2:  Maxim told me to de-obfuscate status tracking.  I realized
that netif_queue_stopped can be used for that and thereby noticed bogus
usages of it in the rx path.

 drivers/firewire/net.c |   59 ++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 24 deletions(-)

Index: b/drivers/firewire/net.c
===================================================================
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -28,8 +28,14 @@
 #include <asm/unaligned.h>
 #include <net/arp.h>
 
-#define FWNET_MAX_FRAGMENTS	25	/* arbitrary limit */
-#define FWNET_ISO_PAGE_COUNT	(PAGE_SIZE < 16 * 1024 ? 4 : 2)
+/* rx limits */
+#define FWNET_MAX_FRAGMENTS		25 /* arbitrary limit */
+#define FWNET_ISO_PAGE_COUNT		(PAGE_SIZE < 16*1024 ? 4 : 2)
+
+/* tx limits */
+#define FWNET_MAX_QUEUED_DATAGRAMS	8 /* should keep AT DMA busy enough */
+#define FWNET_MIN_QUEUED_DATAGRAMS	2
+#define FWNET_TX_QUEUE_LEN		FWNET_MAX_QUEUED_DATAGRAMS /* ? */
 
 #define IEEE1394_BROADCAST_CHANNEL	31
 #define IEEE1394_ALL_NODES		(0xffc0 | 0x003f)
@@ -641,8 +647,6 @@ static int fwnet_finish_incoming_packet(
 		net->stats.rx_packets++;
 		net->stats.rx_bytes += skb->len;
 	}
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return 0;
 
@@ -651,8 +655,6 @@ static int fwnet_finish_incoming_packet(
 	net->stats.rx_dropped++;
 
 	dev_kfree_skb_any(skb);
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return -ENOENT;
 }
@@ -784,15 +786,10 @@ static int fwnet_incoming_packet(struct 
 	 * Datagram is not complete, we're done for the
 	 * moment.
 	 */
-	spin_unlock_irqrestore(&dev->lock, flags);
-
-	return 0;
+	retval = 0;
  fail:
 	spin_unlock_irqrestore(&dev->lock, flags);
 
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
-
 	return retval;
 }
 
@@ -892,6 +889,13 @@ static void fwnet_free_ptask(struct fwne
 	kmem_cache_free(fwnet_packet_task_cache, ptask);
 }
 
+/* Caller must hold dev->lock. */
+static void dec_queued_datagrams(struct fwnet_device *dev)
+{
+	if (--dev->queued_datagrams == FWNET_MIN_QUEUED_DATAGRAMS)
+		netif_wake_queue(dev->netdev);
+}
+
 static int fwnet_send_packet(struct fwnet_packet_task *ptask);
 
 static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask)
@@ -908,7 +912,7 @@ static void fwnet_transmit_packet_done(s
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = (ptask->outstanding_pkts == 0 && ptask->enqueued);
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	if (ptask->outstanding_pkts == 0) {
 		dev->netdev->stats.tx_packets++;
@@ -979,7 +983,7 @@ static void fwnet_transmit_packet_failed
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = ptask->enqueued;
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	dev->netdev->stats.tx_dropped++;
 	dev->netdev->stats.tx_errors++;
@@ -1064,7 +1068,7 @@ static int fwnet_send_packet(struct fwne
 		if (!free)
 			ptask->enqueued = true;
 		else
-			dev->queued_datagrams--;
+			dec_queued_datagrams(dev);
 
 		spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1083,7 +1087,7 @@ static int fwnet_send_packet(struct fwne
 	if (!free)
 		ptask->enqueued = true;
 	else
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1249,6 +1253,15 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	struct fwnet_peer *peer;
 	unsigned long flags;
 
+	spin_lock_irqsave(&dev->lock, flags);
+
+	/* Can this happen? */
+	if (netif_queue_stopped(dev->netdev)) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+
+		return NETDEV_TX_BUSY;
+	}
+
 	ptask = kmem_cache_alloc(fwnet_packet_task_cache, GFP_ATOMIC);
 	if (ptask == NULL)
 		goto fail;
@@ -1267,9 +1280,6 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	proto = hdr_buf.h_proto;
 	dg_size = skb->len;
 
-	/* serialize access to peer, including peer->datagram_label */
-	spin_lock_irqsave(&dev->lock, flags);
-
 	/*
 	 * Set the transmission type for the packet.  ARP packets and IP
 	 * broadcast packets are sent via GASP.
@@ -1291,7 +1301,7 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 		peer = fwnet_peer_find_by_guid(dev, be64_to_cpu(guid));
 		if (!peer || peer->fifo == FWNET_NO_FIFO_ADDR)
-			goto fail_unlock;
+			goto fail;
 
 		generation         = peer->generation;
 		dest_node          = peer->node_id;
@@ -1345,7 +1355,8 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 		max_payload += RFC2374_FRAG_HDR_SIZE;
 	}
 
-	dev->queued_datagrams++;
+	if (++dev->queued_datagrams == FWNET_MAX_QUEUED_DATAGRAMS)
+		netif_stop_queue(dev->netdev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1356,9 +1367,9 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 	return NETDEV_TX_OK;
 
- fail_unlock:
-	spin_unlock_irqrestore(&dev->lock, flags);
  fail:
+	spin_unlock_irqrestore(&dev->lock, flags);
+
 	if (ptask)
 		kmem_cache_free(fwnet_packet_task_cache, ptask);
 
@@ -1415,7 +1426,7 @@ static void fwnet_init_dev(struct net_de
 	net->addr_len		= FWNET_ALEN;
 	net->hard_header_len	= FWNET_HLEN;
 	net->type		= ARPHRD_IEEE1394;
-	net->tx_queue_len	= 10;
+	net->tx_queue_len	= FWNET_TX_QUEUE_LEN;
 	SET_ETHTOOL_OPS(net, &fwnet_ethtool_ops);
 }
 

-- 
Stefan Richter
-=====-==-=- =-== -==-=
http://arcgraph.de/sr/


^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Eric Dumazet @ 2010-11-13 22:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20101113101320.4b1c9ba7@nehalam>

Le samedi 13 novembre 2010 à 10:13 -0800, Stephen Hemminger a écrit :
> On Sat, 13 Nov 2010 18:58:50 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> > > On Sat, 13 Nov 2010 09:15:28 +0100
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > 
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 578debb..ffbd177 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -996,7 +996,10 @@ struct net_device {
> > > >  #endif
> > > >  
> > > >  	rx_handler_func_t	*rx_handler;
> > > > -	void			*rx_handler_data;
> > > > +	union {
> > > > +		void				*rx_handler_data;
> > > > +		struct net_bridge_port __rcu	*br_port_rcu;
> > > > +	};
> > > >  
> > > >  	struct netdev_queue __rcu *ingress_queue;
> > > 
> > > I don't like making the generic hook typed again.
> > > We don't do this for other callbacks, timers, workqueues, ...
> > > Why is it necessary for RCU notation.
> > > 
> > 
> > because rcu_dereference() needs the type for __CHECKER__/sparse checks
> > 
> > #define __rcu_dereference_check(p, c, space) \
> >         ({ \
> >                 typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
> >                 rcu_lockdep_assert(c); \
> >                 rcu_dereference_sparse(p, space); \
> >                 smp_read_barrier_depends(); \
> >                 ((typeof(*p) __force __kernel *)(_________p1)); \
> >         })
> > 
> > So using a "void *ptr" is not an option
> > 
> > Its also cleaner to use
> > 
> > rcu_dereference(dev->br_port_rcu)
> > 
> > instead of 
> > 
> > (struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)
> 
> There must be a better way. What about use of that hook by macvlan and openvswitch?

macvlan and openvswitch (is it part of linux yet ???)

I honestly dont understand your point Stephen, maybe you could explain a
bit more what is the problem ?

I use a union, like many other ones in the kernel. This is the first
time I ear this is not good to add type safety.

You can use either one or other field at your convenience.

If you are talking about stacking hooks, that has nothing to do with
this (cleanup) rcu patch, but previous introduction of
rx_handler_data/rx_handler ?

Please run sparse on x86_64 machine and watch all the warnings in bridge
code. (with CONFIG_SPARSE_RCU_POINTER=y)

Me confused.



^ permalink raw reply

* [PATCH] net: more Kconfig whitespace cleanup
From: Philippe De Muyter @ 2010-11-13 18:43 UTC (permalink / raw)
  To: netdev; +Cc: Philippe De Muyter

indentation for TSI108_ETH entry was too big.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
---
 drivers/net/Kconfig |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 805cf5d..90431de 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2389,12 +2389,12 @@ config SPIDER_NET
 	  Cell Processor-Based Blades from IBM.
 
 config TSI108_ETH
-	   tristate "Tundra TSI108 gigabit Ethernet support"
-	   depends on TSI108_BRIDGE
-	   help
-	     This driver supports Tundra TSI108 gigabit Ethernet ports.
-	     To compile this driver as a module, choose M here: the module
-	     will be called tsi108_eth.
+	tristate "Tundra TSI108 gigabit Ethernet support"
+	depends on TSI108_BRIDGE
+	help
+	  This driver supports Tundra TSI108 gigabit Ethernet ports.
+	  To compile this driver as a module, choose M here: the module
+	  will be called tsi108_eth.
 
 config GELIC_NET
 	tristate "PS3 Gigabit Ethernet driver"
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 4/10] Fix leaking of kernel heap addresses in net/
From: Dan Rosenberg @ 2010-11-13 18:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1289673008.3090.350.camel@Dan>

> Actually, this is not even a joke.
> 
> Take a look at how we track what sockets a user wants dumped via
> the inet_diag netlink facility, the socket pointer is used as
> the identification cookie.
> 
> I'm sure we'll now get some more security theatre about how we
> have to undo that too.
> 
> More and more I see this whole idea as extremely rediculious.
> 
> If I can write to or read kernel memory, I can look up the
> sockets, inodes, and whatever else we're currently exposing
> the addresses of.  Even without a symbol table, which is
> readily available, I can easily find the ksymtab and find the
> inode and socket hash table addresses there.
> 
> This whole exercise is closing the barn door after the horses have
> already escaped, and it's causing all kinds of inconveniences
> that we really have no need for.

Of course if you can both write and read to kernel memory, this is a
pointless exercise, but those are not the conditions I am trying to
defend against.  Proactive security assumes that individual bugs will
continue to be found, and takes steps to prevent exploitation of classes
of bugs on a broader scale.  In this case, the goal is for it to be
difficult to exploit arbitrary write bugs WITHOUT having a separate
arbitrary read bug.  Several other steps are being taken in this
direction - there is open discussion about the merits of hiding symbol
information, address space randomization, marking various likely targets
as read-only, and restricting access to debugging information.  However,
none of this really accomplishes much if addresses are still exposed
via /proc.

I'm sorry that you see all this as "security theatre".  It's clear that
there's too much resistance to this effort for it to ever succeed, so
I'm ceasing attempts to get this patch series through.

-Dan


^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Stephen Hemminger @ 2010-11-13 18:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1289671130.2743.28.camel@edumazet-laptop>

On Sat, 13 Nov 2010 18:58:50 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> > On Sat, 13 Nov 2010 09:15:28 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 578debb..ffbd177 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -996,7 +996,10 @@ struct net_device {
> > >  #endif
> > >  
> > >  	rx_handler_func_t	*rx_handler;
> > > -	void			*rx_handler_data;
> > > +	union {
> > > +		void				*rx_handler_data;
> > > +		struct net_bridge_port __rcu	*br_port_rcu;
> > > +	};
> > >  
> > >  	struct netdev_queue __rcu *ingress_queue;
> > 
> > I don't like making the generic hook typed again.
> > We don't do this for other callbacks, timers, workqueues, ...
> > Why is it necessary for RCU notation.
> > 
> 
> because rcu_dereference() needs the type for __CHECKER__/sparse checks
> 
> #define __rcu_dereference_check(p, c, space) \
>         ({ \
>                 typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
>                 rcu_lockdep_assert(c); \
>                 rcu_dereference_sparse(p, space); \
>                 smp_read_barrier_depends(); \
>                 ((typeof(*p) __force __kernel *)(_________p1)); \
>         })
> 
> So using a "void *ptr" is not an option
> 
> Its also cleaner to use
> 
> rcu_dereference(dev->br_port_rcu)
> 
> instead of 
> 
> (struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)

There must be a better way. What about use of that hook by macvlan and openvswitch?



-- 

^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Eric Dumazet @ 2010-11-13 17:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20101113093545.6fe9c077@nehalam>

Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> On Sat, 13 Nov 2010 09:15:28 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 578debb..ffbd177 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -996,7 +996,10 @@ struct net_device {
> >  #endif
> >  
> >  	rx_handler_func_t	*rx_handler;
> > -	void			*rx_handler_data;
> > +	union {
> > +		void				*rx_handler_data;
> > +		struct net_bridge_port __rcu	*br_port_rcu;
> > +	};
> >  
> >  	struct netdev_queue __rcu *ingress_queue;
> 
> I don't like making the generic hook typed again.
> We don't do this for other callbacks, timers, workqueues, ...
> Why is it necessary for RCU notation.
> 

because rcu_dereference() needs the type for __CHECKER__/sparse checks

#define __rcu_dereference_check(p, c, space) \
        ({ \
                typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
                rcu_lockdep_assert(c); \
                rcu_dereference_sparse(p, space); \
                smp_read_barrier_depends(); \
                ((typeof(*p) __force __kernel *)(_________p1)); \
        })

So using a "void *ptr" is not an option

Its also cleaner to use

rcu_dereference(dev->br_port_rcu)

instead of 

(struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox