Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:15 UTC (permalink / raw)
  To: deri; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <903D7FEC-34E8-4F70-ABC0-E56A3A5FBBE6@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 21:54:26 +0200

> - packets to be filtered using both BPF and ACL-like filters

To me this is all that PF_RING adds, the ACL filter features.

And I see no reason why that can't be added to PF_PACKET, we've
extended PF_PACKET in many ways before (even the mmap ring
buffer layout can be modified trivially) so there is no reason
we can't add the ACL filter feature to PF_PACKET as well.

Adding all of PF_RING just for that is a joke.

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-14 20:17 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <Pine.LNX.4.58.0910140912420.2714@u.domain.uli>

Hello Julian,

On Wed, Oct 14, 2009 at 10:27:50AM +0300, Julian Anastasov wrote:
> 	There were attempts to switch the server socket to
> established state, no SYN-ACK retransmissions after client ACK
> and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
> but due to some locking problems it was reverted:
> 
> http://marc.info/?t=121318919000001&r=1&w=2

interesting lecture, thanks for the link.

> 	Current handling is as before: drop client ACKs before DATA,
> stay in SYN_RECV state, send RST on outdated ACKs.

OK, that's what I observed too.

(...)
> > > 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> > > and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.
> > 
> > That's what I observe too, but the connection is silently dropped
> > afterwards and I'm clearly not sure this was the intended behaviour.
> 
> 	Yes, bad only for firewalls. Note that if
> TCP_SYNCNT/TCP_DEFER_ACCEPT is shorter the client still gets
> RST on next ACK. So, the problem is when client is silent
> after first ACK.

Not only, we have the problem that the application layer has no way
to be informed that someone connected and failed to send anything.

I have added the defer_accept feature in haproxy (http load balancer),
just as an optimization to avoid a EAGAIN upon first recv() attempt
to read the HTTP request after an accept. But users rely on stats and
logs a lot to check if their site works. If clients really connect and
fail to send a request (most often because of PPPoE outgoing MTU issues),
they want to see that in the stats in order to monitor their quality of
service. Also, having the ability to return a "408 request timeout" to
the users is a lot cleaner and easier to debug than not responding
anything at all, especially when there are reverse-proxies between
both ends.

So this behaviour has visible effects to the end user, that are not
expected at all when reading the doc.

> > So clearly this is in order to improve chances that the application
> > will receive the connection, no ?
> 
> 	Yes, it is not logical to configure
> TCP_DEFER_ACCEPT < TCP_SYNCNT with the idea TCP_DEFER_ACCEPT
> to reduce the SYN_RECV period. TCP_DEFER_ACCEPT does not
> reduce the SYN_RECV period (before ACK), may be this is lacking
> in man page. The idea to optionally give more time for DATA
> to come after ACK is more logical. If TCP_DEFER_ACCEPT
> switches state to ESTABLISHED may be then the TCP_DEFER_ACCEPT
> period should start after ACK, eg. "wait for DATA 10 seconds after
> ACK" sounds logical, it will need new name...

I don't even need to wait for 10 seconds after first ACK, being
able to drop only the first ACK is already excellent IMHO.

> Note that
> TCP_DEFER_ACCEPT is used also as flag for clients but you can do
> the same with TCP_QUICKACK=0 (to delay first ACK and to send
> DATA with first ACK).

Yes and I already use that, this gives substantial performance
boosts on small HTTP requests ; unfortunately I have no control
over the end-user's browser !

> > Yes this is what happens right now, but reading the man again
> > does not imply to me that the connection will not be accepted
> > once we reach the retransmit limit.
> > 
> > Maybe we have different usages and different interpretations of
> > the man can satisfy either, but I don't see what this would be
> > useful to in case we silently drop instead of finally accepting.
> 
> 	May be we want the same but currently TCP_DEFER_ACCEPT
> works as ACK dropper which is easier for implementation.

That's what I like too. No timer, very light requirements, just drop
the first empty ACK I'm not interested in, and that's all. That's
really what I understood from the man page.

> The semantic 'TCP_DEFER_ACCEPT extends the period after ACK'
> is good, you can tune it together with TCP_SYNCNT, to
> extend or not to extend the period. What happens on
> TCP_DEFER_ACCEPT expiration after ACK - we all prefer to
> see FIN, so we have to wait someone to come with new
> implementation.

Well, too much complicated for very little gain IMHO.

Regards,
Willy

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Brad Doctor, netdev
In-Reply-To: <20091014123328.1ebe35ea@s6510>

Stephen
many thanks for your feedback I appreciated it.

The reason why I decided to patch dev.c is because I wanted PF_RING to  
decide whether the packet journey shall continue or not. In other  
words with my solution PF_RING applications can decide whether the  
received packets will also be delivered to upper layers (and to other  
kernel network components). This configurable 'early packet drop'  
allows the overall performance to be significantly increased as  
received packets are not supposed to be delivered to upper layers;  
this is a typical situations for many monitoring devices.

Another reason, is that having a hook in dev.c, device drivers can  
pass PF_RING packets directly without going through the standard  
kernel mechanisms. For instance I have developed some drivers that if  
they detect the presence of PF_RING, pass received packets directly to  
PF_RING instead of going with NAPI.

These are reasons behind my design choices. However if you believe  
that my arguments are not enough, and it's better to use a different  
approach, I'm prepared to change the implementation. Please let me  
know what you want me to change in order to include PF_RING into the  
kernel source tree.

If you want to test it, for your convenience I have created a patch  
for kernel 2.6.31.3:
https://svn.ntop.org/svn/ntop/trunk/PF_RING/patches/linux-2.6.31.3-1-686-smp-PF_RING.patch.gz

Regards Luca


On Oct 14, 2009, at 9:33 PM, Stephen Hemminger wrote:

> On Wed, 14 Oct 2009 11:02:45 -0600
> Brad Doctor <brad.doctor@gmail.com> wrote:
>
>> Good point for sure. We will make the change and get back to you.
>> Thanks for the quick reply, we appreciate it very much!
>>
>> -brad
>>
>> On Wed, Oct 14, 2009 at 10:01 AM, Stephen Hemminger
>> <shemminger@vyatta.com> wrote:
>>> On Wed, 14 Oct 2009 08:33:08 -0600
>>> Brad Doctor <brad.doctor@gmail.com> wrote:
>>>
>>>> Greetings,
>>>>
>>>> On behalf of the users and developers of the PF_RING project, we  
>>>> would
>>>> like to ask consideration to include the PF_RING module in the main
>>>> line kernel.
>>>>
>>>> PF_RING (http://www.ntop.org/PF_RING.html) is a kernel module that
>>>> implements an mmap()-ed memory ring for accelerating packet capture
>>>> and for providing all the basic features a network monitoring
>>>> application needs. PF_RING includes several features such as packet
>>>> filtering, balancing across capture applications, packet reflection
>>>> (i.e. capture application can decide to bounce selected packets  
>>>> onto
>>>> an as-specified interface). Packets are filtered both using BPF and
>>>> using ACL-like rules (e.g. tcp and ports from 80 to 100). Using
>>>> PF_RING it is also possible to exploit multiple RX queues  
>>>> provided by
>>>> modern NIC adapters. PF_RING achieves a significant speedup by  
>>>> making
>>>> only one copy of the packet. Additionally, PF_RING is able to  
>>>> operate
>>>> in a capture-only installation, further increasing performance.
>>>>
>>>> PF_RING has been around since 2003 and is very mature with an  
>>>> active
>>>> contributing developer base. The developer and user community use a
>>>> mailing list (http://listgateway.unipi.it/pipermail/ntop-misc/) for
>>>> discussions and submissions. PF_RING is used in several projects,
>>>> ranging from distributions such as DD-WRT/OpenWrt to improving
>>>> performance of applications like Snort and Wireshark. Many  
>>>> commercial
>>>> companies around the world in the field of intrusion detection and
>>>> traffic analysis rely on PF_RING for accelerating their products  
>>>> and
>>>> operations.
>>>>
>>>> The PF_RING module relies on a small patch to net/core/dev.c that
>>>> intercepts when a packet is received/transmitted so that it can be
>>>> passed to the PF_RING module when present and with an active  
>>>> listener.
>>>> Other than these minor changes, all the PF_RING code is
>>>> self-contained, comprising jut two files: ring.c and ring.h.  
>>>> PF_RING
>>>> is the result of many years of research and development  
>>>> specifically
>>>> into high-speed packet capture, and is homegrown. PF_RING uses the
>>>> stock GPL license.
>>>>
>>>> We feel that PF_RING is ready to be included with the mainline  
>>>> kernel.
>>>> We are ready and eager to support PF_RING for the long term.
>>>>
>>>> Thank you in advance for your consideration!
>>>
>>> I was going to wrap pfring up for the staging tree.
>>> The code you put in network receive path is not necessary;
>>> it would be cleaner to just use existing packet type all hook, and
>>> then PF_RING could be a loadable module without having to be  
>>> compiled in.
>>>
>>> --
>>>
>
> If you want, I'll try and do it. Are there any tests other
> than just ntop?


^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:26 UTC (permalink / raw)
  To: David Miller; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <20091014.131526.181354809.davem@davemloft.net>

David
I agree that some PF_RING features could be merged into PF_PACKET.  
However PF_RING is not just about improving packet capture but it  
implements facilities that can be used by many monitoring applications  
including, packet balancing, reflection, layer-7 packet filtering,  
pf_ring socket clustering just to name a few. You can read about  
PF_RING feature into this tutorial: http://luca.ntop.org/IM2009_Tutorial.pdf

So if the decision is to include in PF_PACKET some unique features  
present in PF_RING, I will not be against that, even if I think that  
the PF_RING design is different from PF_PACKET, that IMHO is limited  
to packet capture.

Regards Luca

On Oct 14, 2009, at 10:15 PM, David Miller wrote:

> From: Luca Deri <deri@ntop.org>
> Date: Wed, 14 Oct 2009 21:54:26 +0200
>
>> - packets to be filtered using both BPF and ACL-like filters
>
> To me this is all that PF_RING adds, the ACL filter features.
>
> And I see no reason why that can't be added to PF_PACKET, we've
> extended PF_PACKET in many ways before (even the mmap ring
> buffer layout can be modified trivially) so there is no reason
> we can't add the ACL filter feature to PF_PACKET as well.
>
> Adding all of PF_RING just for that is a joke.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:27 UTC (permalink / raw)
  To: deri; +Cc: shemminger, brad.doctor, netdev
In-Reply-To: <8B385E10-4BE2-48A6-BDE0-0AA1A603275E@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 22:17:30 +0200

> Another reason, is that having a hook in dev.c, device drivers can
> pass PF_RING packets directly without going through the standard
> kernel mechanisms. For instance I have developed some drivers that if
> they detect the presence of PF_RING, pass received packets directly to
> PF_RING instead of going with NAPI.

There is absolutely no reason to do this.

If the existing infrastructure isn't good or fast enough,
fix it, don't bypass it.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:29 UTC (permalink / raw)
  To: deri; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <A2EB721D-3AC9-4815-ADCC-EC0B311C5AE7@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 22:26:25 +0200

> I agree that some PF_RING features could be merged into
> PF_PACKET. However PF_RING is not just about improving packet capture
> but it implements facilities that can be used by many monitoring
> applications including, packet balancing, reflection, layer-7 packet
> filtering, pf_ring socket clustering just to name a few. You can read
> about PF_RING feature into this tutorial:
> http://luca.ntop.org/IM2009_Tutorial.pdf

I've already researched several times what PF_RING does and
is capable of doing, and my position still stands that none of
it can't be added to existing facilities.

It's been an out of tree hack for years, and if it stays as
a seperate facility it's likely to stay that way.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:34 UTC (permalink / raw)
  To: David Miller; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <20091014.132911.174712308.davem@davemloft.net>

David
so do you want me to start porting PF_RING facilities into PF_PACKET?  
As I have said I;m not against this: my goal is to include this work  
into the linux kernel, as it has been separate for too long.

Luca

On Oct 14, 2009, at 10:29 PM, David Miller wrote:

> From: Luca Deri <deri@ntop.org>
> Date: Wed, 14 Oct 2009 22:26:25 +0200
>
>> I agree that some PF_RING features could be merged into
>> PF_PACKET. However PF_RING is not just about improving packet capture
>> but it implements facilities that can be used by many monitoring
>> applications including, packet balancing, reflection, layer-7 packet
>> filtering, pf_ring socket clustering just to name a few. You can read
>> about PF_RING feature into this tutorial:
>> http://luca.ntop.org/IM2009_Tutorial.pdf
>
> I've already researched several times what PF_RING does and
> is capable of doing, and my position still stands that none of
> it can't be added to existing facilities.
>
> It's been an out of tree hack for years, and if it stays as
> a seperate facility it's likely to stay that way.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Evgeniy Polyakov @ 2009-10-14 20:36 UTC (permalink / raw)
  To: Luca Deri; +Cc: Stephen Hemminger, Brad Doctor, netdev
In-Reply-To: <8B385E10-4BE2-48A6-BDE0-0AA1A603275E@ntop.org>

On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org) wrote:
> The reason why I decided to patch dev.c is because I wanted PF_RING to  
> decide whether the packet journey shall continue or not. In other  
> words with my solution PF_RING applications can decide whether the  
> received packets will also be delivered to upper layers (and to other  
> kernel network components). This configurable 'early packet drop'  
> allows the overall performance to be significantly increased as  
> received packets are not supposed to be delivered to upper layers;  
> this is a typical situations for many monitoring devices.

This is a feature many projects implement themself indeed. What about
creating special return value from the packet handler which will
indicate that packet was already consumed and no further work should be
done on it?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: query: tcpdump versus atomic?
From: William Allen Simpson @ 2009-10-14 20:36 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20091014085941.6897d9d9@nehalam>

Stephen Hemminger wrote:
> On Wed, 14 Oct 2009 11:20:12 -0400
> William Allen Simpson <william.allen.simpson@gmail.com> wrote:
> 
>> William Allen Simpson wrote:
>>> Anybody know what code path tcpdump changes to running atomic?
>>>
>>> Is there a function to test whether you're running atomic?
>>>
>> To partially answer my own question, after laboriously #if'ing compiling
>> section by section, it affects the tcp_minisockets.c code at
>> tcp_create_openreq_child().
>>
>> I've not found a function to test.  I've found sk->sk_allocation, but
>> that doesn't seem to be dynamically updated to reflect the current state.
>>
> did you look at your ethernet's drivers code to turn on promiscuous mode.
> It could be leaving irq's or bottom half disabled.
> 
[    2.876485] eth0: RealTek RTL8139 at 0x2000, 00:40:2b:6b:61:36, IRQ 17
[    2.876490] eth0:  Identified 8139 chip type 'RTL-8101'

No idea on the driver file name, but that's a fairly popular chipset, so
the code should be fairly well reviewed.

Still, tcp shouldn't be processed with interrupts disabled.  That's *way*
too much code....

And we need a function that tells us whether we're atomic already....  And
better function header descriptions that mention the assumptions.

^ permalink raw reply

* Re: query: bnx2 and tg3 don't check tcp and/or ip header length validity?
From: William Allen Simpson @ 2009-10-14 20:46 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4AD5F333.3040002@gmail.com>

William Allen Simpson wrote:
> My question is whether it would be OK to add a simple test, and set it to
> zero in case of bad values?
> 
Although both are compiled in my build, I've got no way to test them.  I'm
just going to do the easy thing and set to zero for now.  Somebody that
knows the code -- who should have done real error checking -- could actually
write better error checking and comments about the purpose of cramming the
length of the TCP option field into a tag....

-		tcp_opt_len = tcp_optlen(skb);
+		tcp_opt_len = tcp_option_len_th(tcp_hdr(skb));
+		if (tcp_opt_len < 0)
+			tcp_opt_len = 0;


^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Mark Smith @ 2009-10-14 20:50 UTC (permalink / raw)
  To: Luca Deri; +Cc: David Miller, bcook, brad.doctor, netdev
In-Reply-To: <C803B824-8249-44BA-ACCD-D9AE4AA21F92@ntop.org>

Hi Luca, David,

On Wed, 14 Oct 2009 22:34:17 +0200
Luca Deri <deri@ntop.org> wrote:

> David
> so do you want me to start porting PF_RING facilities into PF_PACKET?  
> As I have said I;m not against this: my goal is to include this work  
> into the linux kernel, as it has been separate for too long.
> 

I'd like to see that. One of my main uses for Linux is as a network
troubleshooting "pocket knife", so anything that can improve or make
more functional the packet capturing capabilities is really appreciated
by people like me.

Thanks,
Mark.

> Luca
> 
> On Oct 14, 2009, at 10:29 PM, David Miller wrote:
> 
> > From: Luca Deri <deri@ntop.org>
> > Date: Wed, 14 Oct 2009 22:26:25 +0200
> >
> >> I agree that some PF_RING features could be merged into
> >> PF_PACKET. However PF_RING is not just about improving packet capture
> >> but it implements facilities that can be used by many monitoring
> >> applications including, packet balancing, reflection, layer-7 packet
> >> filtering, pf_ring socket clustering just to name a few. You can read
> >> about PF_RING feature into this tutorial:
> >> http://luca.ntop.org/IM2009_Tutorial.pdf
> >
> > I've already researched several times what PF_RING does and
> > is capable of doing, and my position still stands that none of
> > it can't be added to existing facilities.
> >
> > It's been an out of tree hack for years, and if it stays as
> > a seperate facility it's likely to stay that way.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Route Metric at Kernel Level
From: Tim Schneider @ 2009-10-14 20:49 UTC (permalink / raw)
  To: netdev; +Cc: Nils Aschenbruck, Christoph Fuchs

Hello,

I've been trying for a while now, to readout the Routing-Metric from a  
given socket at kernel level. So far, I've always set the routing- 
table metric by hand using the ifmetric command.
When I tried reading out the struct sock->sk_dst_cache->metrics  
values, I always got a Metric value which was 0. I thought I was  
reading out the wrong value, until I discovered, that the metric  
values, that I put into the standard routing table are not copied into  
the kernel routing cache which I was reading out. Now I am wondering  
if it is supposed to be like that, and if so, why it was implemented  
that way.

Also, since I need to read out the Metric value from the Routing  
Table, I am in need of some help. Maybe just a push into the right  
direction. I've been looking at the fib_frontend.c and the other fib_*  
files which, as I understand, implement the routing tables.  
Unfortunately they don't seem to export their functions for my Kernel- 
Module, so I am unable to use those. Is there an entrypoint from where  
I am able to use the functions given in the fib_* files, or are they  
strictly closed off from the rest of the Kernel?

Thank you for any help!

Greetings

Tim

^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Olaf van der Spek @ 2009-10-14 21:12 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20091014201706.GA24298@1wt.eu>

On Wed, Oct 14, 2009 at 10:17 PM, Willy Tarreau <w@1wt.eu> wrote:
> Yes and I already use that, this gives substantial performance
> boosts on small HTTP requests ; unfortunately I have no control
> over the end-user's browser !

Most accept feature requests, don't they?

Olaf

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Ben Greear @ 2009-10-14 21:27 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Luca Deri, Stephen Hemminger, Brad Doctor, netdev
In-Reply-To: <20091014203640.GB32317@ioremap.net>

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On 10/14/2009 01:36 PM, Evgeniy Polyakov wrote:
> On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org) wrote:
>> The reason why I decided to patch dev.c is because I wanted PF_RING to
>> decide whether the packet journey shall continue or not. In other
>> words with my solution PF_RING applications can decide whether the
>> received packets will also be delivered to upper layers (and to other
>> kernel network components). This configurable 'early packet drop'
>> allows the overall performance to be significantly increased as
>> received packets are not supposed to be delivered to upper layers;
>> this is a typical situations for many monitoring devices.
>
> This is a feature many projects implement themself indeed. What about
> creating special return value from the packet handler which will
> indicate that packet was already consumed and no further work should be
> done on it?

Maybe something similar to the attached patch?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


[-- Attachment #2: patch0.patch --]
[-- Type: text/plain, Size: 2636 bytes --]

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 907d118..da78f0a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -78,6 +78,7 @@ struct wireless_dev;
 #define NET_RX_CN_MOD		3   /* Storm on its way! */
 #define NET_RX_CN_HIGH		4   /* The storm is here */
 #define NET_RX_BAD		5  /* packet dropped due to kernel error */
+#define NET_RX_CONSUMED       6 /* pkt is consumed, stop rx logic here. */
 
 /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It
  * indicates that the device will soon be dropping packets, or already drops
diff --git a/net/core/dev.c b/net/core/dev.c
index 0101178..d5024b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2105,6 +2105,10 @@ static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 
 	return br_handle_frame_hook(port, skb);
@@ -2128,6 +2132,10 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 	return macvlan_handle_frame_hook(skb);
 }
@@ -2185,6 +2193,10 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	} else {
 		/* Huh? Why does turning on AF_PACKET affect this? */
 		skb->tc_verd = SET_TC_OK2MUNGE(skb->tc_verd);
@@ -2300,8 +2312,13 @@ int netif_receive_skb(struct sk_buff *skb)
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
 		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		    ptype->dev == orig_dev) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}
@@ -2336,8 +2353,13 @@ ncls:
 		if (ptype->type == type &&
 		    (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		     ptype->dev == orig_dev)) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}

^ permalink raw reply related

* Re: query: bnx2 and tg3 don't check tcp and/or ip header length validity?
From: Michael Chan @ 2009-10-14 21:24 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AD638AC.3000901@gmail.com>

On Wed, 2009-10-14 at 13:46 -0700, William Allen Simpson wrote:
> William Allen Simpson wrote:
> > My question is whether it would be OK to add a simple test, and set it to
> > zero in case of bad values?
> > 
> Although both are compiled in my build, I've got no way to test them.  I'm
> just going to do the easy thing and set to zero for now.  Somebody that
> knows the code -- who should have done real error checking -- could actually
> write better error checking and comments about the purpose of cramming the
> length of the TCP option field into a tag....

The option length is needed by the hardware to segment a TSO packet into
proper MTU-sized packets.  You'll get malformed packets if the TSO
header is bad.  Setting it to zero perhaps can make these bad packets
more deterministic, but I don't know for sure.

> 
> -		tcp_opt_len = tcp_optlen(skb);
> +		tcp_opt_len = tcp_option_len_th(tcp_hdr(skb));
> +		if (tcp_opt_len < 0)
> +			tcp_opt_len = 0;
> 

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 21:34 UTC (permalink / raw)
  To: Ben Greear; +Cc: Evgeniy Polyakov, Stephen Hemminger, Brad Doctor, netdev
In-Reply-To: <4AD64251.50903@candelatech.com>

Ben
the patch implements what I looked for.

Thanks Luca

On Oct 14, 2009, at 11:27 PM, Ben Greear wrote:

> On 10/14/2009 01:36 PM, Evgeniy Polyakov wrote:
>> On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org)  
>> wrote:
>>> The reason why I decided to patch dev.c is because I wanted  
>>> PF_RING to
>>> decide whether the packet journey shall continue or not. In other
>>> words with my solution PF_RING applications can decide whether the
>>> received packets will also be delivered to upper layers (and to  
>>> other
>>> kernel network components). This configurable 'early packet drop'
>>> allows the overall performance to be significantly increased as
>>> received packets are not supposed to be delivered to upper layers;
>>> this is a typical situations for many monitoring devices.
>>
>> This is a feature many projects implement themself indeed. What about
>> creating special return value from the packet handler which will
>> indicate that packet was already consumed and no further work  
>> should be
>> done on it?
>
> Maybe something similar to the attached patch?
>
> Thanks,
> Ben
>
> -- 
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
> <patch0.patch>


^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 21:49 UTC (permalink / raw)
  To: greearb; +Cc: zbr, deri, shemminger, brad.doctor, netdev
In-Reply-To: <4AD64251.50903@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Wed, 14 Oct 2009 14:27:45 -0700

> Maybe something similar to the attached patch?

This is not something I'm interested in applying.

It makes implementing proprietary complete networking stacks
for Linux way too easy.

Instead I'd rather have a GPL exported function that allows indication
of consumption somehow.  That's why we have a special hook for
bonding, so it cannot be abused in proprietary modules.

^ permalink raw reply

* Re: [PATCH] net: Fix OF platform drivers coldplug/hotplug when compiled as modules
From: David Miller @ 2009-10-14 21:54 UTC (permalink / raw)
  To: avorontsov; +Cc: linuxppc-dev, netdev
In-Reply-To: <20091008121512.GA2390@oksana.dev.rtsoft.ru>

From: Anton Vorontsov <avorontsov@ru.mvista.com>
Date: Thu, 8 Oct 2009 16:15:12 +0400

> Some OF platform drivers are missing module device tables, so they won't
> load automatically on boot. This patch fixes the issue by adding proper
> MODULE_DEVICE_TABLE() macros to the drivers.
> 
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>

Applied, thanks!

^ permalink raw reply

* [PATCH -mmotm] net: ks8851_mll uses mii interfaces
From: Randy Dunlap @ 2009-10-14 21:59 UTC (permalink / raw)
  To: akpm, netdev; +Cc: linux-kernel, David Choi
In-Reply-To: <200910140436.n9E4aF9G001644@imap1.linux-foundation.org>

From: Randy Dunlap <randy.dunlap@oracle.com>

ks8851_mll uses mii interfaces so it needs to select MII.

ks8851_mll.c:(.text+0xf95ac): undefined reference to `generic_mii_ioctl'
ks8851_mll.c:(.text+0xf96a0): undefined reference to `mii_ethtool_gset'
ks8851_mll.c:(.text+0xf96fa): undefined reference to `mii_ethtool_sset'
ks8851_mll.c:(.text+0xf9754): undefined reference to `mii_link_ok'
ks8851_mll.c:(.text+0xf97ae): undefined reference to `mii_nway_restart'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Choi <david.choi@micrel.com>
---
 drivers/net/Kconfig |    1 +
 1 file changed, 1 insertion(+)

--- mmotm-2009-1013-2113.orig/drivers/net/Kconfig
+++ mmotm-2009-1013-2113/drivers/net/Kconfig
@@ -1741,6 +1741,7 @@ config KS8851
 config KS8851_MLL
 	tristate "Micrel KS8851 MLL"
 	depends on HAS_IOMEM
+	select MII
 	help
 	  This platform driver is for Micrel KS8851 Address/data bus
 	  multiplexed network chip.

^ permalink raw reply

* Re: [RFC][PATCH] pkt_sched: skbedit add support for setting mark
From: David Miller @ 2009-10-14 22:08 UTC (permalink / raw)
  To: hadi; +Cc: denys, alexander.h.duyck, netdev
In-Reply-To: <1255523114.21940.14.camel@dogo.mojatatu.com>

From: jamal <hadi@cyberus.ca>
Date: Wed, 14 Oct 2009 08:25:14 -0400

> 
> Here are two patches, one for kernel and one for iproute2.
> I have tested it and it works fine. Unless there are objections, I
> would like to make a formal submission of this.

This patch doesn't apply.

You forgot to add the new files when you checked them into your
tree.

^ permalink raw reply

* Re: [PATCH net-next 0/5] Phonet: basic routing support
From: David Miller @ 2009-10-14 22:08 UTC (permalink / raw)
  To: remi; +Cc: netdev
In-Reply-To: <06dcdf33b2a0132b2a05c3220735a81e@chewa.net>

From: Rémi Denis-Courmont <remi@remlab.net>
Date: Wed, 14 Oct 2009 12:47:34 +0200

> 
> This patchset provides rudimentary support for routing Phonet packets.
> Configuration is done with the common rtnetlink infrastructure.
> 
> This is useful when there is more than one Phonet interface in the same
> namespace,
> e.g. a serial bus to a cellular modem and a USB gadget function to a PC.

Looks good, applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [net-next 0/8] bnx2x: Device Control Channel bug fixes
From: David Miller @ 2009-10-14 22:09 UTC (permalink / raw)
  To: eilong; +Cc: netdev
In-Reply-To: <1255532825.25030.35.camel@lb-tlvb-eilong>

From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Wed, 14 Oct 2009 17:07:05 +0200

> This bnx2x patch series is fixing some bugs that relates to the Device
> Control Channel (DCC) code. There are actually 3 different failures:
> 
> 1. When the fairness initial value was set to zero, the device could not
> be enabled. This is caused since zero indicated that the mechanism is
> disabled, and the code (both FW and driver) was not ready to allow
> enabling it at run time. This patch requires replacing the FW - to allow
> easier review, it is split to 3 patches:
>         P1: adding the new FW
>         P2: the actually patch
>         P3: removing the old FW
>         
> 2. Races when loading/unloading the driver when DCC link enable/disable
> commands are received. There were 3 different races:
>         P4: The state of the driver which indicates if it is loaded or
>         not was also used to signal if its link is enabled/disabled by
>         DCC
>         P5: The FW commands to acknowledge the DCC command and
>         loading/unloading the driver run over each other
>         P6: Setting/clearing the MAC address and the FW filtering rules
>         
> 3.	P7: Reporting the maximal BW as the link speed
> 
> Patch number 8 is the version update.
> 
> The patches were made based on net-next. Since those are bug fixes,
> please let me know if I should send them based on net-2.6 as well.

Applied to net-next-2.6, which is where this likely belongs.

Thanks!

^ permalink raw reply

* Re: [PATCH v2] net/fec_mpc52xx: Fix kernel panic on FEC error
From: David Miller @ 2009-10-14 22:10 UTC (permalink / raw)
  To: grant.likely; +Cc: bones, netdev, linuxppc-dev
In-Reply-To: <20091014174224.29221.18830.stgit@angua>

From: Grant Likely <grant.likely@secretlab.ca>
Date: Wed, 14 Oct 2009 11:43:43 -0600

> From: John Bonesio <bones@secretlab.ca>
> 
> The MDIO bus cannot be accessed at interrupt context, but on an FEC
> error, the fec_mpc52xx driver reset function also tries to reset the
> PHY.  Since the error is detected at IRQ context, and the PHY functions
> try to sleep, the kernel ends up panicking.
> 
> Resetting the PHY on an FEC error isn't even necessary.  This patch
> solves the problem by removing the PHY reset entirely.
> 
> Signed-off-by: John Bonesio <bones@secretlab.ca>
> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH -mmotm] net: ks8851_mll uses mii interfaces
From: David Miller @ 2009-10-14 22:11 UTC (permalink / raw)
  To: randy.dunlap; +Cc: akpm, netdev, linux-kernel, david.choi
In-Reply-To: <20091014145905.219416bb.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Wed, 14 Oct 2009 14:59:05 -0700

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> ks8851_mll uses mii interfaces so it needs to select MII.
> 
> ks8851_mll.c:(.text+0xf95ac): undefined reference to `generic_mii_ioctl'
> ks8851_mll.c:(.text+0xf96a0): undefined reference to `mii_ethtool_gset'
> ks8851_mll.c:(.text+0xf96fa): undefined reference to `mii_ethtool_sset'
> ks8851_mll.c:(.text+0xf9754): undefined reference to `mii_link_ok'
> ks8851_mll.c:(.text+0xf97ae): undefined reference to `mii_nway_restart'
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> Cc: David Choi <david.choi@micrel.com>

Applied, thanks Randy.

^ permalink raw reply

* Re: [PATCH] net: add support for STMicroelectronics Ethernet controllers.
From: David Miller @ 2009-10-14 22:14 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: eric.dumazet, netdev
In-Reply-To: <4AD57693.1010604@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 14 Oct 2009 08:58:27 +0200

>>From d370c0d34c2cfe20de94c05e243ab761e316ab4d Mon Sep 17 00:00:00 2001
> From: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Date: Mon, 12 Oct 2009 11:11:06 +0200
> Subject: [PATCH] net: add support for STMicroelectronics Ethernet controllers.
> 
> This is the driver for the ST MAC 10/100/1000 on-chip Ethernet
> controllers (Synopsys IP blocks).
> 
> Driver documentation:
>  o http://stlinux.com/drupal/kernel/network/stmmac
> Revisions:
>  o http://stlinux.com/drupal/kernel/network/stmmac-driver-revisions
> Performances:
>  o http://stlinux.com/drupal/benchmarks/networking/stmmac
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox