Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: PF_RING: Include in main line kernel?
From: Ben Greear @ 2009-10-14 21:27 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Luca Deri, Stephen Hemminger, Brad Doctor, netdev
In-Reply-To: <20091014203640.GB32317@ioremap.net>

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On 10/14/2009 01:36 PM, Evgeniy Polyakov wrote:
> On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org) wrote:
>> The reason why I decided to patch dev.c is because I wanted PF_RING to
>> decide whether the packet journey shall continue or not. In other
>> words with my solution PF_RING applications can decide whether the
>> received packets will also be delivered to upper layers (and to other
>> kernel network components). This configurable 'early packet drop'
>> allows the overall performance to be significantly increased as
>> received packets are not supposed to be delivered to upper layers;
>> this is a typical situations for many monitoring devices.
>
> This is a feature many projects implement themself indeed. What about
> creating special return value from the packet handler which will
> indicate that packet was already consumed and no further work should be
> done on it?

Maybe something similar to the attached patch?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


[-- Attachment #2: patch0.patch --]
[-- Type: text/plain, Size: 2636 bytes --]

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 907d118..da78f0a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -78,6 +78,7 @@ struct wireless_dev;
 #define NET_RX_CN_MOD		3   /* Storm on its way! */
 #define NET_RX_CN_HIGH		4   /* The storm is here */
 #define NET_RX_BAD		5  /* packet dropped due to kernel error */
+#define NET_RX_CONSUMED       6 /* pkt is consumed, stop rx logic here. */
 
 /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It
  * indicates that the device will soon be dropping packets, or already drops
diff --git a/net/core/dev.c b/net/core/dev.c
index 0101178..d5024b9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2105,6 +2105,10 @@ static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 
 	return br_handle_frame_hook(port, skb);
@@ -2128,6 +2132,10 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	}
 	return macvlan_handle_frame_hook(skb);
 }
@@ -2185,6 +2193,10 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
+		if (*ret == NET_RX_CONSUMED) {
+			kfree_skb(skb); /* we made a copy in deliver_skb */
+			return NULL;
+		}
 	} else {
 		/* Huh? Why does turning on AF_PACKET affect this? */
 		skb->tc_verd = SET_TC_OK2MUNGE(skb->tc_verd);
@@ -2300,8 +2312,13 @@ int netif_receive_skb(struct sk_buff *skb)
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
 		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		    ptype->dev == orig_dev) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}
@@ -2336,8 +2353,13 @@ ncls:
 		if (ptype->type == type &&
 		    (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
 		     ptype->dev == orig_dev)) {
-			if (pt_prev)
+			if (pt_prev) {
 				ret = deliver_skb(skb, pt_prev, orig_dev);
+				if (ret == NET_RX_CONSUMED) {
+					kfree_skb(skb); /* we made a copy in deliver_skb */
+					goto out;
+				}
+			}
 			pt_prev = ptype;
 		}
 	}

^ permalink raw reply related

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Olaf van der Spek @ 2009-10-14 21:12 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20091014201706.GA24298@1wt.eu>

On Wed, Oct 14, 2009 at 10:17 PM, Willy Tarreau <w@1wt.eu> wrote:
> Yes and I already use that, this gives substantial performance
> boosts on small HTTP requests ; unfortunately I have no control
> over the end-user's browser !

Most accept feature requests, don't they?

Olaf

^ permalink raw reply

* Route Metric at Kernel Level
From: Tim Schneider @ 2009-10-14 20:49 UTC (permalink / raw)
  To: netdev; +Cc: Nils Aschenbruck, Christoph Fuchs

Hello,

I've been trying for a while now, to readout the Routing-Metric from a  
given socket at kernel level. So far, I've always set the routing- 
table metric by hand using the ifmetric command.
When I tried reading out the struct sock->sk_dst_cache->metrics  
values, I always got a Metric value which was 0. I thought I was  
reading out the wrong value, until I discovered, that the metric  
values, that I put into the standard routing table are not copied into  
the kernel routing cache which I was reading out. Now I am wondering  
if it is supposed to be like that, and if so, why it was implemented  
that way.

Also, since I need to read out the Metric value from the Routing  
Table, I am in need of some help. Maybe just a push into the right  
direction. I've been looking at the fib_frontend.c and the other fib_*  
files which, as I understand, implement the routing tables.  
Unfortunately they don't seem to export their functions for my Kernel- 
Module, so I am unable to use those. Is there an entrypoint from where  
I am able to use the functions given in the fib_* files, or are they  
strictly closed off from the rest of the Kernel?

Thank you for any help!

Greetings

Tim

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Mark Smith @ 2009-10-14 20:50 UTC (permalink / raw)
  To: Luca Deri; +Cc: David Miller, bcook, brad.doctor, netdev
In-Reply-To: <C803B824-8249-44BA-ACCD-D9AE4AA21F92@ntop.org>

Hi Luca, David,

On Wed, 14 Oct 2009 22:34:17 +0200
Luca Deri <deri@ntop.org> wrote:

> David
> so do you want me to start porting PF_RING facilities into PF_PACKET?  
> As I have said I;m not against this: my goal is to include this work  
> into the linux kernel, as it has been separate for too long.
> 

I'd like to see that. One of my main uses for Linux is as a network
troubleshooting "pocket knife", so anything that can improve or make
more functional the packet capturing capabilities is really appreciated
by people like me.

Thanks,
Mark.

> Luca
> 
> On Oct 14, 2009, at 10:29 PM, David Miller wrote:
> 
> > From: Luca Deri <deri@ntop.org>
> > Date: Wed, 14 Oct 2009 22:26:25 +0200
> >
> >> I agree that some PF_RING features could be merged into
> >> PF_PACKET. However PF_RING is not just about improving packet capture
> >> but it implements facilities that can be used by many monitoring
> >> applications including, packet balancing, reflection, layer-7 packet
> >> filtering, pf_ring socket clustering just to name a few. You can read
> >> about PF_RING feature into this tutorial:
> >> http://luca.ntop.org/IM2009_Tutorial.pdf
> >
> > I've already researched several times what PF_RING does and
> > is capable of doing, and my position still stands that none of
> > it can't be added to existing facilities.
> >
> > It's been an out of tree hack for years, and if it stays as
> > a seperate facility it's likely to stay that way.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: query: bnx2 and tg3 don't check tcp and/or ip header length validity?
From: William Allen Simpson @ 2009-10-14 20:46 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4AD5F333.3040002@gmail.com>

William Allen Simpson wrote:
> My question is whether it would be OK to add a simple test, and set it to
> zero in case of bad values?
> 
Although both are compiled in my build, I've got no way to test them.  I'm
just going to do the easy thing and set to zero for now.  Somebody that
knows the code -- who should have done real error checking -- could actually
write better error checking and comments about the purpose of cramming the
length of the TCP option field into a tag....

-		tcp_opt_len = tcp_optlen(skb);
+		tcp_opt_len = tcp_option_len_th(tcp_hdr(skb));
+		if (tcp_opt_len < 0)
+			tcp_opt_len = 0;


^ permalink raw reply

* Re: query: tcpdump versus atomic?
From: William Allen Simpson @ 2009-10-14 20:36 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20091014085941.6897d9d9@nehalam>

Stephen Hemminger wrote:
> On Wed, 14 Oct 2009 11:20:12 -0400
> William Allen Simpson <william.allen.simpson@gmail.com> wrote:
> 
>> William Allen Simpson wrote:
>>> Anybody know what code path tcpdump changes to running atomic?
>>>
>>> Is there a function to test whether you're running atomic?
>>>
>> To partially answer my own question, after laboriously #if'ing compiling
>> section by section, it affects the tcp_minisockets.c code at
>> tcp_create_openreq_child().
>>
>> I've not found a function to test.  I've found sk->sk_allocation, but
>> that doesn't seem to be dynamically updated to reflect the current state.
>>
> did you look at your ethernet's drivers code to turn on promiscuous mode.
> It could be leaving irq's or bottom half disabled.
> 
[    2.876485] eth0: RealTek RTL8139 at 0x2000, 00:40:2b:6b:61:36, IRQ 17
[    2.876490] eth0:  Identified 8139 chip type 'RTL-8101'

No idea on the driver file name, but that's a fairly popular chipset, so
the code should be fairly well reviewed.

Still, tcp shouldn't be processed with interrupts disabled.  That's *way*
too much code....

And we need a function that tells us whether we're atomic already....  And
better function header descriptions that mention the assumptions.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Evgeniy Polyakov @ 2009-10-14 20:36 UTC (permalink / raw)
  To: Luca Deri; +Cc: Stephen Hemminger, Brad Doctor, netdev
In-Reply-To: <8B385E10-4BE2-48A6-BDE0-0AA1A603275E@ntop.org>

On Wed, Oct 14, 2009 at 10:17:30PM +0200, Luca Deri (deri@ntop.org) wrote:
> The reason why I decided to patch dev.c is because I wanted PF_RING to  
> decide whether the packet journey shall continue or not. In other  
> words with my solution PF_RING applications can decide whether the  
> received packets will also be delivered to upper layers (and to other  
> kernel network components). This configurable 'early packet drop'  
> allows the overall performance to be significantly increased as  
> received packets are not supposed to be delivered to upper layers;  
> this is a typical situations for many monitoring devices.

This is a feature many projects implement themself indeed. What about
creating special return value from the packet handler which will
indicate that packet was already consumed and no further work should be
done on it?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:34 UTC (permalink / raw)
  To: David Miller; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <20091014.132911.174712308.davem@davemloft.net>

David
so do you want me to start porting PF_RING facilities into PF_PACKET?  
As I have said I;m not against this: my goal is to include this work  
into the linux kernel, as it has been separate for too long.

Luca

On Oct 14, 2009, at 10:29 PM, David Miller wrote:

> From: Luca Deri <deri@ntop.org>
> Date: Wed, 14 Oct 2009 22:26:25 +0200
>
>> I agree that some PF_RING features could be merged into
>> PF_PACKET. However PF_RING is not just about improving packet capture
>> but it implements facilities that can be used by many monitoring
>> applications including, packet balancing, reflection, layer-7 packet
>> filtering, pf_ring socket clustering just to name a few. You can read
>> about PF_RING feature into this tutorial:
>> http://luca.ntop.org/IM2009_Tutorial.pdf
>
> I've already researched several times what PF_RING does and
> is capable of doing, and my position still stands that none of
> it can't be added to existing facilities.
>
> It's been an out of tree hack for years, and if it stays as
> a seperate facility it's likely to stay that way.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:29 UTC (permalink / raw)
  To: deri; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <A2EB721D-3AC9-4815-ADCC-EC0B311C5AE7@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 22:26:25 +0200

> I agree that some PF_RING features could be merged into
> PF_PACKET. However PF_RING is not just about improving packet capture
> but it implements facilities that can be used by many monitoring
> applications including, packet balancing, reflection, layer-7 packet
> filtering, pf_ring socket clustering just to name a few. You can read
> about PF_RING feature into this tutorial:
> http://luca.ntop.org/IM2009_Tutorial.pdf

I've already researched several times what PF_RING does and
is capable of doing, and my position still stands that none of
it can't be added to existing facilities.

It's been an out of tree hack for years, and if it stays as
a seperate facility it's likely to stay that way.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:27 UTC (permalink / raw)
  To: deri; +Cc: shemminger, brad.doctor, netdev
In-Reply-To: <8B385E10-4BE2-48A6-BDE0-0AA1A603275E@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 22:17:30 +0200

> Another reason, is that having a hook in dev.c, device drivers can
> pass PF_RING packets directly without going through the standard
> kernel mechanisms. For instance I have developed some drivers that if
> they detect the presence of PF_RING, pass received packets directly to
> PF_RING instead of going with NAPI.

There is absolutely no reason to do this.

If the existing infrastructure isn't good or fast enough,
fix it, don't bypass it.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:26 UTC (permalink / raw)
  To: David Miller; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <20091014.131526.181354809.davem@davemloft.net>

David
I agree that some PF_RING features could be merged into PF_PACKET.  
However PF_RING is not just about improving packet capture but it  
implements facilities that can be used by many monitoring applications  
including, packet balancing, reflection, layer-7 packet filtering,  
pf_ring socket clustering just to name a few. You can read about  
PF_RING feature into this tutorial: http://luca.ntop.org/IM2009_Tutorial.pdf

So if the decision is to include in PF_PACKET some unique features  
present in PF_RING, I will not be against that, even if I think that  
the PF_RING design is different from PF_PACKET, that IMHO is limited  
to packet capture.

Regards Luca

On Oct 14, 2009, at 10:15 PM, David Miller wrote:

> From: Luca Deri <deri@ntop.org>
> Date: Wed, 14 Oct 2009 21:54:26 +0200
>
>> - packets to be filtered using both BPF and ACL-like filters
>
> To me this is all that PF_RING adds, the ACL filter features.
>
> And I see no reason why that can't be added to PF_PACKET, we've
> extended PF_PACKET in many ways before (even the mmap ring
> buffer layout can be modified trivially) so there is no reason
> we can't add the ACL filter feature to PF_PACKET as well.
>
> Adding all of PF_RING just for that is a joke.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 20:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Brad Doctor, netdev
In-Reply-To: <20091014123328.1ebe35ea@s6510>

Stephen
many thanks for your feedback I appreciated it.

The reason why I decided to patch dev.c is because I wanted PF_RING to  
decide whether the packet journey shall continue or not. In other  
words with my solution PF_RING applications can decide whether the  
received packets will also be delivered to upper layers (and to other  
kernel network components). This configurable 'early packet drop'  
allows the overall performance to be significantly increased as  
received packets are not supposed to be delivered to upper layers;  
this is a typical situations for many monitoring devices.

Another reason, is that having a hook in dev.c, device drivers can  
pass PF_RING packets directly without going through the standard  
kernel mechanisms. For instance I have developed some drivers that if  
they detect the presence of PF_RING, pass received packets directly to  
PF_RING instead of going with NAPI.

These are reasons behind my design choices. However if you believe  
that my arguments are not enough, and it's better to use a different  
approach, I'm prepared to change the implementation. Please let me  
know what you want me to change in order to include PF_RING into the  
kernel source tree.

If you want to test it, for your convenience I have created a patch  
for kernel 2.6.31.3:
https://svn.ntop.org/svn/ntop/trunk/PF_RING/patches/linux-2.6.31.3-1-686-smp-PF_RING.patch.gz

Regards Luca


On Oct 14, 2009, at 9:33 PM, Stephen Hemminger wrote:

> On Wed, 14 Oct 2009 11:02:45 -0600
> Brad Doctor <brad.doctor@gmail.com> wrote:
>
>> Good point for sure. We will make the change and get back to you.
>> Thanks for the quick reply, we appreciate it very much!
>>
>> -brad
>>
>> On Wed, Oct 14, 2009 at 10:01 AM, Stephen Hemminger
>> <shemminger@vyatta.com> wrote:
>>> On Wed, 14 Oct 2009 08:33:08 -0600
>>> Brad Doctor <brad.doctor@gmail.com> wrote:
>>>
>>>> Greetings,
>>>>
>>>> On behalf of the users and developers of the PF_RING project, we  
>>>> would
>>>> like to ask consideration to include the PF_RING module in the main
>>>> line kernel.
>>>>
>>>> PF_RING (http://www.ntop.org/PF_RING.html) is a kernel module that
>>>> implements an mmap()-ed memory ring for accelerating packet capture
>>>> and for providing all the basic features a network monitoring
>>>> application needs. PF_RING includes several features such as packet
>>>> filtering, balancing across capture applications, packet reflection
>>>> (i.e. capture application can decide to bounce selected packets  
>>>> onto
>>>> an as-specified interface). Packets are filtered both using BPF and
>>>> using ACL-like rules (e.g. tcp and ports from 80 to 100). Using
>>>> PF_RING it is also possible to exploit multiple RX queues  
>>>> provided by
>>>> modern NIC adapters. PF_RING achieves a significant speedup by  
>>>> making
>>>> only one copy of the packet. Additionally, PF_RING is able to  
>>>> operate
>>>> in a capture-only installation, further increasing performance.
>>>>
>>>> PF_RING has been around since 2003 and is very mature with an  
>>>> active
>>>> contributing developer base. The developer and user community use a
>>>> mailing list (http://listgateway.unipi.it/pipermail/ntop-misc/) for
>>>> discussions and submissions. PF_RING is used in several projects,
>>>> ranging from distributions such as DD-WRT/OpenWrt to improving
>>>> performance of applications like Snort and Wireshark. Many  
>>>> commercial
>>>> companies around the world in the field of intrusion detection and
>>>> traffic analysis rely on PF_RING for accelerating their products  
>>>> and
>>>> operations.
>>>>
>>>> The PF_RING module relies on a small patch to net/core/dev.c that
>>>> intercepts when a packet is received/transmitted so that it can be
>>>> passed to the PF_RING module when present and with an active  
>>>> listener.
>>>> Other than these minor changes, all the PF_RING code is
>>>> self-contained, comprising jut two files: ring.c and ring.h.  
>>>> PF_RING
>>>> is the result of many years of research and development  
>>>> specifically
>>>> into high-speed packet capture, and is homegrown. PF_RING uses the
>>>> stock GPL license.
>>>>
>>>> We feel that PF_RING is ready to be included with the mainline  
>>>> kernel.
>>>> We are ready and eager to support PF_RING for the long term.
>>>>
>>>> Thank you in advance for your consideration!
>>>
>>> I was going to wrap pfring up for the staging tree.
>>> The code you put in network receive path is not necessary;
>>> it would be cleaner to just use existing packet type all hook, and
>>> then PF_RING could be a loadable module without having to be  
>>> compiled in.
>>>
>>> --
>>>
>
> If you want, I'll try and do it. Are there any tests other
> than just ntop?


^ permalink raw reply

* Re: TCP_DEFER_ACCEPT is missing counter update
From: Willy Tarreau @ 2009-10-14 20:17 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <Pine.LNX.4.58.0910140912420.2714@u.domain.uli>

Hello Julian,

On Wed, Oct 14, 2009 at 10:27:50AM +0300, Julian Anastasov wrote:
> 	There were attempts to switch the server socket to
> established state, no SYN-ACK retransmissions after client ACK
> and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
> but due to some locking problems it was reverted:
> 
> http://marc.info/?t=121318919000001&r=1&w=2

interesting lecture, thanks for the link.

> 	Current handling is as before: drop client ACKs before DATA,
> stay in SYN_RECV state, send RST on outdated ACKs.

OK, that's what I observed too.

(...)
> > > 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> > > and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.
> > 
> > That's what I observe too, but the connection is silently dropped
> > afterwards and I'm clearly not sure this was the intended behaviour.
> 
> 	Yes, bad only for firewalls. Note that if
> TCP_SYNCNT/TCP_DEFER_ACCEPT is shorter the client still gets
> RST on next ACK. So, the problem is when client is silent
> after first ACK.

Not only, we have the problem that the application layer has no way
to be informed that someone connected and failed to send anything.

I have added the defer_accept feature in haproxy (http load balancer),
just as an optimization to avoid a EAGAIN upon first recv() attempt
to read the HTTP request after an accept. But users rely on stats and
logs a lot to check if their site works. If clients really connect and
fail to send a request (most often because of PPPoE outgoing MTU issues),
they want to see that in the stats in order to monitor their quality of
service. Also, having the ability to return a "408 request timeout" to
the users is a lot cleaner and easier to debug than not responding
anything at all, especially when there are reverse-proxies between
both ends.

So this behaviour has visible effects to the end user, that are not
expected at all when reading the doc.

> > So clearly this is in order to improve chances that the application
> > will receive the connection, no ?
> 
> 	Yes, it is not logical to configure
> TCP_DEFER_ACCEPT < TCP_SYNCNT with the idea TCP_DEFER_ACCEPT
> to reduce the SYN_RECV period. TCP_DEFER_ACCEPT does not
> reduce the SYN_RECV period (before ACK), may be this is lacking
> in man page. The idea to optionally give more time for DATA
> to come after ACK is more logical. If TCP_DEFER_ACCEPT
> switches state to ESTABLISHED may be then the TCP_DEFER_ACCEPT
> period should start after ACK, eg. "wait for DATA 10 seconds after
> ACK" sounds logical, it will need new name...

I don't even need to wait for 10 seconds after first ACK, being
able to drop only the first ACK is already excellent IMHO.

> Note that
> TCP_DEFER_ACCEPT is used also as flag for clients but you can do
> the same with TCP_QUICKACK=0 (to delay first ACK and to send
> DATA with first ACK).

Yes and I already use that, this gives substantial performance
boosts on small HTTP requests ; unfortunately I have no control
over the end-user's browser !

> > Yes this is what happens right now, but reading the man again
> > does not imply to me that the connection will not be accepted
> > once we reach the retransmit limit.
> > 
> > Maybe we have different usages and different interpretations of
> > the man can satisfy either, but I don't see what this would be
> > useful to in case we silently drop instead of finally accepting.
> 
> 	May be we want the same but currently TCP_DEFER_ACCEPT
> works as ACK dropper which is easier for implementation.

That's what I like too. No timer, very light requirements, just drop
the first empty ACK I'm not interested in, and that's all. That's
really what I understood from the man page.

> The semantic 'TCP_DEFER_ACCEPT extends the period after ACK'
> is good, you can tune it together with TCP_SYNCNT, to
> extend or not to extend the period. What happens on
> TCP_DEFER_ACCEPT expiration after ACK - we all prefer to
> see FIN, so we have to wait someone to come with new
> implementation.

Well, too much complicated for very little gain IMHO.

Regards,
Willy

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: David Miller @ 2009-10-14 20:15 UTC (permalink / raw)
  To: deri; +Cc: bcook, brad.doctor, netdev
In-Reply-To: <903D7FEC-34E8-4F70-ABC0-E56A3A5FBBE6@ntop.org>

From: Luca Deri <deri@ntop.org>
Date: Wed, 14 Oct 2009 21:54:26 +0200

> - packets to be filtered using both BPF and ACL-like filters

To me this is all that PF_RING adds, the ACL filter features.

And I see no reason why that can't be added to PF_PACKET, we've
extended PF_PACKET in many ways before (even the mmap ring
buffer layout can be modified trivially) so there is no reason
we can't add the ACL filter feature to PF_PACKET as well.

Adding all of PF_RING just for that is a joke.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Luca Deri @ 2009-10-14 19:54 UTC (permalink / raw)
  To: Brent Cook; +Cc: Brad Doctor, netdev
In-Reply-To: <200910141319.43884.bcook@bpointsys.com>

Brent
contrary to other socket types, PF_RING allows
- packets to be filtered using both BPF and ACL-like filters
- parsing information is returned as metadata with the packet (i.e.  
you don't have to parse the packet again as it happens with BPF)
- ACL-like filters allows you to specify advanced features such as  
port ranges or packet payload match

I agree with you that PF_RING has some overlaps with PACKET_RX/ 
TX_RING, but the main idea behind PF_RING is not just to accelerate  
packet capture. For instance in PF_RING you can have actions attached  
to rules, or extend PF_RING filtering/packet handling by means of  
plugins.
For instance I have coded PF_RING plugins for parsing and filtering  
inside PF_RING protocols such as SIP, RTP, RTCP just to name a few  
that I would like to release after PF_RING kernel integration.

For all those who want to see what you can do with PF_RING, I suggest  
to read this tutorial: http://luca.ntop.org/MulticorePacketCapture.pdf

Regards Luca

PF_RING allows you to specify mutiple
On Oct 14, 2009, at 8:19 PM, Brent Cook wrote:

> On Wednesday 14 October 2009 09:33:08 Brad Doctor wrote:
>> Greetings,
>>
>> On behalf of the users and developers of the PF_RING project, we  
>> would
>> like to ask consideration to include the PF_RING module in the main
>> line kernel.
>>
>> PF_RING (http://www.ntop.org/PF_RING.html) is a kernel module that
>> implements an mmap()-ed memory ring for accelerating packet capture
>> and for providing all the basic features a network monitoring
>> application needs. PF_RING includes several features such as packet
>> filtering, balancing across capture applications, packet reflection
>> (i.e. capture application can decide to bounce selected packets onto
>> an as-specified interface). Packets are filtered both using BPF and
>> using ACL-like rules (e.g. tcp and ports from 80 to 100). Using
>> PF_RING it is also possible to exploit multiple RX queues provided by
>> modern NIC adapters. PF_RING achieves a significant speedup by making
>> only one copy of the packet. Additionally, PF_RING is able to operate
>> in a capture-only installation, further increasing performance.
>
> What is the difference between PF_RING and the existing
> PACKET_RX_RING support (which is now complemented by PACKET_TX_RING).
>
> ggaoed makes use of both of these, though it is one of the few open- 
> source projects I've found that do: http://code.google.com/p/ggaoed/
>
> I've used PACKET_RX_RING with SO_ATTACH_FILTER to implement  
> filtering via BPF code. You can also set PACKET_COPY_THRESH to  
> filter on size, etc. Has anyone done a PF_RING/PACKET_RX_RING speed  
> comparison? They seem feature-wise pretty similar.
>
>> PF_RING has been around since 2003 and is very mature with an active
>> contributing developer base. The developer and user community use a
>> mailing list (http://listgateway.unipi.it/pipermail/ntop-misc/) for
>> discussions and submissions. PF_RING is used in several projects,
>> ranging from distributions such as DD-WRT/OpenWrt to improving
>> performance of applications like Snort and Wireshark. Many commercial
>> companies around the world in the field of intrusion detection and
>> traffic analysis rely on PF_RING for accelerating their products and
>> operations.
>>
>> The PF_RING module relies on a small patch to net/core/dev.c that
>> intercepts when a packet is received/transmitted so that it can be
>> passed to the PF_RING module when present and with an active  
>> listener.
>> Other than these minor changes, all the PF_RING code is
>> self-contained, comprising jut two files: ring.c and ring.h. PF_RING
>> is the result of many years of research and development specifically
>> into high-speed packet capture, and is homegrown. PF_RING uses the
>> stock GPL license.
>>
>> We feel that PF_RING is ready to be included with the mainline  
>> kernel.
>> We are ready and eager to support PF_RING for the long term.
>>
>> Thank you in advance for your consideration!
>>
>> -brad
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply

* Re: kernel mode pppoe ppp if + ifb + mirred redirect, ethernet packets in ifb?!
From: Jarek Poplawski @ 2009-10-14 19:59 UTC (permalink / raw)
  To: Denys Fedoryschenko; +Cc: hadi, netdev
In-Reply-To: <20091014191155.GA3471@del.dom.local>

On Wed, Oct 14, 2009 at 09:11:55PM +0200, Jarek Poplawski wrote:
> Denys Fedoryschenko wrote, On 10/13/2009 12:44 AM:
> ...
> > As i understand, for pppoe case, he can just skip offset for ethernet and 
> > pppoe header, and he can filter by ip, or not?
> > Current way is maybe better, cause someone who want to count everything with 
> > ethernet and pppoe headers - can, and who want without - also can (by setting 
> > offset , just a bit more difficult.
> > 
> > Like 
> > /sbin/tc filter add dev eth1 protocol 0x8864  parent 2:0 prio 1 u32 \
> > match u32 0x$IPREMOTE_HEX 0xffffffff at 24 flowid 2:$ID
> > (found in LARTC)
> 
> Maybe I miss something, but generally (for IP, TCP etc. matches) it
> should work "as usual". I think you and those other users you quoted
> were mislead by that tcpdump on ifb. Probably in some configs you
> might needed this 'protocol 0x8864' or 'protocol all'. You should
> see it on ppp's tcpdump then, like yours:

Hmm... Of course, like yours, where we can't see it ;-)
so 'protocol ip' is enough.

> 
> > PPPoE_146 ~ # tcpdump -ni ppp0 -e -vvv -s 1500 -c 4
> > tcpdump: WARNING:
> > tcpdump: listening on ppp0, link-type LINUX_SLL (Linux cooked), capture size 
> > 1500 bytes
> > 17:03:17.015598 Out ethertype IPv4 (0x0800), length 68: (tos 0x0, ttl 111, id 

BTW, let's note this 'protocol 0x8864' was used with dev eth1, so it's
a different case.

Jarek P.

^ permalink raw reply

* Re: PF_RING: Include in main line kernel?
From: Stephen Hemminger @ 2009-10-14 19:33 UTC (permalink / raw)
  To: Brad Doctor; +Cc: netdev, Luca Deri
In-Reply-To: <a07586b0910141002m2a9f4466j412245fdae4e5372@mail.gmail.com>

On Wed, 14 Oct 2009 11:02:45 -0600
Brad Doctor <brad.doctor@gmail.com> wrote:

> Good point for sure. We will make the change and get back to you.
> Thanks for the quick reply, we appreciate it very much!
> 
> -brad
> 
> On Wed, Oct 14, 2009 at 10:01 AM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> > On Wed, 14 Oct 2009 08:33:08 -0600
> > Brad Doctor <brad.doctor@gmail.com> wrote:
> >
> >> Greetings,
> >>
> >> On behalf of the users and developers of the PF_RING project, we would
> >> like to ask consideration to include the PF_RING module in the main
> >> line kernel.
> >>
> >> PF_RING (http://www.ntop.org/PF_RING.html) is a kernel module that
> >> implements an mmap()-ed memory ring for accelerating packet capture
> >> and for providing all the basic features a network monitoring
> >> application needs. PF_RING includes several features such as packet
> >> filtering, balancing across capture applications, packet reflection
> >> (i.e. capture application can decide to bounce selected packets onto
> >> an as-specified interface). Packets are filtered both using BPF and
> >> using ACL-like rules (e.g. tcp and ports from 80 to 100). Using
> >> PF_RING it is also possible to exploit multiple RX queues provided by
> >> modern NIC adapters. PF_RING achieves a significant speedup by making
> >> only one copy of the packet. Additionally, PF_RING is able to operate
> >> in a capture-only installation, further increasing performance.
> >>
> >> PF_RING has been around since 2003 and is very mature with an active
> >> contributing developer base. The developer and user community use a
> >> mailing list (http://listgateway.unipi.it/pipermail/ntop-misc/) for
> >> discussions and submissions. PF_RING is used in several projects,
> >> ranging from distributions such as DD-WRT/OpenWrt to improving
> >> performance of applications like Snort and Wireshark. Many commercial
> >> companies around the world in the field of intrusion detection and
> >> traffic analysis rely on PF_RING for accelerating their products and
> >> operations.
> >>
> >> The PF_RING module relies on a small patch to net/core/dev.c that
> >> intercepts when a packet is received/transmitted so that it can be
> >> passed to the PF_RING module when present and with an active listener.
> >> Other than these minor changes, all the PF_RING code is
> >> self-contained, comprising jut two files: ring.c and ring.h. PF_RING
> >> is the result of many years of research and development specifically
> >> into high-speed packet capture, and is homegrown. PF_RING uses the
> >> stock GPL license.
> >>
> >> We feel that PF_RING is ready to be included with the mainline kernel.
> >> We are ready and eager to support PF_RING for the long term.
> >>
> >> Thank you in advance for your consideration!
> >
> > I was going to wrap pfring up for the staging tree.
> > The code you put in network receive path is not necessary;
> > it would be cleaner to just use existing packet type all hook, and
> > then PF_RING could be a loadable module without having to be compiled in.
> >
> > --
> >

If you want, I'll try and do it. Are there any tests other
than just ntop?

^ permalink raw reply

* Re: [PATCH v2] net/fec_mpc52xx: Fix kernel panic on FEC error
From: Grant Likely @ 2009-10-14 19:32 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: John Bonesio, netdev, linuxppc-dev, davem
In-Reply-To: <4AD62247.4020102@grandegger.com>

On Wed, Oct 14, 2009 at 1:11 PM, Wolfgang Grandegger <wg@grandegger.com> wrote:
> Grant Likely wrote:
>> From: John Bonesio <bones@secretlab.ca>
>>
>> The MDIO bus cannot be accessed at interrupt context, but on an FEC
>> error, the fec_mpc52xx driver reset function also tries to reset the
>> PHY.  Since the error is detected at IRQ context, and the PHY functions
>> try to sleep, the kernel ends up panicking.
>>
>> Resetting the PHY on an FEC error isn't even necessary.  This patch
>> solves the problem by removing the PHY reset entirely.
>
> There is also no need to free and re-allocate the RX buffers in
> mpc52xx_fec_reset().

Write and test a patch for me!  :-)

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [Bug #14378] Problems with net/core/skbuff.c
From: Massimo Cetra @ 2009-10-14 19:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Massimo Cetra, David Miller, rjw, linux-kernel, kernel-testers,
	netdev
In-Reply-To: <4AD455E4.2070105@navynet.it>

Massimo Cetra ha scritto:
> Eric Dumazet ha scritto:
>> Could you please following patch ?
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8d00976..54bf091 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -454,7 +454,7 @@ static unsigned int free_old_xmit_skbs(struct 
>> virtnet_info *vi)
>>          vi->dev->stats.tx_bytes += skb->len;
>>          vi->dev->stats.tx_packets++;
>>          tot_sgs += skb_vnet_hdr(skb)->num_sg;
>> -        kfree_skb(skb);
>> +        dev_kfree_skb_any(skb);
>>      }
>>      return tot_sgs;
>>  }
>>
>>
>>   
> Thank you very much.
>
> Compiling.
> It' will be in production in a few minutes.
> I'll let you know if the problem arises again.
> give me a couple of days because this problem is randomly triggered.
> Sometimes it happens multiple times a day, sometimes none.
>
Eric,
thanks for the patch.
The problem didn't arise again and i haven't seen any warning like that 
on both servers where that problem was happening more frequently.

I would say that it's fixed and if it's not, i'll let you know as soon 
as it happens again.

Thanks again,
Max

^ permalink raw reply

* Re: [PATCH 6/6] Staging: octeon-ethernet: Convert to use PHY Abstraction Layer.
From: Ralf Baechle @ 2009-10-14 19:23 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-6-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:42PM -0700, David Daney wrote:

Also queued for 2.6.33.

  Ralf

^ permalink raw reply

* Re: [PATCH 5/6] netdev: Add Ethernet driver for Octeon MGMT devices.
From: Ralf Baechle @ 2009-10-14 19:23 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-5-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:41PM -0700, David Daney wrote:

Thanks, queued for 2.6.32.

  Ralf

^ permalink raw reply

* Re: [PATCH 4/6] MIPS: Octeon: Add register definitions for MGMT Ethernet driver.
From: Ralf Baechle @ 2009-10-14 19:22 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-4-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:40PM -0700, David Daney wrote:

Thanks, queued for 2.6.32.

  Ralf

^ permalink raw reply

* Re: [PATCH 3/6] MIPS: Octeon: Add platform devices MGMT Ethernet ports.
From: Ralf Baechle @ 2009-10-14 19:22 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-3-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:39PM -0700, David Daney wrote:

Thanks, queued for 2.6.32.

  Ralf

^ permalink raw reply

* Re: [PATCH 2/6] net: Add driver for Octeon MDIO buses.
From: Ralf Baechle @ 2009-10-14 19:22 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-2-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:38PM -0700, David Daney wrote:

Thanks, queued for 2.6.32.

  Ralf

^ permalink raw reply

* Re: [PATCH 1/6] MIPS: Octeon: Add platform device for MDIO buses.
From: Ralf Baechle @ 2009-10-14 19:22 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, netdev
In-Reply-To: <1255547082-7388-1-git-send-email-ddaney@caviumnetworks.com>

On Wed, Oct 14, 2009 at 12:04:37PM -0700, David Daney wrote:

Thanks, queued.

  Ralf

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox