Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: Use __this_cpu_inc() in fast path
From: Eric Dumazet @ 2010-05-20  8:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

This patch saves 224 bytes of text on my machine.

__this_cpu_inc() generates a single instruction, using no scratch
registers :

  65 ff 04 25 a8 30 01 00      incl   %gs:0x130a8

instead of :

  48 c7 c2 80 30 01 00         mov    $0x13080,%rdx
  65 48 8b 04 25 88 ea 00 00   mov    %gs:0xea88,%rax
  83 44 10 28 01               addl   $0x1,0x28(%rax,%rdx,1)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c   |    3 +--
 net/ipv4/route.c |    3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6c82065..76475f2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2826,8 +2826,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 			skb->dev = master;
 	}
 
-	__get_cpu_var(softnet_data).processed++;
-
+	__this_cpu_inc(softnet_data.processed);
 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
 	skb->mac_len = skb->network_header - skb->mac_header;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 560acc6..8495bce 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -253,8 +253,7 @@ static unsigned			rt_hash_mask __read_mostly;
 static unsigned int		rt_hash_log  __read_mostly;
 
 static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
-#define RT_CACHE_STAT_INC(field) \
-	(__raw_get_cpu_var(rt_cache_stat).field++)
+#define RT_CACHE_STAT_INC(field) __this_cpu_inc(rt_cache_stat.field)
 
 static inline unsigned int rt_hash(__be32 daddr, __be32 saddr, int idx,
 				   int genid)



^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Thomas Graf @ 2010-05-20  8:10 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, David Miller, bmb, nhorman, nhorman, netdev
In-Reply-To: <20100520065242.GA8719@gondor.apana.org.au>

On Thu, 2010-05-20 at 16:52 +1000, Herbert Xu wrote: 
> The value is set at socket creation time.  So all sockets created
> via socket(2) automatically gains the ID of the thread creating it.
> Whenever another process touches the socket by either reading or
> writing to it, we will change the socket classid to that of the
> process if it has a valid (non-zero) classid.

There is a fundamental problem with this. The process needs to be
associated with the cgroup before any sockets get created. Sockets
are often created right after the application starts. This means that
the only viable option is to start each application in a wrapper which
assigns itself to the cgroup and then forks the application as its
child. If a task is associated with a cgroup after it has started it
may lead to unpredictable outcome because only some of the sockets
may end up being classified.

This was the actual reason for the old method.


^ permalink raw reply

* Issue on RTL8169scd(l), can not transmit packet
From: steven @ 2010-05-20  8:21 UTC (permalink / raw)
  To: netdev; +Cc: romieu

Dear All,

I am porting openPOWERLINK to Loongson/MIPS platform, so I have to write
a driver for RTL8169scl(RTL8169scd in kernel) firstly.

Now, my driver could receive packet(confirmed by printing the content of
packets and comparing them with the data wireshark captured). 

But RTL8169 didn't send any packet(I use wireshark to capture the
packet). And I found some bits in Tx Desc(Transmit Descriptor) has been
modified, such as OWN, that's to say RTL8169 could read/write the
transmit descriptor buffer. Things seems OK, but I can not capture any
packet.()
  I use pci_alloc_consistent to allocate the memory for TxDesc/RxDesc Tx
buffer/Rx buffer. The first field of my TxDesc is 0xb000003c.
  The environment:
                 Target<===> HUB <===>PC_with_wireshark

The LED(whose port connects to the target) on HUB doesn't flash when the
target keeps sending packet or it receives packets.

I print the content of all the registers of RTL8169 in Linux kernel, I
found the configuration in kernel and in my driver are the
same except PHY Access, but the config of all the PHY regs are the same
with kernel's. There is no "system error interrupt" occurs, but I am not
sure whether it will screw up. 

[14403.742503] 0x10---------0x9EAD4000
[14403.742991] 0x14---------0x00000000
[14403.743478] 0x18---------0x00000000
[14403.744799] 0x1C---------0x00000000
[14403.745309] 0x20---------0x9EB18000
[14403.745801] 0x24---------0x00000000
[14403.746289] 0x28---------0x05744700
[14403.746778] 0x2C---------0xB61C07FB
[14403.747265] 0x37---------0x0C
[14403.747702] 0x38---------0x00
[14403.749221] 0x3C---------0xC1FF
[14403.749692] 0x3E---------0x0000
[14403.750146] 0x40---------0x1B000600
[14403.750636] 0x44---------0x00000000
[14403.751123] 0x48---------0x09AF2BDA
[14403.751612] 0x50---------0x00
[14403.752973] 0x51---------0x05
[14403.753423] 0x52---------0xCD
[14403.753860] 0x53---------0x00
[14403.754295] 0x54---------0xA1
[14403.754731] 0x55---------0x00
[14403.755164] 0x56---------0x01
[14403.755600] 0x58---------0x00000000
[14403.757212] 0x60---------0x00000000
[14403.757722] 0x6c---------0x06
[14403.758160] 0xC4---------0x8EAD
[14403.758771] 0xC6---------0x7962
[14403.759230] 0xC8---------0x1020
[14403.759682] 0xCA---------0x45D6
[14403.761115] 0xCC---------0x6353
[14403.761589] 0xDA---------0x4000
[14403.762047] 0xE0---------0x0028
[14403.762500] 0xE4---------0x9E76C000
[14403.762987] 0xE8---------0x00000000
[14403.763475] 0xEC---------0x0000003F

Does anyone have some idea? Thank you very very much.

Steven


^ permalink raw reply

* [PATCH] net: remove zap_completion_queue
From: Eric Dumazet @ 2010-05-20  9:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

netpoll does an interesting work in zap_completion_queue(), but this was
before we did skb orphaning before delivering packets to device.

It now makes sense to add a test in dev_kfree_skb_irq() to not queue a
skb if already orphaned, and to remove netpoll zap_completion_queue() as
a bonus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c     |    4 +++-
 net/core/netpoll.c |   31 -------------------------------
 2 files changed, 3 insertions(+), 32 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6c82065..37f1641 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1578,7 +1578,9 @@ EXPORT_SYMBOL(__netif_schedule);
 
 void dev_kfree_skb_irq(struct sk_buff *skb)
 {
-	if (atomic_dec_and_test(&skb->users)) {
+	if (!skb->destructor)
+		dev_kfree_skb(skb);
+	else if (atomic_dec_and_test(&skb->users)) {
 		struct softnet_data *sd;
 		unsigned long flags;
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 94825b1..e034342 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -49,7 +49,6 @@ static atomic_t trapped;
 		(MAX_UDP_CHUNK + sizeof(struct udphdr) + \
 				sizeof(struct iphdr) + sizeof(struct ethhdr))
 
-static void zap_completion_queue(void);
 static void arp_reply(struct sk_buff *skb);
 
 static unsigned int carrier_timeout = 4;
@@ -197,7 +196,6 @@ void netpoll_poll_dev(struct net_device *dev)
 
 	service_arp_queue(dev->npinfo);
 
-	zap_completion_queue();
 }
 
 void netpoll_poll(struct netpoll *np)
@@ -221,40 +219,11 @@ static void refill_skbs(void)
 	spin_unlock_irqrestore(&skb_pool.lock, flags);
 }
 
-static void zap_completion_queue(void)
-{
-	unsigned long flags;
-	struct softnet_data *sd = &get_cpu_var(softnet_data);
-
-	if (sd->completion_queue) {
-		struct sk_buff *clist;
-
-		local_irq_save(flags);
-		clist = sd->completion_queue;
-		sd->completion_queue = NULL;
-		local_irq_restore(flags);
-
-		while (clist != NULL) {
-			struct sk_buff *skb = clist;
-			clist = clist->next;
-			if (skb->destructor) {
-				atomic_inc(&skb->users);
-				dev_kfree_skb_any(skb); /* put this one back */
-			} else {
-				__kfree_skb(skb);
-			}
-		}
-	}
-
-	put_cpu_var(softnet_data);
-}
-
 static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
 {
 	int count = 0;
 	struct sk_buff *skb;
 
-	zap_completion_queue();
 	refill_skbs();
 repeat:
 



^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Thomas Graf @ 2010-05-20  9:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, David Miller, bmb, nhorman, nhorman, netdev
In-Reply-To: <1274343043.23393.7.camel@lsx.localdomain>

On Thu, 2010-05-20 at 10:10 +0200, Thomas Graf wrote: 
> There is a fundamental problem with this. The process needs to be
> associated with the cgroup before any sockets get created. Sockets
> are often created right after the application starts. This means that
> the only viable option is to start each application in a wrapper which
> assigns itself to the cgroup and then forks the application as its
> child. If a task is associated with a cgroup after it has started it
> may lead to unpredictable outcome because only some of the sockets
> may end up being classified.
> 
> This was the actual reason for the old method.

Disregard this. I didn't read your latest patch correctly.


^ permalink raw reply

* RFC: netfilter: synproxy iptables target
From: Changli Gao @ 2010-05-20  9:46 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List, Linux Netdev List

I have implemented a simple SYNPROXY iptables target. It is much like
the SYNPROXY implementation in pf of OpenBSD, but won't have state
until the first connection is established with the help of syncookies.
The code is hosted at github:

http://github.com/xiaosuo/xiaosuo/tree/master/synproxy/

Currently, it can work with firewall and local socket.

It is in the very early stage, and ugly. And I will add --timeout
parameter to this target as TCP_DFER_ACCEPT, so I can do NAT basing on
the request data.

i.e.

iptables -t nat -A OUTPUT -p tcp -m synproxy --http-url "*.jpg" -j
DNAT --to-destination $image_http_server:80

And is there any chance to merge it into mainline?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: loosing IPMI-card by loading netconsole
From: Henning Fehrmann @ 2010-05-20 10:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ronciak, John, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Allan, Bruce W, Waskiewicz Jr, Peter P, netdev@vger.kernel.org,
	Carsten Aulbert
In-Reply-To: <20100518131216.GA24750@localhost>



Hello,
> 
> 
> Let me re-describe the symptoms.
> I am not loading any ipmi related modules and not the netconsole
> module.
> When booting out current 2.6.32 kernel we can not access the IPMI
> remotely.
> 
> We had one case where the IPMI card was accessible while using this
> kernel but probably due to the fact that eth0 was removed. We do not
> consider this case anymore. 
> 
> This problem does not occur when using an older kernel. 
> 
> It has likely nothing to do with netconsole.
> 
> Here is the bisecting result:
> 
> The sha1 sum of the first bad commit is: 
> 6e50912a442947d5fafd296ca6fdcbeb36b163ff
> 
> Hence, the last good commit has:
> b2f8f7525c8aa1fdd8ad8c72c832dfb571d5f768

I 'reverse patched' the changes: 

diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index e909f96..1342e0b 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -427,6 +427,8 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
                        e1000_validate_option(&crc_stripping, &opt, adapter);
                        if (crc_stripping == OPTION_ENABLED)
                                adapter->flags2 |= FLAG2_CRC_STRIPPING;
+               } else {
+                       adapter->flags2 |= FLAG2_CRC_STRIPPING;
                }
        }
        { /* Kumeran Lock Loss Workaround */


and compiled the kernel. This kernel works and the IPMI card is remotely accessible. 

Can we savely remove this two lines or are we running into other problems?

Cheers,
Henning

^ permalink raw reply related

* Re: [PATCH 06/11] netdev: bfin_mac: avoid tx skb overflows in the tx DMA ring
From: Sonic Zhang @ 2010-05-20 10:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100520.030835.267363374.davem@davemloft.net>

On Thu, May 20, 2010 at 6:08 PM, David Miller <davem@davemloft.net> wrote:
> From: Sonic Zhang <sonic.adi@gmail.com>
> Date: Thu, 20 May 2010 17:38:07 +0800
>
>> On Thu, May 20, 2010 at 4:12 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Sonic Zhang <sonic.adi@gmail.com>
>>> Date: Wed, 19 May 2010 17:23:16 +0800
>>>
>>>> No, this doesn't happen, because before ndo_start_xmit() returns, the
>>>> old TX buffers and skbs in the ring, which finished DMA operation, are
>>>> freed. The only difference is that the free operation of a skb is done
>>>> in next tx transfer.
>>>
>>> This is still illegal.
>>>
>>> What if TX activity stops right then, and there is no "next tx
>>> transfer"?
>>>
>> The skb remain in the TX ring will be freed finally when ndo_stop() is
>> called to shutdown the network. So, this is not a problem.
>
> You really don't understand me, and I'm starting to get really
> frustrated.  You must free all packets in your TX ring in a very
> small, finite, amount of time.  This is not optional.  And this
> must happen regardless of what TX traffic which occurs in the future,
> that means it must happen even if TX traffic suddenly stops.
>
> Your driver's behavior is absolutely not acceptable.
>
> Leaving the SKB In the TX ring like that means that potentially there
> is a socket in the system or other major resource that cannot be released
> and freed up.
>
> Please stop your driver from keeping packets in the TX ring indefinitely.
>

Forgot to CC netdev mailing list in my last reply.
Try again.

Sonic

^ permalink raw reply

* Re: [PATCH 06/11] netdev: bfin_mac: avoid tx skb overflows in the tx DMA ring
From: Sonic Zhang @ 2010-05-20 10:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100520.030835.267363374.davem@davemloft.net>

On Thu, May 20, 2010 at 6:08 PM, David Miller <davem@davemloft.net> wrote:
> From: Sonic Zhang <sonic.adi@gmail.com>
> Date: Thu, 20 May 2010 17:38:07 +0800
>
>> On Thu, May 20, 2010 at 4:12 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Sonic Zhang <sonic.adi@gmail.com>
>>> Date: Wed, 19 May 2010 17:23:16 +0800
>>>
>>>> No, this doesn't happen, because before ndo_start_xmit() returns, the
>>>> old TX buffers and skbs in the ring, which finished DMA operation, are
>>>> freed. The only difference is that the free operation of a skb is done
>>>> in next tx transfer.
>>>
>>> This is still illegal.
>>>
>>> What if TX activity stops right then, and there is no "next tx
>>> transfer"?
>>>
>> The skb remain in the TX ring will be freed finally when ndo_stop() is
>> called to shutdown the network. So, this is not a problem.
>
> You really don't understand me, and I'm starting to get really
> frustrated.  You must free all packets in your TX ring in a very
> small, finite, amount of time.  This is not optional.  And this
> must happen regardless of what TX traffic which occurs in the future,
> that means it must happen even if TX traffic suddenly stops.
>

OK. I didn't figure out that the socket may not be closed if its skbs
stay active somewhere in system for a long time. Thanks for your
explanation. I will send a new patch to enabled tx interrupt and free
skb from tx ring immediately.


Sonic

> Your driver's behavior is absolutely not acceptable.
>
> Leaving the SKB In the TX ring like that means that potentially there
> is a socket in the system or other major resource that cannot be released
> and freed up.
>
> Please stop your driver from keeping packets in the TX ring indefinitely.
>

^ permalink raw reply

* Re: will 2 cpu simultaneously process packets which have same hash value on multiqueue nic?
From: Eilon Greenstein @ 2010-05-20 11:46 UTC (permalink / raw)
  To: Jon Zhou; +Cc: netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F2497F5B7D6@MILEXCH2.ds.jdsu.net>

On Wed, 2010-05-19 at 22:37 -0700, Jon Zhou wrote:
> will 2 cpu simultaneously  process packets which have same hash value on multiqueue nic?
> 
> let 's take broadcom 57711 bnx2x_main.c as an example:
> 
> #1 packet1->queue=1
> #2 packet2->queue=1
> 
> will cpu1 and cpu2 execute the function " bnx2x_rx_int" in parallel, to receive packet1 & packet2
Both packets will be handled by the same queue and the queue processing
is serialized - so the packets will be handled one after the other.

Regards,
Eilon



^ permalink raw reply

* Re: will 2 cpu simultaneously process packets which have same hash value on multiqueue nic?
From: Eric Dumazet @ 2010-05-20 12:00 UTC (permalink / raw)
  To: eilong; +Cc: Jon Zhou, netdev@vger.kernel.org
In-Reply-To: <1274355967.17395.5.camel@lb-tlvb-eilong.il.broadcom.com>

Le jeudi 20 mai 2010 à 14:46 +0300, Eilon Greenstein a écrit :
> On Wed, 2010-05-19 at 22:37 -0700, Jon Zhou wrote:
> > will 2 cpu simultaneously  process packets which have same hash value on multiqueue nic?
> > 
> > let 's take broadcom 57711 bnx2x_main.c as an example:
> > 
> > #1 packet1->queue=1
> > #2 packet2->queue=1
> > 
> > will cpu1 and cpu2 execute the function " bnx2x_rx_int" in parallel, to receive packet1 & packet2
> Both packets will be handled by the same queue and the queue processing
> is serialized - so the packets will be handled one after the other.

I am scratching my head to understand both the question and your
answer...

if cpu1 is handling an interrupt, cpu2 cannot handle an interrupt at the
same time for same queue. It must be for a different queue.

Therefore, packets will be handled in parallel.

Serialization might be done later, at socket layer to queue packets in a
receive queue for example, if both packets must be delivered on same
socket.




^ permalink raw reply

* Re: will 2 cpu simultaneously process packets which have same hash value on multiqueue nic?
From: Eilon Greenstein @ 2010-05-20 12:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jon Zhou, netdev@vger.kernel.org
In-Reply-To: <1274356826.4046.29.camel@edumazet-laptop>

On Thu, 2010-05-20 at 05:00 -0700, Eric Dumazet wrote:
> Le jeudi 20 mai 2010 à 14:46 +0300, Eilon Greenstein a écrit :
> > On Wed, 2010-05-19 at 22:37 -0700, Jon Zhou wrote:
> > > will 2 cpu simultaneously  process packets which have same hash value on multiqueue nic?
> > > 
> > > let 's take broadcom 57711 bnx2x_main.c as an example:
> > > 
> > > #1 packet1->queue=1
> > > #2 packet2->queue=1
> > > 
> > > will cpu1 and cpu2 execute the function " bnx2x_rx_int" in parallel, to receive packet1 & packet2
> > Both packets will be handled by the same queue and the queue processing
> > is serialized - so the packets will be handled one after the other.
> 
> I am scratching my head to understand both the question and your
> answer...
> 
I was trying to say that since both packets are handled on the same
queue - it is handled one after the other and not by two different CPUs
working on the same queue at once.

> if cpu1 is handling an interrupt, cpu2 cannot handle an interrupt at the
> same time for same queue. It must be for a different queue.
> 
> Therefore, packets will be handled in parallel.
What do you mean "in parallel"? As you wrote above, only one cpu is
handling the packets from that queue, and since the hash is the same,
even RPS will not allow different CPUs to handle those 2 packets.

> 
> Serialization might be done later, at socket layer to queue packets in a
> receive queue for example, if both packets must be delivered on same
> socket.
Though this is always true, the case of receiving packets on the same
queue is even simpler, is not it?




^ permalink raw reply

* Re: [PATCH] netfilter: fix description of expected checkentry return code on xt_target
From: Patrick McHardy @ 2010-05-20 13:59 UTC (permalink / raw)
  To: Luciano Coelho; +Cc: netfilter-devel, netdev
In-Reply-To: <1274126430-13744-1-git-send-email-luciano.coelho@nokia.com>

Luciano Coelho wrote:
> The text describing the return codes that are expected on calls to
> checkentry() was incorrect.  Instead of returning true or false, or an error
> code, it should return 0 or an error code.

Applied, thanks.

^ permalink raw reply

* [PATCH] sh_eth: Fix memleak in sh_mdio_release
From: Denis Kirjanov @ 2010-05-20 14:00 UTC (permalink / raw)
  To: davem; +Cc: shimoda.yoshihiro, iwamatsu, morimoto.kuninori, netdev

Allocated memory for IRQs should be freed when releasing the mii_bus

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---

drivers/net/sh_eth.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sh_eth.c b/drivers/net/sh_eth.c
index 586ed09..501a55f 100644
--- a/drivers/net/sh_eth.c
+++ b/drivers/net/sh_eth.c
@@ -1294,6 +1294,9 @@ static int sh_mdio_release(struct net_device *ndev)
 	/* remove mdio bus info from net_device */
 	dev_set_drvdata(&ndev->dev, NULL);
 
+	/* free interrupts memory */
+	kfree(bus->irq);
+
 	/* free bitbang info */
 	free_mdio_bitbang(bus);
 

^ permalink raw reply related

* Re: [PATCH iproute] ip: add support for multicast rules
From: Patrick McHardy @ 2010-05-20 14:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linux Netdev List
In-Reply-To: <20100519090354.6dc6d82b@nehalam>

Stephen Hemminger wrote:
> On Tue, 13 Apr 2010 17:06:24 +0200
> Patrick McHardy <kaber@trash.net> wrote:
> 
>> This patch adds support for a "ip mrule" command, which is used
>> to configure multicast routing rules.
>>
>> The corresponding kernel patches have been sent to Dave and
>> should (hopefully) appear in net-next soon.
> 
> The fib_rules.h file in iproute2 is kept in sync with the kernel
> headers. But I do not see the definitions of FIB_RULES_IPV4 etc
> in net-next kernel.  What happened to this?

Those got changed again during the addition of IPv6 support.
I'll send a new version shortly, including IPv6 support.

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Patrick McHardy @ 2010-05-20 14:11 UTC (permalink / raw)
  To: Changli Gao; +Cc: Netfilter Developer Mailing List, Linux Netdev List
In-Reply-To: <AANLkTimp6S7MpwDAT8l-M0ZWjs2HIcUEfL5f8j9-QDZh@mail.gmail.com>

Changli Gao wrote:
> I have implemented a simple SYNPROXY iptables target. It is much like
> the SYNPROXY implementation in pf of OpenBSD, but won't have state
> until the first connection is established with the help of syncookies.
> The code is hosted at github:
> 
> http://github.com/xiaosuo/xiaosuo/tree/master/synproxy/
> 
> Currently, it can work with firewall and local socket.
> 
> It is in the very early stage, and ugly. And I will add --timeout
> parameter to this target as TCP_DFER_ACCEPT, so I can do NAT basing on
> the request data.
> 
> i.e.
> 
> iptables -t nat -A OUTPUT -p tcp -m synproxy --http-url "*.jpg" -j
> DNAT --to-destination $image_http_server:80
> 
> And is there any chance to merge it into mainline?

If you can state a good use case, sure. I don't know much about the
PF synproxy myself.

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Changli Gao @ 2010-05-20 14:21 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List, Linux Netdev List
In-Reply-To: <4BF54310.6030004@trash.net>

On Thu, May 20, 2010 at 10:11 PM, Patrick McHardy <kaber@trash.net> wrote:
> Changli Gao wrote:
>> I have implemented a simple SYNPROXY iptables target. It is much like
>> the SYNPROXY implementation in pf of OpenBSD, but won't have state
>> until the first connection is established with the help of syncookies.
>> The code is hosted at github:
>>
>> http://github.com/xiaosuo/xiaosuo/tree/master/synproxy/
>>
>> Currently, it can work with firewall and local socket.
>>
>> It is in the very early stage, and ugly. And I will add --timeout
>> parameter to this target as TCP_DFER_ACCEPT, so I can do NAT basing on
>> the request data.
>>
>> i.e.
>>
>> iptables -t nat -A OUTPUT -p tcp -m synproxy --http-url "*.jpg" -j
>> DNAT --to-destination $image_http_server:80
>>
>> And is there any chance to merge it into mainline?
>
> If you can state a good use case, sure. I don't know much about the
> PF synproxy myself.
>

pure synproxy can be used on firewall to protect the internal servers,
which don't support neither syncookies and synproxy, from the attack
of SYN-flood.

synproxy with defered connection relay acts as a layer 7 proxy, but
works in kernel space totally, unlike tcp splice tech., which needs
the applications in user space parse the requests, and establish the
connections.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [net-next-2.6 V9 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-20 14:24 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <20100518054833.21787.45456.stgit@savbu-pc100.cisco.com>

Scott Feldman wrote:
> From: Scott Feldman <scofeldm@cisco.com>
> 
> Add new netdev ops ndo_{set|get}_vf_port to allow setting of
> port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
> RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
> (added to end of IFLA_cmd list).  These are both nested atrtibutes
> using this layout:
> 
>...

Appologies for the delay, my mailserver failed me once again :|

This version looks very good to me, just one question:

> +static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct nlattr *vf_ports;
> +	struct nlattr *vf_port;
> +	int vf;
> +	int err;
> +
> +	vf_ports = nla_nest_start(skb, IFLA_VF_PORTS);
> +	if (!vf_ports)
> +		return -EMSGSIZE;
> +
> +	for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
> +		vf_port = nla_nest_start(skb, IFLA_VF_PORT);
> +		if (!vf_port) {
> +			nla_nest_cancel(skb, vf_ports);
> +			return -EMSGSIZE;
> +		}
> +		NLA_PUT_U32(skb, IFLA_PORT_VF, vf);
> +		err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
> +		if (err) {
> +nla_put_failure:
> +			nla_nest_cancel(skb, vf_port);
> +			continue;

Are you sure you want to continue here? During a full dump
this means that the last VF port fitting in the skb will most
likely be incomplete since the higher layer code won't
notice that the size was exceeded and the entry should be
dumped into another skb.

> +		}
> +		nla_nest_end(skb, vf_port);
> +	}
> +
> +	nla_nest_end(skb, vf_ports);
> +
> +	return 0;
> +}

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Patrick McHardy @ 2010-05-20 14:25 UTC (permalink / raw)
  To: Changli Gao; +Cc: Netfilter Developer Mailing List, Linux Netdev List
In-Reply-To: <AANLkTimjY_0yKfMtAUxI7He-QH5R5AdDCR3V8eKSgbGq@mail.gmail.com>

Changli Gao wrote:
> On Thu, May 20, 2010 at 10:11 PM, Patrick McHardy <kaber@trash.net> wrote:
>> Changli Gao wrote:
>>> I have implemented a simple SYNPROXY iptables target. It is much like
>>> the SYNPROXY implementation in pf of OpenBSD, but won't have state
>>> until the first connection is established with the help of syncookies.
>>> The code is hosted at github:
>>>
>>> http://github.com/xiaosuo/xiaosuo/tree/master/synproxy/
>>>
>>> Currently, it can work with firewall and local socket.
>>>
>>> It is in the very early stage, and ugly. And I will add --timeout
>>> parameter to this target as TCP_DFER_ACCEPT, so I can do NAT basing on
>>> the request data.
>>>
>>> i.e.
>>>
>>> iptables -t nat -A OUTPUT -p tcp -m synproxy --http-url "*.jpg" -j
>>> DNAT --to-destination $image_http_server:80
>>>
>>> And is there any chance to merge it into mainline?
>> If you can state a good use case, sure. I don't know much about the
>> PF synproxy myself.
>>
> 
> pure synproxy can be used on firewall to protect the internal servers,
> which don't support neither syncookies and synproxy, from the attack
> of SYN-flood.
> 
> synproxy with defered connection relay acts as a layer 7 proxy, but
> works in kernel space totally, unlike tcp splice tech., which needs
> the applications in user space parse the requests, and establish the
> connections.

I can't say much before seeing any code, but no general objections
from my side.

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Eric Dumazet @ 2010-05-20 14:32 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, Netfilter Developer Mailing List,
	Linux Netdev List
In-Reply-To: <AANLkTimjY_0yKfMtAUxI7He-QH5R5AdDCR3V8eKSgbGq@mail.gmail.com>

Le jeudi 20 mai 2010 à 22:21 +0800, Changli Gao a écrit :

> 
> pure synproxy can be used on firewall to protect the internal servers,
> which don't support neither syncookies and synproxy, from the attack
> of SYN-flood.
> 

protecting servers using conntracking ?

Thats seems very dangerous to me.

> synproxy with defered connection relay acts as a layer 7 proxy, but
> works in kernel space totally, unlike tcp splice tech., which needs
> the applications in user space parse the requests, and establish the
> connections.
> 

In the example given, only non persistent connections are handled...

These days, browsers and servers dont establish one socket per http
request...



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Changli Gao @ 2010-05-20 14:33 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Developer Mailing List, Linux Netdev List
In-Reply-To: <4BF5464D.4090409@trash.net>

On Thu, May 20, 2010 at 10:25 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> I can't say much before seeing any code, but no general objections
> from my side.
>

The code is on github. (http://github.com/xiaosuo/xiaosuo). As it
isn't in good shape, and needs future modifications, I leave it there.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RFC: netfilter: synproxy iptables target
From: Changli Gao @ 2010-05-20 14:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, Netfilter Developer Mailing List,
	Linux Netdev List
In-Reply-To: <1274365963.4046.39.camel@edumazet-laptop>

On Thu, May 20, 2010 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 20 mai 2010 à 22:21 +0800, Changli Gao a écrit :
>
>>
>> pure synproxy can be used on firewall to protect the internal servers,
>> which don't support neither syncookies and synproxy, from the attack
>> of SYN-flood.
>>
>
> protecting servers using conntracking ?
>
> Thats seems very dangerous to me.

If NAT is needed, conntracking is needed in any way. The conntrack
won't be confirmed until the connection between firewall and client is
established.

>
>> synproxy with defered connection relay acts as a layer 7 proxy, but
>> works in kernel space totally, unlike tcp splice tech., which needs
>> the applications in user space parse the requests, and establish the
>> connections.
>>
>
> In the example given, only non persistent connections are handled...
>
> These days, browsers and servers dont establish one socket per http
> request...
>
>

Yea. But some users still use non persistent connections, as they want
to fetch URLs in parallel.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: loosing IPMI-card by loading netconsole
From: Allan, Bruce W @ 2010-05-20 15:01 UTC (permalink / raw)
  To: Henning Fehrmann, Tejun Heo
  Cc: Ronciak, John, Kirsher, Jeffrey T, Brandeburg, Jesse,
	Waskiewicz Jr, Peter P, netdev@vger.kernel.org, Carsten Aulbert
In-Reply-To: <20100520101601.GA26235@localhost>

Those lines should not be removed as the default behavior should be to strip the CRC, there was a bug in previous versions of the kernel/driver that caused the default behavior to be reversed from what was intended.  For your particular BMC, however, the CRC appears to be required so you will need to use the CrcStripping=0 module parameter to reverse the default behavior.

-----Original Message-----
From: Henning Fehrmann [mailto:henning.fehrmann@aei.mpg.de] 
Sent: Thursday, May 20, 2010 3:16 AM
To: Tejun Heo
Cc: Ronciak, John; Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Waskiewicz Jr, Peter P; netdev@vger.kernel.org; Carsten Aulbert
Subject: Re: loosing IPMI-card by loading netconsole



Hello,
> 
> 
> Let me re-describe the symptoms.
> I am not loading any ipmi related modules and not the netconsole
> module.
> When booting out current 2.6.32 kernel we can not access the IPMI
> remotely.
> 
> We had one case where the IPMI card was accessible while using this
> kernel but probably due to the fact that eth0 was removed. We do not
> consider this case anymore. 
> 
> This problem does not occur when using an older kernel. 
> 
> It has likely nothing to do with netconsole.
> 
> Here is the bisecting result:
> 
> The sha1 sum of the first bad commit is: 
> 6e50912a442947d5fafd296ca6fdcbeb36b163ff
> 
> Hence, the last good commit has:
> b2f8f7525c8aa1fdd8ad8c72c832dfb571d5f768

I 'reverse patched' the changes: 

diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index e909f96..1342e0b 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -427,6 +427,8 @@ void __devinit e1000e_check_options(struct e1000_adapter *adapter)
                        e1000_validate_option(&crc_stripping, &opt, adapter);
                        if (crc_stripping == OPTION_ENABLED)
                                adapter->flags2 |= FLAG2_CRC_STRIPPING;
+               } else {
+                       adapter->flags2 |= FLAG2_CRC_STRIPPING;
                }
        }
        { /* Kumeran Lock Loss Workaround */


and compiled the kernel. This kernel works and the IPMI card is remotely accessible. 

Can we savely remove this two lines or are we running into other problems?

Cheers,
Henning

^ permalink raw reply related

* [ANNOUNCE]: Release of iptables-1.4.8
From: Patrick McHardy @ 2010-05-20 15:08 UTC (permalink / raw)
  To: Netfilter Development Mailinglist
  Cc: netfilter, netfilter-announce, Linux Netdev List, netfilter-core

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

The netfilter coreteam presents:

    iptables version 1.4.8

the iptables release for the 2.6.34 kernel. Changes include:

- support for the new CT target
- port parsing fixes in the REDIRECT and MASQUERADE targets
- iprange v0 parsing fixes
- removal of MARK target restriction to the mangle table
- documentation updates
- inclusion of the nfnl_osf program for OS fingerprinting support

See the Changelog for more details.

Version 1.4.8 can be obtained from:

http://www.netfilter.org/projects/iptables/downloads.html
ftp://ftp.netfilter.org/pub/iptables/
git://git.netfilter.org/iptables.git

On behalf of the Netfilter Core Team.
Happy firewalling!

[-- Attachment #2: changes-iptables-1.4.8.txt --]
[-- Type: text/plain, Size: 1035 bytes --]

Dmitry V. Levin (3):
      extensions: REDIRECT: fix --to-ports parser
      iptables: add noreturn attribute to exit_tryhelp()
      extensions: MASQUERADE: fix --to-ports parser

Jan Engelhardt (8):
      libxt_comment: avoid use of IPv4-specific examples
      libxt_CT: add a manpage
      iptables: correctly check for too-long chain/target/match names
      doc: libxt_MARK: no longer restricted to mangle table
      doc: remove claim that TCPMSS is limited to mangle
      libxt_recent: add a missing space in output
      doc: add manpage for libxt_osf
      libxt_osf: import nfnl_osf program

Karl Hiramoto (1):
      iptables: optionally disable largefile support

Pablo Neira Ayuso (1):
      CT: fix --ctevents parsing

Patrick McHardy (3):
      extensions: add CT extension
      libxt_CT: print conntrack zone in ->print/->save
      xtables: fix compilation when debugging is enabled

Simon Lodal (1):
      libxt_conntrack: document --ctstate UNTRACKED

Vincent Bernat (1):
      iprange: fix xt_iprange v0 parsing


^ permalink raw reply

* Re: [patch] IPVS: one-packet scheduling
From: Patrick McHardy @ 2010-05-20 15:17 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, lvs-devel, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Nick Chalk
In-Reply-To: <20100519031408.GC18534@verge.net.au>

Simon Horman wrote:
> From: Nick Chalk <nick@loadbalancer.org>
> 
> IPVS: one-packet scheduling
> 
> Allow one-packet scheduling for UDP connections. When the fwmark-based or
> normal virtual service is marked with '-o' or '--ops' options all
> connections are created only to schedule one packet. Useful to schedule UDP
> packets from same client port to different real servers. Recommended with
> RR or WRR schedulers (the connections are not visible with ipvsadm -L).

I'm afraid its too late in this merge window for new features
since Dave has already sent his merge request to Linus.

Please resend once the net-next (and nf-next) tree opens up.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox