Netdev List
 help / color / mirror / Atom feed
* RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Rose, Gregory V @ 2010-05-25 17:08 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, kvm@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <1274806289.18023.57.camel@localhost.localdomain>

>-----Original Message-----
>From: Shirley Ma [mailto:mashirle@us.ibm.com]
>Sent: Tuesday, May 25, 2010 9:51 AM
>To: Rose, Gregory V
>Cc: Kirsher, Jeffrey T; davem@davemloft.net; kvm@vger.kernel.org;
>netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net
>Subject: RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
>
>On Tue, 2010-05-25 at 08:57 -0700, Rose, Gregory V wrote:
>> Just use ifconfig and vconfig utilities to set the MAC and VLAN for
>> each VF.  There shouldn't be any need for secondary addresses because
>> they're not like physical devices where each VF has a pre-programmed
>> HW MAC address.  The initial MAC address of each VF is generated on
>> the fly during the PF driver initialization. You can change it as you
>> see fit and then put the VF on a VLAN using vconfig.  After you do
>> that you have a macvlan filter for that VF.
>
>I run macvlan test not vlan. macvlan is used to give a second MAC
>address to a network adapter and see it as a new device at the higher
>levels. The command is used as follow:
>
>ip link add link eth4 address 54:52:00:35:e3:20 macvlan2 type macvlan
>
>It will create an interface name macvlan2 with above MAC address. In
>kernel, netdev eth4 maintains this secondary address.

Right, so what I'm saying is that if you load the VF driver it will create and <eth> interface for each VF you've created.  You can assign a MAC and VLAN to that <eth> interface and will get essentially the same behavior.

- Greg


^ permalink raw reply

* Re: [PATCH] mlx4_en: show device's port used
From: Roland Dreier @ 2010-05-25 16:55 UTC (permalink / raw)
  To: Eli Cohen; +Cc: davem, netdev, Linux RDMA list, yevgenyp
In-Reply-To: <20100525135548.GA12749@mtldesk030.lab.mtl.com>

 > Add a sysfs file under /sys/class/net/<ethx> to show the port number within the
 > device that this network interface is using. This is needed as ConnectX devices
 > have two ports and it is useful to know which port the ethernet devices uses.

How do other multi-port ethernet devices handle this?  Seems that the
cleanest way to handle this would be to add a place for drivers to set
the port number, and export it to userspace in generic code (so everyone
does it the same way).
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply

* [PATCH net-next] ixgbe: make macvlan on PF working when SRIOV is enabled
From: Shirley Ma @ 2010-05-25 16:55 UTC (permalink / raw)
  To: jeffrey.t.kirsher, gregory.v.rose
  Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	davem@davemloft.net, kvm@vger.kernel.org

When SRIOV is enabled, if we create macvlan on PF and load the VF driver in host 
domain, when macvlan/PF and VF interfaces are all up, the macvlan incoming network
traffics are directed to VF interface not PF interface. This patch has fixed this
issue.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
--- 
 drivers/net/ixgbe/ixgbe_common.c |   26 ++++++++++++++++----------
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_common.c b/drivers/net/ixgbe/ixgbe_common.c
index 1159d91..591dd19 100644
--- a/drivers/net/ixgbe/ixgbe_common.c
+++ b/drivers/net/ixgbe/ixgbe_common.c
@@ -1271,6 +1271,7 @@ s32 ixgbe_init_rx_addrs_generic(struct ixgbe_hw *hw)
 {
 	u32 i;
 	u32 rar_entries = hw->mac.num_rar_entries;
+	struct ixgbe_adapter *adapter = hw->back;
 
 	/*
 	 * If the current mac address is valid, assume it is a software override
@@ -1288,15 +1289,18 @@ s32 ixgbe_init_rx_addrs_generic(struct ixgbe_hw *hw)
 		hw_dbg(hw, "Overriding MAC Address in RAR[0]\n");
 		hw_dbg(hw, " New MAC Addr =%pM\n", hw->mac.addr);
 
-		hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
+		hw->mac.ops.set_rar(hw, 0, hw->mac.addr, adapter->num_vfs,
+				    IXGBE_RAH_AV);
 	}
 	hw->addr_ctrl.overflow_promisc = 0;
 
-	hw->addr_ctrl.rar_used_count = 1;
+	/* reserve VFs receive address entries if any */
+	hw->addr_ctrl.rar_used_count = adapter->num_vfs + 1;
 
-	/* Zero out the other receive addresses. */
-	hw_dbg(hw, "Clearing RAR[1-%d]\n", rar_entries - 1);
-	for (i = 1; i < rar_entries; i++) {
+	/* Zero out the other receive addresses except VFs if any */
+	hw_dbg(hw, "Clearing RAR[%d-%d]\n",
+	       adapter->num_vfs + 1, rar_entries - 1);
+	for (i = adapter->num_vfs + 1; i < rar_entries; i++) {
 		IXGBE_WRITE_REG(hw, IXGBE_RAL(i), 0);
 		IXGBE_WRITE_REG(hw, IXGBE_RAH(i), 0);
 	}
@@ -1368,18 +1372,20 @@ s32 ixgbe_update_uc_addr_list_generic(struct ixgbe_hw *hw,
 	u32 uc_addr_in_use;
 	u32 fctrl;
 	struct netdev_hw_addr *ha;
+	struct ixgbe_adapter *adapter = hw->back;
 
 	/*
 	 * Clear accounting of old secondary address list,
 	 * don't count RAR[0]
 	 */
-	uc_addr_in_use = hw->addr_ctrl.rar_used_count - 1;
+	uc_addr_in_use = hw->addr_ctrl.rar_used_count - 1 - adapter->num_vfs;
 	hw->addr_ctrl.rar_used_count -= uc_addr_in_use;
 	hw->addr_ctrl.overflow_promisc = 0;
 
-	/* Zero out the other receive addresses */
-	hw_dbg(hw, "Clearing RAR[1-%d]\n", uc_addr_in_use + 1);
-	for (i = 0; i < uc_addr_in_use; i++) {
+	/* Zero out the other receive addresses except VFs if any */
+	hw_dbg(hw, "Clearing RAR[%d-%d]\n",
+	       adapter->num_vfs + 1, uc_addr_in_use + adapter->num_vfs + 1);
+	for (i = adapter->num_vfs; i < uc_addr_in_use + adapter->num_vfs; i++) {
 		IXGBE_WRITE_REG(hw, IXGBE_RAL(1+i), 0);
 		IXGBE_WRITE_REG(hw, IXGBE_RAH(1+i), 0);
 	}
@@ -1387,7 +1393,7 @@ s32 ixgbe_update_uc_addr_list_generic(struct ixgbe_hw *hw,
 	/* Add the new addresses */
 	netdev_for_each_uc_addr(ha, netdev) {
 		hw_dbg(hw, " Adding the secondary addresses:\n");
-		ixgbe_add_uc_addr(hw, ha->addr, 0);
+		ixgbe_add_uc_addr(hw, ha->addr, adapter->num_vfs);
 	}
 
 	if (hw->addr_ctrl.overflow_promisc) {




------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related

* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Michael S. Tsirkin @ 2010-05-25 16:53 UTC (permalink / raw)
  To: Paul Menage
  Cc: Sridhar Samudrala, netdev, kvm@vger.kernel.org, lkml, containers,
	lizf
In-Reply-To: <AANLkTinsrFoLVKDFM5pcKcL_6MvAzhR6IzbNmWKh3BDh@mail.gmail.com>

On Thu, May 20, 2010 at 03:22:15PM -0700, Paul Menage wrote:
> On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
> <samudrala.sridhar@gmail.com> wrote:
> > Add a new kernel API to attach a task to current task's cgroup
> > in all the active hierarchies.
> >
> > Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> 
> Reviewed-by: Paul Menage <menage@google.com>
> 
> It would be more efficient to just attach directly to current->cgroups
> rather than potentially creating/destroying one css_set for each
> hierarchy until we've completely converged on current->cgroups - but
> that would require a bunch of refactoring of the guts of
> cgroup_attach_task() to ensure that the right can_attach()/attach()
> callbacks are made. That doesn't really seem worthwhile right now for
> the initial use, that I imagine isn't going to be
> performance-sensitive.
> 
> Paul

Is this patch suitable for 2.6.35?
It is needed to fix the case where vhost user might cause a kernel thread
to consume more CPU than allowed by the cgroup.
Should I just merge it through the vhost tree?
Ack for this?

Thanks,

-- 
MST

^ permalink raw reply

* RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Shirley Ma @ 2010-05-25 16:51 UTC (permalink / raw)
  To: Rose, Gregory V
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, kvm@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <43F901BD926A4E43B106BF17856F0755A7A58B5F@orsmsx508.amr.corp.intel.com>

On Tue, 2010-05-25 at 08:57 -0700, Rose, Gregory V wrote:
> Just use ifconfig and vconfig utilities to set the MAC and VLAN for
> each VF.  There shouldn't be any need for secondary addresses because
> they're not like physical devices where each VF has a pre-programmed
> HW MAC address.  The initial MAC address of each VF is generated on
> the fly during the PF driver initialization. You can change it as you
> see fit and then put the VF on a VLAN using vconfig.  After you do
> that you have a macvlan filter for that VF.

I run macvlan test not vlan. macvlan is used to give a second MAC
address to a network adapter and see it as a new device at the higher
levels. The command is used as follow:

ip link add link eth4 address 54:52:00:35:e3:20 macvlan2 type macvlan

It will create an interface name macvlan2 with above MAC address. In
kernel, netdev eth4 maintains this secondary address.

Thanks
Shirley



^ permalink raw reply

* [REGRESSION] [PATCH] ssd: fix NULL ptr deref when pcihost_wrapper is used
From: Christoph Fritz @ 2010-05-25 16:37 UTC (permalink / raw)
  To: Michael Buesch, Gary Zambrano
  Cc: John W. Linville, Rafał Miłecki, netdev, linux-kernel

Hi,

 with the attached patch my b44 ethernet works again.


Thanks,
 Christoph

---
Ethernet driver b44 does register ssb by it's pcihost_wrapper
and doesn't set ssb_chipcommon. A check on this value
introduced with commit d53cdbb94a52a920d5420ed64d986c3523a56743
and ea2db495f92ad2cf3301623e60cb95b4062bc484 triggers:

BUG: unable to handle kernel NULL pointer dereference at 00000010
IP: [<c1266c36>] ssb_is_sprom_available+0x16/0x30

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
---
 drivers/ssb/pci.c   |    9 ++++++---
 drivers/ssb/sprom.c |    1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/ssb/pci.c b/drivers/ssb/pci.c
index 989e275..6dcda86 100644
--- a/drivers/ssb/pci.c
+++ b/drivers/ssb/pci.c
@@ -625,9 +625,12 @@ static int ssb_pci_sprom_get(struct ssb_bus *bus,
 		ssb_printk(KERN_ERR PFX "No SPROM available!\n");
 		return -ENODEV;
 	}
-
-	bus->sprom_offset = (bus->chipco.dev->id.revision < 31) ?
-		SSB_SPROM_BASE1 : SSB_SPROM_BASE31;
+	if (bus->chipco.dev) {	/* can be unavailible! */
+		bus->sprom_offset = (bus->chipco.dev->id.revision < 31) ?
+			SSB_SPROM_BASE1 : SSB_SPROM_BASE31;
+	} else {
+		bus->sprom_offset = SSB_SPROM_BASE1;
+	}
 
 	buf = kcalloc(SSB_SPROMSIZE_WORDS_R123, sizeof(u16), GFP_KERNEL);
 	if (!buf)
diff --git a/drivers/ssb/sprom.c b/drivers/ssb/sprom.c
index 007bc3a..4f7cc8d 100644
--- a/drivers/ssb/sprom.c
+++ b/drivers/ssb/sprom.c
@@ -185,6 +185,7 @@ bool ssb_is_sprom_available(struct ssb_bus *bus)
 	/* this routine differs from specs as we do not access SPROM directly
 	   on PCMCIA */
 	if (bus->bustype == SSB_BUSTYPE_PCI &&
+	    bus->chipco.dev &&	/* can be unavailible! */
 	    bus->chipco.dev->id.revision >= 31)
 		return bus->chipco.capabilities & SSB_CHIPCO_CAP_SPROM;
 
-- 
1.5.6.5
---
b44 0000:02:0e.0: PCI INT A -> Link[C0D8] -> GSI 11 (level, low) -> IRQ 11
ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07, vendor 0x4243)
ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
BUG: unable to handle kernel NULL pointer dereference at 00000010
IP: [<c1266c36>] ssb_is_sprom_available+0x16/0x30
*pde = 00000000 
Oops: 0000 [#1] PREEMPT 
last sysfs file: /sys/devices/pnp0/00:04/id
Modules linked in: video backlight b44(+) mii yenta_socket sg sr_mod cdrom

Pid: 2013, comm: modprobe Not tainted 2.6.34d #1 3088/HP Compaq nx6110 (EK201ET#ABD)
EIP: 0060:[<c1266c36>] EFLAGS: 00010246 CPU: 0
EIP is at ssb_is_sprom_available+0x16/0x30
EAX: 00000000 EBX: de4de000 ECX: 00000000 EDX: de4de000
ESI: de5bbdd0 EDI: de5bbdd0 EBP: de4de000 ESP: de5bbda8
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process modprobe (pid: 2013, ti=de5ba000 task=df05e700 task.ti=de5ba000)
Stack:
 c12675e0 00000000 de4de000 c12651cb de4de000 de5bbdd0 00000000 de4de2dc
<0> c1265eb3 c12675d0 00000000 00000000 00000000 00000000 00000000 00000000
<0> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace:
 [<c12675e0>] ? ssb_pci_get_invariants+0x10/0x530
 [<c12651cb>] ? ssb_bus_powerup+0x3b/0x60
 [<c1265eb3>] ? ssb_bus_register+0x163/0x240
 [<c12675d0>] ? ssb_pci_get_invariants+0x0/0x530
 [<c1266004>] ? ssb_bus_pcibus_register+0x24/0x70
 [<c1268220>] ? ssb_pcihost_probe+0x90/0xe0
 [<c1166e0b>] ? local_pci_probe+0xb/0x10
 [<c1167d19>] ? pci_device_probe+0x69/0x90
 [<c11e5dec>] ? driver_probe_device+0x7c/0x1b0
 [<c11e5fa1>] ? __driver_attach+0x81/0x90
 [<c11e5674>] ? bus_for_each_dev+0x54/0x80
 [<c11e5c76>] ? driver_attach+0x16/0x20
 [<c11e5f20>] ? __driver_attach+0x0/0x90
 [<c11e4e67>] ? bus_add_driver+0xb7/0x280
 [<c1166e10>] ? pci_device_shutdown+0x0/0x20
 [<c1167c50>] ? pci_device_remove+0x0/0x40
 [<c11e6227>] ? driver_register+0x67/0x150
 [<c1167f66>] ? __pci_register_driver+0x36/0xa0
 [<e00e1000>] ? b44_init+0x0/0x56 [b44]
 [<e00e102b>] ? b44_init+0x2b/0x56 [b44]
 [<e00e1000>] ? b44_init+0x0/0x56 [b44]
 [<c100111e>] ? do_one_initcall+0x2e/0x190
 [<c104fbc2>] ? sys_init_module+0xb2/0x210
 [<c107f2ac>] ? sys_mmap_pgoff+0xfc/0x110
 [<c1328749>] ? syscall_call+0x7/0xb
Code: 90 90 90 90 90 a1 04 68 4b c1 c3 8d 76 00 8d bc 27 00 00 00 00 83 78 10 01 89 c2 74 08 b0 01 c3 90 8d 74 26 00 8b 80 3c 02 00 00 <80> 78 10 1e 76 ec 8b 82 40 02 00 00 c1 e8 1e 24 01 c3 90 8d b4 
EIP: [<c1266c36>] ssb_is_sprom_available+0x16/0x30 SS:ESP 0068:de5bbda8
CR2: 0000000000000010
---[ end trace c237bc4aec4c0537 ]---

^ permalink raw reply related

* Re: [stable-2.6.32 PATCH] ixgbe: backport bug fix for tx panic
From: Brandeburg, Jesse @ 2010-05-25 16:27 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: stable@kernel.org, greg@kroah.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, davem@davemloft.net, Brandon
In-Reply-To: <AANLkTinHm8ukQ3KlcrMKWvQ-QrlmBcWeVD-i99Q9Z7JL@mail.gmail.com>



On Tue, 25 May 2010, Jeff Kirsher wrote:

> On Mon, May 10, 2010 at 17:46, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >
> > backporting this commit:
> >
> > commit fdd3d631cddad20ad9d3e1eb7dbf26825a8a121f
> > Author: Krishna Kumar <krkumar2@in.ibm.com>
> > Date:   Wed Feb 3 13:13:10 2010 +0000
> >
> >    ixgbe: Fix return of invalid txq
> >
> >    a developer had complained of getting lots of warnings:
> >
> >    "eth16 selects TX queue 98, but real number of TX queues is 64"
> >
> >    http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02200.html
> >
> >    As there was no follow up on that bug, I am submitting this
> >    patch assuming that the other return points will not return
> >    invalid txq's, and also that this fixes the bug (not tested).
> >
> >    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> >    Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >    Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> >    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> >    Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > CC: Brandon <brandon@ifup.org>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > ---
> >
> >  drivers/net/ixgbe/ixgbe_main.c |    8 ++++++--
> >  1 files changed, 6 insertions(+), 2 deletions(-)
> >
> 
> Greg - status?  Did you queue this patch for the stable release and I missed it?

Maybe we didn't say (and we should have) that this fixes a panic on 
machines with > 64 cores.  Please apply to -stable 32.

^ permalink raw reply

* RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
From: Rose, Gregory V @ 2010-05-25 15:57 UTC (permalink / raw)
  To: Shirley Ma
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, kvm@vger.kernel.org,
	netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <1274780239.18023.48.camel@localhost.localdomain>

>-----Original Message-----
>From: Shirley Ma [mailto:mashirle@us.ibm.com]
>Sent: Tuesday, May 25, 2010 2:37 AM
>To: Rose, Gregory V
>Cc: Kirsher, Jeffrey T; davem@davemloft.net; kvm@vger.kernel.org;
>netdev@vger.kernel.org; e1000-devel@lists.sourceforge.net
>Subject: RE: ixgbe: macvlan on PF/VF when SRIOV is enabled
>
>On Mon, 2010-05-24 at 10:54 -0700, Rose, Gregory V wrote:
>> We look forward to it and will be happy to provide feedback.
>I have submitted the patch to make macvlan on PF works when SRIOV is
>enabled.
>
>> One thing you can do is allocate VFs and then load the VF driver in
>> your host domain and then assign each of them a macvlan filter.  You'd
>> get a similar effect.
>
>That's I am trying to make it work for macvlan on VFs in host domain. I
>need to add VF secondary addresses in address filter, right?

Just use ifconfig and vconfig utilities to set the MAC and VLAN for each VF.  There shouldn't be any need for secondary addresses because they're not like physical devices where each VF has a pre-programmed HW MAC address.  The initial MAC address of each VF is generated on the fly during the PF driver initialization. You can change it as you see fit and then put the VF on a VLAN using vconfig.  After you do that you have a macvlan filter for that VF.

>
>Do you have any aggregation performance comparison between multiple
>macvlans on PF and single macvlan per VF in host domain?

No, we haven't done any performance testing of that particular sort but I imagine you should be able to get close to line rates using multiple VFs with a single macvlan filter for each.

Regards,

- Greg



^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Eric Dumazet @ 2010-05-25 15:27 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev
In-Reply-To: <20100525115813.GA28063@kryten>

Le mardi 25 mai 2010 à 21:58 +1000, Anton Blanchard a écrit :
> Hi,
> 
> A build from last night (2.6.34-git9) is hitting the following WARN_ON on
> a ppc64 box:
> 
> Badness at net/ipv4/af_inet.c:154
> 
> NIP [c00000000062b034] .inet_sock_destruct+0x1b0/0x1f0
> LR [c0000000005b376c] .__sk_free+0x58/0x154
> Call Trace:
> [c000000000088c0c] .__local_bh_disable+0xdc/0xfc (unreliable)
> [c0000000005b376c] .__sk_free+0x58/0x154
> [c0000000005b3978] .sk_free+0x4c/0x60
> [c0000000005b3ab4] .sk_common_release+0x128/0x148
> [c00000000061da44] .udp_lib_close+0x28/0x40
> [c00000000062a990] .inet_release+0xd0/0xf8
> [c0000000005adbf8] .sock_release+0x60/0xec
> [c0000000005adcd0] .sock_close+0x4c/0x6c
> [c000000000186554] .__fput+0x184/0x270
> [c00000000018668c] .fput+0x4c/0x60
> [c000000000182248] .filp_close+0xc8/0xf4
> [c0000000000826d4] .put_files_struct+0xc8/0x158
> [c0000000000827d4] .exit_files+0x70/0x8c
> [c0000000000848cc] .do_exit+0x2c8/0x82c
> [c000000000084efc] .do_group_exit+0xcc/0x100
> [c000000000084f5c] .SyS_exit_group+0x2c/0x48
> [c000000000008554] syscall_exit+0x0/0x40
> 
> Which is:
> 
>         WARN_ON(sk->sk_forward_alloc);
> 

Yes, the infamous one :)

Is it reproductible ? What kind of workload is it ?
What is the NIC involved ?




^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25 14:42 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.00.1005251353370.20667@blackhole.kfki.hu>

On Tue, May 25, 2010 at 8:17 PM, Jozsef Kadlecsik
<kadlec@blackhole.kfki.hu> wrote:
> Hi,
>
> On Tue, 25 May 2010, Changli Gao wrote:
>
>> iptables target SYNPROXY.
>>
>> This patch implements an iptables target SYNPROXY, which works in the raw table
>> of the PREROUTING chain, before conntracking system. Syncookies is used, so no
>> new state is introduced into the conntracking system. In fact, until the first
>> connection is established, conntracking system doesn't see any packets. So when
>> there is a SYN-flood attack, conntracking system won't be busy on finding and
>> deleting the un-assured ct.
>
> My main problem with your target is that by using it, important and useful
> TCP options are lost: timestamp and SACK. That pushes back TCP by almost
> twenty years.

Yea. Only MSS option is  supported. But it is better than being DoSed.
And you can set a threshold for SYNPROXY with limit match, then there
isn't any difference if there isn't any SYN-flood attack.

>
> Here you reason for the target that it protects conntrack itself, but in
> the Kconfig text you write that it protects the servers behind the
> firewall. Both can be true, but if the real goal is to defend the servers
> then your target could simply send a faked ACK to complete the three way
> handshake and that way TCP would not be crippled (conntrack timeout
> should still be adjusted).
>

Yes, both can be true. You descried above is called SYNDefender by
Checkpoint, and it doesn't work as well as SYNPROXY.

http://www.usenix.org/events/sec01/invitedtalks/oliver.pdf

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] r8169: Add counters tx_bytes and rx_bytes for ethtool
From: Junchang Wang @ 2010-05-25 14:19 UTC (permalink / raw)
  To: romieu; +Cc: netdev

Traffic stats counters (rx_bytes and tx_bytes) in net_device are
"unsigned long". On 32-bit systems, they wrap around every few
minutes, giving out wrong answers to the amount of traffic. To get the
right message, another available approach is "ethtool -S". However,
r8169 didn't support those two counters so far.

Add traffic counters tx_bytes and rx_bytes with 64-bit width for
ethtool. On 32-bit systems, gcc treats each one as two 32-bit
variables, making the increment not "atomic". But there is no sync
issue since the updates to the counters are serialized by driver logic
in any case. Results provided by ethtool maybe slightly biased if the
read and update operations are interleaved. But the results are much
better than the original ones that always fall into the range from 0
to 4GiB.

Signed-off-by: Junchang Wang <junchangwang@gmail.com>
---
 drivers/net/r8169.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..19a2748 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -510,6 +510,8 @@ struct rtl8169_private {

 	struct mii_if_info mii;
 	struct rtl8169_counters counters;
+	u64 tx_bytes;
+	u64 rx_bytes;
 	u32 saved_wolopts;
 };

@@ -1162,6 +1164,8 @@ static void rtl8169_set_msglevel(struct
net_device *dev, u32 value)
 static const char rtl8169_gstrings[][ETH_GSTRING_LEN] = {
 	"tx_packets",
 	"rx_packets",
+	"tx_bytes",
+	"rx_bytes",
 	"tx_errors",
 	"rx_errors",
 	"rx_missed",
@@ -1236,17 +1240,19 @@ static void rtl8169_get_ethtool_stats(struct
net_device *dev,

 	data[0] = le64_to_cpu(tp->counters.tx_packets);
 	data[1] = le64_to_cpu(tp->counters.rx_packets);
-	data[2] = le64_to_cpu(tp->counters.tx_errors);
-	data[3] = le32_to_cpu(tp->counters.rx_errors);
-	data[4] = le16_to_cpu(tp->counters.rx_missed);
-	data[5] = le16_to_cpu(tp->counters.align_errors);
-	data[6] = le32_to_cpu(tp->counters.tx_one_collision);
-	data[7] = le32_to_cpu(tp->counters.tx_multi_collision);
-	data[8] = le64_to_cpu(tp->counters.rx_unicast);
-	data[9] = le64_to_cpu(tp->counters.rx_broadcast);
-	data[10] = le32_to_cpu(tp->counters.rx_multicast);
-	data[11] = le16_to_cpu(tp->counters.tx_aborted);
-	data[12] = le16_to_cpu(tp->counters.tx_underun);
+	data[2] = tp->tx_bytes;
+	data[3] = tp->rx_bytes;
+	data[4] = le64_to_cpu(tp->counters.tx_errors);
+	data[5] = le32_to_cpu(tp->counters.rx_errors);
+	data[6] = le16_to_cpu(tp->counters.rx_missed);
+	data[7] = le16_to_cpu(tp->counters.align_errors);
+	data[8] = le32_to_cpu(tp->counters.tx_one_collision);
+	data[9] = le32_to_cpu(tp->counters.tx_multi_collision);
+	data[10] = le64_to_cpu(tp->counters.rx_unicast);
+	data[11] = le64_to_cpu(tp->counters.rx_broadcast);
+	data[12] = le32_to_cpu(tp->counters.rx_multicast);
+	data[13] = le16_to_cpu(tp->counters.tx_aborted);
+	data[14] = le16_to_cpu(tp->counters.tx_underun);
 }

 static void rtl8169_get_strings(struct net_device *dev, u32
stringset, u8 *data)
@@ -4412,6 +4418,7 @@ static void rtl8169_tx_interrupt(struct net_device *dev,

 		dev->stats.tx_bytes += len;
 		dev->stats.tx_packets++;
+		tp->tx_bytes += len;

 		rtl8169_unmap_tx_skb(tp->pci_dev, tx_skb, tp->TxDescArray + entry);

@@ -4567,6 +4574,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,

 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;
+			tp->rx_bytes += pkt_size;
 		}

 		/* Work around for AMD plateform. */

--

^ permalink raw reply related

* [PATCH net] Phonet: fix potential use-after-free in pep_sock_close()
From: Rémi Denis-Courmont @ 2010-05-25 14:11 UTC (permalink / raw)
  To: netdev

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

sk_common_release() might destroy our last reference to the socket.
So an extra temporary reference is needed during cleanup.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 net/phonet/pep.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index af4d38b..7b048a3 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -626,6 +626,7 @@ static void pep_sock_close(struct sock *sk, long timeout)
 	struct pep_sock *pn = pep_sk(sk);
 	int ifindex = 0;
 
+	sock_hold(sk); /* keep a reference after sk_common_release() */
 	sk_common_release(sk);
 
 	lock_sock(sk);
@@ -644,6 +645,7 @@ static void pep_sock_close(struct sock *sk, long timeout)
 
 	if (ifindex)
 		gprs_detach(sk);
+	sock_put(sk);
 }
 
 static int pep_wait_connreq(struct sock *sk, int noblock)
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] mlx4_en: show device's port used
From: Eli Cohen @ 2010-05-25 13:55 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux RDMA list, Roland Dreier,
	yevgenyp-VPRAkNaXOzVS1MOuV/RT9w

Add a sysfs file under /sys/class/net/<ethx> to show the port number within the
device that this network interface is using. This is needed as ConnectX devices
have two ports and it is useful to know which port the ethernet devices uses.

Signed-off-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
---
 drivers/net/mlx4/en_netdev.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 73c3d20..5d6f811 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -871,6 +871,16 @@ err:
 	return -ENOMEM;
 }
 
+static ssize_t show_port(struct device *d, struct device_attribute *attr,
+			 char *buf)
+{
+	struct mlx4_en_priv *priv = netdev_priv(to_net_dev(d));
+
+	return sprintf(buf, "%d\n", priv->port);
+	return 0;
+}
+
+static DEVICE_ATTR(port, S_IRUGO, show_port, NULL);
 
 void mlx4_en_destroy_netdev(struct net_device *dev)
 {
@@ -1067,6 +1077,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	en_warn(priv, "Using %d TX rings\n", prof->tx_ring_num);
 	en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
 
+	err = device_create_file(&dev->dev, &dev_attr_port);
+	if (err) {
+		mlx4_err(mdev, "failed to create sysfs file\n");
+		goto out;
+	}
+
 	priv->registered = 1;
 	queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
 	return 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 13:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525124636.GA13161@gondor.apana.org.au>

On Tue, 2010-05-25 at 22:46 +1000, Herbert Xu wrote:

> That's not very surprising as you're not checking whether the
> skb is cloned in act_pedit.c:

I meant the test "if (skb_cloned(skb))" failed in such cases;-> So you
couldnt reliably use it. 
If it turns out it is unnecessary, what you describe is what i had in
mind as well.

> BTW, this error handling also looks a bit suss.  Shouldn't it
> drop the packet instead of continuing as if we have successfully
> modified it?

It is not the responsibility of the action to drop packets in a pipeline
rather the responsibility is that of the caller (ref: rule #3 in
Documentation/networking/tc-actions-env-rules.txt). What to do on a
failure such as above is programmable by the action user/admin.

Anyways, now i am really gone. Dont make me look in again Herbert ;->

cheers,
jamal


^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 12:46 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274790038.3878.926.camel@bigi>

On Tue, May 25, 2010 at 08:20:38AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 22:12 +1000, Herbert Xu wrote:
> 
> > It is guaranteed, it is illegal to hold onto the skb contents
> > without the reference that comes from cloning an skb.
> 
> As i recall this was an issue. And the example i described earlier
> would have tcpdump show edited contents coming out of eth0.

That's not very surprising as you're not checking whether the
skb is cloned in act_pedit.c:

	if (!(skb->tc_verd & TC_OK2MUNGE)) {
		/* should we set skb->cloned? */
		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
			return p->tcf_action;
		}
	}

This code needs to be changed to something like

	if (skb_cloned(skb)) {
		if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
			return p->tcf_action;
		}
	}

BTW, this error handling also looks a bit suss.  Shouldn't it
drop the packet instead of continuing as if we have successfully
modified it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] hso: add support for new products
From: f.aben @ 2010-05-25 12:33 UTC (permalink / raw)
  To: davem, gregkh; +Cc: linux-usb, netdev, j.dumon


This patch adds a few new product id's for the hso driver.

Signed-off-by: Filip Aben <f.aben@option.com>
---
diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index be0cc99..8355f0e 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -475,6 +475,9 @@ static const struct usb_device_id hso_ids[] = {
 	{USB_DEVICE(0x0af0, 0x8302)},
 	{USB_DEVICE(0x0af0, 0x8304)},
 	{USB_DEVICE(0x0af0, 0x8400)},
+	{USB_DEVICE(0x0af0, 0x8600)},
+	{USB_DEVICE(0x0af0, 0x8800)},
+	{USB_DEVICE(0x0af0, 0x8900)},
 	{USB_DEVICE(0x0af0, 0xd035)},
 	{USB_DEVICE(0x0af0, 0xd055)},
 	{USB_DEVICE(0x0af0, 0xd155)},

^ permalink raw reply related

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525121202.GA12712@gondor.apana.org.au>

On Tue, 2010-05-25 at 22:12 +1000, Herbert Xu wrote:

> It is guaranteed, it is illegal to hold onto the skb contents
> without the reference that comes from cloning an skb.

As i recall this was an issue. And the example i described earlier
would have tcpdump show edited contents coming out of eth0.

> The DHCP client may keep the AF_PACKET open even if there is no
> ongoing dialogue.

Ok ;-> Not running that dhcp client without looking at how it behaves.

> For ingress, the packet may have looped back from a virtual
> interface.  It may have been cloned on its way out of the virtual
> interface.

and i think this is where you and i ended last time we talked.

> Looking at ptype instead of checking whether the skb is cloned
> simply doesn't work in this case.

I will test in a few days - or if Jiri has the bandwidth i think i can
describe what to test (I have the relevant tests logged somewhere).

I apologize there may be some large latency of about a day before my
next response.

cheers,
jamal


^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jozsef Kadlecsik @ 2010-05-25 12:17 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>

Hi,

On Tue, 25 May 2010, Changli Gao wrote:

> iptables target SYNPROXY.
> 
> This patch implements an iptables target SYNPROXY, which works in the raw table
> of the PREROUTING chain, before conntracking system. Syncookies is used, so no
> new state is introduced into the conntracking system. In fact, until the first
> connection is established, conntracking system doesn't see any packets. So when
> there is a SYN-flood attack, conntracking system won't be busy on finding and
> deleting the un-assured ct.

My main problem with your target is that by using it, important and useful 
TCP options are lost: timestamp and SACK. That pushes back TCP by almost 
twenty years.

Here you reason for the target that it protects conntrack itself, but in 
the Kconfig text you write that it protects the servers behind the 
firewall. Both can be true, but if the real goal is to defend the servers 
then your target could simply send a faked ACK to complete the three way 
handshake and that way TCP would not be crippled (conntrack timeout 
should still be adjusted).

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-05-25 12:12 UTC (permalink / raw)
  To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1274789024.3878.919.camel@bigi>

On Tue, May 25, 2010 at 08:03:44AM -0400, jamal wrote:
> On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:
> 
> > In that case you should be checking whether the skb is cloned.
> 
> That is the general rule used (and what i specify to do in the docs)..
> but i recall there were issues if the packet path emanated from ingress
> and included multiple netdevices (earlier ex with mirror applies). There
> may have been bugs then, eg I could not assume that it i had any ptype
> at all that the ptype will clone the packet. Does tcpdump guarantee
> skb->clone being set? I will try to test some scenarios when i am back
> +settled. 

It is guaranteed, it is illegal to hold onto the skb contents
without the reference that comes from cloning an skb.

> "Most" for people running serious firewalls or routers is not to run
> DHCP servers;-> They may client, but thats a short-lived session.

The DHCP client may keep the AF_PACKET open even if there is no
ongoing dialogue.

> > Also
> > the skb may still be cloned even if there is no AF_PACKET listener.
> > In that case your optimisation may be incorrect.
> 
> Did you mean that as long as there are other ptypes - which may or 
> not be doing af packet? 

For ingress, the packet may have looped back from a virtual
interface.  It may have been cloned on its way out of the virtual
interface.

Looking at ptype instead of checking whether the skb is cloned
simply doesn't work in this case.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-05-25 12:03 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100525102603.GA11494@gondor.apana.org.au>

On Tue, 2010-05-25 at 20:26 +1000, Herbert Xu wrote:

> In that case you should be checking whether the skb is cloned.

That is the general rule used (and what i specify to do in the docs)..
but i recall there were issues if the packet path emanated from ingress
and included multiple netdevices (earlier ex with mirror applies). There
may have been bugs then, eg I could not assume that it i had any ptype
at all that the ptype will clone the packet. Does tcpdump guarantee
skb->clone being set? I will try to test some scenarios when i am back
+settled. 

> After all, tcpdump might have simply filtered the packet out.

True - but i think thats an acceptable compromise.

> BTW, this is the case whenever you run a DHCP client/server.  So
> on most boxes your optimisation will never kick in as is.  

"Most" for people running serious firewalls or routers is not to run
DHCP servers;-> They may client, but thats a short-lived session.

> Also
> the skb may still be cloned even if there is no AF_PACKET listener.
> In that case your optimisation may be incorrect.

Did you mean that as long as there are other ptypes - which may or 
not be doing af packet? 

cheers,
jamal


^ permalink raw reply

* Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-25 11:58 UTC (permalink / raw)
  To: netdev


Hi,

A build from last night (2.6.34-git9) is hitting the following WARN_ON on
a ppc64 box:

Badness at net/ipv4/af_inet.c:154

NIP [c00000000062b034] .inet_sock_destruct+0x1b0/0x1f0
LR [c0000000005b376c] .__sk_free+0x58/0x154
Call Trace:
[c000000000088c0c] .__local_bh_disable+0xdc/0xfc (unreliable)
[c0000000005b376c] .__sk_free+0x58/0x154
[c0000000005b3978] .sk_free+0x4c/0x60
[c0000000005b3ab4] .sk_common_release+0x128/0x148
[c00000000061da44] .udp_lib_close+0x28/0x40
[c00000000062a990] .inet_release+0xd0/0xf8
[c0000000005adbf8] .sock_release+0x60/0xec
[c0000000005adcd0] .sock_close+0x4c/0x6c
[c000000000186554] .__fput+0x184/0x270
[c00000000018668c] .fput+0x4c/0x60
[c000000000182248] .filp_close+0xc8/0xf4
[c0000000000826d4] .put_files_struct+0xc8/0x158
[c0000000000827d4] .exit_files+0x70/0x8c
[c0000000000848cc] .do_exit+0x2c8/0x82c
[c000000000084efc] .do_group_exit+0xcc/0x100
[c000000000084f5c] .SyS_exit_group+0x2c/0x48
[c000000000008554] syscall_exit+0x0/0x40

Which is:

        WARN_ON(sk->sk_forward_alloc);

Anton

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 11:50 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <AANLkTim2DrXuBPUBK0TseqVJ5lxPRQjK0P_QJVsbtYRC@mail.gmail.com>


On Tuesday 2010-05-25 13:26, Changli Gao wrote:
>> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>>> net/ipv4/netfilter/Kconfig                  |   12
>>> net/ipv4/netfilter/Makefile                 |    1
>>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>>> net/ipv4/syncookies.c                       |   21
>>> net/ipv4/tcp_ipv4.c                         |    5
>>> net/netfilter/nf_conntrack_core.c           |   44 +
>>
>> Please make this an Xtables extension.
>> There is excellent documentation ("Writing Netfilter Modules" to google)
>> on the details if needed.
>
>Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
>and hope sb. comes on with IPv6 support after it gets merged.

This should still be xt even if it does not do ipv6.

>>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>>+      BUG_ON(th == NULL);
>>
>> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
>> (with proper IPV4 header).
>
>I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
>isn't useful at all, as tcp_error is called before this function
>is called.

Ah I see it hard-depends on nf_conntrack_ipv4 (in Kconfig).
That too could be done instead by using nf_ct_get_l3proto at runtime.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Changli Gao @ 2010-05-25 11:26 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1005251207450.1725@obet.zrqbmnf.qr>

On Tue, May 25, 2010 at 6:36 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
>
> On Tuesday 2010-05-25 09:06, Changli Gao wrote:
>> net/ipv4/netfilter/Kconfig                  |   12
>> net/ipv4/netfilter/Makefile                 |    1
>> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
>> net/ipv4/syncookies.c                       |   21
>> net/ipv4/tcp_ipv4.c                         |    5
>> net/netfilter/nf_conntrack_core.c           |   44 +
>
> Please make this an Xtables extension.
> There is excellent documentation ("Writing Netfilter Modules" to google)
> on the details if needed.

Hmm. I don't know IPv6 well. So I leave it as an iptables extension,
and hope sb. comes on with IPv6 support after it gets merged.

>
>>--- a/include/net/netfilter/nf_conntrack_core.h
>>+++ b/include/net/netfilter/nf_conntrack_core.h
>>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
>>       if (ct && ct != &nf_conntrack_untracked) {
>>               if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
>>                       ret = __nf_conntrack_confirm(skb);
>>-              if (likely(ret == NF_ACCEPT))
>>+              if (likely(ret == NF_ACCEPT)) {
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>>+                                       enum ip_conntrack_info);
>>+#endif
>>+
>>                       nf_ct_deliver_cached_events(ct);
>>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>>+                      syn_proxy = rcu_dereference(syn_proxy_post_hook);
>>+                      if (syn_proxy)
>>+                              ret = syn_proxy(skb, ct, skb->nfctinfo);
>>+#endif
>>+              }
>>       }
>>       return ret;
>> }
>
> I'm sure this is possible with lesser macro intrusion - use a separate
> function. Same in nf_conntrack_core.c

I'm thinking about adding two helper functions into structure
nf_conntrack_l4proto. Then the l4 protocol sepcific code won't be in
nf_conntrack_core.c but in nf_conntrack_proto_tcp.c.

>
>>--- a/include/net/tcp.h
>>+++ b/include/net/tcp.h
>>@@ -460,8 +460,18 @@ extern int                        tcp_disconnect(struct sock *sk, int flags);
>> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
>> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>>                                   struct ip_options *opt);
>>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb,
>>-                                   __u16 *mss);
>>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>>+                                     __be16 sport, __be16 dport, __u32 seq,
>>+                                     __u16 *mssp);
>>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>>+                                          const struct tcphdr *th,
>>+                                          __u16 *mssp)
>>+{
>>+      return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>>+                                       th->dest, ntohl(th->seq), mssp);
>>+}
>
> Since there is only one cookie_v4_init_sequence user AFAICS,
> moving the function there seems to make best sense.

OK. I wanted to keep cookie_v4_init_sequence() and
cookie_v4_check_sequence() symmetric.

>
>>+static int get_mss(u8 *data, int len)
>
> unsigned

I am afraid that I can't do that. As get_mss() may return a negative
value, if there is some invalid TCP option.

>
>>+      /* only support IPv4 now */
>
> Please fix :)

The same reasion as above.

>
>>+      th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>>+      BUG_ON(th == NULL);
>
> I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
> (with proper IPV4 header).

I copied the code from file nf_conntrack_proto_tcp.c. In fact, BUG_ON
isn't useful at all, as tcp_error is called before this function
is called.

>
>>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>>+                              struct tcphdr *th, u32 seq_diff)
>>+{
>>+      __be32 new;
>>+      int olen;
>>+
>>+      if (skb->len < (iph->ihl + th->doff) * 4)
>>+              return NF_DROP;
>
> Verdicts are unsigned too. (Also in other functions.)

OK. Thanks.

>
>>+static struct xt_target synproxy_tg_reg __read_mostly = {
>>+      .name           = "SYNPROXY",
>>+      .family         = NFPROTO_IPV4,
>>+      .target         = synproxy_tg,
>>+      .table          = "raw",
>>+      .hooks          = (1 << NF_INET_PRE_ROUTING),
>
> No need for (),

Yea. Thanks.

>
>>+      .proto          = IPPROTO_TCP,
>>+      .me             = THIS_MODULE,
>>+};
>>+
>>+static int __init synproxy_tg_init(void)
>>+{
>>+      int err, cpu;
>>+
>>+      for_each_possible_cpu(cpu)
>>+              per_cpu(syn_proxy_state, cpu).seq_inited = 0;
>
> Static data starts out zeroed. That also applies to percpu data, doesn't it?
>

Yea, Eric has mentioned that.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 22/27] net/dccp: Use memdup_user
From: Gerrit Renker @ 2010-05-25 11:23 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Arnaldo Carvalho de Melo, David S. Miller, dccp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005221025040.13021@ask.diku.dk>

| Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

| 
| ---
|  net/dccp/proto.c |   11 +++--------
|  1 file changed, 3 insertions(+), 8 deletions(-)
| 
| diff --git a/net/dccp/proto.c b/net/dccp/proto.c
| index b03ecf6..f79bcef 100644
| --- a/net/dccp/proto.c
| +++ b/net/dccp/proto.c
| @@ -473,14 +473,9 @@ static int dccp_setsockopt_ccid(struct sock *sk, int type,
|  	if (optlen < 1 || optlen > DCCP_FEAT_MAX_SP_VALS)
|  		return -EINVAL;
|  
| -	val = kmalloc(optlen, GFP_KERNEL);
| -	if (val == NULL)
| -		return -ENOMEM;
| -
| -	if (copy_from_user(val, optval, optlen)) {
| -		kfree(val);
| -		return -EFAULT;
| -	}
| +	val = memdup_user(optval, optlen);
| +	if (IS_ERR(val))
| +		return PTR_ERR(val);
|  
|  	lock_sock(sk);
|  	if (type == DCCP_SOCKOPT_TX_CCID || type == DCCP_SOCKOPT_CCID)

^ permalink raw reply

* Re: [PATCH RFC] netfilter: iptables target SYNPROXY
From: Jan Engelhardt @ 2010-05-25 10:36 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov, James Morris,
	netfilter-devel, netdev
In-Reply-To: <1274771218-3204-1-git-send-email-xiaosuo@gmail.com>


On Tuesday 2010-05-25 09:06, Changli Gao wrote:
> net/ipv4/netfilter/Kconfig                  |   12 
> net/ipv4/netfilter/Makefile                 |    1 
> net/ipv4/netfilter/ipt_SYNPROXY.c           |  658 ++++++++++++++++++++++++++++
> net/ipv4/syncookies.c                       |   21 
> net/ipv4/tcp_ipv4.c                         |    5 
> net/netfilter/nf_conntrack_core.c           |   44 +

Please make this an Xtables extension.
There is excellent documentation ("Writing Netfilter Modules" to google)
on the details if needed.

>--- a/include/net/netfilter/nf_conntrack_core.h
>+++ b/include/net/netfilter/nf_conntrack_core.h
>@@ -63,8 +63,21 @@ static inline int nf_conntrack_confirm(struct sk_buff *skb)
> 	if (ct && ct != &nf_conntrack_untracked) {
> 		if (!nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct))
> 			ret = __nf_conntrack_confirm(skb);
>-		if (likely(ret == NF_ACCEPT))
>+		if (likely(ret == NF_ACCEPT)) {
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			int (*syn_proxy)(struct sk_buff *, struct nf_conn *,
>+					 enum ip_conntrack_info);
>+#endif
>+
> 			nf_ct_deliver_cached_events(ct);
>+#if defined(CONFIG_IP_NF_TARGET_SYNPROXY) || \
>+    defined(CONFIG_IP_NF_TARGET_SYNPROXY_MODULE)
>+			syn_proxy = rcu_dereference(syn_proxy_post_hook);
>+			if (syn_proxy)
>+				ret = syn_proxy(skb, ct, skb->nfctinfo);
>+#endif
>+		}
> 	}
> 	return ret;
> }

I'm sure this is possible with lesser macro intrusion - use a separate
function. Same in nf_conntrack_core.c

>--- a/include/net/tcp.h
>+++ b/include/net/tcp.h
>@@ -460,8 +460,18 @@ extern int			tcp_disconnect(struct sock *sk, int flags);
> extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
> extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
> 				    struct ip_options *opt);
>-extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
>-				     __u16 *mss);
>+extern __u32 __cookie_v4_init_sequence(__be32 saddr, __be32 daddr,
>+				       __be16 sport, __be16 dport, __u32 seq,
>+				       __u16 *mssp);
>+static inline __u32 cookie_v4_init_sequence(const struct iphdr *iph,
>+					    const struct tcphdr *th,
>+					    __u16 *mssp)
>+{
>+	return __cookie_v4_init_sequence(iph->saddr, iph->daddr, th->source,
>+					 th->dest, ntohl(th->seq), mssp);
>+}

Since there is only one cookie_v4_init_sequence user AFAICS,
moving the function there seems to make best sense.

>+static int get_mss(u8 *data, int len)

unsigned

>+	/* only support IPv4 now */

Please fix :)

>+	th = skb_header_pointer(skb, iph->ihl * 4, sizeof(_th), &_th);
>+	BUG_ON(th == NULL);

I wouldn't call that a bug. I think that can happen on an evil TCP tinygram
(with proper IPV4 header).

>+static int syn_proxy_mangle_pkt(struct sk_buff *skb, struct iphdr *iph,
>+				struct tcphdr *th, u32 seq_diff)
>+{
>+	__be32 new;
>+	int olen;
>+
>+	if (skb->len < (iph->ihl + th->doff) * 4)
>+		return NF_DROP;

Verdicts are unsigned too. (Also in other functions.)

>+static struct xt_target synproxy_tg_reg __read_mostly = {
>+	.name		= "SYNPROXY",
>+	.family		= NFPROTO_IPV4,
>+	.target		= synproxy_tg,
>+	.table		= "raw",
>+	.hooks		= (1 << NF_INET_PRE_ROUTING),

No need for (),

>+	.proto		= IPPROTO_TCP,
>+	.me		= THIS_MODULE,
>+};
>+
>+static int __init synproxy_tg_init(void)
>+{
>+	int err, cpu;
>+
>+	for_each_possible_cpu(cpu)
>+		per_cpu(syn_proxy_state, cpu).seq_inited = 0;

Static data starts out zeroed. That also applies to percpu data, doesn't it?


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox