Netdev List
 help / color / mirror / Atom feed
* RE: Micrel KSZ8031 - phy link missing
From: Bruno Thomsen @ 2014-10-10 11:32 UTC (permalink / raw)
  To: Angelo Dureghello, netdev@vger.kernel.org
In-Reply-To: <5437BACA.1000201@gmail.com>

Hey again,

> thanks for the prompt help,
> well i am setting up that clock pin in my board.c, as
> ..
>
> Probably it's me that i am doing something illegal, i explain:
>
> I am moving from a 3.5.1 to a 3.17 kernel, but have a special board.c startup file i cannot convert into DT easily.
> So i am booting using old way (board.c).
> Now, new kernels have "pinctrl" that probably jump over my gpio setting later in the boot, is it possible ?

Looks like you just enable clock output from DA850. You also need to setup PHY to accept RMII clock from MAC.
This can be done with something like this...

static int angelo_board_phy_fixup(struct phy_device *phy)
{
	phy->dev_flags |= MICREL_PHY_50MHZ_CLK;
	return 0;
}

static void __init angelo_board_init(void)
{
	phy_register_fixup_for_uid(PHY_ID_KSZ8031, MICREL_PHY_ID_MASK,
					   angelo_board_phy_fixup);

	// Old board init
}



Venlig hilsen / Best regards

Kamstrup A/S <http://www.kamstrup.dk> 
Bruno Thomsen
Development engineer
Technology

Kamstrup A/S
Industrivej 28
DK-8660 Skanderborg
Tel:	 +45 89 93 10 00	 
Fax:	 +45 89 93 10 01	 
Dir:	 +45 89 93 13 94	 
E-mail:	 bth@kamstrup.dk	 
Web:	 www.kamstrup.dk	 

^ permalink raw reply

* Re: [PATCH net 0/5] net: fix races accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:37 UTC (permalink / raw)
  To: David Laight
  Cc: 'Eric Dumazet', David S. Miller, netdev@vger.kernel.org,
	Alexander Duyck, Andres Lagar-Cavilla, Greg Thelen, Hugh Dickins,
	David Rientjes
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C833D@AcuExch.aculab.com>

On Fri, 2014-10-10 at 10:02 +0000, David Laight wrote:
> From: Eric Dumazet
> > This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
> > the page. Other entities in the kernel need to use get_page_unless_zero()
> > to get a reference to the page before testing page properties, so we could
> > loose a refcount increment.
> 
> Plausibly the atomic_inc() could be avoided by allocating the page
> with multiple references, or by gabbing a lot of extra references
> and keeping a local count of when they are 'used' (by passing
> ownership onwards).

Well, this is the idea already in place.

I assume you read __netdev_alloc_frag() ?

Note this is a fix, and a fix should be minimal and obvious.

^ permalink raw reply

* Re: [PATCH net 2/5] igb: fix race accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:47 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Eric Dumazet, David S. Miller, netdev, Alexander Duyck,
	Andres Lagar-Cavilla, Greg Thelen, Hugh Dickins, David Rientjes
In-Reply-To: <CAL3LdT5_uQ07kbB+N6RY1yMe3yfko6O2g+QsTaz1-TDk7A1CCQ@mail.gmail.com>

On Thu, 2014-10-09 at 22:55 -0700, Jeff Kirsher wrote:
> On Thu, Oct 9, 2014 at 10:24 PM, Eric Dumazet <edumazet@google.com> wrote:
> > This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
> > the page. Other entities in the kernel need to use get_page_unless_zero()
> > to get a reference to the page before testing page properties, so we could
> > loose a refcount increment.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> Since this is apart of a series, if the changes to skbuff are ok, then
> the changes to the Intel drivers are ok.

Thanks Jeff, I am sending v2 including your Acked-by and ixgbe title
fix.

^ permalink raw reply

* [PATCH v2 net 0/5] net: fix races accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet

This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

The only case it is valid is when page->_count is 0, we can use this in
__netdev_alloc_frag()

Note that I never seen crashes caused by these races, the issue was reported
by Andres Lagar-Cavilla and Hugh Dickins.


Eric Dumazet (5):
  fm10k: fix race accessing page->_count
  igb: fix race accessing page->_count
  ixgbe: fix race accessing page->_count
  mlx4: fix race accessing page->_count
  net: fix races in page->_count manipulation

 drivers/net/ethernet/intel/fm10k/fm10k_main.c |  7 +++----
 drivers/net/ethernet/intel/igb/igb_main.c     |  7 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  8 +++-----
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  6 +++---
 net/core/skbuff.c                             | 25 ++++++++++++++++++-------
 5 files changed, 30 insertions(+), 23 deletions(-)

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply

* [PATCH v2 net 1/5] fm10k: fix race accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>

This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 6c800a330d66..9d7118a0d67a 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -219,11 +219,10 @@ static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer *rx_buffer,
 	/* flip page offset to other buffer */
 	rx_buffer->page_offset ^= FM10K_RX_BUFSZ;
 
-	/* since we are the only owner of the page and we need to
-	 * increment it, just set the value to 2 in order to avoid
-	 * an unnecessary locked operation
+	/* Even if we own the page, we are not allowed to use atomic_set()
+	 * This would break get_page_unless_zero() users.
 	 */
-	atomic_set(&page->_count, 2);
+	atomic_inc(&page->_count);
 #else
 	/* move offset up to the next cache line */
 	rx_buffer->page_offset += truesize;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 net 2/5] igb: fix race accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>

This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ae59c0b108c5..a21b14495ebd 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6545,11 +6545,10 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
 	/* flip page offset to other buffer */
 	rx_buffer->page_offset ^= IGB_RX_BUFSZ;
 
-	/* since we are the only owner of the page and we need to
-	 * increment it, just set the value to 2 in order to avoid
-	 * an unnecessary locked operation
+	/* Even if we own the page, we are not allowed to use atomic_set()
+	 * This would break get_page_unless_zero() users.
 	 */
-	atomic_set(&page->_count, 2);
+	atomic_inc(&page->_count);
 #else
 	/* move offset up to the next cache line */
 	rx_buffer->page_offset += truesize;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 net 3/5] ixgbe: fix race accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>

This is illegal to use atomic_set(&page->_count, 2) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d677b5a23b58..fec5212d4337 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1865,12 +1865,10 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
 	/* flip page offset to other buffer */
 	rx_buffer->page_offset ^= truesize;
 
-	/*
-	 * since we are the only owner of the page and we need to
-	 * increment it, just set the value to 2 in order to avoid
-	 * an unecessary locked operation
+	/* Even if we own the page, we are not allowed to use atomic_set()
+	 * This would break get_page_unless_zero() users.
 	 */
-	atomic_set(&page->_count, 2);
+	atomic_inc(&page->_count);
 #else
 	/* move offset up to the next cache line */
 	rx_buffer->page_offset += truesize;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 net 4/5] mlx4: fix race accessing page->_count
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>

This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index a33048ee9621..01660c595f5c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -76,10 +76,10 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 	page_alloc->dma = dma;
 	page_alloc->page_offset = frag_info->frag_align;
 	/* Not doing get_page() for each frag is a big win
-	 * on asymetric workloads.
+	 * on asymetric workloads. Note we can not use atomic_set().
 	 */
-	atomic_set(&page->_count,
-		   page_alloc->page_size / frag_info->frag_stride);
+	atomic_add(page_alloc->page_size / frag_info->frag_stride - 1,
+		   &page->_count);
 	return 0;
 }
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 net 5/5] net: fix races in page->_count manipulation
From: Eric Dumazet @ 2014-10-10 11:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Alexander Duyck, Jeff Kirsher, Andres Lagar-Cavilla,
	Greg Thelen, Hugh Dickins, David Rientjes, Eric Dumazet
In-Reply-To: <1412941698-17502-1-git-send-email-edumazet@google.com>

This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

The only case it is valid is when page->_count is 0

Fixes: 540eb7bf0bbed ("net: Update alloc frag to reduce get/put page usage and recycle pages")
Signed-off-by: Eric Dumaze <edumazet@google.com>
---
 net/core/skbuff.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a30d750647e7..829d013745ab 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -360,18 +360,29 @@ refill:
 				goto end;
 		}
 		nc->frag.size = PAGE_SIZE << order;
-recycle:
-		atomic_set(&nc->frag.page->_count, NETDEV_PAGECNT_MAX_BIAS);
+		/* Even if we own the page, we do not use atomic_set().
+		 * This would break get_page_unless_zero() users.
+		 */
+		atomic_add(NETDEV_PAGECNT_MAX_BIAS - 1,
+			   &nc->frag.page->_count);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
 		nc->frag.offset = 0;
 	}
 
 	if (nc->frag.offset + fragsz > nc->frag.size) {
-		/* avoid unnecessary locked operations if possible */
-		if ((atomic_read(&nc->frag.page->_count) == nc->pagecnt_bias) ||
-		    atomic_sub_and_test(nc->pagecnt_bias, &nc->frag.page->_count))
-			goto recycle;
-		goto refill;
+		if (atomic_read(&nc->frag.page->_count) != nc->pagecnt_bias) {
+			if (!atomic_sub_and_test(nc->pagecnt_bias,
+						 &nc->frag.page->_count))
+				goto refill;
+			/* OK, page count is 0, we can safely set it */
+			atomic_set(&nc->frag.page->_count,
+				   NETDEV_PAGECNT_MAX_BIAS);
+		} else {
+			atomic_add(NETDEV_PAGECNT_MAX_BIAS - nc->pagecnt_bias,
+				   &nc->frag.page->_count);
+		}
+		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
+		nc->frag.offset = 0;
 	}
 
 	data = page_address(nc->frag.page) + nc->frag.offset;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* Re: [PATCH v3] Add support for GPIOs for SMSC LAN95xx chips.
From: Bjørn Mork @ 2014-10-10 12:02 UTC (permalink / raw)
  To: Evgeny Boger
  Cc: Daniele Forsi, Steve Glendinning, David Miller, netdev, linux-usb
In-Reply-To: <1412806498-22556-1-git-send-email-boger@contactless.ru>

Evgeny Boger <boger@contactless.ru> writes:

> For LAN951x:
> GPIOs with offsets 0-7 are named "GPIO3" - "GPIO7",

The number of offsets does not match the number of names.



Bjørn

^ permalink raw reply

* Re: Micrel KSZ8031 - phy link missing
From: Angelo Dureghello @ 2014-10-10 12:54 UTC (permalink / raw)
  To: Bruno Thomsen, netdev@vger.kernel.org
In-Reply-To: <915054555B5659448ACF8A70E114824D0163C7A7F9@Exchange2010.kamstrup.dk>

Dear Bruno,

i added the fixup, nothing changes.
I am tracing flags at ksz8021_config_ini:

Starting network...
davinci_mdio davinci_mdio.0: resetting idled controller
micrel.c: ksz8021_config_init: flags: 0x00000203
net eth0: attached PHY driver [Micrel KSZ8031] 
(mii_bus:phy_addr=davinci_mdio-0:00, id=221556)
udhcpc (v1.20.2) started
Sending discover...
Sending discover...
Sending discover...
No lease, failing

So, MICREL_PHY_50MHZ_CLK seems to be set.
I have the suspect that the clock is not received from the phy. 
Unfortunately i can't check
with the scope, the micrel chip is on a hidden part of the board.

 From what i remember, when i was trying to init emac/phy by devicetree, 
it was working.

[root@barix ~]# cat /sys/class/net/eth0/carrier
0
[root@barix ~]# cat /sys/class/net/eth0/carrier_changes
1
[root@barix ~]#

I realize at this point is my own debugging / issue.
But if you have any idea how to debug this, i can put some traces here 
and there.

Still many thanks,
Angelo



On 10/10/2014 13:32, Bruno Thomsen wrote:
> Hey again,
> Looks like you just enable clock output from DA850. You also need to setup PHY to accept RMII clock from MAC.
> This can be done with something like this...
>
> ...
>   
>

^ permalink raw reply

* [PATCH] ipv6: notify userspace when we added or changed an ipv6 token
From: Lubomir Rintel @ 2014-10-10 14:08 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, David S. Miller, Lubomir Rintel,
	Hannes Frederic Sowa, Daniel Borkmann

NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
---
 net/ipv6/addrconf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3e118df..3d11390 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4528,6 +4528,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 	}
 
 	write_unlock_bh(&idev->lock);
+	netdev_state_change(dev);
 	addrconf_verify_rtnl();
 	return 0;
 }
-- 
1.9.3

^ permalink raw reply related

* Re: hso. hso_serial_close oops.
From: Peter Hurley @ 2014-10-10 14:22 UTC (permalink / raw)
  To: christian.melki, Jan Dumon; +Cc: linux-usb, netdev@vger.kernel.org
In-Reply-To: <5437E6E7.7070107@t2data.com>

[ +Jan Dumon, netdev ]

Forwarding oops report to maintainer.

On 10/10/2014 10:02 AM, Christian Melki wrote:
> I'm using a ppc 8347 with a normal 3.16.1 kernel.
> The software opens the driver tty in question and then sets it as stdin, stdout for chat-program and pppd.
> When I yank the modem while running, the software detects this and tries to close the open socket with a kernel crash as a result.
> 
> Unable to handle kernel paging request for unknown fault
> Faulting instruction address: 0xc03a4420
> Oops: Kernel access of bad area, sig: 11 [#1]
> PREEMPT ASP8347E
> Modules linked in:
> CPU: 0 PID: 1536 Comm: pppd Not tainted 3.16.1 #1
> task: c31272e0 ti: c39e8000 task.ti: c39e8000
> NIP: c03a4420 LR: c020752c CTR: c02074e0
> REGS: c39e9d40 TRAP: 0600   Not tainted  (3.16.1)
> MSR: 00009032 <EE,ME,IR,DR,RI>  CR: 24004224  XER: 20000000
> DAR: 0000004d DSISR: 00000000
> GPR00: 00000000 c39e9df0 c31272e0 0000004d c3235460 00000000 c39c1934 00000000
> GPR08: 00000000 00000000 c39c1964 c39c1800 24004228 10047610 10040000 1003f6ec
> GPR16: 1003f6b4 1003f618 1003f6b0 1003f6bc 1003f700 1003f7b4 c39e9edc 1003f6c8
> GPR24: 1003f6dc c03bd1a8 00000004 c03bd2b4 00000000 c3235460 00000000 c38cca00
> NIP [c03a4420] mutex_lock+0x0/0x1c
> LR [c020752c] hso_serial_close+0x4c/0x11c
> Call Trace:
> [c39e9df0] [c3235460] 0xc3235460 (unreliable)
> [c39e9e00] [c0188944] tty_release+0x134/0x560
> [c39e9e90] [c00a1968] __fput+0x94/0x214
> [c39e9eb0] [c0032854] task_work_run+0xcc/0xf4
> [c39e9ed0] [c0019108] do_exit+0x208/0x874
> [c39e9f20] [c00198c0] do_group_exit+0x44/0xd8
> [c39e9f30] [c0019968] __wake_up_parent+0x0/0x34
> [c39e9f40] [c000e60c] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xfebd4cc
>     LR = 0xff79a98
> Instruction dump:
> 409e0014 801f003c 70090004 41820008 4bffe6ad 80010034 bb410018 38210030
> 7c0803a6 4e800020 4bffe695 4bffffc4 <7c001828> 3000ffff 7c00192d 40a2fff4
> ---[ end trace bfebaf22f6f5795a ]---
> 
> Fixing recursive fault but reboot is needed!
> 
> I have simulated the same error with a simple userland program without using pppd.
> 
> 
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdlib.h>
> 
> int
> main(int argc, char *argv[]) {
>     int fd;
>     fd = open(argv[1], O_RDWR);
>     sleep(atoi(argv[2]));
>     close(fd);
> 
>     return 0;
> }
> 
> If I yank the modem while the program is sleeping, I get exactly the same kernel error as with pppd.
> I have looked at the hso.c (hso_serial_close) driver but can't figure it out. The structs might not be intact at that time, but those are tty structs.. Im not sure what is going on. I tried to check the integrity of the structs but still get a crash. The tty layer is a mystery to me.
> 
> regards,
> Christian

^ permalink raw reply

* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Alexander Duyck @ 2014-10-10 14:42 UTC (permalink / raw)
  To: David Miller, alexander.duyck; +Cc: netdev
In-Reply-To: <20141010.004708.1975127853551765914.davem@davemloft.net>

On 10/09/2014 09:47 PM, David Miller wrote:
> From: alexander.duyck@gmail.com
> Date: Thu, 09 Oct 2014 21:03:28 -0700
>
>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>
>> This patch addresses a kernel unaligned access bug seen on a sparc64 system
>> with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
>> be32 pointer which was then having the value directly returned.
>>
>> In order to keep the handling of the ports consistent with how we were
>> handling the IPv4 and IPv6 addresses I have instead replaced the assignment
>> with a memcpy to the flow key ports value.  This way it should stay a
>> memcpy on systems that cannot handle unaligned access, and will likely be
>> converted to a 32b assignment on the systems that can support it.
>>
>> Reported-by: David S. Miller <davem@davemloft.net>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> Guess what the compiler will output for the memcpy()....
>
> 	*(u32 *)dest = *(u32 *)src;
>
> Using memcpy() is never a valid way to avoid misaligned loads and
> stores.

If needed we could convert ports to a __be16 I suppose.

> If the types have a given alignment, the compiler can legitimately
> expand the memcpy() inline with suitably sized loads and stores.

That is what I get for trying to come up with a fix at the end of the 
day.  Although it does leave me scratching my head why the IPv4 and IPv6 
addresses were not causing unaligned accesses or were they triggering 
them as well?

> Please see my other reply in the original thread, we have to use
> the hardware when we can to manage this situation, by configuring
> it to output two garbage bytes before the packet contents when
> DMA'ing into power-of-2 aligned blocks of memory.
>
> Thanks.

The problem is the igb / ixgbe / fm10k hardware doesn't have a means of 
inserting padding from its side.  The whole point of the xxx_get_headlen 
functions was to determine the size needed in order to generate the 
correct memcpy so that we could generate an aligned header.  It sounds 
like we can't do that with this function now because there are multiple 
instances where the data will be accessed unaligned if it isn't aligned 
in the first place.

The way I see it now there are two possible solutions.  The first one 
being to update the flow hash functions to work with unaligned accesses, 
and the second being to split the get_headlen code flow off and for now 
use my original code which already had all the alignment issues 
resolved.  I'm open to suggestions either way, but for now I will try to 
just address the items you have pointed out here with the current patch 
and submit a v2.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net 1/3] net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
From: Neil Horman @ 2014-10-10 14:48 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <1412888133-833-2-git-send-email-dborkman@redhat.com>

On Thu, Oct 09, 2014 at 10:55:31PM +0200, Daniel Borkmann wrote:
> Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for
> ASCONF chunk") added basic verification of ASCONF chunks, however,
> it is still possible to remotely crash a server by sending a
> special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
> 
> skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
>  head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
>  end:0x440 dev:<NULL>
>  ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:129!
> [...]
> Call Trace:
>  <IRQ>
>  [<ffffffff8144fb1c>] skb_put+0x5c/0x70
>  [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
>  [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
>  [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
>  [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
>  [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
>  [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
>  [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
>  [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
>  [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
>  [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
>  [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
>  [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
>  [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
>  [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
>  [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
>  [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
>  [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
>  [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
>  [<ffffffff81496ac5>] ip_rcv+0x275/0x350
>  [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
>  [<ffffffff81460588>] netif_receive_skb+0x58/0x60
> 
> This can be triggered e.g., through a simple scripted nmap
> connection scan injecting the chunk after the handshake, for
> example, ...
> 
>   -------------- INIT[ASCONF; ASCONF_ACK] ------------->
>   <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
>   -------------------- COOKIE-ECHO -------------------->
>   <-------------------- COOKIE-ACK ---------------------
>   ------------------ ASCONF; UNKNOWN ------------------>
> 
> ... where ASCONF chunk of length 280 contains 2 parameters ...
> 
>   1) Add IP address parameter (param length: 16)
>   2) Add/del IP address parameter (param length: 255)
> 
> ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
> Address Parameter in the ASCONF chunk is even missing, too.
> This is just an example and similarly-crafted ASCONF chunks
> could be used just as well.
> 
> The ASCONF chunk passes through sctp_verify_asconf() as all
> parameters passed sanity checks, and after walking, we ended
> up successfully at the chunk end boundary, and thus may invoke
> sctp_process_asconf(). Parameter walking is done with
> WORD_ROUND() to take padding into account.
> 
> In sctp_process_asconf()'s TLV processing, we may fail in
> sctp_process_asconf_param() e.g., due to removal of the IP
> address that is also the source address of the packet containing
> the ASCONF chunk, and thus we need to add all TLVs after the
> failure to our ASCONF response to remote via helper function
> sctp_add_asconf_response(), which basically invokes a
> sctp_addto_chunk() adding the error parameters to the given
> skb.
> 
> When walking to the next parameter this time, we proceed
> with ...
> 
>   length = ntohs(asconf_param->param_hdr.length);
>   asconf_param = (void *)asconf_param + length;
> 
> ... instead of the WORD_ROUND()'ed length, thus resulting here
> in an off-by-one that leads to reading the follow-up garbage
> parameter length of 12336, and thus throwing an skb_over_panic
> for the reply when trying to sctp_addto_chunk() next time,
> which implicitly calls the skb_put() with that length.
> 
> Fix it by using sctp_walk_params() [ which is also used in
> INIT parameter processing ] macro in the verification *and*
> in ASCONF processing: it will make sure we don't spill over,
> that we walk parameters WORD_ROUND()'ed. Moreover, we're being
> more defensive and guard against unknown parameter types and
> missized addresses.
> 
> Joint work with Vlad Yasevich.
> 
> Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* RE: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Laight @ 2014-10-10 14:57 UTC (permalink / raw)
  To: 'alexander.h.duyck@redhat.com', David Miller,
	alexander.duyck@gmail.com
  Cc: netdev@vger.kernel.org
In-Reply-To: <5437F072.5060502@redhat.com>

From: Alexander Duyck
> On 10/09/2014 09:47 PM, David Miller wrote:
> > From: alexander.duyck@gmail.com
> > Date: Thu, 09 Oct 2014 21:03:28 -0700
> >
> >> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> >>
> >> This patch addresses a kernel unaligned access bug seen on a sparc64 system
> >> with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
> >> be32 pointer which was then having the value directly returned.
> >>
> >> In order to keep the handling of the ports consistent with how we were
> >> handling the IPv4 and IPv6 addresses I have instead replaced the assignment
> >> with a memcpy to the flow key ports value.  This way it should stay a
> >> memcpy on systems that cannot handle unaligned access, and will likely be
> >> converted to a 32b assignment on the systems that can support it.
> >>
> >> Reported-by: David S. Miller <davem@davemloft.net>
> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> > Guess what the compiler will output for the memcpy()....
> >
> > 	*(u32 *)dest = *(u32 *)src;
> >
> > Using memcpy() is never a valid way to avoid misaligned loads and
> > stores.
> 
> If needed we could convert ports to a __be16 I suppose.

You would have to cast it somewhere where the compiler cannot
tell what the original type was.
This usually means you have to call a non-static function, which
might have to be in a different compilation unit.

...
> That is what I get for trying to come up with a fix at the end of the
> day.  Although it does leave me scratching my head why the IPv4 and IPv6
> addresses were not causing unaligned accesses or were they triggering
> them as well?

I think there is code to copy the IP and TCP headers to aligned memory
before they are parsed.

...
> 
> The problem is the igb / ixgbe / fm10k hardware doesn't have a means of
> inserting padding from its side...

Shoot the hardware engineers.

You aren't going to get the performance you expect from a 10Ge card
unless the rx buffers are 'correctly' aligned.

	David

^ permalink raw reply

* [PATCH v2] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: alexander.duyck @ 2014-10-10 14:59 UTC (permalink / raw)
  To: netdev, davem
In-Reply-To: <20141009.201248.1210454965155680255.davem@davemloft.net>

From: Alexander Duyck <alexander.h.duyck@redhat.com>

This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.

In order to keep the handling of the ports consistent with how we were
handling the IPv4 and IPv6 addresses I have instead replaced the assignment
with a memcpy to the flow key ports value.  This way it should stay a
memcpy on systems that cannot handle unaligned access, and will likely be
converted to a 32b assignment on the systems that can support it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---

v2: Changed ports to a __be16 * to prevent optimization for __be32 *

 drivers/net/bonding/bond_main.c |    2 +-
 include/net/flow_keys.h         |   10 ++++++----
 net/core/flow_dissector.c       |   22 ++++++++++++++--------
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c9ac06c..326eb3d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2993,7 +2993,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 		return false;
 	}
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
-		fk->ports = skb_flow_get_ports(skb, noff, proto);
+		skb_flow_get_ports(fk, skb, noff, proto);
 
 	return true;
 }
diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 7ee2df0..66235f4 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -33,11 +33,13 @@ static inline bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys
 {
 	return __skb_flow_dissect(skb, flow, NULL, 0, 0, 0);
 }
-__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
-			    void *data, int hlen_proto);
-static inline __be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
+void __skb_flow_get_ports(struct flow_keys *flow, const struct sk_buff *skb,
+			  int thoff, u8 ip_proto, void *data, int hlen_proto);
+static inline void skb_flow_get_ports(struct flow_keys *flow,
+				      const struct sk_buff *skb, int thoff,
+				      u8 ip_proto)
 {
-	return __skb_flow_get_ports(skb, thoff, ip_proto, NULL, 0);
+	__skb_flow_get_ports(flow, skb, thoff, ip_proto, NULL, 0);
 }
 u32 flow_hash_from_keys(struct flow_keys *keys);
 unsigned int flow_get_hlen(const unsigned char *data, unsigned int max_len,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 8560dea..e789a39 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -28,6 +28,7 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
 
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
+ * @flow: Flow key to place port informatin into
  * @skb: sk_buff to extract the ports from
  * @thoff: transport header offset
  * @ip_proto: protocol for which to get port offset
@@ -37,8 +38,8 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
  * The function will try to retrieve the ports at offset thoff + poff where poff
  * is the protocol port offset returned from proto_ports_offset
  */
-__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
-			    void *data, int hlen)
+void __skb_flow_get_ports(struct flow_keys *flow, const struct sk_buff *skb,
+			  int thoff, u8 ip_proto, void *data, int hlen)
 {
 	int poff = proto_ports_offset(ip_proto);
 
@@ -48,15 +49,20 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
 	}
 
 	if (poff >= 0) {
-		__be32 *ports, _ports;
+		__be16 *ports;
 
 		ports = __skb_header_pointer(skb, thoff + poff,
-					     sizeof(_ports), data, hlen, &_ports);
-		if (ports)
-			return *ports;
+					     sizeof(flow->ports), data, hlen,
+					     &flow->ports);
+		if (ports) {
+			if (flow->port16 != ports)
+				memcpy(&flow->ports, ports,
+				       sizeof(flow->ports));
+			return;
+		}
 	}
 
-	return 0;
+	memset(&flow->ports, 0, sizeof(flow->ports));
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
@@ -231,7 +237,7 @@ ipv6:
 
 	flow->n_proto = proto;
 	flow->ip_proto = ip_proto;
-	flow->ports = __skb_flow_get_ports(skb, nhoff, ip_proto, data, hlen);
+	__skb_flow_get_ports(flow, skb, nhoff, ip_proto, data, hlen);
 	flow->thoff = (u16) nhoff;
 
 	return true;

^ permalink raw reply related

* Re: [PATCH 1/1] Checkpatch: coding style errors in Nvidia ethernet driver
From: Joe Perches @ 2014-10-10 15:03 UTC (permalink / raw)
  To: Akshay Sarode; +Cc: John Stultz, netdev, linux-kernel
In-Reply-To: <1412928102-1696-1-git-send-email-akshaysarode21@gmail.com>

On Fri, 2014-10-10 at 13:31 +0530, Akshay Sarode wrote:
> ERROR: "foo* bar" should be "foo *bar"
> ERROR: do not initialise statics to 0 or NULL
> CHECK: spinlock_t definition without comment
> Signed-off-by: Akshay Sarode <akshaysarode21@gmail.com>
[]
> diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
[]
> @@ -911,12 +913,18 @@ enum {
[]
>  /*
>   * Debug output control for tx_timeout
>   */
> -static bool debug_tx_timeout = false;
> +enum {
> +	NV_DEBUG_TX_TIMEOUT_DISABLED,
> +	NV_DEBUG_TX_TIMEOUT_ENABLED
> +};
> +
> +static bool debug_tx_timeout = NV_DEBUG_TX_TIMEOUT_DISABLED;

Adding this enum is not useful.

^ permalink raw reply

* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Alexander Duyck @ 2014-10-10 15:14 UTC (permalink / raw)
  To: David Laight, David Miller, alexander.duyck@gmail.com
  Cc: netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C858B@AcuExch.aculab.com>

On 10/10/2014 07:57 AM, David Laight wrote:
> From: Alexander Duyck
>> On 10/09/2014 09:47 PM, David Miller wrote:
>>> From: alexander.duyck@gmail.com
>>> Date: Thu, 09 Oct 2014 21:03:28 -0700
>>>
>>>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>>>
>>>> This patch addresses a kernel unaligned access bug seen on a sparc64 system
>>>> with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
>>>> be32 pointer which was then having the value directly returned.
>>>>
>>>> In order to keep the handling of the ports consistent with how we were
>>>> handling the IPv4 and IPv6 addresses I have instead replaced the assignment
>>>> with a memcpy to the flow key ports value.  This way it should stay a
>>>> memcpy on systems that cannot handle unaligned access, and will likely be
>>>> converted to a 32b assignment on the systems that can support it.
>>>>
>>>> Reported-by: David S. Miller <davem@davemloft.net>
>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
>>> Guess what the compiler will output for the memcpy()....
>>>
>>> 	*(u32 *)dest = *(u32 *)src;
>>>
>>> Using memcpy() is never a valid way to avoid misaligned loads and
>>> stores.
>> If needed we could convert ports to a __be16 I suppose.
> You would have to cast it somewhere where the compiler cannot
> tell what the original type was.
> This usually means you have to call a non-static function, which
> might have to be in a different compilation unit.
>
> ...

The pointer itself gets assigned from skb->data + offset, since 
skb->data is an unsigned char I would think that it should be safe to 
cast it as a __be16 and have that stick.  It is only if we cast it as a 
__be32 that it should be an issue as we are casting a value that starts 
off only byte aligned.

>> That is what I get for trying to come up with a fix at the end of the
>> day.  Although it does leave me scratching my head why the IPv4 and IPv6
>> addresses were not causing unaligned accesses or were they triggering
>> them as well?
> I think there is code to copy the IP and TCP headers to aligned memory
> before they are parsed.
>
> ...

The code I am talking about is run before we actually get into the 
header parsing.  So for example if you take a look at 
iph_to_flow_copy_addrs that should be getting called before we even get 
to the code that is supposedly triggering this and it is doing 32b 
aligned accesses for the source and destination IP addresses.

>> The problem is the igb / ixgbe / fm10k hardware doesn't have a means of
>> inserting padding from its side...
> Shoot the hardware engineers.
>
> You aren't going to get the performance you expect from a 10Ge card
> unless the rx buffers are 'correctly' aligned.
>
> 	David
>

I think you might be coming to this a little late.  The igb and ixgbe 
drivers had been working this way for a long time.  We did a memcpy to 
move the headers from the page and into the skb->data at an aligned 
offset.  In order to determine the length to memcpy we had a function 
that could walk through the DMA aligned data to get the header length.  
The function for that was replaced with the __skb_flow_dissect as it was 
considered a duplication of code with the flow_dissection functions.  
However that is obviously not the case now that we are hitting these 
alignment issues.

The question I have in all this is do I push forward and make 
__skb_flow_dissect work with unaligned accesses, or do I back off and 
put something equivilent to igb/ixgbe_get_headlen functions in the 
kernel in order to deal with the unaligned accesses as they had no 
issues with them since they were only concerned with getting the header 
length and kept all accesses 16b aligned.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Eric Dumazet @ 2014-10-10 15:29 UTC (permalink / raw)
  To: alexander.h.duyck
  Cc: David Laight, David Miller, alexander.duyck@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <5437F7CB.5010101@redhat.com>

On Fri, 2014-10-10 at 08:14 -0700, Alexander Duyck wrote:

> I think you might be coming to this a little late.  The igb and ixgbe 
> drivers had been working this way for a long time.  We did a memcpy to 
> move the headers from the page and into the skb->data at an aligned 
> offset.  In order to determine the length to memcpy we had a function 
> that could walk through the DMA aligned data to get the header length.  
> The function for that was replaced with the __skb_flow_dissect as it was 
> considered a duplication of code with the flow_dissection functions.  
> However that is obviously not the case now that we are hitting these 
> alignment issues.
> 
> The question I have in all this is do I push forward and make 
> __skb_flow_dissect work with unaligned accesses, or do I back off and 
> put something equivilent to igb/ixgbe_get_headlen functions in the 
> kernel in order to deal with the unaligned accesses as they had no 
> issues with them since they were only concerned with getting the header 
> length and kept all accesses 16b aligned.
> 

I see nothing wrong dealing with unaligned accesses, as these helpers
are nop on x86 or CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y arches.

^ permalink raw reply

* Re: [ovs-dev] [PATCH/RFC repost 2/8] netlink: Allow suppression of warnings for duplicate attributes
From: Ben Pfaff @ 2014-10-10 15:31 UTC (permalink / raw)
  To: Simon Horman; +Cc: dev, netdev
In-Reply-To: <20141009011831.GF9339@vergenet.net>

On Thu, Oct 09, 2014 at 10:18:32AM +0900, Simon Horman wrote:
> On Fri, Sep 26, 2014 at 04:55:42PM -0700, Ben Pfaff wrote:
> > On Thu, Sep 18, 2014 at 10:55:05AM +0900, Simon Horman wrote:
> > > Add a multiple field to struct nl_policy which if set suppresses
> > > warning of duplicate attributes in nl_parse_nested().
> > > 
> > > As is the case without this patch only the last occurrence of an
> > > attribute is stored in attrs by nl_parse_nested(). As such
> > > if the multiple field of struct nl_policy is set then it
> > > is up to the caller to parse the message to extract all the attributes.
> > > 
> > > This is in preparation for allowing multiple OVS_SELECT_GROUP_ATTR_BUCKET
> > > attributes in a nested OVS_ACTION_ATTR_SELECT_GROUP attribute.
> > > 
> > > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> > 
> > In the other case where we have duplicate attributes, it doesn't make
> > sense to process them with the policy functions, because we want to
> > see all of the instances of the duplicate attributes and policy
> > doesn't allow us to do that.  I'm a little surprised that the new
> > attributes work differently.  What's the idea?
> 
> My idea was to use the policy to obtain the attributes that
> may not be duplicated. And then custom code to pick up all the
> instances of attributes that may be duplicated.
> 
> I'm don't feel strongly about that approach and I'd be just has
> happy to drop this patch and rework things a little so that
> all the attributes are picked out by custom code. It sounds
> like that would match the approach taken elsewhere. Sorry for
> not noticing that earlier.

I see.

That's more like the approach used elsewhere, yes, but in those cases
there wasn't anything (if I recall correctly) that could be usefully
done with the policy parser, so it wasn't considered as an option.  If
the group buckets are different then maybe a different treatment is
warranted.

^ permalink raw reply

* Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Eric Dumazet @ 2014-10-10 15:33 UTC (permalink / raw)
  To: David Laight
  Cc: 'alexander.h.duyck@redhat.com', David Miller,
	alexander.duyck@gmail.com, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D174C858B@AcuExch.aculab.com>

On Fri, 2014-10-10 at 14:57 +0000, David Laight wrote:

> I think there is code to copy the IP and TCP headers to aligned memory
> before they are parsed.

There is no such thing. You are here on netdev list, please read the
code before doing such claims.

> ...
> > 
> > The problem is the igb / ixgbe / fm10k hardware doesn't have a means of
> > inserting padding from its side...
> 
> Shoot the hardware engineers.
> 
> You aren't going to get the performance you expect from a 10Ge card
> unless the rx buffers are 'correctly' aligned.

That is simply not true on current x86 cpus. They simply dont care at
all.

You cannot blame Intel for other arches.

^ permalink raw reply

* Re: [PATCH v2] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: Eric Dumazet @ 2014-10-10 15:36 UTC (permalink / raw)
  To: alexander.duyck; +Cc: netdev, davem
In-Reply-To: <20141010145647.23006.54451.stgit@ahduyck-workstation.home>

On Fri, 2014-10-10 at 07:59 -0700, alexander.duyck@gmail.com wrote:
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> 
> This patch addresses a kernel unaligned access bug seen on a sparc64 system
> with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
> be32 pointer which was then having the value directly returned.
> 
> In order to keep the handling of the ports consistent with how we were
> handling the IPv4 and IPv6 addresses I have instead replaced the assignment
> with a memcpy to the flow key ports value.  This way it should stay a
> memcpy on systems that cannot handle unaligned access, and will likely be
> converted to a 32b assignment on the systems that can support it.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---

I believe you also need to take care of calls to ipv6_addr_hash()

The IPv4 part also needs something in iph_to_flow_copy_addrs(),
otherwise compiler might assume  &iph->saddr is word aligned.

^ permalink raw reply

* Re: [PATCH net 2/3] net: sctp: fix panic on duplicate ASCONF chunks
From: Neil Horman @ 2014-10-10 15:39 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, linux-sctp, netdev, Vlad Yasevich
In-Reply-To: <1412888133-833-3-git-send-email-dborkman@redhat.com>

On Thu, Oct 09, 2014 at 10:55:32PM +0200, Daniel Borkmann wrote:
> When receiving a e.g. semi-good formed connection scan in the
> form of ...
> 
>   -------------- INIT[ASCONF; ASCONF_ACK] ------------->
>   <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
>   -------------------- COOKIE-ECHO -------------------->
>   <-------------------- COOKIE-ACK ---------------------
>   ---------------- ASCONF_a; ASCONF_b ----------------->
> 
> ... where ASCONF_a equals ASCONF_b chunk (at least both serials
> need to be equal), we panic an SCTP server!
> 
> The problem is that good-formed ASCONF chunks that we reply with
> ASCONF_ACK chunks are cached per serial. Thus, when we receive a
> same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
> not need to process them again on the server side (that was the
> idea, also proposed in the RFC). Instead, we know it was cached
> and we just resend the cached chunk instead. So far, so good.
> 
> Where things get nasty is in SCTP's side effect interpreter, that
> is, sctp_cmd_interpreter():
> 
> While incoming ASCONF_a (chunk = event_arg) is being marked
> !end_of_packet and !singleton, and we have an association context,
> we do not flush the outqueue the first time after processing the
> ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
> queued up, although we set local_cork to 1. Commit 2e3216cd54b1
> changed the precedence, so that as long as we get bundled, incoming
> chunks we try possible bundling on outgoing queue as well. Before
> this commit, we would just flush the output queue.
> 
> Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
> continue to process the same ASCONF_b chunk from the packet. As
> we have cached the previous ASCONF_ACK, we find it, grab it and
> do another SCTP_CMD_REPLY command on it. So, effectively, we rip
> the chunk->list pointers and requeue the same ASCONF_ACK chunk
> another time. Since we process ASCONF_b, it's correctly marked
> with end_of_packet and we enforce an uncork, and thus flush, thus
> crashing the kernel.
> 
> Fix it by testing if the ASCONF_ACK is currently pending and if
> that is the case, do not requeue it. When flushing the output
> queue we may relink the chunk for preparing an outgoing packet,
> but eventually unlink it when it's copied into the skb right
> before transmission.
> 
> Joint work with Vlad Yasevich.
> 
> Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
> ---
>  include/net/sctp/sctp.h | 5 +++++
>  net/sctp/associola.c    | 2 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 9fbd856..856f01c 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -426,6 +426,11 @@ static inline void sctp_assoc_pending_pmtu(struct sock *sk, struct sctp_associat
>  	asoc->pmtu_pending = 0;
>  }
>  
> +static inline bool sctp_chunk_pending(const struct sctp_chunk *chunk)
> +{
> +	return !list_empty(&chunk->list);
> +}
> +
>  /* Walk through a list of TLV parameters.  Don't trust the
>   * individual parameter lengths and instead depend on
>   * the chunk length to indicate when to stop.  Make sure
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index a88b852..f791edd 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1668,6 +1668,8 @@ struct sctp_chunk *sctp_assoc_lookup_asconf_ack(
>  	 * ack chunk whose serial number matches that of the request.
>  	 */
>  	list_for_each_entry(ack, &asoc->asconf_ack_list, transmitted_list) {
> +		if (sctp_chunk_pending(ack))
> +			continue;
>  		if (ack->subh.addip_hdr->serial == serial) {
>  			sctp_chunk_hold(ack);
>  			return ack;
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Is it worth adding a WARN_ON, to indicate that two ASCONF chunks have been
received with duplicate serials?

Neil

^ permalink raw reply

* RE: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
From: David Laight @ 2014-10-10 16:30 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: 'alexander.h.duyck@redhat.com', David Miller,
	alexander.duyck@gmail.com, netdev@vger.kernel.org
In-Reply-To: <1412955219.9362.13.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: 10 October 2014 16:34
> To: David Laight
> Cc: 'alexander.h.duyck@redhat.com'; David Miller; alexander.duyck@gmail.com; netdev@vger.kernel.org
> Subject: Re: [PATCH] flow-dissector: Fix alignment issue in __skb_flow_get_ports
> 
> On Fri, 2014-10-10 at 14:57 +0000, David Laight wrote:
> 
> > I think there is code to copy the IP and TCP headers to aligned memory
> > before they are parsed.
> 
> There is no such thing. You are here on netdev list, please read the
> code before doing such claims.

I did say 'I think'...
I must be thinking of some similar code somewhere else.
Possibly just the code that ensures the header isn't fragmented.

> > > The problem is the igb / ixgbe / fm10k hardware doesn't have a means of
> > > inserting padding from its side...
> >
> > Shoot the hardware engineers.
> >
> > You aren't going to get the performance you expect from a 10Ge card
> > unless the rx buffers are 'correctly' aligned.
> 
> That is simply not true on current x86 cpus. They simply dont care at
> all.

I was referring to using them on sparc64, not x86.

I know that current intel x86 cpu have support for misaligned 'rep movsd',
but I thought there was still a small cost (maybe one clock) for
single word transfers.
So maybe they care 'just a little bit'.

> You cannot blame Intel for other arches.

True, but this does mean that you don't really want to use these adapters
on a system that can't to unaligned accesses.

	David


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox