Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v3] fast_hash: clobber registers correctly for inline function use
From: Jay Vosburgh @ 2014-11-14 17:57 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev, ogerlitz, pshelar, jesse, discuss
In-Reply-To: <6751d6af4301f283134a419385f65dfcf92a44ab.1415978153.git.hannes@stressinduktion.org>

Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

>In case the arch_fast_hash call gets inlined we need to tell gcc which
>registers are clobbered with. rhashtable was fine, because it used
>arch_fast_hash via function pointer and thus the compiler took care of
>that. In case of openvswitch the call got inlined and arch_fast_hash
>touched registeres which gcc didn't know about.
>
>Also don't use conditional compilation inside arguments, as this confuses
>sparse.
>
>Fixes: e5a2c899957659c ("fast_hash: avoid indirect function calls")
>Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>Cc: Pravin Shelar <pshelar@nicira.com>
>Cc: Jesse Gross <jesse@nicira.com>
>Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>---
>
>
>v2)
>After studying gcc documentation again, it occured to me that I need to
>specificy all input operands in the clobber section, too. Otherwise gcc
>can expect that the inline assembler section won't modify the inputs,
>which is not true.
>
>v3)
>added Fixes tag

	This patch does not compile for me when applied to today's
net-next:

  CC [M]  fs/nfsd/nfs4state.o
In file included from ./arch/x86/include/asm/bitops.h:16:0,
                 from include/linux/bitops.h:33,
                 from include/linux/kernel.h:10,
                 from include/linux/list.h:8,
                 from include/linux/wait.h:6,
                 from include/linux/fs.h:6,
                 from fs/nfsd/nfs4state.c:36:
fs/nfsd/nfs4state.c: In function ‘nfsd4_cb_recall_prepare’:
./arch/x86/include/asm/alternative.h:185:2: error: ‘asm’ operand has impossible constraints
  asm volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
  ^
./arch/x86/include/asm/hash.h:27:2: note: in expansion of macro ‘alternative_call’
  alternative_call(__jhash, __intel_crc4_2_hash, X86_FEATURE_XMM4_2,
  ^
make[2]: *** [fs/nfsd/nfs4state.o] Error 1
make[1]: *** [fs/nfsd] Error 2
make: *** [fs] Error 2

	-J


>Bye,
>Hannes
>
> arch/x86/include/asm/hash.h | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/include/asm/hash.h b/arch/x86/include/asm/hash.h
>index a881d78..a25c45a 100644
>--- a/arch/x86/include/asm/hash.h
>+++ b/arch/x86/include/asm/hash.h
>@@ -23,11 +23,15 @@ static inline u32 arch_fast_hash(const void *data, u32 len, u32 seed)
> {
> 	u32 hash;
> 
>-	alternative_call(__jhash, __intel_crc4_2_hash, X86_FEATURE_XMM4_2,
> #ifdef CONFIG_X86_64
>-			 "=a" (hash), "D" (data), "S" (len), "d" (seed));
>+	alternative_call(__jhash, __intel_crc4_2_hash, X86_FEATURE_XMM4_2,
>+			 "=a" (hash), "D" (data), "S" (len), "d" (seed)
>+			 : "rdi", "rsi", "rdx", "rcx", "r8", "r9", "r10", "r11",
>+			   "cc", "memory");
> #else
>-			 "=a" (hash), "a" (data), "d" (len), "c" (seed));
>+	alternative_call(__jhash, __intel_crc4_2_hash, X86_FEATURE_XMM4_2,
>+			 "=a" (hash), "a" (data), "d" (len), "c" (seed)
>+			 : "edx", "ecx", "cc", "memory");
> #endif
> 	return hash;
> }
>@@ -36,11 +40,15 @@ static inline u32 arch_fast_hash2(const u32 *data, u32 len, u32 seed)
> {
> 	u32 hash;
> 
>-	alternative_call(__jhash2, __intel_crc4_2_hash2, X86_FEATURE_XMM4_2,
> #ifdef CONFIG_X86_64
>-			 "=a" (hash), "D" (data), "S" (len), "d" (seed));
>+	alternative_call(__jhash2, __intel_crc4_2_hash2, X86_FEATURE_XMM4_2,
>+			 "=a" (hash), "D" (data), "S" (len), "d" (seed)
>+			 : "rdi", "rsi", "rdx", "rcx", "r8", "r9", "r10", "r11",
>+			   "cc", "memory");
> #else
>-			 "=a" (hash), "a" (data), "d" (len), "c" (seed));
>+	alternative_call(__jhash2, __intel_crc4_2_hash2, X86_FEATURE_XMM4_2,
>+			 "=a" (hash), "a" (data), "d" (len), "c" (seed)
>+			 : "edx", "ecx", "cc", "memory");
> #endif
> 	return hash;
> }
>-- 
>1.9.3

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [PATCH v4 3/8] net: can: c_can: Add RAMINIT register information to driver data
From: Marc Kleine-Budde @ 2014-11-14 17:55 UTC (permalink / raw)
  To: Roger Quadros, wg
  Cc: wsa, tony, tglx, mugunthanvnm, george.cherian, balbi, nsekhar, nm,
	sergei.shtylyov, linux-omap, linux-can, netdev
In-Reply-To: <1415371762-29885-4-git-send-email-rogerq@ti.com>

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On 11/07/2014 03:49 PM, Roger Quadros wrote:
> Some platforms (e.g. TI) need special RAMINIT register handling.
> Provide a way to store RAMINIT register description in driver data.
> 
> Signed-off-by: Roger Quadros <rogerq@ti.com>
> ---
>  drivers/net/can/c_can/c_can.h          | 6 ++++++
>  drivers/net/can/c_can/c_can_platform.c | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
> index 26c975d..3c305a1 100644
> --- a/drivers/net/can/c_can/c_can.h
> +++ b/drivers/net/can/c_can/c_can.h
> @@ -171,6 +171,12 @@ enum c_can_dev_id {
>  
>  struct c_can_driver_data {
>  	enum c_can_dev_id id;
> +
> +	/* RAMINIT register description. Optional. */
> +	u8 num_can;		/* Number of CAN instances on the SoC */
> +	u8 *raminit_start_bits;	/* Array of START bit positions */
> +	u8 *raminit_done_bits;	/* Array of DONE bit positions */

What do you think about making this a struct:

+struct raminit_bits {
+       u8 start;
+       u8 done;
+};

 struct c_can_driver_data {
        enum c_can_dev_id id;
+
+       /* RAMINIT register description. Optional. */
+       const struct raminit_bits *raminit_bits; /* Array of START/DONE bit positions */
+       u8 raminit_num;         /* Number of CAN instances on the SoC */
+       bool raminit_pulse;     /* If set, sets and clears START bit (pulse) */
 };

The driver data looks like this:

+static const struct raminit_bits dra7_raminit_bits[] = {
+       [0] = { .start = 3, .done = 1, },
+       [1] = { .start = 5, .done = 2, },
+};
+
+static const struct c_can_driver_data dra7_dcan_drvdata = {
+       .id = BOSCH_D_CAN,
+       .raminit_num = ARRAY_SIZE(dra7_raminit_bits),
+       .raminit_bits = dra7_raminit_bits,
+       .raminit_pulse = true,
+};

I'll send an updated series.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [BUG] index is out of range for nfnl_group2type[]
From: Pablo Neira Ayuso @ 2014-11-14 17:44 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Patrick McHardy, Jozsef Kadlecsik, David S. Miller,
	netfilter-devel, coreteam, netdev@vger.kernel.org, linux-kernel
In-Reply-To: <5464733B.5060505@samsung.com>

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

On Thu, Nov 13, 2014 at 12:00:43PM +0300, Andrey Ryabinin wrote:
> FYI I've spotted this:
> 
> [  180.202810] ================================================================================
> [  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
> [  180.204249] index 9 is out of range for type 'int [9]'
> [  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
> [  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
> [  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
> [  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
> [  180.208639] Call Trace:
> [  180.208857] dump_stack (lib/dump_stack.c:52)
> [  180.209370] ubsan_epilogue (lib/ubsan.c:174)
> [  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
> [  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
> [  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
> [  180.211495] SYSC_bind (net/socket.c:1541)
> [  180.211940] ? security_socket_setsockopt (security/security.c:1208)
> [  180.212541] ? SyS_setsockopt (net/socket.c:1920 net/socket.c:1900)
> [  180.213057] ? SyS_write (fs/read_write.c:276 fs/read_write.c:588 fs/read_write.c:577)
> [  180.213506] SyS_bind (net/socket.c:1527)
> [  180.213919] system_call_fastpath (arch/x86/kernel/entry_64.S:423)
> [  180.214479] ================================================================================

Thanks for reporting. I think the attached patch fixes this problem.

[-- Attachment #2: 0001-netfilter-nfnetlink-fix-insufficient-validation-in-n.patch --]
[-- Type: text/x-diff, Size: 2858 bytes --]

>From 289a727f1561b4e228078d60235f77e88b350f84 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 14 Nov 2014 18:14:33 +0100
Subject: [PATCH] netfilter: nfnetlink: fix insufficient validation in
 nfnetlink_bind

Make sure the netlink group exists, otherwise this trigger an out of
bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.

[  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[  180.204249] index 9 is out of range for type 'int [9]'
[  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[  180.208639] Call Trace:
[  180.208857] dump_stack (lib/dump_stack.c:52)
[  180.209370] ubsan_epilogue (lib/ubsan.c:174)
[  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[  180.211495] SYSC_bind (net/socket.c:1541)

Moreover, define the missing nf_tables and nf_acct multicast groups
too to skip.

Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 6c5a915..13c2e17 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -47,6 +47,8 @@ static const int nfnl_group2type[NFNLGRP_MAX+1] = {
 	[NFNLGRP_CONNTRACK_EXP_NEW]	= NFNL_SUBSYS_CTNETLINK_EXP,
 	[NFNLGRP_CONNTRACK_EXP_UPDATE]	= NFNL_SUBSYS_CTNETLINK_EXP,
 	[NFNLGRP_CONNTRACK_EXP_DESTROY] = NFNL_SUBSYS_CTNETLINK_EXP,
+	[NFNLGRP_NFTABLES]		= NFNL_SUBSYS_NFTABLES,
+	[NFNLGRP_ACCT_QUOTA]		= NFNL_SUBSYS_ACCT,
 };
 
 void nfnl_lock(__u8 subsys_id)
@@ -464,7 +466,12 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 static int nfnetlink_bind(int group)
 {
 	const struct nfnetlink_subsystem *ss;
-	int type = nfnl_group2type[group];
+	int type;
+
+	if (group <= NFNLGRP_NONE || group > NFNLGRP_MAX)
+		return -EINVAL;
+
+	type = nfnl_group2type[group];
 
 	rcu_read_lock();
 	ss = nfnetlink_get_subsys(type);
@@ -514,6 +521,9 @@ static int __init nfnetlink_init(void)
 {
 	int i;
 
+	for (i = NFNLGRP_NONE + 1; i <= NFNLGRP_MAX; i++)
+		BUG_ON(nfnl_group2type[i] == NFNL_SUBSYS_NONE);
+
 	for (i=0; i<NFNL_SUBSYS_COUNT; i++)
 		mutex_init(&table[i].mutex);
 
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH net-next] tipc: allow one link per bearer to neighboring nodes
From: Holger Brunck @ 2014-11-14 17:33 UTC (permalink / raw)
  To: davem; +Cc: jon.maloy, Holger Brunck, Ying Xue, Erik Hugne, netdev

There is no reason to limit the amount of possible links to a
neighboring node to 2. If we have more then two bearers we can also
establish more links.

Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Reviewed-By: Jon Maloy <jon.maloy@ericsson.com>
cc: Ying Xue <ying.xue@windriver.com>
cc: Erik Hugne <erik.hugne@ericsson.com>
cc: netdev@vger.kernel.org
---
 net/tipc/link.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 1db162a..7cf8004 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -224,9 +224,10 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 	char addr_string[16];
 	u32 peer = n_ptr->addr;
 
-	if (n_ptr->link_cnt >= 2) {
+	if (n_ptr->link_cnt >= MAX_BEARERS) {
 		tipc_addr_string_fill(addr_string, n_ptr->addr);
-		pr_err("Attempt to establish third link to %s\n", addr_string);
+		pr_err("Attempt to establish %uth link to %s. Max %u allowed.\n",
+			n_ptr->link_cnt, addr_string, MAX_BEARERS);
 		return NULL;
 	}
 
-- 
2.1.2

^ permalink raw reply related

* Re: [PATCH 3/3] fm10k/igb/ixgbe: Use load_acquire on Rx descriptor
From: Jeff Kirsher @ 2014-11-14 17:25 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-arch, netdev, linux-kernel, mikey, tony.luck,
	mathieu.desnoyers, donald.c.skidmore, peterz, benh,
	heiko.carstens, oleg, will.deacon, davem, michael, matthew.vick,
	nic_swsd, geert, fweisbec, schwidefsky, linux, paulmck, torvalds,
	mingo
In-Reply-To: <20141113192746.12579.91054.stgit@ahduyck-server>

[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

On Thu, 2014-11-13 at 11:27 -0800, Alexander Duyck wrote:
> This change makes it so that load_acquire is used when reading the Rx
> descriptor.  The advantage of load_acquire is that it allows for a
> much
> lower cost barrier on x86, ia64, powerpc, arm64, and s390
> architectures
> than a traditional memory barrier when dealing with reads that only
> have
> to synchronize to system memory.
> 
> In addition I have updated the code so that it just checks to see if
> any
> bits have been set instead of just the DD bit since the DD bit will
> always
> be set as a part of a descriptor write-back so we just need to check
> for a
> non-zero value being present at that memory location rather than just
> checking for any specific bit.  This allows the code itself to appear
> much
> cleaner and allows the compiler more room to optimize.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Matthew Vick <matthew.vick@intel.com>
> Cc: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c |    8 +++-----
>  drivers/net/ethernet/intel/igb/igb_main.c     |    8 +++-----
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   11 ++++-------
>  3 files changed, 10 insertions(+), 17 deletions(-)

Based on the discussion on patch 01 of the series, it appears changes
are coming to the series, so I won't be picking up this patch.  I will
wait for Alex to re-spin the series with the suggested changes.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] e100: fix typo in MDI/MDI-X eeprom check in e100_phy_init
From: Jeff Kirsher @ 2014-11-14 17:17 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev, Dave Miller, Auke Kok, Malli Chilakala
In-Reply-To: <1415980770-5467-1-git-send-email-linville@tuxdriver.com>

On Fri, Nov 14, 2014 at 7:59 AM, John W. Linville
<linville@tuxdriver.com> wrote:
> Although it doesn't explicitly say so, commit 60ffa478759f39a2 ("e100:
> Fix MDIO/MDIO-X") appears to be intended to revert the earlier commit
> 648951451e6d2d53 ("e100: fixed e100 MDI/MDI-X issues").  However,
> careful examination reveals that the attempted revert actually
> _inverted_ the test for eeprom_mdix_enabled.  That is bound to program
> a few PHYs incorrectly...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1156417
>
> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Auke Kok <auke-jan.h.kok@intel.com>
> Cc: Malli Chilakala <mallikarjuna.chilakala@intel.com>
> ---
> Wow, an 8 year old bug in e100 -- woohoo!! :-)
>
> This was causing some serious flakiness for a large cash register
> deployment in Europe.  Testing with this one-line (really,
> one-character) patch seems to have resolved the issue.
>
>  drivers/net/ethernet/intel/e100.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Weird, I did not get this mail.  Anyway, thanks John, I have added
your patch to my queue.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce SO_INCOMING_CPU
From: Andy Lutomirski @ 2014-11-14 17:17 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Eric Dumazet, David Miller, netdev, Ying Cai, Willem de Bruijn,
	Neal Cardwell, Linux API
In-Reply-To: <CAHO5Pa0OGtgbUp4R287jbK2SFSpVUoXWCJybvHTFsGyCLynqLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 14, 2014 at 12:05 AM, Michael Kerrisk
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Eric,
>
> Since this is an API change ( Documentation/SubmitChecklist),
> linux-api@ should be CCed.
>
> Thanks,
>
> Michael
>
>
>
> On Fri, Nov 7, 2014 at 9:51 PM, Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> From: Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>
>> Alternative to RPS/RFS is to use hardware support for multi queue.
>>
>> Then split a set of million of sockets into worker threads, each
>> one using epoll() to manage events on its own socket pool.
>>
>> Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
>> know after accept() or connect() on which queue/cpu a socket is managed.
>>
>> We normally use one cpu per RX queue (IRQ smp_affinity being properly
>> set), so remembering on socket structure which cpu delivered last packet
>> is enough to solve the problem.
>>
>> After accept(), connect(), or even file descriptor passing around
>> processes, applications can use :
>>
>>  int cpu;
>>  socklen_t len = sizeof(cpu);
>>
>>  getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
>>
>> And use this information to put the socket into the right silo
>> for optimal performance, as all networking stack should run
>> on the appropriate cpu, without need to send IPI (RPS/RFS).

As a heavy user of RFS (and finder of bugs in it, too), here's my
question about this API:

How does an application tell whether the socket represents a
non-actively-steered flow?  If the flow is subject to RFS, then moving
the application handling to the socket's CPU seems problematic, as the
socket's CPU might move as well.  The current implementation in this
patch seems to tell me which CPU the most recent packet came in on,
which is not necessarily very useful.

Some possibilities:

1. Let SO_INCOMING_CPU fail if RFS or RPS are in play.

2. Change the interface a bit to report the socket's preferred CPU
(where it would go without RFS, for example) and then let the
application use setsockopt to tell the socket to stay put (i.e. turn
off RFS and RPS for that flow).

3. Report the preferred CPU as in (2) but let the application ask for
something different.

For example, I have flows for which I know which CPU I want.  A nice
API to put the flow there would be quite useful.


Also, it may be worth changing the naming to indicate that these are
about the rx cpu (they are, right?).  For some applications (sparse,
low-latency flows, for example), it can be useful to keep the tx
completion handling on a different CPU.

--Andy

>>
>> Signed-off-by: Eric Dumazet <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> ---
>>  arch/alpha/include/uapi/asm/socket.h   |    2 ++
>>  arch/avr32/include/uapi/asm/socket.h   |    2 ++
>>  arch/cris/include/uapi/asm/socket.h    |    2 ++
>>  arch/frv/include/uapi/asm/socket.h     |    2 ++
>>  arch/ia64/include/uapi/asm/socket.h    |    2 ++
>>  arch/m32r/include/uapi/asm/socket.h    |    2 ++
>>  arch/mips/include/uapi/asm/socket.h    |    2 ++
>>  arch/mn10300/include/uapi/asm/socket.h |    2 ++
>>  arch/parisc/include/uapi/asm/socket.h  |    2 ++
>>  arch/powerpc/include/uapi/asm/socket.h |    2 ++
>>  arch/s390/include/uapi/asm/socket.h    |    2 ++
>>  arch/sparc/include/uapi/asm/socket.h   |    2 ++
>>  arch/xtensa/include/uapi/asm/socket.h  |    2 ++
>>  include/net/sock.h                     |   12 ++++++++++++
>>  include/uapi/asm-generic/socket.h      |    2 ++
>>  net/core/sock.c                        |    5 +++++
>>  net/ipv4/tcp_ipv4.c                    |    1 +
>>  net/ipv4/udp.c                         |    1 +
>>  net/ipv6/tcp_ipv6.c                    |    1 +
>>  net/ipv6/udp.c                         |    1 +
>>  net/sctp/ulpqueue.c                    |    5 +++--
>>  21 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index 3de1394bcab821984674e89a3ee022cc6dd5f0f2..e2fe0700b3b442bffc1f606b1b8b0bb7759aa157 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -87,4 +87,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
>> index 6e6cd159924b1855aa5f1811ad4e4c60b403c431..92121b0f5b989a61c008e0be24030725bab88e36 100644
>> --- a/arch/avr32/include/uapi/asm/socket.h
>> +++ b/arch/avr32/include/uapi/asm/socket.h
>> @@ -80,4 +80,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _UAPI__ASM_AVR32_SOCKET_H */
>> diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
>> index ed94e5ed0a238c2750e677ccb806a6bc0a94041a..60f60f5b9b35bd219d7a9834fe5394e8ac5fdbab 100644
>> --- a/arch/cris/include/uapi/asm/socket.h
>> +++ b/arch/cris/include/uapi/asm/socket.h
>> @@ -82,6 +82,8 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_SOCKET_H */
>>
>>
>> diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
>> index ca2c6e6f31c6817780d31a246652adcc9847e373..2c6890209ea60c149bf097c2a1b369519cb8c301 100644
>> --- a/arch/frv/include/uapi/asm/socket.h
>> +++ b/arch/frv/include/uapi/asm/socket.h
>> @@ -80,5 +80,7 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_SOCKET_H */
>>
>> diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
>> index a1b49bac7951929127ed08db549218c2c16ccf89..09a93fb566f6c6c6fe29c10c95b931881843d1cd 100644
>> --- a/arch/ia64/include/uapi/asm/socket.h
>> +++ b/arch/ia64/include/uapi/asm/socket.h
>> @@ -89,4 +89,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_IA64_SOCKET_H */
>> diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
>> index 6c9a24b3aefa3a4f3048c17a7fa06d97b585ec14..e8589819c2743c6e112b15a245fc3ebd146e6313 100644
>> --- a/arch/m32r/include/uapi/asm/socket.h
>> +++ b/arch/m32r/include/uapi/asm/socket.h
>> @@ -80,4 +80,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_M32R_SOCKET_H */
>> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
>> index a14baa218c76f14de988ef106bdac5dadc48aceb..2e9ee8c55a103a0337d9f80f71fe9ef28be1154b 100644
>> --- a/arch/mips/include/uapi/asm/socket.h
>> +++ b/arch/mips/include/uapi/asm/socket.h
>> @@ -98,4 +98,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
>> index 6aa3ce1854aa9523d46bc28851eddabd59edeb37..f3492e8c9f7009c33e07168df916f7337bef3929 100644
>> --- a/arch/mn10300/include/uapi/asm/socket.h
>> +++ b/arch/mn10300/include/uapi/asm/socket.h
>> @@ -80,4 +80,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_SOCKET_H */
>> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
>> index fe35ceacf0e72cad69a43d9b1ce7b8f5ec3da98a..7984a1cab3da980f1f810827967b4b67616eb89b 100644
>> --- a/arch/parisc/include/uapi/asm/socket.h
>> +++ b/arch/parisc/include/uapi/asm/socket.h
>> @@ -79,4 +79,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      0x4029
>>
>> +#define SO_INCOMING_CPU                0x402A
>> +
>>  #endif /* _UAPI_ASM_SOCKET_H */
>> diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
>> index a9c3e2e18c054a1e952fe33599401de57c6a6544..3474e4ef166df4a573773916b325d0fa9f3b45d0 100644
>> --- a/arch/powerpc/include/uapi/asm/socket.h
>> +++ b/arch/powerpc/include/uapi/asm/socket.h
>> @@ -87,4 +87,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_POWERPC_SOCKET_H */
>> diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
>> index e031332096d7c7b23b5953680289e8f3bcc3b378..8457636c33e1b67a9b7804daa05627839035a8fb 100644
>> --- a/arch/s390/include/uapi/asm/socket.h
>> +++ b/arch/s390/include/uapi/asm/socket.h
>> @@ -86,4 +86,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _ASM_SOCKET_H */
>> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
>> index 54d9608681b6947ae25dab008f808841d96125c0..4a8003a9416348006cfa85d5bcdf7553c8d23958 100644
>> --- a/arch/sparc/include/uapi/asm/socket.h
>> +++ b/arch/sparc/include/uapi/asm/socket.h
>> @@ -76,6 +76,8 @@
>>
>>  #define SO_BPF_EXTENSIONS      0x0032
>>
>> +#define SO_INCOMING_CPU                0x0033
>> +
>>  /* Security levels - as per NRL IPv6 - don't actually do anything */
>>  #define SO_SECURITY_AUTHENTICATION             0x5001
>>  #define SO_SECURITY_ENCRYPTION_TRANSPORT       0x5002
>> diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
>> index 39acec0cf0b1d500c1c40f9b523ef3a9a142c2f1..c46f6a696849c6f7f8a34b2cc522b48e04b17380 100644
>> --- a/arch/xtensa/include/uapi/asm/socket.h
>> +++ b/arch/xtensa/include/uapi/asm/socket.h
>> @@ -91,4 +91,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* _XTENSA_SOCKET_H */
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 6767d75ecb17693eb59a99b8218da4319854ccc0..7789b59c0c400eb99f65d1f0e03cd9773664cf93 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -273,6 +273,7 @@ struct cg_proto;
>>    *    @sk_rcvtimeo: %SO_RCVTIMEO setting
>>    *    @sk_sndtimeo: %SO_SNDTIMEO setting
>>    *    @sk_rxhash: flow hash received from netif layer
>> +  *    @sk_incoming_cpu: record cpu processing incoming packets
>>    *    @sk_txhash: computed flow hash for use on transmit
>>    *    @sk_filter: socket filtering instructions
>>    *    @sk_protinfo: private area, net family specific, when not using slab
>> @@ -350,6 +351,12 @@ struct sock {
>>  #ifdef CONFIG_RPS
>>         __u32                   sk_rxhash;
>>  #endif
>> +       u16                     sk_incoming_cpu;
>> +       /* 16bit hole
>> +        * Warned : sk_incoming_cpu can be set from softirq,
>> +        * Do not use this hole without fully understanding possible issues.
>> +        */
>> +
>>         __u32                   sk_txhash;
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>>         unsigned int            sk_napi_id;
>> @@ -833,6 +840,11 @@ static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
>>         return sk->sk_backlog_rcv(sk, skb);
>>  }
>>
>> +static inline void sk_incoming_cpu_update(struct sock *sk)
>> +{
>> +       sk->sk_incoming_cpu = raw_smp_processor_id();
>> +}
>> +
>>  static inline void sock_rps_record_flow_hash(__u32 hash)
>>  {
>>  #ifdef CONFIG_RPS
>> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
>> index ea0796bdcf88404ef0f127eb6e64ba00c16ea856..f541ccefd4acbeb4ad757be9dbf4b67f204bf21d 100644
>> --- a/include/uapi/asm-generic/socket.h
>> +++ b/include/uapi/asm-generic/socket.h
>> @@ -82,4 +82,6 @@
>>
>>  #define SO_BPF_EXTENSIONS      48
>>
>> +#define SO_INCOMING_CPU                49
>> +
>>  #endif /* __ASM_GENERIC_SOCKET_H */
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index ac56dd06c306f3712e57ce8e4724c79565589499..0725cf0cb685787b2122606437da53299fb24621 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1213,6 +1213,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
>>                 v.val = sk->sk_max_pacing_rate;
>>                 break;
>>
>> +       case SO_INCOMING_CPU:
>> +               v.val = sk->sk_incoming_cpu;
>> +               break;
>> +
>>         default:
>>                 return -ENOPROTOOPT;
>>         }
>> @@ -1517,6 +1521,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
>>
>>                 newsk->sk_err      = 0;
>>                 newsk->sk_priority = 0;
>> +               newsk->sk_incoming_cpu = raw_smp_processor_id();
>>                 /*
>>                  * Before updating sk_refcnt, we must commit prior changes to memory
>>                  * (Documentation/RCU/rculist_nulls.txt for details)
>> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
>> index 9c7d7621466b1241f404a5ca11de809dcff2d02a..3893f51972f28271a6d27a763c05495c5c2554f7 100644
>> --- a/net/ipv4/tcp_ipv4.c
>> +++ b/net/ipv4/tcp_ipv4.c
>> @@ -1662,6 +1662,7 @@ process:
>>                 goto discard_and_relse;
>>
>>         sk_mark_napi_id(sk, skb);
>> +       sk_incoming_cpu_update(sk);
>>         skb->dev = NULL;
>>
>>         bh_lock_sock_nested(sk);
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index df19027f44f3d6fbe13dec78d3b085968dbf2329..f52b6081158e87caa5df32e8e5d27dbf314a01b1 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1445,6 +1445,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>>         if (inet_sk(sk)->inet_daddr) {
>>                 sock_rps_save_rxhash(sk, skb);
>>                 sk_mark_napi_id(sk, skb);
>> +               sk_incoming_cpu_update(sk);
>>         }
>>
>>         rc = sock_queue_rcv_skb(sk, skb);
>> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
>> index ace29b60813cf8a1d7182ad2262cbcbd21810fa7..ac40d23204b5e55da5172c80dafd1d4854b370d5 100644
>> --- a/net/ipv6/tcp_ipv6.c
>> +++ b/net/ipv6/tcp_ipv6.c
>> @@ -1455,6 +1455,7 @@ process:
>>                 goto discard_and_relse;
>>
>>         sk_mark_napi_id(sk, skb);
>> +       sk_incoming_cpu_update(sk);
>>         skb->dev = NULL;
>>
>>         bh_lock_sock_nested(sk);
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 9b6809232b178c16d699ce3d152196b8c4cb096b..0125ca3daf47a4a3333e7462a11550d3e2f96875 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -577,6 +577,7 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>>         if (!ipv6_addr_any(&sk->sk_v6_daddr)) {
>>                 sock_rps_save_rxhash(sk, skb);
>>                 sk_mark_napi_id(sk, skb);
>> +               sk_incoming_cpu_update(sk);
>>         }
>>
>>         rc = sock_queue_rcv_skb(sk, skb);
>> diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
>> index d49dc2ed30adb97a809eb37902b9956c366a2862..ce469d648ffbe166f9ae1c5650f481256f31a7f8 100644
>> --- a/net/sctp/ulpqueue.c
>> +++ b/net/sctp/ulpqueue.c
>> @@ -205,9 +205,10 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
>>         if (sock_flag(sk, SOCK_DEAD) || (sk->sk_shutdown & RCV_SHUTDOWN))
>>                 goto out_free;
>>
>> -       if (!sctp_ulpevent_is_notification(event))
>> +       if (!sctp_ulpevent_is_notification(event)) {
>>                 sk_mark_napi_id(sk, skb);
>> -
>> +               sk_incoming_cpu_update(sk);
>> +       }
>>         /* Check if the user wishes to receive this event.  */
>>         if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
>>                 goto out_free;
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Michael Kerrisk Linux man-pages maintainer;
> http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface", http://blog.man7.org/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: Device Tree Binding for Marvell DSA Switch on imx28 board over Mdio Interface
From: Florian Fainelli @ 2014-11-14 17:09 UTC (permalink / raw)
  To: Oliver Graute; +Cc: netdev
In-Reply-To: <CA+KjHfa=Tqs=CWe6TT+rbmc9UaFZghqW1OtigPm9wOyXgN2AOQ@mail.gmail.com>

On 11/14/2014 06:52 AM, Oliver Graute wrote:
> On Fri, Nov 14, 2014 at 8:39 AM, Oliver Graute <oliver.graute@gmail.com> wrote:
>> On Thu, Nov 13, 2014 at 9:03 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> On 11/13/2014 07:15 AM, Oliver Graute wrote:
>>>> Hello Florian,
>>>>
>>>> On Wed, Nov 12, 2014 at 8:19 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>> On 11/12/2014 05:07 AM, Oliver Graute wrote:
>>>>>> Hello,
>>>>>>
>>>>>> how do I specify the DSA node and the MDIO node in the Device Tree
>>>>>> Binding to integrate a Marvell 88e6071 switch with a imx28 board?
>>>>>>
>>>>>> On my board the Marvell switch 88e6071 is connected via phy1 (on a
>>>>>> imx28 PCB) to phy5 on the Marvell switch (on a Switch PCB). All phys
>>>>>> are connected via the same MDIO Bus.
>>>>>>
>>>>>> I enabled the Marvell DSA Support Driver, Gianfar Ethernet Driver and
>>>>>> Freescale PQ MDIO Driver in the Kernel (I' am not sure if this is the
>>>>>> right choice for imx28 fec ethernet controller is it?)
>>>>>>
>>>>
>>>> I changed my DeviceTree according to your proposal. Now I got a ENODEV 19
>>>> in dsa_of_probe. Because  of_find_device_by_node(ethernet) is returning 0.
>>>> Is my ethernet setting still wrong?
>>>
>>> Is your ethernet driver also modular? If so, you will need it to be
>>> loaded *before* dsa. of_find_device_by_node() also needs the ethernet
>>> driver to be a platform_driver.
>>
>> No my Freescale FEC PHY driver is not a module. FEC is a imx28/arm
>> platform driver or not?
>>
>> I loaded the DSA as a Kernel module to make sure that the DSA probing
>> is happening when the switch is really on. I enable the SWITCH ON Pin
>> on bootup with a systemd started script. Then I write some registers
>> on the switch with a userspace mii tool. This manually writing of some
>> switch registers works fine via the MII Bus using ioctl(SIOCGMIIPHY).
>>
>> But i would like to integrate the switch with a full dsa driver.
>> currently its failing with dsa_of_probe returns=-19
>>
> 
> the dsa_core driver is probing the mii_bus before eth0 and eth1 are
> detected via the FEC Driver.

DSA is typically built into the kernel and few people have actually
tried to make modules work with it. You may have to play with
EPROBE_DEFER and similar to satisfy the ordering. I would start with
building these drivers in the kernel, make them work together, and then
see what is missing to make it work in a modular build configuration.

> 
> [   20.716253] !!!!!enter dsa_init_module!!!!!
> [   20.777046] !!!!Enter dsa Probe!!!!!
> [   20.803422] Distributed Switch Architecture driver version 0.1
> [   20.809295] !!!!!Enter dsa_of_probe!!!!!
> [   20.888268] !!!!!mdio->name=mdio mdio->type=mdio
> mdio->full_name=/mdio@800f0040 !!!!!
> [   20.999618] !!!!!np->name=dsa np->type=<NULL> np->full_name=/dsa@0 !!!!!
> [   21.097805] !!!!before of_mdio_find_bus!!!!!
> [   21.137278] !!!!!enter of_mdio_find_bus!!!!!
> [   21.190232] !!!!!enter of_mdio_bus_match!!!!!
> [   21.194635] !!!!!enter of_mdio_bus_match!!!!!
> [   21.199000] !!!!!enter of_mdio_bus_match!!!!!
> [   21.300627] !!!!Leave of_mdio_find_bus !!!!!
> [   21.304949] !!!!after of_mdio_find_bus mdio_bus=Freescale
> PowerQUICC MII Bus !!!!!
> [   21.456904] !!!!before of_parse_phandle dsa,ethernet!!!!!
> [   21.570569] !!!!before of find_device_by_node!!!!!
> [   21.575416] !!!!!ethernet->name=ethernet ethernet->type=<NULL>
> ethernet->full_name=/ahb@80080000/ethernet@800f4000 !!!!!
> [   21.860234] !!!!! enter of_find_device_by_node !!!!!
> [   21.865284] !!!!! Leave of_find_device_by_node dev=c790fe10 !!!!!
> [   21.970600] !!!!! dev->init_name=(null) !!!!!
> [   21.975001] before to_platform_device test->name=800f4000.ethernet
> [   22.088915] !!!!before of kzalloc!!!!!
> [   22.134753] !!!!before pd->netdev!!!!!
> [   22.138548] !!!!before dev_to_net_device!!!!!
> [   22.210241] !!!!dev_put(dev)!!!!!
> [   22.213600] !!!!kzalloc!!!!!
> [   22.216493] !!!!platform_set_drv_data!!!!!
> [   22.313247] !!!!!enter dev_to_mii_bus!!!!!
> [   22.317393] !!!!!enter dsa_switch_setup!!!!!
> [   22.394756] !!!!name=!!!!!
> [   22.397691] !!!!bus->name=Freescale PowerQUICC MII Bus!!!!!
> [   22.502050] !!!!pd->sw_addr=3!!!!!
> [   22.505489] !!!!Enter dsa_switch_probe!!!!!
> [   22.509685] !!!!Leave dsa_switch_probe!!!!!
> [   22.630239] eth1[0]: could not detect attached switch
> [   22.635337] eth1[0]: couldn't create dsa switch instance (error -22)
> [   22.740538] !!!!Leave dsa Probe!!!!!
> [   22.794305] !!!!!leave dsa_init_module!!!!!
> 
> [   65.954070] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Micrel KSZ8041] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   66.067135] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   66.532877] fec 800f4000.ethernet eth1: Freescale FEC PHY driver
> [Micrel KSZ8041] (mii_bus:phy_addr=800f0000.etherne:01, irq=-1)
> 
> if i manually rmmod and modprobe the dsa_core driver  after FEC PHY
> detection again i got a  EEXIST 17
> 
> modprobe dsa_core
> [  212.770578] !!!!!enter dsa_init_module!!!!!
> [  212.775121] !!!!Enter dsa Probe!!!!!
> [  212.778726] Distributed Switch Architecture driver version 0.1
> [  212.791071] !!!!!Enter dsa_of_probe!!!!!
> [  212.795191] !!!!!mdio->name=mdio mdio->type=mdio
> mdio->full_name=/mdio@800f0040 !!!!!
> [  212.805452] !!!!!np->name=dsa np->type=<NULL> np->full_name=/dsa@0 !!!!!
> [  212.813355] !!!!before of_mdio_find_bus!!!!!
> [  212.817669] !!!!!enter of_mdio_find_bus!!!!!
> [  212.823707] !!!!!enter of_mdio_bus_match!!!!!
> [  212.828111] !!!!!enter of_mdio_bus_match!!!!!
> [  212.834213] !!!!!enter of_mdio_bus_match!!!!!
> [  212.838620] !!!!Leave of_mdio_find_bus !!!!!
> [  212.844655] !!!!after of_mdio_find_bus mdio_bus=Freescale
> PowerQUICC MII Bus !!!!!
> [  212.853684] !!!!before of_parse_phandle dsa,ethernet!!!!!
> [  212.859179] !!!!before of find_device_by_node!!!!!
> [  212.866019] !!!!!ethernet->name=ethernet ethernet->type=<NULL>
> ethernet->full_name=/ahb@80080000/ethernet@800f4000 !!!!!
> [  212.878029] !!!!! enter of_find_device_by_node !!!!!
> [  212.884159] !!!!! Leave of_find_device_by_node dev=c790fe10 !!!!!
> [  212.891366] !!!!! dev->init_name=(null) !!!!!
> [  212.895769]
> [  212.895769] before to_platform_device test->name=800f4000.ethernet
> [  212.905738] !!!!before of kzalloc!!!!!
> [  212.909586] !!!!before pd->netdev!!!!!
> [  212.915402] !!!!before dev_to_net_device!!!!!
> [  212.919817] !!!!dev_put(dev)!!!!!
> [  212.925431] dsa: probe of dsa.5 failed with error -17
> [  212.936922] !!!!!leave dsa_init_module!!!!!
> 
> 
> best regards,
> 
> Oliver
> 

^ permalink raw reply

* Re: [PATCH net-next] net: introduce SO_INCOMING_CPU
From: Eric Dumazet @ 2014-11-14 17:00 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: David Miller, netdev, Ying Cai, Willem de Bruijn, Neal Cardwell,
	Linux API
In-Reply-To: <CAHO5Pa0OGtgbUp4R287jbK2SFSpVUoXWCJybvHTFsGyCLynqLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, 2014-11-14 at 09:05 +0100, Michael Kerrisk wrote:
> Hi Eric,
> 
> Since this is an API change ( Documentation/SubmitChecklist),
> linux-api@ should be CCed.

Right, I am sorry !

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] icmp: Remove some spurious dropped packet profile hits from the ICMP path
From: Eric Dumazet @ 2014-11-14 16:58 UTC (permalink / raw)
  To: Rick Jones; +Cc: Rick Jones, netdev, davem
In-Reply-To: <54662AF9.4050002@hp.com>

On Fri, 2014-11-14 at 08:16 -0800, Rick Jones wrote:

> I thought the point of the drop profiling was to show where the drops 
> were happening.  Leaving the kfree_skb() up in icmp_rcv() does not 
> improve showing where the drops happened.  That is why I've pushed it 
> down into the routines called by icmp_rcv().

OK, but we drop an icmp message, and that really should be enough.

The point is that most normal icmp messages wont be dropped, but
consumed.

I am not sure we want to bloat the kernel for such minor problem.

^ permalink raw reply

* [PATCH 8/8] netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

From: bill bonaparte <programme110@gmail.com>

After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75be2 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().

The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.

Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit.  In case
race occured, re-add the CT to the dying list.

While at this, fix coding style of the comment that has been
updated.

Fixes: 93bb0ceb75be2 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5016a69..2c69975 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -611,12 +611,16 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	 */
 	NF_CT_ASSERT(!nf_ct_is_confirmed(ct));
 	pr_debug("Confirming conntrack %p\n", ct);
-	/* We have to check the DYING flag inside the lock to prevent
-	   a race against nf_ct_get_next_corpse() possibly called from
-	   user context, else we insert an already 'dead' hash, blocking
-	   further use of that particular connection -JM */
+
+	/* We have to check the DYING flag after unlink to prevent
+	 * a race against nf_ct_get_next_corpse() possibly called from
+	 * user context, else we insert an already 'dead' hash, blocking
+	 * further use of that particular connection -JM.
+	 */
+	nf_ct_del_from_dying_or_unconfirmed_list(ct);
 
 	if (unlikely(nf_ct_is_dying(ct))) {
+		nf_ct_add_to_dying_list(ct);
 		nf_conntrack_double_unlock(hash, reply_hash);
 		local_bh_enable();
 		return NF_ACCEPT;
@@ -636,8 +640,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
 			goto out;
 
-	nf_ct_del_from_dying_or_unconfirmed_list(ct);
-
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
 	   weird delay cases. */
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 6/8] netfilter: nft_compat: use the match->table to validate dependencies
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

Instead of the match->name, which is of course not relevant.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 70dc965..265e190 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -346,7 +346,7 @@ nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 	union nft_entry e = {};
 	int ret;
 
-	ret = nft_compat_chain_validate_dependency(match->name, ctx->chain);
+	ret = nft_compat_chain_validate_dependency(match->table, ctx->chain);
 	if (ret < 0)
 		goto err;
 
@@ -420,7 +420,7 @@ static int nft_match_validate(const struct nft_ctx *ctx,
 		if (!(hook_mask & match->hooks))
 			return -EINVAL;
 
-		ret = nft_compat_chain_validate_dependency(match->name,
+		ret = nft_compat_chain_validate_dependency(match->table,
 							   ctx->chain);
 		if (ret < 0)
 			return ret;
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 5/8] netfilter: nft_compat: relax chain type validation
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.

This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.

Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c |   32 ++------------------------------
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index b92f129..70dc965 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -21,45 +21,17 @@
 #include <linux/netfilter_ipv6/ip6_tables.h>
 #include <net/netfilter/nf_tables.h>
 
-static const struct {
-       const char	*name;
-       u8		type;
-} table_to_chaintype[] = {
-       { "filter",     NFT_CHAIN_T_DEFAULT },
-       { "raw",        NFT_CHAIN_T_DEFAULT },
-       { "security",   NFT_CHAIN_T_DEFAULT },
-       { "mangle",     NFT_CHAIN_T_ROUTE },
-       { "nat",        NFT_CHAIN_T_NAT },
-       { },
-};
-
-static int nft_compat_table_to_chaintype(const char *table)
-{
-	int i;
-
-	for (i = 0; table_to_chaintype[i].name != NULL; i++) {
-		if (strcmp(table_to_chaintype[i].name, table) == 0)
-			return table_to_chaintype[i].type;
-	}
-
-	return -1;
-}
-
 static int nft_compat_chain_validate_dependency(const char *tablename,
 						const struct nft_chain *chain)
 {
-	enum nft_chain_type type;
 	const struct nft_base_chain *basechain;
 
 	if (!tablename || !(chain->flags & NFT_BASE_CHAIN))
 		return 0;
 
-	type = nft_compat_table_to_chaintype(tablename);
-	if (type < 0)
-		return -EINVAL;
-
 	basechain = nft_base_chain(chain);
-	if (basechain->type->type != type)
+	if (strcmp(tablename, "nat") == 0 &&
+	    basechain->type->type != NFT_CHAIN_T_NAT)
 		return -EINVAL;
 
 	return 0;
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 4/8] netfilter: nft_compat: use current net namespace
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

Instead of init_net when using xtables over nftables compat.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 9d6d6f6..b92f129 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -117,7 +117,7 @@ nft_target_set_tgchk_param(struct xt_tgchk_param *par,
 			   struct xt_target *target, void *info,
 			   union nft_entry *entry, u8 proto, bool inv)
 {
-	par->net	= &init_net;
+	par->net	= ctx->net;
 	par->table	= ctx->table->name;
 	switch (ctx->afi->family) {
 	case AF_INET:
@@ -324,7 +324,7 @@ nft_match_set_mtchk_param(struct xt_mtchk_param *par, const struct nft_ctx *ctx,
 			  struct xt_match *match, void *info,
 			  union nft_entry *entry, u8 proto, bool inv)
 {
-	par->net	= &init_net;
+	par->net	= ctx->net;
 	par->table	= ctx->table->name;
 	switch (ctx->afi->family) {
 	case AF_INET:
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 2/8] netfilter: ipset: small potential read beyond the end of buffer
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

From: Dan Carpenter <dan.carpenter@oracle.com>

We could be reading 8 bytes into a 4 byte buffer here.  It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 86f9d76..d259da3 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1863,6 +1863,12 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	if (*op < IP_SET_OP_VERSION) {
 		/* Check the version at the beginning of operations */
 		struct ip_set_req_version *req_version = data;
+
+		if (*len < sizeof(struct ip_set_req_version)) {
+			ret = -EINVAL;
+			goto done;
+		}
+
 		if (req_version->version != IPSET_PROTOCOL) {
 			ret = -EPROTO;
 			goto done;
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 0/8] Netfilter/IPVS fixes for net
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter updates for your net tree,
they are:

1) Fix missing initialization of the range structure (allocated in the
   stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

2) Make sure the data we receive from userspace contains the req_version
   structure, otherwise return an error incomplete on truncated input.
   From Dan Carpenter.

3) Fix handling og skb->sk which may cause incorrect handling
   of connections from a local process. Via Simon Horman, patch from
   Calvin Owens.

4) Fix wrong netns in nft_compat when setting target and match params
   structure.

5) Relax chain type validation in nft_compat that was recently included,
   this broke the matches that need to be run from the route chain type.
   Now iptables-test.py automated regression tests report success again
   and we avoid the only possible problematic case, which is the use of
   nat targets out of nat chain type.

6) Use match->table to validate the tablename, instead of the match->name.
   Again patch for nft_compat.

7) Restore the synchronous release of objects from the commit and abort
   path in nf_tables. This is causing two major problems: splats when using
   nft_compat, given that matches and targets may sleep and call_rcu is
   invoked from softirq context. Moreover Patrick reported possible event
   notification reordering when rules refer to anonymous sets.

8) Fix race condition in between packets that are being confirmed by
   conntrack and the ctnetlink flush operation. This happens since the
   removal of the central spinlock. Thanks to Jesper D. Brouer to looking
   into this.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit d52fdbb735c36a209f36a628d40ca9185b349ba7:

  smc91x: retrieve IRQ and trigger flags in a modern way (2014-11-01 17:04:20 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 5195c14c8b27cc0b18220ddbf0e5ad3328a04187:

  netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse (2014-11-14 17:43:05 +0100)

----------------------------------------------------------------
Calvin Owens (1):
      ipvs: Keep skb->sk when allocating headroom on tunnel xmit

Dan Carpenter (1):
      netfilter: ipset: small potential read beyond the end of buffer

Daniel Borkmann (1):
      netfilter: nft_masq: fix uninitialized range in nft_masq_{ipv4, ipv6}_eval

Pablo Neira Ayuso (4):
      netfilter: nft_compat: use current net namespace
      netfilter: nft_compat: relax chain type validation
      netfilter: nft_compat: use the match->table to validate dependencies
      netfilter: nf_tables: restore synchronous object release from commit/abort

bill bonaparte (1):
      netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse

 include/net/netfilter/nf_tables.h  |    2 --
 net/ipv4/netfilter/nft_masq_ipv4.c |    1 +
 net/ipv6/netfilter/nft_masq_ipv6.c |    1 +
 net/netfilter/ipset/ip_set_core.c  |    6 ++++++
 net/netfilter/ipvs/ip_vs_xmit.c    |    2 ++
 net/netfilter/nf_conntrack_core.c  |   14 +++++++------
 net/netfilter/nf_tables_api.c      |   24 ++++++++--------------
 net/netfilter/nft_compat.c         |   40 ++++++------------------------------
 8 files changed, 32 insertions(+), 58 deletions(-)

^ permalink raw reply

* Re: [PATCH 1/3] arch: Introduce load_acquire() and store_release()
From: Alexander Duyck @ 2014-11-14 16:58 UTC (permalink / raw)
  To: David Laight, linux-arch@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: mikey@neuling.org, tony.luck@intel.com,
	mathieu.desnoyers@polymtl.ca, donald.c.skidmore@intel.com,
	peterz@infradead.org, benh@kernel.crashing.org,
	heiko.carstens@de.ibm.com, oleg@redhat.com, will.deacon@arm.com,
	davem@davemloft.net, michael@ellerman.id.au,
	matthew.vick@intel.com, nic_swsd@realtek.com,
	geert@linux-m68k.org, jeffrey.t.kirsher@intel.com,
	fweisbec@gmail.com, schwidefsky@de.ibm.com,
	"linux@arm.linux.org.uk" <linux@
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9F0780@AcuExch.aculab.com>


On 11/14/2014 02:45 AM, David Laight wrote:
> From: Alexander Duyck
>> It is common for device drivers to make use of acquire/release semantics
>> when dealing with descriptors stored in device memory.  On reviewing the
>> documentation and code for smp_load_acquire() and smp_store_release() as
>> well as reviewing an IBM website that goes over the use of PowerPC barriers
>> at http://www.ibm.com/developerworks/systems/articles/powerpc.html it
>> occurred to me that the same code could likely be applied to device drivers.
>>
>> As a result this patch introduces load_acquire() and store_release().  The
>> load_acquire() function can be used in the place of situations where a test
>> for ownership must be followed by a memory barrier.  The below example is
>> from ixgbe:
>>
>> 	if (!rx_desc->wb.upper.status_error)
>> 		break;
>>
>> 	/* This memory barrier is needed to keep us from reading
>> 	 * any other fields out of the rx_desc until we know the
>> 	 * descriptor has been written back
>> 	 */
>> 	rmb();
>>
>> With load_acquire() this can be changed to:
>>
>> 	if (!load_acquire(&rx_desc->wb.upper.status_error))
>> 		break;
> If I'm quickly reading the 'new' code I need to look up yet another
> function, with the 'old' code I can easily see the logic.
>
> You've also added a memory barrier to the 'break' path - which isn't needed.
>
> The driver might also have additional code that can be added before the barrier
> so reducing the cost of the barrier.
>
> The driver may also be able to perform multiple actions before a barrier is needed.
>
> Hiding barriers isn't necessarily a good idea anyway.
> If you are writing a driver you need to understand when and where they are needed.
>
> Maybe you need a new (weaker) barrier to replace rmb() on some architectures.
>
> ...
>
>
> 	David

Yeah, I think I might explore creating some lightweight barriers. The 
load/acquire stuff is a bit overkill for what is needed.

Thanks,

Alex

^ permalink raw reply

* [PATCH 7/8] netfilter: nf_tables: restore synchronous object release from commit/abort
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.

Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.

So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.

This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |    2 --
 net/netfilter/nf_tables_api.c     |   24 ++++++++----------------
 2 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 845c596..3ae969e 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -396,14 +396,12 @@ struct nft_rule {
 /**
  *	struct nft_trans - nf_tables object update in transaction
  *
- *	@rcu_head: rcu head to defer release of transaction data
  *	@list: used internally
  *	@msg_type: message type
  *	@ctx: transaction context
  *	@data: internal information related to the transaction
  */
 struct nft_trans {
-	struct rcu_head			rcu_head;
 	struct list_head		list;
 	int				msg_type;
 	struct nft_ctx			ctx;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 11ab4b0..66e8425 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3484,13 +3484,8 @@ static void nft_chain_commit_update(struct nft_trans *trans)
 	}
 }
 
-/* Schedule objects for release via rcu to make sure no packets are accesing
- * removed rules.
- */
-static void nf_tables_commit_release_rcu(struct rcu_head *rt)
+static void nf_tables_commit_release(struct nft_trans *trans)
 {
-	struct nft_trans *trans = container_of(rt, struct nft_trans, rcu_head);
-
 	switch (trans->msg_type) {
 	case NFT_MSG_DELTABLE:
 		nf_tables_table_destroy(&trans->ctx);
@@ -3612,10 +3607,11 @@ static int nf_tables_commit(struct sk_buff *skb)
 		}
 	}
 
+	synchronize_rcu();
+
 	list_for_each_entry_safe(trans, next, &net->nft.commit_list, list) {
 		list_del(&trans->list);
-		trans->ctx.nla = NULL;
-		call_rcu(&trans->rcu_head, nf_tables_commit_release_rcu);
+		nf_tables_commit_release(trans);
 	}
 
 	nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
@@ -3623,13 +3619,8 @@ static int nf_tables_commit(struct sk_buff *skb)
 	return 0;
 }
 
-/* Schedule objects for release via rcu to make sure no packets are accesing
- * aborted rules.
- */
-static void nf_tables_abort_release_rcu(struct rcu_head *rt)
+static void nf_tables_abort_release(struct nft_trans *trans)
 {
-	struct nft_trans *trans = container_of(rt, struct nft_trans, rcu_head);
-
 	switch (trans->msg_type) {
 	case NFT_MSG_NEWTABLE:
 		nf_tables_table_destroy(&trans->ctx);
@@ -3725,11 +3716,12 @@ static int nf_tables_abort(struct sk_buff *skb)
 		}
 	}
 
+	synchronize_rcu();
+
 	list_for_each_entry_safe_reverse(trans, next,
 					 &net->nft.commit_list, list) {
 		list_del(&trans->list);
-		trans->ctx.nla = NULL;
-		call_rcu(&trans->rcu_head, nf_tables_abort_release_rcu);
+		nf_tables_abort_release(trans);
 	}
 
 	return 0;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/8] ipvs: Keep skb->sk when allocating headroom on tunnel xmit
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

From: Calvin Owens <calvinowens@fb.com>

ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.

This patch properly copies ->sk and increments its reference count.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_xmit.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 437a366..bd90bf8 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -846,6 +846,8 @@ ip_vs_prepare_tunneled_skb(struct sk_buff *skb, int skb_af,
 		new_skb = skb_realloc_headroom(skb, max_headroom);
 		if (!new_skb)
 			goto error;
+		if (skb->sk)
+			skb_set_owner_w(new_skb, skb->sk);
 		consume_skb(skb);
 		skb = new_skb;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/8] netfilter: nft_masq: fix uninitialized range in nft_masq_{ipv4, ipv6}_eval
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

From: Daniel Borkmann <dborkman@redhat.com>

When transferring from the original range in nf_nat_masquerade_{ipv4,ipv6}()
we copy over values from stack in from min_proto/max_proto due to uninitialized
range variable in both, nft_masq_{ipv4,ipv6}_eval. As we only initialize
flags at this time from nft_masq struct, just zero out the rest.

Fixes: 9ba1f726bec09 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nft_masq_ipv4.c |    1 +
 net/ipv6/netfilter/nft_masq_ipv6.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/netfilter/nft_masq_ipv4.c b/net/ipv4/netfilter/nft_masq_ipv4.c
index c1023c4..665de06 100644
--- a/net/ipv4/netfilter/nft_masq_ipv4.c
+++ b/net/ipv4/netfilter/nft_masq_ipv4.c
@@ -24,6 +24,7 @@ static void nft_masq_ipv4_eval(const struct nft_expr *expr,
 	struct nf_nat_range range;
 	unsigned int verdict;
 
+	memset(&range, 0, sizeof(range));
 	range.flags = priv->flags;
 
 	verdict = nf_nat_masquerade_ipv4(pkt->skb, pkt->ops->hooknum,
diff --git a/net/ipv6/netfilter/nft_masq_ipv6.c b/net/ipv6/netfilter/nft_masq_ipv6.c
index 8a7ac68..529c119 100644
--- a/net/ipv6/netfilter/nft_masq_ipv6.c
+++ b/net/ipv6/netfilter/nft_masq_ipv6.c
@@ -25,6 +25,7 @@ static void nft_masq_ipv6_eval(const struct nft_expr *expr,
 	struct nf_nat_range range;
 	unsigned int verdict;
 
+	memset(&range, 0, sizeof(range));
 	range.flags = priv->flags;
 
 	verdict = nf_nat_masquerade_ipv6(pkt->skb, &range, pkt->out);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 0/8] Netfilter/IPVS fixes for net
From: Pablo Neira Ayuso @ 2014-11-14 16:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1415984329-5569-1-git-send-email-pablo@netfilter.org>

Hi David,

The following patchset contains Netfilter updates for your net tree,
they are:

1) Fix missing initialization of the range structure (allocated in the
   stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.

2) Make sure the data we receive from userspace contains the req_version
   structure, otherwise return an error incomplete on truncated input.
   From Dan Carpenter.

3) Fix handling og skb->sk which may cause incorrect handling
   of connections from a local process. Via Simon Horman, patch from
   Calvin Owens.

4) Fix wrong netns in nft_compat when setting target and match params
   structure.

5) Relax chain type validation in nft_compat that was recently included,
   this broke the matches that need to be run from the route chain type.
   Now iptables-test.py automated regression tests report success again
   and we avoid the only possible problematic case, which is the use of
   nat targets out of nat chain type.

6) Use match->table to validate the tablename, instead of the match->name.
   Again patch for nft_compat.

7) Restore the synchronous release of objects from the commit and abort
   path in nf_tables. This is causing two major problems: splats when using
   nft_compat, given that matches and targets may sleep and call_rcu is
   invoked from softirq context. Moreover Patrick reported possible event
   notification reordering when rules refer to anonymous sets.

8) Fix race condition in between packets that are being confirmed by
   conntrack and the ctnetlink flush operation. This happens since the
   removal of the central spinlock. Thanks to Jesper D. Brouer to looking
   into this.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit d52fdbb735c36a209f36a628d40ca9185b349ba7:

  smc91x: retrieve IRQ and trigger flags in a modern way (2014-11-01 17:04:20 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to 5195c14c8b27cc0b18220ddbf0e5ad3328a04187:

  netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse (2014-11-14 17:43:05 +0100)

----------------------------------------------------------------
Calvin Owens (1):
      ipvs: Keep skb->sk when allocating headroom on tunnel xmit

Dan Carpenter (1):
      netfilter: ipset: small potential read beyond the end of buffer

Daniel Borkmann (1):
      netfilter: nft_masq: fix uninitialized range in nft_masq_{ipv4, ipv6}_eval

Pablo Neira Ayuso (4):
      netfilter: nft_compat: use current net namespace
      netfilter: nft_compat: relax chain type validation
      netfilter: nft_compat: use the match->table to validate dependencies
      netfilter: nf_tables: restore synchronous object release from commit/abort

bill bonaparte (1):
      netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse

 include/net/netfilter/nf_tables.h  |    2 --
 net/ipv4/netfilter/nft_masq_ipv4.c |    1 +
 net/ipv6/netfilter/nft_masq_ipv6.c |    1 +
 net/netfilter/ipset/ip_set_core.c  |    6 ++++++
 net/netfilter/ipvs/ip_vs_xmit.c    |    2 ++
 net/netfilter/nf_conntrack_core.c  |   14 +++++++------
 net/netfilter/nf_tables_api.c      |   24 ++++++++--------------
 net/netfilter/nft_compat.c         |   40 ++++++------------------------------
 8 files changed, 32 insertions(+), 58 deletions(-)

^ permalink raw reply

* Re: [PATCH v5 4/8] net: can: c_can: Add syscon/regmap RAMINIT mechanism
From: Roger Quadros @ 2014-11-14 16:42 UTC (permalink / raw)
  To: Marc Kleine-Budde, wg
  Cc: wsa, tony, tglx, mugunthanvnm, george.cherian, balbi, nsekhar, nm,
	sergei.shtylyov, linux-omap, linux-can, netdev
In-Reply-To: <54662E9B.9030104@pengutronix.de>

On 11/14/2014 06:32 PM, Marc Kleine-Budde wrote:
> On 11/14/2014 04:37 PM, Roger Quadros wrote:
>> Some TI SoCs like DRA7 have a RAMINIT register specification
>> different from the other AMxx SoCs and as expected by the
>> existing driver.
>>
>> To add more insanity, this register is shared with other
>> IPs like DSS, PCIe and PWM.
>>
>> Provides a more generic mechanism to specify the RAMINIT
>> register location and START/DONE bit position and use the
>> syscon/regmap framework to access the register.
>>
>> Signed-off-by: Roger Quadros <rogerq@ti.com>
>> ---
>>  .../devicetree/bindings/net/can/c_can.txt          |   3 +
>>  drivers/net/can/c_can/c_can.h                      |  11 +-
>>  drivers/net/can/c_can/c_can_platform.c             | 113 ++++++++++++++-------
>>  3 files changed, 87 insertions(+), 40 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
>> index 8f1ae81..a3ca3ee 100644
>> --- a/Documentation/devicetree/bindings/net/can/c_can.txt
>> +++ b/Documentation/devicetree/bindings/net/can/c_can.txt
>> @@ -12,6 +12,9 @@ Required properties:
>>  Optional properties:
>>  - ti,hwmods		: Must be "d_can<n>" or "c_can<n>", n being the
>>  			  instance number
>> +- syscon-raminit	: Handle to system control region that contains the
>> +			  RAMINIT register, register offset to the RAMINIT
>> +			  register and the CAN instance number (0 offset).
>>  
>>  Note: "ti,hwmods" field is used to fetch the base address and irq
>>  resources from TI, omap hwmod data base during device registration.
>> diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
>> index 3c305a1..0e17c7b 100644
>> --- a/drivers/net/can/c_can/c_can.h
>> +++ b/drivers/net/can/c_can/c_can.h
>> @@ -179,6 +179,14 @@ struct c_can_driver_data {
>>  	bool raminit_pulse;	/* If set, sets and clears START bit (pulse) */
>>  };
>>  
>> +/* Out of band RAMINIT register access via syscon regmap */
>> +struct c_can_raminit {
>> +	struct regmap *syscon;	/* for raminit ctrl. reg. access */
>> +	unsigned int reg;	/* register index within syscon */
>> +	u8 start_bit;
>> +	u8 done_bit;
>> +};
>> +
>>  /* c_can private data structure */
>>  struct c_can_priv {
>>  	struct can_priv can;	/* must be the first member */
>> @@ -196,8 +204,7 @@ struct c_can_priv {
>>  	const u16 *regs;
>>  	void *priv;		/* for board-specific data */
>>  	enum c_can_dev_id type;
>> -	u32 __iomem *raminit_ctrlreg;
>> -	int instance;
>> +	struct c_can_raminit raminit_sys;	/* RAMINIT via syscon regmap */
>>  	void (*raminit) (const struct c_can_priv *priv, bool enable);
>>  	u32 comm_rcv_high;
>>  	u32 rxmasked;
>> diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
>> index 1546c2b..89739a1 100644
>> --- a/drivers/net/can/c_can/c_can_platform.c
>> +++ b/drivers/net/can/c_can/c_can_platform.c
>> @@ -32,14 +32,13 @@
>>  #include <linux/clk.h>
>>  #include <linux/of.h>
>>  #include <linux/of_device.h>
>> +#include <linux/mfd/syscon.h>
>> +#include <linux/regmap.h>
>>  
>>  #include <linux/can/dev.h>
>>  
>>  #include "c_can.h"
>>  
>> -#define CAN_RAMINIT_START_MASK(i)	(0x001 << (i))
>> -#define CAN_RAMINIT_DONE_MASK(i)	(0x100 << (i))
>> -#define CAN_RAMINIT_ALL_MASK(i)		(0x101 << (i))
>>  #define DCAN_RAM_INIT_BIT		(1 << 3)
>>  static DEFINE_SPINLOCK(raminit_lock);
>>  /*
>> @@ -72,47 +71,61 @@ static void c_can_plat_write_reg_aligned_to_32bit(const struct c_can_priv *priv,
>>  	writew(val, priv->base + 2 * priv->regs[index]);
>>  }
>>  
>> -static void c_can_hw_raminit_wait_ti(const struct c_can_priv *priv, u32 mask,
>> -				  u32 val)
>> +static void c_can_hw_raminit_wait_syscon(const struct c_can_priv *priv,
>> +					 u32 mask, u32 val)
>>  {
>>  	int timeout = 0;
>> +	const struct c_can_raminit *raminit = &priv->raminit_sys;
>> +	u32 ctrl;
>> +
>>  	/* We look only at the bits of our instance. */
>>  	val &= mask;
>> -	while ((readl(priv->raminit_ctrlreg) & mask) != val) {
>> +	do {
>>  		udelay(1);
>>  		timeout++;
>>  
>> +		regmap_read(raminit->syscon, raminit->reg, &ctrl);
>>  		if (timeout == 1000) {
>>  			dev_err(&priv->dev->dev, "%s: time out\n", __func__);
>>  			break;
>>  		}
>> -	}
>> +	} while ((ctrl & mask) != val);
>>  }
>>  
>> -static void c_can_hw_raminit_ti(const struct c_can_priv *priv, bool enable)
>> +static void c_can_hw_raminit_syscon(const struct c_can_priv *priv, bool enable)
>>  {
>> -	u32 mask = CAN_RAMINIT_ALL_MASK(priv->instance);
>> +	u32 mask;
>>  	u32 ctrl;
>> +	const struct c_can_raminit *raminit = &priv->raminit_sys;
>> +	u8 start_bit, done_bit;
>> +
>> +	start_bit = raminit->start_bit;
>> +	done_bit = raminit->done_bit;
>>  
>>  	spin_lock(&raminit_lock);
>>  
>> -	ctrl = readl(priv->raminit_ctrlreg);
>> +	mask = 1 << start_bit | 1 << done_bit;
>> +	regmap_read(raminit->syscon, raminit->reg, &ctrl);
>> +
>>  	/* We clear the done and start bit first. The start bit is
>>  	 * looking at the 0 -> transition, but is not self clearing;
>>  	 * And we clear the init done bit as well.
>> +	 * NOTE: DONE must be written with 1 to clear it.
>>  	 */
>> -	ctrl &= ~CAN_RAMINIT_START_MASK(priv->instance);
>> -	ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
>> -	writel(ctrl, priv->raminit_ctrlreg);
>> -	ctrl &= ~CAN_RAMINIT_DONE_MASK(priv->instance);
>> -	c_can_hw_raminit_wait_ti(priv, mask, ctrl);
>> +	ctrl &= ~(1 << start_bit);
>> +	ctrl |= 1 << done_bit;
>> +	regmap_write(raminit->syscon, raminit->reg, ctrl);
>> +
>> +	ctrl &= ~(1 << done_bit);
>> +	c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
>>  
>>  	if (enable) {
>>  		/* Set start bit and wait for the done bit. */
>> -		ctrl |= CAN_RAMINIT_START_MASK(priv->instance);
>> -		writel(ctrl, priv->raminit_ctrlreg);
>> -		ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
>> -		c_can_hw_raminit_wait_ti(priv, mask, ctrl);
>> +		ctrl |= 1 << start_bit;
>> +		regmap_write(raminit->syscon, raminit->reg, ctrl);
>> +
>> +		ctrl |= 1 << done_bit;
>> +		c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
>>  	}
>>  	spin_unlock(&raminit_lock);
>>  }
> 
> My arm gcc-4.7.2 spits this warnings, I'll initialize ctrl to 0.

My 4.7.3 doesn't. Initializing to 0 is fine as well.

cheers,
-roger

> 
>> drivers/net/can/c_can/c_can_platform.c: In function 'c_can_hw_raminit_wait_syscon':
>> drivers/net/can/c_can/c_can_platform.c:92:17: warning: 'ctrl' may be used uninitialized in this function [-Wuninitialized]
>> drivers/net/can/c_can/c_can_platform.c: In function 'c_can_hw_raminit_syscon':
>> drivers/net/can/c_can/c_can_platform.c:115:7: warning: 'ctrl' is used uninitialized in this function [-Wuninitialized]
> 
> Marc
> 


^ permalink raw reply

* Re: [PATCH nf] netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse
From: Pablo Neira Ayuso @ 2014-11-14 16:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: programme110, netfilter-devel, Florian Westphal, netdev,
	Patrick McHardy, Joerg Marx
In-Reply-To: <20141112083500.5404e5f4@redhat.com>

On Wed, Nov 12, 2014 at 08:35:00AM +0100, Jesper Dangaard Brouer wrote:
> > > -	/* We have to check the DYING flag inside the lock to prevent
> > > +
> > > +	/* We have to check the DYING flag after unlink to prevent
> > >  	   a race against nf_ct_get_next_corpse() possibly called from
> > >  	   user context, else we insert an already 'dead' hash, blocking
> > >  	   further use of that particular connection -JM */
> > 
> > While at this, I think it would be good to fix comment style to:
> > 
> >         /* We have ...
> >          * ...
> >          */
> > 
> > I can fix this here, no need to resend, just let me know.
> 
> Okay, I was just trying to keep the changes as minimal as possible, if
> this should go into a stable-kernel.  Your choice.

I'm going to take this patch including the comment style fix, I would
like to avoid specific patches to fix coding style issues, and the
first line of this comment is updated. I think the patch will be still
small to fulfill -stable rules.

I'll send a follow a patch to change the return verdict to NF_DROP to
not mix up different things.

Thanks!

^ permalink raw reply

* Re: [PATCH v5 4/8] net: can: c_can: Add syscon/regmap RAMINIT mechanism
From: Marc Kleine-Budde @ 2014-11-14 16:32 UTC (permalink / raw)
  To: Roger Quadros, wg
  Cc: wsa, tony, tglx, mugunthanvnm, george.cherian, balbi, nsekhar, nm,
	sergei.shtylyov, linux-omap, linux-can, netdev
In-Reply-To: <546621C3.3010804@ti.com>

[-- Attachment #1: Type: text/plain, Size: 6825 bytes --]

On 11/14/2014 04:37 PM, Roger Quadros wrote:
> Some TI SoCs like DRA7 have a RAMINIT register specification
> different from the other AMxx SoCs and as expected by the
> existing driver.
> 
> To add more insanity, this register is shared with other
> IPs like DSS, PCIe and PWM.
> 
> Provides a more generic mechanism to specify the RAMINIT
> register location and START/DONE bit position and use the
> syscon/regmap framework to access the register.
> 
> Signed-off-by: Roger Quadros <rogerq@ti.com>
> ---
>  .../devicetree/bindings/net/can/c_can.txt          |   3 +
>  drivers/net/can/c_can/c_can.h                      |  11 +-
>  drivers/net/can/c_can/c_can_platform.c             | 113 ++++++++++++++-------
>  3 files changed, 87 insertions(+), 40 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/can/c_can.txt b/Documentation/devicetree/bindings/net/can/c_can.txt
> index 8f1ae81..a3ca3ee 100644
> --- a/Documentation/devicetree/bindings/net/can/c_can.txt
> +++ b/Documentation/devicetree/bindings/net/can/c_can.txt
> @@ -12,6 +12,9 @@ Required properties:
>  Optional properties:
>  - ti,hwmods		: Must be "d_can<n>" or "c_can<n>", n being the
>  			  instance number
> +- syscon-raminit	: Handle to system control region that contains the
> +			  RAMINIT register, register offset to the RAMINIT
> +			  register and the CAN instance number (0 offset).
>  
>  Note: "ti,hwmods" field is used to fetch the base address and irq
>  resources from TI, omap hwmod data base during device registration.
> diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
> index 3c305a1..0e17c7b 100644
> --- a/drivers/net/can/c_can/c_can.h
> +++ b/drivers/net/can/c_can/c_can.h
> @@ -179,6 +179,14 @@ struct c_can_driver_data {
>  	bool raminit_pulse;	/* If set, sets and clears START bit (pulse) */
>  };
>  
> +/* Out of band RAMINIT register access via syscon regmap */
> +struct c_can_raminit {
> +	struct regmap *syscon;	/* for raminit ctrl. reg. access */
> +	unsigned int reg;	/* register index within syscon */
> +	u8 start_bit;
> +	u8 done_bit;
> +};
> +
>  /* c_can private data structure */
>  struct c_can_priv {
>  	struct can_priv can;	/* must be the first member */
> @@ -196,8 +204,7 @@ struct c_can_priv {
>  	const u16 *regs;
>  	void *priv;		/* for board-specific data */
>  	enum c_can_dev_id type;
> -	u32 __iomem *raminit_ctrlreg;
> -	int instance;
> +	struct c_can_raminit raminit_sys;	/* RAMINIT via syscon regmap */
>  	void (*raminit) (const struct c_can_priv *priv, bool enable);
>  	u32 comm_rcv_high;
>  	u32 rxmasked;
> diff --git a/drivers/net/can/c_can/c_can_platform.c b/drivers/net/can/c_can/c_can_platform.c
> index 1546c2b..89739a1 100644
> --- a/drivers/net/can/c_can/c_can_platform.c
> +++ b/drivers/net/can/c_can/c_can_platform.c
> @@ -32,14 +32,13 @@
>  #include <linux/clk.h>
>  #include <linux/of.h>
>  #include <linux/of_device.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
>  
>  #include <linux/can/dev.h>
>  
>  #include "c_can.h"
>  
> -#define CAN_RAMINIT_START_MASK(i)	(0x001 << (i))
> -#define CAN_RAMINIT_DONE_MASK(i)	(0x100 << (i))
> -#define CAN_RAMINIT_ALL_MASK(i)		(0x101 << (i))
>  #define DCAN_RAM_INIT_BIT		(1 << 3)
>  static DEFINE_SPINLOCK(raminit_lock);
>  /*
> @@ -72,47 +71,61 @@ static void c_can_plat_write_reg_aligned_to_32bit(const struct c_can_priv *priv,
>  	writew(val, priv->base + 2 * priv->regs[index]);
>  }
>  
> -static void c_can_hw_raminit_wait_ti(const struct c_can_priv *priv, u32 mask,
> -				  u32 val)
> +static void c_can_hw_raminit_wait_syscon(const struct c_can_priv *priv,
> +					 u32 mask, u32 val)
>  {
>  	int timeout = 0;
> +	const struct c_can_raminit *raminit = &priv->raminit_sys;
> +	u32 ctrl;
> +
>  	/* We look only at the bits of our instance. */
>  	val &= mask;
> -	while ((readl(priv->raminit_ctrlreg) & mask) != val) {
> +	do {
>  		udelay(1);
>  		timeout++;
>  
> +		regmap_read(raminit->syscon, raminit->reg, &ctrl);
>  		if (timeout == 1000) {
>  			dev_err(&priv->dev->dev, "%s: time out\n", __func__);
>  			break;
>  		}
> -	}
> +	} while ((ctrl & mask) != val);
>  }
>  
> -static void c_can_hw_raminit_ti(const struct c_can_priv *priv, bool enable)
> +static void c_can_hw_raminit_syscon(const struct c_can_priv *priv, bool enable)
>  {
> -	u32 mask = CAN_RAMINIT_ALL_MASK(priv->instance);
> +	u32 mask;
>  	u32 ctrl;
> +	const struct c_can_raminit *raminit = &priv->raminit_sys;
> +	u8 start_bit, done_bit;
> +
> +	start_bit = raminit->start_bit;
> +	done_bit = raminit->done_bit;
>  
>  	spin_lock(&raminit_lock);
>  
> -	ctrl = readl(priv->raminit_ctrlreg);
> +	mask = 1 << start_bit | 1 << done_bit;
> +	regmap_read(raminit->syscon, raminit->reg, &ctrl);
> +
>  	/* We clear the done and start bit first. The start bit is
>  	 * looking at the 0 -> transition, but is not self clearing;
>  	 * And we clear the init done bit as well.
> +	 * NOTE: DONE must be written with 1 to clear it.
>  	 */
> -	ctrl &= ~CAN_RAMINIT_START_MASK(priv->instance);
> -	ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
> -	writel(ctrl, priv->raminit_ctrlreg);
> -	ctrl &= ~CAN_RAMINIT_DONE_MASK(priv->instance);
> -	c_can_hw_raminit_wait_ti(priv, mask, ctrl);
> +	ctrl &= ~(1 << start_bit);
> +	ctrl |= 1 << done_bit;
> +	regmap_write(raminit->syscon, raminit->reg, ctrl);
> +
> +	ctrl &= ~(1 << done_bit);
> +	c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
>  
>  	if (enable) {
>  		/* Set start bit and wait for the done bit. */
> -		ctrl |= CAN_RAMINIT_START_MASK(priv->instance);
> -		writel(ctrl, priv->raminit_ctrlreg);
> -		ctrl |= CAN_RAMINIT_DONE_MASK(priv->instance);
> -		c_can_hw_raminit_wait_ti(priv, mask, ctrl);
> +		ctrl |= 1 << start_bit;
> +		regmap_write(raminit->syscon, raminit->reg, ctrl);
> +
> +		ctrl |= 1 << done_bit;
> +		c_can_hw_raminit_wait_syscon(priv, mask, ctrl);
>  	}
>  	spin_unlock(&raminit_lock);
>  }

My arm gcc-4.7.2 spits this warnings, I'll initialize ctrl to 0.

> drivers/net/can/c_can/c_can_platform.c: In function 'c_can_hw_raminit_wait_syscon':
> drivers/net/can/c_can/c_can_platform.c:92:17: warning: 'ctrl' may be used uninitialized in this function [-Wuninitialized]
> drivers/net/can/c_can/c_can_platform.c: In function 'c_can_hw_raminit_syscon':
> drivers/net/can/c_can/c_can_platform.c:115:7: warning: 'ctrl' is used uninitialized in this function [-Wuninitialized]

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net-next 1/7] bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
From: Alexei Starovoitov @ 2014-11-14 16:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: David S. Miller, Ingo Molnar, Andy Lutomirski, Daniel Borkmann,
	Eric Dumazet, Linux API, Network Development, LKML
In-Reply-To: <1415981203.15154.45.camel@localhost>

On Fri, Nov 14, 2014 at 8:06 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Fr, 2014-11-14 at 07:33 -0800, Alexei Starovoitov wrote:
>> On Fri, Nov 14, 2014 at 4:11 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > On Do, 2014-11-13 at 17:36 -0800, Alexei Starovoitov wrote:
>> >> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>> >> either update existing map element or create a new one.
>> >> Initially the plan was to add a new command to handle the case of
>> >> 'create new element if it didn't exist', but 'flags' style looks
>> >> cleaner and overall diff is much smaller (more code reused), so add 'flags'
>> >> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>> >>  #define BPF_ANY      0 /* create new element or update existing */
>> >>  #define BPF_NOEXIST  1 /* create new element if it didn't exist */
>> >>  #define BPF_EXIST    2 /* update existing element */
>> >
>> > Would a cmpxchg-alike function be handy here?
>>
>> you mean cmpxchg command in addition to
>> update() command ?
>> May be... it will have an extra 'value' argument
>> (key, old_value, new_value)
>> I don't have a use case for it yet though.
>
> I don't neither. ;)
>
> I just wanted to bring this up before user space api might get public
> and the additional argument might make problems.

addition of cmpxchg command won't be a problem obviously.
(just another 'new_value' field in existing struct inside bpf_attr union).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox