Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH nf-next 0/2] netfilter: conntrack: route cache for forwarded connections
From: Florian Westphal @ 2014-12-08 15:36 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, brouer

[ Pablo, in case you deem this too late for -next just let me know
and I will resend once its open again ]

This adds an optional forward routing cache extension for netfilter
connection tracking.

The memory cost is an additional 32 bytes per conntrack entry
on x86_64.

Unlike any other currently implemented connection tracking
extension the rtcache has no run-time tunables, it is always active.

Also, unlike other conntrack extensions, it can be built as a module,
in this case modprobe/rmmod are used to enable/disable the cache.

Forward test using netperf UDP_STREAM between two network namespaces
(connected via veth devices), tput:

With conntrack + reverse path filtering (rp_filter sysctl=1):
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.12.2 () port 0 AF_INET
Socket  Message  Elapsed      Messages
 Size    Size     Time         Okay Errors   Throughput
 bytes   bytes    secs            #      #   10^6bits/sec

  212992      64   120.00    26333996      0     112.36
  212992           120.00    26279399            112.13

same, but with rtcache (this patch series):
  212992      64   120.00    34508693      0     147.24
  212992           120.00    34507838            147.23

same but with rp_filter=0 and no conntrack modules active:
  212992      64   120.00    42288748      0     180.43
  212992           120.00    42283439            180.41

IOW, this is only useful if conntrack is used anyway.

^ permalink raw reply

* [PATCH nf-next 2/2] netfilter: use conntrack rtcache if available
From: Florian Westphal @ 2014-12-08 15:36 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, brouer, Florian Westphal
In-Reply-To: <1418052964-4632-1-git-send-email-fw@strlen.de>

skip the reverse lookup if the iif matches the cached one.
In this case we know that a previous rpfilter check did not result
in packet drop.

This shortcut only works if rtcache is available and rule is placed
in mangle table (raw table is too early; skb->nfct will not be set).

While it would be possible to enforce rtcache, it would a)
force a dependency on conntrack and b) break backwards compatibility
since we'd have to restrict it to mangle table.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_rtcache.h | 12 +++++++++++
 net/ipv4/netfilter/ipt_rpfilter.c            |  4 +++-
 net/ipv6/netfilter/ip6t_rpfilter.c           |  4 +++-
 net/netfilter/core.c                         | 30 ++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_rtcache.h b/include/net/netfilter/nf_conntrack_rtcache.h
index e2fb302..19265ad 100644
--- a/include/net/netfilter/nf_conntrack_rtcache.h
+++ b/include/net/netfilter/nf_conntrack_rtcache.h
@@ -27,6 +27,18 @@ struct nf_conn_rtcache *nf_ct_rtcache_find(const struct nf_conn *ct)
 #endif
 }
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_RTCACHE)
+bool nf_conn_rtcache_match_dev(const struct sk_buff *skb,
+			       const struct net_device *dev);
+#else
+static inline bool
+nf_conn_rtcache_match_dev(const struct sk_buff *skb,
+			  const struct net_device *dev)
+{
+	return false;
+}
+#endif
+
 static inline int nf_conn_rtcache_iif_get(const struct nf_conn_rtcache *rtc,
 					  enum ip_conntrack_dir dir)
 {
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c b/net/ipv4/netfilter/ipt_rpfilter.c
index 4bfaedf..f39e934 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -19,6 +19,8 @@
 #include <linux/netfilter/xt_rpfilter.h>
 #include <linux/netfilter/x_tables.h>
 
+#include <net/netfilter/nf_conntrack_rtcache.h>
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Florian Westphal <fw@strlen.de>");
 MODULE_DESCRIPTION("iptables: ipv4 reverse path filter match");
@@ -82,7 +84,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	info = par->matchinfo;
 	invert = info->flags & XT_RPFILTER_INVERT;
 
-	if (rpfilter_is_local(skb))
+	if (rpfilter_is_local(skb) || nf_conn_rtcache_match_dev(skb, par->in))
 		return true ^ invert;
 
 	iph = ip_hdr(skb);
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c b/net/ipv6/netfilter/ip6t_rpfilter.c
index 790e0c6..8db9fe7 100644
--- a/net/ipv6/netfilter/ip6t_rpfilter.c
+++ b/net/ipv6/netfilter/ip6t_rpfilter.c
@@ -16,6 +16,8 @@
 #include <linux/netfilter/xt_rpfilter.h>
 #include <linux/netfilter/x_tables.h>
 
+#include <net/netfilter/nf_conntrack_rtcache.h>
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Florian Westphal <fw@strlen.de>");
 MODULE_DESCRIPTION("Xtables: IPv6 reverse path filter match");
@@ -85,7 +87,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	struct ipv6hdr *iph;
 	bool invert = info->flags & XT_RPFILTER_INVERT;
 
-	if (rpfilter_is_local(skb))
+	if (rpfilter_is_local(skb) || nf_conn_rtcache_match_dev(skb, par->in))
 		return true ^ invert;
 
 	iph = ipv6_hdr(skb);
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index fea9ef5..651b4c6 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -25,6 +25,8 @@
 #include <net/net_namespace.h>
 #include <net/sock.h>
 
+#include <net/netfilter/nf_conntrack_rtcache.h>
+
 #include "nf_internals.h"
 
 static DEFINE_MUTEX(afinfo_mutex);
@@ -267,6 +269,34 @@ EXPORT_SYMBOL_GPL(nfq_ct_hook);
 struct nfq_ct_nat_hook __rcu *nfq_ct_nat_hook __read_mostly;
 EXPORT_SYMBOL_GPL(nfq_ct_nat_hook);
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK_RTCACHE)
+/* returns true if dev matches the last recorded
+ * input interface of the conntrack attached to skb.
+ *
+ * This is not in conntrack to avoid module dependency.
+ */
+bool nf_conn_rtcache_match_dev(const struct sk_buff *skb,
+			       const struct net_device *dev)
+{
+	struct nf_conn_rtcache *rtc;
+	enum ip_conntrack_info ctinfo;
+	enum ip_conntrack_dir dir;
+	struct nf_conn *ct;
+	int iif;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	rtc = nf_ct_rtcache_find(ct);
+	if (!rtc)
+		return false;
+
+	dir = CTINFO2DIR(ctinfo);
+	iif = nf_conn_rtcache_iif_get(rtc, dir);
+
+	return iif == dev->ifindex;
+}
+EXPORT_SYMBOL_GPL(nf_conn_rtcache_match_dev);
+#endif
+
 #endif /* CONFIG_NF_CONNTRACK */
 
 #ifdef CONFIG_NF_NAT_NEEDED
-- 
2.0.4

^ permalink raw reply related

* Re: [PATCH net-next 2/3] netlink: IFLA_PHYS_SWITCH_ID to IFLA_PHYS_PARENT_ID
From: Andy Gospodarek @ 2014-12-08 15:37 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sfeldma
In-Reply-To: <20141208151714.GG1885@nanopsycho.brq.redhat.com>

On Mon, Dec 08, 2014 at 04:17:14PM +0100, Jiri Pirko wrote:
> Fri, Dec 05, 2014 at 07:02:16PM CET, gospo@cumulusnetworks.com wrote:
> >There has been much discussion about proper nomenclature to use for this
> >and I would prefer parent rather than calling every forwarding element a
> >switch.
> 
> Andy, I must say I really do not like just plain "parent". It is really
> not clear what it means as it can mean 1000 things.
> 
> I know "switch" is not ideal but everytime anyone is talking about these
> kind of forwarding devices, they use word "switch" even if it is not
> accurate and everyone knows what they are talking about. Nobody uses
> "parent".

Well of course they are not going to use it until it's committed.  ;-)

> For me this is nack for this patchset.

Thanks for the review.  I am not big marketing person, so it was not
clear to me what was ideal.  Due to parent already being in the code
and having as a logical description of the relationship (parent
switch/router device and sibling network interfaces -- like sibling CPU
cores on the same socket).

I do really want to collectively come up with something other than
switch for everything.  Those L3 ops with 'switch' in the name will
feel really awkward....

> 
> Jiri
> 
> >
> >Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
> >---
> > include/uapi/linux/if_link.h | 2 +-
> > net/core/rtnetlink.c         | 4 ++--
> > 2 files changed, 3 insertions(+), 3 deletions(-)
> >
> >diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> >index f7d0d2d..3d8edd8 100644
> >--- a/include/uapi/linux/if_link.h
> >+++ b/include/uapi/linux/if_link.h
> >@@ -145,7 +145,7 @@ enum {
> > 	IFLA_CARRIER,
> > 	IFLA_PHYS_PORT_ID,
> > 	IFLA_CARRIER_CHANGES,
> >-	IFLA_PHYS_SWITCH_ID,
> >+	IFLA_PHYS_PARENT_ID,
> > 	__IFLA_MAX
> > };
> > 
> >diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> >index 61cb7e7..1fe0a16 100644
> >--- a/net/core/rtnetlink.c
> >+++ b/net/core/rtnetlink.c
> >@@ -982,7 +982,7 @@ static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
> > 		return err;
> > 	}
> > 
> >-	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
> >+	if (nla_put(skb, IFLA_PHYS_PARENT_ID, psid.id_len, psid.id))
> > 		return -EMSGSIZE;
> > 
> > 	return 0;
> >@@ -1222,7 +1222,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> > 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
> > 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> > 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
> >-	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> >+	[IFLA_PHYS_PARENT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> > };
> > 
> > static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> >-- 
> >1.9.3
> >

^ permalink raw reply

* Re: [PATCH net-next v3 2/2] rocker: remove swdev mode
From: Thomas Graf @ 2014-12-08 16:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jiri Pirko, roopa, sfeldma, jhs, bcrl, john.fastabend, stephen,
	linville, vyasevic, netdev, davem, shm, gospo
In-Reply-To: <54858E6E.7010707@redhat.com>

On 12/08/14 at 12:41pm, Daniel Borkmann wrote:
> On 12/08/2014 12:03 PM, Jiri Pirko wrote:
> >well, I see no problem in using u16. IFLA_BRIDGE_MODE attr is u16 so
> >mode should stay u16.
> >
> >But maybe better to add:
> >#define BRIDGE_MODE_UNDEF 0xFFFF
> 
> Yep, something along these lines seems better.

Using u16 is fine but then all occurences should use it as opposed
to mixed s16/u16 usage as in v3.

^ permalink raw reply

* Re: Where exactly will arch_fast_hash be used
From: George Spelvin @ 2014-12-08 16:19 UTC (permalink / raw)
  To: hannes, linux
  Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso
In-Reply-To: <1418037908.190744.200156353.5DD668E8@webmail.messagingengine.com>

>>> In case of openvswitch it shows a performance improvment. The seed
>>> parameter could be used as an initial biasing of the crc32 function, but
>>> in case of openvswitch it is only set to 0.

>> NACK. [...]

> Sorry for being unclear, I understood that and didn't bother patching
> that '0' with a random seed exactly because of this.

And I'm sorry for delivering a long lecture on a subject you already
understood perfectly well.

I'd just been thinking about it because of Herbert's comments, so it was
conveniently at hand. :-)

Out of curiousity, what *were* you referring to when you talked
about biasing the crc32 function?  "Biasing" is a good term becuase
it just applies an offset, but what do you gain from doing that?

There are nifty things one can do with the CRC32 instruction, however.
A lot of ciphers these days use an ARX (add, rotate, XOR) kernel.
A crc32 instruction, although linear, does some very powerful rotate &
xor operations, and could replace the XOR and rotate.

^ permalink raw reply

* Re: [PATCH] drivers:atm Remove two FIXMES in the function, top_off_fp for the file, firestream.c
From: chas williams - CONTRACTOR @ 2014-12-08 16:20 UTC (permalink / raw)
  To: Nicholas Krause; +Cc: linux-atm-general, netdev, linux-kernel
In-Reply-To: <1417923348-13807-1-git-send-email-xerofoify@gmail.com>

I don't see any reason to promote qe_tmp to a u64.  I think you can
just remove the comment.  Anyone trying to build this into a 64-bit
kernel will see errors from the virt_to_bus()/bus_to_virt() usage.

fp->n seems to only be manipulated in interrupt context (after driving
initialization) so it doesn't need locking or to be atomic.

On Sat,  6 Dec 2014 22:35:48 -0500
Nicholas Krause <xerofoify@gmail.com> wrote:

> Removes two FIXMES in the function.top_off_fp. The first being that of needing the variable, qe_tmp needing to be a
> u64 type and not u32 as encoding will not work if using a 32 bit register rather then 64 bit as stated in the first
> fix me comment. In addition the second being a no longer needed comment due to not needing to atomically increment
> the variable, n as passed to the function by the pointer of fp, as part of a structure of type,freepool.
> 
> Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
> ---
>  drivers/atm/firestream.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/atm/firestream.c b/drivers/atm/firestream.c
> index 82f2ae0..06c23f6 100644
> --- a/drivers/atm/firestream.c
> +++ b/drivers/atm/firestream.c
> @@ -1477,7 +1477,7 @@ static void top_off_fp (struct fs_dev *dev, struct freepool *fp,
>  	struct FS_BPENTRY *qe, *ne;
>  	struct sk_buff *skb;
>  	int n = 0;
> -	u32 qe_tmp;
> +	u64  qe_tmp;
>  
>  	fs_dprintk (FS_DEBUG_QUEUE, "Topping off queue at %x (%d-%d/%d)\n", 
>  		    fp->offset, read_fs (dev, FP_CNT (fp->offset)), fp->n, 
> @@ -1505,14 +1505,8 @@ static void top_off_fp (struct fs_dev *dev, struct freepool *fp,
>  		ne->skb = skb;
>  		ne->fp = fp;
>  
> -		/*
> -		 * FIXME: following code encodes and decodes
> -		 * machine pointers (could be 64-bit) into a
> -		 * 32-bit register.
> -		 */
> -
>  		qe_tmp = read_fs (dev, FP_EA(fp->offset));
> -		fs_dprintk (FS_DEBUG_QUEUE, "link at %x\n", qe_tmp);
> +		fs_dprintk(FS_DEBUG_QUEUE, "link at %llx\n", qe_tmp);
>  		if (qe_tmp) {
>  			qe = bus_to_virt ((long) qe_tmp);
>  			qe->next = virt_to_bus(ne);
> @@ -1521,7 +1515,7 @@ static void top_off_fp (struct fs_dev *dev, struct freepool *fp,
>  			write_fs (dev, FP_SA(fp->offset), virt_to_bus(ne));
>  
>  		write_fs (dev, FP_EA(fp->offset), virt_to_bus (ne));
> -		fp->n++;   /* XXX Atomic_inc? */
> +		fp->n++;
>  		write_fs (dev, FP_CTU(fp->offset), 1);
>  	}
>  

^ permalink raw reply

* Antw: Re: Q: need effective backlog for listen()
From: Ulrich Windl @ 2014-12-08 16:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1418051901.15618.46.camel@edumazet-glaptop2.roam.corp.google.com>

>>> Eric Dumazet <eric.dumazet@gmail.com> schrieb am 08.12.2014 um 16:18 in
Nachricht <1418051901.15618.46.camel@edumazet-glaptop2.roam.corp.google.com>:
> On Mon, 2014-12-08 at 13:51 +0100, Ulrich Windl wrote:
>> (not subscribed to the list, plese keep me on CC:)
>> 
>> Hi!
>> 
>> I have a problem I could not find the answer. I suspect the problem
>> arises from Linux derivating from standard functionality...
>> 
>> I have written a server that should accept n TCP connections at most.
>> I was expecting that the backlog parameter of listen will cause extra
>> connection requests either
>> 1) to be refused
>> or
>> 2) to time out eventually
>> 
>> (The standard seems to say that extra connections are refused)
>> 
>> However none of the above see ms true. Even if my server delays
>> accept()ing new connections, no client ever sees a "connection
>> refused" or "connection timed out". Is there any chance to signal the
>> client that no more connections are accepted at the moment?
> 
> This 'standard' makes no sense to me, in light of SYNFLOOD attacks.
> 
> It actually makes SYNFLOOD attacks very effective.
> 
> Have you tried to disable syncookies for a start ?

No, I tries to temporarily close the listening socket. My priority is not to overload the server with connections. At the sime time I _want_ that the clients see that the server resfuses connections. Before you wonder: The client will actually be a load balancer.

^ permalink raw reply

* Re: Where exactly will arch_fast_hash be used
From: Hannes Frederic Sowa @ 2014-12-08 16:32 UTC (permalink / raw)
  To: George Spelvin
  Cc: davem, dborkman, herbert, linux-kernel, netdev, tgraf, tytso
In-Reply-To: <20141208161959.8852.qmail@ns.horizon.com>



On Mon, Dec 8, 2014, at 17:19, George Spelvin wrote:
> >>> In case of openvswitch it shows a performance improvment. The seed
> >>> parameter could be used as an initial biasing of the crc32 function, but
> >>> in case of openvswitch it is only set to 0.
>  
> >> NACK. [...]
> 
> > Sorry for being unclear, I understood that and didn't bother patching
> > that '0' with a random seed exactly because of this.
> 
> And I'm sorry for delivering a long lecture on a subject you already
> understood perfectly well.

I learned something, so your time wasn't completely wasted. ;)

> I'd just been thinking about it because of Herbert's comments, so it was
> conveniently at hand. :-)
> 
> Out of curiousity, what *were* you referring to when you talked
> about biasing the crc32 function?  "Biasing" is a good term becuase
> it just applies an offset, but what do you gain from doing that?

Actually, I don't know why the seed parameter was added. I just wanted
to mention that there is a way to bias the crc32 function which fits
into the style of the other hashing functions, like jhash with its
initval parameter.

I just kept it around during the rewrite.

The only use case I can imagine would be if one would like to calculate
a crc32c over a non-contiguous array, thus feeding the result of one crc
operation into the next one.

> There are nifty things one can do with the CRC32 instruction, however.
> A lot of ciphers these days use an ARX (add, rotate, XOR) kernel.
> A crc32 instruction, although linear, does some very powerful rotate &
> xor operations, and could replace the XOR and rotate.

Yes, I have seen it being used in cityhash.

There is also a proposal by Intel, but the hash seems too weak, too:
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/hash-method-performance-paper.pdf

Bye,
Hannes

^ permalink raw reply

* [PATCH net] netlink: use jhash as hashfn for rhashtable
From: Daniel Borkmann @ 2014-12-08 16:30 UTC (permalink / raw)
  To: davem; +Cc: netdev, Herbert Xu, Thomas Graf, Hannes Frederic Sowa

For netlink, we shouldn't be using arch_fast_hash() as a hashing
discipline, but rather jhash() instead.

Since netlink sockets can be opened by any user, a local attacker
would be able to easily create collisions with the DPDK-derived
arch_fast_hash(), which trades off performance for security by
using crc32 CPU instructions on x86_64.

While it might have a legimite use case in other places, it should
be avoided in netlink context, though. As rhashtable's API is very
flexible, we could later on still decide on other hashing disciplines,
if legitimate.

Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0007b81..b6bf8e8 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -3130,7 +3130,7 @@ static int __init netlink_proto_init(void)
 		.head_offset = offsetof(struct netlink_sock, node),
 		.key_offset = offsetof(struct netlink_sock, portid),
 		.key_len = sizeof(u32), /* portid */
-		.hashfn = arch_fast_hash,
+		.hashfn = jhash,
 		.max_shift = 16, /* 64K */
 		.grow_decision = rht_grow_above_75,
 		.shrink_decision = rht_shrink_below_30,
-- 
1.7.11.7

^ permalink raw reply related

* Re: wl1251: NVS firmware data
From: Greg Kroah-Hartman @ 2014-12-08 16:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Pali Rohár, Pavel Machek, John W. Linville,
	Grazvydas Ignotas, linux-wireless@vger.kernel.org,
	Network Development, Linux Kernel Mailing List, Ivaylo Dimitrov,
	Aaro Koskinen, Kalle Valo, Sebastian Reichel, David Gnedt
In-Reply-To: <CACVXFVPOLfDuqc0nLb-zM8vH618DLXy0xtZbUOn5_XvdxRZSDw@mail.gmail.com>

On Mon, Dec 08, 2014 at 11:18:18PM +0800, Ming Lei wrote:
> On Sat, Dec 6, 2014 at 9:02 PM, Pali Rohár <pali.rohar@gmail.com> wrote:
> > On Saturday 06 December 2014 13:49:54 Pavel Machek wrote:
> 
> >
> >  /**
> > + * request_firmware_prefer_user: - prefer usermode helper for loading firmware
> > + * @firmware_p: pointer to firmware image
> > + * @name: name of firmware file
> > + * @device: device for which firmware is being loaded
> > + *
> > + * This function works pretty much like request_firmware(), but it prefer
> > + * usermode helper. If usermode helper fails then it fallback to direct access.
> > + * Usefull for dynamic or model specific firmware data.
> > + **/
> > +int request_firmware_prefer_user(const struct firmware **firmware_p,
> > +                           const char *name, struct device *device)
> > +{
> > +       int ret;
> > +       __module_get(THIS_MODULE);
> > +       ret = _request_firmware(firmware_p, name, device,
> > +                               FW_OPT_UEVENT | FW_OPT_PREFER_USER);
> > +       module_put(THIS_MODULE);
> > +       return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(request_firmware_prefer_user);
> 
> I'd like to introduce request_firmware_user() which only requests
> firmware from user space, and this way is simpler and more flexible
> since we have request_firmware_direct() already.

Why would a driver care about what program provides the firmware?  It
shouldn't at all, and we want to get rid of the userspace firmware
loader, not encourage drivers to use it "exclusively" at all.

So no, I don't want to see this, and I don't want drivers to worry about
this either.

greg k-h

^ permalink raw reply

* Re: [PATCH net] netlink: use jhash as hashfn for rhashtable
From: Thomas Graf @ 2014-12-08 16:38 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Herbert Xu, Hannes Frederic Sowa
In-Reply-To: <1418056230-8700-1-git-send-email-dborkman@redhat.com>

On 12/08/14 at 05:30pm, Daniel Borkmann wrote:
> For netlink, we shouldn't be using arch_fast_hash() as a hashing
> discipline, but rather jhash() instead.
> 
> Since netlink sockets can be opened by any user, a local attacker
> would be able to easily create collisions with the DPDK-derived
> arch_fast_hash(), which trades off performance for security by
> using crc32 CPU instructions on x86_64.
> 
> While it might have a legimite use case in other places, it should
> be avoided in netlink context, though. As rhashtable's API is very
> flexible, we could later on still decide on other hashing disciplines,
> if legitimate.
> 
> Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
> Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table")
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH net-next 2/3] netlink: IFLA_PHYS_SWITCH_ID to IFLA_PHYS_PARENT_ID
From: Jiri Pirko @ 2014-12-08 16:41 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, sfeldma
In-Reply-To: <20141208153747.GG797@gospo.home.greyhouse.net>

Mon, Dec 08, 2014 at 04:37:47PM CET, gospo@cumulusnetworks.com wrote:
>On Mon, Dec 08, 2014 at 04:17:14PM +0100, Jiri Pirko wrote:
>> Fri, Dec 05, 2014 at 07:02:16PM CET, gospo@cumulusnetworks.com wrote:
>> >There has been much discussion about proper nomenclature to use for this
>> >and I would prefer parent rather than calling every forwarding element a
>> >switch.
>> 
>> Andy, I must say I really do not like just plain "parent". It is really
>> not clear what it means as it can mean 1000 things.
>> 
>> I know "switch" is not ideal but everytime anyone is talking about these
>> kind of forwarding devices, they use word "switch" even if it is not
>> accurate and everyone knows what they are talking about. Nobody uses
>> "parent".
>
>Well of course they are not going to use it until it's committed.  ;-)


Do you seriously expect people talking about "parents" instead of
"switches". I doubt that...

>
>> For me this is nack for this patchset.
>
>Thanks for the review.  I am not big marketing person, so it was not
>clear to me what was ideal.  Due to parent already being in the code
>and having as a logical description of the relationship (parent
>switch/router device and sibling network interfaces -- like sibling CPU
>cores on the same socket).
>
>I do really want to collectively come up with something other than
>switch for everything.  Those L3 ops with 'switch' in the name will
>feel really awkward....


I say it is not optimal, but I did not see any better proposal...


>
>> 
>> Jiri
>> 
>> >
>> >Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
>> >---
>> > include/uapi/linux/if_link.h | 2 +-
>> > net/core/rtnetlink.c         | 4 ++--
>> > 2 files changed, 3 insertions(+), 3 deletions(-)
>> >
>> >diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> >index f7d0d2d..3d8edd8 100644
>> >--- a/include/uapi/linux/if_link.h
>> >+++ b/include/uapi/linux/if_link.h
>> >@@ -145,7 +145,7 @@ enum {
>> > 	IFLA_CARRIER,
>> > 	IFLA_PHYS_PORT_ID,
>> > 	IFLA_CARRIER_CHANGES,
>> >-	IFLA_PHYS_SWITCH_ID,
>> >+	IFLA_PHYS_PARENT_ID,
>> > 	__IFLA_MAX
>> > };
>> > 
>> >diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> >index 61cb7e7..1fe0a16 100644
>> >--- a/net/core/rtnetlink.c
>> >+++ b/net/core/rtnetlink.c
>> >@@ -982,7 +982,7 @@ static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device *dev)
>> > 		return err;
>> > 	}
>> > 
>> >-	if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
>> >+	if (nla_put(skb, IFLA_PHYS_PARENT_ID, psid.id_len, psid.id))
>> > 		return -EMSGSIZE;
>> > 
>> > 	return 0;
>> >@@ -1222,7 +1222,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>> > 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>> > 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>> > 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>> >-	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>> >+	[IFLA_PHYS_PARENT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>> > };
>> > 
>> > static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>> >-- 
>> >1.9.3
>> >
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Q: need effective backlog for listen()
From: Philippe Troin @ 2014-12-08 16:35 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: netdev
In-Reply-To: <5485ACC9020000A100018394@gwsmtp1.uni-regensburg.de>

On Mon, 2014-12-08 at 13:51 +0100, Ulrich Windl wrote:
> (not subscribed to the list, plese keep me on CC:)
> 
> I have a problem I could not find the answer. I suspect the problem
> arises from Linux derivating from standard functionality...
> 
> I have written a server that should accept n TCP connections at most.
> I was expecting that the backlog parameter of listen will cause extra
> connection requests either
> 1) to be refused
> or
> 2) to time out eventually
> 
> (The standard seems to say that extra connections are refused)

The argument to listen() specifies how many connections the system is
allow to keep waiting to be accept()ed.
As soon as you accept() the connection, the count is decremented.
So that won't help for your use case.

> However none of the above see ms true. Even if my server delays
> accept()ing new connections, no client ever sees a "connection
> refused" or "connection timed out". Is there any chance to signal the
> client that no more connections are accepted at the moment?

Close the listening socket.  No new connections will be accepted.
When you reopen the socket for accepting new connections, you may have
to use SO_REUSEADDR before bind()ing to the port.

Phil.

^ permalink raw reply

* Re: [RFC][PATCHES] iov_iter.c rewrite
From: Kirill A. Shutemov @ 2014-12-08 16:46 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20141204202011.GO29748@ZenIV.linux.org.uk>

On Thu, Dec 04, 2014 at 08:20:11PM +0000, Al Viro wrote:
> 	First of all, I want to apologize for the nastiness of preprocessor
> use in this series.  Seeing that the whole "macros that look like new kinds
> of C statements" thing (including list_for_each_...(), etc) is very much not
> to my liking, I really don't trust my taste on finer details and I'd very
> much like some feedback.
> 
> 	The reason for doing that kind of tricks is that iov_iter.c keeps
> growing more and more boilerplate code.  For iov_iter-net series we need
> 	* csum_and_copy_from_iter()
> 	* csum_and_copy_to_iter()
> 	* copy_from_iter_nocache()
> That's 3 new primitives, each in 2 variants (iovec and bvec).
> 	* ITER_KVEC handled without going through uaccess.h stuff (and
> independent of set_fs() state).
> And *that* means 3 variants intstead of 2 for most of the existing primitives.
> That's far too much, and the amount of copies of the same logics would pretty
> much guarantee that it will be a breeding ground for hard-to-kill bugs.
> 
> 	The following series (also in vfs.git#iov_iter) actually manages to
> do all of the above *and* shrink the damn thing quite a bit.  The generated
> code appears to be no worse than before.  The price is a couple of iterator
> macros - iterate_all_kinds() and iterate_and_advance().  They are given an
> iov_iter, size (i.e. the amount of data in iov_iter beginning we want to go
> through), name of the loop variable and 3 variants of loop body - for iovec,
> bvec and kvec resp.  Loop variable is declared *inside* the expansion of those
> suckers according to the kind of iov_iter - it's struct iovec, struct bio_vec
> or struct kvec, covering the current range to deal with.
> 	The difference between those two is that iterate_and_advance() will
> advance the iov_iter by the amount it has handled and iterate_all_kinds()
> will leave iov_iter unchanged.
> 
> 	Unless I hear anybody yelling, it goes into vfs.git#for-next today,
> so if you have objections, suggestions, etc., give those *now*.
> 
> Al Viro (13):
>       iov_iter.c: macros for iterating over iov_iter
>       iov_iter.c: iterate_and_advance
>       iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
>       iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
>       iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
>       iov_iter.c: convert iov_iter_zero() to iterate_and_advance
>       iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
>       iov_iter.c: convert copy_from_iter() to iterate_and_advance
>       iov_iter.c: convert copy_to_iter() to iterate_and_advance
>       iov_iter.c: handle ITER_KVEC directly
>       csum_and_copy_..._iter()
>       new helper: iov_iter_kvec()
>       copy_from_iter_nocache()

I guess this crash is related to the patchset.

[  102.337742] ------------[ cut here ]------------
[  102.338270] kernel BUG at /home/kas/git/public/linux-next/arch/x86/mm/physaddr.c:26!
[  102.339043] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  102.339622] Modules linked in:
[  102.339951] CPU: 2 PID: 6029 Comm: trinity-c23 Not tainted 3.18.0-next-20141208-00036-gc7edb4791544-dirty #269
[  102.340011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[  102.340011] task: ffff880041c51510 ti: ffff880049c70000 task.ti: ffff880049c70000
[  102.340011] RIP: 0010:[<ffffffff8104d8c0>]  [<ffffffff8104d8c0>] __phys_addr+0x40/0x60
[  102.340011] RSP: 0018:ffff880049c73838  EFLAGS: 00010206
[  102.340011] RAX: 00004100174b4000 RBX: ffff880049c73b08 RCX: 0000000000000028
[  102.340011] RDX: 0000000000000041 RSI: ffff88015dc980a8 RDI: ffffc900174b4000
[  102.340011] RBP: ffff880049c73838 R08: ffffc900174b4000 R09: 000000000000000c
[  102.340011] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88015dc980a8
[  102.340011] R13: ffffc900174f4000 R14: ffffea0000000000 R15: ffffc900174b4000
[  102.340011] FS:  00007fcdd37fb700(0000) GS:ffff88017aa00000(0000) knlGS:0000000000000000
[  102.340011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  102.340011] CR2: 00000000006246a8 CR3: 0000000068dbb000 CR4: 00000000001406e0
[  102.340011] Stack:
[  102.340011]  ffff880049c73888 ffffffff8117a96b 0000000000000002 0000000000040000
[  102.340011]  ffffffff81204b3f ffff88015dc98028 0000000000000000 ffff880049c73a78
[  102.340011]  ffff88015dc98000 ffff880049c73a10 ffff880049c73978 ffffffff81203ec9
[  102.340011] Call Trace:
[  102.340011]  [<ffffffff8117a96b>] iov_iter_get_pages+0x17b/0x390
[  102.340011]  [<ffffffff81204b3f>] ? __blockdev_direct_IO+0x15f/0x16a0
[  102.340011]  [<ffffffff81203ec9>] do_direct_IO+0x10a9/0x1bc0
[  102.340011]  [<ffffffff810a92d8>] ? lockdep_init_map+0x68/0x5c0
[  102.340011]  [<ffffffff81204d6c>] __blockdev_direct_IO+0x38c/0x16a0
[  102.340011]  [<ffffffff810a9d27>] ? __lock_acquire+0x4f7/0xd40
[  102.340011]  [<ffffffff810a73f9>] ? mark_held_locks+0x79/0xb0
[  102.340011]  [<ffffffff8133b510>] ? xfs_get_blocks+0x20/0x20
[  102.340011]  [<ffffffff81338ff0>] xfs_vm_direct_IO+0x120/0x140
[  102.340011]  [<ffffffff8133b510>] ? xfs_get_blocks+0x20/0x20
[  102.340011]  [<ffffffff810aa773>] ? lock_release_non_nested+0x203/0x3d0
[  102.340011]  [<ffffffff8135b9a7>] ? xfs_ilock+0x167/0x2e0
[  102.340011]  [<ffffffff8114d957>] generic_file_read_iter+0x517/0x6a0
[  102.340011]  [<ffffffff810a73f9>] ? mark_held_locks+0x79/0xb0
[  102.340011]  [<ffffffff8185e192>] ? __mutex_unlock_slowpath+0xb2/0x190
[  102.340011]  [<ffffffff8134bc2f>] xfs_file_read_iter+0x12f/0x460
[  102.340011]  [<ffffffff811c237e>] new_sync_read+0x7e/0xb0
[  102.340011]  [<ffffffff811c3528>] __vfs_read+0x18/0x50
[  102.340011]  [<ffffffff811c35ed>] vfs_read+0x8d/0x150
[  102.340011]  [<ffffffff811c89e8>] kernel_read+0x48/0x60
[  102.340011]  [<ffffffff810f4a92>] copy_module_from_fd.isra.51+0x112/0x170
[  102.340011]  [<ffffffff810f7646>] SyS_finit_module+0x56/0x80
[  102.340011]  [<ffffffff81861f92>] system_call_fastpath+0x12/0x17
[  102.340011] Code: 00 00 00 00 00 78 00 00 48 01 f8 48 39 c2 72 1b 0f b6 0d 9d 7a ec 00 48 89 c2 48 d3 ea 48 85 d2 75 09 5d c3 0f 1f 80 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00 48 89 d0 48 03 05 3e 87 dc 00 48 81 fa 
[  102.340011] RIP  [<ffffffff8104d8c0>] __phys_addr+0x40/0x60
[  102.340011]  RSP <ffff880049c73838>
[  102.371939] ---[ end trace e07368268cd6b49c ]---


-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: wl1251: NVS firmware data
From: Pali Rohár @ 2014-12-08 16:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ming Lei, Pavel Machek, John W. Linville, Grazvydas Ignotas,
	linux-wireless@vger.kernel.org, Network Development,
	Linux Kernel Mailing List, Ivaylo Dimitrov, Aaro Koskinen,
	Kalle Valo, Sebastian Reichel, David Gnedt
In-Reply-To: <20141208163714.GA31169@kroah.com>

[-- Attachment #1: Type: Text/Plain, Size: 1991 bytes --]

On Monday 08 December 2014 17:37:14 Greg Kroah-Hartman wrote:
> On Mon, Dec 08, 2014 at 11:18:18PM +0800, Ming Lei wrote:
> > On Sat, Dec 6, 2014 at 9:02 PM, Pali Rohár 
<pali.rohar@gmail.com> wrote:
> > > On Saturday 06 December 2014 13:49:54 Pavel Machek wrote:
> > >  /**
> > > 
> > > + * request_firmware_prefer_user: - prefer usermode helper
> > > for loading firmware + * @firmware_p: pointer to firmware
> > > image
> > > + * @name: name of firmware file
> > > + * @device: device for which firmware is being loaded
> > > + *
> > > + * This function works pretty much like
> > > request_firmware(), but it prefer + * usermode helper. If
> > > usermode helper fails then it fallback to direct access.
> > > + * Usefull for dynamic or model specific firmware data.
> > > + **/
> > > +int request_firmware_prefer_user(const struct firmware
> > > **firmware_p, +                           const char
> > > *name, struct device *device) +{
> > > +       int ret;
> > > +       __module_get(THIS_MODULE);
> > > +       ret = _request_firmware(firmware_p, name, device,
> > > +                               FW_OPT_UEVENT |
> > > FW_OPT_PREFER_USER); +       module_put(THIS_MODULE);
> > > +       return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(request_firmware_prefer_user);
> > 
> > I'd like to introduce request_firmware_user() which only
> > requests firmware from user space, and this way is simpler
> > and more flexible since we have request_firmware_direct()
> > already.
> 
> Why would a driver care about what program provides the
> firmware?  It shouldn't at all, and we want to get rid of the
> userspace firmware loader, not encourage drivers to use it
> "exclusively" at all.
> 

Do not remove it! Without userspace firmware loader it is 
impossible to load dynamic firmware files.

> So no, I don't want to see this, and I don't want drivers to
> worry about this either.
> 
> greg k-h

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH][net-next] net: avoid to call skb_queue_len again
From: Sergei Shtylyov @ 2014-12-08 16:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Li RongQing, netdev
In-Reply-To: <1418051436.15618.43.camel@edumazet-glaptop2.roam.corp.google.com>

Hello.

On 12/08/2014 06:10 PM, Eric Dumazet wrote:

>>      I expect you to also refine the description, so that it's meaningful,
>> unlike now.

> It seems obvious to me Li is not a native English speaker.

    Me neither. :-)
    However, the good command of English language was a requirement when I was 
first hired to do the Linux development.

> I understood
> the patch very well, and the changelog seemed fine to me.

    Oh, if you say so...

> What about you provide this description instead, since you seem to care
> very much ?

    I mostly care for others; I don't suppose much people except you are able 
to understand the current variant. And I now have neither enough time nor 
enough understanding to write a proper description for this patch. I can only 
suggest that you refine the description for others if you can understand it so 
well.

> Thanks !

    Not at all.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH net] netlink: use jhash as hashfn for rhashtable
From: Hannes Frederic Sowa @ 2014-12-08 16:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Herbert Xu, Thomas Graf
In-Reply-To: <1418056230-8700-1-git-send-email-dborkman@redhat.com>

On Mo, 2014-12-08 at 17:30 +0100, Daniel Borkmann wrote:
> For netlink, we shouldn't be using arch_fast_hash() as a hashing
> discipline, but rather jhash() instead.
> 
> Since netlink sockets can be opened by any user, a local attacker
> would be able to easily create collisions with the DPDK-derived
> arch_fast_hash(), which trades off performance for security by
> using crc32 CPU instructions on x86_64.
> 
> While it might have a legimite use case in other places, it should
> be avoided in netlink context, though. As rhashtable's API is very
> flexible, we could later on still decide on other hashing disciplines,
> if legitimate.
> 
> Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
> Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table")
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  net/netlink/af_netlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 0007b81..b6bf8e8 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -3130,7 +3130,7 @@ static int __init netlink_proto_init(void)
>  		.head_offset = offsetof(struct netlink_sock, node),
>  		.key_offset = offsetof(struct netlink_sock, portid),
>  		.key_len = sizeof(u32), /* portid */
> -		.hashfn = arch_fast_hash,
> +		.hashfn = jhash,
>  		.max_shift = 16, /* 64K */
>  		.grow_decision = rht_grow_above_75,
>  		.shrink_decision = rht_shrink_below_30,

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

In net-next, some time soon, we should try to let all function pointers
to jhash() use one non-inline version. The other arch_fast_hash patch
adds __jhash for x86-only, we can move it over to lib/.

Thanks,
Hannes

^ permalink raw reply

* Re: wl1251: NVS firmware data
From: Marcel Holtmann @ 2014-12-08 17:05 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Greg Kroah-Hartman, Ming Lei, Pavel Machek, John W. Linville,
	Grazvydas Ignotas, linux-wireless@vger.kernel.org,
	Network Development, Linux Kernel Mailing List, Ivaylo Dimitrov,
	Aaro Koskinen, Kalle Valo, Sebastian Reichel, David Gnedt
In-Reply-To: <201412081747.30965@pali>

Hi Pali,

>>>> On Saturday 06 December 2014 13:49:54 Pavel Machek wrote:
>>>> /**
>>>> 
>>>> + * request_firmware_prefer_user: - prefer usermode helper
>>>> for loading firmware + * @firmware_p: pointer to firmware
>>>> image
>>>> + * @name: name of firmware file
>>>> + * @device: device for which firmware is being loaded
>>>> + *
>>>> + * This function works pretty much like
>>>> request_firmware(), but it prefer + * usermode helper. If
>>>> usermode helper fails then it fallback to direct access.
>>>> + * Usefull for dynamic or model specific firmware data.
>>>> + **/
>>>> +int request_firmware_prefer_user(const struct firmware
>>>> **firmware_p, +                           const char
>>>> *name, struct device *device) +{
>>>> +       int ret;
>>>> +       __module_get(THIS_MODULE);
>>>> +       ret = _request_firmware(firmware_p, name, device,
>>>> +                               FW_OPT_UEVENT |
>>>> FW_OPT_PREFER_USER); +       module_put(THIS_MODULE);
>>>> +       return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(request_firmware_prefer_user);
>>> 
>>> I'd like to introduce request_firmware_user() which only
>>> requests firmware from user space, and this way is simpler
>>> and more flexible since we have request_firmware_direct()
>>> already.
>> 
>> Why would a driver care about what program provides the
>> firmware?  It shouldn't at all, and we want to get rid of the
>> userspace firmware loader, not encourage drivers to use it
>> "exclusively" at all.
>> 
> 
> Do not remove it! Without userspace firmware loader it is 
> impossible to load dynamic firmware files.

why is this dynamic in the first place. It does not sound like dynamic data to me at all. This is like the WiFi MAC address(es) or Bluetooth BD_ADDR. They are all static information. The only difference is that they are on the host accessibly filesystem or storage and not on the device itself.

To be honest, for Bluetooth we solved this now. If the device is missing key information like the calibration data or BD_ADDR, then it comes up unconfigured. A userspace process can then go and load the right data into it and then the device becomes available as Bluetooth device.

Trying to use request_firmware to load some random data and insist on going through userspace helper for that sounds crazy to me. Especially since we are trying hard to get away from the userspace loader. Forcing to keep it for new stuff sounds backwards to me.

With the special Nokia partition in mind, why hasn't this been turned into a mountable filesystem or into a driver/subsystem that can access the data direct from the kernel. I advocated for this some time ago. Maybe there should be a special subsystem for access to these factory persistent information that drivers then just can access. I seem to remember that some systems provide these via ACPI. Why does the ARM platform has to be special here?

And the problem of getting Ethernet and WiFi MAC address and Bluetooth BD_ADDR comes up many many times. Why not have something generic here. And don't tell me request_firmware is that generic solution ;)

Regards

Marcel

^ permalink raw reply

* Re: wl1251: NVS firmware data
From: Pali Rohár @ 2014-12-08 17:11 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Greg Kroah-Hartman, Ming Lei, Pavel Machek, John W. Linville,
	Grazvydas Ignotas, linux-wireless@vger.kernel.org,
	Network Development, Linux Kernel Mailing List, Ivaylo Dimitrov,
	Aaro Koskinen, Kalle Valo, Sebastian Reichel, David Gnedt
In-Reply-To: <E5F3BB54-6365-4856-A231-A5FEDAEA217F@holtmann.org>

[-- Attachment #1: Type: Text/Plain, Size: 3623 bytes --]

On Monday 08 December 2014 18:05:37 Marcel Holtmann wrote:
> Hi Pali,
> 
> >>>> On Saturday 06 December 2014 13:49:54 Pavel Machek wrote:
> >>>> /**
> >>>> 
> >>>> + * request_firmware_prefer_user: - prefer usermode
> >>>> helper for loading firmware + * @firmware_p: pointer to
> >>>> firmware image
> >>>> + * @name: name of firmware file
> >>>> + * @device: device for which firmware is being loaded
> >>>> + *
> >>>> + * This function works pretty much like
> >>>> request_firmware(), but it prefer + * usermode helper. If
> >>>> usermode helper fails then it fallback to direct access.
> >>>> + * Usefull for dynamic or model specific firmware data.
> >>>> + **/
> >>>> +int request_firmware_prefer_user(const struct firmware
> >>>> **firmware_p, +                           const char
> >>>> *name, struct device *device) +{
> >>>> +       int ret;
> >>>> +       __module_get(THIS_MODULE);
> >>>> +       ret = _request_firmware(firmware_p, name, device,
> >>>> +                               FW_OPT_UEVENT |
> >>>> FW_OPT_PREFER_USER); +       module_put(THIS_MODULE);
> >>>> +       return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(request_firmware_prefer_user);
> >>> 
> >>> I'd like to introduce request_firmware_user() which only
> >>> requests firmware from user space, and this way is simpler
> >>> and more flexible since we have request_firmware_direct()
> >>> already.
> >> 
> >> Why would a driver care about what program provides the
> >> firmware?  It shouldn't at all, and we want to get rid of
> >> the userspace firmware loader, not encourage drivers to
> >> use it "exclusively" at all.
> > 
> > Do not remove it! Without userspace firmware loader it is
> > impossible to load dynamic firmware files.
> 
> why is this dynamic in the first place. It does not sound like
> dynamic data to me at all. This is like the WiFi MAC
> address(es) or Bluetooth BD_ADDR. They are all static
> information. The only difference is that they are on the host
> accessibly filesystem or storage and not on the device
> itself.
> 
> To be honest, for Bluetooth we solved this now. If the device
> is missing key information like the calibration data or
> BD_ADDR, then it comes up unconfigured. A userspace process
> can then go and load the right data into it and then the
> device becomes available as Bluetooth device.
> 
> Trying to use request_firmware to load some random data and
> insist on going through userspace helper for that sounds
> crazy to me. Especially since we are trying hard to get away
> from the userspace loader. Forcing to keep it for new stuff
> sounds backwards to me.
> 
> With the special Nokia partition in mind, why hasn't this been
> turned into a mountable filesystem or into a driver/subsystem
> that can access the data direct from the kernel. I advocated
> for this some time ago. Maybe there should be a special
> subsystem for access to these factory persistent information
> that drivers then just can access. I seem to remember that
> some systems provide these via ACPI. Why does the ARM
> platform has to be special here?
> 
> And the problem of getting Ethernet and WiFi MAC address and
> Bluetooth BD_ADDR comes up many many times. Why not have
> something generic here. And don't tell me request_firmware is
> that generic solution ;)
> 
> Regards
> 
> Marcel

Hi Marcel. I think you did not understand this problem. This 
discussion is not about mac address. Please read email thread 
again and if there are some unclear pars, then ask. Thanks!

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [ovs-dev] OVS Kernel Datapath development
From: Thomas Graf @ 2014-12-08 17:15 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: netdev, dev@openvswitch.org
In-Reply-To: <CALnjE+rn_Giv+8TM1_E2faBftRkivXMYHcjn-4Eq4Gu15r=CiQ@mail.gmail.com>

On 12/07/14 at 08:47pm, Pravin Shelar wrote:
> Since the beginning OVS kernel datapath development is primarily done
> on external OVS repo. Now we have mostly synced upstream and external
> OVS. So we have decided to change this process. New process is as
> follows.
> 
> 1. OVS feature development that involves kernel datapath should be
> done on net-next tree datapath.
> 2. Such feature patch series should be posted on netdev and ovs-dev
> mailing list.
> 3. Once review is done for entire series, kernel and OVS userspace
> patches will be merged in respective repo.
> 4. After the merge developer is suppose to send patches for external
> kernel datapath along with old kernel compatibility code. So that we
> can keep external datapath insync.

+1

Just to be clear, by respective repo do you mean net-next/net or will
you maintain a net-next branch on git.kernel.org and continue doing
pull requests?

^ permalink raw reply

* Re: [PATCH net] netlink: use jhash as hashfn for rhashtable
From: Dave Taht @ 2014-12-08 17:20 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, davem@davemloft.net, netdev@vger.kernel.org,
	Herbert Xu, Thomas Graf
In-Reply-To: <1418057780.29477.12.camel@localhost>

On Mon, Dec 8, 2014 at 8:56 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Mo, 2014-12-08 at 17:30 +0100, Daniel Borkmann wrote:
>> For netlink, we shouldn't be using arch_fast_hash() as a hashing
>> discipline, but rather jhash() instead.

I am not particularly happy with the amount of entropy in

static inline u32 ipv6_addr_hash(const struct in6_addr *a)
{
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
        const unsigned long *ul = (const unsigned long *)a;
        unsigned long x = ul[0] ^ ul[1];

        return (u32)(x ^ (x >> 32));
#else
        return (__force u32)(a->s6_addr32[0] ^ a->s6_addr32[1] ^
                             a->s6_addr32[2] ^ a->s6_addr32[3]);
#endif
}

is this worth improving somehow?

>> Since netlink sockets can be opened by any user, a local attacker
>> would be able to easily create collisions with the DPDK-derived
>> arch_fast_hash(), which trades off performance for security by
>> using crc32 CPU instructions on x86_64.
>>
>> While it might have a legimite use case in other places, it should
>> be avoided in netlink context, though. As rhashtable's API is very
>> flexible, we could later on still decide on other hashing disciplines,
>> if legitimate.
>>
>> Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
>> Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table")
>> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>> Cc: Thomas Graf <tgraf@suug.ch>
>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> ---
>>  net/netlink/af_netlink.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
>> index 0007b81..b6bf8e8 100644
>> --- a/net/netlink/af_netlink.c
>> +++ b/net/netlink/af_netlink.c
>> @@ -3130,7 +3130,7 @@ static int __init netlink_proto_init(void)
>>               .head_offset = offsetof(struct netlink_sock, node),
>>               .key_offset = offsetof(struct netlink_sock, portid),
>>               .key_len = sizeof(u32), /* portid */
>> -             .hashfn = arch_fast_hash,
>> +             .hashfn = jhash,
>>               .max_shift = 16, /* 64K */
>>               .grow_decision = rht_grow_above_75,
>>               .shrink_decision = rht_shrink_below_30,
>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>
> In net-next, some time soon, we should try to let all function pointers
> to jhash() use one non-inline version. The other arch_fast_hash patch
> adds __jhash for x86-only, we can move it over to lib/.
>
> Thanks,
> Hannes
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

http://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

^ permalink raw reply

* Re: [PATCH net] netlink: use jhash as hashfn for rhashtable
From: Hannes Frederic Sowa @ 2014-12-08 17:25 UTC (permalink / raw)
  To: Dave Taht; +Cc: Daniel Borkmann, davem, netdev, Herbert Xu, Thomas Graf
In-Reply-To: <CAA93jw5bAqxbpxyOHEHd6XSEn4-HK0a9qw3Cb_Vecb8Nz=xbfg@mail.gmail.com>



On Mon, Dec 8, 2014, at 18:20, Dave Taht wrote:
> On Mon, Dec 8, 2014 at 8:56 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Mo, 2014-12-08 at 17:30 +0100, Daniel Borkmann wrote:
> >> For netlink, we shouldn't be using arch_fast_hash() as a hashing
> >> discipline, but rather jhash() instead.
> 
> I am not particularly happy with the amount of entropy in
> 
> static inline u32 ipv6_addr_hash(const struct in6_addr *a)
> {
> #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG ==
> 64
>         const unsigned long *ul = (const unsigned long *)a;
>         unsigned long x = ul[0] ^ ul[1];
> 
>         return (u32)(x ^ (x >> 32));
> #else
>         return (__force u32)(a->s6_addr32[0] ^ a->s6_addr32[1] ^
>                              a->s6_addr32[2] ^ a->s6_addr32[3]);
> #endif
> }
> 
> is this worth improving somehow?
> 

That's e.g. the reason why we have

commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Fri Mar 15 11:32:30 2013 +0000

    inet: limit length of fragment queue hash table bucket lists

Note, __ipv6_addr_jhash (xoring the upper 32 bit before jhashing them)
has the same problem. I currently cannot spot any problematic users in
the kernel, flow dissector hashes are insecure by nature, local
addresses normally don't have problems with hash collisions. But maybe I
should redo an audit. :)

Bye,
Hannes

^ permalink raw reply

* Re: [ovs-discuss] kernel panic receiving flooded VXLAN traffic with OVS
From: Jesse Gross @ 2014-12-08 17:33 UTC (permalink / raw)
  To: Nicholas Bastin; +Cc: Jay Vosburgh, netdev, discuss@openvswitch.org
In-Reply-To: <CADmMkWdGhw3Rn3gMEtAV8ESZB-k+uAK=Ed=LwsShKOtyHdJS_g@mail.gmail.com>

On Sat, Dec 6, 2014 at 2:47 PM, Nicholas Bastin <nick.bastin@gmail.com> wrote:
> On Fri, Dec 5, 2014 at 4:51 PM, Jesse Gross <jesse@nicira.com> wrote:
>>
>> I don't think there is anything inherently wrong with aggregating TCP
>> segments in VXLAN that are not destined for the local host. This is
>> conceptually the same as doing aggregation for TCP packets where we
>> only perform L2 bridging - in theory we shouldn't look at the upper
>> layers but it is fine as long as we faithfully reconstruct it on the
>> way out.
>
>
> But you don't faithfully reconstruct what the user originally sent - in-path
> reassembly is always wrong, which is why hardware switches don't do it (by
> default, anyhow).  If you configure a middlebox to do some kind of
> assembly/translation/whatever work for you, that's fine, but something that
> advertises itself as a "switch" or "router" should definitely not do this by
> default.
>
> If you reassemble frames you completely obviate any kind of PMTU-D or
> configured MTU that your user is using, and this breaks a lot of paths.  We
> completely disable all GRO/TSO/etc., but if you are able to determine that a
> packet is not destined for the local host you should definitely not mutate
> it.

If you look at the implementation of GRO/TSO, I think you will see
that it does in fact faithfully reconstruct the original message and
path MTU discovery is preserved. On Linux systems, GRO is enabled by
default for all workloads - including those that do not result in
local termination such as bridging.

^ permalink raw reply

* Re: [PATCH net-next 2/3] netlink: IFLA_PHYS_SWITCH_ID to IFLA_PHYS_PARENT_ID
From: Andy Gospodarek @ 2014-12-08 17:49 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sfeldma
In-Reply-To: <20141208164143.GI1885@nanopsycho.brq.redhat.com>

On Mon, Dec 08, 2014 at 05:41:43PM +0100, Jiri Pirko wrote:
> Mon, Dec 08, 2014 at 04:37:47PM CET, gospo@cumulusnetworks.com wrote:
> >On Mon, Dec 08, 2014 at 04:17:14PM +0100, Jiri Pirko wrote:
> >> Fri, Dec 05, 2014 at 07:02:16PM CET, gospo@cumulusnetworks.com wrote:
> >> >There has been much discussion about proper nomenclature to use for this
> >> >and I would prefer parent rather than calling every forwarding element a
> >> >switch.
> >> 
> >> Andy, I must say I really do not like just plain "parent". It is really
> >> not clear what it means as it can mean 1000 things.
> >> 
> >> I know "switch" is not ideal but everytime anyone is talking about these
> >> kind of forwarding devices, they use word "switch" even if it is not
> >> accurate and everyone knows what they are talking about. Nobody uses
> >> "parent".
> >
> >Well of course they are not going to use it until it's committed.  ;-)
> 
> 
> Do you seriously expect people talking about "parents" instead of
"Parent device" -- absolutely

> "switches". I doubt that...
Agree to disagree, I guess! 

^ permalink raw reply

* Re: [RFC][PATCHES] iov_iter.c rewrite
From: Al Viro @ 2014-12-08 17:58 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20141208164650.GB29028@node.dhcp.inet.fi>

On Mon, Dec 08, 2014 at 06:46:50PM +0200, Kirill A. Shutemov wrote:

> I guess this crash is related to the patchset.

Might be.  Do you have a reproducer for it?

It looks like the second VIRTUAL_BUG_ON() in __phys_addr(), most likely
from __pa(), from virt_to_page(), from
                unsigned long addr = (unsigned long)v.iov_base, end;
                size_t len = v.iov_len + (*start = addr & (PAGE_SIZE - 1));

                if (len > maxpages * PAGE_SIZE)
                        len = maxpages * PAGE_SIZE;
                addr &= ~(PAGE_SIZE - 1);
                for (end = addr + len; addr < end; addr += PAGE_SIZE)
                        get_page(*pages++ = virt_to_page(addr));
                return len - *start;
in iov_iter_get_pages().  And that's ITER_KVEC case there...  Further
call chain looks like dio_refill_pages(), from dio_get_page(), from
do_direct_io(), eventually from kernel_read() and finit_module(),
Presumably called on O_DIRECT descriptor...

I'll try to reproduce it here, but if you have any reliable reproducer, it
would be very welcome (and would make a useful addition to LTP and/or
xfstests).

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox