Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 00/27] Remove VLAN CFI bit abuse
From: Michał Mirosław @ 2016-12-14  2:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20161213171626.76e7dced@xeon-e3>

On Tue, Dec 13, 2016 at 05:16:26PM -0800, Stephen Hemminger wrote:
> On Tue, 13 Dec 2016 01:12:32 +0100 (CET)
> Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> > This series removes an abuse of VLAN CFI bit in Linux networking stack.
> > Currently Linux always clears the bit on outgoing traffic and presents
> > it cleared to userspace (even via AF_PACKET/tcpdump when hw-accelerated).
> > 
> > This uses a new vlan_present bit in struct skbuff, and removes an assumption
> > that vlan_proto != 0 when VLAN tag is present.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > 
> > The series is supposed to be bisect-friendly and that requires temporary
> > insertion of #define VLAN_TAG_PRESENT in BPF code to be able to split
> > JIT changes per architecture.
> 
> I wonder if CFI can every validly be non-zero in the modern world, on Hyper-V.
> There are no token ring devices and that seems to be the only use case where CFI would
> be non-zero. Unless someone is planning to reuse it a a protocol bit which seems
> like a really bad idea.
> 
> Maybe the right thing is to keep hard coded as zero and not start adding
> more untestable code conditions.
> 
> My recommendation would be get rid of VLAN_TAG_PRESENT, but don't preserve
> CFI bit.

According to Wikipedia page [1] on 802.1Q, CFI bit got already changed
to DEI (Drop eligible indicator) in 2011 revision of the IEEE standard.

I can't verify this, though.

Best Regards,
Michał Mirosław

[1] https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format

^ permalink raw reply

* Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Michał Mirosław @ 2016-12-14  2:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: open list:OPENVSWITCH, netdev-u79uwXL29TY76Z2rM5mHXA,
	moderated list:ETHERNET BRIDGE
In-Reply-To: <20161213172118.2f55c503@xeon-e3>

On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote:
> On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
> Michał Mirosław <mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org> wrote:
> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > intact through linux networking stack.
> > 
> > Signed-off-by: Michał Mirosław <michal.miroslaw-sjE0K2xrq/hHxbwTTUZ4aWZHpeb/A1Y/@public.gmane.org>
> > ---
> > 
> > Dear NetDevs
> > 
> > I guess this needs to be split to the prep..convert[]..finish sequence,
> > but if you like it as is, then it's ready.
> > 
> > The biggest question is if the modified interface and vlan_present
> > is the way to go. This can be changed to use vlan_proto != 0 instead
> > of an extra flag bit.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > OVS and bridge eyes are especially welcome.
> > 
> > Best Regards,
> > Michał Mirosław
> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?
> 
> If so then you need to be more verbose in the commit log, and lots more
> work is needed. You need to rename fields and validate every place a
> driver is using DEI bit to make sure it really does the right thing
> on that hardware. It is not just a mechanical change.

My main motivation is to be able to see the bit intact in tcpdump and be
able to pass it untouched through at least a veth pair. It would be great
if all devices didn't do something stupid with the bit, but it's not
something I am able to make happen.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Alexei Starovoitov @ 2016-12-14  2:21 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Stephen Hemminger, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	moderated list:ETHERNET BRIDGE, open list:OPENVSWITCH
In-Reply-To: <20161214020305.qck2bpxmfh6ltrw7-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

On Tue, Dec 13, 2016 at 6:03 PM, Michał Mirosław
<mirq-linux@rere.qmqm.pl> wrote:
> On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote:
>> On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
>> Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
>> > intact through linux networking stack.
>> >
>> > Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
>> > ---
>> >
>> > Dear NetDevs
>> >
>> > I guess this needs to be split to the prep..convert[]..finish sequence,
>> > but if you like it as is, then it's ready.
>> >
>> > The biggest question is if the modified interface and vlan_present
>> > is the way to go. This can be changed to use vlan_proto != 0 instead
>> > of an extra flag bit.
>> >
>> > As I can't test most of the driver changes, please look at them carefully.
>> > OVS and bridge eyes are especially welcome.
>> >
>> > Best Regards,
>> > Michał Mirosław
>> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?
>>
>> If so then you need to be more verbose in the commit log, and lots more
>> work is needed. You need to rename fields and validate every place a
>> driver is using DEI bit to make sure it really does the right thing
>> on that hardware. It is not just a mechanical change.
>
> My main motivation is to be able to see the bit intact in tcpdump and be
> able to pass it untouched through at least a veth pair. It would be great
> if all devices didn't do something stupid with the bit, but it's not
> something I am able to make happen.

imo "be able to pass untouched through veth" is not good enough
justification for such invasive patches.
I'm still not sure that all of these changes don't affect user space.
_______________________________________________
dev mailing list
dev@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply

* (unknown), 
From: Mr Friedrich Mayrhofer @ 2016-12-14  2:45 UTC (permalink / raw)



Good Day,

This is the second time i am sending you this mail.

I, Friedrich Mayrhofer Donate $ 1,000,000.00 to You, Email Me
personally for more details.

Regards.
Friedrich Mayrhofer

^ permalink raw reply

* [PATCH 4/3] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14  3:10 UTC (permalink / raw)
  To: Netdev, David Miller, Linus Torvalds,
	kernel-hardening@lists.openwall.com, LKML, George Spelvin,
	Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers,
	linux-crypto, Ted Tso
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson
In-Reply-To: <20161214001656.19388-1-Jason@zx2c4.com>

This duplicates the current algorithm for get_random_int/long, but uses
siphash24 instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
hashes the pid, entropy, and timestamp as fixed width fields, in order
to increase diffusion.

The previous md5 algorithm used a per-cpu md5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash24, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 drivers/char/random.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..25f96f074da5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static u8 random_int_secret[SIPHASH24_KEY_LEN];
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,25 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
+	uint64_t *chaining;
 	unsigned int ret;
+	struct {
+		uint64_t chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_var(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2089,25 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
+	uint64_t *chaining;
 	unsigned long ret;
+	struct {
+		uint64_t chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = &get_cpu_var(get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_var(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 1/4] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-14  3:59 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Daniel J . Bernstein,
	Linus Torvalds, Eric Biggers

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.

SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.

Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.

(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
---
Changes from v1->v2:

   - None in this patch, but see elsewhere in series.

 include/linux/siphash.h | 20 +++++++++++++
 lib/Kconfig.debug       |  6 ++--
 lib/Makefile            |  5 ++--
 lib/siphash.c           | 76 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/test_siphash.c      | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 176 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..6623b3090645
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,20 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash_lengths {
+	SIPHASH24_KEY_LEN = 16
+};
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e6327d102184..32bbf689fc46 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1843,9 +1843,9 @@ config TEST_HASH
 	tristate "Perform selftest on hash functions"
 	default n
 	help
-	  Enable this option to test the kernel's integer (<linux/hash,h>)
-	  and string (<linux/stringhash.h>) hash functions on boot
-	  (or module load).
+	  Enable this option to test the kernel's integer (<linux/hash.h>),
+	  string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
+	  hash functions on boot (or module load).
 
 	  This is intended to help people writing architecture-specific
 	  optimized versions.  If unsure, say N.
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..7b55ad3a7fe9
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,76 @@
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <asm/unaligned.h>
+
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+#include <linux/dcache.h>
+#include <asm/word-at-a-time.h>
+#endif
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+	} while(0)
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = get_unaligned_le64(key);
+	u64 k1 = get_unaligned_le64(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = get_unaligned_le64(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+	if (left)
+		b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left));
+#else
+	switch (left) {
+	case 7: b |= ((u64)data[6]) << 48;
+	case 6: b |= ((u64)data[5]) << 40;
+	case 5: b |= ((u64)data[4]) << 32;
+	case 4: b |= get_unaligned_le32(data); break;
+	case 3: b |= ((u64)data[2]) << 16;
+	case 2: b |= get_unaligned_le16(data); break;
+	case 1: b |= data[0];
+	}
+#endif
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24);
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..336298aaa33b
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,74 @@
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+/* Test vectors taken from official reference source available at:
+ *     https://131002.net/siphash/siphash24.c
+ */
+static const u64 test_vectors[64] = {
+	0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
+	0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
+	0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
+	0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
+	0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
+	0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
+	0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
+	0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
+	0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
+	0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
+	0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
+	0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
+	0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
+	0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
+	0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
+	0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
+	0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
+	0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
+	0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
+	0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
+	0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
+	0x958a324ceb064572ULL
+};
+
+static int __init siphash_test_init(void)
+{
+	u8 in[64], k[16], i;
+	int ret = 0;
+
+	for (i = 0; i < 16; ++i)
+		k[i] = i;
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		if (siphash24(in, i, k) != test_vectors[i]) {
+			pr_info("self-test %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+	}
+	if (!ret)
+		pr_info("self-tests: pass\n");
+	return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 2/4] siphash: add convenience functions for jhash converts
From: Jason A. Donenfeld @ 2016-12-14  3:59 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto; +Cc: Jason A. Donenfeld
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>

Many jhash users currently rely on the Nwords functions. In order to
make transitions to siphash fit something people already know about, we
provide analog functions here. This also winds up being nice for the
networking stack, where hashing 32-bit fields is common.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes from v1->v2:

  - None in this patch, but see elsewhere in series.

 include/linux/siphash.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
index 6623b3090645..1391054c4c29 100644
--- a/include/linux/siphash.h
+++ b/include/linux/siphash.h
@@ -17,4 +17,37 @@ enum siphash_lengths {
 
 u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
 
+static inline u64 siphash24_1word(const u32 a, const u8 key[SIPHASH24_KEY_LEN])
+{
+	return siphash24((u8 *)&a, sizeof(a), key);
+}
+
+static inline u64 siphash24_2words(const u32 a, const u32 b, const u8 key[SIPHASH24_KEY_LEN])
+{
+	const struct {
+		u32 a;
+		u32 b;
+	} __packed combined = {
+		.a = a,
+		.b = b
+	};
+
+	return siphash24((const u8 *)&combined, sizeof(combined), key);
+}
+
+static inline u64 siphash24_3words(const u32 a, const u32 b, const u32 c, const u8 key[SIPHASH24_KEY_LEN])
+{
+	const struct {
+		u32 a;
+		u32 b;
+		u32 c;
+	} __packed combined = {
+		.a = a,
+		.b = b,
+		.c = c
+	};
+
+	return siphash24((const u8 *)&combined, sizeof(combined), key);
+}
+
 #endif /* _LINUX_SIPHASH_H */
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 3/4] secure_seq: use siphash24 instead of md5_transform
From: Jason A. Donenfeld @ 2016-12-14  3:59 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Andi Kleen
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
Changes from v1->v2:

  - Rebased on the latest 4.10, and now uses top 32-bits of siphash
    for the optional ts value.

 net/core/secure_seq.c | 160 +++++++++++++++++++++++++-------------------------
 1 file changed, 79 insertions(+), 81 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..abadc79cd5d3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */
+
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/cryptohash.h>
@@ -8,14 +10,14 @@
 #include <linux/ktime.h>
 #include <linux/string.h>
 #include <linux/net.h>
-
+#include <linux/siphash.h>
 #include <net/secure_seq.h>
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include <linux/in6.h>
 #include <net/tcp.h>
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+static u8 net_secret[SIPHASH24_KEY_LEN];
 
 static __always_inline void net_secret_init(void)
 {
@@ -44,44 +46,39 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 				 __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
-	u32 i;
-
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32) daddr[i];
-	secret[4] = net_secret[4] + (__force u32)dport;
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	return hash[0];
+	return siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
@@ -91,33 +88,37 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 			       __be16 sport, __be16 dport, u32 *tsoff)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
+	u64 hash;
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	*tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-	return seq_scale(hash[0]);
+	hash = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
+	*tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+	return seq_scale(hash);
 }
 
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
-
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.dport = dport
+	};
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = (__force u32)dport ^ net_secret[14];
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	return hash[0];
+	return seq_scale(siphash24((const u8 *)&combined, sizeof(combined), net_secret));
 }
 EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 #endif
@@ -126,21 +127,22 @@ EXPORT_SYMBOL_GPL(secure_ipv4_port_ephemeral);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport)
 {
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		__be32 saddr;
+		__be32 daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = saddr,
+		.daddr = daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-
 	net_secret_init();
-	hash[0] = (__force u32)saddr;
-	hash[1] = (__force u32)daddr;
-	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
-	hash[3] = net_secret[15];
-
-	md5_transform(hash, net_secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccp_sequence_number);
@@ -149,26 +151,22 @@ EXPORT_SYMBOL(secure_dccp_sequence_number);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 				  __be16 sport, __be16 dport)
 {
-	u32 secret[MD5_MESSAGE_BYTES / 4];
-	u32 hash[MD5_DIGEST_WORDS];
+	const struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		__be16 sport;
+		__be16 dport;
+	} __packed combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.daddr = *(struct in6_addr *)daddr,
+		.sport = sport,
+		.dport = dport
+	};
 	u64 seq;
-	u32 i;
-
 	net_secret_init();
-	memcpy(hash, saddr, 16);
-	for (i = 0; i < 4; i++)
-		secret[i] = net_secret[i] + (__force u32)daddr[i];
-	secret[4] = net_secret[4] +
-		(((__force u16)sport << 16) + (__force u16)dport);
-	for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-		secret[i] = net_secret[i];
-
-	md5_transform(hash, secret);
-
-	seq = hash[0] | (((u64)hash[1]) << 32);
+	seq = siphash24((const u8 *)&combined, sizeof(combined), net_secret);
 	seq += ktime_get_real_ns();
 	seq &= (1ull << 48) - 1;
-
 	return seq;
 }
 EXPORT_SYMBOL(secure_dccpv6_sequence_number);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v2 4/4] random: use siphash24 instead of md5 for get_random_int/long
From: Jason A. Donenfeld @ 2016-12-14  3:59 UTC (permalink / raw)
  To: Netdev, kernel-hardening, LKML, linux-crypto
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Ted Tso
In-Reply-To: <20161214035927.30004-1-Jason@zx2c4.com>

This duplicates the current algorithm for get_random_int/long, but uses
siphash24 instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
hashes the pid, entropy, and timestamp as fixed width fields, in order
to increase diffusion.

The previous md5 algorithm used a per-cpu md5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash24, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ted Tso <tytso@mit.edu>
---
Changes from v1->v2:

  - Uses u64 instead of uint64_t
  - Uses get_cpu_ptr instead of get_cpu_var

 drivers/char/random.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..61c4b45427dc 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include <linux/syscalls.h>
 #include <linux/completion.h>
 #include <linux/uuid.h>
+#include <linux/siphash.h>
 #include <crypto/chacha20.h>
 
 #include <asm/processor.h>
@@ -2042,7 +2043,7 @@ struct ctl_table random_table[] = {
 };
 #endif 	/* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+static u8 random_int_secret[SIPHASH24_KEY_LEN];
 
 int random_int_secret_init(void)
 {
@@ -2050,8 +2051,7 @@ int random_int_secret_init(void)
 	return 0;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-		__aligned(sizeof(unsigned long));
+static DEFINE_PER_CPU(u64, get_random_int_chaining);
 
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
@@ -2061,19 +2061,25 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-	__u32 *hash;
 	unsigned int ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
+	u64 *chaining;
 
 	if (arch_get_random_int(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = hash[0];
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_int);
@@ -2083,19 +2089,25 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-	__u32 *hash;
 	unsigned long ret;
+	struct {
+		u64 chaining;
+		unsigned long ts;
+		unsigned long entropy;
+		pid_t pid;
+	} __packed combined;
+	u64 *chaining;
 
 	if (arch_get_random_long(&ret))
 		return ret;
 
-	hash = get_cpu_var(get_random_int_hash);
-
-	hash[0] += current->pid + jiffies + random_get_entropy();
-	md5_transform(hash, random_int_secret);
-	ret = *(unsigned long *)hash;
-	put_cpu_var(get_random_int_hash);
-
+	chaining = get_cpu_ptr(&get_random_int_chaining);
+	combined.chaining = *chaining;
+	combined.ts = jiffies;
+	combined.entropy = random_get_entropy();
+	combined.pid = current->pid;
+	ret = *chaining = siphash24((u8 *)&combined, sizeof(combined), random_int_secret);
+	put_cpu_ptr(chaining);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_long);
-- 
2.11.0

^ permalink raw reply related

* Re: [RFC PATCH v3] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-14  4:00 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet, linux-audit,
	Dmitry Vyukov
In-Reply-To: <CAM_iQpVFkNdEirvBDi8wV=iExt9BnCm3KU7+Q8oqhrJJtcnu9Q@mail.gmail.com>

On 2016-12-13 16:19, Cong Wang wrote:
> On Tue, Dec 13, 2016 at 7:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > @@ -1283,8 +1299,10 @@ static void __net_exit audit_net_exit(struct net *net)
> >  {
> >         struct audit_net *aunet = net_generic(net, audit_net_id);
> >         struct sock *sock = aunet->nlsk;
> > +       mutex_lock(&audit_cmd_mutex);
> >         if (sock == audit_sock)
> >                 auditd_reset();
> > +       mutex_unlock(&audit_cmd_mutex);
> 
> This still doesn't look correct to me, b/c here we release the audit_sock
> refcnt twice:
> 
> 1) inside audit_reset()

The audit_reset() refcount decrement corresponds to a setting of
audit_sock only if audit_sock is still non-NULL.

> 2) netlink_kernel_release()

This refcount decrement corresponds to netlink_kernel_create().

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: stmmac driver...
From: Jie Deng @ 2016-12-14  4:05 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, David Miller, alexandre.torgue; +Cc: netdev
In-Reply-To: <35edb551-518e-f59f-7a9e-db108d8f42a7@st.com>

Hi Peppe,

On 2016/12/12 22:17, Giuseppe CAVALLARO wrote:
> Hi David
>
> On 12/7/2016 7:06 PM, David Miller wrote:
>>
>> Giuseppe and Alexandre,
>>
>> There are a lot of patches and discussions happening around the stammc
>> driver lately and both of you are listed as the maintainers.
>>
>> I really need prompt and conclusive reviews of these patch submissions
>> from you, and participation in all discussions about the driver.
>
> yes we are trying to do the best.
>
>> Otherwise I have only three things I can do: 1) let the patches rot in
>> patchwork for days 2) trust that the patches are sane and fit your
>> desires and goals and just apply them or 3) reject them since they
>> aren't being reviewed properly.
>
> at this stage, I think the best is: (3).
I think the patches David mentioned also included XLGMAC. He sent this email
before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
development under drivers/net/ethernet/synopsys/ ? I think we don't have
conflict since we will keep QoS development in stmmac.
>
>>
>> Thanks in advance.
>>
> you are welcome
>
>
> Peppe

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-14  4:17 UTC (permalink / raw)
  To: Cong Wang
  Cc: Herbert Xu, Johannes Berg, netdev, Florian Westphal, LKML,
	Eric Dumazet, linux-audit, syzkaller, David Miller, Dmitry Vyukov
In-Reply-To: <CAM_iQpVPEJ2t29ENpT4qcBznwE83w_PEBOxStwyzDH27Si2Ppw@mail.gmail.com>

On 2016-12-13 16:17, Cong Wang wrote:
> On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > It is actually the audit_pid and audit_nlk_portid that I care about
> > more.  The audit daemon could vanish or close the socket while the
> > kernel sock to which it was attached is still quite valid.  Accessing
> > the set of three atomically is the urge.  I wonder if it makes more
> > sense to test for the presence of auditd using audit_sock rather than
> > audit_pid, but still keep audit_pid for our reporting and replacement
> > strategy.  Another idea would be to put the three in one struct.
> 
> Note, the process has audit_pid should hold a refcnt to the netns too,
> so the netns can't be gone until that process is gone.

I noted that.  I did wonder if there might be a problem if all the
processes were moved to another netns with the struct sock stuck in the
now process-void netns.

This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper
network namespace.").

> > Can someone explain how they think the original test was able to trigger
> > this GPF?  Network namespace shutdown while something pretended to set
> > up a new auditd?  That's impressive for a fuzzer if that's the case...
> > Is there an strace?  I guess it is all in test().
> 
> I am surprised you still don't get the race condition even when you
> are now working on v2...
> 
> The race happens in this scenarios :
> 
> 1) Create a new netns
> 
> 2) In the new netns, communicate with kauditd to set audit_sock
> 
> 3) Generate some audit messages, so kauditd will keep sending them
> via audit_sock
> 
> 4) exit the netns
> 
> 5) the previous audit_sock is now going away, but kaudit_sock could still
> access it in this small window.

Ah ok that fits...

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Wei Xu @ 2016-12-14  4:24 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: jasowang, netdev, mst, nhorman, davem
In-Reply-To: <20161213194431.42vtozn6bs24vwda@thunk.org>

On 2016年12月14日 03:44, Theodore Ts'o wrote:
> Jason's patch fixed the issue, so I think we have the proper fix, but
> to answer your questions:
>
> On Wed, Dec 14, 2016 at 01:46:44AM +0800, Wei Xu wrote:
>>
>> Q1:
>> Which distribution are you using for the GCE instance?
>
> The test appliance is based on Debian Jessie.
>
>> Q2:
>> Are you running xfs test as an embedded VM case, which means XFS test
>> appliance is also a VM inside the GCE instance? Or the kernel is built
>> for the instance itself?
>
> No, GCE currently doesn't support running nested VM's (e.g., running
> VM's inside GCE).  So the kernel is built for the instance itself.
> The way the test appliance works is that it initially boots using the
> Debian Jessie default kernel and then we kexec into the kernel under
> test.
>
>> Q3:
>> Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
>> a local test bed if it makes sense.
>
> You definitely can't do it out of the box -- you need to build the
> image using "gen-image --networking", and then run "kvm-xfstests -N
> shell" as root.  But the bug doesn't reproduce on kvm-xfstests, using
> a 4.9 host kernel and linux-next guest kernel.
>

OK, thanks a lot.

BTW, although this is a guest issue, is there anyway to view the GCE
host kernel or qemu(if it is) version?

>
> Cheers,
>
> 					- Ted
>

^ permalink raw reply

* Re: [PATCH v3 net-next 1/3] openvswitch: Add a missing break statement.
From: Pravin Shelar @ 2016-12-14  5:07 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: Linux Kernel Network Developers, Jiri Benc, Eric Garver
In-Reply-To: <1480462253-114713-1-git-send-email-jarno@ovn.org>

On Tue, Nov 29, 2016 at 3:30 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Add a break statement to prevent fall-through from
> OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
> actions setting ethernet addresses fail to validate with log messages
> complaining about invalid tunnel attributes.
>
> Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> Acked-by: Pravin B Shelar <pshelar@ovn.org>
> Acked-by: Jiri Benc <jbenc@redhat.com>

Hi Jarno,
Since this is straight forward patch. can you send it separately so
that we can get it merged soon?

Thanks,
Pravin.

^ permalink raw reply

* Re: [RFC PATCH v3] audit: use proper refcount locking on audit_sock
From: Cong Wang @ 2016-12-14  5:36 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux Kernel Network Developers, LKML, Eric Dumazet, linux-audit,
	Dmitry Vyukov
In-Reply-To: <20161214040005.GL22660@madcap2.tricolour.ca>

On Tue, Dec 13, 2016 at 8:00 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2016-12-13 16:19, Cong Wang wrote:
>> On Tue, Dec 13, 2016 at 7:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > @@ -1283,8 +1299,10 @@ static void __net_exit audit_net_exit(struct net *net)
>> >  {
>> >         struct audit_net *aunet = net_generic(net, audit_net_id);
>> >         struct sock *sock = aunet->nlsk;
>> > +       mutex_lock(&audit_cmd_mutex);
>> >         if (sock == audit_sock)
>> >                 auditd_reset();
>> > +       mutex_unlock(&audit_cmd_mutex);
>>
>> This still doesn't look correct to me, b/c here we release the audit_sock
>> refcnt twice:
>>
>> 1) inside audit_reset()
>
> The audit_reset() refcount decrement corresponds to a setting of
> audit_sock only if audit_sock is still non-NULL.
>

Hmm, thinking about it again, looks like the sock == audit_sock
and audit_sock != NULL checks can guarantee we are safe. So,

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply

* [PATCH v3] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz Folta @ 2016-12-14  6:39 UTC (permalink / raw)
  To: Nicolas Ferre, David S. Miller, Niklas Cassel, Alexandre Torgue,
	Satanand Burla, Raghu Vatsavayi, Simon Horman,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
  Cc: Bartosz Folta, Rafal Ozieblo
In-Reply-To: <1481648560-25927-1-git-send-email-bfolta@cadence.com>

There are hardware PCI implementations of Cadence GEM network
controller. This patch will allow to use such hardware with reuse of
existing Platform Driver.

Signed-off-by: Bartosz Folta <bfolta@cadence.com>
---
Changed in v3:
Fixed dependencies in Kconfig.
---
Changed in v2:
Respin to net-next. Changed patch formatting.
---
 drivers/net/ethernet/cadence/Kconfig    |   9 ++
 drivers/net/ethernet/cadence/Makefile   |   1 +
 drivers/net/ethernet/cadence/macb.c     |  31 +++++--
 drivers/net/ethernet/cadence/macb_pci.c | 153 ++++++++++++++++++++++++++++++++
 include/linux/platform_data/macb.h      |   6 ++
 5 files changed, 195 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/cadence/macb_pci.c

diff --git a/drivers/net/ethernet/cadence/Kconfig b/drivers/net/ethernet/cadence/Kconfig
index f0bcb15..608bea1 100644
--- a/drivers/net/ethernet/cadence/Kconfig
+++ b/drivers/net/ethernet/cadence/Kconfig
@@ -31,4 +31,13 @@ config MACB
 	  To compile this driver as a module, choose M here: the module
 	  will be called macb.
 
+config MACB_PCI
+	tristate "Cadence PCI MACB/GEM support"
+	depends on MACB && PCI && COMMON_CLK
+	---help---
+	  This is PCI wrapper for MACB driver.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called macb_pci.
+
 endif # NET_CADENCE
diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
index 91f79b1..4ba7559 100644
--- a/drivers/net/ethernet/cadence/Makefile
+++ b/drivers/net/ethernet/cadence/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_MACB) += macb.o
+obj-$(CONFIG_MACB_PCI) += macb_pci.o
diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 538544a..c0fb80a 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -404,6 +404,8 @@ static int macb_mii_probe(struct net_device *dev)
 			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
 			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
 		}
+	} else {
+		phydev->irq = PHY_POLL;
 	}
 
 	/* attach the mac to the phy */
@@ -482,6 +484,9 @@ static int macb_mii_init(struct macb *bp)
 				goto err_out_unregister_bus;
 		}
 	} else {
+		for (i = 0; i < PHY_MAX_ADDR; i++)
+			bp->mii_bus->irq[i] = PHY_POLL;
+
 		if (pdata)
 			bp->mii_bus->phy_mask = pdata->phy_mask;
 
@@ -2523,16 +2528,24 @@ static int macb_clk_init(struct platform_device *pdev, struct clk **pclk,
 			 struct clk **hclk, struct clk **tx_clk,
 			 struct clk **rx_clk)
 {
+	struct macb_platform_data *pdata;
 	int err;
 
-	*pclk = devm_clk_get(&pdev->dev, "pclk");
+	pdata = dev_get_platdata(&pdev->dev);
+	if (pdata) {
+		*pclk = pdata->pclk;
+		*hclk = pdata->hclk;
+	} else {
+		*pclk = devm_clk_get(&pdev->dev, "pclk");
+		*hclk = devm_clk_get(&pdev->dev, "hclk");
+	}
+
 	if (IS_ERR(*pclk)) {
 		err = PTR_ERR(*pclk);
 		dev_err(&pdev->dev, "failed to get macb_clk (%u)\n", err);
 		return err;
 	}
 
-	*hclk = devm_clk_get(&pdev->dev, "hclk");
 	if (IS_ERR(*hclk)) {
 		err = PTR_ERR(*hclk);
 		dev_err(&pdev->dev, "failed to get hclk (%u)\n", err);
@@ -3107,15 +3120,23 @@ static int at91ether_init(struct platform_device *pdev)
 MODULE_DEVICE_TABLE(of, macb_dt_ids);
 #endif /* CONFIG_OF */
 
+static const struct macb_config default_gem_config = {
+	.caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_JUMBO,
+	.dma_burst_length = 16,
+	.clk_init = macb_clk_init,
+	.init = macb_init,
+	.jumbo_max_len = 10240,
+};
+
 static int macb_probe(struct platform_device *pdev)
 {
+	const struct macb_config *macb_config = &default_gem_config;
 	int (*clk_init)(struct platform_device *, struct clk **,
 			struct clk **, struct clk **,  struct clk **)
-					      = macb_clk_init;
-	int (*init)(struct platform_device *) = macb_init;
+					      = macb_config->clk_init;
+	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
 	struct device_node *phy_node;
-	const struct macb_config *macb_config = NULL;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
diff --git a/drivers/net/ethernet/cadence/macb_pci.c b/drivers/net/ethernet/cadence/macb_pci.c
new file mode 100644
index 0000000..92be2cd
--- /dev/null
+++ b/drivers/net/ethernet/cadence/macb_pci.c
@@ -0,0 +1,153 @@
+/**
+ * macb_pci.c - Cadence GEM PCI wrapper.
+ *
+ * Copyright (C) 2016 Cadence Design Systems - http://www.cadence.com
+ *
+ * Authors: Rafal Ozieblo <rafalo@cadence.com>
+ *	    Bartosz Folta <bfolta@cadence.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2  of
+ * the License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/etherdevice.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_data/macb.h>
+#include <linux/platform_device.h>
+#include "macb.h"
+
+#define PCI_DRIVER_NAME "macb_pci"
+#define PLAT_DRIVER_NAME "macb"
+
+#define CDNS_VENDOR_ID 0x17cd
+#define CDNS_DEVICE_ID 0xe007
+
+#define GEM_PCLK_RATE 50000000
+#define GEM_HCLK_RATE 50000000
+
+static int macb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	int err;
+	struct platform_device *plat_dev;
+	struct platform_device_info plat_info;
+	struct macb_platform_data plat_data;
+	struct resource res[2];
+
+	/* sanity check */
+	if (!id)
+		return -EINVAL;
+
+	/* enable pci device */
+	err = pci_enable_device(pdev);
+	if (err < 0) {
+		dev_err(&pdev->dev, "Enabling PCI device has failed: 0x%04X",
+			err);
+		return -EACCES;
+	}
+
+	pci_set_master(pdev);
+
+	/* set up resources */
+	memset(res, 0x00, sizeof(struct resource) * ARRAY_SIZE(res));
+	res[0].start = pdev->resource[0].start;
+	res[0].end = pdev->resource[0].end;
+	res[0].name = PCI_DRIVER_NAME;
+	res[0].flags = IORESOURCE_MEM;
+	res[1].start = pdev->irq;
+	res[1].name = PCI_DRIVER_NAME;
+	res[1].flags = IORESOURCE_IRQ;
+
+	dev_info(&pdev->dev, "EMAC physical base addr = 0x%p\n",
+		 (void *)(uintptr_t)pci_resource_start(pdev, 0));
+
+	/* set up macb platform data */
+	memset(&plat_data, 0, sizeof(plat_data));
+
+	/* initialize clocks */
+	plat_data.pclk = clk_register_fixed_rate(&pdev->dev, "pclk", NULL, 0,
+						 GEM_PCLK_RATE);
+	if (IS_ERR(plat_data.pclk)) {
+		err = PTR_ERR(plat_data.pclk);
+		goto err_pclk_register;
+	}
+
+	plat_data.hclk = clk_register_fixed_rate(&pdev->dev, "hclk", NULL, 0,
+						 GEM_HCLK_RATE);
+	if (IS_ERR(plat_data.hclk)) {
+		err = PTR_ERR(plat_data.hclk);
+		goto err_hclk_register;
+	}
+
+	/* set up platform device info */
+	memset(&plat_info, 0, sizeof(plat_info));
+	plat_info.parent = &pdev->dev;
+	plat_info.fwnode = pdev->dev.fwnode;
+	plat_info.name = PLAT_DRIVER_NAME;
+	plat_info.id = pdev->devfn;
+	plat_info.res = res;
+	plat_info.num_res = ARRAY_SIZE(res);
+	plat_info.data = &plat_data;
+	plat_info.size_data = sizeof(plat_data);
+	plat_info.dma_mask = DMA_BIT_MASK(32);
+
+	/* register platform device */
+	plat_dev = platform_device_register_full(&plat_info);
+	if (IS_ERR(plat_dev)) {
+		err = PTR_ERR(plat_dev);
+		goto err_plat_dev_register;
+	}
+
+	pci_set_drvdata(pdev, plat_dev);
+
+	return 0;
+
+err_plat_dev_register:
+	clk_unregister(plat_data.hclk);
+
+err_hclk_register:
+	clk_unregister(plat_data.pclk);
+
+err_pclk_register:
+	pci_disable_device(pdev);
+	return err;
+}
+
+static void macb_remove(struct pci_dev *pdev)
+{
+	struct platform_device *plat_dev = pci_get_drvdata(pdev);
+	struct macb_platform_data *plat_data = dev_get_platdata(&plat_dev->dev);
+
+	platform_device_unregister(plat_dev);
+	pci_disable_device(pdev);
+	clk_unregister(plat_data->pclk);
+	clk_unregister(plat_data->hclk);
+}
+
+static struct pci_device_id dev_id_table[] = {
+	{ PCI_DEVICE(CDNS_VENDOR_ID, CDNS_DEVICE_ID), },
+	{ 0, }
+};
+
+static struct pci_driver macb_pci_driver = {
+	.name     = PCI_DRIVER_NAME,
+	.id_table = dev_id_table,
+	.probe    = macb_probe,
+	.remove	  = macb_remove,
+};
+
+module_pci_driver(macb_pci_driver);
+MODULE_DEVICE_TABLE(pci, dev_id_table);
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Cadence NIC PCI wrapper");
diff --git a/include/linux/platform_data/macb.h b/include/linux/platform_data/macb.h
index 21b15f6..7815d50 100644
--- a/include/linux/platform_data/macb.h
+++ b/include/linux/platform_data/macb.h
@@ -8,6 +8,8 @@
 #ifndef __MACB_PDATA_H__
 #define __MACB_PDATA_H__
 
+#include <linux/clk.h>
+
 /**
  * struct macb_platform_data - platform data for MACB Ethernet
  * @phy_mask:		phy mask passed when register the MDIO bus
@@ -15,12 +17,16 @@
  * @phy_irq_pin:	PHY IRQ
  * @is_rmii:		using RMII interface?
  * @rev_eth_addr:	reverse Ethernet address byte order
+ * @pclk:		platform clock
+ * @hclk:		AHB clock
  */
 struct macb_platform_data {
 	u32		phy_mask;
 	int		phy_irq_pin;
 	u8		is_rmii;
 	u8		rev_eth_addr;
+	struct clk	*pclk;
+	struct clk	*hclk;
 };
 
 #endif /* __MACB_PDATA_H__ */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH iproute2 -net-next] lwt: BPF support for LWT
From: Thomas Graf @ 2016-12-14  7:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20161212154134.51638dae@xeon-e3>

On 13 December 2016 at 00:41, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> I went ahead and fixed these.

Thanks for fixing it up Stephen.

^ permalink raw reply

* [Query] Delayed vxlan socket creation?
From: Du, Fan @ 2016-12-14  7:49 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: mrjana@gmail.com, Du, Fan

Hi

I'm interested to one Docker issue[1] which looks like related to kernel vxlan socket creation
as described in the thread. From my limited knowledge here, socket creation is synchronous ,
and after the *socket* syscall, the sock handle will be valid and ready to linkup.

Somehow I'm not sure the detailed scenario here, and which/how possible commit fix?
Thanks!

Quoted analysis:
--------------------------------------------------------------------------
(Found in kernel 3.13)
The issue happens because in older kernels when a vxlan interface is created, 
the socket creation is queued up in a worker thread which actually creates 
the socket. But this needs to happen before we bring up the link on the vxlan interface. 
If for some chance, the worker thread hasn't completed the creation of the socket 
before we did link up then when we do link up the kernel checks if the socket was 
created and if not it will return ENOTCONN. This was a bug in the kernel which got fixed
in later kernels. That is why retrying with a timer fixes the issue.

[1]: https://github.com/docker/libnetwork/issues/1247

^ permalink raw reply

* [PATCH] vhost: introduce O(1) vq metadata cache
From: Jason Wang @ 2016-12-14  7:56 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel
  Cc: vkaplans, maxime.coquelin, wexu, peterx

When device IOTLB is enabled, all address translations were stored in
interval tree. O(lgN) searching time could be slow for virtqueue
metadata (avail, used and descriptors) since they were accessed much
often than other addresses. So this patch introduces an O(1) array
which points to the interval tree nodes that store the translations of
vq metadata. Those array were update during vq IOTLB prefetching and
were reset during each invalidation and tlb update. Each time we want
to access vq metadata, this small array were queried before interval
tree. This would be sufficient for static mappings but not dynamic
mappings, we could do optimizations on top.

Test were done with l2fwd in guest (2M hugepage):

   noiommu  | before        | after
tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)

We can almost reach the same performance as noiommu mode.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 136 ++++++++++++++++++++++++++++++++++++++++----------
 drivers/vhost/vhost.h |   8 +++
 2 files changed, 118 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c6f2d89..89e40b6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -282,6 +282,22 @@ void vhost_poll_queue(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
+static void __vhost_vq_meta_reset(struct vhost_virtqueue *vq)
+{
+	int j;
+
+	for (j = 0; j < VHOST_NUM_ADDRS; j++)
+		vq->meta_iotlb[j] = NULL;
+}
+
+static void vhost_vq_meta_reset(struct vhost_dev *d)
+{
+	int i;
+
+	for (i = 0; i < d->nvqs; ++i)
+		__vhost_vq_meta_reset(d->vqs[i]);
+}
+
 static void vhost_vq_reset(struct vhost_dev *dev,
 			   struct vhost_virtqueue *vq)
 {
@@ -311,6 +327,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	__vhost_vq_meta_reset(vq);
 }
 
 static int vhost_worker(void *data)
@@ -690,6 +707,18 @@ static int vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
 	return 1;
 }
 
+static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
+					       u64 addr, unsigned int size,
+					       int type)
+{
+	const struct vhost_umem_node *node = vq->meta_iotlb[type];
+
+	if (!node)
+		return NULL;
+
+	return (void *)(node->userspace_addr + (u64)addr - node->start);
+}
+
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
 static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
@@ -732,8 +761,14 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
 		struct iov_iter t;
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)to, size,
+				     VHOST_ADDR_DESC);
+
+		if (uaddr)
+			return __copy_to_user(uaddr, from, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_WO);
@@ -761,8 +796,14 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 		 * could be access through iotlb. So -EAGAIN should
 		 * not happen in this case.
 		 */
-		/* TODO: more fast path */
+		void __user *uaddr = vhost_vq_meta_fetch(vq,
+				     (u64)(uintptr_t)from, size,
+				     VHOST_ADDR_DESC);
 		struct iov_iter f;
+
+		if (uaddr)
+			return __copy_from_user(to, uaddr, size);
+
 		ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov,
 				     ARRAY_SIZE(vq->iotlb_iov),
 				     VHOST_ACCESS_RO);
@@ -782,17 +823,12 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
 	return ret;
 }
 
-static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
-				     void *addr, unsigned size)
+static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq,
+					  void *addr, unsigned int size,
+					  int type)
 {
 	int ret;
 
-	/* This function should be called after iotlb
-	 * prefetch, which means we're sure that vq
-	 * could be access through iotlb. So -EAGAIN should
-	 * not happen in this case.
-	 */
-	/* TODO: more fast path */
 	ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov,
 			     ARRAY_SIZE(vq->iotlb_iov),
 			     VHOST_ACCESS_RO);
@@ -813,14 +849,32 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	return vq->iotlb_iov[0].iov_base;
 }
 
-#define vhost_put_user(vq, x, ptr) \
+/* This function should be called after iotlb
+ * prefetch, which means we're sure that vq
+ * could be access through iotlb. So -EAGAIN should
+ * not happen in this case.
+ */
+static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
+					    void *addr, unsigned int size,
+					    int type)
+{
+	void __user *uaddr = vhost_vq_meta_fetch(vq,
+			     (u64)(uintptr_t)addr, size, type);
+	if (uaddr)
+		return uaddr;
+
+	return __vhost_get_user_slow(vq, addr, size, type);
+}
+
+#define vhost_put_user(vq, x, ptr)		\
 ({ \
 	int ret = -EFAULT; \
 	if (!vq->iotlb) { \
 		ret = __put_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) to = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr,	\
+					  sizeof(*ptr), VHOST_ADDR_USED); \
 		if (to != NULL) \
 			ret = __put_user(x, to); \
 		else \
@@ -829,14 +883,16 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
-#define vhost_get_user(vq, x, ptr) \
+#define vhost_get_user(vq, x, ptr, type)		\
 ({ \
 	int ret; \
 	if (!vq->iotlb) { \
 		ret = __get_user(x, ptr); \
 	} else { \
 		__typeof__(ptr) from = \
-			(__typeof__(ptr)) __vhost_get_user(vq, ptr, sizeof(*ptr)); \
+			(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
+							   sizeof(*ptr), \
+							   type); \
 		if (from != NULL) \
 			ret = __get_user(x, from); \
 		else \
@@ -845,6 +901,12 @@ static void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
+#define vhost_get_avail(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_AVAIL)
+
+#define vhost_get_used(vq, x, ptr) \
+	vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
+
 static void vhost_dev_lock_vqs(struct vhost_dev *d)
 {
 	int i = 0;
@@ -950,6 +1012,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 			ret = -EFAULT;
 			break;
 		}
+		vhost_vq_meta_reset(dev);
 		if (vhost_new_umem_range(dev->iotlb, msg->iova, msg->size,
 					 msg->iova + msg->size - 1,
 					 msg->uaddr, msg->perm)) {
@@ -959,6 +1022,7 @@ int vhost_process_iotlb_msg(struct vhost_dev *dev,
 		vhost_iotlb_notify_vq(dev, msg);
 		break;
 	case VHOST_IOTLB_INVALIDATE:
+		vhost_vq_meta_reset(dev);
 		vhost_del_umem_range(dev->iotlb, msg->iova,
 				     msg->iova + msg->size - 1);
 		break;
@@ -1102,12 +1166,26 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 			sizeof *used + num * sizeof *used->ring + s);
 }
 
+static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
+				 const struct vhost_umem_node *node,
+				 int type)
+{
+	int access = (type == VHOST_ADDR_USED) ?
+		     VHOST_ACCESS_WO : VHOST_ACCESS_RO;
+
+	if (likely(node->perm & access))
+		vq->meta_iotlb[type] = node;
+}
+
 static int iotlb_access_ok(struct vhost_virtqueue *vq,
-			   int access, u64 addr, u64 len)
+			   int access, u64 addr, u64 len, int type)
 {
 	const struct vhost_umem_node *node;
 	struct vhost_umem *umem = vq->iotlb;
-	u64 s = 0, size;
+	u64 s = 0, size, orig_addr = addr;
+
+	if (vhost_vq_meta_fetch(vq, addr, len, type))
+		return true;
 
 	while (len > s) {
 		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
@@ -1124,6 +1202,10 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
 		}
 
 		size = node->size - addr + node->start;
+
+		if (orig_addr == addr && size >= len)
+			vhost_vq_meta_update(vq, node, type);
+
 		s += size;
 		addr += size;
 	}
@@ -1140,13 +1222,15 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
 		return 1;
 
 	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
-			       num * sizeof *vq->desc) &&
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
 			       sizeof *vq->avail +
-			       num * sizeof *vq->avail->ring + s) &&
+			       num * sizeof(*vq->avail->ring) + s,
+			       VHOST_ADDR_AVAIL) &&
 	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
 			       sizeof *vq->used +
-			       num * sizeof *vq->used->ring + s);
+			       num * sizeof(*vq->used->ring) + s,
+			       VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
 
@@ -1729,7 +1813,7 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 		r = -EFAULT;
 		goto err;
 	}
-	r = vhost_get_user(vq, last_used_idx, &vq->used->idx);
+	r = vhost_get_used(vq, last_used_idx, &vq->used->idx);
 	if (r) {
 		vq_err(vq, "Can't access used idx at %p\n",
 		       &vq->used->idx);
@@ -1932,7 +2016,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Check it isn't doing very strange things with descriptor numbers. */
 	last_avail_idx = vq->last_avail_idx;
-	if (unlikely(vhost_get_user(vq, avail_idx, &vq->avail->idx))) {
+	if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
 		vq_err(vq, "Failed to access avail idx at %p\n",
 		       &vq->avail->idx);
 		return -EFAULT;
@@ -1954,7 +2038,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
 	/* Grab the next descriptor number they're advertising, and increment
 	 * the index we've seen. */
-	if (unlikely(vhost_get_user(vq, ring_head,
+	if (unlikely(vhost_get_avail(vq, ring_head,
 		     &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) {
 		vq_err(vq, "Failed to read head: idx %d address %p\n",
 		       last_avail_idx,
@@ -2170,7 +2254,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
 		__virtio16 flags;
-		if (vhost_get_user(vq, flags, &vq->avail->flags)) {
+		if (vhost_get_avail(vq, flags, &vq->avail->flags)) {
 			vq_err(vq, "Failed to get flags");
 			return true;
 		}
@@ -2184,7 +2268,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	if (unlikely(!v))
 		return true;
 
-	if (vhost_get_user(vq, event, vhost_used_event(vq))) {
+	if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
 		vq_err(vq, "Failed to get used event idx");
 		return true;
 	}
@@ -2226,7 +2310,7 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__virtio16 avail_idx;
 	int r;
 
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r)
 		return false;
 
@@ -2261,7 +2345,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again. */
 	smp_mb();
-	r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
+	r = vhost_get_avail(vq, avail_idx, &vq->avail->idx);
 	if (r) {
 		vq_err(vq, "Failed to check avail idx at %p: %d\n",
 		       &vq->avail->idx, r);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 78f3c5f..034ea18 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -76,6 +76,13 @@ struct vhost_umem {
 	int numem;
 };
 
+enum vhost_uaddr_type {
+	VHOST_ADDR_DESC = 0,
+	VHOST_ADDR_AVAIL = 1,
+	VHOST_ADDR_USED = 2,
+	VHOST_NUM_ADDRS = 3,
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
 	struct vhost_dev *dev;
@@ -86,6 +93,7 @@ struct vhost_virtqueue {
 	struct vring_desc __user *desc;
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
+	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
 	struct file *kick;
 	struct file *call;
 	struct file *error;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] vhost: introduce O(1) vq metadata cache
From: kbuild test robot @ 2016-12-14  8:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, mst, netdev, linux-kernel, peterx, virtualization,
	maxime.coquelin, kbuild-all, vkaplans, wexu
In-Reply-To: <1481702183-16088-1-git-send-email-jasowang@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

Hi Jason,

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on v4.9 next-20161214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vhost-introduce-O-1-vq-metadata-cache/20161214-160153
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-x005-201650 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/vhost/vhost.c: In function 'vhost_vq_meta_fetch':
>> drivers/vhost/vhost.c:719:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     return (void *)(node->userspace_addr + (u64)addr - node->start);
            ^

vim +719 drivers/vhost/vhost.c

   703							   node->start,
   704							   node->size))
   705				return 0;
   706		}
   707		return 1;
   708	}
   709	
   710	static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
   711						       u64 addr, unsigned int size,
   712						       int type)
   713	{
   714		const struct vhost_umem_node *node = vq->meta_iotlb[type];
   715	
   716		if (!node)
   717			return NULL;
   718	
 > 719		return (void *)(node->userspace_addr + (u64)addr - node->start);
   720	}
   721	
   722	/* Can we switch to this memory table? */
   723	/* Caller should have device mutex but not vq mutex */
   724	static int memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
   725				    int log_all)
   726	{
   727		int i;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27829 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: stmmac driver...
From: Jie Deng @ 2016-12-14  8:26 UTC (permalink / raw)
  To: Jie Deng, David Miller, alexandre.torgue
  Cc: CARLOS.PALMINHA, netdev, Giuseppe CAVALLARO
In-Reply-To: <d8d00654-a07e-2992-a911-3b88f1d3d3ac@st.com>

Hi David,

>>>> Giuseppe and Alexandre,
>>>>
>>>> There are a lot of patches and discussions happening around the stammc
>>>> driver lately and both of you are listed as the maintainers.
>>>>
>>>> I really need prompt and conclusive reviews of these patch submissions
>>>> from you, and participation in all discussions about the driver.
>>>
>>> yes we are trying to do the best.
>>>
>>>> Otherwise I have only three things I can do: 1) let the patches rot in
>>>> patchwork for days 2) trust that the patches are sane and fit your
>>>> desires and goals and just apply them or 3) reject them since they
>>>> aren't being reviewed properly.
>>>
>>> at this stage, I think the best is: (3).
>> I think the patches David mentioned also included XLGMAC. He sent this email
>> before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
>> development under drivers/net/ethernet/synopsys/ ? I think we don't have
>> conflict since we will keep QoS development in stmmac.
>
> Great. Many thanks for the clarification :-)
>
> Regards
> Peppe
>
Do you agree that we do XLGMAC  development under drivers/net/ethernet/synopsys/
in the future ?
There is no conflict of interest since this is a new IP without driver. As you
see, there are several drivers for QoS (GMAC) and several drivers for XGMAC. We
want to avoid this situation for the new IP XLGMAC.

Regards,
Jie

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] phy: add phy fixup unregister functions
From: Dongpo Li @ 2016-12-14  8:39 UTC (permalink / raw)
  To: Woojung.Huh, davem, f.fainelli; +Cc: andrew, netdev, UNGLinuxDriver
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D4097999C@CHN-SV-EXMX02.mchp-main.com>

Hi all,

On 2016/12/8 4:26, Woojung.Huh@microchip.com wrote:
>>From : Woojung Huh <woojung.huh@microchip.com>
> 
> Add functions to unregister phy fixup for modules.
> 
> int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
> 	Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
> 	phy_uid_mask
> 
> int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
> 	Unregister phy fixup from phy_fixup_list.
> 	Use it for fixup registered by phy_register_fixup_for_uid()
> 
> int phy_unregister_fixup_for_id(const char *bus_id)
> 	Unregister phy fixup from phy_fixup_list.
> 	Use it for fixup registered by phy_register_fixup_for_id()
> 
> Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
> ---
>  Documentation/networking/phy.txt |  9 ++++++++
>  drivers/net/phy/phy_device.c     | 47 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/phy.h              |  4 ++++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
> index e017d93..16f90d8 100644
> --- a/Documentation/networking/phy.txt
> +++ b/Documentation/networking/phy.txt
> @@ -407,6 +407,15 @@ Board Fixups
>   The stubs set one of the two matching criteria, and set the other one to
>   match anything.
>  
> + When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
> + unregister fixup and free allocate memory are required.
> +
> + Call one of following function before unloading module.
> +
> + int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
> + int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
> + int phy_register_fixup_for_id(const char *phy_id);
> +
>  Standards
>  
>   IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index aeaf1bc..32fa7c7 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -235,6 +235,53 @@ int phy_register_fixup_for_id(const char *bus_id,
>  }
>  EXPORT_SYMBOL(phy_register_fixup_for_id);
>  
> +/**
> + * phy_unregister_fixup - remove a phy_fixup from the list
> + * @bus_id: A string matches fixup->bus_id (or PHY_ANY_ID) in phy_fixup_list
> + * @phy_uid: A phy id matches fixup->phy_id (or PHY_ANY_UID) in phy_fixup_list
> + * @phy_uid_mask: Applied to phy_uid and fixup->phy_uid before comparison
> + */
> +int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
> +{
> +	struct list_head *pos, *n;
> +	struct phy_fixup *fixup;
> +	int ret;
> +
> +	ret = -ENODEV;
> +
> +	mutex_lock(&phy_fixup_lock);
> +	list_for_each_safe(pos, n, &phy_fixup_list) {
> +		fixup = list_entry(pos, struct phy_fixup, list);
> +
> +		if ((!strcmp(fixup->bus_id, bus_id)) &&
> +		    ((fixup->phy_uid & phy_uid_mask) ==
> +		     (phy_uid & phy_uid_mask))) {
> +			list_del(&fixup->list);
> +			kfree(fixup);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&phy_fixup_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup);
> +
I just want to commit the unregister patch and found this patch. Good job!
But I consider this patch may miss something.
If one SoC has 2 MAC ports and each port uses the different network driver,
the 2 drivers may register fixup for the same PHY chip with different
"run" function because the PHY chip works in different mode.
In such a case, this patch doesn't consider "run" function and may cause problem.
When removing the driver which register fixup at last, it will remove another
driver's fixup.
Should this condition be considered and fixed?

> +/* Unregisters a fixup of any PHY with the UID in phy_uid */
> +int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
> +{
> +	return phy_unregister_fixup(PHY_ANY_ID, phy_uid, phy_uid_mask);
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup_for_uid);
> +
> +/* Unregisters a fixup of the PHY with id string bus_id */
> +int phy_unregister_fixup_for_id(const char *bus_id)
> +{
> +	return phy_unregister_fixup(bus_id, PHY_ANY_UID, 0xffffffff);
> +}
> +EXPORT_SYMBOL(phy_unregister_fixup_for_id);
> +
>  /* Returns 1 if fixup matches phydev in bus_id and phy_uid.
>   * Fixups can be set to match any in one or more fields.
>   */
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index feb8a98..f7d95f6 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -860,6 +860,10 @@ int phy_register_fixup_for_id(const char *bus_id,
>  int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
>  			       int (*run)(struct phy_device *));
>  
> +int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask);
> +int phy_unregister_fixup_for_id(const char *bus_id);
> +int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
> +
>  int phy_init_eee(struct phy_device *phydev, bool clk_stop_enable);
>  int phy_get_eee_err(struct phy_device *phydev);
>  int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data);
> 


    Regards,
    Dongpo

.

^ permalink raw reply

* Re: stmmac driver...
From: Giuseppe CAVALLARO @ 2016-12-14  7:33 UTC (permalink / raw)
  To: Jie Deng, David Miller, alexandre.torgue; +Cc: netdev
In-Reply-To: <8d624fd3-8440-5b8a-ee8d-558a671eec60@synopsys.com>

Hello Jie

On 12/14/2016 5:05 AM, Jie Deng wrote:
> Hi Peppe,
>
> On 2016/12/12 22:17, Giuseppe CAVALLARO wrote:
>> Hi David
>>
>> On 12/7/2016 7:06 PM, David Miller wrote:
>>>
>>> Giuseppe and Alexandre,
>>>
>>> There are a lot of patches and discussions happening around the stammc
>>> driver lately and both of you are listed as the maintainers.
>>>
>>> I really need prompt and conclusive reviews of these patch submissions
>>> from you, and participation in all discussions about the driver.
>>
>> yes we are trying to do the best.
>>
>>> Otherwise I have only three things I can do: 1) let the patches rot in
>>> patchwork for days 2) trust that the patches are sane and fit your
>>> desires and goals and just apply them or 3) reject them since they
>>> aren't being reviewed properly.
>>
>> at this stage, I think the best is: (3).
> I think the patches David mentioned also included XLGMAC. He sent this email
> before I explained QoS and XLGMAC were different IPs. Do you mind we do XLGMAC
> development under drivers/net/ethernet/synopsys/ ? I think we don't have
> conflict since we will keep QoS development in stmmac.

Great. Many thanks for the clarification :-)

Regards
Peppe

>>
>>>
>>> Thanks in advance.
>>>
>> you are welcome
>>
>>
>> Peppe
>
>

^ permalink raw reply

* [PATCH] net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-14  9:01 UTC (permalink / raw)
  To: davem, robert.jarzmik, mugunthanvnm, marcel, jarod, s.nawrocki,
	fw, harvey.hunt
  Cc: netdev, Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/ethernet/davicom/dm9000.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index f1a81c5..008dc81 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -570,19 +570,21 @@ static void dm9000_set_msglevel(struct net_device *dev, u32 value)
 	dm->msg_enable = value;
 }
 
-static int dm9000_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int dm9000_get_link_ksettings(struct net_device *dev,
+				     struct ethtool_link_ksettings *cmd)
 {
 	struct board_info *dm = to_dm9000_board(dev);
 
-	mii_ethtool_gset(&dm->mii, cmd);
+	mii_ethtool_get_link_ksettings(&dm->mii, cmd);
 	return 0;
 }
 
-static int dm9000_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int dm9000_set_link_ksettings(struct net_device *dev,
+				     const struct ethtool_link_ksettings *cmd)
 {
 	struct board_info *dm = to_dm9000_board(dev);
 
-	return mii_ethtool_sset(&dm->mii, cmd);
+	return mii_ethtool_set_link_ksettings(&dm->mii, cmd);
 }
 
 static int dm9000_nway_reset(struct net_device *dev)
@@ -741,8 +743,6 @@ static int dm9000_set_wol(struct net_device *dev, struct ethtool_wolinfo *w)
 
 static const struct ethtool_ops dm9000_ethtool_ops = {
 	.get_drvinfo		= dm9000_get_drvinfo,
-	.get_settings		= dm9000_get_settings,
-	.set_settings		= dm9000_set_settings,
 	.get_msglevel		= dm9000_get_msglevel,
 	.set_msglevel		= dm9000_set_msglevel,
 	.nway_reset		= dm9000_nway_reset,
@@ -752,6 +752,8 @@ static int dm9000_set_wol(struct net_device *dev, struct ethtool_wolinfo *w)
 	.get_eeprom_len		= dm9000_get_eeprom_len,
 	.get_eeprom		= dm9000_get_eeprom,
 	.set_eeprom		= dm9000_set_eeprom,
+	.get_link_ksettings	= dm9000_get_link_ksettings,
+	.set_link_ksettings	= dm9000_set_link_ksettings,
 };
 
 static void dm9000_show_carrier(struct board_info *db,
-- 
1.7.4.4

^ permalink raw reply related

* Re: [Query] Delayed vxlan socket creation?
From: Jiri Benc @ 2016-12-14  9:29 UTC (permalink / raw)
  To: Du, Fan; +Cc: netdev@vger.kernel.org, mrjana@gmail.com
In-Reply-To: <5A90DA2E42F8AE43BC4A093BF06788481A9457F1@SHSMSX103.ccr.corp.intel.com>

On Wed, 14 Dec 2016 07:49:24 +0000, Du, Fan wrote:
> I'm interested to one Docker issue[1] which looks like related to kernel vxlan socket creation
> as described in the thread. From my limited knowledge here, socket creation is synchronous ,
> and after the *socket* syscall, the sock handle will be valid and ready to linkup.
> 
> Somehow I'm not sure the detailed scenario here, and which/how possible commit fix?

baf606d9c9b1^..56ef9c909b40

 Jiri

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox