Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net: korina: Fix NAPI versus resources freeing
From: David Miller @ 2016-12-26 16:26 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, alex, phil
In-Reply-To: <20161224035656.11289-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 23 Dec 2016 19:56:56 -0800

> Commit beb0babfb77e ("korina: disable napi on close and restart")
> introduced calls to napi_disable() that were missing before,
> unfortunately this leaves a small window during which NAPI has a chance
> to run, yet we just freed resources since korina_free_ring() has been
> called:
> 
> Fix this by disabling NAPI first then freeing resource, and make sure
> that we also cancel the restart taks before doing the resource freeing.
> 
> Fixes: beb0babfb77e ("korina: disable napi on close and restart")
> Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data
From: Pavel Machek @ 2016-12-26 16:35 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Pali Rohár, Ming Lei, Luis R. Rodriguez, Greg Kroah-Hartman,
	Kalle Valo, David Gnedt, Michal Kazior, Daniel Wagner,
	Tony Lindgren, Sebastian Reichel, Ivaylo Dimitrov, Aaro Koskinen,
	Grazvydas Ignotas, linux-kernel, linux-wireless, netdev
In-Reply-To: <e728586e-80bf-c9e0-0063-71da4c4aba85@broadcom.com>

[-- Attachment #1: Type: text/plain, Size: 2144 bytes --]

On Sun 2016-12-25 21:15:40, Arend Van Spriel wrote:
> On 24-12-2016 17:52, Pali Rohár wrote:
> > NVS calibration data for wl1251 are model specific. Every one device with
> > wl1251 chip has different and calibrated in factory.
> > 
> > Not all wl1251 chips have own EEPROM where are calibration data stored. And
> > in that case there is no "standard" place. Every device has stored them on
> > different place (some in rootfs file, some in dedicated nand partition,
> > some in another proprietary structure).
> > 
> > Kernel wl1251 driver cannot support every one different storage decided by
> > device manufacture so it will use request_firmware_prefer_user() call for
> > loading NVS calibration data and userspace helper will be responsible to
> > prepare correct data.
> 
> Responding to this patch as it provides a lot of context to discuss. As
> you might have gathered from earlier discussions I am not a fan of using
> user-space helper. I can agree that the kernel driver, wl1251 in this
> case, should be agnostic to platform specific details regarding storage
> solutions and the firmware api should hide that. However, it seems your
> only solution is adding user-space to the mix and changing the api
> towards that. Can we solve it without user-space help?

Answer is no, due to licensing. But that's wrong question to ask.

Right question is "should we solve it without user-space help"?

Answer is no, too. Way too complex. Yes, it would be nice if hardware
was designed in such a way that getting calibration data from kernel
is easy, and if you design hardware, please design it like that. But
N900 is not designed like that and getting the calibration through
userspace looks like only reasonable solution.

Now... how exactly to do that is other question. (But this is looks
very reasonable. Maybe I'd add request_firmware_with_flags(, ...int
flags), but.. that's a tiny detail.). But userspace needs to be
involved.

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH net] openvswitch: upcall: Fix vlan handling.
From: Pravin B Shelar @ 2016-12-26 16:31 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar, Jarno Rajahalme, Jiri Benc

Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
---
 net/openvswitch/datapath.c |  1 -
 net/openvswitch/flow.c     | 54 +++++++++++++++++++++++-----------------------
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2d4c4d3..9c62b63 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -606,7 +606,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
-	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..2c0a00f 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -312,7 +312,8 @@ static bool icmp6hdr_ok(struct sk_buff *skb)
  * Returns 0 if it encounters a non-vlan or incomplete packet.
  * Returns 1 after successfully parsing vlan tag.
  */
-static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh)
+static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh,
+			  bool untag_vlan)
 {
 	struct vlan_head *vh = (struct vlan_head *)skb->data;
 
@@ -330,7 +331,20 @@ static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh)
 	key_vh->tci = vh->tci | htons(VLAN_TAG_PRESENT);
 	key_vh->tpid = vh->tpid;
 
-	__skb_pull(skb, sizeof(struct vlan_head));
+	if (unlikely(untag_vlan)) {
+		int offset = skb->data - skb_mac_header(skb);
+		u16 tci;
+		int err;
+
+		__skb_push(skb, offset);
+		err = __skb_vlan_pop(skb, &tci);
+		__skb_pull(skb, offset);
+		if (err)
+			return err;
+		__vlan_hwaccel_put_tag(skb, key_vh->tpid, tci);
+	} else {
+		__skb_pull(skb, sizeof(struct vlan_head));
+	}
 	return 1;
 }
 
@@ -351,13 +365,13 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.vlan.tpid = skb->vlan_proto;
 	} else {
 		/* Parse outer vlan tag in the non-accelerated case. */
-		res = parse_vlan_tag(skb, &key->eth.vlan);
+		res = parse_vlan_tag(skb, &key->eth.vlan, true);
 		if (res <= 0)
 			return res;
 	}
 
 	/* Parse inner vlan tag. */
-	res = parse_vlan_tag(skb, &key->eth.cvlan);
+	res = parse_vlan_tag(skb, &key->eth.cvlan, false);
 	if (res <= 0)
 		return res;
 
@@ -800,29 +814,15 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
-	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
-		/* key_extract assumes that skb->protocol is set-up for
-		 * layer 3 packets which is the case for other callers,
-		 * in particular packets recieved from the network stack.
-		 * Here the correct value can be set from the metadata
-		 * extracted above.
-		 */
-		skb->protocol = key->eth.type;
-	} else {
-		struct ethhdr *eth;
-
-		skb_reset_mac_header(skb);
-		eth = eth_hdr(skb);
-
-		/* Normally, setting the skb 'protocol' field would be
-		 * handled by a call to eth_type_trans(), but it assumes
-		 * there's a sending device, which we may not have.
-		 */
-		if (eth_proto_is_802_3(eth->h_proto))
-			skb->protocol = eth->h_proto;
-		else
-			skb->protocol = htons(ETH_P_802_2);
-	}
+	/* key_extract assumes that skb->protocol is set-up for
+	 * layer 3 packets which is the case for other callers,
+	 * in particular packets received from the network stack.
+	 * Here the correct value can be set from the metadata
+	 * extracted above.
+	 * For L2 packet key eth type would be zero. skb protocol
+	 * would be set to correct value later during key-extact.
+	 */
 
+	skb->protocol = key->eth.type;
 	return key_extract(skb, key);
 }
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH] vif queue counters from int to long,[PATCH] vif queue counters from int to long
From: David Miller @ 2016-12-26 16:35 UTC (permalink / raw)
  To: mart; +Cc: netdev, ian.campbell, wei.liu2
In-Reply-To: <585D3E23.10509@greenhost.nl>

From: Mart van Santen <mart@greenhost.nl>
Date: Fri, 23 Dec 2016 16:09:23 +0100

> This patch fixes an issue where counters in the queue have type int,
> while the counters of the vif itself are specified as long. This can
> cause incorrect reporting of tx/rx values of the vif interface.
> More extensively reported on xen-devel mailinglist.
> 
> Signed-off-by: Mart van Santen <mart@greenhost.nl>

Your subject line lacks a proper subsystem prefix, in this case
an appropriate one might be "xen-netback: "

Also, your patch was corrupted by your email client, transforming
TAB characters into spaces.  This makes the patch unusable.

Please read Documentation/email-clients.txt on how to fix this
and how to submit uncorrupted patches properly.

Email a test patch to yourself, and do not attempt to resend this
patch to the mailing list until you can successfully apply the test
patch you send to yourself.

Thanks.

^ permalink raw reply

* Re: [PATCH v1] net: dev_weight: TX/RX orthogonality,Re: [PATCH v1] net: dev_weight: TX/RX orthogonality
From: David Miller @ 2016-12-26 16:58 UTC (permalink / raw)
  To: matthias.tafelmeier; +Cc: netdev, hagen, fw, edumazet, daniel
In-Reply-To: <ae0712c3-61c6-432e-78d9-665d0c291c9f@gmx.net>

From: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
Date: Mon, 26 Dec 2016 17:43:08 +0100

> 
>> From: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
>> Date: Mon, 26 Dec 2016 10:49:23 +0100
>>
>>> @@ -269,13 +269,21 @@ static struct ctl_table net_core_table[] = {
>>>  		.extra1		= &min_rcvbuf,
>>>  	},
>>>  	{
>>> -		.procname	= "dev_weight",
>>> -		.data		= &weight_p,
>>> +		.procname	= "dev_weight_rx",
>>> +		.data		= &weight_p_rx,
>>  ...
>>>  	{
>>> +		.procname	= "dev_weight_tx",
>> Sysctls are user visible APIs.  You cannot change them without
>> breaking userspace.  You particularly cannot change the name of
>> the sysctl.
> 
> What about leaving *dev_weight* in place for TX side as is and newly
> introducing a sysctl param
> *dev_weight_rx*. Though, am open to a better naming for the latter.

This changes behavior for existing users, you cannot do this.

^ permalink raw reply

* Re: [PATCH] net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
From: David Miller @ 2016-12-26 17:38 UTC (permalink / raw)
  To: bernat; +Cc: mlichvar, netdev, decot
In-Reply-To: <87tw9s311z.fsf@luffy.cx>

From: Vincent Bernat <bernat@luffy.cx>
Date: Sun, 25 Dec 2016 08:44:40 +0100

>  ❦ 24 novembre 2016 10:55 +0100, Miroslav Lichvar <mlichvar@redhat.com> :
> 
>> The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
>> command and likewise it shouldn't require the CAP_NET_ADMIN
>> capability.
> 
> Could this patch be pushed to stable branches too?

Sure, queued up.

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Ard Biesheuvel @ 2016-12-26 17:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Andy Lutomirski, Andy Lutomirski, Daniel Borkmann, Netdev, LKML,
	Linux Crypto Mailing List, Jason A. Donenfeld,
	Hannes Frederic Sowa, Alexei Starovoitov, Eric Dumazet,
	Eric Biggers, Tom Herbert, David S. Miller
In-Reply-To: <20161226075757.GA8916@gondor.apana.org.au>

On 26 December 2016 at 07:57, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>
>> I actually do use incremental hashing later on.   BPF currently
>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>> I change it to hash as it goes.
>
> How much data is this supposed to hash on average? If it's a large
> amount then perhaps using the existing crypto API would be a better
> option than adding this.
>

This is a good point actually: you didn't explain *why* BPF shouldn't
depend on the crypto API.

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Andy Lutomirski @ 2016-12-26 18:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Herbert Xu, Andy Lutomirski, Daniel Borkmann, Netdev, LKML,
	Linux Crypto Mailing List, Jason A. Donenfeld,
	Hannes Frederic Sowa, Alexei Starovoitov, Eric Dumazet,
	Eric Biggers, Tom Herbert, David S. Miller
In-Reply-To: <CAKv+Gu-NMNCEs61f4k7JSf9iSJ1A_Gy1r=kZRGqtbDsEDz7--Q@mail.gmail.com>

On Mon, Dec 26, 2016 at 9:51 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 26 December 2016 at 07:57, Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>>
>>> I actually do use incremental hashing later on.   BPF currently
>>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>>> I change it to hash as it goes.
>>
>> How much data is this supposed to hash on average? If it's a large
>> amount then perhaps using the existing crypto API would be a better
>> option than adding this.
>>
>
> This is a good point actually: you didn't explain *why* BPF shouldn't
> depend on the crypto API.

According to Daniel, the networking folks want to let embedded systems
include BPF without requiring the crypto core.

At some point, I'd also like to use modern hash functions for module
verification.  If doing so would require the crypto core to be
available when modules are loaded, then the crypto core couldn't be
modular.  (Although it occurs to me that my patches get that wrong --
if this change happens, I need to split the code so that the library
functions can be built in even if CRYPTO=m.)

Daniel, would you be okay with BPF selecting CRYPTO and CRYPTO_HASH?

Also, as a bikeshed thought: I could call the functions
sha256_init_direct(), etc.  Then there wouldn't be namespace
collisions and the fact that they bypass accelerated drivers would be
more obvious.

--Andy

^ permalink raw reply

* RE: Microsoft 36772 netdev
From: helga.brickl @ 2016-12-26 23:38 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 181628705052.zip --]
[-- Type: application/zip, Size: 41056 bytes --]

^ permalink raw reply

* Re: driver r8169 suddenly failed
From: Francois Romieu @ 2016-12-27  0:12 UTC (permalink / raw)
  To: Robert Grasso; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <428aa40c-e313-99ae-d3ae-97b6a5ef40e4@modulonet.fr>

Robert Grasso <robert.grasso@modulonet.fr> :
[dhcp snafu]
> First of all, can you confirm that I am doing right in posting to you
> (addresses found in README.Debian) ?

It isn't completely wrong.

> If I do, can you help ? I am not very proficient with Ethernet, and I am not
> able to figure out what my provider changed : their hotline is
> underqualified, they are just able to tell that "the signal on the line is
> ok".

You're spoiled. It is more than decent for a company whose core business
used to sell cable TV.

> But if you want me to run various tests, try new versions, I would be
> glad to do so : I would appreciate if I could salvage this Shuttle.

Please try a recent stable vanilla kernel and send a complete dmesg
from boot. I need to identify the specific 816x chipset.

Then record some traffic:

# touch gonzo.pcap && tshark -w gonzo.pcap -i eth1

It should only exhibit small outgoing packets but, well, one never knows.

Check the leds activity to be sure that the network interfaces have not
been renumbered.

Use ethtool to check that tso is disabled. If it isn't, disable it.

-- 
Ueimor

^ permalink raw reply

* Re: [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests
From: Alexei Starovoitov @ 2016-12-27  1:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List,
	Jason A. Donenfeld, Hannes Frederic Sowa, Eric Dumazet,
	Eric Biggers, Tom Herbert, David S. Miller, Alexei Starovoitov
In-Reply-To: <585ED3B9.6020407@iogearbox.net>

On Sat, Dec 24, 2016 at 08:59:53PM +0100, Daniel Borkmann wrote:
> On 12/24/2016 03:22 AM, Andy Lutomirski wrote:
> >BPF digests are intended to be used to avoid reloading programs that
> >are already loaded.  For use cases (CRIU?) where untrusted programs
> >are involved, intentional hash collisions could cause the wrong BPF
> >program to execute.  Additionally, if BPF digests are ever used
> >in-kernel to skip verification, a hash collision could give privilege
> >escalation directly.
> 
> Just for the record, digests will never ever be used to skip the
> verification step, so I don't know why this idea even comes up
> here (?) or is part of the changelog? As this will never be done
> anyway, rather drop that part so we can avoid confusion on this?

+1 to what Daniel said above.

For the others let me explain what this patch set is actually
trying to accomplish.
Andy had an idea that sha256 of the program can somehow be used
to bypass kernel verifier during program loading. Furthemore
he thinks that such 'bypass' would be useful for criu of bpf programs,
hence see vigorously attacking existing prog_digest (sha1) because
it's not as secure as sha256 and hence cannot be used for such 'bypass'.

The problem with criu of bpf programs is same as criu of kernel modules.
For the main tracing and networking use cases, we cannot stop the kernel,
so criu is out of question already.
Even if we could stop all the events that trigger bpf program execution,
the sha256 or memcmp() of the full program is not enough to guarantee
that two programs are the same.
Ex. bpf_map_lookup() may be safe for one program and not for another
depending on how map was created. Two programs of different types
are not comparable either. Etc, etc.
Therefore the idea of using sha256 for such purpose is bogus,
and I have an obvious NACK for bpf related patches 3,4,5,6.

For the questions raised in other threads:
I'm not ok with making BPF depend on CRYPTO, since it's the same as
requiring CRYPTO to select BPF for no good reason.

And 0/6 commit log:
> Since there are plenty of uses for the new-in-4.10 BPF digest feature
> that would be problematic if malicious users could produce collisions,
> the BPF digest should be collision-resistant. 

This statement is also bogus. The only reason we added prog_digest is
to improve debuggability and introspection of bpf programs.
As I said in the previous thread "collisions are ok" and we could have
used jhash here to avoid patches like this ever appearing
and wasting everyones time.

sha1 is 20 bytes which is already a bit long to print and copy paste by humans.
whereas 4 byte jhash is a bit too short, since collisions are not that rare
and may lead to incorrect assumptions from the users that develop the programs.
I would prefer something in 6-10 byte range that prevents collisions most of
the time and short to print as hex, but I don't know of anything like this
in the existing kernel and inventing bpf specific hash is not great.
Another requirement for debugging (and prog_digest) that user space
should be able to produce the same hash without asking kernel, so
sha1 fits that as well, since it's well known and easy to put into library.

sha256 doesn't fit either of these requirements. 32-bytes are too long to print
and when we use it as a substitue for the prog name for jited ksym, looking
at long function names will screw up all tools like perf, which we don't
want. sha256 is equally not easy for user space app like iproute2,
so not an acceptable choice from that pov either.

^ permalink raw reply

* Re: [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests
From: Andy Lutomirski @ 2016-12-27  2:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, Jason A. Donenfeld,
	Hannes Frederic Sowa, Eric Dumazet, Eric Biggers, Tom Herbert,
	David S. Miller, Alexei Starovoitov
In-Reply-To: <20161227013644.GA96815@ast-mbp.thefacebook.com>

On Mon, Dec 26, 2016 at 5:36 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Sat, Dec 24, 2016 at 08:59:53PM +0100, Daniel Borkmann wrote:
>> On 12/24/2016 03:22 AM, Andy Lutomirski wrote:
>> >BPF digests are intended to be used to avoid reloading programs that
>> >are already loaded.  For use cases (CRIU?) where untrusted programs
>> >are involved, intentional hash collisions could cause the wrong BPF
>> >program to execute.  Additionally, if BPF digests are ever used
>> >in-kernel to skip verification, a hash collision could give privilege
>> >escalation directly.
>>
>> Just for the record, digests will never ever be used to skip the
>> verification step, so I don't know why this idea even comes up
>> here (?) or is part of the changelog? As this will never be done
>> anyway, rather drop that part so we can avoid confusion on this?
>
> +1 to what Daniel said above.
>
> For the others let me explain what this patch set is actually
> trying to accomplish.

The patch:

a) cleans up the code and

b) uses a cryptographic hash that is actually believed to satisfy the
definition of a cryptographic hash.

There's no excuse for not doing b.

> and I have an obvious NACK for bpf related patches 3,4,5,6.

Did you *read* the ones that were pure cleanups?

>
> sha1 is 20 bytes which is already a bit long to print and copy paste by humans.
> whereas 4 byte jhash is a bit too short, since collisions are not that rare
> and may lead to incorrect assumptions from the users that develop the programs.
> I would prefer something in 6-10 byte range that prevents collisions most of
> the time and short to print as hex, but I don't know of anything like this
> in the existing kernel and inventing bpf specific hash is not great.
> Another requirement for debugging (and prog_digest) that user space
> should be able to produce the same hash without asking kernel, so
> sha1 fits that as well, since it's well known and easy to put into library.

Then truncate them in user space.

^ permalink raw reply

* [PATCH net] net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
From: Jason Wang @ 2016-12-27  2:49 UTC (permalink / raw)
  To: davem, linux-kernel, netdev; +Cc: Jason Wang, Daniel Borkmann, John Fastabend

After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/linux/filter.h | 1 -
 net/core/filter.c      | 6 ------
 2 files changed, 7 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7023142..a0934e6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -610,7 +610,6 @@ bool bpf_helper_changes_pkt_data(void *func);
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 void bpf_warn_invalid_xdp_action(u32 act);
-void bpf_warn_invalid_xdp_buffer(void);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/net/core/filter.c b/net/core/filter.c
index 7190bd6..b146170 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2972,12 +2972,6 @@ void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
-void bpf_warn_invalid_xdp_buffer(void)
-{
-	WARN_ONCE(1, "Illegal XDP buffer encountered, expect throughput degradation\n");
-}
-EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_buffer);
-
 static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					int src_reg, int ctx_off,
 					struct bpf_insn *insn_buf,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net 1/9] virtio-net: remove the warning before XDP linearizing
From: Jason Wang @ 2016-12-27  3:08 UTC (permalink / raw)
  To: Daniel Borkmann, mst, virtualization, netdev, linux-kernel
  Cc: john.r.fastabend
In-Reply-To: <585D7BA9.50509@iogearbox.net>



On 2016年12月24日 03:31, Daniel Borkmann wrote:
> Hi Jason,
>
> On 12/23/2016 03:37 PM, Jason Wang wrote:
>> Since we use EWMA to estimate the size of rx buffer. When rx buffer
>> size is underestimated, it's usual to have a packet with more than one
>> buffers. Consider this is not a bug, remove the warning and correct
>> the comment before XDP linearizing.
>>
>> Cc: John Fastabend <john.r.fastabend@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 8 +-------
>>   1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 08327e0..1067253 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -552,14 +552,8 @@ static struct sk_buff *receive_mergeable(struct 
>> net_device *dev,
>>           struct page *xdp_page;
>>           u32 act;
>>
>> -        /* No known backend devices should send packets with
>> -         * more than a single buffer when XDP conditions are
>> -         * met. However it is not strictly illegal so the case
>> -         * is handled as an exception and a warning is thrown.
>> -         */
>> +        /* This happens when rx buffer size is underestimated */
>>           if (unlikely(num_buf > 1)) {
>> -            bpf_warn_invalid_xdp_buffer();
>
> Could you also remove the bpf_warn_invalid_xdp_buffer(), which got added
> just for this?
>
> Thanks. 

Posted.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC PATCH 1/7] PF driver modified to enable HW filter support, changes works in backward compatibility mode Enable required things in Makefile Enable LZ4 dependecy inside config file
From: Sunil Kovvuri @ 2016-12-27  4:19 UTC (permalink / raw)
  To: Koteshwar Rao, Satha
  Cc: LKML, Goutham, Sunil, Robert Richter, David S. Miller,
	Daney, David, Vatsavayi, Raghu, Chickles, Derek, Romanov, Philip,
	Linux Netdev List, LAKML
In-Reply-To: <DM5PR07MB28428EE7B1679FAEC78103AD8D960@DM5PR07MB2842.namprd07.prod.outlook.com>

>>  #define NIC_MAX_RSS_HASH_BITS          8
>>  #define NIC_MAX_RSS_IDR_TBL_SIZE       (1 << NIC_MAX_RSS_HASH_BITS)
>> +#define NIC_TNS_RSS_IDR_TBL_SIZE       5
>
> So you want to use only 5 queues per VF when TNS is enabled, is it ??
> There are 4096 RSS indices in total, for each VF you can use max 32.
> I guess you wanted to set no of hash bits to 5 instead of table size.
>
> SATHA>>> We enabled 8 queues for VF. Yes Macro name misleads it has to be hash bits, will change this in next version

No, I am not referring to any discrepancy in naming the macro.
If you check your code

- hw->rss_ind_tbl_size = NIC_MAX_RSS_IDR_TBL_SIZE;
+ hw->rss_ind_tbl_size = veb_enabled ? NIC_TNS_RSS_IDR_TBL_SIZE :
+     NIC_MAX_RSS_IDR_TBL_SIZE;

You are setting RSS table size to 5, i.e RSS hash bits will be set to 2.
Hence only 4 queues (not even 5 as i mentioned earlier). Please check
'nicvf_rss_init' you will understand what i am saying.
Have you tested and observed pkts in all 8 queues ?

^ permalink raw reply

* Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: David Miller @ 2016-12-27  4:54 UTC (permalink / raw)
  To: hock.leong.kweh
  Cc: Joao.Pinto, peppe.cavallaro, seraphin.bonnaffe, alexandre.torgue,
	manabian, niklas.cassel, johan, pavel, boon.leong.ong, netdev,
	linux-kernel, weifeng.voon, lars.persson
In-Reply-To: <1482839100-20612-1-git-send-email-hock.leong.kweh@intel.com>

From: "Kweh, Hock Leong"	<hock.leong.kweh@intel.com>
Date: Tue, 27 Dec 2016 19:44:59 +0800

> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> 
> If kernel module stmmac driver being loaded after OS booted, there is a
> race condition between stmmac_open() and stmmac_mdio_register(), which is
> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> PHY not found and stmmac_open() failed:
 ...
> The resolution used wait_for_completion_interruptible() to synchronize
> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> happening.
> 
> Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>

The proper thing to do is to make sure register_netdevice() is not
invoked until it is %100 safe to call stmmac_open().

^ permalink raw reply

* Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: Florian Fainelli @ 2016-12-27  5:10 UTC (permalink / raw)
  To: Kweh, Hock Leong, David S. Miller, Joao Pinto, Giuseppe CAVALLARO,
	seraphin.bonnaffe
  Cc: Alexandre TORGUE, Joachim Eastwood, Niklas Cassel, Johan Hovold,
	pavel, Ong Boon Leong, netdev, LKML, weifeng.voon, Lars Persson
In-Reply-To: <1482839100-20612-1-git-send-email-hock.leong.kweh@intel.com>



On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> 
> If kernel module stmmac driver being loaded after OS booted, there is a
> race condition between stmmac_open() and stmmac_mdio_register(), which is
> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> PHY not found and stmmac_open() failed:
> [  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> 		stmmac_dvr_probe: warning: cannot get CSR clock
> [  473.919382] stmmaceth 0000:01:00.0: no reset control found
> [  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
> [  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
> [  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
> [  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
> [  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> 		Enable RX Mitigation via HW Watchdog Timer
> [  473.921395] libphy: PHY stmmac-1:00 not found
> [  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
> [  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
> 		PHY (error: -19)
> [  473.959710] libphy: stmmac: probed
> [  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
> 		(stmmac-1:00) active
> [  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
> 		(stmmac-1:01)
> [  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
> 		(stmmac-1:02)
> [  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
> 		(stmmac-1:03)
> 
> The resolution used wait_for_completion_interruptible() to synchronize
> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> happening.

The proper fix for this would be to have register_netdev() be the last
thing done in stmmac_drv_probe(), whereas right now, the last thing done
is stmmac_mdio_register(), leading the window you are seeing here, where
the network interface can be open prior to all resources being set up,
including, but not limited to MDIO devices.
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: Florian Fainelli @ 2016-12-27  5:13 UTC (permalink / raw)
  To: Kweh, Hock Leong, David S. Miller, Joao Pinto, Giuseppe CAVALLARO,
	seraphin.bonnaffe
  Cc: Alexandre TORGUE, Joachim Eastwood, Niklas Cassel, Johan Hovold,
	pavel, Ong Boon Leong, netdev, LKML, weifeng.voon, Lars Persson
In-Reply-To: <461cd45a-70a5-1f08-816d-3c210d694083@gmail.com>



On 12/26/2016 09:10 PM, Florian Fainelli wrote:
> 
> 
> On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
>> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
>>
>> If kernel module stmmac driver being loaded after OS booted, there is a
>> race condition between stmmac_open() and stmmac_mdio_register(), which is
>> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
>> PHY not found and stmmac_open() failed:
>> [  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
>> 		stmmac_dvr_probe: warning: cannot get CSR clock
>> [  473.919382] stmmaceth 0000:01:00.0: no reset control found
>> [  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
>> [  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported
>> [  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
>> [  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
>> [  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
>> 		Enable RX Mitigation via HW Watchdog Timer
>> [  473.921395] libphy: PHY stmmac-1:00 not found
>> [  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
>> [  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to
>> 		PHY (error: -19)
>> [  473.959710] libphy: stmmac: probed
>> [  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
>> 		(stmmac-1:00) active
>> [  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
>> 		(stmmac-1:01)
>> [  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
>> 		(stmmac-1:02)
>> [  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
>> 		(stmmac-1:03)
>>
>> The resolution used wait_for_completion_interruptible() to synchronize
>> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
>> happening.
> 
> The proper fix for this would be to have register_netdev() be the last
> thing done in stmmac_drv_probe(), whereas right now, the last thing done
> is stmmac_mdio_register(), leading the window you are seeing here, where
> the network interface can be open prior to all resources being set up,
> including, but not limited to MDIO devices.

Something like the following untested patch should plug this race:

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bb40382e205d..5910ea51f8f6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3339,13 +3339,6 @@ int stmmac_dvr_probe(struct device *device,

        spin_lock_init(&priv->lock);

-       ret = register_netdev(ndev);
-       if (ret) {
-               netdev_err(priv->dev, "%s: ERROR %i registering the
device\n",
-                          __func__, ret);
-               goto error_netdev_register;
-       }
-
        /* If a specific clk_csr value is passed from the platform
         * this means that the CSR Clock Range selection cannot be
         * changed at run-time and it is fixed. Viceversa the driver'll
try to
@@ -3372,11 +3365,14 @@ int stmmac_dvr_probe(struct device *device,
                }
        }

-       return 0;
+       ret = register_netdev(ndev);
+       if (ret)
+               netdev_err(priv->dev, "%s: ERROR %i registering the
device\n",
+                          __func__, ret);
+
+       return ret;

 error_mdio_register:
-       unregister_netdev(ndev);
-error_netdev_register:
        netif_napi_del(&priv->napi);
 error_hw_init:
        clk_disable_unprepare(priv->pclk);

-- 
Florian

^ permalink raw reply related

* RE: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-27  5:25 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller, Joao Pinto, Giuseppe CAVALLARO,
	seraphin.bonnaffe@st.com
  Cc: Alexandre TORGUE, Joachim Eastwood, Niklas Cassel, Johan Hovold,
	pavel@ucw.cz, Ong, Boon Leong, netdev, LKML, Voon, Weifeng,
	Lars Persson
In-Reply-To: <6df6425b-bb0c-1e74-b5e1-a221447b761f@gmail.com>

> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Tuesday, December 27, 2016 1:14 PM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>; David S. Miller
> <davem@davemloft.net>; Joao Pinto <Joao.Pinto@synopsys.com>; Giuseppe
> CAVALLARO <peppe.cavallaro@st.com>; seraphin.bonnaffe@st.com
> Cc: Alexandre TORGUE <alexandre.torgue@gmail.com>; Joachim Eastwood
> <manabian@gmail.com>; Niklas Cassel <niklas.cassel@axis.com>; Johan Hovold
> <johan@kernel.org>; pavel@ucw.cz; Ong, Boon Leong
> <boon.leong.ong@intel.com>; netdev <netdev@vger.kernel.org>; LKML <linux-
> kernel@vger.kernel.org>; Voon, Weifeng <weifeng.voon@intel.com>; Lars
> Persson <lars.persson@axis.com>
> Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and
> stmmac_dvr_probe
> 
> 
> 
> On 12/26/2016 09:10 PM, Florian Fainelli wrote:
> >
> >
> > On 12/27/2016 03:44 AM, Kweh, Hock Leong wrote:
> >> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >>
> >> If kernel module stmmac driver being loaded after OS booted, there is a
> >> race condition between stmmac_open() and stmmac_mdio_register(), which
> is
> >> invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> >> PHY not found and stmmac_open() failed:
> >> [  473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> 		stmmac_dvr_probe: warning: cannot get CSR clock
> >> [  473.919382] stmmaceth 0000:01:00.0: no reset control found
> >> [  473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42
> >> [  473.919429] stmmaceth 0000:01:00.0: DMA HW capability register
> supported
> >> [  473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine
> supported
> >> [  473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported
> >> [  473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized):
> >> 		Enable RX Mitigation via HW Watchdog Timer
> >> [  473.921395] libphy: PHY stmmac-1:00 not found
> >> [  473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY
> >> [  473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach
> to
> >> 		PHY (error: -19)
> >> [  473.959710] libphy: stmmac: probed
> >> [  473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL
> >> 		(stmmac-1:00) active
> >> [  473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL
> >> 		(stmmac-1:01)
> >> [  473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL
> >> 		(stmmac-1:02)
> >> [  473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL
> >> 		(stmmac-1:03)
> >>
> >> The resolution used wait_for_completion_interruptible() to synchronize
> >> stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> >> happening.
> >
> > The proper fix for this would be to have register_netdev() be the last
> > thing done in stmmac_drv_probe(), whereas right now, the last thing done
> > is stmmac_mdio_register(), leading the window you are seeing here, where
> > the network interface can be open prior to all resources being set up,
> > including, but not limited to MDIO devices.
> 
> Something like the following untested patch should plug this race:
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index bb40382e205d..5910ea51f8f6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3339,13 +3339,6 @@ int stmmac_dvr_probe(struct device *device,
> 
>         spin_lock_init(&priv->lock);
> 
> -       ret = register_netdev(ndev);
> -       if (ret) {
> -               netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> -                          __func__, ret);
> -               goto error_netdev_register;
> -       }
> -
>         /* If a specific clk_csr value is passed from the platform
>          * this means that the CSR Clock Range selection cannot be
>          * changed at run-time and it is fixed. Viceversa the driver'll
> try to
> @@ -3372,11 +3365,14 @@ int stmmac_dvr_probe(struct device *device,
>                 }
>         }
> 
> -       return 0;
> +       ret = register_netdev(ndev);
> +       if (ret)
> +               netdev_err(priv->dev, "%s: ERROR %i registering the
> device\n",
> +                          __func__, ret);
> +
> +       return ret;
> 
>  error_mdio_register:
> -       unregister_netdev(ndev);
> -error_netdev_register:
>         netif_napi_del(&priv->napi);
>  error_hw_init:
>         clk_disable_unprepare(priv->pclk);
> 
> --
> Florian

Thanks. Will try out to confirm.

Regards,
Wilson

^ permalink raw reply

* RE: [PATCH] net: stmmac: synchronize stmmac_open and stmmac_dvr_probe
From: Kweh, Hock Leong @ 2016-12-27  5:26 UTC (permalink / raw)
  To: David Miller
  Cc: Joao.Pinto@synopsys.com, peppe.cavallaro@st.com,
	seraphin.bonnaffe@st.com, alexandre.torgue@gmail.com,
	manabian@gmail.com, niklas.cassel@axis.com, johan@kernel.org,
	pavel@ucw.cz, Ong, Boon Leong, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Voon, Weifeng,
	lars.persson@axis.com
In-Reply-To: <20161226.235436.131087168899382517.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, December 27, 2016 12:55 PM
> To: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> Cc: Joao.Pinto@synopsys.com; peppe.cavallaro@st.com;
> seraphin.bonnaffe@st.com; alexandre.torgue@gmail.com;
> manabian@gmail.com; niklas.cassel@axis.com; johan@kernel.org;
> pavel@ucw.cz; Ong, Boon Leong <boon.leong.ong@intel.com>;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Voon, Weifeng
> <weifeng.voon@intel.com>; lars.persson@axis.com
> Subject: Re: [PATCH] net: stmmac: synchronize stmmac_open and
> stmmac_dvr_probe
> 
> From: "Kweh, Hock Leong"	<hock.leong.kweh@intel.com>
> Date: Tue, 27 Dec 2016 19:44:59 +0800
> 
> > From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
> >
> > If kernel module stmmac driver being loaded after OS booted, there is a
> > race condition between stmmac_open() and stmmac_mdio_register(), which is
> > invoked inside stmmac_dvr_probe(), and the error is showed in dmesg log as
> > PHY not found and stmmac_open() failed:
>  ...
> > The resolution used wait_for_completion_interruptible() to synchronize
> > stmmac_open() and stmmac_dvr_probe() to prevent the race condition
> > happening.
> >
> > Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
> 
> The proper thing to do is to make sure register_netdevice() is not
> invoked until it is %100 safe to call stmmac_open().

Noted & thanks. Will look into it.

Regards,
Wilson

^ permalink raw reply

* Re: driver r8169 suddenly failed
From: Robert Grasso @ 2016-12-27  7:33 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <20161227001248.GA20172@electric-eye.fr.zoreil.com>

Thank you for your quick answer ! Allow me some days for this test :

1 - I am at work these days
2 - it has been a while since I have dealt with custom kernels, I am a 
bit rusty, it will be good to review this topic

-- 
Robert Grasso
@home
---
UNIX was not designed to stop you from doing stupid things, because
   that would also stop you from doing clever things. -- Doug Gwyn

On 27/12/2016 01:12, Francois Romieu wrote:
> Robert Grasso <robert.grasso@modulonet.fr> :
> [dhcp snafu]
>> First of all, can you confirm that I am doing right in posting to you
>> (addresses found in README.Debian) ?
> It isn't completely wrong.
>
>> If I do, can you help ? I am not very proficient with Ethernet, and I am not
>> able to figure out what my provider changed : their hotline is
>> underqualified, they are just able to tell that "the signal on the line is
>> ok".
> You're spoiled. It is more than decent for a company whose core business
> used to sell cable TV.
>
>> But if you want me to run various tests, try new versions, I would be
>> glad to do so : I would appreciate if I could salvage this Shuttle.
> Please try a recent stable vanilla kernel and send a complete dmesg
> from boot. I need to identify the specific 816x chipset.
>
> Then record some traffic:
>
> # touch gonzo.pcap && tshark -w gonzo.pcap -i eth1
>
> It should only exhibit small outgoing packets but, well, one never knows.
>
> Check the leds activity to be sure that the network interfaces have not
> been renumbered.
>
> Use ethtool to check that tso is disabled. If it isn't, disable it.
>

^ permalink raw reply

* [PATCH] net: fix incorrect original ingress device index in PKTINFO
From: Wei Zhang @ 2016-12-27  7:52 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber; +Cc: netdev, linux-kernel

When we send a packet for our own local address on a non-loopback interface
(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
("net: original ingress device index in PKTINFO"), the original ingress
device index would be set as the loopback interface. However, the packet
should be considered as if it is being arrived via the sending interface
(eth0), otherwise it would break the expectation of the userspace
application (e.g. the DHCPRELEASE message from dhcp_release binary would
be ignored by the dnsmasq daemon)

Signed-off-by: Wei Zhang <asuka.com@163.com>
---
 net/ipv4/ip_sockglue.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..b6a6d35 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
 		 * which has interface index (iif) as the first member of the
 		 * underlying inet{6}_skb_parm struct. This code then overlays
 		 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-		 * element so the iif is picked up from the prior IPCB
+		 * element so the iif is picked up from the prior IPCB except
+		 * iif is loopback interface which the packet should be 
+		 * considered as if it is being arrived via the sending interface
 		 */
+		if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
+			pktinfo->ipi_ifindex = inet_iif(skb);
+		}
 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
 	} else {
 		pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] net: dev_weight: TX/RX orthogonality
From: Matthias Tafelmeier @ 2016-12-27  8:25 UTC (permalink / raw)
  To: netdev; +Cc: hagen, fw, edumazet, daniel
In-Reply-To: <20161226.115818.1691666598003246082.davem@davemloft.net>

Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.

This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).

A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.

This patch therefore introduces an independent configurability via sysctl to
userland.
---
 include/linux/netdevice.h  |  2 ++
 net/core/dev.c             |  4 +++-
 net/core/sysctl_net_core.c | 14 ++++++++++++++
 net/sched/sch_generic.c    |  2 +-
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 994f742..bb331e0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3795,6 +3795,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
+extern int		dev_w_rx_bias;
+extern int		dev_w_tx_bias;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev);
 struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 8db5a0b..0dcbd28 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3428,6 +3428,8 @@ EXPORT_SYMBOL(netdev_max_backlog);
 int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;            /* old backlog weight */
+int dev_w_rx_bias __read_mostly = 1;            /* bias for backlog weight */
+int dev_w_tx_bias __read_mostly = 1;            /* bias for output_queue quota */
 
 /* Called with irq disabled */
 static inline void ____napi_schedule(struct softnet_data *sd,
@@ -4833,7 +4835,7 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		net_rps_action_and_irq_enable(sd);
 	}
 
-	napi->weight = weight_p;
+	napi->weight = weight_p * dev_w_rx_bias;
 	while (again) {
 		struct sk_buff *skb;
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 2a46e40..a2ab149 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -276,6 +276,20 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
+		.procname	= "dev_w_rx_bias",
+		.data		= &dev_w_rx_bias,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "dev_w_tx_bias",
+		.data		= &dev_w_tx_bias,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
 		.procname	= "netdev_max_backlog",
 		.data		= &netdev_max_backlog,
 		.maxlen		= sizeof(int),
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6eb9c8e..4c07780 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -247,7 +247,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
 
 void __qdisc_run(struct Qdisc *q)
 {
-	int quota = weight_p;
+	int quota = weight_p * dev_w_tx_bias;
 	int packets;
 
 	while (qdisc_restart(q, &packets)) {
-- 
2.7.4

^ permalink raw reply related

* Re:[PATCH] net: fix incorrect original ingress device index in PKTINFO
From: wei zhang @ 2016-12-27  8:29 UTC (permalink / raw)
  To: kuznet, davem, jmorris, yoshfuji, kaber; +Cc: netdev, linux-kernel
In-Reply-To: <1482825138-2125-1-git-send-email-asuka.com@163.com>

At 2016-12-27 15:52:18, "Wei Zhang" <asuka.com@163.com> wrote:
>When we send a packet for our own local address on a non-loopback interface
>(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
>("net: original ingress device index in PKTINFO"), the original ingress
>device index would be set as the loopback interface. However, the packet
>should be considered as if it is being arrived via the sending interface
>(eth0), otherwise it would break the expectation of the userspace
>application (e.g. the DHCPRELEASE message from dhcp_release binary would
>be ignored by the dnsmasq daemon)
>
>Signed-off-by: Wei Zhang <asuka.com@163.com>
>---
> net/ipv4/ip_sockglue.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
>diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>index b8a2d63..b6a6d35 100644
>--- a/net/ipv4/ip_sockglue.c
>+++ b/net/ipv4/ip_sockglue.c
>@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
> 		 * which has interface index (iif) as the first member of the
> 		 * underlying inet{6}_skb_parm struct. This code then overlays
> 		 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
>-		 * element so the iif is picked up from the prior IPCB
>+		 * element so the iif is picked up from the prior IPCB except
>+		 * iif is loopback interface which the packet should be 
>+		 * considered as if it is being arrived via the sending interface
> 		 */
>+		if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
>+			pktinfo->ipi_ifindex = inet_iif(skb);
>+		}
> 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
> 	} else {
> 		pktinfo->ipi_ifindex = 0;
>-- 
>1.8.3.1
>

  When I upgrade to the 4.9, the dhcp_release could not release the dhcp
lease, the dnsmasq ignored all the DHCPRELEASE message which it think come
from lo. I think this is due to the commit 0b922b7a829c, which set the
IPCB(skb)->iif = skb->skb_iif in the ip_rcv()!

  And I'm very sorry about forgetting checkpatch, I will resend the patch,
hope I'm not bothering you!

Thanks,
Wei Zhang

^ permalink raw reply

* [PATCH net-next] r8169: add support for RTL8168 series add-on card.
From: Chun-Hao Lin @ 2016-12-27  8:29 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, Chun-Hao Lin

This chip is the same as RTL8168, but its device id is 0x8161.

Signed-off-by: Chun-Hao Lin <hau@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index f9b97f5..44389c9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -326,6 +326,7 @@ enum cfg_version {
 static const struct pci_device_id rtl8169_pci_tbl[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8129), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8136), 0, 0, RTL_CFG_2 },
+	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8161), 0, 0, RTL_CFG_1 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8167), 0, 0, RTL_CFG_0 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8168), 0, 0, RTL_CFG_1 },
 	{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK,	0x8169), 0, 0, RTL_CFG_0 },
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox