Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-26  8:33 UTC (permalink / raw)
  To: maowenan, Alexander Duyck
  Cc: Jeff Kirsher, Stephen Hemminger, netdev@vger.kernel.org,
	weiyongjun (A), Dingtianhong, Wangzhou (B)
In-Reply-To: <CAKgT0Ud0=fQpADLHPnZBLrG8xLGHUB20TZt1mi+GBNYJQngDCA@mail.gmail.com>



> -----Original Message-----
> From: maowenan
> Sent: Saturday, December 24, 2016 4:30 PM
> To: 'Alexander Duyck'
> Cc: Jeff Kirsher; Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> Dingtianhong; Wangzhou (B)
> Subject: RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> 
> 
> > -----Original Message-----
> > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > Sent: Friday, December 23, 2016 11:43 PM
> > To: maowenan
> > Cc: Jeff Kirsher; Stephen Hemminger; netdev@vger.kernel.org;
> > weiyongjun (A); Dingtianhong
> > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > ordering mode
> >
> > On Thu, Dec 22, 2016 at 10:14 PM, maowenan <maowenan@huawei.com>
> > wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Jeff Kirsher [mailto:jeffrey.t.kirsher@intel.com]
> > >> Sent: Friday, December 23, 2016 9:07 AM
> > >> To: maowenan; Alexander Duyck
> > >> Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> > >> Dingtianhong
> > >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > >> ordering mode
> > >>
> > >> On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > >> > > -----Original Message-----
> > >> > > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > >> > > Sent: Thursday, December 22, 2016 11:54 PM
> > >> > > To: maowenan
> > >> > > Cc: Stephen Hemminger; netdev@vger.kernel.org;
> > jeffrey.t.kirsher@intel.
> > >> > > com;
> > >> > > weiyongjun (A); Dingtianhong
> > >> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > >> > > relax ordering mode
> > >> > >
> > >> > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan
> > <maowenan@huawei.com>
> > >> > > wrote:
> > >> > > >
> > >> > > >
> > >> > > > > -----Original Message-----
> > >> > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > >> > > > > Sent: Thursday, December 22, 2016 9:28 AM
> > >> > > > > To: maowenan
> > >> > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > >> > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > >> > > > > relax ordering mode
> > >> > > > >
> > >> > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan
> > >> > > > > <maowenan@huawei.com> wrote:
> > >> > > > >
> > >> > > > > > This patch provides one way to set/unset IXGBE NIC TX and
> > >> > > > > > RX relax ordering mode, which can be set by ethtool.
> > >> > > > > > Relax ordering is one mode of 82599 NIC, to enable this
> > >> > > > > > mode can enhance the performance for some cpu architecure.
> > >> > > > >
> > >> > > > > Then it should be done by CPU architecture specific quirks
> > >> > > > > (preferably in PCI
> > >> > > > > layer) so that all users get the option without having to
> > >> > > > > do manual
> > >> > >
> > >> > > intervention.
> > >> > > > >
> > >> > > > > > example:
> > >> > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0
> > >> > > > > > relaxorder on
> > >> > > > >
> > >> > > > > Doing it via ethtool is a developer API (for testing) not
> > >> > > > > something that makes sense in production.
> > >> > > >
> > >> > > >
> > >> > > > This feature is not mandatory for all users, acturally relax
> > >> > > > ordering default configuration of 82599 is 'disable', So this
> > >> > > > patch gives one way to
> > >> > >
> > >> > > enable relax ordering to be selected in some performance condition.
> > >> > >
> > >> > > That isn't quite true.  The default for Sparc systems is to
> > >> > > have it enabled.
> > >> > >
> > >> > > Really this is something that is platform specific.  I agree
> > >> > > with Stephen that it would work better if this was handled as a
> > >> > > series of platform specific quirks handled at something like
> > >> > > the PCI layer rather than be a switch the user can toggle on and off.
> > >> > >
> > >> > > With that being said there are changes being made that should
> > >> > > help to improve the situation.  Specifically I am looking at
> > >> > > adding support for the DMA_ATTR_WEAK_ORDERING which may also
> > >> > > allow us to identify cases where you might be able to specify
> > >> > > the DMA behavior via the DMA mapping instead of having to make
> > >> > > the final decision in the device itself.
> > >> > >
> > >> > > - Alex
> > >> >
> > >> > Yes, Sparc is a special case. From the NIC driver point of view,
> > >> > It is no need for some ARCHs to do particular operation and
> > >> > compiling branch, ethtool is a flexible method for user to make
> > >> > decision whether
> > >> > on|off this feature.
> > >> > I think Jeff as maintainer of 82599 has some comments about this.
> > >>
> > >> My original comment/objection was that you attempted to do this
> > >> change as a module parameter to the ixgbe driver, where I directed
> > >> you to use ethtool so that other drivers could benefit from the
> > >> ability to enable/disable relaxed ordering.  As far as how it gets
> > >> implemented in ethtool or PCI layer, makes little difference to me,
> > >> I only had issues with the driver specific module parameter
> > >> implementation,
> > which is not acceptable.
> > >
> > >
> > > Thank you Jeff and Alex.
> > > And then I have gone through mail thread about "i40e: enable PCIe
> > > relax ordering for SPARC", It only works for SPARC, any other ARCH
> > > who wants to enable DMA_ATTR_WEAK_ORDERING feature, should define
> > > the
> > new macro, recompile the driver module.
> > >
> > > Because of the above reasons, we implement in ethtool to give the
> > > final user a convenient way to on|off special feature, no need
> > > define new macro, easy to extend the new features, and also good
> > > benefit for other
> > driver as Jeff referred.
> > >
> >
> > I think the point is we shouldn't base the decision on user input.
> > The fact is the PCIe device control register should have a bit that
> > indicates if the device is allowed to enable relaxed ordering or not.
> > If we can guarantee that the bit is set in all the cases where it
> > should be set, and cleared in all the cases where it should not then
> > we could use something like that to determine if the device is
> > supposed to enable relaxed ordering instead of trying to make the decision
> ourselves.
> >
> > - Alex
> 
> ok. We are focusing on the register.
> And yes, to enable relax ordering for 82599 should be set by one or more bits of
> Rx/TX DCA Control Register, these bits should be set in many cpu architectures,
> such as arm64, sparc, and so on, and should be cleared in other ARCHs.
> By the way, how do you enable SPARC macro, how and where to define this
> compiling macro when user one to enable relax ordering under SPARC system?
> #ifndef CONFIG_SPARC
> 
> 


Hi, Alex,
Have you already sent out the patches about DMA_ATTR_WEAK_ORDERING?
We want to get you how to enable DMA_ATTR_WEAK_ORDERING by PCIe layer,
and we can refer to that.


^ permalink raw reply

* Re: [RFC PATCH 4.10 0/6] Switch BPF's digest to SHA256
From: Herbert Xu @ 2016-12-26  8:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: daniel, netdev, linux-kernel, linux-crypto, Jason, hannes,
	alexei.starovoitov, edumazet, ebiggers3, tom, davem, luto
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

Andy Lutomirski <luto@kernel.org> wrote:
> Since there are plenty of uses for the new-in-4.10 BPF digest feature
> that would be problematic if malicious users could produce collisions,
> the BPF digest should be collision-resistant.  SHA-1 is no longer
> considered collision-resistant, so switch it to SHA-256.
> 
> The actual switchover is trivial.  Most of this series consists of
> cleanups to the SHA256 code to make it usable as a standalone library
> (since BPF should not depend on crypto).
> 
> The cleaned up library is much more user-friendly than the SHA-1 code,
> so this also significantly tidies up the BPF digest code.
> 
> This is intended for 4.10.  If this series misses 4.10 and nothing
> takes its place, then we'll have an unpleasant ABI stability
> situation.

Can you please explain why BPF needs to be able to use SHA directly
rather than through the crypto API?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* driver r8169 suddenly failed
From: Robert Grasso @ 2016-12-26  8:01 UTC (permalink / raw)
  To: Realtek linux nic maintainers, Francois Romieu; +Cc: netdev

Hello,

I am a senior Linux sysadmin.
At home, I am using a Shuttle DS47 barebones computer as my firewall; it 
contains the following two network cards :
01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 
802.11b/g/n WiFi Adapter (rev 01)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)

The OS is Ubuntu 14.04.5 LTS with kernel 3.13.0-101-generic. The kernel 
has been using the driver r8169 successfully from the very first 
installation (2 or 3 years ago).

The first NIC is connected onto my french provider's (Numericable) cable 
modem Netgear CBVG834G
The second NIC goes to my LAN.

Last friday, suddenly, the connection to the cable modem failed. From 
that moment, acquiring a DHCP address fails always, the request is 
looping endlessly :

DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3 (xid=0x7da87c44)
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3 (xid=0x7da87c44)
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5 (xid=0x7da87c44)
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 (xid=0x7da87c44)
(...)

I tried to boot the Shuttle from an USB dongle with Ubuntu 16.04 : the 
connection failed in the same way.

My Ubuntu 16 laptop (a brand new Dell XPS)  connects successfully on the 
cable modem (through Ethernet - by choice, no wireless at my home). 
Also, I am using a cheap Ethernet-to-USB converter connected on the 
Shuttle and on the cable modem in order to get a temporary Internet 
access - this allows me to send this email.

I am using dhclient.

I just installed r8168-dkms_8.043.02-1_all.deb, but this does not fix 
the issue.

I assume that my ISP upgraded something in the cable modem.

First of all, can you confirm that I am doing right in posting to you 
(addresses found in README.Debian) ?

If I do, can you help ? I am not very proficient with Ethernet, and I am 
not able to figure out what my provider changed : their hotline is 
underqualified, they are just able to tell that "the signal on the line 
is ok". But if you want me to run various tests, try new versions, I 
would be glad to do so : I would appreciate if I could salvage this Shuttle.

Best regards

-- 
Robert Grasso
@home
---
UNIX was not designed to stop you from doing stupid things, because
   that would also stop you from doing clever things. -- Doug Gwyn

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Herbert Xu @ 2016-12-26  7:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ard Biesheuvel, Andy Lutomirski, Daniel Borkmann, Netdev, LKML,
	Linux Crypto Mailing List, Jason A. Donenfeld,
	Hannes Frederic Sowa, Alexei Starovoitov, Eric Dumazet,
	Eric Biggers, Tom Herbert, David S. Miller
In-Reply-To: <CALCETrVbT_1=cdU1+a-+KbhoFqeT3XvbHjY0s_U7C5JVgiPx_Q@mail.gmail.com>

On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
> 
> I actually do use incremental hashing later on.   BPF currently
> vmallocs() a big temporary buffer just so it can fill it and hash it.
> I change it to hash as it goes.

How much data is this supposed to hash on average? If it's a large
amount then perhaps using the existing crypto API would be a better
option than adding this.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: 4.9.0-rc8: tg3 dead after resume
From: Siva Reddy Kallam @ 2016-12-26  6:27 UTC (permalink / raw)
  To: Billy Shuman; +Cc: Michael Chan, Netdev

On Mon, Dec 12, 2016 at 3:53 PM, Siva Reddy Kallam
<siva.kallam@broadcom.com> wrote:
> On Fri, Dec 9, 2016 at 7:59 PM, Billy Shuman <wshuman3@gmail.com> wrote:
>> On Thu, Dec 8, 2016 at 4:03 AM, Siva Reddy Kallam
>> <siva.kallam@broadcom.com> wrote:
>>> On Thu, Dec 8, 2016 at 12:14 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>>>> On Wed, Dec 7, 2016 at 12:37 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>>>> On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>>>>>> After resume on 4.9.0-rc8 tg3 is dead.
>>>>>>
>>>>>> In logs I see:
>>>>>> kernel: tg3 0000:44:00.0: phy probe failed, err -19
>>>>>> kernel: tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
>>>>>
>>>>> -19 is -ENODEV which means tg3 cannot read the PHY ID.
>>>>>
>>>>> If it's a true suspend/resume operation, the driver does not have to
>>>>> go through probe during resume.  Please explain how you do
>>>>> suspend/resume.
>>>>>
>>>>
>>>> Sorry my previous message was accidentally sent to early.
>>>>
>>>> I used systemd (systemctl suspend) to suspend.
>>>>
>>> We need more information to proceed further.
>>> Without suspend, Are you able to use the tg3 port?
>>
>> Yes the port works fine without suspend.
> OK
>>
>>> Which Broadcom card are you having in laptop?
>>
>> The nic is a NetXtreme BCM57762 Gigabit Ethernet PCIe in a thunderbolt3 dock.
>>
> OK
>>> Please provide complete tg3 specific logs in dmesg.
>>>
>>
>> [   32.084010] tg3.c:v3.137 (May 11, 2014)
>> [   32.124695] tg3 0000:44:00.0 eth0: Tigon3 [partno(BCM957762) rev
>> 57766001] (PCI Express) MAC address 98:e7:f4:8b:13:19
>> [   32.124698] tg3 0000:44:00.0 eth0: attached PHY is 57765
>> (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
>> [   32.124699] tg3 0000:44:00.0 eth0: RXcsums[1] LinkChgREG[0]
>> MIirq[0] ASF[0] TSOcap[1]
>> [   32.124700] tg3 0000:44:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
>> [   32.219764] tg3 0000:44:00.0 enp68s0: renamed from eth0
>> [   36.219245] tg3 0000:44:00.0 enp68s0: Link is up at 1000 Mbps, full duplex
>> [   36.219250] tg3 0000:44:00.0 enp68s0: Flow control is on for TX and on for RX
>> [   36.219251] tg3 0000:44:00.0 enp68s0: EEE is disabled
>>
>> after resume
>> [   92.292838] tg3 0000:44:00.0 enp68s0: No firmware running
>> [   93.521744] tg3 0000:44:00.0: tg3_abort_hw timed out,
>> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>> [  106.704655] tg3 0000:44:00.0 enp68s0: Link is down
>> [  108.370356] tg3 0000:44:00.0: tg3_abort_hw timed out,
>> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>>
>> after rmmod, modprobe
>> [  570.933636] tg3 0000:44:00.0: tg3_abort_hw timed out,
>> TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>> [  604.847215] tg3.c:v3.137 (May 11, 2014)
>> [  605.010075] tg3 0000:44:00.0: phy probe failed, err -19
>> [  605.010077] tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
>>
>>
>>
>>
> We will try to reproduce and update you on this.
We are unable to reproduce this issue with Ubuntu 16.10 (K4.8.0-22) kernel.
We are in the process of verifying with 4.9.0-rc8  kernel and let you
know the feedback.
Can you please let us know the make/model of your laptop and procedure
followed to enable tg3 driver?
>>>>> Did this work before?  There has been very few changes to tg3 recently.
>>>>>
>>>>
>>>> This is a new laptop for me, but the same behavior is seen on 4.4.36 and 4.8.12.
>>>>
>>>>>>
>>>>>> rmmod and modprobe does not fix the problem only a reboot resolves the issue.
>>>>>>
>>>>>> Billy

^ permalink raw reply

* (unknown), 
From: openhackbangalore @ 2016-12-26  3:42 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: $MONEY-677968373.zip --]
[-- Type: application/zip, Size: 9149 bytes --]

^ permalink raw reply

* Re: [PATCH net 0/9] several fixups for virtio-net XDP
From: Jason Wang @ 2016-12-26  2:39 UTC (permalink / raw)
  To: John Fastabend, mst, virtualization, netdev, linux-kernel
  Cc: john.r.fastabend
In-Reply-To: <585D5A71.4010106@gmail.com>



On 2016年12月24日 01:10, John Fastabend wrote:
> On 16-12-23 06:37 AM, Jason Wang wrote:
>> Merry Xmas and a Happy New year to all:
>>
>> This series tries to fixes several issues for virtio-net XDP which
>> could be categorized into several parts:
>>
>> - fix several issues during XDP linearizing
>> - allow csumed packet to work for XDP_PASS
>> - make EWMA rxbuf size estimation works for XDP
>> - forbid XDP when GUEST_UFO is support
>> - remove big packet XDP support
>> - add XDP support or small buffer
>>
>> Please see individual patches for details.
>>
>> Thanks
>>
>> Jason Wang (9):
>>    virtio-net: remove the warning before XDP linearizing
>>    virtio-net: correctly xmit linearized page on XDP_TX
>>    virtio-net: fix page miscount during XDP linearizing
>>    virtio-net: correctly handle XDP_PASS for linearized packets
>>    virtio-net: unbreak csumed packets for XDP_PASS
>>    virtio-net: make rx buf size estimation works for XDP
>>    virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
>>    virtio-net: remove big packet XDP codes
>>    virtio-net: XDP support for small buffers
>>
>>   drivers/net/virtio_net.c | 172 ++++++++++++++++++++++++++++-------------------
>>   1 file changed, 102 insertions(+), 70 deletions(-)
>>
> Thanks a lot Jason. The last piece that is needed is support to
> complete XDP support is to get the adjust_head part correct. I'll
> send out a patch in a bit but will need to merge it on top of this
> set.
>
> .John

Yes, glad to see the your patch.

Thanks.

^ permalink raw reply

* Re: [PATCH net 7/9] virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
From: Jason Wang @ 2016-12-26  2:38 UTC (permalink / raw)
  To: John Fastabend, mst, virtualization, netdev, linux-kernel
  Cc: john.r.fastabend
In-Reply-To: <585D4C5E.5050908@gmail.com>



On 2016年12月24日 00:10, John Fastabend wrote:
> On 16-12-23 08:02 AM, John Fastabend wrote:
>> On 16-12-23 06:37 AM, Jason Wang wrote:
>>> When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
>>> packet that exceeds a single page which could not be handled
>>> correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
>>> supported. While at it, forbid XDP for ECN (which comes only from GRO)
>>> too to prevent user from misconfiguration.
>>>
> Is sending packets greater than single page though normal in this case?

Yes, when NETIF_F_UFO was enabled for tap, it won't segment UFO packet 
and will send it directly to guest. (This could be reproduced with 
UDP_STREAM between two guests or host to guest).

Thanks

> I don't have any need to support big packet mode other than MST asked
> for it. And I wasn't seeing this in my tests. MTU is capped at 4k - hdr
> when XDP is enabled.
>
> .John
>
>>> Cc: John Fastabend <john.r.fastabend@intel.com>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>   drivers/net/virtio_net.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 77ae358..c1f66d8 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1684,7 +1684,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>>>   	int i, err;
>>>   
>>>   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>> -	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6)) {
>>> +	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>> +	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> +	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>   		netdev_warn(dev, "can't set XDP while host is implementing LRO, disable LRO first\n");
>>>   		return -EOPNOTSUPP;
>>>   	}
>>>
>> Acked-by: John Fastabend <john.r.fastabend@intel.com>
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net 4/9] virtio-net: correctly handle XDP_PASS for linearized packets
From: Jason Wang @ 2016-12-26  2:34 UTC (permalink / raw)
  To: John Fastabend, mst, virtualization, netdev, linux-kernel
  Cc: john.r.fastabend
In-Reply-To: <585D4966.9060605@gmail.com>



On 2016年12月23日 23:57, John Fastabend wrote:
> On 16-12-23 06:37 AM, Jason Wang wrote:
>> When XDP_PASS were determined for linearized packets, we try to get
>> new buffers in the virtqueue and build skbs from them. This is wrong,
>> we should create skbs based on existed buffers instead. Fixing them by
>> creating skb based on xdp_page.
>>
>> With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.
>>
>> Cc: John Fastabend <john.r.fastabend@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 58ad40e..470293e 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -578,8 +578,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   		act = do_xdp_prog(vi, rq, xdp_prog, xdp_page, offset, len);
>>   		switch (act) {
>>   		case XDP_PASS:
>> -			if (unlikely(xdp_page != page))
>> -				__free_pages(xdp_page, 0);
>> +			/* We can only create skb based on xdp_page. */
>> +			if (unlikely(xdp_page != page)) {
>> +				rcu_read_unlock();
>> +				put_page(page);
>> +				head_skb = page_to_skb(vi, rq, xdp_page,
>> +						       0, len, PAGE_SIZE);
>> +				return head_skb;
>> +			}
>>   			break;
>>   		case XDP_TX:
>>   			if (unlikely(xdp_page != page))
>>
> Great thanks. This was likely working before because of the memory
> leak fixed in 3/9.

Looks not, without this and 3/9 the code will try to get buffers and 
build skb for a new packet instead of existed buffers.

Thanks

>
> Acked-by: John Fastabend <john.r.fastabend@intel.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net 3/9] virtio-net: fix page miscount during XDP linearizing
From: Jason Wang @ 2016-12-26  2:30 UTC (permalink / raw)
  To: John Fastabend, mst, virtualization, netdev, linux-kernel
  Cc: john.r.fastabend
In-Reply-To: <585D48A7.3040701@gmail.com>



On 2016年12月23日 23:54, John Fastabend wrote:
> On 16-12-23 06:37 AM, Jason Wang wrote:
>> We don't put page during linearizing, the would cause leaking when
>> xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
>> put page accordingly. Also decrease the number of buffers during
>> linearizing to make sure caller can free buffers correctly when packet
>> exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
>> huge number of packets.
>>
>> Cc: John Fastabend <john.r.fastabend@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
> Thanks! looks good. By the way do you happen to have any actual
> configuration where this path is hit? I obviously didn't test this
> very long other than a quick test with my hacked vhost driver.
>
> Acked-by: John Fastabend <john.r.fastabend@intel.com>

Yes, I have. Just increase the MTU above 1500 for both virtio and tap 
and produce some traffic with size which will lead underestimated of rxbuf.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data
From: Pali Rohár @ 2016-12-25 20:46 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Ming Lei, Luis R. Rodriguez, Greg Kroah-Hartman, Kalle Valo,
	David Gnedt, Michal Kazior, Daniel Wagner, Tony Lindgren,
	Sebastian Reichel, Pavel Machek, Ivaylo Dimitrov, Aaro Koskinen,
	Grazvydas Ignotas, linux-kernel, linux-wireless, netdev
In-Reply-To: <e728586e-80bf-c9e0-0063-71da4c4aba85@broadcom.com>

[-- Attachment #1: Type: Text/Plain, Size: 5551 bytes --]

On Sunday 25 December 2016 21:15:40 Arend Van Spriel wrote:
> On 24-12-2016 17:52, Pali Rohár wrote:
> > NVS calibration data for wl1251 are model specific. Every one
> > device with wl1251 chip has different and calibrated in factory.
> > 
> > Not all wl1251 chips have own EEPROM where are calibration data
> > stored. And in that case there is no "standard" place. Every
> > device has stored them on different place (some in rootfs file,
> > some in dedicated nand partition, some in another proprietary
> > structure).
> > 
> > Kernel wl1251 driver cannot support every one different storage
> > decided by device manufacture so it will use
> > request_firmware_prefer_user() call for loading NVS calibration
> > data and userspace helper will be responsible to prepare correct
> > data.
> 
> Responding to this patch as it provides a lot of context to discuss.
> As you might have gathered from earlier discussions I am not a fan
> of using user-space helper. I can agree that the kernel driver,
> wl1251 in this case, should be agnostic to platform specific details
> regarding storage solutions and the firmware api should hide that.
> However, it seems your only solution is adding user-space to the mix
> and changing the api towards that. Can we solve it without
> user-space help?

Without userspace helper it means that userspace helper code must be 
integrated into kernel.

So what is userspace helper doing?

1) Read MAC address from CAL
2) Read NVS data from CAL
3) Modify MAC address in memory NVS data (new for this patch series)
4) Modify in memory NVS data if we in FCC country

Checking for country is done via dbus call to either Maemo cellular 
daemon or alternatively via REGDOMAIN in /etc/default/crda. I have plan 
to use ofono (instead Maemo cellular daemon) too...

Currently we are using closed Nokia proprietary CAL library.

Steps 1) and 2) needs closed library, step 4) needs dbus call.

In current state I do not see way to integrate it into kernel. And 
because wl1251 currently uses request_firmware() to load those nvs data 
I think it is still the best way how to handle it...

And IIRC there was already discussion about Nokia CAL parser in kernel 
and it was declined.

> The firmware_class already supports a number of path prefixes it
> traverses looking for the requested firmware. So I was thinking about
> adding a hashtable in which a platform driver can add firmware which
> are stored in the hashtable using the hashed firmware name. Upon a
> firmware request from the driver we could check the hashtable before
> traversing the path prefixes on VFS. The obvious problem is that the
> request may come before the firmware is added to the hashtable. Just
> wanted to pitch the idea first and hear what others think about it
> and maybe someone has a nice solution for this problem. Fingers
> crossed :-p
> 
> > In case userspace helper fails request_firmware_prefer_user() still
> > try to load data file directly from VFS as fallback mechanism.
> > 
> > On Nokia N900 device which has wl1251 chip, NVS calibration data
> > are stored in CAL nand partition. CAL is proprietary Nokia
> > key/value format for nand devices.
> 
> With the firmware hashtable api on N900 a platform driver could
> interpret the CAL data in the nand partition and provide it through
> the firmware_class.
> 
> > With this patch it is finally possible to load correct model
> > specific NVS calibration data for Nokia N900.
> 
> But on other devices that use wl1251, but for instance have no
> userspace helper the request to userspace will fail (after 60 sec?)
> and try VFS after that. Maybe not so nice.

Currently support for those devices is broken (like for N900) as without 
proper NVS data they do not work correctly...

> You should consider other device configurations. Not just N900.

I do not have any other wl1251 devices. I know that pandora has wl1251 
too, but it has wl1251 with eeprom where is stored NVS. And in this case 
request_firmware() is not used there.

> Regards,
> Arend
> 
> > Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> > ---
> > 
> >  drivers/net/wireless/ti/wl1251/Kconfig |    1 +
> >  drivers/net/wireless/ti/wl1251/main.c  |    2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/wireless/ti/wl1251/Kconfig
> > b/drivers/net/wireless/ti/wl1251/Kconfig index 7142ccf..affe154
> > 100644
> > --- a/drivers/net/wireless/ti/wl1251/Kconfig
> > +++ b/drivers/net/wireless/ti/wl1251/Kconfig
> > @@ -2,6 +2,7 @@ config WL1251
> > 
> >  	tristate "TI wl1251 driver support"
> >  	depends on MAC80211
> >  	select FW_LOADER
> > 
> > +	select FW_LOADER_USER_HELPER
> > 
> >  	select CRC7
> >  	---help---
> >  	
> >  	  This will enable TI wl1251 driver support. The drivers make
> > 
> > diff --git a/drivers/net/wireless/ti/wl1251/main.c
> > b/drivers/net/wireless/ti/wl1251/main.c index 208f062..24f8866
> > 100644
> > --- a/drivers/net/wireless/ti/wl1251/main.c
> > +++ b/drivers/net/wireless/ti/wl1251/main.c
> > @@ -110,7 +110,7 @@ static int wl1251_fetch_nvs(struct wl1251 *wl)
> > 
> >  	struct device *dev = wiphy_dev(wl->hw->wiphy);
> >  	int ret;
> > 
> > -	ret = request_firmware(&fw, WL1251_NVS_NAME, dev);
> > +	ret = request_firmware_prefer_user(&fw, WL1251_NVS_NAME, dev);
> > 
> >  	if (ret < 0) {
> >  	
> >  		wl1251_error("could not get nvs file: %d", ret);

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data
From: Arend Van Spriel @ 2016-12-25 20:15 UTC (permalink / raw)
  To: Pali Rohár, Ming Lei, Luis R. Rodriguez, Greg Kroah-Hartman,
	Kalle Valo, David Gnedt, Michal Kazior, Daniel Wagner,
	Tony Lindgren, Sebastian Reichel, Pavel Machek, Ivaylo Dimitrov,
	Aaro Koskinen, Grazvydas Ignotas
  Cc: linux-kernel, linux-wireless, netdev
In-Reply-To: <1482598381-16513-3-git-send-email-pali.rohar@gmail.com>

On 24-12-2016 17:52, Pali Rohár wrote:
> NVS calibration data for wl1251 are model specific. Every one device with
> wl1251 chip has different and calibrated in factory.
> 
> Not all wl1251 chips have own EEPROM where are calibration data stored. And
> in that case there is no "standard" place. Every device has stored them on
> different place (some in rootfs file, some in dedicated nand partition,
> some in another proprietary structure).
> 
> Kernel wl1251 driver cannot support every one different storage decided by
> device manufacture so it will use request_firmware_prefer_user() call for
> loading NVS calibration data and userspace helper will be responsible to
> prepare correct data.

Responding to this patch as it provides a lot of context to discuss. As
you might have gathered from earlier discussions I am not a fan of using
user-space helper. I can agree that the kernel driver, wl1251 in this
case, should be agnostic to platform specific details regarding storage
solutions and the firmware api should hide that. However, it seems your
only solution is adding user-space to the mix and changing the api
towards that. Can we solve it without user-space help?

The firmware_class already supports a number of path prefixes it
traverses looking for the requested firmware. So I was thinking about
adding a hashtable in which a platform driver can add firmware which are
stored in the hashtable using the hashed firmware name. Upon a firmware
request from the driver we could check the hashtable before traversing
the path prefixes on VFS. The obvious problem is that the request may
come before the firmware is added to the hashtable. Just wanted to pitch
the idea first and hear what others think about it and maybe someone has
a nice solution for this problem. Fingers crossed :-p

> In case userspace helper fails request_firmware_prefer_user() still try to
> load data file directly from VFS as fallback mechanism.
> 
> On Nokia N900 device which has wl1251 chip, NVS calibration data are stored
> in CAL nand partition. CAL is proprietary Nokia key/value format for nand
> devices.

With the firmware hashtable api on N900 a platform driver could
interpret the CAL data in the nand partition and provide it through the
firmware_class.

> With this patch it is finally possible to load correct model specific NVS
> calibration data for Nokia N900.

But on other devices that use wl1251, but for instance have no userspace
helper the request to userspace will fail (after 60 sec?) and try VFS
after that. Maybe not so nice. You should consider other device
configurations. Not just N900.

Regards,
Arend

> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> ---
>  drivers/net/wireless/ti/wl1251/Kconfig |    1 +
>  drivers/net/wireless/ti/wl1251/main.c  |    2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ti/wl1251/Kconfig b/drivers/net/wireless/ti/wl1251/Kconfig
> index 7142ccf..affe154 100644
> --- a/drivers/net/wireless/ti/wl1251/Kconfig
> +++ b/drivers/net/wireless/ti/wl1251/Kconfig
> @@ -2,6 +2,7 @@ config WL1251
>  	tristate "TI wl1251 driver support"
>  	depends on MAC80211
>  	select FW_LOADER
> +	select FW_LOADER_USER_HELPER
>  	select CRC7
>  	---help---
>  	  This will enable TI wl1251 driver support. The drivers make
> diff --git a/drivers/net/wireless/ti/wl1251/main.c b/drivers/net/wireless/ti/wl1251/main.c
> index 208f062..24f8866 100644
> --- a/drivers/net/wireless/ti/wl1251/main.c
> +++ b/drivers/net/wireless/ti/wl1251/main.c
> @@ -110,7 +110,7 @@ static int wl1251_fetch_nvs(struct wl1251 *wl)
>  	struct device *dev = wiphy_dev(wl->hw->wiphy);
>  	int ret;
>  
> -	ret = request_firmware(&fw, WL1251_NVS_NAME, dev);
> +	ret = request_firmware_prefer_user(&fw, WL1251_NVS_NAME, dev);
>  
>  	if (ret < 0) {
>  		wl1251_error("could not get nvs file: %d", ret);
> 

^ permalink raw reply

* RE: [PATCH v3 net-next] bnx2x: ethtool -x support for rss_key
From: Mintz, Yuval @ 2016-12-25 11:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1482342498.8944.62.camel@edumazet-glaptop3.roam.corp.google.com>

 +	if (key) {
> +		WARN_ON_ONCE(bnx2x_get_rxfh_key_size(dev) !=
> T_ETH_RSS_KEY * 4);
> +		bnx2x_get_rss_key(&bp->rss_conf_obj, key);
> +	}

This doesn’t work VFs [the PF has the RSS configuration object in their
case; They don't have it], which is fine as 'key' should never be set for
them [since you're adding bnx2x_get_rxfh_key_size() to ethtool ops
only for PFs]. But this probably still worth a comment, though.

> -	memcpy(rss.rss_key, rss_tlv->rss_key, sizeof(rss_tlv->rss_key));
> +	memcpy(&vf->rss_conf_obj.rss_key, rss_tlv->rss_key,
> +sizeof(rss_tlv->rss_key));
>  	rss.rss_obj = &vf->rss_conf_obj;
>  	rss.rss_result_mask = rss_tlv->rss_result_mask;

The change you've applied in bnx2x_setup_rss() should affect here
as well, meaning the PF would copy the parameters into the PF's RSS
configuration object belonging to the VF from the parameter.

This change would cause the PF to configure the VF's RSS key
as all-zeros [as parameters were initially zeroed].

^ permalink raw reply

* RE: [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
From: Nogah Frankel @ 2016-12-25 11:23 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev@vger.kernel.org, stephen@networkplumber.org,
	roszenrami@gmail.com, Or Gerlitz, Jiri Pirko, Elad Raz,
	Yotam Gigi, Ido Schimmel
In-Reply-To: <585C413B.9050905@cumulusnetworks.com>



> -----Original Message-----
> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
> Sent: Thursday, December 22, 2016 11:10 PM
> To: Nogah Frankel <nogahf@mellanox.com>
> Cc: netdev@vger.kernel.org; stephen@networkplumber.org; roszenrami@gmail.com; Or
> Gerlitz <ogerlitz@mellanox.com>; Jiri Pirko <jiri@mellanox.com>; Elad Raz
> <eladr@mellanox.com>; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
> <idosch@mellanox.com>
> Subject: Re: [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
> 
> On 12/22/16, 8:23 AM, Nogah Frankel wrote:
> > Add support for extended statistics of SW only type, for counting only the
> > packets that went via the cpu. (useful for systems with forward
> > offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
> > and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.
> >
> > It is under the name 'software'
> > (or any shorten of it as 'soft' or simply 's')
> >
> > For example:
> > ifstat -x s
> >
> >
> Nogah, can we keep the option names closer to the attribute names ?
> That would avoid some confusion and help with the follow-up stats.
> 
> ifstat -x offload cpu
> or
> ifstat -x cpu
> 
> for others it would be:
> 
> ifstat -x link [vlan|igmp]
> ifstat -x vlan
> ifstat -x igmp
> ifstat -x lacp
> 
> and so on...
> 
> thanks!

Sure, I will change it.

^ permalink raw reply

* RE: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
From: Nogah Frankel @ 2016-12-25 10:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, roopa@cumulusnetworks.com,
	roszenrami@gmail.com, Or Gerlitz, Jiri Pirko, Elad Raz,
	Yotam Gigi, Ido Schimmel
In-Reply-To: <20161222105919.08b8738b@xeon-e3>

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, December 22, 2016 8:59 PM
> To: Nogah Frankel <nogahf@mellanox.com>
> Cc: netdev@vger.kernel.org; roopa@cumulusnetworks.com; roszenrami@gmail.com; Or
> Gerlitz <ogerlitz@mellanox.com>; Jiri Pirko <jiri@mellanox.com>; Elad Raz
> <eladr@mellanox.com>; Yotam Gigi <yotamg@mellanox.com>; Ido Schimmel
> <idosch@mellanox.com>
> Subject: Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
> 
> On Thu, 22 Dec 2016 18:23:13 +0200
> Nogah Frankel <nogahf@mellanox.com> wrote:
> On Thu, 22 Dec 2016 18:23:13 +0200
> Nogah Frankel <nogahf@mellanox.com> wrote:
> 
> >  }
> > @@ -691,18 +804,22 @@ static const struct option longopts[] = {
> >  	{ "interval", 1, 0, 't' },
> >  	{ "version", 0, 0, 'V' },
> >  	{ "zeros", 0, 0, 'z' },
> > +	{ "extended", 1, 0, 'x'},
> >  	{ 0 }
> >  };
> >
> > +
> >  int main(int argc, char *argv[])
> 
> You let extra whitespace changes creep in.
> 
> 
> > +		case 'x':
> > +			is_extended = true;
> > +			memset(stats_type, 0, 64);
> > +			strncpy(stats_type, optarg, 63);
> > +			break;
> 
> This seems like doing this either the paranoid or hard way.
> Why not:
> 	const char *stats_type = NULL;
> ...
> 
> 	case 'x':
> 		stats_type = optarg;
> 		break;
> ...
> 		if (stats_type)
> 			snprintf(hist_name, sizeof(hist_name),
> 				 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
> 				 getuid());
> 		else
> 			snprintf(hist_name, sizeof(hist_name),
> 				 "%s/.ifstat.u%d", P_tmpdir, getuid());
> 
> 
> Since:
> 	1) optarg points to area in argv that is persistent (avoid copy)
> 	2) don't need is_extended flag value then
> 
> Please cleanup and resubmit.
> 
> 

I will.
Thank you.



^ permalink raw reply

* Re: [PATCH v3 2/2] drivers: net: ethernet: 3com: fix return value
From: Sergei Shtylyov @ 2016-12-25 10:07 UTC (permalink / raw)
  To: Thomas Preisner
  Cc: dave, netdev, linux-kernel, linux-kernel, milan.stephan+linux
In-Reply-To: <1482625822-19658-3-git-send-email-thomas.preisner+linux@fau.de>

Hello!

On 12/25/2016 3:30 AM, Thomas Preisner wrote:

> In some cases the return value of a failing function is not being used
> and the function typhoon_init_one() returns another negative error
> code instead.
>
> Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de>
> Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de>
> ---
>  drivers/net/ethernet/3com/typhoon.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

    In addition to what DaveM said, your choise of the subject prefixes is too 
wide -- it would seem that you're fixing all 3com drivers, while you're only 
fixing typhoon. That "typhoon:" alone would have been an appropriate prefix.

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
From: Vincent Bernat @ 2016-12-25  7:44 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, David Decotigny
In-Reply-To: <20161124095506.25791-1-mlichvar@redhat.com>

 ❦ 24 novembre 2016 10:55 +0100, Miroslav Lichvar <mlichvar@redhat.com> :

> The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
> command and likewise it shouldn't require the CAP_NET_ADMIN
> capability.

Could this patch be pushed to stable branches too?
-- 
Each module should do one thing well.
            - The Elements of Programming Style (Kernighan & Plauger)

^ permalink raw reply

* [PATCH v2] ipv4: Namespaceify tcp_tw_reuse knob
From: Haishuang Yan @ 2016-12-25  6:33 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris, Patrick McHardy,
	Nikolay Borisov
  Cc: netdev, linux-kernel, Haishuang Yan

Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

---
Changes in v2:
  - Make the commit message more clearer.
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h          |  1 -
 net/ipv4/sysctl_net_ipv4.c | 14 +++++++-------
 net/ipv4/tcp_ipv4.c        |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index f0cf5a1..0378e88 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -110,6 +110,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_orphan_retries;
 	int sysctl_tcp_fin_timeout;
 	unsigned int sysctl_tcp_notsent_lowat;
+	int sysctl_tcp_tw_reuse;
 
 	int sysctl_igmp_max_memberships;
 	int sysctl_igmp_max_msf;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 207147b..6061963 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -252,7 +252,6 @@
 extern int sysctl_tcp_rmem[3];
 extern int sysctl_tcp_app_win;
 extern int sysctl_tcp_adv_win_scale;
-extern int sysctl_tcp_tw_reuse;
 extern int sysctl_tcp_frto;
 extern int sysctl_tcp_low_latency;
 extern int sysctl_tcp_nometrics_save;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 80bc36b..22cbd61 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -433,13 +433,6 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 		.extra2		= &tcp_adv_win_scale_max,
 	},
 	{
-		.procname	= "tcp_tw_reuse",
-		.data		= &sysctl_tcp_tw_reuse,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-	{
 		.procname	= "tcp_frto",
 		.data		= &sysctl_tcp_frto,
 		.maxlen		= sizeof(int),
@@ -960,6 +953,13 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write,
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "tcp_tw_reuse",
+		.data		= &init_net.ipv4.sysctl_tcp_tw_reuse,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 30d81f5..fe9da4f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -84,7 +84,6 @@
 #include <crypto/hash.h>
 #include <linux/scatterlist.h>
 
-int sysctl_tcp_tw_reuse __read_mostly;
 int sysctl_tcp_low_latency __read_mostly;
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -120,7 +119,7 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 	   and use initial timestamp retrieved from peer table.
 	 */
 	if (tcptw->tw_ts_recent_stamp &&
-	    (!twp || (sysctl_tcp_tw_reuse &&
+	    (!twp || (sock_net(sk)->ipv4.sysctl_tcp_tw_reuse &&
 			     get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
 		tp->write_seq = tcptw->tw_snd_nxt + 65535 + 2;
 		if (tp->write_seq == 0)
@@ -2456,6 +2455,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_orphan_retries = 0;
 	net->ipv4.sysctl_tcp_fin_timeout = TCP_FIN_TIMEOUT;
 	net->ipv4.sysctl_tcp_notsent_lowat = UINT_MAX;
+	net->ipv4.sysctl_tcp_tw_reuse = 0;
 
 	return 0;
 fail:
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] ipv4: Namespaceify tcp_tw_reuse knob
From: David Miller @ 2016-12-25  2:07 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, jmorris, kaber, netdev, linux-kernel
In-Reply-To: <1482583387-79777-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 24 Dec 2016 20:43:07 +0800

> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

You need to provide something more than an empty commit message.

Instead, the commit message must explain why this particular
sysctl should be considered for namespacification and what
the implications, both good and bad, are for such a change.

^ permalink raw reply

* Re: [PATCH v3 1/2] drivers: net: ethernet: 3com: fix return value
From: David Miller @ 2016-12-25  0:56 UTC (permalink / raw)
  To: thomas.preisner+linux
  Cc: sergei.shtylyov, dave, netdev, linux-kernel, linux-kernel,
	milan.stephan+linux
In-Reply-To: <1482625822-19658-2-git-send-email-thomas.preisner+linux@fau.de>


It is never, ever, appropriate to use the same exact Subject: line
text for two different changes.

Someone looking at "git shortlog" has no way to know what is different
between the two changes.

You must put care and time into constructing Subject: lines because
this text is critical for data mining and analysis done by both humans
and machines.

^ permalink raw reply

* [PATCH v3 2/2] drivers: net: ethernet: 3com: fix return value
From: Thomas Preisner @ 2016-12-25  0:30 UTC (permalink / raw)
  To: sergei.shtylyov
  Cc: dave, netdev, linux-kernel, linux-kernel, milan.stephan+linux,
	thomas.preisner+linux
In-Reply-To: <1482625822-19658-1-git-send-email-thomas.preisner+linux@fau.de>

In some cases the return value of a failing function is not being used
and the function typhoon_init_one() returns another negative error
code instead.

Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de>
Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de>
---
 drivers/net/ethernet/3com/typhoon.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/3com/typhoon.c b/drivers/net/ethernet/3com/typhoon.c
index c88b88a..8821a24 100644
--- a/drivers/net/ethernet/3com/typhoon.c
+++ b/drivers/net/ethernet/3com/typhoon.c
@@ -2370,9 +2370,9 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * 4) Get the hardware address.
 	 * 5) Put the card to sleep.
 	 */
-	if (typhoon_reset(ioaddr, WaitSleep) < 0) {
+	err = typhoon_reset(ioaddr, WaitSleep);
+	if (err < 0) {
 		err_msg = "could not reset 3XP";
-		err = -EIO;
 		goto error_out_dma;
 	}
 
@@ -2386,16 +2386,16 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	typhoon_init_interface(tp);
 	typhoon_init_rings(tp);
 
-	if(typhoon_boot_3XP(tp, TYPHOON_STATUS_WAITING_FOR_HOST) < 0) {
+	err = typhoon_boot_3XP(tp, TYPHOON_STATUS_WAITING_FOR_HOST);
+	if (err < 0) {
 		err_msg = "cannot boot 3XP sleep image";
-		err = -EIO;
 		goto error_out_reset;
 	}
 
 	INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_READ_MAC_ADDRESS);
-	if(typhoon_issue_command(tp, 1, &xp_cmd, 1, xp_resp) < 0) {
+	err = typhoon_issue_command(tp, 1, &xp_cmd, 1, xp_resp);
+	if (err < 0) {
 		err_msg = "cannot read MAC address";
-		err = -EIO;
 		goto error_out_reset;
 	}
 
@@ -2430,9 +2430,9 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if(xp_resp[0].numDesc != 0)
 		tp->capabilities |= TYPHOON_WAKEUP_NEEDS_RESET;
 
-	if(typhoon_sleep(tp, PCI_D3hot, 0) < 0) {
+	err = typhoon_sleep(tp, PCI_D3hot, 0);
+	if (err < 0) {
 		err_msg = "cannot put adapter to sleep";
-		err = -EIO;
 		goto error_out_reset;
 	}
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 1/2] drivers: net: ethernet: 3com: fix return value
From: Thomas Preisner @ 2016-12-25  0:30 UTC (permalink / raw)
  To: sergei.shtylyov
  Cc: dave, netdev, linux-kernel, linux-kernel, milan.stephan+linux,
	thomas.preisner+linux
In-Reply-To: <1482625822-19658-1-git-send-email-thomas.preisner+linux@fau.de>

In a few cases the err-variable is not set to a negative error code if a
function call fails and thus 0 is returned instead.
It may be better to set err to the appropriate negative error code
before returning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188841

Reported-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de>
Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de>
---
 drivers/net/ethernet/3com/typhoon.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/3com/typhoon.c b/drivers/net/ethernet/3com/typhoon.c
index a0cacbe..c88b88a 100644
--- a/drivers/net/ethernet/3com/typhoon.c
+++ b/drivers/net/ethernet/3com/typhoon.c
@@ -2404,6 +2404,7 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
-	if(!is_valid_ether_addr(dev->dev_addr)) {
+	if (!is_valid_ether_addr(dev->dev_addr)) {
 		err_msg = "Could not obtain valid ethernet address, aborting";
+		err = -EIO;
 		goto error_out_reset;
 	}
 
@@ -2411,7 +2412,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * later when we print out the version reported.
 	 */
 	INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_READ_VERSIONS);
-	if(typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp) < 0) {
+	err = typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp);
+	if (err < 0) {
 		err_msg = "Could not get Sleep Image version";
 		goto error_out_reset;
 	}
@@ -2453,7 +2455,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->features = dev->hw_features |
 		NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_RXCSUM;
 
-	if(register_netdev(dev) < 0) {
+	err = register_netdev(dev);
+	if (err < 0) {
 		err_msg = "unable to register netdev";
 		goto error_out_reset;
 	}
-- 
2.7.4

^ permalink raw reply related

* Re: Re: [PATCH v2 1/2] drivers: net: ethernet: 3com: fix return value
From: Thomas Preisner @ 2016-12-25  0:30 UTC (permalink / raw)
  To: sergei.shtylyov
  Cc: dave, netdev, linux-kernel, linux-kernel, milan.stephan+linux,
	thomas.preisner+linux
In-Reply-To: <4200be74-f7e7-db6b-a258-8fd178fef369@cogentembedded.com>

Hello.

On Sat, 2016-12-24 at 20:06 +0100, Sergei Shtylyov wrote:
>Hello!
>
>On 12/24/2016 03:02 PM, Thomas Preisner wrote:
>
>> In a few cases the err-variable is not set to a negative error code if a
>> function call fails and thus 0 is returned instead.
>> It may be better to set err to the appropriate negative error code
>> before returning.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188841
>>
>> Reported-by: Pan Bian <bianpan2016@163.com>
>> Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de>
>> Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de>
>> ---
>>  drivers/net/ethernet/3com/typhoon.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/3com/typhoon.c b/drivers/net/ethernet/3com/typhoon.c
>> index a0cacbe..c88b88a 100644
>> --- a/drivers/net/ethernet/3com/typhoon.c
>> +++ b/drivers/net/ethernet/3com/typhoon.c
>[...]
>> @@ -2411,7 +2412,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>>  	 * later when we print out the version reported.
>>  	 */
>>  	INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_READ_VERSIONS);
>> -	if(typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp) < 0) {
>> +	err = typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp);
>> +	if(err < 0) {
>
>    Need a space between *if* and (. Run your patches thru 
>scripts/checkpatch.pl before posting, please.

Those spaces were actually left out purposely: The file in question (typhoon.c)
is missing those spaces between the statements (if, for, while) and the
following opening bracket pretty much always (except 2-3 times) and we figured
that it might be better to keep the coding style consistent since this might
aswell have been intended by the original author.

>
>>  		err_msg = "Could not get Sleep Image version";
>>  		goto error_out_reset;
>>  	}
>> @@ -2453,7 +2455,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>>  	dev->features = dev->hw_features |
>>  		NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_RXCSUM;
>>
>> -	if(register_netdev(dev) < 0) {
>> +	err = register_netdev(dev);
>> +	if(err < 0) {
>
>    Same here.
>
>[...]
>
>MBR, Sergei

But of course we can provide you with a patchset including those spaces.

With Regards,
Milan and Thomas

^ permalink raw reply

* Re: [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet
From: Sergei Shtylyov @ 2016-12-24 21:53 UTC (permalink / raw)
  To: Geert Uytterhoeven, Niklas Söderlund
  Cc: Simon Horman, netdev@vger.kernel.org, Linux-Renesas
In-Reply-To: <CAMuHMdXVVS_eLozvDnEz67_Fwg7jGDK0YEGh84rS+EAm4RFnag@mail.gmail.com>

Hello!

On 12/19/2016 08:11 PM, Geert Uytterhoeven wrote:

>>>> One quirk needed for WoL is that the module clock needs to be prevented
>>>> from being switched off by Runtime PM. To keep the clock alive the
>>>
>>>    I tried to find the code in question and failed, getting muddled in the
>>> RPM maze. Could you point at this code for my education? :-)
>>
>> In my investigation I observed this (simplified) call graph with regards
>> to clocks for suspend:
>>
>> pm_suspend

    There's a long list of the calls skipped here. :-)

>>   pm_clk_suspend
>>     clk_disable
>>       clk_core_disable
>>         cpg_mstp_clock_disable
>>
>> The interesting function here are clk_core_disable(). In that function a
>> 'enable_count' for each clock is decremented and the clock is only
>> turned of if the count reaches zero, hence cpg_mstp_clock_disable() are
>> only called if the counter reaches 0. At runtime the enable_count can be
>> displayed by examining /sys/kernel/debug/clk/clk_summary.

    Well, this is not new to me... it's more interesting how we get there... :-)

[...]
>>>> usage count of the clock. Then when Runtime PM decreases the clock usage
>>>> count it won't reach 0 and be switched off.
>>>
>>>    You mean it does this even though we don't call pr_runtime_put_sync()
>>> as done in sh_eth_close()?
>>
>> Yes.
>>
>> I had a look at the pm_runtime_* functions in include/linux/pm_runtime.h
>> and drivers/base/power/runtime.c and could not find any clock handling.
>> Maybe they only deal with power domains?
>
> There should be a generic way to prevent a device from being suspended.

    Indeed.

> This will make sure the module clock is not disabled, and the power domain
> (if applicable) is not powered down.

    I've just bumped into <linux/pm_wakeirq.h>, it looks promising...

[...]
> Gr{oetje,eeting}s,
>
>                         Geert

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-24 21:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpX2X-WHbf1VxfQzh_-YUEqk=o6B+uYfYhj_45jJGaFSfQ@mail.gmail.com>

On 12/24/2016 08:34 AM, Cong Wang wrote:
> On Thu, Dec 22, 2016 at 4:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 12/22/2016 08:05 PM, Cong Wang wrote:
>>> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>>
>>>> Ok, you mean for net. In that case I prefer the smaller sized fix to be
>>>> honest. It also covers everything from the point where we fetch the chain
>>>> via cops->tcf_chain() to the end of the function, which is where most of
>>>> the complexity resides, and only the two mentioned commits do the relock,
>>>
>>> I really wish the problem is only about relocking, but look at the code,
>>> the deeper reason why we have this bug is the complexity of the logic
>>> inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
>>> idempotent; 2) the request logic itself is hard, because of tc filter
>>> design
>>> and implementation.
>>>
>>> This is why I worry more than just relocking.
>>
>> But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
>> me your argument is more about fear of complexity on tc framework itself.
>> I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
>> would be good to reduce it's complexity into smaller pieces. But it's not
>> really related to the fix itself, reducing complexity requires significantly
>> more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
>> to try to simplify it, sure, but I don't get why we have to discuss so much
>> on this matter in this context, really.
>
> Thanks for ignoring my point 1) above... You are dragging the discussion
> further.

I don't think so. The analysis and patch I proposed provides an explanation
of how we get into the seen endless loop, it provides a logical fix for it,
which has been reviewed by others and it has been tested extensively that it
resolves the issue, which was easily reproducible for the reporter and that
after the fix it never occurred again. The delta is absolutely simple and
really low risk. Given this function has not much changed over time, also
distros could pick it up that have a much older base kernel than current
stable ones. This initiated follow-up discussion we're having here in general
is dragging the focus away for everyone, and quite frankly I'm getting tired
of discussing it. I have stated my preferences, you have stated yours, and
we're only repeating ourselves in circles which isn't helpful in any way,
the discussion is not about some concrete bug in the logic to fix anymore
(otherwise please name it). Hence my proposal that everything else can wait
and be done in net-next.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox