Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/4] net: hix5hd2_gmac: add generic compatible string
From: Dongpo Li @ 2016-12-13  2:02 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, Michael Turquette, Stephen Boyd, Russell King,
	Zhangfei Gao, Yisen Zhuang, salil.mehta, David Miller,
	Arnd Bergmann, Andrew Lunn, Jiancheng Xue, benjamin.chenhao,
	caizhiyong, netdev, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAL_Jsq+d5xHsxMKxbWSYy3cDXnC+HQj3=wun1zxf5PZ_BYrD-Q@mail.gmail.com>



On 2016/12/12 22:21, Rob Herring wrote:
> On Mon, Dec 12, 2016 at 5:16 AM, Dongpo Li <lidongpo@hisilicon.com> wrote:
>> Hi Rob,
>>
>> On 2016/12/10 6:35, Rob Herring wrote:
>>> On Mon, Dec 05, 2016 at 09:27:58PM +0800, Dongpo Li wrote:
>>>> The "hix5hd2" is SoC name, add the generic ethernet driver name.
>>>> The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
>>>> the SG/TXCSUM/TSO/UFO features.
>>>>
>>>> Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
>>>> ---
>>>>  .../devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt    |  9 +++++++--
>>>>  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c             | 15 +++++++++++----
>>>>  2 files changed, 18 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>>>> index 75d398b..75920f0 100644
>>>> --- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>>>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
>>>> @@ -1,7 +1,12 @@
>>>>  Hisilicon hix5hd2 gmac controller
>>>>
>>>>  Required properties:
>>>> -- compatible: should be "hisilicon,hix5hd2-gmac".
>>>> +- compatible: should contain one of the following SoC strings:
>>>> +    * "hisilicon,hix5hd2-gemac"
>>>> +    * "hisilicon,hi3798cv200-gemac"
>>>> +    and one of the following version string:
>>>> +    * "hisilicon,hisi-gemac-v1"
>>>> +    * "hisilicon,hisi-gemac-v2"
>>>
>>> What combinations are valid? I assume both chips don't have both v1 and
>>> v2. 2 SoCs and 2 versions so far, I don't think there is much point to
>>> have the v1 and v2 compatible strings.
>>>
>> The v1 and v2 are generic MAC compatible strings, many HiSilicon SoCs may
>> use the same MAC version. For example,
>> hix5hd2, hi3716cv200 SoCs use the v1 MAC version,
>> hi3798cv200, hi3516a SoCs use the v2 MAC version,
>> and there may be more SoCs added in future.
>> So I think the generic compatible strings are okay here.
>> Should I add the hi3716cv200, hi3516a SoCs compatible here?
> 
> Yes.
> 
>> Do you have any good advice?
>>
>>>>  - reg: specifies base physical address(s) and size of the device registers.
>>>>    The first region is the MAC register base and size.
>>>>    The second region is external interface control register.
>>>> @@ -20,7 +25,7 @@ Required properties:
>>>>
>>>>  Example:
>>>>      gmac0: ethernet@f9840000 {
>>>> -            compatible = "hisilicon,hix5hd2-gmac";
>>>> +            compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
>>>
>>> You can't just change compatible strings.
>>>
>> Okay, maybe I should name all the compatible string with the suffix "-gmac" instead of
>> "-gemac". This can keep the compatible strings with the same suffix. Is this okay?
>> Can I just add the generic compatible string without changing the SoCs compatible string?
>> Like following:
>>         gmac0: ethernet@f9840000 {
>>  -              compatible = "hisilicon,hix5hd2-gmac";
>>  +              compatible = "hisilicon,hix5hd2-gmac", "hisilicon,hisi-gmac-v1";
> 
> Yes, this is fine.
> 
Many thanks for your advice.
As the patch series have been applied to net-next branch,
in which way should I commit this compatible fix?
Should I send a new patch with "Fixes: xxxx"?


    Regards,
    Dongpo

.

^ permalink raw reply

* Re: [PATCH 0/6] USB support for Broadcom NSP SoC
From: Florian Fainelli @ 2016-12-13  2:20 UTC (permalink / raw)
  To: Yendapally Reddy Dhananjaya Reddy, Rob Herring, Mark Rutland,
	Russell King, Ray Jui, Scott Branden, Jon Mason, Florian Fainelli,
	Kishon Vijay Abraham I
  Cc: bcm-kernel-feedback-list, netdev, devicetree, linux-kernel,
	linux-arm-kernel
In-Reply-To: <1478683994-12008-1-git-send-email-yendapally.reddy@broadcom.com>

On 11/09/2016 01:33 AM, Yendapally Reddy Dhananjaya Reddy wrote:
> This patch set contains the usb support for Broadcom NSP SoC.
> The usb phy is connected through mdio interface. The mdio interface
> can be used to access either internal phys or external phys using a
> multiplexer.
> 
> The first patch provides the documentation details for mdio-mux and
> second patch provides the documentation details for usb3 phy. The third
> patch contains the mdio-mux support and fourth patch contains the
> changes to the mdio bus driver.
> 
> The fifth patch provides the phy driver and sixth patch provides the
> enable method for usb.
> 
> This patch series has been tested on NSP bcm958625HR board.
> This patch series is based on v4.9.0-rc1 and is available from github-
> repo: https://github.com/Broadcom/cygnus-linux.git
> branch:nsp-usb-v1

Can you resubmit this patch series with the feedback from Andrew, Rob
and Scott addressed?

Thanks!
-- 
Florian

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Michael S. Tsirkin @ 2016-12-13  2:28 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: jasowang, netdev, nhorman, davem
In-Reply-To: <20161212233343.q5xlv55rc5npqaqp@thunk.org>

On Mon, Dec 12, 2016 at 06:33:43PM -0500, Theodore Ts'o wrote:
> Hi,
> 
> I was doing a last minute regression test of the ext4 tree before
> sending a pull request to Linus, which I do using gce-xfstests[1], and
> I found that using networking was broken on GCE on linux-next.  I was
> using next-20161209, and after bisecting things, I narrowed down the
> commit which causing things to break to commit 449000102901:
> "virtio-net: enable multiqueue by default".  Reverting this commit on
> top of next-20161209 fixed the problem.
> 
> [1] http://thunk.org/gce-xfstests
> 
> You can reproduce the problem for building the kernel for Google
> Compute Engine --- I use a config such as this [2], and then try to
> boot a kernel on a VM.  The way I do this involves booting a test
> appliance and then kexec'ing into the kernel to be tested[3], using a
> 2cpu configuration.  (GCE machine type: n1-standard-2)
> 
> [2] https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-4.9
> [3] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> 
> You can then take a look at serial console using a command such as
> "gcloud compute instances get-serial-port-output <instance-name>", and
> you will get something like this (see attached).  The important bit is
> that the dhclient command is completely failing to be able to get a
> response from the network, from which I deduce that apparently that
> either networking send or receive or both seem to be badly affected by
> the commit in question.
> 
> Please let me know if there's anything I can do to help you debug this
> further.
> 
> Cheers,
> 
> 						- Ted

That's unfortunate, of course. It could be a hypervisor or
a guest kernel bug. ideas:
- does host have mq capability? how many queues?
- how about # of msix vectors?
- after you send something on tx queues,
  are interrupts arriving on rx queues?
- is problem rx or tx?
  set ip and arp manually and send a packet to known MAC,
  does it get there?

> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Linux version 4.9.0-rc8-ext4-06387-g03e5cbd (tytso@tytso-ssd) (gcc version 4.9.2 (Debian 4.9.2-10) ) #9 SMP Mon Dec 12 04:50:16 UTC 2016
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 elevator=noop console=ttyS0  fstestcfg=4k fstestset=-g,quick fstestexc= fstestopt=aex fstesttyp=ext4 fstestapi=1.3
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
> Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load Kernel Modules.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Apply Kernel Variables...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting Configuration File System...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting FUSE Control File System...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted FUSE Control File System.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted Configuration File System.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Apply Kernel Variables.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Static Device Nodes in /dev.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Kernel Device Manager...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Kernel Device Manager.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Coldplug all Devices.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Wait for Complete Device Initialization...
> Dec 11 23:53:20 xfstests-201612120451 systemd-fsck[1659]: xfstests-root: clean, 56268/655360 files, 357439/2620928 blocks
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started File System Check on Root Device.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remount Root and Kernel File Systems...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Remount Root and Kernel File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Various fixups to make systemd work better on Debian.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Load/Save Random Seed...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems (Pre).
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems (Pre).
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load/Save Random Seed.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Wait for Complete Device Initialization.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Copy rules generated while the root was ro...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS0.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS1.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Copy rules generated while the root was ro.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS2.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS3.
> Dec 11 23:53:20 xfstests-201612120451 systemd-udevd[2568]: could not open moddep file '/lib/modules/4.9.0-rc8-ext4-06387-g03e5cbd/modules.dep.bin'
> Dec 11 23:53:20 xfstests-201612120451 lvm[2579]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Encrypted Volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Encrypted Volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
> Dec 11 23:53:20 xfstests-201612120451 lvm[2625]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
> Dec 11 23:53:20 xfstests-201612120451 lvm[2627]: No volume groups found
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remote File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Remote File Systems.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Create Volatile Files and Directories...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Generate ssh host keys if they do not exist...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Raise network interfaces....
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Volatile Files and Directories.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started LSB: Generate ssh host keys if they do not exist.
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Update UTMP about System Boot/Shutdown...
> Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Update UTMP about System Boot/Shutdown.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Internet Systems Consortium DHCP Client 4.3.1
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Copyright 2004-2014 Internet Systems Consortium.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: All rights reserved.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: For info, please visit https://www.isc.org/software/dhcp/
> Dec 11 23:53:20 xfstests-201612120451 dhclient: 
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Configuring network interfaces...Internet Systems Consortium DHCP Client 4.3.1
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Copyright 2004-2014 Internet Systems Consortium.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: All rights reserved.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: For info, please visit https://www.isc.org/software/dhcp/
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Listening on LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   Socket/fallback
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Listening on LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   LPF/eth0/42:01:0a:f0:00:03
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   Socket/fallback
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCP[^[[32m  OK  ^[[0m] DISCOVER on eth0 to 255.255.255.255 port 67 interval 8
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
> Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
> Dec 11 23:53:20 xfstests-201612120451 dhclient: No DHCPOFFERS received.
> Dec 11 23:53:20 xfstests-201612120451 dhclient: Trying recorded lease 10.240.0.3
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: No DHCPOFFERS received.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Trying recorded lease 10.240.0.3
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: connect: Network is unreachable
> Dec 11 23:53:20 xfstests-201612120451 logger: /etc/dhcp/dhclient-exit-hooks returned non-zero exit status 2
> Dec 11 23:53:20 xfstests-201612120451 dhclient: bound: renewal in 38598 seconds.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: bound: renewal in 38598 seconds.
> Dec 11 23:53:20 xfstests-201612120451 networking[2633]: done.

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Theodore Ts'o @ 2016-12-13  3:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, netdev, nhorman, davem
In-Reply-To: <20161213042057-mutt-send-email-mst@kernel.org>

On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
> 
> That's unfortunate, of course. It could be a hypervisor or
> a guest kernel bug. ideas:
> - does host have mq capability? how many queues?
> - how about # of msix vectors?
> - after you send something on tx queues,
>   are interrupts arriving on rx queues?
> - is problem rx or tx?
>   set ip and arp manually and send a packet to known MAC,
>   does it get there?

Sorry, I don't know how to debug virtio-net.  Given that it's in a
cloud environment, I also can't set ip addresses manually, since ip
addresses are set manually.

If you can send me a patch, I'm happy to apply it and send you back
results.

I can say that I've had _zero_ problems using pretty much any kernel
from 3.10 to 4.9 using Google Compute Engine.  The commit I referenced
caused things to stop working.  So in terms of regression, this is
definitely a regression, and it's definitely caused by commit
449000102901.  Even if it is a hypervisor "bug", I'm pretty sure I
know what Linus will say if I ask him to revert it.  Linux kernels are
expected to work around hardware bugs, and breaking users just because
hardware is "broken" by some definition is generally not considered
friendly, especially when has been working for years and years before
some commit "fixed" things.

I would very much like to work with you to fix it, but I will need
your help, since virtio-net doesn't seem to print any informational
during the boot sequence, and I don't know how the best way to debug
it.

Cheers,

						- Ted

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Michael S. Tsirkin @ 2016-12-13  3:30 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: jasowang, netdev, nhorman, davem
In-Reply-To: <20161213031243.avq5g5m5r5ylcnnk@thunk.org>

On Mon, Dec 12, 2016 at 10:12:43PM -0500, Theodore Ts'o wrote:
> On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
> > 
> > That's unfortunate, of course. It could be a hypervisor or
> > a guest kernel bug. ideas:
> > - does host have mq capability? how many queues?
> > - how about # of msix vectors?
> > - after you send something on tx queues,
> >   are interrupts arriving on rx queues?
> > - is problem rx or tx?
> >   set ip and arp manually and send a packet to known MAC,
> >   does it get there?
> 
> Sorry, I don't know how to debug virtio-net.  Given that it's in a
> cloud environment, I also can't set ip addresses manually, since ip
> addresses are set manually.

OK, but you can send raw ethernet frames preseumably?


> If you can send me a patch, I'm happy to apply it and send you back
> results.

Let's start with collecting stats from sysfs for this device.
pls get features bitmap from there,
pls get /proc/interrupts mappings,
and pls use lspci to dump pci config.


> I can say that I've had _zero_ problems using pretty much any kernel
> from 3.10 to 4.9 using Google Compute Engine.  The commit I referenced
> caused things to stop working.  So in terms of regression, this is
> definitely a regression, and it's definitely caused by commit
> 449000102901.  Even if it is a hypervisor "bug", I'm pretty sure I
> know what Linus will say if I ask him to revert it.  Linux kernels are
> expected to work around hardware bugs, and breaking users just because
> hardware is "broken" by some definition is generally not considered
> friendly, especially when has been working for years and years before
> some commit "fixed" things.

I'm open to limiting new features to virtio 1 mode just to
avoid the hassle of dealing with legacy hypervisors.
But let's not argue about it until we know the root cause.

> 
> I would very much like to work with you to fix it, but I will need
> your help, since virtio-net doesn't seem to print any informational
> during the boot sequence, and I don't know how the best way to debug
> it.
> 
> Cheers,
> 
> 						- Ted


Let's start with debugging it like any PCI NIC.


-- 
MST

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Jason Wang @ 2016-12-13  3:43 UTC (permalink / raw)
  To: Theodore Ts'o, Michael S. Tsirkin; +Cc: netdev, nhorman, davem
In-Reply-To: <20161213031243.avq5g5m5r5ylcnnk@thunk.org>



On 2016年12月13日 11:12, Theodore Ts'o wrote:
> On Tue, Dec 13, 2016 at 04:28:17AM +0200, Michael S. Tsirkin wrote:
>> That's unfortunate, of course. It could be a hypervisor or
>> a guest kernel bug. ideas:
>> - does host have mq capability? how many queues?
>> - how about # of msix vectors?
>> - after you send something on tx queues,
>>    are interrupts arriving on rx queues?
>> - is problem rx or tx?
>>    set ip and arp manually and send a packet to known MAC,
>>    does it get there?
> Sorry, I don't know how to debug virtio-net.  Given that it's in a
> cloud environment, I also can't set ip addresses manually, since ip
> addresses are set manually.
>
> If you can send me a patch, I'm happy to apply it and send you back
> results.
>
> I can say that I've had _zero_ problems using pretty much any kernel
> from 3.10 to 4.9 using Google Compute Engine.  The commit I referenced
> caused things to stop working.  So in terms of regression, this is
> definitely a regression, and it's definitely caused by commit
> 449000102901.  Even if it is a hypervisor "bug", I'm pretty sure I
> know what Linus will say if I ask him to revert it.  Linux kernels are
> expected to work around hardware bugs, and breaking users just because
> hardware is "broken" by some definition is generally not considered
> friendly, especially when has been working for years and years before
> some commit "fixed" things.
>
> I would very much like to work with you to fix it, but I will need
> your help, since virtio-net doesn't seem to print any informational
> during the boot sequence, and I don't know how the best way to debug
> it.
>
> Cheers,
>
> 						- Ted

Thanks for reporting this issue. Looks like I blindly set the affinity 
instead of queues during probe. Could you please try the following patch 
to see if it works?

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b425fa1..fe9f772 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
                 goto free_unregister_netdev;
         }

-       virtnet_set_affinity(vi);
+       rtnl_lock();
+       virtnet_set_queues(vi, vi->curr_queue_pairs);
+       rtnl_unlock();

         /* Assume link up if device can't report link status,
            otherwise get link status from config. */

^ permalink raw reply related

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  3:54 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma, netdev
In-Reply-To: <23e26353-4317-2836-9f94-d1fc3274a770@redhat.com>

On Tue, Dec 13, 2016 at 5:22 AM, Doug Ledford <dledford@redhat.com> wrote:
>
> There are outstanding review comments to be addressed still yet, and the
> v2 patchset doesn't compile for me in 0day testing.  I'm going to bounce
> this one to 4.11.

I will address all review comments and fix the 0day compilation error
and post a v3 soon.

Thanks,
Selvin Xavier

^ permalink raw reply

* Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Theodore Ts'o @ 2016-12-13  4:19 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, netdev, nhorman, davem
In-Reply-To: <60cd312f-86f9-47e9-0c72-f4c2109e2f87@redhat.com>

On Tue, Dec 13, 2016 at 11:43:00AM +0800, Jason Wang wrote:
> Thanks for reporting this issue. Looks like I blindly set the affinity
> instead of queues during probe. Could you please try the following patch to
> see if it works?

This fixed things, thanks!!

						- Ted
						

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b425fa1..fe9f772 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
>                 goto free_unregister_netdev;
>         }
> 
> -       virtnet_set_affinity(vi);
> +       rtnl_lock();
> +       virtnet_set_queues(vi, vi->curr_queue_pairs);
> +       rtnl_unlock();
> 
>         /* Assume link up if device can't report link status,
>            otherwise get link status from config. */
> 
> 

^ permalink raw reply

* Re: [PATCH V2 for-next 00/11] Code improvements & fixes for HNS RoCE driver
From: Doug Ledford @ 2016-12-13  4:34 UTC (permalink / raw)
  To: Salil Mehta
  Cc: xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
	oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
	mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161115181053.399568-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1959 bytes --]

On 11/15/2016 1:10 PM, Salil Mehta wrote:
> This patchset introduces some code improvements and fixes
> for the identified problems in the HNS RoCE driver.
> 
> Lijun Ou (4):
>   IB/hns: Add the interface for querying QP1
>   IB/hns: add self loopback for CM
>   IB/hns: Modify the condition of notifying hardware loopback
>   IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()
> 
> Salil Mehta (1):
>   IB/hns: Fix for Checkpatch.pl comment style errors
> 
> Shaobo Xu (1):
>   IB/hns: Implement the add_gid/del_gid and optimize the GIDs
>     management
> 
> Wei Hu (Xavier) (5):
>   IB/hns: Add code for refreshing CQ CI using TPTR
>   IB/hns: Optimize the logic of allocating memory using APIs
>   IB/hns: Modify the macro for the timeout when cmd process
>   IB/hns: Modify query info named port_num when querying RC QP
>   IB/hns: Change qpn allocation to round-robin mode.
> 
>  drivers/infiniband/hw/hns/hns_roce_alloc.c  |   11 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.c    |    8 +-
>  drivers/infiniband/hw/hns/hns_roce_cmd.h    |    7 +-
>  drivers/infiniband/hw/hns/hns_roce_common.h |    2 -
>  drivers/infiniband/hw/hns/hns_roce_cq.c     |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_device.h |   45 ++--
>  drivers/infiniband/hw/hns/hns_roce_eq.c     |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hem.c    |    6 +-
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  267 +++++++++++++++++------
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   17 +-
>  drivers/infiniband/hw/hns/hns_roce_main.c   |  311 +++++++--------------------
>  drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
>  drivers/infiniband/hw/hns/hns_roce_pd.c     |    5 +-
>  drivers/infiniband/hw/hns/hns_roce_qp.c     |    2 +-
>  14 files changed, 363 insertions(+), 362 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-13  4:49 UTC (permalink / raw)
  To: Paul Moore
  Cc: netdev, linux-kernel, linux-audit, edumazet, xiyou.wangcong,
	dvyukov
In-Reply-To: <CAHC9VhTt5Pbw+LzVwRV1E==drwrH0ihUJvJkWwDgOA35OUFV2g@mail.gmail.com>

On 2016-12-12 12:10, Paul Moore wrote:
> On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> > <xiyou.wangcong@gmail.com> on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> 
> This is coming in pretty late for the v4.10 merge window, much later
> than I would usually take things, but this is arguably important, and
> (at first glance) relatively low risk - what testing have you done on
> this?

At this point, compile and boot, and I'm able to compile and run the
supplied test code without any issues.

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> >         struct sk_buff *skb;
> >
> >         /* break the connection */
> > +       sock_put(audit_sock);
> >         audit_pid = 0;
> > +       audit_nlk_portid = 0;
> >         audit_sock = NULL;
> >
> >         /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> >         if (rc >= 0) {
> >                 consume_skb(skb);
> >                 rc = 0;
> > +       } else {
> > +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> > +                       mutex_lock(&audit_cmd_mutex);
> > +                       auditd_reset();
> > +                       mutex_unlock(&audit_cmd_mutex);
> > +               }
> >         }
> >
> >         return rc;
> > @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
> >
> >                                 auditd = 0;
> >                                 if (AUDITD_BAD(rc, reschedule)) {
> > +                                       mutex_lock(&audit_cmd_mutex);
> >                                         auditd_reset();
> > +                                       mutex_unlock(&audit_cmd_mutex);
> >                                         reschedule = 0;
> >                                 }
> >                         } else
> > @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> >                                 auditd = 0;
> >                                 if (AUDITD_BAD(rc, reschedule)) {
> >                                         kauditd_hold_skb(skb);
> > +                                       mutex_lock(&audit_cmd_mutex);
> >                                         auditd_reset();
> > +                                       mutex_unlock(&audit_cmd_mutex);
> >                                         reschedule = 0;
> >                                 } else
> >                                         /* temporary problem (we hope), queue
> > @@ -623,7 +635,9 @@ quick_loop:
> >                                 if (rc) {
> >                                         auditd = 0;
> >                                         if (AUDITD_BAD(rc, reschedule)) {
> > +                                               mutex_lock(&audit_cmd_mutex);
> >                                                 auditd_reset();
> > +                                               mutex_unlock(&audit_cmd_mutex);
> >                                                 reschedule = 0;
> >                                         }
> >
> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                                 return -EACCES;
> >                         }
> >                         if (audit_pid && new_pid &&
> > -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
> > +                           (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> >                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 0);
> >                                 return -EEXIST;
> >                         }
> >                         if (audit_enabled != AUDIT_OFF)
> >                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
> > -                       audit_pid = new_pid;
> > -                       audit_nlk_portid = NETLINK_CB(skb).portid;
> > -                       audit_sock = skb->sk;
> > -                       if (!new_pid)
> > +                       if (new_pid) {
> > +                               if (audit_sock)
> > +                                       sock_put(audit_sock);
> > +                               audit_pid = new_pid;
> > +                               audit_nlk_portid = NETLINK_CB(skb).portid;
> > +                               sock_hold(skb->sk);
> > +                               audit_sock = skb->sk;
> > +                       } else {
> >                                 auditd_reset();
> > +                       }
> >                         wake_up_interruptible(&kauditd_wait);
> >                 }
> >                 if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> > @@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
> >  {
> >         struct audit_net *aunet = net_generic(net, audit_net_id);
> >         struct sock *sock = aunet->nlsk;
> > -       if (sock == audit_sock)
> > +       if (sock == audit_sock) {
> > +               mutex_lock(&audit_cmd_mutex);
> >                 auditd_reset();
> > +               mutex_unlock(&audit_cmd_mutex);
> > +       }
> >
> >         RCU_INIT_POINTER(aunet->nlsk, NULL);
> >         synchronize_net();
> > --
> > 1.7.1
> >
> > --
> > Linux-audit mailing list
> > Linux-audit@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-audit
> 
> 
> 
> -- 
> paul moore
> www.paul-moore.com

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  4:52 UTC (permalink / raw)
  To: jtoppins-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <9cf03e2b-a16d-19ec-a8ce-14f24272bf6a-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, Dec 12, 2016 at 10:24 PM, Jonathan Toppins <jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> CHECK   drivers/infiniband/hw/bnxtre/bnxt_re_debugfs.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c
> drivers/infiniband/hw/bnxtre/bnxt_qplib_res.c:729:6: warning: symbol
> 'bnxt_qplib_cleanup_pkey_tbl' was not declared. Should it be static?
I will remove this warning in v3 patch set.

>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.c
>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1015:22: warning: context
> imbalance in 'bnxt_qplib_lock_cqs' - wrong count at exit
> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1030:28: warning: context
> imbalance in 'bnxt_qplib_unlock_cqs' - unexpected unlock
The above two are false positives, since locking and unlocking are
handled in two separate functions. This is a wrapper to lock/unlock
both SQ and RQ CQ locks. Functionally it is ok  since
bnxt_qplib_unlock_cqs is called just after the critical section and
both locks are freed in order. I think we can ignore this warning.


>   MODPOST 2 modules
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Richard Guy Briggs @ 2016-12-13  5:10 UTC (permalink / raw)
  To: Paul Moore
  Cc: netdev, linux-kernel, linux-audit, edumazet, xiyou.wangcong,
	dvyukov
In-Reply-To: <CAHC9VhSjU+-KDSUcjHZAPEwCDbyLYGvkmXYcUqSfSsa1+=DsRw@mail.gmail.com>

On 2016-12-12 15:18, Paul Moore wrote:
> On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> > <xiyou.wangcong@gmail.com> on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> >         struct sk_buff *skb;
> >
> >         /* break the connection */
> > +       sock_put(audit_sock);
> >         audit_pid = 0;
> > +       audit_nlk_portid = 0;
> >         audit_sock = NULL;
> >
> >         /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> >         if (rc >= 0) {
> >                 consume_skb(skb);
> >                 rc = 0;
> > +       } else {
> > +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> 
> I dislike the way you wrote this because instead of simply looking at
> this to see if it correct I need to sort out all the bits and find out
> if there are other error codes that could run afoul of this check ...
> make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
> Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
> 0xffffffff) is going to cause this to be true for pretty much any
> value of rc, yes?

Yes, you are correct.  We need there a logical or on the results of each
comparison to the return code rather than bit-wise or-ing the result
codes together first to save a step.

> > +                       mutex_lock(&audit_cmd_mutex);
> > +                       auditd_reset();
> > +                       mutex_unlock(&audit_cmd_mutex);
> > +               }
> 
> The code in audit#next handles netlink_unicast() errors in
> kauditd_thread() and you are adding error handling code here in
> kauditd_send_unicast_skb() ... that's messy.  I don't care too much
> where the auditd_reset() call is made, but let's only do it in one
> function; FWIW, I originally put the error handling code in
> kauditd_thread() because there was other error handling code that
> needed to done in that scope so it resulted in cleaner code.

Hmmm, I seem to remember it not returning the return code and I thought
I had changed it to do so, but I see now that it was already there.
Agreed, I needlessly duplicated that error handling.

> Related, I see you are now considering ENOMEM to be a fatal condition,
> that differs from the AUDITD_BAD macro in kauditd_thread(); this
> difference needs to be reconciled.

Also correct about -EPERM now that I check back to the intent of commit
32a1dbaece7e ("audit: try harder to send to auditd upon netlink
failure")

> Finally, you should update the comment header block for auditd_reset()
> that it needs to be called with the audit_cmd_mutex held.

Yup.

> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                                 return -EACCES;
> >                         }
> >                         if (audit_pid && new_pid &&
> > -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
> > +                           (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> 
> Do we simply want to treat any error here as fatal, and not just
> ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
> handle the fatal netlink_unicast() return codes so we have some chance
> to keep things consistent in the future.

I'll work through this before I post another patch...

> paul moore

- RGB

^ permalink raw reply

* Re: [PATCH net-next 00/27] Remove VLAN CFI bit abuse
From: Michał Mirosław @ 2016-12-13  5:18 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

On Tue, Dec 13, 2016 at 01:12:32AM +0100, Michał Mirosław wrote:
> Dear NetDevs
> 
> This series removes an abuse of VLAN CFI bit in Linux networking stack.
> Currently Linux always clears the bit on outgoing traffic and presents
> it cleared to userspace (even via AF_PACKET/tcpdump when hw-accelerated).
[...]

I just noticed net-next got closed few days ago. I'll resend after it
opens again.  Nevertheless, review is appreciated.

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH iproute2 2/2] tc: tunnel_key: Add tc-tunnel_key man page to Makefile
From: Roi Dayan @ 2016-12-13  5:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan
In-Reply-To: <1481607232-1342-1-git-send-email-roid@mellanox.com>

To be installed with the other man pages.

Fixes: d57639a475a9 ("tc/act_tunnel: Introduce ip tunnel action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
---
 man/man8/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/man/man8/Makefile b/man/man8/Makefile
index de6f249..d4cb01a 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -17,6 +17,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 rtpr.8 ss.
 	tc-tcindex.8 tc-u32.8 tc-matchall.8 \
 	tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
 	tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 tc-skbmod.8 \
+	tc-tunnel_key.8 \
 	devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8
 
 all: $(TARGETS)
-- 
2.7.4

^ permalink raw reply related

* [PATCH iproute2 0/2] Man page fixes
From: Roi Dayan @ 2016-12-13  5:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan

Hi,

The 2 patches are man page related only.
First fixes a typo and second adding missing man page to the Makefile.

Thanks

Roi Dayan (2):
  tc: flower: Fix typo in the flower man page
  tc: tunnel_key: Add tc-tunnel_key man page to Makefile

 man/man8/Makefile    | 1 +
 man/man8/tc-flower.8 | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH iproute2 1/2] tc: flower: Fix typo in the flower man page
From: Roi Dayan @ 2016-12-13  5:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan
In-Reply-To: <1481607232-1342-1-git-send-email-roid@mellanox.com>

Replace vlan_eth_type with vlan_ethtype.

Fixes: 745d91726006 ("tc: flower: Introduce vlan support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
---
 man/man8/tc-flower.8 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 90fdfba..1cea54d 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -27,7 +27,7 @@ flower \- flow based traffic control filter
 .IR VID " | "
 .B vlan_prio
 .IR PRIORITY " | "
-.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
+.BR vlan_ethtype " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | "
 .IR IP_PROTO " } | { "
@@ -87,7 +87,7 @@ Match on vlan tag priority.
 .I PRIORITY
 is an unsigned 3bit value in decimal format.
 .TP
-.BI vlan_eth_type " VLAN_ETH_TYPE"
+.BI vlan_ethtype " VLAN_ETH_TYPE"
 Match on layer three protocol.
 .I ETH_TYPE
 may be either
-- 
2.7.4

^ permalink raw reply related

* DO YOU NEED A LOAN??
From: bancoleite @ 2016-12-13  5:27 UTC (permalink / raw)
  To: Recipients

Are you in need of a loan? Apply for more details.

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Selvin Xavier @ 2016-12-13  6:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161212170701.GA28387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Mon, Dec 12, 2016 at 10:37 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Sat, Dec 10, 2016 at 11:06:58AM +0530, Selvin Xavier wrote:
>> On Fri, Dec 9, 2016 at 12:17 PM, Selvin Xavier
>> <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org> wrote:
>> > I am preparing a git repository with these changes as per Jason's
>> > comment and will share the details later today.
>>
>> Please use bnxt_re branch in this git repository.
>>
>> https://github.com/Broadcom/linux-rdma-nxt.git
>
> Why are you using __packed in bnxt_re_uverbs_abi.h ? that doesn't seem
> necessary. It is a good idea to make sure all those structures are a
> multiple of 64 bits (add explicit reserved fields), and make sure you
> test 32 bit verbs as well.

Will take care in v3.

>
> Why are you using debugfs just to export counters? Isn't the core code
> counter framework good enough?

I agree that some of the counters exported by this patch set, tx and
rx bytes/pkts etc, can be exported
through the core counters. i will try adding  this support in v3, if
not, will post as a separate patch.
debugfs was introduced more for the future, in case any HW specific
data needs to be displayed.
As of now, it tracks only the count of resources( CQ/MR/QPs) active at
any given point. So its ok to
skip this patch from this series.

>
> Please try and avoid writing functions as defines (eg rdev_to_dev,
> to_bnxt_re, SQE_PG, RCFW_CMDQ_COOKIE, PTR_PG etc)
>
Sure, will take care in v3.

> There is something wrong with the tabs and spaces (see
> https://github.com/Broadcom/linux-rdma-nxt/blob/03e23b087f7e86ea28656273994e065827210ce5/drivers/infiniband/hw/bnxtre/bnxt_re_hsi.h)
>
> FWIW, I really dislike the column alignment style, it is so hard to
> maintain..
This file contains the Macro defines for the FW/HW structures and are
auto-generated. Some of these auto-generated defines are very long
which makes the lines greater than
80 characters. I will fix whatever possible and include in v3 set.

>
> Jason

Thanks,
Selvin
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 13/22] bnxt_re: Support QP verbs
From: Selvin Xavier @ 2016-12-13  6:08 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Eddie Wai, Devesh Sharma,
	Somnath Kotur, Sriharsha Basavapatna
In-Reply-To: <20161212182737.GC8204-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>

On Mon, Dec 12, 2016 at 11:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> It can help to review if you break this function into smaller pieces and
> get rid of switch->switch->if construction.

Thanks Leon. I will address this and your previous comments in v3 patch set.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net] virtio-net: correctly enable multiqueue
From: Jason Wang @ 2016-12-13  6:23 UTC (permalink / raw)
  To: netdev, virtualization; +Cc: tytso, Neil Horman, Michael S . Tsirkin

Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
multiqueue by default") blindly set the affinity instead of queues
during probe which can cause a mismatch of #queues between guest and
host. This patch fixes it by setting queues.

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b425fa1..fe9f772 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_unregister_netdev;
 	}

-	virtnet_set_affinity(vi);
+	rtnl_lock();
+	virtnet_set_queues(vi, vi->curr_queue_pairs);
+	rtnl_unlock();

 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH V2 18/22] bnxt_re: Support for DCB
From: Selvin Xavier @ 2016-12-13  6:25 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, linux-rdma@vger.kernel.org, Linux Netdev List,
	Eddie Wai, Devesh Sharma, Somnath Kotur, Sriharsha Basavapatna
In-Reply-To: <CAJ3xEMjcKRZpBMEBwQMZO5OqbVzH2=Q3FOAyYUVAz+ouw1jTNQ@mail.gmail.com>

On Sat, Dec 10, 2016 at 7:20 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Fri, Dec 9, 2016 at 8:48 AM, Selvin Xavier
> <selvin.xavier@broadcom.com> wrote:
>> This patch queries the configured RoCE APP Priority on the host
>> using the dcbnl API and programs the RoCE FW with the corresponding
>> Traffic Class(es) for the priority.
>
>> +#define BNXT_RE_ROCE_V1_ETH_TYPE       0x8915
>> +#define BNXT_RE_ROCE_V2_PORT_NO                4791
>
> I believe these two are defined already, try # git grep on each under include

Thanks Or for your comments.
V2 port number is defined in ib_verbs.h.  i will include this in the
next patch set.
v1 eth_type is not defined. All vendor drivers have their own definition.

Thanks,
Selvin

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Michael Chan @ 2016-12-13  6:41 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: jtoppins, Doug Ledford, linux-rdma, Netdev
In-Reply-To: <CA+sbYW2fF01N5ZQfaH6Uj7Q63SHwdo1gEwR=eoB6vB1ktQaiag@mail.gmail.com>

On Mon, Dec 12, 2016 at 8:52 PM, Selvin Xavier
<selvin.xavier@broadcom.com> wrote:

>>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_rcfw.c
>>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_sp.c
>>   CHECK   drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c
>> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1015:22: warning: context
>> imbalance in 'bnxt_qplib_lock_cqs' - wrong count at exit
>> drivers/infiniband/hw/bnxtre/bnxt_qplib_fp.c:1030:28: warning: context
>> imbalance in 'bnxt_qplib_unlock_cqs' - unexpected unlock
> The above two are false positives, since locking and unlocking are
> handled in two separate functions. This is a wrapper to lock/unlock
> both SQ and RQ CQ locks. Functionally it is ok  since
> bnxt_qplib_unlock_cqs is called just after the critical section and
> both locks are freed in order. I think we can ignore this warning.
>
>
You can use __releases() and __acquires() macros to denote these cases
for sparse.

^ permalink raw reply

* Re: stmmac DT property snps,axi_all
From: Giuseppe CAVALLARO @ 2016-12-13  6:47 UTC (permalink / raw)
  To: Niklas Cassel, Alexandre Torgue; +Cc: netdev
In-Reply-To: <0546490f-8f8c-a22e-db7b-eac521c7ff27@st.com>

Hello Niklas, Alex,

my fault and a step behind... Current code is OK
when manage the AAL that, although it is passed from
the axi structure, it is always used to program,
for all the chip versions, the writable bit inside the
DMA_BUS_MODE register.
So I guess no extra patch is needed.

Regards
Peppe

On 12/12/2016 3:18 PM, Giuseppe CAVALLARO wrote:
> Please Niklas
>
> when you send the patch, add my
>
> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>
>
> On 12/9/2016 10:53 AM, Niklas Cassel wrote:
>> On 12/09/2016 10:20 AM, Niklas Cassel wrote:
>>> On 12/08/2016 02:36 PM, Alexandre Torgue wrote:
>>>> Hi Niklas,
>>>>
>>>> On 12/05/2016 05:18 PM, Niklas Cassel wrote:
>>>>> Hello Giuseppe
>>>>>
>>>>>
>>>>> I'm trying to figure out what snps,axi_all is supposed to represent.
>>>>>
>>>>> It appears that the value is saved, but never used in the code.
>>>>>
>>>>> Looking at the register specification, I'm guessing that it represents
>>>>> Address-Aligned Beats, but there is already the property snps,aal
>>>>> for that.
>>>> IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus
>>>> mode register) and reflects the aal bit in DMA bus register.
>>>> As you know we use "snps,aal" to set aal bit in DMA bus register.
>>>> So "snps,axi_all" entry seems useless. Let's see with Peppe.
>>> Ok, I see. GMAC and GMAC4 is different here.
>>>
>>> For GMAC4 AAL only exists in DMA_SYS_BUS_MODE.
>>> It's not reflected anywhere else.
>>>
>>> The code is correct in the driver.
>>>
>>> If snps,axi_all is just created for a read-only register,
>>> and it is currently never used in the code,
>>> while we have snps,aal, which is correct and works,
>>> I guess it should be ok to remove snps,axi_all.
>>>
>>> I can cook up a patch.
>>>
>>
>> Here we go :)
>>
>> I will send it as a real patch once net-next reopens.
>>
>>
>>> From defc01cb7c22611b89d9cf1fcae72544092bd62c Mon Sep 17 00:00:00 2001
>> From: Niklas Cassel <niklas.cassel@axis.com>
>> Date: Fri, 9 Dec 2016 10:27:00 +0100
>> Subject: [PATCH net-next] net: stmmac: remove unused duplicate property
>>  snps,axi_all
>>
>> For core revision 3.x Address-Aligned Beats is available in two
>> registers.
>> The DT property snps,aal was created for AAL in the DMA bus register,
>> which is a read/write bit.
>> The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
>> register, which is a read only bit that reflects the value of AAL in the
>> DMA bus register.
>>
>> Since the value of snps,axi_all is never used in the driver,
>> and since the property was created for a bit that is read only,
>> it should be safe to remove the property.
>>
>> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
>> ---
>>  Documentation/devicetree/bindings/net/stmmac.txt      | 1 -
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
>>  include/linux/stmmac.h                                | 1 -
>>  3 files changed, 3 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/stmmac.txt
>> b/Documentation/devicetree/bindings/net/stmmac.txt
>> index 128da752fec9..c3d2fd480a1b 100644
>> --- a/Documentation/devicetree/bindings/net/stmmac.txt
>> +++ b/Documentation/devicetree/bindings/net/stmmac.txt
>> @@ -65,7 +65,6 @@ Optional properties:
>>      - snps,wr_osr_lmt: max write outstanding req. limit
>>      - snps,rd_osr_lmt: max read outstanding req. limit
>>      - snps,kbbe: do not cross 1KiB boundary.
>> -    - snps,axi_all: align address
>>      - snps,blen: this is a vector of supported burst length.
>>      - snps,fb: fixed-burst
>>      - snps,mb: mixed-burst
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> index 082cd48db6a7..60ba8993c650 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
>> @@ -121,7 +121,6 @@ static struct stmmac_axi *stmmac_axi_setup(struct
>> platform_device *pdev)
>>      axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
>>      axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm");
>>      axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe");
>> -    axi->axi_axi_all = of_property_read_bool(np, "snps,axi_all");
>>      axi->axi_fb = of_property_read_bool(np, "snps,axi_fb");
>>      axi->axi_mb = of_property_read_bool(np, "snps,axi_mb");
>>      axi->axi_rb =  of_property_read_bool(np, "snps,axi_rb");
>> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
>> index 266dab9ad782..889e0e9a3f1c 100644
>> --- a/include/linux/stmmac.h
>> +++ b/include/linux/stmmac.h
>> @@ -103,7 +103,6 @@ struct stmmac_axi {
>>      u32 axi_wr_osr_lmt;
>>      u32 axi_rd_osr_lmt;
>>      bool axi_kbbe;
>> -    bool axi_axi_all;
>>      u32 axi_blen[AXI_BLEN];
>>      bool axi_fb;
>>      bool axi_mb;
>>
>
>

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Giuseppe CAVALLARO @ 2016-12-13  7:22 UTC (permalink / raw)
  To: Niklas Cassel, Joao Pinto, Florian Fainelli, Andy Shevchenko
  Cc: David Miller, larper, rabinv, netdev, CARLOS.PALMINHA, Jie.Deng1,
	Stephen Warren, pavel
In-Reply-To: <1d445ec1-deb8-6e36-39c4-6813c446095f@axis.com>

On 12/12/2016 5:25 PM, Niklas Cassel wrote:
>
>
> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>> Hi,
>>
>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>
>>>>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>>>>> did
>>>>> actually pioneer the upstreaming effort, but it is good to see people
>>>>> from Synopsys willing to fix that in the future.
>>>> Wait, you would like to tell that we have more than 2 drivers for the
>>>> same (okay, same vendor) IP?!
>>>> It's better to unify them earlier, than have n+ copies.
>>> Unfortunately that is the case, see this email:
>>>
>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html
>>>
>>> dwc_eth_qos and stmmac have some overlap. There seems to be work
>>> underway to unify these two to begin with.
>>>
>>>> P.S. Though, I don't see how sxgbe got in the list. First glance on
>>>> the code doesn't show similarities.
>>> Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
>>> just my cursory look at the code, it may very well be something entirely
>>> different. The descriptor formats just look suspiciously similar.
>>>
>> Thank you for your inputs! Renaming seems to be a hotspot. I agree that maybe
>> instead of renaming (breaking retro-compatibility as David and Florian
>> mentioned), the best is to move stmmac to synopsys/ after merging *qos* and
>> removing it. As Florian mentioned, git is capable of detecting folder restructured.
>>
>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>> driver would it be possible for you to make an initial analysis of what has to
>> be merged into Stmmac? This way the development would speed-up.
>
> I can answer that question.
>
> I've sent out 12 patches to the stmmac driver
> (all patches are included in the current net-next tree),

ok I have seen these patches applied, I had just a minor concern about
the  failure when DMA configuration is missing.
In these years, I have noticed that, for this kind of HW, default DMA
configuration is usually good to have a driver working. AHB, AXI
parameters can be provided to have a best tuning or to fix know issues
on some platforms. So IMO, we should relax the check with a warning.
Please, consider that, the stmmac also supports very old MAC10/100
versions where the DMA configuration was often never passed.

> with these patches the stmmac driver works properly on Axis hardware
> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).

perfect and thanks a lot for this effort.

> stmmac's DT binding has also been extended with properties that
> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
>
> Since we have no problem updating the DTB together with the kernel,
> we will simply move to using the start using the stmmac driver,
> with stmmac's DT binding.
>
> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
> I don't how easy it would be for them to switch to stmmac's DT binding.
> (Adding Stephen Warren to CC.)

ok

>
> The reset sequence that Lars Persson was worried about is not an issue
> with the stmmac driver.
>

thx for this check.

>
> There are some performance problems with the stmmac driver though:
>
> When running iperf3 with 3 streams:
> iperf3 -c 192.168.0.90 -P 3 -t 30
> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>
> I get really bad fairness between the streams.

Can you confirm you are using the 4.xxa version?

This doesn't match with Alex's experiments on ARM platforms.

> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.

this doesn't match with what we had seen but I am happy and open to
review and accept new strategy.

> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
> and we don't see the same problem.

please, if you have new patch add me on CC and we will review all
together.

> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
> dwceqos driver (without IRQ coalescing).
> 2000 transactions/sec vs 9000 transactions/sec.
> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
> gives the same performance. I guess it's a trade off, low CPU usage
> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
>
> The best thing would be to get a good working TX IRQ coalesce
> implementation with HR timers in stmmac.

as said, welcome patches.

Basically, the default tuning of coalescence parameters comes from
ST platform experiences. I mean, we tuned to driver to have good
performance and saving CPU on SH4 (UP) and ARM (SMP) systems.
In these years, these default was accepted but, if today we need
to change something welcome effort. On my side, I can try to
perform some bench to see if I have regressions on not.

> Perhaps it should also be investigated if the RX interrupt watchdog
> timeout should have a lower default value.

Do not expect to get many improvements to play with the HW watchdog
due to the poor granularity of the Receive Interrupt Watchdog Timer
Count.

Regards
Peppe

>
>
>
>>
>> Thanks to all.
>>
>> Joao
>
>

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-13  7:51 UTC (permalink / raw)
  To: Cong Wang
  Cc: Herbert Xu, Johannes Berg, netdev, Florian Westphal, LKML,
	Eric Dumazet, linux-audit, syzkaller, David Miller, Dmitry Vyukov
In-Reply-To: <CAM_iQpVcHGywXn90EpiSz-LsUDgKVqs-7BY-L7UBCu2VxkC31Q@mail.gmail.com>

On 2016-12-09 23:40, Cong Wang wrote:
> On Fri, Dec 9, 2016 at 8:13 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, Dec 9, 2016 at 3:01 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> >> On 2016-12-08 22:57, Cong Wang wrote:
> >>> On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> >>> > I also tried to extend Cong Wang's idea to attempt to proactively respond to a
> >>> > NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
> >>> > stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
> >>> > Eliminating the lock since the sock is dead anways eliminates the error.
> >>> >
> >>> > Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
> >>> > get the test case to compile.
> >>>
> >>> It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
> >>> are updated as a whole and race between audit_receive_msg() and
> >>> NETLINK_URELEASE.
> >>
> >> This is what I expected and why I originally added the mutex lock in the
> >> callback...  The dumps I got were bare with no wrapper identifying the
> >> process context or specific error, so I'm at a bit of a loss how to
> >> solve this (without thinking more about it) other than instinctively
> >> removing the mutex.
> >
> > Netlink notifier can safely be converted to blocking one, I will send
> > a patch.
> >
> > But I seriously doubt you really need NETLINK_URELEASE here,
> > it adds nothing but overhead, b/c the netlink notifier is called on
> > every netlink socket in the system, but for net exit path, that is
> > relatively a slow path.
> >
> > Also, kauditd_send_skb() needs audit_cmd_mutex too.
> 
> Please let me know what you think about the attached patch?
> 
> Thanks!

> commit a12b43ee814625933ff155c20dc863c59cfcf240
> Author: Cong Wang <xiyou.wangcong@gmail.com>
> Date:   Fri Dec 9 17:56:42 2016 -0800
> 
>     audit: close a race condition on audit_sock
>     
>     Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f1ca116..ab947d8 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -423,6 +423,8 @@ static void kauditd_send_skb(struct sk_buff *skb)
>  				snprintf(s, sizeof(s), "audit_pid=%d reset", audit_pid);
>  				audit_log_lost(s);
>  				audit_pid = 0;
> +				audit_nlk_portid = 0;
> +				sock_put(audit_sock);
>  				audit_sock = NULL;
>  			} else {
>  				pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
> @@ -899,6 +901,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>  				audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
>  			audit_pid = new_pid;
>  			audit_nlk_portid = NETLINK_CB(skb).portid;
> +			sock_hold(skb->sk);
> +			if (audit_sock)
> +				sock_put(audit_sock);
>  			audit_sock = skb->sk;
>  		}
>  		if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> @@ -1167,10 +1172,6 @@ static void __net_exit audit_net_exit(struct net *net)
>  {
>  	struct audit_net *aunet = net_generic(net, audit_net_id);
>  	struct sock *sock = aunet->nlsk;
> -	if (sock == audit_sock) {
> -		audit_pid = 0;
> -		audit_sock = NULL;
> -	}

So how does this not leak memory leaving the sock refcount incremented
by the registered audit daemon when that daemon shuts down normally?


- RGB

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox