Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] r8169: fix performance issue on RTL8168evl
From: Heiner Kallweit @ 2019-08-08 22:02 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller,
	Holger Hoffstätte
  Cc: netdev@vger.kernel.org

From: Holger Hoffstätte <holger@applied-asynchrony.com>
Disabling TSO but leaving SG active results is a significant
performance drop. Therefore disable also SG on RTL8168evl.
This restores the original performance.

Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index b2a275d85..912bd41ea 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6898,9 +6898,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	/* RTL8168e-vl has a HW issue with TSO */
 	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
-		dev->vlan_features &= ~NETIF_F_ALL_TSO;
-		dev->hw_features &= ~NETIF_F_ALL_TSO;
-		dev->features &= ~NETIF_F_ALL_TSO;
+		dev->vlan_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+		dev->hw_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+		dev->features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
 	}
 
 	dev->hw_features |= NETIF_F_RXALL;
-- 
2.22.0


^ permalink raw reply related

* Re: KASAN: use-after-free Read in tomoyo_socket_sendmsg_permission
From: Tetsuo Handa @ 2019-08-08 22:07 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs, Ralf Baechle, linux-hams; +Cc: linux-kernel, netdev
In-Reply-To: <000000000000a244b3058f9dc7d6@google.com>

On 2019/08/09 1:45, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    107e47cc vrf: make sure skb->data contains ip header to ma..
> git tree:       net
> console output: https://syzkaller.appspot.com/x/log.txt?x=139506d8600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7
> dashboard link: https://syzkaller.appspot.com/bug?extid=b91501546ab4037f685f
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

This is not TOMOYO's bug. LSM modules expect that "struct sock" does not go away.

Also, another use-after-free (presumably on the same "struct sock") was concurrently
inflight at nr_insert_socket() in net/netrom/af_netrom.c . Thus, suspecting netrom's bug.

[  625.441058][    C0] ------------[ cut here ]------------
[  625.446837][    C0] refcount_t: increment on 0; use-after-free.
[  625.461518][    C0] WARNING: CPU: 0 PID: 0 at lib/refcount.c:156 refcount_inc_checked+0x61/0x70
[  625.479173][    C0] Kernel panic - not syncing: panic_on_warn set ...
[  625.746558][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #97
[  625.746575][    C0] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  625.755731][    C0] Call Trace:
[  625.770091][    C0]  <IRQ>
[  625.777543][    C0]  dump_stack+0x172/0x1f0
[  625.786005][    C0]  ? refcount_inc_not_zero_checked+0x1f0/0x200
[  625.794831][    C0]  panic+0x2dc/0x755
[  625.805217][    C0]  ? add_taint.cold+0x16/0x16
[  625.813697][    C0]  ? __kasan_check_write+0x14/0x20
[  625.822433][    C0]  ? __warn.cold+0x5/0x4c
[  625.832388][    C0]  ? __warn+0xe7/0x1e0
[  625.841820][    C0]  ? refcount_inc_checked+0x61/0x70
[  625.851148][    C0]  __warn.cold+0x20/0x4c
[  625.859701][    C0]  ? vprintk_emit+0x1ea/0x700
[  625.867208][    C0]  ? refcount_inc_checked+0x61/0x70
[  625.875413][    C0]  report_bug+0x263/0x2b0
[  625.884580][    C0]  do_error_trap+0x11b/0x200
[  625.893730][    C0]  do_invalid_op+0x37/0x50
[  625.902936][    C0]  ? refcount_inc_checked+0x61/0x70
[  625.911858][    C0]  invalid_op+0x14/0x20
[  625.920825][    C0] RIP: 0010:refcount_inc_checked+0x61/0x70
[  625.929407][    C0] Code: 1d 3f 6e 64 06 31 ff 89 de e8 cb d2 35 fe 84 db 75 dd e8 82 d1 35 fe 48 c7 c7 40 09 c6 87 c6 05 1f 6e 64 06 01 e8 77 39 07 fe <0f> 0b eb c1 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41
[  625.937608][    C0] RSP: 0018:ffff8880ae809bf0 EFLAGS: 00010282
[  625.948510][    C0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  625.957237][    C0] RDX: 0000000000000100 RSI: ffffffff815c3a26 RDI: ffffed1015d01370
[  625.967249][    C0] RBP: ffff8880ae809c00 R08: ffffffff88c7a1c0 R09: fffffbfff14a775b
[  625.991542][    C0] R10: fffffbfff14a775a R11: ffffffff8a53bad7 R12: ffff8880a066f480
[  626.002193][    C0] R13: ffff8880a066f468 R14: ffff88808d69ef48 R15: ffff88808d69ef20
[  626.014844][    C0]  ? vprintk_func+0x86/0x189
[  626.027298][    C0]  nr_insert_socket+0x2d/0xe0
[  626.041237][    C0]  nr_rx_frame+0x1605/0x1e73
[  626.051737][    C0]  nr_loopback_timer+0x7b/0x170
[  626.073842][    C0]  call_timer_fn+0x1ac/0x780
[  626.092970][    C0]  ? nr_process_rx_frame+0x1540/0x1540
[  626.108552][    C0]  ? msleep_interruptible+0x150/0x150
[  626.118574][    C0]  ? run_timer_softirq+0x685/0x17a0
[  626.131811][    C0]  ? trace_hardirqs_on+0x67/0x240
[  626.145424][    C0]  ? __kasan_check_read+0x11/0x20
[  626.156592][    C0]  ? nr_process_rx_frame+0x1540/0x1540
[  626.164362][    C0]  ? nr_process_rx_frame+0x1540/0x1540
[  626.175423][    C0]  run_timer_softirq+0x697/0x17a0
[  626.188804][    C0]  ? add_timer+0x930/0x930
[  626.202652][    C0]  ? kvm_clock_read+0x18/0x30
[  626.215813][    C0]  ? kvm_sched_clock_read+0x9/0x20
[  626.231378][    C0]  ? sched_clock+0x2e/0x50
[  626.231395][    C0]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  626.231408][    C0]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  626.231432][    C0]  __do_softirq+0x262/0x98c
[  626.244512][    C0]  ? sched_clock_cpu+0x1b/0x1b0
[  626.244531][    C0]  irq_exit+0x19b/0x1e0
[  626.244545][    C0]  smp_apic_timer_interrupt+0x1a3/0x610
[  626.244558][    C0]  apic_timer_interrupt+0xf/0x20
[  626.244563][    C0]  </IRQ>
[  626.244579][    C0] RIP: 0010:native_safe_halt+0xe/0x10
[  626.244606][    C0] Code: b8 94 73 fa eb 8a 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 34 25 4f 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 24 25 4f 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 0e 56 27 fa e8 c9
[  626.257081][    C0] RSP: 0018:ffffffff88c07ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[  626.269812][    C0] RAX: 1ffffffff11a5e05 RBX: ffffffff88c7a1c0 RCX: 0000000000000000
[  626.281053][    C0] RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff88c7aa4c
[  626.290913][    C0] RBP: ffffffff88c07d18 R08: ffffffff88c7a1c0 R09: 0000000000000000
[  626.303361][    C0] R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
[  626.314081][    C0] R13: ffffffff89a4f778 R14: 0000000000000000 R15: 0000000000000000
[  626.314116][    C0]  ? default_idle+0x4e/0x360
[  626.323075][    C0]  arch_cpu_idle+0xa/0x10
[  626.333543][    C0]  default_idle_call+0x84/0xb0
[  626.341839][    C0]  do_idle+0x413/0x760
[  626.370736][    C0]  ? retint_kernel+0x2b/0x2b
[  626.383044][    C0]  ? arch_cpu_idle_exit+0x80/0x80
[  626.400071][    C0]  ? do_idle+0x387/0x760
[  626.418085][    C0]  cpu_startup_entry+0x1b/0x20
[  626.431835][    C0]  rest_init+0x245/0x37b
[  626.459420][    C0]  arch_call_rest_init+0xe/0x1b
[  626.471993][    C0]  start_kernel+0x912/0x951
[  626.482387][    C0]  ? mem_encrypt_init+0xb/0xb
[  626.495105][    C0]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  626.507125][    C0]  ? x86_family+0x41/0x50
[  626.519773][    C0]  ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[  626.532837][    C0]  x86_64_start_reservations+0x29/0x2b
[  626.545019][    C0]  x86_64_start_kernel+0x77/0x7b
[  626.558711][    C0]  secondary_startup_64+0xa4/0xb0
[  626.897092][    C0] Kernel Offset: disabled
[  626.901428][    C0] Rebooting in 86400 seconds..

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Heiner Kallweit @ 2019-08-08 22:11 UTC (permalink / raw)
  To: Tao Ren, Andrew Lunn, Florian Fainelli, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org
In-Reply-To: <14c1591b-26e1-3a2f-f6c4-beb2c8978e41@fb.com>

On 08.08.2019 23:47, Tao Ren wrote:
> Hi Heiner,
> 
> On 8/7/19 9:24 PM, Tao Ren wrote:
>> Hi Heiner,
>>
>> On 8/7/19 12:18 PM, Heiner Kallweit wrote:
>>> On 06.08.2019 23:42, Tao Ren wrote:
>>>> Hi Andrew / Heiner / Vladimir,
>>>>
>>>> On 8/6/19 2:09 PM, Tao Ren wrote:
>>>>> The BCM54616S PHY cannot work properly in RGMII->1000Base-KX mode (for
>>>>> example, on Facebook CMM BMC platform), mainly because genphy functions
>>>>> are designed for copper links, and 1000Base-X (clause 37) auto negotiation
>>>>> needs to be handled differently.
>>>>>
>>>>> This patch enables 1000Base-X support for BCM54616S by customizing 3
>>>>> driver callbacks:
>>>>>
>>>>>   - probe: probe callback detects PHY's operation mode based on
>>>>>     INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
>>>>>     Control register.
>>>>>
>>>>>   - config_aneg: bcm54616s_config_aneg_1000bx function is added for auto
>>>>>     negotiation in 1000Base-X mode.
>>>>>
>>>>>   - read_status: BCM54616S and BCM5482 PHY share the same read_status
>>>>>     callback which manually set link speed and duplex mode in 1000Base-X
>>>>>     mode.
>>>>>
>>>>> Signed-off-by: Tao Ren <taoren@fb.com>
>>>>
>>>> I customized config_aneg function for BCM54616S 1000Base-X mode and link-down issue is also fixed: the patch is tested on Facebook CMM and Minipack BMC and everything looks normal. Please kindly review when you have bandwidth and let me know if you have further suggestions.
>>>>
>>>> BTW, I would be happy to help if we decide to add a set of genphy functions for clause 37, although that may mean I need more help/guidance from you :-)
>>>
>>> You want to have standard clause 37 aneg and this should be generic in phylib.
>>> I hacked together a first version that is compile-tested only:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_1143631_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=iYElT7HC77pRZ3byVvW8ng&m=ZJArOJvHqNkqvs1x8l9HjfxjCN8e5xJpPz2YViBuKRA&s=EskpfBQtu9IBVeb96dv-sz76xIz4tJK5-lD4-qdIyWI&e= 
>>> It supports fixed mode too.
>>>
>>> It doesn't support half duplex mode because phylib doesn't know 1000BaseX HD yet.
>>> Not sure whether half duplex mode is used at all in reality.
>>>
>>> You could test the new core functions in your own config_aneg and read_status
>>> callback implementations.
>>
>> Thank you very much for the help! I'm planning to add these functions but I haven't started yet because I'm still going through clause 37 :-)
>>
>> Let me apply your patch and run some test on my platform. Will share you results tomorrow.
> 
> The patch "net: phy: add support for clause 37 auto-negotiation" works on my CMM platform, with just 1 minor change in phy.h (I guess it's typo?). Thanks again for the help!
> 
> -int genphy_c37_aneg_done(struct phy_device *phydev);
> +int genphy_c37_config_aneg(struct phy_device *phydev);
> 
Indeed, this was a typo. Thanks.

> BTW, shall I send out my patch v5 now (based on your patch)? Or I should wait till your patch is included in net-next and then send out my patch?
> 
Adding new functions to the core is typically only acceptable if in the
same patch series a user of the new functions is added. Therefore it's
best if you include my patch in your series (just remove the RFC tag and
set the From: properly).

> 
> Cheers,
> 
> Tao
> 
Heiner

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Tao Ren @ 2019-08-08 22:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, netdev@vger.kernel.org, openbmc@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Samuel Mendoza-Jonas,
	David S . Miller, William Kennington
In-Reply-To: <20190808211629.GQ27917@lunn.ch>

On 8/8/19 2:16 PM, Andrew Lunn wrote:
> On Thu, Aug 08, 2019 at 07:02:54PM +0000, Tao Ren wrote:
>> Hi Andrew,
>>
>> On 8/8/19 6:32 AM, Andrew Lunn wrote:
>>>> Let me prepare patch v2 using device tree. I'm not sure if standard
>>>> "mac-address" fits this situation because all we need is an offset
>>>> (integer) and BMC MAC is calculated by adding the offset to NIC's
>>>> MAC address. Anyways, let me work out v2 patch we can discuss more
>>>> then.
>>>
>>> Hi Tao
>>>
>>> I don't know BMC terminology. By NICs MAC address, you are referring
>>> to the hosts MAC address? The MAC address the big CPU is using for its
>>> interface?  Where does this NIC get its MAC address from? If the BMCs
>>> bootloader has access to it, it can set the mac-address property in
>>> the device tree.
>>
>> Sorry for the confusion and let me clarify more:
>>
> 
>> The NIC here refers to the Network controller which provide network
>> connectivity for both BMC (via NC-SI) and Host (for example, via
>> PCIe).
>>
> 
>> On Facebook Yamp BMC, BMC sends NCSI_OEM_GET_MAC command (as an
>> ethernet packet) to the Network Controller while bringing up eth0,
>> and the (Broadcom) Network Controller replies with the Base MAC
>> Address reserved for the platform. As for Yamp, Base-MAC and
>> Base-MAC+1 are used by Host (big CPU) and Base-MAC+2 are assigned to
>> BMC. In my opinion, Base MAC and MAC address assignments are
>> controlled by Network Controller, which is transparent to both BMC
>> and Host.
> 
> Hi Tao
> 
> I've not done any work in the BMC field, so thanks for explaining
> this.
> 
> In a typical embedded system, each network interface is assigned a MAC
> address by the vendor. But here, things are different. The BMC SoC
> network interface has not been assigned a MAC address, it needs to ask
> the network controller for its MAC address, and then do some magical
> transformation on the answer to derive a MAC address for
> itself. Correct?

Yes. It's correct.

> It seems like a better design would of been, the BMC sends a
> NCSI_OEM_GET_BMC_MAC and the answer it gets back is the MAC address
> the BMC should use. No magic involved. But i guess it is too late to
> do that now.

Some NCSI Network Controllers support such OEM command (Get Provisioned BMC MAC Address), but unfortunately it's not supported on Yamp.

>> I'm not sure if I understand your suggestion correctly: do you mean
>> we should move the logic (GET_MAC from Network Controller, adding
>> offset and configuring BMC MAC) from kernel to boot loader?
> 
> In general, the kernel is generic. It probably boots on any ARM system
> which is has the needed modules for. The bootloader is often much more
> specific. It might not be fully platform specific, but it will be at
> least specific to the general family of BMC SoCs. If you consider the
> combination of the BMC bootloader and the device tree blob, you have
> something specific to the platform. This magical transformation of
> adding 2 seems to be very platform specific. So having this magic in
> the bootloader+DT seems like the best place to put it.

I understand your concern now. Thank you for the explanation.

> However, how you pass the resulting MAC address to the kernel should
> be as generic as possible. The DT "mac-address" property is very
> generic, many MAC drivers understand it. Using it also allows for
> vendors which actually assign a MAC address to the BMC to pass it to
> the BMC, avoiding all this NCSI_OEM_GET_MAC handshake. Having an API
> which just passing '2' is not generic at all.

After giving it more thought, I'm thinking about adding ncsi dt node with following structure (mac/ncsi similar to mac/mdio/phy):

&mac0 {
    /* MAC properties... */

    use-ncsi;
    ncsi {
        /* ncsi level properties if any */

        package@0 {
            /* package level properties if any */

            channel@0 {
                /* channel level properties if any */

                bmc-mac-offset = <2>;
            };

            channel@1 {
                /* channel #1 properties */
            };
        };

        /* package #1 properties start here.. */
    };
};

The reasons behind this are:

1) mac driver doesn't need to parse "mac-offset" stuff: these ncsi-network-controller specific settings should be parsed in ncsi stack.

2) get_bmc_mac_address command is a channel specific command, and technically people can configure different offset/formula for different channels.

Any concerns or suggestions?


Thanks,

Tao

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Tao Ren @ 2019-08-08 22:31 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, Florian Fainelli, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org
In-Reply-To: <6d080f3e-48b9-a65d-b73e-576296e98738@gmail.com>

On 8/8/19 3:11 PM, Heiner Kallweit wrote:
> On 08.08.2019 23:47, Tao Ren wrote:
>> Hi Heiner,
>>
>> On 8/7/19 9:24 PM, Tao Ren wrote:
>>> Hi Heiner,
>>>
>>> On 8/7/19 12:18 PM, Heiner Kallweit wrote:
>>>> On 06.08.2019 23:42, Tao Ren wrote:
>>>>> Hi Andrew / Heiner / Vladimir,
>>>>>
>>>>> On 8/6/19 2:09 PM, Tao Ren wrote:
>>>>>> The BCM54616S PHY cannot work properly in RGMII->1000Base-KX mode (for
>>>>>> example, on Facebook CMM BMC platform), mainly because genphy functions
>>>>>> are designed for copper links, and 1000Base-X (clause 37) auto negotiation
>>>>>> needs to be handled differently.
>>>>>>
>>>>>> This patch enables 1000Base-X support for BCM54616S by customizing 3
>>>>>> driver callbacks:
>>>>>>
>>>>>>   - probe: probe callback detects PHY's operation mode based on
>>>>>>     INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
>>>>>>     Control register.
>>>>>>
>>>>>>   - config_aneg: bcm54616s_config_aneg_1000bx function is added for auto
>>>>>>     negotiation in 1000Base-X mode.
>>>>>>
>>>>>>   - read_status: BCM54616S and BCM5482 PHY share the same read_status
>>>>>>     callback which manually set link speed and duplex mode in 1000Base-X
>>>>>>     mode.
>>>>>>
>>>>>> Signed-off-by: Tao Ren <taoren@fb.com>
>>>>>
>>>>> I customized config_aneg function for BCM54616S 1000Base-X mode and link-down issue is also fixed: the patch is tested on Facebook CMM and Minipack BMC and everything looks normal. Please kindly review when you have bandwidth and let me know if you have further suggestions.
>>>>>
>>>>> BTW, I would be happy to help if we decide to add a set of genphy functions for clause 37, although that may mean I need more help/guidance from you :-)
>>>>
>>>> You want to have standard clause 37 aneg and this should be generic in phylib.
>>>> I hacked together a first version that is compile-tested only:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_1143631_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=iYElT7HC77pRZ3byVvW8ng&m=ZJArOJvHqNkqvs1x8l9HjfxjCN8e5xJpPz2YViBuKRA&s=EskpfBQtu9IBVeb96dv-sz76xIz4tJK5-lD4-qdIyWI&e= 
>>>> It supports fixed mode too.
>>>>
>>>> It doesn't support half duplex mode because phylib doesn't know 1000BaseX HD yet.
>>>> Not sure whether half duplex mode is used at all in reality.
>>>>
>>>> You could test the new core functions in your own config_aneg and read_status
>>>> callback implementations.
>>>
>>> Thank you very much for the help! I'm planning to add these functions but I haven't started yet because I'm still going through clause 37 :-)
>>>
>>> Let me apply your patch and run some test on my platform. Will share you results tomorrow.
>>
>> The patch "net: phy: add support for clause 37 auto-negotiation" works on my CMM platform, with just 1 minor change in phy.h (I guess it's typo?). Thanks again for the help!
>>
>> -int genphy_c37_aneg_done(struct phy_device *phydev);
>> +int genphy_c37_config_aneg(struct phy_device *phydev);
>>
> Indeed, this was a typo. Thanks.
> 
>> BTW, shall I send out my patch v5 now (based on your patch)? Or I should wait till your patch is included in net-next and then send out my patch?
>>
> Adding new functions to the core is typically only acceptable if in the
> same patch series a user of the new functions is added. Therefore it's
> best if you include my patch in your series (just remove the RFC tag and
> set the From: properly).

Got it. Let me play with it (especially "From:" property) and will send out patch series soon.


Cheers,

Tao

^ permalink raw reply

* Re: [PATCH net] inet: frags: re-introduce skb coalescing for local delivery
From: David Miller @ 2019-08-08 22:55 UTC (permalink / raw)
  To: gnault; +Cc: netdev, fw, edumazet, posk, alex.aring
In-Reply-To: <22d8da10c97214edd0677e6478093ad9376180ef.1564758715.git.gnault@redhat.com>

From: Guillaume Nault <gnault@redhat.com>
Date: Fri, 2 Aug 2019 17:15:03 +0200

> Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6
> defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
> generating many fragments) and running over an IPsec tunnel, reported
> more than 6Gbps throughput. After that patch, the same test gets only
> 9Mbps when receiving on a be2net nic (driver can make a big difference
> here, for example, ixgbe doesn't seem to be affected).
> 
> By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
> (IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert
> "ipv4: use skb coalescing in defragmentation"")).
> 
> Without fragment coalescing, be2net runs out of Rx ring entries and
> starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
> the netperf traffic is only composed of UDP fragments, any lost packet
> prevents reassembly of the full datagram. Therefore, fragments which
> have no possibility to ever get reassembled pile up in the reassembly
> queue, until the memory accounting exeeds the threshold. At that point
> no fragment is accepted anymore, which effectively discards all
> netperf traffic.
> 
> When reassembly timeout expires, some stale fragments are removed from
> the reassembly queue, so a few packets can be received, reassembled
> and delivered to the netperf receiver. But the nic still drops frames
> and soon the reassembly queue gets filled again with stale fragments.
> These long time frames where no datagram can be received explain why
> the performance drop is so significant.
> 
> Re-introducing fragment coalescing is enough to get the initial
> performances again (6.6Gbps with be2net): driver doesn't drop frames
> anymore (no more rx_drops_no_frags errors) and the reassembly engine
> works at full speed.
> 
> This patch is quite conservative and only coalesces skbs for local
> IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
> forwarding). Coalescing could be extended in the future if need be, as
> more scenarios would probably benefit from it.
 ...
> Signed-off-by: Guillaume Nault <gnault@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Andrew Lunn @ 2019-08-08 23:03 UTC (permalink / raw)
  To: Tao Ren
  Cc: Jakub Kicinski, netdev@vger.kernel.org, openbmc@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Samuel Mendoza-Jonas,
	David S . Miller, William Kennington
In-Reply-To: <ac22bbe0-36ca-b4b9-7ea7-7b1741c2070d@fb.com>

> After giving it more thought, I'm thinking about adding ncsi dt node
> with following structure (mac/ncsi similar to mac/mdio/phy):
> 
> &mac0 {
>     /* MAC properties... */
> 
>     use-ncsi;

This property seems to be specific to Faraday FTGMAC100. Are you going
to make it more generic? 

>     ncsi {
>         /* ncsi level properties if any */
> 
>         package@0 {

You should get Rob Herring involved. This is not really describing
hardware, so it might get rejected by the device tree maintainer.

> 1) mac driver doesn't need to parse "mac-offset" stuff: these
> ncsi-network-controller specific settings should be parsed in ncsi
> stack.

> 2) get_bmc_mac_address command is a channel specific command, and
> technically people can configure different offset/formula for
> different channels.

Does that mean the NCSA code puts the interface into promiscuous mode?
Or at least adds these unicast MAC addresses to the MAC receive
filter? Humm, ftgmac100 only seems to support multicast address
filtering, not unicast filters, so it must be using promisc mode, if
you expect to receive frames using this MAC address.

	   Andrew

^ permalink raw reply

* Re: [PATCH v2 15/15] dt-bindings: net: add bindings for ADIN PHY driver
From: Rob Herring @ 2019-08-08 23:03 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel@vger.kernel.org, David Miller,
	Mark Rutland, Florian Fainelli, Heiner Kallweit, Andrew Lunn
In-Reply-To: <20190808123026.17382-16-alexandru.ardelean@analog.com>

On Thu, Aug 8, 2019 at 6:31 AM Alexandru Ardelean
<alexandru.ardelean@analog.com> wrote:
>
> This change adds bindings for the Analog Devices ADIN PHY driver, detailing
> all the properties implemented by the driver.
>
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> ---
>  .../devicetree/bindings/net/adi,adin.yaml     | 76 +++++++++++++++++++
>  MAINTAINERS                                   |  1 +
>  2 files changed, 77 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/adi,adin.yaml
>
> diff --git a/Documentation/devicetree/bindings/net/adi,adin.yaml b/Documentation/devicetree/bindings/net/adi,adin.yaml
> new file mode 100644
> index 000000000000..86177c8fe23a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/adi,adin.yaml
> @@ -0,0 +1,76 @@
> +# SPDX-License-Identifier: GPL-2.0+
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/adi,adin.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Analog Devices ADIN1200/ADIN1300 PHY
> +
> +maintainers:
> +  - Alexandru Ardelean <alexandru.ardelean@analog.com>
> +
> +description: |
> +  Bindings for Analog Devices Industrial Ethernet PHYs
> +
> +allOf:
> +  - $ref: ethernet-phy.yaml#
> +
> +properties:
> +  adi,rx-internal-delay-ps:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      RGMII RX Clock Delay used only when PHY operates in RGMII mode with
> +      internal delay (phy-mode is 'rgmii-id' or 'rgmii-rxid') in pico-seconds.
> +    enum: [ 1600, 1800, 2000, 2200, 2400 ]
> +    default: 2000

This doesn't actually do what you think. The '$ref' has to be under an
'allOf' to work. It's an oddity of json-schema. However, anything with
a standard unit suffix already has a schema to define the type, so you
don't need to here and can just drop $ref.

> +
> +  adi,tx-internal-delay-ps:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      RGMII TX Clock Delay used only when PHY operates in RGMII mode with
> +      internal delay (phy-mode is 'rgmii-id' or 'rgmii-txid') in pico-seconds.
> +    enum: [ 1600, 1800, 2000, 2200, 2400 ]
> +    default: 2000
> +
> +  adi,fifo-depth-bits:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description: |
> +      When operating in RMII mode, this option configures the FIFO depth.
> +    enum: [ 4, 8, 12, 16, 20, 24 ]
> +    default: 8
> +
> +  adi,disable-energy-detect:
> +    description: |
> +      Disables Energy Detect Powerdown Mode (default disabled, i.e energy detect
> +      is enabled if this property is unspecified)
> +    type: boolean
> +
> +examples:
> +  - |
> +    ethernet {
> +        #address-cells = <1>;
> +        #size-cells = <0>;
> +
> +        phy-mode = "rgmii-id";
> +
> +        ethernet-phy@0 {
> +            reg = <0>;
> +
> +            adi,rx-internal-delay-ps = <1800>;
> +            adi,tx-internal-delay-ps = <2200>;
> +        };
> +    };
> +  - |
> +    ethernet {
> +        #address-cells = <1>;
> +        #size-cells = <0>;
> +
> +        phy-mode = "rmii";
> +
> +        ethernet-phy@1 {
> +            reg = <1>;
> +
> +            adi,fifo-depth-bits = <16>;
> +            adi,disable-energy-detect;
> +        };
> +    };
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e8aa8a667864..fd9ab61c2670 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -944,6 +944,7 @@ L:  netdev@vger.kernel.org
>  W:     http://ez.analog.com/community/linux-device-drivers
>  S:     Supported
>  F:     drivers/net/phy/adin.c
> +F:     Documentation/devicetree/bindings/net/adi,adin.yaml
>
>  ANALOG DEVICES INC ADIS DRIVER LIBRARY
>  M:     Alexandru Ardelean <alexandru.ardelean@analog.com>
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH v2 net-next 0/2] net: mvpp2: Implement RXAUI Support
From: Matt Pelland @ 2019-08-08 23:06 UTC (permalink / raw)
  To: netdev; +Cc: Matt Pelland, davem, maxime.chevallier, antoine.tenart

This patch set implements support for configuring Marvell's mvpp2 hardware for
RXAUI operation. There are two other patches necessary for this to work
correctly that concern Marvell's cp110 comphy that were emailed to the general
linux-kernel mailing list earlier on. I can post them here if need be. This
patch set was successfully tested on both a Marvell Armada 7040 based platform
as well as an Armada 8040 based platform.

Changes since v1:

- Use reverse christmas tree formatting for all modified declaration blocks.
- Bump MVP22_MAX_COMPHYS to 4 to allow for XAUI operation.
- Implement comphy sanity checking.

Matt Pelland (2):
  net: mvpp2: implement RXAUI support
  net: mvpp2: support multiple comphy lanes

 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |   8 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 129 ++++++++++++++----
 2 files changed, 110 insertions(+), 27 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [PATCH v2 net-next 1/2] net: mvpp2: implement RXAUI support
From: Matt Pelland @ 2019-08-08 23:06 UTC (permalink / raw)
  To: netdev; +Cc: Matt Pelland, davem, maxime.chevallier, antoine.tenart
In-Reply-To: <20190808230606.7900-1-mpelland@starry.com>

Marvell's mvpp2 packet processor supports RXAUI on port zero in a
similar manner to the existing 10G protocols that have already been
implemented. This patch implements the miscellaneous extra configuration
steps required for RXAUI operation.

Signed-off-by: Matt Pelland <mpelland@starry.com>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  1 +
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 32 +++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 4d9564ba68f6..256e7c796631 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -481,6 +481,7 @@
 #define MVPP22_XLG_CTRL4_REG			0x184
 #define     MVPP22_XLG_CTRL4_FWD_FC		BIT(5)
 #define     MVPP22_XLG_CTRL4_FWD_PFC		BIT(6)
+#define     MVPP22_XLG_CTRL4_USE_XPCS		BIT(8)
 #define     MVPP22_XLG_CTRL4_MACMODSELECT_GMAC	BIT(12)
 #define     MVPP22_XLG_CTRL4_EN_IDLE_CHECK	BIT(14)
 
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 74fd9e171865..1a5037a398fc 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -980,6 +980,7 @@ mvpp2_shared_interrupt_mask_unmask(struct mvpp2_port *port, bool mask)
 static bool mvpp2_is_xlg(phy_interface_t interface)
 {
 	return interface == PHY_INTERFACE_MODE_10GKR ||
+	       interface == PHY_INTERFACE_MODE_RXAUI ||
 	       interface == PHY_INTERFACE_MODE_XAUI;
 }
 
@@ -1020,6 +1021,29 @@ static void mvpp22_gop_init_sgmii(struct mvpp2_port *port)
 	}
 }
 
+static void mvpp22_gop_init_rxaui(struct mvpp2_port *port)
+{
+	struct mvpp2 *priv = port->priv;
+	void __iomem *xpcs;
+	u32 val;
+
+	xpcs = priv->iface_base + MVPP22_XPCS_BASE(port->gop_id);
+
+	val = readl(xpcs + MVPP22_XPCS_CFG0);
+	val &= ~MVPP22_XPCS_CFG0_RESET_DIS;
+	writel(val, xpcs + MVPP22_XPCS_CFG0);
+
+	val = readl(xpcs + MVPP22_XPCS_CFG0);
+	val &= ~(MVPP22_XPCS_CFG0_PCS_MODE(0x3) |
+		 MVPP22_XPCS_CFG0_ACTIVE_LANE(0x3));
+	val |= MVPP22_XPCS_CFG0_ACTIVE_LANE(2);
+	writel(val, xpcs + MVPP22_XPCS_CFG0);
+
+	val = readl(xpcs + MVPP22_XPCS_CFG0);
+	val |= MVPP22_XPCS_CFG0_RESET_DIS;
+	writel(val, xpcs + MVPP22_XPCS_CFG0);
+}
+
 static void mvpp22_gop_init_10gkr(struct mvpp2_port *port)
 {
 	struct mvpp2 *priv = port->priv;
@@ -1065,6 +1089,9 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
 	case PHY_INTERFACE_MODE_2500BASEX:
 		mvpp22_gop_init_sgmii(port);
 		break;
+	case PHY_INTERFACE_MODE_RXAUI:
+		mvpp22_gop_init_rxaui(port);
+		break;
 	case PHY_INTERFACE_MODE_10GKR:
 		if (port->gop_id != 0)
 			goto invalid_conf;
@@ -4567,6 +4594,7 @@ static void mvpp2_phylink_validate(struct phylink_config *config,
 	switch (state->interface) {
 	case PHY_INTERFACE_MODE_10GKR:
 	case PHY_INTERFACE_MODE_XAUI:
+	case PHY_INTERFACE_MODE_RXAUI:
 		if (port->gop_id != 0)
 			goto empty_set;
 		break;
@@ -4589,6 +4617,7 @@ static void mvpp2_phylink_validate(struct phylink_config *config,
 	switch (state->interface) {
 	case PHY_INTERFACE_MODE_10GKR:
 	case PHY_INTERFACE_MODE_XAUI:
+	case PHY_INTERFACE_MODE_RXAUI:
 	case PHY_INTERFACE_MODE_NA:
 		if (port->gop_id == 0) {
 			phylink_set(mask, 10000baseT_Full);
@@ -4741,6 +4770,9 @@ static void mvpp2_xlg_config(struct mvpp2_port *port, unsigned int mode,
 		   MVPP22_XLG_CTRL4_EN_IDLE_CHECK);
 	ctrl4 |= MVPP22_XLG_CTRL4_FWD_FC | MVPP22_XLG_CTRL4_FWD_PFC;
 
+	if (state->interface == PHY_INTERFACE_MODE_RXAUI)
+		ctrl4 |= MVPP22_XLG_CTRL4_USE_XPCS;
+
 	if (old_ctrl0 != ctrl0)
 		writel(ctrl0, port->base + MVPP22_XLG_CTRL0_REG);
 	if (old_ctrl4 != ctrl4)
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 net-next 2/2] net: mvpp2: support multiple comphy lanes
From: Matt Pelland @ 2019-08-08 23:06 UTC (permalink / raw)
  To: netdev; +Cc: Matt Pelland, davem, maxime.chevallier, antoine.tenart
In-Reply-To: <20190808230606.7900-1-mpelland@starry.com>

mvpp 2.2 supports RXAUI, which requires two serdes lanes, and XAUI which
requires four serdes lanes instead of the usual single lane required by other
interface modes. This patch expands the number of lanes that can be associated
to a port so that all relevant serdes lanes are correctly configured at the
appropriate times when either RXAUI or XAUI is in use.

Signed-off-by: Matt Pelland <mpelland@starry.com>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  7 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 97 ++++++++++++++-----
 2 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 256e7c796631..d74f458ca099 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -655,6 +655,11 @@
 #define MVPP2_F_LOOPBACK		BIT(0)
 #define MVPP2_F_DT_COMPAT		BIT(1)
 
+/* MVPP22 supports RXAUI which requires two comphy lanes and XAUI which
+ * requires four comphy lanes. All other modes require one.
+ */
+#define MVPP22_MAX_COMPHYS		4
+
 /* Marvell tag types */
 enum mvpp2_tag_type {
 	MVPP2_TAG_TYPE_NONE = 0,
@@ -935,7 +940,7 @@ struct mvpp2_port {
 	phy_interface_t phy_interface;
 	struct phylink *phylink;
 	struct phylink_config phylink_config;
-	struct phy *comphy;
+	struct phy *comphys[MVPP22_MAX_COMPHYS];
 
 	struct mvpp2_bm_pool *pool_long;
 	struct mvpp2_bm_pool *pool_short;
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 1a5037a398fc..100972703f60 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -1200,17 +1200,40 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
  */
 static int mvpp22_comphy_init(struct mvpp2_port *port)
 {
-	int ret;
+	int i, ret;
 
-	if (!port->comphy)
-		return 0;
+	for (i = 0; i < ARRAY_SIZE(port->comphys); i++) {
+		if (!port->comphys[i])
+			return 0;
 
-	ret = phy_set_mode_ext(port->comphy, PHY_MODE_ETHERNET,
-			       port->phy_interface);
-	if (ret)
-		return ret;
+		ret = phy_set_mode_ext(port->comphys[i],
+				       PHY_MODE_ETHERNET,
+				       port->phy_interface);
+		if (ret)
+			return ret;
+
+		ret = phy_power_on(port->comphys[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int mvpp22_comphy_deinit(struct mvpp2_port *port)
+{
+	int i, ret;
+
+	for (i = 0; i < ARRAY_SIZE(port->comphys); i++) {
+		if (!port->comphys[i])
+			return 0;
+
+		ret = phy_power_off(port->comphys[i]);
+		if (ret)
+			return ret;
+	}
 
-	return phy_power_on(port->comphy);
+	return 0;
 }
 
 static void mvpp2_port_enable(struct mvpp2_port *port)
@@ -3389,7 +3412,9 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
 
 	if (port->phylink)
 		phylink_stop(port->phylink);
-	phy_power_off(port->comphy);
+
+	if (port->priv->hw_version == MVPP22)
+		mvpp22_comphy_deinit(port);
 }
 
 static int mvpp2_check_ringparam_valid(struct net_device *dev,
@@ -4946,7 +4971,7 @@ static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode,
 		port->phy_interface = state->interface;
 
 		/* Reconfigure the serdes lanes */
-		phy_power_off(port->comphy);
+		mvpp22_comphy_deinit(port);
 		mvpp22_mode_reconfigure(port);
 	}
 
@@ -5037,20 +5062,18 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 			    struct fwnode_handle *port_fwnode,
 			    struct mvpp2 *priv)
 {
-	struct phy *comphy = NULL;
-	struct mvpp2_port *port;
-	struct mvpp2_port_pcpu *port_pcpu;
+	unsigned int ntxqs, nrxqs, ncomphys, nrequired_comphys, thread;
 	struct device_node *port_node = to_of_node(port_fwnode);
+	struct mvpp2_port_pcpu *port_pcpu;
 	netdev_features_t features;
-	struct net_device *dev;
 	struct phylink *phylink;
-	char *mac_from = "";
-	unsigned int ntxqs, nrxqs, thread;
+	struct mvpp2_port *port;
 	unsigned long flags = 0;
+	struct net_device *dev;
+	int err, i, phy_mode;
+	char *mac_from = "";
 	bool has_tx_irqs;
 	u32 id;
-	int phy_mode;
-	int err, i;
 
 	has_tx_irqs = mvpp2_port_has_irqs(priv, port_node, &flags);
 	if (!has_tx_irqs && queue_mode == MVPP2_QDIST_MULTI_MODE) {
@@ -5084,14 +5107,38 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		goto err_free_netdev;
 	}
 
+	port = netdev_priv(dev);
+
 	if (port_node) {
-		comphy = devm_of_phy_get(&pdev->dev, port_node, NULL);
-		if (IS_ERR(comphy)) {
-			if (PTR_ERR(comphy) == -EPROBE_DEFER) {
-				err = -EPROBE_DEFER;
-				goto err_free_netdev;
+		for (i = 0, ncomphys = 0; i < ARRAY_SIZE(port->comphys); i++) {
+			port->comphys[i] = devm_of_phy_get_by_index(&pdev->dev,
+								    port_node,
+								    i);
+			if (IS_ERR(port->comphys[i])) {
+				err = PTR_ERR(port->comphys[i]);
+				port->comphys[i] = NULL;
+				if (err == -EPROBE_DEFER)
+					goto err_free_netdev;
+				err = 0;
+				break;
 			}
-			comphy = NULL;
+
+			++ncomphys;
+		}
+
+		if (phy_mode == PHY_INTERFACE_MODE_XAUI)
+			nrequired_comphys = 4;
+		else if (phy_mode == PHY_INTERFACE_MODE_RXAUI)
+			nrequired_comphys = 2;
+		else
+			nrequired_comphys = 1;
+
+		if (ncomphys < nrequired_comphys) {
+			dev_err(&pdev->dev,
+				"not enough comphys to support %s\n",
+				phy_modes(phy_mode));
+			err = -EINVAL;
+			goto err_free_netdev;
 		}
 	}
 
@@ -5106,7 +5153,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 	dev->netdev_ops = &mvpp2_netdev_ops;
 	dev->ethtool_ops = &mvpp2_eth_tool_ops;
 
-	port = netdev_priv(dev);
 	port->dev = dev;
 	port->fwnode = port_fwnode;
 	port->has_phy = !!of_find_property(port_node, "phy", NULL);
@@ -5143,7 +5189,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 
 	port->of_node = port_node;
 	port->phy_interface = phy_mode;
-	port->comphy = comphy;
 
 	if (priv->hw_version == MVPP21) {
 		port->base = devm_platform_ioremap_resource(pdev, 2 + id);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v5 2/3] net: phy: add support for clause 37 auto-negotiation
From: Tao Ren @ 2019-08-08 23:48 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean, netdev,
	linux-kernel, openbmc

From: Heiner Kallweit <hkallweit1@gmail.com>

This patch adds support for clause 37 1000Base-X auto-negotiation.
It's compile-tested only as I don't have fiber equipment.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 139 +++++++++++++++++++++++++++++++++++
 include/linux/phy.h          |   5 ++
 2 files changed, 144 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 252a712d1b2b..7c5315302937 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1617,6 +1617,40 @@ static int genphy_config_advert(struct phy_device *phydev)
 	return changed;
 }
 
+/**
+ * genphy_c37_config_advert - sanitize and advertise auto-negotiation parameters
+ * @phydev: target phy_device struct
+ *
+ * Description: Writes MII_ADVERTISE with the appropriate values,
+ *   after sanitizing the values to make sure we only advertise
+ *   what is supported.  Returns < 0 on error, 0 if the PHY's advertisement
+ *   hasn't changed, and > 0 if it has changed. This function is intended
+ *   for Clause 37 1000Base-X mode.
+ */
+static int genphy_c37_config_advert(struct phy_device *phydev)
+{
+	u16 adv = 0;
+
+	/* Only allow advertising what this PHY supports */
+	linkmode_and(phydev->advertising, phydev->advertising,
+		     phydev->supported);
+
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XFULL;
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XPAUSE;
+	if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+			      phydev->advertising))
+		adv |= ADVERTISE_1000XPSE_ASYM;
+
+	return phy_modify_changed(phydev, MII_ADVERTISE,
+				  ADVERTISE_1000XFULL | ADVERTISE_1000XPAUSE |
+				  ADVERTISE_1000XHALF | ADVERTISE_1000XPSE_ASYM,
+				  adv);
+}
+
 /**
  * genphy_config_eee_advert - disable unwanted eee mode advertisement
  * @phydev: target phy_device struct
@@ -1726,6 +1760,54 @@ int genphy_config_aneg(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_config_aneg);
 
+/**
+ * genphy_c37_config_aneg - restart auto-negotiation or write BMCR
+ * @phydev: target phy_device struct
+ *
+ * Description: If auto-negotiation is enabled, we configure the
+ *   advertising, and then restart auto-negotiation.  If it is not
+ *   enabled, then we write the BMCR. This function is intended
+ *   for use with Clause 37 1000Base-X mode.
+ */
+int genphy_c37_config_aneg(struct phy_device *phydev)
+{
+	int err, changed;
+
+	if (AUTONEG_ENABLE != phydev->autoneg)
+		return genphy_setup_forced(phydev);
+
+	err = phy_modify(phydev, MII_BMCR, BMCR_SPEED1000 | BMCR_SPEED100,
+			 BMCR_SPEED1000);
+	if (err)
+		return err;
+
+	changed = genphy_c37_config_advert(phydev);
+	if (changed < 0) /* error */
+		return changed;
+
+	if (!changed) {
+		/* Advertisement hasn't changed, but maybe aneg was never on to
+		 * begin with?  Or maybe phy was isolated?
+		 */
+		int ctl = phy_read(phydev, MII_BMCR);
+
+		if (ctl < 0)
+			return ctl;
+
+		if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
+			changed = 1; /* do restart aneg */
+	}
+
+	/* Only restart aneg if we are advertising something different
+	 * than we were before.
+	 */
+	if (changed > 0)
+		return genphy_restart_aneg(phydev);
+
+	return 0;
+}
+EXPORT_SYMBOL(genphy_c37_config_aneg);
+
 /**
  * genphy_aneg_done - return auto-negotiation status
  * @phydev: target phy_device struct
@@ -1864,6 +1946,63 @@ int genphy_read_status(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(genphy_read_status);
 
+/**
+ * genphy_c37_read_status - check the link status and update current link state
+ * @phydev: target phy_device struct
+ *
+ * Description: Check the link, then figure out the current state
+ *   by comparing what we advertise with what the link partner
+ *   advertises. This function is for Clause 37 1000Base-X mode.
+ */
+int genphy_c37_read_status(struct phy_device *phydev)
+{
+	int lpa, err, old_link = phydev->link;
+
+	/* Update the link, but return if there was an error */
+	err = genphy_update_link(phydev);
+	if (err)
+		return err;
+
+	/* why bother the PHY if nothing can have changed */
+	if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
+		return 0;
+
+	phydev->duplex = DUPLEX_UNKNOWN;
+	phydev->pause = 0;
+	phydev->asym_pause = 0;
+
+	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+		lpa = phy_read(phydev, MII_LPA);
+		if (lpa < 0)
+			return lpa;
+
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+				 phydev->lp_advertising, lpa & LPA_LPACK);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
+				 phydev->lp_advertising, lpa & LPA_1000XFULL);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+				 phydev->lp_advertising, lpa & LPA_1000XPAUSE);
+		linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+				 phydev->lp_advertising,
+				 lpa & LPA_1000XPAUSE_ASYM);
+
+		phy_resolve_aneg_linkmode(phydev);
+	} else if (phydev->autoneg == AUTONEG_DISABLE) {
+		int bmcr = phy_read(phydev, MII_BMCR);
+
+		if (bmcr < 0)
+			return bmcr;
+
+		if (bmcr & BMCR_FULLDPLX)
+			phydev->duplex = DUPLEX_FULL;
+		else
+			phydev->duplex = DUPLEX_HALF;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(genphy_c37_read_status);
+
 /**
  * genphy_soft_reset - software reset the PHY via BMCR_RESET bit
  * @phydev: target phy_device struct
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73f93..81a2921512ee 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1077,6 +1077,11 @@ int genphy_suspend(struct phy_device *phydev);
 int genphy_resume(struct phy_device *phydev);
 int genphy_loopback(struct phy_device *phydev, bool enable);
 int genphy_soft_reset(struct phy_device *phydev);
+
+/* Clause 37 */
+int genphy_c37_config_aneg(struct phy_device *phydev);
+int genphy_c37_read_status(struct phy_device *phydev);
+
 static inline int genphy_no_soft_reset(struct phy_device *phydev)
 {
 	return 0;
-- 
2.17.1


^ permalink raw reply related

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Jakub Kicinski @ 2019-08-09  0:05 UTC (permalink / raw)
  To: Y Song
  Cc: Maciej Fijalkowski, Daniel T. Lee, Daniel Borkmann,
	Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRWeD+9Lmz+mJt3EnNkX8kbcyCW4sNgRindCiObnzAj-yQ@mail.gmail.com>

On Thu, 8 Aug 2019 12:52:11 -0700, Y Song wrote:
> > Ah ok. In this scenario if driver has a native xdp support we would be invoking
> > its ndo_bpf even if there's no prog currently attached and it wouldn't return
> > error value.
> >
> > Looking at dev_xdp_uninstall, setting driver's prog to NULL is being done only
> > when prog is attached. Maybe we should consider querying the driver in
> > dev_change_xdp_fd regardless of passed fd value? E.g. don't query only when
> > prog >= 0.
> >
> > I don't recall whether this was brought up previously.  
> 
> Thanks for explanation. I think return an error is better in
> such error cases. Otherwise, people mistakenly write wrong
> device name and they may think xdp is detached and it is
> actually not.
> 
> But this probably should not prevent
> this patch as it is more like a kernel issue.

Agreed, we'd probably need a flag here, similar to IF_NOEXIST to keep
backward compat.

^ permalink raw reply

* Re: general protection fault in tls_tx_records
From: Jakub Kicinski @ 2019-08-09  0:19 UTC (permalink / raw)
  To: syzbot
  Cc: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
	john.fastabend, kafai, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs
In-Reply-To: <000000000000216779058f9dc40e@google.com>

On Thu, 08 Aug 2019 09:44:06 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    ce96e791 Add linux-next specific files for 20190731
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13ce4fd0600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fca5b9d53db6585c
> dashboard link: https://syzkaller.appspot.com/bug?extid=97d0cf528b9c8e9be7f4
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Looks like this was an old tree here, so most likely:

#syz fix: net/tls: partially revert fix transition through disconnect with close

^ permalink raw reply

* Re: KASAN: use-after-free Read in tls_wait_data
From: Jakub Kicinski @ 2019-08-09  0:20 UTC (permalink / raw)
  To: syzbot
  Cc: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
	john.fastabend, kafai, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs
In-Reply-To: <000000000000262820058f9dc474@google.com>

On Thu, 08 Aug 2019 09:44:07 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    7b4980e0 Add linux-next specific files for 20190802
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14a749b4600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7e1348afd44b5e02
> dashboard link: https://syzkaller.appspot.com/bug?extid=30c791a76814a3c6c9f9
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+30c791a76814a3c6c9f9@syzkaller.appspotmail.com

Also old tree, pretty confidently I can say:

#syz fix: net/tls: partially revert fix transition through disconnect with close

^ permalink raw reply

* [PATCHv2 net 0/2] Add netdev_level_ratelimited to avoid netdev msg flush
From: Hangbin Liu @ 2019-08-09  0:29 UTC (permalink / raw)
  To: netdev; +Cc: Joe Perches, Thomas Falcon, David S . Miller, Hangbin Liu
In-Reply-To: <20190801090347.8258-1-liuhangbin@gmail.com>

This patch set add netdev_level_ratelimited to avoid netdev msg flush.
The second patch fixed ibmveth msg flush when add lots of(e.g. 2000) group
memberships in one group at the same time.

In my testing, there will be the

ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table

error when add more thann 256 memberships in one multicast group. I haven't
found this issue on other driver. It looks like an ibm driver issue and need
to be fixed separately.

v2: add netdev_level_ratelimited as Joe Perches suggested

Hangbin Liu (2):
  netdevice.h: add netdev_level_ratelimited for netdevice
  ibmveth: use net_err_ratelimited when set_multicast_list

 drivers/net/ethernet/ibm/ibmveth.c |  5 ++-
 include/linux/netdevice.h          | 53 ++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 3 deletions(-)

-- 
2.19.2

^ permalink raw reply

* [PATCHv2 net 1/2] netdevice.h: add netdev_level_ratelimited for netdevice
From: Hangbin Liu @ 2019-08-09  0:29 UTC (permalink / raw)
  To: netdev; +Cc: Joe Perches, Thomas Falcon, David S . Miller, Hangbin Liu
In-Reply-To: <20190809002941.15341-1-liuhangbin@gmail.com>

Add netdev_level_ratelimited so we can use it in the future.
The code is copied from device.h.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/linux/netdevice.h | 53 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88292953aa6f..4e37065c6717 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4737,6 +4737,59 @@ do {								\
 #define netdev_info_once(dev, fmt, ...) \
 	netdev_level_once(KERN_INFO, dev, fmt, ##__VA_ARGS__)
 
+#define netdev_level_ratelimited(netdev_level, dev, fmt, ...)		\
+do {									\
+	static DEFINE_RATELIMIT_STATE(_rs,				\
+				      DEFAULT_RATELIMIT_INTERVAL,	\
+				      DEFAULT_RATELIMIT_BURST);		\
+	if (__ratelimit(&_rs))						\
+		netdev_level(dev, fmt, ##__VA_ARGS__);			\
+} while (0)
+
+#define netdev_emerg_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_emerg, dev, fmt, ##__VA_ARGS__)
+#define netdev_alert_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_alert, dev, fmt, ##__VA_ARGS__)
+#define netdev_crit_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_crit, dev, fmt, ##__VA_ARGS__)
+#define netdev_err_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_err, dev, fmt, ##__VA_ARGS__)
+#define netdev_warn_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_warn, dev, fmt, ##__VA_ARGS__)
+#define netdev_notice_ratelimited(dev, fmt, ...)			\
+	netdev_level_ratelimited(netdev_notice, dev, fmt, ##__VA_ARGS__)
+#define netdev_info_ratelimited(dev, fmt, ...)				\
+	netdev_level_ratelimited(netdev_info, dev, fmt, ##__VA_ARGS__)
+#if defined(CONFIG_DYNAMIC_DEBUG)
+/* descriptor check is first to prevent flooding with "callbacks suppressed" */
+#define netdev_dbg_ratelimited(dev, fmt, ...)				\
+do {									\
+	static DEFINE_RATELIMIT_STATE(_rs,				\
+				      DEFAULT_RATELIMIT_INTERVAL,	\
+				      DEFAULT_RATELIMIT_BURST);		\
+	DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt);			\
+	if (DYNAMIC_DEBUG_BRANCH(descriptor) &&				\
+	    __ratelimit(&_rs))						\
+		__dynamic_netdev_dbg(&descriptor, dev, dev_fmt(fmt),	\
+				     ##__VA_ARGS__);			\
+} while (0)
+#elif defined(DEBUG)
+#define netdev_dbg_ratelimited(dev, fmt, ...)				\
+do {									\
+	static DEFINE_RATELIMIT_STATE(_rs,				\
+				      DEFAULT_RATELIMIT_INTERVAL,	\
+				      DEFAULT_RATELIMIT_BURST);		\
+	if (__ratelimit(&_rs))						\
+		netdev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \
+} while (0)
+#else
+#define netdev_dbg_ratelimited(dev, fmt, ...)				\
+do {									\
+	if (0)								\
+		netdev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \
+} while (0)
+#endif
+
 #define MODULE_ALIAS_NETDEV(device) \
 	MODULE_ALIAS("netdev-" device)
 
-- 
2.19.2


^ permalink raw reply related

* [PATCHv2 net 2/2] ibmveth: use netdev_err_ratelimited when set_multicast_list
From: Hangbin Liu @ 2019-08-09  0:29 UTC (permalink / raw)
  To: netdev; +Cc: Joe Perches, Thomas Falcon, David S . Miller, Hangbin Liu
In-Reply-To: <20190809002941.15341-1-liuhangbin@gmail.com>

When add lots of (e.g. add 3000) memberships in one multicast group on
ibmveth, the following error message flushes our console log file

8507    [  901.478251] ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table
...
1718386 [ 5636.808658] ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table

We got 1.5 million lines of messages in 1.3h. Let's replace netdev_err() by
netdev_err_ratelimited() to avoid this issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/ethernet/ibm/ibmveth.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index d654c234aaf7..138523ee5e84 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1446,9 +1446,8 @@ static void ibmveth_set_multicast_list(struct net_device *netdev)
 						   IbmVethMcastAddFilter,
 						   mcast_addr);
 			if (lpar_rc != H_SUCCESS) {
-				netdev_err(netdev, "h_multicast_ctrl rc=%ld "
-					   "when adding an entry to the filter "
-					   "table\n", lpar_rc);
+				netdev_err_ratelimited(netdev, "h_multicast_ctrl rc=%ld when adding an entry to the filter table\n",
+						       lpar_rc);
 			}
 		}

-- 
2.19.2

^ permalink raw reply related

* [PATCH v3] tools: bpftool: fix reading from /proc/config.gz
From: Peter Wu @ 2019-08-09  0:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, Stanislav Fomichev, Jakub Kicinski, Quentin Monnet

/proc/config has never existed as far as I can see, but /proc/config.gz
is present on Arch Linux. Add support for decompressing config.gz using
zlib which is a mandatory dependency of libelf. Replace existing stdio
functions with gzFile operations since the latter transparently handles
uncompressed and gzip-compressed files.

Cc: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 v3: replace popen(gunzip) by linking directly to zlib. Reword commit
     message, remove "Fixes" line. (this patch)
 v2: fix style (reorder vars as reverse xmas tree, rename function,
     braces), fallback to /proc/config.gz if uname() fails.
     https://lkml.kernel.org/r/20190806010702.3303-1-peter@lekensteyn.nl
 v1: https://lkml.kernel.org/r/20190805001541.8096-1-peter@lekensteyn.nl

Hi,

Thanks to Jakub for observing that zlib is already used by libelf, this
simplifies the patch tremendously as the same API can be used for both
compressed and uncompressed files. No special case exists anymore for
fclose/pclose.

According to configure.ac in elfutils, zlib is mandatory, so I just
assume it to be available. For simplicity I also silently assume lines
to be less than 4096 characters. If that is not the case, then lines
will appear truncated, but that should not be an issue for the
CONFIG_xyz lines that we are scanning for.

Jakub requested the handle leak fix to be posted separately against the
bpf tree, but since the whole code is rewritten I am not sure if it is
worth it. It is an unusual edge case: /boot/config-$(uname -r) could be
opened, but starts with unexpected data.

Kind regards,
Peter
---
 tools/bpf/bpftool/Makefile  |   2 +-
 tools/bpf/bpftool/feature.c | 105 ++++++++++++++++++------------------
 2 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index a7afea4dec47..078bd0dcfba5 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -52,7 +52,7 @@ ifneq ($(EXTRA_LDFLAGS),)
 LDFLAGS += $(EXTRA_LDFLAGS)
 endif
 
-LIBS = -lelf $(LIBBPF)
+LIBS = -lelf -lz $(LIBBPF)
 
 INSTALL ?= install
 RM ?= rm -f
diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
index d672d9086fff..03bdc5b3ac49 100644
--- a/tools/bpf/bpftool/feature.c
+++ b/tools/bpf/bpftool/feature.c
@@ -14,6 +14,7 @@
 
 #include <bpf.h>
 #include <libbpf.h>
+#include <zlib.h>
 
 #include "main.h"
 
@@ -284,34 +285,32 @@ static void probe_jit_limit(void)
 	}
 }
 
-static char *get_kernel_config_option(FILE *fd, const char *option)
+static bool read_next_kernel_config_option(gzFile file, char *buf, size_t n,
+					   char **value)
 {
-	size_t line_n = 0, optlen = strlen(option);
-	char *res, *strval, *line = NULL;
-	ssize_t n;
+	char *sep;
 
-	rewind(fd);
-	while ((n = getline(&line, &line_n, fd)) > 0) {
-		if (strncmp(line, option, optlen))
+	while (gzgets(file, buf, n)) {
+		if (strncmp(buf, "CONFIG_", 7))
 			continue;
-		/* Check we have at least '=', value, and '\n' */
-		if (strlen(line) < optlen + 3)
-			continue;
-		if (*(line + optlen) != '=')
+
+		sep = strchr(buf, '=');
+		if (!sep)
 			continue;
 
 		/* Trim ending '\n' */
-		line[strlen(line) - 1] = '\0';
+		buf[strlen(buf) - 1] = '\0';
+
+		/* Split on '=' and ensure that a value is present. */
+		*sep = '\0';
+		if (!sep[1])
+			continue;
 
-		/* Copy and return config option value */
-		strval = line + optlen + 1;
-		res = strdup(strval);
-		free(line);
-		return res;
+		*value = sep + 1;
+		return true;
 	}
-	free(line);
 
-	return NULL;
+	return false;
 }
 
 static void probe_kernel_image_config(void)
@@ -386,59 +385,61 @@ static void probe_kernel_image_config(void)
 		/* test_bpf module for BPF tests */
 		"CONFIG_TEST_BPF",
 	};
-	char *value, *buf = NULL;
+	char *values[ARRAY_SIZE(options)] = { };
 	struct utsname utsn;
 	char path[PATH_MAX];
-	size_t i, n;
-	ssize_t ret;
-	FILE *fd;
+	gzFile file = NULL;
+	char buf[4096];
+	char *value;
+	size_t i;
 
-	if (uname(&utsn))
-		goto no_config;
+	if (!uname(&utsn)) {
+		snprintf(path, sizeof(path), "/boot/config-%s", utsn.release);
 
-	snprintf(path, sizeof(path), "/boot/config-%s", utsn.release);
+		/* gzopen also accepts uncompressed files. */
+		file = gzopen(path, "r");
+	}
 
-	fd = fopen(path, "r");
-	if (!fd && errno == ENOENT) {
-		/* Some distributions put the config file at /proc/config, give
-		 * it a try.
-		 * Sometimes it is also at /proc/config.gz but we do not try
-		 * this one for now, it would require linking against libz.
+	if (!file) {
+		/* Some distributions build with CONFIG_IKCONFIG=y and put the
+		 * config file at /proc/config.gz.
 		 */
-		fd = fopen("/proc/config", "r");
+		file = gzopen("/proc/config.gz", "r");
 	}
-	if (!fd) {
+	if (!file) {
 		p_info("skipping kernel config, can't open file: %s",
 		       strerror(errno));
-		goto no_config;
+		goto end_parse;
 	}
 	/* Sanity checks */
-	ret = getline(&buf, &n, fd);
-	ret = getline(&buf, &n, fd);
-	if (!buf || !ret) {
+	if (!gzgets(file, buf, sizeof(buf)) ||
+	    !gzgets(file, buf, sizeof(buf))) {
 		p_info("skipping kernel config, can't read from file: %s",
 		       strerror(errno));
-		free(buf);
-		goto no_config;
+		goto end_parse;
 	}
 	if (strcmp(buf, "# Automatically generated file; DO NOT EDIT.\n")) {
 		p_info("skipping kernel config, can't find correct file");
-		free(buf);
-		goto no_config;
+		goto end_parse;
 	}
-	free(buf);
 
-	for (i = 0; i < ARRAY_SIZE(options); i++) {
-		value = get_kernel_config_option(fd, options[i]);
-		print_kernel_option(options[i], value);
-		free(value);
+	while (read_next_kernel_config_option(file, buf, sizeof(buf), &value)) {
+		for (i = 0; i < ARRAY_SIZE(options); i++) {
+			if (values[i] || strcmp(buf, options[i]))
+				continue;
+
+			values[i] = strdup(value);
+		}
 	}
-	fclose(fd);
-	return;
 
-no_config:
-	for (i = 0; i < ARRAY_SIZE(options); i++)
-		print_kernel_option(options[i], NULL);
+end_parse:
+	if (file)
+		gzclose(file);
+
+	for (i = 0; i < ARRAY_SIZE(options); i++) {
+		print_kernel_option(options[i], values[i]);
+		free(values[i]);
+	}
 }
 
 static bool probe_bpf_syscall(const char *define_prefix)
-- 
2.22.0


^ permalink raw reply related

* Re: memory leak in internal_dev_create
From: Marcelo Ricardo Leitner @ 2019-08-09  0:45 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Hillf Danton, syzbot, David S. Miller, ovs dev, linux-kernel,
	Linux Kernel Network Developers, syzkaller-bugs
In-Reply-To: <CAOrHB_BmuAxdch-nbaTS-1eXN-0goUb5UXtYDr==0KeM9vVsRw@mail.gmail.com>

On Wed, Aug 07, 2019 at 01:32:40PM -0700, Pravin Shelar wrote:
> On Tue, Aug 6, 2019 at 5:00 AM Hillf Danton <hdanton@sina.com> wrote:
> >
> >
> > On Tue, 06 Aug 2019 01:58:05 -0700
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> 
> ...
> > > BUG: memory leak
> > > unreferenced object 0xffff8881228ca500 (size 128):
> > >    comm "syz-executor032", pid 7015, jiffies 4294944622 (age 7.880s)
> > >    hex dump (first 32 bytes):
> > >      00 f0 27 18 81 88 ff ff 80 ac 8c 22 81 88 ff ff  ..'........"....
> > >      40 b7 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  @.#.............
> > >    backtrace:
> > >      [<000000000eb78212>] kmemleak_alloc_recursive  include/linux/kmemleak.h:43 [inline]
> > >      [<000000000eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
> > >      [<000000000eb78212>] slab_alloc mm/slab.c:3319 [inline]
> > >      [<000000000eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
> > >      [<00000000006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
> > >      [<00000000006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
> > >      [<00000000006ea6c6>] ovs_vport_alloc+0x37/0xf0  net/openvswitch/vport.c:130
> > >      [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  net/openvswitch/vport-internal_dev.c:164
> > >      [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  net/openvswitch/vport.c:199
> > >      [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
> > >      [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  net/openvswitch/datapath.c:1614
> > >      [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  net/netlink/genetlink.c:629
> > >      [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
> > >      [<000000006694b647>] netlink_rcv_skb+0x61/0x170  net/netlink/af_netlink.c:2477
> > >      [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
> > >      [<00000000dad42a47>] netlink_unicast_kernel  net/netlink/af_netlink.c:1302 [inline]
> > >      [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  net/netlink/af_netlink.c:1328
> > >      [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  net/netlink/af_netlink.c:1917
> > >      [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
> > >      [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
> > >      [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
> > >      [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
> > >      [<00000000c10abb2d>] __do_sys_sendmsg net/socket.c:2365 [inline]
> > >      [<00000000c10abb2d>] __se_sys_sendmsg net/socket.c:2363 [inline]
> > >      [<00000000c10abb2d>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
> >
> >
> > Always free vport manually unless register_netdevice() succeeds.
> >
> > --- a/net/openvswitch/vport-internal_dev.c
> > +++ b/net/openvswitch/vport-internal_dev.c
> > @@ -137,7 +137,7 @@ static void do_setup(struct net_device *
> >         netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH |
> >                               IFF_NO_QUEUE;
> >         netdev->needs_free_netdev = true;
> > -       netdev->priv_destructor = internal_dev_destructor;
> > +       netdev->priv_destructor = NULL;
> >         netdev->ethtool_ops = &internal_dev_ethtool_ops;
> >         netdev->rtnl_link_ops = &internal_dev_link_ops;
> >
> > @@ -159,7 +159,6 @@ static struct vport *internal_dev_create
> >         struct internal_dev *internal_dev;
> >         struct net_device *dev;
> >         int err;
> > -       bool free_vport = true;
> >
> >         vport = ovs_vport_alloc(0, &ovs_internal_vport_ops, parms);
> >         if (IS_ERR(vport)) {
> > @@ -190,10 +189,9 @@ static struct vport *internal_dev_create
> >
> >         rtnl_lock();
> >         err = register_netdevice(vport->dev);
> > -       if (err) {
> > -               free_vport = false;
> > +       if (err)
> >                 goto error_unlock;
> > -       }
> > +       vport->dev->priv_destructor = internal_dev_destructor;
> >
> I am not sure why have you moved this assignment out of do_setup().
> 
> Otherwise patch looks good to me.
> 
> Thanks.

Seems it's to avoid re-introducing the issue that was fixed by:

commit 309b66970ee2abf721ecd0876a48940fa0b99a35
Author: Taehee Yoo <ap420073@gmail.com>
Date:   Sun Jun 9 23:26:21 2019 +0900

    net: openvswitch: do not free vport if register_netdevice() is failed.

A Fixes: 309b66970ee2a  is welcomed then.

> >         dev_set_promiscuity(vport->dev, 1);
> >         rtnl_unlock();
> > @@ -207,8 +205,7 @@ error_unlock:
> >  error_free_netdev:
> >         free_netdev(dev);
> >  error_free_vport:
> > -       if (free_vport)
> > -               ovs_vport_free(vport);
> > +       ovs_vport_free(vport);
> >  error:
> >         return ERR_PTR(err);
> >  }
> > --
> >
> 

^ permalink raw reply

* [PATCH v2] tipc: initialise addr_trail_end when setting node addresses
From: Chris Packham @ 2019-08-09  0:54 UTC (permalink / raw)
  To: jon.maloy, ying.xue, davem
  Cc: netdev, tipc-discussion, linux-kernel, Chris Packham

Ensure addr_trail_end is set to jiffies when configuring the node
address. This ensures that we don't treat the initial value of 0 as
being a wrapped. This isn't a problem when using auto-generated node
addresses because the addr_trail_end is updated for the duplicate
address detection phase.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
Changes in v2:
- move setting to tipc_set_node_addr() as suggested
- reword commit message

 net/tipc/addr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/addr.c b/net/tipc/addr.c
index b88d48d00913..0f1eaed1bd1b 100644
--- a/net/tipc/addr.c
+++ b/net/tipc/addr.c
@@ -75,6 +75,7 @@ void tipc_set_node_addr(struct net *net, u32 addr)
 		tipc_set_node_id(net, node_id);
 	}
 	tn->trial_addr = addr;
+	tn->addr_trial_end = jiffies;
 	pr_info("32-bit node address hash set to %x\n", addr);
 }
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH] net: phy: at803x: stop switching phy delay config needlessly
From: André Draszik @ 2019-08-09  0:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: André Draszik, Andrew Lunn, Florian Fainelli,
	Heiner Kallweit, David S. Miller, netdev

This driver does a funny dance disabling and re-enabling
RX and/or TX delays. In any of the RGMII-ID modes, it first
disables the delays, just to re-enable them again right
away. This looks like a needless exercise.

Just enable the respective delays when in any of the
relevant 'id' modes, and disable them otherwise.

Also, remove comments which don't add anything that can't be
seen by looking at the code.

Signed-off-by: André Draszik <git@andred.net>
CC: Andrew Lunn <andrew@lunn.ch>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: Heiner Kallweit <hkallweit1@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
---
 drivers/net/phy/at803x.c | 26 ++++++--------------------
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 222ccd9ecfce..2ab51f552e92 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -257,35 +257,21 @@ static int at803x_config_init(struct phy_device *phydev)
 	 *   after HW reset: RX delay enabled and TX delay disabled
 	 *   after SW reset: RX delay enabled, while TX delay retains the
 	 *   value before reset.
-	 *
-	 * So let's first disable the RX and TX delays in PHY and enable
-	 * them based on the mode selected (this also takes care of RGMII
-	 * mode where we expect delays to be disabled)
 	 */
-
-	ret = at803x_disable_rx_delay(phydev);
-	if (ret < 0)
-		return ret;
-	ret = at803x_disable_tx_delay(phydev);
-	if (ret < 0)
-		return ret;
-
 	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
 	    phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) {
-		/* If RGMII_ID or RGMII_RXID are specified enable RX delay,
-		 * otherwise keep it disabled
-		 */
 		ret = at803x_enable_rx_delay(phydev);
-		if (ret < 0)
-			return ret;
+	} else {
+		ret = at803x_disable_rx_delay(phydev);
 	}
+	if (ret < 0)
+		return ret;
 
 	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID ||
 	    phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID) {
-		/* If RGMII_ID or RGMII_TXID are specified enable TX delay,
-		 * otherwise keep it disabled
-		 */
 		ret = at803x_enable_tx_delay(phydev);
+	} else {
+		ret = at803x_disable_tx_delay(phydev);
 	}
 
 	return ret;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2] dpaa_eth: Use refcount_t for refcount
From: David Miller @ 2019-08-09  0:59 UTC (permalink / raw)
  To: hslester96; +Cc: madalin.bucur, netdev, linux-kernel
In-Reply-To: <20190802164759.20135-1-hslester96@gmail.com>

From: Chuhong Yuan <hslester96@gmail.com>
Date: Sat,  3 Aug 2019 00:47:59 +0800

> refcount_t is better for reference counters since its
> implementation can prevent overflows.
> So convert atomic_t ref counters to refcount_t.
> 
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> ---
> Changes in v2:
>   - Add #include in dpaa_eth.h.

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v2] mkiss: Use refcount_t for refcount
From: David Miller @ 2019-08-09  0:59 UTC (permalink / raw)
  To: hslester96; +Cc: netdev, linux-kernel
In-Reply-To: <20190802164821.20189-1-hslester96@gmail.com>

From: Chuhong Yuan <hslester96@gmail.com>
Date: Sat,  3 Aug 2019 00:48:21 +0800

> refcount_t is better for reference counters since its
> implementation can prevent overflows.
> So convert atomic_t ref counters to refcount_t.
> 
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> ---
> Changes in v2:
>   - Add #include.

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] xen/netback: Reset nr_frags before freeing skb
From: David Miller @ 2019-08-09  1:02 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: netdev, xen-devel, paul.durrant, wei.liu
In-Reply-To: <20190805153434.12144-1-ross.lagerwall@citrix.com>

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Mon, 5 Aug 2019 16:34:34 +0100

> At this point nr_frags has been incremented but the frag does not yet
> have a page assigned so freeing the skb results in a crash. Reset
> nr_frags before freeing the skb to prevent this.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Applied and queued up for -stable.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox