Netdev List
 help / color / mirror / Atom feed
* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-22  2:32 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Cornelia Huck, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <CADGSJ22B39JiS14L0yCYQ720-fZcwyUt1wCX1YO-6Y9FhmDKZg@mail.gmail.com>

On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > On Wed, 20 Jun 2018 22:48:58 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >>
> >> It's mostly so we can have e.g. multiple devices with same MAC
> >> (which some people seem to want in order to then use
> >> then with different containers).
> >>
> >> But it is also handy for when you assign a PF, since then you
> >> can't set the MAC.
> >>
> >
> > OK, so what about the following:
> >
> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >   that we have a new uuid field in the virtio-net config space
> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > - when configuring, set the property to the group UUID of the vfio-pci
> >   device
> 
> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> to still expose UUID in the config space on virtio-pci?


Yes but guest is not supposed to read it.

> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> where the corresponding vfio-pci device attached to for a guest which
> doesn't support the feature (legacy).
> 
> -Siwei

Yes but you won't add the primary behind such a bridge.

> 
> > - in the guest, use the uuid from the virtio-net device's config space
> >   if applicable; else, fall back to matching by MAC as done today
> >
> > That should work for all virtio transports.

^ permalink raw reply

* Re: [PATCH net] net: sungem: fix rx checksum support
From: Eric Dumazet @ 2018-06-22  4:20 UTC (permalink / raw)
  To: David Miller, edumazet; +Cc: netdev, mroos, malat, schwab, eric.dumazet
In-Reply-To: <20180620.143050.313454768369559179.davem@davemloft.net>



On 06/19/2018 10:30 PM, David Miller wrote:
> Tested-by: Andreas Schwab <schwab@linux-m68k.org>
> 
> Applied and queued up for -stable, thanks Eric.
> 

BTW, removing the FCS also means GRO is going to work, finally on this NIC ;)

GRO does not like packets with padding.

^ permalink raw reply

* Re: [PATCH 0/7] Assorted rhashtables cleanups.
From: David Miller @ 2018-06-22  4:43 UTC (permalink / raw)
  To: neilb; +Cc: tgraf, herbert, netdev, linux-kernel
In-Reply-To: <152929034948.23173.8671757672560065344.stgit@noble>

From: NeilBrown <neilb@suse.com>
Date: Mon, 18 Jun 2018 12:52:50 +1000

> Following 7 patches are selections from a recent RFC series I posted
> that have all received suitable Acks.
> 
> The most visible changes are that rhashtable-types.h is now preferred
> for inclusion in include/linux/*.h rather than rhashtable.h, and
> that the full hash is used - no bits a reserved for a NULLS pointer.

Series applied to net-next, thanks Neil.

^ permalink raw reply

* Re: [PATCH net-next] tcp: ignore rcv_rtt sample with old ts ecr value
From: David Miller @ 2018-06-22  4:45 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, edumazet, ncardwell
In-Reply-To: <20180620044250.33943-1-tracywwnj@gmail.com>

From: Wei Wang <weiwan@google.com>
Date: Tue, 19 Jun 2018 21:42:50 -0700

> From: Wei Wang <weiwan@google.com>
> 
> When receiving multiple packets with the same ts ecr value, only try
> to compute rcv_rtt sample with the earliest received packet.
> This is because the rcv_rtt calculated by later received packets
> could possibly include long idle time or other types of delay.
> For example:
> (1) server sends last packet of reply with TS val V1
> (2) client ACKs last packet of reply with TS ecr V1
> (3) long idle time passes
> (4) client sends next request data packet with TS ecr V1 (again!)
> At this time, the rcv_rtt computed on server with TS ecr V1 will be
> inflated with the idle time and should get ignored.
> 
> Signed-off-by: Wei Wang <weiwan@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: David Miller @ 2018-06-22  4:49 UTC (permalink / raw)
  To: shannon.nelson; +Cc: netdev, anders.roxell
In-Reply-To: <1529473363-4036-1-git-send-email-shannon.nelson@oracle.com>

From: Shannon Nelson <shannon.nelson@oracle.com>
Date: Tue, 19 Jun 2018 22:42:41 -0700

> A couple of bad behaviors in the ipsec selftest were pointed out
> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
> 
> Shannon Nelson (2):
>   selftests: rtnetlink: hide complaint from terminated monitor
>   selftests: rtnetlink: use a local IP address for IPsec tests

Series applied, but I wonder about patch #2.

The idea is that we don't make modifications to the actual system
networking configuration and therefore make changes that can't
possibly disrupt connectivity for the system under test.

Using a configured local IP address seems to subvert that.

^ permalink raw reply

* Re: ISDN: use irqsave() in URB completion + usb_fill_int_urb
From: David Miller @ 2018-06-22  4:54 UTC (permalink / raw)
  To: bigeasy; +Cc: netdev, isdn, linux-usb, tglx
In-Reply-To: <20180620104028.18283-1-bigeasy@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed, 20 Jun 2018 12:40:24 +0200

> This series is mostly about using _irqsave() primitives in the
> completion callback in order to get rid of local_irq_save() in
> __usb_hcd_giveback_urb(). While at it, I also tried to move drivers to
> use usb_fill_int_urb() otherwise it is hard find users of a certain API.

Series applied, thanks Sebastian.

^ permalink raw reply

* Re: [PATCH v2] ucc_geth: Add BQL support
From: David Miller @ 2018-06-22  4:56 UTC (permalink / raw)
  To: joakim.tjernlund; +Cc: leoyang.li, netdev
In-Reply-To: <20180620162918.3609-1-joakim.tjernlund@infinera.com>

From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Wed, 20 Jun 2018 18:29:18 +0200

> Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
> ---
> 
>  v2 - Reoder varibles according to Dave
>       Add call to netdev_reset_queue(dev) open/close

Applied.

^ permalink raw reply

* Re: net/usb: Use irqsave in USB's complete callback
From: David Miller @ 2018-06-22  4:58 UTC (permalink / raw)
  To: bigeasy; +Cc: netdev, linux-usb, tglx
In-Reply-To: <20180620193121.24967-1-bigeasy@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed, 20 Jun 2018 21:31:16 +0200

> This is about using _irqsave() primitives in the completion callback in
> order to get rid of local_irq_save() in __usb_hcd_giveback_urb().

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp_bbr: fix bbr pacing rate for internal pacing
From: David Miller @ 2018-06-22  5:04 UTC (permalink / raw)
  To: yyd; +Cc: netdev, edumazet
In-Reply-To: <20180620200735.82085-1-yyd@google.com>

From: Kevin Yang <yyd@google.com>
Date: Wed, 20 Jun 2018 16:07:35 -0400

> From: Eric Dumazet <edumazet@google.com>
> 
> This commit makes BBR use only the MSS (without any headers) to
> calculate pacing rates when internal TCP-layer pacing is used.
> 
> This is necessary to achieve the correct pacing behavior in this case,
> since tcp_internal_pacing() uses only the payload length to calculate
> pacing delays.
> 
> Signed-off-by: Kevin Yang <yyd@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Neal Cardwell <ncardwell@google.com>

Applied, thank you.

^ permalink raw reply

* bnx2x: kernel panic in the bnx2x driver
From: Vishwanath Pai @ 2018-06-22  5:07 UTC (permalink / raw)
  To: ariel.elior, everest-linux-l2; +Cc: davem, netdev, dbanerje, pai.vishwain

Hi,

We recently noticed a kernel panic in the bnx2x driver when trying to set
rx-flow-hash parameters via ethtool during if-pre-up.d. I am running kernel
v4.17.2 from ubuntu-mainline-ppa. I have added the stack trace below:

[   18.280209] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[   18.280212] PGD 8000000407a79067 P4D 8000000407a79067 PUD 40ce8a067 PMD 0
[   18.280214] Oops: 0010 [#1] SMP PTI
[   18.280215] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel joydev input_led kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc hid_eneric aesni_intel gpio_ich aes_x86_64 usbhid lpc_ich crpto_simd ie31200_edac cryptd glue_helper intel_cstate mac_hid intel_rapl_perf bnx2x mdio tcp_bbr netconsole ipmi_devintf ipmi_msghandler i2c_i801 coretemp autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear sha26_mb mcryptd sha256_ssse3 hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mpt3sas fb_sys_fops drm raid_class scsi_transport_sas ahci libahci shpchp video
[   18.280241] CPU: 6 PID: 1081 Comm: ethtool Not tainted 4.17.2-041702-generic #201806160433
[   18.280242] Hardware name: Foxconn CangJie/CangJie, BIOS CC1F108D 02/26/2014
[   18.280243] RIP: 0010:          (null)
[   18.280243] RSP: 0018:ffffb84bc260b9c0 EFLAGS: 00010246
[   18.280244] RAX: 0000000000000000 RBX: ffff92f987f020f0 RCX: 0000000000000000
[   18.280245] RDX: 0000000000000000 RSI: ffffb84bc260b9f8 RDI: ffff92f987f020f0
[   18.280245] RBP: ffffb8bc260b9e8 R08: 0000000000000001 R09: 0000000000000000
[   18.280246] R10: ffffb84bc260bd20 R11: 0000000000000000 R12: ffffb84bc260b9f8
[   18.280246] R13: ffff92f987f008c0 R14: 00007ffdb75bec40 R15: 0000000000000000
[   18.280247] FS:  00007fc0e8798700(0000) GS:ffff92f99fd80000(0000) knlGS:0000000000000000
[   18.280248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.280249] CR2: 0000000000000000 CR3: 0000000409b4c003 CR4: 00000000001606e0
[   18.280249] Call Trace:
[   18.280263]  ? bnx2x_config_rss+0x2f/0xd0 [bnx2x]
[   18.280270]  bnx2x_rss+0x1d9/0x210 [bnx2x]
[   18.280276]  bnx2x_set_rxnfc+0x17d/0x380 [bnx2x]
[   18.280279]  ethtool_set_rxnfc+0x9b/0x110
[   18.280281]  ? __do_page_cache_readahead+0x1da/0x2c0
[   18.280283]  ? security_capable+0x3c/0x60
[   18.280284]  dev_ethtool+0350/0x2610
[   18.280286]  ? page_cache_async_readahead+0x71/0x80
[   18.280288]  ? page_add_file_rmap+0x5d/0x220
[   18.280290]  ? inet_ioctl+0x182/0x1a0
[   18.280291]  dev_ioctl+0x203/0x3f0
[   18.280293]  ? dev_ioctl+0x203/0x3f0
[   18.280294]  sock_do_ioctl+0xae/0x150
[   18.280296]  sock_ioctl+0x1e2/0x330
[   18.280296]  ? sock_ioctl+0x1e2/0x330
[   18.280299]  do_vfs_ioctl+0xa8/0x620
[   18.280300]  ? dlci_ioctl_set+0x30/0x30
[   18.280301]  ? do_vfs_ioctl+0xa8/0x620
[   18.280302]  ? handle_mm_fault+0xe3/0x220
[   18.280304]  ksys_ioctl+0x75/0x80
[   18.280305]  __x64_sys_ioctl+0x1a/0x20
[   18.280307]  do_syscall_64+0x5a/0x120
[   18.280309]  entry_SYSCALL_64_aftr_hwframe+0x44/0xa9
[   18.280310] RIP: 0033:0x7fc0e7fba107
[   18.280311] RSP: 002b:00007ffdb75beb78 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[   18.280312] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc0e7fba107
[   18.280312] RDX: 00007ffdb75bed60 RSI: 0000000000008946 RDI: 0000000000000003
[   18.280313] RBP: 00007ffdb75bed50 R08: 00007ffdb75bed60 R09: 0000000000000001
[   18.280313] R10: 0000000000000541 R11: 0000000000000206 R12: 00007ffdb75beed0
[   18.280314] R13: 0000000000421020 R14: 000000000041fe28 R15: 0000000000000003
[   18.280315] Code:  Bad RIP value.
[   18.280317] RIP:           (null) RSP: ffffb84bc260b9c0
[  18.280318] CR2: 0000000000000000
[   18.280319] ---[ end trace 5f361db3fb9059f1 ]---

To reproduce this I created a bash script in "/etc/network/if-pre-up.d/" with
these two lines:
ethtool -N $IFACE rx-flow-hash udp4 "sdfn"
ethtool -N $IFACE rx-flow-hash udp6 "sdfn"

The problem here is that rss_obj in bnx2x struct for the device hasn't been
initialized yet, which causes an exception in bnx2x_config_rss() when calling
"r->set_pending(r)" because r->set_pending is NULL. It looks like a lot many
things haven't been initialized at this point, most of that happens in this
function: "bnx2x_init_bp_objs()" which isn't called until ifup. Any thoughts on
how this can be fixed? Would it be possible to safely move bnx2x_init_bp_objs()
to maybe bnx2x_init_one() which runs much before ifup?

Thanks,
Vishwanath

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] r8169: Don't disable ASPM in the driver
From: David Miller @ 2018-06-22  5:08 UTC (permalink / raw)
  To: kai.heng.feng
  Cc: ryankao, hayeswang, hau, hkallweit1, romieu, bhelgaas, acelan.kao,
	netdev, linux-pci, linux-kernel
In-Reply-To: <20180621083039.22545-1-kai.heng.feng@canonical.com>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>
Date: Thu, 21 Jun 2018 16:30:38 +0800

> Enable or disable ASPM should be done in PCI core instead of in the
> device driver.
> 
> Commit ba04c7c93bbc ("r8169: disable ASPM") uses
> pci_disable_link_state() to disable ASPM, but it's not the best way to
> do it. If the device really wants to disable ASPM, we can use a quirk in
> PCI core to prevent the PCI core from setting ASPM before probe.
> 
> Let's remove pci_disable_link_state() for now. Use PCI core quirks if
> any regression happens.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] r8169: Reinstate ASPM Support
From: David Miller @ 2018-06-22  5:08 UTC (permalink / raw)
  To: kai.heng.feng
  Cc: ryankao, hayeswang, hau, hkallweit1, romieu, bhelgaas, acelan.kao,
	netdev, linux-pci, linux-kernel
In-Reply-To: <20180621083039.22545-2-kai.heng.feng@canonical.com>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>
Date: Thu, 21 Jun 2018 16:30:39 +0800

> On Intel platforms (Skylake and newer), ASPM support in r8169 is the
> last missing puzzle to let CPU's Package C-State reaches PC8.  Without
> ASPM support, the CPU cannot reach beyond PC3. PC8 can save additional
> ~3W in comparison with PC3 on a Coffee Lake platform, Dell G3 3779.
> 
> This is based on the work from Chunhao Lin <hau@realtek.com>.
> 
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

Also applied.

Thank you for being so persistent.

^ permalink raw reply

* Re: KMSAN: uninit-value in ip_vs_lblc_check_expire
From: syzbot @ 2018-06-22  5:52 UTC (permalink / raw)
  To: coreteam, davem, fw, horms, ja, kadlec, linux-kernel, lvs-devel,
	netdev, netfilter-devel, pablo, syzkaller-bugs, wensong
In-Reply-To: <000000000000c66d94056a84011d@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    123906095e30 kmsan: introduce kmsan_interrupt_enter()/kmsa..
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=134ad890400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=848e40757852af3e
dashboard link: https://syzkaller.appspot.com/bug?extid=3e9695f147fb529aa9bc
compiler:       clang version 7.0.0 (trunk 334104)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11505218400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1168a490400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com

==================================================================
BUG: KMSAN: uninit-value in ip_vs_lblc_check_expire+0xe62/0xf10  
net/netfilter/ipvs/ip_vs_lblc.c:315
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
  __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
  ip_vs_lblc_check_expire+0xe62/0xf10 net/netfilter/ipvs/ip_vs_lblc.c:315
  call_timer_fn+0x280/0x5d0 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0xd96/0x11b0 kernel/time/timer.c:1666
  run_timer_softirq+0x43/0x70 kernel/time/timer.c:1692
  __do_softirq+0x592/0x979 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x202/0x240 kernel/softirq.c:405
  exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:525
  smp_apic_timer_interrupt+0x64/0x90 arch/x86/kernel/apic/apic.c:1055
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:866
  </IRQ>
RIP: 0010:native_safe_halt arch/x86/include/asm/irqflags.h:55 [inline]
RIP: 0010:arch_safe_halt arch/x86/include/asm/irqflags.h:97 [inline]
RIP: 0010:default_idle+0x20b/0x3e0 arch/x86/kernel/process.c:500
RSP: 0018:ffff8801d8e5fdf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff8801fd432f18 RBX: 0000000000000000 RCX: ffff880000000000
RDX: ffff8801fd032f18 RSI: aaaaaaaaaaaab000 RDI: ffffea0000000000
RBP: ffff8801d8e5fe28 R08: 0000000001080020 R09: 0000000000000002
R10: 00000030de3d75c0 R11: ffffffff89fef830 R12: ffff8801d8e5fe8f
R13: ffff8801d8da57c0 R14: ffff8801d8e5fe8c R15: ffff8801d8da6098
  arch_cpu_idle+0x26/0x30 arch/x86/kernel/process.c:491
  default_idle_call kernel/sched/idle.c:93 [inline]
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x36d/0x830 kernel/sched/idle.c:262
  cpu_startup_entry+0x45/0x50 kernel/sched/idle.c:368
  start_secondary+0x3c6/0x490 arch/x86/kernel/smpboot.c:272
  secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:282 [inline]
  kmsan_alloc_meta_for_pages+0x161/0x3a0 mm/kmsan/kmsan.c:819
  kmsan_alloc_page+0x82/0xe0 mm/kmsan/kmsan.c:889
  __alloc_pages_nodemask+0xf7b/0x5cc0 mm/page_alloc.c:4402
  alloc_pages_current+0x6b1/0x970 mm/mempolicy.c:2093
  alloc_pages include/linux/gfp.h:494 [inline]
  kmalloc_order mm/slab_common.c:1148 [inline]
  kmalloc_order_trace+0xbb/0x390 mm/slab_common.c:1159
  kmalloc_large include/linux/slab.h:446 [inline]
  __kmalloc+0x335/0x350 mm/slub.c:3805
  kmalloc include/linux/slab.h:517 [inline]
  ip_vs_lblc_init_svc+0x57/0x310 net/netfilter/ipvs/ip_vs_lblc.c:355
  ip_vs_bind_scheduler+0xa9/0x1f0 net/netfilter/ipvs/ip_vs_sched.c:51
  ip_vs_add_service+0xa9d/0x1d90 net/netfilter/ipvs/ip_vs_ctl.c:1265
  do_ip_vs_set_ctl+0x2aa9/0x2cd0 net/netfilter/ipvs/ip_vs_ctl.c:2462
  nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
  nf_setsockopt+0x47c/0x4e0 net/netfilter/nf_sockopt.c:115
  ip_setsockopt+0x24b/0x2b0 net/ipv4/ip_sockglue.c:1251
  udp_setsockopt+0x108/0x1b0 net/ipv4/udp.c:2416
  ipv6_setsockopt+0x311/0x350 net/ipv6/ipv6_sockglue.c:917
  tcp_setsockopt+0x1c0/0x1f0 net/ipv4/tcp.c:2891
  sock_common_setsockopt+0x13b/0x170 net/core/sock.c:3039
  __sys_setsockopt+0x496/0x540 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0x15c/0x1c0 net/socket.c:1911
  do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-22  6:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, anders.roxell
In-Reply-To: <20180622.134913.2049651808540037401.davem@davemloft.net>

On 6/21/2018 9:49 PM, David Miller wrote:
> From: Shannon Nelson <shannon.nelson@oracle.com>
> Date: Tue, 19 Jun 2018 22:42:41 -0700
> 
>> A couple of bad behaviors in the ipsec selftest were pointed out
>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>
>> Shannon Nelson (2):
>>    selftests: rtnetlink: hide complaint from terminated monitor
>>    selftests: rtnetlink: use a local IP address for IPsec tests
> 
> Series applied, but I wonder about patch #2.
> 
> The idea is that we don't make modifications to the actual system
> networking configuration and therefore make changes that can't
> possibly disrupt connectivity for the system under test.
> 
> Using a configured local IP address seems to subvert that.

Yeah, I'm not so thrilled with it either.  I've got a couple more 
changes coming Real Soon Now that extend netdevsim and add a couple of 
tests for ipsec-hw-offload, so while I finish those up I can change this 
again and make use of netdevsim to leave existing devices alone.

For that matter, if you want to cut down on patch thrash, just drop patch 2.

sln

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: David Wu @ 2018-06-22  7:22 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel
In-Reply-To: <2582999.2hZx6CH9S6@diego>

Hi Heiko,

在 2018年06月14日 16:30, Heiko Stübner 写道:
> And someone could convert the driver to use the new clk-bulk APIs [0],
> so the large number of clk_prepare_enable calls would be a bit
> trimmed down.

Some clocks need special treatment at special cases, may not know which 
index is we need at clk_bulk_data struct.
1. At rmii mode, need to use mac_ref, mac_refout; but at rgmii, they are 
not needed.
2. At rgmii mode, rx is coming in from external source, there is no 
gate, and it is coming from mac_ref_clk at rmii mode, there is gate.
3. clk_mac needs to be configured rate 50M or 125M.
4. mac_clk_speed needs to be configured at PX30 Soc and next Socs.

It looks like use the clk-bulk, will not be more flexible, and we still 
keep the present. What do you think?

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: David Miller @ 2018-06-22  7:27 UTC (permalink / raw)
  To: shannon.nelson; +Cc: netdev, anders.roxell
In-Reply-To: <20059528-5d1d-cb7c-26f8-f9055bd807c2@oracle.com>

From: Shannon Nelson <shannon.nelson@oracle.com>
Date: Thu, 21 Jun 2018 23:50:36 -0700

> For that matter, if you want to cut down on patch thrash, just drop
> patch 2.

Too late, already in my tree :)

Don't worry about it for now.

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: Heiko Stuebner @ 2018-06-22  7:28 UTC (permalink / raw)
  To: David Wu
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel
In-Reply-To: <7173b45f-17d3-2356-fede-28bdd5c658f2@rock-chips.com>

Hi David,

Am Freitag, 22. Juni 2018, 09:22:35 CEST schrieb David Wu:
> 在 2018年06月14日 16:30, Heiko Stübner 写道:
> > And someone could convert the driver to use the new clk-bulk APIs [0],
> > so the large number of clk_prepare_enable calls would be a bit
> > trimmed down.
> 
> Some clocks need special treatment at special cases, may not know which 
> index is we need at clk_bulk_data struct.
> 1. At rmii mode, need to use mac_ref, mac_refout; but at rgmii, they are 
> not needed.
> 2. At rgmii mode, rx is coming in from external source, there is no 
> gate, and it is coming from mac_ref_clk at rmii mode, there is gate.
> 3. clk_mac needs to be configured rate 50M or 125M.
> 4. mac_clk_speed needs to be configured at PX30 Soc and next Socs.
> 
> It looks like use the clk-bulk, will not be more flexible, and we still 
> keep the present. What do you think?

yeah, you're probably right. I just saw all these clk_prepare_enable calls
and didn't think enough about the config side ;-) .


Heiko

^ permalink raw reply

* Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
From: Andrew Lunn @ 2018-06-22  7:28 UTC (permalink / raw)
  To: Don Bollinger; +Cc: netdev, Florian Fainelli
In-Reply-To: <20180621202809.enukvcdzahdon3lx@thebollingers.org>

> Got it.  I'm targeting a different market, with a different
> architecture.  In this architecture it makes more sense to separate the
> EEPROM access from the IO pins control.

The fact you are targeting a different architecture is why you are
getting no traction. The closer you stick to the kernel architecture,
the more acceptable your changes are going to be.

> > have a network interface per port. So they cannot use ethtool
> > --module-info, which IS the defined API to get access to SFP
> > data. Adding another API is probably not going to get accepted.
> 
> Got it.  I don't think I'm adding another API.  Note that Cumulus is
> using the same architecture as optoe, and providing all the expected
> linux services, including ethtool --module-info.

Please point me to the in kernel code.

Cumulus has a lot of code out of mainline, which does not follow the
mainline way of doing things. Cumulus is between a rock and a hard
place. There are some switch vendors who simply won't do things the
netdev way. So they have to make use of the vendor SDKs. But they also
work with vendors like Mellonex which fully do things netdev way. So
you cannot use Cumulus as a reference, without pointing to actual in
kernel code.

> I'm offering an improvement to sfp.c.  The improvement is access to more
> pages of SFP EEPROM, and access to QSFP EEPROMs.  It comes in the form of
> a specialized EEPROM driver custom built for {Q}SFP devices.  I'm also
> offering to help integrate that driver into sfp.c.  I can modify optoe
> to accomodate sfp.c, I can recommend how to instantiate and call it. I am
> not going to be able to spend the time and money required to modify and
> test sfp.c.  I'm pretty sure you can do it MUCH faster, and MUCH better
> than I can.

You are however going the wrong way. We want ethtool --module-info to
show QSFP contents, and that is the only API the kernel needs to the
raw data. The core of optoe for accessing the EEPROM could be merged
into the SFP code, but the way optoe it exposes it via /sysfs is
unlikely to be accepted.

> > the SFP code. It has been on my TODO list to add HWMON support for the
> > temperature sensors, etc.
> 
> Huh.  Just read Documentation/hwmon/sysfs-interface.  Looks like a good
> way to deliver that EEPROM data.  Wish I'd found that two years ago when
> there were a few more dimes available.

Well, it is not all the EEPROM data. Just sensors. But that is the
kernel way of exporting sensors.

       Andrew

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: Heiko Stuebner @ 2018-06-22  7:30 UTC (permalink / raw)
  To: David Wu
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel, 张晴
In-Reply-To: <157ecfc9-d0e6-7782-1cbc-d0fb76c81edb@rock-chips.com>

Hi David,

Am Mittwoch, 20. Juni 2018, 04:40:35 CEST schrieb David Wu:
> 在 2018年06月14日 16:30, Heiko Stübner 写道:
> > Am Donnerstag, 14. Juni 2018, 10:14:31 CEST schrieb David Wu:
> >> Hi Heiko,
> >>
> >> 在 2018年06月14日 15:54, Heiko Stübner 写道:
> >>> I don't see that new clock documented in the dt-binding.
> >>> Also, which clock from the clock-controller does this connect to?
> >>
> >> The clock is the "SCLK_GMAC_RMII" at the clock-controller, which could
> >> be set rate by the link speed.
> > 
> > Hmm, while these huge number of clocks are somewhat strange,
> > shouldn't it be named something with _rmii instead of _speed then?
> 
> Okay, it is better to be named _speed.
> 
> > 
> > Also, I don't see any clk_enable action for that new clock, so you could
> > end up with being off?
> 
> The new speed is the parent of the clk_tx_rx, to enable/disable 
> clk_tx_rx, the new clock would be also enabled/disabled.

Still it is nicer to really enable it, so that the clock-framework can keep
track of usage counts.

Because also no-one hinders the chip-designer from putting a gate in
between in one of the next socs ;-)


Heiko

^ permalink raw reply

* Re: WARNING: refcount bug in smap_release_sock
From: syzbot @ 2018-06-22  7:35 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <0000000000004f02b3056e895cce@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1609ed08400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=d464d2c20c717ef5a6a8
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=10a53fbf800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d27aa0400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
device lo entered promiscuous mode
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 4505 at lib/refcount.c:187  
refcount_sub_and_test+0x2d3/0x330 lib/refcount.c:187
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 4505 Comm: syz-executor540 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  panic+0x22f/0x4de kernel/panic.c:184
  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
  report_bug+0x252/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:refcount_sub_and_test+0x2d3/0x330 lib/refcount.c:187
Code: 89 de e8 40 7e 21 fe 84 db 74 07 31 db e9 52 ff ff ff e8 60 7d 21 fe  
48 c7 c7 20 4b 1a 88 c6 05 78 64 40 06 01 e8 8d 97 ed fd <0f> 0b 31 db e9  
31 ff ff ff 48 8b bd 28 ff ff ff 89 85 34 ff ff ff
RSP: 0018:ffff8801b18b7800 EFLAGS: 00010282
RAX: 0000000000000026 RBX: 0000000000000000 RCX: ffffffff8161907a
RDX: 0000000000000000 RSI: ffffffff8161f371 RDI: ffff8801b18b74d8
RBP: ffff8801b18b78e8 R08: ffff8801b24923c0 R09: 0000000000000006
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
R13: ffff8801b18b78c0 R14: 0000000000000001 R15: ffff8801b318f040
  refcount_dec_and_test+0x1a/0x20 lib/refcount.c:212
  smap_release_sock+0x6e/0x2f0 kernel/bpf/sockmap.c:1358
  sock_hash_ctx_update_elem.isra.24+0x896/0x1560 kernel/bpf/sockmap.c:2281
  sock_hash_update_elem+0x14f/0x2d0 kernel/bpf/sockmap.c:2303
  map_update_elem+0x5c4/0xc90 kernel/bpf/syscall.c:765
  __do_sys_bpf kernel/bpf/syscall.c:2357 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
  __x64_sys_bpf+0x32d/0x510 kernel/bpf/syscall.c:2328
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445a69
Code: e8 3c b6 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 db 51 00 00 c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f90f7ac8db8 EFLAGS: 00000293 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00000000006dac94 RCX: 0000000000445a69
RDX: 0000000000000020 RSI: 0000000020000180 RDI: 0000000000000002
RBP: 00000000006dac90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 00007ffc108bd18f R14: 00007f90f7ac99c0 R15: 0000000000000001
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

^ permalink raw reply

* Re: [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver
From: Ilias Apalodimas @ 2018-06-22  7:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ivan Vecera, Florian Fainelli, Andrew Lunn, Networking,
	Grygorii Strashko, ivan.khoronzhuk, Sekhar Nori,
	Jiří Pírko, Francois Ozog, yogeshs, spatton,
	Jose.Abreu
In-Reply-To: <CAK8P3a0wAE+8kvyuF-y3oaz+3Req3Jrv3jr-x2c0LWZ39ztVXg@mail.gmail.com>

On Thu, Jun 21, 2018 at 05:31:31PM +0200, Arnd Bergmann wrote:
> On Thu, Jun 21, 2018 at 2:45 PM, Ilias Apalodimas
> <ilias.apalodimas@linaro.org> wrote:
> > On Thu, Jun 21, 2018 at 02:19:55PM +0200, Ivan Vecera wrote:
> 
> > The driver is currently widely used and that's the reason we tried to avoid
> > rewriting it. The current driver uses a DTS option to distinguish between two
> > existing modes. This patch just adds a third one. So to my understanding we
> > have the following options:
> > 1. The driver already uses DTS to configure the hardware mode. Although this is
> > techincally wrong, we can add a third mode on DTS called 'switchdev;', get rid
> > of the .config option and keep the configuration method common (although not
> > optimal).
> > 2. Keep the .config option which overrides the 2 existing modes.
> > 3. Introduce a devlink option. If this is applied for all 3 modes, it will break
> > backwards compatibility, so it's not an option. If it's introduced for
> > configuring 'switchdev' mode only, we fall into the same pitfall as option 2),
> > we have something that overrides our current config, slightly better though
> > since it's not a compile time option.
> > 4. rewrite the driver
> 
> As I understand it, the switchdev support can also be added without
> becoming incompatible with the existing behavior, this is how I would
> suggest it gets added in a way that keeps the existing DT binding and
> user view while adding switchdev:
> 
> * In non-"dual-emac" mode, show only one network device that is
>   configured as a transparent switch as today. Any users that today
>   add TI's third-party ioctl interface with a non-upstreamable patch
>   can keep using this mode and try to forward-port that patch.
Correct
> * In "dual-emac" mode (as selected through DT), the hardware is
>    configured to be viewed as two separate network devices as before,
>    regardless of kernel configuration. Users can add the two device
>    to a bridge device as before, and enabling switchdev support in
>    the kernel configuration (based on your patch series) would change
>    nothing else than using hardware support in the background to
>    reconfigure the HW VLAN settings.
> 
> This does not require using devlink, adding a third mode, or changing
> the DT binding or the user-visible behavior when switchdev is enabled,
> but should get all the features you want.
> 
Correct again. This is doable and the changes on the current patchset are
somewhat trivial (detecting a bridge and making the configuration changes
on the fly).
> > If it was a brand new driver, i'd definitely pick 4. Since it's a pre-existing
> > driver though i can't rule out the rest of the options.
> 
> I think the suggestion was to have a new driver with a new binding
> so that the DT could choose between the two drivers, one with
> somewhat obscure behavior and the other with proper behavior.
> 
> However, from what I can tell, the only requirement to get a somewhat
> reasonable behavior is that you enable "dual-emac" mode in DT
> to get switchdev support. It would be trivial to add a new compatible
> value that only allows that mode along with supporting switchdev,
> but I don't think that's necessary here.
> 
> Writing a new driver might also be a good idea (depending on the
> quality of the existing one, I haven't looked in detail), but again
> I would see no reason for the new driver to be incompatible with
> the existing binding, so a gradual cleanup seems like a better
> approach.
Agree
> 
>        Arnd

If people like this idea, i can send a V3 with these changes.

Thanks
Ilias

^ permalink raw reply

* Re: [PATCH 3/6] arcnet: com20020: Add com20020 io mapped version
From: kbuild test robot @ 2018-06-22  8:11 UTC (permalink / raw)
  To: Andrea Greco
  Cc: kbuild-all, davem, tobin, Andrea Greco, Michael Grzeschik,
	linux-kernel, netdev
In-Reply-To: <20180611142635.20712-1-andrea.greco.gapmilano@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

Hi Andrea,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.18-rc1 next-20180622]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Andrea-Greco/arcnet-leds-Removed-leds-dependecy/20180611-222941
config: x86_64-randconfig-s4-06220549 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "com20020_found" [drivers/net/arcnet/com20020-io.ko] undefined!
>> ERROR: "com20020_check" [drivers/net/arcnet/com20020-io.ko] undefined!
>> ERROR: "com20020_netdev_ops" [drivers/net/arcnet/com20020-io.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32221 bytes --]

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: David Wu @ 2018-06-22  8:13 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel, 张晴
In-Reply-To: <18221590.FEDROxemCD@phil>

Hi Heiko,

在 2018年06月22日 15:30, Heiko Stuebner 写道:
> Hi David,
> 
> Am Mittwoch, 20. Juni 2018, 04:40:35 CEST schrieb David Wu:
>> 在 2018年06月14日 16:30, Heiko Stübner 写道:
>>> Am Donnerstag, 14. Juni 2018, 10:14:31 CEST schrieb David Wu:
>>>> Hi Heiko,
>>>>
>>>> 在 2018年06月14日 15:54, Heiko Stübner 写道:
>>>>> I don't see that new clock documented in the dt-binding.
>>>>> Also, which clock from the clock-controller does this connect to?
>>>>
>>>> The clock is the "SCLK_GMAC_RMII" at the clock-controller, which could
>>>> be set rate by the link speed.
>>>
>>> Hmm, while these huge number of clocks are somewhat strange,
>>> shouldn't it be named something with _rmii instead of _speed then?
>>
>> Okay, it is better to be named _speed.
>>
>>>
>>> Also, I don't see any clk_enable action for that new clock, so you could
>>> end up with being off?
>>
>> The new speed is the parent of the clk_tx_rx, to enable/disable
>> clk_tx_rx, the new clock would be also enabled/disabled.
> 
> Still it is nicer to really enable it, so that the clock-framework can keep
> track of usage counts.
> 
> Because also no-one hinders the chip-designer from putting a gate in
> between in one of the next socs ;-)
> 

Okay, i will add the enable/disable for clk_mac_speed.
  ;-)

> 
> Heiko
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH net] net: mvneta: fix the Rx desc DMA address in the Rx path
From: Antoine Tenart @ 2018-06-22  8:15 UTC (permalink / raw)
  To: davem, gregory.clement, mw
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman

When using s/w buffer management, buffers are allocated and DMA mapped.
When doing so on an arm64 platform, an offset correction is applied on
the DMA address, before storing it in an Rx descriptor. The issue is
this DMA address is then used later in the Rx path without removing the
offset correction. Thus the DMA address is wrong, which can led to
various issues.

This patch fixes this by removing the offset correction from the DMA
address retrieved from the Rx descriptor before using it in the Rx path.

Fixes: 8d5047cf9ca2 ("net: mvneta: Convert to be 64 bits compatible")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 17a904cc6a5e..0ad2f3f7da85 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1932,7 +1932,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
 		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
 		index = rx_desc - rxq->descs;
 		data = rxq->buf_virt_addr[index];
-		phys_addr = rx_desc->buf_phys_addr;
+		phys_addr = rx_desc->buf_phys_addr - pp->rx_offset_correction;
 
 		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
 		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] net: Fix device name resolving crash in default_device_exit()
From: Kirill Tkhai @ 2018-06-22  8:36 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, daniel, jakub.kicinski, ast, linux, john.fastabend, brouer
In-Reply-To: <5e4b9ecd-e7f5-87e4-2b45-2aaaca125442@gmail.com>

On 21.06.2018 18:28, David Ahern wrote:
> On 6/21/18 4:03 AM, Kirill Tkhai wrote:
>>> This patch does not remove the BUG, so does not really solve the
>>> problem. ie., it is fairly trivial to write a script (32k dev%d named
>>> devices in init_net) that triggers it again, so your commit subject and
>>> commit log are not correct with the references to 'fixing the problem'.
>>
>> 1)I'm not agree with you and I don't think removing the BUG() is a good idea.
>> This function is called from the place, where it must not fail. But it can
>> fail, and the problem with name is not the only reason of this happens.
>> We can't continue further pernet_operations in case of a problem happened
>> in default_device_exit(), and we can't remove the BUG() before this function
>> becomes of void type. But we are not going to make it of void type. So
>> we can't remove the BUG().
> 
> You missed my point: that the function can still fail means you are not
> "fixing" the problem, only delaying it.

Till the function is of int type and it can fail, we can't remove the BUG().
And this does not connected with name resolution.

>>
>> 2)In case of the script is trivial, can't you just post it here to show
>> what type of devices you mean? Is there real problem or this is
>> a theoretical thinking?
> 
> Current code:
> 
> # ip li sh dev eth2
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
> # ip netns add fubar
> # ip li set eth2 netns fubar
> # ip li add eth2 type dummy
> # ip li add dev4 type dummy
> # ip netns del fubar
> --> BUG
> kernel:[78079.127748] default_device_exit: failed to move eth2 to
> init_net: -17
> 
> 
> With your patch:
> 
> # ip li sh dev eth2
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> mode DEFAULT group default qlen 1000
>     link/ether 02:e0:f9:46:64:80 brd ff:ff:ff:ff:ff:ff
> # ip netns add fubar
> # ip li set eth2 netns fubar
> # ip li add eth2 type dummy
> # for n in $(seq 0 $((32*1024))); do
>   echo "li add dev${n} type dummy"
>   done > ip.batch
> # ip -batch ip.batch
> # ip netns del fubar
> --> BUG
> kernel:[   25.800024] default_device_exit: failed to move eth2 to
> init_net: -17

Yeah, this has a sense.

>>
>> All virtual devices I see have rtnl_link_ops, so that they just destroyed
>> in default_device_exit_batch(). According to physical devices, it's difficult
>> to imagine a node with 32k physical devices, and if someone tried to deploy
>> them it may meet problems not only in this place.
> 
> Nothing says it has to be a physical device. It is only checking for a name.
> 
>>
>>> The change does provide more variability in naming and reduces the
>>> likelihood of not being able to push a device back to init_net.
>>
>> No, it provides. With the patch one may move real device to a container,
>> and allow to do with the device anything including changing of device
>> index. Then, the destruction of the container does not resilt a kernel
>> panic just because of two devices have the same index.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox