Netdev List

Netdev List
 help / color / mirror / Atom feed

* WARNING in cgroup_rstat_updated
From: syzbot @ 2019-08-07  3:18 UTC (permalink / raw)
  To: linux-kernel, linux-mm, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    31cc088a Merge tag 'drm-next-2019-07-19' of git://anongit...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=102db48c600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7
dashboard link: https://syzkaller.appspot.com/bug?extid=370e4739fa489334a4ef
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dd57dc600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+370e4739fa489334a4ef@syzkaller.appspotmail.com

8021q: adding VLAN 0 to HW filter on device batadv0
WARNING: CPU: 1 PID: 9095 at mm/page_counter.c:62 page_counter_cancel  
mm/page_counter.c:62 [inline]
WARNING: CPU: 1 PID: 9095 at mm/page_counter.c:62  
page_counter_cancel+0x5a/0x70 mm/page_counter.c:55
Kernel panic - not syncing: panic_on_warn set ...
Shutting down cpus with NMI
Kernel Offset: disabled

======================================================
WARNING: possible circular locking dependency detected
5.2.0+ #67 Not tainted
------------------------------------------------------
syz-executor.2/9306 is trying to acquire lock:
00000000e4252251 ((console_sem).lock){-.-.}, at: down_trylock+0x13/0x70  
kernel/locking/semaphore.c:135

but task is already holding lock:
000000000fdb8781 (per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)){-...}, at:  
cgroup_rstat_updated+0x115/0x2f0 kernel/cgroup/rstat.c:49

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)){-...}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
        cgroup_rstat_updated+0x115/0x2f0 kernel/cgroup/rstat.c:49
        cgroup_base_stat_cputime_account_end.isra.0+0x1d/0x60  
kernel/cgroup/rstat.c:361
        __cgroup_account_cputime+0x9e/0xd0 kernel/cgroup/rstat.c:371
        cgroup_account_cputime include/linux/cgroup.h:782 [inline]
        update_curr+0x3c8/0x8d0 kernel/sched/fair.c:862
        dequeue_entity+0x1e/0x1100 kernel/sched/fair.c:4014
        dequeue_task_fair+0x65/0x870 kernel/sched/fair.c:5306
        dequeue_task+0x77/0x2e0 kernel/sched/core.c:1195
        sched_move_task+0x1fb/0x350 kernel/sched/core.c:6847
        cpu_cgroup_attach+0x6d/0xb0 kernel/sched/core.c:6970
        cgroup_migrate_execute+0xc56/0x1350 kernel/cgroup/cgroup.c:2524
        cgroup_migrate+0x14f/0x1f0 kernel/cgroup/cgroup.c:2780
        cgroup_attach_task+0x57f/0x860 kernel/cgroup/cgroup.c:2817
        cgroup_procs_write+0x340/0x400 kernel/cgroup/cgroup.c:4777
        cgroup_file_write+0x241/0x790 kernel/cgroup/cgroup.c:3754
        kernfs_fop_write+0x2b8/0x480 fs/kernfs/file.c:315
        __vfs_write+0x8a/0x110 fs/read_write.c:494
        vfs_write+0x268/0x5d0 fs/read_write.c:558
        ksys_write+0x14f/0x290 fs/read_write.c:611
        __do_sys_write fs/read_write.c:623 [inline]
        __se_sys_write fs/read_write.c:620 [inline]
        __x64_sys_write+0x73/0xb0 fs/read_write.c:620
        do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #2 (&rq->lock){-.-.}:
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
        rq_lock kernel/sched/sched.h:1207 [inline]
        task_fork_fair+0x6a/0x520 kernel/sched/fair.c:9940
        sched_fork+0x3af/0x900 kernel/sched/core.c:2783
        copy_process+0x1b04/0x6b00 kernel/fork.c:1987
        _do_fork+0x146/0xfa0 kernel/fork.c:2369
        kernel_thread+0xbb/0xf0 kernel/fork.c:2456
        rest_init+0x28/0x37b init/main.c:417
        arch_call_rest_init+0xe/0x1b
        start_kernel+0x912/0x951 init/main.c:785
        x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:472
        x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:453
        secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

-> #1 (&p->pi_lock){-.-.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
        try_to_wake_up+0xb0/0x1aa0 kernel/sched/core.c:2432
        wake_up_process+0x10/0x20 kernel/sched/core.c:2548
        __up.isra.0+0x136/0x1a0 kernel/locking/semaphore.c:261
        up+0x9c/0xe0 kernel/locking/semaphore.c:186
        __up_console_sem+0xb7/0x1c0 kernel/printk/printk.c:244
        console_unlock+0x695/0xf10 kernel/printk/printk.c:2481
        vprintk_emit+0x2a0/0x700 kernel/printk/printk.c:1986
        vprintk_default+0x28/0x30 kernel/printk/printk.c:2013
        vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:386
        printk+0xba/0xed kernel/printk/printk.c:2046
        check_stack_usage kernel/exit.c:765 [inline]
        do_exit.cold+0x18b/0x314 kernel/exit.c:927
        do_group_exit+0x135/0x360 kernel/exit.c:981
        __do_sys_exit_group kernel/exit.c:992 [inline]
        __se_sys_exit_group kernel/exit.c:990 [inline]
        __x64_sys_exit_group+0x44/0x50 kernel/exit.c:990
        do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 ((console_sem).lock){-.-.}:
        check_prev_add kernel/locking/lockdep.c:2405 [inline]
        check_prevs_add kernel/locking/lockdep.c:2507 [inline]
        validate_chain kernel/locking/lockdep.c:2897 [inline]
        __lock_acquire+0x25a9/0x4c30 kernel/locking/lockdep.c:3880
        lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4413
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
        down_trylock+0x13/0x70 kernel/locking/semaphore.c:135
        __down_trylock_console_sem+0xa8/0x210 kernel/printk/printk.c:227
        console_trylock+0x15/0xa0 kernel/printk/printk.c:2297
        console_trylock_spinning kernel/printk/printk.c:1706 [inline]
        vprintk_emit+0x283/0x700 kernel/printk/printk.c:1985
        vprintk_default+0x28/0x30 kernel/printk/printk.c:2013
        vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:386
        printk+0xba/0xed kernel/printk/printk.c:2046
        kasan_die_handler arch/x86/mm/kasan_init_64.c:254 [inline]
        kasan_die_handler.cold+0x11/0x23 arch/x86/mm/kasan_init_64.c:249
        notifier_call_chain+0xc2/0x230 kernel/notifier.c:95
        __atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:185
        atomic_notifier_call_chain kernel/notifier.c:195 [inline]
        notify_die+0xfb/0x180 kernel/notifier.c:551
        do_general_protection+0x13d/0x300 arch/x86/kernel/traps.c:558
        general_protection+0x1e/0x30 arch/x86/entry/entry_64.S:1181
        cgroup_rstat_updated+0x174/0x2f0 kernel/cgroup/rstat.c:64
        cgroup_base_stat_cputime_account_end.isra.0+0x1d/0x60  
kernel/cgroup/rstat.c:361
        __cgroup_account_cputime_field+0xd3/0x130 kernel/cgroup/rstat.c:395
        cgroup_account_cputime_field include/linux/cgroup.h:797 [inline]
        task_group_account_field kernel/sched/cputime.c:109 [inline]
        account_system_index_time+0x1f7/0x390 kernel/sched/cputime.c:172
        irqtime_account_process_tick.isra.0+0x386/0x490  
kernel/sched/cputime.c:389
        account_process_tick+0x27f/0x350 kernel/sched/cputime.c:484
        update_process_times+0x25/0x80 kernel/time/timer.c:1637
        tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:167
        tick_sched_timer+0x53/0x140 kernel/time/tick-sched.c:1296
        __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
        __hrtimer_run_queues+0x364/0xe40 kernel/time/hrtimer.c:1451
        hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
        local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1068 [inline]
        smp_apic_timer_interrupt+0x160/0x610 arch/x86/kernel/apic/apic.c:1093
        apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:828

other info that might help us debug this:

Chain exists of:
   (console_sem).lock --> &rq->lock --> per_cpu_ptr(&cgroup_rstat_cpu_lock,  
cpu)

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu));
                                lock(&rq->lock);
                                lock(per_cpu_ptr(&cgroup_rstat_cpu_lock,  
cpu));
   lock((console_sem).lock);

  *** DEADLOCK ***

6 locks held by syz-executor.2/9306:
  #0: 0000000032d2cedf (&sb->s_type->i_mutex_key#12){+.+.}, at: inode_lock  
include/linux/fs.h:778 [inline]
  #0: 0000000032d2cedf (&sb->s_type->i_mutex_key#12){+.+.}, at:  
__sock_release+0x89/0x280 net/socket.c:589
  #1: 000000002033d24d (sk_lock-AF_INET6){+.+.}, at: lock_sock  
include/net/sock.h:1522 [inline]
  #1: 000000002033d24d (sk_lock-AF_INET6){+.+.}, at: tcp_close+0x27/0x10e0  
net/ipv4/tcp.c:2329
  #2: 0000000067f2fc6a (rcu_read_lock){....}, at: tcp_bpf_unhash+0x0/0x390  
net/ipv4/tcp_bpf.c:480
  #3: 0000000067f2fc6a (rcu_read_lock){....}, at: arch_atomic64_add  
arch/x86/include/asm/atomic64_64.h:46 [inline]
  #3: 0000000067f2fc6a (rcu_read_lock){....}, at: atomic64_add  
include/asm-generic/atomic-instrumented.h:873 [inline]
  #3: 0000000067f2fc6a (rcu_read_lock){....}, at: account_group_system_time  
include/linux/sched/cputime.h:154 [inline]
  #3: 0000000067f2fc6a (rcu_read_lock){....}, at:  
account_system_index_time+0xf7/0x390 kernel/sched/cputime.c:169
  #4: 000000000fdb8781 (per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)){-...}, at:  
cgroup_rstat_updated+0x115/0x2f0 kernel/cgroup/rstat.c:49
  #5: 0000000067f2fc6a (rcu_read_lock){....}, at:  
__atomic_notifier_call_chain+0x0/0x1a0 kernel/notifier.c:404

stack backtrace:
CPU: 0 PID: 9306 Comm: syz-executor.2 Not tainted 5.2.0+ #67
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_circular_bug.cold+0x163/0x172 kernel/locking/lockdep.c:1617
  check_noncircular+0x345/0x3e0 kernel/locking/lockdep.c:1741
  check_prev_add kernel/locking/lockdep.c:2405 [inline]
  check_prevs_add kernel/locking/lockdep.c:2507 [inline]
  validate_chain kernel/locking/lockdep.c:2897 [inline]
  __lock_acquire+0x25a9/0x4c30 kernel/locking/lockdep.c:3880
  lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4413
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
  down_trylock+0x13/0x70 kernel/locking/semaphore.c:135
  __down_trylock_console_sem+0xa8/0x210 kernel/printk/printk.c:227
  console_trylock+0x15/0xa0 kernel/printk/printk.c:2297
  console_trylock_spinning kernel/printk/printk.c:1706 [inline]
  vprintk_emit+0x283/0x700 kernel/printk/printk.c:1985
  vprintk_default+0x28/0x30 kernel/printk/printk.c:2013
  vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:386
  printk+0xba/0xed kernel/printk/printk.c:2046
  kasan_die_handler arch/x86/mm/kasan_init_64.c:254 [inline]
  kasan_die_handler.cold+0x11/0x23 arch/x86/mm/kasan_init_64.c:249
  notifier_call_chain+0xc2/0x230 kernel/notifier.c:95
  __atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:185
  atomic_notifier_call_chain kernel/notifier.c:195 [inline]
  notify_die+0xfb/0x180 kernel/notifier.c:551
  do_general_protection+0x13d/0x300 arch/x86/kernel/traps.c:558
  general_protection+0x1e/0x30 arch/x86/entry/entry_64.S:1181
RIP: 0010:cgroup_rstat_updated+0x174/0x2f0 kernel/cgroup/rstat.c:64
Code: 00 fc ff df 48 8b 45 c0 48 c1 e8 03 4c 01 f8 48 89 45 c8 eb 60 e8 6c  
e1 05 00 49 8d 7c 24 30 48 8b 55 d0 49 89 f9 49 c1 e9 03 <43> 80 3c 39 00  
0f 85 00 01 00 00 49 8b 7c 24 30 48 89 7a 38 49 8d
RSP: 0018:ffff8880ae809c08 EFLAGS: 00010006
RAX: ffff88809378a480 RBX: 0000000000000000 RCX: ffffffff8159c5ca
RDX: ffff8880ae800000 RSI: ffffffff816ca374 RDI: 47ff8883313e8861
RBP: ffff8880ae809c58 R08: 0000000000000004 R09: 08fff1106627d10c
R10: ffffed1015d0136d R11: 0000000000000003 R12: 47ff8883313e8831
R13: ffff88807b60a280 R14: ffffffff8626cbf5 R15: dffffc0000000000
  cgroup_base_stat_cputime_account_end.isra.0+0x1d/0x60  
kernel/cgroup/rstat.c:361
  __cgroup_account_cputime_field+0xd3/0x130 kernel/cgroup/rstat.c:395
  cgroup_account_cputime_field include/linux/cgroup.h:797 [inline]
  task_group_account_field kernel/sched/cputime.c:109 [inline]
  account_system_index_time+0x1f7/0x390 kernel/sched/cputime.c:172
  irqtime_account_process_tick.isra.0+0x386/0x490 kernel/sched/cputime.c:389
  account_process_tick+0x27f/0x350 kernel/sched/cputime.c:484
  update_process_times+0x25/0x80 kernel/time/timer.c:1637
  tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:167
  tick_sched_timer+0x53/0x140 kernel/time/tick-sched.c:1296
  __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
  __hrtimer_run_queues+0x364/0xe40 kernel/time/hrtimer.c:1451
  hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1068 [inline]
  smp_apic_timer_interrupt+0x160/0x610 arch/x86/kernel/apic/apic.c:1093
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:828
  </IRQ>
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH v3] mlx5: Use refcount_t for refcount
From: Leon Romanovsky @ 2019-08-07  3:17 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: hslester96@gmail.com, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	davem@davemloft.net, jgg@ziepe.ca, dledford@redhat.com
In-Reply-To: <cbea99e74a1f70b1a67357aaf2afdb55655cd2bd.camel@mellanox.com>

On Tue, Aug 06, 2019 at 08:40:11PM +0000, Saeed Mahameed wrote:
> On Tue, 2019-08-06 at 09:59 +0800, Chuhong Yuan wrote:
> > Reference counters are preferred to use refcount_t instead of
> > atomic_t.
> > This is because the implementation of refcount_t can prevent
> > overflows and detect possible use-after-free.
> > So convert atomic_t ref counters to refcount_t.
> >
> > Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> > ---
> > Changes in v3:
> >   - Merge v2 patches together.
> >
> >  drivers/infiniband/hw/mlx5/srq_cmd.c         | 6 +++---
> >  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 6 +++---
> >  include/linux/mlx5/driver.h                  | 3 ++-
> >  3 files changed, 8 insertions(+), 7 deletions(-)
> >
>
> LGTM, Leon, let me know if you are happy with this version,
> this should go to mlx5-next.

Thanks,
Acked-by: Leon Romanovsky <leonro@mellanox.com>

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: David Ahern @ 2019-08-07  3:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <20190807025933.GF20422@lunn.ch>

On 8/6/19 8:59 PM, Andrew Lunn wrote:
> However, zoom out a bit, from networking to the whole kernel. In
> general, across the kernel as a whole, resource management is done
> with cgroups. cgroups is the consistent operational model across the
> kernel as a whole.
> 
> So i think you need a second leg to your argument. You have said why
> devlink is the right way to do this. But you should also be able to
> say to Tejun Heo why cgroups is the wrong way to do this, going
> against the kernel as a whole model. Why is networking special?
> 

So you are saying mlxsw should be using a cgroups based API for its
resources? netdevsim is for testing kernel APIs sans hardware. Is that
not what the fib controller netdevsim is doing? It is from my perspective.

I am not the one arguing to change code and functionality that has
existed for 16 months. I am arguing that the existing resource
controller satisfies all existing goals (testing in kernel APIs) and
even satisfies additional ones - like a consistent user experience
managing networking resources. ie.., I see no reason to change what exists.

^ permalink raw reply

* [PATCH] team: Add vlan tx offload to hw_enc_features
From: YueHaibing @ 2019-08-07  2:38 UTC (permalink / raw)
  To: j.vosburgh, vfalico, andy, davem, jiri, jay.vosburgh
  Cc: linux-kernel, netdev, YueHaibing

We should also enable bonding's vlan tx offload in hw_enc_features,
pass the vlan packets to the slave devices with vlan tci, let them
to handle vlan tunneling offload implementation.

Fixes: 3268e5cb494d ("team: Advertise tunneling offload features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/team/team.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index abfa0da..e8089de 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1004,6 +1004,8 @@ static void __team_compute_features(struct team *team)
 
 	team->dev->vlan_features = vlan_features;
 	team->dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				     NETIF_F_HW_VLAN_CTAG_TX |
+				     NETIF_F_HW_VLAN_STAG_TX |
 				     NETIF_F_GSO_UDP_L4;
 	team->dev->hard_header_len = max_hard_header_len;
 
-- 
2.7.4



^ permalink raw reply related

* [net-next v3] tipc: add loopback device tracking
From: john.rutherford @ 2019-08-07  2:52 UTC (permalink / raw)
  To: davem, netdev, tipc-discussion; +Cc: John Rutherford

From: John Rutherford <john.rutherford@dektech.com.au>

Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.

We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface.  The clones are dropped at reception
and have no functional role except making the traffic visible.

The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.

v3 - Function dev_nit_active used to check for network taps.
   - Procedure netif_rx_ni used to send cloned messages to loopback device.

Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
---
 net/tipc/bcast.c  |  4 +++-
 net/tipc/bearer.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/tipc/bearer.h | 10 +++++++++
 net/tipc/core.c   |  5 +++++
 net/tipc/core.h   |  3 +++
 net/tipc/node.c   |  1 +
 net/tipc/topsrv.c |  2 ++
 7 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 6c997d4..235331d 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -406,8 +406,10 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
 			rc = tipc_bcast_xmit(net, pkts, cong_link_cnt);
 	}
 
-	if (dests->local)
+	if (dests->local) {
+		tipc_loopback_trace(net, &localq);
 		tipc_sk_mcast_rcv(net, &localq, &inputq);
+	}
 exit:
 	/* This queue should normally be empty by now */
 	__skb_queue_purge(pkts);
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 2bed658..93c9616 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -389,6 +389,11 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b,
 		dev_put(dev);
 		return -EINVAL;
 	}
+	if (dev == net->loopback_dev) {
+		dev_put(dev);
+		pr_info("Enabling <%s> not permitted\n", b->name);
+		return -EINVAL;
+	}
 
 	/* Autoconfigure own node identity if needed */
 	if (!tipc_own_id(net) && hwaddr_len <= NODE_ID_LEN) {
@@ -674,6 +679,65 @@ void tipc_bearer_stop(struct net *net)
 	}
 }
 
+void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts)
+{
+	struct net_device *dev = net->loopback_dev;
+	struct sk_buff *skb, *_skb;
+	int exp;
+
+	skb_queue_walk(pkts, _skb) {
+		skb = pskb_copy(_skb, GFP_ATOMIC);
+		if (!skb)
+			continue;
+
+		exp = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
+		if (exp > 0 && pskb_expand_head(skb, exp, 0, GFP_ATOMIC)) {
+			kfree_skb(skb);
+			continue;
+		}
+
+		skb_reset_network_header(skb);
+		dev_hard_header(skb, dev, ETH_P_TIPC, dev->dev_addr,
+				dev->dev_addr, skb->len);
+		skb->dev = dev;
+		skb->pkt_type = PACKET_HOST;
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		skb->protocol = eth_type_trans(skb, dev);
+		netif_rx_ni(skb);
+	}
+}
+
+static int tipc_loopback_rcv_pkt(struct sk_buff *skb, struct net_device *dev,
+				 struct packet_type *pt, struct net_device *od)
+{
+	consume_skb(skb);
+	return NET_RX_SUCCESS;
+}
+
+int tipc_attach_loopback(struct net *net)
+{
+	struct net_device *dev = net->loopback_dev;
+	struct tipc_net *tn = tipc_net(net);
+
+	if (!dev)
+		return -ENODEV;
+
+	dev_hold(dev);
+	tn->loopback_pt.dev = dev;
+	tn->loopback_pt.type = htons(ETH_P_TIPC);
+	tn->loopback_pt.func = tipc_loopback_rcv_pkt;
+	dev_add_pack(&tn->loopback_pt);
+	return 0;
+}
+
+void tipc_detach_loopback(struct net *net)
+{
+	struct tipc_net *tn = tipc_net(net);
+
+	dev_remove_pack(&tn->loopback_pt);
+	dev_put(net->loopback_dev);
+}
+
 /* Caller should hold rtnl_lock to protect the bearer */
 static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
 				struct tipc_bearer *bearer, int nlflags)
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 7f4c569..ea0f3c4 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -232,6 +232,16 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
 		      struct tipc_media_addr *dst);
 void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id,
 			 struct sk_buff_head *xmitq);
+void tipc_clone_to_loopback(struct net *net, struct sk_buff_head *pkts);
+int tipc_attach_loopback(struct net *net);
+void tipc_detach_loopback(struct net *net);
+
+static inline void tipc_loopback_trace(struct net *net,
+				       struct sk_buff_head *pkts)
+{
+	if (unlikely(dev_nit_active(net->loopback_dev)))
+		tipc_clone_to_loopback(net, pkts);
+}
 
 /* check if device MTU is too low for tipc headers */
 static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve)
diff --git a/net/tipc/core.c b/net/tipc/core.c
index c837072..23cb379 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -82,6 +82,10 @@ static int __net_init tipc_init_net(struct net *net)
 	if (err)
 		goto out_bclink;
 
+	err = tipc_attach_loopback(net);
+	if (err)
+		goto out_bclink;
+
 	return 0;
 
 out_bclink:
@@ -94,6 +98,7 @@ static int __net_init tipc_init_net(struct net *net)
 
 static void __net_exit tipc_exit_net(struct net *net)
 {
+	tipc_detach_loopback(net);
 	tipc_net_stop(net);
 	tipc_bcast_stop(net);
 	tipc_nametbl_stop(net);
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 7a68e1b..60d8295 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -125,6 +125,9 @@ struct tipc_net {
 
 	/* Cluster capabilities */
 	u16 capabilities;
+
+	/* Tracing of node internal messages */
+	struct packet_type loopback_pt;
 };
 
 static inline struct tipc_net *tipc_net(struct net *net)
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 550581d..16d251b 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1443,6 +1443,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
 	int rc;
 
 	if (in_own_node(net, dnode)) {
+		tipc_loopback_trace(net, list);
 		tipc_sk_rcv(net, list);
 		return 0;
 	}
diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
index f345662..e3a6ba1 100644
--- a/net/tipc/topsrv.c
+++ b/net/tipc/topsrv.c
@@ -40,6 +40,7 @@
 #include "socket.h"
 #include "addr.h"
 #include "msg.h"
+#include "bearer.h"
 #include <net/sock.h>
 #include <linux/module.h>
 
@@ -608,6 +609,7 @@ static void tipc_topsrv_kern_evt(struct net *net, struct tipc_event *evt)
 	memcpy(msg_data(buf_msg(skb)), evt, sizeof(*evt));
 	skb_queue_head_init(&evtq);
 	__skb_queue_tail(&evtq, skb);
+	tipc_loopback_trace(net, &evtq);
 	tipc_sk_rcv(net, &evtq);
 }
 
-- 
2.11.0


^ permalink raw reply related

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Andrew Lunn @ 2019-08-07  2:59 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet
In-Reply-To: <e0047c07-11a0-423c-9560-3806328a0d76@gmail.com>

On Tue, Aug 06, 2019 at 08:33:47PM -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).
> 
> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.

Hi David

Nice arguments.

However, zoom out a bit, from networking to the whole kernel. In
general, across the kernel as a whole, resource management is done
with cgroups. cgroups is the consistent operational model across the
kernel as a whole.

So i think you need a second leg to your argument. You have said why
devlink is the right way to do this. But you should also be able to
say to Tejun Heo why cgroups is the wrong way to do this, going
against the kernel as a whole model. Why is networking special?

      Andrew

^ permalink raw reply

* RE: Slowness forming TIPC cluster with explicit node addresses
From: Jon Maloy @ 2019-08-07  2:55 UTC (permalink / raw)
  To: Chris Packham, tipc-discussion@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1564959879.27215.18.camel@alliedtelesis.co.nz>



> -----Original Message-----
> From: Chris Packham <Chris.Packham@alliedtelesis.co.nz>
> Sent: 4-Aug-19 19:05
> To: Jon Maloy <jon.maloy@ericsson.com>; tipc-
> discussion@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: Slowness forming TIPC cluster with explicit node addresses
> 
> On Sun, 2019-08-04 at 21:53 +0000, Jon Maloy wrote:
> >
> > >
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> On
> > > Behalf Of Chris Packham
> > > Sent: 2-Aug-19 01:11
> > > To: Jon Maloy <jon.maloy@ericsson.com>; tipc-
> > > discussion@lists.sourceforge.net
> > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: Re: Slowness forming TIPC cluster with explicit node
> > > addresses
> > >
> > > On Mon, 2019-07-29 at 09:04 +1200, Chris Packham wrote:
> > > >
> > > > On Fri, 2019-07-26 at 13:31 +0000, Jon Maloy wrote:
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: netdev-owner@vger.kernel.org <netdev-
> > > owner@vger.kernel.org>
> > > >
> > > > >
> > > > > >
> > > > > > On Behalf Of Chris Packham
> > > > > > Sent: 25-Jul-19 19:37
> > > > > > To: tipc-discussion@lists.sourceforge.net
> > > > > > Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > > > > > Subject: Slowness forming TIPC cluster with explicit node
> > > > > > addresses
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm having problems forming a TIPC cluster between 2 nodes.
> > > > > >
> > > > > > This is the basic steps I'm going through on each node.
> > > > > >
> > > > > > modprobe tipc
> > > > > > ip link set eth2 up
> > > > > > tipc node set addr 1.1.5 # or 1.1.6 tipc bearer enable media
> > > > > > eth dev eth0
> > > > > eth2, I assume...
> > > > >
> > > > Yes sorry I keep switching between between Ethernet ports for
> > > > testing
> > > > so I hand edited the email.
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Then to confirm if the cluster is formed I use tipc link list
> > > > > >
> > > > > > [root@node-5 ~]# tipc link list
> > > > > > broadcast-link: up
> > > > > > ...
> > > > > >
> > > > > > Looking at tcpdump the two nodes are sending packets
> > > > > >
> > > > > > 22:30:05.782320 TIPC v2.0 1.1.5 > 0.0.0, headerlength 60
> > > > > > bytes,
> > > > > > MessageSize
> > > > > > 76 bytes, Neighbor Detection Protocol internal, messageType
> > > > > > Link
> > > > > > request
> > > > > > 22:30:05.863555 TIPC v2.0 1.1.6 > 0.0.0, headerlength 60
> > > > > > bytes,
> > > > > > MessageSize
> > > > > > 76 bytes, Neighbor Detection Protocol internal, messageType
> > > > > > Link
> > > > > > request
> > > > > >
> > > > > > Eventually (after a few minutes) the link does come up
> > > > > >
> > > > > > [root@node-6 ~]# tipc link list
> > > > > > broadcast-link: up
> > > > > > 1001006:eth2-1001005:eth2: up
> > > > > >
> > > > > > [root@node-5 ~]# tipc link list
> > > > > > broadcast-link: up
> > > > > > 1001005:eth2-1001006:eth2: up
> > > > > >
> > > > > > When I remove the "tipc node set addr" things seem to kick
> > > > > > into
> > > > > > life straight away
> > > > > >
> > > > > > [root@node-5 ~]# tipc link list
> > > > > > broadcast-link: up
> > > > > > 0050b61bd2aa:eth2-0050b61e6dfa:eth2: up
> > > > > >
> > > > > > So there appears to be some difference in behaviour between
> > > > > > having
> > > > > > an explicit node address and using the default. Unfortunately
> > > > > > our
> > > > > > application relies on setting the node addresses.
> > > > > I do this many times a day, without any problems. If there
> > > > > would be
> > > > > any time difference, I would expect the 'auto configurable'
> > > > > version
> > > > > to be slower, because it involves a DAD step.
> > > > > Are you sure you don't have any other nodes running in your
> > > > > system?
> > > > >
> > > > > ///jon
> > > > >
> > > > Nope the two nodes are connected back to back. Does the number of
> > > > Ethernet interfaces make a difference? As you can see I've got 3
> > > > on
> > > > each node. One is completely disconnected, one is for booting
> > > > over
> > > > TFTP
> > > >  (only used by U-boot) and the other is the USB Ethernet I'm
> > > > using for
> > > > testing.
> > > >
> > > So I can still reproduce this on nodes that only have one network
> > > interface and
> > > are the only things connected.
> > >
> > > I did find one thing that helps
> > >
> > > diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
> > > c138d68e8a69..49921dad404a 100644
> > > --- a/net/tipc/discover.c
> > > +++ b/net/tipc/discover.c
> > > @@ -358,10 +358,10 @@ int tipc_disc_create(struct net *net, struct
> > > tipc_bearer *b,
> > >         tipc_disc_init_msg(net, d->skb, DSC_REQ_MSG, b);
> > >
> > >         /* Do we need an address trial period first ? */
> > > -       if (!tipc_own_addr(net)) {
> > > +//     if (!tipc_own_addr(net)) {
> > >                 tn->addr_trial_end = jiffies +
> > > msecs_to_jiffies(1000);
> > >                 msg_set_type(buf_msg(d->skb), DSC_TRIAL_MSG);
> > > -       }
> > > +//     }
> > >         memcpy(&d->dest, dest, sizeof(*dest));
> > >         d->net = net;
> > >         d->bearer_id = b->identity;
> > >
> > > I think because with pre-configured addresses the duplicate address
> > > detection
> > > is skipped the shorter init phase is skipped. Would is make sense
> > > to
> > > unconditionally do the trial step? Or is there some better way to
> > > get things to
> > > transition with pre-assigned addresses.
> >
> > I am on vacation until the end of next-week, so I can't give you any
> > good analysis right now.
> 
> Thanks for taking the time to respond.
> 
> > To do the trial step doesn’t make much sense to me, -it would only
> > delay the setup unnecessarily (but with only 1 second).
> > Can you check the initial value of addr_trial_end when there a pre-
> > configured address?
> 
> I had the same thought. For both my devices 'addr_trial_end = 0' so I
> think tipc_disc_addr_trial_msg should end up with trial == false

I suggest you try initializing it to jiffies and see what happens.

///jon

> 
> >
> > ///jon
> >

^ permalink raw reply

* [PATCH v2 1/1] ixgbe: sync the first fragment unconditionally
From: Firo Yang @ 2019-08-07  2:49 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: alexander.h.duyck@linux.intel.com, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org, Firo Yang

In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
could possibly allocate a page, DMA memory buffer, for the first
fragment which is not suitable for Xen-swiotlb to do DMA operations.
Xen-swiotlb have to internally allocate another page for doing DMA
operations. It requires syncing between those two pages. However,
since commit f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path"), the unmap operation is performed with
DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.

To fix this problem, always sync before possibly performing a page
unmap operation.

Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path")
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Firo Yang <firo.yang@suse.com>
---

Changes from v1:
 * Imporved the patch description.
 * Added Reviewed-by: and Fixes: as suggested by Alexander Duyck

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbaf712d6529..200de9838096 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1825,13 +1825,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
 static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 				struct sk_buff *skb)
 {
-	/* if the page was released unmap it, else just sync our portion */
-	if (unlikely(IXGBE_CB(skb)->page_released)) {
-		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
-				     ixgbe_rx_pg_size(rx_ring),
-				     DMA_FROM_DEVICE,
-				     IXGBE_RX_DMA_ATTR);
-	} else if (ring_uses_build_skb(rx_ring)) {
+	if (ring_uses_build_skb(rx_ring)) {
 		unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK;
 
 		dma_sync_single_range_for_cpu(rx_ring->dev,
@@ -1848,6 +1842,14 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 					      skb_frag_size(frag),
 					      DMA_FROM_DEVICE);
 	}
+
+	/* If the page was released, just unmap it. */
+	if (unlikely(IXGBE_CB(skb)->page_released)) {
+		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+				     ixgbe_rx_pg_size(rx_ring),
+				     DMA_FROM_DEVICE,
+				     IXGBE_RX_DMA_ATTR);
+	}
 }
 
 /**
-- 
2.16.4


^ permalink raw reply related

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: David Ahern @ 2019-08-07  2:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jiri Pirko, netdev, davem, mlxsw, jakub.kicinski, f.fainelli,
	vivien.didelot, mkubecek, stephen, daniel, brouer, eric.dumazet,
	Jakub Kicinski
In-Reply-To: <20190806180346.GD17072@lunn.ch>

Some time back supported was added for devlink 'resources'. The idea is
that hardware (mlxsw) has limited resources (e.g., memory) that can be
allocated in certain ways (e.g., kvd for mlxsw) thus implementing
restrictions on the number of programmable entries (e.g., routes,
neighbors) by userspace.

I contend:

1. The kernel is an analogy to the hardware: it is programmed by
userspace, has limited resources (e.g., memory), and that users want to
control (e.g., limit) the number of networking entities that can be
programmed - routes, rules, nexthop objects etc and by address family
(ipv4, ipv6).

2. A consistent operational model across use cases - s/w forwarding, XDP
forwarding and hardware forwarding - is good for users deploying systems
based on the Linux networking stack. This aligns with my basic point at
LPC last November about better integration of XDP and kernel tables.

The existing devlink API is the right one for all use cases. Most
notably that the kernel can mimic the hardware from a resource
management. Trying to say 'use cgroups for s/w forwarding and devlink
for h/w forwarding' is complicating the lives of users. It is just a
model and models can apply to more than some rigid definition.

As for the namespace piece of this, the kernel's tables for networking
are *per namespace*, and so the resource controller must be per
namespace. This aligns with another consistent theme I have promoted
over the years - the ability to divide up a single ASIC into multiple,
virtual switches which are managed per namespace. This is a very popular
feature from a certain legacy vendor and one that would be good for open
networking to achieve. This is the basis of my response last week about
the devlink instance per namespace, and I thought Jiri was moving in
that direction until our chat today. Jiri's intention is something
different; we can discuss that on the next version of his patches.

###

As for the current controller put into netdevsim...

When I started down this road 18-20 months ago, I was copying a lot of
netdevsim code to create a fake device from which I could have a devlink
instance to implement the devlink resources. At some point it was silly
to keep duplicating the code - just make it part of netdevsim. After all
it really mirrors mlxsw and the resource limits for fib notifier
handling, it allows testing of the userspace APIs and in kernel notifier
APIs which allow an entity to veto a change. This is all consistent with
the intent of netdevsim - s/w based implementation for testing of APIs
that otherwise require hardware.

^ permalink raw reply

* RE: Realtek r8822be wireless card fails to work with new rtw88 kernel module
From: Tony Chuang @ 2019-08-07  2:33 UTC (permalink / raw)
  To: Brian Norris, 고준
  Cc: linux-wireless, <netdev@vger.kernel.org>, Linux Kernel
In-Reply-To: <CA+ASDXM6Jz7YY9XUj6QKv5VJCED-BnQ5K1UZHNApB9p6qTWtgg@mail.gmail.com>

> + yhchuang
> 
> On Tue, Aug 6, 2019 at 7:32 AM 고준 <gojun077@gmail.com> wrote:
> >
> > Hello,
> >
> > I recently reported a bug to Ubuntu regarding a regression in wireless
> > driver support for the Realtek r8822be wireless chipset. The issue
> > link on launchpad is:
> >
> > https://bugs.launchpad.net/bugs/1838133
> >
> > After Canonical developers triaged the bug they determined that the
> > problem lies upstream, and instructed me to send mails to the relevant
> > kernel module maintainers at Realtek and to the general kernel.org
> > mailing list.
> >
> > I built kernel 5.3.0-rc1+ with the latest realtek drivers from
> > wireless-drivers-next but my Realtek r8822be doesn't work with
> > rtw88/rtwpci kernel modules.
> >
> > Please let me know if there is any additional information I can
> > provide that would help in debugging this issue.
> 
> Any chance this would help you?
> 
> https://patchwork.kernel.org/patch/11065631/
> 
> Somebody else was complaining about 8822be regressions that were fixed
> with that.
> 

I hope it could fix it.

And as "r8822be" was dropped, it is preferred to use "rtw88" instead.
I have received two kinds of failures that cause driver stop working.
One is the MSI interrupt should be enabled on certain platforms.
Another is the RFE type of the card, could you send more dmesg to me?

Yan-Hsuan



^ permalink raw reply

* [v3,4/4] tools: bpftool: add documentation for net attach/detach
From: Daniel T. Lee @ 2019-08-07  2:25 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-1-danieltimlee@gmail.com>

Since, new sub-command 'net attach/detach' has been added for
attaching XDP program on interface,
this commit documents usage and sample output of `net attach/detach`.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 .../bpf/bpftool/Documentation/bpftool-net.rst | 51 +++++++++++++++++--
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst b/tools/bpf/bpftool/Documentation/bpftool-net.rst
index d8e5237a2085..4ad1a380e186 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -15,17 +15,22 @@ SYNOPSIS
 	*OPTIONS* := { [{ **-j** | **--json** }] [{ **-p** | **--pretty** }] }
 
 	*COMMANDS* :=
-	{ **show** | **list** } [ **dev** name ] | **help**
+	{ **show** | **list** | **attach** | **detach** | **help** }
 
 NET COMMANDS
 ============
 
-|	**bpftool** **net { show | list } [ dev name ]**
+|	**bpftool** **net { show | list }** [ **dev** *name* ]
+|	**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *name* [ **overwrite** ]
+|	**bpftool** **net detach** *ATTACH_TYPE* **dev** *name*
 |	**bpftool** **net help**
+|
+|	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
+|	*ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
 
 DESCRIPTION
 ===========
-	**bpftool net { show | list } [ dev name ]**
+	**bpftool net { show | list }** [ **dev** *name* ]
                   List bpf program attachments in the kernel networking subsystem.
 
                   Currently, only device driver xdp attachments and tc filter
@@ -47,6 +52,18 @@ DESCRIPTION
                   all bpf programs attached to non clsact qdiscs, and finally all
                   bpf programs attached to root and clsact qdisc.
 
+	**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *name* [ **overwrite** ]
+                  Attach bpf program *PROG* to network interface *name* with
+                  type specified by *ATTACH_TYPE*. Previously attached bpf program
+                  can be replaced by the command used with **overwrite** option.
+                  Currently, *ATTACH_TYPE* only contains XDP programs.
+
+	**bpftool** **net detach** *ATTACH_TYPE* **dev** *name*
+                  Detach bpf program attached to network interface *name* with
+                  type specified by *ATTACH_TYPE*. To detach bpf program, same
+                  *ATTACH_TYPE* previously used for attach must be specified.
+                  Currently, *ATTACH_TYPE* only contains XDP programs.
+
 	**bpftool net help**
 		  Print short help message.
 
@@ -137,6 +154,34 @@ EXAMPLES
         }
     ]
 
+|
+| **# bpftool net attach xdpdrv id 16 dev enp6s0np0**
+| **# bpftool net**
+
+::
+
+      xdp:
+      enp6s0np0(4) driver id 16
+
+|
+| **# bpftool net attach xdpdrv id 16 dev enp6s0np0**
+| **# bpftool net attach xdpdrv id 20 dev enp6s0np0 overwrite**
+| **# bpftool net**
+
+::
+
+      xdp:
+      enp6s0np0(4) driver id 20
+
+|
+| **# bpftool net attach xdpdrv id 16 dev enp6s0np0**
+| **# bpftool net detach xdpdrv dev enp6s0np0**
+| **# bpftool net**
+
+::
+
+      xdp:
+
 
 SEE ALSO
 ========
-- 
2.20.1


^ permalink raw reply related

* [v3,3/4] tools: bpftool: add bash-completion for net attach/detach
From: Daniel T. Lee @ 2019-08-07  2:25 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-1-danieltimlee@gmail.com>

This commit adds bash-completion for new "net attach/detach"
subcommand for attaching XDP program on interface.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 tools/bpf/bpftool/bash-completion/bpftool | 64 +++++++++++++++++++----
 1 file changed, 55 insertions(+), 9 deletions(-)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index c8f42e1fcbc9..1d81cb09d478 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -201,6 +201,10 @@ _bpftool()
             _bpftool_get_prog_tags
             return 0
             ;;
+        dev)
+            _sysfs_get_netdevs
+            return 0
+            ;;
         file|pinned)
             _filedir
             return 0
@@ -399,10 +403,6 @@ _bpftool()
                             _filedir
                             return 0
                             ;;
-                        dev)
-                            _sysfs_get_netdevs
-                            return 0
-                            ;;
                         *)
                             COMPREPLY=( $( compgen -W "map" -- "$cur" ) )
                             _bpftool_once_attr 'type'
@@ -498,10 +498,6 @@ _bpftool()
                         key|value|flags|name|entries)
                             return 0
                             ;;
-                        dev)
-                            _sysfs_get_netdevs
-                            return 0
-                            ;;
                         *)
                             _bpftool_once_attr 'type'
                             _bpftool_once_attr 'key'
@@ -775,11 +771,61 @@ _bpftool()
             esac
             ;;
         net)
+            local PROG_TYPE='id pinned tag'
+            local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload'
             case $command in
+                show|list)
+                    [[ $prev != "$command" ]] && return 0
+                    COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
+                    return 0
+                    ;;
+                attach)
+                    case $cword in
+                        3)
+                            COMPREPLY=( $( compgen -W "$ATTACH_TYPES" -- "$cur" ) )
+                            return 0
+                            ;;
+                        4)
+                            COMPREPLY=( $( compgen -W "$PROG_TYPE" -- "$cur" ) )
+                            return 0
+                            ;;
+                        5)
+                            case $prev in
+                                id)
+                                    _bpftool_get_prog_ids
+                                    ;;
+                                pinned)
+                                    _filedir
+                                    ;;
+                            esac
+                            return 0
+                            ;;
+                        6)
+                            COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
+                            return 0
+                            ;;
+                        8)
+                            _bpftool_once_attr 'overwrite'
+                            return 0
+                            ;;
+                    esac
+                    ;;
+                detach)
+                    case $cword in
+                        3)
+                            COMPREPLY=( $( compgen -W "$ATTACH_TYPES" -- "$cur" ) )
+                            return 0
+                            ;;
+                        4)
+                            COMPREPLY=( $( compgen -W 'dev' -- "$cur" ) )
+                            return 0
+                            ;;
+                    esac
+                    ;;
                 *)
                     [[ $prev == $object ]] && \
                         COMPREPLY=( $( compgen -W 'help \
-                            show list' -- "$cur" ) )
+                            show list attach detach' -- "$cur" ) )
                     ;;
             esac
             ;;
-- 
2.20.1


^ permalink raw reply related

* [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Daniel T. Lee @ 2019-08-07  2:25 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-1-danieltimlee@gmail.com>

By this commit, using `bpftool net detach`, the attached XDP prog can
be detached. Detaching the BPF prog will be done through libbpf
'bpf_set_link_xdp_fd' with the progfd set to -1.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 tools/bpf/bpftool/net.c | 42 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index c05a3fac5cac..7be96acb08e0 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv)
 	return 0;
 }
 
+static int do_detach(int argc, char **argv)
+{
+	enum net_attach_type attach_type;
+	int progfd, ifindex, err = 0;
+
+	/* parse detach args */
+	if (!REQ_ARGS(3))
+		return -EINVAL;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == max_net_attach_type) {
+		p_err("invalid net attach/detach type");
+		return -EINVAL;
+	}
+
+	NEXT_ARG();
+	ifindex = net_parse_dev(&argc, &argv);
+	if (ifindex < 1)
+		return -EINVAL;
+
+	/* detach xdp prog */
+	progfd = -1;
+	if (is_prefix("xdp", attach_type_strings[attach_type]))
+		err = do_attach_detach_xdp(progfd, attach_type, ifindex, NULL);
+
+	if (err < 0) {
+		p_err("interface %s detach failed",
+		      attach_type_strings[attach_type]);
+		return err;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+
+	return 0;
+}
+
 static int do_show(int argc, char **argv)
 {
 	struct bpf_attach_info attach_info = {};
@@ -419,6 +456,7 @@ static int do_help(int argc, char **argv)
 	fprintf(stderr,
 		"Usage: %s %s { show | list } [dev <devname>]\n"
 		"       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
+		"       %s %s detach ATTACH_TYPE dev <devname>\n"
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_PROGRAM "\n"
@@ -429,7 +467,8 @@ static int do_help(int argc, char **argv)
 		"      to dump program attachments. For program types\n"
 		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
 		"      consult iproute2.\n",
-		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2]);
 
 	return 0;
 }
@@ -438,6 +477,7 @@ static const struct cmd cmds[] = {
 	{ "show",	do_show },
 	{ "list",	do_show },
 	{ "attach",	do_attach },
+	{ "detach",	do_detach },
 	{ "help",	do_help },
 	{ 0 }
 };
-- 
2.20.1


^ permalink raw reply related

* [v3,1/4] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-07  2:25 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-1-danieltimlee@gmail.com>

By this commit, using `bpftool net attach`, user can attach XDP prog on
interface. New type of enum 'net_attach_type' has been made, as stated at
cover-letter, the meaning of 'attach' is, prog will be attached on interface.

With 'overwrite' option at argument, attached XDP program could be replaced.
Added new helper 'net_parse_dev' to parse the network device at argument.

BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 tools/bpf/bpftool/net.c | 141 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 130 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 67e99c56bc88..c05a3fac5cac 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -55,6 +55,35 @@ struct bpf_attach_info {
 	__u32 flow_dissector_id;
 };
 
+enum net_attach_type {
+	NET_ATTACH_TYPE_XDP,
+	NET_ATTACH_TYPE_XDP_GENERIC,
+	NET_ATTACH_TYPE_XDP_DRIVER,
+	NET_ATTACH_TYPE_XDP_OFFLOAD,
+};
+
+static const char * const attach_type_strings[] = {
+	[NET_ATTACH_TYPE_XDP]		= "xdp",
+	[NET_ATTACH_TYPE_XDP_GENERIC]	= "xdpgeneric",
+	[NET_ATTACH_TYPE_XDP_DRIVER]	= "xdpdrv",
+	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
+};
+
+const size_t max_net_attach_type = ARRAY_SIZE(attach_type_strings);
+
+static enum net_attach_type parse_attach_type(const char *str)
+{
+	enum net_attach_type type;
+
+	for (type = 0; type < max_net_attach_type; type++) {
+		if (attach_type_strings[type] &&
+		   is_prefix(str, attach_type_strings[type]))
+			return type;
+	}
+
+	return max_net_attach_type;
+}
+
 static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
 {
 	struct bpf_netdev_t *netinfo = cookie;
@@ -223,6 +252,97 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
 	return 0;
 }
 
+static int net_parse_dev(int *argc, char ***argv)
+{
+	int ifindex;
+
+	if (is_prefix(**argv, "dev")) {
+		NEXT_ARGP();
+
+		ifindex = if_nametoindex(**argv);
+		if (!ifindex)
+			p_err("invalid devname %s", **argv);
+
+		NEXT_ARGP();
+	} else {
+		p_err("expected 'dev', got: '%s'?", **argv);
+		return -1;
+	}
+
+	return ifindex;
+}
+
+static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type,
+				int ifindex, bool overwrite)
+{
+	__u32 flags = 0;
+
+	if (!overwrite)
+		flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	if (attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
+		flags |= XDP_FLAGS_SKB_MODE;
+	if (attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
+		flags |= XDP_FLAGS_DRV_MODE;
+	if (attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
+		flags |= XDP_FLAGS_HW_MODE;
+
+	return bpf_set_link_xdp_fd(ifindex, progfd, flags);
+}
+
+static int do_attach(int argc, char **argv)
+{
+	enum net_attach_type attach_type;
+	int progfd, ifindex, err = 0;
+	bool overwrite = false;
+
+	/* parse attach args */
+	if (!REQ_ARGS(5))
+		return -EINVAL;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == max_net_attach_type) {
+		p_err("invalid net attach/detach type");
+		return -EINVAL;
+	}
+
+	NEXT_ARG();
+	progfd = prog_parse_fd(&argc, &argv);
+	if (progfd < 0)
+		return -EINVAL;
+
+	ifindex = net_parse_dev(&argc, &argv);
+	if (ifindex < 1) {
+		close(progfd);
+		return -EINVAL;
+	}
+
+	if (argc) {
+		if (is_prefix(*argv, "overwrite")) {
+			overwrite = true;
+		} else {
+			p_err("expected 'overwrite', got: '%s'?", *argv);
+			close(progfd);
+			return -EINVAL;
+		}
+	}
+
+	/* attach xdp prog */
+	if (is_prefix("xdp", attach_type_strings[attach_type]))
+		err = do_attach_detach_xdp(progfd, attach_type, ifindex,
+					   overwrite);
+
+	if (err < 0) {
+		p_err("interface %s attach failed",
+		      attach_type_strings[attach_type]);
+		return err;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+
+	return 0;
+}
+
 static int do_show(int argc, char **argv)
 {
 	struct bpf_attach_info attach_info = {};
@@ -231,17 +351,10 @@ static int do_show(int argc, char **argv)
 	unsigned int nl_pid;
 	char err_buf[256];
 
-	if (argc == 2) {
-		if (strcmp(argv[0], "dev") != 0)
-			usage();
-		filter_idx = if_nametoindex(argv[1]);
-		if (filter_idx == 0) {
-			fprintf(stderr, "invalid dev name %s\n", argv[1]);
-			return -1;
-		}
-	} else if (argc != 0) {
+	if (argc == 2)
+		filter_idx = net_parse_dev(&argc, &argv);
+	else if (argc != 0)
 		usage();
-	}
 
 	ret = query_flow_dissector(&attach_info);
 	if (ret)
@@ -305,13 +418,18 @@ static int do_help(int argc, char **argv)
 
 	fprintf(stderr,
 		"Usage: %s %s { show | list } [dev <devname>]\n"
+		"       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
 		"       %s %s help\n"
+		"\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
+		"\n"
 		"Note: Only xdp and tc attachments are supported now.\n"
 		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
 		"      to dump program attachments. For program types\n"
 		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
 		"      consult iproute2.\n",
-		bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
 
 	return 0;
 }
@@ -319,6 +437,7 @@ static int do_help(int argc, char **argv)
 static const struct cmd cmds[] = {
 	{ "show",	do_show },
 	{ "list",	do_show },
+	{ "attach",	do_attach },
 	{ "help",	do_help },
 	{ 0 }
 };
-- 
2.20.1


^ permalink raw reply related

* [v3,0/4] tools: bpftool: add net attach/detach command to attach XDP prog
From: Daniel T. Lee @ 2019-08-07  2:25 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev

Currently, bpftool net only supports dumping progs attached on the
interface. To attach XDP prog on interface, user must use other tool
(eg. iproute2). By this patch, with `bpftool net attach/detach`, user
can attach/detach XDP prog on interface.

    # bpftool prog
	16: xdp  name xdp_prog1  tag 539ec6ce11b52f98  gpl
        loaded_at 2019-08-07T08:30:17+0900  uid 0
    ...
	20: xdp  name xdp_fwd_prog  tag b9cb69f121e4a274  gpl
        loaded_at 2019-08-07T08:30:17+0900  uid 0

	# bpftool net attach xdpdrv id 16 dev enp6s0np0
    # bpftool net
    xdp:
	enp6s0np0(4) driver id 16

	# bpftool net attach xdpdrv id 20 dev enp6s0np0 overwrite
    # bpftool net
    xdp:
	enp6s0np0(4) driver id 20

	# bpftool net detach xdpdrv dev enp6s0np0
    # bpftool net
    xdp:

While this patch only contains support for XDP, through `net
attach/detach`, bpftool can further support other prog attach types.

XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio.

---
Changes in v3:
  - added 'overwrite' option for replacing previously attached XDP prog
  - command argument order has been changed ('ATTACH_TYPE' comes first)
  - add 'dev' keyword in front of <devname>
  - added bash-completion and documentation

Changes in v2:
  - command 'load/unload' changed to 'attach/detach' for the consistency

Daniel T. Lee (4):
  tools: bpftool: add net attach command to attach XDP on interface
  tools: bpftool: add net detach command to detach XDP on interface
  tools: bpftool: add bash-completion for net attach/detach
  tools: bpftool: add documentation for net attach/detach

 .../bpf/bpftool/Documentation/bpftool-net.rst |  51 ++++-
 tools/bpf/bpftool/bash-completion/bpftool     |  64 ++++++-
 tools/bpf/bpftool/net.c                       | 181 ++++++++++++++++--
 3 files changed, 273 insertions(+), 23 deletions(-)

-- 
2.20.1

^ permalink raw reply

* [PATCH v2] bonding: Add vlan tx offload to hw_enc_features
From: YueHaibing @ 2019-08-07  2:19 UTC (permalink / raw)
  To: j.vosburgh, vfalico, andy, davem, jiri, jay.vosburgh
  Cc: linux-kernel, netdev, YueHaibing

As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.

Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:

	 if (skb->encapsulation)
                 features &= dev->hw_enc_features;

clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8ac7 like this:

vlan_dev_hard_start_xmit
  -->dev_queue_xmit
    -->validate_xmit_skb
      -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
      -->validate_xmit_vlan
        -->__vlan_hwaccel_push_inside //skb->tci is cleared
...
 --> bond_start_xmit
   --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
     --> __skb_flow_dissect // nhoff point to IP header
        -->  case htons(ETH_P_8021Q)
             // skb_vlan_tag_present is false, so
             vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
             //vlan point to ip header wrongly

Fixes: b2a103e6d0af ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
---
v2: fix a log typo, add Fixes tag
---
 drivers/net/bonding/bond_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 02fd782..931d9d9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1126,6 +1126,8 @@ static void bond_compute_features(struct bonding *bond)
 done:
 	bond_dev->vlan_features = vlan_features;
 	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
+				    NETIF_F_HW_VLAN_CTAG_TX |
+				    NETIF_F_HW_VLAN_STAG_TX |
 				    NETIF_F_GSO_UDP_L4;
 	bond_dev->mpls_features = mpls_features;
 	bond_dev->gso_max_segs = gso_max_segs;
-- 
2.7.4



^ permalink raw reply related

* Re: linux-next: Signed-off-by missing for commit in the net-next tree
From: Yifeng Sun @ 2019-08-07  2:02 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <20190807115436.5f02155c@canb.auug.org.au>

Hi Stephen,

Sure, thanks!
Yifeng

On Tue, Aug 6, 2019 at 6:54 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Yifeng,
>
> On Tue, 6 Aug 2019 16:37:26 -0700 Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> >
> > My apologies, thanks for the email. Please add the signed-off if you can.
>
> Dave does not rebase his trees, so that is not possible.  Just remember
> for next time, thanks :-)
>
> --
> Cheers,
> Stephen Rothwell

^ permalink raw reply

* Re: linux-next: Signed-off-by missing for commit in the net-next tree
From: Stephen Rothwell @ 2019-08-07  1:54 UTC (permalink / raw)
  To: Yifeng Sun
  Cc: David Miller, Networking, Linux Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <CAEYOeXMV1DbTsy7U1-Fu0eztVGpw-+ZEJTK0Hzm8xbqCL7fabw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

Hi Yifeng,

On Tue, 6 Aug 2019 16:37:26 -0700 Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>
> My apologies, thanks for the email. Please add the signed-off if you can.

Dave does not rebase his trees, so that is not possible.  Just remember
for next time, thanks :-)

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v3 00/39] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-07  1:49 UTC (permalink / raw)
  To: john.hubbard, Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

On 8/6/19 6:32 PM, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> ...
> 
> John Hubbard (38):
>   mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
...
>  54 files changed, 191 insertions(+), 323 deletions(-)
> 
ahem, yes, apparently this is what happens if I add a few patches while editing
the cover letter... :) 

The subject line should read "00/41", and the list of files affected here is
therefore under-reported in this cover letter. However, the patch series itself is 
intact and ready for submission.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH] bonding: Add vlan tx offload to hw_enc_features
From: Yuehaibing @ 2019-08-07  1:46 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: vfalico, andy, davem, jiri, linux-kernel, netdev
In-Reply-To: <4281.1565098884@nyx>

On 2019/8/6 21:41, Jay Vosburgh wrote:
> YueHaibing <yuehaibing@huawei.com> wrote:
> 
>> As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload")
>> said, we should always enable bonding's vlan tx offload, pass the
>> vlan packets to the slave devices with vlan tci, let them to handle
>> vlan implementation.
>>
>> Now if encapsulation protocols like VXLAN is used, skb->encapsulation
>> may be set, then the packet is passed to vlan devicec which based on
> 
> 	Typo: "devicec"

oh, yes, thanks!

> 
>> bonding device. However in netif_skb_features(), the check of
>> hw_enc_features:
>>
>> 	 if (skb->encapsulation)
>>                 features &= dev->hw_enc_features;
>>
>> clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
>> in same issue in commit 30d8177e8ac7 like this:
>>
>> vlan_dev_hard_start_xmit
>>  -->dev_queue_xmit
>>    -->validate_xmit_skb
>>      -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
>>      -->validate_xmit_vlan
>>        -->__vlan_hwaccel_push_inside //skb->tci is cleared
>> ...
>> --> bond_start_xmit
>>   --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
>>     --> __skb_flow_dissect // nhoff point to IP header
>>        -->  case htons(ETH_P_8021Q)
>>             // skb_vlan_tag_present is false, so
>>             vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
>>             //vlan point to ip header wrongly
>>
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> 
> 	Looks good to me; should this be tagged with
> 
> Fixes: 278339a42a1b ("bonding: propogate vlan_features to bonding master")
> 
> 	as 30d8177e8ac7 was?  If not, is there an appropriate commit id?

It seems the commit was:

Fixes: b2a103e6d0af ("bonding: convert to ndo_fix_features")

> 
> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>


Thanks, will send v2.

> 
> 	-J
> 
>> ---
>> drivers/net/bonding/bond_main.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 02fd782..931d9d9 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1126,6 +1126,8 @@ static void bond_compute_features(struct bonding *bond)
>> done:
>> 	bond_dev->vlan_features = vlan_features;
>> 	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
>> +				    NETIF_F_HW_VLAN_CTAG_TX |
>> +				    NETIF_F_HW_VLAN_STAG_TX |
>> 				    NETIF_F_GSO_UDP_L4;
>> 	bond_dev->mpls_features = mpls_features;
>> 	bond_dev->gso_max_segs = gso_max_segs;
>> -- 
>> 2.7.4
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
> 
> .
> 


^ permalink raw reply

* [PATCH v3 02/41] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-08-07  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, David Airlie, Daniel Vetter
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/drm/via/via_dmablit.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..b5b5bf0ba65e 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-	struct page *page;
 	int i;
 
 	switch (vsg->state) {
@@ -186,13 +185,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 		kfree(vsg->desc_pages);
 		/* fall through */
 	case dr_via_pages_locked:
-		for (i = 0; i < vsg->num_pages; ++i) {
-			if (NULL != (page = vsg->pages[i])) {
-				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
-					SetPageDirty(page);
-				put_page(page);
-			}
-		}
+		put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+					  (vsg->direction == DMA_FROM_DEVICE));
 		/* fall through */
 	case dr_via_pages_alloc:
 		vfree(vsg->pages);
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 04/41] net/rds: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-08-07  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Santosh Shilimkar, David S . Miller
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: rds-devel@oss.oracle.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 net/rds/info.c    |  5 ++---
 net/rds/message.c |  2 +-
 net/rds/rdma.c    | 15 +++++++--------
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/rds/info.c b/net/rds/info.c
index 03f6fd56d237..ca6af2889adf 100644
--- a/net/rds/info.c
+++ b/net/rds/info.c
@@ -162,7 +162,6 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval,
 	struct rds_info_lengths lens;
 	unsigned long nr_pages = 0;
 	unsigned long start;
-	unsigned long i;
 	rds_info_func func;
 	struct page **pages = NULL;
 	int ret;
@@ -235,8 +234,8 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval,
 		ret = -EFAULT;
 
 out:
-	for (i = 0; pages && i < nr_pages; i++)
-		put_page(pages[i]);
+	if (pages)
+		put_user_pages(pages, nr_pages);
 	kfree(pages);
 
 	return ret;
diff --git a/net/rds/message.c b/net/rds/message.c
index 50f13f1d4ae0..d7b0d266c437 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -404,7 +404,7 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter *
 			int i;
 
 			for (i = 0; i < rm->data.op_nents; i++)
-				put_page(sg_page(&rm->data.op_sg[i]));
+				put_user_page(sg_page(&rm->data.op_sg[i]));
 			mmp = &rm->data.op_mmp_znotifier->z_mmp;
 			mm_unaccount_pinned_pages(mmp);
 			ret = -EFAULT;
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 916f5ec373d8..6762e8696b99 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -162,8 +162,7 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
 				  pages);
 
 	if (ret >= 0 && ret < nr_pages) {
-		while (ret--)
-			put_page(pages[ret]);
+		put_user_pages(pages, ret);
 		ret = -EFAULT;
 	}
 
@@ -276,7 +275,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
 
 	if (IS_ERR(trans_private)) {
 		for (i = 0 ; i < nents; i++)
-			put_page(sg_page(&sg[i]));
+			put_user_page(sg_page(&sg[i]));
 		kfree(sg);
 		ret = PTR_ERR(trans_private);
 		goto out;
@@ -464,9 +463,10 @@ void rds_rdma_free_op(struct rm_rdma_op *ro)
 		 * to local memory */
 		if (!ro->op_write) {
 			WARN_ON(!page->mapping && irqs_disabled());
-			set_page_dirty(page);
+			put_user_pages_dirty_lock(&page, 1, true);
+		} else {
+			put_user_page(page);
 		}
-		put_page(page);
 	}
 
 	kfree(ro->op_notifier);
@@ -481,8 +481,7 @@ void rds_atomic_free_op(struct rm_atomic_op *ao)
 	/* Mark page dirty if it was possibly modified, which
 	 * is the case for a RDMA_READ which copies from remote
 	 * to local memory */
-	set_page_dirty(page);
-	put_page(page);
+	put_user_pages_dirty_lock(&page, 1, true);
 
 	kfree(ao->op_notifier);
 	ao->op_notifier = NULL;
@@ -867,7 +866,7 @@ int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm,
 	return ret;
 err:
 	if (page)
-		put_page(page);
+		put_user_page(page);
 	rm->atomic.op_active = 0;
 	kfree(rm->atomic.op_notifier);
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 05/41] net/ceph: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-08-07  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Jeff Layton, Ilya Dryomov, Sage Weil,
	David S . Miller
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Acked-by: Jeff Layton <jlayton@kernel.org>

Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Sage Weil <sage@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 net/ceph/pagevec.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/ceph/pagevec.c b/net/ceph/pagevec.c
index 64305e7056a1..c88fff2ab9bd 100644
--- a/net/ceph/pagevec.c
+++ b/net/ceph/pagevec.c
@@ -12,13 +12,7 @@
 
 void ceph_put_page_vector(struct page **pages, int num_pages, bool dirty)
 {
-	int i;
-
-	for (i = 0; i < num_pages; i++) {
-		if (dirty)
-			set_page_dirty_lock(pages[i]);
-		put_page(pages[i]);
-	}
+	put_user_pages_dirty_lock(pages, num_pages, dirty);
 	kvfree(pages);
 }
 EXPORT_SYMBOL(ceph_put_page_vector);
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 07/41] drm/etnaviv: convert release_pages() to put_user_pages()
From: john.hubbard @ 2019-08-07  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Joerg Roedel, Paolo Bonzini,
	Radim Krčmář, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index e8778ebb72e6..a0144a5ee325 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -686,7 +686,7 @@ static int etnaviv_gem_userptr_get_pages(struct etnaviv_gem_object *etnaviv_obj)
 		ret = get_user_pages_fast(ptr, num_pages,
 					  !userptr->ro ? FOLL_WRITE : 0, pages);
 		if (ret < 0) {
-			release_pages(pvec, pinned);
+			put_user_pages(pvec, pinned);
 			kvfree(pvec);
 			return ret;
 		}
@@ -710,7 +710,7 @@ static void etnaviv_gem_userptr_release(struct etnaviv_gem_object *etnaviv_obj)
 	if (etnaviv_obj->pages) {
 		int npages = etnaviv_obj->base.size >> PAGE_SHIFT;
 
-		release_pages(etnaviv_obj->pages, npages);
+		put_user_pages(etnaviv_obj->pages, npages);
 		kvfree(etnaviv_obj->pages);
 	}
 }
-- 
2.22.0


^ permalink raw reply related

* [PATCH v3 08/41] drm/i915: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-08-07  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Rodrigo Vivi, Jani Nikula,
	Joonas Lahtinen, David Airlie
In-Reply-To: <20190807013340.9706-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

This is a merge-able version of the fix, because it restricts
itself to put_user_page() and put_user_pages(), both of which
have not changed their APIs. Later, i915_gem_userptr_put_pages()
can be simplified to use put_user_pages_dirty_lock().

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 2caa594322bc..76dda2923cf1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -527,7 +527,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 	}
 	mutex_unlock(&obj->mm.lock);
 
-	release_pages(pvec, pinned);
+	put_user_pages(pvec, pinned);
 	kvfree(pvec);
 
 	i915_gem_object_put(obj);
@@ -640,7 +640,7 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 		__i915_gem_userptr_set_active(obj, true);
 
 	if (IS_ERR(pages))
-		release_pages(pvec, pinned);
+		put_user_pages(pvec, pinned);
 	kvfree(pvec);
 
 	return PTR_ERR_OR_ZERO(pages);
@@ -675,7 +675,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
 			set_page_dirty_lock(page);
 
 		mark_page_accessed(page);
-		put_page(page);
+		put_user_page(page);
 	}
 	obj->mm.dirty = false;
 
-- 
2.22.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox