Netdev List
 help / color / mirror / Atom feed
* general protection fault in dev_map_hash_update_elem
From: syzbot @ 2019-09-05 20:08 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, hawk, jakub.kicinski, john.fastabend,
	kafai, linux-kernel, netdev, songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    6d028043 Add linux-next specific files for 20190830
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=135c1a92600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=82a6bec43ab0cb69
dashboard link: https://syzkaller.appspot.com/bug?extid=4e7a85b1432052e8d6f8
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=109124e1600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10235 Comm: syz-executor.0 Not tainted 5.3.0-rc6-next-20190830  
#75
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9  
00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f  
85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
FS:  00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  map_update_elem+0xc82/0x10b0 kernel/bpf/syscall.c:966
  __do_sys_bpf+0x8b5/0x3350 kernel/bpf/syscall.c:2854
  __se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
  __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:2825
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459879
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a3d50bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459879
RDX: 0000000000000020 RSI: 0000000020000040 RDI: 0000000000000002
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a3d50c6d4
R13: 00000000004bfc86 R14: 00000000004d1960 R15: 00000000ffffffff
Modules linked in:
---[ end trace 083223e21dbd0ae5 ]---
RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9  
00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f  
85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
FS:  00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: DR, Fix error return code in dr_domain_init_resources()
From: Saeed Mahameed @ 2019-09-05 19:37 UTC (permalink / raw)
  To: Erez Shitrit, weiyongjun1@huawei.com, Mark Bloch, leon@kernel.org,
	Alex Vesker
  Cc: kernel-janitors@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <20190905095600.127371-1-weiyongjun1@huawei.com>

On Thu, 2019-09-05 at 09:56 +0000, Wei Yongjun wrote:
> Fix to return negative error code -ENOMEM from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: 4ec9e7b02697 ("net/mlx5: DR, Expose steering domain
> functionality")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> 

Applied to net-next-mlx5.
Thanks !


^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: DR, Remove useless set memory to zero use memset()
From: Saeed Mahameed @ 2019-09-05 19:37 UTC (permalink / raw)
  To: Erez Shitrit, weiyongjun1@huawei.com, Mark Bloch, leon@kernel.org,
	Alex Vesker
  Cc: kernel-janitors@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <20190905095326.127277-1-weiyongjun1@huawei.com>

On Thu, 2019-09-05 at 09:53 +0000, Wei Yongjun wrote:
> The memory return by kzalloc() has already be set to zero, so
> remove useless memset(0).
> 
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Applied to net-next-mlx5.
Thanks !

^ permalink raw reply

* Zdravstvujte! Vas interesujut klientskie bazy dannyh?
From: netdev @ 2019-09-05 17:56 UTC (permalink / raw)
  To: netdev

Zdravstvujte! Vas interesujut klientskie bazy dannyh?

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Andrii Nakryiko @ 2019-09-05 19:26 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Stephen Rothwell, David Miller, Networking,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <CAK7LNAQEU6uu-Z=VeR2KNa8ezCLA7VHtpvM2tvAKsWtUTi6Eug@mail.gmail.com>

On Tue, Sep 3, 2019 at 11:20 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> On Wed, Sep 4, 2019 at 3:00 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi all,
> >
> > After merging the net-next tree, today's linux-next build (arm
> > multi_v7_defconfig) failed like this:
> >
> > scripts/link-vmlinux.sh: 74: Bad substitution
> >
> > Caused by commit
> >
> >   341dfcf8d78e ("btf: expose BTF info through sysfs")
> >
> > interacting with commit
> >
> >   1267f9d3047d ("kbuild: add $(BASH) to run scripts with bash-extension")
> >
> > from the kbuild tree.
>
>
> I knew that they were using bash-extension
> in the #!/bin/sh script.  :-D
>
> In fact, I wrote my patch in order to break their code
> and  make btf people realize that they were doing wrong.

Was there a specific reason to wait until this would break during
Stephen's merge, instead of giving me a heads up (or just replying on
original patch) and letting me fix it and save everyone's time and
efforts?

Either way, I've fixed the issue in
https://patchwork.ozlabs.org/patch/1158620/ and will pay way more
attention to BASH-specific features going forward (I found it pretty
hard to verify stuff like this, unfortunately). But again, code review
process is the best place to catch this and I really hope in the
future we can keep this process productive. Thanks!

>
>
>
> > The change in the net-next tree turned link-vmlinux.sh into a bash script
> > (I think).
> >
> > I have applied the following patch for today:
>
>
> But, this is a temporary fix only for linux-next.
>
> scripts/link-vmlinux.sh does not need to use the
> bash-extension ${@:2} in the first place.
>
> I hope btf people will write the correct code.

I replaced ${@:2} with shift and ${@}, I hope that's a correct fix,
but if you think it's not, please reply on the patch and let me know.


>
> Thanks.
>
>
>
>
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > Date: Wed, 4 Sep 2019 15:43:41 +1000
> > Subject: [PATCH] link-vmlinux.sh is now a bash script
> >
> > Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > ---
> >  Makefile                | 4 ++--
> >  scripts/link-vmlinux.sh | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index ac97fb282d99..523d12c5cebe 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1087,7 +1087,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
> >
> >  # Final link of vmlinux with optional arch pass after final link
> >  cmd_link-vmlinux =                                                 \
> > -       $(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
> > +       $(BASH) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
> >         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
> >
> >  vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> > @@ -1403,7 +1403,7 @@ clean: rm-files := $(CLEAN_FILES)
> >  PHONY += archclean vmlinuxclean
> >
> >  vmlinuxclean:
> > -       $(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
> > +       $(Q)$(BASH) $(srctree)/scripts/link-vmlinux.sh clean
> >         $(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
> >
> >  clean: archclean vmlinuxclean
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index f7edb75f9806..ea1f8673869d 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -1,4 +1,4 @@
> > -#!/bin/sh
> > +#!/bin/bash
> >  # SPDX-License-Identifier: GPL-2.0
> >  #
> >  # link vmlinux
> > --
> > 2.23.0.rc1
> >
> > --
> > Cheers,
> > Stephen Rothwell
>
>
>
> --
> Best Regards
> Masahiro Yamada

^ permalink raw reply

* Re: [PATCH v2 00/10] Add definition for the number of standard PCI BARs
From: Denis Efremov @ 2019-09-05 19:02 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Bjorn Helgaas, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Jose Abreu, kvm, linux-fbdev, netdev, x86, linux-s390
In-Reply-To: <20190816105128.GD14111@e119886-lin.cambridge.arm.com>



On 16.08.2019 13:51, Andrew Murray wrote:
> On Fri, Aug 16, 2019 at 12:24:27PM +0300, Denis Efremov wrote:
>> Code that iterates over all standard PCI BARs typically uses
>> PCI_STD_RESOURCE_END, but this is error-prone because it requires
>> "i <= PCI_STD_RESOURCE_END" rather than something like
>> "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
>> way PCI_SRIOV_NUM_BARS is used. There is already the definition
>> PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
>>
>> Changes in v2:
>>   - Reverse checks in pci_iomap_range,pci_iomap_wc_range.
>>   - Refactor loops in vfio_pci to keep PCI_STD_RESOURCES.
>>   - Add 2 new patches to replace the magic constant with new define.
>>   - Split net patch in v1 to separate stmmac and dwc-xlgmac patches.
>>
>> Denis Efremov (10):
>>   PCI: Add define for the number of standard PCI BARs
>>   s390/pci: Loop using PCI_STD_NUM_BARS
>>   x86/PCI: Loop using PCI_STD_NUM_BARS
>>   stmmac: pci: Loop using PCI_STD_NUM_BARS
>>   net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
>>   rapidio/tsi721: Loop using PCI_STD_NUM_BARS
>>   efifb: Loop using PCI_STD_NUM_BARS
>>   vfio_pci: Loop using PCI_STD_NUM_BARS
>>   PCI: hv: Use PCI_STD_NUM_BARS
>>   PCI: Use PCI_STD_NUM_BARS
>>
>>  arch/s390/include/asm/pci.h                      |  5 +----
>>  arch/s390/include/asm/pci_clp.h                  |  6 +++---
>>  arch/s390/pci/pci.c                              | 16 ++++++++--------
>>  arch/s390/pci/pci_clp.c                          |  6 +++---
>>  arch/x86/pci/common.c                            |  2 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
>>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
>>  drivers/pci/controller/pci-hyperv.c              | 10 +++++-----
>>  drivers/pci/pci.c                                | 11 ++++++-----
>>  drivers/pci/quirks.c                             |  4 ++--
>>  drivers/rapidio/devices/tsi721.c                 |  2 +-
>>  drivers/vfio/pci/vfio_pci.c                      | 11 +++++++----
>>  drivers/vfio/pci/vfio_pci_config.c               | 10 ++++++----
>>  drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
>>  drivers/video/fbdev/efifb.c                      |  2 +-
>>  include/linux/pci.h                              |  2 +-
>>  include/uapi/linux/pci_regs.h                    |  1 +
>>  17 files changed, 51 insertions(+), 47 deletions(-)
> 
> I've come across a few more places where this change can be made. There
> may be multiple instances in the same file, but only the first is shown
> below:
> 
> drivers/misc/pci_endpoint_test.c:       for (bar = BAR_0; bar <= BAR_5; bar++) {
> drivers/net/ethernet/intel/e1000/e1000_main.c:          for (i = BAR_1; i <= BAR_5; i++) {
> drivers/net/ethernet/intel/ixgb/ixgb_main.c:    for (i = BAR_1; i <= BAR_5; i++) {
> drivers/pci/controller/dwc/pci-dra7xx.c:        for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pci-layerscape-ep.c: for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pcie-artpec6.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pcie-designware-plat.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/endpoint/functions/pci-epf-test.c:  for (bar = BAR_0; bar <= BAR_5; bar++) {
> include/linux/pci-epc.h:        u64     bar_fixed_size[BAR_5 + 1];
> drivers/scsi/pm8001/pm8001_hwi.c:       for (bar = 0; bar < 6; bar++) {
> drivers/scsi/pm8001/pm8001_init.c:      for (bar = 0; bar < 6; bar++) {
> drivers/ata/sata_nv.c:  for (bar = 0; bar < 6; bar++)
> drivers/video/fbdev/core/fbmem.c:       for (idx = 0, bar = 0; bar < PCI_ROM_RESOURCE; bar++) {
> drivers/staging/gasket/gasket_core.c:   for (i = 0; i < GASKET_NUM_BARS; i++) {
> drivers/tty/serial/8250/8250_pci.c:     for (i = 0; i < PCI_NUM_BAR_RESOURCES; i++) { <-----------
> 
> It looks like BARs are often iterated with PCI_NUM_BAR_RESOURCES, there
> are a load of these too found with:
> 
> git grep PCI_ROM_RESOURCE | grep "< "
> 
> I'm happy to share patches if preferred.
> 

I'm not sure about lib/devres.c
265:#define PCIM_IOMAP_MAX      PCI_ROM_RESOURCE
268:    void __iomem *table[PCIM_IOMAP_MAX];
277:    for (i = 0; i < PCIM_IOMAP_MAX; i++)
324:    BUG_ON(bar >= PCIM_IOMAP_MAX);
352:    for (i = 0; i < PCIM_IOMAP_MAX; i++)
455:    for (i = 0; i < PCIM_IOMAP_MAX; i++) {

Is it worth changing?
#define PCIM_IOMAP_MAX  PCI_STD_NUM_BARS

Thanks,
Denis

^ permalink raw reply

* Re: [PATCH v2] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
From: Eric Dumazet @ 2019-09-05 18:49 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski,
	David S . Miller
  Cc: netdev, David Ahern, Lorenzo Colitti
In-Reply-To: <20190902162336.240405-1-zenczykowski@gmail.com>



On 9/2/19 6:23 PM, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> There is a subtle change in behaviour introduced by:
>   commit c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
>   'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
> 
> Before that patch /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
> 
> Afterwards /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
> 
> ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
> 
> AFAICT, this is incorrect since these routes are *not* coming from RA's.
> 
> As such, this patch restores the old behaviour.
> 
> Fixes: c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>  net/ipv6/route.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 558c6c68855f..516b2e568dae 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4365,13 +4365,14 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
>  	struct fib6_config cfg = {
>  		.fc_table = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL,
>  		.fc_ifindex = idev->dev->ifindex,
> -		.fc_flags = RTF_UP | RTF_ADDRCONF | RTF_NONEXTHOP,
> +		.fc_flags = RTF_UP | RTF_NONEXTHOP,
>  		.fc_dst = *addr,
>  		.fc_dst_len = 128,
>  		.fc_protocol = RTPROT_KERNEL,
>  		.fc_nlinfo.nl_net = net,
>  		.fc_ignore_dev_down = true,
>  	};
> +	struct fib6_info *f6i;
>  
>  	if (anycast) {
>  		cfg.fc_type = RTN_ANYCAST;
> @@ -4381,7 +4382,10 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
>  		cfg.fc_flags |= RTF_LOCAL;
>  	}
>  
> -	return ip6_route_info_create(&cfg, gfp_flags, NULL);
> +	f6i = ip6_route_info_create(&cfg, gfp_flags, NULL);
> +	if (f6i)
> +		f6i->dst_nocount = true;

Shouldn't it use 

	if (!IS_ERR(f6i))
		f6i->dst_nocount = true;

???


> +	return f6i;
>  }
>  
>  /* remove deleted ip from prefsrc entries */
> 

^ permalink raw reply

* [PATCH net] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list
From: Shmulik Ladkani @ 2019-09-05 18:36 UTC (permalink / raw)
  To: Daniel Borkmann, Eric Dumazet, Willem de Bruijn
  Cc: eyal, netdev, Shmulik Ladkani, Alexander Duyck

Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801d1
("net: Add frag_list support to skb_segment"), quote:

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.
    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.

This lead to a kernel BUG_ON for flows involving:
 - GRO generating a frag_list skb
 - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
 - skb_segment() of the skb

See example BUG_ON reports in [0].

In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).

Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.

This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].

Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
 - untrusted gso_size, by testing SKB_GSO_DODGY is set
   (SKB_GSO_DODGY is set by any gso_size mangling functions in
    net/core/filter.c)
 - the frag_list is non empty, its item is a non head_frag, *and* the
   headlen of the given 'head_skb' does not match the gso_size.

[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/

[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/

Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 net/core/skbuff.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ea8e8d332d85..c4bd1881acff 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3678,6 +3678,24 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	sg = !!(features & NETIF_F_SG);
 	csum = !!can_checksum_protocol(features, proto);
 
+	if (mss != GSO_BY_FRAGS &&
+	    (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
+		/* gso_size is untrusted.
+		 *
+		 * If head_skb has a frag_list with a linear non head_frag
+		 * item, and head_skb's headlen does not fit requested
+		 * gso_size, fall back to copying the skbs - by disabling sg.
+		 *
+		 * We assume checking the first frag suffices, i.e if either of
+		 * the frags have non head_frag data, then the first frag is
+		 * too.
+		 */
+		if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag &&
+		    (mss != skb_headlen(head_skb) - doffset)) {
+			sg = false;
+		}
+	}
+
 	if (sg && csum && (mss != GSO_BY_FRAGS))  {
 		if (!(features & NETIF_F_GSO_PARTIAL)) {
 			struct sk_buff *iter;
-- 
2.19.1


^ permalink raw reply related

* Re: rtnl_lock() question
From: Rustad, Mark D @ 2019-09-05 18:07 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: jonathan.lemon@gmail.com, eric.dumazet@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <867cf373f204715aec3b2e04ef9f65454cf25a2e.camel@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Sep 4, 2019, at 4:23 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:

> some allocations require parameters that should remain valid and
> constant across the whole reconfiguration procedure such
> params.num_channels, so they must be done inside the lock.

You could always check if those parameters have changed once under the lock  
and, if they did, drop the lock, reallocate and try again. Since such  
changes should be very infrequent, this is something that really should not  
loop multiple times.

--
Mark Rustad, Networking Division, Intel Corporation

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply

* [patch net-next] net: fib_notifier: move fib_notifier_ops from struct net into per-net struct
From: Jiri Pirko @ 2019-09-05 18:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

No need for fib_notifier_ops to be in struct net. It is used only by
fib_notifier as a private data. Use net_generic to introduce per-net
fib_notifier struct and move fib_notifier_ops there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/net_namespace.h |  3 ---
 net/core/fib_notifier.c     | 29 +++++++++++++++++++++++------
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ab40d7afdc54..64bcb589a610 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -103,9 +103,6 @@ struct net {
 	/* core fib_rules */
 	struct list_head	rules_ops;
 
-	struct list_head	fib_notifier_ops;  /* Populated by
-						    * register_pernet_subsys()
-						    */
 	struct net_device       *loopback_dev;          /* The loopback */
 	struct netns_core	core;
 	struct netns_mib	mib;
diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
index 13a40b831d6d..470a606d5e8d 100644
--- a/net/core/fib_notifier.c
+++ b/net/core/fib_notifier.c
@@ -5,8 +5,15 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <net/net_namespace.h>
+#include <net/netns/generic.h>
 #include <net/fib_notifier.h>
 
+static unsigned int fib_notifier_net_id;
+
+struct fib_notifier_net {
+	struct list_head fib_notifier_ops;
+};
+
 static ATOMIC_NOTIFIER_HEAD(fib_chain);
 
 int call_fib_notifier(struct notifier_block *nb, struct net *net,
@@ -34,6 +41,7 @@ EXPORT_SYMBOL(call_fib_notifiers);
 
 static unsigned int fib_seq_sum(void)
 {
+	struct fib_notifier_net *fn_net;
 	struct fib_notifier_ops *ops;
 	unsigned int fib_seq = 0;
 	struct net *net;
@@ -41,8 +49,9 @@ static unsigned int fib_seq_sum(void)
 	rtnl_lock();
 	down_read(&net_rwsem);
 	for_each_net(net) {
+		fn_net = net_generic(net, fib_notifier_net_id);
 		rcu_read_lock();
-		list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
+		list_for_each_entry_rcu(ops, &fn_net->fib_notifier_ops, list) {
 			if (!try_module_get(ops->owner))
 				continue;
 			fib_seq += ops->fib_seq_read(net);
@@ -58,9 +67,10 @@ static unsigned int fib_seq_sum(void)
 
 static int fib_net_dump(struct net *net, struct notifier_block *nb)
 {
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
 	struct fib_notifier_ops *ops;
 
-	list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
+	list_for_each_entry_rcu(ops, &fn_net->fib_notifier_ops, list) {
 		int err;
 
 		if (!try_module_get(ops->owner))
@@ -127,12 +137,13 @@ EXPORT_SYMBOL(unregister_fib_notifier);
 static int __fib_notifier_ops_register(struct fib_notifier_ops *ops,
 				       struct net *net)
 {
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
 	struct fib_notifier_ops *o;
 
-	list_for_each_entry(o, &net->fib_notifier_ops, list)
+	list_for_each_entry(o, &fn_net->fib_notifier_ops, list)
 		if (ops->family == o->family)
 			return -EEXIST;
-	list_add_tail_rcu(&ops->list, &net->fib_notifier_ops);
+	list_add_tail_rcu(&ops->list, &fn_net->fib_notifier_ops);
 	return 0;
 }
 
@@ -167,18 +178,24 @@ EXPORT_SYMBOL(fib_notifier_ops_unregister);
 
 static int __net_init fib_notifier_net_init(struct net *net)
 {
-	INIT_LIST_HEAD(&net->fib_notifier_ops);
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
+
+	INIT_LIST_HEAD(&fn_net->fib_notifier_ops);
 	return 0;
 }
 
 static void __net_exit fib_notifier_net_exit(struct net *net)
 {
-	WARN_ON_ONCE(!list_empty(&net->fib_notifier_ops));
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
+
+	WARN_ON_ONCE(!list_empty(&fn_net->fib_notifier_ops));
 }
 
 static struct pernet_operations fib_notifier_net_ops = {
 	.init = fib_notifier_net_init,
 	.exit = fib_notifier_net_exit,
+	.id = &fib_notifier_net_id,
+	.size = sizeof(struct fib_notifier_net),
 };
 
 static int __init fib_notifier_init(void)
-- 
2.21.0


^ permalink raw reply related

* [PATCH bpf-next] kbuild: replace BASH-specific ${@:2} with shift and ${@}
From: Andrii Nakryiko @ 2019-09-05 17:59 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Stephen Rothwell,
	Masahiro Yamada

${@:2} is BASH-specific extension, which makes link-vmlinux.sh rely on
BASH. Use shift and ${@} instead to fix this issue.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 scripts/link-vmlinux.sh | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 0d8f41db8cd6..8c59970a09dc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -57,12 +57,16 @@ modpost_link()
 
 # Link of vmlinux
 # ${1} - output file
-# ${@:2} - optional extra .o files
+# ${2}, ${3}, ... - optional extra .o files
 vmlinux_link()
 {
 	local lds="${objtree}/${KBUILD_LDS}"
+	local output=${1}
 	local objects
 
+	# skip output file argument
+	shift
+
 	if [ "${SRCARCH}" != "um" ]; then
 		objects="--whole-archive			\
 			${KBUILD_VMLINUX_OBJS}			\
@@ -70,9 +74,10 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
-			${@:2}"
+			${@}"
 
-		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${1}	\
+		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
+			-o ${output}				\
 			-T ${lds} ${objects}
 	else
 		objects="-Wl,--whole-archive			\
@@ -81,9 +86,10 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
-			${@:2}"
+			${@}"
 
-		${CC} ${CFLAGS_vmlinux} -o ${1}			\
+		${CC} ${CFLAGS_vmlinux}				\
+			-o ${output}				\
 			-Wl,-T,${lds}				\
 			${objects}				\
 			-lutil -lrt -lpthread
-- 
2.21.0


^ permalink raw reply related

* Applied "spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages" to the spi tree
From: Mark Brown @ 2019-09-05 17:39 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, broonie, f.fainelli, h.feurstein, linux-spi, Mark Brown,
	mlichvar, netdev, richardcochran
In-Reply-To: <20190905010114.26718-2-olteanv@gmail.com>

The patch

   spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages

has been applied to the spi tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

From d1c44c9342c17e3314371325d9272684a075b65c Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Thu, 5 Sep 2019 04:01:11 +0300
Subject: [PATCH] spi: Use an abbreviated pointer to ctlr->cur_msg in
 __spi_pump_messages

This helps a bit with line fitting now (the list_first_entry call) as
well as during the next patch which needs to iterate through all
transfers of ctlr->cur_msg so it timestamps them.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-2-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 drivers/spi/spi.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index aef55acb5ccd..b2890923d256 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1265,8 +1265,9 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
-	unsigned long flags;
+	struct spi_message *msg;
 	bool was_busy = false;
+	unsigned long flags;
 	int ret;
 
 	/* Lock queue */
@@ -1325,10 +1326,10 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 	}
 
 	/* Extract head of queue */
-	ctlr->cur_msg =
-		list_first_entry(&ctlr->queue, struct spi_message, queue);
+	msg = list_first_entry(&ctlr->queue, struct spi_message, queue);
+	ctlr->cur_msg = msg;
 
-	list_del_init(&ctlr->cur_msg->queue);
+	list_del_init(&msg->queue);
 	if (ctlr->busy)
 		was_busy = true;
 	else
@@ -1361,7 +1362,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 			if (ctlr->auto_runtime_pm)
 				pm_runtime_put(ctlr->dev.parent);
 
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 
 			mutex_unlock(&ctlr->io_mutex);
@@ -1369,28 +1370,28 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		}
 	}
 
-	trace_spi_message_start(ctlr->cur_msg);
+	trace_spi_message_start(msg);
 
 	if (ctlr->prepare_message) {
-		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
+		ret = ctlr->prepare_message(ctlr, msg);
 		if (ret) {
 			dev_err(&ctlr->dev, "failed to prepare message: %d\n",
 				ret);
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 			goto out;
 		}
 		ctlr->cur_msg_prepared = true;
 	}
 
-	ret = spi_map_msg(ctlr, ctlr->cur_msg);
+	ret = spi_map_msg(ctlr, msg);
 	if (ret) {
-		ctlr->cur_msg->status = ret;
+		msg->status = ret;
 		spi_finalize_current_message(ctlr);
 		goto out;
 	}
 
-	ret = ctlr->transfer_one_message(ctlr, ctlr->cur_msg);
+	ret = ctlr->transfer_one_message(ctlr, msg);
 	if (ret) {
 		dev_err(&ctlr->dev,
 			"failed to transfer one message from queue\n");
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Florian Fainelli @ 2019-09-05 17:23 UTC (permalink / raw)
  To: Ardelean, Alexandru, andrew@lunn.ch
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, hkallweit1@gmail.com
In-Reply-To: <361eb94a4da73d1fa21893e8e294639f0fc0bcd2.camel@analog.com>

On 9/4/19 11:25 PM, Ardelean, Alexandru wrote:
> On Wed, 2019-09-04 at 21:53 +0200, Andrew Lunn wrote:
>> [External]
>>
>> On Wed, Sep 04, 2019 at 07:23:21PM +0300, Alexandru Ardelean wrote:
>>
>> Hi Alexandru
>>
>> Somewhere we need a comment stating what EDPD means. Here would be a
>> good place.
> 
> ack
> 
>>
>>> +#define ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL	0x7fff
>>> +#define ETHTOOL_PHY_EDPD_NO_TX			0x8000
>>> +#define ETHTOOL_PHY_EDPD_DISABLE		0
>>
>> I think you are passing a u16. So why not 0xfffe and 0xffff?  We also
>> need to make it clear what the units are for interval. This file
> 
> I initially thought about keeping this u8 and going with 0xff & 0xfe.
> But 254 or 253 could be too small to specify the value of an interval.
> 
> Also (maybe due ti all the coding-patterns that I saw over the course of some time), make me feel that I should add a
> flag somewhere.
> 
> Bottom line is: 0xfffe and 0xffff also work from my side, if it is acceptable (by the community).
> 
> Another approach I considered, was to maybe have this EDPD just do enable & disable (which is sufficient for the `adin`
> PHY & `micrel` as well).
> That would mean that if we would ever want to configure the TX interval (in the future), we would need an extra PHY-
> tunable parameter just for that; because changing the enable/disable behavior would be dangerous.
> And also, deferring the TX-interval configuration, does not sound like good design/pattern, since it can allow for tons
> of PHY-tunable parameters for every little knob.

It seems to me that the interval is a better way to deal with that, if
you specify a non zero interval, you enable EDPD, even if your PHY can
only act on an enable/disable bit. For PHYs that do support setting a TX
internal, the non-zero interval can be translated into whatever
appropriate unit. In all cases, a 0 interval means disable.

Andrew, does that work  for you?
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net/skbuff: silence warnings under memory pressure
From: Steven Rostedt @ 2019-09-05 17:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qian Cai, Petr Mladek, Sergey Senozhatsky, Michal Hocko,
	Eric Dumazet, davem, netdev, linux-mm, linux-kernel
In-Reply-To: <20190905113208.GA521@jagdpanzerIV>

On Thu, 5 Sep 2019 20:32:08 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> I think we can queue significantly much less irq_work-s from printk().
> 
> Petr, Steven, what do you think?

What if we just rate limit the wake ups of klogd? I mean, really, do we
need to keep calling wake up if it probably never even executed?

-- Steve

^ permalink raw reply

* Re: [PATCH 1/2] net: phy: dp83867: Add documentation for SGMII mode type
From: Trent Piepho @ 2019-09-05 17:06 UTC (permalink / raw)
  To: vitaly.gaiduk@cloudbear.ru, davem@davemloft.net,
	f.fainelli@gmail.com, robh+dt@kernel.org
  Cc: mark.rutland@arm.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, andrew@lunn.ch,
	devicetree@vger.kernel.org
In-Reply-To: <1567700761-14195-2-git-send-email-vitaly.gaiduk@cloudbear.ru>

On Thu, 2019-09-05 at 19:26 +0300, Vitaly Gaiduk wrote:
> Add documentation of ti,sgmii-type which can be used to select
> SGMII mode type (4 or 6-wire).
> 
> Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
> ---
>  Documentation/devicetree/bindings/net/ti,dp83867.txt | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/ti,dp83867.txt b/Documentation/devicetree/bindings/net/ti,dp83867.txt
> index db6aa3f2215b..18e7fd52897f 100644
> --- a/Documentation/devicetree/bindings/net/ti,dp83867.txt
> +++ b/Documentation/devicetree/bindings/net/ti,dp83867.txt
> @@ -37,6 +37,7 @@ Optional property:
>  			      for applicable values.  The CLK_OUT pin can also
>  			      be disabled by this property.  When omitted, the
>  			      PHY's default will be left as is.
> +	- ti,sgmii-type - This denotes the fact which SGMII mode is used (4 or 6-wire).

Really should explain what kind of value it is and what the values
mean.  I.e., should this be ti,sgimii-type = <4> to select 4 wire?

Maybe a boolean, "sgmii-clock", to indicate the presence of sgmii rx
clock lines, would make more sense?

I also wonder if phy-mode = "sgmii-clk" or "sgmii-6wire", vs the
existing phy-mode = "sgmii", might also be a better way to describe
this instead of a new property.

^ permalink raw reply

* Re: [PATCH] net/skbuff: silence warnings under memory pressure
From: Steven Rostedt @ 2019-09-05 17:14 UTC (permalink / raw)
  To: Qian Cai
  Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, Michal Hocko,
	Eric Dumazet, davem, netdev, linux-mm, linux-kernel
In-Reply-To: <1567699393.5576.96.camel@lca.pw>

On Thu, 05 Sep 2019 12:03:13 -0400
Qian Cai <cai@lca.pw> wrote:

> > > and could deal with console hardware that involve irq_exit() anyway.  
> > 
> > printk->console_driver->write() does not involve irq.  
> 
> Hmm, from the article,
> 
> https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter
> 
> "Since transmission of a single or multiple characters may take a long time
> relative to CPU speeds, a UART maintains a flag showing busy status so that the
> host system knows if there is at least one character in the transmit buffer or
> shift register; "ready for next character(s)" may also be signaled with an
> interrupt."

I'm pretty sure all serial consoles do a busy loop on the UART and not
use interrupts to notify when it's available. That would require an
asynchronous implementation of printk() which would be quite complex to
implement.

-- Steve

^ permalink raw reply

* Re: [v3] net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate
From: Cong Wang @ 2019-09-05 17:04 UTC (permalink / raw)
  To: David Dai
  Cc: Jamal Hadi Salim, Jiri Pirko, David Miller,
	Linux Kernel Network Developers, LKML, zdai
In-Reply-To: <1567609423-26826-1-git-send-email-zdai@linux.vnet.ibm.com>

On Wed, Sep 4, 2019 at 8:03 AM David Dai <zdai@linux.vnet.ibm.com> wrote:
>
> For high speed adapter like Mellanox CX-5 card, it can reach upto
> 100 Gbits per second bandwidth. Currently htb already supports 64bit rate
> in tc utility. However police action rate and peakrate are still limited
> to 32bit value (upto 32 Gbits per second). Add 2 new attributes
> TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
> so that tc utility can use them for 64bit rate and peakrate value to
> break the 32bit limit, and still keep the backward binary compatibility.
>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
> Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks.

^ permalink raw reply

* Re: [PATCH] net: phylink: Fix flow control resolution
From: Russell King - ARM Linux admin @ 2019-09-05 16:59 UTC (permalink / raw)
  To: stefanc
  Cc: davem, andrew, netdev, linux-kernel, shaulb, nadavh, ymarkman,
	marcin
In-Reply-To: <1567701978-16056-1-git-send-email-stefanc@marvell.com>

On Thu, Sep 05, 2019 at 07:46:18PM +0300, stefanc@marvell.com wrote:
> From: Stefan Chulski <stefanc@marvell.com>
> 
> Regarding to IEEE 802.3-2015 standard section 2
> 28B.3 Priority resolution - Table 28-3 - Pause resolution
> 
> In case of Local device Pause=1 AsymDir=0, Link partner
> Pause=1 AsymDir=1, Local device resolution should be enable PAUSE
> transmit, disable PAUSE receive.
> And in case of Local device Pause=1 AsymDir=1, Link partner
> Pause=1 AsymDir=0, Local device resolution should be enable PAUSE
> receive, disable PAUSE transmit.
> 
> Signed-off-by: Stefan Chulski <stefanc@marvell.com>
> Reported-by: Shaul Ben-Mayor <shaulb@marvell.com>

Good catch, thanks for the patch.

Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")

> ---
>  drivers/net/phy/phylink.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index a45c5de..a5a57ca 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -376,8 +376,8 @@ static void phylink_get_fixed_state(struct phylink *pl, struct phylink_link_stat
>   *  Local device  Link partner
>   *  Pause AsymDir Pause AsymDir Result
>   *    1     X       1     X     TX+RX
> - *    0     1       1     1     RX
> - *    1     1       0     1     TX
> + *    0     1       1     1     TX
> + *    1     1       0     1     RX
>   */
>  static void phylink_resolve_flow(struct phylink *pl,
>  				 struct phylink_link_state *state)
> @@ -398,7 +398,7 @@ static void phylink_resolve_flow(struct phylink *pl,
>  			new_pause = MLO_PAUSE_TX | MLO_PAUSE_RX;
>  		else if (pause & MLO_PAUSE_ASYM)
>  			new_pause = state->pause & MLO_PAUSE_SYM ?
> -				 MLO_PAUSE_RX : MLO_PAUSE_TX;
> +				 MLO_PAUSE_TX : MLO_PAUSE_RX;
>  	} else {
>  		new_pause = pl->link_config.pause & MLO_PAUSE_TXRX_MASK;
>  	}
> -- 
> 1.9.1
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* [PATCH] net: phylink: Fix flow control resolution
From: stefanc @ 2019-09-05 16:46 UTC (permalink / raw)
  To: davem
  Cc: linux, andrew, netdev, linux-kernel, stefanc, shaulb, nadavh,
	ymarkman, marcin

From: Stefan Chulski <stefanc@marvell.com>

Regarding to IEEE 802.3-2015 standard section 2
28B.3 Priority resolution - Table 28-3 - Pause resolution

In case of Local device Pause=1 AsymDir=0, Link partner
Pause=1 AsymDir=1, Local device resolution should be enable PAUSE
transmit, disable PAUSE receive.
And in case of Local device Pause=1 AsymDir=1, Link partner
Pause=1 AsymDir=0, Local device resolution should be enable PAUSE
receive, disable PAUSE transmit.

Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Reported-by: Shaul Ben-Mayor <shaulb@marvell.com>
---
 drivers/net/phy/phylink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index a45c5de..a5a57ca 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -376,8 +376,8 @@ static void phylink_get_fixed_state(struct phylink *pl, struct phylink_link_stat
  *  Local device  Link partner
  *  Pause AsymDir Pause AsymDir Result
  *    1     X       1     X     TX+RX
- *    0     1       1     1     RX
- *    1     1       0     1     TX
+ *    0     1       1     1     TX
+ *    1     1       0     1     RX
  */
 static void phylink_resolve_flow(struct phylink *pl,
 				 struct phylink_link_state *state)
@@ -398,7 +398,7 @@ static void phylink_resolve_flow(struct phylink *pl,
 			new_pause = MLO_PAUSE_TX | MLO_PAUSE_RX;
 		else if (pause & MLO_PAUSE_ASYM)
 			new_pause = state->pause & MLO_PAUSE_SYM ?
-				 MLO_PAUSE_RX : MLO_PAUSE_TX;
+				 MLO_PAUSE_TX : MLO_PAUSE_RX;
 	} else {
 		new_pause = pl->link_config.pause & MLO_PAUSE_TXRX_MASK;
 	}
-- 
1.9.1


^ permalink raw reply related

* [PATCH 1/2] net: phy: dp83867: Add documentation for SGMII mode type
From: Vitaly Gaiduk @ 2019-09-05 16:26 UTC (permalink / raw)
  To: davem, robh+dt, f.fainelli
  Cc: Vitaly Gaiduk, Mark Rutland, Trent Piepho, Andrew Lunn, netdev,
	devicetree, linux-kernel
In-Reply-To: <1567700761-14195-1-git-send-email-vitaly.gaiduk@cloudbear.ru>

Add documentation of ti,sgmii-type which can be used to select
SGMII mode type (4 or 6-wire).

Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
---
 Documentation/devicetree/bindings/net/ti,dp83867.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/ti,dp83867.txt b/Documentation/devicetree/bindings/net/ti,dp83867.txt
index db6aa3f2215b..18e7fd52897f 100644
--- a/Documentation/devicetree/bindings/net/ti,dp83867.txt
+++ b/Documentation/devicetree/bindings/net/ti,dp83867.txt
@@ -37,6 +37,7 @@ Optional property:
 			      for applicable values.  The CLK_OUT pin can also
 			      be disabled by this property.  When omitted, the
 			      PHY's default will be left as is.
+	- ti,sgmii-type - This denotes the fact which SGMII mode is used (4 or 6-wire).
 
 Note: ti,min-output-impedance and ti,max-output-impedance are mutually
       exclusive. When both properties are present ti,max-output-impedance
-- 
2.16.4


^ permalink raw reply related

* [PATCH 2/2] net: phy: dp83867: Add SGMII mode type switching
From: Vitaly Gaiduk @ 2019-09-05 16:25 UTC (permalink / raw)
  To: davem, robh+dt, f.fainelli
  Cc: Vitaly Gaiduk, Mark Rutland, Andrew Lunn, Heiner Kallweit,
	Trent Piepho, netdev, devicetree, linux-kernel

This patch adds ability to switch beetween two PHY SGMII modes.
Some hardware, for example, FPGA IP designs may use 6-wire mode
which enables differential SGMII clock to MAC.

Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
---
 drivers/net/phy/dp83867.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 1f1ecee0ee2f..3efb33e7523f 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -37,6 +37,7 @@
 #define DP83867_STRAP_STS2	0x006f
 #define DP83867_RGMIIDCTL	0x0086
 #define DP83867_IO_MUX_CFG	0x0170
+#define DP83867_SGMIICTL	0x00D3
 #define DP83867_10M_SGMII_CFG   0x016F
 #define DP83867_10M_SGMII_RATE_ADAPT_MASK BIT(7)
 
@@ -61,6 +62,9 @@
 #define DP83867_RGMII_TX_CLK_DELAY_EN		BIT(1)
 #define DP83867_RGMII_RX_CLK_DELAY_EN		BIT(0)
 
+/* SGMIICTL bits */
+#define DP83867_SGMII_TYPE		BIT(14)
+
 /* STRAP_STS1 bits */
 #define DP83867_STRAP_STS1_RESERVED		BIT(11)
 
@@ -109,6 +113,7 @@ struct dp83867_private {
 	bool rxctrl_strap_quirk;
 	bool set_clk_output;
 	u32 clk_output_sel;
+	bool sgmii_type;
 };
 
 static int dp83867_ack_interrupt(struct phy_device *phydev)
@@ -197,6 +202,8 @@ static int dp83867_of_init(struct phy_device *phydev)
 	dp83867->rxctrl_strap_quirk = of_property_read_bool(of_node,
 					"ti,dp83867-rxctrl-strap-quirk");
 
+	dp83867->sgmii_type = of_property_read_bool(of_node, "ti,sgmii-type");
+
 	/* Existing behavior was to use default pin strapping delay in rgmii
 	 * mode, but rgmii should have meant no delay.  Warn existing users.
 	 */
@@ -389,6 +396,14 @@ static int dp83867_config_init(struct phy_device *phydev)
 
 		if (ret)
 			return ret;
+
+		/* SGMII type is set to 4-wire mode by default */
+		if (dp83867->sgmii_type) {
+			/* Switch-on 6-wire mode */
+			val = phy_read_mmd(phydev, DP83867_DEVADDR, DP83867_SGMIICTL);
+			val |= DP83867_SGMII_TYPE;
+			phy_write_mmd(phydev, DP83867_DEVADDR, DP83867_SGMIICTL, val);
+		}
 	}
 
 	/* Enable Interrupt output INT_OE in CFG3 register */
-- 
2.16.4


^ permalink raw reply related

* [PATCH] brcmsmac: Use DIV_ROUND_CLOSEST directly to make it readable
From: zhong jiang @ 2019-09-05 16:24 UTC (permalink / raw)
  To: kvalo; +Cc: davem, zhongjiang, linux-wireless, netdev, linux-kernel

The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d
but is perhaps more readable.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 .../net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c   | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index 07f61d6..3bf152d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -17748,7 +17748,7 @@ static void wlc_phy_txpwrctrl_pwr_setup_nphy(struct brcms_phy *pi)
 			num = 8 *
 			      (16 * b0[tbl_id - 26] + b1[tbl_id - 26] * idx);
 			den = 32768 + a1[tbl_id - 26] * idx;
-			pwr_est = max(((4 * num + den / 2) / den), -8);
+			pwr_est = max(DIV_ROUND_CLOSEST(4 * num, den), -8);
 			if (NREV_LT(pi->pubpi.phy_rev, 3)) {
 				if (idx <=
 				    (uint) (31 - idle_tssi[tbl_id - 26] + 1))
@@ -26990,8 +26990,8 @@ static void wlc_phy_rxcal_phycleanup_nphy(struct brcms_phy *pi, u8 rx_core)
 				     NPHY_RXCAL_TONEAMP, 0, cal_type, false);
 
 		wlc_phy_rx_iq_est_nphy(pi, est, num_samps, 32, 0);
-		i_pwr = (est[rx_core].i_pwr + num_samps / 2) / num_samps;
-		q_pwr = (est[rx_core].q_pwr + num_samps / 2) / num_samps;
+		i_pwr = DIV_ROUND_CLOSEST(est[rx_core].i_pwr, num_samps);
+		q_pwr = DIV_ROUND_CLOSEST(est[rx_core].q_pwr, num_samps);
 		curr_pwr = i_pwr + q_pwr;
 
 		switch (gainctrl_dirn) {
@@ -27673,10 +27673,10 @@ static int wlc_phy_cal_rxiq_nphy_rev3(struct brcms_phy *pi,
 					wlc_phy_rx_iq_est_nphy(pi, est,
 							       num_samps, 32,
 							       0);
-					i_pwr =	(est[rx_core].i_pwr +
-						 num_samps / 2) / num_samps;
-					q_pwr =	(est[rx_core].q_pwr +
-						 num_samps / 2) / num_samps;
+					i_pwr = DIV_ROUND_CLOSEST(est[rx_core].i_pwr,
+									 num_samps);
+					q_pwr = DIV_ROUND_CLOSEST(est[rx_core].q_pwr,
+									 num_samps);
 					tot_pwr[gain_pass] = i_pwr + q_pwr;
 				} else {
 
-- 
1.7.12.4


^ permalink raw reply related

* Re: [BACKPORT 4.14.y v2 5/6] ppp: mppe: Revert "ppp: mppe: Add softdep to arc4"
From: Eric Biggers @ 2019-09-05 16:16 UTC (permalink / raw)
  To: Baolin Wang
  Cc: stable, paulus, linux-ppp, netdev, arnd, orsonzhai,
	vincent.guittot, linux-kernel
In-Reply-To: <c24710bae9098ba971a2778a1a44627d5fa3ddc0.1567649729.git.baolin.wang@linaro.org>

On Thu, Sep 05, 2019 at 11:10:45AM +0800, Baolin Wang wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> [Upstream commit 25a09ce79639a8775244808c17282c491cff89cf]
> 
> Commit 0e5a610b5ca5 ("ppp: mppe: switch to RC4 library interface"),
> which was merged through the crypto tree for v5.3, changed ppp_mppe.c to
> use the new arc4_crypt() library function rather than access RC4 through
> the dynamic crypto_skcipher API.
> 
> Meanwhile commit aad1dcc4f011 ("ppp: mppe: Add softdep to arc4") was
> merged through the net tree and added a module soft-dependency on "arc4".
> 
> The latter commit no longer makes sense because the code now uses the
> "libarc4" module rather than "arc4", and also due to the direct use of
> arc4_crypt(), no module soft-dependency is required.
> 
> So revert the latter commit.
> 
> Cc: Takashi Iwai <tiwai@suse.de>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> ---
>  drivers/net/ppp/ppp_mppe.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
> index d9eda7c..6c7fd98 100644
> --- a/drivers/net/ppp/ppp_mppe.c
> +++ b/drivers/net/ppp/ppp_mppe.c
> @@ -63,7 +63,6 @@
>  MODULE_DESCRIPTION("Point-to-Point Protocol Microsoft Point-to-Point Encryption support");
>  MODULE_LICENSE("Dual BSD/GPL");
>  MODULE_ALIAS("ppp-compress-" __stringify(CI_MPPE));
> -MODULE_SOFTDEP("pre: arc4");

Why is this being backported?  This revert was only needed because of a
different patch that was merged in v5.3, as I explained in the commit message.

- Eric

^ permalink raw reply

* Re: [PATCH v3 bpf-next 2/3] bpf: implement CAP_BPF
From: Song Liu @ 2019-09-05 16:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, David S . Miller, Daniel Borkmann,
	Peter Zijlstra, Andy Lutomirski, Networking, bpf, Kernel Team,
	linux-api@vger.kernel.org
In-Reply-To: <20190905031518.behyq7olkh6fjsoe@ast-mbp.dhcp.thefacebook.com>



> On Sep 4, 2019, at 8:15 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Thu, Sep 05, 2019 at 02:51:51AM +0000, Song Liu wrote:
>> 
>> 
>>> On Sep 4, 2019, at 6:32 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Thu, Sep 05, 2019 at 12:34:36AM +0000, Song Liu wrote:
>>>> 
>>>> 
>>>>> On Sep 4, 2019, at 11:43 AM, Alexei Starovoitov <ast@kernel.org> wrote:
>>>>> 
>>>>> Implement permissions as stated in uapi/linux/capability.h
>>>>> 
>>>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>>>> 
>>>> 
>>>> [...]
>>>> 
>>>>> @@ -1648,11 +1648,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
>>>>> 	is_gpl = license_is_gpl_compatible(license);
>>>>> 
>>>>> 	if (attr->insn_cnt == 0 ||
>>>>> -	    attr->insn_cnt > (capable(CAP_SYS_ADMIN) ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
>>>>> +	    attr->insn_cnt > (capable_bpf() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
>>>>> 		return -E2BIG;
>>>>> 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
>>>>> 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
>>>>> -	    !capable(CAP_SYS_ADMIN))
>>>>> +	    !capable_bpf())
>>>>> 		return -EPERM;
>>>> 
>>>> Do we allow load BPF_PROG_TYPE_SOCKET_FILTER and BPF_PROG_TYPE_CGROUP_SKB
>>>> without CAP_BPF? If so, maybe highlight in the header?
>>> 
>>> of course. there is no change in behavior.
>>> 'highlight in the header'?
>>> you mean in commit log?
>>> I think it's a bit weird to describe things in commit that patch
>>> is _not_ changing vs things that patch does actually change.
>>> This type of comment would be great in a doc though.
>>> The doc will be coming separately in the follow up assuming
>>> the whole thing lands. I'll remember to note that bit.
>> 
>> I meant capability.h:
>> 
>> + * CAP_BPF allows the following BPF operations:
>> + * - Loading all types of BPF programs
>> 
>> But CAP_BPF is not required to load all types of programs. 
> 
> yes, but above statement is still correct, right?
> 
> And right below it says:
> * CAP_BPF allows the following BPF operations:
> * - Loading all types of BPF programs
> * - Creating all types of BPF maps except:
> *    - stackmap that needs CAP_TRACING
> *    - devmap that needs CAP_NET_ADMIN
> *    - cpumap that needs CAP_SYS_ADMIN
> which is also correct, but CAP_BPF is not required
> for array, hash, prog_array, percpu, map-in-map ...
> except their lru variants...
> and except if they contain bpf_spin_lock...
> and if they need BTF it currently can be loaded with cap_sys_admin only...
> 
> If we say something about socket_filter, cg_skb progs in capability.h
> we should clarify maps as well, but then it will become too big for .h
> The comments in capability.h already look too long to me.
> All that info and a lot more belongs in the doc.

Agreed. We cannot put all these details in capability.h. Doc/wikipages 
would be better fit for these information. 

Thanks,
Song



^ permalink raw reply

* [PATCH 2/8] netfilter: nft_meta: support for time matching
From: Pablo Neira Ayuso @ 2019-09-05 16:03 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190905160400.25399-1-pablo@netfilter.org>

From: Ander Juaristi <a@juaristi.eus>

This patch introduces meta matches in the kernel for time (a UNIX timestamp),
day (a day of week, represented as an integer between 0-6), and
hour (an hour in the current day, or: number of seconds since midnight).

All values are taken as unsigned 64-bit integers.

The 'time' keyword is internally converted to nanoseconds by nft in
userspace, and hence the timestamp is taken in nanoseconds as well.

Signed-off-by: Ander Juaristi <a@juaristi.eus>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h |  6 +++++
 net/netfilter/nft_meta.c                 | 46 ++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 82abaa183fc3..b83b62eb4b01 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -799,6 +799,9 @@ enum nft_exthdr_attributes {
  * @NFT_META_OIFKIND: packet output interface kind name (dev->rtnl_link_ops->kind)
  * @NFT_META_BRI_IIFPVID: packet input bridge port pvid
  * @NFT_META_BRI_IIFVPROTO: packet input bridge vlan proto
+ * @NFT_META_TIME_NS: time since epoch (in nanoseconds)
+ * @NFT_META_TIME_DAY: day of week (from 0 = Sunday to 6 = Saturday)
+ * @NFT_META_TIME_HOUR: hour of day (in seconds)
  */
 enum nft_meta_keys {
 	NFT_META_LEN,
@@ -831,6 +834,9 @@ enum nft_meta_keys {
 	NFT_META_OIFKIND,
 	NFT_META_BRI_IIFPVID,
 	NFT_META_BRI_IIFVPROTO,
+	NFT_META_TIME_NS,
+	NFT_META_TIME_DAY,
+	NFT_META_TIME_HOUR,
 };
 
 /**
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index f69afb9ff3cb..317e3a9e8c5b 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -26,8 +26,36 @@
 
 #include <uapi/linux/netfilter_bridge.h> /* NF_BR_PRE_ROUTING */
 
+#define NFT_META_SECS_PER_MINUTE	60
+#define NFT_META_SECS_PER_HOUR		3600
+#define NFT_META_SECS_PER_DAY		86400
+#define NFT_META_DAYS_PER_WEEK		7
+
 static DEFINE_PER_CPU(struct rnd_state, nft_prandom_state);
 
+static u8 nft_meta_weekday(unsigned long secs)
+{
+	unsigned int dse;
+	u8 wday;
+
+	secs -= NFT_META_SECS_PER_MINUTE * sys_tz.tz_minuteswest;
+	dse = secs / NFT_META_SECS_PER_DAY;
+	wday = (4 + dse) % NFT_META_DAYS_PER_WEEK;
+
+	return wday;
+}
+
+static u32 nft_meta_hour(unsigned long secs)
+{
+	struct tm tm;
+
+	time64_to_tm(secs, 0, &tm);
+
+	return tm.tm_hour * NFT_META_SECS_PER_HOUR
+		+ tm.tm_min * NFT_META_SECS_PER_MINUTE
+		+ tm.tm_sec;
+}
+
 void nft_meta_get_eval(const struct nft_expr *expr,
 		       struct nft_regs *regs,
 		       const struct nft_pktinfo *pkt)
@@ -218,6 +246,15 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 			goto err;
 		strncpy((char *)dest, out->rtnl_link_ops->kind, IFNAMSIZ);
 		break;
+	case NFT_META_TIME_NS:
+		nft_reg_store64(dest, ktime_get_real_ns());
+		break;
+	case NFT_META_TIME_DAY:
+		nft_reg_store8(dest, nft_meta_weekday(get_seconds()));
+		break;
+	case NFT_META_TIME_HOUR:
+		*dest = nft_meta_hour(get_seconds());
+		break;
 	default:
 		WARN_ON(1);
 		goto err;
@@ -330,6 +367,15 @@ int nft_meta_get_init(const struct nft_ctx *ctx,
 		len = sizeof(u8);
 		break;
 #endif
+	case NFT_META_TIME_NS:
+		len = sizeof(u64);
+		break;
+	case NFT_META_TIME_DAY:
+		len = sizeof(u8);
+		break;
+	case NFT_META_TIME_HOUR:
+		len = sizeof(u32);
+		break;
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.11.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox