Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 net-next 19/19] ionic: Add basic devlink interface
From: Shannon Nelson @ 2019-07-23 22:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20190723.144055.138556918172139772.davem@davemloft.net>

On 7/23/19 2:40 PM, David Miller wrote:
> From: Shannon Nelson <snelson@pensando.io>
> Date: Mon, 22 Jul 2019 14:40:23 -0700
>
>> +struct ionic *ionic_devlink_alloc(struct device *dev)
>> +{
>> +	struct devlink *dl;
>> +	struct ionic *ionic;
> Reverse christmas tree please.

Yep, I missed this one.

Thanks for your review time.
sln



^ permalink raw reply

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Saeed Mahameed @ 2019-07-23 22:51 UTC (permalink / raw)
  To: snelson@pensando.io, davem@davemloft.net; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190723.143326.197667027838462669.davem@davemloft.net>

On Tue, 2019-07-23 at 14:33 -0700, David Miller wrote:
> From: Shannon Nelson <snelson@pensando.io>
> Date: Mon, 22 Jul 2019 14:40:15 -0700
> 
> > +	if (in_interrupt()) {
> > +		work = kzalloc(sizeof(*work), GFP_ATOMIC);
> > +		if (!work) {
> > +			netdev_err(lif->netdev, "%s OOM\n", __func__);
> > +			return -ENOMEM;
> > +		}
> > +		work->type = add ? DW_TYPE_RX_ADDR_ADD :
> > DW_TYPE_RX_ADDR_DEL;
> > +		memcpy(work->addr, addr, ETH_ALEN);
> > +		netdev_dbg(lif->netdev, "deferred: rx_filter %s %pM\n",
> > +			   add ? "add" : "del", addr);
> > +		ionic_lif_deferred_enqueue(&lif->deferred, work);
> > +	} else {
> > +		netdev_dbg(lif->netdev, "rx_filter %s %pM\n",
> > +			   add ? "add" : "del", addr);
> > +		if (add)
> > +			return ionic_lif_addr_add(lif, addr);
> > +		else
> > +			return ionic_lif_addr_del(lif, addr);
> > +	}
> 
> I don't know about this.
> 
> Generally interface address changes are expected to be synchronous.

Well, drivers can't sleep on set_rx_mode ndo, dev_set_rx_mode is
holding netif_addr_lock_bh.

Some drivers need to wait for firmware/device completion to program the
new address and they have to do this asynchronously to avoid atomic
context. I assume this driver is having the same issue due to
"ionic_adminq_post_wait()" .. 


^ permalink raw reply

* Re: [PATCH TEST] net: bridge: mcast: fix possible uses of stale pointers
From: kbuild test robot @ 2019-07-23 22:54 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kbuild-all, Martin Weinelt, bridge, Roopa Prabhu, netdev
In-Reply-To: <908e9e90-70cc-7bbe-f83f-0810c9ef3925@cumulusnetworks.com>

Hi Nikolay,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[cannot apply to v5.3-rc1 next-20190723]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nikolay-Aleksandrov/net-bridge-mcast-fix-possible-uses-of-stale-pointers/20190702-083354

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


coccinelle warnings: (new ones prefixed by >>)

>> net/bridge/br_multicast.c:999:8-14: ERROR: application of sizeof to pointer

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* [PATCH] fix noderef.cocci warnings
From: kbuild test robot @ 2019-07-23 22:54 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kbuild-all, Martin Weinelt, bridge, Roopa Prabhu, netdev
In-Reply-To: <908e9e90-70cc-7bbe-f83f-0810c9ef3925@cumulusnetworks.com>

From: kbuild test robot <lkp@intel.com>

net/bridge/br_multicast.c:999:8-14: ERROR: application of sizeof to pointer

 sizeof when applied to a pointer typed expression gives the size of
 the pointer

Generated by: scripts/coccinelle/misc/noderef.cocci

Fixes: 17c91348ed8b ("Use-after-free in br_multicast_rcv")
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
---

url:    https://github.com/0day-ci/linux/commits/Nikolay-Aleksandrov/net-bridge-mcast-fix-possible-uses-of-stale-pointers/20190702-083354

 br_multicast.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -996,7 +996,7 @@ static int br_ip6_multicast_mld2_report(
 			return -EINVAL;
 
 		_nsrcs = skb_header_pointer(skb, nsrcs_offset,
-					   sizeof(_nsrcs), &__nsrcs);
+					   sizeof(*_nsrcs), &__nsrcs);
 		if (!_nsrcs)
 			return -EINVAL;
 

^ permalink raw reply

* Re: [PATCH v2] drivers: net: xgene: Remove acpi_has_method() calls
From: Bjorn Helgaas @ 2019-07-23 22:56 UTC (permalink / raw)
  To: Kelsey Skunberg
  Cc: iyappan, keyur, quan, David Miller, netdev, linux-kernel,
	Bjorn Helgaas, Shuah Khan, linux-kernel-mentees
In-Reply-To: <20190723185811.8548-1-skunberg.kelsey@gmail.com>

On Tue, Jul 23, 2019 at 1:59 PM Kelsey Skunberg
<skunberg.kelsey@gmail.com> wrote:
>
> acpi_evaluate_object will already return an error if the needed method
> does not exist. Remove unnecessary acpi_has_method() calls and check the
> returned acpi_status for failure instead.

> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> index 6453fc2ebb1f..5d637b46b2bf 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> @@ -437,6 +437,7 @@ static void xgene_sgmac_tx_disable(struct xgene_enet_pdata *p)
>  static int xgene_enet_reset(struct xgene_enet_pdata *p)
>  {
>         struct device *dev = &p->pdev->dev;
> +       acpi_status status;
>
>         if (!xgene_ring_mgr_init(p))
>                 return -ENODEV;
> @@ -460,14 +461,13 @@ static int xgene_enet_reset(struct xgene_enet_pdata *p)
>                 }
>         } else {
>  #ifdef CONFIG_ACPI
> -               if (acpi_has_method(ACPI_HANDLE(&p->pdev->dev), "_RST"))
> -                       acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
> -                                            "_RST", NULL, NULL);
> -               else if (acpi_has_method(ACPI_HANDLE(&p->pdev->dev), "_INI"))
> +               status = acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
> +                                             "_RST", NULL, NULL);
> +               if (ACPI_FAILURE(status)) {
>                         acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
>                                              "_INI", NULL, NULL);
> +               }
>  #endif
> -       }

Oops, I don't think you intended to remove that brace.

If you haven't found it already, CONFIG_COMPILE_TEST is useful for
building things that wouldn't normally be buildable on your arch.

>         if (!p->port_id) {
>                 xgene_enet_ecc_init(p);

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-07-23 22:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, linux-security@vger.kernel.org, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API
In-Reply-To: <CALCETrX2bMnwC6_t4b_G-hzJSfMPrkK4YKs5ebcecv2LJ0rt3w@mail.gmail.com>



> On Jul 23, 2019, at 8:11 AM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Mon, Jul 22, 2019 at 1:54 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Hi Andy, Lorenz, and all,
>> 
>>> On Jul 2, 2019, at 2:32 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Tue, Jul 2, 2019 at 2:04 PM Kees Cook <keescook@chromium.org> wrote:
>>>> 
>>>> On Mon, Jul 01, 2019 at 06:59:13PM -0700, Andy Lutomirski wrote:
>>>>> I think I'm understanding your motivation.  You're not trying to make
>>>>> bpf() generically usable without privilege -- you're trying to create
>>>>> a way to allow certain users to access dangerous bpf functionality
>>>>> within some limits.
>>>>> 
>>>>> That's a perfectly fine goal, but I think you're reinventing the
>>>>> wheel, and the wheel you're reinventing is quite complicated and
>>>>> already exists.  I think you should teach bpftool to be secure when
>>>>> installed setuid root or with fscaps enabled and put your policy in
>>>>> bpftool.  If you want to harden this a little bit, it would seem
>>>>> entirely reasonable to add a new CAP_BPF_ADMIN and change some, but
>>>>> not all, of the capable() checks to check CAP_BPF_ADMIN instead of the
>>>>> capabilities that they currently check.
>>>> 
>>>> If finer grained controls are wanted, it does seem like the /dev/bpf
>>>> path makes the most sense. open, request abilities, use fd. The open can
>>>> be mediated by DAC and LSM. The request can be mediated by LSM. This
>>>> provides a way to add policy at the LSM level and at the tool level.
>>>> (i.e. For tool-level controls: leave LSM wide open, make /dev/bpf owned
>>>> by "bpfadmin" and bpftool becomes setuid "bpfadmin". For fine-grained
>>>> controls, leave /dev/bpf wide open and add policy to SELinux, etc.)
>>>> 
>>>> With only a new CAP, you don't get the fine-grained controls. (The
>>>> "request abilities" part is the key there.)
>>> 
>>> Sure you do: the effective set.  It has somewhat bizarre defaults, but
>>> I don't think that's a real problem.  Also, this wouldn't be like
>>> CAP_DAC_READ_SEARCH -- you can't accidentally use your BPF caps.
>>> 
>>> I think that a /dev capability-like object isn't totally nuts, but I
>>> think we should do it well, and this patch doesn't really achieve
>>> that.  But I don't think bpf wants fine-grained controls like this at
>>> all -- as I pointed upthread, a fine-grained solution really wants
>>> different treatment for the different capable() checks, and a bunch of
>>> them won't resemble capabilities or /dev/bpf at all.
>> 
>> With 5.3-rc1 out, I am back on this. :)
>> 
>> How about we modify the set as:
>>  1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf.
> 
> I'm fine with this in principle, but:
> 
>>  2. Better handling of capable() calls through bpf code. I guess the
>>     biggest problem here is is_priv in verifier.c:bpf_check().
> 
> I think it would be good to understand exactly what /dev/bpf will
> enable one to do.  Without some care, it would just become the next
> CAP_SYS_ADMIN: if you can open it, sure, you're not root, but you can
> intercept network traffic, modify cgroup behavior, and do plenty of
> other things, any of which can probably be used to completely take
> over the system.

Well, yes. sys_bpf() is pretty powerful. 

The goal of /dev/bpf is to enable special users to call sys_bpf(). In 
the meanwhile, such users should not take down the whole system easily
by accident, e.g., with rm -rf /.

It is similar to CAP_BPF_ADMIN, without really adding the CAP_.  

I think adding new CAP_ requires much more effort. 

> 
> It would also be nice to understand why you can't do what you need to
> do entirely in user code using setuid or fscaps.

It is not very easy to achieve the same control: only certain users can
run certain tools (bpftool, etc.). 

The closest approach I can find is:
  1. use libcap (pam_cap) to give CAP_SETUID to certain users;
  2. add setuid(0) to bpftool.

The difference between this approach and /dev/bpf is that certain users
would be able to run other tools that call setuid(). Though I am not 
sure how many tools call setuid(), and how risky they are. 

> 
> Finally, at risk of rehashing some old arguments, I'll point out that
> the bpf() syscall is an unusual design to begin with.  As an example,
> consider bpf_prog_attach().  Outside of bpf(), if I want to change the
> behavior of a cgroup, I would write to a file in
> /sys/kernel/cgroup/unified/whatever/, and normal DAC and MAC rules
> apply.  With bpf(), however, I just call bpf() to attach a program to
> the cgroup.  bpf() says "oh, you are capable(CAP_NET_ADMIN) -- go for
> it!".  Unless I missed something major, and I just re-read the code,
> there is no check that the caller has write or LSM permission to
> anything at all in cgroupfs, and the existing API would make it very
> awkward to impose any kind of DAC rules here.
> 
> So I think it might actually be time to repay some techincal debt and
> come up with a real fix.  As a less intrusive approach, you could see
> about requiring ownership of the cgroup directory instead of
> CAP_NET_ADMIN.  As a more intrusive but perhaps better approach, you
> could invert the logic to to make it work like everything outside of
> cgroup: add pseudo-files like bpf.inet_ingress to the cgroup
> directories, and require a writable fd to *that* to a new improved
> attach API.  If a user could do:
> 
> int fd = open("/sys/fs/cgroup/.../bpf.inet_attach", O_RDWR);  /* usual
> DAC and MAC policy applies */
> int bpf_fd = setup the bpf stuff;  /* no privilege required, unless
> the program is huge or needs is_priv */
> bpf(BPF_IMPROVED_ATTACH, target = fd, program = bpf_fd);
> 
> there would be no capabilities or global privilege at all required for
> this.  It would just work with cgroup delegation, containers, etc.
> 
> I think you could even pull off this type of API change with only
> libbpf changes.  In particular, there's this code:
> 
> int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type,
>                    unsigned int flags)
> {
>        union bpf_attr attr;
> 
>        memset(&attr, 0, sizeof(attr));
>        attr.target_fd     = target_fd;
>        attr.attach_bpf_fd = prog_fd;
>        attr.attach_type   = type;
>        attr.attach_flags  = flags;
> 
>        return sys_bpf(BPF_PROG_ATTACH, &attr, sizeof(attr));
> }
> 
> This would instead do something like:
> 
> int specific_target_fd = openat(target_fd, bpf_type_to_target[type], O_RDWR);
> attr.target_fd = specific_target_fd;
> ...
> 
> return sys_bpf(BPF_PROG_IMPROVED_ATTACH, &attr, sizeof(attr));
> 
> Would this solve your problem without needing /dev/bpf at all?

This gives fine grain access control. I think it solves the problem. 
But it also requires a lot of rework to sys_bpf(). And it may also 
break backward/forward compatibility?

Personally, I think it is an overkill for the original motivation: 
call sys_bpf() with special user instead of root. 

Alexei, Daniel: what do you think about this? 

Thanks,
Song

^ permalink raw reply

* Re: [PATCH v2 bpf-next] libbpf: provide more helpful message on uninitialized global var
From: Alexei Starovoitov @ 2019-07-23 23:00 UTC (permalink / raw)
  To: Song Liu, Andrii Nakryiko
  Cc: bpf, netdev@vger.kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <08DD65ED-34B4-47C0-B5ED-9A354CF5BA35@fb.com>

On 7/23/19 3:03 PM, Song Liu wrote:
>> On Jul 23, 2019, at 2:11 PM, Andrii Nakryiko<andriin@fb.com>  wrote:
>>
>> When BPF program defines uninitialized global variable, it's put into
>> a special COMMON section. Libbpf will reject such programs, but will
>> provide very unhelpful message with garbage-looking section index.
>>
>> This patch detects special section cases and gives more explicit error
>> message.
>>
>> Signed-off-by: Andrii Nakryiko<andriin@fb.com>
> Acked-by: Song Liu<songliubraving@fb.com>
> 

Applied. Thanks

^ permalink raw reply

* Re: [PATCH v4 net-next 02/19] ionic: Add hardware init and device commands
From: David Miller @ 2019-07-23 23:05 UTC (permalink / raw)
  To: snelson; +Cc: netdev
In-Reply-To: <59e45fd2-3c62-58cf-cf63-935d17703d2c@pensando.io>

From: Shannon Nelson <snelson@pensando.io>
Date: Tue, 23 Jul 2019 15:50:22 -0700

> On 7/23/19 2:18 PM, David Miller wrote:
>> From: Shannon Nelson <snelson@pensando.io>
>> Date: Mon, 22 Jul 2019 14:40:06 -0700
>>
>>> +void ionic_init_devinfo(struct ionic_dev *idev)
>>> +{
>>> + idev->dev_info.asic_type = ioread8(&idev->dev_info_regs->asic_type);
>>> + idev->dev_info.asic_rev = ioread8(&idev->dev_info_regs->asic_rev);
>>> +
>>> +	memcpy_fromio(idev->dev_info.fw_version,
>>> +		      idev->dev_info_regs->fw_version,
>>> +		      IONIC_DEVINFO_FWVERS_BUFLEN);
>>> +
>>> +	memcpy_fromio(idev->dev_info.serial_num,
>>> +		      idev->dev_info_regs->serial_num,
>>> +		      IONIC_DEVINFO_SERIAL_BUFLEN);
>>   ...
>>> +	sig = ioread32(&idev->dev_info_regs->signature);
>> I think if you are going to use the io{read,write}{8,16,32,64}()
>> interfaces then you should use io{read,write}{8,16,32,64}_rep()
>> instead of memcpy_{to,from}io().
>>
> Sure.

Note, I could be wrong.  Please test.

I think the operation of the two things might be different.

^ permalink raw reply

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: David Miller @ 2019-07-23 23:06 UTC (permalink / raw)
  To: snelson; +Cc: netdev
In-Reply-To: <e0c8417c-75bf-837f-01b5-60df302dafa7@pensando.io>

From: Shannon Nelson <snelson@pensando.io>
Date: Tue, 23 Jul 2019 15:50:43 -0700

> On 7/23/19 2:33 PM, David Miller wrote:
>> Generally interface address changes are expected to be synchronous.
> Yeah, this bothers me a bit as well, but the address change calls come
> in under spin_lock_bh(), and I'm reluctant to make an AdminQ call
> under the _bh that could block for a few seconds.

So it's not about memory allocation but rather the fact that the device
might take a while to complete?

Can you start the operation synchronously yet complete it async?

^ permalink raw reply

* Re: [PATCH v2 bpf-next 0/5] switch samples and tests to libbpf perf buffer API
From: Alexei Starovoitov @ 2019-07-23 23:09 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
	daniel@iogearbox.net, Song Liu
  Cc: andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723213445.1732339-1-andriin@fb.com>

On 7/23/19 2:34 PM, Andrii Nakryiko wrote:
> There were few more tests and samples that were using custom perf buffer setup
> code from trace_helpers.h. This patch set gets rid of all the usages of those
> and removes helpers themselves. Libbpf provides nicer, but equally powerful
> set of APIs to work with perf ring buffers, so let's have all the samples use
> 
> v1->v2:
> - make logging message one long line instead of two (Song).

Applied. Thanks

^ permalink raw reply

* Re: [PATCH v2] drivers: net: xgene: Remove acpi_has_method() calls
From: Kelsey Skunberg @ 2019-07-23 23:17 UTC (permalink / raw)
  To: bjorn
  Cc: iyappan, keyur, quan, David Miller, netdev, linux-kernel,
	Shuah Khan, linux-kernel-mentees
In-Reply-To: <CABhMZUVAcJwJpN8BKZTTU7jUW6881KdBtoYs_3kSn+tDtOVqNw@mail.gmail.com>

On Tue, Jul 23, 2019 at 05:56:04PM -0500, Bjorn Helgaas wrote:
> On Tue, Jul 23, 2019 at 1:59 PM Kelsey Skunberg
> <skunberg.kelsey@gmail.com> wrote:
> >
> > acpi_evaluate_object will already return an error if the needed method
> > does not exist. Remove unnecessary acpi_has_method() calls and check the
> > returned acpi_status for failure instead.
> 
> > diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> > index 6453fc2ebb1f..5d637b46b2bf 100644
> > --- a/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> > +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c
> > @@ -437,6 +437,7 @@ static void xgene_sgmac_tx_disable(struct xgene_enet_pdata *p)
> >  static int xgene_enet_reset(struct xgene_enet_pdata *p)
> >  {
> >         struct device *dev = &p->pdev->dev;
> > +       acpi_status status;
> >
> >         if (!xgene_ring_mgr_init(p))
> >                 return -ENODEV;
> > @@ -460,14 +461,13 @@ static int xgene_enet_reset(struct xgene_enet_pdata *p)
> >                 }
> >         } else {
> >  #ifdef CONFIG_ACPI
> > -               if (acpi_has_method(ACPI_HANDLE(&p->pdev->dev), "_RST"))
> > -                       acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
> > -                                            "_RST", NULL, NULL);
> > -               else if (acpi_has_method(ACPI_HANDLE(&p->pdev->dev), "_INI"))
> > +               status = acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
> > +                                             "_RST", NULL, NULL);
> > +               if (ACPI_FAILURE(status)) {
> >                         acpi_evaluate_object(ACPI_HANDLE(&p->pdev->dev),
> >                                              "_INI", NULL, NULL);
> > +               }
> >  #endif
> > -       }
> 
> Oops, I don't think you intended to remove that brace.
> 
> If you haven't found it already, CONFIG_COMPILE_TEST is useful for
> building things that wouldn't normally be buildable on your arch.

Thank you very much for catching that. I did not know about
CONFIG_COMPILE_TEST yet and that will be very useful. It's clear why my
build test wasn't coming up with the same errors now. I know for future
patches now and will certainly get this one fixed.
Thank you again.

-Kelsey

^ permalink raw reply

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Saeed Mahameed @ 2019-07-23 23:20 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-12-snelson@pensando.io>

On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> Add the Rx filtering and rx_mode NDO callbacks.  Also add
> the deferred work thread handling needed to manage the filter
> requests otuside of the netif_addr_lock spinlock.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  .../net/ethernet/pensando/ionic/ionic_lif.c   | 389
> +++++++++++++++++-
>  .../net/ethernet/pensando/ionic/ionic_lif.h   |  29 ++
>  2 files changed, 412 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 72ac0fd56b69..efcda1337f91 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -12,9 +12,55 @@
>  #include "ionic_lif.h"
>  #include "ionic_debugfs.h"
>  
> +static void ionic_lif_rx_mode(struct lif *lif, unsigned int
> rx_mode);
> +static int ionic_lif_addr_add(struct lif *lif, const u8 *addr);
> +static int ionic_lif_addr_del(struct lif *lif, const u8 *addr);
> +
>  static int ionic_set_nic_features(struct lif *lif, netdev_features_t
> features);
>  static int ionic_notifyq_clean(struct lif *lif, int budget);
>  
> +static void ionic_lif_deferred_work(struct work_struct *work)
> +{
> +	struct lif *lif = container_of(work, struct lif,
> deferred.work);
> +	struct ionic_deferred *def = &lif->deferred;
> +	struct ionic_deferred_work *w = NULL;
> +
> +	spin_lock_bh(&def->lock);
> +	if (!list_empty(&def->list)) {
> +		w = list_first_entry(&def->list,
> +				     struct ionic_deferred_work, list);
> +		list_del(&w->list);
> +	}
> +	spin_unlock_bh(&def->lock);
> +
> +	if (w) {
> +		switch (w->type) {
> +		case DW_TYPE_RX_MODE:
> +			ionic_lif_rx_mode(lif, w->rx_mode);
> +			break;
> +		case DW_TYPE_RX_ADDR_ADD:
> +			ionic_lif_addr_add(lif, w->addr);
> +			break;
> +		case DW_TYPE_RX_ADDR_DEL:
> +			ionic_lif_addr_del(lif, w->addr);
> +			break;
> +		default:
> +			break;
> +		}
> +		kfree(w);
> +		schedule_work(&def->work);
> +	}
> +}
> +
> +static void ionic_lif_deferred_enqueue(struct ionic_deferred *def,
> +				       struct ionic_deferred_work
> *work)
> +{
> +	spin_lock_bh(&def->lock);
> +	list_add_tail(&work->list, &def->list);
> +	spin_unlock_bh(&def->lock);
> +	schedule_work(&def->work);
> +}
> +
>  int ionic_open(struct net_device *netdev)
>  {
>  	struct lif *lif = netdev_priv(netdev);
> @@ -180,6 +226,235 @@ static int ionic_notifyq_clean(struct lif *lif,
> int budget)
>  	return work_done;
>  }
>  
> +static int ionic_lif_addr_add(struct lif *lif, const u8 *addr)
> +{
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.rx_filter_add = {
> +			.opcode = CMD_OPCODE_RX_FILTER_ADD,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.match = cpu_to_le16(RX_FILTER_MATCH_MAC),
> +		},
> +	};
> +	struct rx_filter *f;
> +	int err;
> +
> +	/* don't bother if we already have it */
> +	spin_lock_bh(&lif->rx_filters.lock);
> +	f = ionic_rx_filter_by_addr(lif, addr);
> +	spin_unlock_bh(&lif->rx_filters.lock);
> +	if (f)
> +		return 0;
> +
> +	netdev_dbg(lif->netdev, "rx_filter add ADDR %pM (id %d)\n",
> addr,
> +		   ctx.comp.rx_filter_add.filter_id);
> +
> +	memcpy(ctx.cmd.rx_filter_add.mac.addr, addr, ETH_ALEN);
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	return ionic_rx_filter_save(lif, 0, RXQ_INDEX_ANY, 0, &ctx);
> +}
> +
> +static int ionic_lif_addr_del(struct lif *lif, const u8 *addr)
> +{
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.rx_filter_del = {
> +			.opcode = CMD_OPCODE_RX_FILTER_DEL,
> +			.lif_index = cpu_to_le16(lif->index),
> +		},
> +	};
> +	struct rx_filter *f;
> +	int err;
> +
> +	spin_lock_bh(&lif->rx_filters.lock);
> +	f = ionic_rx_filter_by_addr(lif, addr);
> +	if (!f) {
> +		spin_unlock_bh(&lif->rx_filters.lock);
> +		return -ENOENT;
> +	}
> +
> +	ctx.cmd.rx_filter_del.filter_id = cpu_to_le32(f->filter_id);
> +	ionic_rx_filter_free(lif, f);
> +	spin_unlock_bh(&lif->rx_filters.lock);
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	netdev_dbg(lif->netdev, "rx_filter del ADDR %pM (id %d)\n",
> addr,
> +		   ctx.cmd.rx_filter_del.filter_id);
> +
> +	return 0;
> +}
> +
> +static int ionic_lif_addr(struct lif *lif, const u8 *addr, bool add)
> +{
> +	struct ionic *ionic = lif->ionic;
> +	struct ionic_deferred_work *work;
> +	unsigned int nmfilters;
> +	unsigned int nufilters;
> +
> +	if (add) {
> +		/* Do we have space for this filter?  We test the
> counters
> +		 * here before checking the need for deferral so that
> we
> +		 * can return an overflow error to the stack.
> +		 */
> +		nmfilters = le32_to_cpu(ionic-
> >ident.lif.eth.max_mcast_filters);
> +		nufilters = le32_to_cpu(ionic-
> >ident.lif.eth.max_ucast_filters);
> +
> +		if ((is_multicast_ether_addr(addr) && lif->nmcast <
> nmfilters))
> +			lif->nmcast++;
> +		else if (!is_multicast_ether_addr(addr) &&
> +			 lif->nucast < nufilters)
> +			lif->nucast++;
> +		else
> +			return -ENOSPC;
> +	} else {
> +		if (is_multicast_ether_addr(addr) && lif->nmcast)
> +			lif->nmcast--;
> +		else if (!is_multicast_ether_addr(addr) && lif->nucast)
> +			lif->nucast--;
> +	}
> +
> +	if (in_interrupt()) {
> +		work = kzalloc(sizeof(*work), GFP_ATOMIC);
> +		if (!work) {
> +			netdev_err(lif->netdev, "%s OOM\n", __func__);
> +			return -ENOMEM;
> +		}
> +		work->type = add ? DW_TYPE_RX_ADDR_ADD :
> DW_TYPE_RX_ADDR_DEL;
> +		memcpy(work->addr, addr, ETH_ALEN);
> +		netdev_dbg(lif->netdev, "deferred: rx_filter %s %pM\n",
> +			   add ? "add" : "del", addr);
> +		ionic_lif_deferred_enqueue(&lif->deferred, work);
> +	} else {
> +		netdev_dbg(lif->netdev, "rx_filter %s %pM\n",
> +			   add ? "add" : "del", addr);
> +		if (add)
> +			return ionic_lif_addr_add(lif, addr);
> +		else
> +			return ionic_lif_addr_del(lif, addr);
> +	}
> +
> +	return 0;
> +}
> +
> +static int ionic_addr_add(struct net_device *netdev, const u8 *addr)
> +{
> +	return ionic_lif_addr(netdev_priv(netdev), addr, true);
> +}
> +
> +static int ionic_addr_del(struct net_device *netdev, const u8 *addr)
> +{
> +	return ionic_lif_addr(netdev_priv(netdev), addr, false);
> +}
> +
> +static void ionic_lif_rx_mode(struct lif *lif, unsigned int rx_mode)
> +{
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.rx_mode_set = {
> +			.opcode = CMD_OPCODE_RX_MODE_SET,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.rx_mode = cpu_to_le16(rx_mode),
> +		},
> +	};
> +	char buf[128];
> +	int err;
> +	int i;
> +#define REMAIN(__x) (sizeof(buf) - (__x))
> +
> +	i = snprintf(buf, sizeof(buf), "rx_mode 0x%04x -> 0x%04x:",
> +		     lif->rx_mode, rx_mode);
> +	if (rx_mode & RX_MODE_F_UNICAST)
> +		i += snprintf(&buf[i], REMAIN(i), "
> RX_MODE_F_UNICAST");
> +	if (rx_mode & RX_MODE_F_MULTICAST)
> +		i += snprintf(&buf[i], REMAIN(i), "
> RX_MODE_F_MULTICAST");
> +	if (rx_mode & RX_MODE_F_BROADCAST)
> +		i += snprintf(&buf[i], REMAIN(i), "
> RX_MODE_F_BROADCAST");
> +	if (rx_mode & RX_MODE_F_PROMISC)
> +		i += snprintf(&buf[i], REMAIN(i), "
> RX_MODE_F_PROMISC");
> +	if (rx_mode & RX_MODE_F_ALLMULTI)
> +		i += snprintf(&buf[i], REMAIN(i), "
> RX_MODE_F_ALLMULTI");
> +	netdev_dbg(lif->netdev, "lif%d %s\n", lif->index, buf);
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		netdev_warn(lif->netdev, "set rx_mode 0x%04x failed:
> %d\n",
> +			    rx_mode, err);
> +	else
> +		lif->rx_mode = rx_mode;
> +}
> +
> +static void _ionic_lif_rx_mode(struct lif *lif, unsigned int
> rx_mode)
> +{
> +	struct ionic_deferred_work *work;
> +
> +	if (in_interrupt()) {
> +		work = kzalloc(sizeof(*work), GFP_ATOMIC);
> +		if (!work) {
> +			netdev_err(lif->netdev, "%s OOM\n", __func__);
> +			return;
> +		}
> +		work->type = DW_TYPE_RX_MODE;
> +		work->rx_mode = rx_mode;
> +		netdev_dbg(lif->netdev, "deferred: rx_mode\n");
> +		ionic_lif_deferred_enqueue(&lif->deferred, work);
> +	} else {
> +		ionic_lif_rx_mode(lif, rx_mode);
> +	}
> +}
> +
> +static void ionic_set_rx_mode(struct net_device *netdev)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +	struct identity *ident = &lif->ionic->ident;
> +	unsigned int nfilters;
> +	unsigned int rx_mode;
> +
> +	rx_mode = RX_MODE_F_UNICAST;
> +	rx_mode |= (netdev->flags & IFF_MULTICAST) ?
> RX_MODE_F_MULTICAST : 0;
> +	rx_mode |= (netdev->flags & IFF_BROADCAST) ?
> RX_MODE_F_BROADCAST : 0;
> +	rx_mode |= (netdev->flags & IFF_PROMISC) ? RX_MODE_F_PROMISC :
> 0;
> +	rx_mode |= (netdev->flags & IFF_ALLMULTI) ? RX_MODE_F_ALLMULTI
> : 0;
> +
> +	/* sync unicast addresses
> +	 * next check to see if we're in an overflow state
> +	 *    if so, we track that we overflowed and enable NIC PROMISC
> +	 *    else if the overflow is set and not needed
> +	 *       we remove our overflow flag and check the netdev flags
> +	 *       to see if we can disable NIC PROMISC
> +	 */
> +	__dev_uc_sync(netdev, ionic_addr_add, ionic_addr_del);
> +	nfilters = le32_to_cpu(ident->lif.eth.max_ucast_filters);
> +	if (netdev_uc_count(netdev) + 1 > nfilters) {
> +		rx_mode |= RX_MODE_F_PROMISC;
> +		lif->uc_overflow = true;
> +	} else if (lif->uc_overflow) {
> +		lif->uc_overflow = false;
> +		if (!(netdev->flags & IFF_PROMISC))
> +			rx_mode &= ~RX_MODE_F_PROMISC;
> +	}
> +
> +	/* same for multicast */
> +	__dev_mc_sync(netdev, ionic_addr_add, ionic_addr_del);
> +	nfilters = le32_to_cpu(ident->lif.eth.max_mcast_filters);
> +	if (netdev_mc_count(netdev) > nfilters) {
> +		rx_mode |= RX_MODE_F_ALLMULTI;
> +		lif->mc_overflow = true;
> +	} else if (lif->mc_overflow) {
> +		lif->mc_overflow = false;
> +		if (!(netdev->flags & IFF_ALLMULTI))
> +			rx_mode &= ~RX_MODE_F_ALLMULTI;
> +	}
> +
> +	if (lif->rx_mode != rx_mode)
> +		_ionic_lif_rx_mode(lif, rx_mode);
> +}
> +
>  static int ionic_set_features(struct net_device *netdev,
>  			      netdev_features_t features)
>  {
> @@ -196,8 +471,26 @@ static int ionic_set_features(struct net_device
> *netdev,
>  
>  static int ionic_set_mac_address(struct net_device *netdev, void
> *sa)
>  {
> -	netdev_info(netdev, "%s: stubbed\n", __func__);
> -	return 0;
> +	struct sockaddr *addr = sa;
> +	u8 *mac;
> +
> +	mac = (u8 *)addr->sa_data;
> +	if (ether_addr_equal(netdev->dev_addr, mac))
> +		return 0;
> +
> +	if (!is_valid_ether_addr(mac))
> +		return -EADDRNOTAVAIL;
> +
> +	if (!is_zero_ether_addr(netdev->dev_addr)) {
> +		netdev_info(netdev, "deleting mac addr %pM\n",
> +			    netdev->dev_addr);
> +		ionic_addr_del(netdev, netdev->dev_addr);
> +	}
> +
> +	memcpy(netdev->dev_addr, mac, netdev->addr_len);
> +	netdev_info(netdev, "updating mac addr %pM\n", mac);
> +
> +	return ionic_addr_add(netdev, mac);
>  }
>  
>  static int ionic_change_mtu(struct net_device *netdev, int new_mtu)
> @@ -232,20 +525,63 @@ static void ionic_tx_timeout(struct net_device
> *netdev)
>  static int ionic_vlan_rx_add_vid(struct net_device *netdev, __be16
> proto,
>  				 u16 vid)
>  {
> -	netdev_info(netdev, "%s: stubbed\n", __func__);
> -	return 0;
> +	struct lif *lif = netdev_priv(netdev);
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.rx_filter_add = {
> +			.opcode = CMD_OPCODE_RX_FILTER_ADD,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.match = cpu_to_le16(RX_FILTER_MATCH_VLAN),
> +			.vlan.vlan = cpu_to_le16(vid),
> +		},
> +	};
> +	int err;
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	netdev_dbg(netdev, "rx_filter add VLAN %d (id %d)\n", vid,
> +		   ctx.comp.rx_filter_add.filter_id);
> +
> +	return ionic_rx_filter_save(lif, 0, RXQ_INDEX_ANY, 0, &ctx);
>  }
>  
>  static int ionic_vlan_rx_kill_vid(struct net_device *netdev, __be16
> proto,
>  				  u16 vid)
>  {
> -	netdev_info(netdev, "%s: stubbed\n", __func__);
> -	return 0;
> +	struct lif *lif = netdev_priv(netdev);
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.rx_filter_del = {
> +			.opcode = CMD_OPCODE_RX_FILTER_DEL,
> +			.lif_index = cpu_to_le16(lif->index),
> +		},
> +	};
> +	struct rx_filter *f;
> +
> +	spin_lock_bh(&lif->rx_filters.lock);
> +
> +	f = ionic_rx_filter_by_vlan(lif, vid);
> +	if (!f) {
> +		spin_unlock_bh(&lif->rx_filters.lock);
> +		return -ENOENT;
> +	}
> +
> +	netdev_dbg(netdev, "rx_filter del VLAN %d (id %d)\n", vid,
> +		   le32_to_cpu(ctx.cmd.rx_filter_del.filter_id));
> +
> +	ctx.cmd.rx_filter_del.filter_id = cpu_to_le32(f->filter_id);
> +	ionic_rx_filter_free(lif, f);
> +	spin_unlock_bh(&lif->rx_filters.lock);
> +
> +	return ionic_adminq_post_wait(lif, &ctx);
>  }
>  
>  static const struct net_device_ops ionic_netdev_ops = {
>  	.ndo_open               = ionic_open,
>  	.ndo_stop               = ionic_stop,
> +	.ndo_set_rx_mode	= ionic_set_rx_mode,
>  	.ndo_set_features	= ionic_set_features,
>  	.ndo_set_mac_address	= ionic_set_mac_address,
>  	.ndo_validate_addr	= eth_validate_addr,
> @@ -546,6 +882,10 @@ static struct lif *ionic_lif_alloc(struct ionic
> *ionic, unsigned int index)
>  
>  	spin_lock_init(&lif->adminq_lock);
>  
> +	spin_lock_init(&lif->deferred.lock);
> +	INIT_LIST_HEAD(&lif->deferred.list);
> +	INIT_WORK(&lif->deferred.work, ionic_lif_deferred_work);
> +
>  	/* allocate lif info */
>  	lif->info_sz = ALIGN(sizeof(*lif->info), PAGE_SIZE);
>  	lif->info = dma_alloc_coherent(dev, lif->info_sz,
> @@ -607,6 +947,8 @@ static void ionic_lif_free(struct lif *lif)
>  	ionic_qcqs_free(lif);
>  	ionic_lif_reset(lif);
>  

I don't think you want deferred.work running while reset is executing..
cancel_work_sync should happen as early as you close the netdevice.

I assume ionic_lif_reset will flush all configurations and you don't
need to cleanup anything manually?  or any data structure stored in
driver ?

> +	cancel_work_sync(&lif->deferred.work);
> +
>  	/* free lif info */
>  	dma_free_coherent(dev, lif->info_sz, lif->info, lif->info_pa);
>  	lif->info = NULL;
> @@ -975,6 +1317,37 @@ static int ionic_init_nic_features(struct lif
> *lif)
>  	return 0;
>  }
>  
> +static int ionic_station_set(struct lif *lif)
> +{
> +	struct net_device *netdev = lif->netdev;
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.lif_getattr = {
> +			.opcode = CMD_OPCODE_LIF_GETATTR,
> +			.index = cpu_to_le16(lif->index),
> +			.attr = IONIC_LIF_ATTR_MAC,
> +		},
> +	};
> +	int err;
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	if (!is_zero_ether_addr(netdev->dev_addr)) {
> +		netdev_dbg(lif->netdev, "deleting station MAC addr
> %pM\n",
> +			   netdev->dev_addr);
> +		ionic_lif_addr(lif, netdev->dev_addr, false);
> +	}
> +	memcpy(netdev->dev_addr, ctx.comp.lif_getattr.mac,
> +	       netdev->addr_len);
> +	netdev_dbg(lif->netdev, "adding station MAC addr %pM\n",
> +		   netdev->dev_addr);
> +	ionic_lif_addr(lif, netdev->dev_addr, true);
> +
> +	return 0;
> +}
> +
>  static int ionic_lif_init(struct lif *lif)
>  {
>  	struct ionic_dev *idev = &lif->ionic->idev;
> @@ -1039,6 +1412,10 @@ static int ionic_lif_init(struct lif *lif)
>  	if (err)
>  		goto err_out_notifyq_deinit;
>  
> +	err = ionic_station_set(lif);
> +	if (err)
> +		goto err_out_notifyq_deinit;
> +
>  	set_bit(LIF_INITED, lif->state);
>  
>  	return 0;
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> index 9f112aa69033..20b4fa573f77 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> @@ -60,6 +60,29 @@ struct qcq {
>  #define napi_to_qcq(napi)	container_of(napi, struct qcq, napi)
>  #define napi_to_cq(napi)	(&napi_to_qcq(napi)->cq)
>  
> +enum ionic_deferred_work_type {
> +	DW_TYPE_RX_MODE,
> +	DW_TYPE_RX_ADDR_ADD,
> +	DW_TYPE_RX_ADDR_DEL,
> +	DW_TYPE_LINK_STATUS,
> +	DW_TYPE_LIF_RESET,
> +};
> +
> +struct ionic_deferred_work {
> +	struct list_head list;
> +	enum ionic_deferred_work_type type;
> +	union {
> +		unsigned int rx_mode;
> +		u8 addr[ETH_ALEN];
> +	};
> +};
> +
> +struct ionic_deferred {
> +	spinlock_t lock;		/* lock for deferred work list */
> +	struct list_head list;
> +	struct work_struct work;
> +};
> +
>  enum lif_state_flags {
>  	LIF_INITED,
>  	LIF_UP,
> @@ -87,13 +110,19 @@ struct lif {
>  	u64 last_eid;
>  	unsigned int neqs;
>  	unsigned int nxqs;
> +	unsigned int rx_mode;
>  	u64 hw_features;
> +	bool mc_overflow;
> +	unsigned int nmcast;
> +	bool uc_overflow;
> +	unsigned int nucast;
>  
>  	struct lif_info *info;
>  	dma_addr_t info_pa;
>  	u32 info_sz;
>  
>  	struct rx_filters rx_filters;
> +	struct ionic_deferred deferred;
>  	unsigned long *dbid_inuse;
>  	unsigned int dbid_count;
>  	struct dentry *dentry;

^ permalink raw reply

* Re: [PATCH] fix noderef.cocci warnings
From: Nikolay Aleksandrov @ 2019-07-23 23:21 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Martin Weinelt, bridge, Roopa Prabhu, netdev
In-Reply-To: <20190723225458.GA3376@lkp-kbuild04>

On 7/24/19 1:54 AM, kbuild test robot wrote:
> From: kbuild test robot <lkp@intel.com>
> 
> net/bridge/br_multicast.c:999:8-14: ERROR: application of sizeof to pointer
> 
>  sizeof when applied to a pointer typed expression gives the size of
>  the pointer
> 
> Generated by: scripts/coccinelle/misc/noderef.cocci
> 
> Fixes: 17c91348ed8b ("Use-after-free in br_multicast_rcv")
> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Signed-off-by: kbuild test robot <lkp@intel.com>
> ---
> 
> url:    https://github.com/0day-ci/linux/commits/Nikolay-Aleksandrov/net-bridge-mcast-fix-possible-uses-of-stale-pointers/20190702-083354
> 
>  br_multicast.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -996,7 +996,7 @@ static int br_ip6_multicast_mld2_report(
>  			return -EINVAL;
>  
>  		_nsrcs = skb_header_pointer(skb, nsrcs_offset,
> -					   sizeof(_nsrcs), &__nsrcs);
> +					   sizeof(*_nsrcs), &__nsrcs);
>  		if (!_nsrcs)
>  			return -EINVAL;
>  
> 

This must be quite old, I already sent a proper patch without this error.
This one was sent just for testing, hence the TEST in $subject.

 [PATCH TEST] net: bridge: mcast: fix possible uses of stale pointers

Thanks,
 Nik


^ permalink raw reply

* Re: [PATCH v4 net-next 02/19] ionic: Add hardware init and device commands
From: Shannon Nelson @ 2019-07-23 23:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20190723.160538.2065000079755912945.davem@davemloft.net>

On 7/23/19 4:05 PM, David Miller wrote:
> From: Shannon Nelson <snelson@pensando.io>
> Date: Tue, 23 Jul 2019 15:50:22 -0700
>
>> On 7/23/19 2:18 PM, David Miller wrote:
>>> From: Shannon Nelson <snelson@pensando.io>
>>> Date: Mon, 22 Jul 2019 14:40:06 -0700
>>>
>>>> +void ionic_init_devinfo(struct ionic_dev *idev)
>>>> +{
>>>> + idev->dev_info.asic_type = ioread8(&idev->dev_info_regs->asic_type);
>>>> + idev->dev_info.asic_rev = ioread8(&idev->dev_info_regs->asic_rev);
>>>> +
>>>> +	memcpy_fromio(idev->dev_info.fw_version,
>>>> +		      idev->dev_info_regs->fw_version,
>>>> +		      IONIC_DEVINFO_FWVERS_BUFLEN);
>>>> +
>>>> +	memcpy_fromio(idev->dev_info.serial_num,
>>>> +		      idev->dev_info_regs->serial_num,
>>>> +		      IONIC_DEVINFO_SERIAL_BUFLEN);
>>>    ...
>>>> +	sig = ioread32(&idev->dev_info_regs->signature);
>>> I think if you are going to use the io{read,write}{8,16,32,64}()
>>> interfaces then you should use io{read,write}{8,16,32,64}_rep()
>>> instead of memcpy_{to,from}io().
>>>
>> Sure.
> Note, I could be wrong.  Please test.
>
> I think the operation of the two things might be different.

It seems to me that memcpy() usually just does the right thing in most 
cases, so that's what I went with.  Looking into some of the 
definitions, and at how I used memcpy_...(), I think there are some 
appropriate ways to use ioread32_rep() in a couple of my cases, and 
another case or two where the memcpy variant may not make much 
difference with ioread8_rep().  It's also possible that sparse may have 
an opinion.  I'll look at them.

sln


^ permalink raw reply

* Re: [PATCH iproute2] etf: make printing of variable JSON friendly
From: Stephen Hemminger @ 2019-07-23 23:23 UTC (permalink / raw)
  To: Patel, Vedang
  Cc: David Ahern, netdev@vger.kernel.org, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Gomes, Vinicius, Dorileo, Leandro
In-Reply-To: <8BC34CA3-C500-4188-BDBA-4B2B7E9F1EE2@intel.com>

On Tue, 23 Jul 2019 21:34:46 +0000
"Patel, Vedang" <vedang.patel@intel.com> wrote:

> > On Jul 22, 2019, at 5:11 PM, David Ahern <dsahern@gmail.com> wrote:
> > 
> > On 7/22/19 1:11 PM, Patel, Vedang wrote:  
> >> 
> >>   
> >>> On Jul 22, 2019, at 11:21 AM, David Ahern <dsahern@gmail.com> wrote:
> >>> 
> >>> On 7/19/19 3:40 PM, Vedang Patel wrote:  
> >>>> In iproute2 txtime-assist series, it was pointed out that print_bool()
> >>>> should be used to print binary values. This is to make it JSON friendly.
> >>>> 
> >>>> So, make the corresponding changes in ETF.
> >>>> 
> >>>> Fixes: 8ccd49383cdc ("etf: Add skip_sock_check")
> >>>> Reported-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
> >>>> ---
> >>>> tc/q_etf.c | 12 ++++++------
> >>>> 1 file changed, 6 insertions(+), 6 deletions(-)
> >>>> 
> >>>> diff --git a/tc/q_etf.c b/tc/q_etf.c
> >>>> index c2090589bc64..307c50eed48b 100644
> >>>> --- a/tc/q_etf.c
> >>>> +++ b/tc/q_etf.c
> >>>> @@ -176,12 +176,12 @@ static int etf_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
> >>>> 		     get_clock_name(qopt->clockid));
> >>>> 
> >>>> 	print_uint(PRINT_ANY, "delta", "delta %d ", qopt->delta);
> >>>> -	print_string(PRINT_ANY, "offload", "offload %s ",
> >>>> -				(qopt->flags & TC_ETF_OFFLOAD_ON) ? "on" : "off");
> >>>> -	print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
> >>>> -				(qopt->flags & TC_ETF_DEADLINE_MODE_ON) ? "on" : "off");
> >>>> -	print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
> >>>> -				(qopt->flags & TC_ETF_SKIP_SOCK_CHECK) ? "on" : "off");
> >>>> +	if (qopt->flags & TC_ETF_OFFLOAD_ON)
> >>>> +		print_bool(PRINT_ANY, "offload", "offload ", true);
> >>>> +	if (qopt->flags & TC_ETF_DEADLINE_MODE_ON)
> >>>> +		print_bool(PRINT_ANY, "deadline_mode", "deadline_mode ", true);
> >>>> +	if (qopt->flags & TC_ETF_SKIP_SOCK_CHECK)
> >>>> +		print_bool(PRINT_ANY, "skip_sock_check", "skip_sock_check", true);
> >>>> 
> >>>> 	return 0;
> >>>> }
> >>>>   
> >>> 
> >>> This changes existing output for TC_ETF_OFFLOAD_ON and
> >>> TC_ETF_DEADLINE_MODE_ON which were added a year ago.  
> >> Yes, this is a good point. I missed that. 
> >> 
> >> Another idea is to use is_json_context() and call print_bool() there. But, that will still change values corresponding to the json output for the above flags from “on”/“off” to “true”/“false”. I am not sure if this is a big issue. 
> >> 
> >> My suggestion is to keep the code as is. what do you think?
> >>   
> > 
> > I think we need automated checkers for new code. ;-)
> > 
> > The first 2 should not change for backward compatibility - unless there
> > is agreement that this feature is too new and long term it is better to
> > print as above.
> > 
> > Then the new one should follow context of the other 2 - consistency IMHO
> > takes precedence.  
> Thanks for the inputs. 
> 
> Let’s keep whatever is currently present upstream and you can ignore this patch.
> 
> Thanks,
> Vedang
Agreed. At this point consistency is better.
Maybe at some future point, all the JSON will be reviewed and fixed (yes it would be a breaking flag day).
But for now inconsistent usage across ip, tc, and devlink is a fact of life.

^ permalink raw reply

* Re: [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-23 23:24 UTC (permalink / raw)
  To: Ira Weiny
  Cc: john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Christoph Hellwig,
	Daniel Vetter, Dan Williams, Dave Chinner, David Airlie,
	David S . Miller, Ilya Dryomov, Jan Kara, Jason Gunthorpe,
	Jens Axboe, Jérôme Glisse, Johannes Thumshirn,
	Magnus Karlsson, Matthew Wilcox, Miklos Szeredi, Ming Lei,
	Sage Weil, Santosh Shilimkar, Yan Zheng, netdev, dri-devel,
	linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190723180612.GB29729@iweiny-DESK2.sc.intel.com>

On 7/23/19 11:06 AM, Ira Weiny wrote:
> On Mon, Jul 22, 2019 at 09:41:34PM -0700, John Hubbard wrote:
>> On 7/22/19 5:25 PM, Ira Weiny wrote:
>>> On Mon, Jul 22, 2019 at 03:34:15PM -0700, john.hubbard@gmail.com wrote:
...
>> Obviously, this stuff is all subject to a certain amount of opinion, but I
>> think I'm on really solid ground as far as precedent goes. So I'm pushing
>> back on the NAK... :)
> 
> Fair enough...  However, we have discussed in the past how GUP can be a
> confusing interface to use.
> 
> So I'd like to see it be more directed.  Only using the __put_user_pages()
> version allows us to ID callers easier through a grep of PUP_FLAGS_DIRTY_LOCK
> in addition to directing users to use that interface rather than having to read
> the GUP code to figure out that the 2 calls above are equal.  It is not a huge
> deal but...
> 

OK, combining all the feedback to date, which is:

* the leading double underscore is unloved,

* set_page_dirty() is under investigation, but likely guilty of incitement
  to cause bugs,


...we end up with this:

void put_user_pages_dirty_lock(struct page **pages, unsigned long npages,
			       bool make_dirty)

...which I have a v2 patchset for, ready to send out. It makes IB all pretty 
too. :)


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: memory leak in rds_send_probe
From: Eric Biggers @ 2019-07-23 23:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: syzbot, catalin.marinas, davem, dvyukov, jack, kirill.shutemov,
	koct9i, linux-kernel, linux-mm, linux-rdma, neilb, netdev,
	rds-devel, ross.zwisler, santosh.shilimkar, syzkaller-bugs,
	torvalds, willy
In-Reply-To: <20190723152336.29ed51551d8c9600bb316b52@linux-foundation.org>

On Tue, Jul 23, 2019 at 03:23:36PM -0700, Andrew Morton wrote:
> On Tue, 23 Jul 2019 15:17:00 -0700 syzbot <syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com> wrote:
> 
> > syzbot has bisected this bug to:
> > 
> > commit af49a63e101eb62376cc1d6bd25b97eb8c691d54
> > Author: Matthew Wilcox <willy@linux.intel.com>
> > Date:   Sat May 21 00:03:33 2016 +0000
> > 
> >      radix-tree: change naming conventions in radix_tree_shrink
> > 
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=176528c8600000
> > start commit:   c6dd78fc Merge branch 'x86-urgent-for-linus' of git://git...
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=14e528c8600000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=10e528c8600000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8de7d700ea5ac607
> > dashboard link: https://syzkaller.appspot.com/bug?extid=5134cdf021c4ed5aaa5f
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=145df0c8600000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=170001f4600000
> > 
> > Reported-by: syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com
> > Fixes: af49a63e101e ("radix-tree: change naming conventions in  
> > radix_tree_shrink")
> > 
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 
> That's rather hard to believe.  af49a63e101eb6237 simply renames a
> couple of local variables.
> 

It's been known for months (basically ever since bisection was added) that about
50% of syzbot bisection results are obviously incorrect, often a commit selected
at random.  Unfortunately, the people actually funded to work on syzbot
apparently don't consider fixing this to be high priority issue, so we have to
live with it for now.  Or until someone volunteers to fix it themselves; source
is at https://github.com/google/syzkaller.

So for now, please don't waste much time on bisection results that look wonky.

But please do pay attention to any bisection results in reminders I've been
sending like "Reminder: 10 open syzbot bugs in foo subsystem", since I've
manually reviewed those to exclude the obviously wrong results...

- Eric

^ permalink raw reply

* [PATCH] fsl/fman: Remove comment referring to non-existent function
From: Chris Packham @ 2019-07-23 23:35 UTC (permalink / raw)
  To: madalin.bucur, davem; +Cc: netdev, linux-kernel, Chris Packham

fm_set_max_frm() existed in the Freescale SDK as a callback for an
early_param. When this code was ported to the upstream kernel the
early_param was converted to a module_param making the reference to the
function incorrect. The rest of the comment already does a good job of
explaining the parameter so removing the reference to the non-existent
function seems like the best thing to do.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
 drivers/net/ethernet/freescale/fman/fman.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index e80fedb27cee..210749bf1eac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -2439,9 +2439,6 @@ MODULE_PARM_DESC(fsl_fm_rx_extra_headroom, "Extra headroom for Rx buffers");
  * buffers when not using jumbo frames.
  * Must be large enough to accommodate the network MTU, but small enough
  * to avoid wasting skb memory.
- *
- * Could be overridden once, at boot-time, via the
- * fm_set_max_frm() callback.
  */
 static int fsl_fm_max_frm = FSL_FM_MAX_FRAME_SIZE;
 module_param(fsl_fm_max_frm, int, 0);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v4 net-next 02/19] ionic: Add hardware init and device commands
From: Saeed Mahameed @ 2019-07-23 23:47 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-3-snelson@pensando.io>

On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> The ionic device has a small set of PCI registers, including a
> device control and data space, and a large set of message
> commands.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/net/ethernet/pensando/ionic/Makefile  |    2 +-
>  drivers/net/ethernet/pensando/ionic/ionic.h   |   20 +
>  .../net/ethernet/pensando/ionic/ionic_bus.h   |    1 +
>  .../ethernet/pensando/ionic/ionic_bus_pci.c   |  140 +-
>  .../ethernet/pensando/ionic/ionic_debugfs.c   |   67 +
>  .../ethernet/pensando/ionic/ionic_debugfs.h   |   28 +
>  .../net/ethernet/pensando/ionic/ionic_dev.c   |  132 +
>  .../net/ethernet/pensando/ionic/ionic_dev.h   |  144 +
>  .../net/ethernet/pensando/ionic/ionic_if.h    | 2552
> +++++++++++++++++
>  .../net/ethernet/pensando/ionic/ionic_main.c  |  296 ++
>  .../net/ethernet/pensando/ionic/ionic_regs.h  |  133 +
>  11 files changed, 3512 insertions(+), 3 deletions(-)
>  create mode 100644
> drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
>  create mode 100644
> drivers/net/ethernet/pensando/ionic/ionic_debugfs.h
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_dev.c
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_dev.h
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_if.h
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_regs.h
> 

[...]
 
>  static void ionic_remove(struct pci_dev *pdev)
>  {
>  	struct ionic *ionic = pci_get_drvdata(pdev);
>  
> -	devm_kfree(&pdev->dev, ionic);
> +	if (ionic) {

nit, in case you are doing another re-spin  maybe early return here:
if (!ionic) 
     return;
//do stuff 

> +		ionic_reset(ionic);
> +		ionic_dev_teardown(ionic);
> +		ionic_unmap_bars(ionic);
> +		pci_release_regions(pdev);
> +		pci_clear_master(pdev);
> +		pci_disable_sriov(pdev);
> +		pci_disable_device(pdev);
> +		ionic_debugfs_del_dev(ionic);
> +		mutex_destroy(&ionic->dev_cmd_lock);
> +
> +		devm_kfree(&pdev->dev, ionic);
> +	}
>  }
> 

[...]

>  
> +
> +/* Devcmd Interface */
> +u8 ionic_dev_cmd_status(struct ionic_dev *idev)
> +{
> +	return ioread8(&idev->dev_cmd_regs->comp.comp.status);
> +}
> +
> +bool ionic_dev_cmd_done(struct ionic_dev *idev)
> +{
> +	return ioread32(&idev->dev_cmd_regs->done) & DEV_CMD_DONE;
> +}
> +
> +void ionic_dev_cmd_comp(struct ionic_dev *idev, union dev_cmd_comp
> *comp)
> +{
> +	memcpy_fromio(comp, &idev->dev_cmd_regs->comp, sizeof(*comp));
> +}
> +
> +void ionic_dev_cmd_go(struct ionic_dev *idev, union dev_cmd *cmd)
> +{
> +	memcpy_toio(&idev->dev_cmd_regs->cmd, cmd, sizeof(*cmd));
> +	iowrite32(0, &idev->dev_cmd_regs->done);
> +	iowrite32(1, &idev->dev_cmd_regs->doorbell);
> +}
> +
> +/* Device commands */
> +void ionic_dev_cmd_identify(struct ionic_dev *idev, u8 ver)
> +{
> +	union dev_cmd cmd = {
> +		.identify.opcode = CMD_OPCODE_IDENTIFY,
> +		.identify.ver = ver,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_init(struct ionic_dev *idev)
> +{
> +	union dev_cmd cmd = {
> +		.init.opcode = CMD_OPCODE_INIT,
> +		.init.type = 0,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}
> +
> +void ionic_dev_cmd_reset(struct ionic_dev *idev)
> +{
> +	union dev_cmd cmd = {
> +		.reset.opcode = CMD_OPCODE_RESET,
> +	};
> +
> +	ionic_dev_cmd_go(idev, &cmd);
> +}

[...]

> +int ionic_dev_cmd_wait(struct ionic *ionic, unsigned long
> max_seconds)
> +{
> +	struct ionic_dev *idev = &ionic->idev;
> +	unsigned long max_wait, start_time, duration;
> +	int opcode;
> +	int done;
> +	int err;
> +
> +	WARN_ON(in_interrupt());
> +
> +	/* Wait for dev cmd to complete, retrying if we get EAGAIN,
> +	 * but don't wait any longer than max_seconds.
> +	 */
> +	max_wait = jiffies + (max_seconds * HZ);
> +try_again:
> +	start_time = jiffies;
> +	do {
> +		done = ionic_dev_cmd_done(idev);

READ_ONCE required here ? to read from coherent memory modified
by the device and read by the driver ?

> +		if (done)
> +			break;
> +		msleep(20);
> +	} while (!done && time_before(jiffies, max_wait));

so your command interface is busy polling based, i am relating here to
Dave's comment regarding async command completion, is it possible to
have interrupt (MSIX?) based command completion in this hw ? 

> +	duration = jiffies - start_time;
> +
> +	opcode = idev->dev_cmd_regs->cmd.cmd.opcode;
> +	dev_dbg(ionic->dev, "DEVCMD %s (%d) done=%d took %ld secs (%ld
> jiffies)\n",
> +		ionic_opcode_to_str(opcode), opcode,
> +		done, duration / HZ, duration);
> +
> +	if (!done && !time_before(jiffies, max_wait)) {
> +		dev_warn(ionic->dev, "DEVCMD %s (%d) timeout after %ld
> secs\n",
> +			 ionic_opcode_to_str(opcode), opcode,
> max_seconds);
> +		return -ETIMEDOUT;
> +	}
> +
> +	err = ionic_dev_cmd_status(&ionic->idev);
> +	if (err) {
> +		if (err == IONIC_RC_EAGAIN && !time_after(jiffies,
> max_wait)) {
> +			dev_err(ionic->dev, "DEV_CMD %s (%d) error, %s
> (%d) retrying...\n",
> +				ionic_opcode_to_str(opcode), opcode,
> +				ionic_error_to_str(err), err);
> +
> +			msleep(1000);
> +			iowrite32(0, &idev->dev_cmd_regs->done);
> +			iowrite32(1, &idev->dev_cmd_regs->doorbell);
> +			goto try_again;
> +		}
> +
> +		dev_err(ionic->dev, "DEV_CMD %s (%d) error, %s (%d)
> failed\n",
> +			ionic_opcode_to_str(opcode), opcode,
> +			ionic_error_to_str(err), err);
> +
> +		return ionic_error_to_errno(err);
> +	}
> +
> +	return 0;
> +}
> +
> 

[...]

^ permalink raw reply

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Saeed Mahameed @ 2019-07-23 23:54 UTC (permalink / raw)
  To: snelson@pensando.io, davem@davemloft.net; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190723.160628.20093803405793483.davem@davemloft.net>

On Tue, 2019-07-23 at 16:06 -0700, David Miller wrote:
> From: Shannon Nelson <snelson@pensando.io>
> Date: Tue, 23 Jul 2019 15:50:43 -0700
> 
> > On 7/23/19 2:33 PM, David Miller wrote:
> > > Generally interface address changes are expected to be
> > > synchronous.
> > Yeah, this bothers me a bit as well, but the address change calls
> > come
> > in under spin_lock_bh(), and I'm reluctant to make an AdminQ call
> > under the _bh that could block for a few seconds.
> 
> So it's not about memory allocation but rather the fact that the
> device
> might take a while to complete?
> 
> Can you start the operation synchronously yet complete it async?

The driver is doing busy polling on command completion, doing only busy
polling on completion status in the deferred work will not be much
different than what they have now.. 

async completion will only make since if the hardware supports
interrupt based (MSI-x) command completion.

^ permalink raw reply

* [PATCH bpf-next v10 0/2] bpf: Allow bpf_skb_event_output for more prog types
From: Allan Zhang @ 2019-07-24  0:07 UTC (permalink / raw)
  To: netdev, bpf, songliubraving, daniel, andrii.nakryiko; +Cc: ast, Allan Zhang

Software event output is only enabled by a few prog types right now (TC,
LWT out, XDP, sockops). Many other skb based prog types need
bpf_skb_event_output to produce software event.

More prog types are enabled to access bpf_skb_event_output in this
patch.

v10 changes:
Resubmit (v9 is submitted when bpf branch is closed).

v9 changes:
add "Acked-by" field.

v8 changes:
No actual change, just cc to netdev@vger.kernel.org and
bpf@vger.kernel.org.
v7 patches are acked by Song Liu.

v7 changes:
Reformat from hints by scripts/checkpatch.pl, including Song's comment
on signed-off-by name to captical case in cover letter.
3 of hints are ignored:
1. new file mode.
2. SPDX-License-Identifier for event_output.c since all files under
   this dir have no such line.
3. "Macros ... enclosed in parentheses" for macro in event_output.c
   due to code's nature.

Change patch 02 subject "bpf:..." to "selftests/bpf:..."

v6 changes:
Fix Signed-off-by, fix fixup map creation.

v5 changes:
Fix typos, reformat comments in event_output.c, move revision history to
cover letter.

v4 changes:
Reformating log message.

v3 changes:
Reformating log message.

v2 changes:
Reformating log message.

Allan Zhang (2):
  bpf: Allow bpf_skb_event_output for a few prog types
  selftests/bpf: Add selftests for bpf_perf_event_output

 net/core/filter.c                             |  6 ++
 tools/testing/selftests/bpf/test_verifier.c   | 12 ++-
 .../selftests/bpf/verifier/event_output.c     | 94 +++++++++++++++++++
 3 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/event_output.c

-- 
2.22.0.410.gd8fdbe21b5-goog

^ permalink raw reply

* [PATCH bpf-next v10 1/2] bpf: Allow bpf_skb_event_output for a few prog types
From: Allan Zhang @ 2019-07-24  0:07 UTC (permalink / raw)
  To: netdev, bpf, songliubraving, daniel, andrii.nakryiko; +Cc: ast, Allan Zhang
In-Reply-To: <20190724000725.15634-1-allanzhang@google.com>

Software event output is only enabled by a few prog types right now (TC,
LWT out, XDP, sockops). Many other skb based prog types need
bpf_skb_event_output to produce software event.

Added socket_filter, cg_skb, sk_skb prog types to generate sw event.

Test bpf code is generated from code snippet:

struct TMP {
    uint64_t tmp;
} tt;
tt.tmp = 5;
bpf_perf_event_output(skb, &connection_tracking_event_map, 0,
                      &tt, sizeof(tt));
return 1;

the bpf assembly from llvm is:
       0:       b7 02 00 00 05 00 00 00         r2 = 5
       1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
       2:       bf a4 00 00 00 00 00 00         r4 = r10
       3:       07 04 00 00 f8 ff ff ff         r4 += -8
       4:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00    r2 = 0ll
       6:       b7 03 00 00 00 00 00 00         r3 = 0
       7:       b7 05 00 00 08 00 00 00         r5 = 8
       8:       85 00 00 00 19 00 00 00         call 25
       9:       b7 00 00 00 01 00 00 00         r0 = 1
      10:       95 00 00 00 00 00 00 00         exit

Signed-off-by: Allan Zhang <allanzhang@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 net/core/filter.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4e2a79b2fd77..3961437ccc50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5999,6 +5999,8 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_cookie_proto;
 	case BPF_FUNC_get_socket_uid:
 		return &bpf_get_socket_uid_proto;
+	case BPF_FUNC_perf_event_output:
+		return &bpf_skb_event_output_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -6019,6 +6021,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sk_storage_get_proto;
 	case BPF_FUNC_sk_storage_delete:
 		return &bpf_sk_storage_delete_proto;
+	case BPF_FUNC_perf_event_output:
+		return &bpf_skb_event_output_proto;
 #ifdef CONFIG_SOCK_CGROUP_DATA
 	case BPF_FUNC_skb_cgroup_id:
 		return &bpf_skb_cgroup_id_proto;
@@ -6267,6 +6271,8 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sk_redirect_map_proto;
 	case BPF_FUNC_sk_redirect_hash:
 		return &bpf_sk_redirect_hash_proto;
+	case BPF_FUNC_perf_event_output:
+		return &bpf_skb_event_output_proto;
 #ifdef CONFIG_INET
 	case BPF_FUNC_sk_lookup_tcp:
 		return &bpf_sk_lookup_tcp_proto;
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply related

* [PATCH bpf-next v10 2/2] selftests/bpf: Add selftests for bpf_perf_event_output
From: Allan Zhang @ 2019-07-24  0:07 UTC (permalink / raw)
  To: netdev, bpf, songliubraving, daniel, andrii.nakryiko; +Cc: ast, Allan Zhang
In-Reply-To: <20190724000725.15634-1-allanzhang@google.com>

Software event output is only enabled by a few prog types.
This test is to ensure that all supported types are enabled for
bpf_perf_event_output successfully.

Signed-off-by: Allan Zhang <allanzhang@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/testing/selftests/bpf/test_verifier.c   | 12 ++-
 .../selftests/bpf/verifier/event_output.c     | 94 +++++++++++++++++++
 2 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/event_output.c

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 84135d5f4b35..44e2d640b088 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -50,7 +50,7 @@
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_TEST_INSNS	1000000
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	18
+#define MAX_NR_MAPS	19
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -84,6 +84,7 @@ struct bpf_test {
 	int fixup_map_array_wo[MAX_FIXUPS];
 	int fixup_map_array_small[MAX_FIXUPS];
 	int fixup_sk_storage_map[MAX_FIXUPS];
+	int fixup_map_event_output[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
 	uint32_t insn_processed;
@@ -632,6 +633,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_map_array_wo = test->fixup_map_array_wo;
 	int *fixup_map_array_small = test->fixup_map_array_small;
 	int *fixup_sk_storage_map = test->fixup_sk_storage_map;
+	int *fixup_map_event_output = test->fixup_map_event_output;
 
 	if (test->fill_helper) {
 		test->fill_insns = calloc(MAX_TEST_INSNS, sizeof(struct bpf_insn));
@@ -793,6 +795,14 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_sk_storage_map++;
 		} while (*fixup_sk_storage_map);
 	}
+	if (*fixup_map_event_output) {
+		map_fds[18] = __create_map(BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+					   sizeof(int), sizeof(int), 1, 0);
+		do {
+			prog[*fixup_map_event_output].imm = map_fds[18];
+			fixup_map_event_output++;
+		} while (*fixup_map_event_output);
+	}
 }
 
 static int set_admin(bool admin)
diff --git a/tools/testing/selftests/bpf/verifier/event_output.c b/tools/testing/selftests/bpf/verifier/event_output.c
new file mode 100644
index 000000000000..130553e19eca
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/event_output.c
@@ -0,0 +1,94 @@
+/* instructions used to output a skb based software event, produced
+ * from code snippet:
+ * struct TMP {
+ *  uint64_t tmp;
+ * } tt;
+ * tt.tmp = 5;
+ * bpf_perf_event_output(skb, &connection_tracking_event_map, 0,
+ *			 &tt, sizeof(tt));
+ * return 1;
+ *
+ * the bpf assembly from llvm is:
+ *        0:       b7 02 00 00 05 00 00 00         r2 = 5
+ *        1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
+ *        2:       bf a4 00 00 00 00 00 00         r4 = r10
+ *        3:       07 04 00 00 f8 ff ff ff         r4 += -8
+ *        4:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00    r2 = 0ll
+ *        6:       b7 03 00 00 00 00 00 00         r3 = 0
+ *        7:       b7 05 00 00 08 00 00 00         r5 = 8
+ *        8:       85 00 00 00 19 00 00 00         call 25
+ *        9:       b7 00 00 00 01 00 00 00         r0 = 1
+ *       10:       95 00 00 00 00 00 00 00         exit
+ *
+ *     The reason I put the code here instead of fill_helpers is that map fixup
+ *     is against the insns, instead of filled prog.
+ */
+
+#define __PERF_EVENT_INSNS__					\
+	BPF_MOV64_IMM(BPF_REG_2, 5),				\
+	BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -8),		\
+	BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),			\
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),			\
+	BPF_LD_MAP_FD(BPF_REG_2, 0),				\
+	BPF_MOV64_IMM(BPF_REG_3, 0),				\
+	BPF_MOV64_IMM(BPF_REG_5, 8),				\
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,		\
+		     BPF_FUNC_perf_event_output),		\
+	BPF_MOV64_IMM(BPF_REG_0, 1),				\
+	BPF_EXIT_INSN(),
+{
+	"perfevent for sockops",
+	.insns = { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_SOCK_OPS,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for tc",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for lwt out",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_LWT_OUT,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for xdp",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_XDP,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for socket filter",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for sk_skb",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_SK_SKB,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"perfevent for cgroup skb",
+	.insns =  { __PERF_EVENT_INSNS__ },
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	.fixup_map_event_output = { 4 },
+	.result = ACCEPT,
+	.retval = 1,
+},
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply related

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Shannon Nelson @ 2019-07-24  0:07 UTC (permalink / raw)
  To: Saeed Mahameed, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <1ce8b25ccb5632a33be6d18714aadfdabd4105ce.camel@mellanox.com>

On 7/23/19 4:20 PM, Saeed Mahameed wrote:
> On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
>>

>> @@ -607,6 +947,8 @@ static void ionic_lif_free(struct lif *lif)
>>   	ionic_qcqs_free(lif);
>>   	ionic_lif_reset(lif);
>>   
> I don't think you want deferred.work running while reset is executing..
> cancel_work_sync should happen as early as you close the netdevice.

Given the current implementation, it doesn't actually hurt anything, but 
yes it makes sense to move it up in the sequence.

> I assume ionic_lif_reset will flush all configurations and you don't
> need to cleanup anything manually?  or any data structure stored in
> driver ?

Most of the driver data structure cleaning has happened in 
ionic_lif_deinit() before getting here.

sln


^ permalink raw reply

* Re: [bpf-next 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-24  0:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Petar Penkov, Networking, bpf, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, lmb,
	Stanislav Fomichev
In-Reply-To: <8736ix3p8h.fsf@toke.dk>

On Tue, Jul 23, 2019 at 5:33 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Petar Penkov <ppenkov.kernel@gmail.com> writes:
>
> > From: Petar Penkov <ppenkov@google.com>
> >
> > This helper function allows BPF programs to try to generate SYN
> > cookies, given a reference to a listener socket. The function works
> > from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
> > socket in both cases.
> >
> > Signed-off-by: Petar Penkov <ppenkov@google.com>
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  include/uapi/linux/bpf.h | 30 ++++++++++++++++-
> >  net/core/filter.c        | 73 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 102 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 6f68438aa4ed..20baee7b2219 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2713,6 +2713,33 @@ union bpf_attr {
> >   *           **-EPERM** if no permission to send the *sig*.
> >   *
> >   *           **-EAGAIN** if bpf program can try again.
> > + *
> > + * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
> > + *   Description
> > + *           Try to issue a SYN cookie for the packet with corresponding
> > + *           IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
> > + *
> > + *           *iph* points to the start of the IPv4 or IPv6 header, while
> > + *           *iph_len* contains **sizeof**\ (**struct iphdr**) or
> > + *           **sizeof**\ (**struct ip6hdr**).
> > + *
> > + *           *th* points to the start of the TCP header, while *th_len*
> > + *           contains the length of the TCP header.
> > + *
> > + *   Return
> > + *           On success, lower 32 bits hold the generated SYN cookie in
> > + *           followed by 16 bits which hold the MSS value for that cookie,
> > + *           and the top 16 bits are unused.
> > + *
> > + *           On failure, the returned value is one of the following:
> > + *
> > + *           **-EINVAL** SYN cookie cannot be issued due to error
> > + *
> > + *           **-ENOENT** SYN cookie should not be issued (no SYN flood)
> > + *
> > + *           **-ENOTSUPP** kernel configuration does not enable SYN
> > cookies
>
> nit: This should be EOPNOTSUPP - the other one is for NFS...
Will correct this in a v2, thanks for catching that!

>
> > + *
> > + *           **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)                \
> >       FN(unspec),                     \
> > @@ -2824,7 +2851,8 @@ union bpf_attr {
> >       FN(strtoul),                    \
> >       FN(sk_storage_get),             \
> >       FN(sk_storage_delete),          \
> > -     FN(send_signal),
> > +     FN(send_signal),                \
> > +     FN(tcp_gen_syncookie),
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >   * function eBPF program intends to call
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 47f6386fb17a..92114271eff6 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5850,6 +5850,75 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
> >       .arg5_type      = ARG_CONST_SIZE,
> >  };
> >
> > +BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
> > +        struct tcphdr *, th, u32, th_len)
> > +{
> > +#ifdef CONFIG_SYN_COOKIES
> > +     u32 cookie;
> > +     u16 mss;
> > +
> > +     if (unlikely(th_len < sizeof(*th) || th_len != th->doff * 4))
> > +             return -EINVAL;
> > +
> > +     if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
> > +             return -EINVAL;
> > +
> > +     if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
> > +             return -ENOENT;
> > +
> > +     if (!th->syn || th->ack || th->fin || th->rst)
> > +             return -EINVAL;
> > +
> > +     if (unlikely(iph_len < sizeof(struct iphdr)))
> > +             return -EINVAL;
> > +
> > +     /* Both struct iphdr and struct ipv6hdr have the version field at the
> > +      * same offset so we can cast to the shorter header (struct iphdr).
> > +      */
> > +     switch (((struct iphdr *)iph)->version) {
> > +     case 4:
> > +             if (sk->sk_family == AF_INET6 && sk->sk_ipv6only)
> > +                     return -EINVAL;
> > +
> > +             mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
> > +             break;
> > +
> > +#if IS_BUILTIN(CONFIG_IPV6)
> > +     case 6:
> > +             if (unlikely(iph_len < sizeof(struct ipv6hdr)))
> > +                     return -EINVAL;
> > +
> > +             if (sk->sk_family != AF_INET6)
> > +                     return -EINVAL;
> > +
> > +             mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
> > +             break;
> > +#endif /* CONFIG_IPV6 */
> > +
> > +     default:
> > +             return -EPROTONOSUPPORT;
> > +     }
> > +     if (mss <= 0)
> > +             return -ENOENT;
> > +
> > +     return cookie | ((u64)mss << 32);
> > +#else
> > +     return -ENOTSUPP;
>
> See above
>
> > +#endif /* CONFIG_SYN_COOKIES */
> > +}
> > +
> > +static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
> > +     .func           = bpf_tcp_gen_syncookie,
> > +     .gpl_only       = true, /* __cookie_v*_init_sequence() is GPL */
> > +     .pkt_access     = true,
> > +     .ret_type       = RET_INTEGER,
> > +     .arg1_type      = ARG_PTR_TO_SOCK_COMMON,
> > +     .arg2_type      = ARG_PTR_TO_MEM,
> > +     .arg3_type      = ARG_CONST_SIZE,
> > +     .arg4_type      = ARG_PTR_TO_MEM,
> > +     .arg5_type      = ARG_CONST_SIZE,
> > +};
> > +
> >  #endif /* CONFIG_INET */
> >
> >  bool bpf_helper_changes_pkt_data(void *func)
> > @@ -6135,6 +6204,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >               return &bpf_tcp_check_syncookie_proto;
> >       case BPF_FUNC_skb_ecn_set_ce:
> >               return &bpf_skb_ecn_set_ce_proto;
> > +     case BPF_FUNC_tcp_gen_syncookie:
> > +             return &bpf_tcp_gen_syncookie_proto;
> >  #endif
> >       default:
> >               return bpf_base_func_proto(func_id);
> > @@ -6174,6 +6245,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >               return &bpf_xdp_skc_lookup_tcp_proto;
> >       case BPF_FUNC_tcp_check_syncookie:
> >               return &bpf_tcp_check_syncookie_proto;
> > +     case BPF_FUNC_tcp_gen_syncookie:
> > +             return &bpf_tcp_gen_syncookie_proto;
> >  #endif
> >       default:
> >               return bpf_base_func_proto(func_id);
> > --
> > 2.22.0.657.g960e92d24f-goog

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox