Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
From: Eric Dumazet @ 2022-04-26 16:13 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev
In-Reply-To: <b4df9653b93b9b0bdc8a91f5560ec027848200a9.camel@redhat.com>

On Tue, Apr 26, 2022 at 8:28 AM Paolo Abeni <pabeni@redhat.com> wrote:
>

> I'm unsure I explained my doubt in a clear way: what I fear is that the
> compiler could emit a single read instruction, corresponding to the
> READ_ONCE() outside the lock, so that the spin-locked section will
> operate on "old" defer_list.
>
> If that happens we could end-up with 'defer_count' underestimating the
> list lenght. It looks like that is tolerable, as we will still be
> protected vs defer_list growing too much.

defer_count is always read/written under the protection of the spinlock.
It must be very precise, unless I am mistaken.

>
> Acked-by: Paolo Abeni <pabeni@redhat.com>
>
>

Thanks !

^ permalink raw reply

* Re: [PATCH bpf-next 1/4] libbpf: Define DEFAULT_BPFFS
From: Yafang Shao @ 2022-04-26 16:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
	bpf
In-Reply-To: <CAEf4Bza_8d_K22DFRzGHYAQdz_y1+9b_bfSc0t0EkdM4nyy7Hw@mail.gmail.com>

On Tue, Apr 26, 2022 at 2:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Apr 23, 2022 at 7:01 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Let's use a macro DEFAULT_BPFFS instead of the hard-coded "/sys/fs/bpf".
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  tools/lib/bpf/bpf_helpers.h | 2 +-
> >  tools/lib/bpf/libbpf.c      | 2 +-
> >  tools/lib/bpf/libbpf.h      | 6 ++++--
> >  3 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> > index 44df982d2a5c..9161ebcd3466 100644
> > --- a/tools/lib/bpf/bpf_helpers.h
> > +++ b/tools/lib/bpf/bpf_helpers.h
> > @@ -137,7 +137,7 @@ struct bpf_map_def {
> >
> >  enum libbpf_pin_type {
> >         LIBBPF_PIN_NONE,
> > -       /* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
> > +       /* PIN_BY_NAME: pin maps by name (in DEFAULT_BPFFS by default) */
>
> how is this improving things? now I need to grep some more to find out
> what's the value of DEFAULT_BPFFS is
>

The new added one also uses the "/sys/fs/bpf", so I defined a macro
for them, then they can be kept the same.
I won't change it if you object to it.

[snip]

-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH bpf-next 0/4] bpf: Generate helpers for pinning through bpf object skeleton
From: Yafang Shao @ 2022-04-26 16:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
	bpf
In-Reply-To: <CAEf4BzbhODOBrE=unLOUpo40uUYz72BJX-+uJobiwhF9VFSizQ@mail.gmail.com>

On Tue, Apr 26, 2022 at 2:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Apr 23, 2022 at 7:01 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Currently there're helpers for allowing to open/load/attach BPF object
> > through BPF object skeleton. Let's also add helpers for pinning through
> > BPF object skeleton. It could simplify BPF userspace code which wants to
> > pin the progs into bpffs.
> >
> > After this change, with command 'bpftool gen skeleton XXX.bpf.o', the
> > helpers for pinning BPF prog will be generated in BPF object skeleton.
> >
> > The new helpers are named with __{pin, unpin}_prog, because it only pins
> > bpf progs. If the user also wants to pin bpf maps, he can use
> > LIBBPF_PIN_BY_NAME.
>
> API says it's pinning programs, but really it's trying to pin links.

Actually it should be bpf_object__pin_skeleton_link().

> But those links might not even be created for non-auto-attachable
> programs, and for others users might or might not set
> <skel>.links.<prog_name> links.
>
> There are lots of questions about this new functionality... But the
> main one is why do we need it? What does it bring that's hard to do
> otherwise?
>

See also my replyment to Daniel[1].
For the FD-based bpf objects, the userspace code is similar, so we can
abstract the userspace code into a common code, and then the developer
doesn't need to write the userspace code any more (if he doesn't have
some special userspace logical.).


[1]. https://lore.kernel.org/bpf/CAEf4BzbhODOBrE=unLOUpo40uUYz72BJX-+uJobiwhF9VFSizQ@mail.gmail.com/T/#m32dfc6343f2b4fba980c62686b245cb6e0133c2f


> >
> > Yafang Shao (4):
> >   libbpf: Define DEFAULT_BPFFS
> >   libbpf: Add helpers for pinning bpf prog through bpf object skeleton
> >   bpftool: Fix incorrect return in generated detach helper
> >   bpftool: Generate helpers for pinning prog through bpf object skeleton
> >
> >  tools/bpf/bpftool/gen.c     | 18 ++++++++++-
> >  tools/lib/bpf/bpf_helpers.h |  2 +-
> >  tools/lib/bpf/libbpf.c      | 61 ++++++++++++++++++++++++++++++++++++-
> >  tools/lib/bpf/libbpf.h      | 10 ++++--
> >  tools/lib/bpf/libbpf.map    |  2 ++
> >  5 files changed, 88 insertions(+), 5 deletions(-)
> >
> > --
> > 2.17.1
> >



-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCHv2 bpf-next 1/4] kallsyms: Add kallsyms_lookup_names function
From: Masami Hiramatsu @ 2022-04-26 16:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	netdev, bpf, lkml, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh
In-Reply-To: <YmflGEbjkp8mynxK@krava>

On Tue, 26 Apr 2022 14:27:04 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> On Tue, Apr 26, 2022 at 07:01:08PM +0900, Masami Hiramatsu wrote:
> > Hi Jiri,
> > 
> > Sorry for replying late.
> > 
> > On Fri, 22 Apr 2022 08:47:13 +0200
> > Jiri Olsa <olsajiri@gmail.com> wrote:
> > 
> > > On Tue, Apr 19, 2022 at 10:26:05AM +0200, Jiri Olsa wrote:
> > > 
> > > SNIP
> > > 
> > > > > > +static int kallsyms_callback(void *data, const char *name,
> > > > > > +			     struct module *mod, unsigned long addr)
> > > > > > +{
> > > > > > +	struct kallsyms_data *args = data;
> > > > > > +
> > > > > > +	if (!bsearch(&name, args->syms, args->cnt, sizeof(*args->syms), symbols_cmp))
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	addr = ftrace_location(addr);
> > > > > > +	if (!addr)
> > > > > > +		return 0;
> > > > > 
> > > > > Ooops, wait. Did you do this last version? I missed this point.
> > > > > This changes the meanings of the kernel function.
> > > > 
> > > > yes, it was there before ;-) and you're right.. so some archs can
> > > > return different address, I did not realize that
> > > > 
> > > > > 
> > > > > > +
> > > > > > +	args->addrs[args->found++] = addr;
> > > > > > +	return args->found == args->cnt ? 1 : 0;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * kallsyms_lookup_names - Lookup addresses for array of symbols
> > > > > 
> > > > > More correctly "Lookup 'ftraced' addresses for array of sorted symbols", right?
> > > > > 
> > > > > I'm not sure, we can call it as a 'kallsyms' API, since this is using
> > > > > kallsyms but doesn't return symbol address, but ftrace address.
> > > > > I think this name misleads user to expect returning symbol address.
> > > > > 
> > > > > > + *
> > > > > > + * @syms: array of symbols pointers symbols to resolve, must be
> > > > > > + * alphabetically sorted
> > > > > > + * @cnt: number of symbols/addresses in @syms/@addrs arrays
> > > > > > + * @addrs: array for storing resulting addresses
> > > > > > + *
> > > > > > + * This function looks up addresses for array of symbols provided in
> > > > > > + * @syms array (must be alphabetically sorted) and stores them in
> > > > > > + * @addrs array, which needs to be big enough to store at least @cnt
> > > > > > + * addresses.
> > > > > 
> > > > > Hmm, sorry I changed my mind. I rather like to expose kallsyms_on_each_symbol()
> > > > > and provide this API from fprobe or ftrace, because this returns ftrace address
> > > > > and thus this is only used from fprobe.
> > > > 
> > > > ok, so how about:
> > > > 
> > > >   int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs);
> > > 
> > > quick question.. is it ok if it stays in kalsyms.c object?
> > 
> > I think if this is for the ftrace API, I think it should be in the ftrace.c, and
> > it can remove unneeded #ifdefs in C code.
> > 
> > > 
> > > so we don't need to expose kallsyms_on_each_symbol,
> > > and it stays in 'kalsyms' place
> > 
> > We don't need to expose it to modules, but just make it into a global scope.
> > I don't think that doesn't cause a problem.

Oops, I meant "I don't think that cause any problem."

> 
> np, will move it to ftrace

Thank you!

> 
> thanks,
> jirka


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
From: Igor Russkikh @ 2022-04-26 16:00 UTC (permalink / raw)
  To: Grant Grundler, Dmitry Bezrukov
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi
In-Reply-To: <CANEJEGtaFCRhVBaVtHrQiJvwsuBk3f_4RNTg87CWERHt+453KA@mail.gmail.com>

Hi Grant,

Sorry for the delay, I was on vacation.
Thanks for working on this.

I'm adding here Dmitrii, to help me review the patches.
Dmitrii, here is a full series:

https://patchwork.kernel.org/project/netdevbpf/cover/20220418231746.2464800-1-grundler@chromium.org/

Grant, I've reviewed and also quite OK with patches 1-4.

For patch 5 - why do you think we need these extra comparisons with software head/tail?
From what I see in logic, only the size limiting check is enough there..

Other extra checks are tricky and non intuitive..

Regards,
  Igor

On 4/21/2022 9:53 PM, Grant Grundler wrote:
> External Email
> 
> ----------------------------------------------------------------------
> Igor,
> Will you have a chance to comment on this in the near future?
> Should someone else review/integrate these patches?
> 
> I'm asking since I've seen no comments in the past three days.
> 
> cheers,
> grant
> 
> 
> On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler <grundler@chromium.org> 
> wrote:
>>
>> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
>> in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
>> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeSvQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
>>
>> It essentially describes four problems:
>> 1) validate rxd_wb->next_desc_ptr before populating buff->next
>> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
>> 3) limit iterations handling fragments in aq_ring_rx_clean()
>> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
>>
>> I've added one "clean up" contribution:
>>     "net: atlantic: reduce scope of is_rsc_complete"
>>
>> I tested the "original" patches using chromeos-v5.4 kernel branch:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dreview.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28status-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
>>
>> The fuzzing team will retest using the chromeos-v5.4 patches and the b0 
>> HW.
>>
>> I've forward ported those patches to 5.18-rc2 and compiled them but am
>> currently unable to test them on 5.18-rc2 kernel (logistics problems).
>>
>> I'm confident in all but the last patch:
>>    "net: atlantic: verify hw_head_ is reasonable"
>>
>> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
>> are used in hw_atl_b0_hw_ring_tx_head_update().
>>
>> Credit largely goes to Chrome OS Fuzzing team members:
>>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
>>
>> cheers,
>> grant

^ permalink raw reply

* Re: [PATCH bpf-next 4/4] bpftool: Generate helpers for pinning prog through bpf object skeleton
From: Yafang Shao @ 2022-04-26 16:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin Lau, Song Liu,
	Yonghong Song, john fastabend, KP Singh, netdev, bpf
In-Reply-To: <33e5314f-8546-3945-c73b-25ee13d1b368@iogearbox.net>

On Mon, Apr 25, 2022 at 10:03 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/23/22 4:00 PM, Yafang Shao wrote:
> > After this change, with command 'bpftool gen skeleton XXX.bpf.o', the
> > helpers for pinning BPF prog will be generated in BPF object skeleton. It
> > could simplify userspace code which wants to pin bpf progs in bpffs.
> >
> > The new helpers are named with __{pin, unpin}_prog, because it only pins
> > bpf progs. If the user also wants to pin bpf maps in bpffs, he can use
> > LIBBPF_PIN_BY_NAME.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >   tools/bpf/bpftool/gen.c | 16 ++++++++++++++++
> >   1 file changed, 16 insertions(+)
> >
> > diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
> > index 8f76d8d9996c..1d06ebde723b 100644
> > --- a/tools/bpf/bpftool/gen.c
> > +++ b/tools/bpf/bpftool/gen.c
> > @@ -1087,6 +1087,8 @@ static int do_skeleton(int argc, char **argv)
> >                       static inline int load(struct %1$s *skel);          \n\
> >                       static inline int attach(struct %1$s *skel);        \n\
> >                       static inline void detach(struct %1$s *skel);       \n\
> > +                     static inline int pin_prog(struct %1$s *skel, const char *path);\n\
> > +                     static inline void unpin_prog(struct %1$s *skel);   \n\
> >                       static inline void destroy(struct %1$s *skel);      \n\
> >                       static inline const void *elf_bytes(size_t *sz);    \n\
> >               #endif /* __cplusplus */                                    \n\
> > @@ -1172,6 +1174,18 @@ static int do_skeleton(int argc, char **argv)
> >               %1$s__detach(struct %1$s *obj)                              \n\
> >               {                                                           \n\
> >                       bpf_object__detach_skeleton(obj->skeleton);         \n\
> > +             }                                                           \n\
> > +                                                                         \n\
> > +             static inline int                                           \n\
> > +             %1$s__pin_prog(struct %1$s *obj, const char *path)          \n\
> > +             {                                                           \n\
> > +                     return bpf_object__pin_skeleton_prog(obj->skeleton, path);\n\
> > +             }                                                           \n\
> > +                                                                         \n\
> > +             static inline void                                          \n\
> > +             %1$s__unpin_prog(struct %1$s *obj)                          \n\
> > +             {                                                           \n\
> > +                     bpf_object__unpin_skeleton_prog(obj->skeleton);     \n\
> >               }                                                           \n\
>
> (This should also have BPF selftest code as in-tree user.)
>

Will do it, thanks.

[snip]

-- 
Regards
Yafang

^ permalink raw reply

* [PATCH net v3] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
From: Duoming Zhou @ 2022-04-26 15:59 UTC (permalink / raw)
  To: krzysztof.kozlowski, pabeni, linux-kernel
  Cc: davem, kuba, netdev, gregkh, alexander.deucher, broonie, akpm,
	linma, Duoming Zhou

There are destructive operations such as nfcmrvl_fw_dnld_abort and
gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
gpio and so on could be destructed while the upper layer functions such as
nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
to double-free, use-after-free and null-ptr-deref bugs.

There are three situations that could lead to double-free bugs.

The first situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |  nfcmrvl_nci_unregister_dev
 release_firmware()           |   nfcmrvl_fw_dnld_abort
  kfree(fw) //(1)             |    fw_dnld_over
                              |     release_firmware
  ...                         |      kfree(fw) //(2)
                              |     ...

The second situation is shown below:

   (Thread 1)                 |      (Thread 2)
nfcmrvl_fw_dnld_start         |
 ...                          |
 mod_timer                    |
 (wait a time)                |
 fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
   fw_dnld_over               |   nfcmrvl_fw_dnld_abort
    release_firmware          |    fw_dnld_over
     kfree(fw) //(1)          |     release_firmware
     ...                      |      kfree(fw) //(2)

The third situation is shown below:

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_recv_frame          |
 if(..->fw_download_in_progress)|
  nfcmrvl_fw_dnld_recv_frame    |
   queue_work                   |
                                |
fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
 fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
  release_firmware              |   fw_dnld_over
   kfree(fw) //(1)              |    release_firmware
                                |     kfree(fw) //(2)

The firmware struct is deallocated in position (1) and deallocated
in position (2) again.

The crash trace triggered by POC is like below:

[  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
[  122.640457] Call Trace:
[  122.640457]  <TASK>
[  122.640457]  kfree+0xb0/0x330
[  122.640457]  fw_dnld_over+0x28/0xf0
[  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
[  122.640457]  nci_uart_tty_close+0x87/0xd0
[  122.640457]  tty_ldisc_kill+0x3e/0x80
[  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
[  122.640457]  __tty_hangup.part.0+0x316/0x520
[  122.640457]  tty_release+0x200/0x670
[  122.640457]  __fput+0x110/0x410
[  122.640457]  task_work_run+0x86/0xd0
[  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
[  122.640457]  syscall_exit_to_user_mode+0x19/0x50
[  122.640457]  do_syscall_64+0x48/0x90
[  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  122.640457] RIP: 0033:0x7f68433f6beb

What's more, there are also use-after-free and null-ptr-deref bugs
in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
then, we dereference firmware, gpio or the members of priv->fw_dnld in
nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.

This patch reorders destructive operations after nci_unregister_device
and adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_unregister_dev      |
  nci_unregister_device         |
    nfc_unregister_device       | nfc_fw_download
      device_lock()             |
      ...                       |
      nfc_download = false;     |   ...
      device_unlock()           |
  ...                           |   device_lock()
  (destructive operations)      |   if(.. || !nfc_download)
                                |     goto error;
                                | error:
                                |   device_unlock()

If the device is detaching, the download function will goto error.
If the download function is executing, nci_unregister_device will
wait until download function is finished.

Fixes: 3194c6870158 ("NFC: nfcmrvl: add firmware download support")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
---
Changes in v3:
  - Add bool variable to synchronize.
  - Make commit message clearer.

 drivers/nfc/nfcmrvl/main.c | 2 +-
 net/nfc/core.c             | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
 	struct nci_dev *ndev = priv->ndev;
 
+	nci_unregister_device(ndev);
 	if (priv->ndev->nfc_dev->fw_download_in_progress)
 		nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 	if (gpio_is_valid(priv->config.reset_n_io))
 		gpio_free(priv->config.reset_n_io);
 
-	nci_unregister_device(ndev);
 	nci_free_device(ndev);
 	kfree(priv);
 }
diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..da8199f67d4 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,7 @@
 #define NFC_CHECK_PRES_FREQ_MS	2000
 
 int nfc_devlist_generation;
+bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */
@@ -38,7 +39,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
 	device_lock(&dev->dev);
 
-	if (!device_is_registered(&dev->dev)) {
+	if (!device_is_registered(&dev->dev) || !nfc_download) {
 		rc = -ENODEV;
 		goto error;
 	}
@@ -1134,6 +1135,7 @@ int nfc_register_device(struct nfc_dev *dev)
 			dev->rfkill = NULL;
 		}
 	}
+	nfc_download = true;
 	device_unlock(&dev->dev);
 
 	rc = nfc_genl_device_added(dev);
@@ -1166,6 +1168,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
 		rfkill_unregister(dev->rfkill);
 		rfkill_destroy(dev->rfkill);
 	}
+	nfc_download = false;
 	device_unlock(&dev->dev);
 
 	if (dev->ops->check_presence) {
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next 2/4] libbpf: Add helpers for pinning bpf prog through bpf object skeleton
From: Yafang Shao @ 2022-04-26 15:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin Lau, Song Liu,
	Yonghong Song, john fastabend, KP Singh, netdev, bpf
In-Reply-To: <29b077a7-1e99-9436-bd5a-4277651e09db@iogearbox.net>

On Mon, Apr 25, 2022 at 9:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/23/22 4:00 PM, Yafang Shao wrote:
> > Currently there're helpers for allowing to open/load/attach BPF object
> > through BPF object skeleton. Let's also add helpers for pinning through
> > BPF object skeleton. It could simplify BPF userspace code which wants to
> > pin the progs into bpffs.
>
> Please elaborate some more on your use case/rationale for the commit message,
> do you have orchestration code that will rely on these specifically?
>

We have a bpf manager on our production environment to maintain the
bpf programs, some of which need to be pinned in bpffs, for example
tracing bpf programs, perf_event programs and other bpf hooks added by
ourselves for performance tuning.  These bpf programs don't need a
user agent, while they really work like a kernel module, that is why
we pin them. For these kinds of bpf programs, the bpf manager can help
to simplify the development and deployment.  Take the improvement on
development for example,  the user doesn't need to write userspace
code while he focuses on the kernel side only, and then bpf manager
will do all the other things. Below is a simple example,
   Step1, gen the skeleton for the user provided bpf object file,
              $ bpftool gen skeleton  test.bpf.o > simple.skel.h
   Step2, Compile the bpf object file into a runnable binary
              #include "simple.skel.h"

              #define SIMPLE_BPF_PIN(name, path)  \
              ({                                                              \
                  struct name##_bpf *obj;                      \
                  int err = 0;                                            \
                                                                              \
                  obj = name##_bpf__open();                \
                   if (!obj) {                                              \
                       err = -errno;                                    \
                       goto cleanup;                                 \
                    }                                                         \
                                                                              \
                    err = name##_bpf__load(obj);           \
                    if (err)                                                 \
                        goto cleanup;                                 \
                                                                               \
                     err = name##_bpf__attach(obj);       \
                     if (err)                                                \
                         goto cleanup;                                \
                                                                               \
                     err = name##_bpf__pin_prog(obj, path);      \
                     if (err)                                                \
                         goto cleanup;                                \
                                                                              \
                      goto end;                                         \
                                                                              \
                  cleanup:                                              \
                      name##_bpf__destroy(obj);            \
                  end:                                                     \
                      err;                                                  \
                   })

                   SIMPLE_BPF_PIN(test, "/sys/fs/bpf");

               As the userspace code of FD-based bpf objects are all
the same,  so we can abstract them as above.  The pathset means to add
the non-exist "name##_bpf__pin_prog(obj, path)" for it.

> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >   tools/lib/bpf/libbpf.c   | 59 ++++++++++++++++++++++++++++++++++++++++
> >   tools/lib/bpf/libbpf.h   |  4 +++
> >   tools/lib/bpf/libbpf.map |  2 ++
> >   3 files changed, 65 insertions(+)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 13fcf91e9e0e..e7ed6c53c525 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -12726,6 +12726,65 @@ void bpf_object__detach_skeleton(struct bpf_object_skeleton *s)
> >       }
> >   }
> >
> > +int bpf_object__pin_skeleton_prog(struct bpf_object_skeleton *s,
> > +                               const char *path)
>
> These pin the link, not the prog itself, so the func name is a bit misleading? Also,
> what if is this needs to be more customized in future? It doesn't seem very generic.
>

Ah, it should be bpf_object__pin_skeleton_link().
I'm not sure if it can be extended to work for a non-auto-detachable
bpf program, but I know it is hard for pinning tc-bpf programs, which
is also running on our production environment, in this way.

> > +{
> > +     struct bpf_link *link;
> > +     int err;
> > +     int i;
> > +
> > +     if (!s->prog_cnt)
> > +             return libbpf_err(-EINVAL);
> > +
> > +     if (!path)
> > +             path = DEFAULT_BPFFS;
> > +
> > +     for (i = 0; i < s->prog_cnt; i++) {
> > +             char buf[PATH_MAX];
> > +             int len;
> > +
> > +             len = snprintf(buf, PATH_MAX, "%s/%s", path, s->progs[i].name);
> > +             if (len < 0) {
> > +                     err = -EINVAL;
> > +                     goto err_unpin_prog;
> > +             } else if (len >= PATH_MAX) {
> > +                     err = -ENAMETOOLONG;
> > +                     goto err_unpin_prog;
> > +             }
> > +
> > +             link = *s->progs[i].link;
> > +             if (!link) {
> > +                     err = -EINVAL;
> > +                     goto err_unpin_prog;
> > +             }
> > +
> > +             err = bpf_link__pin(link, buf);
> > +             if (err)
> > +                     goto err_unpin_prog;
> > +     }
> > +
> > +     return 0;
> > +
> > +err_unpin_prog:
> > +     bpf_object__unpin_skeleton_prog(s);
> > +
> > +     return libbpf_err(err);
> > +}
> > +
> > +void bpf_object__unpin_skeleton_prog(struct bpf_object_skeleton *s)
> > +{
> > +     struct bpf_link *link;
> > +     int i;
> > +
> > +     for (i = 0; i < s->prog_cnt; i++) {
> > +             link = *s->progs[i].link;
> > +             if (!link || !link->pin_path)
> > +                     continue;
> > +
> > +             bpf_link__unpin(link);
> > +     }
> > +}
> > +
> >   void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s)
> >   {
> >       if (!s)
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index 3784867811a4..af44b0968cca 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -1427,6 +1427,10 @@ bpf_object__open_skeleton(struct bpf_object_skeleton *s,
> >   LIBBPF_API int bpf_object__load_skeleton(struct bpf_object_skeleton *s);
> >   LIBBPF_API int bpf_object__attach_skeleton(struct bpf_object_skeleton *s);
> >   LIBBPF_API void bpf_object__detach_skeleton(struct bpf_object_skeleton *s);
> > +LIBBPF_API int
> > +bpf_object__pin_skeleton_prog(struct bpf_object_skeleton *s, const char *path);
> > +LIBBPF_API void
> > +bpf_object__unpin_skeleton_prog(struct bpf_object_skeleton *s);
> >   LIBBPF_API void bpf_object__destroy_skeleton(struct bpf_object_skeleton *s);
>
> Please also add API documentation.
>

Sure.

> >   struct bpf_var_skeleton {
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index 82f6d62176dd..4e3e37b84b3a 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -55,6 +55,8 @@ LIBBPF_0.0.1 {
> >               bpf_object__unload;
> >               bpf_object__unpin_maps;
> >               bpf_object__unpin_programs;
> > +             bpf_object__pin_skeleton_prog;
> > +             bpf_object__unpin_skeleton_prog;
>
> This would have to go under LIBBPF_0.8.0 if so.
>

Thanks, I will change it.

> >               bpf_perf_event_read_simple;
> >               bpf_prog_attach;
> >               bpf_prog_detach;
> >
>

-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH RFC 4/5] net/tls: Add support for PF_TLSH (a TLS handshake listener)
From: Chuck Lever III @ 2022-04-26 15:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Linux NFS Mailing List, linux-nvme@lists.infradead.org,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	ak@tempesta-tech.com, borisp@nvidia.com, simo@redhat.com
In-Reply-To: <20220426075504.18be4ee2@kernel.org>



> On Apr 26, 2022, at 10:55 AM, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Tue, 26 Apr 2022 13:48:20 +0000 Chuck Lever III wrote:
>>> Create the socket in user space, do all the handshakes you need there
>>> and then pass it to the kernel.  This is how NBD + TLS works.  Scales
>>> better and requires much less kernel code.  
>> 
>> The RPC-with-TLS standard allows unencrypted RPC traffic on the connection
>> before sending ClientHello. I think we'd like to stick with creating the
>> socket in the kernel, for this reason and for the reasons Hannes mentions
>> in his reply.
> 
> Umpf, I presume that's reviewed by security people in IETF so I guess
> it's done right this time (tm).

> Your wording seems careful not to imply that you actually need that,
> tho. Am I over-interpreting?

RPC-with-TLS requires one RPC as a "starttls" token. That could be
done in user space as part of the handshake, but it is currently
done in the kernel to enable the user agent to be shared with other
kernel consumers of TLS. Keep in mind that we already have two
real consumers: NVMe and RPC-with-TLS; and possibly QUIC.

You asserted earlier that creating sockets in user space "scales
better" but did not provide any data. Can we see some? How well
does it need to scale for storage protocols that use long-lived
connections?

Also, why has no-one mentioned the NBD on TLS implementation to
us before? I will try to review that code soon.


> This set does not even have selftests.

I can include unit tests with the prototype. Someone needs to
educate me on what is the preferred unit test paradigm for this
type of subsystem. Examples in the current kernel code base would
help too.


> Plus there are more protocols being actively worked on (QUIC, PSP etc.)
> Having per ULP special sauce to invoke a user space helper is not the
> paradigm we chose, and the time as inopportune as ever to change that.

When we started discussing TLS handshake requirements with some
community members several years ago, creating the socket in
kernel and passing it up to a user agent was the suggested design.
Has that recommendation changed since then?

I'd prefer an in-kernel handshake implementation over a user
space one (even one that is sharable amongst transports and ULPs
as my proposal is intended to be). However, so far we've been told
that an in-kernel handshake implementation is a non-starter.

But in the abstract, we agree that having a single TLS handshake
mechanism for kernel consumers is preferable.


--
Chuck Lever




^ permalink raw reply

* Re: [PATCH RFC 4/5] net/tls: Add support for PF_TLSH (a TLS handshake listener)
From: Hannes Reinecke @ 2022-04-26 15:58 UTC (permalink / raw)
  To: Jakub Kicinski, Sagi Grimberg
  Cc: Chuck Lever, netdev, linux-nfs, linux-nvme, linux-cifs,
	linux-fsdevel, ak, borisp, simo
In-Reply-To: <20220426080247.19bbb64e@kernel.org>

On 4/26/22 17:02, Jakub Kicinski wrote:
> On Tue, 26 Apr 2022 17:29:03 +0300 Sagi Grimberg wrote:
>>>> Create the socket in user space, do all the handshakes you need there
>>>> and then pass it to the kernel.  This is how NBD + TLS works.  Scales
>>>> better and requires much less kernel code.
>>>>   
>>> But we can't, as the existing mechanisms (at least for NVMe) creates the
>>> socket in-kernel.
>>> Having to create the socket in userspace would require a completely new
>>> interface for nvme and will not be backwards compatible.
>>
>> And we will still need the upcall anyways when we reconnect
>> (re-establish the socket)
> 
> That totally flew over my head, I have zero familiarity with in-kernel
> storage network users :S
> 
Count yourself lucky.

> In all honesty the tls code in the kernel is a bit of a dumping ground.
> People come, dump a bunch of code and disappear. Nobody seems to care
> that the result is still (years in) not ready for production use :/
> Until a month ago it'd break connections even under moderate memory
> pressure. This set does not even have selftests.
> 
Well, I'd been surprised that it worked, too.
And even more so that Boris Piskenny @ Nvidia is actively working on it.
(Thanks, Sagi!)

> Plus there are more protocols being actively worked on (QUIC, PSP etc.)
> Having per ULP special sauce to invoke a user space helper is not the
> paradigm we chose, and the time as inopportune as ever to change that.

Which is precisely what we hope to discuss at LSF.
(Yes, I know, probably not the best venue to discuss network stuff ...)

Each approach has its drawbacks:

- Establishing sockets from userspace will cause issues during 
reconnection, as then someone (aka the kernel) will have to inform 
userspace that a new connection will need to be established.
(And that has to happen while the root filesystem is potentially 
inaccessible, so you can't just call arbitrary commands here)
(Especially call_usermodehelper() is out of the game)
- Having ULP helpers (as with this design) mitigates that problem 
somewhat in the sense that you can mlock() that daemon and having it 
polling on an intermediate socket; that solves the notification problem.
But you have to have ULP special sauce here to make it work.
- Moving everything in kernel is ... possible. But then you have yet 
another security-relevant piece of code in the kernel which needs to be 
audited, CVEd etc. Not to mention the usual policy discussion whether it 
really belongs into the kernel.

So I don't really see any obvious way to go; best we can do is to pick 
the least ugly :-(

Cheers,

Hannes

^ permalink raw reply

* Re: [PATCH perf/core 1/5] libbpf: Add bpf_program__set_insns function
From: Andrii Nakryiko @ 2022-04-26 15:58 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Daniel Borkmann, Jiri Olsa, Arnaldo Carvalho de Melo,
	Alexei Starovoitov, Andrii Nakryiko, linux-perf-use., Networking,
	bpf, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Ian Rogers
In-Reply-To: <YmeXx0mfy4Nr5jEB@krava>

On Mon, Apr 25, 2022 at 11:57 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Apr 25, 2022 at 11:19:09PM -0700, Andrii Nakryiko wrote:
> > On Mon, Apr 25, 2022 at 9:22 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 4/22/22 12:00 PM, Jiri Olsa wrote:
> > > > Adding bpf_program__set_insns that allows to set new
> > > > instructions for program.
> > > >
> > > > Also moving bpf_program__attach_kprobe_multi_opts on
> > > > the proper name sorted place in map file.
> >
> > would make sense to fix it as a separate patch, it has nothing to do
> > with bpf_program__set_insns() API itself
>
> np
>
> >
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >   tools/lib/bpf/libbpf.c   |  8 ++++++++
> > > >   tools/lib/bpf/libbpf.h   | 12 ++++++++++++
> > > >   tools/lib/bpf/libbpf.map |  3 ++-
> > > >   3 files changed, 22 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > > index 809fe209cdcc..284790d81c1b 100644
> > > > --- a/tools/lib/bpf/libbpf.c
> > > > +++ b/tools/lib/bpf/libbpf.c
> > > > @@ -8457,6 +8457,14 @@ size_t bpf_program__insn_cnt(const struct bpf_program *prog)
> > > >       return prog->insns_cnt;
> > > >   }
> > > >
> > > > +void bpf_program__set_insns(struct bpf_program *prog,
> > > > +                         struct bpf_insn *insns, size_t insns_cnt)
> > > > +{
> > > > +     free(prog->insns);
> > > > +     prog->insns = insns;
> > > > +     prog->insns_cnt = insns_cnt;
> >
> > let's not store user-provided pointer here. Please realloc prog->insns
> > as necessary and copy over insns into it.
> >
> > Also let's at least add the check for prog->loaded and return -EBUSY
> > in such a case. And of course this API should return int, not void.
>
> ok, will change
>
> >
> > > > +}
> > > > +
> > > >   int bpf_program__set_prep(struct bpf_program *prog, int nr_instances,
> > > >                         bpf_program_prep_t prep)
> > > >   {
> > > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > > index 05dde85e19a6..b31ad58d335f 100644
> > > > --- a/tools/lib/bpf/libbpf.h
> > > > +++ b/tools/lib/bpf/libbpf.h
> > > > @@ -323,6 +323,18 @@ struct bpf_insn;
> > > >    * different.
> > > >    */
> > > >   LIBBPF_API const struct bpf_insn *bpf_program__insns(const struct bpf_program *prog);
> > > > +
> > > > +/**
> > > > + * @brief **bpf_program__set_insns()** can set BPF program's underlying
> > > > + * BPF instructions.
> > > > + * @param prog BPF program for which to return instructions
> > > > + * @param insn a pointer to an array of BPF instructions
> > > > + * @param insns_cnt number of `struct bpf_insn`'s that form
> > > > + * specified BPF program
> > > > + */
> >
> > This API makes me want to cry... but I can't come up with anything
> > better for perf's use case.
> >

So thinking about this some more. If we make libbpf not to close maps
and prog FDs on BPF program load failure automatically and instead
doing it in bpf_object__close(), which would seem to be a totally fine
semantics and won't break any reasonable application as they always
have to call bpf_object__close() anyways to clean up all the
resources; we wouldn't need this horror of bpf_program__set_insns().
Your BPF program would fail to load, but you'll get its fully prepared
instructions with bpf_program__insns(), then you can just append
correct preamble. Meanwhile, all the maps will be created (they are
always created before the first program load), so all the FDs will be
correct.

This is certainly advanced knowledge of libbpf behavior, but the use
case you are trying to solve is also very unique and advanced (and I
wouldn't recommend anyone trying to do this anyways). WDYT? Would that
work?

> > But it can only more or less safely and sanely be used from the
> > prog_prepare_load_fn callback, so please add a big warning here saying
> > that this is a very advanced libbpf API and the user needs to know
> > what they are doing and this should be used from prog_prepare_load_fn
> > callback only. If bpf_program__set_insns() is called before
> > prog_prepare_load_fn any map/subprog/etc relocation will most probably
> > fail or corrupt BPF program code.
>
> will add the warnings
>
> >
> > > > +LIBBPF_API void bpf_program__set_insns(struct bpf_program *prog,
> > > > +                                    struct bpf_insn *insns, size_t insns_cnt);
> >
> > s/insns_cnt/insn_cnt/
> >
> > > > +
> > >
> > > Iiuc, patch 2 should be squashed into this one given they logically belong to the
> > > same change?
> > >
> > > Fwiw, I think the API description should be elaborated a bit more, in particular that
> > > the passed-in insns need to be from allocated dynamic memory which is later on passed
> > > to free(), and maybe also constraints at which point in time bpf_program__set_insns()
> > > may be called.. (as well as high-level description on potential use cases e.g. around
> > > patch 4).
> >
> > Yep, patch #1 is kind of broken without patch #2, so let's combine them.
>
> ok
>
> thanks,
> jirka

^ permalink raw reply

* Re: [PATCH iproute2 0/3] Fix some typos in doc and man pages
From: Andrea Claudi @ 2022-04-26 15:57 UTC (permalink / raw)
  To: stephen; +Cc: netdev, dsahern
In-Reply-To: <cover.1647955885.git.aclaudi@redhat.com>

On Tue, Mar 22, 2022 at 03:32:02PM +0100, Andrea Claudi wrote:
> As per description, this series contains some typo fixes on doc an man
> pages.
> 
> - patch 1/3 fixes a typo in a devlink example
> - patch 2/3 fixes typos on some man pages
> - patch 3/3 fixes a typo in the tc actions doc
> 
> Andrea Claudi (3):
>   man: devlink-region: fix typo in example
>   man: fix some typos
>   doc: fix 'infact' --> 'in fact' typo
> 
>  doc/actions/actions-general | 2 +-
>  man/man8/dcb-app.8          | 2 +-
>  man/man8/dcb-dcbx.8         | 2 +-
>  man/man8/devlink-dev.8      | 2 +-
>  man/man8/devlink-region.8   | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> -- 
> 2.35.1
>

Hi Stephen, is there any issue with this series? Should I resend it?

Thanks,
Andrea


^ permalink raw reply

* Re: Aw: Re: [RFC v1 2/3] net: dsa: mt753x: make CPU-Port dynamic
From: Florian Fainelli @ 2022-04-26 15:56 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: Frank Wunderlich, linux-mediatek, linux-rockchip, Rob Herring,
	Krzysztof Kozlowski, Heiko Stuebner, Sean Wang, Landen Chao,
	DENG Qingfang, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	Peter Geis, devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <trinity-5fd6da8c-15f6-488d-a332-0ce7625f41e0-1650988498781@3c-app-gmx-bs69>

On 4/26/22 08:54, Frank Wunderlich wrote:
> Hi
> 
> thanks for fast review.
> 
>> Gesendet: Dienstag, 26. April 2022 um 17:45 Uhr
>> Von: "Florian Fainelli" <f.fainelli@gmail.com>
>> On 4/26/22 06:49, Frank Wunderlich wrote:
>>> From: Frank Wunderlich <frank-w@public-files.de>
> 
>>> @@ -1190,8 +1191,8 @@ mt7530_port_bridge_join(struct dsa_switch *ds, int port,
>>>    			struct netlink_ext_ack *extack)
>>>    {
>>>    	struct dsa_port *dp = dsa_to_port(ds, port), *other_dp;
>>> -	u32 port_bitmap = BIT(MT7530_CPU_PORT);
>>>    	struct mt7530_priv *priv = ds->priv;
>>> +	u32 port_bitmap = BIT(priv->cpu_port);
>>
>> No need to re-order these two lines.
> 
> imho it is needed as i now access priv-struct now ;) which is not available at the "old" position

My bad, yes, I was concerned with preserving the reserve christmas tree 
style, never mind.
-- 
Florian

^ permalink raw reply

* Re: Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
From: duoming @ 2022-04-26 15:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: krzysztof.kozlowski, linux-kernel, davem, gregkh,
	alexander.deucher, broonie, akpm, netdev, linma
In-Reply-To: <57eae113432e286b7e279102220c21fcf0bd1306.camel@redhat.com>

hello,

On Mon, Tue, 26 Apr 2022 13:17:21 +0200 Paolo wrote:

> > There are destructive operations such as nfcmrvl_fw_dnld_abort and
> > gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> > gpio and so on could be destructed while the upper layer functions such as
> > nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> > to double-free, use-after-free and null-ptr-deref bugs.
> > 
> > There are three situations that could lead to double-free bugs.
> > 
> > The first situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |  nfcmrvl_nci_unregister_dev
> >  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >   kfree(fw) //(1)             |    fw_dnld_over
> >                               |     release_firmware
> >   ...                         |      kfree(fw) //(2)
> >                               |     ...
> > 
> > The second situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |
> >  mod_timer                    |
> >  (wait a time)                |
> >  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >     release_firmware          |    fw_dnld_over
> >      kfree(fw) //(1)          |     release_firmware
> >      ...                      |      kfree(fw) //(2)
> > 
> > The third situation is shown below:
> > 
> >        (Thread 1)               |       (Thread 2)
> > nfcmrvl_nci_recv_frame          |
> >  if(..->fw_download_in_progress)|
> >   nfcmrvl_fw_dnld_recv_frame    |
> >    queue_work                   |
> >                                 |
> > fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
> >  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
> >   release_firmware              |   fw_dnld_over
> >    kfree(fw) //(1)              |    release_firmware
> >                                 |     kfree(fw) //(2)
> > 
> > The firmware struct is deallocated in position (1) and deallocated
> > in position (2) again.
> > 
> > The crash trace triggered by POC is like below:
> > 
> > [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> > [  122.640457] Call Trace:
> > [  122.640457]  <TASK>
> > [  122.640457]  kfree+0xb0/0x330
> > [  122.640457]  fw_dnld_over+0x28/0xf0
> > [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> > [  122.640457]  nci_uart_tty_close+0x87/0xd0
> > [  122.640457]  tty_ldisc_kill+0x3e/0x80
> > [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> > [  122.640457]  __tty_hangup.part.0+0x316/0x520
> > [  122.640457]  tty_release+0x200/0x670
> > [  122.640457]  __fput+0x110/0x410
> > [  122.640457]  task_work_run+0x86/0xd0
> > [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> > [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> > [  122.640457]  do_syscall_64+0x48/0x90
> > [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  122.640457] RIP: 0033:0x7f68433f6beb
> > 
> > What's more, there are also use-after-free and null-ptr-deref bugs
> > in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> > set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> > then, we dereference firmware, gpio or the members of priv->fw_dnld in
> > nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> > 
> > This patch reorders destructive operations after nci_unregister_device
> > to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> > is well synchronized and won't return if there is a running routine.
> > This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> > nfc_{un,}register_device").

> It looks like the above is not enough to close all the possible races,
> specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
> still race one vs another. 

The fw_dnld_timeout() and fw_dnld_rx_work() could not race with each other.
Because fw_dnld_rx_work() is used to receive packets from firmware download
process, the release_firmware() in fw_dnld_rx_work() will not execute unless it
has received firmware packets, the code is shown below. 

fw_dnld_rx_work                                   
  while ((skb = skb_dequeue(&fw_dnld->rx_q))) {
    ...  
    fw_dnld_over
      release_firmware
  }

The release_firmware() is only called when firmware download process failed, and
fw_dnld_rx_work() will not receive any packets from firmware download process.
As a result, the race is non-existent. The process is shown below:

fw_dnld_rx_work          | nfcmrvl_fw_dnld_start
  ...                    |   ...
                         |   release_firmware
  (wait for skb)         |   (the following will not send any skb)
  while (skb = ...) {    |   
    ...                  | 
    fw_dnld_over         |
      release_firmware   |
  }                      |
  (receive no packet)    |

> I *think* that the approach you already suggested here:
> 
> https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@zju.edu.cn/
> 
> should be safer - but you have to protect with the same spinlock even
> every fw_dnld->fw modification.

The spinlock should not be used to protect release_firmware(), because there is vfree() in release_firmware().
The vfree() is a function that may sleep. If we use spinlock, the sleep in atomic bug will happen. The process
is shown below:

spin_lock()
release_firmware()
  firmware_free_data()
    vfree()  //sleep in atomic
spin_unlock()

We reorders destructive operations after nci_unregister_device(), 
using nci_unregister_device() to synchonize with firmware download routine.
But I found this is not enough, when nci_unregister_device() is return, there 
are still firmware download routine is running.

[   65.835462] BUG: KASAN: use-after-free in nci_fw_download+0x26/0x60
[   65.840236] Read of size 8 at addr ffff88800c2f5008
...
[   65.845755] Call Trace:
[   65.856235]  nci_fw_download+0x26/0x60
[   65.856235]  nfc_fw_download+0x99/0xe0
[   65.856235]  nfc_genl_fw_download+0x10b/0x170
...

In order to solve this problem, I propose the following solutions:

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
        struct nci_dev *ndev = priv->ndev;
 
+       nci_unregister_device(ndev);
        if (priv->ndev->nfc_dev->fw_download_in_progress)
                nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
        if (gpio_is_valid(priv->config.reset_n_io))
                gpio_free(priv->config.reset_n_io);
 
-       nci_unregister_device(ndev);
        nci_free_device(ndev);
        kfree(priv);
 }

diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..da8199f67d4 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,7 @@
 #define NFC_CHECK_PRES_FREQ_MS 2000
 
 int nfc_devlist_generation;
+bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */
@@ -38,7 +39,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
        device_lock(&dev->dev);
 
-       if (!device_is_registered(&dev->dev)) {
+       if (!device_is_registered(&dev->dev) || !nfc_download) {
                rc = -ENODEV;
                goto error;
        }
@@ -1134,6 +1135,7 @@ int nfc_register_device(struct nfc_dev *dev)
                        dev->rfkill = NULL;
                }
        }
+       nfc_download = true;
        device_unlock(&dev->dev);
 
        rc = nfc_genl_device_added(dev);
@@ -1166,6 +1168,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
                rfkill_unregister(dev->rfkill);
                rfkill_destroy(dev->rfkill);
        }
+       nfc_download = false;
        device_unlock(&dev->dev);
 
        if (dev->ops->check_presence) {
-- 

The new solution not only reorders destructive operations after nci_unregister_device,
but also adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_unregister_dev      |
  nci_unregister_device         |
    nfc_unregister_device       | nfc_fw_download
      device_lock()             |
      ...                       |
      nfc_download = false;//(1)|   ...
      device_unlock()           |
  ...                           |   device_lock()
  (destructive operations)      |   if(.. || !nfc_download)
                                |     goto error; //(2)
                                |   rc = dev->ops->fw_download();
                                | error:
                                |   device_unlock()

If the device is detaching, nfc_download will be set to false in position (1) and 
the download function will goto error.

       (Thread 1)                  |       (Thread 2)
nfcmrvl_nci_unregister_dev         |
  nci_unregister_device            |
    nfc_unregister_device          | nfc_fw_download
                                   |   device_lock()
                                   |   rc = dev->ops->fw_download();
                                   |   dev->fw_download_in_progress = false;
                                   |   device_unlock()
      device_lock()                |   ...
      ...                          |   
      nfc_download = false;        |   ...
      device_unlock()              |
  ...                              |
  if (..->fw_download_in_progress) |
    nfcmrvl_fw_dnld_abort          |
      fw_dnld_over                 |
        release_firmware           |

If the download function is executing, nci_unregister_device() will wait until
firmware download routine is finished. What`s more, when the firmware download
routine is finished, the fw_download_in_progress() in nfc_dev struct will be set
to false. the release_firmware() in nfcmrvl_nci_unregister_dev() will not execute,
which could synchonize with fw_dnld_rx_work() and fw_dnld_timeout().

I think the new solution is good, it has been tested enough. I sent "[PATCH net v3]
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs"
just now.

Best regards,
Duoming Zhou

^ permalink raw reply related

* Aw: Re: [RFC v1 2/3] net: dsa: mt753x: make CPU-Port dynamic
From: Frank Wunderlich @ 2022-04-26 15:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Frank Wunderlich, linux-mediatek, linux-rockchip, Rob Herring,
	Krzysztof Kozlowski, Heiko Stuebner, Sean Wang, Landen Chao,
	DENG Qingfang, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Matthias Brugger,
	Peter Geis, devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <046a334b-d972-6ab9-5127-f845cef72751@gmail.com>

Hi

thanks for fast review.

> Gesendet: Dienstag, 26. April 2022 um 17:45 Uhr
> Von: "Florian Fainelli" <f.fainelli@gmail.com>
> On 4/26/22 06:49, Frank Wunderlich wrote:
> > From: Frank Wunderlich <frank-w@public-files.de>

> > @@ -1190,8 +1191,8 @@ mt7530_port_bridge_join(struct dsa_switch *ds, int port,
> >   			struct netlink_ext_ack *extack)
> >   {
> >   	struct dsa_port *dp = dsa_to_port(ds, port), *other_dp;
> > -	u32 port_bitmap = BIT(MT7530_CPU_PORT);
> >   	struct mt7530_priv *priv = ds->priv;
> > +	u32 port_bitmap = BIT(priv->cpu_port);
>
> No need to re-order these two lines.

imho it is needed as i now access priv-struct now ;) which is not available at the "old" position

> >
> >   	mutex_lock(&priv->reg_mutex);

> > @@ -1503,6 +1504,7 @@ static int
> >   mt7530_port_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering,
> >   			   struct netlink_ext_ack *extack)
> >   {
> > +	struct mt7530_priv *priv = ds->priv;
>
> Add a space to separate declaration from code.

OK

> > --- a/drivers/net/dsa/mt7530.h
> > +++ b/drivers/net/dsa/mt7530.h
> > @@ -8,7 +8,6 @@
> >
> >   #define MT7530_NUM_PORTS		7
> >   #define MT7530_NUM_PHYS			5
> > -#define MT7530_CPU_PORT			6
>
> We could have kept this define and rename it MT7530_DEFAULT_CPU_PORT or
> something and in m7530_probe() use that newly renamed constant to
> illustrate that we have a default value assigned, just in case.

i do

> >   #define MT7530_NUM_FDB_RECORDS		2048
> >   #define MT7530_ALL_MEMBERS		0xff
> >
> > @@ -823,6 +822,7 @@ struct mt7530_priv {
> >   	u8			mirror_tx;
> >
> >   	struct mt7530_port	ports[MT7530_NUM_PORTS];
> > +	int			cpu_port;
>
> This can be an unsigned integer since you do not assign negative values.
> With that fixes, this looks good to me.

ok, i change to unsigned int

regards Frank


^ permalink raw reply

* [PATCH net v2] tls: Skip tls_append_frag on zero copy size
From: Maxim Mikityanskiy @ 2022-04-26 15:49 UTC (permalink / raw)
  To: Boris Pismenny, John Fastabend, Daniel Borkmann, Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Tariq Toukan, Aviad Yehezkel,
	netdev, Maxim Mikityanskiy

Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.

If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.

This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/tls/tls_device.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 12f7b56771d9..af875ad4a822 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
 		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
 		copy = min_t(size_t, copy, (max_open_record_len - record->len));
 
-		rc = tls_device_copy_data(page_address(pfrag->page) +
-					  pfrag->offset, copy, msg_iter);
-		if (rc)
-			goto handle_error;
-		tls_append_frag(record, pfrag, copy);
+		if (copy) {
+			rc = tls_device_copy_data(page_address(pfrag->page) +
+						  pfrag->offset, copy, msg_iter);
+			if (rc)
+				goto handle_error;
+			tls_append_frag(record, pfrag, copy);
+		}
 
 		size -= copy;
 		if (!size) {
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
From: Maxim Mikityanskiy @ 2022-04-26 15:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Boris Pismenny, John Fastabend, Daniel Borkmann, David S. Miller,
	Paolo Abeni, Tariq Toukan, Aviad Yehezkel, Ilya Lesokhin, netdev
In-Reply-To: <20220422075502.27532722@kernel.org>

On 2022-04-22 17:55, Jakub Kicinski wrote:
> On Thu, 21 Apr 2022 12:47:18 +0300 Maxim Mikityanskiy wrote:
>> On 2022-04-18 17:56, Maxim Mikityanskiy wrote:
>>> On 2022-04-14 13:28, Jakub Kicinski wrote:
>>>> I appreciate you're likely trying to keep the fix minimal but Greg
>>>> always says "fix it right, worry about backports later".
>>>>
>>>> I think we should skip more, we can reorder the mins and if
>>>> min(size, rec space) == 0 then we can skip the allocation as well.
>>>
>>> Sorry, I didn't get the idea. Could you elaborate?
>>>
>>> Reordering the mins:
>>>
>>> copy = min_t(size_t, size, max_open_record_len - record->len);
>>> copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
>>>
>>> I assume by skipping the allocation you mean skipping
>>> tls_do_allocation(), right? Do you suggest to skip it if the result of
>>> the first min_t() is 0?
>>>
>>> record->len used in the first min_t() comes from ctx->open_record, which
>>> either exists or is allocated by tls_do_allocation(). If we move the
>>> copy == 0 check above the tls_do_allocation() call, first we'll have to
>>> check whether ctx->open_record is NULL, which is currently checked by
>>> tls_do_allocation() itself.
>>>
>>> If open_record is not NULL, there isn't much to skip in
>>> tls_do_allocation on copy == 0, the main part is already skipped,
>>> regardless of the value of copy. If open_record is NULL, we can't skip
>>> tls_do_allocation, and copy won't be 0 afterwards.
>>>
>>> To compare, before (pseudocode):
>>>
>>> tls_do_allocation {
>>>       if (!ctx->open_record)
>>>           ALLOCATE RECORD
>>>           Now ctx->open_record is not NULL
>>>       if (!sk_page_frag_refill(sk, pfrag))
>>>           return -ENOMEM
>>> }
>>> handle errors from tls_do_allocation
>>> copy = min(size, pfrag->size - pfrag->offset)
>>> copy = min(copy, max_open_record_len - ctx->open_record->len)
>>> if (copy)
>>>       copy data and append frag
>>>
>>> After:
>>>
>>> if (ctx->open_record) {
>>>       copy = min(size, max_open_record_len - ctx->open_record->len)
>>>       if (copy) {
>>>           // You want to put this part of tls_do_allocation under if (copy)?
>>>           if (!sk_page_frag_refill(sk, pfrag))
>>>               handle errors
>>>           copy = min(copy, pfrag->size - pfrag->offset)
>>>           if (copy)
>>>               copy data and append frag
>>>       }
>>> } else {
>>>       ALLOCATE RECORD
>>>       if (!sk_page_frag_refill(sk, pfrag))
>>>           handle errors
>>>       // Have to do this after the allocation anyway.
>>>       copy = min(size, max_open_record_len - ctx->open_record->len)
>>>       copy = min(copy, pfrag->size - pfrag->offset)
>>>       if (copy)
>>>           copy data and append frag
>>> }
>>>
>>> Either I totally don't get what you suggested, or it doesn't make sense
>>> to me, because we have +1 branch in the common path when a record is
>>> open and copy is not 0, no changes when there is no record, and more
>>> repeating code hard to compress.
>>>
>>> If I missed your idea, please explain in more details.
>>
>> Jakub, is your comment still relevant after my response? If not, can the
>> patch be merged?
> 
> I'd prefer if you refactored the code so tls_push_data() looks more
> natural.

I would be happy to improve the code, but I honestly didn't understand 
your idea. My attempt to understand it only made the code worse.

> But the patch is correct so if you don't want to you can
> repost.

OK, I'm resubmitting as is, but in case you find time to elaborate on 
your refactoring idea, I'm still open to suggestions.

Thanks.

> Sorry for the delay.


^ permalink raw reply

* Re: [RFC v1 2/3] net: dsa: mt753x: make CPU-Port dynamic
From: Florian Fainelli @ 2022-04-26 15:45 UTC (permalink / raw)
  To: Frank Wunderlich, linux-mediatek, linux-rockchip
  Cc: Frank Wunderlich, Rob Herring, Krzysztof Kozlowski,
	Heiko Stuebner, Sean Wang, Landen Chao, DENG Qingfang,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Matthias Brugger, Peter Geis,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20220426134924.30372-3-linux@fw-web.de>

On 4/26/22 06:49, Frank Wunderlich wrote:
> From: Frank Wunderlich <frank-w@public-files.de>
> 
> Currently CPU-Port is hardcoded to Port 6.
> 
> On BPI-R2-Pro board this port is not connected and only Port 5 is
> connected to gmac of SoC.
> 
> Replace this hardcoded CPU-Port with a member in mt7530_priv struct
> which is set in mt753x_cpu_port_enable to the right port.
> 
> I defined a default in probe (in case no CPU-Port will be setup) and
> if both cpu-port were setup port 6 will be used like the const prior
> this patch.
> 
> In mt7531_setup first access is before we know which port should be used
> (mt753x_cpu_port_enable) so section "BPDU to CPU port" needs to be moved
> down.
> 
> Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
> ---
>   drivers/net/dsa/mt7530.c | 46 ++++++++++++++++++++++------------------
>   drivers/net/dsa/mt7530.h |  2 +-
>   2 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index ccf4cb944167..4789105b8137 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -1004,6 +1004,7 @@ mt753x_cpu_port_enable(struct dsa_switch *ds, int port)
>   			return ret;
>   	}
>   
> +	priv->cpu_port = port;
>   	/* Enable Mediatek header mode on the cpu port */
>   	mt7530_write(priv, MT7530_PVC_P(port),
>   		     PORT_SPEC_TAG);
> @@ -1041,7 +1042,7 @@ mt7530_port_enable(struct dsa_switch *ds, int port,
>   	 * restore the port matrix if the port is the member of a certain
>   	 * bridge.
>   	 */
> -	priv->ports[port].pm |= PCR_MATRIX(BIT(MT7530_CPU_PORT));
> +	priv->ports[port].pm |= PCR_MATRIX(BIT(priv->cpu_port));
>   	priv->ports[port].enable = true;
>   	mt7530_rmw(priv, MT7530_PCR_P(port), PCR_MATRIX_MASK,
>   		   priv->ports[port].pm);
> @@ -1190,8 +1191,8 @@ mt7530_port_bridge_join(struct dsa_switch *ds, int port,
>   			struct netlink_ext_ack *extack)
>   {
>   	struct dsa_port *dp = dsa_to_port(ds, port), *other_dp;
> -	u32 port_bitmap = BIT(MT7530_CPU_PORT);
>   	struct mt7530_priv *priv = ds->priv;
> +	u32 port_bitmap = BIT(priv->cpu_port);

No need to re-order these two lines.

>   
>   	mutex_lock(&priv->reg_mutex);
>   
> @@ -1267,9 +1268,9 @@ mt7530_port_set_vlan_unaware(struct dsa_switch *ds, int port)
>   	 * the CPU port get out of VLAN filtering mode.
>   	 */
>   	if (all_user_ports_removed) {
> -		mt7530_write(priv, MT7530_PCR_P(MT7530_CPU_PORT),
> +		mt7530_write(priv, MT7530_PCR_P(priv->cpu_port),
>   			     PCR_MATRIX(dsa_user_ports(priv->ds)));
> -		mt7530_write(priv, MT7530_PVC_P(MT7530_CPU_PORT), PORT_SPEC_TAG
> +		mt7530_write(priv, MT7530_PVC_P(priv->cpu_port), PORT_SPEC_TAG
>   			     | PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT));
>   	}
>   }
> @@ -1335,8 +1336,8 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
>   	 */
>   	if (priv->ports[port].enable)
>   		mt7530_rmw(priv, MT7530_PCR_P(port), PCR_MATRIX_MASK,
> -			   PCR_MATRIX(BIT(MT7530_CPU_PORT)));
> -	priv->ports[port].pm = PCR_MATRIX(BIT(MT7530_CPU_PORT));
> +			   PCR_MATRIX(BIT(priv->cpu_port)));
> +	priv->ports[port].pm = PCR_MATRIX(BIT(priv->cpu_port));
>   
>   	/* When a port is removed from the bridge, the port would be set up
>   	 * back to the default as is at initial boot which is a VLAN-unaware
> @@ -1503,6 +1504,7 @@ static int
>   mt7530_port_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering,
>   			   struct netlink_ext_ack *extack)
>   {
> +	struct mt7530_priv *priv = ds->priv;

Add a space to separate declaration from code.

>   	if (vlan_filtering) {
>   		/* The port is being kept as VLAN-unaware port when bridge is
>   		 * set up with vlan_filtering not being set, Otherwise, the
> @@ -1510,7 +1512,7 @@ mt7530_port_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering,
>   		 * for becoming a VLAN-aware port.
>   		 */
>   		mt7530_port_set_vlan_aware(ds, port);
> -		mt7530_port_set_vlan_aware(ds, MT7530_CPU_PORT);
> +		mt7530_port_set_vlan_aware(ds, priv->cpu_port);
>   	} else {
>   		mt7530_port_set_vlan_unaware(ds, port);
>   	}
> @@ -1526,7 +1528,7 @@ mt7530_hw_vlan_add(struct mt7530_priv *priv,
>   	u32 val;
>   
>   	new_members = entry->old_members | BIT(entry->port) |
> -		      BIT(MT7530_CPU_PORT);
> +		      BIT(priv->cpu_port);
>   
>   	/* Validate the entry with independent learning, create egress tag per
>   	 * VLAN and joining the port as one of the port members.
> @@ -1550,8 +1552,8 @@ mt7530_hw_vlan_add(struct mt7530_priv *priv,
>   	 * DSA tag.
>   	 */
>   	mt7530_rmw(priv, MT7530_VAWD2,
> -		   ETAG_CTRL_P_MASK(MT7530_CPU_PORT),
> -		   ETAG_CTRL_P(MT7530_CPU_PORT,
> +		   ETAG_CTRL_P_MASK(priv->cpu_port),
> +		   ETAG_CTRL_P(priv->cpu_port,
>   			       MT7530_VLAN_EGRESS_STACK));
>   }
>   
> @@ -1575,7 +1577,7 @@ mt7530_hw_vlan_del(struct mt7530_priv *priv,
>   	 * the entry would be kept valid. Otherwise, the entry is got to be
>   	 * disabled.
>   	 */
> -	if (new_members && new_members != BIT(MT7530_CPU_PORT)) {
> +	if (new_members && new_members != BIT(priv->cpu_port)) {
>   		val = IVL_MAC | VTAG_EN | PORT_MEM(new_members) |
>   		      VLAN_VALID;
>   		mt7530_write(priv, MT7530_VAWD1, val);
> @@ -2105,7 +2107,7 @@ mt7530_setup(struct dsa_switch *ds)
>   	 * controller also is the container for two GMACs nodes representing
>   	 * as two netdev instances.
>   	 */
> -	dn = dsa_to_port(ds, MT7530_CPU_PORT)->master->dev.of_node->parent;
> +	dn = dsa_to_port(ds, priv->cpu_port)->master->dev.of_node->parent;
>   	ds->assisted_learning_on_cpu_port = true;
>   	ds->mtu_enforcement_ingress = true;
>   
> @@ -2337,15 +2339,6 @@ mt7531_setup(struct dsa_switch *ds)
>   	mt7531_ind_c45_phy_write(priv, MT753X_CTRL_PHY_ADDR, MDIO_MMD_VEND2,
>   				 CORE_PLL_GROUP4, val);
>   
> -	/* BPDU to CPU port */
> -	mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK,
> -		   BIT(MT7530_CPU_PORT));
> -	mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK,
> -		   MT753X_BPDU_CPU_ONLY);
> -
> -	/* Enable and reset MIB counters */
> -	mt7530_mib_reset(ds);
> -
>   	for (i = 0; i < MT7530_NUM_PORTS; i++) {
>   		/* Disable forwarding by default on all ports */
>   		mt7530_rmw(priv, MT7530_PCR_P(i), PCR_MATRIX_MASK,
> @@ -2373,6 +2366,15 @@ mt7531_setup(struct dsa_switch *ds)
>   			   PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT));
>   	}
>   
> +	/* BPDU to CPU port */
> +	mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK,
> +		   BIT(priv->cpu_port));
> +	mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK,
> +		   MT753X_BPDU_CPU_ONLY);
> +
> +	/* Enable and reset MIB counters */
> +	mt7530_mib_reset(ds);
> +
>   	/* Setup VLAN ID 0 for VLAN-unaware bridges */
>   	ret = mt7530_setup_vlan0(priv);
>   	if (ret)
> @@ -3213,6 +3215,8 @@ mt7530_probe(struct mdio_device *mdiodev)
>   	if (!priv)
>   		return -ENOMEM;
>   
> +	priv->cpu_port = 6;
> +
>   	priv->ds = devm_kzalloc(&mdiodev->dev, sizeof(*priv->ds), GFP_KERNEL);
>   	if (!priv->ds)
>   		return -ENOMEM;
> diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
> index 91508e2feef9..62df8d10f6d4 100644
> --- a/drivers/net/dsa/mt7530.h
> +++ b/drivers/net/dsa/mt7530.h
> @@ -8,7 +8,6 @@
>   
>   #define MT7530_NUM_PORTS		7
>   #define MT7530_NUM_PHYS			5
> -#define MT7530_CPU_PORT			6

We could have kept this define and rename it MT7530_DEFAULT_CPU_PORT or 
something and in m7530_probe() use that newly renamed constant to 
illustrate that we have a default value assigned, just in case.

>   #define MT7530_NUM_FDB_RECORDS		2048
>   #define MT7530_ALL_MEMBERS		0xff
>   
> @@ -823,6 +822,7 @@ struct mt7530_priv {
>   	u8			mirror_tx;
>   
>   	struct mt7530_port	ports[MT7530_NUM_PORTS];
> +	int			cpu_port;

This can be an unsigned integer since you do not assign negative values. 
With that fixes, this looks good to me.
-- 
Florian

^ permalink raw reply

* Re: [RFC v1 1/3] net: dsa: mt753x: make reset optional
From: Florian Fainelli @ 2022-04-26 15:42 UTC (permalink / raw)
  To: Frank Wunderlich, linux-mediatek, linux-rockchip
  Cc: Frank Wunderlich, Rob Herring, Krzysztof Kozlowski,
	Heiko Stuebner, Sean Wang, Landen Chao, DENG Qingfang,
	Andrew Lunn, Vivien Didelot, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Matthias Brugger, Peter Geis,
	devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20220426134924.30372-2-linux@fw-web.de>

On 4/26/22 06:49, Frank Wunderlich wrote:
> From: Frank Wunderlich <frank-w@public-files.de>
> 
> Currently a reset line is required, but on BPI-R2-Pro board
> this reset is shared with the gmac and prevents the switch to
> be initialized because mdio is not ready fast enough after
> the reset.
> 
> So make the reset optional to allow shared reset lines.
> 
> Signed-off-by: Frank Wunderlich <frank-w@public-files.de>

This looks fine however 
Documentation/devicetree/bindings/net/dsa/mt7530.txt still has some 
verbiage that suggests that the 'reset' property is mandatory, so you 
might need to update the binding (and as a separate patch we should make 
it YAML).
-- 
Florian

^ permalink raw reply

* [PATCH net] bnx2x: fix napi API usage sequence
From: Manish Chopra @ 2022-04-26 15:39 UTC (permalink / raw)
  To: kuba; +Cc: drc, netdev, aelior, davem

While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload

Uninterruptible tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.

Fixes: 7fa6f34081f1 ("bnx2x: AER revised")
Reported-by: David Christensen <drc@linux.vnet.ibm.com>
Tested-by: David Christensen <drc@linux.vnet.ibm.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index c19b072f3a23..962253db25b8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14153,10 +14153,6 @@ static int bnx2x_eeh_nic_unload(struct bnx2x *bp)
 
 	/* Stop Tx */
 	bnx2x_tx_disable(bp);
-	/* Delete all NAPI objects */
-	bnx2x_del_all_napi(bp);
-	if (CNIC_LOADED(bp))
-		bnx2x_del_all_napi_cnic(bp);
 	netdev_reset_tc(bp->dev);
 
 	del_timer_sync(&bp->timer);
@@ -14261,6 +14257,11 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev)
 		bnx2x_drain_tx_queues(bp);
 		bnx2x_send_unload_req(bp, UNLOAD_RECOVERY);
 		bnx2x_netif_stop(bp, 1);
+		bnx2x_del_all_napi(bp);
+
+		if (CNIC_LOADED(bp))
+			bnx2x_del_all_napi_cnic(bp);
+
 		bnx2x_free_irq(bp);
 
 		/* Report UNLOAD_DONE to MCP */
-- 
2.35.1.273.ge6ebfd0


^ permalink raw reply related

* Re: 9p EBADF with cache enabled (Was: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected))
From: Christian Schoenebeck @ 2022-04-26 15:38 UTC (permalink / raw)
  To: David Howells
  Cc: asmadeus, David Kahurani, davem, ericvh, kuba, linux-kernel,
	lucho, netdev, v9fs-developer, Greg Kurz
In-Reply-To: <3174158.1650895816@warthog.procyon.org.uk>

On Montag, 25. April 2022 16:10:16 CEST David Howells wrote:
> There may be a quick and dirty workaround.  I think the problem is that
> unless the O_APPEND read starts at the beginning of a page, netfs is going
> to enforce a read.  Does the attached patch fix the problem?  (note that
> it's untested)

Patch doesn't apply for me on master:

checking file fs/9p/vfs_addr.c
Hunk #1 FAILED at 291.
1 out of 1 hunk FAILED
checking file fs/netfs/buffered_read.c
Hunk #1 FAILED at 364.
1 out of 1 hunk FAILED
checking file fs/netfs/internal.h
checking file fs/netfs/stats.c
Hunk #2 FAILED at 38.
1 out of 2 hunks FAILED

commit d615b5416f8a1afeb82d13b238f8152c572d59c0 (HEAD -> master, origin/master, origin/HEAD)
Merge: 0fc74d820a01 4d8ec9120819
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Apr 25 10:53:56 2022 -0700

What was is based on?

> Also, can you get the contents of /proc/fs/fscache/stats from after
> reproducing the problem?

FS-Cache statistics
Cookies: n=684 v=1 vcol=0 voom=0
Acquire: n=689 ok=689 oom=0
LRU    : n=0 exp=0 rmv=0 drp=0 at=0
Invals : n=0
Updates: n=2095 rsz=0 rsn=0
Relinqs: n=5 rtr=0 drop=5
NoSpace: nwr=0 ncr=0 cull=0
IO     : rd=0 wr=0
RdHelp : RA=974 RP=0 WB=13323 WBZ=2072 rr=0 sr=0
RdHelp : ZR=13854 sh=0 sk=2072
RdHelp : DL=14297 ds=14297 df=13322 di=0
RdHelp : RD=0 rs=0 rf=0
RdHelp : WR=0 ws=0 wf=0

Best regards,
Christian Schoenebeck



^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Andrew Lunn @ 2022-04-26 15:36 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Ido Schimmel, Jakub Kicinski, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <Ymf8N19bQYcKJJ1g@nanopsycho>

On Tue, Apr 26, 2022 at 04:05:43PM +0200, Jiri Pirko wrote:
> Tue, Apr 26, 2022 at 03:45:48PM CEST, andrew@lunn.ch wrote:
> >> Well, I got your point. If the HW would be designed in the way the
> >> building blocks are exposed to the host, that would work. However, that
> >> is not the case here, unfortunatelly.
> >
> >I'm with Jakub. It is the uAPI which matters here. It should look the
> >same for a SoC style enterprise router and your discombobulated TOR
> >router. How you talk to the different building blocks is an
> >implementation detail.
> 
> It's not that simple. Take the gearbox for example. You say bunch of
> MDIO registers. ASIC FW has a custom SDK internally that is used to
> talk to the gearbox.
> 
> The flash, you say expose by MTD, but there is no access to it directly
> from host. Can't be done. There are HW design limitations that are
> blocking your concept.

The MTD API and your SDK API are abstractions. You give it a blob of
data and ask it to write it to the storage. Somehow that happens. Does
user space need to know MTD or an SDK is being used? Does user space
care? I would expect the same uAPI for both, here is a firmware blob,
write it to storage. The driver knows if it needs to use the MTD API
or the SDK API, it is all abstracted away in the driver.

	Andrew


^ permalink raw reply

* Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
From: Paolo Abeni @ 2022-04-26 15:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev
In-Reply-To: <CANn89iLuqGdbHkyUcTZd+Ww6vUxqNg0L4eC5Xt8bqLMDmDM18w@mail.gmail.com>

On Tue, 2022-04-26 at 06:11 -0700, Eric Dumazet wrote:
> On Tue, Apr 26, 2022 at 12:38 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Fri, 2022-04-22 at 13:12 -0700, Eric Dumazet wrote:
> > [...]
> > > @@ -6571,6 +6577,28 @@ static int napi_threaded_poll(void *data)
> > >       return 0;
> > >  }
> > > 
> > > +static void skb_defer_free_flush(struct softnet_data *sd)
> > > +{
> > > +     struct sk_buff *skb, *next;
> > > +     unsigned long flags;
> > > +
> > > +     /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
> > > +     if (!READ_ONCE(sd->defer_list))
> > > +             return;
> > > +
> > > +     spin_lock_irqsave(&sd->defer_lock, flags);
> > > +     skb = sd->defer_list;
> > 
> > I *think* that this read can possibly be fused with the previous one,
> > and another READ_ONCE() should avoid that.
> 
> Only the lockless read needs READ_ONCE()
> 
> For the one after spin_lock_irqsave(&sd->defer_lock, flags),
> there is no need for any additional barrier.
> 
> If the compiler really wants to use multiple one-byte-at-a-time loads,
> we are not going to fight, there might be good reasons for that.

I'm unsure I explained my doubt in a clear way: what I fear is that the
compiler could emit a single read instruction, corresponding to the
READ_ONCE() outside the lock, so that the spin-locked section will
operate on "old" defer_list. 

If that happens we could end-up with 'defer_count' underestimating the
list lenght. It looks like that is tolerable, as we will still be
protected vs defer_list growing too much.

Acked-by: Paolo Abeni <pabeni@redhat.com>



^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Andrew Lunn @ 2022-04-26 15:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Ido Schimmel, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <Ymf66h5dMNOLun8k@nanopsycho>

> Does not make sense to me at all. Line cards are detachable PHY sets in
> essence, very basic functionality. They does not have buffers, health
> and params, I don't think so. 

Ido recently added support to ethtool to upgrade the flash in SFPs.
They are far from simple devices. Some of the GPON ones have linux
running in them, that you can log in to.

The real question is, can you do everything you need to do via
existing uAPIs like ethtool and hwmon.

	Andrew

^ permalink raw reply

* Re: [PATCH iproute2-next v3 0/2] f_flower: match on the number of vlan tags
From: Stephen Hemminger @ 2022-04-26 15:11 UTC (permalink / raw)
  To: Boris Sukholitko; +Cc: netdev, David Ahern, Ilya Lifshits
In-Reply-To: <20220426091417.7153-1-boris.sukholitko@broadcom.com>

On Tue, 26 Apr 2022 12:14:15 +0300
Boris Sukholitko <boris.sukholitko@broadcom.com> wrote:

> Hi,
> 
> Our customers in the fiber telecom world have network configurations
> where they would like to control their traffic according to the number
> of tags appearing in the packet.
> 
> For example, TR247 GPON conformance test suite specification mostly
> talks about untagged, single, double tagged packets and gives lax
> guidelines on the vlan protocol vs. number of vlan tags.
> 
> This is different from the common IT networks where 802.1Q and 802.1ad
> protocols are usually describe single and double tagged packet. GPON
> configurations that we work with have arbitrary mix the above protocols
> and number of vlan tags in the packet.
> 
> The following patch series implement number of vlans flower filter. They
> add num_of_vlans flower filter as an alternative to vlan ethtype protocol
> matching. The end result is that the following command becomes possible:
> 
> tc filter add dev eth1 ingress flower \
>   num_of_vlans 1 vlan_prio 5 action drop
> 
> Also, from our logs, we have redirect rules such that:
> 
> tc filter add dev $GPON ingress flower num_of_vlans $N \
>      action mirred egress redirect dev $DEV
> 
> where N can range from 0 to 3 and $DEV is the function of $N.
> 
> Also there are rules setting skb mark based on the number of vlans:
> 
> tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \
>     $P action skbedit mark $M
> 
> Thanks,
> Boris.
> 
> - v3: rebased to the latest iproute2-next
> - v2: add missing f_flower subject prefix
> 
> Boris Sukholitko (2):
>   f_flower: Add num of vlans parameter
>   f_flower: Check args with num_of_vlans
> 
>  tc/f_flower.c | 57 ++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 16 deletions(-)

Can you do this with BPF? instead of kernel change?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox