Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf-next 01/15] bpf: Remove unneeded memset in queue_stack_map creation
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao
In-Reply-To: <20220810151840.16394-1-laoar.shao@gmail.com>

__GFP_ZERO will clear the memory, so we don't need to memset it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/queue_stack_maps.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index a1c0794..8a5e060 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -78,8 +78,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	memset(qs, 0, sizeof(*qs));
-
 	bpf_map_init_from_attr(&qs->map, attr);
 
 	qs->size = size;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH bpf-next 00/15] bpf: Introduce selectable memcg for bpf map
From: Yafang Shao @ 2022-08-10 15:18 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

On our production environment, we may load, run and pin bpf programs and
maps in containers. For example, some of our networking bpf programs and
maps are loaded and pinned by a process running in a container on our
k8s environment. In this container, there're also running some other
user applications which watch the networking configurations from remote
servers and update them on this local host, log the error events, monitor
the traffic, and do some other stuffs. Sometimes we may need to update 
these user applications to a new release, and in this update process we
will destroy the old container and then start a new genration. In order not
to interrupt the bpf programs in the update process, we will pin the bpf
programs and maps in bpffs. That is the background and use case on our
production environment. 

After switching to memcg-based bpf memory accounting to limit the bpf
memory, some unexpected issues jumped out at us.
1. The memory usage is not consistent between the first generation and
new generations.
2. After the first generation is destroyed, the bpf memory can't be
limited if the bpf maps are not preallocated, because they will be
reparented.

This patchset tries to resolve these issues by introducing an
independent memcg to limit the bpf memory.

In the bpf map creation, we can assign a specific memcg instead of using
the current memcg.  That makes it flexible in containized environment.
For example, if we want to limit the pinned bpf maps, we can use below
hierarchy,

    Shared resources              Private resources 
                                    
     bpf-memcg                      k8s-memcg
     /        \                     /             
bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg        
                  |               /        \
               (charged)     (not charged) (charged)                 
                  |           /              \
                  |          /                \
          bpf-foo-{progs, maps}              srv-foo

srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
(srv-foo-memcg).

Pls. note that there may be no process in bpf-foo-memcg, that means it
can be rmdir-ed by root user currently. Meanwhile we don't forcefully
destroy a memcg if it doesn't have any residents. So this hierarchy is
acceptible. 

In order to make the memcg of bpf maps seletectable, this patchset
introduces some memory allocation wrappers to allocate map related
memory. In these wrappers, it will get the memcg from the map and then
charge the allocated pages or objs.  

Currenly it only supports for bpf map, and we can extend it to bpf prog
as well. It only supports for cgroup2 now, but we can make an additional
change in cgroup_get_from_fd() to support it for cgroup1. 

The observebility can also be supported in the next step, for example,
showing the bpf map's memcg by 'bpftool map show' or even showing which
maps are charged to a specific memcg by 'bpftool cgroup show'.
Furthermore, we may also show an accurate memory size of a bpf map
instead of an estimated memory size in 'bpftool map show' in the future. 

RFC->v1:
- get rid of bpf_map container wrapper (Alexei)
- add the new field into the end of struct (Alexei)
- get rid of BPF_F_SELECTABLE_MEMCG (Alexei)
- save memcg in bpf_map_init_from_attr
- introduce bpf_ringbuf_pages_{alloc,free} and keep them inside
  kernel/bpf/ringbuf.c  (Andrii)

Yafang Shao (15):
  bpf: Remove unneeded memset in queue_stack_map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Fix incorrect mem_cgroup_put
  bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
  bpf: Call bpf_map_init_from_attr() immediately after map creation
  bpf: Save memcg in bpf_map_init_from_attr()
  bpf: Use scoped-based charge in bpf_map_area_alloc
  bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
  bpf: Use bpf_map_kzalloc in arraymap
  bpf: Use bpf_map_kvcalloc in bpf_local_storage
  mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  bpf: Add return value for bpf_map_init_from_attr
  bpf: Introduce selectable memcg for bpf map

 include/linux/bpf.h            |  43 ++++++++++++-
 include/linux/memcontrol.h     |  11 ++++
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/arraymap.c          |  34 ++++++-----
 kernel/bpf/bloom_filter.c      |  11 +++-
 kernel/bpf/bpf_local_storage.c |  17 ++++--
 kernel/bpf/bpf_struct_ops.c    |  19 +++---
 kernel/bpf/cpumap.c            |  17 ++++--
 kernel/bpf/devmap.c            |  30 ++++++----
 kernel/bpf/hashtab.c           |  26 ++++----
 kernel/bpf/local_storage.c     |  12 ++--
 kernel/bpf/lpm_trie.c          |  12 +++-
 kernel/bpf/offload.c           |  12 ++--
 kernel/bpf/queue_stack_maps.c  |  13 ++--
 kernel/bpf/reuseport_array.c   |  11 +++-
 kernel/bpf/ringbuf.c           | 104 ++++++++++++++++++++++----------
 kernel/bpf/stackmap.c          |  13 ++--
 kernel/bpf/syscall.c           | 133 ++++++++++++++++++++++++++++-------------
 mm/memcontrol.c                |  41 +++++++++++++
 net/core/sock_map.c            |  30 ++++++----
 net/xdp/xskmap.c               |  12 +++-
 tools/include/uapi/linux/bpf.h |   1 +
 tools/lib/bpf/bpf.c            |   3 +-
 tools/lib/bpf/bpf.h            |   3 +-
 tools/lib/bpf/gen_loader.c     |   2 +-
 tools/lib/bpf/libbpf.c         |   2 +
 tools/lib/bpf/skel_internal.h  |   2 +-
 27 files changed, 436 insertions(+), 179 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: ethernet<n> dt aliases implications in U-Boot and Linux
From: Andrew Lunn @ 2022-08-10 15:17 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Tim Harvey, Pali Rohár, Sean Anderson, Stephen Hemminger,
	netdev, u-boot, Device Tree Mailing List
In-Reply-To: <20220810071603.GR17705@kitsune.suse.cz>

> > I guess you are new to the netdev list :-)
> > 
> > This is one of those FAQ sort of things, discussed every
> > year. Anything like this is always NACKed. I don't see why this time
> > should be any different.
> > 
> > DSA is somewhat special because it is very old. It comes from before
> > the times of DT. Its DT binding was proposed relatively earl in DT
> > times, and would be rejected in modern days. But the rules of ABI mean
> > the label property will be valid forever. But i very much doubt it
> > will spread to interfaces in general.
> 
> And if this is a FAQ maybe you can point to a summary (perhaps in
> previous mail discusssion) that explains how to provide stable interface
> names for Ethernet devices on a DT based platform?

As far so the kernel is concerned, interface names are unstable. They
have never been truly stable, but they have got more unstable in the
past decade with multiple busses being probed in parallel, which did
not happen before so much.

> On x86 there is a name derived from the device location in the bus
> topology

This is nothing to do with x86. That is userspace, e.g. systemd,
renaming the interfaces. This applies for any architecture for which
systemd runs on.

> which may be somewhat stable but it is not clear that it
> cannot change, and there is an optional BIOS provided table that can
> asssign meaningful names to the interfaces.

I doubt the kernel is looking at ACPI tables. It is user space which
does that.

The kernel provides udev with a bunch of information about the
interface, its bus location, MAC address, etc. Userspace can then
decide what it wants to call it, and what its alternative names are,
etc.

Also, this is not just a network interface name problem. Any device
with a number/letter in it is unstable. I2C bus devices: i2c0,
i2c1... SPI bus deviceS: spi0, spi1..., Block devices, sda, sdb, sdc,
TPM device, tpm0, tpm1. Nothing is stable in the kernel.

       Andrew

^ permalink raw reply

* Re: [PATCH bpf-next v4] selftests: xsk: Update poll test cases
From: Daniel Borkmann @ 2022-08-10 15:17 UTC (permalink / raw)
  To: Maciej Fijalkowski, Shibin Koikkara Reeny
  Cc: bpf, ast, netdev, magnus.karlsson, bjorn, kuba, andrii,
	ciara.loftus
In-Reply-To: <YvOtvgdSnOhUd9Po@boxer>

On 8/10/22 3:08 PM, Maciej Fijalkowski wrote:
> On Wed, Aug 03, 2022 at 02:43:54PM +0000, Shibin Koikkara Reeny wrote:
>> Poll test case was not testing all the functionality
>> of the poll feature in the testsuite. This patch
>> update the poll test case which contain 2 testcases to
> 
> updates, contains, test cases
> 
>> test the RX and the TX poll functionality and additional
>> 2 more testcases to check the timeout features of the
> 
> feature
> 
>> poll event.
>>
>> Poll testsuite have 4 test cases:
> 
> test suite, has
> 
>>
>> 1. TEST_TYPE_RX_POLL:
>> Check if RX path POLLIN function work as expect. TX path
> 
> works
> 
>> can use any method to sent the traffic.
> 
> send
> 
>>
>> 2. TEST_TYPE_TX_POLL:
>> Check if TX path POLLOUT function work as expect. RX path
>> can use any method to receive the traffic.
>>
>> 3. TEST_TYPE_POLL_RXQ_EMPTY:
>> Call poll function with parameter POLLIN on empty rx queue
>> will cause timeout.If return timeout then test case is pass.
> 
> space after dot
> 
>>
>> 4. TEST_TYPE_POLL_TXQ_FULL:
>> When txq is filled and packets are not cleaned by the kernel
>> then if we invoke the poll function with POLLOUT then it
>> should trigger timeout.
>>
>> v1: https://lore.kernel.org/bpf/20220718095712.588513-1-shibin.koikkara.reeny@intel.com/
>> v2: https://lore.kernel.org/bpf/20220726101723.250746-1-shibin.koikkara.reeny@intel.com/
>> v3: https://lore.kernel.org/bpf/20220729132337.211443-1-shibin.koikkara.reeny@intel.com/
>>
>> Changes in v2:
>>   * Updated the commit message
>>   * fixed the while loop flow in receive_pkts function.
>> Changes in v3:
>>   * Introduced single thread validation function.
>>   * Removed pkt_stream_invalid().
>>   * Updated TEST_TYPE_POLL_TXQ_FULL testcase to create invalid frame.
>>   * Removed timer from send_pkts().
>>   * Removed boolean variable skip_rx and skip_tx.
>> Change in v4:
>>   * Added is_umem_valid()
> 
> for single patches, I believe that it's concerned a better practice to
> include the versioning below the '---' line?
> 
>>
>> Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
>> ---
>>   tools/testing/selftests/bpf/xskxceiver.c | 166 +++++++++++++++++------
>>   tools/testing/selftests/bpf/xskxceiver.h |   8 +-
>>   2 files changed, 134 insertions(+), 40 deletions(-)
> 
> I don't think these grammar suggestions require a new revision, so:
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

I cleaned these up while applying. Shibin, please take care of this before sending
out next time, thanks guys!

^ permalink raw reply

* [PATCH 02/15] bpf: Use bpf_map_area_free instread of kvfree
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao
In-Reply-To: <20220810151322.16163-1-laoar.shao@gmail.com>

bpf_map_area_alloc() should be paired with bpf_map_area_free().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/ringbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index ded4fae..3fb54fe 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -116,7 +116,7 @@ static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
 err_free_pages:
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 	return NULL;
 }
 
@@ -190,7 +190,7 @@ static void bpf_ringbuf_free(struct bpf_ringbuf *rb)
 	vunmap(rb);
 	for (i = 0; i < nr_pages; i++)
 		__free_page(pages[i]);
-	kvfree(pages);
+	bpf_map_area_free(pages);
 }
 
 static void ringbuf_map_free(struct bpf_map *map)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 01/15] bpf: Remove unneeded memset in queue_stack_map creation
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao
In-Reply-To: <20220810151322.16163-1-laoar.shao@gmail.com>

__GFP_ZERO will clear the memory, so we don't need to memset it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/queue_stack_maps.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index a1c0794..8a5e060 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -78,8 +78,6 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
 	if (!qs)
 		return ERR_PTR(-ENOMEM);
 
-	memset(qs, 0, sizeof(*qs));
-
 	bpf_map_init_from_attr(&qs->map, attr);
 
 	qs->size = size;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 00/15] bpf: Introduce selectable memcg for bpf map
From: Yafang Shao @ 2022-08-10 15:13 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, hannes, mhocko, roman.gushchin,
	shakeelb, songmuchun, akpm
  Cc: netdev, bpf, linux-mm, Yafang Shao

On our production environment, we may load, run and pin bpf programs and
maps in containers. For example, some of our networking bpf programs and
maps are loaded and pinned by a process running in a container on our
k8s environment. In this container, there're also running some other
user applications which watch the networking configurations from remote
servers and update them on this local host, log the error events, monitor
the traffic, and do some other stuffs. Sometimes we may need to update 
these user applications to a new release, and in this update process we
will destroy the old container and then start a new genration. In order not
to interrupt the bpf programs in the update process, we will pin the bpf
programs and maps in bpffs. That is the background and use case on our
production environment. 

After switching to memcg-based bpf memory accounting to limit the bpf
memory, some unexpected issues jumped out at us.
1. The memory usage is not consistent between the first generation and
new generations.
2. After the first generation is destroyed, the bpf memory can't be
limited if the bpf maps are not preallocated, because they will be
reparented.

This patchset tries to resolve these issues by introducing an
independent memcg to limit the bpf memory.

In the bpf map creation, we can assign a specific memcg instead of using
the current memcg.  That makes it flexible in containized environment.
For example, if we want to limit the pinned bpf maps, we can use below
hierarchy,

    Shared resources              Private resources 
                                    
     bpf-memcg                      k8s-memcg
     /        \                     /             
bpf-bar-memcg bpf-foo-memcg   srv-foo-memcg        
                  |               /        \
               (charged)     (not charged) (charged)                 
                  |           /              \
                  |          /                \
          bpf-foo-{progs, maps}              srv-foo

srv-foo loads and pins bpf-foo-{progs, maps}, but they are charged to an
independent memcg (bpf-foo-memcg) instead of srv-foo's memcg
(srv-foo-memcg).

Pls. note that there may be no process in bpf-foo-memcg, that means it
can be rmdir-ed by root user currently. Meanwhile we don't forcefully
destroy a memcg if it doesn't have any residents. So this hierarchy is
acceptible. 

In order to make the memcg of bpf maps seletectable, this patchset
introduces some memory allocation wrappers to allocate map related
memory. In these wrappers, it will get the memcg from the map and then
charge the allocated pages or objs.  

Currenly it only supports for bpf map, and we can extend it to bpf prog
as well. It only supports for cgroup2 now, but we can make an additional
change in cgroup_get_from_fd() to support it for cgroup1. 

The observebility can also be supported in the next step, for example,
showing the bpf map's memcg by 'bpftool map show' or even showing which
maps are charged to a specific memcg by 'bpftool cgroup show'.
Furthermore, we may also show an accurate memory size of a bpf map
instead of an estimated memory size in 'bpftool map show' in the future. 

RFC->v1:
- get rid of bpf_map container wrapper (Alexei)
- add the new field into the end of struct (Alexei)
- get rid of BPF_F_SELECTABLE_MEMCG (Alexei)
- save memcg in bpf_map_init_from_attr
- introduce bpf_ringbuf_pages_{alloc,free} and keep them inside
  kernel/bpf/ringbuf.c  (Andrii)

Yafang Shao (15):
  bpf: Remove unneeded memset in queue_stack_map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Fix incorrect mem_cgroup_put
  bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM
  bpf: Call bpf_map_init_from_attr() immediately after map creation
  bpf: Save memcg in bpf_map_init_from_attr()
  bpf: Use scoped-based charge in bpf_map_area_alloc
  bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free}
  bpf: Use bpf_map_kzalloc in arraymap
  bpf: Use bpf_map_kvcalloc in bpf_local_storage
  mm, memcg: Add new helper get_obj_cgroup_from_cgroup
  bpf: Add return value for bpf_map_init_from_attr
  bpf: Introduce selectable memcg for bpf map

 include/linux/bpf.h            |  43 ++++++++++++-
 include/linux/memcontrol.h     |  11 ++++
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/arraymap.c          |  34 ++++++-----
 kernel/bpf/bloom_filter.c      |  11 +++-
 kernel/bpf/bpf_local_storage.c |  17 ++++--
 kernel/bpf/bpf_struct_ops.c    |  19 +++---
 kernel/bpf/cpumap.c            |  17 ++++--
 kernel/bpf/devmap.c            |  30 ++++++----
 kernel/bpf/hashtab.c           |  26 ++++----
 kernel/bpf/local_storage.c     |  12 ++--
 kernel/bpf/lpm_trie.c          |  12 +++-
 kernel/bpf/offload.c           |  12 ++--
 kernel/bpf/queue_stack_maps.c  |  13 ++--
 kernel/bpf/reuseport_array.c   |  11 +++-
 kernel/bpf/ringbuf.c           | 104 ++++++++++++++++++++++----------
 kernel/bpf/stackmap.c          |  13 ++--
 kernel/bpf/syscall.c           | 133 ++++++++++++++++++++++++++++-------------
 mm/memcontrol.c                |  41 +++++++++++++
 net/core/sock_map.c            |  30 ++++++----
 net/xdp/xskmap.c               |  12 +++-
 tools/include/uapi/linux/bpf.h |   1 +
 tools/lib/bpf/bpf.c            |   3 +-
 tools/lib/bpf/bpf.h            |   3 +-
 tools/lib/bpf/gen_loader.c     |   2 +-
 tools/lib/bpf/libbpf.c         |   2 +
 tools/lib/bpf/skel_internal.h  |   2 +-
 27 files changed, 436 insertions(+), 179 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: vlan: fix bridge binding behavior and add selftests
From: Nikolay Aleksandrov @ 2022-08-10 15:10 UTC (permalink / raw)
  To: Sevinj Aghayeva
  Cc: netdev, aroulin, sbrivio, roopa, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, bridge
In-Reply-To: <CAMWRUK4J6Dp7Cff=pN9iw6OwDN8g61dd4S=OVKQ75vBch-PxXQ@mail.gmail.com>

On 10/08/2022 18:00, Sevinj Aghayeva wrote:
> On Wed, Aug 10, 2022 at 10:50 AM Nikolay Aleksandrov
> <razor@blackwall.org> wrote:
>>
>> On 10/08/2022 17:42, Sevinj Aghayeva wrote:
>>>
>>>
>>> On Wed, Aug 10, 2022 at 4:54 AM Nikolay Aleksandrov <razor@blackwall.org <mailto:razor@blackwall.org>> wrote:
>>>
>>>     On 10/08/2022 06:11, Sevinj Aghayeva wrote:
>>>     > When bridge binding is enabled for a vlan interface, it is expected
>>>     > that the link state of the vlan interface will track the subset of the
>>>     > ports that are also members of the corresponding vlan, rather than
>>>     > that of all ports.
>>>     >
>>>     > Currently, this feature works as expected when a vlan interface is
>>>     > created with bridge binding enabled:
>>>     >
>>>     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>>>     >         bridge_binding on
>>>     >
>>>     > However, the feature does not work when a vlan interface is created
>>>     > with bridge binding disabled, and then enabled later:
>>>     >
>>>     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>>>     >         bridge_binding off
>>>     >   ip link set vlan10 type vlan bridge_binding on
>>>     >
>>>     > After these two commands, the link state of the vlan interface
>>>     > continues to track that of all ports, which is inconsistent and
>>>     > confusing to users. This series fixes this bug and introduces two
>>>     > tests for the valid behavior.
>>>     >
>>>     > Sevinj Aghayeva (3):
>>>     >   net: core: export call_netdevice_notifiers_info
>>>     >   net: 8021q: fix bridge binding behavior for vlan interfaces
>>>     >   selftests: net: tests for bridge binding behavior
>>>     >
>>>     >  include/linux/netdevice.h                     |   2 +
>>>     >  net/8021q/vlan.h                              |   2 +-
>>>     >  net/8021q/vlan_dev.c                          |  25 ++-
>>>     >  net/core/dev.c                                |   7 +-
>>>     >  tools/testing/selftests/net/Makefile          |   1 +
>>>     >  .../selftests/net/bridge_vlan_binding_test.sh | 143 ++++++++++++++++++
>>>     >  6 files changed, 172 insertions(+), 8 deletions(-)
>>>     >  create mode 100755 tools/testing/selftests/net/bridge_vlan_binding_test.sh
>>>     >
>>>
>>>     Hi,
>>>     NETDEV_CHANGE event is already propagated when the vlan changes flags,
>>>
>>>
>>> I'm not sure if NETDEV_CHANGE is actually propagated when the vlan changes flags. The two functions in the bridge module that handle NETDEV_CHANGE are br_vlan_port_event  and br_vlan_bridge_event. I've installed probes for both, and when I'm changing flags using "sudo ip link set vlan10 type vlan bridge_binding on", I don't see any of those functions getting called, although I do see vlan_dev_change_flags getting called. I think there may be a bug in core/dev.c:__dev_notify_flags.
>>
>> are both vlan and bridge interfaces up?
>> what exactly are you probing for?
> 
> 
> I first run the attached pre.sh script that sets up the environment
> and creates a vlan interface with bridge binding off. I then start
> recording with perf, and here's the list of probes:
> 
> $ sudo ./k/linux/tools/perf/perf probe -l
>   probe:br_vlan_bridge_event (on br_vlan_bridge_event in bridge with event dev)
>   probe:br_vlan_port_event (on br_vlan_port_event in bridge with event)
>   probe:br_vlan_set_vlan_dev_state (on br_vlan_set_vlan_dev_state in
> bridge with br vlan_dev)
>   probe:register_vlan_dev (on register_vlan_dev in 8021q with dev)
>   probe:vlan_changelink (on vlan_changelink in 8021q with dev)
>   probe:vlan_dev_change_flags (on vlan_dev_change_flags in 8021q with dev)
>   probe:vlan_dev_fix_features (on vlan_dev_fix_features in 8021q with dev)
>   probe:vlan_dev_init  (on vlan_dev_init in 8021q with dev)
>   probe:vlan_dev_ioctl (on vlan_dev_ioctl in 8021q with dev)
>   probe:vlan_dev_open  (on vlan_dev_open in 8021q with dev)
>   probe:vlan_dev_stop  (on vlan_dev_stop in 8021q with dev)
>   probe:vlan_dev_uninit (on vlan_dev_uninit in 8021q with dev)
>   probe:vlan_newlink   (on vlan_newlink in 8021q with dev)
> 
> I then run the following command to turn the bridge binding flag on:
> 
> $ sudo ip link set vlan10 type vlan bridge_binding on
> 
> Then I stop the recording and print out the events, and I see this. I
> don't see br_vlan_port_event or br_vlan_bridge_event getting called.
> 
>               ip  5933 [003]  2204.722470:
> probe:vlan_changelink: (ffffffffc1042b50) dev="vlan10"
>               ip  5933 [003]  2204.722476:
> probe:vlan_dev_change_flags: (ffffffffc1042600) dev="vlan10"
> 
> Am I doing something wrong?
> 
> Thanks
> 
> 

You can't expect to see br_vlan_bridge_event() called because the notification
target device is vlan10 and not the bridge. See br_device_event():
...
        if (netif_is_bridge_master(dev)) {
                err = br_vlan_bridge_event(dev, event, ptr);
                if (err)
                        return notifier_from_errno(err);
...


Try probing for br_device_event(), you'll see it gets called every time you change the flag.


^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: vlan: fix bridge binding behavior and add selftests
From: Sevinj Aghayeva @ 2022-08-10 15:00 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, aroulin, sbrivio, roopa, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, bridge
In-Reply-To: <49f933c3-7430-a133-9add-ed76c395023b@blackwall.org>

[-- Attachment #1: Type: text/plain, Size: 4778 bytes --]

On Wed, Aug 10, 2022 at 10:50 AM Nikolay Aleksandrov
<razor@blackwall.org> wrote:
>
> On 10/08/2022 17:42, Sevinj Aghayeva wrote:
> >
> >
> > On Wed, Aug 10, 2022 at 4:54 AM Nikolay Aleksandrov <razor@blackwall.org <mailto:razor@blackwall.org>> wrote:
> >
> >     On 10/08/2022 06:11, Sevinj Aghayeva wrote:
> >     > When bridge binding is enabled for a vlan interface, it is expected
> >     > that the link state of the vlan interface will track the subset of the
> >     > ports that are also members of the corresponding vlan, rather than
> >     > that of all ports.
> >     >
> >     > Currently, this feature works as expected when a vlan interface is
> >     > created with bridge binding enabled:
> >     >
> >     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
> >     >         bridge_binding on
> >     >
> >     > However, the feature does not work when a vlan interface is created
> >     > with bridge binding disabled, and then enabled later:
> >     >
> >     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
> >     >         bridge_binding off
> >     >   ip link set vlan10 type vlan bridge_binding on
> >     >
> >     > After these two commands, the link state of the vlan interface
> >     > continues to track that of all ports, which is inconsistent and
> >     > confusing to users. This series fixes this bug and introduces two
> >     > tests for the valid behavior.
> >     >
> >     > Sevinj Aghayeva (3):
> >     >   net: core: export call_netdevice_notifiers_info
> >     >   net: 8021q: fix bridge binding behavior for vlan interfaces
> >     >   selftests: net: tests for bridge binding behavior
> >     >
> >     >  include/linux/netdevice.h                     |   2 +
> >     >  net/8021q/vlan.h                              |   2 +-
> >     >  net/8021q/vlan_dev.c                          |  25 ++-
> >     >  net/core/dev.c                                |   7 +-
> >     >  tools/testing/selftests/net/Makefile          |   1 +
> >     >  .../selftests/net/bridge_vlan_binding_test.sh | 143 ++++++++++++++++++
> >     >  6 files changed, 172 insertions(+), 8 deletions(-)
> >     >  create mode 100755 tools/testing/selftests/net/bridge_vlan_binding_test.sh
> >     >
> >
> >     Hi,
> >     NETDEV_CHANGE event is already propagated when the vlan changes flags,
> >
> >
> > I'm not sure if NETDEV_CHANGE is actually propagated when the vlan changes flags. The two functions in the bridge module that handle NETDEV_CHANGE are br_vlan_port_event  and br_vlan_bridge_event. I've installed probes for both, and when I'm changing flags using "sudo ip link set vlan10 type vlan bridge_binding on", I don't see any of those functions getting called, although I do see vlan_dev_change_flags getting called. I think there may be a bug in core/dev.c:__dev_notify_flags.
>
> are both vlan and bridge interfaces up?
> what exactly are you probing for?


I first run the attached pre.sh script that sets up the environment
and creates a vlan interface with bridge binding off. I then start
recording with perf, and here's the list of probes:

$ sudo ./k/linux/tools/perf/perf probe -l
  probe:br_vlan_bridge_event (on br_vlan_bridge_event in bridge with event dev)
  probe:br_vlan_port_event (on br_vlan_port_event in bridge with event)
  probe:br_vlan_set_vlan_dev_state (on br_vlan_set_vlan_dev_state in
bridge with br vlan_dev)
  probe:register_vlan_dev (on register_vlan_dev in 8021q with dev)
  probe:vlan_changelink (on vlan_changelink in 8021q with dev)
  probe:vlan_dev_change_flags (on vlan_dev_change_flags in 8021q with dev)
  probe:vlan_dev_fix_features (on vlan_dev_fix_features in 8021q with dev)
  probe:vlan_dev_init  (on vlan_dev_init in 8021q with dev)
  probe:vlan_dev_ioctl (on vlan_dev_ioctl in 8021q with dev)
  probe:vlan_dev_open  (on vlan_dev_open in 8021q with dev)
  probe:vlan_dev_stop  (on vlan_dev_stop in 8021q with dev)
  probe:vlan_dev_uninit (on vlan_dev_uninit in 8021q with dev)
  probe:vlan_newlink   (on vlan_newlink in 8021q with dev)

I then run the following command to turn the bridge binding flag on:

$ sudo ip link set vlan10 type vlan bridge_binding on

Then I stop the recording and print out the events, and I see this. I
don't see br_vlan_port_event or br_vlan_bridge_event getting called.

              ip  5933 [003]  2204.722470:
probe:vlan_changelink: (ffffffffc1042b50) dev="vlan10"
              ip  5933 [003]  2204.722476:
probe:vlan_dev_change_flags: (ffffffffc1042600) dev="vlan10"

Am I doing something wrong?

Thanks



>
>
> I can see the NETDEV_CHANGE event go through when changing the loose binding.
>
>
>
>


-- 

Sevinj.Aghayeva

[-- Attachment #2: pre.sh --]
[-- Type: application/x-sh, Size: 1128 bytes --]

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: vlan: fix bridge binding behavior and add selftests
From: Nikolay Aleksandrov @ 2022-08-10 14:50 UTC (permalink / raw)
  To: Sevinj Aghayeva
  Cc: netdev, aroulin, sbrivio, roopa, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, bridge
In-Reply-To: <CAMWRUK45nbZS3PeSLR1X=Ko6oavrjKj2AWeh2F1wckMPrz_dEg@mail.gmail.com>

On 10/08/2022 17:42, Sevinj Aghayeva wrote:
> 
> 
> On Wed, Aug 10, 2022 at 4:54 AM Nikolay Aleksandrov <razor@blackwall.org <mailto:razor@blackwall.org>> wrote:
> 
>     On 10/08/2022 06:11, Sevinj Aghayeva wrote:
>     > When bridge binding is enabled for a vlan interface, it is expected
>     > that the link state of the vlan interface will track the subset of the
>     > ports that are also members of the corresponding vlan, rather than
>     > that of all ports.
>     >
>     > Currently, this feature works as expected when a vlan interface is
>     > created with bridge binding enabled:
>     >
>     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>     >         bridge_binding on
>     >
>     > However, the feature does not work when a vlan interface is created
>     > with bridge binding disabled, and then enabled later:
>     >
>     >   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>     >         bridge_binding off
>     >   ip link set vlan10 type vlan bridge_binding on
>     >
>     > After these two commands, the link state of the vlan interface
>     > continues to track that of all ports, which is inconsistent and
>     > confusing to users. This series fixes this bug and introduces two
>     > tests for the valid behavior.
>     >
>     > Sevinj Aghayeva (3):
>     >   net: core: export call_netdevice_notifiers_info
>     >   net: 8021q: fix bridge binding behavior for vlan interfaces
>     >   selftests: net: tests for bridge binding behavior
>     >
>     >  include/linux/netdevice.h                     |   2 +
>     >  net/8021q/vlan.h                              |   2 +-
>     >  net/8021q/vlan_dev.c                          |  25 ++-
>     >  net/core/dev.c                                |   7 +-
>     >  tools/testing/selftests/net/Makefile          |   1 +
>     >  .../selftests/net/bridge_vlan_binding_test.sh | 143 ++++++++++++++++++
>     >  6 files changed, 172 insertions(+), 8 deletions(-)
>     >  create mode 100755 tools/testing/selftests/net/bridge_vlan_binding_test.sh
>     >
> 
>     Hi,
>     NETDEV_CHANGE event is already propagated when the vlan changes flags,
> 
> 
> I'm not sure if NETDEV_CHANGE is actually propagated when the vlan changes flags. The two functions in the bridge module that handle NETDEV_CHANGE are br_vlan_port_event  and br_vlan_bridge_event. I've installed probes for both, and when I'm changing flags using "sudo ip link set vlan10 type vlan bridge_binding on", I don't see any of those functions getting called, although I do see vlan_dev_change_flags getting called. I think there may be a bug in core/dev.c:__dev_notify_flags.

are both vlan and bridge interfaces up?
what exactly are you probing for?

I can see the NETDEV_CHANGE event go through when changing the loose binding.





^ permalink raw reply

* [PATCH] can: rx-offload: Break loop on queue full
From: Uwe Kleine-König @ 2022-08-10 14:45 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde; +Cc: linux-can, netdev, kernel

The following happend on an i.MX25 using flexcan with many packets on
the bus:

The rx-offload queue reached a length more than skb_queue_len_max. So in
can_rx_offload_offload_one() the drop variable was set to true which
made the call to .mailbox_read() (here: flexcan_mailbox_read()) just
return ERR_PTR(-ENOBUFS) (plus some irrelevant hardware interaction) and
so can_rx_offload_offload_one() returned ERR_PTR(-ENOBUFS), too.

Now can_rx_offload_irq_offload_fifo() looks as follows:

	while (1) {
		skb = can_rx_offload_offload_one(offload, 0);
		if (IS_ERR(skb))
			continue;
		...
	}

As the i.MX25 is a single core CPU while the rx-offload processing is
active there is no thread to process packets from the offload queue and
so it doesn't get shorter.

The result is a tight loop: can_rx_offload_offload_one() does nothing
relevant and returns an error code and so
can_rx_offload_irq_offload_fifo() calls can_rx_offload_offload_one()
again.

To break that loop don't continue calling can_rx_offload_offload_one()
after it reported an error.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
Hello,

this patch just implements the obvious change to break the loop. I'm not
100% certain that there is no corner case where the break is wrong. The
problem exists at least since v5.6, didn't go back further to check.

This fixes a hard hang on said i.MX25.

Best regards
Uwe

 drivers/net/can/dev/rx-offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/dev/rx-offload.c b/drivers/net/can/dev/rx-offload.c
index a32a01c172d4..d5d33692bb6a 100644
--- a/drivers/net/can/dev/rx-offload.c
+++ b/drivers/net/can/dev/rx-offload.c
@@ -207,7 +207,7 @@ int can_rx_offload_irq_offload_fifo(struct can_rx_offload *offload)
 	while (1) {
 		skb = can_rx_offload_offload_one(offload, 0);
 		if (IS_ERR(skb))
-			continue;
+			break;
 		if (!skb)
 			break;
 
-- 
2.36.1


^ permalink raw reply related

* Re: [RFCv7 PATCH net-next 23/36] net: adjust the build check for net_gso_ok()
From: shenjian (K) @ 2022-08-10 14:41 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: davem, kuba, andrew, ecree.xilinx, hkallweit1, saeed, leon,
	netdev, linuxarm
In-Reply-To: <20220810110917.1307697-1-alexandr.lobakin@intel.com>



在 2022/8/10 19:09, Alexander Lobakin 写道:
> From: Jian Shen <shenjian15@huawei.com>
> Date: Wed, 10 Aug 2022 11:06:11 +0800
>
>> Introduce macro GSO_INDEX(x) to replace the NETIF_F_XXX
>> feature shift check, for all the macroes NETIF_F_XXX will
>> be remove later.
>>
>> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>> ---
>>   include/linux/netdevice.h | 40 ++++++++++++++++++++-------------------
>>   1 file changed, 21 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 1bd5dcbc884d..b01af2a3838d 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -4886,28 +4886,30 @@ netdev_features_t netif_skb_features(struct sk_buff *skb);
>>   
>>   static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>>   {
>> +#define GSO_INDEX(x)	((1ULL << (x)) >> NETIF_F_GSO_SHIFT)
> What if we get a new GSO offload which's corresponding bit will be
> higher than 64?
> You could instead do
>
> #define __SKB_GSO_FLAG(x)	(1ULL << (x))
>
> enum {
> 	SKB_GSO_TCPV4_BIT	= 0,
> 	SKB_GSO_DODGY_BIT	= 1,
> 	...,
> };
> enum {
> 	SKB_GSO_TCPV4		= __SKB_GSO_FLAG(TCPV4),
> 	SKB_GSO_DODGY		= __SKB_GSO_FLAG(DODGY),
> 	...,
> };
>
> and then just
>
> #define ASSERT_GSO_TYPE(fl, feat)	\
> 	static_assert((fl) == (feat) - NETIF_F_GSO_SHIFT)
>
> 	...
> 	ASSERT_GSO_TYPE(SKB_GSO_TCPV4_BIT, NETIF_F_TSO_BIT);
> 	ASSERT_GSO_TYPE(SKB_GSO_DODGY, NETIF_F_GSO_ROBUST_BIT);
> 	...
Yes, it may be misused for new GSO offload bit higher than 64, the macro
__SKB_GSO_FLAG(x) is better.

>> +
>>   	netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
>>   
>>   	/* check flags correspondence */
>> -	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_FCOE    != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_GRE     != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_GRE_CSUM != (NETIF_F_GSO_GRE_CSUM >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_IPXIP4  != (NETIF_F_GSO_IPXIP4 >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_IPXIP6  != (NETIF_F_GSO_IPXIP6 >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_PARTIAL != (NETIF_F_GSO_PARTIAL >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_SCTP    != (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
>> +	BUILD_BUG_ON(SKB_GSO_TCPV4   != GSO_INDEX(NETIF_F_TSO_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_DODGY   != GSO_INDEX(NETIF_F_GSO_ROBUST_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_TCP_ECN != GSO_INDEX(NETIF_F_TSO_ECN_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != GSO_INDEX(NETIF_F_TSO_MANGLEID_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_TCPV6   != GSO_INDEX(NETIF_F_TSO6_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_FCOE    != GSO_INDEX(NETIF_F_FSO_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_GRE     != GSO_INDEX(NETIF_F_GSO_GRE_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_GRE_CSUM != GSO_INDEX(NETIF_F_GSO_GRE_CSUM_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_IPXIP4  != GSO_INDEX(NETIF_F_GSO_IPXIP4_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_IPXIP6  != GSO_INDEX(NETIF_F_GSO_IPXIP6_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != GSO_INDEX(NETIF_F_GSO_UDP_TUNNEL_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != GSO_INDEX(NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_PARTIAL != GSO_INDEX(NETIF_F_GSO_PARTIAL_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != GSO_INDEX(NETIF_F_GSO_TUNNEL_REMCSUM_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_SCTP    != GSO_INDEX(NETIF_F_GSO_SCTP_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_ESP != GSO_INDEX(NETIF_F_GSO_ESP_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_UDP != GSO_INDEX(NETIF_F_GSO_UDP_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_UDP_L4 != GSO_INDEX(NETIF_F_GSO_UDP_L4_BIT));
>> +	BUILD_BUG_ON(SKB_GSO_FRAGLIST != GSO_INDEX(NETIF_F_GSO_FRAGLIST_BIT));
>>   
>>   	return (features & feature) == feature;
>>   }
>> -- 
>> 2.33.0
> Thanks,
> Olek
> .
>


^ permalink raw reply

* Re: [PATCH v2] net:bonding:support balance-alb interface with vlan to bridge
From: Jay Vosburgh @ 2022-08-10 14:32 UTC (permalink / raw)
  To: Sun Shouxin
  Cc: vfalico, andy, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, razor, huyd12
In-Reply-To: <20220809062103.31213-1-sunshouxin@chinatelecom.cn>

Sun Shouxin <sunshouxin@chinatelecom.cn> wrote:

>In my test, balance-alb bonding with two slaves eth0 and eth1,
>and then Bond0.150 is created with vlan id attached bond0.
>After adding bond0.150 into one linux bridge, I noted that Bond0,
>bond0.150 and  bridge were assigned to the same MAC as eth0.
>Once bond0.150 receives a packet whose dest IP is bridge's
>and dest MAC is eth1's, the linux bridge will not match
>eth1's MAC entry in FDB, and not handle it as expected.
>The patch fix the issue, and diagram as below:
>
>eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
>                      |
>                   bond0.150(mac:eth0_mac)
>                      |
>                   bridge(ip:br_ip, mac:eth0_mac)--other port
>
>Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
>Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>

	As Nik suggested, please add some additional explanation here.
You can cut and paste my description from the original discussion if
you'd like.

>---
>
>changelog:
>v1->v2:
>  -declare variabls in reverse xmas tree order
>  -delete {}
>  -add explanation in commit message
>---
> drivers/net/bonding/bond_alb.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 007d43e46dcb..60cb9a0225aa 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -653,6 +653,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb,
> static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> {
> 	struct slave *tx_slave = NULL;
>+	struct net_device *dev;
> 	struct arp_pkt *arp;
> 
> 	if (!pskb_network_may_pull(skb, sizeof(*arp)))
>@@ -665,6 +666,12 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> 	if (!bond_slave_has_mac_rx(bond, arp->mac_src))
> 		return NULL;
> 
>+	dev = ip_dev_find(dev_net(bond->dev), arp->ip_src);
>+	if (dev) {
>+		if (netif_is_bridge_master(dev))
>+			return NULL;

	Stylistically, the "if dev" and "if netif_is_bridge_master"
could be one line, e.g., "if dev && netif_is_bridge_master".

	Functionally, ip_dev_find acquires a reference to dev, and this
code will need to release (dev_put) that reference.

	I'm also wondering if testing bond->dev for netif_if_bridge_port
before ip_dev_find would help here (as an optimization); I think so, for
the case where the bond is directly in the bridge without a VLAN in the
middle.

	-J


>+	}
>+
> 	if (arp->op_code == htons(ARPOP_REPLY)) {
> 		/* the arp must be sent on the selected rx channel */
> 		tx_slave = rlb_choose_channel(skb, bond, arp);
>-- 
>2.27.0
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [RFCv7 PATCH net-next 17/36] treewide: adjust features initialization
From: shenjian (K) @ 2022-08-10 14:06 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: davem, kuba, andrew, ecree.xilinx, hkallweit1, saeed, leon,
	netdev, linuxarm
In-Reply-To: <20220810105831.1307150-1-alexandr.lobakin@intel.com>



在 2022/8/10 18:58, Alexander Lobakin 写道:
> From: Jian Shen <shenjian15@huawei.com>
> Date: Wed, 10 Aug 2022 11:06:05 +0800
>
>> There are many direclty single bit assignment to netdev features.
>> Adjust these expressions, so can use netdev features helpers later.
>>
>> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>> ---
>>   arch/um/drivers/vector_kern.c                       | 5 ++++-
>>   drivers/firewire/net.c                              | 4 +++-
>>   drivers/infiniband/hw/hfi1/vnic_main.c              | 4 +++-
>>   drivers/misc/sgi-xp/xpnet.c                         | 3 ++-
>>   drivers/net/can/dev/dev.c                           | 4 +++-
>>   drivers/net/ethernet/alacritech/slicoss.c           | 4 +++-
>>   drivers/net/ethernet/amd/xgbe/xgbe-drv.c            | 4 +++-
>>   drivers/net/ethernet/aquantia/atlantic/aq_nic.c     | 3 ++-
>>   drivers/net/ethernet/atheros/atlx/atl2.c            | 4 +++-
>>   drivers/net/ethernet/cadence/macb_main.c            | 4 +++-
>>   drivers/net/ethernet/davicom/dm9000.c               | 4 +++-
>>   drivers/net/ethernet/engleder/tsnep_main.c          | 4 +++-
>>   drivers/net/ethernet/ibm/ibmveth.c                  | 3 ++-
>>   drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 4 +++-
>>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c      | 3 ++-
>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 4 +++-
>>   drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 7 +++++--
>>   drivers/net/ethernet/netronome/nfp/nfp_net_repr.c   | 3 ++-
>>   drivers/net/ethernet/ni/nixge.c                     | 4 +++-
>>   drivers/net/ethernet/renesas/sh_eth.c               | 6 ++++--
>>   drivers/net/ethernet/sun/sunhme.c                   | 7 +++++--
>>   drivers/net/ethernet/toshiba/ps3_gelic_net.c        | 6 ++++--
>>   drivers/net/ethernet/toshiba/spider_net.c           | 3 ++-
>>   drivers/net/ethernet/tundra/tsi108_eth.c            | 3 ++-
>>   drivers/net/ethernet/xilinx/ll_temac_main.c         | 4 +++-
>>   drivers/net/ethernet/xilinx/xilinx_axienet_main.c   | 4 +++-
>>   drivers/net/hamradio/bpqether.c                     | 4 +++-
>>   drivers/net/hyperv/netvsc_drv.c                     | 3 ++-
>>   drivers/net/ipa/ipa_modem.c                         | 4 +++-
>>   drivers/net/ntb_netdev.c                            | 4 +++-
>>   drivers/net/rionet.c                                | 4 +++-
>>   drivers/net/tap.c                                   | 2 +-
>>   drivers/net/thunderbolt.c                           | 3 ++-
>>   drivers/net/usb/smsc95xx.c                          | 4 +++-
>>   drivers/net/virtio_net.c                            | 4 +++-
>>   drivers/net/wireless/ath/ath10k/mac.c               | 7 +++++--
>>   drivers/net/wireless/ath/ath11k/mac.c               | 4 +++-
>>   drivers/net/wireless/intel/iwlwifi/dvm/mac80211.c   | 4 +++-
>>   drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c   | 4 +++-
>>   drivers/net/wireless/mediatek/mt76/mt7615/init.c    | 4 +++-
>>   drivers/net/wireless/mediatek/mt76/mt7915/init.c    | 4 +++-
>>   drivers/net/wireless/mediatek/mt76/mt7921/init.c    | 4 +++-
>>   drivers/net/wwan/t7xx/t7xx_netdev.c                 | 4 +++-
>>   drivers/s390/net/qeth_core_main.c                   | 7 +++++--
>>   include/net/udp.h                                   | 4 +++-
>>   net/phonet/pep-gprs.c                               | 4 +++-
>>   46 files changed, 138 insertions(+), 52 deletions(-)
>>
>> diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c
>> index 1d59522a50d8..d797758850e1 100644
>> --- a/arch/um/drivers/vector_kern.c
>> +++ b/arch/um/drivers/vector_kern.c
>> @@ -1628,7 +1628,10 @@ static void vector_eth_configure(
>>   		.bpf			= NULL
>>   	});
>>   
>> -	dev->features = dev->hw_features = (NETIF_F_SG | NETIF_F_FRAGLIST);
>> +	netdev_active_features_zero(dev);
>> +	dev->features |= NETIF_F_SG;
>> +	dev->features |= NETIF_F_FRAGLIST;
>> +	dev->features = dev->hw_features;
> I think a new helper can be useful there and in a couple other
> places, which would set or clear an array of bits taking them as
> varargs:
>
> #define __netdev_features_set_set(feat, uniq, ...) ({	\
> 	DECLARE_NETDEV_FEATURE_SET(uniq, __VA_ARGS__);	\
> 	netdev_features_set_array(feat, &(uniq));	\
> })
> #define netdev_features_set_set(feat, ...)		\
> 	__smth(feat, __UNIQUE_ID(feat_set), __VA_ARGS__)
>
> (name is a placeholder)
>
> so that you can do
>
> 	netdev_active_features_zero(dev);
> 	netdev_features_set_set(dev->features, NETIF_F_SG, NETIF_F_FRAGLIST);
>
> in one take. I think it looks elegant, doesn't it?
good idea. I will try it, thanks!


>>   	INIT_WORK(&vp->reset_tx, vector_reset_tx);
>>   
>>   	timer_setup(&vp->tl, vector_timer_expire, 0);
> [...]
>
>> -- 
>> 2.33.0
> Thanks,
> Olek
>
> .
>


^ permalink raw reply

* [PATCH] fec: Restart PPS after link state change
From: Csókás Bence @ 2022-08-10 14:00 UTC (permalink / raw)
  To: netdev; +Cc: Richard Cochran, Csókás Bence

On link state change, the controller gets reset,
causing PPS to drop out. So we restart it if needed.

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
---
 drivers/net/ethernet/freescale/fec_main.c | 32 +++++++++++++++++++++--
 drivers/net/ethernet/freescale/fec_ptp.c  |  3 +++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 366c52b62d4b..546a152df4f4 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -257,6 +257,9 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 #define FEC_MMFR_TA		(2 << 16)
 #define FEC_MMFR_DATA(v)	(v & 0xffff)
 /* FEC ECR bits definition */
+#define FEC_ECR_RESET   	(1 << 0)
+#define FEC_ECR_ETHEREN 	(1 << 1)
+#define FEC_ECR_EN1588  	(1 << 4)
 #define FEC_ECR_MAGICEN		(1 << 2)
 #define FEC_ECR_SLEEP		(1 << 3)
 
@@ -955,6 +958,7 @@ fec_restart(struct net_device *ndev)
 	u32 temp_mac[2];
 	u32 rcntl = OPT_FRAME_SIZE | 0x04;
 	u32 ecntl = 0x2; /* ETHEREN */
+	struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
 
 	/* Whack a reset.  We should wait for this.
 	 * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
@@ -1106,7 +1110,7 @@ fec_restart(struct net_device *ndev)
 	}
 
 	if (fep->bufdesc_ex)
-		ecntl |= (1 << 4);
+		ecntl |= FEC_ECR_EN1588;
 
 #ifndef CONFIG_M5272
 	/* Enable the MIB statistic event counters */
@@ -1120,6 +1124,13 @@ fec_restart(struct net_device *ndev)
 	if (fep->bufdesc_ex)
 		fec_ptp_start_cyclecounter(ndev);
 
+	/* Restart PPS if needed */
+	if (fep->pps_enable) {
+		/* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+		fep->pps_enable = 0;
+		fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+	}
+
 	/* Enable interrupts we wish to service */
 	if (fep->link)
 		writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
@@ -1155,6 +1166,8 @@ fec_stop(struct net_device *ndev)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	u32 rmii_mode = readl(fep->hwp + FEC_R_CNTRL) & (1 << 8);
 	u32 val;
+	struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
+	u32 ecntl = 0;
 
 	/* We cannot expect a graceful transmit stop without link !!! */
 	if (fep->link) {
@@ -1185,12 +1198,27 @@ fec_stop(struct net_device *ndev)
 	}
 	writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
 
+	if (fep->bufdesc_ex)
+		ecntl |= FEC_ECR_EN1588;
+
 	/* We have to keep ENET enabled to have MII interrupt stay working */
 	if (fep->quirks & FEC_QUIRK_ENET_MAC &&
 		!(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
-		writel(2, fep->hwp + FEC_ECNTRL);
+		ecntl |= FEC_ECR_ETHEREN;
 		writel(rmii_mode, fep->hwp + FEC_R_CNTRL);
 	}
+
+	writel(ecntl, fep->hwp + FEC_ECNTRL);
+
+	if (fep->bufdesc_ex)
+		fec_ptp_start_cyclecounter(ndev);
+
+	/* Restart PPS if needed */
+	if (fep->pps_enable) {
+		/* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+		fep->pps_enable = 0;
+		fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+	}
 }
 
 
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index a5077eff305b..869d149efc53 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -628,6 +628,9 @@ void fec_ptp_stop(struct platform_device *pdev)
 	struct net_device *ndev = platform_get_drvdata(pdev);
 	struct fec_enet_private *fep = netdev_priv(ndev);
 
+	if (fep->pps_enable)
+		fec_ptp_enable_pps(fep, 0);
+
 	cancel_delayed_work_sync(&fep->time_keep);
 	if (fep->ptp_clock)
 		ptp_clock_unregister(fep->ptp_clock);
-- 
2.25.1


^ permalink raw reply related

* [PATCH] skfp/h: fix repeated words in comments
From: Jilin Yuan @ 2022-08-10 13:59 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni; +Cc: netdev, linux-kernel, Jilin Yuan

 Delete the redundant word 'the'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
---
 drivers/net/fddi/skfp/h/hwmtm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/fddi/skfp/h/hwmtm.h b/drivers/net/fddi/skfp/h/hwmtm.h
index 76c4a709d73d..e97db826cdd4 100644
--- a/drivers/net/fddi/skfp/h/hwmtm.h
+++ b/drivers/net/fddi/skfp/h/hwmtm.h
@@ -348,7 +348,7 @@ do {									\
  *		This macro is invoked by the OS-specific before it left the
  *		function mac_drv_rx_complete. This macro calls mac_drv_fill_rxd
  *		if the number of used RxDs is equal or lower than the
- *		the given low water mark.
+ *		given low water mark.
  *
  * para	low_water	low water mark of used RxD's
  *
-- 
2.36.1


^ permalink raw reply related

* Re: [PATCH] net: dsa: mv88e6060: report max mtu 1536
From: Vladimir Oltean @ 2022-08-10 13:35 UTC (permalink / raw)
  To: Sergei Antonov; +Cc: netdev, Florian Fainelli
In-Reply-To: <CABikg9zb7z8p7tE0H+fpmB_NSK3YVS-Sy4sqWbihziFdPBoL+Q@mail.gmail.com>

On Wed, Aug 10, 2022 at 03:00:20PM +0300, Sergei Antonov wrote:
> > >       val = addr[0] << 8 | addr[1];
> > >
> > >       /* The multicast bit is always transmitted as a zero, so the switch uses
> > > @@ -212,6 +211,11 @@ static int mv88e6060_setup(struct dsa_switch *ds)
> > >       return 0;
> > >  }
> > >
> > > +static int mv88e6060_port_max_mtu(struct dsa_switch *ds, int port)
> > > +{
> > > +     return MV88E6060_MAX_MTU;
> > > +}
> >
> > Does this solve any problem? It's ok for the hardware MTU to be higher
> > than advertised. The problem is when the hardware doesn't accept what
> > the stack thinks it should.
> 
> I need some time to reconstruct the problem. IIRC there was an attempt
> to set MTU 1504 (1500 + a switch overhead), but can not reproduce it
> at the moment.

What kernel are you using? According to Documentation/process/maintainer-netdev.rst,
you should test the patches you submit against the master branch from one of
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
or
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
depending on whether it's a new feature or if it fixes a problem.

Currently, both net and net-next contain the same thing (we are in a
merge window so net-next will not progress until kernel 6.0-rc1 is cut),
which is that dsa_slave_change_mtu() will not do anything because of
this:

	if (!ds->ops->port_change_mtu)
		return -EOPNOTSUPP;

(which mv88e6060 does not implement)

So I am slightly doubtful that anyone attempts an MTU change for this
switch, as you say.

The DSA master (host port, not switch), on the other hand, is a
different story. Its MTU is updated to 1504 by dsa_master_setup().

> > You're the first person to submit a patch on mv88e6060 that I see.
> > Is there a board with this switch available somewhere? Does the driver
> > still work?
> 
> Very nice to get your feedback. Because, yes, I am working with a
> device which has mv88e6060, it is called MOXA NPort 6610.
> 
> The driver works now. There was one problem which I had to workaround.
> Inside my device only ports 2 and 5 are used, so I initially wrote in
> .dts:
>         switch@0 {
>                 compatible = "marvell,mv88e6060";
>                 reg = <16>;

reg = <16> for switch@0? Something is wrong, probably switch@0.

> 
>                 ports {
>                         #address-cells = <1>;
>                         #size-cells = <0>;
> 
>                         port@2 {
>                                 reg = <2>;
>                                 label = "lan2";
>                         };
> 
>                         port@5 {
>                                 reg = <5>;
>                                 label = "cpu";
>                                 ethernet = <&mac1>;
>                         };
>                 };
>         };
> and the driver crashed in mv88e6060_setup_port() on a null pointer.
> Two workarounds are possible:
> 1. Describe ports 0, 1, 3, 4 in .dts too.

No, it should work with just port@2 and port@5 defined.

> 2. Insert this code at the beginning of mv88e6060_setup_port():
> if(!dsa_is_cpu_port(priv->ds, p) && !dsa_to_port(priv->ds, p)->cpu_dp)
>     return 0;
> 'cpu_dp' was the null pointer the driver crashed at.

You mean here:

			(dsa_is_cpu_port(priv->ds, p) ?
			 dsa_user_ports(priv->ds) :
			 BIT(dsa_to_port(priv->ds, p)->cpu_dp->index)));

Yes, this is a limitation that has been made worse by blind code
conversions (nobody seems to have the hardware or to know someone who
does; I've been tempted to delete the driver a few times or at least to
move it to staging, because of the unrealistically long delays until
someone chirps that something is broken for it, even when it obviously is).
The driver assumes that if the port isn't a CPU port, it's a user port.
That's clearly false.

You can probably put this at the beginning of mv88e6060_setup_port():

	if (dsa_is_unused_port(priv->ds, p))
		return 0;

The bug seems to have been introduced by commit 0abfd494deef ("net: dsa:
use dedicated CPU port"), because, although before we'd be uselessly
programming the port VLAN for a disabled port, now in doing so, we
dereference a NULL pointer.

FWIW, in case there is ever a need to backport, the vintage-correct fix
would be to use something like this:

	if (!dsa_port_is_valid(priv->ds->ports[p]))
		return 0;

but in that case the process is:
- send patch against current "net" tree
- wait until patch is queued up for "linux-stable" and backported as far
  as possible
- email will be sent that patch failed to apply to the still-maintained
  LTS branches as far as the Fixes: tag required (this is why it is
  important to populate the Fixes: tag correctly)
- reply to that email with a manually backported patch, just for that
  stable tree (linux-4.14.y etc)

> 
> One more observation. Generating and setting a random MAC in
> mv88e6060_setup_addr() is not necessary - the switch works without it
> (at least in my case).

The GLOBAL_MAC address that the switch uses there will be used as MAC SA
in PAUSE frames (802.3 flow control). Not clear if you were aware of
that fact when saying that the switch "works without it". In other words,
if you make a change in that area, I expect that flow control is what
you test, and not, say, ping.

It's true that some other switches use a MAC SA of 00:00:00:00:00:00 for
PAUSE frames (ocelot_init_port) and this hasn't caused a problem for them.
I don't know if the 6060 supports this mode. If it does, it's worth a shot.

^ permalink raw reply

* Re: [PATCH 1/8] perf arm64: Send pointer auth masks to ring buffer
From: Arnaldo Carvalho de Melo @ 2022-08-10 13:23 UTC (permalink / raw)
  To: Andrew Kilroy
  Cc: linux-kernel, linux-perf-users, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Tom Rix,
	linux-arm-kernel, netdev, bpf, llvm
In-Reply-To: <20220704145333.22557-2-andrew.kilroy@arm.com>

Em Mon, Jul 04, 2022 at 03:53:25PM +0100, Andrew Kilroy escreveu:
> Perf report cannot produce callgraphs using dwarf on arm64 where pointer
> authentication is enabled.  This is because libunwind and libdw cannot
> unmangle instruction pointers that have a pointer authentication code
> (PAC) embedded in them.
> 
> libunwind and libdw need to be given an instruction mask which they can
> use to arrive at the correct return address that does not contain the
> PAC.
> 
> The bits in the return address that contain the PAC can differ by
> process, so this patch adds a new sample field PERF_SAMPLE_ARCH_1
> to allow the kernel to send the masks up to userspace perf.
> 
> This field can be used in a architecture specific fashion, but on
> aarch64, it contains the ptrauth mask information.

I'm not seeing this kernel patch applied to tip/master or
torvalds/master, what is the status of that part? Then I can look at the
tooling part.

- Arnaldo
 
> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
> ---
>  arch/arm64/include/asm/arch_sample_data.h | 38 +++++++++++++++++++++++
>  arch/arm64/kernel/Makefile                |  2 +-
>  arch/arm64/kernel/arch_sample_data.c      | 37 ++++++++++++++++++++++
>  include/linux/perf_event.h                | 24 ++++++++++++++
>  include/uapi/linux/perf_event.h           |  5 ++-
>  kernel/events/core.c                      | 35 +++++++++++++++++++++
>  6 files changed, 139 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/include/asm/arch_sample_data.h
>  create mode 100644 arch/arm64/kernel/arch_sample_data.c
> 
> diff --git a/arch/arm64/include/asm/arch_sample_data.h b/arch/arm64/include/asm/arch_sample_data.h
> new file mode 100644
> index 000000000000..83fda293b1fc
> --- /dev/null
> +++ b/arch/arm64/include/asm/arch_sample_data.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _ASM_ARCH_SAMPLE_DATA_H
> +#define _ASM_ARCH_SAMPLE_DATA_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * Structure holding masks to help userspace stack unwinding
> + * in the presence of arm64 pointer authentication.
> + */
> +struct ptrauth_info {
> +	/*
> +	 * Bits 0, 1, 2, 3, 4 may be set to on, to indicate which keys are being used
> +	 * The APIAKEY, APIBKEY, APDAKEY, APDBKEY, or the APGAKEY respectively.
> +	 * Where all bits are off, pointer authentication is not in use for the
> +	 * process.
> +	 */
> +	u64 enabled_keys;
> +
> +	/*
> +	 * The on bits represent which bits in an instruction pointer
> +	 * constitute the pointer authentication code.
> +	 */
> +	u64 insn_mask;
> +
> +	/*
> +	 * The on bits represent which bits in a data pointer constitute the
> +	 * pointer authentication code.
> +	 */
> +	u64 data_mask;
> +};
> +
> +struct arch_sample_data {
> +	struct ptrauth_info ptrauth;
> +};
> +
> +#endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index fa7981d0d917..843c6e0e2393 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -44,7 +44,7 @@ obj-$(CONFIG_KUSER_HELPERS)		+= kuser32.o
>  obj-$(CONFIG_FUNCTION_TRACER)		+= ftrace.o entry-ftrace.o
>  obj-$(CONFIG_MODULES)			+= module.o
>  obj-$(CONFIG_ARM64_MODULE_PLTS)		+= module-plts.o
> -obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
> +obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o arch_sample_data.o
>  obj-$(CONFIG_HW_PERF_EVENTS)		+= perf_event.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
>  obj-$(CONFIG_CPU_PM)			+= sleep.o suspend.o
> diff --git a/arch/arm64/kernel/arch_sample_data.c b/arch/arm64/kernel/arch_sample_data.c
> new file mode 100644
> index 000000000000..2d47e8db0dbe
> --- /dev/null
> +++ b/arch/arm64/kernel/arch_sample_data.c
> @@ -0,0 +1,37 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <asm/arch_sample_data.h>
> +#include <linux/perf_event.h>
> +
> +inline void perf_output_sample_arch_1(struct perf_output_handle *handle,
> +				      struct perf_event_header *header,
> +				      struct perf_sample_data *data,
> +				      struct perf_event *event)
> +{
> +	perf_output_put(handle, data->arch.ptrauth.enabled_keys);
> +	perf_output_put(handle, data->arch.ptrauth.insn_mask);
> +	perf_output_put(handle, data->arch.ptrauth.data_mask);
> +}
> +
> +inline void perf_prepare_sample_arch_1(struct perf_event_header *header,
> +				       struct perf_sample_data *data,
> +				       struct perf_event *event,
> +				       struct pt_regs *regs)
> +{
> +	struct task_struct *task = current;
> +	int keys_result = ptrauth_get_enabled_keys(task);
> +	u64 user_pac_mask = keys_result > 0 ? ptrauth_user_pac_mask() : 0;
> +
> +	data->arch.ptrauth.enabled_keys = keys_result > 0 ? keys_result : 0;
> +	data->arch.ptrauth.insn_mask = user_pac_mask;
> +	data->arch.ptrauth.data_mask = user_pac_mask;
> +
> +	header->size += (3 * sizeof(u64));
> +}
> +
> +inline int perf_event_open_request_arch_1(void)
> +{
> +	return 0;
> +}
> +
> +
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index da759560eec5..8a99942989ce 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -999,6 +999,29 @@ int perf_event_read_local(struct perf_event *event, u64 *value,
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
>  
> +void perf_output_sample_arch_1(struct perf_output_handle *handle,
> +			       struct perf_event_header *header,
> +			       struct perf_sample_data *data,
> +			       struct perf_event *event);
> +
> +void perf_prepare_sample_arch_1(struct perf_event_header *header,
> +				struct perf_sample_data *data,
> +				struct perf_event *event,
> +				struct pt_regs *regs);
> +
> +int perf_event_open_request_arch_1(void);
> +
> +#if IS_ENABLED(CONFIG_ARM64)
> +
> +#define HAS_ARCH_SAMPLE_DATA
> +#include <asm/arch_sample_data.h>
> +
> +#endif
> +
> +#ifndef HAS_ARCH_SAMPLE_DATA
> +struct arch_sample_data {
> +};
> +#endif
>  
>  struct perf_sample_data {
>  	/*
> @@ -1041,6 +1064,7 @@ struct perf_sample_data {
>  	u64				cgroup;
>  	u64				data_page_size;
>  	u64				code_page_size;
> +	struct arch_sample_data		arch;
>  } ____cacheline_aligned;
>  
>  /* default value for data source */
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d37629dbad72..821bf5ff6a19 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -162,12 +162,15 @@ enum perf_event_sample_format {
>  	PERF_SAMPLE_DATA_PAGE_SIZE		= 1U << 22,
>  	PERF_SAMPLE_CODE_PAGE_SIZE		= 1U << 23,
>  	PERF_SAMPLE_WEIGHT_STRUCT		= 1U << 24,
> +	PERF_SAMPLE_ARCH_1			= 1U << 25,
>  
> -	PERF_SAMPLE_MAX = 1U << 25,		/* non-ABI */
> +	PERF_SAMPLE_MAX = 1U << 26,		/* non-ABI */
>  
>  	__PERF_SAMPLE_CALLCHAIN_EARLY		= 1ULL << 63, /* non-ABI; internal use */
>  };
>  
> +#define PERF_SAMPLE_ARM64_PTRAUTH PERF_SAMPLE_ARCH_1
> +
>  #define PERF_SAMPLE_WEIGHT_TYPE	(PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
>  /*
>   * values to program into branch_sample_type when PERF_SAMPLE_BRANCH is set
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 80782cddb1da..89ab8120f4f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6957,6 +6957,29 @@ static inline bool perf_sample_save_hw_index(struct perf_event *event)
>  	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
>  }
>  
> +#ifndef HAS_ARCH_SAMPLE_DATA
> +
> +inline void perf_output_sample_arch_1(struct perf_output_handle *handle __maybe_unused,
> +				      struct perf_event_header *header __maybe_unused,
> +				      struct perf_sample_data *data __maybe_unused,
> +				      struct perf_event *event __maybe_unused)
> +{
> +}
> +
> +inline void perf_prepare_sample_arch_1(struct perf_event_header *header __maybe_unused,
> +				       struct perf_sample_data *data __maybe_unused,
> +				       struct perf_event *event __maybe_unused,
> +				       struct pt_regs *regs __maybe_unused)
> +{
> +}
> +
> +inline int perf_event_open_request_arch_1(void)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif
> +
>  void perf_output_sample(struct perf_output_handle *handle,
>  			struct perf_event_header *header,
>  			struct perf_sample_data *data,
> @@ -7125,6 +7148,9 @@ void perf_output_sample(struct perf_output_handle *handle,
>  			perf_aux_sample_output(event, handle, data);
>  	}
>  
> +	if (sample_type & PERF_SAMPLE_ARCH_1)
> +		perf_output_sample_arch_1(handle, header, data, event);
> +
>  	if (!event->attr.watermark) {
>  		int wakeup_events = event->attr.wakeup_events;
>  
> @@ -7427,6 +7453,9 @@ void perf_prepare_sample(struct perf_event_header *header,
>  	if (sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)
>  		data->code_page_size = perf_get_page_size(data->ip);
>  
> +	if (sample_type & PERF_SAMPLE_ARCH_1)
> +		perf_prepare_sample_arch_1(header, data, event, regs);
> +
>  	if (sample_type & PERF_SAMPLE_AUX) {
>  		u64 size;
>  
> @@ -12074,6 +12103,12 @@ SYSCALL_DEFINE5(perf_event_open,
>  			return err;
>  	}
>  
> +	if (attr.sample_type & PERF_SAMPLE_ARCH_1) {
> +		err = perf_event_open_request_arch_1();
> +		if (err)
> +			return err;
> +	}
> +
>  	/*
>  	 * In cgroup mode, the pid argument is used to pass the fd
>  	 * opened to the cgroup directory in cgroupfs. The cpu argument
> -- 
> 2.17.1

-- 

- Arnaldo

^ permalink raw reply

* Re: [PATCH net] genetlink: correct uAPI defines
From: patchwork-bot+netdevbpf @ 2022-08-10 13:10 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, edumazet, pabeni, johannes.berg, johannes
In-Reply-To: <20220809232740.405668-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue,  9 Aug 2022 16:27:40 -0700 you wrote:
> Commit 50a896cf2d6f ("genetlink: properly support per-op policy dumping")
> seems to have copy'n'pasted things a little incorrectly.
> 
> The #define CTRL_ATTR_MCAST_GRP_MAX should have stayed right
> after the previous enum. The new CTRL_ATTR_POLICY_* needs
> its own define for MAX and that max should not contain the
> superfluous _DUMP in the name.
> 
> [...]

Here is the summary with links:
  - [net] genetlink: correct uAPI defines
    https://git.kernel.org/netdev/net/c/f329a0ebeaba

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf-next v4] selftests: xsk: Update poll test cases
From: Maciej Fijalkowski @ 2022-08-10 13:08 UTC (permalink / raw)
  To: Shibin Koikkara Reeny
  Cc: bpf, ast, daniel, netdev, magnus.karlsson, bjorn, kuba, andrii,
	ciara.loftus
In-Reply-To: <20220803144354.98122-1-shibin.koikkara.reeny@intel.com>

On Wed, Aug 03, 2022 at 02:43:54PM +0000, Shibin Koikkara Reeny wrote:
> Poll test case was not testing all the functionality
> of the poll feature in the testsuite. This patch
> update the poll test case which contain 2 testcases to

updates, contains, test cases

> test the RX and the TX poll functionality and additional
> 2 more testcases to check the timeout features of the

feature

> poll event.
> 
> Poll testsuite have 4 test cases:

test suite, has

> 
> 1. TEST_TYPE_RX_POLL:
> Check if RX path POLLIN function work as expect. TX path

works

> can use any method to sent the traffic.

send

> 
> 2. TEST_TYPE_TX_POLL:
> Check if TX path POLLOUT function work as expect. RX path
> can use any method to receive the traffic.
> 
> 3. TEST_TYPE_POLL_RXQ_EMPTY:
> Call poll function with parameter POLLIN on empty rx queue
> will cause timeout.If return timeout then test case is pass.

space after dot

> 
> 4. TEST_TYPE_POLL_TXQ_FULL:
> When txq is filled and packets are not cleaned by the kernel
> then if we invoke the poll function with POLLOUT then it
> should trigger timeout.
> 
> v1: https://lore.kernel.org/bpf/20220718095712.588513-1-shibin.koikkara.reeny@intel.com/
> v2: https://lore.kernel.org/bpf/20220726101723.250746-1-shibin.koikkara.reeny@intel.com/
> v3: https://lore.kernel.org/bpf/20220729132337.211443-1-shibin.koikkara.reeny@intel.com/
> 
> Changes in v2:
>  * Updated the commit message
>  * fixed the while loop flow in receive_pkts function.
> Changes in v3:
>  * Introduced single thread validation function.
>  * Removed pkt_stream_invalid().
>  * Updated TEST_TYPE_POLL_TXQ_FULL testcase to create invalid frame.
>  * Removed timer from send_pkts().
>  * Removed boolean variable skip_rx and skip_tx.
> Change in v4:
>  * Added is_umem_valid()

for single patches, I believe that it's concerned a better practice to
include the versioning below the '---' line?

> 
> Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
> ---
>  tools/testing/selftests/bpf/xskxceiver.c | 166 +++++++++++++++++------
>  tools/testing/selftests/bpf/xskxceiver.h |   8 +-
>  2 files changed, 134 insertions(+), 40 deletions(-)
> 

I don't think these grammar suggestions require a new revision, so:
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>


^ permalink raw reply

* [PATCH] rds: add missing barrier to release_refill
From: Mikulas Patocka @ 2022-08-10 13:00 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, linux-rdma, rds-devel

The functions clear_bit and set_bit do not imply a memory barrier, thus it
may be possible that the waitqueue_active function (which does not take
any locks) is moved before clear_bit and it could miss a wakeup event.

Fix this bug by adding a memory barrier after clear_bit.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

Index: linux-2.6/net/rds/ib_recv.c
===================================================================
--- linux-2.6.orig/net/rds/ib_recv.c
+++ linux-2.6/net/rds/ib_recv.c
@@ -363,6 +363,7 @@ static int acquire_refill(struct rds_con
 static void release_refill(struct rds_connection *conn)
 {
 	clear_bit(RDS_RECV_REFILL, &conn->c_flags);
+	smp_mb__after_atomic();
 
 	/* We don't use wait_on_bit()/wake_up_bit() because our waking is in a
 	 * hot path and finding waiters is very rare.  We don't want to walk


^ permalink raw reply

* Re: [PATCH v2 5/5] dt-bindings: Drop Dan Murphy and Ricardo Rivera-Matos
From: Sebastian Reichel @ 2022-08-10 12:57 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Michael Hennerich, Jean Delvare, Guenter Roeck, Rob Herring,
	Krzysztof Kozlowski, Jonathan Cameron, Lars-Peter Clausen,
	Pavel Machek, Tim Harvey, Lee Jones, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Liam Girdwood,
	Mark Brown, Andrew Davis, linux-hwmon, devicetree, linux-kernel,
	linux-iio, linux-fbdev, linux-leds, netdev, linux-pm, alsa-devel
In-Reply-To: <20220809162752.10186-6-krzysztof.kozlowski@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]

Hi,

On Tue, Aug 09, 2022 at 07:27:52PM +0300, Krzysztof Kozlowski wrote:
> Emails to Dan Murphy and Ricardo Rivera-Matos bounce ("550 Invalid
> recipient").  Andrew Davis agreed to take over the bindings.
> 
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> 
> ---
> 
> Changes since v1:
> 1. Add Andrew Davis instead.
> 2. Not adding accumulated ack due to change above.
> ---
...
>  Documentation/devicetree/bindings/power/supply/bq2515x.yaml    | 3 +--
>  Documentation/devicetree/bindings/power/supply/bq256xx.yaml    | 2 +-
>  Documentation/devicetree/bindings/power/supply/bq25980.yaml    | 3 +--

Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>

-- Sebastian

> [...]
> diff --git a/Documentation/devicetree/bindings/power/supply/bq2515x.yaml b/Documentation/devicetree/bindings/power/supply/bq2515x.yaml
> index 27db38577822..1a1b240034ef 100644
> --- a/Documentation/devicetree/bindings/power/supply/bq2515x.yaml
> +++ b/Documentation/devicetree/bindings/power/supply/bq2515x.yaml
> @@ -8,8 +8,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
>  title: TI bq2515x 500-mA Linear charger family
>  
>  maintainers:
> -  - Dan Murphy <dmurphy@ti.com>
> -  - Ricardo Rivera-Matos <r-rivera-matos@ti.com>
> +  - Andrew Davis <afd@ti.com>
>  
>  description: |
>    The BQ2515x family is a highly integrated battery charge management IC that
> diff --git a/Documentation/devicetree/bindings/power/supply/bq256xx.yaml b/Documentation/devicetree/bindings/power/supply/bq256xx.yaml
> index 91abe5733c41..82f382a7ffb3 100644
> --- a/Documentation/devicetree/bindings/power/supply/bq256xx.yaml
> +++ b/Documentation/devicetree/bindings/power/supply/bq256xx.yaml
> @@ -8,7 +8,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
>  title: TI bq256xx Switch Mode Buck Charger
>  
>  maintainers:
> -  - Ricardo Rivera-Matos <r-rivera-matos@ti.com>
> +  - Andrew Davis <afd@ti.com>
>  
>  description: |
>    The bq256xx devices are a family of highly-integrated battery charge
> diff --git a/Documentation/devicetree/bindings/power/supply/bq25980.yaml b/Documentation/devicetree/bindings/power/supply/bq25980.yaml
> index 4883527ab5c7..b687b8bcd705 100644
> --- a/Documentation/devicetree/bindings/power/supply/bq25980.yaml
> +++ b/Documentation/devicetree/bindings/power/supply/bq25980.yaml
> @@ -8,8 +8,7 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
>  title: TI BQ25980 Flash Charger
>  
>  maintainers:
> -  - Dan Murphy <dmurphy@ti.com>
> -  - Ricardo Rivera-Matos <r-rivera-matos@ti.com>
> +  - Andrew Davis <afd@ti.com>
>  
>  description: |
>    The BQ25980, BQ25975, and BQ25960 are a series of flash chargers intended
> [...]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Introduce sched_numa_hop_mask()
From: Tariq Toukan @ 2022-08-10 12:57 UTC (permalink / raw)
  To: Valentin Schneider, netdev, linux-kernel
  Cc: Tariq Toukan, David S. Miller, Saeed Mahameed, Jakub Kicinski,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Eric Dumazet,
	Paolo Abeni, Gal Pressman, Vincent Guittot
In-Reply-To: <db20e6fe-4368-15ec-65c5-ead28fc7981b@gmail.com>



On 8/10/2022 3:42 PM, Tariq Toukan wrote:
> 
> 
> On 8/10/2022 1:51 PM, Valentin Schneider wrote:
>> Tariq has pointed out that drivers allocating IRQ vectors would benefit
>> from having smarter NUMA-awareness - cpumask_local_spread() only knows
>> about the local node and everything outside is in the same bucket.
>>
>> sched_domains_numa_masks is pretty much what we want to hand out (a 
>> cpumask
>> of CPUs reachable within a given distance budget), introduce
>> sched_numa_hop_mask() to export those cpumasks. Add in an iteration 
>> helper
>> to iterate over CPUs at an incremental distance from a given node.
>>
>> Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.com
>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>> ---
>>   include/linux/topology.h | 12 ++++++++++++
>>   kernel/sched/topology.c  | 28 ++++++++++++++++++++++++++++
>>   2 files changed, 40 insertions(+)
>>
>> diff --git a/include/linux/topology.h b/include/linux/topology.h
>> index 4564faafd0e1..d66e3cf40823 100644
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -245,5 +245,17 @@ static inline const struct cpumask 
>> *cpu_cpu_mask(int cpu)
>>       return cpumask_of_node(cpu_to_node(cpu));
>>   }
>> +#ifdef CONFIG_NUMA
>> +extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
>> +#else
>> +static inline const struct cpumask *sched_numa_hop_mask(int node, int 
>> hops)
>> +{
>> +    return -ENOTSUPP;
> 
> missing ERR_PTR()
> 
>> +}
>> +#endif    /* CONFIG_NUMA */
>> +
>> +#define for_each_numa_hop_mask(node, hops, mask)            \
>> +    for (mask = sched_numa_hop_mask(node, hops); 
>> !IS_ERR_OR_NULL(mask); \
>> +         mask = sched_numa_hop_mask(node, ++hops))
>>   #endif /* _LINUX_TOPOLOGY_H */
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 8739c2a5a54e..f0236a0ae65c 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -2067,6 +2067,34 @@ int sched_numa_find_closest(const struct 
>> cpumask *cpus, int cpu)
>>       return found;
>>   }
>> +/**
>> + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops 
>> away.
>> + * @node: The node to count hops from.
>> + * @hops: Include CPUs up to that many hops away. 0 means local node.

AFAIU, here you work with a specific level/num of hops, description is 
not accurate.


>> + *
>> + * Requires rcu_lock to be held. Returned cpumask is only valid 
>> within that
>> + * read-side section, copy it if required beyond that.
>> + *
>> + * Note that not all hops are equal in size; see sched_init_numa() 
>> for how
>> + * distances and masks are handled.
>> + *
>> + * Also note that this is a reflection of sched_domains_numa_masks, 
>> which may change
>> + * during the lifetime of the system (offline nodes are taken out of 
>> the masks).
>> + */
>> +const struct cpumask *sched_numa_hop_mask(int node, int hops)
>> +{
>> +    struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks);
>> +
>> +    if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
>> +        return ERR_PTR(-EINVAL);
>> +
>> +    if (!masks)
>> +        return NULL;
>> +
>> +    return masks[hops][node];
>> +}
>> +EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
>> +
>>   #endif /* CONFIG_NUMA */
>>   static int __sdt_alloc(const struct cpumask *cpu_map)

^ permalink raw reply

* Re: [PATCH 2/2] net/mlx5e: Leverage sched_numa_hop_mask()
From: Tariq Toukan @ 2022-08-10 12:57 UTC (permalink / raw)
  To: Valentin Schneider, netdev, linux-kernel, Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Saeed Mahameed, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Eric Dumazet, Paolo Abeni,
	Gal Pressman, Vincent Guittot
In-Reply-To: <20220810105119.2684079-2-vschneid@redhat.com>



On 8/10/2022 1:51 PM, Valentin Schneider wrote:
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
> 

Missing description.

I had a very detailed description with performance numbers and an 
affinity hints example with before/after tables. I don't want to get 
them lost.


> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 229728c80233..2eb4ffd96a95 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -809,9 +809,12 @@ static void comp_irqs_release(struct mlx5_core_dev *dev)
>   static int comp_irqs_request(struct mlx5_core_dev *dev)
>   {
>   	struct mlx5_eq_table *table = dev->priv.eq_table;
> +	const struct cpumask *mask;
>   	int ncomp_eqs = table->num_comp_eqs;
> +	int hops = 0;
>   	u16 *cpus;
>   	int ret;
> +	int cpu;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -830,8 +833,17 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   		ret = -ENOMEM;
>   		goto free_irqs;
>   	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> +	rcu_read_lock();
> +	for_each_numa_hop_mask(dev->priv.numa_node, hops, mask) {

We don't really use this 'hops' iterator. We always pass 0 (not a 
valuable input...), and we do not care about its final value. Probably 
it's best to hide it from the user into the macro.

> +		for_each_cpu(cpu, mask) {
> +			cpus[i] = cpu;
> +			if (++i == ncomp_eqs)
> +				goto spread_done;
> +		}
> +	}
> +spread_done:
> +	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);
>   	if (ret < 0)

This logic is typical. Other drivers would also want to use it.
It must be introduced as a service/API function, if not by the sched 
topology, then at least by the networking subsystem.
Jakub, WDYT?

^ permalink raw reply

* Re: [PATCH v2 03/13] firmware: google: Test spinlock on panic path to avoid lockups
From: Greg Kroah-Hartman @ 2022-08-10 12:54 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Evan Green, linux-efi, LKML, Ard Biesheuvel, Andrew Morton, bhe,
	Petr Mladek, kexec, linux-hyperv, netdev, x86, kernel-dev, kernel,
	halves, fabiomirmar, alejandro.j.jimenez, Andy Shevchenko,
	Arnd Bergmann, Borislav Petkov, Jonathan Corbet, d.hatayama,
	dave.hansen, dyoung, feng.tang, mikelley, hidehiro.kawai.ez,
	jgross, john.ogness, Kees Cook, luto, mhiramat, mingo, paulmck,
	peterz, rostedt, senozhatsky, Alan Stern, Thomas Gleixner, vgoyal,
	vkuznets, Will Deacon, David Gow, Julius Werner
In-Reply-To: <55a074a0-ca3a-8afc-4336-e40cff757394@igalia.com>

On Mon, Aug 08, 2022 at 12:37:46PM -0300, Guilherme G. Piccoli wrote:
> Let me clarify / ask something: this series, for example, is composed as
> a bunch of patches "centered" around the same idea, panic notifiers
> improvements/fixes. But its patches belong to completely different
> subsystems, like EFI/misc, architectures (alpha, parisc, arm), core
> kernel code, etc.
> 
> What is the best way of getting this merged?
> (a) Re-send individual patches with the respective Review/ACK tags to
> the proper subsystem, or;

Yes.

> (b) Wait until the whole series is ACKed/Reviewed, and a single
> maintainer (like you or Andrew, for example) would pick the whole series
> and apply at once, even if it spans across multiple parts of the kernel?

No, only do this after a kernel release cycle happens and there are
straggler patches that did not get picked up by the relevant subsystem
maintainers.

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox