Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] checkpatch: Add kmap and kmap_atomic to the deprecated list
From: Chaitanya Kulkarni @ 2022-08-14  5:25 UTC (permalink / raw)
  To: ira.weiny@intel.com, Andy Whitcroft, Joe Perches
  Cc: nvdimm@lists.linux.dev, kvm@vger.kernel.org,
	linux-sh@vger.kernel.org, kgdb-bugreport@lists.sourceforge.net,
	dri-devel@lists.freedesktop.org, linux-mips@vger.kernel.org,
	linux-ide@vger.kernel.org, dm-devel@redhat.com,
	keyrings@vger.kernel.org, linux-mtd@lists.infradead.org,
	sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux1394-devel@lists.sourceforge.net, linux-scsi@vger.kernel.org,
	linux-rdma@vger.kernel.org, x86@kernel.org,
	linux-csky@vger.kernel.org, iommu@lists.linux.dev,
	linux-snps-arc@lists.infradead.org, Fabio M . De Francesco,
	linux-media@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-um@lists.infradead.org, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, loongarch@lists.linux.dev,
	Thomas Gleixner, virtualization@lists.linux-foundation.org,
	bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-edac@vger.kernel.org, linux-raid@vger.kernel.org,
	netdev@vger.kernel.org, linux-mmc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20220813220034.806698-1-ira.weiny@intel.com>

On 8/13/22 15:00, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> kmap() and kmap_atomic() are being deprecated in favor of
> kmap_local_page().
> 
> There are two main problems with kmap(): (1) It comes with an overhead
> as mapping space is restricted and protected by a global lock for
> synchronization and (2) it also requires global TLB invalidation when
> the kmap’s pool wraps and it might block when the mapping space is fully
> utilized until a slot becomes available.
> 
> kmap_local_page() is safe from any context and is therefore redundant
> with kmap_atomic() with the exception of any pagefault or preemption
> disable requirements.  However, using kmap_atomic() for these side
> effects makes the code less clear.  So any requirement for pagefault or
> preemption disable should be made explicitly.
> 
> With kmap_local_page() the mappings are per thread, CPU local, can take
> page faults, and can be called from any context (including interrupts).
> It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore,
> the tasks can be preempted and, when they are scheduled to run again,
> the kernel virtual addresses are restored.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Suggested by credits.
> 	Thomas: Idea to keep from growing more kmap/kmap_atomic calls.
> 	Fabio: Stole some of his boiler plate commit message.
> 
> Notes on tree-wide conversions:
> 
> I've cc'ed mailing lists for subsystems which currently contains either kmap()
> or kmap_atomic() calls.  As some of you already know Fabio and I have been
> working through converting kmap() calls to kmap_local_page().  But there is a
> lot more work to be done.  Help from the community is always welcome,
> especially with kmap_atomic() conversions.  To keep from stepping on each
> others toes I've created a spreadsheet of the current calls[1].  Please let me
> or Fabio know if you plan on tacking one of the conversions so we can mark it
> off the list.
> 
> [1] https://docs.google.com/spreadsheets/d/1i_ckZ10p90bH_CkxD2bYNi05S2Qz84E2OFPv8zq__0w/edit#gid=1679714357
> 

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>



^ permalink raw reply

* RE: [PATCH net 1/1] net: macsec: Fix XPN properties passing to macsec offload
From: Emeel Hakim @ 2022-08-14  7:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: edumazet@google.com, mayflowerera@gmail.com, pabeni@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Raed Salem
In-Reply-To: <20220811092543.696a5ef2@kernel.org>



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, 11 August 2022 19:26
> To: Emeel Hakim <ehakim@nvidia.com>
> Cc: edumazet@google.com; mayflowerera@gmail.com; pabeni@redhat.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Raed Salem
> <raeds@nvidia.com>
> Subject: Re: [PATCH net 1/1] net: macsec: Fix XPN properties passing to macsec
> offload
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 9 Aug 2022 13:29:05 +0300 Emeel Hakim wrote:
> > Currently macsec invokes HW offload path before reading extended
> > packet number (XPN) related user properties i.e. salt and short secure
> > channel identifier (ssci), hence preventing macsec XPN HW offload.
> >
> > Fix by moving macsec XPN properties reading prior to HW offload path.
> >
> > Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites")
> > Reviewed-by: Raed Salem <raeds@nvidia.com>
> > Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
> 
> Is there a driver in the tree which uses those values today?
> I can't grep out any rx_sa->key accesses in the drivers at all :S
> 
> If there is none it's not really a fix.

Thanks for the review, agreed
will repost it with commit adjustment to net-next as part
of a macsec offload series.

^ permalink raw reply

* Re: [PATCH RFC net-next 0/3] net: vlan: fix bridge binding behavior and add selftests
From: Nikolay Aleksandrov @ 2022-08-14  7:38 UTC (permalink / raw)
  To: Sevinj Aghayeva
  Cc: netdev, aroulin, sbrivio, roopa, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel, bridge
In-Reply-To: <CAMWRUK4Mo2KHfa-6Z4Ka+ZLx8TtmzSvq9CLmMmEwE5S7Yp7-Kw@mail.gmail.com>

On 12/08/2022 18:30, Sevinj Aghayeva wrote:
> On Wed, Aug 10, 2022 at 4:54 AM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> On 10/08/2022 06:11, Sevinj Aghayeva wrote:
>>> When bridge binding is enabled for a vlan interface, it is expected
>>> that the link state of the vlan interface will track the subset of the
>>> ports that are also members of the corresponding vlan, rather than
>>> that of all ports.
>>>
>>> Currently, this feature works as expected when a vlan interface is
>>> created with bridge binding enabled:
>>>
>>>   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>>>         bridge_binding on
>>>
>>> However, the feature does not work when a vlan interface is created
>>> with bridge binding disabled, and then enabled later:
>>>
>>>   ip link add link br name vlan10 type vlan id 10 protocol 802.1q \
>>>         bridge_binding off
>>>   ip link set vlan10 type vlan bridge_binding on
>>>
>>> After these two commands, the link state of the vlan interface
>>> continues to track that of all ports, which is inconsistent and
>>> confusing to users. This series fixes this bug and introduces two
>>> tests for the valid behavior.
>>>
>>> Sevinj Aghayeva (3):
>>>   net: core: export call_netdevice_notifiers_info
>>>   net: 8021q: fix bridge binding behavior for vlan interfaces
>>>   selftests: net: tests for bridge binding behavior
>>>
>>>  include/linux/netdevice.h                     |   2 +
>>>  net/8021q/vlan.h                              |   2 +-
>>>  net/8021q/vlan_dev.c                          |  25 ++-
>>>  net/core/dev.c                                |   7 +-
>>>  tools/testing/selftests/net/Makefile          |   1 +
>>>  .../selftests/net/bridge_vlan_binding_test.sh | 143 ++++++++++++++++++
>>>  6 files changed, 172 insertions(+), 8 deletions(-)
>>>  create mode 100755 tools/testing/selftests/net/bridge_vlan_binding_test.sh
>>>
>>
>> Hi,
>> NETDEV_CHANGE event is already propagated when the vlan changes flags,
>> NETDEV_CHANGEUPPER is used when the devices' relationship changes not their flags.
>> The only problem you have to figure out is that the flag has changed. The fix itself
>> must be done within the bridge, not 8021q. You can figure it out based on current bridge
>> loose binding state and the vlan's changed state, again in the bridge's NETDEV_CHANGE
>> handler. Unfortunately the proper fix is much more involved and will need new
>> infra, you'll have to track the loose binding vlans in the bridge. To do that you should
>> add logic that reflects the current vlans' loose binding state *only* for vlans that also
>> exist in the bridge, the rest which are upper should be carrier off if they have the loose
>> binding flag set.
>>
>> Alternatively you can add a new NETDEV_ notifier (using something similar to struct netdev_notifier_pre_changeaddr_info)
>> and add link type-specific space (e.g. union of link type-specific structs) in the struct which will contain
>> what changed for 8021q and will be properly interpreted by the bridge. The downside is that we'll generate
>> 2 notifications when changing the loose binding flag, but on the bright side won't have to track anything
>> in the bridge, just handle the new notifier type. This might be the easiest path, the fix is still in
>> the bridge though, the 8021q module just needs to fill in the new struct and emit the notification on
>> any loose binding changes, the bridge must decide if it should process it (i.e. based on upper/lower
>> relationship). Such notifier can be also re-used by other link types to propagate link-type specific
>> changes.

Hi,

> 
> Hi Nik,
> 
> Can you please clarify the following?
> 
> 1) should the new NETDEV_ notifier be about the vlan device and not
> the bridge? That is, should I handle it in br_device_event?

Yes, it should be about the vlan device (i.e. the target device that changes its state).

> 2) is it still okay to export call_netdevice_notifiers_info or should
> i write a new function for this?
> 

If you need it, export it. But if you do it similar to netdev_notifier_pre_changeaddr_info
then you don't have to, more below.

> The answers to the above wasn't clear to me, but I came up with the
> following patch anyway, so perhaps you can also comment on it. I'm
> pasting it inline; this is against 5.19.
> 

A few comments inline below,

> Thanks!
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2563d30736e9..c63205eb1f72 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2762,6 +2762,7 @@ enum netdev_cmd {
>   NETDEV_UNREGISTER,
>   NETDEV_CHANGEMTU, /* notify after mtu change happened */
>   NETDEV_CHANGEADDR, /* notify after the address change */
> + NETDEV_CHANGEUPPERFLAGS,

Please don't use CHANGEUPPER, that is about a device changing its
upper device. Also make it more generic, NETDEV_CHANGEFLAGS is too
specific. For example today we have NETDEV_CHANGEINFODATA which TBH
sounds good, but is tied to bonding in a few places, e.g.:
        case NETDEV_CHANGEINFODATA:
                rtnl_event_type = IFLA_EVENT_BONDING_OPTIONS;

which is very unfortunate. We really need a generic notifier that can pass
link-type specific information alongside the device. As I mentioned please
see how netdev_notifier_pre_changeaddr_info is handled, we need something
generic that extends netdev_notifier_info and the various link types can add
their own structures in a union which is to be interpreted based on the link
type. For example if the new notifier is called NETDEV_CHANGE_DETAILS then
in the bridge we'll check if the target device is a vlan and interpret the
structure's union as the vlan change information. It'd be nice to get more
feedback about this from others as well.

Also note that this notifier is for internal use for the time being so it's not necessary
to export these notifications to user-space yet.

I would've opted for extending NETDEV_CHANGE itself, but that would be quite the
adventure. :)

>   NETDEV_PRE_CHANGEADDR, /* notify before the address change */
>   NETDEV_GOING_DOWN,
>   NETDEV_CHANGENAME,
> @@ -2837,6 +2838,12 @@ struct netdev_notifier_changelowerstate_info {
>   void *lower_state_info; /* is lower dev state */
>  };
> 
> +struct netdev_notifier_changeupperflags_info {
> + struct netdev_notifier_info info; /* must be first */
> + struct net_device *upper_dev;

just dev, not upper
we should be able to use this construct for any link type and actually
we don't need the device here, we already have it in info.dev

> + bool vlan_bridge_binding;

add this into a vlan-specific structure that should be in a union here so
other link types can add their own later

> +};
> +
>  struct netdev_notifier_pre_changeaddr_info {
>   struct netdev_notifier_info info; /* must be first */
>   const unsigned char *dev_addr;
> @@ -2898,6 +2905,8 @@ netdev_notifier_info_to_extack(const struct
> netdev_notifier_info *info)
>  }
> 
>  int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
> +int call_netdevice_notifiers_info(unsigned long val,
> +  struct netdev_notifier_info *info);

No need for this if you handle notifications similar to dev_pre_changeaddr_notify()
with netdev_notifier_pre_changeaddr_info

> 
> 
>  extern rwlock_t dev_base_lock; /* Device list lock */
> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
> index 5eaf38875554..71947cdcfaaa 100644
> --- a/net/8021q/vlan.h
> +++ b/net/8021q/vlan.h
> @@ -130,7 +130,7 @@ void vlan_dev_set_ingress_priority(const struct
> net_device *dev,
>  int vlan_dev_set_egress_priority(const struct net_device *dev,
>   u32 skb_prio, u16 vlan_prio);
>  void vlan_dev_free_egress_priority(const struct net_device *dev);
> -int vlan_dev_change_flags(const struct net_device *dev, u32 flag, u32 mask);
> +int vlan_dev_change_flags(struct net_device *dev, u32 flag, u32 mask);
>  void vlan_dev_get_realdev_name(const struct net_device *dev, char *result,
>         size_t size);
> 
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 839f2020b015..68da3901dfb0 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -208,11 +208,18 @@ int vlan_dev_set_egress_priority(const struct
> net_device *dev,
>   return 0;
>  }
> 
> +static inline bool netif_is_bridge(const struct net_device *dev)

no inline in .c files, let the compiler decide

> +{
> + return dev->rtnl_link_ops &&
> +    !strcmp(dev->rtnl_link_ops->kind, "bridge");
> +}
> +

there is already netif_is_bridge_master()

>  /* Flags are defined in the vlan_flags enum in
>   * include/uapi/linux/if_vlan.h file.
>   */
> -int vlan_dev_change_flags(const struct net_device *dev, u32 flags, u32 mask)
> +int vlan_dev_change_flags(struct net_device *dev, u32 flags, u32 mask)
>  {
> + struct netdev_notifier_changeupperflags_info info;
>   struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
>   u32 old_flags = vlan->flags;
> 
> @@ -223,19 +230,33 @@ int vlan_dev_change_flags(const struct
> net_device *dev, u32 flags, u32 mask)
> 
>   vlan->flags = (old_flags & ~mask) | (flags & mask);
> 
> - if (netif_running(dev) && (vlan->flags ^ old_flags) & VLAN_FLAG_GVRP) {
> + if (!netif_running(dev))
> + return 0;
> +
> + if ((vlan->flags ^ old_flags) & VLAN_FLAG_GVRP) {
>   if (vlan->flags & VLAN_FLAG_GVRP)
>   vlan_gvrp_request_join(dev);
>   else
>   vlan_gvrp_request_leave(dev);
>   }
> 
> - if (netif_running(dev) && (vlan->flags ^ old_flags) & VLAN_FLAG_MVRP) {
> + if ((vlan->flags ^ old_flags) & VLAN_FLAG_MVRP) {
>   if (vlan->flags & VLAN_FLAG_MVRP)
>   vlan_mvrp_request_join(dev);
>   else
>   vlan_mvrp_request_leave(dev);
>   }
> +
> + if ((vlan->flags ^ old_flags) & VLAN_FLAG_BRIDGE_BINDING &&
> +    netif_is_bridge(vlan->real_dev)) {
> + info.info.dev = vlan->real_dev;
> + info.upper_dev = dev;
> + info.vlan_bridge_binding =
> +    !!(vlan->flags & VLAN_FLAG_BRIDGE_BINDING);
> + call_netdevice_notifiers_info(NETDEV_CHANGEUPPERFLAGS,
> +    &info.info);
> + }
> +
>   return 0;
>  }
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 0f5e75ccac79..cbcb0877d4a4 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1718,6 +1718,7 @@ static void nbp_vlan_set_vlan_dev_state(struct
> net_bridge_port *p, u16 vid)
>  /* Must be protected by RTNL. */
>  int br_vlan_bridge_event(struct net_device *dev, unsigned long event,
> void *ptr)
>  {
> + struct netdev_notifier_changeupperflags_info *flags_info;
>   struct netdev_notifier_changeupper_info *info;
>   struct net_bridge *br = netdev_priv(dev);
>   int vlcmd = 0, ret = 0;
> @@ -1739,7 +1740,11 @@ int br_vlan_bridge_event(struct net_device
> *dev, unsigned long event, void *ptr)
>   info = ptr;
>   br_vlan_upper_change(dev, info->upper_dev, info->linking);
>   break;
> -
> + case NETDEV_CHANGEUPPERFLAGS:
> + flags_info = ptr;
> + br_vlan_upper_change(dev, flags_info->upper_dev,
> +    flags_info->vlan_bridge_binding);
> + break;
>   case NETDEV_CHANGE:
>   case NETDEV_UP:
>   if (!br_opt_get(br, BROPT_VLAN_BRIDGE_BINDING))
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 30a1603a7225..bc8640d77d83 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -160,8 +160,6 @@ struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
>  struct list_head ptype_all __read_mostly; /* Taps */
> 
>  static int netif_rx_internal(struct sk_buff *skb);
> -static int call_netdevice_notifiers_info(unsigned long val,
> - struct netdev_notifier_info *info);
>  static int call_netdevice_notifiers_extack(unsigned long val,
>     struct net_device *dev,
>     struct netlink_ext_ack *extack);
> @@ -1624,7 +1622,7 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd)
>   N(POST_INIT) N(RELEASE) N(NOTIFY_PEERS) N(JOIN) N(CHANGEUPPER)
>   N(RESEND_IGMP) N(PRECHANGEMTU) N(CHANGEINFODATA) N(BONDING_INFO)
>   N(PRECHANGEUPPER) N(CHANGELOWERSTATE) N(UDP_TUNNEL_PUSH_INFO)
> - N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
> + N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN) N(CHANGEUPPERFLAGS)
>   N(CVLAN_FILTER_PUSH_INFO) N(CVLAN_FILTER_DROP_INFO)
>   N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO)
>   N(PRE_CHANGEADDR) N(OFFLOAD_XSTATS_ENABLE) N(OFFLOAD_XSTATS_DISABLE)
> @@ -1927,8 +1925,8 @@ static void
> move_netdevice_notifiers_dev_net(struct net_device *dev,
>   * are as for raw_notifier_call_chain().
>   */
> 
> -static int call_netdevice_notifiers_info(unsigned long val,
> - struct netdev_notifier_info *info)
> +int call_netdevice_notifiers_info(unsigned long val,
> +  struct netdev_notifier_info *info)
>  {
>   struct net *net = dev_net(info->dev);
>   int ret;
> @@ -1944,6 +1942,7 @@ static int
> call_netdevice_notifiers_info(unsigned long val,
>   return ret;
>   return raw_notifier_call_chain(&netdev_chain, val, info);
>  }
> +EXPORT_SYMBOL(call_netdevice_notifiers_info);
> 
>  /**
>   * call_netdevice_notifiers_info_robust - call per-netns notifier blocks
> 
> 
>>
>> Both of these avoid any direct dependencies between the bridge and 8021q. Any other suggestions that
>> are simpler, avoid direct dependencies and solve the issue in a generic way would be appreciated.
>>
>> Just be careful about introducing too much unnecessary processing because we
>> can have lots of vlan devices in a system.
>>
>> Cheers,
>>  Nik
> 
> 
> 


^ permalink raw reply

* RE: [PATCH 2/2] Revert "mlxsw: core: Add the hottest thermal zone detection"
From: Vadim Pasternak @ 2022-08-14  7:42 UTC (permalink / raw)
  To: Daniel Lezcano, rafael@kernel.org
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ido Schimmel, Petr Machata,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <6cf66002-f13d-a1ee-7fa6-dfa78d6be427@linaro.org>



> -----Original Message-----
> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> Sent: Friday, August 5, 2022 7:07 PM
> To: Vadim Pasternak <vadimp@nvidia.com>; rafael@kernel.org
> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>; Petr Machata
> <petrm@nvidia.com>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>
> Subject: Re: [PATCH 2/2] Revert "mlxsw: core: Add the hottest thermal zone
> detection"
> 
> 
> Hi Vadim,
> 
> 
> On 04/08/2022 14:21, Vadim Pasternak wrote:
> >
> >
> >> -----Original Message-----
> >> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> Sent: Monday, August 1, 2022 12:56 PM
> >> To: daniel.lezcano@linaro.org; rafael@kernel.org
> >> Cc: Vadim Pasternak <vadimp@nvidia.com>; davem@davemloft.net;
> >> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Ido Schimmel
> >> <idosch@nvidia.com>; Petr Machata <petrm@nvidia.com>; Eric Dumazet
> >> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni
> >> <pabeni@redhat.com>
> >> Subject: [PATCH 2/2] Revert "mlxsw: core: Add the hottest thermal
> >> zone detection"
> >>
> >> This reverts commit 6f73862fabd93213de157d9cc6ef76084311c628.
> >>
> >> As discussed in the thread:
> >>
> >> https://lore.kernel.org/all/f3c62ebe-7d59-c537-a010-
> >> bff366c8aeba@linaro.org/
> >>
> >> the feature provided by commits 2dc2f760052da and 6f73862fabd93 is
> >> actually already handled by the thermal framework via the cooling
> >> device state aggregation, thus all this code is pointless.
> >>
> >> The revert conflicts with the following changes:
> >>   - 7f4957be0d5b8: thermal: Use mode helpers in drivers
> >>   - 6a79507cfe94c: mlxsw: core: Extend thermal module with per QSFP
> >> module thermal zones
> >>
> >> These conflicts were fixed and the resulting changes are in this patch.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Tested-by: Vadim Pasternak <vadimp@nvidia.com>
> 
> Thanks for testing
> 
> > Daniel,
> > Could you, please, re-base the patch on top of net-next as Jakub
> mentioned?
> > Or do you want me to do it?
> 
> It is fine, I can do it. The conflict is trivial.
> 
> However, I would have preferred to have the patch in my tree so I can
> continue the consolidation work.
> 
> Is it ok if I pick the patch and the conflict being simple, that can be handle at
> merge time, no?

Hi Daniel,

Sorry for the delay.

Yes, it is OK. There are no plans to make changes in 'core_thermal.c' module
In current cycle, so it should be fine.

Thanks,
Vadim.

> 
> > There is also redundant blank line in this patch:
> >
> 	&mlxsw_thermal_module_ops,
> > +
> >
> 	&mlxsw_thermal_params,
> 
> Yeah, thanks.
> 
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-
> blog/> Blog

^ permalink raw reply

* Re: [Patch net] net: dsa: microchip: ksz9477: fix fdb_dump last invalid entry
From: Vladimir Oltean @ 2022-08-14  7:56 UTC (permalink / raw)
  To: Arun Ramadoss
  Cc: linux-kernel, netdev, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Russell King, Tristram Ha
In-Reply-To: <20220812093411.5879-1-arun.ramadoss@microchip.com>

On Fri, Aug 12, 2022 at 03:04:11PM +0530, Arun Ramadoss wrote:
> In the ksz9477_fdb_dump function it reads the ALU control register and
> exit from the timeout loop if there is valid entry or search is
> complete. After exiting the loop, it reads the alu entry and report to
> the user space irrespective of entry is valid. It works till the valid
> entry. If the loop exited when search is complete, it reads the alu
> table. The table returns all ones and it is reported to user space. So
> bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port.
> To fix it, after exiting the loop the entry is reported only if it is
> valid one.
> 
> Fixes: c2e866911e25 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")

I think this should be:
Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
since that's when ksz9477_port_fdb_dump() was introduced, with identical
logic.

> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
>  drivers/net/dsa/microchip/ksz9477.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
> index 4b14d80d27ed..aa961dc03ddf 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -613,15 +613,17 @@ int ksz9477_fdb_dump(struct ksz_device *dev, int port,
>  			goto exit;
>  		}
>  
> -		/* read ALU table */
> -		ksz9477_read_table(dev, alu_table);
> +		if (ksz_data & ALU_VALID) {

I wonder if you could avoid increasing the indentation level using:

		if (!(ksz_data & ALU_VALID))
			continue;

> +			/* read ALU table */
> +			ksz9477_read_table(dev, alu_table);
>  
> -		ksz9477_convert_alu(&alu, alu_table);
> +			ksz9477_convert_alu(&alu, alu_table);
>  
> -		if (alu.port_forward & BIT(port)) {
> -			ret = cb(alu.mac, alu.fid, alu.is_static, data);
> -			if (ret)
> -				goto exit;
> +			if (alu.port_forward & BIT(port)) {
> +				ret = cb(alu.mac, alu.fid, alu.is_static, data);
> +				if (ret)
> +					goto exit;
> +			}
>  		}
>  	} while (ksz_data & ALU_START);
>  
> 
> base-commit: f86d1fbbe7858884d6754534a0afbb74fc30bc26
> -- 
> 2.36.1
> 


^ permalink raw reply

* Re: [PATCH net-next v1 07/10] net: dsa: microchip: warn about not supported synclko properties on KSZ9893 chips
From: Vladimir Oltean @ 2022-08-14  8:04 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Woojung Huh, Florian Fainelli, David S. Miller,
	netdev, linux-kernel, UNGLinuxDriver, Eric Dumazet, kernel,
	Jakub Kicinski, Paolo Abeni, Vivien Didelot, Rob Herring,
	devicetree
In-Reply-To: <20220814042608.GC12534@pengutronix.de>

On Sun, Aug 14, 2022 at 06:26:08AM +0200, Oleksij Rempel wrote:
> Heh :) Currently with "unevaluatedProperties: false" restrictions do not
> work at all. At least for me. For example with this change I have no
> warnings:
> diff --git a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
> index 1e26d876d1463..da38ad98a152f 100644
> --- a/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
> +++ b/Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
> @@ -120,6 +120,7 @@ examples:
>              ethernet-switch@1 {
>                      reg = <0x1>;
>                      compatible = "nxp,sja1105t";
> +                    something-random-here;
>  
>                      ethernet-ports {
>                              #address-cells = <1>;
> 
> make dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml
> 
> So the main question is, is it broken for all or just for me? If it is
> just me, what i'm doing wrong?

Might it be due to the additionalProperties: true from spi-peripheral-props.yaml?

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Introduce sched_numa_hop_mask()
From: Tariq Toukan @ 2022-08-14  8:19 UTC (permalink / raw)
  To: Valentin Schneider, netdev, linux-kernel
  Cc: Tariq Toukan, David S. Miller, Saeed Mahameed, Jakub Kicinski,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Eric Dumazet,
	Paolo Abeni, Gal Pressman, Vincent Guittot
In-Reply-To: <xhsmhmtcac0up.mognet@vschneid.remote.csb>



On 8/11/2022 5:26 PM, Valentin Schneider wrote:
> On 10/08/22 15:57, Tariq Toukan wrote:
>> On 8/10/2022 3:42 PM, Tariq Toukan wrote:
>>>
>>>
>>> On 8/10/2022 1:51 PM, Valentin Schneider wrote:
>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>> index 8739c2a5a54e..f0236a0ae65c 100644
>>>> --- a/kernel/sched/topology.c
>>>> +++ b/kernel/sched/topology.c
>>>> @@ -2067,6 +2067,34 @@ int sched_numa_find_closest(const struct
>>>> cpumask *cpus, int cpu)
>>>>        return found;
>>>>    }
>>>> +/**
>>>> + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops
>>>> away.
>>>> + * @node: The node to count hops from.
>>>> + * @hops: Include CPUs up to that many hops away. 0 means local node.
>>
>> AFAIU, here you work with a specific level/num of hops, description is
>> not accurate.
>>
> 
> Hmph, unfortunately it's the other way around - the masks do include CPUs
> *up to* a number of hops, but in my mlx5 example I've used it as if it only
> included CPUs a specific distance away :/
> 

Aha, got it. It makes it more challenging :)

> As things stand we'd need a temporary cpumask to account for which CPUs we
> have visited (which is what you had in your original submission), but with
> a for_each_cpu_andnot() we don't need any of that.
> 
> Below is what I ended up with. I've tested it on a range of NUMA topologies
> and it behaves as I'd expect, and on the plus side the code required in the
> driver side is even simpler than before.
> 
> If you don't have major gripes with it, I'll shape that into a proper
> series and will let you handle the mlx5/enic bits.
> 

The API is indeed easy to use, the driver part looks straight forward.

I appreciate the tricks you used to make it work!
However, the implementation is relatively complicated, not easy to read 
or understand, and touches several files. I do understand what you did 
here, but I guess not all respective maintainers will like it. Let's see.

One alternative to consider, that will simplify things up, is switching 
back to returning an array of cpus, ordered by their distance, up to a 
provided argument 'npus'.
This way, you will iterate over sched_numa_hop_mask() internally, easily 
maintaining the cpumask diffs between two hops, without the need of 
making it on-the-fly as part an an exposed for-loop macro.

> ---
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 229728c80233..0a5432903edd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -812,6 +812,7 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   	int ncomp_eqs = table->num_comp_eqs;
>   	u16 *cpus;
>   	int ret;
> +	int cpu;
>   	int i;
>   
>   	ncomp_eqs = table->num_comp_eqs;
> @@ -830,8 +831,15 @@ static int comp_irqs_request(struct mlx5_core_dev *dev)
>   		ret = -ENOMEM;
>   		goto free_irqs;
>   	}
> -	for (i = 0; i < ncomp_eqs; i++)
> -		cpus[i] = cpumask_local_spread(i, dev->priv.numa_node);
> +
> +	rcu_read_lock();
> +	for_each_numa_hop_cpus(cpu, dev->priv.numa_node) {
> +		cpus[i] = cpu;
> +		if (++i == ncomp_eqs)
> +			goto spread_done;
> +	}
> +spread_done:
> +	rcu_read_unlock();
>   	ret = mlx5_irqs_request_vectors(dev, cpus, ncomp_eqs, table->comp_irqs);
>   	kfree(cpus);
>   	if (ret < 0)
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index fe29ac7cc469..ccd5d71aefef 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -157,6 +157,13 @@ static inline unsigned int cpumask_next_and(int n,
>   	return n+1;
>   }
>   
> +static inline unsigned int cpumask_next_andnot(int n,
> +					    const struct cpumask *srcp,
> +					    const struct cpumask *andp)
> +{
> +	return n+1;
> +}
> +
>   static inline unsigned int cpumask_next_wrap(int n, const struct cpumask *mask,
>   					     int start, bool wrap)
>   {
> @@ -194,6 +201,8 @@ static inline int cpumask_any_distribute(const struct cpumask *srcp)
>   	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)(start))
>   #define for_each_cpu_and(cpu, mask1, mask2)	\
>   	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2)
> +#define for_each_cpu_andnot(cpu, mask1, mask2)	\
> +	for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2)
>   #else
>   /**
>    * cpumask_first - get the first cpu in a cpumask
> @@ -259,6 +268,7 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
>   }
>   
>   int __pure cpumask_next_and(int n, const struct cpumask *, const struct cpumask *);
> +int __pure cpumask_next_andnot(int n, const struct cpumask *, const struct cpumask *);
>   int __pure cpumask_any_but(const struct cpumask *mask, unsigned int cpu);
>   unsigned int cpumask_local_spread(unsigned int i, int node);
>   int cpumask_any_and_distribute(const struct cpumask *src1p,
> @@ -324,6 +334,26 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool
>   	for ((cpu) = -1;						\
>   		(cpu) = cpumask_next_and((cpu), (mask1), (mask2)),	\
>   		(cpu) < nr_cpu_ids;)
> +
> +/**
> + * for_each_cpu_andnot - iterate over every cpu in one mask but not in another
> + * @cpu: the (optionally unsigned) integer iterator
> + * @mask1: the first cpumask pointer
> + * @mask2: the second cpumask pointer
> + *
> + * This saves a temporary CPU mask in many places.  It is equivalent to:
> + *	struct cpumask tmp;
> + *	cpumask_andnot(&tmp, &mask1, &mask2);
> + *	for_each_cpu(cpu, &tmp)
> + *		...
> + *
> + * After the loop, cpu is >= nr_cpu_ids.
> + */
> +#define for_each_cpu_andnot(cpu, mask1, mask2)				\
> +	for ((cpu) = -1;						\
> +		(cpu) = cpumask_next_andnot((cpu), (mask1), (mask2)),	\
> +		(cpu) < nr_cpu_ids;)
> +
>   #endif /* SMP */
>   
>   #define CPU_BITS_NONE						\
> diff --git a/include/linux/find.h b/include/linux/find.h
> index 424ef67d4a42..454cde69b30b 100644
> --- a/include/linux/find.h
> +++ b/include/linux/find.h
> @@ -10,7 +10,8 @@
>   
>   extern unsigned long _find_next_bit(const unsigned long *addr1,
>   		const unsigned long *addr2, unsigned long nbits,
> -		unsigned long start, unsigned long invert, unsigned long le);
> +		unsigned long start, unsigned long invert, unsigned long le,
> +		bool negate);
>   extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
>   extern unsigned long _find_first_and_bit(const unsigned long *addr1,
>   					 const unsigned long *addr2, unsigned long size);
> @@ -41,7 +42,7 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>   		return val ? __ffs(val) : size;
>   	}
>   
> -	return _find_next_bit(addr, NULL, size, offset, 0UL, 0);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL, 0, 0);
>   }
>   #endif
>   
> @@ -71,7 +72,38 @@ unsigned long find_next_and_bit(const unsigned long *addr1,
>   		return val ? __ffs(val) : size;
>   	}
>   
> -	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0);
> +	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0, 0);
> +}
> +#endif
> +
> +#ifndef find_next_andnot_bit
> +/**
> + * find_next_andnot_bit - find the next set bit in one memory region
> + *                        but not in the other
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @size: The bitmap size in bits
> + * @offset: The bitnumber to start searching at
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +static inline
> +unsigned long find_next_andnot_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset)
> +{
> +	if (small_const_nbits(size)) {
> +		unsigned long val;
> +
> +		if (unlikely(offset >= size))
> +			return size;
> +
> +		val = *addr1 & ~*addr2 & GENMASK(size - 1, offset);
> +		return val ? __ffs(val) : size;
> +	}
> +
> +	return _find_next_bit(addr1, addr2, size, offset, 0UL, 0, 1);
>   }
>   #endif
>   
> @@ -99,7 +131,7 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>   		return val == ~0UL ? size : ffz(val);
>   	}
>   
> -	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL, 0, 0);
>   }
>   #endif
>   
> @@ -247,7 +279,7 @@ unsigned long find_next_zero_bit_le(const void *addr, unsigned
>   		return val == ~0UL ? size : ffz(val);
>   	}
>   
> -	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL, 1, 0);
>   }
>   #endif
>   
> @@ -266,7 +298,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
>   		return val ? __ffs(val) : size;
>   	}
>   
> -	return _find_next_bit(addr, NULL, size, offset, 0UL, 1);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL, 1, 0);
>   }
>   #endif
>   
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 4564faafd0e1..41bed4b883d3 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -245,5 +245,50 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu)
>   	return cpumask_of_node(cpu_to_node(cpu));
>   }
>   
> +#ifdef CONFIG_NUMA
> +extern const struct cpumask *sched_numa_hop_mask(int node, int hops);
> +#else
> +static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
> +{
> +	return ERR_PTR(-ENOTSUPP);
> +}
> +#endif	/* CONFIG_NUMA */
> +
> +/**
> + * for_each_numa_hop_cpu - iterate over CPUs by increasing NUMA distance,
> + *                         starting from a given node.
> + * @cpu: the iteration variable.
> + * @node: the NUMA node to start the search from.
> + *
> + * Requires rcu_lock to be held.
> + * Careful: this is a double loop, 'break' won't work as expected.
> + *
> + *
> + * Implementation notes:
> + *
> + * Providing it is valid, the mask returned by
> + *  sched_numa_hop_mask(node, hops+1)
> + * is a superset of the one returned by
> + *   sched_numa_hop_mask(node, hops)
> + * which may not be that useful for drivers that try to spread things out and
> + * want to visit a CPU not more than once.
> + *
> + * To accomodate for that, we use for_each_cpu_andnot() to iterate over the cpus
> + * of sched_numa_hop_mask(node, hops+1) with the CPUs of
> + * sched_numa_hop_mask(node, hops) removed, IOW we only iterate over CPUs
> + * a given distance away (rather than *up to* a given distance).
> + *
> + * h=0 forces us to play silly games and pass cpu_none_mask to
> + * for_each_cpu_andnot(), which turns it into for_each_cpu().
> + */
> +#define for_each_numa_hop_cpu(cpu, node)				       \
> +	for (struct { const struct cpumask *mask; int hops; } __v__ =	       \
> +		     { sched_numa_hop_mask(node, 0), 0 };		       \
> +	     !IS_ERR_OR_NULL(__v__.mask);				       \
> +	     __v__.hops++, __v__.mask = sched_numa_hop_mask(node, __v__.hops)) \
> +		for_each_cpu_andnot(cpu, __v__.mask,			       \
> +				    __v__.hops ?			       \
> +				    sched_numa_hop_mask(node, __v__.hops - 1) :\
> +				    cpu_none_mask)
>   
>   #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
> index 976092b7bd45..9182101f2c4f 100644
> --- a/kernel/sched/Makefile
> +++ b/kernel/sched/Makefile
> @@ -29,6 +29,6 @@ endif
>   # build parallelizes well and finishes roughly at once:
>   #
>   obj-y += core.o
> -obj-y += fair.o
> +obj-y += fair.o yolo.o
>   obj-y += build_policy.o
>   obj-y += build_utility.o
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 8739c2a5a54e..f0236a0ae65c 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2067,6 +2067,34 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
>   	return found;
>   }
>   
> +/**
> + * sched_numa_hop_mask() - Get the cpumask of CPUs at most @hops hops away.
> + * @node: The node to count hops from.
> + * @hops: Include CPUs up to that many hops away. 0 means local node.
> + *
> + * Requires rcu_lock to be held. Returned cpumask is only valid within that
> + * read-side section, copy it if required beyond that.
> + *
> + * Note that not all hops are equal in size; see sched_init_numa() for how
> + * distances and masks are handled.
> + *
> + * Also note that this is a reflection of sched_domains_numa_masks, which may change
> + * during the lifetime of the system (offline nodes are taken out of the masks).
> + */
> +const struct cpumask *sched_numa_hop_mask(int node, int hops)
> +{
> +	struct cpumask ***masks = rcu_dereference(sched_domains_numa_masks);
> +
> +	if (node >= nr_node_ids || hops >= sched_domains_numa_levels)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!masks)
> +		return NULL;
> +
> +	return masks[hops][node];
> +}
> +EXPORT_SYMBOL_GPL(sched_numa_hop_mask);
> +
>   #endif /* CONFIG_NUMA */
>   
>   static int __sdt_alloc(const struct cpumask *cpu_map)
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index a971a82d2f43..8bcf7e919193 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -42,6 +42,25 @@ int cpumask_next_and(int n, const struct cpumask *src1p,
>   }
>   EXPORT_SYMBOL(cpumask_next_and);
>   
> +/**
> + * cpumask_next_andnot - get the next cpu in *src1p & ~*src2p
> + * @n: the cpu prior to the place to search (ie. return will be > @n)
> + * @src1p: the first cpumask pointer
> + * @src2p: the second cpumask pointer
> + *
> + * Returns >= nr_cpu_ids if no further cpus set in both.
> + */
> +int cpumask_next_andnot(int n, const struct cpumask *src1p,
> +		     const struct cpumask *src2p)
> +{
> +	/* -1 is a legal arg here. */
> +	if (n != -1)
> +		cpumask_check(n);
> +	return find_next_andnot_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> +		nr_cpumask_bits, n + 1);
> +}
> +EXPORT_SYMBOL(cpumask_next_andnot);
> +
>   /**
>    * cpumask_any_but - return a "random" in a cpumask, but not this one.
>    * @mask: the cpumask to search
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 1b8e4b2a9cba..6e5f42c621a9 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -21,17 +21,19 @@
>   
>   #if !defined(find_next_bit) || !defined(find_next_zero_bit) ||			\
>   	!defined(find_next_bit_le) || !defined(find_next_zero_bit_le) ||	\
> -	!defined(find_next_and_bit)
> +	!defined(find_next_and_bit) || !defined(find_next_andnot_bit)
>   /*
>    * This is a common helper function for find_next_bit, find_next_zero_bit, and
>    * find_next_and_bit. The differences are:
>    *  - The "invert" argument, which is XORed with each fetched word before
>    *    searching it for one bits.
> - *  - The optional "addr2", which is anded with "addr1" if present.
> + *  - The optional "addr2", negated if "negate" and ANDed with "addr1" if
> + *    present.
>    */
>   unsigned long _find_next_bit(const unsigned long *addr1,
>   		const unsigned long *addr2, unsigned long nbits,
> -		unsigned long start, unsigned long invert, unsigned long le)
> +		unsigned long start, unsigned long invert, unsigned long le,
> +		bool negate)
>   {
>   	unsigned long tmp, mask;
>   
> @@ -40,7 +42,9 @@ unsigned long _find_next_bit(const unsigned long *addr1,
>   
>   	tmp = addr1[start / BITS_PER_LONG];
>   	if (addr2)
> -		tmp &= addr2[start / BITS_PER_LONG];
> +		tmp &= negate ?
> +		       ~addr2[start / BITS_PER_LONG] :
> +			addr2[start / BITS_PER_LONG];
>   	tmp ^= invert;
>   
>   	/* Handle 1st word. */
> @@ -59,7 +63,9 @@ unsigned long _find_next_bit(const unsigned long *addr1,
>   
>   		tmp = addr1[start / BITS_PER_LONG];
>   		if (addr2)
> -			tmp &= addr2[start / BITS_PER_LONG];
> +			tmp &= negate ?
> +			       ~addr2[start / BITS_PER_LONG] :
> +				addr2[start / BITS_PER_LONG];
>   		tmp ^= invert;
>   	}
>   
> 

^ permalink raw reply

* Re: [PATCH 1/2] sched/topology: Introduce sched_numa_hop_mask()
From: Tariq Toukan @ 2022-08-14  8:26 UTC (permalink / raw)
  To: Valentin Schneider, netdev, linux-kernel
  Cc: Tariq Toukan, David S. Miller, Saeed Mahameed, Jakub Kicinski,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Eric Dumazet,
	Paolo Abeni, Gal Pressman, Vincent Guittot
In-Reply-To: <6a2dae6d-cbac-84ba-8852-dadd183fb77d@gmail.com>


>> If you don't have major gripes with it, I'll shape that into a proper
>> series and will let you handle the mlx5/enic bits.
>>

Sure I can take the drivers/networking parts. In order to submit the API 
and its usage combined in a one patchset, I can send you these parts 
privately and you combine it into your submitted series, or the other 
way-around if you want me to do the submission.
Both work for me.
I will do it once we converge with the API.

Important note: I'll be out-of-office, with very limited access to 
email, until Sep 1st. I doubt I can progress much before then.

^ permalink raw reply

* Re: [GIT PULL] virtio: fatures, fixes
From: Michael S. Tsirkin @ 2022-08-14  8:59 UTC (permalink / raw)
  To: Andres Freund
  Cc: Xuan Zhuo, Linus Torvalds, kvm, virtualization, netdev,
	linux-kernel, alvaro.karsz, colin.i.king, colin.king,
	dan.carpenter, david, elic, eperezma, gautam.dawar, gshan,
	hdegoede, hulkci, jasowang, jiaming, kangjie.xu, lingshan.zhu,
	liubo03, michael.christie, pankaj.gupta, peng.fan, quic_mingxue,
	robin.murphy, sgarzare, suwan.kim027, syoshida, xieyongji,
	xuqiang36
In-Reply-To: <20220814043906.xkmhmnp23bqjzz4s@awork3.anarazel.de>

On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > Is there specific information you'd like from the VM? I just recreated the
> > problem and can extract.
> 
> Actually, after reproducing I seem to now hit a likely different issue. I
> guess I should have checked exactly the revision I had a problem with earlier,
> rather than doing a git pull (up to aea23e7c464b)

Looks like there's a generic memory corruption so it crashes
in random places. Would bisect be possible for you?

-- 
MST


^ permalink raw reply

* [PATCH] net/ppp: fix repeated words in comments
From: Jilin Yuan @ 2022-08-14  9:22 UTC (permalink / raw)
  To: paulus, davem, edumazet, kuba, pabeni
  Cc: linux-ppp, netdev, linux-kernel, Jilin Yuan

 Delete the redundant word 'the'.

Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
---
 drivers/net/ppp/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 4a365f15533e..942c7e7372d9 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2969,7 +2969,7 @@ ppp_unregister_channel(struct ppp_channel *chan)
 
 	/*
 	 * This ensures that we have returned from any calls into the
-	 * the channel's start_xmit or ioctl routine before we proceed.
+	 * channel's start_xmit or ioctl routine before we proceed.
 	 */
 	down_write(&pch->chan_sem);
 	spin_lock_bh(&pch->downl);
-- 
2.36.1


^ permalink raw reply related

* [syzbot] WARNING in tls_strp_done
From: syzbot @ 2022-08-14 10:15 UTC (permalink / raw)
  To: borisp, davem, edumazet, john.fastabend, kuba, linux-kernel,
	netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    7ebfc85e2cd7 Merge tag 'net-6.0-rc1' of git://git.kernel.o..
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=10545c6b080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=924833c12349a8c0
dashboard link: https://syzkaller.appspot.com/bug?extid=abd45eb849b05194b1b6
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=164c98cb080000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15497dc3080000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+abd45eb849b05194b1b6@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3611 at kernel/workqueue.c:3066 __flush_work+0x926/0xb10 kernel/workqueue.c:3066
Modules linked in:
CPU: 0 PID: 3611 Comm: syz-executor165 Not tainted 5.19.0-syzkaller-13930-g7ebfc85e2cd7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:__flush_work+0x926/0xb10 kernel/workqueue.c:3066
Code: 00 48 c7 c6 0b 12 4f 81 48 c7 c7 40 92 f8 8b e8 30 61 10 00 e9 66 fc ff ff e8 d6 f4 2c 00 0f 0b e9 5a fc ff ff e8 ca f4 2c 00 <0f> 0b 45 31 f6 e9 4b fc ff ff e8 0b 4c 79 00 e9 3a fb ff ff e8 b1
RSP: 0018:ffffc900038bf948 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888020f0a8f0 RCX: 0000000000000000
RDX: ffff88801bd50000 RSI: ffffffff814f1246 RDI: 0000000000000001
RBP: ffffc900038bfae0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff92000717f5f R14: 0000000000000001 R15: ffff888020f0a908
FS:  0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555559cb5d0 CR3: 000000000bc8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3162
 tls_strp_done+0x66/0x230 net/tls/tls_strp.c:478
 tls_sk_proto_close+0x40d/0xaf0 net/tls/tls_main.c:328
 inet_release+0x12e/0x270 net/ipv4/af_inet.c:428
 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:482
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x18/0x20 net/socket.c:1365
 __fput+0x277/0x9d0 fs/file_table.c:320
 task_work_run+0xdd/0x1a0 kernel/task_work.c:177
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xad5/0x29b0 kernel/exit.c:795
 do_group_exit+0xd2/0x2f0 kernel/exit.c:925
 __do_sys_exit_group kernel/exit.c:936 [inline]
 __se_sys_exit_group kernel/exit.c:934 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:934
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f0b52a2fdf9
Code: Unable to access opcode bytes at RIP 0x7f0b52a2fdcf.
RSP: 002b:00007fffd86d4c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f0b52aa43f0 RCX: 00007f0b52a2fdf9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000001
R10: 00000000200004c0 R11: 0000000000000246 R12: 00007f0b52aa43f0
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* [PATCH net 1/1] net_sched: cls_route: disallow handle of 0
From: Jamal Hadi Salim @ 2022-08-14 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni
  Cc: netdev, xiyou.wangcong, jiri, kuznet, cascardo, linux-distros,
	security, stephen, dsahern, gregkh, Jamal Hadi Salim

Follows up on:
https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/

handle of 0 implies from/to of universe realm which is not very
sensible.

Lets see what this patch will do:
$sudo tc qdisc add dev $DEV root handle 1:0 prio

//lets manufacture a way to insert handle of 0
$sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
route to 0 from 0 classid 1:10 action ok

//gets rejected...
Error: handle of 0 is not valid.
We have an error talking to the kernel, -1

//lets create a legit entry..
sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
classid 1:10 action ok

//what did the kernel insert?
$sudo tc filter ls dev $DEV parent 1:0
filter protocol ip pref 100 route chain 0
filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
	action order 1: gact action pass
	 random type none pass val 0
	 index 1 ref 1 bind 1

//Lets try to replace that legit entry with a handle of 0
$ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
handle 0x000a8000 route to 0 from 0 classid 1:10 action drop

Error: Replacing with handle of 0 is invalid.
We have an error talking to the kernel, -1

And last, lets run Cascardo's POC:
$ ./poc
0
0
-22
-22
-22

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/cls_route.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 3f935cbbaff6..48712bc51bda 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -424,6 +424,11 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 			return -EINVAL;
 	}
 
+	if (!nhandle) {
+		NL_SET_ERR_MSG(extack, "Replacing with handle of 0 is invalid");
+		return -EINVAL;
+	}
+
 	h1 = to_hash(nhandle);
 	b = rtnl_dereference(head->table[h1]);
 	if (!b) {
@@ -477,6 +482,11 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 	int err;
 	bool new = true;
 
+	if (!handle) {
+		NL_SET_ERR_MSG(extack, "Creating with handle of 0 is invalid");
+		return -EINVAL;
+	}
+
 	if (opt == NULL)
 		return handle ? -EINVAL : 0;
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v5] Bluetooth: hci_sync: Remove redundant func definition
From: kernel test robot @ 2022-08-14 12:18 UTC (permalink / raw)
  To: Zijun Hu, marcel, johan.hedberg, luiz.dentz, davem, edumazet,
	kuba, pabeni, luiz.von.dentz
  Cc: kbuild-all, linux-kernel, linux-bluetooth, netdev
In-Reply-To: <1658488552-24691-1-git-send-email-quic_zijuhu@quicinc.com>

Hi Zijun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bluetooth/master]
[also build test WARNING on net-next/master net/master linus/master v5.19]
[cannot apply to bluetooth-next/master next-20220812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zijun-Hu/Bluetooth-hci_sync-Remove-redundant-func-definition/20220722-191804
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git master
config: arm-defconfig (https://download.01.org/0day-ci/archive/20220814/202208142033.Kav1wBRp-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/01ff3d2230c220a1387940ed594eccda09dc51fb
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Zijun-Hu/Bluetooth-hci_sync-Remove-redundant-func-definition/20220722-191804
        git checkout 01ff3d2230c220a1387940ed594eccda09dc51fb
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash net/bluetooth/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/bluetooth/hci_sync.c:2398:6: warning: no previous prototype for 'disconnected_accept_list_entries' [-Wmissing-prototypes]
    2398 | bool disconnected_accept_list_entries(struct hci_dev *hdev)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/disconnected_accept_list_entries +2398 net/bluetooth/hci_sync.c

  2397	
> 2398	bool disconnected_accept_list_entries(struct hci_dev *hdev)
  2399	{
  2400		struct bdaddr_list *b;
  2401	
  2402		list_for_each_entry(b, &hdev->accept_list, list) {
  2403			struct hci_conn *conn;
  2404	
  2405			conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &b->bdaddr);
  2406			if (!conn)
  2407				return true;
  2408	
  2409			if (conn->state != BT_CONNECTED && conn->state != BT_CONFIG)
  2410				return true;
  2411		}
  2412	
  2413		return false;
  2414	}
  2415	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH v5] Bluetooth: hci_sync: Remove redundant func definition
From: kernel test robot @ 2022-08-14 12:18 UTC (permalink / raw)
  To: Zijun Hu, marcel, johan.hedberg, luiz.dentz, davem, edumazet,
	kuba, pabeni, luiz.von.dentz
  Cc: kbuild-all, linux-kernel, linux-bluetooth, netdev
In-Reply-To: <1658488552-24691-1-git-send-email-quic_zijuhu@quicinc.com>

Hi Zijun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bluetooth/master]
[also build test WARNING on net-next/master net/master linus/master v5.19]
[cannot apply to bluetooth-next/master next-20220812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zijun-Hu/Bluetooth-hci_sync-Remove-redundant-func-definition/20220722-191804
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git master
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220814/202208142029.Y9YOiT4V-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/01ff3d2230c220a1387940ed594eccda09dc51fb
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Zijun-Hu/Bluetooth-hci_sync-Remove-redundant-func-definition/20220722-191804
        git checkout 01ff3d2230c220a1387940ed594eccda09dc51fb
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/bluetooth/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/bluetooth/hci_sync.c:2398:6: warning: no previous prototype for 'disconnected_accept_list_entries' [-Wmissing-prototypes]
    2398 | bool disconnected_accept_list_entries(struct hci_dev *hdev)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/disconnected_accept_list_entries +2398 net/bluetooth/hci_sync.c

  2397	
> 2398	bool disconnected_accept_list_entries(struct hci_dev *hdev)
  2399	{
  2400		struct bdaddr_list *b;
  2401	
  2402		list_for_each_entry(b, &hdev->accept_list, list) {
  2403			struct hci_conn *conn;
  2404	
  2405			conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &b->bdaddr);
  2406			if (!conn)
  2407				return true;
  2408	
  2409			if (conn->state != BT_CONNECTED && conn->state != BT_CONFIG)
  2410				return true;
  2411		}
  2412	
  2413		return false;
  2414	}
  2415	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [RFC net-next 4/4] ynl: add a sample user for ethtool
From: Ido Schimmel @ 2022-08-14 12:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, edumazet, pabeni, sdf, jacob.e.keller, vadfed,
	johannes, jiri, dsahern, stephen, fw, linux-doc
In-Reply-To: <20220811022304.583300-5-kuba@kernel.org>

On Wed, Aug 10, 2022 at 07:23:04PM -0700, Jakub Kicinski wrote:
> @@ -0,0 +1,115 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +
> +name: ethtool
> +
> +description: |
> +  Ethernet device configuration interface.
> +
> +attr-cnt-suffix: CNT
> +
> +attribute-spaces:
> +  -
> +    name: header
> +    name-prefix: ETHTOOL_A_HEADER_
> +    attributes:
> +      -
> +        name: dev_index
> +        val: 1
> +        type: u32
> +      -
> +        name: dev_name
> +        type: nul-string
> +        len: ALTIFNAMSIZ - 1
> +      -
> +        name: flags
> +        type: u32
> +  -
> +    name: channels
> +    name-prefix: ETHTOOL_A_CHANNELS_
> +    attributes:
> +      -
> +        name: header
> +        val: 1
> +        type: nest
> +        nested-attributes: header
> +      -
> +        name: rx_max
> +        type: u32
> +      -
> +        name: tx_max
> +        type: u32
> +      -
> +        name: other_max
> +        type: u32
> +      -
> +        name: combined_max
> +        type: u32
> +      -
> +        name: rx_count
> +        type: u32
> +      -
> +        name: tx_count
> +        type: u32
> +      -
> +        name: other_count
> +        type: u32
> +      -
> +        name: combined_count
> +        type: u32

Another interesting use case for the schema can be automatic generation
of syzkaller descriptions. These are the corresponding descriptions for
syzkaller:

https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_ethtool.txt#L125

Last I checked, these descriptions had to be written by hand, which is
why they are generally out of date, leading to sub-optimal fuzzing. If
schemas are sent along with the kernel code and syzkaller/syzbot
automatically derives descriptions from them, then we should be able to
get meaningful fuzzing as soon as a feature lands in net-next.

> +
> +headers:
> +  user: linux/if.h
> +  uapi: linux/ethtool_netlink.h
> +
> +operations:
> +  name-prefix: ETHTOOL_MSG_
> +  async-prefix: ETHTOOL_MSG_
> +  list:
> +    -
> +      name: channels_get
> +      val: 17
> +      description: Get current and max supported number of channels.
> +      attribute-space: channels
> +      do:
> +        request:
> +          attributes:
> +            - header
> +        reply: &channel_reply
> +          attributes:
> +            - header
> +            - rx_max
> +            - tx_max
> +            - other_max
> +            - combined_max
> +            - rx_count
> +            - tx_count
> +            - other_count
> +            - combined_count
> +      dump:
> +        reply: *channel_reply
> +
> +    -
> +      name: channels_ntf
> +      description: Notification for device changing its number of channels.
> +      notify: channels_get
> +      mcgrp: monitor
> +
> +    -
> +      name: channels_set
> +      description: Set number of channels.
> +      attribute-space: channels
> +      do:
> +        request:
> +          attributes:
> +            - header
> +            - rx_count
> +            - tx_count
> +            - other_count
> +            - combined_count
> +
> +mcast-groups:
> +  name-prefix: ETHTOOL_MCGRP_
> +  name-suffix: _NAME
> +  list:
> +    -
> +      name: monitor

^ permalink raw reply

* Re: [PATCH v4 net-next 3/6] drivers: net: dsa: add locked fdb entry flag to drivers
From: Ido Schimmel @ 2022-08-14 14:55 UTC (permalink / raw)
  To: netdev
  Cc: Vladimir Oltean, davem, kuba, netdev, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Eric Dumazet, Paolo Abeni, Jiri Pirko,
	Ivan Vecera, Roopa Prabhu, Nikolay Aleksandrov, Shuah Khan,
	Daniel Borkmann, linux-kernel, bridge, linux-kselftest
In-Reply-To: <5a4cfc6246f621d006af69d4d1f61ed1@kapio-technology.com>

On Fri, Aug 12, 2022 at 02:29:48PM +0200, netdev@kapio-technology.com wrote:
> On 2022-08-11 13:28, Ido Schimmel wrote:
> 
> > > > I'm talking about roaming, not forwarding. Let's say you have a locked
> > > > entry with MAC X pointing to port Y. Now you get a packet with SMAC X
> > > > from port Z which is unlocked. Will the FDB entry roam to port Z? I
> > > > think it should, but at least in current implementation it seems that
> > > > the "locked" flag will not be reset and having locked entries pointing
> > > > to an unlocked port looks like a bug.
> > > >
> > > 
> 
> In general I have been thinking that the said setup is a network
> configuration error as I was arguing in an earlier conversation with
> Vladimir. In this setup we must remember that SMAC X becomes DMAC X in the
> return traffic on the open port. But the question arises to me why MAC X
> would be behind the locked port without getting authed while being behind an
> open port too?
> In a real life setup, I don't think you would want random hosts behind a
> locked port in the MAB case, but only the hosts you will let through. Other
> hosts should be regarded as intruders.
> 
> If we are talking about a station move, then the locked entry will age out
> and MAC X will function normally on the open port after the timeout, which
> was a case that was taken up in earlier discussions.
> 
> But I will anyhow do some testing with this 'edge case' (of being behind
> both a locked and an unlocked port) if I may call it so, and see to that the
> offloaded and non-offloaded cases correspond to each other, and will work
> satisfactory.

It would be best to implement these as additional test cases in the
current selftest. Then you can easily test with both veth pairs and
loopbacks and see that the hardware and software data paths behave the
same.

> 
> I think it will be good to have a flag to enable the mac-auth/MAB feature,
> and I suggest just calling the flag 'mab', as it is short.

Fine by me, but I'm not sure everyone agrees.

> 
> Otherwise I don't see any major issues with the whole feature as it is.

Will review and test the next version.

Thanks

^ permalink raw reply

* Re: [PATCH net 1/1] net_sched: cls_route: disallow handle of 0
From: Stephen Hemminger @ 2022-08-14 15:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: davem, edumazet, kuba, pabeni, netdev, xiyou.wangcong, jiri,
	kuznet, cascardo, linux-distros, security, dsahern, gregkh
In-Reply-To: <20220814112758.3088655-1-jhs@mojatatu.com>

On Sun, 14 Aug 2022 11:27:58 +0000
Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> Follows up on:
> https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/
> 
> handle of 0 implies from/to of universe realm which is not very
> sensible.
> 
> Lets see what this patch will do:
> $sudo tc qdisc add dev $DEV root handle 1:0 prio
> 
> //lets manufacture a way to insert handle of 0
> $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
> route to 0 from 0 classid 1:10 action ok
> 
> //gets rejected...
> Error: handle of 0 is not valid.
> We have an error talking to the kernel, -1
> 
> //lets create a legit entry..
> sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
> classid 1:10 action ok
> 
> //what did the kernel insert?
> $sudo tc filter ls dev $DEV parent 1:0
> filter protocol ip pref 100 route chain 0
> filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
> 	action order 1: gact action pass
> 	 random type none pass val 0
> 	 index 1 ref 1 bind 1
> 
> //Lets try to replace that legit entry with a handle of 0
> $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
> handle 0x000a8000 route to 0 from 0 classid 1:10 action drop
> 
> Error: Replacing with handle of 0 is invalid.
> We have an error talking to the kernel, -1
> 
> And last, lets run Cascardo's POC:
> $ ./poc
> 0
> 0
> -22
> -22
> -22
> 
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply

* [PATCH] wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
From: Siddh Raman Pant @ 2022-08-14 15:15 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: linux-wireless, netdev, linux-kernel, linux-kernel-mentees,
	syzbot+b6c9fe29aefe68e4ad34, stable

When we are not connected to a channel, sending channel "switch"
announcement doesn't make any sense.

The BSS list is empty in that case. This causes the for loop in
cfg80211_get_bss() to be bypassed, so the function returns NULL
(check line 1424 of net/wireless/scan.c), causing the WARN_ON()
in ieee80211_ibss_csa_beacon() to get triggered (check line 500
of net/mac80211/ibss.c), which was consequently reported on the
syzkaller dashboard.

Thus, check if we have an existing connection before generating
the CSA beacon in ieee80211_ibss_finish_csa().

Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode")
Bug report: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6
Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org

Signed-off-by: Siddh Raman Pant <code@siddh.me>
---
The fixes commit is old, and syzkaller shows the problem exists for
4.19 and 4.14 as well, so CC'd stable list.

 net/mac80211/ibss.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index d56890e3fabb..9b283bbc7bb4 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -530,6 +530,10 @@ int ieee80211_ibss_finish_csa(struct ieee80211_sub_if_data *sdata)
 
 	sdata_assert_lock(sdata);
 
+	/* When not connected/joined, sending CSA doesn't make sense. */
+	if (ifibss->state != IEEE80211_IBSS_MLME_JOINED)
+		return -ENOLINK;
+
 	/* update cfg80211 bss information with the new channel */
 	if (!is_zero_ether_addr(ifibss->bssid)) {
 		cbss = cfg80211_get_bss(sdata->local->hw.wiphy,
-- 
2.35.1



^ permalink raw reply related

* Re: [PATCH 4/6] net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
From: kernel test robot @ 2022-08-14 15:49 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: kbuild-all, Dmitry Safonov, Andy Lutomirski, Ard Biesheuvel,
	David Ahern, Eric Biggers, Eric Dumazet, Francesco Ruggeri,
	Herbert Xu, Hideaki YOSHIFUJI, Jakub Kicinski, Leonard Crestez,
	Paolo Abeni, Salam Noureddine, netdev, linux-crypto
In-Reply-To: <20220726201600.1715505-5-dima@arista.com>

Hi Dmitry,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 058affafc65a74cf54499fb578b66ad0b18f939b]

url:    https://github.com/intel-lab-lkp/linux/commits/Dmitry-Safonov/net-crypto-Introduce-crypto_pool/20220727-041830
base:   058affafc65a74cf54499fb578b66ad0b18f939b
config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20220814/202208142332.WUqM9sfv-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/a4ee3ecdaada036ed6747ed86eaf7270d3f27bab
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dmitry-Safonov/net-crypto-Introduce-crypto_pool/20220727-041830
        git checkout a4ee3ecdaada036ed6747ed86eaf7270d3f27bab
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/ipv4/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv4/tcp_ipv4.c:1174:5: warning: no previous prototype for '__tcp_md5_do_add' [-Wmissing-prototypes]
    1174 | int __tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
         |     ^~~~~~~~~~~~~~~~


vim +/__tcp_md5_do_add +1174 net/ipv4/tcp_ipv4.c

  1172	
  1173	/* This can be called on a newly created socket, from other files */
> 1174	int __tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
  1175			     int family, u8 prefixlen, int l3index, u8 flags,
  1176			     const u8 *newkey, u8 newkeylen, gfp_t gfp)
  1177	{
  1178		/* Add Key to the list */
  1179		struct tcp_md5sig_key *key;
  1180		struct tcp_sock *tp = tcp_sk(sk);
  1181		struct tcp_md5sig_info *md5sig;
  1182	
  1183		key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index, flags);
  1184		if (key) {
  1185			/* Pre-existing entry - just update that one.
  1186			 * Note that the key might be used concurrently.
  1187			 * data_race() is telling kcsan that we do not care of
  1188			 * key mismatches, since changing MD5 key on live flows
  1189			 * can lead to packet drops.
  1190			 */
  1191			data_race(memcpy(key->key, newkey, newkeylen));
  1192	
  1193			/* Pairs with READ_ONCE() in tcp_md5_hash_key().
  1194			 * Also note that a reader could catch new key->keylen value
  1195			 * but old key->key[], this is the reason we use __GFP_ZERO
  1196			 * at sock_kmalloc() time below these lines.
  1197			 */
  1198			WRITE_ONCE(key->keylen, newkeylen);
  1199	
  1200			return 0;
  1201		}
  1202	
  1203		md5sig = rcu_dereference_protected(tp->md5sig_info,
  1204						   lockdep_sock_is_held(sk));
  1205	
  1206		key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
  1207		if (!key)
  1208			return -ENOMEM;
  1209		if (!tcp_alloc_md5sig_pool()) {
  1210			sock_kfree_s(sk, key, sizeof(*key));
  1211			return -ENOMEM;
  1212		}
  1213	
  1214		memcpy(key->key, newkey, newkeylen);
  1215		key->keylen = newkeylen;
  1216		key->family = family;
  1217		key->prefixlen = prefixlen;
  1218		key->l3index = l3index;
  1219		key->flags = flags;
  1220		memcpy(&key->addr, addr,
  1221		       (IS_ENABLED(CONFIG_IPV6) && family == AF_INET6) ? sizeof(struct in6_addr) :
  1222									 sizeof(struct in_addr));
  1223		hlist_add_head_rcu(&key->node, &md5sig->head);
  1224		return 0;
  1225	}
  1226	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH 4/6] net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction
From: kernel test robot @ 2022-08-14 15:49 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: kbuild-all, Dmitry Safonov, Andy Lutomirski, Ard Biesheuvel,
	David Ahern, Eric Biggers, Eric Dumazet, Francesco Ruggeri,
	Herbert Xu, Hideaki YOSHIFUJI, Jakub Kicinski, Leonard Crestez,
	Paolo Abeni, Salam Noureddine, netdev, linux-crypto
In-Reply-To: <20220726201600.1715505-5-dima@arista.com>

Hi Dmitry,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on 058affafc65a74cf54499fb578b66ad0b18f939b]

url:    https://github.com/intel-lab-lkp/linux/commits/Dmitry-Safonov/net-crypto-Introduce-crypto_pool/20220727-041830
base:   058affafc65a74cf54499fb578b66ad0b18f939b
config: i386-randconfig-a005 (https://download.01.org/0day-ci/archive/20220814/202208142357.bDGLpecB-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/a4ee3ecdaada036ed6747ed86eaf7270d3f27bab
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dmitry-Safonov/net-crypto-Introduce-crypto_pool/20220727-041830
        git checkout a4ee3ecdaada036ed6747ed86eaf7270d3f27bab
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash net/ipv4/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv4/tcp_ipv4.c:1174:5: warning: no previous prototype for '__tcp_md5_do_add' [-Wmissing-prototypes]
    1174 | int __tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
         |     ^~~~~~~~~~~~~~~~


vim +/__tcp_md5_do_add +1174 net/ipv4/tcp_ipv4.c

  1172	
  1173	/* This can be called on a newly created socket, from other files */
> 1174	int __tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
  1175			     int family, u8 prefixlen, int l3index, u8 flags,
  1176			     const u8 *newkey, u8 newkeylen, gfp_t gfp)
  1177	{
  1178		/* Add Key to the list */
  1179		struct tcp_md5sig_key *key;
  1180		struct tcp_sock *tp = tcp_sk(sk);
  1181		struct tcp_md5sig_info *md5sig;
  1182	
  1183		key = tcp_md5_do_lookup_exact(sk, addr, family, prefixlen, l3index, flags);
  1184		if (key) {
  1185			/* Pre-existing entry - just update that one.
  1186			 * Note that the key might be used concurrently.
  1187			 * data_race() is telling kcsan that we do not care of
  1188			 * key mismatches, since changing MD5 key on live flows
  1189			 * can lead to packet drops.
  1190			 */
  1191			data_race(memcpy(key->key, newkey, newkeylen));
  1192	
  1193			/* Pairs with READ_ONCE() in tcp_md5_hash_key().
  1194			 * Also note that a reader could catch new key->keylen value
  1195			 * but old key->key[], this is the reason we use __GFP_ZERO
  1196			 * at sock_kmalloc() time below these lines.
  1197			 */
  1198			WRITE_ONCE(key->keylen, newkeylen);
  1199	
  1200			return 0;
  1201		}
  1202	
  1203		md5sig = rcu_dereference_protected(tp->md5sig_info,
  1204						   lockdep_sock_is_held(sk));
  1205	
  1206		key = sock_kmalloc(sk, sizeof(*key), gfp | __GFP_ZERO);
  1207		if (!key)
  1208			return -ENOMEM;
  1209		if (!tcp_alloc_md5sig_pool()) {
  1210			sock_kfree_s(sk, key, sizeof(*key));
  1211			return -ENOMEM;
  1212		}
  1213	
  1214		memcpy(key->key, newkey, newkeylen);
  1215		key->keylen = newkeylen;
  1216		key->family = family;
  1217		key->prefixlen = prefixlen;
  1218		key->l3index = l3index;
  1219		key->flags = flags;
  1220		memcpy(&key->addr, addr,
  1221		       (IS_ENABLED(CONFIG_IPV6) && family == AF_INET6) ? sizeof(struct in6_addr) :
  1222									 sizeof(struct in_addr));
  1223		hlist_add_head_rcu(&key->node, &md5sig->head);
  1224		return 0;
  1225	}
  1226	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply

* Re: [PATCH v4 0/4] Introduce security_create_user_ns()
From: Serge E. Hallyn @ 2022-08-14 15:55 UTC (permalink / raw)
  To: Paul Moore
  Cc: Eric W. Biederman, Frederick Lawler, kpsingh, revest, jackmanb,
	ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	jmorris, serge, stephen.smalley.work, eparis, shuah, brauner,
	casey, bpf, linux-security-module, selinux, linux-kselftest,
	linux-kernel, netdev, kernel-team, cgzones, karl
In-Reply-To: <CAHC9VhS3udhEecVYVvHm=tuqiPGh034-xPqXYtFjBk23+p-Szg@mail.gmail.com>

On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
> On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Paul Moore <paul@paul-moore.com> writes:
> > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >> Frederick Lawler <fred@cloudflare.com> writes:
> > >>
> > >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > >> > a call to create_user_ns().
> > >>
> > >> Re-nack for all of the same reasons.
> > >> AKA This can only break the users of the user namespace.
> > >>
> > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > >>
> > >> You aren't fixing what your problem you are papering over it by denying
> > >> access to the user namespace.
> > >>
> > >> Nack Nack Nack.
> > >>
> > >> Stop.
> > >>
> > >> Go back to the drawing board.
> > >>
> > >> Do not pass go.
> > >>
> > >> Do not collect $200.
> > >
> > > If you want us to take your comments seriously Eric, you need to
> > > provide the list with some constructive feedback that would allow
> > > Frederick to move forward with a solution to the use case that has
> > > been proposed.  You response above may be many things, but it is
> > > certainly not that.
> >
> > I did provide constructive feedback.  My feedback to his problem
> > was to address the real problem of bugs in the kernel.
> 
> We've heard from several people who have use cases which require
> adding LSM-level access controls and observability to user namespace
> creation.  This is the problem we are trying to solve here; if you do
> not like the approach proposed in this patchset please suggest another
> implementation that allows LSMs visibility into user namespace
> creation.

Regarding the observability - can someone concisely lay out why just
auditing userns creation would not suffice?  Userspace could decide
what to report based on whether the creating user_ns == /proc/1/ns/user...

Regarding limiting the tweaking of otherwise-privileged code by
unprivileged users, i wonder whether we could instead add smarts to
ns_capable().  Point being, uid mapping would still work, but we'd
break the "privileged against resources you own" part of user
namespaces.  I would want it to default to allow, but then when a
0-day is found which requires reaching ns_capable() code, admins 
could easily prevent exploitation until reboot from a fixed kernel.

^ permalink raw reply

* Re: [patch iproute2-next] devlink: expose nested devlink for a line card object
From: patchwork-bot+netdevbpf @ 2022-08-14 17:40 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sthemmin, dsahern, mlxsw, idosch
In-Reply-To: <20220809131730.2677759-1-jiri@resnulli.us>

Hello:

This patch was applied to iproute2/iproute2-next.git (main)
by David Ahern <dsahern@kernel.org>:

On Tue,  9 Aug 2022 15:17:30 +0200 you wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> If line card object contains a nested devlink, expose it.
> 
> Example:
> 
> $ devlink lc show pci/0000:01:00.0 lc 1
> pci/0000:01:00.0:
>   lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
>     supported_types:
>       16x100G
> $ devlink dev show auxiliary/mlxsw_core.lc.0
> auxiliary/mlxsw_core.lc.0
> 
> [...]

Here is the summary with links:
  - [iproute2-next] devlink: expose nested devlink for a line card object
    https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=700a8991f05e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [GIT PULL] virtio: fatures, fixes
From: Andres Freund @ 2022-08-14 19:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Linus Torvalds, kvm, virtualization, netdev,
	linux-kernel, alvaro.karsz, colin.i.king, colin.king,
	dan.carpenter, david, elic, eperezma, gautam.dawar, gshan,
	hdegoede, hulkci, jasowang, jiaming, kangjie.xu, lingshan.zhu,
	liubo03, michael.christie, pankaj.gupta, peng.fan, quic_mingxue,
	robin.murphy, sgarzare, suwan.kim027, syoshida, xieyongji,
	xuqiang36
In-Reply-To: <20220814045853-mutt-send-email-mst@kernel.org>

Hi,

On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > Is there specific information you'd like from the VM? I just recreated the
> > > problem and can extract.
> > 
> > Actually, after reproducing I seem to now hit a likely different issue. I
> > guess I should have checked exactly the revision I had a problem with earlier,
> > rather than doing a git pull (up to aea23e7c464b)
> 
> Looks like there's a generic memory corruption so it crashes
> in random places.

Either a generic memory corruption, or something wrong with IO.

> Would bisect be possible for you?

I'll give it a go.

Greetings,

Andres Freund

^ permalink raw reply

* [PATCH] af_unix: Add ioctl(SIOCUNIXGRABFDS) to grab files of receive queue skbs
From: Kirill Tkhai @ 2022-08-14 20:53 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Kirill Tkhai

When a fd owning a counter of some critical resource, say, of a mount,
it's impossible to umount that mount and disconnect related block device.
That fd may be contained in some unix socket receive queue skb.

Despite we have an interface for detecting such the sockets queues
(/proc/[PID]/fdinfo/[fd] shows non-zero scm_fds count if so) and
it's possible to kill that process to release the counter, the problem is
that there may be several processes, and it's not a good thing to kill
each of them.

This patch adds a simple interface to grab files from receive queue,
so the caller may analyze them, and even do that recursively, if grabbed
file is unix socket itself. So, the described above problem may be solved
by this ioctl() in pair with pidfd_getfd().

Note, that the existing recvmsg(,,MSG_PEEK) is not suitable for that
purpose, since it modifies peek offset inside socket, and this results
in a problem in case of examined process uses peek offset itself.
Additional ptrace freezing of that task plus ioctl(SO_PEEK_OFF) won't help
too, since that socket may relate to several tasks, and there is no
reliable and non-racy way to detect that. Also, if the caller of such
trick will die, the examined task will remain frozen forever. The new
suggested ioctl(SIOCUNIXGRABFDS) does not have such problems.

The realization of ioctl(SIOCUNIXGRABFDS) is pretty simple. The only
interesting thing is protocol with userspace. Firstly, we let userspace
to know the number of all files in receive queue skbs. Then we receive
fds one by one starting from requested offset. We return number of
received fds if there is a successfully received fd, and this number
may be less in case of error or desired fds number lack. Userspace
may detect that situations by comparison of returned value and
out.nr_all minus in.nr_skip. Looking over different variant this one
looks the best for me (I considered returning error in case of error
and there is a received fd. Also I considered returning number of
received files as one more member in struct unix_ioc_grab_fds).

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
 include/uapi/linux/un.h |   12 ++++++++
 net/unix/af_unix.c      |   70 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/include/uapi/linux/un.h b/include/uapi/linux/un.h
index 0ad59dc8b686..995b358263dd 100644
--- a/include/uapi/linux/un.h
+++ b/include/uapi/linux/un.h
@@ -11,6 +11,18 @@ struct sockaddr_un {
 	char sun_path[UNIX_PATH_MAX];	/* pathname */
 };
 
+struct unix_ioc_grab_fds {
+	struct {
+		int nr_grab;
+		int nr_skip;
+		int *fds;
+	} in;
+	struct {
+		int nr_all;
+	} out;
+};
+
 #define SIOCUNIXFILE (SIOCPROTOPRIVATE + 0) /* open a socket file with O_PATH */
+#define SIOCUNIXGRABFDS (SIOCPROTOPRIVATE + 1) /* grab files from recv queue */
 
 #endif /* _LINUX_UN_H */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index bf338b782fc4..3c7e8049eba1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3079,6 +3079,73 @@ static int unix_open_file(struct sock *sk)
 	return fd;
 }
 
+static int unix_ioc_grab_fds(struct sock *sk, struct unix_ioc_grab_fds __user *uarg)
+{
+	int i, todo, skip, count, all, err, done = 0;
+	struct unix_sock *u = unix_sk(sk);
+	struct unix_ioc_grab_fds arg;
+	struct sk_buff *skb = NULL;
+	struct scm_fp_list *fp;
+
+	if (copy_from_user(&arg, uarg, sizeof(arg)))
+		return -EFAULT;
+
+	skip = arg.in.nr_skip;
+	todo = arg.in.nr_grab;
+
+	if (skip < 0 || todo <= 0)
+		return -EINVAL;
+	if (mutex_lock_interruptible(&u->iolock))
+		return -EINTR;
+
+	all = atomic_read(&u->scm_stat.nr_fds);
+	err = -EFAULT;
+	/* Set uarg->out.nr_all before the first file is received. */
+	if (put_user(all, &uarg->out.nr_all))
+		goto unlock;
+	err = 0;
+	if (all <= skip)
+		goto unlock;
+	if (all - skip < todo)
+		todo = all - skip;
+	while (todo) {
+		spin_lock(&sk->sk_receive_queue.lock);
+		if (!skb)
+			skb = skb_peek(&sk->sk_receive_queue);
+		else
+			skb = skb_peek_next(skb, &sk->sk_receive_queue);
+		spin_unlock(&sk->sk_receive_queue.lock);
+
+		if (!skb)
+			goto unlock;
+
+		fp = UNIXCB(skb).fp;
+		count = fp->count;
+		if (skip >= count) {
+			skip -= count;
+			continue;
+		}
+
+		for (i = skip; i < count && todo; i++) {
+			err = receive_fd_user(fp->fp[i], &arg.in.fds[done], 0);
+			if (err < 0)
+				goto unlock;
+			done++;
+			todo--;
+		}
+		skip = 0;
+	}
+unlock:
+	mutex_unlock(&u->iolock);
+
+	/* Return number of fds (non-error) if there is a received file. */
+	if (done)
+		return done;
+	if (err < 0)
+		return err;
+	return 0;
+}
+
 static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
 	struct sock *sk = sock->sk;
@@ -3113,6 +3180,9 @@ static int unix_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 		}
 		break;
 #endif
+	case SIOCUNIXGRABFDS:
+		err = unix_ioc_grab_fds(sk, (struct unix_ioc_grab_fds __user *)arg);
+		break;
 	default:
 		err = -ENOIOCTLCMD;
 		break;



^ permalink raw reply related

* Re: [PATCH] af_unix: Add ioctl(SIOCUNIXGRABFDS) to grab files of receive queue skbs
From: kernel test robot @ 2022-08-14 23:59 UTC (permalink / raw)
  To: Kirill Tkhai, netdev
  Cc: llvm, kbuild-all, Eric Dumazet, Paolo Abeni, Kirill Tkhai
In-Reply-To: <9293c7ee-6fb7-7142-66fe-051548ffb65c@ya.ru>

Hi Kirill,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on net-next/master linus/master v5.19 next-20220812]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kirill-Tkhai/af_unix-Add-ioctl-SIOCUNIXGRABFDS-to-grab-files-of-receive-queue-skbs/20220815-045608
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 777885673122b78b2abd2f1e428730961a786ff2
config: s390-randconfig-r044-20220815 (https://download.01.org/0day-ci/archive/20220815/202208150743.t05nZxqC-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 3329cec2f79185bafd678f310fafadba2a8c76d2)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/0b4bc309fb3cdc6e470ee5c28e33f2909bfb8266
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Kirill-Tkhai/af_unix-Add-ioctl-SIOCUNIXGRABFDS-to-grab-files-of-receive-queue-skbs/20220815-045608
        git checkout 0b4bc309fb3cdc6e470ee5c28e33f2909bfb8266
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

ERROR: modpost: "devm_ioremap" [drivers/net/ethernet/altera/altera_tse.ko] undefined!
>> ERROR: modpost: "__receive_fd" [net/unix/unix.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox