Netdev List
 help / color / mirror / Atom feed
* [PATCH] net:ipvs: add rcu read lock in some parts
From: sunsuwan @ 2022-08-12  9:34 UTC (permalink / raw)
  To: horms, ja, pablo, kadlec, netdev, lvs-devel
  Cc: chenzhen126, yanan, liaichun, caowangbao, sunsuwan3

We founf a possible UAF if rmmod pe_sid or schedule,
when packages in hook and get pe or sched.

Signed-off-by: sunsuwan <sunsuwan3@huawei.com>
Signed-off-by: chenzhen <chenzhen126@huawei.com>
---
 net/netfilter/ipvs/ip_vs_core.c | 6 ++++++
 net/netfilter/ipvs/ip_vs_ctl.c  | 3 +++
 net/netfilter/ipvs/ip_vs_dh.c   | 2 ++
 3 files changed, 11 insertions(+)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 51ad557a525b..d289f184d5c1 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -235,7 +235,9 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
 {
 	ip_vs_conn_fill_param(svc->ipvs, svc->af, protocol, caddr, cport, vaddr,
 			      vport, p);
+	rcu_read_lock();
 	p->pe = rcu_dereference(svc->pe);
+	rcu_read_unlock();
 	if (p->pe && p->pe->fill_param)
 		return p->pe->fill_param(p, skb);
 
@@ -346,7 +348,9 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 		 * template is not available.
 		 * return *ignored=0 i.e. ICMP and NF_DROP
 		 */
+		rcu_read_lock();
 		sched = rcu_dereference(svc->scheduler);
+		rcu_read_unlock();
 		if (sched) {
 			/* read svc->sched_data after svc->scheduler */
 			smp_rmb();
@@ -521,7 +525,9 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 		return NULL;
 	}
 
+	rcu_read_lock();
 	sched = rcu_dereference(svc->scheduler);
+	rcu_read_unlock();
 	if (sched) {
 		/* read svc->sched_data after svc->scheduler */
 		smp_rmb();
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index efab2b06d373..91e568028001 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -580,6 +580,7 @@ bool ip_vs_has_real_service(struct netns_ipvs *ipvs, int af, __u16 protocol,
 	/* Check for "full" addressed entries */
 	hash = ip_vs_rs_hashkey(af, daddr, dport);
 
+	rcu_read_lock();
 	hlist_for_each_entry_rcu(dest, &ipvs->rs_table[hash], d_list) {
 		if (dest->port == dport &&
 		    dest->af == af &&
@@ -587,9 +588,11 @@ bool ip_vs_has_real_service(struct netns_ipvs *ipvs, int af, __u16 protocol,
 		    (dest->protocol == protocol || dest->vfwmark) &&
 		    IP_VS_DFWD_METHOD(dest) == IP_VS_CONN_F_MASQ) {
 			/* HIT */
+			rcu_read_unlock();
 			return true;
 		}
 	}
+	rcu_read_unlock();
 
 	return false;
 }
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 5e6ec32aff2b..3e4b9607172b 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -219,7 +219,9 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
 	IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
 
 	s = (struct ip_vs_dh_state *) svc->sched_data;
+	rcu_read_lock();
 	dest = ip_vs_dh_get(svc->af, s, &iph->daddr);
+	rcu_read_unlock();
 	if (!dest
 	    || !(dest->flags & IP_VS_DEST_F_AVAILABLE)
 	    || atomic_read(&dest->weight) <= 0
-- 
2.30.0


^ permalink raw reply related

* Re: [PATCH] rds: add missing barrier to release_refill
From: patchwork-bot+netdevbpf @ 2022-08-12  9:50 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel
In-Reply-To: <alpine.LRH.2.02.2208100858130.19047@file01.intranet.prod.int.rdu2.redhat.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 10 Aug 2022 09:00:42 -0400 (EDT) you wrote:
> The functions clear_bit and set_bit do not imply a memory barrier, thus it
> may be possible that the waitqueue_active function (which does not take
> any locks) is moved before clear_bit and it could miss a wakeup event.
> 
> Fix this bug by adding a memory barrier after clear_bit.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org

Here is the summary with links:
  - rds: add missing barrier to release_refill
    https://git.kernel.org/netdev/net/c/9f414eb409da

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2] wifi: cfg80211: Fix UAF in ieee80211_scan_rx()
From: Siddh Raman Pant @ 2022-08-12  9:51 UTC (permalink / raw)
  To: Johannes Berg, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: linux-wireless, netdev, linux-kernel-mentees,
	syzbot+f9acff9bf08a845f225d, syzbot+6cb476b7c69916a0caca,
	syzbot+9250865a55539d384347, linux-kernel
In-Reply-To: <20220726123921.29664-1-code@siddh.me>

On Tue, 26 Jul 2022 18:09:21 +0530  Siddh Raman Pant  wrote:
> ieee80211_scan_rx() tries to access scan_req->flags after a null check
> (see line 303 of mac80211/scan.c), but ___cfg80211_scan_done() uses
> kfree() on the scan_req (see line 991 of wireless/scan.c).
> 
> This results in a UAF.
> 
> ieee80211_scan_rx() is called inside a RCU read-critical section
> initiated by ieee80211_rx_napi() (see line 5044 of mac80211/rx.c).
> 
> Thus, add an rcu_head to the scan_req struct, so that we can use
> kfree_rcu() instead of kfree() and thus not free during the critical
> section.
> 
> We can clear the pointer before freeing here, since scan_req is
> accessed using rcu_dereference().
> 
> Bug report (3): https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d
> Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com
> Reported-by: syzbot+6cb476b7c69916a0caca@syzkaller.appspotmail.com
> Reported-by: syzbot+9250865a55539d384347@syzkaller.appspotmail.com
> 
> Signed-off-by: Siddh Raman Pant code@siddh.me>
> ---
> Changes since v1 as requested:
> - Fixed commit heading and better commit message.
> - Clear pointer before freeing.
> 
>  include/net/cfg80211.h | 2 ++
>  net/wireless/scan.c    | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index 80f41446b1f0..7e0b448c4cdb 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -2368,6 +2368,7 @@ struct cfg80211_scan_6ghz_params {
>   * @n_6ghz_params: number of 6 GHz params
>   * @scan_6ghz_params: 6 GHz params
>   * @bssid: BSSID to scan for (most commonly, the wildcard BSSID)
> + * @rcu_head: (internal) RCU head to use for freeing
>   */
>  struct cfg80211_scan_request {
>  	struct cfg80211_ssid *ssids;
> @@ -2397,6 +2398,7 @@ struct cfg80211_scan_request {
>  	bool scan_6ghz;
>  	u32 n_6ghz_params;
>  	struct cfg80211_scan_6ghz_params *scan_6ghz_params;
> +	struct rcu_head rcu_head;
>  
>  	/* keep last */
>  	struct ieee80211_channel *channels[];
> diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> index 6d82bd9eaf8c..6cf58fe6dea0 100644
> --- a/net/wireless/scan.c
> +++ b/net/wireless/scan.c
> @@ -988,8 +988,8 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev,
>  	kfree(rdev->int_scan_req);
>  	rdev->int_scan_req = NULL;
>  
> -	kfree(rdev->scan_req);
>  	rdev->scan_req = NULL;
> +	kfree_rcu(rdev_req, rcu_head);
>  
>  	if (!send_message)
>  		rdev->scan_msg = msg;
> -- 
> 2.35.1
> 

Hello,

Probably the above quoted patch was missed, which can be found on
https://lore.kernel.org/linux-wireless/20220726123921.29664-1-code@siddh.me/

This patch was posted more than 2 weeks ago, with changes as requested.

With the merge window almost ending, may I request for another look at
this patch?

Thanks,
Siddh

^ permalink raw reply

* Re: [PATCH net] l2tp: Serialize access to sk_user_data with sock lock
From: Jakub Sitnicki @ 2022-08-12  9:54 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, kernel-team, van fantasy
In-Reply-To: <20220811102310.3577136d@kernel.org>

On Thu, Aug 11, 2022 at 10:23 AM -07, Jakub Kicinski wrote:
> On Wed, 10 Aug 2022 12:28:48 +0200 Jakub Sitnicki wrote:
>> Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
>
> That tag immediately sets off red flags. Please find the commit where
> to code originates, not where it was last moved.

The code move happened in v2.6.35. There's no point in digging further, IMHO.

>
>> Reported-by: van fantasy <g1042620637@gmail.com>
>> Tested-by: van fantasy <g1042620637@gmail.com>
>
> Can we get real names? Otherwise let's just drop those tags.
> I know that the legal name requirement is only for S-o-b tags,
> technically, but it feels silly.

I don't make the rules. There is already a precendent in the git log:

commit 5c835bb142d4013c2ab24bff5ae9f6709a39cbcf
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Jul 8 16:36:09 2022 -0700

    mptcp: fix subflow traversal at disconnect time

    At disconnect time the MPTCP protocol traverse the subflows
    list closing each of them. In some circumstances - MPJ subflow,
    passive MPTCP socket, the latter operation can remove the
    subflow from the list, invalidating the current iterator.

    Address the issue using the safe list traversing helper
    variant.

    Reported-by: van fantasy <g1042620637@gmail.com>
    Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
    Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net] iavf: Fix deadlock in initialization
From: Jankowski, Konrad0 @ 2022-08-12  9:59 UTC (permalink / raw)
  To: ivecera, netdev@vger.kernel.org
  Cc: moderated list:INTEL ETHERNET DRIVERS, Brandeburg, Jesse,
	open list, Stefan Assmann, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David S. Miller
In-Reply-To: <20220808175845.484968-1-ivecera@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Ivan Vecera
> Sent: Monday, August 8, 2022 7:59 PM
> To: netdev@vger.kernel.org
> Cc: moderated list:INTEL ETHERNET DRIVERS <intel-wired-
> lan@lists.osuosl.org>; Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> open list <linux-kernel@vger.kernel.org>; Stefan Assmann
> <sassmann@kpanic.de>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S.
> Miller <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH net] iavf: Fix deadlock in initialization
> 
> Fix deadlock that occurs when iavf interface is a part of failover configuration.
> 
> 1. Mutex crit_lock is taken at the beginning of iavf_watchdog_task() 2.
> Function iavf_init_config_adapter() is called when adapter
>    state is __IAVF_INIT_CONFIG_ADAPTER
> 3. iavf_init_config_adapter() calls register_netdevice() that emits
>    NETDEV_REGISTER event
> 4. Notifier function failover_event() then calls
>    net_failover_slave_register() that calls dev_open() 5. dev_open() calls
> iavf_open() that tries to take crit_lock in
>    end-less loop
> 
> Stack trace:
> ...
> [  790.251876]  usleep_range_state+0x5b/0x80 [  790.252547]
> iavf_open+0x37/0x1d0 [iavf] [  790.253139]  __dev_open+0xcd/0x160 [
> 790.253699]  dev_open+0x47/0x90 [  790.254323]
> net_failover_slave_register+0x122/0x220 [net_failover] [  790.255213]
> failover_slave_register.part.7+0xd2/0x180 [failover] [  790.256050]
> failover_event+0x122/0x1ab [failover] [  790.256821]
> notifier_call_chain+0x47/0x70 [  790.257510]
> register_netdevice+0x20f/0x550 [  790.258263]
> iavf_watchdog_task+0x7c8/0xea0 [iavf] [  790.259009]
> process_one_work+0x1a7/0x360 [  790.259705]  worker_thread+0x30/0x390
> 
> To fix the situation we should check the current adapter state after first
> unsuccessful mutex_trylock() and return with -EBUSY if it is
> __IAVF_INIT_CONFIG_ADAPTER.
> 
> Fixes: 226d528512cf ("iavf: fix locking of critical sections")
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 45d097a164ad..f9dcaadc7ea0 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -4085,8 +4085,17 @@ static int iavf_open(struct net_device *netdev)

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>

^ permalink raw reply

* Re: [PATCH] net: cxgb3: Fix comment typo
From: patchwork-bot+netdevbpf @ 2022-08-12 10:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: edumazet, rajur, davem, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20220811115701.4578-1-wangborong@cdjrlc.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 11 Aug 2022 19:57:01 +0800 you wrote:
> The double `the' is duplicated in the comment, remove one.
> 
> Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
> ---
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - net: cxgb3: Fix comment typo
    https://git.kernel.org/netdev/net/c/75d8620d46f0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] bnx2x: Fix comment typo
From: patchwork-bot+netdevbpf @ 2022-08-12 10:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: edumazet, aelior, skalluru, manishc, davem, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <20220811115620.3596-1-wangborong@cdjrlc.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 11 Aug 2022 19:56:20 +0800 you wrote:
> The double `the' is duplicated in the comment, remove one.
> 
> Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - bnx2x: Fix comment typo
    https://git.kernel.org/netdev/net/c/0619d0fa6ced

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] skfp/h: fix repeated words in comments
From: patchwork-bot+netdevbpf @ 2022-08-12 10:30 UTC (permalink / raw)
  To: Jilin Yuan; +Cc: davem, edumazet, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20220810135901.17400-1-yuanjilin@cdjrlc.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 10 Aug 2022 21:59:01 +0800 you wrote:
> Delete the redundant word 'the'.
> 
> Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com>
> ---
>  drivers/net/fddi/skfp/h/hwmtm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - skfp/h: fix repeated words in comments
    https://git.kernel.org/netdev/net/c/86d2155e48f6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net/sunrpc: fix potential memory leaks in rpc_sysfs_xprt_state_change()
From: patchwork-bot+netdevbpf @ 2022-08-12 10:30 UTC (permalink / raw)
  To: Xin Xiong
  Cc: trond.myklebust, anna, chuck.lever, jlayton, davem, edumazet,
	kuba, pabeni, kolga, linux-nfs, netdev, linux-kernel, yuanxzhang,
	tanxin.ctf
In-Reply-To: <20220810152909.25149-1-xiongx18@fudan.edu.cn>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 10 Aug 2022 23:29:13 +0800 you wrote:
> The issue happens on some error handling paths. When the function
> fails to grab the object `xprt`, it simply returns 0, forgetting to
> decrease the reference count of another object `xps`, which is
> increased by rpc_sysfs_xprt_kobj_get_xprt_switch(), causing refcount
> leaks. Also, the function forgets to check whether `xps` is valid
> before using it, which may result in NULL-dereferencing issues.
> 
> [...]

Here is the summary with links:
  - net/sunrpc: fix potential memory leaks in rpc_sysfs_xprt_state_change()
    https://git.kernel.org/netdev/net/c/bfc48f1b0505

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: ipa: Fix comment typo
From: patchwork-bot+netdevbpf @ 2022-08-12 10:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: edumazet, elder, davem, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20220811115259.64225-1-wangborong@cdjrlc.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 11 Aug 2022 19:52:59 +0800 you wrote:
> The double `is' is duplicated in the comment, remove one.
> 
> Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
> ---
>  drivers/net/ipa/ipa_reg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - net: ipa: Fix comment typo
    https://git.kernel.org/netdev/net/c/9221b2898a58

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: lan966x: fix checking for return value of platform_get_irq_byname()
From: patchwork-bot+netdevbpf @ 2022-08-12 10:40 UTC (permalink / raw)
  To: Li Qiong
  Cc: horatiu.vultur, UNGLinuxDriver, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel, yuzhe, renyu, jiaming, kernel-janitors
In-Reply-To: <20220812030954.24050-1-liqiong@nfschina.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 12 Aug 2022 11:09:54 +0800 you wrote:
> The platform_get_irq_byname() returns non-zero IRQ number
> or negative error number. "if (irq)" always true, chang it
> to "if (irq > 0)"
> 
> Signed-off-by: Li Qiong <liqiong@nfschina.com>
> ---
>  drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Here is the summary with links:
  - net: lan966x: fix checking for return value of platform_get_irq_byname()
    https://git.kernel.org/netdev/net/c/40b4ac880e21

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH V5 0/6] ifcvf/vDPA: support query device config space through netlink
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan

This series allows userspace to query device config space of vDPA
devices and the management devices through netlink,
to get multi-queue, feature bits and etc.

This series has introduced a new netlink attr
VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, this should be used to query
features of vDPA  devices than the management device.

Please help review.

Thanks!
Zhu Lingshan

Changes rom V4:
(1) Read MAC, MTU, MQ conditionally (Michael)
(2) If VIRTIO_NET_F_MAC not set, don't report MAC to userspace
(3) If VIRTIO_NET_F_MTU not set, report 1500 to userspace
(4) Add comments to the new attr
VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES(Michael)
(5) Add comments for reporting the device status as LE(Michael)

Changes from V3:
(1)drop the fixes tags(Parva)
(2)better commit log for patch 1/6(Michael)
(3)assign num_queues to max_supported_vqs than max_vq_pairs(Jason)
(4)initialize virtio pci capabilities in the probe() function.

Changes from V2:
Add fixes tags(Parva)

Changes from V1:
(1) Use __virito16_to_cpu(true, xxx) for the le16 casting(Jason)
(2) Add a comment in ifcvf_get_config_size(), to explain
why we should return the minimum value of
sizeof(struct virtio_net_config) and the onboard
cap size(Jason)
(3) Introduced a new attr VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES
(4) Show the changes of iproute2 output before and after 5/6 patch(Jason)
(5) Fix cast warning in vdpa_fill_stats_rec() 

Zhu Lingshan (6):
  vDPA/ifcvf: get_config_size should return a value no greater than dev
    implementation
  vDPA/ifcvf: support userspace to query features and MQ of a management
    device
  vDPA: allow userspace to query features of a vDPA device
  vDPA: !FEATURES_OK should not block querying device config space
  vDPA: Conditionally read fields in virtio-net dev config space
  fix 'cast to restricted le16' warnings in vdpa.c

 drivers/vdpa/ifcvf/ifcvf_base.c |  13 ++-
 drivers/vdpa/ifcvf/ifcvf_base.h |   2 +
 drivers/vdpa/ifcvf/ifcvf_main.c | 142 +++++++++++++++++---------------
 drivers/vdpa/vdpa.c             |  82 ++++++++++++------
 include/uapi/linux/vdpa.h       |   3 +
 5 files changed, 149 insertions(+), 93 deletions(-)

-- 
2.31.1


^ permalink raw reply

* [PATCH V5 1/6] vDPA/ifcvf: get_config_size should return a value no greater than dev implementation
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

Drivers must not access a BAR outside the capability length,
and for a virtio device, ifcvf driver should not report any non-standard
capability contents to the upper layers.

Function ifcvf_get_config_size() is introduced here to return a safe value
of the device config capability size.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 drivers/vdpa/ifcvf/ifcvf_base.c | 13 +++++++++++--
 drivers/vdpa/ifcvf/ifcvf_base.h |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 48c4dadb0c7c..85611be5ccb4 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -128,6 +128,7 @@ int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *pdev)
 			break;
 		case VIRTIO_PCI_CAP_DEVICE_CFG:
 			hw->dev_cfg = get_cap_addr(hw, &cap);
+			hw->cap_dev_config_size = le32_to_cpu(cap.length);
 			IFCVF_DBG(pdev, "hw->dev_cfg = %p\n", hw->dev_cfg);
 			break;
 		}
@@ -233,15 +234,23 @@ int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features)
 u32 ifcvf_get_config_size(struct ifcvf_hw *hw)
 {
 	struct ifcvf_adapter *adapter;
+	u32 net_config_size = sizeof(struct virtio_net_config);
+	u32 blk_config_size = sizeof(struct virtio_blk_config);
+	u32 cap_size = hw->cap_dev_config_size;
 	u32 config_size;
 
 	adapter = vf_to_adapter(hw);
+	/* If the onboard device config space size is greater than
+	 * the size of struct virtio_net/blk_config, only the spec
+	 * implementing contents size is returned, this is very
+	 * unlikely, defensive programming.
+	 */
 	switch (hw->dev_type) {
 	case VIRTIO_ID_NET:
-		config_size = sizeof(struct virtio_net_config);
+		config_size = min(cap_size, net_config_size);
 		break;
 	case VIRTIO_ID_BLOCK:
-		config_size = sizeof(struct virtio_blk_config);
+		config_size = min(cap_size, blk_config_size);
 		break;
 	default:
 		config_size = 0;
diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index 115b61f4924b..f5563f665cc6 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -87,6 +87,8 @@ struct ifcvf_hw {
 	int config_irq;
 	int vqs_reused_irq;
 	u16 nr_vring;
+	/* VIRTIO_PCI_CAP_DEVICE_CFG size */
+	u32 cap_dev_config_size;
 };
 
 struct ifcvf_adapter {
-- 
2.31.1


^ permalink raw reply related

* [PATCH V5 2/6] vDPA/ifcvf: support userspace to query features and MQ of a management device
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

Adapting to current netlink interfaces, this commit allows userspace
to query feature bits and MQ capability of a management device.

Currently both the vDPA device and the management device are the VF itself,
thus this ifcvf should initialize the virtio capabilities in probe() before
setting up the struct vdpa_mgmt_dev.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 drivers/vdpa/ifcvf/ifcvf_main.c | 142 +++++++++++++++++---------------
 1 file changed, 76 insertions(+), 66 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 0a5670729412..3fd0267873f8 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -752,59 +752,36 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
 {
 	struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
 	struct ifcvf_adapter *adapter;
+	struct vdpa_device *vdpa_dev;
 	struct pci_dev *pdev;
 	struct ifcvf_hw *vf;
-	struct device *dev;
-	int ret, i;
+	int ret;
 
 	ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev);
-	if (ifcvf_mgmt_dev->adapter)
+	if (!ifcvf_mgmt_dev->adapter)
 		return -EOPNOTSUPP;
 
-	pdev = ifcvf_mgmt_dev->pdev;
-	dev = &pdev->dev;
-	adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
-				    dev, &ifc_vdpa_ops, 1, 1, name, false);
-	if (IS_ERR(adapter)) {
-		IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
-		return PTR_ERR(adapter);
-	}
-
-	ifcvf_mgmt_dev->adapter = adapter;
-
+	adapter = ifcvf_mgmt_dev->adapter;
 	vf = &adapter->vf;
-	vf->dev_type = get_dev_type(pdev);
-	vf->base = pcim_iomap_table(pdev);
+	pdev = adapter->pdev;
+	vdpa_dev = &adapter->vdpa;
 
-	adapter->pdev = pdev;
-	adapter->vdpa.dma_dev = &pdev->dev;
-
-	ret = ifcvf_init_hw(vf, pdev);
-	if (ret) {
-		IFCVF_ERR(pdev, "Failed to init IFCVF hw\n");
-		goto err;
-	}
-
-	for (i = 0; i < vf->nr_vring; i++)
-		vf->vring[i].irq = -EINVAL;
-
-	vf->hw_features = ifcvf_get_hw_features(vf);
-	vf->config_size = ifcvf_get_config_size(vf);
+	if (name)
+		ret = dev_set_name(&vdpa_dev->dev, "%s", name);
+	else
+		ret = dev_set_name(&vdpa_dev->dev, "vdpa%u", vdpa_dev->index);
 
-	adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev;
 	ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring);
 	if (ret) {
+		put_device(&adapter->vdpa.dev);
 		IFCVF_ERR(pdev, "Failed to register to vDPA bus");
-		goto err;
+		return ret;
 	}
 
 	return 0;
-
-err:
-	put_device(&adapter->vdpa.dev);
-	return ret;
 }
 
+
 static void ifcvf_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev)
 {
 	struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
@@ -823,61 +800,94 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev;
 	struct device *dev = &pdev->dev;
+	struct ifcvf_adapter *adapter;
+	struct ifcvf_hw *vf;
 	u32 dev_type;
-	int ret;
-
-	ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
-	if (!ifcvf_mgmt_dev) {
-		IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n");
-		return -ENOMEM;
-	}
-
-	dev_type = get_dev_type(pdev);
-	switch (dev_type) {
-	case VIRTIO_ID_NET:
-		ifcvf_mgmt_dev->mdev.id_table = id_table_net;
-		break;
-	case VIRTIO_ID_BLOCK:
-		ifcvf_mgmt_dev->mdev.id_table = id_table_blk;
-		break;
-	default:
-		IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", dev_type);
-		ret = -EOPNOTSUPP;
-		goto err;
-	}
-
-	ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops;
-	ifcvf_mgmt_dev->mdev.device = dev;
-	ifcvf_mgmt_dev->pdev = pdev;
+	int ret, i;
 
 	ret = pcim_enable_device(pdev);
 	if (ret) {
 		IFCVF_ERR(pdev, "Failed to enable device\n");
-		goto err;
+		return ret;
 	}
-
 	ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4),
 				 IFCVF_DRIVER_NAME);
 	if (ret) {
 		IFCVF_ERR(pdev, "Failed to request MMIO region\n");
-		goto err;
+		return ret;
 	}
 
 	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
 	if (ret) {
 		IFCVF_ERR(pdev, "No usable DMA configuration\n");
-		goto err;
+		return ret;
 	}
 
 	ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev);
 	if (ret) {
 		IFCVF_ERR(pdev,
 			  "Failed for adding devres for freeing irq vectors\n");
-		goto err;
+		return ret;
 	}
 
 	pci_set_master(pdev);
 
+	adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
+				    dev, &ifc_vdpa_ops, 1, 1, NULL, false);
+	if (IS_ERR(adapter)) {
+		IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
+		return PTR_ERR(adapter);
+	}
+
+	vf = &adapter->vf;
+	vf->dev_type = get_dev_type(pdev);
+	vf->base = pcim_iomap_table(pdev);
+
+	adapter->pdev = pdev;
+	adapter->vdpa.dma_dev = &pdev->dev;
+
+	ret = ifcvf_init_hw(vf, pdev);
+	if (ret) {
+		IFCVF_ERR(pdev, "Failed to init IFCVF hw\n");
+		return ret;
+	}
+
+	for (i = 0; i < vf->nr_vring; i++)
+		vf->vring[i].irq = -EINVAL;
+
+	vf->hw_features = ifcvf_get_hw_features(vf);
+	vf->config_size = ifcvf_get_config_size(vf);
+
+	ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL);
+	if (!ifcvf_mgmt_dev) {
+		IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n");
+		return -ENOMEM;
+	}
+
+	ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops;
+	ifcvf_mgmt_dev->mdev.device = dev;
+	ifcvf_mgmt_dev->adapter = adapter;
+
+	dev_type = get_dev_type(pdev);
+	switch (dev_type) {
+	case VIRTIO_ID_NET:
+		ifcvf_mgmt_dev->mdev.id_table = id_table_net;
+		break;
+	case VIRTIO_ID_BLOCK:
+		ifcvf_mgmt_dev->mdev.id_table = id_table_blk;
+		break;
+	default:
+		IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", dev_type);
+		ret = -EOPNOTSUPP;
+		goto err;
+	}
+
+	ifcvf_mgmt_dev->mdev.max_supported_vqs = vf->nr_vring;
+	ifcvf_mgmt_dev->mdev.supported_features = vf->hw_features;
+
+	adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev;
+
+
 	ret = vdpa_mgmtdev_register(&ifcvf_mgmt_dev->mdev);
 	if (ret) {
 		IFCVF_ERR(pdev,
-- 
2.31.1


^ permalink raw reply related

* [PATCH V5 3/6] vDPA: allow userspace to query features of a vDPA device
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

This commit adds a new vDPA netlink attribution
VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
features of vDPA devices through this new attr.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 drivers/vdpa/vdpa.c       | 17 +++++++++++++----
 include/uapi/linux/vdpa.h |  3 +++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index ebf2f363fbe7..6eb3d972d802 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -491,6 +491,8 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m
 		err = -EMSGSIZE;
 		goto msg_err;
 	}
+
+	/* report features of a vDPA management device through VDPA_ATTR_DEV_SUPPORTED_FEATURES */
 	if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_SUPPORTED_FEATURES,
 			      mdev->supported_features, VDPA_ATTR_PAD)) {
 		err = -EMSGSIZE;
@@ -815,7 +817,7 @@ static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
 static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg)
 {
 	struct virtio_net_config config = {};
-	u64 features;
+	u64 features_device, features_driver;
 	u16 val_u16;
 
 	vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
@@ -832,12 +834,19 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
 		return -EMSGSIZE;
 
-	features = vdev->config->get_driver_features(vdev);
-	if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features,
+	features_driver = vdev->config->get_driver_features(vdev);
+	if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
+			      VDPA_ATTR_PAD))
+		return -EMSGSIZE;
+
+	features_device = vdev->config->get_device_features(vdev);
+
+	/* report features of a vDPA device through VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES */
+	if (nla_put_u64_64bit(msg, VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, features_device,
 			      VDPA_ATTR_PAD))
 		return -EMSGSIZE;
 
-	return vdpa_dev_net_mq_config_fill(vdev, msg, features, &config);
+	return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver, &config);
 }
 
 static int
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
index 25c55cab3d7c..d171b92ef522 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -46,7 +46,10 @@ enum vdpa_attr {
 
 	VDPA_ATTR_DEV_NEGOTIATED_FEATURES,	/* u64 */
 	VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,		/* u32 */
+	/* features of a vDPA management device */
 	VDPA_ATTR_DEV_SUPPORTED_FEATURES,	/* u64 */
+	/* features of a vDPA device, e.g., /dev/vhost-vdpa0 */
+	VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,	/* u64 */
 
 	VDPA_ATTR_DEV_QUEUE_INDEX,              /* u32 */
 	VDPA_ATTR_DEV_VENDOR_ATTR_NAME,		/* string */
-- 
2.31.1


^ permalink raw reply related

* [PATCH V5 4/6] vDPA: !FEATURES_OK should not block querying device config space
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

Users may want to query the config space of a vDPA device,
to choose a appropriate one for a certain guest. This means the
users need to read the config space before FEATURES_OK, and
the existence of config space contents does not depend on
FEATURES_OK.

The spec says:
The device MUST allow reading of any device-specific configuration
field before FEATURES_OK is set by the driver. This includes
fields which are conditional on feature bits, as long as those
feature bits are offered by the device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 drivers/vdpa/vdpa.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 6eb3d972d802..bf312d9c59ab 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -855,17 +855,9 @@ vdpa_dev_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid,
 {
 	u32 device_id;
 	void *hdr;
-	u8 status;
 	int err;
 
 	down_read(&vdev->cf_lock);
-	status = vdev->config->get_status(vdev);
-	if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
-		NL_SET_ERR_MSG_MOD(extack, "Features negotiation not completed");
-		err = -EAGAIN;
-		goto out;
-	}
-
 	hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags,
 			  VDPA_CMD_DEV_CONFIG_GET);
 	if (!hdr) {
-- 
2.31.1


^ permalink raw reply related

* [PATCH V5 5/6] vDPA: Conditionally read fields in virtio-net dev config space
From: Zhu Lingshan @ 2022-08-12 10:44 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

Some fields of virtio-net device config space are
conditional on the feature bits, the spec says:

"The mac address field always exists
(though is only valid if VIRTIO_NET_F_MAC is set)"

"max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
or VIRTIO_NET_F_RSS is set"

"mtu only exists if VIRTIO_NET_F_MTU is set"

so we should read MTU, MAC and MQ in the device config
space only when these feature bits are offered.

For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
not set, the virtio device should have
one queue pair as default value, so when userspace querying queue pair numbers,
it should return mq=1 than zero.

For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
MTU from the device config sapce.
RFC894 <A Standard for the Transmission of IP Datagrams over Ethernet Networks>
says:"The minimum length of the data field of a packet sent over an
Ethernet is 1500 octets, thus the maximum length of an IP datagram
sent over an Ethernet is 1500 octets.  Implementations are encouraged
to support full-length packets"

virtio spec says:"The virtio network device is a virtual ethernet card",
so the default MTU value should be 1500 for virtio-net.

For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
the configuration space mac entry indicates the “physical” address
of the network card, otherwise the driver would typically
generate a random local MAC address." So there is no
default MAC address if VIRTIO_NET_F_MAC not set.

This commits introduces functions vdpa_dev_net_mtu_config_fill()
and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
It also fixes vdpa_dev_net_mq_config_fill() to report correct
MQ when _F_MQ is not present.

These functions should check devices features than driver
features, and struct vdpa_device is not needed as a parameter

The test & userspace tool output:

Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
and VIRTIO_NET_F_MAC can be mask out by hardcode.

However, it is challenging to "disable" the related fields
in the HW device config space, so let's just assume the values
are meaningless if the feature bits are not set.

Before this change, when feature bits for RSS, MQ, MTU and MAC
are not set, iproute2 output:
$vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false mtu 1500
  negotiated_features

without this commit, function vdpa_dev_net_config_fill()
reads all config space fields unconditionally, so let's
assume the MAC and MTU are meaningless, and it checks
MQ with driver_features, so we don't see max_vq_pairs.

After applying this commit, when feature bits for
MQ, RSS, MAC and MTU are not set 0,iproute2 output:
$vdpa dev config show vdpa0
vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
  negotiated_features

As explained above:
Here is no MAC, because VIRTIO_NET_F_MAC is not set,
and there is no default value for MAC. It shows
max_vq_paris = 1 because even without MQ feature,
a functional virtio-net must have one queue pair.
mtu = 1500 is the default value as ethernet
required.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 drivers/vdpa/vdpa.c | 51 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bf312d9c59ab..f6172c8dd262 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -801,19 +801,44 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct netlink_callba
 	return msg->len;
 }
 
-static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
-				       struct sk_buff *msg, u64 features,
+static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64 features,
 				       const struct virtio_net_config *config)
 {
 	u16 val_u16;
 
-	if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
-		return 0;
+	if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
+	    (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
+		val_u16 = 1;
+	else
+		val_u16 = __virtio16_to_cpu(true, config->max_virtqueue_pairs);
 
-	val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
 	return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, val_u16);
 }
 
+static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, u64 features,
+					const struct virtio_net_config *config)
+{
+	u16 val_u16;
+
+	if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
+		val_u16 = 1500;
+	else
+		val_u16 = __virtio16_to_cpu(true, config->mtu);
+
+	return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
+}
+
+static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 features,
+					const struct virtio_net_config *config)
+{
+	if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
+		return 0;
+	else
+		return  nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
+				sizeof(config->mac), config->mac);
+}
+
+
 static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg)
 {
 	struct virtio_net_config config = {};
@@ -822,18 +847,10 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
 
 	vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
 
-	if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, sizeof(config.mac),
-		    config.mac))
-		return -EMSGSIZE;
-
 	val_u16 = le16_to_cpu(config.status);
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
 		return -EMSGSIZE;
 
-	val_u16 = le16_to_cpu(config.mtu);
-	if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
-		return -EMSGSIZE;
-
 	features_driver = vdev->config->get_driver_features(vdev);
 	if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
 			      VDPA_ATTR_PAD))
@@ -846,7 +863,13 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
 			      VDPA_ATTR_PAD))
 		return -EMSGSIZE;
 
-	return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver, &config);
+	if (vdpa_dev_net_mac_config_fill(msg, features_device, &config))
+		return -EMSGSIZE;
+
+	if (vdpa_dev_net_mtu_config_fill(msg, features_device, &config))
+		return -EMSGSIZE;
+
+	return vdpa_dev_net_mq_config_fill(msg, features_device, &config);
 }
 
 static int
-- 
2.31.1


^ permalink raw reply related

* [PATCH V5 6/6] fix 'cast to restricted le16' warnings in vdpa.c
From: Zhu Lingshan @ 2022-08-12 10:45 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

This commit fixes spars warnings: cast to restricted __le16
in function vdpa_dev_net_config_fill() and
vdpa_fill_stats_rec()

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
---
 drivers/vdpa/vdpa.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index f6172c8dd262..0e5fa97dfcc3 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -847,7 +847,11 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
 
 	vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
 
-	val_u16 = le16_to_cpu(config.status);
+	/*
+	 * Assume little endian for now, userspace can tweak this for
+	 * legacy guest support.
+	 */
+	val_u16 = __virtio16_to_cpu(true, config.status);
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
 		return -EMSGSIZE;
 
@@ -937,7 +941,11 @@ static int vdpa_fill_stats_rec(struct vdpa_device *vdev, struct sk_buff *msg,
 	}
 	vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
 
-	max_vqp = le16_to_cpu(config.max_virtqueue_pairs);
+	/*
+	 * Assume little endian for now, userspace can tweak this for
+	 * legacy guest support.
+	 */
+	max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
 	if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
 		return -EMSGSIZE;
 
-- 
2.31.1


^ permalink raw reply related

* Re: [RFCv7 PATCH net-next 02/36] net: replace general features macroes with global netdev_features variables
From: shenjian (K) @ 2022-08-12 10:58 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: davem, kuba, andrew, ecree.xilinx, hkallweit1, saeed, leon,
	netdev, linuxarm
In-Reply-To: <20220811110529.4866-1-alexandr.lobakin@intel.com>



在 2022/8/11 19:05, Alexander Lobakin 写道:
> From: "shenjian (K)" <shenjian15@huawei.com>
> Date: Wed, 10 Aug 2022 20:01:15 +0800
>
>> 在 2022/8/10 17:58, Alexander Lobakin 写道:
>> > From: Jian Shen <shenjian15@huawei.com>
>> > Date: Wed, 10 Aug 2022 11:05:50 +0800
>> >
>> >> There are many netdev_features bits group used in kernel. The 
>> definition
>> >> will be illegal when using feature bit more than 64. Replace these 
>> macroes
>> >> with global netdev_features variables, initialize them when netdev 
>> module
>> >> init.
>> >>
>> >> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>> >> ---
>
> [...]
>
>> >> @@ -11362,6 +11363,86 @@ static struct pernet_operations 
>> __net_initdata default_device_ops = {
>> >>       .exit_batch = default_device_exit_batch,
>> >>   };
>> >>   >> +static void __init netdev_features_init(void)
>> > Given that you're creating a new file dedicated to netdev features,
>> > I'd place that initializer there. You can then declare its proto in
>> > net/core/dev.h.
>> I want to make sure it cann't be called outside net/core/dev.c, for some
>> drivers include net/core/dev.h, then they can see it.
>
> net/core/dev.h is internal, nobody outside net/core/ uses it and
> this was its purpose.
>
All right, will move it netdev_features.c
>>
>> >> +{
>> >> +    netdev_features_t features;
>> >> +
>> >> + netdev_features_set_array(&netif_f_ip_csum_feature_set,
>> >> +                  &netdev_ip_csum_features);
>> >> + netdev_features_set_array(&netif_f_csum_feature_set_mask,
>> >> +                  &netdev_csum_features_mask);
>> >> +
>> >> + netdev_features_set_array(&netif_f_gso_feature_set_mask,
>> >> +                  &netdev_gso_features_mask);
>> >> + netdev_features_set_array(&netif_f_general_tso_feature_set,
>> >> +                  &netdev_general_tso_features);
>> >> + netdev_features_set_array(&netif_f_all_tso_feature_set,
>> >> +                  &netdev_all_tso_features);
>> >> + netdev_features_set_array(&netif_f_tso_ecn_feature_set,
>> >> +                  &netdev_tso_ecn_features);
>> >> + netdev_features_set_array(&netif_f_all_fcoe_feature_set,
>> >> +                  &netdev_all_fcoe_features);
>> >> + netdev_features_set_array(&netif_f_gso_soft_feature_set,
>> >> +                  &netdev_gso_software_features);
>> >> + netdev_features_set_array(&netif_f_gso_encap_feature_set,
>> >> +                  &netdev_gso_encap_all_features);
>> >> +
>> >> +    netdev_csum_gso_features_mask =
>> >> +        netdev_features_or(netdev_gso_features_mask,
>> >> +                   netdev_csum_features_mask);
>> > (I forgot to mention this in 01/36 ._.)
>> >
>> > As you're converting to bitmaps, you should probably avoid direct
>> > assignments. All the bitmap_*() modification functions take a pointer
>> > to the destination as a first argument. So it should be
>> >
>> > netdev_features_or(netdev_features_t *dst, const netdev_features_t 
>> *src1,
>> >            const netdev_features_t *src1);
>> The netdev_features_t will be convert to a structure which only 
>> contained
>> a feature bitmap. So assginement is ok.
>
> Yeah I realized it later, probably a good idea.
>
>>
>>
>> >> +
>> >> + netdev_features_set_array(&netif_f_one_for_all_feature_set,
>> >> +                  &netdev_one_for_all_features);
>> > Does it make sense to prefix features and the corresponding sets
>> > differently? Why not just 'netdev_' for both of them?
>> For all the feature bits are named "NETFI_F_XXX_BIT",
>
> Right, but then why are netdev_*_features prefixed with 'netdev',
> not 'netif_f'? :D Those sets are tied tightly with the feature
> structures, so I think they should have the same prefix. I'd go
> with 'netdev' for both.
>
ok, will prefix with 'netdev'

>>
>>
>> >> + netdev_features_set_array(&netif_f_all_for_all_feature_set,
>> >> +                  &netdev_all_for_all_features);
>
> [...]
>
>> >> -- >> 2.33.0
>> > Thanks,
>> > Olek
>> >
>> > .
>>
>> Thank,
>> Jian
>
> Thanks,
> Olek
>
> .
>
Thanks,
Jian

^ permalink raw reply

* Re: [PATCH V5 0/6] ifcvf/vDPA: support query device config space through netlink
From: Michael S. Tsirkin @ 2022-08-12 11:14 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, virtualization, netdev, kvm, parav, xieyongji,
	gautam.dawar
In-Reply-To: <20220812104500.163625-1-lingshan.zhu@intel.com>

On Fri, Aug 12, 2022 at 06:44:54PM +0800, Zhu Lingshan wrote:
> This series allows userspace to query device config space of vDPA
> devices and the management devices through netlink,
> to get multi-queue, feature bits and etc.
> 
> This series has introduced a new netlink attr
> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, this should be used to query
> features of vDPA  devices than the management device.
> 
> Please help review.

I can't merge this for this merge window.
Am I right when I say that the new thing here is patch 5/6 + new
comments?
If yes I can queue it out of the window, on top.

> Thanks!
> Zhu Lingshan
> 
> Changes rom V4:
> (1) Read MAC, MTU, MQ conditionally (Michael)
> (2) If VIRTIO_NET_F_MAC not set, don't report MAC to userspace
> (3) If VIRTIO_NET_F_MTU not set, report 1500 to userspace
> (4) Add comments to the new attr
> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES(Michael)
> (5) Add comments for reporting the device status as LE(Michael)
> 
> Changes from V3:
> (1)drop the fixes tags(Parva)
> (2)better commit log for patch 1/6(Michael)
> (3)assign num_queues to max_supported_vqs than max_vq_pairs(Jason)
> (4)initialize virtio pci capabilities in the probe() function.
> 
> Changes from V2:
> Add fixes tags(Parva)
> 
> Changes from V1:
> (1) Use __virito16_to_cpu(true, xxx) for the le16 casting(Jason)
> (2) Add a comment in ifcvf_get_config_size(), to explain
> why we should return the minimum value of
> sizeof(struct virtio_net_config) and the onboard
> cap size(Jason)
> (3) Introduced a new attr VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES
> (4) Show the changes of iproute2 output before and after 5/6 patch(Jason)
> (5) Fix cast warning in vdpa_fill_stats_rec() 
> 
> Zhu Lingshan (6):
>   vDPA/ifcvf: get_config_size should return a value no greater than dev
>     implementation
>   vDPA/ifcvf: support userspace to query features and MQ of a management
>     device
>   vDPA: allow userspace to query features of a vDPA device
>   vDPA: !FEATURES_OK should not block querying device config space
>   vDPA: Conditionally read fields in virtio-net dev config space
>   fix 'cast to restricted le16' warnings in vdpa.c
> 
>  drivers/vdpa/ifcvf/ifcvf_base.c |  13 ++-
>  drivers/vdpa/ifcvf/ifcvf_base.h |   2 +
>  drivers/vdpa/ifcvf/ifcvf_main.c | 142 +++++++++++++++++---------------
>  drivers/vdpa/vdpa.c             |  82 ++++++++++++------
>  include/uapi/linux/vdpa.h       |   3 +
>  5 files changed, 149 insertions(+), 93 deletions(-)
> 
> -- 
> 2.31.1


^ permalink raw reply

* Re: [PATCH V5 0/6] ifcvf/vDPA: support query device config space through netlink
From: Michael S. Tsirkin @ 2022-08-12 11:17 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, virtualization, netdev, kvm, parav, xieyongji,
	gautam.dawar
In-Reply-To: <20220812071251-mutt-send-email-mst@kernel.org>

On Fri, Aug 12, 2022 at 07:14:39AM -0400, Michael S. Tsirkin wrote:
> On Fri, Aug 12, 2022 at 06:44:54PM +0800, Zhu Lingshan wrote:
> > This series allows userspace to query device config space of vDPA
> > devices and the management devices through netlink,
> > to get multi-queue, feature bits and etc.
> > 
> > This series has introduced a new netlink attr
> > VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, this should be used to query
> > features of vDPA  devices than the management device.
> > 
> > Please help review.
> 
> I can't merge this for this merge window.
> Am I right when I say that the new thing here is patch 5/6 + new
> comments?
> If yes I can queue it out of the window, on top.

So at this point, can you please send patches on top of the vhost
tree? I think these are just patches 3 and 5 but please confirm.


> > Thanks!
> > Zhu Lingshan
> > 
> > Changes rom V4:
> > (1) Read MAC, MTU, MQ conditionally (Michael)
> > (2) If VIRTIO_NET_F_MAC not set, don't report MAC to userspace
> > (3) If VIRTIO_NET_F_MTU not set, report 1500 to userspace
> > (4) Add comments to the new attr
> > VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES(Michael)
> > (5) Add comments for reporting the device status as LE(Michael)
> > 
> > Changes from V3:
> > (1)drop the fixes tags(Parva)
> > (2)better commit log for patch 1/6(Michael)
> > (3)assign num_queues to max_supported_vqs than max_vq_pairs(Jason)
> > (4)initialize virtio pci capabilities in the probe() function.
> > 
> > Changes from V2:
> > Add fixes tags(Parva)
> > 
> > Changes from V1:
> > (1) Use __virito16_to_cpu(true, xxx) for the le16 casting(Jason)
> > (2) Add a comment in ifcvf_get_config_size(), to explain
> > why we should return the minimum value of
> > sizeof(struct virtio_net_config) and the onboard
> > cap size(Jason)
> > (3) Introduced a new attr VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES
> > (4) Show the changes of iproute2 output before and after 5/6 patch(Jason)
> > (5) Fix cast warning in vdpa_fill_stats_rec() 
> > 
> > Zhu Lingshan (6):
> >   vDPA/ifcvf: get_config_size should return a value no greater than dev
> >     implementation
> >   vDPA/ifcvf: support userspace to query features and MQ of a management
> >     device
> >   vDPA: allow userspace to query features of a vDPA device
> >   vDPA: !FEATURES_OK should not block querying device config space
> >   vDPA: Conditionally read fields in virtio-net dev config space
> >   fix 'cast to restricted le16' warnings in vdpa.c
> > 
> >  drivers/vdpa/ifcvf/ifcvf_base.c |  13 ++-
> >  drivers/vdpa/ifcvf/ifcvf_base.h |   2 +
> >  drivers/vdpa/ifcvf/ifcvf_main.c | 142 +++++++++++++++++---------------
> >  drivers/vdpa/vdpa.c             |  82 ++++++++++++------
> >  include/uapi/linux/vdpa.h       |   3 +
> >  5 files changed, 149 insertions(+), 93 deletions(-)
> > 
> > -- 
> > 2.31.1


^ permalink raw reply

* Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state
From: Marek Szyprowski @ 2022-08-12 11:19 UTC (permalink / raw)
  To: Florian Fainelli, netdev, Steve Glendinning
  Cc: opendmb, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	open list
In-Reply-To: <20220801233403.258871-1-f.fainelli@gmail.com>

Hi All,

On 02.08.2022 01:34, Florian Fainelli wrote:
> Calling mdio_bus_phy_resume() with neither the PHY state machine set to
> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
> that we can produce a race condition looking like this:
>
> CPU0						CPU1
> bcmgenet_resume
>   -> phy_resume
>     -> phy_init_hw
>   -> phy_start
>     -> phy_resume
>                                                  phy_start_aneg()
> mdio_bus_phy_resume
>   -> phy_resume
>      -> phy_write(..., BMCR_RESET)
>       -> usleep()                                  -> phy_read()
>
> with the phy_resume() function triggering a PHY behavior that might have
> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
> brcm_fet_config_init()") for instance) that ultimately leads to an error
> reading from the PHY.
>
> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

This patch, as probably intended, triggers a warning during system 
suspend/resume cycle in the SMSC911x driver. I've observed it on ARM 
Juno R1 board on the kernel compiled from next-202208010:

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323 
mdio_bus_phy_resume+0x34/0xc8
  Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative 
crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x]
  CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940
  Hardware name: ARM Juno development board (r1) (DT)
  pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : mdio_bus_phy_resume+0x34/0xc8
  lr : dpm_run_callback+0x74/0x350
  ...
  Call trace:
   mdio_bus_phy_resume+0x34/0xc8
   dpm_run_callback+0x74/0x350
   device_resume+0xb8/0x258
   dpm_resume+0x120/0x4a8
   dpm_resume_end+0x14/0x28
   suspend_devices_and_enter+0x164/0xa60
   pm_suspend+0x25c/0x3a8
   state_store+0x84/0x108
   kobj_attr_store+0x14/0x28
   sysfs_kf_write+0x60/0x70
   kernfs_fop_write_iter+0x124/0x1a8
   new_sync_write+0xd0/0x190
   vfs_write+0x208/0x478
   ksys_write+0x64/0xf0
   __arm64_sys_write+0x14/0x20
   invoke_syscall+0x40/0xf8
   el0_svc_common.constprop.3+0x8c/0x120
   do_el0_svc+0x28/0xc8
   el0_svc+0x48/0xd0
   el0t_64_sync_handler+0x94/0xb8
   el0t_64_sync+0x15c/0x160
  irq event stamp: 24406
  hardirqs last  enabled at (24405): [<ffff8000090c4734>] 
_raw_spin_unlock_irqrestore+0x8c/0x90
  hardirqs last disabled at (24406): [<ffff8000090b3164>] el1_dbg+0x24/0x88
  softirqs last  enabled at (24144): [<ffff800008010488>] _stext+0x488/0x5cc
  softirqs last disabled at (24139): [<ffff80000809bf98>] 
irq_exit_rcu+0x168/0x1a8
  ---[ end trace 0000000000000000 ]---

I hope the above information will help fixing the driver.

> ---
>   drivers/net/phy/phy_device.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 46acddd865a7..608de5a94165 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -316,6 +316,12 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
>   
>   	phydev->suspended_by_mdio_bus = 0;
>   
> +	/* If we managed to get here with the PHY state machine in a state other
> +	 * than PHY_HALTED this is an indication that something went wrong and
> +	 * we should most likely be using MAC managed PM and we are not.
> +	 */
> +	WARN_ON(phydev->state != PHY_HALTED && !phydev->mac_managed_pm);
> +
>   	ret = phy_init_hw(phydev);
>   	if (ret < 0)
>   		return ret;

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply

* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Krzysztof Kozlowski @ 2022-08-12 11:25 UTC (permalink / raw)
  To: Wei Fang, andrew@lunn.ch, hkallweit1@gmail.com,
	linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, robh+dt@kernel.org,
	krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <DB9PR04MB81060AF4890DEA9E2378940288679@DB9PR04MB8106.eurprd04.prod.outlook.com>

On 12/08/2022 12:02, Wei Fang wrote:
> 
> 
>> -----Original Message-----
>> From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>> Sent: 2022年8月12日 15:28
>> To: Wei Fang <wei.fang@nxp.com>; andrew@lunn.ch; hkallweit1@gmail.com;
>> linux@armlinux.org.uk; davem@davemloft.net; edumazet@google.com;
>> kuba@kernel.org; pabeni@redhat.com; robh+dt@kernel.org;
>> krzysztof.kozlowski+dt@linaro.org; f.fainelli@gmail.com;
>> netdev@vger.kernel.org; devicetree@vger.kernel.org;
>> linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation
>> property
>>
>> On 12/08/2022 17:50, wei.fang@nxp.com wrote:
>>> From: Wei Fang <wei.fang@nxp.com>
>>>
>>
>> Please use subject prefix matching subsystem.
>>
> Ok, I'll add the subject prefix.
> 
>>> The hibernation mode of Atheros AR803x PHYs is default enabled.
>>> When the cable is unplugged, the PHY will enter hibernation mode and
>>> the PHY clock does down. For some MACs, it needs the clock to support
>>> it's logic. For instance, stmmac needs the PHY inputs clock is present
>>> for software reset completion. Therefore, It is reasonable to add a DT
>>> property to disable hibernation mode.
>>>
>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>> ---
>>>  Documentation/devicetree/bindings/net/qca,ar803x.yaml | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
>>> b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
>>> index b3d4013b7ca6..d08431d79b83 100644
>>> --- a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
>>> +++ b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
>>> @@ -40,6 +40,12 @@ properties:
>>>        Only supported on the AR8031.
>>>      type: boolean
>>>
>>> +  qca,disable-hibernation:
>>> +    description: |
>>> +    If set, the PHY will not enter hibernation mode when the cable is
>>> +    unplugged.
>>
>> Wrong indentation. Did you test the bindings?
>>
> Sorry, I just checked the patch and forgot to check the dt-bindings.
> 
>> Unfortunately the property describes driver behavior not hardware, so it is not
>> suitable for DT. Instead describe the hardware
>> characteristics/features/bugs/constraints. Not driver behavior. Both in property
>> name and property description.
>>
> Thanks for your review and feedback. Actually, the hibernation mode is a feature of hardware, I will modify the property name and description to be more in line with the requirements of the DT property. 

hibernation is a feature, but 'disable-hibernation' is not. DTS
describes the hardware, not policy or driver bejhvior. Why disabling
hibernation is a property of hardware? How you described, it's not,
therefore either property is not for DT or it has to be phrased
correctly to describe the hardware.

Best regards,
Krzysztof

^ permalink raw reply

* mainline build failure due to 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression")
From: Sudip Mukherjee (Codethink) @ 2022-08-12 11:25 UTC (permalink / raw)
  To: torvalds, Jakub Kicinski
  Cc: Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-bluetooth, netdev, linux-kernel

Hi All,

The latest mainline kernel branch fails to build csky and mips allmodconfig
with gcc-12.

mips error is:

In function 'memcmp',
    inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
    inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15:
./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
   44 | #define __underlying_memcmp     __builtin_memcmp
      |                                 ^
./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp'
  420 |         return __underlying_memcmp(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~
In function 'memcmp',
    inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
    inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15:
./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
   44 | #define __underlying_memcmp     __builtin_memcmp
      |                                 ^
./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp'
  420 |         return __underlying_memcmp(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~


csky error is:

In file included from net/bluetooth/l2cap_core.c:37:
In function 'bacmp',
    inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15:
./include/net/bluetooth/bluetooth.h:347:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
  347 |         return memcmp(ba1, ba2, sizeof(bdaddr_t));
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function 'bacmp',
    inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15:
./include/net/bluetooth/bluetooth.h:347:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
  347 |         return memcmp(ba1, ba2, sizeof(bdaddr_t));
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


git bisect pointed to 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression").
And, reverting that commit has fixed the build failure.

Already reported at https://lore.kernel.org/lkml/YvVQEDs75pxSgxjM@debian/
and Jacub is looking at a fix, but this is just my usual build failure
mail of mainline branch for Linus's information.


--
Regards
Sudip

^ permalink raw reply

* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
From: Yafang Shao @ 2022-08-12 11:25 UTC (permalink / raw)
  To: Muchun Song
  Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
	john fastabend, KP Singh, Stanislav Fomichev, Hao Luo, jolsa,
	Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, netdev, bpf, Linux MM
In-Reply-To: <870C70CA-C760-40A5-8A04-F0962EFDF507@linux.dev>

On Fri, Aug 12, 2022 at 1:34 PM Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
> > On Aug 12, 2022, at 08:27, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Aug 11, 2022 at 11:47 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>
> >> On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
> >>> On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >>>>
> >>>> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> >>>>> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> >>>>
> >>>> No, it is ok to put root_mem_cgroup. css_put already handles the root
> >>>> cgroups.
> >>>>
> >>>
> >>> Ah, this commit log doesn't describe the issue clearly. I should improve it.
> >>> The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
> >>> map->objcg is NULL (that can happen if the map belongs to the root
> >>> memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
> >>> bpf_map_put_memcg().
> >>
> >> Sorry I am still not understanding. We are not 'getting' objcg in
> >> bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
> >> is NULL the function is returning root memcg and putting root memcg is
> >> totally fine.
> >
> > When the map belongs to root_mem_cgroup, the map->objcg is NULL, right ?
> > See also bpf_map_save_memcg() and it describes clearly in the comment -
> >
> > static void bpf_map_save_memcg(struct bpf_map *map)
> > {
> >        /* Currently if a map is created by a process belonging to the root
> >         * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> >         * So we have to check map->objcg for being NULL each time it's
> >         * being used.
> >         */
> >        map->objcg = get_obj_cgroup_from_current();
> > }
> >
> > So for the root_mem_cgroup case, bpf_map_get_memcg() will return
> > root_mem_cgroup directly without getting its css, right ? See below,
> >
> > static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
> > {
> >
> >        if (map->objcg)
> >                return get_mem_cgroup_from_objcg(map->objcg);
> >
> >        return root_mem_cgroup;   // without css_get(&memcg->css);
> > }
> >
> > But it will put the css unconditionally.  See below,
> >
> > memcg = bpf_map_get_memcg(map);
> > ...
> > mem_cgroup_put(memcg);
> >
> > So we should put it *conditionally* as well.
>
> Hi,
>
> No. We could put root_mem_cgroup unconditionally since the root css
> is treated as no reference css. See css_put().
>
> static inline void css_put(struct cgroup_subsys_state *css)
> {
>         // The root memcg’s css has been set with CSS_NO_REF.
>         if (!(css->flags & CSS_NO_REF))
>                 percpu_ref_put(&css->refcnt);
> }

Indeed. I missed that.
The call stack is so deep that I didn't look into it :(
Thanks for the information.

-- 
Regards
Yafang

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox