Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH 3/3] Enable ptp_kvm for arm64
From: Marc Zyngier @ 2019-08-29 10:32 UTC (permalink / raw)
  To: Jianyong Wu, netdev, pbonzini, sean.j.christopherson,
	richardcochran, Mark.Rutland, Will.Deacon, suzuki.poulose
  Cc: linux-kernel, Steve.Capper, Kaly.Xin, justin.he
In-Reply-To: <20190829063952.18470-4-jianyong.wu@arm.com>

On 29/08/2019 07:39, Jianyong Wu wrote:
> Currently in arm64 virtualization environment, there is no mechanism to
> keep time sync between guest and host. Time in guest will drift compared
> with host after boot up as they may both use third party time sources
> to correct their time respectively. The time deviation will be in order
> of milliseconds but some scenarios ask for higher time precision, like
> in cloud envirenment, we want all the VMs running in the host aquire the
> same level accuracy from host clock.
> 
> Use of kvm ptp clock, which choose the host clock source clock as a
> reference clock to sync time clock between guest and host has been adopted
> by x86 which makes the time sync order from milliseconds to nanoseconds.
> 
> This patch enable kvm ptp on arm64 and we get the similar clock drift as
> found with x86 with kvm ptp.
> 
> Test result comparison between with kvm ptp and without it in arm64 are
> as follows. This test derived from the result of command 'chronyc
> sources'. we should take more cure of the last sample column which shows
> the offset between the local clock and the source at the last measurement.
> 
> no kvm ptp in guest:
> MS Name/IP address   Stratum Poll Reach LastRx Last sample
> ========================================================================
> ^* dns1.synet.edu.cn      2   6   377    13  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    21  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    29  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    37  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    45  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    53  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    61  +1040us[+1581us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377     4   -130us[ +796us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    12   -130us[ +796us] +/-   21ms
> ^* dns1.synet.edu.cn      2   6   377    20   -130us[ +796us] +/-   21ms
> 
> in host:
> MS Name/IP address   Stratum Poll Reach LastRx Last sample
> ========================================================================
> ^* 120.25.115.20          2   7   377    72   -470us[ -603us] +/-   18ms
> ^* 120.25.115.20          2   7   377    92   -470us[ -603us] +/-   18ms
> ^* 120.25.115.20          2   7   377   112   -470us[ -603us] +/-   18ms
> ^* 120.25.115.20          2   7   377     2   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377    22   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377    43   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377    63   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377    83   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377   103   +872ns[-6808ns] +/-   17ms
> ^* 120.25.115.20          2   7   377   123   +872ns[-6808ns] +/-   17ms
> 
> The dns1.synet.edu.cn is the network reference clock for guest and
> 120.25.115.20 is the network reference clock for host. we can't get the
> clock error between guest and host directly, but a roughly estimated value
> will be in order of hundreds of us to ms.
> 
> with kvm ptp in guest:
> chrony has been disabled in host to remove the disturb by network clock.

Is that a realistic use case? Why should the host not use NTP?

> 
> MS Name/IP address         Stratum Poll Reach LastRx Last sample
> ========================================================================
> * PHC0                    0   3   377     8     -7ns[   +1ns] +/-    3ns
> * PHC0                    0   3   377     8     +1ns[  +16ns] +/-    3ns
> * PHC0                    0   3   377     6     -4ns[   -0ns] +/-    6ns
> * PHC0                    0   3   377     6     -8ns[  -12ns] +/-    5ns
> * PHC0                    0   3   377     5     +2ns[   +4ns] +/-    4ns
> * PHC0                    0   3   377    13     +2ns[   +4ns] +/-    4ns
> * PHC0                    0   3   377    12     -4ns[   -6ns] +/-    4ns
> * PHC0                    0   3   377    11     -8ns[  -11ns] +/-    6ns
> * PHC0                    0   3   377    10    -14ns[  -20ns] +/-    4ns
> * PHC0                    0   3   377     8     +4ns[   +5ns] +/-    4ns
> 
> The PHC0 is the ptp clock which choose the host clock as its source
> clock. So we can be sure to say that the clock error between host and guest
> is in order of ns.
> 
> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> ---
>  arch/arm64/include/asm/arch_timer.h  |  3 ++
>  arch/arm64/kvm/arch_ptp_kvm.c        | 76 ++++++++++++++++++++++++++++
>  drivers/clocksource/arm_arch_timer.c |  6 ++-
>  drivers/ptp/Kconfig                  |  2 +-
>  include/linux/arm-smccc.h            | 14 +++++
>  virt/kvm/arm/psci.c                  | 17 +++++++
>  6 files changed, 115 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm64/kvm/arch_ptp_kvm.c

Please split this patch into two parts: the hypervisor code in a patch
and the guest code in another patch. Having both of them together is
confusing.

> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 6756178c27db..880576a814b6 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -229,4 +229,7 @@ static inline int arch_timer_arch_init(void)
>  	return 0;
>  }
>  
> +extern struct clocksource clocksource_counter;
> +extern u64 arch_counter_read(struct clocksource *cs);

I'm definitely not keen on exposing the internals of the arch_timer
driver to random subsystems. Furthermore, you seem to expect that the
guest kernel will only use the arch timer as a clocksource, and nothing
really guarantees that (in which case get_device_system_crosststamp will
fail).

It looks to me that we'd be better off exposing a core timekeeping API
that populates a struct system_counterval_t based on the *current*
timekeeper monotonic clocksource. This would simplify the split between
generic and arch-specific code.

Whether or not tglx will be happy with the idea is another problem, but
I'm certainly not taking any change to the arch timer code based on this.

> +
>  #endif
> diff --git a/arch/arm64/kvm/arch_ptp_kvm.c b/arch/arm64/kvm/arch_ptp_kvm.c

We don't put non-hypervisor in arch/arm64/kvm. Please move it back to
drivers/ptp (as well as its x86 counterpart), and just link the two
parts there. This should also allow this to be enabled for 32bit guests.

> new file mode 100644
> index 000000000000..6b2165ebce62
> --- /dev/null
> +++ b/arch/arm64/kvm/arch_ptp_kvm.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + *  Virtual PTP 1588 clock for use with KVM guests
> + *  Copyright (C) 2019 ARM Ltd.
> + *  All Rights Reserved
> + */
> +
> +#include <asm/hypervisor.h>
> +#include <linux/module.h>
> +#include <linux/psci.h>
> +#include <linux/arm-smccc.h>
> +#include <linux/timecounter.h>
> +#include <linux/sched/clock.h>
> +#include <asm/arch_timer.h>
> +
> +/*
> + * as trap call cause delay, this function will return the delay in nanosecond
> + */
> +static u64 arm_smccc_1_1_invoke_delay(u32 id, struct arm_smccc_res *res)
> +{
> +	u64 ns, t1, t2;
> +
> +	t1 = sched_clock();
> +	arm_smccc_1_1_invoke(id, res);
> +	t2 = sched_clock();
> +	t2 -= t1;
> +	ns = t2;
> +	return ns;

I think you can get rid of the ns variable here...

> +}
> +
> +int kvm_arch_ptp_init(void)
> +{
> +	return 0;
> +}
> +
> +int kvm_arch_ptp_get_clock(struct timespec64 *ts)
> +{
> +	u64 ns;
> +	struct arm_smccc_res hvc_res;
> +
> +	if (!kvm_arm_hyp_service_available(
> +			ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID)) {
> +		return -EOPNOTSUPP;
> +	}
> +	ns = arm_smccc_1_1_invoke_delay(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
> +					&hvc_res);
> +	ts->tv_sec = hvc_res.a0;
> +	ts->tv_nsec = hvc_res.a1;
> +	timespec64_add_ns(ts, ns);
> +	return 0;
> +}
> +
> +int kvm_arch_ptp_get_clock_fn(long *cycle, struct timespec64 *ts,
> +			      struct clocksource **cs)
> +{
> +	u64 ns;
> +	struct arm_smccc_res hvc_res;
> +
> +	if (!kvm_arm_hyp_service_available(
> +			ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID)) {
> +		return -EOPNOTSUPP;
> +	}
> +	ns = arm_smccc_1_1_invoke_delay(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
> +					&hvc_res);
> +	ts->tv_sec = hvc_res.a0;
> +	ts->tv_nsec = hvc_res.a1;
> +	timespec64_add_ns(ts, ns);
> +	*cycle = hvc_res.a2;
> +	*cs = &clocksource_counter;
> +
> +	return 0;
> +}

Why do we have two functions doing almost the same thing? Why do you
call kvm_arm_hyp_service_available on each and every time? Isn't it
enough to check in kvm_arch_ptp_init()?

> +
> +MODULE_AUTHOR("Marcelo Tosatti <mtosatti@redhat.com>");
> +MODULE_DESCRIPTION("PTP clock using KVMCLOCK");
> +MODULE_LICENSE("GPL");

This should only exist in the generic code.

> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 07e57a49d1e8..021e3f69364c 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -175,23 +175,25 @@ static notrace u64 arch_counter_get_cntvct(void)
>  u64 (*arch_timer_read_counter)(void) = arch_counter_get_cntvct;
>  EXPORT_SYMBOL_GPL(arch_timer_read_counter);
>  
> -static u64 arch_counter_read(struct clocksource *cs)
> +u64 arch_counter_read(struct clocksource *cs)
>  {
>  	return arch_timer_read_counter();
>  }
> +EXPORT_SYMBOL(arch_counter_read);
>  
>  static u64 arch_counter_read_cc(const struct cyclecounter *cc)
>  {
>  	return arch_timer_read_counter();
>  }
>  
> -static struct clocksource clocksource_counter = {
> +struct clocksource clocksource_counter = {
>  	.name	= "arch_sys_counter",
>  	.rating	= 400,
>  	.read	= arch_counter_read,
>  	.mask	= CLOCKSOURCE_MASK(56),
>  	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
>  };
> +EXPORT_SYMBOL(clocksource_counter);

I've said what I thought about this. Not happening.

>  
>  static struct cyclecounter cyclecounter __ro_after_init = {
>  	.read	= arch_counter_read_cc,
> diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
> index 9b8fee5178e8..e032fafdafa7 100644
> --- a/drivers/ptp/Kconfig
> +++ b/drivers/ptp/Kconfig
> @@ -110,7 +110,7 @@ config PTP_1588_CLOCK_PCH
>  config PTP_1588_CLOCK_KVM
>  	tristate "KVM virtual PTP clock"
>  	depends on PTP_1588_CLOCK
> -	depends on KVM_GUEST && X86
> +	depends on KVM_GUEST && X86 || ARM64
>  	default y
>  	help
>  	  This driver adds support for using kvm infrastructure as a PTP
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index a6e4d3e3d10a..2a222a1a8594 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -94,6 +94,7 @@
>  
>  /* KVM "vendor specific" services */
>  #define ARM_SMCCC_KVM_FUNC_FEATURES		0
> +#define ARM_SMCCC_KVM_PTP			1
>  #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
>  #define ARM_SMCCC_KVM_NUM_FUNCS			128
>  
> @@ -102,6 +103,16 @@
>  			   ARM_SMCCC_SMC_32,				\
>  			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
>  			   ARM_SMCCC_KVM_FUNC_FEATURES)
> +/*
> + * This ID used for virtual ptp kvm clock and it will pass second value
> + * and nanosecond value of host real time and system counter by vcpu
> + * register to guest.
> + */
> +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID				\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
> +			   ARM_SMCCC_SMC_32,				\
> +			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
> +			   ARM_SMCCC_KVM_PTP)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -373,5 +384,8 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
>  		method;							\
>  	})
>  
> +#include <linux/psci.h>
> +#include <linux/clocksource.h>
> +
>  #endif /*__ASSEMBLY__*/
>  #endif /*__LINUX_ARM_SMCCC_H*/
> diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
> index 0debf49bf259..7fffdb25d32c 100644
> --- a/virt/kvm/arm/psci.c
> +++ b/virt/kvm/arm/psci.c
> @@ -392,6 +392,8 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	u32 func_id = smccc_get_function(vcpu);
>  	u32 val[4] = {};
>  	u32 option;
> +	struct timespec *ts;
> +	u64 cnt;
>  
>  	val[0] = SMCCC_RET_NOT_SUPPORTED;
>  
> @@ -431,6 +433,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
>  		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
>  		break;
> +	/*
> +	 * This will used for virtual ptp kvm clock. three
> +	 * values will be passed back.
> +	 * reg0 stores seconds of host real time;
> +	 * reg1 stores nanoseconds of host real time;
> +	 * reg2 stotes system counter cycle value.

stores

> +	 */
> +	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> +		getnstimeofday(ts);
> +		cnt = arch_timer_read_counter();
> +		val[0] = ts->tv_sec;
> +		val[1] = ts->tv_nsec;
> +		val[2] = cnt;

Can you explain what the purpose of exposing this counter is? The guest
should have access to the physical counter already.

> +		val[3] = 0;
> +		break;

This will probably conflict with Steven's stolen time series. Not a big
deal though.

>  	default:
>  		return kvm_psci_call(vcpu);
>  	}
> 

Other questions: how does this works with VM migration? Specially when
moving from a hypervisor that supports the feature to one that doesn't?

Thanks,

	M.
-- 
Jazz is not dead, it just smells funny...

^ permalink raw reply

* [PATCH AUTOSEL 4.19 09/29] selftests: fib_rule_tests: use pre-defined DEV_ADDR
From: Sasha Levin @ 2019-08-29 10:49 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hangbin Liu, David Ahern, David S . Miller, Sasha Levin, netdev,
	linux-api
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Hangbin Liu <liuhangbin@gmail.com>

[ Upstream commit 34632975cafdd07ce80e85c2eda4e9c16b5f2faa ]

DEV_ADDR is defined but not used. Use it in address setting.
Do the same with IPv6 for consistency.

Reported-by: David Ahern <dsahern@gmail.com>
Fixes: fc82d93e57e3 ("selftests: fib_rule_tests: fix local IPv4 address typo")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 tools/testing/selftests/net/fib_rule_tests.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index 1ba069967fa2b..ba2d9fab28d0f 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -15,6 +15,7 @@ GW_IP6=2001:db8:1::2
 SRC_IP6=2001:db8:1::3
 
 DEV_ADDR=192.51.100.1
+DEV_ADDR6=2001:db8:1::1
 DEV=dummy0
 
 log_test()
@@ -55,8 +56,8 @@ setup()
 
 	$IP link add dummy0 type dummy
 	$IP link set dev dummy0 up
-	$IP address add 192.51.100.1/24 dev dummy0
-	$IP -6 address add 2001:db8:1::1/64 dev dummy0
+	$IP address add $DEV_ADDR/24 dev dummy0
+	$IP -6 address add $DEV_ADDR6/64 dev dummy0
 
 	set +e
 }
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.4 3/6] mac80211: fix possible sta leak
From: Sasha Levin @ 2019-08-29 10:51 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johannes Berg, Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190829105110.2748-1-sashal@kernel.org>

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit 5fd2f91ad483baffdbe798f8a08f1b41442d1e24 ]

If TDLS station addition is rejected, the sta memory is leaked.
Avoid this by moving the check before the allocation.

Cc: stable@vger.kernel.org
Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/cfg.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 7349bf26ae7b3..1999a7eaa6920 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1211,6 +1211,11 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (is_multicast_ether_addr(mac))
 		return -EINVAL;
 
+	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER) &&
+	    sdata->vif.type == NL80211_IFTYPE_STATION &&
+	    !sdata->u.mgd.associated)
+		return -EINVAL;
+
 	sta = sta_info_alloc(sdata, mac, GFP_KERNEL);
 	if (!sta)
 		return -ENOMEM;
@@ -1228,10 +1233,6 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER))
 		sta->sta.tdls = true;
 
-	if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION &&
-	    !sdata->u.mgd.associated)
-		return -EINVAL;
-
 	err = sta_apply_parameters(local, sta, params);
 	if (err) {
 		sta_info_free(local, sta);
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.9 5/8] mac80211: fix possible sta leak
From: Sasha Levin @ 2019-08-29 10:50 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johannes Berg, Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190829105100.2649-1-sashal@kernel.org>

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit 5fd2f91ad483baffdbe798f8a08f1b41442d1e24 ]

If TDLS station addition is rejected, the sta memory is leaked.
Avoid this by moving the check before the allocation.

Cc: stable@vger.kernel.org
Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/cfg.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 954315e1661df..3b2c4692d966f 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1418,6 +1418,11 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (is_multicast_ether_addr(mac))
 		return -EINVAL;
 
+	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER) &&
+	    sdata->vif.type == NL80211_IFTYPE_STATION &&
+	    !sdata->u.mgd.associated)
+		return -EINVAL;
+
 	sta = sta_info_alloc(sdata, mac, GFP_KERNEL);
 	if (!sta)
 		return -ENOMEM;
@@ -1425,10 +1430,6 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER))
 		sta->sta.tdls = true;
 
-	if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION &&
-	    !sdata->u.mgd.associated)
-		return -EINVAL;
-
 	err = sta_apply_parameters(local, sta, params);
 	if (err) {
 		sta_info_free(local, sta);
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 10/14] mac80211: fix possible sta leak
From: Sasha Levin @ 2019-08-29 10:50 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johannes Berg, Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190829105043.2508-1-sashal@kernel.org>

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit 5fd2f91ad483baffdbe798f8a08f1b41442d1e24 ]

If TDLS station addition is rejected, the sta memory is leaked.
Avoid this by moving the check before the allocation.

Cc: stable@vger.kernel.org
Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/cfg.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 150dd2160cefb..8168c667d91d9 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1459,6 +1459,11 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (is_multicast_ether_addr(mac))
 		return -EINVAL;
 
+	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER) &&
+	    sdata->vif.type == NL80211_IFTYPE_STATION &&
+	    !sdata->u.mgd.associated)
+		return -EINVAL;
+
 	sta = sta_info_alloc(sdata, mac, GFP_KERNEL);
 	if (!sta)
 		return -ENOMEM;
@@ -1466,10 +1471,6 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER))
 		sta->sta.tdls = true;
 
-	if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION &&
-	    !sdata->u.mgd.associated)
-		return -EINVAL;
-
 	err = sta_apply_parameters(local, sta, params);
 	if (err) {
 		sta_info_free(local, sta);
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 01/14] hv_sock: Fix hang when a connection is closed
From: Sasha Levin @ 2019-08-29 10:50 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dexuan Cui, Sunil Muthuswamy, David S . Miller, Sasha Levin,
	netdev, bcm-kernel-feedback-list

From: Dexuan Cui <decui@microsoft.com>

[ Upstream commit 685703b497bacea8765bb409d6b73455b73c540e ]

There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

The issue can be resolved if an extra reference is taken when the
connection is established.

Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/vmw_vsock/hyperv_transport.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 52ac3e49c7efd..ec72a5edaa1b8 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -320,6 +320,11 @@ static void hvs_close_connection(struct vmbus_channel *chan)
 	lock_sock(sk);
 	hvs_do_close_lock_held(vsock_sk(sk), true);
 	release_sock(sk);
+
+	/* Release the refcnt for the channel that's opened in
+	 * hvs_open_connection().
+	 */
+	sock_put(sk);
 }
 
 static void hvs_open_connection(struct vmbus_channel *chan)
@@ -389,6 +394,9 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 	}
 
 	set_per_channel_state(chan, conn_from_host ? new : sk);
+
+	/* This reference will be dropped by hvs_close_connection(). */
+	sock_hold(conn_from_host ? new : sk);
 	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
 
 	/* Set the pending send size to max packet size to always get
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 21/29] netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
From: Sasha Levin @ 2019-08-29 10:50 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stefano Brivio, Chen Yi, Jozsef Kadlecsik, Sasha Levin,
	netfilter-devel, coreteam, netdev
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Stefano Brivio <sbrivio@redhat.com>

[ Upstream commit b89d15480d0cacacae1a0fe0b3da01b529f2914f ]

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.

Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/ipset/ip_set_hash_ipmac.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index fd87de3ed55b3..75c21c8b76514 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -95,10 +95,6 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } };
 	struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
 
-	 /* MAC can be src only */
-	if (!(opt->flags & IPSET_DIM_TWO_SRC))
-		return 0;
-
 	if (skb_mac_header(skb) < skb->head ||
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 22/29] netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
From: Sasha Levin @ 2019-08-29 10:50 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stefano Brivio, Chen Yi, Jozsef Kadlecsik, Sasha Levin,
	netfilter-devel, coreteam, netdev
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Stefano Brivio <sbrivio@redhat.com>

[ Upstream commit 1b4a75108d5bc153daf965d334e77e8e94534f96 ]

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.

This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.

Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.

This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 192.0.2.1/24 dev veth1
  ip -net A addr add 192.0.2.2/24 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  dst=$(ip netns exec A cat /sys/class/net/veth2/address)

  ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
  ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP

  ip netns exec A ipset create test_hash hash:ip,mac
  ip netns exec A ipset add test_hash 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

ipset correctly matches a test packet:

  # ping -c1 192.0.2.2 >/dev/null
  # echo $?
  0

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/netfilter/ipset/ip_set_bitmap_ipmac.c | 2 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index 13ade5782847b..4f01321e793ce 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -230,7 +230,7 @@ bitmap_ipmac_kadt(struct ip_set *set, const struct sk_buff *skb,

 	e.id = ip_to_id(map, ip);

-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index 75c21c8b76514..16ec822e40447 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -99,7 +99,7 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;

-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
-- 
2.20.1

^ permalink raw reply related

* [PATCH AUTOSEL 4.19 18/29] mac80211: fix possible sta leak
From: Sasha Levin @ 2019-08-29 10:49 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Johannes Berg, Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Johannes Berg <johannes.berg@intel.com>

[ Upstream commit 5fd2f91ad483baffdbe798f8a08f1b41442d1e24 ]

If TDLS station addition is rejected, the sta memory is leaked.
Avoid this by moving the check before the allocation.

Cc: stable@vger.kernel.org
Fixes: 7ed5285396c2 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/mac80211/cfg.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 40c5102234679..a48e83b19cfa7 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1471,6 +1471,11 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (is_multicast_ether_addr(mac))
 		return -EINVAL;
 
+	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER) &&
+	    sdata->vif.type == NL80211_IFTYPE_STATION &&
+	    !sdata->u.mgd.associated)
+		return -EINVAL;
+
 	sta = sta_info_alloc(sdata, mac, GFP_KERNEL);
 	if (!sta)
 		return -ENOMEM;
@@ -1478,10 +1483,6 @@ static int ieee80211_add_station(struct wiphy *wiphy, struct net_device *dev,
 	if (params->sta_flags_set & BIT(NL80211_STA_FLAG_TDLS_PEER))
 		sta->sta.tdls = true;
 
-	if (sta->sta.tdls && sdata->vif.type == NL80211_IFTYPE_STATION &&
-	    !sdata->u.mgd.associated)
-		return -EINVAL;
-
 	err = sta_apply_parameters(local, sta, params);
 	if (err) {
 		sta_info_free(local, sta);
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 15/29] netfilter: nf_tables: use-after-free in failing rule with bound set
From: Sasha Levin @ 2019-08-29 10:49 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Pablo Neira Ayuso, Sasha Levin, netfilter-devel, coreteam, netdev
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

[ Upstream commit 6a0a8d10a3661a036b55af695542a714c429ab7c ]

If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.

[   24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[   24.657858] Mem abort info:
[   24.660686]   ESR = 0x96000004
[   24.663769]   Exception class = DABT (current EL), IL = 32 bits
[   24.669725]   SET = 0, FnV = 0
[   24.672804]   EA = 0, S1PTW = 0
[   24.675975] Data abort info:
[   24.678880]   ISV = 0, ISS = 0x00000004
[   24.682743]   CM = 0, WnR = 0
[   24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[   24.692207] [0000000000040434] pgd=0000000000000000
[   24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[   24.889414] Call trace:
[   24.891870]  __nf_tables_abort+0x3f0/0x7a0
[   24.895984]  nf_tables_abort+0x20/0x40
[   24.899750]  nfnetlink_rcv_batch+0x17c/0x588
[   24.904037]  nfnetlink_rcv+0x13c/0x190
[   24.907803]  netlink_unicast+0x18c/0x208
[   24.911742]  netlink_sendmsg+0x1b0/0x350
[   24.915682]  sock_sendmsg+0x4c/0x68
[   24.919185]  ___sys_sendmsg+0x288/0x2c8
[   24.923037]  __sys_sendmsg+0x7c/0xd0
[   24.926628]  __arm64_sys_sendmsg+0x2c/0x38
[   24.930744]  el0_svc_common.constprop.0+0x94/0x158
[   24.935556]  el0_svc_handler+0x34/0x90
[   24.939322]  el0_svc+0x8/0xc
[   24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[   24.948336] ---[ end trace cebbb9dcbed3b56f ]---

Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/netfilter/nf_tables.h |  9 +++++++--
 net/netfilter/nf_tables_api.c     | 15 ++++++++++-----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index f2be5d041ba3a..7685cbda9f28b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -418,8 +418,7 @@ struct nft_set {
 	unsigned char			*udata;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
-	u16				flags:13,
-					bound:1,
+	u16				flags:14,
 					genmask:2;
 	u8				klen;
 	u8				dlen;
@@ -1337,12 +1336,15 @@ struct nft_trans_rule {
 struct nft_trans_set {
 	struct nft_set			*set;
 	u32				set_id;
+	bool				bound;
 };
 
 #define nft_trans_set(trans)	\
 	(((struct nft_trans_set *)trans->data)->set)
 #define nft_trans_set_id(trans)	\
 	(((struct nft_trans_set *)trans->data)->set_id)
+#define nft_trans_set_bound(trans)	\
+	(((struct nft_trans_set *)trans->data)->bound)
 
 struct nft_trans_chain {
 	bool				update;
@@ -1373,12 +1375,15 @@ struct nft_trans_table {
 struct nft_trans_elem {
 	struct nft_set			*set;
 	struct nft_set_elem		elem;
+	bool				bound;
 };
 
 #define nft_trans_elem_set(trans)	\
 	(((struct nft_trans_elem *)trans->data)->set)
 #define nft_trans_elem(trans)	\
 	(((struct nft_trans_elem *)trans->data)->elem)
+#define nft_trans_elem_set_bound(trans)	\
+	(((struct nft_trans_elem *)trans->data)->bound)
 
 struct nft_trans_obj {
 	struct nft_object		*obj;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 29ff59dd99ace..2145581d7b3dc 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -121,9 +121,14 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
 		return;
 
 	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
-		if (trans->msg_type == NFT_MSG_NEWSET &&
-		    nft_trans_set(trans) == set) {
-			set->bound = true;
+		switch (trans->msg_type) {
+		case NFT_MSG_NEWSET:
+			if (nft_trans_set(trans) == set)
+				nft_trans_set_bound(trans) = true;
+			break;
+		case NFT_MSG_NEWSETELEM:
+			if (nft_trans_elem_set(trans) == set)
+				nft_trans_elem_set_bound(trans) = true;
 			break;
 		}
 	}
@@ -6656,7 +6661,7 @@ static int __nf_tables_abort(struct net *net)
 			break;
 		case NFT_MSG_NEWSET:
 			trans->ctx.table->use--;
-			if (nft_trans_set(trans)->bound) {
+			if (nft_trans_set_bound(trans)) {
 				nft_trans_destroy(trans);
 				break;
 			}
@@ -6668,7 +6673,7 @@ static int __nf_tables_abort(struct net *net)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_NEWSETELEM:
-			if (nft_trans_elem_set(trans)->bound) {
+			if (nft_trans_elem_set_bound(trans)) {
 				nft_trans_destroy(trans);
 				break;
 			}
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 07/29] {nl,mac}80211: fix interface combinations on crypto controlled devices
From: Sasha Levin @ 2019-08-29 10:49 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Manikanta Pubbisetty, Johannes Berg, Sasha Levin, linux-wireless,
	netdev
In-Reply-To: <20190829105009.2265-1-sashal@kernel.org>

From: Manikanta Pubbisetty <mpubbise@codeaurora.org>

[ Upstream commit e6f4051123fd33901e9655a675b22aefcdc5d277 ]

Commit 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on
crypto controlled devices") has introduced a change which allows
4addr operation on crypto controlled devices (ex: ath10k). This
change has inadvertently impacted the interface combinations logic
on such devices.

General rule is that software interfaces like AP/VLAN should not be
listed under supported interface combinations and should not be
considered during validation of these combinations; because of the
aforementioned change, AP/VLAN interfaces(if present) will be checked
against interfaces supported by the device and blocks valid interface
combinations.

Consider a case where an AP and AP/VLAN are up and running; when a
second AP device is brought up on the same physical device, this AP
will be checked against the AP/VLAN interface (which will not be
part of supported interface combinations of the device) and blocks
second AP to come up.

Add a new API cfg80211_iftype_allowed() to fix the problem, this
API works for all devices with/without SW crypto control.

Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Fixes: 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices")
Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/cfg80211.h | 15 +++++++++++++++
 net/mac80211/util.c    |  7 +++----
 net/wireless/core.c    |  6 ++----
 net/wireless/nl80211.c |  4 +---
 net/wireless/util.c    | 27 +++++++++++++++++++++++++--
 5 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 67e0a990144a6..468deae5d603e 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -6562,6 +6562,21 @@ int cfg80211_external_auth_request(struct net_device *netdev,
 				   struct cfg80211_external_auth_params *params,
 				   gfp_t gfp);
 
+/**
+ * cfg80211_iftype_allowed - check whether the interface can be allowed
+ * @wiphy: the wiphy
+ * @iftype: interface type
+ * @is_4addr: use_4addr flag, must be '0' when check_swif is '1'
+ * @check_swif: check iftype against software interfaces
+ *
+ * Check whether the interface is allowed to operate; additionally, this API
+ * can be used to check iftype against the software interfaces when
+ * check_swif is '1'.
+ */
+bool cfg80211_iftype_allowed(struct wiphy *wiphy, enum nl80211_iftype iftype,
+			     bool is_4addr, u8 check_swif);
+
+
 /* Logging, debugging and troubleshooting/diagnostic helpers. */
 
 /* wiphy_printk helpers, similar to dev_printk */
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index c59638574cf8b..f101a6460b44b 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -3527,9 +3527,7 @@ int ieee80211_check_combinations(struct ieee80211_sub_if_data *sdata,
 	}
 
 	/* Always allow software iftypes */
-	if (local->hw.wiphy->software_iftypes & BIT(iftype) ||
-	    (iftype == NL80211_IFTYPE_AP_VLAN &&
-	     local->hw.wiphy->flags & WIPHY_FLAG_4ADDR_AP)) {
+	if (cfg80211_iftype_allowed(local->hw.wiphy, iftype, 0, 1)) {
 		if (radar_detect)
 			return -EINVAL;
 		return 0;
@@ -3564,7 +3562,8 @@ int ieee80211_check_combinations(struct ieee80211_sub_if_data *sdata,
 
 		if (sdata_iter == sdata ||
 		    !ieee80211_sdata_running(sdata_iter) ||
-		    local->hw.wiphy->software_iftypes & BIT(wdev_iter->iftype))
+		    cfg80211_iftype_allowed(local->hw.wiphy,
+					    wdev_iter->iftype, 0, 1))
 			continue;
 
 		params.iftype_num[wdev_iter->iftype]++;
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 2a46ec3cb72c1..68660781aa51f 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1335,10 +1335,8 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 		}
 		break;
 	case NETDEV_PRE_UP:
-		if (!(wdev->wiphy->interface_modes & BIT(wdev->iftype)) &&
-		    !(wdev->iftype == NL80211_IFTYPE_AP_VLAN &&
-		      rdev->wiphy.flags & WIPHY_FLAG_4ADDR_AP &&
-		      wdev->use_4addr))
+		if (!cfg80211_iftype_allowed(wdev->wiphy, wdev->iftype,
+					     wdev->use_4addr, 0))
 			return notifier_from_errno(-EOPNOTSUPP);
 
 		if (rfkill_blocked(rdev->rfkill))
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 8e2f03ab4cc9f..2a85bff6a8f35 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3210,9 +3210,7 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info)
 			return err;
 	}
 
-	if (!(rdev->wiphy.interface_modes & (1 << type)) &&
-	    !(type == NL80211_IFTYPE_AP_VLAN && params.use_4addr &&
-	      rdev->wiphy.flags & WIPHY_FLAG_4ADDR_AP))
+	if (!cfg80211_iftype_allowed(&rdev->wiphy, type, params.use_4addr, 0))
 		return -EOPNOTSUPP;
 
 	err = nl80211_parse_mon_options(rdev, type, info, &params);
diff --git a/net/wireless/util.c b/net/wireless/util.c
index d57e2f679a3e4..c14e8f6e5e198 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1670,7 +1670,7 @@ int cfg80211_iter_combinations(struct wiphy *wiphy,
 	for (iftype = 0; iftype < NUM_NL80211_IFTYPES; iftype++) {
 		num_interfaces += params->iftype_num[iftype];
 		if (params->iftype_num[iftype] > 0 &&
-		    !(wiphy->software_iftypes & BIT(iftype)))
+		    !cfg80211_iftype_allowed(wiphy, iftype, 0, 1))
 			used_iftypes |= BIT(iftype);
 	}
 
@@ -1692,7 +1692,7 @@ int cfg80211_iter_combinations(struct wiphy *wiphy,
 			return -ENOMEM;
 
 		for (iftype = 0; iftype < NUM_NL80211_IFTYPES; iftype++) {
-			if (wiphy->software_iftypes & BIT(iftype))
+			if (cfg80211_iftype_allowed(wiphy, iftype, 0, 1))
 				continue;
 			for (j = 0; j < c->n_limits; j++) {
 				all_iftypes |= limits[j].types;
@@ -1895,3 +1895,26 @@ EXPORT_SYMBOL(rfc1042_header);
 const unsigned char bridge_tunnel_header[] __aligned(2) =
 	{ 0xaa, 0xaa, 0x03, 0x00, 0x00, 0xf8 };
 EXPORT_SYMBOL(bridge_tunnel_header);
+
+bool cfg80211_iftype_allowed(struct wiphy *wiphy, enum nl80211_iftype iftype,
+			     bool is_4addr, u8 check_swif)
+
+{
+	bool is_vlan = iftype == NL80211_IFTYPE_AP_VLAN;
+
+	switch (check_swif) {
+	case 0:
+		if (is_vlan && is_4addr)
+			return wiphy->flags & WIPHY_FLAG_4ADDR_AP;
+		return wiphy->interface_modes & BIT(iftype);
+	case 1:
+		if (!(wiphy->software_iftypes & BIT(iftype)) && is_vlan)
+			return wiphy->flags & WIPHY_FLAG_4ADDR_AP;
+		return wiphy->software_iftypes & BIT(iftype);
+	default:
+		break;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL(cfg80211_iftype_allowed);
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 01/29] hv_sock: Fix hang when a connection is closed
From: Sasha Levin @ 2019-08-29 10:49 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dexuan Cui, Sunil Muthuswamy, David S . Miller, Sasha Levin,
	netdev, bcm-kernel-feedback-list

From: Dexuan Cui <decui@microsoft.com>

[ Upstream commit 685703b497bacea8765bb409d6b73455b73c540e ]

There is a race condition for an established connection that is being closed
by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
decrease the refcnt to 3.

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

The issue can be resolved if an extra reference is taken when the
connection is established.

Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/vmw_vsock/hyperv_transport.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 9c7da811d130f..98f193fd5315e 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -320,6 +320,11 @@ static void hvs_close_connection(struct vmbus_channel *chan)
 	lock_sock(sk);
 	hvs_do_close_lock_held(vsock_sk(sk), true);
 	release_sock(sk);
+
+	/* Release the refcnt for the channel that's opened in
+	 * hvs_open_connection().
+	 */
+	sock_put(sk);
 }
 
 static void hvs_open_connection(struct vmbus_channel *chan)
@@ -388,6 +393,9 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 	}
 
 	set_per_channel_state(chan, conn_from_host ? new : sk);
+
+	/* This reference will be dropped by hvs_close_connection(). */
+	sock_hold(conn_from_host ? new : sk);
 	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
 
 	/* Set the pending send size to max packet size to always get
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Horatiu Vultur @ 2019-08-29 10:56 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: alexandre.belloni, UNGLinuxDriver, davem, andrew, allan.nielsen,
	ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829095100.GH2312@nanopsycho>

The 08/29/2019 11:51, Jiri Pirko wrote:
> External E-Mail
> 
> 
> Thu, Aug 29, 2019 at 11:22:28AM CEST, horatiu.vultur@microchip.com wrote:
> >Add the SWITCHDEV_ATTR_ID_PORT_PROMISCUITY switchdev notification type,
> >used to indicate whenever the dev promiscuity counter is changed.
> >
> >The notification doesn't use any switchdev_attr attribute because in the
> >notifier callbacks is it possible to get the dev and read directly
> >the promiscuity value.
> >
> >Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> >---
> > include/net/switchdev.h | 1 +
> > net/core/dev.c          | 9 +++++++++
> > 2 files changed, 10 insertions(+)
> >
> >diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> >index aee86a1..14b1617 100644
> >--- a/include/net/switchdev.h
> >+++ b/include/net/switchdev.h
> >@@ -40,6 +40,7 @@ enum switchdev_attr_id {
> > 	SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
> > 	SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
> > 	SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
> >+	SWITCHDEV_ATTR_ID_PORT_PROMISCUITY,
> > };
> > 
> > struct switchdev_attr {
> >diff --git a/net/core/dev.c b/net/core/dev.c
> >index 49589ed..40c74f2 100644
> >--- a/net/core/dev.c
> >+++ b/net/core/dev.c
> >@@ -142,6 +142,7 @@
> > #include <linux/net_namespace.h>
> > #include <linux/indirect_call_wrapper.h>
> > #include <net/devlink.h>
> >+#include <net/switchdev.h>
> > 
> > #include "net-sysfs.h"
> > 
> >@@ -7377,6 +7378,11 @@ static void dev_change_rx_flags(struct net_device *dev, int flags)
> > static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
> > {
> > 	unsigned int old_flags = dev->flags;
> >+	struct switchdev_attr attr = {
> >+		.orig_dev = dev,
> >+		.id = SWITCHDEV_ATTR_ID_PORT_PROMISCUITY,
> >+		.flags = SWITCHDEV_F_DEFER,
> 

Hi Jiri,

> NACK
> 
> This is invalid usecase for switchdev infra. Switchdev is there for
> bridge offload purposes only.
> 
> For promiscuity changes, the infrastructure is already present in the
> code. See __dev_notify_flags(). it calls:
> call_netdevice_notifiers_info(NETDEV_CHANGE, &change_info.info)
> and you can actually see the changed flag in ".flags_changed".
Yes, you are right. But in case the port is part of a bridge and then
enable promisc mode by a user space application(tpcdump) then the drivers
will not be notified. The reason is that the dev->flags will still be the
same(because IFF_PROMISC was already set) and only promiscuity will be
changed.

One fix could be to call __dev_notify_flags() no matter when the
promisc is enable or disabled.

> 
> You just have to register netdev notifier block in your driver. Grep
> for: register_netdevice_notifier
> 
> 
> >+	};
> > 	kuid_t uid;
> > 	kgid_t gid;
> > 
> >@@ -7419,6 +7425,9 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
> > 	}
> > 	if (notify)
> > 		__dev_notify_flags(dev, old_flags, IFF_PROMISC);
> >+
> >+	switchdev_port_attr_set(dev, &attr);
> >+
> > 	return 0;
> > }
> > 
> >-- 
> >2.7.4
> >
> 

-- 
/Horatiu

^ permalink raw reply

* [PATCH bpf-next 0/3] tools: bpftool: improve bpftool build experience
From: Quentin Monnet @ 2019-08-29 10:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet, Lorenz Bauer,
	Ilya Leoshkevich, Jakub Kicinski

Hi,
This set attempts to make it easier to build bpftool, in particular when
passing a specific output directory. This is a follow-up to the
conversation held last month by Lorenz, Ilya and Jakub [0].

The first patch is a minor fix to bpftool's Makefile, regarding the
retrieval of kernel version (which currently prints a non-relevant make
warning on some invocations).

Second patch improves the Makefile commands to support more "make"
invocations, or to fix building with custom output directory. On Jakub's
suggestion, a script is also added to BPF selftests in order to keep track
of the supported build variants.

At last, third patch is a sligthly modified version of Ilya's fix regarding
libbpf.a appearing twice on the linking command for bpftool.

[0] https://lore.kernel.org/bpf/CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com/

Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>

Quentin Monnet (3):
  tools: bpftool: ignore make built-in rules for getting kernel version
  tools: bpftool: improve and check builds for different make
    invocations
  tools: bpftool: do not link twice against libbpf.a in Makefile

 tools/bpf/bpftool/Makefile                    |  18 ++-
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/test_bpftool_build.sh       | 137 ++++++++++++++++++
 3 files changed, 149 insertions(+), 9 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_bpftool_build.sh

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next 1/3] tools: bpftool: ignore make built-in rules for getting kernel version
From: Quentin Monnet @ 2019-08-29 10:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet, Lorenz Bauer,
	Ilya Leoshkevich, Jakub Kicinski
In-Reply-To: <20190829105645.12285-1-quentin.monnet@netronome.com>

Bpftool calls the toplevel Makefile to get the kernel version for the
sources it is built from. But when the utility is built from the top of
the kernel repository, it may dump the following error message for
certain architectures (including x86):

    $ make tools/bpf
    [...]
    make[3]: *** [checkbin] Error 1
    [...]

This does not prevent bpftool compilation, but may feel disconcerting.
The "checkbin" arch-dependent target is not supposed to be called for
target "kernelversion", which is a simple "echo" of the version number.

It turns out this is caused by the make invocation in tools/bpf/bpftool,
which attempts to find implicit rules to apply. Extract from debug
output:

    Reading makefiles...
    Reading makefile 'Makefile'...
    Reading makefile 'scripts/Kbuild.include' (search path) (no ~ expansion)...
    Reading makefile 'scripts/subarch.include' (search path) (no ~ expansion)...
    Reading makefile 'arch/x86/Makefile' (search path) (no ~ expansion)...
    Reading makefile 'scripts/Makefile.kcov' (search path) (no ~ expansion)...
    Reading makefile 'scripts/Makefile.gcc-plugins' (search path) (no ~ expansion)...
    Reading makefile 'scripts/Makefile.kasan' (search path) (no ~ expansion)...
    Reading makefile 'scripts/Makefile.extrawarn' (search path) (no ~ expansion)...
    Reading makefile 'scripts/Makefile.ubsan' (search path) (no ~ expansion)...
    Updating makefiles....
     Considering target file 'scripts/Makefile.ubsan'.
      Looking for an implicit rule for 'scripts/Makefile.ubsan'.
      Trying pattern rule with stem 'Makefile.ubsan'.
    [...]
      Trying pattern rule with stem 'Makefile.ubsan'.
      Trying implicit prerequisite 'scripts/Makefile.ubsan.o'.
      Looking for a rule with intermediate file 'scripts/Makefile.ubsan.o'.
       Avoiding implicit rule recursion.
       Trying pattern rule with stem 'Makefile.ubsan'.
       Trying rule prerequisite 'prepare'.
       Trying rule prerequisite 'FORCE'.
      Found an implicit rule for 'scripts/Makefile.ubsan'.
        Considering target file 'prepare'.
         File 'prepare' does not exist.
          Considering target file 'prepare0'.
           File 'prepare0' does not exist.
            Considering target file 'archprepare'.
             File 'archprepare' does not exist.
              Considering target file 'archheaders'.
               File 'archheaders' does not exist.
               Finished prerequisites of target file 'archheaders'.
              Must remake target 'archheaders'.
    Putting child 0x55976f4f6980 (archheaders) PID 31743 on the chain.

To avoid that, pass the -r and -R flags to eliminate the use of make
built-in rules (and while at it, built-in variables) when running
command "make kernelversion" from bpftool's Makefile.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/bpf/bpftool/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index f284c207765a..cd0fc05464e7 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -24,7 +24,7 @@ endif
 
 LIBBPF = $(BPF_PATH)libbpf.a
 
-BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. kernelversion)
+BPFTOOL_VERSION := $(shell make -rR --no-print-directory -sC ../../.. kernelversion)
 
 $(LIBBPF): FORCE
 	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 2/3] tools: bpftool: improve and check builds for different make invocations
From: Quentin Monnet @ 2019-08-29 10:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet, Lorenz Bauer,
	Ilya Leoshkevich, Jakub Kicinski
In-Reply-To: <20190829105645.12285-1-quentin.monnet@netronome.com>

There are a number of alternative "make" invocations that can be used to
compile bpftool. The following invocations are expected to work:

  - through the kbuild system, from the top of the repository
    (make tools/bpf)
  - by telling make to change to the bpftool directory
    (make -C tools/bpf/bpftool)
  - by building the BPF tools from tools/
    (cd tools && make bpf)
  - by running make from bpftool directory
    (cd tools/bpf/bpftool && make)

Additionally, setting the O or OUTPUT variables should tell the build
system to use a custom output path, for each of these alternatives.

The following patch fixes the following invocations:

  $ make tools/bpf
  $ make tools/bpf O=<dir>
  $ make -C tools/bpf/bpftool OUTPUT=<dir>
  $ make -C tools/bpf/bpftool O=<dir>
  $ cd tools/ && make bpf O=<dir>
  $ cd tools/bpf/bpftool && make OUTPUT=<dir>
  $ cd tools/bpf/bpftool && make O=<dir>

After this commit, the build still fails for two variants when passing
the OUTPUT variable:

  $ make tools/bpf OUTPUT=<dir>
  $ cd tools/ && make bpf OUTPUT=<dir>

In order to remember and check what make invocations are supposed to
work, and to document the ones which do not, a new script is added to
the BPF selftests. Note that some invocations require the kernel to be
configured, so the script skips them if no .config file is found.

Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/bpf/bpftool/Makefile                    |  12 +-
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/test_bpftool_build.sh       | 137 ++++++++++++++++++
 3 files changed, 146 insertions(+), 6 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_bpftool_build.sh

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index cd0fc05464e7..3fc82ff9b52c 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -17,21 +17,23 @@ endif
 BPF_DIR = $(srctree)/tools/lib/bpf/
 
 ifneq ($(OUTPUT),)
-  BPF_PATH = $(OUTPUT)
+  LIBBPF_OUTPUT = $(OUTPUT)/libbpf/
+  LIBBPF_PATH = $(LIBBPF_OUTPUT)
 else
-  BPF_PATH = $(BPF_DIR)
+  LIBBPF_PATH = $(BPF_DIR)
 endif
 
-LIBBPF = $(BPF_PATH)libbpf.a
+LIBBPF = $(LIBBPF_PATH)libbpf.a
 
 BPFTOOL_VERSION := $(shell make -rR --no-print-directory -sC ../../.. kernelversion)
 
 $(LIBBPF): FORCE
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
+	$(if $(LIBBPF_OUTPUT),@mkdir -p $(LIBBPF_OUTPUT))
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(LIBBPF_OUTPUT) $(LIBBPF_OUTPUT)libbpf.a
 
 $(LIBBPF)-clean:
 	$(call QUIET_CLEAN, libbpf)
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(LIBBPF_OUTPUT) clean >/dev/null
 
 prefix ?= /usr/local
 bash_compdir ?= /usr/share/bash-completion/completions
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1faad0c3c3c9..c7595b4ed55d 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -63,7 +63,8 @@ TEST_PROGS := test_kmod.sh \
 	test_tcp_check_syncookie.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
-	test_xdping.sh
+	test_xdping.sh \
+	test_bpftool_build.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
 	with_tunnels.sh \
diff --git a/tools/testing/selftests/bpf/test_bpftool_build.sh b/tools/testing/selftests/bpf/test_bpftool_build.sh
new file mode 100755
index 000000000000..15dbea809a38
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_bpftool_build.sh
@@ -0,0 +1,137 @@
+#!/bin/bash
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+ERROR=0
+TMPDIR=
+
+# If one build fails, continue but return non-0 on exit.
+return_value() {
+	if [ -d "$TMPDIR" ] ; then
+		rm -rf -- $TMPDIR
+	fi
+	exit $ERROR
+}
+trap return_value EXIT
+
+case $1 in
+	-h|--help)
+		echo -e "$0 [-j <n>]"
+		echo -e "\tTest the different ways of building bpftool."
+		echo -e ""
+		echo -e "\tOptions:"
+		echo -e "\t\t-j <n>:\tPass -j flag to 'make'."
+		exit
+		;;
+esac
+
+J=$*
+
+# Assume script is located under tools/testing/selftests/bpf/. We want to start
+# build attempts from the top of kernel repository.
+SCRIPT_REL_PATH=$(realpath --relative-to=$PWD $0)
+SCRIPT_REL_DIR=$(dirname $SCRIPT_REL_PATH)
+KDIR_ROOT_DIR=$(realpath $PWD/$SCRIPT_REL_DIR/../../../../)
+cd $KDIR_ROOT_DIR
+
+check() {
+	local dir=$(realpath $1)
+
+	echo -n "binary:  "
+	# Returns non-null if file is found (and "false" is run)
+	find $dir -type f -executable -name bpftool -print -exec false {} + && \
+		ERROR=1 && printf "FAILURE: Did not find bpftool\n"
+}
+
+make_and_clean() {
+	echo -e "\$PWD:    $PWD"
+	echo -e "command: make -s $* >/dev/null"
+	make $J -s $* >/dev/null
+	if [ $# -ge 1 ] ; then
+		check ${@: -1}
+	else
+		check .
+	fi
+	(
+		if [ $# -eq 1 ] ; then
+			cd ${@: -1}/bpftool
+		fi
+		make -s clean
+	)
+	echo
+}
+
+make_with_tmpdir() {
+	local ARGS
+
+	TMPDIR=$(mktemp -d)
+	if [ $# -ge 2 ] ; then
+		ARGS=${@:1:(($# - 1))}
+	fi
+	echo -e "\$PWD:    $PWD"
+	echo -e "command: make -s $ARGS ${@: -1}=$TMPDIR/ >/dev/null"
+	make $J -s $ARGS ${@: -1}=$TMPDIR/ >/dev/null
+	check $TMPDIR
+	rm -rf -- $TMPDIR
+	echo
+}
+
+echo "Trying to build bpftool"
+echo -e "... through kbuild\n"
+
+if [ -f ".config" ] ; then
+	make_and_clean tools/bpf
+
+	## $OUTPUT is overwritten in kbuild Makefile, and thus cannot be passed
+	## down from toplevel Makefile to bpftool's Makefile.
+
+	# make_with_tmpdir tools/bpf OUTPUT
+	echo -e "skip:    make tools/bpf OUTPUT=<dir> (not supported)\n"
+
+	make_with_tmpdir tools/bpf O
+else
+	echo -e "skip:    make tools/bpf (no .config found)\n"
+	echo -e "skip:    make tools/bpf OUTPUT=<dir> (not supported)\n"
+	echo -e "skip:    make tools/bpf O=<dir> (no .config found)\n"
+fi
+
+echo -e "... from kernel source tree\n"
+
+make_and_clean -C tools/bpf/bpftool
+
+make_with_tmpdir -C tools/bpf/bpftool OUTPUT
+
+make_with_tmpdir -C tools/bpf/bpftool O
+
+echo -e "... from tools/\n"
+cd tools/
+
+make_and_clean bpf
+
+## In tools/bpf/Makefile, function "descend" is called and passes $(O) and
+## $(OUTPUT). We would like $(OUTPUT) to have "bpf/bpftool/" appended before
+## calling bpftool's Makefile, but this is not the case as the "descend"
+## function focuses on $(O)/$(subdir). However, in the present case, updating
+## $(O) to have $(OUTPUT) recomputed from it in bpftool's Makefile does not
+## work, because $(O) is not defined from command line and $(OUTPUT) is not
+## updated in tools/scripts/Makefile.include.
+##
+## Workarounds would require to a) edit "descend" or use an alternative way to
+## call bpftool's Makefile, b) modify the conditions to update $(OUTPUT) and
+## other variables in tools/scripts/Makefile.include (at the risk of breaking
+## the build of other tools), or c) append manually the "bpf/bpftool" suffix to
+## $(OUTPUT) in bpf's Makefile, which may break if targets for other directories
+## use "descend" in the future.
+
+# make_with_tmpdir bpf OUTPUT
+echo -e "skip:    make bpf OUTPUT=<dir> (not supported)\n"
+
+make_with_tmpdir bpf O
+
+echo -e "... from bpftool's dir\n"
+cd bpf/bpftool
+
+make_and_clean
+
+make_with_tmpdir OUTPUT
+
+make_with_tmpdir O
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 3/3] tools: bpftool: do not link twice against libbpf.a in Makefile
From: Quentin Monnet @ 2019-08-29 10:56 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet, Lorenz Bauer,
	Ilya Leoshkevich, Jakub Kicinski
In-Reply-To: <20190829105645.12285-1-quentin.monnet@netronome.com>

In bpftool's Makefile, $(LIBS) includes $(LIBBPF), therefore the library
is used twice in the linking command. No need to have $(LIBBPF) (from
$^) on that command, let's do with "$(OBJS) $(LIBS)" (but move $(LIBBPF)
_before_ the -l flags in $(LIBS)).

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/bpf/bpftool/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3fc82ff9b52c..22b5a8f2c53d 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -55,7 +55,7 @@ ifneq ($(EXTRA_LDFLAGS),)
 LDFLAGS += $(EXTRA_LDFLAGS)
 endif
 
-LIBS = -lelf -lz $(LIBBPF)
+LIBS = $(LIBBPF) -lelf -lz
 
 INSTALL ?= install
 RM ?= rm -f
@@ -117,7 +117,7 @@ $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
 $(OUTPUT)feature.o: | zdep
 
 $(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
-	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LIBS)
 
 $(OUTPUT)%.o: %.c
 	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 5/6] mdev: Update sysfs documentation
From: Parav Pandit @ 2019-08-29 11:19 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Updated documentation for optional read only sysfs attribute.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 Documentation/driver-api/vfio-mediated-device.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..0ab03d3f5629 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -270,6 +270,7 @@ Directories and Files Under the sysfs for Each mdev Device
          |--- remove
          |--- mdev_type {link to its type}
          |--- vendor-specific-attributes [optional]
+         |--- alias [optional]
 
 * remove (write only)
 
@@ -281,6 +282,10 @@ Example::
 
 	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 
+* alias (read only)
+Whenever a parent requested to generate an alias, each mdev is assigned a unique
+alias by the mdev core. This file shows the alias of the mdev device.
+
 Mediated device Hot plug
 ------------------------
 
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 6/6] mtty: Optionally support mtty alias
From: Parav Pandit @ 2019-08-29 11:19 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Provide a module parameter to set alias length to optionally generate
mdev alias.

Example to request mdev alias.
$ modprobe mtty alias_length=12

Make use of mtty_alias() API when alias_length module parameter is set.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
Changelog:
v1->v2:
 - Added mdev_alias() usage sample
---
 samples/vfio-mdev/mtty.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 92e770a06ea2..075d65440bc0 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -150,6 +150,10 @@ static const struct file_operations vd_fops = {
 	.owner          = THIS_MODULE,
 };
 
+static unsigned int mtty_alias_length;
+module_param_named(alias_length, mtty_alias_length, uint, 0444);
+MODULE_PARM_DESC(alias_length, "mdev alias length; default=0");
+
 /* function prototypes */
 
 static int mtty_trigger_interrupt(const guid_t *uuid);
@@ -770,6 +774,9 @@ static int mtty_create(struct kobject *kobj, struct mdev_device *mdev)
 	list_add(&mdev_state->next, &mdev_devices_list);
 	mutex_unlock(&mdev_list_lock);
 
+	if (mtty_alias_length)
+		dev_dbg(mdev_dev(mdev), "alias is %s\n", mdev_alias(mdev));
+
 	return 0;
 }
 
@@ -1410,6 +1417,11 @@ static struct attribute_group *mdev_type_groups[] = {
 	NULL,
 };
 
+static unsigned int mtty_get_alias_length(void)
+{
+	return mtty_alias_length;
+}
+
 static const struct mdev_parent_ops mdev_fops = {
 	.owner                  = THIS_MODULE,
 	.dev_attr_groups        = mtty_dev_groups,
@@ -1422,6 +1434,7 @@ static const struct mdev_parent_ops mdev_fops = {
 	.read                   = mtty_read,
 	.write                  = mtty_write,
 	.ioctl		        = mtty_ioctl,
+	.get_alias_length	= mtty_get_alias_length
 };
 
 static void mtty_device_release(struct device *dev)
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 4/6] mdev: Introduce an API mdev_alias
From: Parav Pandit @ 2019-08-29 11:19 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Introduce an API mdev_alias() to provide access to optionally generated
alias.

Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/vfio/mdev/mdev_core.c | 12 ++++++++++++
 include/linux/mdev.h          |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index c9bf2ac362b9..5399ed6f1612 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -517,6 +517,18 @@ struct device *mdev_get_iommu_device(struct device *dev)
 }
 EXPORT_SYMBOL(mdev_get_iommu_device);
 
+/**
+ * mdev_alias: Return alias string of a mdev device
+ * @mdev:	Pointer to the mdev device
+ * mdev_alias() returns alias string of a mdev device if alias is present,
+ * returns NULL otherwise.
+ */
+const char *mdev_alias(struct mdev_device *mdev)
+{
+	return mdev->alias;
+}
+EXPORT_SYMBOL(mdev_alias);
+
 static int __init mdev_init(void)
 {
 	int ret;
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index f036fe9854ee..6da82213bc4e 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -148,5 +148,6 @@ void mdev_unregister_driver(struct mdev_driver *drv);
 struct device *mdev_parent_dev(struct mdev_device *mdev);
 struct device *mdev_dev(struct mdev_device *mdev);
 struct mdev_device *mdev_from_dev(struct device *dev);
+const char *mdev_alias(struct mdev_device *mdev);
 
 #endif /* MDEV_H */
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 3/6] mdev: Expose mdev alias in sysfs tree
From: Parav Pandit @ 2019-08-29 11:19 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Expose the optional alias for an mdev device as a sysfs attribute.
This way, userspace tools such as udev may make use of the alias, for
example to create a netdevice name for the mdev.

Signed-off-by: Parav Pandit <parav@mellanox.com>

---
Changelog:
v0->v1:
 - Addressed comments from Cornelia Huck
 - Updated commit description
---
 drivers/vfio/mdev/mdev_sysfs.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 43afe0e80b76..59f4e3cc5233 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -246,7 +246,20 @@ static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
 
 static DEVICE_ATTR_WO(remove);
 
+static ssize_t alias_show(struct device *device,
+			  struct device_attribute *attr, char *buf)
+{
+	struct mdev_device *dev = mdev_from_dev(device);
+
+	if (!dev->alias)
+		return -EOPNOTSUPP;
+
+	return sprintf(buf, "%s\n", dev->alias);
+}
+static DEVICE_ATTR_RO(alias);
+
 static const struct attribute *mdev_device_attrs[] = {
+	&dev_attr_alias.attr,
 	&dev_attr_remove.attr,
 	NULL,
 };
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 1/6] mdev: Introduce sha1 based mdev alias
From: Parav Pandit @ 2019-08-29 11:18 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Some vendor drivers want an identifier for an mdev device that is
shorter than the UUID, due to length restrictions in the consumers of
that identifier.

Add a callback that allows a vendor driver to request an alias of a
specified length to be generated for an mdev device. If generated,
that alias is checked for collisions.

It is an optional attribute.
mdev alias is generated using sha1 from the mdev name.

Signed-off-by: Parav Pandit <parav@mellanox.com>

---
Changelog:
v1->v2:
 - Kept mdev_device naturally aligned
 - Added error checking for crypt_*() calls
 - Corrected a typo from 'and' to 'an'
 - Changed return type of generate_alias() from int to char*
v0->v1:
 - Moved alias length check outside of the parent lock
 - Moved alias and digest allocation from kvzalloc to kzalloc
 - &alias[0] changed to alias
 - alias_length check is nested under get_alias_length callback check
 - Changed comments to start with an empty line
 - Fixed cleaunup of hash if mdev_bus_register() fails
 - Added comment where alias memory ownership is handed over to mdev device
 - Updated commit log to indicate motivation for this feature
---
 drivers/vfio/mdev/mdev_core.c    | 123 ++++++++++++++++++++++++++++++-
 drivers/vfio/mdev/mdev_private.h |   5 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  13 ++--
 include/linux/mdev.h             |   4 +
 4 files changed, 135 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..3bdff0469607 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -10,9 +10,11 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <linux/uuid.h>
 #include <linux/sysfs.h>
 #include <linux/mdev.h>
+#include <crypto/hash.h>
 
 #include "mdev_private.h"
 
@@ -27,6 +29,8 @@ static struct class_compat *mdev_bus_compat_class;
 static LIST_HEAD(mdev_list);
 static DEFINE_MUTEX(mdev_list_lock);
 
+static struct crypto_shash *alias_hash;
+
 struct device *mdev_parent_dev(struct mdev_device *mdev)
 {
 	return mdev->parent->dev;
@@ -150,6 +154,16 @@ int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
 	if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
 		return -EINVAL;
 
+	if (ops->get_alias_length) {
+		unsigned int digest_size;
+		unsigned int aligned_len;
+
+		aligned_len = roundup(ops->get_alias_length(), 2);
+		digest_size = crypto_shash_digestsize(alias_hash);
+		if (aligned_len / 2 > digest_size)
+			return -EINVAL;
+	}
+
 	dev = get_device(dev);
 	if (!dev)
 		return -EINVAL;
@@ -259,6 +273,7 @@ static void mdev_device_free(struct mdev_device *mdev)
 	mutex_unlock(&mdev_list_lock);
 
 	dev_dbg(&mdev->dev, "MDEV: destroying\n");
+	kfree(mdev->alias);
 	kfree(mdev);
 }
 
@@ -269,18 +284,101 @@ static void mdev_device_release(struct device *dev)
 	mdev_device_free(mdev);
 }
 
-int mdev_device_create(struct kobject *kobj,
-		       struct device *dev, const guid_t *uuid)
+static const char *
+generate_alias(const char *uuid, unsigned int max_alias_len)
+{
+	struct shash_desc *hash_desc;
+	unsigned int digest_size;
+	unsigned char *digest;
+	unsigned int alias_len;
+	char *alias;
+	int ret;
+
+	/*
+	 * Align to multiple of 2 as bin2hex will generate
+	 * even number of bytes.
+	 */
+	alias_len = roundup(max_alias_len, 2);
+	alias = kzalloc(alias_len + 1, GFP_KERNEL);
+	if (!alias)
+		return ERR_PTR(-ENOMEM);
+
+	/* Allocate and init descriptor */
+	hash_desc = kvzalloc(sizeof(*hash_desc) +
+			     crypto_shash_descsize(alias_hash),
+			     GFP_KERNEL);
+	if (!hash_desc) {
+		ret = -ENOMEM;
+		goto desc_err;
+	}
+
+	hash_desc->tfm = alias_hash;
+
+	digest_size = crypto_shash_digestsize(alias_hash);
+
+	digest = kzalloc(digest_size, GFP_KERNEL);
+	if (!digest) {
+		ret = -ENOMEM;
+		goto digest_err;
+	}
+	ret = crypto_shash_init(hash_desc);
+	if (ret)
+		goto hash_err;
+
+	ret = crypto_shash_update(hash_desc, uuid, UUID_STRING_LEN);
+	if (ret)
+		goto hash_err;
+
+	ret = crypto_shash_final(hash_desc, digest);
+	if (ret)
+		goto hash_err;
+
+	bin2hex(alias, digest, min_t(unsigned int, digest_size, alias_len / 2));
+	/*
+	 * When alias length is odd, zero out an additional last byte
+	 * that bin2hex has copied.
+	 */
+	if (max_alias_len % 2)
+		alias[max_alias_len] = 0;
+
+	kfree(digest);
+	kvfree(hash_desc);
+	return alias;
+
+hash_err:
+	kfree(digest);
+digest_err:
+	kvfree(hash_desc);
+desc_err:
+	kfree(alias);
+	return ERR_PTR(ret);
+}
+
+int mdev_device_create(struct kobject *kobj, struct device *dev,
+		       const char *uuid_str, const guid_t *uuid)
 {
 	int ret;
 	struct mdev_device *mdev, *tmp;
 	struct mdev_parent *parent;
 	struct mdev_type *type = to_mdev_type(kobj);
+	const char *alias = NULL;
 
 	parent = mdev_get_parent(type->parent);
 	if (!parent)
 		return -EINVAL;
 
+	if (parent->ops->get_alias_length) {
+		unsigned int alias_len;
+
+		alias_len = parent->ops->get_alias_length();
+		if (alias_len) {
+			alias = generate_alias(uuid_str, alias_len);
+			if (IS_ERR(alias)) {
+				ret = PTR_ERR(alias);
+				goto alias_fail;
+			}
+		}
+	}
 	mutex_lock(&mdev_list_lock);
 
 	/* Check for duplicate */
@@ -300,6 +398,12 @@ int mdev_device_create(struct kobject *kobj,
 	}
 
 	guid_copy(&mdev->uuid, uuid);
+	mdev->alias = alias;
+	/*
+	 * At this point alias memory is owned by the mdev.
+	 * Mark it NULL, so that only mdev can free it.
+	 */
+	alias = NULL;
 	list_add(&mdev->next, &mdev_list);
 	mutex_unlock(&mdev_list_lock);
 
@@ -346,6 +450,8 @@ int mdev_device_create(struct kobject *kobj,
 	up_read(&parent->unreg_sem);
 	put_device(&mdev->dev);
 mdev_fail:
+	kfree(alias);
+alias_fail:
 	mdev_put_parent(parent);
 	return ret;
 }
@@ -406,7 +512,17 @@ EXPORT_SYMBOL(mdev_get_iommu_device);
 
 static int __init mdev_init(void)
 {
-	return mdev_bus_register();
+	int ret;
+
+	alias_hash = crypto_alloc_shash("sha1", 0, 0);
+	if (!alias_hash)
+		return -ENOMEM;
+
+	ret = mdev_bus_register();
+	if (ret)
+		crypto_free_shash(alias_hash);
+
+	return ret;
 }
 
 static void __exit mdev_exit(void)
@@ -415,6 +531,7 @@ static void __exit mdev_exit(void)
 		class_compat_unregister(mdev_bus_compat_class);
 
 	mdev_bus_unregister();
+	crypto_free_shash(alias_hash);
 }
 
 module_init(mdev_init)
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 7d922950caaf..078fdaf7836e 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -32,6 +32,7 @@ struct mdev_device {
 	struct list_head next;
 	struct kobject *type_kobj;
 	struct device *iommu_device;
+	const char *alias;
 	bool active;
 };
 
@@ -57,8 +58,8 @@ void parent_remove_sysfs_files(struct mdev_parent *parent);
 int  mdev_create_sysfs_files(struct device *dev, struct mdev_type *type);
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type);
 
-int  mdev_device_create(struct kobject *kobj,
-			struct device *dev, const guid_t *uuid);
+int mdev_device_create(struct kobject *kobj, struct device *dev,
+		       const char *uuid_str, const guid_t *uuid);
 int  mdev_device_remove(struct device *dev);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 7570c7602ab4..43afe0e80b76 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -63,15 +63,18 @@ static ssize_t create_store(struct kobject *kobj, struct device *dev,
 		return -ENOMEM;
 
 	ret = guid_parse(str, &uuid);
-	kfree(str);
 	if (ret)
-		return ret;
+		goto err;
 
-	ret = mdev_device_create(kobj, dev, &uuid);
+	ret = mdev_device_create(kobj, dev, str, &uuid);
 	if (ret)
-		return ret;
+		goto err;
 
-	return count;
+	ret = count;
+
+err:
+	kfree(str);
+	return ret;
 }
 
 MDEV_TYPE_ATTR_WO(create);
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..f036fe9854ee 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -72,6 +72,9 @@ struct device *mdev_get_iommu_device(struct device *dev);
  * @mmap:		mmap callback
  *			@mdev: mediated device structure
  *			@vma: vma structure
+ * @get_alias_length:	Generate alias for the mdevs of this parent based on the
+ *			mdev device name when it returns non zero alias length.
+ *			It is optional.
  * Parent device that support mediated device should be registered with mdev
  * module with mdev_parent_ops structure.
  **/
@@ -92,6 +95,7 @@ struct mdev_parent_ops {
 	long	(*ioctl)(struct mdev_device *mdev, unsigned int cmd,
 			 unsigned long arg);
 	int	(*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+	unsigned int (*get_alias_length)(void);
 };
 
 /* interface for exporting mdev supported type attributes */
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 2/6] mdev: Make mdev alias unique among all mdevs
From: Parav Pandit @ 2019-08-29 11:19 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190829111904.16042-1-parav@mellanox.com>

Mdev alias should be unique among all the mdevs, so that when such alias
is used by the mdev users to derive other objects, there is no
collision in a given system.

Signed-off-by: Parav Pandit <parav@mellanox.com>

---
Changelog:
v1->v2:
 - Moved alias NULL check at beginning
v0->v1:
 - Fixed inclusiong of alias for NULL check
 - Added ratelimited debug print for sha1 hash collision error
---
 drivers/vfio/mdev/mdev_core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3bdff0469607..c9bf2ac362b9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -388,6 +388,13 @@ int mdev_device_create(struct kobject *kobj, struct device *dev,
 			ret = -EEXIST;
 			goto mdev_fail;
 		}
+		if (alias && tmp->alias && strcmp(alias, tmp->alias) == 0) {
+			mutex_unlock(&mdev_list_lock);
+			ret = -EEXIST;
+			dev_dbg_ratelimited(dev, "Hash collision in alias creation for UUID %pUl\n",
+					    uuid);
+			goto mdev_fail;
+		}
 	}
 
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2 0/6] Introduce variable length mdev alias
From: Parav Pandit @ 2019-08-29 11:18 UTC (permalink / raw)
  To: alex.williamson, jiri, kwankhede, cohuck, davem
  Cc: kvm, linux-kernel, netdev, Parav Pandit
In-Reply-To: <20190826204119.54386-1-parav@mellanox.com>

To have consistent naming for the netdevice of a mdev and to have
consistent naming of the devlink port [1] of a mdev, which is formed using
phys_port_name of the devlink port, current UUID is not usable because
UUID is too long.

UUID in string format is 36-characters long and in binary 128-bit.
Both formats are not able to fit within 15 characters limit of netdev
name.

It is desired to have mdev device naming consistent using UUID.
So that widely used user space framework such as ovs [2] can make use
of mdev representor in similar way as PCIe SR-IOV VF and PF representors.

Hence,
(a) mdev alias is created which is derived using sha1 from the mdev name.
(b) Vendor driver describes how long an alias should be for the child mdev
created for a given parent.
(c) Mdev aliases are unique at system level.
(d) alias is created optionally whenever parent requested.
This ensures that non networking mdev parents can function without alias
creation overhead.

This design is discussed at [3].

An example systemd/udev extension will have,

1. netdev name created using mdev alias available in sysfs.

mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
mdev 12 character alias=cd5b146a80a5

netdev name of this mdev = enmcd5b146a80a5
Here en = Ethernet link
m = mediated device

2. devlink port phys_port_name created using mdev alias.
devlink phys_port_name=pcd5b146a80a5

This patchset enables mdev core to maintain unique alias for a mdev.

Patch-1 Introduces mdev alias using sha1.
Patch-2 Ensures that mdev alias is unique in a system.
Patch-3 Exposes mdev alias in a sysfs hirerchy.
Patch-4 Introduces mdev_alias() API.
Patch-5 Updated sysfs documentation
Patch-6 Extends mtty driver to optionally provide alias generation.
This also enables to test UUID based sha1 collision and trigger
error handling for duplicate sha1 results.

[1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
[2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
[3] https://patchwork.kernel.org/cover/11084231/

---
Changelog:
v1->v2:
 - Corrected a typo from 'and' to 'an'
 - Addressed comments from Alex Williamson
 - Kept mdev_device naturally aligned
 - Added error checking for crypt_*() calls
 - Moved alias NULL check at beginning
 - Added mdev_alias() API
 - Updated mtty driver to show example mdev_alias() usage
 - Changed return type of generate_alias() from int to char*
v0->v1:
 - Addressed comments from Alex Williamson, Cornelia Hunk and Mark Bloch
 - Moved alias length check outside of the parent lock
 - Moved alias and digest allocation from kvzalloc to kzalloc
 - &alias[0] changed to alias
 - alias_length check is nested under get_alias_length callback check
 - Changed comments to start with an empty line
 - Added comment where alias memory ownership is handed over to mdev device
 - Fixed cleaunup of hash if mdev_bus_register() fails
 - Updated documentation for new sysfs alias file
 - Improved commit logs to make description more clear
 - Fixed inclusiong of alias for NULL check
 - Added ratelimited debug print for sha1 hash collision error

Parav Pandit (6):
  mdev: Introduce sha1 based mdev alias
  mdev: Make mdev alias unique among all mdevs
  mdev: Expose mdev alias in sysfs tree
  mdev: Introduce an API mdev_alias
  mdev: Update sysfs documentation
  mtty: Optionally support mtty alias

 .../driver-api/vfio-mediated-device.rst       |   5 +
 drivers/vfio/mdev/mdev_core.c                 | 142 +++++++++++++++++-
 drivers/vfio/mdev/mdev_private.h              |   5 +-
 drivers/vfio/mdev/mdev_sysfs.c                |  26 +++-
 include/linux/mdev.h                          |   5 +
 samples/vfio-mdev/mtty.c                      |  13 ++
 6 files changed, 186 insertions(+), 10 deletions(-)

-- 
2.19.2

^ permalink raw reply

* RE: [PATCH] powerpc/kmcent2: update the ethernet devices' phy properties
From: Madalin-cristian Bucur @ 2019-08-29 11:25 UTC (permalink / raw)
  To: Scott Wood, Valentin Longchamp
  Cc: linuxppc-dev@lists.ozlabs.org, galak@kernel.crashing.org,
	netdev@vger.kernel.org
In-Reply-To: <b535dcfc3d6b06ca583dc26703d4adc958eacdd8.camel@buserror.net>

> -----Original Message-----
> From: Scott Wood <oss@buserror.net>
> Sent: Wednesday, August 28, 2019 7:19 AM
> To: Valentin Longchamp <valentin@longchamp.me>; Madalin-cristian Bucur
> <madalin.bucur@nxp.com>
> Cc: linuxppc-dev@lists.ozlabs.org; galak@kernel.crashing.org;
> netdev@vger.kernel.org
> Subject: Re: [PATCH] powerpc/kmcent2: update the ethernet devices' phy
> properties
> 
> On Thu, 2019-08-08 at 23:09 +0200, Valentin Longchamp wrote:
> > Le mar. 30 juil. 2019 à 11:44, Madalin-cristian Bucur
> > <madalin.bucur@nxp.com> a écrit :
> > >
> > > > -----Original Message-----
> > > >
> > > > > Le dim. 14 juil. 2019 à 22:05, Valentin Longchamp
> > > > > <valentin@longchamp.me> a écrit :
> > > > > >
> > > > > > Change all phy-connection-type properties to phy-mode that are
> > > > > > better
> > > > > > supported by the fman driver.
> > > > > >
> > > > > > Use the more readable fixed-link node for the 2 sgmii links.
> > > > > >
> > > > > > Change the RGMII link to rgmii-id as the clock delays are added
> by
> > > > > > the
> > > > > > phy.
> > > > > >
> > > > > > Signed-off-by: Valentin Longchamp <valentin@longchamp.me>
> > > >
> > > > I don't see any other uses of phy-mode in arch/powerpc/boot/dts/fsl,
> and
> > > > I see
> > > > lots of phy-connection-type with fman.  Madalin, does this patch
> look
> > > > OK?
> > > >
> > > > -Scott
> > >
> > > Hi,
> > >
> > > we are using "phy-connection-type" not "phy-mode" for the NXP (former
> > > Freescale)
> > > DPAA platforms. While the two seem to be interchangeable ("phy-mode"
> seems
> > > to be
> > > more recent, looking at the device tree bindings), the driver code in
> > > Linux seems
> > > to use one or the other, not both so one should stick with the variant
> the
> > > driver
> > > is using. To make things more complex, there may be dependencies in
> > > bootloaders,
> > > I see code in u-boot using only "phy-connection-type" or only "phy-
> mode".
> > >
> > > I'd leave "phy-connection-type" as is.
> >
> > So I have finally had time to have a look and now I understand what
> > happens. You are right, there are bootloader dependencies: u-boot
> > calls fdt_fixup_phy_connection() that somehow in our case adds (or
> > changes if already in the device tree) the phy-connection-type
> > property to a wrong value ! By having a phy-mode in the device tree,
> > that is not changed by u-boot and by chance picked up by the kernel
> > fman driver (of_get_phy_mode() ) over phy-connection-mode, the below
> > patch fixes it for us.
> >
> > I agree with you, it's not correct to have both phy-connection-type
> > and phy-mode. Ideally, u-boot on the board should be reworked so that
> > it does not perform the above wrong fixup. However, in an "unfixed"
> > .dtb (I have disabled fdt_fixup_phy_connection), the device tree in
> > the end only has either phy-connection-type or phy-mode, according to
> > what was chosen in the .dts file. And the fman driver works well with
> > both (thanks to the call to of_get_phy_mode() ). I would therefore
> > argue that even if all other DPAA platforms use phy-connection-type,
> > phy-mode is valid as well. (Furthermore we already have hundreds of
> > such boards in the field and we don't really support "remote" u-boot
> > update, so the u-boot fix is going to be difficult for us to pull).
> >
> > Valentin
> 
> Madalin, are you OK with the patch given this explanation?
> 
> -Scott
> 

Yes, I understand that it's the only option they have, given the inability
to upgrade u-boot (this may prove to be an issue in the future, in other
situations).

Acked-by: Madalin Bucur <madalin.bucur@nxp.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox