Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/1] net: ethernet: neterion: vxge: fix spelling mistake
From: Flavio Suligoi @ 2020-06-19  9:02 UTC (permalink / raw)
  To: Jon Mason, David S . Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Zheng Wei
  Cc: netdev, linux-kernel, bpf, Flavio Suligoi

Fix typo: "trigered" --> "triggered"

Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
---
 drivers/net/ethernet/neterion/vxge/vxge-config.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-config.h b/drivers/net/ethernet/neterion/vxge/vxge-config.h
index 628fa9b2f741..373165119850 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-config.h
+++ b/drivers/net/ethernet/neterion/vxge/vxge-config.h
@@ -297,7 +297,7 @@ struct vxge_hw_fifo_config {
  * @greedy_return: If Set it forces the device to return absolutely all RxD
  *             that are consumed and still on board when a timer interrupt
  *             triggers. If Clear, then if the device has already returned
- *             RxD before current timer interrupt trigerred and after the
+ *             RxD before current timer interrupt triggered and after the
  *             previous timer interrupt triggered, then the device is not
  *             forced to returned the rest of the consumed RxD that it has
  *             on board which account for a byte count less than the one
-- 
2.17.1


^ permalink raw reply related

* [PATCH 1/1] net: ethernet: oki-semi: pch_gbe: fix spelling mistake
From: Flavio Suligoi @ 2020-06-19  9:18 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, linux-kernel, Flavio Suligoi

Fix typo: "Triger" --> "Trigger"

Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
---
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 32b9d7705404..55cef5b16aa5 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -147,7 +147,7 @@ struct pch_gbe_regs {
 #define PCH_GBE_RH_ALM_FULL_8   0x00001000      /* 8 words */
 #define PCH_GBE_RH_ALM_FULL_16  0x00002000      /* 16 words */
 #define PCH_GBE_RH_ALM_FULL_32  0x00003000      /* 32 words */
-/* RX FIFO Read Triger Threshold */
+/* RX FIFO Read Trigger Threshold */
 #define PCH_GBE_RH_RD_TRG_4     0x00000000      /* 4 words */
 #define PCH_GBE_RH_RD_TRG_8     0x00000200      /* 8 words */
 #define PCH_GBE_RH_RD_TRG_16    0x00000400      /* 16 words */
-- 
2.17.1


^ permalink raw reply related

* Re: [RFC PATCH] net/sched: add indirect call wrapper hint.
From: Paolo Abeni @ 2020-06-19  9:24 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller, Eric Dumazet
In-Reply-To: <0240746c-1dd1-4822-261c-03ff13854db2@gmail.com>

On Thu, 2020-06-18 at 12:00 -0700, Eric Dumazet wrote:
> 
> On 6/18/20 10:31 AM, Paolo Abeni wrote:
> > The sched layer can use several indirect calls per
> > packet, with not work-conservative qdisc being
> > more affected due to the lack of the BYPASS path.
> > 
> > This change tries to improve the situation using
> > the indirect call wrappers infrastructure for the
> > qdisc enqueue end dequeue indirect calls.
> > 
> > To cope with non-trivial scenarios, a compile-time know is
> > introduced, so that the qdisc used by ICW can be different
> > from the default one.
> > 
> > Tested with pktgen over qdisc, with CONFIG_HINT_FQ_CODEL=y:
> > 
> > qdisc		threads vanilla	patched delta
> > 		nr	Kpps	Kpps	%
> > pfifo_fast	1	3300	3700	12
> > pfifo_fast	2	3940	4070	3
> > fq_codel	1	3840	4110	7
> > fq_codel	2	1920	2260	17
> > fq		1	2230	2210	-1
> > fq		2	1530	1540	1
> 
> Hi Paolo
> 
> This test is a bit misleading, pktgen has a way to bypass the qdisc.

The above figures were collected using the 
pktgen_bench_xmit_mode_queue_xmit.sh script, which in turn uses
'xmit_mode queue_xmit': packets traverse the qdisc layer via the usual
dev_queue_xmit()

> Real numbers for more typical workloads would be more appealing,
> before we consider a quite invasive patch ?

I'll add figures for netperf UDP single threaded/many threads...

> What is the status of static_call infrastructure ?

... unless you prefer waiting for the above. AFAICS, that is under
discussion. v4 was posted in May[1] and collected quite a bit of
feedback.

> >  
> > +#ifndef CODEL_SCOPE
> > +#define CODEL_SCOPE static
> > +#endif
> 
> This looks additional burden, just remove the static attribute,
> if a function might be called directly.

Yep, that will slim down the patch a bit, I will do. 

Thanks for the feedback,

Paolo

[1] https://lwn.net/ml/linux-kernel/20200501202849.647891881@infradead.org/


^ permalink raw reply

* [RFC PATCH v13 0/9] Enable ptp_kvm for arm64
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd

Currently, we offen use ntp (sync time with remote network clock)
to sync time in VM. But the precision of ntp is subject to network delay
so it's difficult to sync time in a high precision.

kvm virtual ptp clock (ptp_kvm) offers another way to sync time in VM,
as the remote clock locates in the host instead of remote network clock.
It targets to sync time between guest and host in virtualization
environment and in this way, we can keep the time of all the VMs running
in the same host in sync. In general, the delay of communication between
host and guest is quiet small, so ptp_kvm can offer time sync precision
up to in order of nanosecond. Please keep in mind that ptp_kvm just
limits itself to be a channel which transmit the remote clock from
host to guest and leaves the time sync jobs to an application, eg. chrony,
in usersapce in VM.

How ptp_kvm works:
After ptp_kvm initialized, there will be a new device node under
/dev called ptp%d. A guest userspace service, like chrony, can use this
device to get host walltime, sometimes also counter cycle, which depends
on the service it calls. Then this guest userspace service can use those
data to do the time sync for guest.
here is a rough sketch to show how kvm ptp clock works.

|----------------------------|              |--------------------------|
|       guest userspace      |              |          host            |
|ioctl -> /dev/ptp%d         |              |                          |
|       ^   |                |              |                          |
|----------------------------|              |                          |
|       |   | guest kernel   |              |                          |
|       |   V      (get host walltime/counter cycle)                   |
|      ptp_kvm -> hypercall - - - - - - - - - - ->hypercall service    |
|                         <- - - - - - - - - - - -                     |
|----------------------------|              |--------------------------|

1. time sync service in guest userspace call ptp device through /dev/ptp%d.
2. ptp_kvm module in guest recive this request then invoke hypercall to
route into host kernel to request host walltime/counter cycle.
3. ptp_kvm hypercall service in host response to the request and send data
back.
4. ptp (not ptp_kvm) in guest copy the data to userspace.

This ptp_kvm implementation focuses itself to step 2 and 3 and step 2 works
in guest comparing step 3 works in host kernel.

change log:
from v12 to v13:
        (1) rebase code on 5.8-rc1.
        (2) this patch set base on 2 patches of 1/8 and 2/8 from Will Decon.
        (3) remove the change to ptp device code of extend getcrosststamp.
        (4) remove the mechanism of letting user choose the counter type in
ptp_kvm for arm64.
        (5) add virtual counter option in ptp_kvm service to let user choose
the specific counter explicitly.

from v11 to v12:
        (1) rebase code on 5.7-rc6 and rebase 2 patches from Will Decon
including 1/11 and 2/11. as these patches introduce discover mechanism of
vendor smccc service.
        (2) rebase ptp_kvm hypercall service from standard smccc to vendor
smccc and add ptp_kvm to vendor smccc service discover mechanism.
        (3) add detail of why we need ptp_kvm and how ptp_kvm works in cover
letter.

from v10 to v11:
        (1) rebase code on 5.7-rc2.
        (2) remove support for arm32, as kvm support for arm32 will be
removed [1]
        (3) add error report in ptp_kvm initialization.

from v9 to v10:
        (1) change code base to v5.5.
        (2) enable ptp_kvm both for arm32 and arm64.
        (3) let user choose which of virtual counter or physical counter
should return when using crosstimestamp mode of ptp_kvm for arm/arm64.
        (4) extend input argument for getcrosstimestamp API.

from v8 to v9:
        (1) move ptp_kvm.h to driver/ptp/
        (2) replace license declaration of ptp_kvm.h the same with other
header files in the same directory.

from v7 to v8:
        (1) separate adding clocksource id for arm_arch_counter as a
single patch.
        (2) update commit message for patch 4/8.
        (3) refine patch 7/8 and patch 8/8 to make them more independent.

from v6 to v7:
        (1) include the omitted clocksource_id.h in last version.
        (2) reorder the header file in patch.
        (3) refine some words in commit message to make it more impersonal.

from v5 to v6:
        (1) apply Mark's patch[4] to get SMCCC conduit.
        (2) add mechanism to recognize current clocksource by add
clocksouce_id value into struct clocksource instead of method in patch-v5.
        (3) rename kvm_arch_ptp_get_clock_fn into
kvm_arch_ptp_get_crosststamp.

from v3 to v4:
        (1) fix clocksource of ptp_kvm to arch_sys_counter.
        (2) move kvm_arch_ptp_get_clock_fn into arm_arch_timer.c
        (3) subtract cntvoff before return cycles from host.
        (4) use ktime_get_snapshot instead of getnstimeofday and
get_current_counterval to return time and counter value.
        (5) split ktime and counter into two 32-bit block respectively
to avoid Y2038-safe issue.
        (6) set time compensation to device time as half of the delay of
hvc call.
        (7) add ARM_ARCH_TIMER as dependency of ptp_kvm for
arm64.

from v2 to v3:
        (1) fix some issues in commit log.
        (2) add some receivers in send list.

from v1 to v2:
        (1) move arch-specific code from arch/ to driver/ptp/
        (2) offer mechanism to inform userspace if ptp_kvm service is
available.
        (3) separate ptp_kvm code for arm64 into hypervisor part and
guest part.
        (4) add API to expose monotonic clock and counter value.
        (5) refine code: remove no necessary part and reconsitution.

[1] https://patchwork.kernel.org/cover/11373351/

Jianyong Wu (7):
  arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
  smccc: export smccc conduit get helper.
  ptp: Reorganize ptp_kvm modules to make it arch-independent.
  clocksource: Add clocksource id for arm arch counter
  arm64/kvm: Add hypercall service for kvm ptp.
  ptp: arm64: Enable ptp_kvm for arm64
  arm64: Add kvm capability check extension for ptp_kvm

Thomas Gleixner (1):
  time: Add mechanism to recognize clocksource in time_get_snapshot

Will Deacon (1):
  arm64: Probe for the presence of KVM hypervisor services during boot

 arch/arm64/include/asm/hypervisor.h         | 11 +++
 arch/arm64/kernel/setup.c                   | 36 +++++++++
 arch/arm64/kvm/arm.c                        |  4 +
 arch/arm64/kvm/hypercalls.c                 | 79 +++++++++++++++---
 drivers/clocksource/arm_arch_timer.c        | 26 ++++++
 drivers/firmware/smccc/smccc.c              |  1 +
 drivers/ptp/Kconfig                         |  2 +-
 drivers/ptp/Makefile                        |  1 +
 drivers/ptp/ptp_kvm.h                       | 11 +++
 drivers/ptp/ptp_kvm_arm64.c                 | 53 ++++++++++++
 drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 80 +++++-------------
 drivers/ptp/ptp_kvm_x86.c                   | 89 +++++++++++++++++++++
 include/linux/arm-smccc.h                   | 56 +++++++++++++
 include/linux/clocksource.h                 |  6 ++
 include/linux/clocksource_ids.h             | 12 +++
 include/linux/timekeeping.h                 | 12 +--
 include/uapi/linux/kvm.h                    |  1 +
 kernel/time/clocksource.c                   |  3 +
 kernel/time/timekeeping.c                   |  1 +
 virt/kvm/Kconfig                            |  4 +
 20 files changed, 413 insertions(+), 75 deletions(-)
 create mode 100644 drivers/ptp/ptp_kvm.h
 create mode 100644 drivers/ptp/ptp_kvm_arm64.c
 rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (63%)
 create mode 100644 drivers/ptp/ptp_kvm_x86.c
 create mode 100644 include/linux/clocksource_ids.h

-- 
2.17.1


^ permalink raw reply

* [RFC PATCH v13 1/9] arm64: Probe for the presence of KVM hypervisor services during boot
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

From: Will Deacon <will@kernel.org>

Although the SMCCC specification provides some limited functionality for
describing the presence of hypervisor and firmware services, this is
generally applicable only to functions designated as "Arm Architecture
Service Functions" and no portable discovery mechanism is provided for
standard hypervisor services, despite having a designated range of
function identifiers reserved by the specification.

In an attempt to avoid the need for additional firmware changes every
time a new function is added, introduce a UID to identify the service
provider as being compatible with KVM. Once this has been established,
additional services can be discovered via a feature bitmap.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 arch/arm64/include/asm/hypervisor.h | 11 +++++++++
 arch/arm64/kernel/setup.c           | 36 +++++++++++++++++++++++++++++
 include/linux/arm-smccc.h           | 26 +++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index f9cc1d021791..91e4bd890819 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -2,6 +2,17 @@
 #ifndef _ASM_ARM64_HYPERVISOR_H
 #define _ASM_ARM64_HYPERVISOR_H
 
+#include <linux/arm-smccc.h>
 #include <asm/xen/hypervisor.h>
 
+static inline bool kvm_arm_hyp_service_available(u32 func_id)
+{
+	extern DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS);
+
+	if (func_id >= ARM_SMCCC_KVM_NUM_FUNCS)
+		return -EINVAL;
+
+	return test_bit(func_id, __kvm_arm_hyp_services);
+}
+
 #endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 93b3844cf442..92f8762a7b80 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -7,6 +7,7 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/arm-smccc.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/stddef.h>
@@ -276,6 +277,40 @@ arch_initcall(reserve_memblock_reserved_regions);
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
+DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) = { };
+
+static void __init kvm_init_hyp_services(void)
+{
+	int i;
+	struct arm_smccc_res res;
+
+	if (arm_smccc_get_version() == ARM_SMCCC_VERSION_1_0)
+		return;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+	if (res.a0 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0 ||
+	    res.a1 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1 ||
+	    res.a2 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2 ||
+	    res.a3 != ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3)
+		return;
+
+	memset(&res, 0, sizeof(res));
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID, &res);
+	for (i = 0; i < 32; ++i) {
+		if (res.a0 & (i))
+			set_bit(i + (32 * 0), __kvm_arm_hyp_services);
+		if (res.a1 & (i))
+			set_bit(i + (32 * 1), __kvm_arm_hyp_services);
+		if (res.a2 & (i))
+			set_bit(i + (32 * 2), __kvm_arm_hyp_services);
+		if (res.a3 & (i))
+			set_bit(i + (32 * 3), __kvm_arm_hyp_services);
+	}
+
+	pr_info("KVM hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 0x%08lx)\n",
+		res.a3, res.a2, res.a1, res.a0);
+}
+
 void __init setup_arch(char **cmdline_p)
 {
 	init_mm.start_code = (unsigned long) _text;
@@ -348,6 +383,7 @@ void __init setup_arch(char **cmdline_p)
 	else
 		psci_acpi_init();
 
+	kvm_init_hyp_services();
 	init_bootcpu_ops();
 	smp_init_cpus();
 	smp_build_mpidr_hash();
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 56d6a5c6e353..86ff30131e7b 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -49,11 +49,14 @@
 #define ARM_SMCCC_OWNER_OEM		3
 #define ARM_SMCCC_OWNER_STANDARD	4
 #define ARM_SMCCC_OWNER_STANDARD_HYP	5
+#define ARM_SMCCC_OWNER_VENDOR_HYP	6
 #define ARM_SMCCC_OWNER_TRUSTED_APP	48
 #define ARM_SMCCC_OWNER_TRUSTED_APP_END	49
 #define ARM_SMCCC_OWNER_TRUSTED_OS	50
 #define ARM_SMCCC_OWNER_TRUSTED_OS_END	63
 
+#define ARM_SMCCC_FUNC_QUERY_CALL_UID	0xff01
+
 #define ARM_SMCCC_QUIRK_NONE		0
 #define ARM_SMCCC_QUIRK_QCOM_A6		1 /* Save/restore register a6 */
 
@@ -81,6 +84,29 @@
 			   ARM_SMCCC_SMC_32,				\
 			   0, 0x7fff)
 
+#define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_FUNC_QUERY_CALL_UID)
+
+/* KVM UID value: 28b46fb6-2ec5-11e9-a9ca-4b564d003a74 */
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0	0xb66fb428U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1	0xe911c52eU
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2	0x564bcaa9U
+#define ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3	0x743a004dU
+
+/* KVM "vendor specific" services */
+#define ARM_SMCCC_KVM_FUNC_FEATURES		0
+#define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
+#define ARM_SMCCC_KVM_NUM_FUNCS			128
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID			\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_FEATURES)
+
 #ifndef __ASSEMBLY__
 
 #include <linux/linkage.h>
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

From: Will Deacon <will@kernel.org>

We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 arch/arm64/kvm/hypercalls.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..db6dce3d0e23 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -12,13 +12,13 @@
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
 	u32 func_id = smccc_get_function(vcpu);
-	long val = SMCCC_RET_NOT_SUPPORTED;
+	u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
 	u32 feature;
 	gpa_t gpa;
 
 	switch (func_id) {
 	case ARM_SMCCC_VERSION_FUNC_ID:
-		val = ARM_SMCCC_VERSION_1_1;
+		val[0] = ARM_SMCCC_VERSION_1_1;
 		break;
 	case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
 		feature = smccc_get_arg1(vcpu);
@@ -28,10 +28,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 			case KVM_BP_HARDEN_UNKNOWN:
 				break;
 			case KVM_BP_HARDEN_WA_NEEDED:
-				val = SMCCC_RET_SUCCESS;
+				val[0] = SMCCC_RET_SUCCESS;
 				break;
 			case KVM_BP_HARDEN_NOT_REQUIRED:
-				val = SMCCC_RET_NOT_REQUIRED;
+				val[0] = SMCCC_RET_NOT_REQUIRED;
 				break;
 			}
 			break;
@@ -41,31 +41,40 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 			case KVM_SSBD_UNKNOWN:
 				break;
 			case KVM_SSBD_KERNEL:
-				val = SMCCC_RET_SUCCESS;
+				val[0] = SMCCC_RET_SUCCESS;
 				break;
 			case KVM_SSBD_FORCE_ENABLE:
 			case KVM_SSBD_MITIGATED:
-				val = SMCCC_RET_NOT_REQUIRED;
+				val[0] = SMCCC_RET_NOT_REQUIRED;
 				break;
 			}
 			break;
 		case ARM_SMCCC_HV_PV_TIME_FEATURES:
-			val = SMCCC_RET_SUCCESS;
+			val[0] = SMCCC_RET_SUCCESS;
 			break;
 		}
 		break;
 	case ARM_SMCCC_HV_PV_TIME_FEATURES:
-		val = kvm_hypercall_pv_features(vcpu);
+		val[0] = kvm_hypercall_pv_features(vcpu);
 		break;
 	case ARM_SMCCC_HV_PV_TIME_ST:
 		gpa = kvm_init_stolen_time(vcpu);
 		if (gpa != GPA_INVALID)
-			val = gpa;
+			val[0] = gpa;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
+		val[0] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_0;
+		val[1] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_1;
+		val[2] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_2;
+		val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
 		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
 
-	smccc_set_retval(vcpu, val, 0, 0, 0);
+	smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
 	return 1;
 }
-- 
2.17.1


^ permalink raw reply related

* [PATCH v13 3/9] smccc: Export smccc conduit get helper.
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

Export arm_smccc_1_1_get_conduit then modules can use smccc helper which
adopts it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 drivers/firmware/smccc/smccc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index 4e80921ee212..b855fe7b5c90 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -24,6 +24,7 @@ enum arm_smccc_conduit arm_smccc_1_1_get_conduit(void)
 
 	return smccc_conduit;
 }
+EXPORT_SYMBOL(arm_smccc_1_1_get_conduit);
 
 u32 arm_smccc_get_version(void)
 {
-- 
2.17.1


^ permalink raw reply related

* [PATCH 1/1] net: wireless: intersil: orinoco: fix spelling mistake
From: Flavio Suligoi @ 2020-06-19  9:31 UTC (permalink / raw)
  To: Kalle Valo, David S . Miller, Jakub Kicinski, Johan Hovold,
	Aditya Pakki, Thomas Gleixner, Gustavo A . R . Silva
  Cc: linux-wireless, netdev, linux-kernel, Flavio Suligoi

Fix typo: "EZUSB_REQUEST_TRIGER" --> "EZUSB_REQUEST_TRIGGER"

Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index 651c676b5506..11fa38fedd87 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -158,7 +158,7 @@ MODULE_FIRMWARE("orinoco_ezusb_fw");
 
 
 #define EZUSB_REQUEST_FW_TRANS		0xA0
-#define EZUSB_REQUEST_TRIGER		0xAA
+#define EZUSB_REQUEST_TRIGGER		0xAA
 #define EZUSB_REQUEST_TRIG_AC		0xAC
 #define EZUSB_CPUCS_REG			0x7F92
 
@@ -1318,12 +1318,12 @@ static int ezusb_hard_reset(struct orinoco_private *priv)
 	netdev_dbg(upriv->dev, "sending control message\n");
 	retval = usb_control_msg(upriv->udev,
 				 usb_sndctrlpipe(upriv->udev, 0),
-				 EZUSB_REQUEST_TRIGER,
+				 EZUSB_REQUEST_TRIGGER,
 				 USB_TYPE_VENDOR | USB_RECIP_DEVICE |
 				 USB_DIR_OUT, 0x0, 0x0, NULL, 0,
 				 DEF_TIMEOUT);
 	if (retval < 0) {
-		err("EZUSB_REQUEST_TRIGER failed retval %d", retval);
+		err("EZUSB_REQUEST_TRIGGER failed retval %d", retval);
 		return retval;
 	}
 #if 0
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 5/9] time: Add mechanism to recognize clocksource in time_get_snapshot
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

From: Thomas Gleixner <tglx@linutronix.de>

System time snapshots are not conveying information about the current
clocksource which was used, but callers like the PTP KVM guest
implementation have the requirement to evaluate the clocksource type to
select the appropriate mechanism.

Introduce a clocksource id field in struct clocksource which is by default
set to CSID_GENERIC (0). Clocksource implementations can set that field to
a value which allows to identify the clocksource.

Store the clocksource id of the current clocksource in the
system_time_snapshot so callers can evaluate which clocksource was used to
take the snapshot and act accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 include/linux/clocksource.h     |  6 ++++++
 include/linux/clocksource_ids.h | 11 +++++++++++
 include/linux/timekeeping.h     | 12 +++++++-----
 kernel/time/clocksource.c       |  3 +++
 kernel/time/timekeeping.c       |  1 +
 5 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/clocksource_ids.h

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143db6523..1290d0dce840 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -17,6 +17,7 @@
 #include <linux/timer.h>
 #include <linux/init.h>
 #include <linux/of.h>
+#include <linux/clocksource_ids.h>
 #include <asm/div64.h>
 #include <asm/io.h>
 
@@ -62,6 +63,10 @@ struct module;
  *			400-499: Perfect
  *				The ideal clocksource. A must-use where
  *				available.
+ * @id:			Defaults to CSID_GENERIC. The id value is captured
+ *			in certain snapshot functions to allow callers to
+ *			validate the clocksource from which the snapshot was
+ *			taken.
  * @flags:		Flags describing special properties
  * @enable:		Optional function to enable the clocksource
  * @disable:		Optional function to disable the clocksource
@@ -100,6 +105,7 @@ struct clocksource {
 	const char		*name;
 	struct list_head	list;
 	int			rating;
+	enum clocksource_ids	id;
 	enum vdso_clock_mode	vdso_clock_mode;
 	unsigned long		flags;
 
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
new file mode 100644
index 000000000000..4d8e19e05328
--- /dev/null
+++ b/include/linux/clocksource_ids.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CLOCKSOURCE_IDS_H
+#define _LINUX_CLOCKSOURCE_IDS_H
+
+/* Enum to give clocksources a unique identifier */
+enum clocksource_ids {
+	CSID_GENERIC		= 0,
+	CSID_MAX,
+};
+
+#endif
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index b27e2ffa96c1..70e771862d20 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -3,6 +3,7 @@
 #define _LINUX_TIMEKEEPING_H
 
 #include <linux/errno.h>
+#include <linux/clocksource_ids.h>
 
 /* Included from linux/ktime.h */
 
@@ -232,11 +233,12 @@ extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
  * @cs_was_changed_seq:	The sequence number of clocksource change events
  */
 struct system_time_snapshot {
-	u64		cycles;
-	ktime_t		real;
-	ktime_t		raw;
-	unsigned int	clock_was_set_seq;
-	u8		cs_was_changed_seq;
+	u64			cycles;
+	ktime_t			real;
+	ktime_t			raw;
+	enum clocksource_ids	cs_id;
+	unsigned int		clock_was_set_seq;
+	u8			cs_was_changed_seq;
 };
 
 /*
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 02441ead3c3b..9fe148734fb3 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -928,6 +928,9 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
 
 	clocksource_arch_init(cs);
 
+	if (WARN_ON_ONCE((unsigned int)cs->id >= CSID_MAX))
+		cs->id = CSID_GENERIC;
+
 	if (cs->vdso_clock_mode < 0 ||
 	    cs->vdso_clock_mode >= VDSO_CLOCKMODE_MAX) {
 		pr_warn("clocksource %s registered with invalid VDSO mode %d. Disabling VDSO support.\n",
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489841c8..2cf85d81e4ed 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -979,6 +979,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		now = tk_clock_read(&tk->tkr_mono);
+		systime_snapshot->cs_id = tk->tkr_mono.clock->id;
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 9/9] arm64: Add kvm capability check extension for ptp_kvm
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

Let userspace check if there is kvm ptp service in host.
Before VMs migrate to another host, VMM may check if this
cap is available to determine the next behavior.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c     | 4 ++++
 include/uapi/linux/kvm.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..f41e84346468 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -194,6 +194,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
 	case KVM_CAP_ARM_NISV_TO_USER:
 	case KVM_CAP_ARM_INJECT_EXT_DABT:
+
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+	case KVM_CAP_ARM_KVM_PTP:
+#endif
 		r = 1;
 		break;
 	case KVM_CAP_ARM_SET_DEVICE_ADDR:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..f3d4b00dac57 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_ARM_KVM_PTP 184
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 8/9] ptp: arm64: Enable ptp_kvm for arm64
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

Currently, there is no mechanism to keep time sync between guest and host
in arm64 virtualization environment. Time in guest will drift compared
with host after boot up as they may both use third party time sources
to correct their time respectively. The time deviation will be in order
of milliseconds. But in some scenarios,like in cloud envirenment, we ask
for higher time precision.

kvm ptp clock, which choose the host clock source as a reference
clock to sync time between guest and host, has been adopted by x86
which makes the time sync order from milliseconds to nanoseconds.

This patch enables kvm ptp clock for arm64 and improve clock sync precison
significantly.

Test result comparisons between with kvm ptp clock and without it in arm64
are as follows. This test derived from the result of command 'chronyc
sources'. we should take more care of the last sample column which shows
the offset between the local clock and the source at the last measurement.

no kvm ptp in guest:
MS Name/IP address   Stratum Poll Reach LastRx Last sample
========================================================================
^* dns1.synet.edu.cn      2   6   377    13  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    21  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    29  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    37  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    45  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    53  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    61  +1040us[+1581us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377     4   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    12   -130us[ +796us] +/-   21ms
^* dns1.synet.edu.cn      2   6   377    20   -130us[ +796us] +/-   21ms

in host:
MS Name/IP address   Stratum Poll Reach LastRx Last sample
========================================================================
^* 120.25.115.20          2   7   377    72   -470us[ -603us] +/-   18ms
^* 120.25.115.20          2   7   377    92   -470us[ -603us] +/-   18ms
^* 120.25.115.20          2   7   377   112   -470us[ -603us] +/-   18ms
^* 120.25.115.20          2   7   377     2   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377    22   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377    43   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377    63   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377    83   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377   103   +872ns[-6808ns] +/-   17ms
^* 120.25.115.20          2   7   377   123   +872ns[-6808ns] +/-   17ms

The dns1.synet.edu.cn is the network reference clock for guest and
120.25.115.20 is the network reference clock for host. we can't get the
clock error between guest and host directly, but a roughly estimated value
will be in order of hundreds of us to ms.

with kvm ptp in guest:
chrony has been disabled in host to remove the disturb by network clock.

MS Name/IP address         Stratum Poll Reach LastRx Last sample
========================================================================
* PHC0                    0   3   377     8     -7ns[   +1ns] +/-    3ns
* PHC0                    0   3   377     8     +1ns[  +16ns] +/-    3ns
* PHC0                    0   3   377     6     -4ns[   -0ns] +/-    6ns
* PHC0                    0   3   377     6     -8ns[  -12ns] +/-    5ns
* PHC0                    0   3   377     5     +2ns[   +4ns] +/-    4ns
* PHC0                    0   3   377    13     +2ns[   +4ns] +/-    4ns
* PHC0                    0   3   377    12     -4ns[   -6ns] +/-    4ns
* PHC0                    0   3   377    11     -8ns[  -11ns] +/-    6ns
* PHC0                    0   3   377    10    -14ns[  -20ns] +/-    4ns
* PHC0                    0   3   377     8     +4ns[   +5ns] +/-    4ns

The PHC0 is the ptp clock which choose the host clock as its source
clock. So we can see that the clock difference between host and guest
is in order of ns.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 drivers/clocksource/arm_arch_timer.c | 24 +++++++++++++
 drivers/ptp/Kconfig                  |  2 +-
 drivers/ptp/ptp_kvm_arm64.c          | 53 ++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ptp/ptp_kvm_arm64.c

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index edc5be0ae324..69d30b407cd9 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -1639,3 +1639,27 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
 }
 TIMER_ACPI_DECLARE(arch_timer, ACPI_SIG_GTDT, arch_timer_acpi_init);
 #endif
+
+#if IS_ENABLED(CONFIG_PTP_1588_CLOCK_KVM)
+#include <linux/arm-smccc.h>
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *ts,
+			      struct clocksource **cs)
+{
+	struct arm_smccc_res hvc_res;
+	ktime_t ktime_overall;
+
+	/* Currently, our guest will always use the virtual counter */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+			     ARM_PTP_VIRT_COUNTER, &hvc_res);
+	if ((int)(hvc_res.a0) < 0)
+		return -EOPNOTSUPP;
+
+	ktime_overall = (long long)hvc_res.a0 << 32 | hvc_res.a1;
+	*ts = ktime_to_timespec64(ktime_overall);
+	*cycle = (long long)hvc_res.a2 << 32 | hvc_res.a3;
+	*cs = &clocksource_counter;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_ptp_get_crosststamp);
+#endif
diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 942f72d8151d..127e96f14f89 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -106,7 +106,7 @@ config PTP_1588_CLOCK_PCH
 config PTP_1588_CLOCK_KVM
 	tristate "KVM virtual PTP clock"
 	depends on PTP_1588_CLOCK
-	depends on KVM_GUEST && X86
+	depends on KVM_GUEST && X86 || ARM64 && ARM_ARCH_TIMER && ARM_PSCI_FW
 	default y
 	help
 	  This driver adds support for using kvm infrastructure as a PTP
diff --git a/drivers/ptp/ptp_kvm_arm64.c b/drivers/ptp/ptp_kvm_arm64.c
new file mode 100644
index 000000000000..704a7252b119
--- /dev/null
+++ b/drivers/ptp/ptp_kvm_arm64.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Virtual PTP 1588 clock for use with KVM guests
+ *  Copyright (C) 2019 ARM Ltd.
+ *  All Rights Reserved
+ */
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <asm/hypervisor.h>
+#include <linux/module.h>
+#include <linux/psci.h>
+#include <linux/arm-smccc.h>
+#include <linux/timecounter.h>
+#include <linux/sched/clock.h>
+#include <asm/arch_timer.h>
+
+int kvm_arch_ptp_init(void)
+{
+	struct arm_smccc_res hvc_res;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+			     &hvc_res);
+	if (!(hvc_res.a0 | BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP)))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock_generic(struct timespec64 *ts,
+				   struct arm_smccc_res *hvc_res)
+{
+	ktime_t ktime_overall;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+			     hvc_res);
+	if ((int)(hvc_res->a0) < 0)
+		return -EOPNOTSUPP;
+
+	ktime_overall = (long long)hvc_res->a0 << 32 | hvc_res->a1;
+	*ts = ktime_to_timespec64(ktime_overall);
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	struct arm_smccc_res hvc_res;
+
+	kvm_arch_ptp_get_clock_generic(ts, &hvc_res);
+
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 7/9] arm64/kvm: Add hypercall service for kvm ptp.
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

ptp_kvm will get this service through smccc call.
The service offers wall time and counter cycle of host for guest.
caller must explicitly determines which cycle of virtual counter or
physical counter to return if it needs counter cycle.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 arch/arm64/kvm/Kconfig      |  6 +++++
 arch/arm64/kvm/hypercalls.c | 50 +++++++++++++++++++++++++++++++++++++
 include/linux/arm-smccc.h   | 30 ++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 13489aff4440..79091f6e5e7a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -60,6 +60,12 @@ config KVM_ARM_PMU
 config KVM_INDIRECT_VECTORS
 	def_bool HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS
 
+config ARM64_KVM_PTP_HOST
+	bool "KVM PTP host service for arm64"
+	default y
+	help
+	  virtual kvm ptp clock hypercall service for arm64
+
 endif # KVM
 
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index db6dce3d0e23..366b0646c360 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -3,6 +3,7 @@
 
 #include <linux/arm-smccc.h>
 #include <linux/kvm_host.h>
+#include <linux/clocksource_ids.h>
 
 #include <asm/kvm_emulate.h>
 
@@ -11,6 +12,10 @@
 
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+	struct system_time_snapshot systime_snapshot;
+	u64 cycles = 0;
+#endif
 	u32 func_id = smccc_get_function(vcpu);
 	u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
 	u32 feature;
@@ -70,7 +75,52 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
+#endif
+		break;
+
+#ifdef CONFIG_ARM64_KVM_PTP_HOST
+	/*
+	 * This serves virtual kvm_ptp.
+	 * Four values will be passed back.
+	 * reg0 stores high 32-bit host ktime;
+	 * reg1 stores low 32-bit host ktime;
+	 * reg2 stores high 32-bit difference of host cycles and cntvoff;
+	 * reg3 stores low 32-bit difference of host cycles and cntvoff.
+	 */
+	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
+		/*
+		 * system time and counter value must captured in the same
+		 * time to keep consistency and precision.
+		 */
+		ktime_get_snapshot(&systime_snapshot);
+		if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
+			break;
+		val[0] = upper_32_bits(systime_snapshot.real);
+		val[1] = lower_32_bits(systime_snapshot.real);
+		/*
+		 * which of virtual counter or physical counter being
+		 * asked for is decided by the first argument of smccc
+		 * call. If no first argument or invalid argument, zero
+		 * counter value will return;
+		 */
+		feature = smccc_get_arg1(vcpu);
+		switch (feature) {
+		case ARM_PTP_VIRT_COUNTER:
+			cycles = systime_snapshot.cycles -
+			vcpu_vtimer(vcpu)->cntvoff;
+			break;
+		case ARM_PTP_PHY_COUNTER:
+			cycles = systime_snapshot.cycles;
+			break;
+		}
+		val[2] = upper_32_bits(cycles);
+		val[3] = lower_32_bits(cycles);
 		break;
+#endif
+
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 86ff30131e7b..e593ec515f82 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -98,6 +98,9 @@
 
 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES		0
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP		1
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY		2
+#define ARM_SMCCC_KVM_FUNC_KVM_PTP_VIRT		3
 #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
 #define ARM_SMCCC_KVM_NUM_FUNCS			128
 
@@ -107,6 +110,33 @@
 			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
 			   ARM_SMCCC_KVM_FUNC_FEATURES)
 
+/*
+ * kvm_ptp is a feature used for time sync between vm and host.
+ * kvm_ptp module in guest kernel will get service from host using
+ * this hypercall ID.
+ */
+#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID				\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_KVM_PTP)
+
+/*
+ * kvm_ptp may get counter cycle from host and should ask for which of
+ * physical counter or virtual counter by using ARM_PTP_PHY_COUNTER and
+ * ARM_PTP_VIRT_COUNTER explicitly.
+ */
+#define ARM_PTP_PHY_COUNTER						\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY)
+
+#define ARM_PTP_VIRT_COUNTER						\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
+			   ARM_SMCCC_SMC_32,				\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
+			   ARM_SMCCC_KVM_FUNC_KVM_PTP_VIRT)
 #ifndef __ASSEMBLY__
 
 #include <linux/linkage.h>
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 6/9] clocksource: Add clocksource id for arm arch counter
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

Add clocksource id for arm arch counter to let it be identified easily and
elegantly in ptp_kvm implementation for arm.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 drivers/clocksource/arm_arch_timer.c | 2 ++
 include/linux/clocksource_ids.h      | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index ecf7b7db2d05..edc5be0ae324 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -16,6 +16,7 @@
 #include <linux/cpu_pm.h>
 #include <linux/clockchips.h>
 #include <linux/clocksource.h>
+#include <linux/clocksource_ids.h>
 #include <linux/interrupt.h>
 #include <linux/of_irq.h>
 #include <linux/of_address.h>
@@ -191,6 +192,7 @@ static u64 arch_counter_read_cc(const struct cyclecounter *cc)
 
 static struct clocksource clocksource_counter = {
 	.name	= "arch_sys_counter",
+	.id	= CSID_ARM_ARCH_COUNTER,
 	.rating	= 400,
 	.read	= arch_counter_read,
 	.mask	= CLOCKSOURCE_MASK(56),
diff --git a/include/linux/clocksource_ids.h b/include/linux/clocksource_ids.h
index 4d8e19e05328..16775d7d8f8d 100644
--- a/include/linux/clocksource_ids.h
+++ b/include/linux/clocksource_ids.h
@@ -5,6 +5,7 @@
 /* Enum to give clocksources a unique identifier */
 enum clocksource_ids {
 	CSID_GENERIC		= 0,
+	CSID_ARM_ARCH_COUNTER,
 	CSID_MAX,
 };
 
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH v13 4/9] ptp: Reorganize ptp_kvm module to make it arch-independent.
From: Jianyong Wu @ 2020-06-19  9:30 UTC (permalink / raw)
  To: netdev, yangbo.lu, john.stultz, tglx, pbonzini,
	sean.j.christopherson, maz, richardcochran, Mark.Rutland, will,
	suzuki.poulose, steven.price
  Cc: linux-kernel, linux-arm-kernel, kvmarm, kvm, Steve.Capper,
	Kaly.Xin, justin.he, Wei.Chen, jianyong.wu, nd
In-Reply-To: <20200619093033.58344-1-jianyong.wu@arm.com>

Currently, ptp_kvm modules implementation is only for x86 which includs
large part of arch-specific code.  This patch move all of those code
into new arch related file in the same directory.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
---
 drivers/ptp/Makefile                        |  1 +
 drivers/ptp/ptp_kvm.h                       | 11 +++
 drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} | 80 +++++-------------
 drivers/ptp/ptp_kvm_x86.c                   | 89 +++++++++++++++++++++
 4 files changed, 122 insertions(+), 59 deletions(-)
 create mode 100644 drivers/ptp/ptp_kvm.h
 rename drivers/ptp/{ptp_kvm.c => ptp_kvm_common.c} (63%)
 create mode 100644 drivers/ptp/ptp_kvm_x86.c

diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile
index 7aff75f745dc..baac6f5b243b 100644
--- a/drivers/ptp/Makefile
+++ b/drivers/ptp/Makefile
@@ -4,6 +4,7 @@
 #
 
 ptp-y					:= ptp_clock.o ptp_chardev.o ptp_sysfs.o
+ptp_kvm-y				:= ptp_kvm_$(ARCH).o ptp_kvm_common.o
 obj-$(CONFIG_PTP_1588_CLOCK)		+= ptp.o
 obj-$(CONFIG_PTP_1588_CLOCK_DTE)	+= ptp_dte.o
 obj-$(CONFIG_PTP_1588_CLOCK_INES)	+= ptp_ines.o
diff --git a/drivers/ptp/ptp_kvm.h b/drivers/ptp/ptp_kvm.h
new file mode 100644
index 000000000000..4bf1802bbeb8
--- /dev/null
+++ b/drivers/ptp/ptp_kvm.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+int kvm_arch_ptp_init(void);
+int kvm_arch_ptp_get_clock(struct timespec64 *ts);
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle,
+		struct timespec64 *tspec, void *cs);
diff --git a/drivers/ptp/ptp_kvm.c b/drivers/ptp/ptp_kvm_common.c
similarity index 63%
rename from drivers/ptp/ptp_kvm.c
rename to drivers/ptp/ptp_kvm_common.c
index 658d33fc3195..8d8a9bcd1d22 100644
--- a/drivers/ptp/ptp_kvm.c
+++ b/drivers/ptp/ptp_kvm_common.c
@@ -8,15 +8,16 @@
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
+#include <linux/slab.h>
 #include <linux/module.h>
 #include <uapi/linux/kvm_para.h>
 #include <asm/kvm_para.h>
-#include <asm/pvclock.h>
-#include <asm/kvmclock.h>
 #include <uapi/asm/kvm_para.h>
 
 #include <linux/ptp_clock_kernel.h>
 
+#include "ptp_kvm.h"
+
 struct kvm_ptp_clock {
 	struct ptp_clock *ptp_clock;
 	struct ptp_clock_info caps;
@@ -24,56 +25,29 @@ struct kvm_ptp_clock {
 
 static DEFINE_SPINLOCK(kvm_ptp_lock);
 
-static struct pvclock_vsyscall_time_info *hv_clock;
-
-static struct kvm_clock_pairing clock_pair;
-static phys_addr_t clock_pair_gpa;
-
 static int ptp_kvm_get_time_fn(ktime_t *device_time,
 			       struct system_counterval_t *system_counter,
 			       void *ctx)
 {
-	unsigned long ret;
+	unsigned long ret, cycle;
 	struct timespec64 tspec;
-	unsigned version;
-	int cpu;
-	struct pvclock_vcpu_time_info *src;
+	struct clocksource *cs;
 
 	spin_lock(&kvm_ptp_lock);
 
 	preempt_disable_notrace();
-	cpu = smp_processor_id();
-	src = &hv_clock[cpu].pvti;
-
-	do {
-		/*
-		 * We are using a TSC value read in the hosts
-		 * kvm_hc_clock_pairing handling.
-		 * So any changes to tsc_to_system_mul
-		 * and tsc_shift or any other pvclock
-		 * data invalidate that measurement.
-		 */
-		version = pvclock_read_begin(src);
-
-		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-				     clock_pair_gpa,
-				     KVM_CLOCK_PAIRING_WALLCLOCK);
-		if (ret != 0) {
-			pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
-			spin_unlock(&kvm_ptp_lock);
-			preempt_enable_notrace();
-			return -EOPNOTSUPP;
-		}
-
-		tspec.tv_sec = clock_pair.sec;
-		tspec.tv_nsec = clock_pair.nsec;
-		ret = __pvclock_read_cycles(src, clock_pair.tsc);
-	} while (pvclock_read_retry(src, version));
+	ret = kvm_arch_ptp_get_crosststamp(&cycle, &tspec, &cs);
+	if (ret != 0) {
+		pr_err_ratelimited("clock pairing hypercall ret %lu\n", ret);
+		spin_unlock(&kvm_ptp_lock);
+		preempt_enable_notrace();
+		return -EOPNOTSUPP;
+	}
 
 	preempt_enable_notrace();
 
-	system_counter->cycles = ret;
-	system_counter->cs = &kvm_clock;
+	system_counter->cycles = cycle;
+	system_counter->cs = cs;
 
 	*device_time = timespec64_to_ktime(tspec);
 
@@ -116,17 +90,13 @@ static int ptp_kvm_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 
 	spin_lock(&kvm_ptp_lock);
 
-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
-			     clock_pair_gpa,
-			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	ret = kvm_arch_ptp_get_clock(&tspec);
 	if (ret != 0) {
 		pr_err_ratelimited("clock offset hypercall ret %lu\n", ret);
 		spin_unlock(&kvm_ptp_lock);
 		return -EOPNOTSUPP;
 	}
 
-	tspec.tv_sec = clock_pair.sec;
-	tspec.tv_nsec = clock_pair.nsec;
 	spin_unlock(&kvm_ptp_lock);
 
 	memcpy(ts, &tspec, sizeof(struct timespec64));
@@ -166,21 +136,13 @@ static void __exit ptp_kvm_exit(void)
 
 static int __init ptp_kvm_init(void)
 {
-	long ret;
-
-	if (!kvm_para_available())
-		return -ENODEV;
-
-	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
-	hv_clock = pvclock_get_pvti_cpu0_va();
+	int ret;
 
-	if (!hv_clock)
-		return -ENODEV;
-
-	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
-			KVM_CLOCK_PAIRING_WALLCLOCK);
-	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
-		return -ENODEV;
+	ret = kvm_arch_ptp_init();
+	if (ret) {
+		pr_err("fail to initialize ptp_kvm");
+		return -EOPNOTSUPP;
+	}
 
 	kvm_ptp_clock.caps = ptp_kvm_caps;
 
diff --git a/drivers/ptp/ptp_kvm_x86.c b/drivers/ptp/ptp_kvm_x86.c
new file mode 100644
index 000000000000..aabed1b08a0d
--- /dev/null
+++ b/drivers/ptp/ptp_kvm_x86.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtual PTP 1588 clock for use with KVM guests
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ */
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <asm/pvclock.h>
+#include <asm/kvmclock.h>
+#include <linux/module.h>
+#include <uapi/asm/kvm_para.h>
+#include <uapi/linux/kvm_para.h>
+#include <linux/ptp_clock_kernel.h>
+
+phys_addr_t clock_pair_gpa;
+struct kvm_clock_pairing clock_pair;
+struct pvclock_vsyscall_time_info *hv_clock;
+
+int kvm_arch_ptp_init(void)
+{
+	int ret;
+
+	if (!kvm_para_available())
+		return -ENODEV;
+
+	clock_pair_gpa = slow_virt_to_phys(&clock_pair);
+	hv_clock = pvclock_get_pvti_cpu0_va();
+	if (!hv_clock)
+		return -ENODEV;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING, clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret == -KVM_ENOSYS || ret == -KVM_EOPNOTSUPP)
+		return -ENODEV;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_clock(struct timespec64 *ts)
+{
+	long ret;
+
+	ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+			     clock_pair_gpa,
+			     KVM_CLOCK_PAIRING_WALLCLOCK);
+	if (ret != 0)
+		return -EOPNOTSUPP;
+
+	ts->tv_sec = clock_pair.sec;
+	ts->tv_nsec = clock_pair.nsec;
+
+	return 0;
+}
+
+int kvm_arch_ptp_get_crosststamp(unsigned long *cycle, struct timespec64 *tspec,
+			      struct clocksource **cs)
+{
+	unsigned long ret;
+	unsigned int version;
+	int cpu;
+	struct pvclock_vcpu_time_info *src;
+
+	cpu = smp_processor_id();
+	src = &hv_clock[cpu].pvti;
+
+	do {
+		/*
+		 * We are using a TSC value read in the hosts
+		 * kvm_hc_clock_pairing handling.
+		 * So any changes to tsc_to_system_mul
+		 * and tsc_shift or any other pvclock
+		 * data invalidate that measurement.
+		 */
+		version = pvclock_read_begin(src);
+
+		ret = kvm_hypercall2(KVM_HC_CLOCK_PAIRING,
+				     clock_pair_gpa,
+				     KVM_CLOCK_PAIRING_WALLCLOCK);
+		tspec->tv_sec = clock_pair.sec;
+		tspec->tv_nsec = clock_pair.nsec;
+		*cycle = __pvclock_read_cycles(src, clock_pair.tsc);
+	} while (pvclock_read_retry(src, version));
+
+	*cs = &kvm_clock;
+
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 0/3] cls_flower: Offload unmasked key
From: dsatish @ 2020-06-19  9:41 UTC (permalink / raw)
  To: davem
  Cc: jhs, xiyou.wangcong, jiri, kuba, netdev, simon.horman, kesavac,
	satish.d, prathibha.nagooru, intiyaz.basha, jai.rana

This series of patches add support for offloading unmasked key along
with the masked key to the hardware. This enables hardware to manage
its own tables better based on the hardware features/capabilities.

It is possible that hardware as part of its limitations/optimizations
remove certain flows. To handle such cases where the hardware lost
the flows, tc can offload to hardware if it receives a flow for
which there exists an entry in its flow table. To handle such cases
TC when it returns EEXIST error, also programs the flow in hardware,
if hardware offload is enabled.

This also covers the uses case where addr_type type should be set
only if mask for ipv4/v6 is non-zero, as while classifying packet,
address type is set based on mask dissector of IPv4 and IPV6 keys,
hence while inserting flow also addr type should be set based on
the mask availability.

Satish Dhote (3):
  cls_flower: Set addr_type when ip mask is non-zero
  cls_flower: Pass the unmasked key to the hardware
  cls_flower: Allow flow offloading though masked key exist.

 include/net/flow_offload.h |  45 ++++++++++
 net/core/flow_offload.c    | 171 +++++++++++++++++++++++++++++++++++++
 net/sched/cls_flower.c     | 132 +++++++++++++++++++++-------
 3 files changed, 316 insertions(+), 32 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH net-next 1/3] cls_flower: Set addr_type when ip mask is non-zero
From: dsatish @ 2020-06-19  9:41 UTC (permalink / raw)
  To: davem
  Cc: jhs, xiyou.wangcong, jiri, kuba, netdev, simon.horman, kesavac,
	satish.d, prathibha.nagooru, intiyaz.basha, jai.rana
In-Reply-To: <20200619094156.31184-1-satish.d@oneconvergence.com>

Set the address type in the flower key and mask structure only if
the mask is non-zero for IPv4 and IPv6 fields.

During classifying packet, address type is set based on mask dissector
of IPv4 and IPV6 keys, hence while inserting flow also addr type should
be set based on the mask availability.

This is required for the upcoming patch in OVS where OVS offloads all
the fields that are part of the flower key irrespective of whether the
mask for those fields are zero or nonzero.

Signed-off-by: Chandra Kesava <kesavac@gmail.com>
Signed-off-by: Prathibha Nagooru <prathibha.nagooru@oneconvergence.com>
Signed-off-by: Satish Dhote <satish.d@oneconvergence.com>
---
 net/sched/cls_flower.c | 66 ++++++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..64b70d396397 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1422,6 +1422,26 @@ static int fl_set_key_ct(struct nlattr **tb,
 	return 0;
 }
 
+#define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
+#define FL_KEY_MEMBER_SIZE(member) sizeof_field(struct fl_flow_key, member)
+
+#define FL_KEY_IS_MASKED(mask, member)					\
+	memchr_inv(((char *)mask) + FL_KEY_MEMBER_OFFSET(member),	\
+		   0, FL_KEY_MEMBER_SIZE(member))			\
+
+#define FL_KEY_SET(keys, cnt, id, member)				\
+	do {								\
+		keys[cnt].key_id = id;					\
+		keys[cnt].offset = FL_KEY_MEMBER_OFFSET(member);	\
+		cnt++;							\
+	} while (0)
+
+#define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)		\
+	do {								\
+		if (FL_KEY_IS_MASKED(mask, member))			\
+			FL_KEY_SET(keys, cnt, id, member);		\
+	} while (0)
+
 static int fl_set_key(struct net *net, struct nlattr **tb,
 		      struct fl_flow_key *key, struct fl_flow_key *mask,
 		      struct netlink_ext_ack *extack)
@@ -1484,23 +1504,27 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 	}
 
 	if (tb[TCA_FLOWER_KEY_IPV4_SRC] || tb[TCA_FLOWER_KEY_IPV4_DST]) {
-		key->control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-		mask->control.addr_type = ~0;
 		fl_set_key_val(tb, &key->ipv4.src, TCA_FLOWER_KEY_IPV4_SRC,
 			       &mask->ipv4.src, TCA_FLOWER_KEY_IPV4_SRC_MASK,
 			       sizeof(key->ipv4.src));
 		fl_set_key_val(tb, &key->ipv4.dst, TCA_FLOWER_KEY_IPV4_DST,
 			       &mask->ipv4.dst, TCA_FLOWER_KEY_IPV4_DST_MASK,
 			       sizeof(key->ipv4.dst));
+		if (FL_KEY_IS_MASKED(mask, ipv4)) {
+			key->control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+			mask->control.addr_type = ~0;
+		}
 	} else if (tb[TCA_FLOWER_KEY_IPV6_SRC] || tb[TCA_FLOWER_KEY_IPV6_DST]) {
-		key->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
-		mask->control.addr_type = ~0;
 		fl_set_key_val(tb, &key->ipv6.src, TCA_FLOWER_KEY_IPV6_SRC,
 			       &mask->ipv6.src, TCA_FLOWER_KEY_IPV6_SRC_MASK,
 			       sizeof(key->ipv6.src));
 		fl_set_key_val(tb, &key->ipv6.dst, TCA_FLOWER_KEY_IPV6_DST,
 			       &mask->ipv6.dst, TCA_FLOWER_KEY_IPV6_DST_MASK,
 			       sizeof(key->ipv6.dst));
+		if (FL_KEY_IS_MASKED(mask, ipv6)) {
+			key->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+			mask->control.addr_type = ~0;
+		}
 	}
 
 	if (key->basic.ip_proto == IPPROTO_TCP) {
@@ -1581,8 +1605,6 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
 	    tb[TCA_FLOWER_KEY_ENC_IPV4_DST]) {
-		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-		mask->enc_control.addr_type = ~0;
 		fl_set_key_val(tb, &key->enc_ipv4.src,
 			       TCA_FLOWER_KEY_ENC_IPV4_SRC,
 			       &mask->enc_ipv4.src,
@@ -1593,12 +1615,15 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			       &mask->enc_ipv4.dst,
 			       TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
 			       sizeof(key->enc_ipv4.dst));
+		if (FL_KEY_IS_MASKED(mask, enc_ipv4)) {
+			key->enc_control.addr_type =
+				FLOW_DISSECTOR_KEY_IPV4_ADDRS;
+			mask->enc_control.addr_type = ~0;
+		}
 	}
 
 	if (tb[TCA_FLOWER_KEY_ENC_IPV6_SRC] ||
 	    tb[TCA_FLOWER_KEY_ENC_IPV6_DST]) {
-		key->enc_control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
-		mask->enc_control.addr_type = ~0;
 		fl_set_key_val(tb, &key->enc_ipv6.src,
 			       TCA_FLOWER_KEY_ENC_IPV6_SRC,
 			       &mask->enc_ipv6.src,
@@ -1609,6 +1634,11 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			       &mask->enc_ipv6.dst,
 			       TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
 			       sizeof(key->enc_ipv6.dst));
+		if (FL_KEY_IS_MASKED(mask, enc_ipv6)) {
+			key->enc_control.addr_type =
+				FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+			mask->enc_control.addr_type = ~0;
+		}
 	}
 
 	fl_set_key_val(tb, &key->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
@@ -1667,26 +1697,6 @@ static int fl_init_mask_hashtable(struct fl_flow_mask *mask)
 	return rhashtable_init(&mask->ht, &mask->filter_ht_params);
 }
 
-#define FL_KEY_MEMBER_OFFSET(member) offsetof(struct fl_flow_key, member)
-#define FL_KEY_MEMBER_SIZE(member) sizeof_field(struct fl_flow_key, member)
-
-#define FL_KEY_IS_MASKED(mask, member)						\
-	memchr_inv(((char *)mask) + FL_KEY_MEMBER_OFFSET(member),		\
-		   0, FL_KEY_MEMBER_SIZE(member))				\
-
-#define FL_KEY_SET(keys, cnt, id, member)					\
-	do {									\
-		keys[cnt].key_id = id;						\
-		keys[cnt].offset = FL_KEY_MEMBER_OFFSET(member);		\
-		cnt++;								\
-	} while(0);
-
-#define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)			\
-	do {									\
-		if (FL_KEY_IS_MASKED(mask, member))				\
-			FL_KEY_SET(keys, cnt, id, member);			\
-	} while(0);
-
 static void fl_init_dissector(struct flow_dissector *dissector,
 			      struct fl_flow_key *mask)
 {
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 2/3] cls_flower: Pass the unmasked key to hw
From: dsatish @ 2020-06-19  9:41 UTC (permalink / raw)
  To: davem
  Cc: jhs, xiyou.wangcong, jiri, kuba, netdev, simon.horman, kesavac,
	satish.d, prathibha.nagooru, intiyaz.basha, jai.rana
In-Reply-To: <20200619094156.31184-1-satish.d@oneconvergence.com>

Pass the unmasked key along with the masked key to the hardware.
This enables hardware to manage its own tables better based on the
hardware features/capabilities.

Signed-off-by: Chandra Kesava <kesavac@gmail.com>
Signed-off-by: Prathibha Nagooru <prathibha.nagooru@oneconvergence.com>
Signed-off-by: Satish Dhote <satish.d@oneconvergence.com>
---
 include/net/flow_offload.h |  45 ++++++++++
 net/core/flow_offload.c    | 171 +++++++++++++++++++++++++++++++++++++
 net/sched/cls_flower.c     |  43 ++++++++++
 3 files changed, 259 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index f2c8311a0433..26c6bd6bdb98 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -11,6 +11,8 @@ struct flow_match {
 	struct flow_dissector	*dissector;
 	void			*mask;
 	void			*key;
+	void			*unmasked_key;
+	struct flow_dissector	*unmasked_key_dissector;
 };
 
 struct flow_match_meta {
@@ -118,6 +120,49 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 void flow_rule_match_ct(const struct flow_rule *rule,
 			struct flow_match_ct *out);
 
+void flow_rule_match_unmasked_key_meta(const struct flow_rule *rule,
+				       struct flow_match_meta *out);
+void flow_rule_match_unmasked_key_basic(const struct flow_rule *rule,
+					struct flow_match_basic *out);
+void flow_rule_match_unmasked_key_control(const struct flow_rule *rule,
+					  struct flow_match_control *out);
+void flow_rule_match_unmasked_key_eth_addrs(const struct flow_rule *rule,
+					    struct flow_match_eth_addrs *out);
+void flow_rule_match_unmasked_key_vlan(const struct flow_rule *rule,
+				       struct flow_match_vlan *out);
+void flow_rule_match_unmasked_key_cvlan(const struct flow_rule *rule,
+					struct flow_match_vlan *out);
+void flow_rule_match_unmasked_key_ipv4_addrs(const struct flow_rule *rule,
+					     struct flow_match_ipv4_addrs *out);
+void flow_rule_match_unmasked_key_ipv6_addrs(const struct flow_rule *rule,
+					     struct flow_match_ipv6_addrs *out);
+void flow_rule_match_unmasked_key_ip(const struct flow_rule *rule,
+				     struct flow_match_ip *out);
+void flow_rule_match_unmasked_key_ports(const struct flow_rule *rule,
+					struct flow_match_ports *out);
+void flow_rule_match_unmasked_key_tcp(const struct flow_rule *rule,
+				      struct flow_match_tcp *out);
+void flow_rule_match_unmasked_key_icmp(const struct flow_rule *rule,
+				       struct flow_match_icmp *out);
+void flow_rule_match_unmasked_key_mpls(const struct flow_rule *rule,
+				       struct flow_match_mpls *out);
+void flow_rule_match_unmasked_key_enc_control(const struct flow_rule *rule,
+					      struct flow_match_control *out);
+void flow_rule_match_unmasked_key_enc_ipv4_addrs(const struct flow_rule *rule,
+					    struct flow_match_ipv4_addrs *out);
+void flow_rule_match_unmasked_key_enc_ipv6_addrs(const struct flow_rule *rule,
+					    struct flow_match_ipv6_addrs *out);
+void flow_rule_match_unmasked_key_enc_ip(const struct flow_rule *rule,
+					 struct flow_match_ip *out);
+void flow_rule_match_unmasked_key_enc_ports(const struct flow_rule *rule,
+					    struct flow_match_ports *out);
+void flow_rule_match_unmasked_key_enc_keyid(const struct flow_rule *rule,
+					    struct flow_match_enc_keyid *out);
+void flow_rule_match_unmasked_key_enc_opts(const struct flow_rule *rule,
+					   struct flow_match_enc_opts *out);
+void flow_rule_match_unmasked_key_ct(const struct flow_rule *rule,
+				     struct flow_match_ct *out);
+
 enum flow_action_id {
 	FLOW_ACTION_ACCEPT		= 0,
 	FLOW_ACTION_DROP,
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 0cfc35e6be28..a98c31e864b1 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -173,6 +173,177 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
+#define FLOW_UNMASKED_KEY_DISSECTOR_MATCH(__rule, __type, __out)	\
+	const struct flow_match *__m = &(__rule)->match;		\
+	struct flow_dissector *__d = (__m)->unmasked_key_dissector;	\
+									\
+	(__out)->key = skb_flow_dissector_target(__d, __type,		\
+						 (__m)->unmasked_key);	\
+	(__out)->mask = skb_flow_dissector_target(__d, __type,		\
+						  (__m)->mask)		\
+
+void flow_rule_match_unmasked_key_meta(const struct flow_rule *rule,
+				       struct flow_match_meta *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_META, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_meta);
+
+void
+flow_rule_match_unmasked_key_basic(const struct flow_rule *rule,
+				   struct flow_match_basic *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_BASIC, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_basic);
+
+void flow_rule_match_unmasked_key_control(const struct flow_rule *rule,
+					  struct flow_match_control *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_CONTROL,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_control);
+
+void flow_rule_match_unmasked_key_eth_addrs(const struct flow_rule *rule,
+					    struct flow_match_eth_addrs *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_eth_addrs);
+
+void flow_rule_match_unmasked_key_vlan(const struct flow_rule *rule,
+				       struct flow_match_vlan *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_VLAN, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_vlan);
+
+void flow_rule_match_unmasked_key_cvlan(const struct flow_rule *rule,
+					struct flow_match_vlan *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_CVLAN, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_cvlan);
+
+void flow_rule_match_unmasked_key_ipv4_addrs(const struct flow_rule *rule,
+					     struct flow_match_ipv4_addrs *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_IPV4_ADDRS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_ipv4_addrs);
+
+void flow_rule_match_unmasked_key_ipv6_addrs(const struct flow_rule *rule,
+					     struct flow_match_ipv6_addrs *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_IPV6_ADDRS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_ipv6_addrs);
+
+void flow_rule_match_unmasked_key_ip(const struct flow_rule *rule,
+				     struct flow_match_ip *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_IP, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_ip);
+
+void flow_rule_match_unmasked_key_ports(const struct flow_rule *rule,
+					struct flow_match_ports *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_PORTS, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_ports);
+
+void flow_rule_match_unmasked_key_tcp(const struct flow_rule *rule,
+				      struct flow_match_tcp *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_TCP, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_tcp);
+
+void flow_rule_match_unmasked_key_icmp(const struct flow_rule *rule,
+				       struct flow_match_icmp *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ICMP, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_icmp);
+
+void flow_rule_match_unmasked_key_mpls(const struct flow_rule *rule,
+				       struct flow_match_mpls *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_MPLS, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_mpls);
+
+void flow_rule_match_unmasked_key_enc_control(const struct flow_rule *rule,
+					      struct flow_match_control *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_control);
+
+void
+flow_rule_match_unmasked_key_enc_ipv4_addrs(const struct flow_rule *rule,
+					    struct flow_match_ipv4_addrs *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule,
+					  FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_ipv4_addrs);
+
+void
+flow_rule_match_unmasked_key_enc_ipv6_addrs(const struct flow_rule *rule,
+					    struct flow_match_ipv6_addrs *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule,
+					  FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_ipv6_addrs);
+
+void flow_rule_match_unmasked_key_enc_ip(const struct flow_rule *rule,
+					 struct flow_match_ip *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_IP, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_ip);
+
+void flow_rule_match_unmasked_key_enc_ports(const struct flow_rule *rule,
+					    struct flow_match_ports *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_PORTS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_ports);
+
+void flow_rule_match_unmasked_key_enc_keyid(const struct flow_rule *rule,
+					    struct flow_match_enc_keyid *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_KEYID,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_keyid);
+
+void flow_rule_match_unmasked_key_enc_opts(const struct flow_rule *rule,
+					   struct flow_match_enc_opts *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_ENC_OPTS,
+					  out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_enc_opts);
+
+void flow_rule_match_unmasked_key_ct(const struct flow_rule *rule,
+				     struct flow_match_ct *out)
+{
+	FLOW_UNMASKED_KEY_DISSECTOR_MATCH(rule, FLOW_DISSECTOR_KEY_CT, out);
+}
+EXPORT_SYMBOL(flow_rule_match_unmasked_key_ct);
+
 struct flow_action_cookie *flow_action_cookie_create(void *data,
 						     unsigned int len,
 						     gfp_t gfp)
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 64b70d396397..f1a5352cbb04 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -121,6 +121,7 @@ struct cls_fl_filter {
 	 */
 	refcount_t refcnt;
 	bool deleted;
+	struct flow_dissector unmasked_key_dissector;
 };
 
 static const struct rhashtable_params mask_ht_params = {
@@ -449,6 +450,13 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	cls_flower.rule->match.key = &f->mkey;
 	cls_flower.classid = f->res.classid;
 
+	/* Pass unmasked key and corresponding dissector also to the driver,
+	 * hardware may optimize its flow table based on its capabilities.
+	 */
+	cls_flower.rule->match.unmasked_key = &f->key;
+	cls_flower.rule->match.unmasked_key_dissector =
+						&f->unmasked_key_dissector;
+
 	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
 	if (err) {
 		kfree(cls_flower.rule);
@@ -1753,6 +1761,39 @@ static void fl_init_dissector(struct flow_dissector *dissector,
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
 
+/* Initialize dissector for unmasked key. */
+static void fl_init_unmasked_key_dissector(struct flow_dissector *dissector)
+{
+	struct flow_dissector_key keys[FLOW_DISSECTOR_KEY_MAX];
+	size_t cnt = 0;
+
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_META, meta);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_CONTROL, control);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_BASIC, basic);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ETH_ADDRS, eth);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS_RANGE, tp_range);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_IP, ip);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_TCP, tcp);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ICMP, icmp);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ARP, arp);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_MPLS, mpls);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_VLAN, vlan);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_CVLAN, cvlan);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_KEYID, enc_key_id);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, enc_ipv4);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS, enc_ipv6);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_CONTROL, enc_control);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_PORTS, enc_tp);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_IP, enc_ip);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
+	FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_CT, ct);
+
+	skb_flow_dissector_init(dissector, keys, cnt);
+}
+
 static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head,
 					       struct fl_flow_mask *mask)
 {
@@ -1980,6 +2021,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (err)
 		goto errout;
 
+	fl_init_unmasked_key_dissector(&fnew->unmasked_key_dissector);
+
 	err = fl_ht_insert_unique(fnew, fold, &in_ht);
 	if (err)
 		goto errout_mask;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 3/3] cls_flower: Allow flow offloading though masked key exist.
From: dsatish @ 2020-06-19  9:41 UTC (permalink / raw)
  To: davem
  Cc: jhs, xiyou.wangcong, jiri, kuba, netdev, simon.horman, kesavac,
	satish.d, prathibha.nagooru, intiyaz.basha, jai.rana
In-Reply-To: <20200619094156.31184-1-satish.d@oneconvergence.com>

A packet reaches OVS user space, only if, either there is no rule in
datapath/hardware or there is race condition that the flow is added
to hardware but before it is processed another packet arrives.

It is possible hardware as part of its limitations/optimizations
remove certain flows. To handle such cases where the hardware lost
the flows, tc can offload to hardware if it receives a flow for which
there exists an entry in its flow table. To handle such cases TC when
it returns EEXIST error, also programs the flow in hardware, if
hardware offload is enabled.

Signed-off-by: Chandra Kesava <kesavac@gmail.com>
Signed-off-by: Prathibha Nagooru <prathibha.nagooru@oneconvergence.com>
Signed-off-by: Satish Dhote <satish.d@oneconvergence.com>
---
 net/sched/cls_flower.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index f1a5352cbb04..d718233cd5b9 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -431,7 +431,8 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
 				struct cls_fl_filter *f, bool rtnl_held,
-				struct netlink_ext_ack *extack)
+				struct netlink_ext_ack *extack,
+				unsigned long cookie)
 {
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
@@ -444,7 +445,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_REPLACE;
-	cls_flower.cookie = (unsigned long) f;
+	cls_flower.cookie = cookie;
 	cls_flower.rule->match.dissector = &f->mask->dissector;
 	cls_flower.rule->match.mask = &f->mask->key;
 	cls_flower.rule->match.key = &f->mkey;
@@ -2024,11 +2025,25 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	fl_init_unmasked_key_dissector(&fnew->unmasked_key_dissector);
 
 	err = fl_ht_insert_unique(fnew, fold, &in_ht);
-	if (err)
+	if (err) {
+		/* It is possible Hardware lost the flow even though TC has it,
+		 * and flow miss in hardware causes controller to offload flow again.
+		 */
+		if (err == -EEXIST && !tc_skip_hw(fnew->flags)) {
+			struct cls_fl_filter *f =
+				__fl_lookup(fnew->mask, &fnew->mkey);
+
+			if (f)
+				fl_hw_replace_filter(tp, fnew, rtnl_held,
+						     extack,
+						     (unsigned long)(f));
+		}
 		goto errout_mask;
+	}
 
 	if (!tc_skip_hw(fnew->flags)) {
-		err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack);
+		err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack,
+					   (unsigned long)fnew);
 		if (err)
 			goto errout_ht;
 	}
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v3 1/1] eventfd: implementation of EFD_MASK flag
From: Paul Elder @ 2020-06-19 10:16 UTC (permalink / raw)
  To: Damian Hobson-Garcia
  Cc: viro, sustrik, linux-kernel, linux-fsdevel, linux-api, netdev,
	David.Laight, laurent.pinchart
In-Reply-To: <1444873328-8466-2-git-send-email-dhobsong@igel.co.jp>

Hello Damian, Martin, and all,

I came across this (quite old by now) patch to extend eventfd's polling
functionality. I was wondering what happened to it (why it hasn't been
merged yet) and if we could, or what is needed to, move it forward.

I was thinking to use it for V4L2 events support via polling in the V4L2
compatibility layer for libcamera [1]. We can signal V4L2 buffer
availability POLLOUT via write(), but there is no way to signal V4L2
events, as they are signaled via POLLPRI.


Thank you,

Paul

[1] https://libcamera.org/docs.html#id1

On Thu, Oct 15, 2015 at 10:42:08AM +0900, Damian Hobson-Garcia wrote:
> From: Martin Sustrik <sustrik@250bpm.com>
> 
> When implementing network protocols in user space, one has to implement
> fake file descriptors to represent the sockets for the protocol.
> 
> Polling on such fake file descriptors is a problem (poll/select/epoll
> accept only true file descriptors) and forces protocol implementers to use
> various workarounds resulting in complex, non-standard and convoluted APIs.
> 
> More generally, ability to create full-blown file descriptors for
> userspace-to-userspace signalling is missing. While eventfd(2) goes half
> the way towards this goal it has follwoing shorcomings:
> 
> I.  There's no way to signal POLLPRI, POLLHUP etc.
> II. There's no way to signal arbitrary combination of POLL* flags. Most
>     notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid
>     combination for a network protocol (rx buffer is empty and tx buffer is
>     full), cannot be signaled using eventfd.
> 
> This patch implements new EFD_MASK flag which solves the above problems.
> 
> The semantics of EFD_MASK are as follows:
> 
> eventfd(2):
> 
> If eventfd is created with EFD_MASK flag set, it is initialised in such a
> way as to signal no events on the file descriptor when it is polled on.
> The 'initval' argument is ignored.
> 
> write(2):
> 
> User is allowed to write only buffers containing a 32-bit value
> representing any combination of event flags as defined by the poll(2)
> function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.). Specified events
> will be signaled when polling (select, poll, epoll) on the eventfd is
> done later on.
> 
> read(2):
> 
> read is not supported and will fail with EINVAL.
> 
> select(2), poll(2) and similar:
> 
> When polling on the eventfd marked by EFD_MASK flag, all the events
> specified in last written event flags shall be signaled.
> 
> Signed-off-by: Martin Sustrik <sustrik@250bpm.com>
> 
> [dhobsong@igel.co.jp: Rebased, and resubmitted for Linux 4.3]
> Signed-off-by: Damian Hobson-Garcia <dhobsong@igel.co.jp>
> ---
>  fs/eventfd.c                 | 102 ++++++++++++++++++++++++++++++++++++++-----
>  include/linux/eventfd.h      |  16 +------
>  include/uapi/linux/eventfd.h |  33 ++++++++++++++
>  3 files changed, 126 insertions(+), 25 deletions(-)
>  create mode 100644 include/uapi/linux/eventfd.h
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 8d0c0df..1310779 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -2,6 +2,7 @@
>   *  fs/eventfd.c
>   *
>   *  Copyright (C) 2007  Davide Libenzi <davidel@xmailserver.org>
> + *  Copyright (C) 2013  Martin Sustrik <sustrik@250bpm.com>
>   *
>   */
>  
> @@ -22,18 +23,31 @@
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
>  
> +#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
> +#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP)
> +
>  struct eventfd_ctx {
>  	struct kref kref;
>  	wait_queue_head_t wqh;
> -	/*
> -	 * Every time that a write(2) is performed on an eventfd, the
> -	 * value of the __u64 being written is added to "count" and a
> -	 * wakeup is performed on "wqh". A read(2) will return the "count"
> -	 * value to userspace, and will reset "count" to zero. The kernel
> -	 * side eventfd_signal() also, adds to the "count" counter and
> -	 * issue a wakeup.
> -	 */
> -	__u64 count;
> +	union {
> +		/*
> +		 * Every time that a write(2) is performed on an eventfd, the
> +		 * value of the __u64 being written is added to "count" and a
> +		 * wakeup is performed on "wqh". A read(2) will return the
> +		 * "count" value to userspace, and will reset "count" to zero.
> +		 * The kernel side eventfd_signal() also, adds to the "count"
> +		 * counter and issue a wakeup.
> +		 */
> +		__u64 count;
> +
> +		/*
> +		 * When using eventfd in EFD_MASK mode this stracture stores the
> +		 * current events to be signaled on the eventfd (events member)
> +		 * along with opaque user-defined data (data member).
> +		 */
> +		__u32 events;
> +	};
>  	unsigned int flags;
>  };
>  
> @@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait)
>  	return events;
>  }
>  
> +static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait)
> +{
> +	struct eventfd_ctx *ctx = file->private_data;
> +
> +	poll_wait(file, &ctx->wqh, wait);
> +	return ctx->events;
> +}
> +
>  static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
>  {
>  	*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
> @@ -239,6 +261,14 @@ static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
>  	return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt);
>  }
>  
> +static ssize_t eventfd_mask_read(struct file *file, char __user *buf,
> +			    size_t count,
> +			    loff_t *ppos)
> +{
> +	return -EINVAL;
> +}
> +
> +
>  static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count,
>  			     loff_t *ppos)
>  {
> @@ -286,6 +316,28 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>  	return res;
>  }
>  
> +static ssize_t eventfd_mask_write(struct file *file, const char __user *buf,
> +			     size_t count,
> +			     loff_t *ppos)
> +{
> +	struct eventfd_ctx *ctx = file->private_data;
> +	__u32 events;
> +
> +	if (count < sizeof(events))
> +		return -EINVAL;
> +	if (copy_from_user(&events, buf, sizeof(events)))
> +		return -EFAULT;
> +	if (events & ~EFD_MASK_VALID_EVENTS)
> +		return -EINVAL;
> +	spin_lock_irq(&ctx->wqh.lock);
> +	memcpy(&ctx->events, &events, sizeof(ctx->events));
> +	if (waitqueue_active(&ctx->wqh))
> +		wake_up_locked_poll(&ctx->wqh,
> +			(unsigned long)ctx->events);
> +	spin_unlock_irq(&ctx->wqh.lock);
> +	return sizeof(ctx->events);
> +}
> +
>  #ifdef CONFIG_PROC_FS
>  static void eventfd_show_fdinfo(struct seq_file *m, struct file *f)
>  {
> @@ -296,6 +348,16 @@ static void eventfd_show_fdinfo(struct seq_file *m, struct file *f)
>  		   (unsigned long long)ctx->count);
>  	spin_unlock_irq(&ctx->wqh.lock);
>  }
> +
> +static void eventfd_mask_show_fdinfo(struct seq_file *m, struct file *f)
> +{
> +	struct eventfd_ctx *ctx = f->private_data;
> +
> +	spin_lock_irq(&ctx->wqh.lock);
> +	seq_printf(m, "eventfd-mask: %x\n",
> +		ctx->events);
> +	spin_unlock_irq(&ctx->wqh.lock);
> +}
>  #endif
>  
>  static const struct file_operations eventfd_fops = {
> @@ -309,6 +371,17 @@ static const struct file_operations eventfd_fops = {
>  	.llseek		= noop_llseek,
>  };
>  
> +static const struct file_operations eventfd_mask_fops = {
> +#ifdef CONFIG_PROC_FS
> +	.show_fdinfo	= eventfd_mask_show_fdinfo,
> +#endif
> +	.release	= eventfd_release,
> +	.poll		= eventfd_mask_poll,
> +	.read		= eventfd_mask_read,
> +	.write		= eventfd_mask_write,
> +	.llseek		= noop_llseek,
> +};
> +
>  /**
>   * eventfd_fget - Acquire a reference of an eventfd file descriptor.
>   * @fd: [in] Eventfd file descriptor.
> @@ -392,6 +465,7 @@ struct file *eventfd_file_create(unsigned int count, int flags)
>  {
>  	struct file *file;
>  	struct eventfd_ctx *ctx;
> +	const struct file_operations *fops;
>  
>  	/* Check the EFD_* constants for consistency.  */
>  	BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC);
> @@ -406,10 +480,16 @@ struct file *eventfd_file_create(unsigned int count, int flags)
>  
>  	kref_init(&ctx->kref);
>  	init_waitqueue_head(&ctx->wqh);
> -	ctx->count = count;
> +	if (flags & EFD_MASK) {
> +		ctx->events = 0;
> +		fops = &eventfd_mask_fops;
> +	} else {
> +		ctx->count = count;
> +		fops = &eventfd_fops;
> +	}
>  	ctx->flags = flags;
>  
> -	file = anon_inode_getfile("[eventfd]", &eventfd_fops, ctx,
> +	file = anon_inode_getfile("[eventfd]", fops, ctx,
>  				  O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS));
>  	if (IS_ERR(file))
>  		eventfd_free_ctx(ctx);
> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index ff0b981..87de343 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h
> @@ -8,23 +8,11 @@
>  #ifndef _LINUX_EVENTFD_H
>  #define _LINUX_EVENTFD_H
>  
> +#include <uapi/linux/eventfd.h>
> +
>  #include <linux/fcntl.h>
>  #include <linux/wait.h>
>  
> -/*
> - * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
> - * new flags, since they might collide with O_* ones. We want
> - * to re-use O_* flags that couldn't possibly have a meaning
> - * from eventfd, in order to leave a free define-space for
> - * shared O_* flags.
> - */
> -#define EFD_SEMAPHORE (1 << 0)
> -#define EFD_CLOEXEC O_CLOEXEC
> -#define EFD_NONBLOCK O_NONBLOCK
> -
> -#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> -#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> -
>  struct file;
>  
>  #ifdef CONFIG_EVENTFD
> diff --git a/include/uapi/linux/eventfd.h b/include/uapi/linux/eventfd.h
> new file mode 100644
> index 0000000..097dcad
> --- /dev/null
> +++ b/include/uapi/linux/eventfd.h
> @@ -0,0 +1,33 @@
> +/*
> + *  Copyright (C) 2013 Martin Sustrik <sustrik@250bpm.com>
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_EVENTFD_H
> +#define _UAPI_LINUX_EVENTFD_H
> +
> +/* For O_CLOEXEC */
> +#include <linux/fcntl.h>
> +#include <linux/types.h>
> +
> +/*
> + * CAREFUL: Check include/asm-generic/fcntl.h when defining
> + * new flags, since they might collide with O_* ones. We want
> + * to re-use O_* flags that couldn't possibly have a meaning
> + * from eventfd, in order to leave a free define-space for
> + * shared O_* flags.
> + */
> +
> +/* Provide semaphore-like semantics for reads from the eventfd. */
> +#define EFD_SEMAPHORE (1 << 0)
> +/* Provide event mask semantics for the eventfd. */
> +#define EFD_MASK (1 << 1)
> +/*  Set the close-on-exec (FD_CLOEXEC) flag on the eventfd. */
> +#define EFD_CLOEXEC O_CLOEXEC
> +/*  Create the eventfd in non-blocking mode. */
> +#define EFD_NONBLOCK O_NONBLOCK
> +#endif /* _UAPI_LINUX_EVENTFD_H */
> -- 
> 1.9.1

^ permalink raw reply

* Re: [PATCH] linux++, this: rename "struct notifier_block *this"
From: Jan Engelhardt @ 2020-06-19 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexey Dobriyan, torvalds, linux-kernel, netdev, linux-arch,
	netfilter-devel
In-Reply-To: <20200619074631.GA1427@infradead.org>

On Friday 2020-06-19 09:46, Christoph Hellwig wrote:

>On Fri, Jun 19, 2020 at 12:06:45AM +0300, Alexey Dobriyan wrote:
>> Rename
>> 	struct notifier_block *this
>> to
>> 	struct notifier_block *nb
>> 
>> "nb" is arguably a better name for notifier block.
>
>But not enough better to cause tons of pointless churn.  Feel free
>to use better naming in new code you write or do significant changes
>to, but stop these pointless renames.

Well, judging from the mention of "linux++" in the subject, I figure 
this is the old discussion of someone trying to make Linux code, or 
parts thereof, work in a C++ environment. Since the patch does not just 
touch headers but .c files, I deduce that there seems to be a project 
trying to build Linux, or a subset thereof, as a C++ program for the fun 
of it. UML could come to mind.

Is it a hot potato? Definitely. But so was IPv6 NAT, and now we have it 
anyway.

^ permalink raw reply

* Re: [RFC PATCH v13 7/9] arm64/kvm: Add hypercall service for kvm ptp.
From: Steven Price @ 2020-06-19 10:44 UTC (permalink / raw)
  To: Jianyong Wu, netdev@vger.kernel.org, yangbo.lu@nxp.com,
	john.stultz@linaro.org, tglx@linutronix.de, pbonzini@redhat.com,
	sean.j.christopherson@intel.com, maz@kernel.org,
	richardcochran@gmail.com, Mark Rutland, will@kernel.org,
	Suzuki Poulose
  Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, Steve Capper,
	Kaly Xin, Justin He, Wei Chen, nd
In-Reply-To: <20200619093033.58344-8-jianyong.wu@arm.com>

On 19/06/2020 10:30, Jianyong Wu wrote:
> ptp_kvm will get this service through smccc call.
> The service offers wall time and counter cycle of host for guest.
> caller must explicitly determines which cycle of virtual counter or
> physical counter to return if it needs counter cycle.
> 
> Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
> ---
>   arch/arm64/kvm/Kconfig      |  6 +++++
>   arch/arm64/kvm/hypercalls.c | 50 +++++++++++++++++++++++++++++++++++++
>   include/linux/arm-smccc.h   | 30 ++++++++++++++++++++++
>   3 files changed, 86 insertions(+)
> 
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 13489aff4440..79091f6e5e7a 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -60,6 +60,12 @@ config KVM_ARM_PMU
>   config KVM_INDIRECT_VECTORS
>   	def_bool HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS
>   
> +config ARM64_KVM_PTP_HOST
> +	bool "KVM PTP host service for arm64"
> +	default y
> +	help
> +	  virtual kvm ptp clock hypercall service for arm64
> +
>   endif # KVM
>   
>   endif # VIRTUALIZATION
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index db6dce3d0e23..366b0646c360 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -3,6 +3,7 @@
>   
>   #include <linux/arm-smccc.h>
>   #include <linux/kvm_host.h>
> +#include <linux/clocksource_ids.h>
>   
>   #include <asm/kvm_emulate.h>
>   
> @@ -11,6 +12,10 @@
>   
>   int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   {
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> +	struct system_time_snapshot systime_snapshot;
> +	u64 cycles = 0;
> +#endif
>   	u32 func_id = smccc_get_function(vcpu);
>   	u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
>   	u32 feature;
> @@ -70,7 +75,52 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>   		break;
>   	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
>   		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> +
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> +		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_KVM_PTP);
> +#endif
> +		break;
> +
> +#ifdef CONFIG_ARM64_KVM_PTP_HOST
> +	/*
> +	 * This serves virtual kvm_ptp.
> +	 * Four values will be passed back.
> +	 * reg0 stores high 32-bit host ktime;
> +	 * reg1 stores low 32-bit host ktime;
> +	 * reg2 stores high 32-bit difference of host cycles and cntvoff;
> +	 * reg3 stores low 32-bit difference of host cycles and cntvoff.
> +	 */
> +	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> +		/*
> +		 * system time and counter value must captured in the same
> +		 * time to keep consistency and precision.
> +		 */
> +		ktime_get_snapshot(&systime_snapshot);
> +		if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER)
> +			break;
> +		val[0] = upper_32_bits(systime_snapshot.real);
> +		val[1] = lower_32_bits(systime_snapshot.real);
> +		/*
> +		 * which of virtual counter or physical counter being
> +		 * asked for is decided by the first argument of smccc
> +		 * call. If no first argument or invalid argument, zero
> +		 * counter value will return;
> +		 */

It's not actually possible to have "no first argument" - there's no 
argument count, so whatever is in the register during the call with be 
passed. I'd also caution that "first argument" is ambigious: r0 could be 
the 'first' but is also the function number, here you mean r1.

There's also a subtle cast to 32 bits here (feature is u32), which might 
be worth a comment before someone 'optimises' by removing the 'feature' 
variable.

Finally I'm not sure if zero counter value is best - would it not be 
possible for this to be a valid counter value?

> +		feature = smccc_get_arg1(vcpu);
> +		switch (feature) {
> +		case ARM_PTP_VIRT_COUNTER:
> +			cycles = systime_snapshot.cycles -
> +			vcpu_vtimer(vcpu)->cntvoff;

Please indent the continuation line so that it's obvious.

> +			break;
> +		case ARM_PTP_PHY_COUNTER:
> +			cycles = systime_snapshot.cycles;
> +			break;
> +		}
> +		val[2] = upper_32_bits(cycles);
> +		val[3] = lower_32_bits(cycles);
>   		break;
> +#endif
> +
>   	default:
>   		return kvm_psci_call(vcpu);
>   	}
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index 86ff30131e7b..e593ec515f82 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -98,6 +98,9 @@
>   
>   /* KVM "vendor specific" services */
>   #define ARM_SMCCC_KVM_FUNC_FEATURES		0
> +#define ARM_SMCCC_KVM_FUNC_KVM_PTP		1
> +#define ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY		2
> +#define ARM_SMCCC_KVM_FUNC_KVM_PTP_VIRT		3
>   #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
>   #define ARM_SMCCC_KVM_NUM_FUNCS			128
>   
> @@ -107,6 +110,33 @@
>   			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
>   			   ARM_SMCCC_KVM_FUNC_FEATURES)
>   
> +/*
> + * kvm_ptp is a feature used for time sync between vm and host.
> + * kvm_ptp module in guest kernel will get service from host using
> + * this hypercall ID.
> + */
> +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID				\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
> +			   ARM_SMCCC_SMC_32,				\
> +			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
> +			   ARM_SMCCC_KVM_FUNC_KVM_PTP)
> +
> +/*
> + * kvm_ptp may get counter cycle from host and should ask for which of
> + * physical counter or virtual counter by using ARM_PTP_PHY_COUNTER and
> + * ARM_PTP_VIRT_COUNTER explicitly.
> + */
> +#define ARM_PTP_PHY_COUNTER						\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
> +			   ARM_SMCCC_SMC_32,				\
> +			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
> +			   ARM_SMCCC_KVM_FUNC_KVM_PTP_PHY)
> +
> +#define ARM_PTP_VIRT_COUNTER						\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,				\
> +			   ARM_SMCCC_SMC_32,				\
> +			   ARM_SMCCC_OWNER_VENDOR_HYP,			\
> +			   ARM_SMCCC_KVM_FUNC_KVM_PTP_VIRT)

These two are not SMCCC calls themselves (just parameters to an SMCCC), 
so they really shouldn't be defined using ARM_SMCCC_CALL_VAL (it's just 
confusing and unnecessary). Can we not just pick small integers (e.g. 0 
and 1) for these?

We also need some documentation of these SMCCC calls somewhere which 
would make this sort of review easier. For instance for paravirtualised 
stolen time there is Documentation/virt/kvm/arm/pvtime.rst (which also 
links to the published document from Arm).

Steve

>   #ifndef __ASSEMBLY__
>   
>   #include <linux/linkage.h>
> 


^ permalink raw reply

* [PATCH iproute2] tc: flower: support multiple MPLS LSE match
From: Guillaume Nault @ 2020-06-19 10:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Add the new "mpls" keyword that can be used to match MPLS fields in
arbitrary Label Stack Entries.
LSEs are introduced by the "lse" keyword and followed by LSE options:
"depth", "label", "tc", "bos" and "ttl". The depth is manadtory, the
other options are optionals.

For example, the following filter drops MPLS packets having two labels,
where the first label is 21 and has TTL 64 and the second label is 22:

$ tc filter add dev ethX ingress proto mpls_uc flower mpls \
    lse depth 1 label 21 ttl 64 \
    lse depth 2 label 22 bos 1 \
    action drop

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 man/man8/tc-flower.8 |  73 +++++++++++++-
 tc/f_flower.c        | 221 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 292 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 4d32ff1b..693be571 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -46,6 +46,8 @@ flower \- flow based traffic control filter
 .IR PRIORITY " | "
 .BR cvlan_ethtype " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
+.B mpls
+.IR LSE_LIST " | "
 .B mpls_label
 .IR LABEL " | "
 .B mpls_tc
@@ -96,7 +98,24 @@ flower \- flow based traffic control filter
 }
 .IR OPTIONS " | "
 .BR ip_flags
-.IR IP_FLAGS
+.IR IP_FLAGS " }"
+
+.ti -8
+.IR LSE_LIST " := [ " LSE_LIST " ] " LSE
+
+.ti -8
+.IR LSE " := "
+.B lse depth
+.IR DEPTH " { "
+.B label
+.IR LABEL " | "
+.B tc
+.IR TC " | "
+.B bos
+.IR BOS " | "
+.B ttl
+.IR TTL " }"
+
 .SH DESCRIPTION
 The
 .B flower
@@ -182,6 +201,56 @@ Match on QinQ layer three protocol.
 may be either
 .BR ipv4 ", " ipv6
 or an unsigned 16bit value in hexadecimal format.
+
+.TP
+.BI mpls " LSE_LIST"
+Match on the MPLS label stack.
+.I LSE_LIST
+is a list of Label Stack Entries, each introduced by the
+.BR lse " keyword."
+This option can't be used together with the standalone
+.BR mpls_label ", " mpls_tc ", " mpls_bos " and " mpls_ttl " options."
+.RS
+.TP
+.BI lse " LSE_OPTIONS"
+Match on an MPLS Label Stack Entry.
+.I LSE_OPTIONS
+is a list of options that describe the properties of the LSE to match.
+.RS
+.TP
+.BI depth " DEPTH"
+The depth of the Label Stack Entry to consider. Depth starts at 1 (the
+outermost Label Stack Entry). The maximum usable depth may be limitted by the
+kernel. This option is mandatory.
+.I DEPTH
+is an unsigned 8 bit value in decimal format.
+.TP
+.BI label " LABEL"
+Match on the MPLS Label field at the specified
+.BR depth .
+.I LABEL
+is an unsigned 20 bit value in decimal format.
+.TP
+.BI tc " TC"
+Match on the MPLS Traffic Class field at the specified
+.BR depth .
+.I TC
+is an unsigned 3 bit value in decimal format.
+.TP
+.BI bos " BOS"
+Match on the MPLS Bottom Of Stack field at the specified
+.BR depth .
+.I BOS
+is a 1 bit value in decimal format.
+.TP
+.BI ttl " TTL"
+Match on the MPLS Time To Live field at the specified
+.BR depth .
+.I TTL
+is an unsigned 8 bit value in decimal format.
+.RE
+.RE
+
 .TP
 .BI mpls_label " LABEL"
 Match the label id in the outermost MPLS label stack entry.
@@ -393,7 +462,7 @@ on the matches of the next lower layer. Precisely, layer one and two matches
 (\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
 have no dependency,
 MPLS and layer three matches
-(\fBmpls_label\fR, \fBmpls_tc\fR, \fBmpls_bos\fR, \fBmpls_ttl\fR,
+(\fBmpls\fR, \fBmpls_label\fR, \fBmpls_tc\fR, \fBmpls_bos\fR, \fBmpls_ttl\fR,
 \fBip_proto\fR, \fBdst_ip\fR, \fBsrc_ip\fR, \fBarp_tip\fR, \fBarp_sip\fR,
 \fBarp_op\fR, \fBarp_tha\fR, \fBarp_sha\fR and \fBip_flags\fR)
 depend on the
diff --git a/tc/f_flower.c b/tc/f_flower.c
index fc136911..00c919fd 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -59,6 +59,7 @@ static void explain(void)
 		"			ip_proto [tcp | udp | sctp | icmp | icmpv6 | IP-PROTO ] |\n"
 		"			ip_tos MASKED-IP_TOS |\n"
 		"			ip_ttl MASKED-IP_TTL |\n"
+		"			mpls LSE-LIST |\n"
 		"			mpls_label LABEL |\n"
 		"			mpls_tc TC |\n"
 		"			mpls_bos BOS |\n"
@@ -89,6 +90,8 @@ static void explain(void)
 		"			ct_label MASKED_CT_LABEL |\n"
 		"			ct_mark MASKED_CT_MARK |\n"
 		"			ct_zone MASKED_CT_ZONE }\n"
+		"	LSE-LIST := [ LSE-LIST ] LSE\n"
+		"	LSE := lse depth DEPTH { label LABEL | tc TC | bos BOS | ttl TTL }\n"
 		"	FILTERID := X:Y:Z\n"
 		"	MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS }\n"
 		"	MASKED_CT_STATE := combination of {+|-} and flags trk,est,new\n"
@@ -1199,11 +1202,127 @@ static int flower_parse_enc_opts_erspan(char *str, struct nlmsghdr *n)
 	return 0;
 }
 
+static int flower_parse_mpls_lse(int *argc_p, char ***argv_p,
+				 struct nlmsghdr *nlh)
+{
+	struct rtattr *lse_attr;
+	char **argv = *argv_p;
+	int argc = *argc_p;
+	__u8 depth = 0;
+	int ret;
+
+	lse_attr = addattr_nest(nlh, MAX_MSG,
+				TCA_FLOWER_KEY_MPLS_OPTS_LSE | NLA_F_NESTED);
+
+	while (argc > 0) {
+		if (matches(*argv, "depth") == 0) {
+			NEXT_ARG();
+			ret = get_u8(&depth, *argv, 10);
+			if (ret < 0 || depth < 1) {
+				fprintf(stderr, "Illegal \"depth\"\n");
+				return -1;
+			}
+			addattr8(nlh, MAX_MSG,
+				 TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH, depth);
+		} else if (matches(*argv, "label") == 0) {
+			__u32 label;
+
+			NEXT_ARG();
+			ret = get_u32(&label, *argv, 10);
+			if (ret < 0 ||
+			    label & ~(MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)) {
+				fprintf(stderr, "Illegal \"label\"\n");
+				return -1;
+			}
+			addattr32(nlh, MAX_MSG,
+				  TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL, label);
+		} else if (matches(*argv, "tc") == 0) {
+			__u8 tc;
+
+			NEXT_ARG();
+			ret = get_u8(&tc, *argv, 10);
+			if (ret < 0 ||
+			    tc & ~(MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)) {
+				fprintf(stderr, "Illegal \"tc\"\n");
+				return -1;
+			}
+			addattr8(nlh, MAX_MSG, TCA_FLOWER_KEY_MPLS_OPT_LSE_TC,
+				 tc);
+		} else if (matches(*argv, "bos") == 0) {
+			__u8 bos;
+
+			NEXT_ARG();
+			ret = get_u8(&bos, *argv, 10);
+			if (ret < 0 || bos & ~(MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)) {
+				fprintf(stderr, "Illegal \"bos\"\n");
+				return -1;
+			}
+			addattr8(nlh, MAX_MSG, TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS,
+				 bos);
+		} else if (matches(*argv, "ttl") == 0) {
+			__u8 ttl;
+
+			NEXT_ARG();
+			ret = get_u8(&ttl, *argv, 10);
+			if (ret < 0 || ttl & ~(MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)) {
+				fprintf(stderr, "Illegal \"ttl\"\n");
+				return -1;
+			}
+			addattr8(nlh, MAX_MSG, TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL,
+				 ttl);
+		} else {
+			break;
+		}
+		argc--; argv++;
+	}
+
+	if (!depth) {
+		missarg("depth");
+		return -1;
+	}
+
+	addattr_nest_end(nlh, lse_attr);
+
+	*argc_p = argc;
+	*argv_p = argv;
+
+	return 0;
+}
+
+static int flower_parse_mpls(int *argc_p, char ***argv_p, struct nlmsghdr *nlh)
+{
+	struct rtattr *mpls_attr;
+	char **argv = *argv_p;
+	int argc = *argc_p;
+
+	mpls_attr = addattr_nest(nlh, MAX_MSG,
+				 TCA_FLOWER_KEY_MPLS_OPTS | NLA_F_NESTED);
+
+	while (argc > 0) {
+		if (matches(*argv, "lse") == 0) {
+			NEXT_ARG();
+			if (flower_parse_mpls_lse(&argc, &argv, nlh) < 0)
+				return -1;
+		} else {
+			break;
+		}
+	}
+
+	addattr_nest_end(nlh, mpls_attr);
+
+	*argc_p = argc;
+	*argv_p = argv;
+
+	return 0;
+}
+
 static int flower_parse_opt(struct filter_util *qu, char *handle,
 			    int argc, char **argv, struct nlmsghdr *n)
 {
 	int ret;
 	struct tcmsg *t = NLMSG_DATA(n);
+	bool mpls_format_old = false;
+	bool mpls_format_new = false;
 	struct rtattr *tail;
 	__be16 eth_type = TC_H_MIN(t->tcm_info);
 	__be16 vlan_ethtype = 0;
@@ -1381,6 +1500,23 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 						 &cvlan_ethtype, n);
 			if (ret < 0)
 				return -1;
+		} else if (matches(*argv, "mpls") == 0) {
+			NEXT_ARG();
+			if (eth_type != htons(ETH_P_MPLS_UC) &&
+			    eth_type != htons(ETH_P_MPLS_MC)) {
+				fprintf(stderr,
+					"Can't set \"mpls\" if ethertype isn't MPLS\n");
+				return -1;
+			}
+			if (mpls_format_old) {
+				fprintf(stderr,
+					"Can't set \"mpls\" if \"mpls_label\", \"mpls_tc\", \"mpls_bos\" or \"mpls_ttl\" is set\n");
+				return -1;
+			}
+			mpls_format_new = true;
+			if (flower_parse_mpls(&argc, &argv, n) < 0)
+				return -1;
+			continue;
 		} else if (matches(*argv, "mpls_label") == 0) {
 			__u32 label;
 
@@ -1391,6 +1527,12 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 					"Can't set \"mpls_label\" if ethertype isn't MPLS\n");
 				return -1;
 			}
+			if (mpls_format_new) {
+				fprintf(stderr,
+					"Can't set \"mpls_label\" if \"mpls\" is set\n");
+				return -1;
+			}
+			mpls_format_old = true;
 			ret = get_u32(&label, *argv, 10);
 			if (ret < 0 || label & ~(MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)) {
 				fprintf(stderr, "Illegal \"mpls_label\"\n");
@@ -1407,6 +1549,12 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 					"Can't set \"mpls_tc\" if ethertype isn't MPLS\n");
 				return -1;
 			}
+			if (mpls_format_new) {
+				fprintf(stderr,
+					"Can't set \"mpls_tc\" if \"mpls\" is set\n");
+				return -1;
+			}
+			mpls_format_old = true;
 			ret = get_u8(&tc, *argv, 10);
 			if (ret < 0 || tc & ~(MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)) {
 				fprintf(stderr, "Illegal \"mpls_tc\"\n");
@@ -1423,6 +1571,12 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 					"Can't set \"mpls_bos\" if ethertype isn't MPLS\n");
 				return -1;
 			}
+			if (mpls_format_new) {
+				fprintf(stderr,
+					"Can't set \"mpls_bos\" if \"mpls\" is set\n");
+				return -1;
+			}
+			mpls_format_old = true;
 			ret = get_u8(&bos, *argv, 10);
 			if (ret < 0 || bos & ~(MPLS_LS_S_MASK >> MPLS_LS_S_SHIFT)) {
 				fprintf(stderr, "Illegal \"mpls_bos\"\n");
@@ -1439,6 +1593,12 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 					"Can't set \"mpls_ttl\" if ethertype isn't MPLS\n");
 				return -1;
 			}
+			if (mpls_format_new) {
+				fprintf(stderr,
+					"Can't set \"mpls_ttl\" if \"mpls\" is set\n");
+				return -1;
+			}
+			mpls_format_old = true;
 			ret = get_u8(&ttl, *argv, 10);
 			if (ret < 0 || ttl & ~(MPLS_LS_TTL_MASK >> MPLS_LS_TTL_SHIFT)) {
 				fprintf(stderr, "Illegal \"mpls_ttl\"\n");
@@ -2316,6 +2476,66 @@ static void flower_print_u32(const char *name, struct rtattr *attr)
 	print_uint(PRINT_ANY, name, namefrm, rta_getattr_u32(attr));
 }
 
+static void flower_print_mpls_opt_lse(const char *name, struct rtattr *lse)
+{
+	struct rtattr *tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX + 1];
+	struct rtattr *attr;
+
+	if (lse->rta_type != (TCA_FLOWER_KEY_MPLS_OPTS_LSE | NLA_F_NESTED)) {
+		fprintf(stderr, "rta_type 0x%x, expecting 0x%x (0x%x & 0x%x)\n",
+		       lse->rta_type,
+		       TCA_FLOWER_KEY_MPLS_OPTS_LSE & NLA_F_NESTED,
+		       TCA_FLOWER_KEY_MPLS_OPTS_LSE, NLA_F_NESTED);
+		return;
+	}
+
+	parse_rtattr(tb, TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX, RTA_DATA(lse),
+		     RTA_PAYLOAD(lse));
+
+	print_nl();
+	open_json_array(PRINT_ANY, name);
+	attr = tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_DEPTH];
+	if (attr)
+		print_hhu(PRINT_ANY, "depth", " depth %u",
+			  rta_getattr_u8(attr));
+	attr = tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL];
+	if (attr)
+		print_uint(PRINT_ANY, "label", " label %u",
+			   rta_getattr_u32(attr));
+	attr = tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_TC];
+	if (attr)
+		print_hhu(PRINT_ANY, "tc", " tc %u", rta_getattr_u8(attr));
+	attr = tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_BOS];
+	if (attr)
+		print_hhu(PRINT_ANY, "bos", " bos %u", rta_getattr_u8(attr));
+	attr = tb[TCA_FLOWER_KEY_MPLS_OPT_LSE_TTL];
+	if (attr)
+		print_hhu(PRINT_ANY, "ttl", " ttl %u", rta_getattr_u8(attr));
+	close_json_array(PRINT_JSON, NULL);
+}
+
+static void flower_print_mpls_opts(const char *name, struct rtattr *attr)
+{
+	struct rtattr *lse;
+	int rem;
+
+	if (!attr || !(attr->rta_type & NLA_F_NESTED))
+		return;
+
+	print_nl();
+	open_json_array(PRINT_ANY, name);
+	rem = RTA_PAYLOAD(attr);
+	lse = RTA_DATA(attr);
+	while (RTA_OK(lse, rem)) {
+		flower_print_mpls_opt_lse("    lse", lse);
+		lse = RTA_NEXT(lse, rem);
+	};
+	if (rem)
+		fprintf(stderr, "!!!Deficit %d, rta_len=%d\n",
+			rem, lse->rta_len);
+	close_json_array(PRINT_JSON, NULL);
+}
+
 static void flower_print_arp_op(const char *name,
 				struct rtattr *op_attr,
 				struct rtattr *mask_attr)
@@ -2430,6 +2650,7 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 	flower_print_ip_attr("ip_ttl", tb[TCA_FLOWER_KEY_IP_TTL],
 			    tb[TCA_FLOWER_KEY_IP_TTL_MASK]);
 
+	flower_print_mpls_opts("  mpls", tb[TCA_FLOWER_KEY_MPLS_OPTS]);
 	flower_print_u32("mpls_label", tb[TCA_FLOWER_KEY_MPLS_LABEL]);
 	flower_print_u8("mpls_tc", tb[TCA_FLOWER_KEY_MPLS_TC]);
 	flower_print_u8("mpls_bos", tb[TCA_FLOWER_KEY_MPLS_BOS]);
-- 
2.21.3


^ permalink raw reply related

* sock diag uAPI and MPTCP
From: Paolo Abeni @ 2020-06-19 10:54 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, mptcp

Hi,

IPPROTO_MPTCP value (0x106) can't be represented using the current sock
diag uAPI, as the 'sdiag_protocol' field is 8 bits wide.

To implement diag support for MPTCP socket, we will likely need a
'inet_diag_req_v3' with a wider 'sdiag_protocol'
field. inet_diag_handler_cmd() could detect the version of
the inet_diag_req_v* provided by user-space checking nlmsg_len() and
convert _v2 reqs to _v3.

This change will be a bit invasive, as all in kernel diag users will
then operate only on 'inet_diag_req_v3' (many functions' signature
change required), but the code-related changes will be encapsulated
by inet_diag_handler_cmd().

Would the above be acceptable?

Thanks for any feedback!

Paolo


^ permalink raw reply

* [PATCH net] openvswitch: take into account de-fragmentation in execute_check_pkt_len
From: Lorenzo Bianconi @ 2020-06-19 11:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, nusiddiq, gvrose8192, pshelar, lorenzo.bianconi, dev

ovs connection tracking module performs de-fragmentation on incoming
fragmented traffic. Take info account if traffic has been de-fragmented
in execute_check_pkt_len action otherwise we will perform the wrong
nested action considering the original packet size. This issue typically
occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.

Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/openvswitch/actions.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index fc0efd8833c8..9f4dd64e53bb 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1169,9 +1169,10 @@ static int execute_check_pkt_len(struct datapath *dp, struct sk_buff *skb,
 				 struct sw_flow_key *key,
 				 const struct nlattr *attr, bool last)
 {
+	struct ovs_skb_cb *ovs_cb = OVS_CB(skb);
 	const struct nlattr *actions, *cpl_arg;
 	const struct check_pkt_len_arg *arg;
-	int rem = nla_len(attr);
+	int len, rem = nla_len(attr);
 	bool clone_flow_key;
 
 	/* The first netlink attribute in 'attr' is always
@@ -1180,7 +1181,8 @@ static int execute_check_pkt_len(struct datapath *dp, struct sk_buff *skb,
 	cpl_arg = nla_data(attr);
 	arg = nla_data(cpl_arg);
 
-	if (skb->len <= arg->pkt_len) {
+	len = ovs_cb->mru ? ovs_cb->mru : skb->len;
+	if (len <= arg->pkt_len) {
 		/* Second netlink attribute in 'attr' is always
 		 * 'OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL'.
 		 */
-- 
2.26.2


^ permalink raw reply related

* [PATCH iproute2-next] devlink: add 'disk' to 'fw_load_policy' string validation
From: louis.peens @ 2020-06-19 11:50 UTC (permalink / raw)
  To: netdev; +Cc: dsahem, oss-drivers, Louis Peens

From: Louis Peens <louis.peens@netronome.com>

The 'fw_load_policy' devlink parameter supports the 'disk' value
since kernel v5.4, seems like there was some oversight in adding
this to iproute, fixed by this patch.

Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 devlink/devlink.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 507972c3..a0a7770a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -2349,6 +2349,11 @@ static const struct param_val_conv param_val_conv[] = {
 		.vstr = "flash",
 		.vuint = DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH,
 	},
+	{
+		.name = "fw_load_policy",
+		.vstr = "disk",
+		.vuint = DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK,
+	},
 	{
 		.name = "reset_dev_on_drv_probe",
 		.vstr = "unknown",
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox