Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01 13:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531174857.GDZHeIib57h5lT5Vh1@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 481 bytes --]

On 31.05.23 19:48, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 04:20:08PM +0200, Juergen Gross wrote:
>> One other note: why does mtrr_cleanup() think that using 8 instead of 6
>> variable MTRRs would be an "optimal setting"?
> 
> Maybe the more extensive debug output below would help answer that...
> 
>> IMO it should replace the original setup only in case it is using _less_
>> MTRRs than before.
> 
> Right.

The attached patch will do that.


Juergen


[-- Attachment #1.1.2: v7-0001-x86-mtrr-Let-mtrr_cleanup-not-increase-number-of-.patch --]
[-- Type: text/x-patch, Size: 2367 bytes --]

From 7989ef9822115a708fc2ba3f7740888a350cb40f Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 1 Jun 2023 14:40:58 +0200
Subject: [PATCH v7] x86/mtrr: Let mtrr_cleanup() not increase number of used
 MTRRs

Today mtrr_cleanup() will always use the best found alternative MTRR
setting, even if this setting is using more variable MTRRs than the
BIOS provided setup.

Add a check that only settings with less variable MTRRs are used.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V7:
- new patch
---
 arch/x86/kernel/cpu/mtrr/cleanup.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index a7cb5d32d03d..a5d331722092 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -567,7 +567,7 @@ static int __init mtrr_need_cleanup(void)
 	    num_var_ranges - num[MTRR_NUM_TYPES])
 		return 0;
 
-	return 1;
+	return num_var_ranges - num[MTRR_NUM_TYPES];
 }
 
 static unsigned long __initdata range_sums;
@@ -673,6 +673,7 @@ int __init mtrr_cleanup(void)
 	u64 chunk_size, gran_size;
 	mtrr_type type;
 	int index_good;
+	int num_used;
 	int i;
 
 	if (!cpu_feature_enabled(X86_FEATURE_MTRR) || enable_mtrr_cleanup < 1)
@@ -693,7 +694,8 @@ int __init mtrr_cleanup(void)
 	}
 
 	/* Check if we need handle it and can handle it: */
-	if (!mtrr_need_cleanup())
+	num_used = mtrr_need_cleanup();
+	if (!num_used)
 		return 0;
 
 	/* Print original var MTRRs at first, for debugging: */
@@ -728,6 +730,10 @@ int __init mtrr_cleanup(void)
 		mtrr_print_out_one_result(i);
 
 		if (!result[i].bad) {
+			if (result[i].num_reg >= num_used) {
+				Dprintk("BIOS provided MTRR setting is better than found one\n");
+				return 0;
+			}
 			set_var_mtrr_all();
 			Dprintk("New variable MTRRs\n");
 			print_out_mtrr_range_state();
@@ -762,8 +768,12 @@ int __init mtrr_cleanup(void)
 	index_good = mtrr_search_optimal_index();
 
 	if (index_good != -1) {
-		pr_info("Found optimal setting for mtrr clean up\n");
 		i = index_good;
+		if (result[i].num_reg >= num_used) {
+			Dprintk("BIOS provided MTRR setting is better than found one\n");
+			return 0;
+		}
+		pr_info("Found optimal setting for mtrr clean up\n");
 		mtrr_print_out_one_result(i);
 
 		/* Convert ranges to var ranges state: */
-- 
2.35.3


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01 12:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230601124844.GBZHiTrDQk+F3lbzGO@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 555 bytes --]

On 01.06.23 14:48, Borislav Petkov wrote:
> On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:
>> Does this translate to: "we should remove that cleanup crap"? I'd be
>> positive to that. :-)
> 
> Why, what's wrong with that thing?
> 

Why do you need it if you don't think adding MTRRs dynamically is
important?

Having a sub-optimal MTRR setup doesn't matter unless you are running
out of MTRRs to use. When you are not adding MTRRs, you can't run out
of them.

This in turn means you don't need mtrr_cleanup().


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Borislav Petkov @ 2023-06-01 12:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <6700dc14-98fa-232d-5f8c-68a418849671@suse.com>

On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:
> Does this translate to: "we should remove that cleanup crap"? I'd be
> positive to that. :-)

Why, what's wrong with that thing?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01  8:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531174857.GDZHeIib57h5lT5Vh1@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 1575 bytes --]

On 31.05.23 19:48, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 04:20:08PM +0200, Juergen Gross wrote:
>> One other note: why does mtrr_cleanup() think that using 8 instead of 6
>> variable MTRRs would be an "optimal setting"?
> 
> Maybe the more extensive debug output below would help answer that...

Patch 2 wants this diff on top:

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 59b48bd8380c..ce254ca89c62 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -655,7 +655,7 @@ static bool set_mtrr_var_ranges(unsigned int index, struct 
mtrr_var_range *vr)
         bool changed = false;

         rdmsr(MTRRphysBase_MSR(index), lo, hi);
-       if ((vr->base_lo & MTRR_PHYSBASE_RSVD) != (lo & MTRR_PHYSBASE_RSVD)
+       if ((vr->base_lo & ~MTRR_PHYSBASE_RSVD) != (lo & ~MTRR_PHYSBASE_RSVD)
             || (vr->base_hi & ~phys_hi_rsvd) != (hi & ~phys_hi_rsvd)) {

                 mtrr_wrmsr(MTRRphysBase_MSR(index), vr->base_lo, vr->base_hi);
@@ -664,7 +664,7 @@ static bool set_mtrr_var_ranges(unsigned int index, struct 
mtrr_var_range *vr)

         rdmsr(MTRRphysMask_MSR(index), lo, hi);

-       if ((vr->mask_lo & MTRR_PHYSMASK_RSVD) != (lo & MTRR_PHYSMASK_RSVD)
+       if ((vr->mask_lo & ~MTRR_PHYSMASK_RSVD) != (lo & ~MTRR_PHYSMASK_RSVD)
             || (vr->mask_hi & ~phys_hi_rsvd) != (hi & ~phys_hi_rsvd)) {
                 mtrr_wrmsr(MTRRphysMask_MSR(index), vr->mask_lo, vr->mask_hi);
                 changed = true;



Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-06-01  6:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531174857.GDZHeIib57h5lT5Vh1@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 8740 bytes --]

On 31.05.23 19:48, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 04:20:08PM +0200, Juergen Gross wrote:
>> One other note: why does mtrr_cleanup() think that using 8 instead of 6
>> variable MTRRs would be an "optimal setting"?
> 
> Maybe the more extensive debug output below would help answer that...

Above question isn't answered, but it at least tells me that the plan was
to write MTRR values as seen on the original kernel.

Looking into the issue with that information in mind.

> 
>> IMO it should replace the original setup only in case it is using _less_
>> MTRRs than before.
> 
> Right.

I'll look into that later, unless my question below will be answered with
"yes".

> 
>> Additionally I believe mtrr_cleanup() would make much more sense if it
>> wouldn't be __init, but being usable when trying to add additional MTRRs
>> in the running system in case we run out of MTRRs.
>>
>> It should probably be based on the new MTRR map anyway...
> 
> So I'm not really sure we really care about adding additional MTRRs.

Does this translate to: "we should remove that cleanup crap"? I'd be
positive to that. :-)

> There probably is a use case which does that but I haven't seen one yet
> - MTRRs are all legacy crap to me.

I think there are still a few drivers using them. No idea how often
those drivers are in use, though.

> 
> Btw, one more patch ontop:
> 
> ---
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Wed, 31 May 2023 19:23:34 +0200
> Subject: [PATCH] x86/mtrr: Unify debugging printing
> 
> Put all the debugging output behind "mtrr=debug" and get rid of
> "mtrr_cleanup_debug" which wasn't even documented anywhere.
> 
> No functional changes.
> 
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

> ---
>   arch/x86/kernel/cpu/mtrr/cleanup.c | 59 ++++++++++++------------------
>   arch/x86/kernel/cpu/mtrr/generic.c |  2 +-
>   arch/x86/kernel/cpu/mtrr/mtrr.c    |  5 +--
>   arch/x86/kernel/cpu/mtrr/mtrr.h    |  3 ++
>   4 files changed, 29 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
> index ed5f84c20ac2..18cf79d6e2c5 100644
> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
> @@ -55,9 +55,6 @@ static int __initdata				nr_range;
>   
>   static struct var_mtrr_range_state __initdata	range_state[RANGE_NUM];
>   
> -static int __initdata debug_print;
> -#define Dprintk(x...) do { if (debug_print) pr_debug(x); } while (0)
> -
>   #define BIOS_BUG_MSG \
>   	"WARNING: BIOS bug: VAR MTRR %d contains strange UC entry under 1M, check with your system vendor!\n"
>   
> @@ -79,12 +76,11 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
>   		nr_range = add_range_with_merge(range, RANGE_NUM, nr_range,
>   						base, base + size);
>   	}
> -	if (debug_print) {
> -		pr_debug("After WB checking\n");
> -		for (i = 0; i < nr_range; i++)
> -			pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
> -				 range[i].start, range[i].end);
> -	}
> +
> +	Dprintk("After WB checking\n");
> +	for (i = 0; i < nr_range; i++)
> +		Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
> +			 range[i].start, range[i].end);
>   
>   	/* Take out UC ranges: */
>   	for (i = 0; i < num_var_ranges; i++) {
> @@ -112,24 +108,22 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
>   		subtract_range(range, RANGE_NUM, extra_remove_base,
>   				 extra_remove_base + extra_remove_size);
>   
> -	if  (debug_print) {
> -		pr_debug("After UC checking\n");
> -		for (i = 0; i < RANGE_NUM; i++) {
> -			if (!range[i].end)
> -				continue;
> -			pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
> -				 range[i].start, range[i].end);
> -		}
> +	Dprintk("After UC checking\n");
> +	for (i = 0; i < RANGE_NUM; i++) {
> +		if (!range[i].end)
> +			continue;
> +
> +		Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
> +			 range[i].start, range[i].end);
>   	}
>   
>   	/* sort the ranges */
>   	nr_range = clean_sort_range(range, RANGE_NUM);
> -	if  (debug_print) {
> -		pr_debug("After sorting\n");
> -		for (i = 0; i < nr_range; i++)
> -			pr_debug("MTRR MAP PFN: %016llx - %016llx\n",
> -				 range[i].start, range[i].end);
> -	}
> +
> +	Dprintk("After sorting\n");
> +	for (i = 0; i < nr_range; i++)
> +		Dprintk("MTRR MAP PFN: %016llx - %016llx\n",
> +			range[i].start, range[i].end);
>   
>   	return nr_range;
>   }
> @@ -164,13 +158,6 @@ static int __init enable_mtrr_cleanup_setup(char *str)
>   }
>   early_param("enable_mtrr_cleanup", enable_mtrr_cleanup_setup);
>   
> -static int __init mtrr_cleanup_debug_setup(char *str)
> -{
> -	debug_print = 1;
> -	return 0;
> -}
> -early_param("mtrr_cleanup_debug", mtrr_cleanup_debug_setup);
> -
>   static void __init
>   set_var_mtrr(unsigned int reg, unsigned long basek, unsigned long sizek,
>   	     unsigned char type)
> @@ -267,7 +254,7 @@ range_to_mtrr(unsigned int reg, unsigned long range_startk,
>   			align = max_align;
>   
>   		sizek = 1UL << align;
> -		if (debug_print) {
> +		if (mtrr_debug) {
>   			char start_factor = 'K', size_factor = 'K';
>   			unsigned long start_base, size_base;
>   
> @@ -542,7 +529,7 @@ static void __init print_out_mtrr_range_state(void)
>   		start_base = to_size_factor(start_base, &start_factor);
>   		type = range_state[i].type;
>   
> -		pr_debug("reg %d, base: %ld%cB, range: %ld%cB, type %s\n",
> +		Dprintk("reg %d, base: %ld%cB, range: %ld%cB, type %s\n",
>   			i, start_base, start_factor,
>   			size_base, size_factor,
>   			(type == MTRR_TYPE_UNCACHABLE) ? "UC" :
> @@ -714,7 +701,7 @@ int __init mtrr_cleanup(void)
>   		return 0;
>   
>   	/* Print original var MTRRs at first, for debugging: */
> -	pr_debug("original variable MTRRs\n");
> +	Dprintk("original variable MTRRs\n");
>   	print_out_mtrr_range_state();
>   
>   	memset(range, 0, sizeof(range));
> @@ -746,7 +733,7 @@ int __init mtrr_cleanup(void)
>   
>   		if (!result[i].bad) {
>   			set_var_mtrr_all();
> -			pr_debug("New variable MTRRs\n");
> +			Dprintk("New variable MTRRs\n");
>   			print_out_mtrr_range_state();
>   			return 1;
>   		}
> @@ -766,7 +753,7 @@ int __init mtrr_cleanup(void)
>   
>   			mtrr_calc_range_state(chunk_size, gran_size,
>   				      x_remove_base, x_remove_size, i);
> -			if (debug_print) {
> +			if (mtrr_debug) {
>   				mtrr_print_out_one_result(i);
>   				pr_info("\n");
>   			}
> @@ -790,7 +777,7 @@ int __init mtrr_cleanup(void)
>   		gran_size <<= 10;
>   		x86_setup_var_mtrrs(range, nr_range, chunk_size, gran_size);
>   		set_var_mtrr_all();
> -		pr_debug("New variable MTRRs\n");
> +		Dprintk("New variable MTRRs\n");
>   		print_out_mtrr_range_state();
>   		return 1;
>   	} else {
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index e5c5192d8a28..58a3848435c4 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -41,7 +41,7 @@ struct cache_map {
>   	u64 fixed:1;
>   };
>   
> -static bool mtrr_debug;
> +bool mtrr_debug;
>   
>   static int __init mtrr_param_setup(char *str)
>   {
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index ec8670bb5d88..767bf1c71aad 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -332,7 +332,7 @@ static int mtrr_check(unsigned long base, unsigned long size)
>   {
>   	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>   		pr_warn("size and base must be multiples of 4 kiB\n");
> -		pr_debug("size: 0x%lx  base: 0x%lx\n", size, base);
> +		Dprintk("size: 0x%lx  base: 0x%lx\n", size, base);
>   		dump_stack();
>   		return -1;
>   	}
> @@ -423,8 +423,7 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>   			}
>   		}
>   		if (reg < 0) {
> -			pr_debug("no MTRR for %lx000,%lx000 found\n",
> -				 base, size);
> +			Dprintk("no MTRR for %lx000,%lx000 found\n", base, size);
>   			goto out;
>   		}
>   	}
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
> index 8385d7d3a865..5655f253d929 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.h
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
> @@ -10,6 +10,9 @@
>   #define MTRR_CHANGE_MASK_VARIABLE  0x02
>   #define MTRR_CHANGE_MASK_DEFTYPE   0x04
>   
> +extern bool mtrr_debug;
> +#define Dprintk(x...) do { if (mtrr_debug) pr_info(x); } while (0)
> +
>   extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>   
>   struct mtrr_ops {


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm
In-Reply-To: <20230519102715.704767397@infradead.org>

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm
In-Reply-To: <87r0qwfrm0.ffs@tglx>

On Wed, May 31 2023 at 17:27, Thomas Gleixner wrote:
> On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
>> to take wrapping into account, but per all the above, we don't
>> actually wrap on u64 anymore.
>
> Indeed. The rationale was that you need ~146 years uptime with a 4GHz
> TSC or ~584 years with 1GHz to actually reach the wrap around point.
>
> Though I can see your point to make sure that silly BIOSes or VMMs
> cannot cause havoc by accident or malice.
>
> Did anyone ever validate that wrap around on TSC including TSC deadline
> timer works correctly?
>
> I have faint memories of TSC_ADJUST, which I prefer not to bring back to
> main memory :)

It seems my fears have been unjustified.

At least a quick test which sets the TSC to ~ -8min @2.1Ghz the machine
seems to survive without the colourful explosions I expected due to my
early exposure to TSC_ADJUST and TSC_DEADLINE_TIMER :)

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH RFC net-next v3 6/8] virtio/vsock: support dgrams
From: Dan Carpenter @ 2023-05-31 18:13 UTC (permalink / raw)
  To: Simon Horman
  Cc: Bobby Eshleman, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Bryan Tan, Vishnu Dasa,
	VMware PV-Drivers Reviewers, kvm, virtualization, netdev,
	linux-kernel, linux-hyperv
In-Reply-To: <ZHdxJxjXDkkO03L4@corigine.com>

On Wed, May 31, 2023 at 06:09:11PM +0200, Simon Horman wrote:
> > @@ -102,6 +144,7 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
> 
> Smatch that err may not be initialised in the out label below.
> 
> Just above this context the following appears:
> 
> 	if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
> 		WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
> 		goto out;
> 	}
> 
> So I wonder if in that case err may not be initialised.
> 

Yep, exactly right.  I commented out the goto and it silenced the
warning.  I also initialized err to zero at the start hoping that it
would trigger a different warning but it didn't.  :(

regards,
dan carpenter


> >  	return skb;
> >  
> >  out:
> > +	*errp = err;
> >  	kfree_skb(skb);
> >  	return NULL;
> >  }


^ permalink raw reply

* Re: [PATCH v4] hv_netvsc: Allocate rx indirection table size dynamically
From: Simon Horman @ 2023-05-31 16:33 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-kernel, linux-hyperv, netdev, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, Michael Kelley, David S. Miller, Steen Hegelund
In-Reply-To: <1685502893-29311-1-git-send-email-shradhagupta@linux.microsoft.com>

On Tue, May 30, 2023 at 08:14:53PM -0700, Shradha Gupta wrote:
> Allocate the size of rx indirection table dynamically in netvsc
> from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES
> query instead of using a constant value of ITAB_NUM.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Tested-on: Ubuntu22 (azure VM, SKU size: Standard_F72s_v2)
> Testcases:
> 1. ethtool -x eth0 output
> 2. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic

...

> @@ -1596,11 +1609,17 @@ void rndis_filter_device_remove(struct hv_device *dev,
>  				struct netvsc_device *net_dev)
>  {
>  	struct rndis_device *rndis_dev = net_dev->extension;
> +	struct net_device *net = hv_get_drvdata(dev);
> +	struct net_device_context *ndc = netdev_priv(net);

nit: I know this file doesn't follow the scheme very closely,
     but I'd preferred if it moved towards it.

     Please use reverse xmas tree - longest line to shortest -
     for local variable declarations in networking code.

	struct rndis_device *rndis_dev = net_dev->extension;
	struct net_device *net = hv_get_drvdata(dev);
	struct net_device_context *ndc;

	ndc = netdev_priv(net);

>  
>  	/* Halt and release the rndis device */
>  	rndis_filter_halt_device(net_dev, rndis_dev);
>  
>  	netvsc_device_remove(dev);
> +
> +	ndc->rx_table_sz = 0;
> +	kfree(ndc->rx_table);
> +	ndc->rx_table = NULL;
>  }

-- 
pw-bot: cr


^ permalink raw reply

* Re: [PATCH RFC net-next v3 7/8] vsock: Add lockless sendmsg() support
From: Simon Horman @ 2023-05-31 16:22 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers, kvm,
	virtualization, netdev, linux-kernel, linux-hyperv
In-Reply-To: <20230413-b4-vsock-dgram-v3-7-c2414413ef6a@bytedance.com>

On Wed, May 31, 2023 at 12:35:11AM +0000, Bobby Eshleman wrote:

...

Hi Bobby,

some more feedback from my side.

> Throughput metrics for single-threaded SOCK_DGRAM and
> single/multi-threaded SOCK_STREAM showed no statistically signficant

nit: s/signficant/significant/

> throughput changes (lowest p-value reaching 0.27), with the range of the
> mean difference ranging between -5% to +1%.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>

...

> @@ -120,8 +125,8 @@ struct vsock_transport {
>  
>  	/* DGRAM. */
>  	int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
> -	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> -			     struct msghdr *, size_t len);
> +	int (*dgram_enqueue)(const struct vsock_transport *, struct vsock_sock *,
> +			     struct sockaddr_vm *, struct msghdr *, size_t len);

Perhaps just a personal preference, but the arguments for these callbacks
could have names.

>  	bool (*dgram_allow)(u32 cid, u32 port);
>  	int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
>  	int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
> @@ -196,6 +201,17 @@ void vsock_core_unregister(const struct vsock_transport *t);
>  /* The transport may downcast this to access transport-specific functions */
>  const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk);
>  
> +static inline struct vsock_remote_info *
> +vsock_core_get_remote_info(struct vsock_sock *vsk)
> +{
> +

nit: no blank line here

> +	/* vsk->remote_info may be accessed if the rcu read lock is held OR the
> +	 * socket lock is held
> +	 */
> +	return rcu_dereference_check(vsk->remote_info,
> +				     lockdep_sock_is_held(sk_vsock(vsk)));
> +}
> +
>  /**** UTILS ****/
>  
>  /* vsock_table_lock must be held */

...

> @@ -300,17 +449,36 @@ static void vsock_insert_unbound(struct vsock_sock *vsk)
>  	spin_unlock_bh(&vsock_table_lock);
>  }
>  
> -void vsock_insert_connected(struct vsock_sock *vsk)
> +int vsock_insert_connected(struct vsock_sock *vsk)
>  {
> -	struct list_head *list = vsock_connected_sockets(
> -		&vsk->remote_addr, &vsk->local_addr);
> +	struct list_head *list;
> +	struct vsock_remote_info *remote_info;

nit: I know that this file doesn't follow the reverse xmas tree
     scheme - longest line to shortest - for local variable declarations.
     But as networking code I think it would be good towards towards
     that scheme as code is changed.

	struct vsock_remote_info *remote_info;
	struct list_head *list;

> +
> +	rcu_read_lock();
> +	remote_info = vsock_core_get_remote_info(vsk);
> +	if (!remote_info) {
> +		rcu_read_unlock();
> +		return -EINVAL;
> +	}
> +	list = vsock_connected_sockets(&remote_info->addr, &vsk->local_addr);
> +	rcu_read_unlock();
>  
>  	spin_lock_bh(&vsock_table_lock);
>  	__vsock_insert_connected(list, vsk);
>  	spin_unlock_bh(&vsock_table_lock);
> +
> +	return 0;
>  }

...

> @@ -1120,7 +1122,9 @@ virtio_transport_recv_connecting(struct sock *sk,
>  	case VIRTIO_VSOCK_OP_RESPONSE:
>  		sk->sk_state = TCP_ESTABLISHED;
>  		sk->sk_socket->state = SS_CONNECTED;
> -		vsock_insert_connected(vsk);
> +		err = vsock_insert_connected(vsk);
> +		if (err)
> +			goto destroy;

The destroy label uses skerr, but it is uninitialised here.

A W=1 or C=1 will probably tell you this.

>  		sk->sk_state_change(sk);
>  		break;
>  	case VIRTIO_VSOCK_OP_INVALID:

...

^ permalink raw reply

* Re: [PATCH RFC net-next v3 6/8] virtio/vsock: support dgrams
From: Simon Horman @ 2023-05-31 16:09 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers, kvm,
	virtualization, netdev, linux-kernel, linux-hyperv, Dan Carpenter
In-Reply-To: <20230413-b4-vsock-dgram-v3-6-c2414413ef6a@bytedance.com>

+ Dan Carpenter

On Wed, May 31, 2023 at 12:35:10AM +0000, Bobby Eshleman wrote:
> This commit adds support for datagrams over virtio/vsock.
> 
> Message boundaries are preserved on a per-skb and per-vq entry basis.
> Messages are copied in whole from the user to an SKB, which in turn is
> added to the scatterlist for the virtqueue in whole for the device.
> Messages do not straddle skbs and they do not straddle packets.
> Messages may be truncated by the receiving user if their buffer is
> shorter than the message.
> 
> Other properties of vsock datagrams:
> - Datagrams self-throttle at the per-socket sk_sndbuf threshold.
> - The same virtqueue is used as is used for streams and seqpacket flows
> - Credits are not used for datagrams
> - Packets are dropped silently by the device, which means the virtqueue
>   will still get kicked even during high packet loss, so long as the
>   socket does not exceed sk_sndbuf.
> 
> Future work might include finding a way to reduce the virtqueue kick
> rate for datagram flows with high packet loss.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>

...

Hi Bobby,

some feedback from my side.

> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c

...

> @@ -730,11 +754,18 @@ int vsock_bind_stream(struct vsock_sock *vsk,
>  }
>  EXPORT_SYMBOL(vsock_bind_stream);
>  
> -static int __vsock_bind_dgram(struct vsock_sock *vsk,
> -			      struct sockaddr_vm *addr)
> +static int vsock_bind_dgram(struct vsock_sock *vsk,
> +			    struct sockaddr_vm *addr)
>  {
> -	if (!vsk->transport || !vsk->transport->dgram_bind)
> -		return -EINVAL;
> +	if (!vsk->transport || !vsk->transport->dgram_bind) {
> +		int retval;

nit: blank line here

> +		spin_lock_bh(&vsock_dgram_table_lock);
> +		retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
> +					   VSOCK_HASH_SIZE);
> +		spin_unlock_bh(&vsock_dgram_table_lock);
> +
> +		return retval;
> +	}
>  
>  	return vsk->transport->dgram_bind(vsk, addr);
>  }

...

> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c

...

> @@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>  			   u32 src_cid,
>  			   u32 src_port,
>  			   u32 dst_cid,
> -			   u32 dst_port)
> +			   u32 dst_port,
> +			   int *errp)
>  {
>  	const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
>  	struct virtio_vsock_hdr *hdr;
> @@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>  	void *payload;
>  	int err;
>  
> -	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> -	if (!skb)
> +	/* dgrams do not use credits, self-throttle according to sk_sndbuf
> +	 * using sock_alloc_send_skb. This helps avoid triggering the OOM.
> +	 */
> +	if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) {
> +		skb = virtio_transport_sock_alloc_send_skb(info, skb_len, GFP_KERNEL, &err);
> +	} else {
> +		skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> +		if (!skb)
> +			err = -ENOMEM;
> +	}
> +
> +	if (!skb) {
> +		*errp = err;
>  		return NULL;
> +	}
>  
>  	hdr = virtio_vsock_hdr(skb);
>  	hdr->type	= cpu_to_le16(info->type);
> @@ -102,6 +144,7 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,

Smatch that err may not be initialised in the out label below.

Just above this context the following appears:

	if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
		WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
		goto out;
	}

So I wonder if in that case err may not be initialised.

>  	return skb;
>  
>  out:
> +	*errp = err;
>  	kfree_skb(skb);
>  	return NULL;
>  }

...

^ permalink raw reply

* RE: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
From: Michael Kelley (LINUX) @ 2023-05-31 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tianyu Lan, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com,
	jgross@suse.com, Tianyu Lan, kirill@shutemov.name,
	jiangshan.ljs@antgroup.com, ashish.kalra@amd.com,
	srutherford@google.com, akpm@linux-foundation.org,
	anshuman.khandual@arm.com, pawan.kumar.gupta@linux.intel.com,
	adrian.hunter@intel.com, daniel.sneddon@linux.intel.com,
	alexander.shishkin@linux.intel.com, sandipan.das@amd.com,
	ray.huang@amd.com, brijesh.singh@amd.com, michael.roth@amd.com,
	thomas.lendacky@amd.com, venu.busireddy@oracle.com,
	sterritt@google.com, tony.luck@intel.com, samitolvanen@google.com,
	fenghua.yu@intel.com, pangupta@amd.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <20230531154832.GA428966@hirez.programming.kicks-ass.net>

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 31, 2023 8:49 AM
> 
> On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:
> 
> > I'm jumping in to answer some of the basic questions here.  Yesterday,
> > there was a discussion about nested #HV exceptions, so maybe some of
> > this is already understood, but let me recap at a higher level, provide some
> > references, and suggest the path forward.
> 
> > 2) For the Restricted Interrupt Injection code, Tianyu will look at
> > how to absolutely minimize the impact in the hot code paths,
> > particularly when SEV-SNP is not active.  Hopefully the impact can
> > be a couple of instructions at most, or even less with the use of
> > other existing kernel techniques.  He'll look at the other things you've
> > commented on and get the code into a better state.  I'll work with
> > him on writing commit messages and comments that explain what's
> > going on.
> 
> So from what I understand of all this SEV-SNP/#HV muck is that it is
> near impossible to get right without ucode/hw changes. Hence my request
> to Tom to look into that.
> 
> The feature as specified in the AMD documentation seems fundamentally
> buggered.
> 
> Specifically #HV needs to be IST because hypervisor can inject at any
> moment, irrespective of IF or anything else -- even #HV itself. This
> means also in the syscall gap.
> 
> Since it is IST, a nested #HV is instant stack corruption -- #HV can
> attempt to play stack games as per the copied #VC crap (which I'm not at
> all convinced about being correct itself), but this doesn't actually fix
> anything, all you need is a single instruction window to wreck things.
> 
> Because as stated, the whole premise is that the hypervisor is out to
> get you, you must not leave it room to wiggle. As is, this is security
> through prayer, and we don't do that.
> 
> In short; I really want a solid proof that what you propose to implement
> is correct and not wishful thinking.

Fair enough.  We will be sync'ing with the AMD folks to make sure that
one way or another this really will work.

Michael


^ permalink raw reply

* Re: [PATCH RFC net-next v3 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue
From: Simon Horman @ 2023-05-31 15:56 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers, kvm,
	virtualization, netdev, linux-kernel, linux-hyperv
In-Reply-To: <20230413-b4-vsock-dgram-v3-1-c2414413ef6a@bytedance.com>

On Wed, May 31, 2023 at 12:35:05AM +0000, Bobby Eshleman wrote:
> This commit drops the transport->dgram_dequeue callback and makes
> vsock_dgram_recvmsg() generic. It also adds additional transport
> callbacks for use by the generic vsock_dgram_recvmsg(), such as for
> parsing skbs for CID/port which vary in format per transport.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>

...

> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b370070194fa..b6a51afb74b8 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue(
>  	return err - sizeof(*dg);
>  }
>  
> -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk,
> -					struct msghdr *msg, size_t len,
> -					int flags)
> +int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
>  {
> -	int err;
>  	struct vmci_datagram *dg;
> -	size_t payload_len;
> -	struct sk_buff *skb;
>  
> -	if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> -		return -EOPNOTSUPP;
> +	dg = (struct vmci_datagram *)skb->data;
> +	if (!dg)
> +		return -EINVAL;
>  
> -	/* Retrieve the head sk_buff from the socket's receive queue. */
> -	err = 0;
> -	skb = skb_recv_datagram(&vsk->sk, flags, &err);
> -	if (!skb)
> -		return err;
> +	*cid = dg->src.context;
> +	return 0;
> +}

Hi Bobby,

clang-16 with W=1 seems a bit unhappy about this.

  net/vmw_vsock/vmci_transport.c:1734:5: warning: no previous prototype for function 'vmci_transport_dgram_get_cid' [-Wmissing-prototypes]
  int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
      ^
  net/vmw_vsock/vmci_transport.c:1734:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
  int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
  ^
  static 
  net/vmw_vsock/vmci_transport.c:1746:5: warning: no previous prototype for function 'vmci_transport_dgram_get_port' [-Wmissing-prototypes]
  int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
      ^
  net/vmw_vsock/vmci_transport.c:1746:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
  int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
  ^
  static 
  net/vmw_vsock/vmci_transport.c:1758:5: warning: no previous prototype for function 'vmci_transport_dgram_get_length' [-Wmissing-prototypes]
  int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
      ^
  net/vmw_vsock/vmci_transport.c:1758:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
  int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
  ^

I see similar warnings for net/vmw_vsock/af_vsock.c in patch 4/8.

> +
> +int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port)
> +{
> +	struct vmci_datagram *dg;
>  
>  	dg = (struct vmci_datagram *)skb->data;
>  	if (!dg)
> -		/* err is 0, meaning we read zero bytes. */
> -		goto out;
> -
> -	payload_len = dg->payload_size;
> -	/* Ensure the sk_buff matches the payload size claimed in the packet. */
> -	if (payload_len != skb->len - sizeof(*dg)) {
> -		err = -EINVAL;
> -		goto out;
> -	}
> +		return -EINVAL;
>  
> -	if (payload_len > len) {
> -		payload_len = len;
> -		msg->msg_flags |= MSG_TRUNC;
> -	}
> +	*port = dg->src.resource;
> +	return 0;
> +}
>  
> -	/* Place the datagram payload in the user's iovec. */
> -	err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len);
> -	if (err)
> -		goto out;
> +int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len)
> +{
> +	struct vmci_datagram *dg;
>  
> -	if (msg->msg_name) {
> -		/* Provide the address of the sender. */
> -		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> -		vsock_addr_init(vm_addr, dg->src.context, dg->src.resource);
> -		msg->msg_namelen = sizeof(*vm_addr);
> -	}
> -	err = payload_len;
> +	dg = (struct vmci_datagram *)skb->data;
> +	if (!dg)
> +		return -EINVAL;
>  
> -out:
> -	skb_free_datagram(&vsk->sk, skb);
> -	return err;
> +	*len = dg->payload_size;
> +	return 0;
>  }
>  
>  static bool vmci_transport_dgram_allow(u32 cid, u32 port)

...

^ permalink raw reply

* Re: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
From: Peter Zijlstra @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Tianyu Lan, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com,
	jgross@suse.com, Tianyu Lan, kirill@shutemov.name,
	jiangshan.ljs@antgroup.com, ashish.kalra@amd.com,
	srutherford@google.com, akpm@linux-foundation.org,
	anshuman.khandual@arm.com, pawan.kumar.gupta@linux.intel.com,
	adrian.hunter@intel.com, daniel.sneddon@linux.intel.com,
	alexander.shishkin@linux.intel.com, sandipan.das@amd.com,
	ray.huang@amd.com, brijesh.singh@amd.com, michael.roth@amd.com,
	thomas.lendacky@amd.com, venu.busireddy@oracle.com,
	sterritt@google.com, tony.luck@intel.com, samitolvanen@google.com,
	fenghua.yu@intel.com, pangupta@amd.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <BYAPR21MB16887196D3DFFCB52EAC546AD748A@BYAPR21MB1688.namprd21.prod.outlook.com>

On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:

> I'm jumping in to answer some of the basic questions here.  Yesterday,
> there was a discussion about nested #HV exceptions, so maybe some of
> this is already understood, but let me recap at a higher level, provide some
> references, and suggest the path forward.

> 2) For the Restricted Interrupt Injection code, Tianyu will look at
> how to absolutely minimize the impact in the hot code paths,
> particularly when SEV-SNP is not active.  Hopefully the impact can
> be a couple of instructions at most, or even less with the use of
> other existing kernel techniques.  He'll look at the other things you've
> commented on and get the code into a better state.  I'll work with
> him on writing commit messages and comments that explain what's
> going on.

So from what I understand of all this SEV-SNP/#HV muck is that it is
near impossible to get right without ucode/hw changes. Hence my request
to Tom to look into that.

The feature as specified in the AMD documentation seems fundamentally
buggered.

Specifically #HV needs to be IST because hypervisor can inject at any
moment, irrespective of IF or anything else -- even #HV itself. This
means also in the syscall gap.

Since it is IST, a nested #HV is instant stack corruption -- #HV can
attempt to play stack games as per the copied #VC crap (which I'm not at
all convinced about being correct itself), but this doesn't actually fix
anything, all you need is a single instruction window to wreck things.

Because as stated, the whole premise is that the hypervisor is out to
get you, you must not leave it room to wiggle. As is, this is security
through prayer, and we don't do that.

In short; I really want a solid proof that what you propose to implement
is correct and not wishful thinking.


^ permalink raw reply

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
From: Thomas Gleixner @ 2023-05-31 15:27 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm
In-Reply-To: <20230519102715.704767397@infradead.org>

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> Because of how the virtual clocks use U64_MAX as an exception value
> instead of a valid time, the clocks can no longer be assumed to wrap
> cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
> everything with the MSB/Sign-bit set.
>
> Therefore, the effective mask becomes S64_MAX, and the comment with
> vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
> out is just plain silly.
>
> Now, the code has a negative filter -- to deal with TSC wobbles:
>
> 	if (cycles > last)
>
> which is just plain wrong, because it should've been written as:
>
> 	if ((s64)(cycles - last) > 0)
>
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.

Indeed. The rationale was that you need ~146 years uptime with a 4GHz
TSC or ~584 years with 1GHz to actually reach the wrap around point.

Though I can see your point to make sure that silly BIOSes or VMMs
cannot cause havoc by accident or malice.

Did anyone ever validate that wrap around on TSC including TSC deadline
timer works correctly?

I have faint memories of TSC_ADJUST, which I prefer not to bring back to
main memory :)

Thanks,

        tglx

^ permalink raw reply

* RE: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
From: Michael Kelley (LINUX) @ 2023-05-31 14:50 UTC (permalink / raw)
  To: Peter Zijlstra, Tianyu Lan
  Cc: luto@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com,
	jgross@suse.com, Tianyu Lan, kirill@shutemov.name,
	jiangshan.ljs@antgroup.com, ashish.kalra@amd.com,
	srutherford@google.com, akpm@linux-foundation.org,
	anshuman.khandual@arm.com, pawan.kumar.gupta@linux.intel.com,
	adrian.hunter@intel.com, daniel.sneddon@linux.intel.com,
	alexander.shishkin@linux.intel.com, sandipan.das@amd.com,
	ray.huang@amd.com, brijesh.singh@amd.com, michael.roth@amd.com,
	thomas.lendacky@amd.com, venu.busireddy@oracle.com,
	sterritt@google.com, tony.luck@intel.com, samitolvanen@google.com,
	fenghua.yu@intel.com, pangupta@amd.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <20230517130943.GE2665450@hirez.programming.kicks-ass.net>

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 17, 2023 6:10 AM
> 
> On Wed, May 17, 2023 at 05:55:45PM +0800, Tianyu Lan wrote:
> > On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
> > > > --- a/arch/x86/entry/entry_64.S
> > > > +++ b/arch/x86/entry/entry_64.S
> > > > @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
> > > >    * R15 - old SPEC_CTRL
> > > >    */
> > > >   SYM_CODE_START_LOCAL(paranoid_exit)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	/*
> > > > @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
> > > >   SYM_CODE_END(error_entry)
> > > >   SYM_CODE_START_LOCAL(error_return)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	DEBUG_ENTRY_ASSERT_IRQS_OFF
> > > >   	testb	$3, CS(%rsp)
> > > Oh hell no... do now you're adding unconditional calls to every single
> > > interrupt and nmi exit path, with the grand total of 0 justification.
> > >
> >
> > Sorry to Add check inside of check_hv_pending(). Will move the check before
> > calling check_hv_pending() in the next version. Thanks.
> 
> You will also explain, in the Changelog, in excruciating detail, *WHY*
> any of this is required.
> 
> Any additional code in these paths that are only required for some
> random hypervisor had better proof that they are absolutely required and
> no alternative solution exists and have no performance impact on normal
> users.
> 
> If this is due to Hyper-V design idiocies over something fundamentally
> required by the hardware design you'll get a NAK.

I'm jumping in to answer some of the basic questions here.  Yesterday,
there was a discussion about nested #HV exceptions, so maybe some of
this is already understood, but let me recap at a higher level, provide some
references, and suggest the path forward.

This code and some of the other patches in this series are for handling the
#HV exception that is introduced by the Restricted Interrupt Injection
feature of the SEV-SNP architecture.  See Section 15.36.16 of [1], and
Section 5 of [2].   There's also an AMD presentation from LPC last fall [3].

Hyper-V requires that the guest implement Restricted Interrupt Injection
to handle the case of a compromised hypervisor injecting an exception
(and forcing the running of that exception handler), even when it should
be disallowed by guest state. For example, the hypervisor could inject an
interrupt while the guest has interrupts disabled.  In time, presumably other
hypervisors like KVM will at least have an option where they expect SEV-SNP
guests to implement Restricted Interrupt Injection functionality, so it's
not Hyper-V specific.

Naming the new exception as #HV and use of "hv" as the Linux prefix
for related functions and variable names is a bit unfortunate.  It
conflicts with the existing use of the "hv" prefix to denote Hyper-V
specific code in the Linux kernel, and at first glance makes this code
look like it is Hyper-V specific code. Maybe we can choose a different
prefix ("hvex"?) for this #HV exception related code to avoid that
"first glance" confusion.

I've talked with Tianyu offline, and he will do the following:

1) Split this patch set into two patch sets.  The first patch set is Hyper-V
specific code for managing communication pages that must be shared
between the guest and Hyper-V, for starting APs, etc.  The second patch
set will be only the Restricted Interrupt Injection and #HV code.

2) For the Restricted Interrupt Injection code, Tianyu will look at
how to absolutely minimize the impact in the hot code paths,
particularly when SEV-SNP is not active.  Hopefully the impact can
be a couple of instructions at most, or even less with the use of
other existing kernel techniques.  He'll look at the other things you've
commented on and get the code into a better state.  I'll work with
him on writing commit messages and comments that explain what's
going on.

Michael

[1] https://www.amd.com/system/files/TechDocs/24593.pdf 
[2] https://www.amd.com/system/files/TechDocs/56421-guest-hypervisor-communication-block-standardization.pdf
[3] https://lpc.events/event/16/contributions/1321/attachments/965/1886/SNP_Interrupt_Security.pptx 

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-05-31 14:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531083508.GAZHcGvB68PUAH7f+a@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 1568 bytes --]

On 31.05.23 10:35, Borislav Petkov wrote:
> [    0.018357] MTRR default type: uncachable
> [    0.022347] MTRR fixed ranges enabled:
> [    0.026085]   00000-9FFFF write-back
> [    0.029650]   A0000-BFFFF uncachable
> [    0.033214]   C0000-FFFFF write-protect
> [    0.037039] MTRR variable ranges enabled:
> [    0.041038]   0 base 000000000000000 mask 0003FFC00000000 write-back
> [    0.047383]   1 base 000000400000000 mask 0003FFFC0000000 write-back
> [    0.053730]   2 base 000000440000000 mask 0003FFFF0000000 write-back
> [    0.060076]   3 base 0000000AE000000 mask 0003FFFFE000000 uncachable
> [    0.066421]   4 base 0000000B0000000 mask 0003FFFF0000000 uncachable
> [    0.072768]   5 base 0000000C0000000 mask 0003FFFC0000000 uncachable
> [    0.079114]   6 disabled
> [    0.081635]   7 disabled
> [    0.084156]   8 disabled
> [    0.086677]   9 disabled
> [    0.089203] total RAM covered: 16352M
> [    0.093023] Found optimal setting for mtrr clean up
> [    0.097734]  gran_size: 64K 	chunk_size: 64M 	num_reg: 8  	lose cover RAM: 0G

One other note: why does mtrr_cleanup() think that using 8 instead of 6
variable MTRRs would be an "optimal setting"?

IMO it should replace the original setup only in case it is using _less_
MTRRs than before.

Additionally I believe mtrr_cleanup() would make much more sense if it
wouldn't be __init, but being usable when trying to add additional MTRRs
in the running system in case we run out of MTRRs.

It should probably be based on the new MTRR map anyway...


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Borislav Petkov @ 2023-05-31  9:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <7e824a95-6676-9553-4158-d434f617fcbb@suse.com>

On Wed, May 31, 2023 at 11:31:37AM +0200, Juergen Gross wrote:
> What it did would have been printed if pr_debug() would have been
> active. :-(

Lemme turn those into pr_info(). pr_debug() is nuts.

> Did you check whether CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT was the same in both
> kernels you've tested?

Yes, it is enabled.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-05-31  9:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230531083508.GAZHcGvB68PUAH7f+a@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 3132 bytes --]

On 31.05.23 10:35, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 09:28:57AM +0200, Juergen Gross wrote:
>> Can you please boot the system with the MTRR patches and specify "mtrr=debug"
>> on the command line? I'd be interested in the raw register values being read
>> and the resulting memory type map.
> 
> This is exactly why I wanted this option. And you're already putting it
> to good use. :-P
> 
> Full dmesg below.
> 
> [    0.012878] last_pfn = 0x450000 max_arch_pfn = 0x400000000
> [    0.018357] MTRR default type: uncachable
> [    0.022347] MTRR fixed ranges enabled:
> [    0.026085]   00000-9FFFF write-back
> [    0.029650]   A0000-BFFFF uncachable
> [    0.033214]   C0000-FFFFF write-protect
> [    0.037039] MTRR variable ranges enabled:
> [    0.041038]   0 base 000000000000000 mask 0003FFC00000000 write-back

16 GB WB at address 0.

> [    0.047383]   1 base 000000400000000 mask 0003FFFC0000000 write-back

1 GB WB at address 16GB.

> [    0.053730]   2 base 000000440000000 mask 0003FFFF0000000 write-back

256MB WB at address 17GB.

This means per default 0-44fffffff are WB.

> [    0.060076]   3 base 0000000AE000000 mask 0003FFFFE000000 uncachable

32MB UC at AE000000

> [    0.066421]   4 base 0000000B0000000 mask 0003FFFF0000000 uncachable

256MB UC at B0000000

> [    0.072768]   5 base 0000000C0000000 mask 0003FFFC0000000 uncachable

512MB UC at C0000000

So an UC hole at AE000000-FFFFFFFF.

> [    0.079114]   6 disabled
> [    0.081635]   7 disabled
> [    0.084156]   8 disabled
> [    0.086677]   9 disabled
> [    0.089203] total RAM covered: 16352M
> [    0.093023] Found optimal setting for mtrr clean up

It seems as if mtrr_cleanup() did change the MTRR settings.

What it did would have been printed if pr_debug() would have been
active. :-(

> [    0.097734]  gran_size: 64K 	chunk_size: 64M 	num_reg: 8  	lose cover RAM: 0G
> [    0.104864] MTRR map: 6 entries (3 fixed + 3 variable; max 23), built from 10 variable MTRRs
> [    0.113294]   0: 0000000000000000-000000000009ffff write-back
> [    0.119033]   1: 00000000000a0000-00000000000bffff uncachable
> [    0.124771]   2: 00000000000c0000-00000000000fffff write-protect
> [    0.130769]   3: 0000000000100000-00000000adffffff write-back
> [    0.136508]   4: 00000000ae000000-00000000afffffff uncachable
> [    0.142246]   5: 0000000100000000-000000044fffffff write-back

The MTRR map seems to be fine assuming the MTRR values before the "clean up".

> [    0.147992] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
 > [    0.155122] e820: update [mem 0xae000000-0xafffffff] usable ==> reserved
 > [    0.161663] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
 > [    0.168358] e820: update [mem 0x110000000-0x1ffffffff] usable ==> reserved
 > [    0.175227] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 
3840MB of RAM.

Clean up messed with the settings, resulting in loss of RAM.

Did you check whether CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT was the same in both
kernels you've tested?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
From: Peter Zijlstra @ 2023-05-31  9:14 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Gupta, Pankaj, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv, linux-arch
In-Reply-To: <20230530185232.GA211927@hirez.programming.kicks-ass.net>

On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote:

> > That should really say that a nested #HV should never be raised by the
> > hypervisor, but if it is, then the guest should detect that and
> > self-terminate knowing that the hypervisor is possibly being malicious.
> 
> I've yet to see code that can do that reliably.

Tom; could you please investigate if this can be enforced in ucode?

Ideally #HV would have an internal latch such that a recursive #HV will
terminate the guest (much like double #MC and tripple-fault).

But unlike the #MC trainwreck, can we please not leave a glaring hole in
this latch and use a spare bit in the IRET frame please?

So have #HV delivery:
 - check internal latch; if set, terminate machine
 - set latch
 - write IRET frame with magic bit set

have IRET:
 - check magic bit and reset #HV latch


^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Borislav Petkov @ 2023-05-31  8:35 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <888f860d-4307-54eb-01da-11f9adf65559@suse.com>

On Wed, May 31, 2023 at 09:28:57AM +0200, Juergen Gross wrote:
> Can you please boot the system with the MTRR patches and specify "mtrr=debug"
> on the command line? I'd be interested in the raw register values being read
> and the resulting memory type map.

This is exactly why I wanted this option. And you're already putting it
to good use. :-P

Full dmesg below.

[    0.000000] microcode: updated early: 0x710 -> 0x718, date = 2019-05-21
[    0.000000] Linux version 6.4.0-rc1+ (boris@zn) (gcc (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Tue May 30 15:54:17 CEST 2023
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc1+ root=/dev/sda7 ro earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 ras=cec_disable root=/dev/sda7 log_buf_len=10M resume=/dev/sda5 no_console_suspend ignore_loglevel mtrr=debug
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] signal: max sigframe size: 1776
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000018ebafff] usable
[    0.000000] BIOS-e820: [mem 0x0000000018ebb000-0x0000000018fe7fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000018fe8000-0x0000000018fe8fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000018fe9000-0x0000000018ffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000019000000-0x000000001dffcfff] usable
[    0.000000] BIOS-e820: [mem 0x000000001dffd000-0x000000001dffffff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000001e000000-0x00000000ac77cfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000ac77d000-0x00000000ac77ffff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac780000-0x00000000ac780fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac781000-0x00000000ac782fff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac783000-0x00000000ac7d9fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac7da000-0x00000000ac7dafff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac7db000-0x00000000ac7dcfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac7dd000-0x00000000ac7e7fff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac7e8000-0x00000000ac7f1fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac7f2000-0x00000000ac7f5fff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac7f6000-0x00000000ac7f9fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac7fa000-0x00000000ac7fafff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac7fb000-0x00000000ac803fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac804000-0x00000000ac810fff] type 20
[    0.000000] BIOS-e820: [mem 0x00000000ac811000-0x00000000ac813fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ac814000-0x00000000ad7fffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000b3ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed20000-0x00000000fed3ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed50000-0x00000000fed8ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffa00000-0x00000000ffa3ffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000044fffffff] usable
[    0.000000] printk: bootconsole [earlyser0] enabled
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.0 by American Megatrends
[    0.000000] efi: ACPI 2.0=0x1dffff98 SMBIOS=0xac811018 
[    0.000000] efi: Remove mem57: MMIO range=[0xb0000000-0xb3ffffff] (64MB) from e820 map
[    0.000000] e820: remove [mem 0xb0000000-0xb3ffffff] reserved
[    0.000000] efi: Not removing mem58: MMIO range=[0xfed20000-0xfed3ffff] (128KB) from e820 map
[    0.000000] efi: Remove mem59: MMIO range=[0xfed50000-0xfed8ffff] (0MB) from e820 map
[    0.000000] e820: remove [mem 0xfed50000-0xfed8ffff] reserved
[    0.000000] efi: Remove mem60: MMIO range=[0xffa00000-0xffa3ffff] (0MB) from e820 map
[    0.000000] e820: remove [mem 0xffa00000-0xffa3ffff] reserved
[    0.000000] SMBIOS 2.6 present.
[    0.000000] DMI: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 3591.377 MHz processor
[    0.000767] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.007307] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.012878] last_pfn = 0x450000 max_arch_pfn = 0x400000000
[    0.018357] MTRR default type: uncachable
[    0.022347] MTRR fixed ranges enabled:
[    0.026085]   00000-9FFFF write-back
[    0.029650]   A0000-BFFFF uncachable
[    0.033214]   C0000-FFFFF write-protect
[    0.037039] MTRR variable ranges enabled:
[    0.041038]   0 base 000000000000000 mask 0003FFC00000000 write-back
[    0.047383]   1 base 000000400000000 mask 0003FFFC0000000 write-back
[    0.053730]   2 base 000000440000000 mask 0003FFFF0000000 write-back
[    0.060076]   3 base 0000000AE000000 mask 0003FFFFE000000 uncachable
[    0.066421]   4 base 0000000B0000000 mask 0003FFFF0000000 uncachable
[    0.072768]   5 base 0000000C0000000 mask 0003FFFC0000000 uncachable
[    0.079114]   6 disabled
[    0.081635]   7 disabled
[    0.084156]   8 disabled
[    0.086677]   9 disabled
[    0.089203] total RAM covered: 16352M
[    0.093023] Found optimal setting for mtrr clean up
[    0.097734]  gran_size: 64K 	chunk_size: 64M 	num_reg: 8  	lose cover RAM: 0G
[    0.104864] MTRR map: 6 entries (3 fixed + 3 variable; max 23), built from 10 variable MTRRs
[    0.113294]   0: 0000000000000000-000000000009ffff write-back
[    0.119033]   1: 00000000000a0000-00000000000bffff uncachable
[    0.124771]   2: 00000000000c0000-00000000000fffff write-protect
[    0.130769]   3: 0000000000100000-00000000adffffff write-back
[    0.136508]   4: 00000000ae000000-00000000afffffff uncachable
[    0.142246]   5: 0000000100000000-000000044fffffff write-back
[    0.147992] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.155122] e820: update [mem 0xae000000-0xafffffff] usable ==> reserved
[    0.161663] e820: update [mem 0xc0000000-0xffffffff] usable ==> reserved
[    0.168358] e820: update [mem 0x110000000-0x1ffffffff] usable ==> reserved
[    0.175227] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 3840MB of RAM.
[    0.183397] update e820 for mtrr
[    0.186621] modified physical RAM map:
[    0.190351] modified: [mem 0x0000000000000000-0x0000000000000fff] reserved
[    0.197219] modified: [mem 0x0000000000001000-0x000000000009ffff] usable
[    0.203914] modified: [mem 0x0000000000100000-0x0000000018ebafff] usable
[    0.210608] modified: [mem 0x0000000018ebb000-0x0000000018fe7fff] ACPI NVS
[    0.217475] modified: [mem 0x0000000018fe8000-0x0000000018fe8fff] usable
[    0.224170] modified: [mem 0x0000000018fe9000-0x0000000018ffffff] ACPI NVS
[    0.231037] modified: [mem 0x0000000019000000-0x000000001dffcfff] usable
[    0.237732] modified: [mem 0x000000001dffd000-0x000000001dffffff] ACPI data
[    0.244687] modified: [mem 0x000000001e000000-0x00000000ac77cfff] usable
[    0.251381] modified: [mem 0x00000000ac77d000-0x00000000ac77ffff] type 20
[    0.258162] modified: [mem 0x00000000ac780000-0x00000000ac780fff] reserved
[    0.265031] modified: [mem 0x00000000ac781000-0x00000000ac782fff] type 20
[    0.271812] modified: [mem 0x00000000ac783000-0x00000000ac7d9fff] reserved
[    0.278679] modified: [mem 0x00000000ac7da000-0x00000000ac7dafff] type 20
[    0.285460] modified: [mem 0x00000000ac7db000-0x00000000ac7dcfff] reserved
[    0.292329] modified: [mem 0x00000000ac7dd000-0x00000000ac7e7fff] type 20
[    0.299109] modified: [mem 0x00000000ac7e8000-0x00000000ac7f1fff] reserved
[    0.305977] modified: [mem 0x00000000ac7f2000-0x00000000ac7f5fff] type 20
[    0.312757] modified: [mem 0x00000000ac7f6000-0x00000000ac7f9fff] reserved
[    0.319627] modified: [mem 0x00000000ac7fa000-0x00000000ac7fafff] type 20
[    0.326408] modified: [mem 0x00000000ac7fb000-0x00000000ac803fff] reserved
[    0.333275] modified: [mem 0x00000000ac804000-0x00000000ac810fff] type 20
[    0.340058] modified: [mem 0x00000000ac811000-0x00000000ac813fff] reserved
[    0.346927] modified: [mem 0x00000000ac814000-0x00000000ad7fffff] usable
[    0.353620] modified: [mem 0x00000000fed20000-0x00000000fed3ffff] reserved
[    0.360489] modified: [mem 0x0000000100000000-0x000000010fffffff] usable
[    0.367183] modified: [mem 0x0000000110000000-0x00000001ffffffff] reserved
[    0.374051] modified: [mem 0x0000000200000000-0x000000044fffffff] usable
[    0.380745] last_pfn = 0x450000 max_arch_pfn = 0x400000000
[    0.386223] last_pfn = 0xad800 max_arch_pfn = 0x400000000
[    0.393245] found SMP MP-table at [mem 0x000f1dd0-0x000f1ddf]
[    0.398838] Using GB pages for direct mapping
[    0.415353] printk: log_buf_len: 16777216 bytes
[    0.419724] printk: early log buf free: 253832(96%)
[    0.424592] Secure boot could not be determined
[    0.429112] RAMDISK: [mem 0x372c7000-0x3795afff]
[    0.433723] ACPI: Early table checksum verification disabled
[    0.439377] ACPI: RSDP 0x000000001DFFFF98 000024 (v02 DELL  )
[    0.445112] ACPI: XSDT 0x000000001DFFEE18 00006C (v01 DELL   CBX3     06222004 MSFT 00010013)
[    0.453632] ACPI: FACP 0x0000000018FF0C18 0000F4 (v04 DELL   CBX3     06222004 MSFT 00010013)
[    0.462153] ACPI: DSDT 0x0000000018FA9018 006373 (v01 DELL   CBX3     00000000 INTL 20091112)
[    0.470671] ACPI: FACS 0x0000000018FFDF40 000040
[    0.475278] ACPI: FACS 0x0000000018FF1F40 000040
[    0.479887] ACPI: APIC 0x000000001DFFDC18 000158 (v02 DELL   CBX3     06222004 MSFT 00010013)
[    0.488406] ACPI: MCFG 0x0000000018FFED18 00003C (v01 A M I  OEMMCFG. 06222004 MSFT 00000097)
[    0.496927] ACPI: TCPA 0x0000000018FFEC98 000032 (v02                 00000000      00000000)
[    0.505447] ACPI: SSDT 0x0000000018FEFA98 000306 (v01 DELLTP TPM      00003000 INTL 20091112)
[    0.513967] ACPI: HPET 0x0000000018FFEC18 000038 (v01 A M I   PCHHPET 06222004 AMI. 00000003)
[    0.522487] ACPI: BOOT 0x0000000018FFEB98 000028 (v01 DELL   CBX3     06222004 AMI  00010013)
[    0.531008] ACPI: SSDT 0x0000000018FB0018 037106 (v02 INTEL  CpuPm    00004000 INTL 20091112)
[    0.539526] ACPI: SLIC 0x0000000018FEEC18 000176 (v03 DELL   CBX3     06222004 MSFT 00010013)
[    0.548046] ACPI: Reserving FACP table memory at [mem 0x18ff0c18-0x18ff0d0b]
[    0.555088] ACPI: Reserving DSDT table memory at [mem 0x18fa9018-0x18faf38a]
[    0.562130] ACPI: Reserving FACS table memory at [mem 0x18ffdf40-0x18ffdf7f]
[    0.569172] ACPI: Reserving FACS table memory at [mem 0x18ff1f40-0x18ff1f7f]
[    0.576213] ACPI: Reserving APIC table memory at [mem 0x1dffdc18-0x1dffdd6f]
[    0.583254] ACPI: Reserving MCFG table memory at [mem 0x18ffed18-0x18ffed53]
[    0.590295] ACPI: Reserving TCPA table memory at [mem 0x18ffec98-0x18ffecc9]
[    0.597336] ACPI: Reserving SSDT table memory at [mem 0x18fefa98-0x18fefd9d]
[    0.604378] ACPI: Reserving HPET table memory at [mem 0x18ffec18-0x18ffec4f]
[    0.611418] ACPI: Reserving BOOT table memory at [mem 0x18ffeb98-0x18ffebbf]
[    0.618665] ACPI: Reserving SSDT table memory at [mem 0x18fb0018-0x18fe711d]
[    0.625706] ACPI: Reserving SLIC table memory at [mem 0x18feec18-0x18feed8d]
[    0.632792] No NUMA configuration found
[    0.636572] Faking a node at [mem 0x0000000000000000-0x000000044fffffff]
[    0.643268] NODE_DATA(0) allocated [mem 0x44b7f8000-0x44b7fbfff]
[    0.649305] Zone ranges:
[    0.651786]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.657959]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.664132]   Normal   [mem 0x0000000100000000-0x000000044fffffff]
[    0.670304] Movable zone start for each node
[    0.674564] Early memory node ranges
[    0.678128]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.684388]   node   0: [mem 0x0000000000100000-0x0000000018ebafff]
[    0.690647]   node   0: [mem 0x0000000018fe8000-0x0000000018fe8fff]
[    0.696908]   node   0: [mem 0x0000000019000000-0x000000001dffcfff]
[    0.703168]   node   0: [mem 0x000000001e000000-0x00000000ac77cfff]
[    0.709427]   node   0: [mem 0x00000000ac814000-0x00000000ad7fffff]
[    0.715686]   node   0: [mem 0x0000000100000000-0x000000010fffffff]
[    0.721946]   node   0: [mem 0x0000000200000000-0x000000044fffffff]
[    0.728206] Initmem setup node 0 [mem 0x0000000000001000-0x000000044fffffff]
[    0.735250] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.735274] On node 0, zone DMA: 96 pages in unavailable ranges
[    0.741596] On node 0, zone DMA32: 301 pages in unavailable ranges
[    0.747459] On node 0, zone DMA32: 23 pages in unavailable ranges
[    0.756635] On node 0, zone DMA32: 3 pages in unavailable ranges
[    0.762596] On node 0, zone DMA32: 151 pages in unavailable ranges
[    0.768986] On node 0, zone Normal: 10240 pages in unavailable ranges
[    0.788009] ACPI: PM-Timer IO Port: 0x408
[    0.798317] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[    0.805164] IOAPIC[1]: apic_id 2, version 32, address 0xfec3f000, GSI 24-47
[    0.812116] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.818461] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.825071] ACPI: Using ACPI (MADT) for SMP configuration information
[    0.831501] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.836633] TSC deadline timer available
[    0.840544] smpboot: Allowing 32 CPUs, 24 hotplug CPUs
[    0.845700] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.853236] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.860800] PM: hibernation: Registered nosave memory: [mem 0x18ebb000-0x18fe7fff]
[    0.868363] PM: hibernation: Registered nosave memory: [mem 0x18fe9000-0x18ffffff]
[    0.875926] PM: hibernation: Registered nosave memory: [mem 0x1dffd000-0x1dffffff]
[    0.883490] PM: hibernation: Registered nosave memory: [mem 0xac77d000-0xac77ffff]
[    0.891051] PM: hibernation: Registered nosave memory: [mem 0xac780000-0xac780fff]
[    0.898614] PM: hibernation: Registered nosave memory: [mem 0xac781000-0xac782fff]
[    0.906178] PM: hibernation: Registered nosave memory: [mem 0xac783000-0xac7d9fff]
[    0.913742] PM: hibernation: Registered nosave memory: [mem 0xac7da000-0xac7dafff]
[    0.921306] PM: hibernation: Registered nosave memory: [mem 0xac7db000-0xac7dcfff]
[    0.928869] PM: hibernation: Registered nosave memory: [mem 0xac7dd000-0xac7e7fff]
[    0.936433] PM: hibernation: Registered nosave memory: [mem 0xac7e8000-0xac7f1fff]
[    0.943996] PM: hibernation: Registered nosave memory: [mem 0xac7f2000-0xac7f5fff]
[    0.951559] PM: hibernation: Registered nosave memory: [mem 0xac7f6000-0xac7f9fff]
[    0.959122] PM: hibernation: Registered nosave memory: [mem 0xac7fa000-0xac7fafff]
[    0.966686] PM: hibernation: Registered nosave memory: [mem 0xac7fb000-0xac803fff]
[    0.974248] PM: hibernation: Registered nosave memory: [mem 0xac804000-0xac810fff]
[    0.981809] PM: hibernation: Registered nosave memory: [mem 0xac811000-0xac813fff]
[    0.989372] PM: hibernation: Registered nosave memory: [mem 0xad800000-0xfed1ffff]
[    0.996933] PM: hibernation: Registered nosave memory: [mem 0xfed20000-0xfed3ffff]
[    1.004495] PM: hibernation: Registered nosave memory: [mem 0xfed40000-0xffffffff]
[    1.012060] PM: hibernation: Registered nosave memory: [mem 0x110000000-0x1ffffffff]
[    1.019797] [mem 0xad800000-0xfed1ffff] available for PCI devices
[    1.025880] Booting paravirtualized kernel on bare hardware
[    1.031444] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    1.046295] setup_percpu: NR_CPUS:256 nr_cpumask_bits:32 nr_cpu_ids:32 nr_node_ids:1
[    1.056271] percpu: Embedded 78 pages/cpu s282624 r8192 d28672 u524288
[    1.062650] pcpu-alloc: s282624 r8192 d28672 u524288 alloc=1*2097152
[    1.068988] pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 
[    1.074292] pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 
[    1.079594] pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 
[    1.084897] pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31 
[    1.090219] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc1+ root=/dev/sda7 ro earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 ras=cec_disable root=/dev/sda7 log_buf_len=10M resume=/dev/sda5 no_console_suspend ignore_loglevel mtrr=debug
[    1.112884] Unknown kernel command line parameters "BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc1+", will be passed to user space.
[    1.125058] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[    1.133712] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    1.141770] Fallback order for Node 0: 0 
[    1.141776] Built 1 zonelists, mobility grouping on.  Total pages: 3150281
[    1.152485] Policy zone: Normal
[    1.155624] mem auto-init: stack:off, heap alloc:off, heap free:off
[    1.161881] software IO TLB: area num 32.
[    1.201697] Memory: 12308624K/12801796K available (14336K kernel code, 2459K rwdata, 5712K rodata, 3044K init, 14704K bss, 492916K reserved, 0K cma-reserved)
[    1.215760] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=1
[    1.222260] Kernel/User page tables isolation: enabled
[    1.227461] ftrace: allocating 40092 entries in 157 pages
[    1.238827] ftrace: allocated 157 pages with 5 groups
[    1.243844] Dynamic Preempt: full
[    1.247217] Running RCU self tests
[    1.250451] Running RCU synchronous self tests
[    1.254895] rcu: Preemptible hierarchical RCU implementation.
[    1.260621] rcu: 	RCU lockdep checking is enabled.
[    1.265402] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=32.
[    1.272184] 	Trampoline variant of Tasks RCU enabled.
[    1.277225] 	Rude variant of Tasks RCU enabled.
[    1.281746] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    1.289395] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=32
[    1.296244] Running RCU synchronous self tests
[    1.303525] NR_IRQS: 16640, nr_irqs: 1088, preallocated irqs: 16
[    1.309579] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    1.316463] Console: colour dummy device 80x25
[    1.320753] printk: console [tty0] enabled
[    1.324829] printk: bootconsole [earlyser0] disabled
[    1.329828] printk: console [ttyS0] enabled
[    2.969459] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    2.977231] ... MAX_LOCKDEP_SUBCLASSES:  8
[    2.981340] ... MAX_LOCK_DEPTH:          48
[    2.985536] ... MAX_LOCKDEP_KEYS:        8192
[    2.989908] ... CLASSHASH_SIZE:          4096
[    2.994279] ... MAX_LOCKDEP_ENTRIES:     32768
[    2.998738] ... MAX_LOCKDEP_CHAINS:      65536
[    3.003195] ... CHAINHASH_SIZE:          32768
[    3.007653]  memory used by lock dependency info: 6365 kB
[    3.013071]  memory used for stack traces: 4224 kB
[    3.017878]  per task-struct memory footprint: 1920 bytes
[    3.023329] ACPI: Core revision 20230331
[    3.027502] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
[    3.036706] APIC: Switch to symmetric I/O mode setup
[    3.041907] x2apic: IRQ remapping doesn't support X2APIC mode
[    3.047731] Switched APIC routing to physical flat.
[    3.053202] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    3.063699] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33c4821c4fd, max_idle_ns: 440795387422 ns
[    3.074284] Calibrating delay loop (skipped), value calculated using timer frequency.. 7182.75 BogoMIPS (lpj=3591377)
[    3.075272] pid_max: default: 32768 minimum: 301
[    3.083375] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    3.084297] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    3.086699] CPU0: Thermal monitoring enabled (TM1)
[    3.087321] process: using mwait in idle threads
[    3.088279] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[    3.089272] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[    3.090276] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    3.091273] Spectre V2 : Mitigation: Retpolines
[    3.092272] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    3.093271] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
[    3.094271] Spectre V2 : Enabling Restricted Speculation for firmware calls
[    3.095274] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    3.096272] Spectre V2 : User space: Mitigation: STIBP via prctl
[    3.097273] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[    3.098275] MDS: Mitigation: Clear CPU buffers
[    3.099272] MMIO Stale Data: Unknown: No mitigations
[    3.113896] Freeing SMP alternatives memory: 36K
[    3.114674] Running RCU synchronous self tests
[    3.115277] Running RCU synchronous self tests
[    3.116462] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz (family: 0x6, model: 0x2d, stepping: 0x7)
[    3.117725] cblist_init_generic: Setting adjustable number of callback queues.
[    3.118272] cblist_init_generic: Setting shift to 5 and lim to 1.
[    3.119344] cblist_init_generic: Setting shift to 5 and lim to 1.
[    3.120328] Running RCU-tasks wait API self tests
[    3.232390] Performance Events: PEBS fmt1+, SandyBridge events, 16-deep LBR, full-width counters, Intel PMU driver.
[    3.233292] ... version:                3
[    3.234272] ... bit width:              48
[    3.235276] ... generic registers:      4
[    3.236280] ... value mask:             0000ffffffffffff
[    3.237279] ... max period:             00007fffffffffff
[    3.238277] ... fixed-purpose events:   3
[    3.239277] ... event mask:             000000070000000f
[    3.240594] Estimated ratio of average max frequency by base frequency (times 1024): 1052
[    3.241365] rcu: Hierarchical SRCU implementation.
[    3.242273] rcu: 	Max phase no-delay instances is 400.
[    3.245869] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    3.247411] smp: Bringing up secondary CPUs ...
[    3.248655] x86: Booting SMP configuration:
[    3.249274] .... node  #0, CPUs:        #1  #2  #3  #4
[    3.260747] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    3.262583]   #5  #6  #7
[    3.268545] smp: Brought up 1 node, 8 CPUs
[    3.270285] smpboot: Max logical packages: 4
[    3.271315] smpboot: Total of 8 processors activated (57462.03 BogoMIPS)
[    3.274775] devtmpfs: initialized
[    3.276302] ACPI: PM: Registering ACPI NVS region [mem 0x18ebb000-0x18fe7fff] (1232896 bytes)
[    3.277409] ACPI: PM: Registering ACPI NVS region [mem 0x18fe9000-0x18ffffff] (94208 bytes)
[    3.278460] Running RCU synchronous self tests
[    3.279304] Running RCU synchronous self tests
[    3.280402] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    3.281300] futex hash table entries: 8192 (order: 8, 1048576 bytes, linear)
[    3.283791] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    3.284713] DMA: preallocated 2048 KiB GFP_KERNEL pool for atomic allocations
[    3.285286] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    3.286284] DMA: preallocated 2048 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    3.287578] thermal_sys: Registered thermal governor 'step_wise'
[    3.287581] thermal_sys: Registered thermal governor 'user_space'
[    3.288347] cpuidle: using governor ladder
[    3.290314] cpuidle: using governor menu
[    3.291323] Simple Boot Flag at 0xf3 set to 0x1
[    3.292331] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    3.293435] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xb0000000-0xb3ffffff] (base 0xb0000000)
[    3.294276] PCI: not using MMCONFIG
[    3.295279] PCI: Using configuration type 1 for base access
[    3.296327] core: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
[    3.298427] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[    3.300295] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    3.301276] HugeTLB: 16380 KiB vmemmap can be freed for a 1.00 GiB page
[    3.302276] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    3.303279] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    3.305532] ACPI: Added _OSI(Module Device)
[    3.306274] ACPI: Added _OSI(Processor Device)
[    3.307273] ACPI: Added _OSI(3.0 _SCP Extensions)
[    3.308273] ACPI: Added _OSI(Processor Aggregator Device)
[    3.344417] Callback from call_rcu_tasks_rude() invoked.
[    3.439738] ACPI: 3 ACPI AML tables successfully acquired and loaded
[    3.468367] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[    3.475332] ACPI: Interpreter enabled
[    3.476350] ACPI: PM: (supports S0 S1 S3 S4 S5)
[    3.477289] ACPI: Using IOAPIC for interrupt routing
[    3.478339] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xb0000000-0xb3ffffff] (base 0xb0000000)
[    3.482798] [Firmware Info]: PCI: MMCONFIG at [mem 0xb0000000-0xb3ffffff] not reserved in ACPI motherboard resources
[    3.483287] PCI: MMCONFIG at [mem 0xb0000000-0xb3ffffff] reserved as EfiMemoryMappedIO
[    3.484290] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    3.485272] PCI: Using E820 reservations for host bridge windows
[    3.486876] ACPI: Enabled 7 GPEs in block 00 to 3F
[    3.526310] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1f])
[    3.527292] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[    3.528506] acpi PNP0A08:00: _OSC: platform does not support [AER PCIeCapability LTR]
[    3.529483] acpi PNP0A08:00: _OSC: not requesting control; platform does not support [PCIeCapability]
[    3.530280] acpi PNP0A08:00: _OSC: OS requested [PME AER PCIeCapability LTR]
[    3.531279] acpi PNP0A08:00: _OSC: platform willing to grant [PME]
[    3.532279] acpi PNP0A08:00: _OSC: platform retains control of PCIe features (AE_SUPPORT)
[    3.534553] PCI host bridge to bus 0000:00
[    3.535281] pci_bus 0000:00: root bus resource [io  0x0000-0x03af window]
[    3.536273] pci_bus 0000:00: root bus resource [io  0x03e0-0x0cf7 window]
[    3.537273] pci_bus 0000:00: root bus resource [io  0x03b0-0x03df window]
[    3.538273] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    3.539277] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000dffff window]
[    3.540279] pci_bus 0000:00: root bus resource [mem 0xb0000000-0xfbffffff window]
[    3.541285] pci_bus 0000:00: root bus resource [bus 00-1f]
[    3.542375] pci 0000:00:00.0: [8086:3c00] type 00 class 0x060000
[    3.543395] pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
[    3.544527] pci 0000:00:01.0: [8086:3c02] type 01 class 0x060400
[    3.545399] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[    3.546742] pci 0000:00:01.1: [8086:3c03] type 01 class 0x060400
[    3.547387] pci 0000:00:01.1: PME# supported from D0 D3hot D3cold
[    3.548643] pci 0000:00:02.0: [8086:3c04] type 01 class 0x060400
[    3.549387] pci 0000:00:02.0: PME# supported from D0 D3hot D3cold
[    3.550636] pci 0000:00:03.0: [8086:3c08] type 01 class 0x060400
[    3.551318] pci 0000:00:03.0: enabling Extended Tags
[    3.552355] pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
[    3.553603] pci 0000:00:05.0: [8086:3c28] type 00 class 0x088000
[    3.554490] pci 0000:00:05.2: [8086:3c2a] type 00 class 0x088000
[    3.555481] pci 0000:00:05.4: [8086:3c2c] type 00 class 0x080020
[    3.556287] pci 0000:00:05.4: reg 0x10: [mem 0xf332d000-0xf332dfff]
[    3.557488] pci 0000:00:11.0: [8086:1d3e] type 01 class 0x060400
[    3.558410] pci 0000:00:11.0: PME# supported from D0 D3hot D3cold
[    3.559592] pci 0000:00:16.0: [8086:1d3a] type 00 class 0x078000
[    3.560300] pci 0000:00:16.0: reg 0x10: [mem 0xf332c000-0xf332c00f 64bit]
[    3.561367] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold
[    3.562423] pci 0000:00:16.3: [8086:1d3d] type 00 class 0x070002
[    3.563293] pci 0000:00:16.3: reg 0x10: [io  0xf0a0-0xf0a7]
[    3.564279] pci 0000:00:16.3: reg 0x14: [mem 0xf332a000-0xf332afff]
[    3.565486] pci 0000:00:19.0: [8086:1502] type 00 class 0x020000
[    3.566287] pci 0000:00:19.0: reg 0x10: [mem 0xf3300000-0xf331ffff]
[    3.567280] pci 0000:00:19.0: reg 0x14: [mem 0xf3329000-0xf3329fff]
[    3.568280] pci 0000:00:19.0: reg 0x18: [io  0xf040-0xf05f]
[    3.569344] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    3.570532] pci 0000:00:1a.0: [8086:1d2d] type 00 class 0x0c0320
[    3.571295] pci 0000:00:1a.0: reg 0x10: [mem 0xf3328000-0xf33283ff]
[    3.572371] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
[    3.573569] pci 0000:00:1b.0: [8086:1d20] type 00 class 0x040300
[    3.574298] pci 0000:00:1b.0: reg 0x10: [mem 0xf3320000-0xf3323fff 64bit]
[    3.575381] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
[    3.576621] pci 0000:00:1c.0: [8086:1d16] type 01 class 0x060400
[    3.577387] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    3.578560] pci 0000:00:1c.2: [8086:1d14] type 01 class 0x060400
[    3.579387] pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold
[    3.580565] pci 0000:00:1d.0: [8086:1d26] type 00 class 0x0c0320
[    3.581295] pci 0000:00:1d.0: reg 0x10: [mem 0xf3327000-0xf33273ff]
[    3.582378] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold
[    3.583567] pci 0000:00:1e.0: [8086:244e] type 01 class 0x060401
[    3.584597] pci 0000:00:1f.0: [8086:1d41] type 00 class 0x060100
[    3.585702] pci 0000:00:1f.2: [8086:1d02] type 00 class 0x010601
[    3.586292] pci 0000:00:1f.2: reg 0x10: [io  0xf090-0xf097]
[    3.587279] pci 0000:00:1f.2: reg 0x14: [io  0xf080-0xf083]
[    3.588282] pci 0000:00:1f.2: reg 0x18: [io  0xf070-0xf077]
[    3.589279] pci 0000:00:1f.2: reg 0x1c: [io  0xf060-0xf063]
[    3.590279] pci 0000:00:1f.2: reg 0x20: [io  0xf020-0xf03f]
[    3.591283] pci 0000:00:1f.2: reg 0x24: [mem 0xf3326000-0xf33267ff]
[    3.592323] pci 0000:00:1f.2: PME# supported from D3hot
[    3.593521] pci 0000:00:1f.3: [8086:1d22] type 00 class 0x0c0500
[    3.594295] pci 0000:00:1f.3: reg 0x10: [mem 0xf3325000-0xf33250ff 64bit]
[    3.595288] pci 0000:00:1f.3: reg 0x20: [io  0xf000-0xf01f]
[    3.596568] pci 0000:00:01.0: PCI bridge to [bus 01]
[    3.597374] pci 0000:00:01.1: PCI bridge to [bus 02]
[    3.598375] pci 0000:03:00.0: [10de:10d8] type 00 class 0x030000
[    3.599283] pci 0000:03:00.0: reg 0x10: [mem 0xf2000000-0xf2ffffff]
[    3.600281] pci 0000:03:00.0: reg 0x14: [mem 0xf4000000-0xf7ffffff 64bit pref]
[    3.601281] pci 0000:03:00.0: reg 0x1c: [mem 0xf8000000-0xf9ffffff 64bit pref]
[    3.602283] pci 0000:03:00.0: reg 0x24: [io  0xe000-0xe07f]
[    3.603282] pci 0000:03:00.0: reg 0x30: [mem 0xf3000000-0xf307ffff pref]
[    3.604288] pci 0000:03:00.0: enabling Extended Tags
[    3.605310] pci 0000:03:00.0: BAR 3: assigned to efifb
[    3.606290] pci 0000:03:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    3.607460] pci 0000:03:00.1: [10de:0be3] type 00 class 0x040300
[    3.608281] pci 0000:03:00.1: reg 0x10: [mem 0xf3080000-0xf3083fff]
[    3.609300] pci 0000:03:00.1: enabling Extended Tags
[    3.610484] pci 0000:00:02.0: PCI bridge to [bus 03]
[    3.611281] pci 0000:00:02.0:   bridge window [io  0xe000-0xefff]
[    3.612274] pci 0000:00:02.0:   bridge window [mem 0xf2000000-0xf30fffff]
[    3.613276] pci 0000:00:02.0:   bridge window [mem 0xf4000000-0xf9ffffff 64bit pref]
[    3.614357] pci 0000:00:03.0: PCI bridge to [bus 04]
[    3.615394] pci 0000:05:00.0: [8086:1d6b] type 00 class 0x010700
[    3.616296] pci 0000:05:00.0: reg 0x10: [mem 0xfa800000-0xfa803fff 64bit pref]
[    3.617287] pci 0000:05:00.0: reg 0x18: [mem 0xfa400000-0xfa7fffff 64bit pref]
[    3.618282] pci 0000:05:00.0: reg 0x20: [io  0xd000-0xd0ff]
[    3.619304] pci 0000:05:00.0: enabling Extended Tags
[    3.620400] pci 0000:05:00.0: reg 0x164: [mem 0x00000000-0x00003fff 64bit pref]
[    3.621273] pci 0000:05:00.0: VF(n) BAR0 space: [mem 0x00000000-0x0007bfff 64bit pref] (contains BAR0 for 31 VFs)
[    3.622629] pci 0000:00:11.0: PCI bridge to [bus 05]
[    3.623276] pci 0000:00:11.0:   bridge window [io  0xd000-0xdfff]
[    3.624275] pci 0000:00:11.0:   bridge window [mem 0xf3200000-0xf32fffff]
[    3.625283] pci 0000:00:11.0:   bridge window [mem 0xfa400000-0xfa8fffff 64bit pref]
[    3.626365] pci 0000:00:1c.0: PCI bridge to [bus 06]
[    3.627396] pci 0000:07:00.0: [1033:0194] type 00 class 0x0c0330
[    3.628301] pci 0000:07:00.0: reg 0x10: [mem 0xf3100000-0xf3101fff 64bit]
[    3.629439] pci 0000:07:00.0: PME# supported from D0 D3hot D3cold
[    3.630630] pci 0000:00:1c.2: PCI bridge to [bus 07]
[    3.631284] pci 0000:00:1c.2:   bridge window [mem 0xf3100000-0xf31fffff]
[    3.632304] pci_bus 0000:08: extended config space not accessible
[    3.633369] pci 0000:00:1e.0: PCI bridge to [bus 08] (subtractive decode)
[    3.634288] pci 0000:00:1e.0:   bridge window [io  0x0000-0x03af window] (subtractive decode)
[    3.635273] pci 0000:00:1e.0:   bridge window [io  0x03e0-0x0cf7 window] (subtractive decode)
[    3.636279] pci 0000:00:1e.0:   bridge window [io  0x03b0-0x03df window] (subtractive decode)
[    3.637273] pci 0000:00:1e.0:   bridge window [io  0x0d00-0xffff window] (subtractive decode)
[    3.638280] pci 0000:00:1e.0:   bridge window [mem 0x000a0000-0x000dffff window] (subtractive decode)
[    3.639278] pci 0000:00:1e.0:   bridge window [mem 0xb0000000-0xfbffffff window] (subtractive decode)
[    3.641847] ACPI: PCI: Interrupt link LNKA configured for IRQ 11
[    3.642428] ACPI: PCI: Interrupt link LNKB configured for IRQ 10
[    3.643427] ACPI: PCI: Interrupt link LNKC configured for IRQ 5
[    3.644416] ACPI: PCI: Interrupt link LNKD configured for IRQ 10
[    3.645416] ACPI: PCI: Interrupt link LNKE configured for IRQ 3
[    3.646415] ACPI: PCI: Interrupt link LNKF configured for IRQ 0
[    3.647280] ACPI: PCI: Interrupt link LNKF disabled
[    3.648415] ACPI: PCI: Interrupt link LNKG configured for IRQ 11
[    3.649433] ACPI: PCI: Interrupt link LNKH configured for IRQ 0
[    3.650273] ACPI: PCI: Interrupt link LNKH disabled
[    3.651665] ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 20-ff])
[    3.652276] acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[    3.653490] acpi PNP0A08:01: _OSC: platform does not support [AER PCIeCapability LTR]
[    3.654475] acpi PNP0A08:01: _OSC: not requesting control; platform does not support [PCIeCapability]
[    3.655273] acpi PNP0A08:01: _OSC: OS requested [PME AER PCIeCapability LTR]
[    3.656272] acpi PNP0A08:01: _OSC: platform willing to grant [PME]
[    3.657272] acpi PNP0A08:01: _OSC: platform retains control of PCIe features (AE_SUPPORT)
[    3.658290] acpi PNP0A08:01: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge
[    3.659754] PCI host bridge to bus 0000:20
[    3.660274] pci_bus 0000:20: root bus resource [io  0x03b0-0x03df window]
[    3.661273] pci_bus 0000:20: root bus resource [mem 0x000a0000-0x000bffff window]
[    3.662273] pci_bus 0000:20: root bus resource [bus 20-ff]
[    3.663806] iommu: Default domain type: Translated 
[    3.664273] iommu: DMA domain TLB invalidation policy: lazy mode 
[    3.665578] SCSI subsystem initialized
[    3.666344] libata version 3.00 loaded.
[    3.667293] ACPI: bus type USB registered
[    3.668322] usbcore: registered new interface driver usbfs
[    3.669311] usbcore: registered new interface driver hub
[    3.670327] usbcore: registered new device driver usb
[    3.671325] pps_core: LinuxPPS API ver. 1 registered
[    3.672272] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    3.673290] PTP clock support registered
[    3.675286] efivars: Registered efivars operations
[    3.676804] PCI: Using ACPI for IRQ routing
[    3.679485] PCI: Discovered peer bus 3f
[    3.680280] PCI: root bus 3f: using default resources
[    3.681273] PCI: Probing PCI hardware (bus 3f)
[    3.682350] PCI host bridge to bus 0000:3f
[    3.683273] pci_bus 0000:3f: root bus resource [io  0x0000-0xffff]
[    3.684273] pci_bus 0000:3f: root bus resource [mem 0x00000000-0x3fffffffffff]
[    3.685287] pci_bus 0000:3f: No busn resource found for root bus, will use [bus 3f-ff]
[    3.686280] pci_bus 0000:3f: busn_res: can not insert [bus 3f-ff] under domain [bus 00-ff] (conflicts with (null) [bus 20-ff])
[    3.687305] pci 0000:3f:08.0: [8086:3c80] type 00 class 0x088000
[    3.688417] pci 0000:3f:08.3: [8086:3c83] type 00 class 0x088000
[    3.689415] pci 0000:3f:08.4: [8086:3c84] type 00 class 0x088000
[    3.690437] pci 0000:3f:09.0: [8086:3c90] type 00 class 0x088000
[    3.691403] pci 0000:3f:09.3: [8086:3c93] type 00 class 0x088000
[    3.692412] pci 0000:3f:09.4: [8086:3c94] type 00 class 0x088000
[    3.693418] pci 0000:3f:0a.0: [8086:3cc0] type 00 class 0x088000
[    3.694377] pci 0000:3f:0a.1: [8086:3cc1] type 00 class 0x088000
[    3.695382] pci 0000:3f:0a.2: [8086:3cc2] type 00 class 0x088000
[    3.696376] pci 0000:3f:0a.3: [8086:3cd0] type 00 class 0x088000
[    3.697386] pci 0000:3f:0b.0: [8086:3ce0] type 00 class 0x088000
[    3.698383] pci 0000:3f:0b.3: [8086:3ce3] type 00 class 0x088000
[    3.699376] pci 0000:3f:0c.0: [8086:3ce8] type 00 class 0x088000
[    3.700374] pci 0000:3f:0c.1: [8086:3ce8] type 00 class 0x088000
[    3.701381] pci 0000:3f:0c.6: [8086:3cf4] type 00 class 0x088000
[    3.702376] pci 0000:3f:0c.7: [8086:3cf6] type 00 class 0x088000
[    3.703374] pci 0000:3f:0d.0: [8086:3ce8] type 00 class 0x088000
[    3.704385] pci 0000:3f:0d.1: [8086:3ce8] type 00 class 0x088000
[    3.705383] pci 0000:3f:0d.6: [8086:3cf5] type 00 class 0x088000
[    3.706382] pci 0000:3f:0e.0: [8086:3ca0] type 00 class 0x088000
[    3.707388] pci 0000:3f:0e.1: [8086:3c46] type 00 class 0x110100
[    3.708398] pci 0000:3f:0f.0: [8086:3ca8] type 00 class 0x088000
[    3.709417] pci 0000:3f:0f.1: [8086:3c71] type 00 class 0x088000
[    3.710410] pci 0000:3f:0f.2: [8086:3caa] type 00 class 0x088000
[    3.711409] pci 0000:3f:0f.3: [8086:3cab] type 00 class 0x088000
[    3.712410] pci 0000:3f:0f.4: [8086:3cac] type 00 class 0x088000
[    3.713409] pci 0000:3f:0f.5: [8086:3cad] type 00 class 0x088000
[    3.714411] pci 0000:3f:0f.6: [8086:3cae] type 00 class 0x088000
[    3.715383] pci 0000:3f:10.0: [8086:3cb0] type 00 class 0x088000
[    3.716410] pci 0000:3f:10.1: [8086:3cb1] type 00 class 0x088000
[    3.717411] pci 0000:3f:10.2: [8086:3cb2] type 00 class 0x088000
[    3.718409] pci 0000:3f:10.3: [8086:3cb3] type 00 class 0x088000
[    3.719414] pci 0000:3f:10.4: [8086:3cb4] type 00 class 0x088000
[    3.720411] pci 0000:3f:10.5: [8086:3cb5] type 00 class 0x088000
[    3.721409] pci 0000:3f:10.6: [8086:3cb6] type 00 class 0x088000
[    3.722411] pci 0000:3f:10.7: [8086:3cb7] type 00 class 0x088000
[    3.723406] pci 0000:3f:11.0: [8086:3cb8] type 00 class 0x088000
[    3.724394] pci 0000:3f:13.0: [8086:3ce4] type 00 class 0x088000
[    3.725382] pci 0000:3f:13.1: [8086:3c43] type 00 class 0x110100
[    3.726393] pci 0000:3f:13.4: [8086:3ce6] type 00 class 0x110100
[    3.727381] pci 0000:3f:13.5: [8086:3c44] type 00 class 0x110100
[    3.728383] pci 0000:3f:13.6: [8086:3c45] type 00 class 0x088000
[    3.729389] pci_bus 0000:3f: busn_res: [bus 3f-ff] end is updated to 3f
[    3.730276] pci_bus 0000:3f: busn_res: can not insert [bus 3f] under domain [bus 00-ff] (conflicts with (null) [bus 20-ff])
[    3.731350] PCI: pci_cache_line_size set to 64 bytes
[    3.732395] e820: reserve RAM buffer [mem 0x18ebb000-0x1bffffff]
[    3.733287] e820: reserve RAM buffer [mem 0x18fe9000-0x1bffffff]
[    3.734272] e820: reserve RAM buffer [mem 0x1dffd000-0x1fffffff]
[    3.735278] e820: reserve RAM buffer [mem 0xac77d000-0xafffffff]
[    3.736287] e820: reserve RAM buffer [mem 0xad800000-0xafffffff]
[    3.737589] pci 0000:03:00.0: vgaarb: setting as boot VGA device
[    3.738270] pci 0000:03:00.0: vgaarb: bridge control possible
[    3.738270] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    3.738301] vgaarb: loaded
[    3.739432] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
[    3.740276] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[    3.743436] clocksource: Switched to clocksource tsc-early
[    3.749474] pnp: PnP ACPI init
[    3.752794] system 00:00: [mem 0xfc000000-0xfcffffff] has been reserved
[    3.759457] system 00:00: [mem 0xfd000000-0xfdffffff] has been reserved
[    3.766110] system 00:00: [mem 0xfe000000-0xfeafffff] has been reserved
[    3.772764] system 00:00: [mem 0xfeb00000-0xfebfffff] has been reserved
[    3.779416] system 00:00: [mem 0xfed00400-0xfed3ffff] could not be reserved
[    3.786420] system 00:00: [mem 0xfed45000-0xfedfffff] has been reserved
[    3.793487] system 00:01: [io  0x0680-0x069f] has been reserved
[    3.799454] system 00:01: [io  0x0800-0x080f] has been reserved
[    3.799467] Callback from call_rcu_tasks() invoked.
[    3.799478] system 00:01: [io  0xffff] has been reserved
[    3.815677] system 00:01: [io  0xffff] has been reserved
[    3.821028] system 00:01: [io  0x0400-0x0453] has been reserved
[    3.826980] system 00:01: [io  0x0458-0x047f] has been reserved
[    3.832934] system 00:01: [io  0x0500-0x057f] has been reserved
[    3.838897] system 00:01: [io  0x164e-0x164f] has been reserved
[    3.845146] system 00:03: [io  0x0454-0x0457] has been reserved
[    3.852337] pnp: PnP ACPI: found 8 devices
[    3.867387] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    3.876377] NET: Registered PF_INET protocol family
[    3.881552] IP idents hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    3.892583] tcp_listen_portaddr_hash hash table entries: 8192 (order: 7, 589824 bytes, linear)
[    3.901469] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    3.909262] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    3.917531] TCP bind hash table entries: 65536 (order: 11, 9437184 bytes, vmalloc hugepage)
[    3.928450] TCP: Hash tables configured (established 131072 bind 65536)
[    3.935325] UDP hash table entries: 8192 (order: 8, 1310720 bytes, linear)
[    3.942619] UDP-Lite hash table entries: 8192 (order: 8, 1310720 bytes, linear)
[    3.950426] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    3.956166] pci 0000:00:01.0: PCI bridge to [bus 01]
[    3.961170] pci 0000:00:01.1: PCI bridge to [bus 02]
[    3.966168] pci 0000:00:02.0: PCI bridge to [bus 03]
[    3.971157] pci 0000:00:02.0:   bridge window [io  0xe000-0xefff]
[    3.977286] pci 0000:00:02.0:   bridge window [mem 0xf2000000-0xf30fffff]
[    3.984123] pci 0000:00:02.0:   bridge window [mem 0xf4000000-0xf9ffffff 64bit pref]
[    3.991912] pci 0000:00:03.0: PCI bridge to [bus 04]
[    3.996914] pci 0000:05:00.0: BAR 7: assigned [mem 0xfa804000-0xfa87ffff 64bit pref]
[    4.004701] pci 0000:00:11.0: PCI bridge to [bus 05]
[    4.009696] pci 0000:00:11.0:   bridge window [io  0xd000-0xdfff]
[    4.015824] pci 0000:00:11.0:   bridge window [mem 0xf3200000-0xf32fffff]
[    4.022656] pci 0000:00:11.0:   bridge window [mem 0xfa400000-0xfa8fffff 64bit pref]
[    4.030448] pci 0000:00:1c.0: PCI bridge to [bus 06]
[    4.035468] pci 0000:00:1c.2: PCI bridge to [bus 07]
[    4.040459] pci 0000:00:1c.2:   bridge window [mem 0xf3100000-0xf31fffff]
[    4.047287] pci 0000:00:1e.0: PCI bridge to [bus 08]
[    4.052297] pci_bus 0000:00: resource 4 [io  0x0000-0x03af window]
[    4.058507] pci_bus 0000:00: resource 5 [io  0x03e0-0x0cf7 window]
[    4.064715] pci_bus 0000:00: resource 6 [io  0x03b0-0x03df window]
[    4.070942] pci_bus 0000:00: resource 7 [io  0x0d00-0xffff window]
[    4.077158] pci_bus 0000:00: resource 8 [mem 0x000a0000-0x000dffff window]
[    4.084066] pci_bus 0000:00: resource 9 [mem 0xb0000000-0xfbffffff window]
[    4.090976] pci_bus 0000:03: resource 0 [io  0xe000-0xefff]
[    4.096579] pci_bus 0000:03: resource 1 [mem 0xf2000000-0xf30fffff]
[    4.102879] pci_bus 0000:03: resource 2 [mem 0xf4000000-0xf9ffffff 64bit pref]
[    4.110143] pci_bus 0000:05: resource 0 [io  0xd000-0xdfff]
[    4.115744] pci_bus 0000:05: resource 1 [mem 0xf3200000-0xf32fffff]
[    4.122045] pci_bus 0000:05: resource 2 [mem 0xfa400000-0xfa8fffff 64bit pref]
[    4.129313] pci_bus 0000:07: resource 1 [mem 0xf3100000-0xf31fffff]
[    4.135611] pci_bus 0000:08: resource 4 [io  0x0000-0x03af window]
[    4.141821] pci_bus 0000:08: resource 5 [io  0x03e0-0x0cf7 window]
[    4.148030] pci_bus 0000:08: resource 6 [io  0x03b0-0x03df window]
[    4.154249] pci_bus 0000:08: resource 7 [io  0x0d00-0xffff window]
[    4.160462] pci_bus 0000:08: resource 8 [mem 0x000a0000-0x000dffff window]
[    4.167369] pci_bus 0000:08: resource 9 [mem 0xb0000000-0xfbffffff window]
[    4.174436] pci_bus 0000:20: resource 4 [io  0x03b0-0x03df window]
[    4.180661] pci_bus 0000:20: resource 5 [mem 0x000a0000-0x000bffff window]
[    4.187630] pci_bus 0000:3f: resource 4 [io  0x0000-0xffff]
[    4.193232] pci_bus 0000:3f: resource 5 [mem 0x00000000-0x3fffffffffff]
[    4.199918] pci 0000:00:05.0: disabled boot interrupts on device [8086:3c28]
[    4.208571] pci 0000:03:00.1: extending delay after power-on from D3hot to 20 msec
[    4.216333] pci 0000:03:00.1: D0 power state depends on 0000:03:00.0
[    4.223191] pci 0000:07:00.0: enabling device (0000 -> 0002)
[    4.229117] PCI: CLS 64 bytes, default 64
[    4.233195] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    4.233433] Unpacking initramfs...
[    4.239672] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
[    4.252983] Initialise system trusted keyrings
[    4.257679] workingset: timestamp_bits=56 max_order=22 bucket_order=0
[    4.264459] ntfs: driver 2.1.32 [Flags: R/W].
[    4.268854] fuse: init (API version 7.38)
[    4.273139] 9p: Installing v9fs 9p2000 file system support
[    4.278836] Key type asymmetric registered
[    4.282980] Asymmetric key parser 'x509' registered
[    4.287925] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[    4.316498] ACPI: \_PR_.CP00: Found 4 idle states
[    4.322022] ACPI: \_PR_.CP01: Found 4 idle states
[    4.326883] ACPI: \_PR_.CP02: Found 4 idle states
[    4.331741] ACPI: \_PR_.CP03: Found 4 idle states
[    4.336594] ACPI: \_PR_.CP04: Found 4 idle states
[    4.341447] ACPI: \_PR_.CP05: Found 4 idle states
[    4.346299] ACPI: \_PR_.CP06: Found 4 idle states
[    4.351151] ACPI: \_PR_.CP07: Found 4 idle states
[    4.358021] Freeing initrd memory: 6736K
[    4.500661] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    4.507336] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    4.517352] serial 0000:00:16.3: enabling device (0000 -> 0003)
[    4.525626] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
[    4.534736] Linux agpgart interface v0.103
[    4.539182] ACPI: bus type drm_connector registered
[    4.545067] nouveau 0000:03:00.0: vgaarb: deactivate vga console
[    4.551292] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
[    4.673972] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
[    4.684664] nouveau 0000:03:00.0: fb: 512 MiB DDR3
[    4.889076] tsc: Refined TSC clocksource calibration: 3591.345 MHz
[    4.895354] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
[    4.905501] clocksource: Switched to clocksource tsc
[    4.999816] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
[    5.004901] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
[    5.010322] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
[    5.016157] nouveau 0000:03:00.0: DRM: DCB version 4.0
[    5.021383] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
[    5.027926] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
[    5.034476] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
[    5.041016] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
[    5.047544] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
[    5.054071] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
[    5.060599] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
[    5.066341] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
[    5.077323] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
[    5.084095] stackdepot: allocating hash table of 1048576 entries via kvcalloc
[    5.100189] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
[    5.155486] fbcon: nouveaudrmfb (fb0) is primary device
[    5.299223] Console: switching to colour frame buffer device 210x65
[    5.318062] nouveau 0000:03:00.0: [drm] fb0: nouveaudrmfb frame buffer device
[    5.361998] megasas: 07.725.01.00-rc1
[    5.366287] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[    5.373365] ahci 0000:00:1f.2: version 3.0
[    5.379420] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3 impl SATA mode
[    5.387923] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst 
[    5.404110] scsi host0: ahci
[    5.408666] scsi host1: ahci
[    5.413155] scsi host2: ahci
[    5.417474] scsi host3: ahci
[    5.421656] scsi host4: ahci
[    5.425773] scsi host5: ahci
[    5.429388] ata1: SATA max UDMA/133 abar m2048@0xf3326000 port 0xf3326100 irq 32
[    5.437202] ata2: SATA max UDMA/133 abar m2048@0xf3326000 port 0xf3326180 irq 32
[    5.445018] ata3: DUMMY
[    5.447726] ata4: DUMMY
[    5.450467] ata5: DUMMY
[    5.453194] ata6: DUMMY
[    5.456324] e1000e: Intel(R) PRO/1000 Network Driver
[    5.461640] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    5.469230] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    5.563007] e1000e 0000:00:19.0 0000:00:19.0 (uninitialized): registered PHC clock
[    5.656045] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 90:b1:1c:7b:da:e7
[    5.664463] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    5.671760] e1000e 0000:00:19.0 eth0: MAC: 10, PHY: 11, PBA No: 7041FF-0FF
[    5.679792] xhci_hcd 0000:07:00.0: xHCI Host Controller
[    5.685750] xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 1
[    5.693916] xhci_hcd 0000:07:00.0: hcc params 0x014042cb hci version 0x96 quirks 0x0000000000000004
[    5.704373] xhci_hcd 0000:07:00.0: xHCI Host Controller
[    5.704410] ehci-pci 0000:00:1a.0: EHCI Host Controller
[    5.709698] xhci_hcd 0000:07:00.0: new USB bus registered, assigned bus number 2
[    5.709775] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 3
[    5.715032] xhci_hcd 0000:07:00.0: Host supports USB 3.0 SuperSpeed
[    5.715758] ehci-pci 0000:00:1a.0: debug port 2
[    5.723137] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.04
[    5.734874] ehci-pci 0000:00:1a.0: irq 16, io mem 0xf3328000
[    5.736537] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    5.744079] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[    5.749532] usb usb1: Product: xHCI Host Controller
[    5.766884] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    5.769664] usb usb1: Manufacturer: Linux 6.4.0-rc1+ xhci-hcd
[    5.769674] usb usb1: SerialNumber: 0000:07:00.0
[    5.771453] hub 1-0:1.0: USB hub found
[    5.776217] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    5.777062] hub 1-0:1.0: 2 ports detected
[    5.778266] ata1.00: ATA-8: ST2000DM001-1CH164, CC24, max UDMA/133
[    5.779735] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    5.785413] ata2.00: ATAPI: PLDS DVD+/-RW DS-8A9SH, ED11, max UDMA/100
[    5.787281] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.04
[    5.792950] ata1.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    5.794080] ata1.00: configured for UDMA/133
[    5.794996] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    5.800420] scsi 0:0:0:0: Direct-Access     ATA      ST2000DM001-1CH1 CC24 PQ: 0 ANSI: 5
[    5.802303] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    5.802517] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    5.802524] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    5.802565] sd 0:0:0:0: [sda] Write Protect is off
[    5.802585] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    5.802639] ata2.00: configured for UDMA/100
[    5.802735] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.802909] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
[    5.804515] usb usb2: Product: xHCI Host Controller
[    5.810877] scsi 1:0:0:0: CD-ROM            PLDS     DVD+-RW DS-8A9SH ED11 PQ: 0 ANSI: 5
[    5.813110] usb usb2: Manufacturer: Linux 6.4.0-rc1+ xhci-hcd
[    5.813118] usb usb2: SerialNumber: 0000:07:00.0
[    5.814218] hub 2-0:1.0: USB hub found
[    5.930498]  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15 >
[    5.937639] hub 2-0:1.0: 2 ports detected
[    5.947027] sd 0:0:0:0: [sda] Attached SCSI disk
[    5.952925] usbcore: registered new interface driver usb-storage
[    5.953320] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.04
[    5.953346] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    5.953358] usb usb3: Product: EHCI Host Controller
[    5.953368] usb usb3: Manufacturer: Linux 6.4.0-rc1+ ehci_hcd
[    5.953377] usb usb3: SerialNumber: 0000:00:1a.0
[    5.955657] hub 3-0:1.0: USB hub found
[    5.963290] usbcore: registered new interface driver usbserial_generic
[    5.966351] hub 3-0:1.0: 3 ports detected
[    5.968987] ehci-pci 0000:00:1d.0: EHCI Host Controller
[    5.972243] usbserial: USB Serial support registered for generic
[    5.975991] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 4
[    5.978016] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[    5.978781] ehci-pci 0000:00:1d.0: debug port 2
[    5.979252] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
[    5.979563] i8042: Warning: Keylock active
[    5.984165] ehci-pci 0000:00:1d.0: irq 17, io mem 0xf3327000
[    5.984694] serio: i8042 KBD port at 0x60,0x64 irq 1
[    5.991990] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[    5.996780] mousedev: PS/2 mouse device common for all mice
[    5.997064] usb usb4: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.04
[    5.997764] usbcore: registered new interface driver synaptics_usb
[    5.998371] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    5.999033] input: PC Speaker as /devices/platform/pcspkr/input/input1
[    5.999865] usb usb4: Product: EHCI Host Controller
[    6.000732] rtc_cmos 00:02: RTC can wake from S4
[    6.001380] usb usb4: Manufacturer: Linux 6.4.0-rc1+ ehci_hcd
[    6.002438] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[    6.002682] rtc_cmos 00:02: registered as rtc0
[    6.002754] rtc_cmos 00:02: setting system clock to 2023-05-31T08:13:04 UTC (1685520784)
[    6.002869] rtc_cmos 00:02: alarms up to one year, y3k, 242 bytes nvram, hpet irqs
[    6.002890] usb usb4: SerialNumber: 0000:00:1d.0
[    6.002897] fail to initialize ptp_kvm
[    6.003504] cdrom: Uniform CD-ROM driver Revision: 3.20
[    6.005377] hub 4-0:1.0: USB hub found
[    6.009843] intel_pstate: Intel P-state driver initializing
[    6.020196] hub 4-0:1.0: 3 ports detected
[    6.038068] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    6.042372] hid: raw HID events driver (C) Jiri Kosina
[    6.256352] NET: Registered PF_INET6 protocol family
[    6.263200] Segment Routing with IPv6
[    6.268012] In-situ OAM (IOAM) with IPv6
[    6.268465] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    6.268573] mip6: Mobile IPv6
[    6.269349] sr 1:0:0:0: Attached scsi generic sg1 type 5
[    6.269870] NET: Registered PF_PACKET protocol family
[    6.282986] usb 4-1: new high-speed USB device number 2 using ehci-pci
[    6.288314] 9pnet: Installing 9P2000 support
[    6.309283] microcode: Microcode Update Driver: v2.2.
[    6.309291] IPI shorthand broadcast: enabled
[    6.327011] sched_clock: Marking stable (4579003425, 1747946202)->(7838879238, -1511929611)
[    6.337407] registered taskstats version 1
[    6.342047] Loading compiled-in X.509 certificates
[    6.368624] printk: console [netcon0] enabled
[    6.373971] netconsole: network logging started
[    6.375966] usb 3-1: new high-speed USB device number 2 using ehci-pci
[    6.380413] clk: Disabling unused clocks
[    6.395067] Freeing unused decrypted memory: 2036K
[    6.401430] Freeing unused kernel image (initmem) memory: 3044K
[    6.414063] Write protecting the kernel read-only data: 20480k
[    6.417477] usb 4-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00
[    6.429238] usb 4-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    6.437113] Freeing unused kernel image (rodata/data gap) memory: 432K
[    6.442848] hub 4-1:1.0: USB hub found
[    6.448521] Run /init as init process
[    6.452699]   with arguments:
[    6.456182]     /init
[    6.458958]   with environment:
[    6.460973] hub 4-1:1.0: 8 ports detected
[    6.462586]     HOME=/
[    6.470011]     TERM=linux
[    6.473991]     BOOT_IMAGE=/boot/vmlinuz-6.4.0-rc1+
[    6.520754] usb 3-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00
[    6.529566] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    6.543836] hub 3-1:1.0: USB hub found
[    6.551163] hub 3-1:1.0: 6 ports detected
[    7.716407] random: crng init done
[    8.320622] process '/usr/bin/fstype' started with executable stack
[    8.678656] EXT4-fs (sda7): mounted filesystem 6aef0462-c7e4-45ca-9a68-4e435300595e with ordered data mode. Quota mode: disabled.
[   10.589450] acpi-cpufreq: probe of acpi-cpufreq failed with error -17
[   10.591550] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[   10.592674] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2
[   10.593618] ACPI: button: Power Button [PWRB]
[   10.593840] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[   10.599604] ACPI: button: Power Button [PWRF]
[   10.605121] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[   10.647504] i2c i2c-14: 4/4 memory slots populated (from DMI)
[   10.684712] iTCO_vendor_support: vendor-support=0
[   10.715708] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
[   10.724357] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
[   10.730682] RAPL PMU: hw unit of domain package 2^-16 Joules
[   10.731713] iTCO_wdt iTCO_wdt.1.auto: Found a Patsburg TCO device (Version=2, TCOBASE=0x0460)
[   10.747468] iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0)
[   10.866628] cryptd: max_cpu_qlen set to 1000
[   10.970050] AVX version of gcm_enc/dec engaged.
[   10.975735] AES CTR mode by8 optimization enabled
[   12.063715] EDAC MC: Ver: 3.0.0
[   12.069807] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[   12.129012] EDAC DEBUG: sbridge_init: 
[   12.134337] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[   12.142034] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[   12.149709] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[   12.156866] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[   12.163530] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[   12.171528] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[   12.178310] EDAC sbridge: Seeking for: PCI ID 8086:3c71
[   12.185032] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3c71
[   12.191963] EDAC sbridge: Seeking for: PCI ID 8086:3c71
[   12.197977] EDAC sbridge: Seeking for: PCI ID 8086:3caa
[   12.203914] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3caa
[   12.210999] EDAC sbridge: Seeking for: PCI ID 8086:3caa
[   12.218000] EDAC sbridge: Seeking for: PCI ID 8086:3cab
[   12.224070] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cab
[   12.230697] EDAC sbridge: Seeking for: PCI ID 8086:3cab
[   12.233955] raid6: sse2x4   gen() 14825 MB/s
[   12.236475] EDAC sbridge: Seeking for: PCI ID 8086:3cac
[   12.247017] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cac
[   12.253817] EDAC sbridge: Seeking for: PCI ID 8086:3cac
[   12.253955] raid6: sse2x2   gen() 17600 MB/s
[   12.254629] EDAC sbridge: Seeking for: PCI ID 8086:3cad
[   12.270653] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cad
[   12.271953] raid6: sse2x1   gen() 13406 MB/s
[   12.277273] EDAC sbridge: Seeking for: PCI ID 8086:3cad
[   12.277736] raid6: using algorithm sse2x2 gen() 17600 MB/s
[   12.278521] EDAC sbridge: Seeking for: PCI ID 8086:3cb8
[   12.295956] raid6: .... xor() 9295 MB/s, rmw enabled
[   12.300040] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cb8
[   12.300970] raid6: using ssse3x2 recovery algorithm
[   12.301022] EDAC sbridge: Seeking for: PCI ID 8086:3cb8
[   12.323539] EDAC sbridge: Seeking for: PCI ID 8086:3cf4
[   12.329993] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cf4
[   12.336849] EDAC sbridge: Seeking for: PCI ID 8086:3cf4
[   12.342617] EDAC sbridge: Seeking for: PCI ID 8086:3cf6
[   12.348749] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cf6
[   12.355352] EDAC sbridge: Seeking for: PCI ID 8086:3cf6
[   12.361335] EDAC sbridge: Seeking for: PCI ID 8086:3cf5
[   12.367035] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3cf5
[   12.373588] EDAC sbridge: Seeking for: PCI ID 8086:3cf5
[   12.379462] EDAC DEBUG: sbridge_probe: Registering MC#0 (1 of 1)
[   12.386249] EDAC DEBUG: sbridge_register_mci: MC: mci = 000000009a9d559b, dev = 00000000fe166939
[   12.396975] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3ca0, bus 63 with dev = 0000000083b9c14f
[   12.407225] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3ca8, bus 63 with dev = 000000007daf49c8
[   12.417462] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3c71, bus 63 with dev = 0000000095b74c5a
[   12.427993] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3caa, bus 63 with dev = 00000000ed519f97
[   12.428196] xor: automatically using best checksumming function   avx       
[   12.428973] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cab, bus 63 with dev = 000000001a05e474
[   12.456556] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cac, bus 63 with dev = 0000000052d1f1e6
[   12.466979] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cad, bus 63 with dev = 00000000a771d86a
[   12.477455] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cb8, bus 63 with dev = 0000000045fb76d7
[   12.487676] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cf4, bus 63 with dev = 00000000f39a9d0f
[   12.498143] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cf6, bus 63 with dev = 00000000046233b4
[   12.508373] EDAC DEBUG: sbridge_mci_bind_devs: Associated PCI 8086:3cf5, bus 63 with dev = 00000000cc484fec
[   12.518869] EDAC DEBUG: get_dimm_config: mc#0: Node ID: 0, source ID: 0
[   12.525991] EDAC DEBUG: get_dimm_config: Memory mirroring is disabled
[   12.532939] EDAC DEBUG: get_dimm_config: Lockstep is disabled
[   12.539965] EDAC DEBUG: get_dimm_config: address map is on open page mode
[   12.547182] EDAC DEBUG: __populate_dimms: Memory is registered
[   12.553568] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 0, dimm 0, 4096 MiB (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[   12.566753] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 1, dimm 0, 4096 MiB (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[   12.581000] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 2, dimm 0, 4096 MiB (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[   12.593896] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 3, dimm 0, 4096 MiB (1048576 pages) bank: 8, rank: 2, row: 0x8000, col: 0x400
[   12.606793] EDAC DEBUG: get_memory_layout: TOLM: 2.812 GB (0x00000000b3ffffff)
[   12.614478] EDAC DEBUG: get_memory_layout: TOHM: 17.312 GB (0x0000000453ffffff)
[   12.622828] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 17.250 GB (0x0000000450000000) Interleave: [8:6] reg=0x000044c3
[   12.634683] EDAC DEBUG: get_memory_layout: SAD#0, interleave #0: 0
[   12.641935] EDAC DEBUG: get_memory_layout: TAD#0: up to 2.750 GB (0x00000000b0000000), socket interleave 1, memory interleave 4, TGT: 0, 1, 2, 3, reg=0x0002b3e4
[   12.656832] EDAC DEBUG: get_memory_layout: TAD#1: up to 17.250 GB (0x0000000450000000), socket interleave 1, memory interleave 4, TGT: 0, 1, 2, 3, reg=0x001133e4
[   12.671799] EDAC DEBUG: get_memory_layout: TAD CH#0, offset #0: 0.000 GB (0x0000000000000000), reg=0x00000000
[   12.682510] EDAC DEBUG: get_memory_layout: TAD CH#0, offset #1: 1.250 GB (0x0000000050000000), reg=0x00000500
[   12.692916] EDAC DEBUG: get_memory_layout: TAD CH#1, offset #0: 0.000 GB (0x0000000000000000), reg=0x00000000
[   12.703355] EDAC DEBUG: get_memory_layout: TAD CH#1, offset #1: 1.250 GB (0x0000000050000000), reg=0x00000500
[   12.713775] EDAC DEBUG: get_memory_layout: TAD CH#2, offset #0: 0.000 GB (0x0000000000000000), reg=0x00000000
[   12.724532] EDAC DEBUG: get_memory_layout: TAD CH#2, offset #1: 1.250 GB (0x0000000050000000), reg=0x00000500
[   12.735918] EDAC DEBUG: get_memory_layout: TAD CH#3, offset #0: 0.000 GB (0x0000000000000000), reg=0x00000000
[   12.746340] EDAC DEBUG: get_memory_layout: TAD CH#3, offset #1: 1.250 GB (0x0000000050000000), reg=0x00000500
[   12.756756] EDAC DEBUG: get_memory_layout: CH#0 RIR#0, limit: 3.999 GB (0x00000000fff00000), way: 2, reg=0x9000000e
[   12.767958] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[   12.780009] EDAC DEBUG: get_memory_layout: CH#0 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[   12.791549] EDAC DEBUG: get_memory_layout: CH#1 RIR#0, limit: 3.999 GB (0x00000000fff00000), way: 2, reg=0x9000000e
[   12.802491] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[   12.814420] EDAC DEBUG: get_memory_layout: CH#1 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[   12.826866] EDAC DEBUG: get_memory_layout: CH#2 RIR#0, limit: 3.999 GB (0x00000000fff00000), way: 2, reg=0x9000000e
[   12.837904] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[   12.849484] EDAC DEBUG: get_memory_layout: CH#2 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[   12.861348] EDAC DEBUG: get_memory_layout: CH#3 RIR#0, limit: 3.999 GB (0x00000000fff00000), way: 2, reg=0x9000000e
[   12.872320] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#0, offset 0.000 GB (0x0000000000000000), tgt: 0, reg=0x00000000
[   12.883887] EDAC DEBUG: get_memory_layout: CH#3 RIR#0 INTL#1, offset 0.000 GB (0x0000000000000000), tgt: 1, reg=0x00010000
[   12.895731] EDAC DEBUG: edac_mc_add_mc_with_groups: 
[   12.901723] EDAC DEBUG: edac_create_sysfs_mci_device: device mc0 created
[   12.910049] EDAC DEBUG: edac_create_dimm_object: device dimm0 created at location channel 0 slot 0 
[   12.919929] EDAC DEBUG: edac_create_dimm_object: device dimm3 created at location channel 1 slot 0 
[   12.931048] EDAC DEBUG: edac_create_dimm_object: device dimm6 created at location channel 2 slot 0 
[   12.940648] EDAC DEBUG: edac_create_dimm_object: device dimm9 created at location channel 3 slot 0 
[   12.952077] EDAC DEBUG: edac_create_csrow_object: device csrow0 created
[   12.959596] EDAC MC0: Giving out device to module sb_edac controller Sandy Bridge SrcID#0_Ha#0: DEV 0000:3f:0e.0 (INTERRUPT)
[   12.972975] EDAC sbridge:  Ver: 1.1.2 
[   14.009312] Adding 33554428k swap on /dev/sda5.  Priority:-2 extents:1 across:33554428k 
[   14.123544] EXT4-fs (sda7): re-mounted 6aef0462-c7e4-45ca-9a68-4e435300595e. Quota mode: disabled.
[   14.842033] e1000e 0000:00:19.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   14.855626] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   15.284697] EXT4-fs (sda6): mounting ext3 file system using the ext4 subsystem
[   15.571525] EXT4-fs (sda6): mounted filesystem 6b02369b-7362-4920-b703-8ba36125139f with ordered data mode. Quota mode: disabled.
[   15.650183] FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   15.679894] BTRFS info (device sda9): using crc32c (crc32c-intel) checksum algorithm
[   15.689918] BTRFS info (device sda9): disk space caching is enabled
[   16.222303] SGI XFS with ACLs, security attributes, quota, no debug enabled
[   16.260915] XFS (sda10): Deprecated V4 format (crc=0) will not be supported after September 2030.
[   16.281460] XFS (sda10): Mounting V4 Filesystem b62c870e-d204-498e-999b-5a0ea7c560cd
[   16.415636] XFS (sda10): Ending clean mount
[   16.423369] xfs filesystem being mounted at /mnt/kernel supports timestamps until 2038-01-19 (0x7fffffff)
[   16.623738] EXT4-fs (sda11): mounted filesystem a1428eb4-29da-4a1f-bbde-e9dc1081fb27 with ordered data mode. Quota mode: disabled.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR
From: Juergen Gross @ 2023-05-31  7:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, linux-hyperv, linux-doc, mikelley,
	Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Boris Ostrovsky, xen-devel, Jonathan Corbet, Andy Lutomirski,
	Peter Zijlstra
In-Reply-To: <20230530152825.GAZHYWGXAp8PHgN/w0@fat_crate.local>


[-- Attachment #1.1.1: Type: text/plain, Size: 1337 bytes --]

On 30.05.23 17:28, Borislav Petkov wrote:
> On Mon, May 22, 2023 at 04:17:50PM +0200, Juergen Gross wrote:
>> The attached diff is for patch 13.
> 
> Merged and pushed out into same branch.
> 
> Next issue. Diffing /proc/mtrr shows:
> 
> --- proc-mtrr.6.3	2023-05-30 17:00:13.215999483 +0200
> +++ proc-mtrr.after	2023-05-30 16:01:38.281997816 +0200
> @@ -1,8 +1,8 @@
>   reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
> -reg01: base=0x080000000 ( 2048MB), size=  512MB, count=1: write-back
> +reg01: base=0x080000000 ( 2048MB), size= 1024MB, count=1: write-back
>   reg02: base=0x0a0000000 ( 2560MB), size=  256MB, count=1: write-back
>   reg03: base=0x0ae000000 ( 2784MB), size=   32MB, count=1: uncachable
> -reg04: base=0x100000000 ( 4096MB), size= 4096MB, count=1: write-back
> +reg04: base=0x100000000 ( 4096MB), size=  256MB, count=1: write-back
>   reg05: base=0x200000000 ( 8192MB), size= 8192MB, count=1: write-back
>   reg06: base=0x400000000 (16384MB), size= 1024MB, count=1: write-back
>   reg07: base=0x440000000 (17408MB), size=  256MB, count=1: write-back
> 

Weird.

Can you please boot the system with the MTRR patches and specify "mtrr=debug"
on the command line? I'd be interested in the raw register values being read
and the resulting memory type map.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* [PATCH v4] hv_netvsc: Allocate rx indirection table size dynamically
From: Shradha Gupta @ 2023-05-31  3:14 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, netdev
  Cc: Shradha Gupta, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Michael Kelley, David S. Miller, Steen Hegelund, Simon Horman

Allocate the size of rx indirection table dynamically in netvsc
from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES
query instead of using a constant value of ITAB_NUM.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Tested-on: Ubuntu22 (azure VM, SKU size: Standard_F72s_v2)
Testcases:
1. ethtool -x eth0 output
2. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic
3. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV

---
Changes in v4:
 * set the right error code if rx table allocation fails
 * fixed unnecessary line break
 * removed extra newline
---
 drivers/net/hyperv/hyperv_net.h   |  5 ++++-
 drivers/net/hyperv/netvsc_drv.c   | 10 ++++++----
 drivers/net/hyperv/rndis_filter.c | 27 +++++++++++++++++++++++----
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index dd5919ec408b..c40868f287a9 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -74,6 +74,7 @@ struct ndis_recv_scale_cap { /* NDIS_RECEIVE_SCALE_CAPABILITIES */
 #define NDIS_RSS_HASH_SECRET_KEY_MAX_SIZE_REVISION_2   40
 
 #define ITAB_NUM 128
+#define ITAB_NUM_MAX 256
 
 struct ndis_recv_scale_param { /* NDIS_RECEIVE_SCALE_PARAMETERS */
 	struct ndis_obj_header hdr;
@@ -1034,7 +1035,9 @@ struct net_device_context {
 
 	u32 tx_table[VRSS_SEND_TAB_SIZE];
 
-	u16 rx_table[ITAB_NUM];
+	u16 *rx_table;
+
+	u32 rx_table_sz;
 
 	/* Ethtool settings */
 	u8 duplex;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 0103ff914024..3ba3c8fb28a5 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1747,7 +1747,9 @@ static u32 netvsc_get_rxfh_key_size(struct net_device *dev)
 
 static u32 netvsc_rss_indir_size(struct net_device *dev)
 {
-	return ITAB_NUM;
+	struct net_device_context *ndc = netdev_priv(dev);
+
+	return ndc->rx_table_sz;
 }
 
 static int netvsc_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
@@ -1766,7 +1768,7 @@ static int netvsc_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
 
 	rndis_dev = ndev->extension;
 	if (indir) {
-		for (i = 0; i < ITAB_NUM; i++)
+		for (i = 0; i < ndc->rx_table_sz; i++)
 			indir[i] = ndc->rx_table[i];
 	}
 
@@ -1792,11 +1794,11 @@ static int netvsc_set_rxfh(struct net_device *dev, const u32 *indir,
 
 	rndis_dev = ndev->extension;
 	if (indir) {
-		for (i = 0; i < ITAB_NUM; i++)
+		for (i = 0; i < ndc->rx_table_sz; i++)
 			if (indir[i] >= ndev->num_chn)
 				return -EINVAL;
 
-		for (i = 0; i < ITAB_NUM; i++)
+		for (i = 0; i < ndc->rx_table_sz; i++)
 			ndc->rx_table[i] = indir[i];
 	}
 
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index eea777ec2541..95869a3c3d6e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -21,6 +21,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/ucs2_string.h>
 #include <linux/string.h>
+#include <linux/slab.h>
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
@@ -927,7 +928,7 @@ static int rndis_set_rss_param_msg(struct rndis_device *rdev,
 	struct rndis_set_request *set;
 	struct rndis_set_complete *set_complete;
 	u32 extlen = sizeof(struct ndis_recv_scale_param) +
-		     4 * ITAB_NUM + NETVSC_HASH_KEYLEN;
+		     4 * ndc->rx_table_sz + NETVSC_HASH_KEYLEN;
 	struct ndis_recv_scale_param *rssp;
 	u32 *itab;
 	u8 *keyp;
@@ -953,7 +954,7 @@ static int rndis_set_rss_param_msg(struct rndis_device *rdev,
 	rssp->hashinfo = NDIS_HASH_FUNC_TOEPLITZ | NDIS_HASH_IPV4 |
 			 NDIS_HASH_TCP_IPV4 | NDIS_HASH_IPV6 |
 			 NDIS_HASH_TCP_IPV6;
-	rssp->indirect_tabsize = 4*ITAB_NUM;
+	rssp->indirect_tabsize = 4 * ndc->rx_table_sz;
 	rssp->indirect_taboffset = sizeof(struct ndis_recv_scale_param);
 	rssp->hashkey_size = NETVSC_HASH_KEYLEN;
 	rssp->hashkey_offset = rssp->indirect_taboffset +
@@ -961,7 +962,7 @@ static int rndis_set_rss_param_msg(struct rndis_device *rdev,
 
 	/* Set indirection table entries */
 	itab = (u32 *)(rssp + 1);
-	for (i = 0; i < ITAB_NUM; i++)
+	for (i = 0; i < ndc->rx_table_sz; i++)
 		itab[i] = ndc->rx_table[i];
 
 	/* Set hask key values */
@@ -1548,6 +1549,18 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	if (ret || rsscap.num_recv_que < 2)
 		goto out;
 
+	if (rsscap.num_indirect_tabent &&
+	    rsscap.num_indirect_tabent <= ITAB_NUM_MAX)
+		ndc->rx_table_sz = rsscap.num_indirect_tabent;
+	else
+		ndc->rx_table_sz = ITAB_NUM;
+
+	ndc->rx_table = kcalloc(ndc->rx_table_sz, sizeof(u16), GFP_KERNEL);
+	if (!ndc->rx_table) {
+		ret = -ENOMEM;
+		goto err_dev_remv;
+	}
+
 	/* This guarantees that num_possible_rss_qs <= num_online_cpus */
 	num_possible_rss_qs = min_t(u32, num_online_cpus(),
 				    rsscap.num_recv_que);
@@ -1558,7 +1571,7 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	net_device->num_chn = min(net_device->max_chn, device_info->num_chn);
 
 	if (!netif_is_rxfh_configured(net)) {
-		for (i = 0; i < ITAB_NUM; i++)
+		for (i = 0; i < ndc->rx_table_sz; i++)
 			ndc->rx_table[i] = ethtool_rxfh_indir_default(
 						i, net_device->num_chn);
 	}
@@ -1596,11 +1609,17 @@ void rndis_filter_device_remove(struct hv_device *dev,
 				struct netvsc_device *net_dev)
 {
 	struct rndis_device *rndis_dev = net_dev->extension;
+	struct net_device *net = hv_get_drvdata(dev);
+	struct net_device_context *ndc = netdev_priv(net);
 
 	/* Halt and release the rndis device */
 	rndis_filter_halt_device(net_dev, rndis_dev);
 
 	netvsc_device_remove(dev);
+
+	ndc->rx_table_sz = 0;
+	kfree(ndc->rx_table);
+	ndc->rx_table = NULL;
 }
 
 int rndis_filter_open(struct netvsc_device *nvdev)
-- 
2.34.1


^ permalink raw reply related

* [PATCH RFC net-next v3 8/8] tests: add vsock dgram tests
From: Bobby Eshleman @ 2023-05-31  0:35 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers
  Cc: kvm, virtualization, netdev, linux-kernel, linux-hyperv,
	Bobby Eshleman, Jiang Wang
In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com>

From: Jiang Wang <jiang.wang@bytedance.com>

This patch adds tests for vsock datagram.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 tools/testing/vsock/util.c       | 105 +++++++++++++++++++++
 tools/testing/vsock/util.h       |   4 +
 tools/testing/vsock/vsock_test.c | 193 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 302 insertions(+)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 01b636d3039a..45e35da48b40 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -260,6 +260,57 @@ void send_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Transmit one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+		 int flags)
+{
+	const uint8_t byte = 'A';
+	ssize_t nwritten;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
+				  len);
+		timeout_check("write");
+	} while (nwritten < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nwritten != -1) {
+			fprintf(stderr, "bogus sendto(2) return value %zd\n",
+				nwritten);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("write");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nwritten < 0) {
+		perror("write");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while sending byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten != sizeof(byte)) {
+		fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Receive one byte and check the return value.
  *
  * expected_ret:
@@ -313,6 +364,60 @@ void recv_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Receive one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+		   int expected_ret, int flags)
+{
+	uint8_t byte;
+	ssize_t nread;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
+		timeout_check("read");
+	} while (nread < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nread != -1) {
+			fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+				nread);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("read");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nread < 0) {
+		perror("read");
+		exit(EXIT_FAILURE);
+	}
+	if (nread == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while receiving byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nread != sizeof(byte)) {
+		fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
+		exit(EXIT_FAILURE);
+	}
+	if (byte != 'A') {
+		fprintf(stderr, "unexpected byte read %c\n", byte);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Run test cases.  The program terminates if a failure occurs. */
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index fb99208a95ea..6e5cd610bf05 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -43,7 +43,11 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
 			   struct sockaddr_vm *clientaddrp);
 void vsock_wait_remote_close(int fd);
 void send_byte(int fd, int expected_ret, int flags);
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+		 int flags);
 void recv_byte(int fd, int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+		   int expected_ret, int flags);
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts);
 void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index ac1bd3ac1533..851c3d65178d 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -202,6 +202,113 @@ static void test_stream_server_close_server(const struct test_opts *opts)
 	close(fd);
 }
 
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("socket");
+		exit(EXIT_FAILURE);
+	}
+
+	sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+	int ret;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = connect(fd, &addr.sa, sizeof(addr.svm));
+	if (ret < 0) {
+		perror("connect");
+		exit(EXIT_FAILURE);
+	}
+
+	send_byte(fd, 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+	test_dgram_sendto_server(opts);
+}
+
 /* With the standard socket sizes, VMCI is able to support about 100
  * concurrent stream connections.
  */
@@ -255,6 +362,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts)
 		close(fds[i]);
 }
 
+static void test_dgram_multiconn_client(const struct test_opts *opts)
+{
+	int fds[MULTICONN_NFDS];
+	int i;
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++) {
+		fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+		if (fds[i] < 0) {
+			perror("socket");
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		close(fds[i]);
+}
+
+static void test_dgram_multiconn_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+	int i;
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
 static void test_stream_msg_peek_client(const struct test_opts *opts)
 {
 	int fd;
@@ -1128,6 +1306,21 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_virtio_skb_merge_client,
 		.run_server = test_stream_virtio_skb_merge_server,
 	},
+	{
+		.name = "SOCK_DGRAM client close",
+		.run_client = test_dgram_sendto_client,
+		.run_server = test_dgram_sendto_server,
+	},
+	{
+		.name = "SOCK_DGRAM client connect",
+		.run_client = test_dgram_connect_client,
+		.run_server = test_dgram_connect_server,
+	},
+	{
+		.name = "SOCK_DGRAM multiple connections",
+		.run_client = test_dgram_multiconn_client,
+		.run_server = test_dgram_multiconn_server,
+	},
 	{},
 };
 

-- 
2.30.2


^ permalink raw reply related

* [PATCH RFC net-next v3 7/8] vsock: Add lockless sendmsg() support
From: Bobby Eshleman @ 2023-05-31  0:35 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Bryan Tan, Vishnu Dasa, VMware PV-Drivers Reviewers
  Cc: kvm, virtualization, netdev, linux-kernel, linux-hyperv,
	Bobby Eshleman
In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com>

Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock
it does not scale when many senders share a socket.

Prior to this patch the socket lock is used to protect both reads and
writes to the local_addr, remote_addr, transport, and buffer size
variables of a vsock socket. What follows are the new protection schemes
for these fields that ensure a race-free and usually lock-free
multi-sender sendmsg() path for vsock dgrams.

- local_addr
    local_addr changes as a result of binding a socket. The write path
    for local_addr is bind() and various vsock_auto_bind() call sites.
    After a socket has been bound via vsock_auto_bind() or bind(), subsequent
    calls to bind()/vsock_auto_bind() do not write to local_addr again. bind()
    rejects the user request and vsock_auto_bind() early exits.
    Therefore, the local addr can not change while a parallel thread is
    in sendmsg() and lock-free reads of local addr in sendmsg() are safe.
    Change: only acquire lock for auto-binding as-needed in sendmsg().

- buffer size variables
    Not used by dgram, so they do not need protection. No change.

- remote_addr and transport
    Because a remote_addr update may result in a changed transport, but we
    would like to be able to read these two fields lock-free but coherently
    in the vsock send path, this patch packages these two fields into a new
    struct vsock_remote_info that is referenced by an RCU-protected pointer.

    Writes are synchronized as usual by the socket lock. Reads only take
    place in RCU read-side critical sections. When remote_addr or transport
    is updated, a new remote info is allocated. Old readers still see the
    old coherent remote_addr/transport pair, and new readers will refer to
    the new coherent. The coherency between remote_addr and transport
    previously provided by the socket lock alone is now also preserved by
    RCU, except with the highly-scalable lock-free read-side.

Helpers are introduced for accessing and updating the new pointer.

The new structure is contains an rcu_head so that kfree_rcu() can be
used. This removes the need of writers to use synchronize_rcu() after
freeing old structures which is simply more efficient and reduces code
churn where remote_addr/transport are already being updated inside RCU
read-side sections.

Only virtio has been tested, but updates were necessary to the VMCI and
hyperv code. Unfortunately the author does not have access to
VMCI/hyperv systems so those changes are untested.

Perf Tests (results from patch v2)
vCPUS: 16
Threads: 16
Payload: 4KB
Test Runs: 5
Type: SOCK_DGRAM

Before: 245.2 MB/s
After: 509.2 MB/s (+107%)

Notably, on the same test system, vsock dgram even outperforms
multi-threaded UDP over virtio-net with vhost and MQ support enabled.

Throughput metrics for single-threaded SOCK_DGRAM and
single/multi-threaded SOCK_STREAM showed no statistically signficant
throughput changes (lowest p-value reaching 0.27), with the range of the
mean difference ranging between -5% to +1%.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
 drivers/vhost/vsock.c                   |  12 +-
 include/linux/virtio_vsock.h            |   3 +-
 include/net/af_vsock.h                  |  39 ++-
 net/vmw_vsock/af_vsock.c                | 451 +++++++++++++++++++++++++-------
 net/vmw_vsock/diag.c                    |  10 +-
 net/vmw_vsock/hyperv_transport.c        |  27 +-
 net/vmw_vsock/virtio_transport_common.c |  32 ++-
 net/vmw_vsock/vmci_transport.c          |  84 ++++--
 net/vmw_vsock/vsock_bpf.c               |  10 +-
 9 files changed, 518 insertions(+), 150 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 159c1a22c1a8..b027a780d333 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -297,13 +297,17 @@ static int
 vhost_transport_cancel_pkt(struct vsock_sock *vsk)
 {
 	struct vhost_vsock *vsock;
+	unsigned int cid;
 	int cnt = 0;
 	int ret = -ENODEV;
 
 	rcu_read_lock();
+	ret = vsock_remote_addr_cid(vsk, &cid);
+	if (ret < 0)
+		goto out;
 
 	/* Find the vhost_vsock according to guest context id  */
-	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+	vsock = vhost_vsock_get(cid);
 	if (!vsock)
 		goto out;
 
@@ -706,6 +710,10 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 static void vhost_vsock_reset_orphans(struct sock *sk)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
+	unsigned int cid;
+
+	if (vsock_remote_addr_cid(vsk, &cid) < 0)
+		return;
 
 	/* vmci_transport.c doesn't take sk_lock here either.  At least we're
 	 * under vsock_table_lock so the sock cannot disappear while we're
@@ -713,7 +721,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
 	 */
 
 	/* If the peer is still valid, no need to reset connection */
-	if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+	if (vhost_vsock_get(cid))
 		return;
 
 	/* If the close timeout is pending, let it expire.  This avoids races
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 237ca87a2ecd..97656e83606f 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -231,7 +231,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
 				struct msghdr *msg,
 				size_t len);
 int
-virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
+virtio_transport_dgram_enqueue(const struct vsock_transport *transport,
+			       struct vsock_sock *vsk,
 			       struct sockaddr_vm *remote_addr,
 			       struct msghdr *msg,
 			       size_t len);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index c115e655b4f5..84f2a9700ebd 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -25,12 +25,17 @@ extern spinlock_t vsock_table_lock;
 #define vsock_sk(__sk)    ((struct vsock_sock *)__sk)
 #define sk_vsock(__vsk)   (&(__vsk)->sk)
 
+struct vsock_remote_info {
+	struct sockaddr_vm addr;
+	struct rcu_head rcu;
+	const struct vsock_transport *transport;
+};
+
 struct vsock_sock {
 	/* sk must be the first member. */
 	struct sock sk;
-	const struct vsock_transport *transport;
 	struct sockaddr_vm local_addr;
-	struct sockaddr_vm remote_addr;
+	struct vsock_remote_info * __rcu remote_info;
 	/* Links for the global tables of bound and connected sockets. */
 	struct list_head bound_table;
 	struct list_head connected_table;
@@ -120,8 +125,8 @@ struct vsock_transport {
 
 	/* DGRAM. */
 	int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *);
-	int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
-			     struct msghdr *, size_t len);
+	int (*dgram_enqueue)(const struct vsock_transport *, struct vsock_sock *,
+			     struct sockaddr_vm *, struct msghdr *, size_t len);
 	bool (*dgram_allow)(u32 cid, u32 port);
 	int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid);
 	int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port);
@@ -196,6 +201,17 @@ void vsock_core_unregister(const struct vsock_transport *t);
 /* The transport may downcast this to access transport-specific functions */
 const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk);
 
+static inline struct vsock_remote_info *
+vsock_core_get_remote_info(struct vsock_sock *vsk)
+{
+
+	/* vsk->remote_info may be accessed if the rcu read lock is held OR the
+	 * socket lock is held
+	 */
+	return rcu_dereference_check(vsk->remote_info,
+				     lockdep_sock_is_held(sk_vsock(vsk)));
+}
+
 /**** UTILS ****/
 
 /* vsock_table_lock must be held */
@@ -214,7 +230,7 @@ void vsock_release_pending(struct sock *pending);
 void vsock_add_pending(struct sock *listener, struct sock *pending);
 void vsock_remove_pending(struct sock *listener, struct sock *pending);
 void vsock_enqueue_accept(struct sock *listener, struct sock *connected);
-void vsock_insert_connected(struct vsock_sock *vsk);
+int vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
@@ -223,7 +239,8 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(struct vsock_transport *transport,
 				     void (*fn)(struct sock *sk));
-int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
+int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk,
+			   struct sockaddr_vm *remote_addr);
 bool vsock_find_cid(unsigned int cid);
 struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr);
 
@@ -253,4 +270,14 @@ static inline void __init vsock_bpf_build_proto(void)
 {}
 #endif
 
+/* RCU-protected remote addr helpers */
+int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid);
+int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port);
+int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid,
+			       unsigned int *port);
+int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest);
+bool vsock_remote_addr_bound(struct vsock_sock *vsk);
+bool vsock_remote_addr_equals(struct vsock_sock *vsk, struct sockaddr_vm *other);
+int vsock_remote_addr_update_cid_port(struct vsock_sock *vsk, u32 cid, u32 port);
+
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index e8c70069d77d..0520228d2a68 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -114,6 +114,8 @@
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
 static void vsock_sk_destruct(struct sock *sk);
 static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+static bool vsock_use_local_transport(unsigned int remote_cid);
+static bool sock_type_connectible(u16 type);
 
 /* Protocol family. */
 struct proto vsock_proto = {
@@ -145,6 +147,147 @@ static const struct vsock_transport *transport_local;
 static DEFINE_MUTEX(vsock_register_mutex);
 
 /**** UTILS ****/
+bool vsock_remote_addr_bound(struct vsock_sock *vsk)
+{
+	struct vsock_remote_info *remote_info;
+	bool ret;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return false;
+	}
+
+	ret = vsock_addr_bound(&remote_info->addr);
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_bound);
+
+int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest)
+{
+	struct vsock_remote_info *remote_info;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+	memcpy(dest, &remote_info->addr, sizeof(*dest));
+	rcu_read_unlock();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_copy);
+
+int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid)
+{
+	return vsock_remote_addr_cid_port(vsk, cid, NULL);
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_cid);
+
+int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port)
+{
+	return vsock_remote_addr_cid_port(vsk, NULL, port);
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_port);
+
+int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid,
+			       unsigned int *port)
+{
+	struct vsock_remote_info *remote_info;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+
+	if (cid)
+		*cid = remote_info->addr.svm_cid;
+	if (port)
+		*port = remote_info->addr.svm_port;
+
+	rcu_read_unlock();
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_cid_port);
+
+/* The socket lock must be held by the caller */
+int vsock_set_remote_info(struct vsock_sock *vsk,
+			  const struct vsock_transport *transport,
+			  struct sockaddr_vm *addr)
+{
+	struct vsock_remote_info *old, *new;
+
+	if (addr || transport) {
+		new = kmalloc(sizeof(*new), GFP_KERNEL);
+		if (!new)
+			return -ENOMEM;
+
+		if (addr)
+			memcpy(&new->addr, addr, sizeof(new->addr));
+
+		if (transport)
+			new->transport = transport;
+	} else {
+		new = NULL;
+	}
+
+	old = rcu_replace_pointer(vsk->remote_info, new, lockdep_sock_is_held(sk_vsock(vsk)));
+	kfree_rcu(old, rcu);
+
+	return 0;
+}
+
+static const struct vsock_transport *
+vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
+{
+	const struct vsock_transport *transport;
+
+	if (vsock_use_local_transport(cid))
+		transport = transport_local;
+	else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
+		 (flags & VMADDR_FLAG_TO_HOST))
+		transport = transport_g2h;
+	else
+		transport = transport_h2g;
+
+	return transport;
+}
+
+static const struct vsock_transport *
+vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
+{
+	if (transport_dgram)
+		return transport_dgram;
+
+	return vsock_connectible_lookup_transport(cid, flags);
+}
+
+bool vsock_remote_addr_equals(struct vsock_sock *vsk,
+			      struct sockaddr_vm *other)
+{
+	struct vsock_remote_info *remote_info;
+	bool equals;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return false;
+	}
+
+	equals = vsock_addr_equals_addr(&remote_info->addr, other);
+	rcu_read_unlock();
+
+	return equals;
+}
+EXPORT_SYMBOL_GPL(vsock_remote_addr_equals);
 
 /* Each bound VSocket is stored in the bind hash table and each connected
  * VSocket is stored in the connected hash table.
@@ -284,10 +427,16 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
 
 	list_for_each_entry(vsk, vsock_connected_sockets(src, dst),
 			    connected_table) {
-		if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
+		struct vsock_remote_info *remote_info;
+
+		rcu_read_lock();
+		remote_info = vsock_core_get_remote_info(vsk);
+		if (vsock_addr_equals_addr(src, &remote_info->addr) &&
 		    dst->svm_port == vsk->local_addr.svm_port) {
+			rcu_read_unlock();
 			return sk_vsock(vsk);
 		}
+		rcu_read_unlock();
 	}
 
 	return NULL;
@@ -300,17 +449,36 @@ static void vsock_insert_unbound(struct vsock_sock *vsk)
 	spin_unlock_bh(&vsock_table_lock);
 }
 
-void vsock_insert_connected(struct vsock_sock *vsk)
+int vsock_insert_connected(struct vsock_sock *vsk)
 {
-	struct list_head *list = vsock_connected_sockets(
-		&vsk->remote_addr, &vsk->local_addr);
+	struct list_head *list;
+	struct vsock_remote_info *remote_info;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+	list = vsock_connected_sockets(&remote_info->addr, &vsk->local_addr);
+	rcu_read_unlock();
 
 	spin_lock_bh(&vsock_table_lock);
 	__vsock_insert_connected(list, vsk);
 	spin_unlock_bh(&vsock_table_lock);
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vsock_insert_connected);
 
+void vsock_remove_dgram_bound(struct vsock_sock *vsk)
+{
+	spin_lock_bh(&vsock_dgram_table_lock);
+	if (__vsock_in_bound_table(vsk))
+		__vsock_remove_bound(vsk);
+	spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
 void vsock_remove_bound(struct vsock_sock *vsk)
 {
 	spin_lock_bh(&vsock_table_lock);
@@ -362,7 +530,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
 
 void vsock_remove_sock(struct vsock_sock *vsk)
 {
-	vsock_remove_bound(vsk);
+	if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+		vsock_remove_bound(vsk);
+	else
+		vsock_remove_dgram_bound(vsk);
 	vsock_remove_connected(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -378,7 +549,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport,
 		struct vsock_sock *vsk;
 		list_for_each_entry(vsk, &vsock_connected_table[i],
 				    connected_table) {
-			if (vsk->transport != transport)
+			if (vsock_core_get_transport(vsk) != transport)
 				continue;
 
 			fn(sk_vsock(vsk));
@@ -444,59 +615,39 @@ static bool vsock_use_local_transport(unsigned int remote_cid)
 
 static void vsock_deassign_transport(struct vsock_sock *vsk)
 {
-	if (!vsk->transport)
-		return;
-
-	vsk->transport->destruct(vsk);
-	module_put(vsk->transport->module);
-	vsk->transport = NULL;
-}
-
-static const struct vsock_transport *
-vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
-{
-	const struct vsock_transport *transport;
+	struct vsock_remote_info *remote_info;
 
-	if (vsock_use_local_transport(cid))
-		transport = transport_local;
-	else if (cid <= VMADDR_CID_HOST || !transport_h2g ||
-		 (flags & VMADDR_FLAG_TO_HOST))
-		transport = transport_g2h;
-	else
-		transport = transport_h2g;
-
-	return transport;
-}
-
-static const struct vsock_transport *
-vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
-{
-	if (transport_dgram)
-		return transport_dgram;
+	remote_info = rcu_replace_pointer(vsk->remote_info, NULL,
+					  lockdep_sock_is_held(sk_vsock(vsk)));
+	if (!remote_info)
+		return;
 
-	return vsock_connectible_lookup_transport(cid, flags);
+	remote_info->transport->destruct(vsk);
+	module_put(remote_info->transport->module);
+	kfree_rcu(remote_info, rcu);
 }
 
 /* Assign a transport to a socket and call the .init transport callback.
  *
- * Note: for connection oriented socket this must be called when vsk->remote_addr
- * is set (e.g. during the connect() or when a connection request on a listener
- * socket is received).
- * The vsk->remote_addr is used to decide which transport to use:
+ * The remote_addr is used to decide which transport to use:
  *  - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if
  *    g2h is not loaded, will use local transport;
  *  - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field
  *    includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport;
  *  - remote CID > VMADDR_CID_HOST will use host->guest transport;
  */
-int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
+int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk,
+			   struct sockaddr_vm *remote_addr)
 {
 	const struct vsock_transport *new_transport;
+	struct vsock_remote_info *old_info;
 	struct sock *sk = sk_vsock(vsk);
-	unsigned int remote_cid = vsk->remote_addr.svm_cid;
+	unsigned int remote_cid;
 	__u8 remote_flags;
 	int ret;
 
+	remote_cid = remote_addr->svm_cid;
+
 	/* If the packet is coming with the source and destination CIDs higher
 	 * than VMADDR_CID_HOST, then a vsock channel where all the packets are
 	 * forwarded to the host should be established. Then the host will
@@ -506,10 +657,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 	 * the connect path the flag can be set by the user space application.
 	 */
 	if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST &&
-	    vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
-		vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST;
+	    remote_cid > VMADDR_CID_HOST)
+		remote_addr->svm_flags |= VMADDR_FLAG_TO_HOST;
 
-	remote_flags = vsk->remote_addr.svm_flags;
+	remote_flags = remote_addr->svm_flags;
 
 	switch (sk->sk_type) {
 	case SOCK_DGRAM:
@@ -525,8 +676,9 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		return -ESOCKTNOSUPPORT;
 	}
 
-	if (vsk->transport) {
-		if (vsk->transport == new_transport)
+	old_info = vsock_core_get_remote_info(vsk);
+	if (old_info && old_info->transport) {
+		if (old_info->transport == new_transport)
 			return 0;
 
 		/* transport->release() must be called with sock lock acquired.
@@ -535,7 +687,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		 * function is called on a new socket which is not assigned to
 		 * any transport.
 		 */
-		vsk->transport->release(vsk);
+		old_info->transport->release(vsk);
 		vsock_deassign_transport(vsk);
 	}
 
@@ -553,13 +705,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 		}
 	}
 
-	ret = new_transport->init(vsk, psk);
+	ret = vsock_set_remote_info(vsk, new_transport, remote_addr);
 	if (ret) {
 		module_put(new_transport->module);
 		return ret;
 	}
 
-	vsk->transport = new_transport;
+	ret = new_transport->init(vsk, psk);
+	if (ret) {
+		vsock_set_remote_info(vsk, NULL, NULL);
+		module_put(new_transport->module);
+		return ret;
+	}
 
 	return 0;
 }
@@ -616,12 +773,14 @@ static bool vsock_is_pending(struct sock *sk)
 
 static int vsock_send_shutdown(struct sock *sk, int mode)
 {
+	const struct vsock_transport *transport;
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	if (!vsk->transport)
+	transport = vsock_core_get_transport(vsk);
+	if (!transport)
 		return -ENODEV;
 
-	return vsk->transport->shutdown(vsk, mode);
+	return transport->shutdown(vsk, mode);
 }
 
 static void vsock_pending_work(struct work_struct *work)
@@ -757,7 +916,10 @@ EXPORT_SYMBOL(vsock_bind_stream);
 static int vsock_bind_dgram(struct vsock_sock *vsk,
 			    struct sockaddr_vm *addr)
 {
-	if (!vsk->transport || !vsk->transport->dgram_bind) {
+	const struct vsock_transport *transport;
+
+	transport = vsock_core_get_transport(vsk);
+	if (!transport || !transport->dgram_bind) {
 		int retval;
 		spin_lock_bh(&vsock_dgram_table_lock);
 		retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
@@ -767,7 +929,7 @@ static int vsock_bind_dgram(struct vsock_sock *vsk,
 		return retval;
 	}
 
-	return vsk->transport->dgram_bind(vsk, addr);
+	return transport->dgram_bind(vsk, addr);
 }
 
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
@@ -816,6 +978,7 @@ static struct sock *__vsock_create(struct net *net,
 				   unsigned short type,
 				   int kern)
 {
+	struct vsock_remote_info *remote_info;
 	struct sock *sk;
 	struct vsock_sock *psk;
 	struct vsock_sock *vsk;
@@ -835,7 +998,14 @@ static struct sock *__vsock_create(struct net *net,
 
 	vsk = vsock_sk(sk);
 	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
-	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+
+	remote_info = kmalloc(sizeof(*remote_info), GFP_KERNEL);
+	if (!remote_info) {
+		sk_free(sk);
+		return NULL;
+	}
+	vsock_addr_init(&remote_info->addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+	rcu_assign_pointer(vsk->remote_info, remote_info);
 
 	sk->sk_destruct = vsock_sk_destruct;
 	sk->sk_backlog_rcv = vsock_queue_rcv_skb;
@@ -882,6 +1052,7 @@ static bool sock_type_connectible(u16 type)
 static void __vsock_release(struct sock *sk, int level)
 {
 	if (sk) {
+		const struct vsock_transport *transport;
 		struct sock *pending;
 		struct vsock_sock *vsk;
 
@@ -895,8 +1066,9 @@ static void __vsock_release(struct sock *sk, int level)
 		 */
 		lock_sock_nested(sk, level);
 
-		if (vsk->transport)
-			vsk->transport->release(vsk);
+		transport = vsock_core_get_transport(vsk);
+		if (transport)
+			transport->release(vsk);
 		else if (sock_type_connectible(sk->sk_type))
 			vsock_remove_sock(vsk);
 
@@ -926,8 +1098,6 @@ static void vsock_sk_destruct(struct sock *sk)
 	 * possibly register the address family with the kernel.
 	 */
 	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
-	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
-
 	put_cred(vsk->owner);
 }
 
@@ -951,16 +1121,22 @@ EXPORT_SYMBOL_GPL(vsock_create_connected);
 
 s64 vsock_stream_has_data(struct vsock_sock *vsk)
 {
-	return vsk->transport->stream_has_data(vsk);
+	const struct vsock_transport *transport;
+
+	transport = vsock_core_get_transport(vsk);
+
+	return transport->stream_has_data(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_data);
 
 s64 vsock_connectible_has_data(struct vsock_sock *vsk)
 {
+	const struct vsock_transport *transport;
 	struct sock *sk = sk_vsock(vsk);
 
+	transport = vsock_core_get_transport(vsk);
 	if (sk->sk_type == SOCK_SEQPACKET)
-		return vsk->transport->seqpacket_has_data(vsk);
+		return transport->seqpacket_has_data(vsk);
 	else
 		return vsock_stream_has_data(vsk);
 }
@@ -968,7 +1144,10 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data);
 
 s64 vsock_stream_has_space(struct vsock_sock *vsk)
 {
-	return vsk->transport->stream_has_space(vsk);
+	const struct vsock_transport *transport;
+
+	transport = vsock_core_get_transport(vsk);
+	return transport->stream_has_space(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_space);
 
@@ -1017,6 +1196,7 @@ static int vsock_getname(struct socket *sock,
 	struct sock *sk;
 	struct vsock_sock *vsk;
 	struct sockaddr_vm *vm_addr;
+	struct vsock_remote_info *rcu_ptr;
 
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
@@ -1025,11 +1205,17 @@ static int vsock_getname(struct socket *sock,
 	lock_sock(sk);
 
 	if (peer) {
+		rcu_read_lock();
 		if (sock->state != SS_CONNECTED) {
 			err = -ENOTCONN;
 			goto out;
 		}
-		vm_addr = &vsk->remote_addr;
+		rcu_ptr = vsock_core_get_remote_info(vsk);
+		if (!rcu_ptr) {
+			err = -EINVAL;
+			goto out;
+		}
+		vm_addr = &rcu_ptr->addr;
 	} else {
 		vm_addr = &vsk->local_addr;
 	}
@@ -1049,6 +1235,8 @@ static int vsock_getname(struct socket *sock,
 	err = sizeof(*vm_addr);
 
 out:
+	if (peer)
+		rcu_read_unlock();
 	release_sock(sk);
 	return err;
 }
@@ -1153,7 +1341,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 
 		lock_sock(sk);
 
-		transport = vsk->transport;
+		transport = vsock_core_get_transport(vsk);
 
 		/* Listening sockets that have connections in their accept
 		 * queue can be read.
@@ -1224,9 +1412,11 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 
 static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor)
 {
+	const struct vsock_transport *transport;
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	return vsk->transport->read_skb(vsk, read_actor);
+	transport = vsock_core_get_transport(vsk);
+	return transport->read_skb(vsk, read_actor);
 }
 
 static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
@@ -1235,7 +1425,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	int err;
 	struct sock *sk;
 	struct vsock_sock *vsk;
-	struct sockaddr_vm *remote_addr;
+	struct sockaddr_vm stack_addr, *remote_addr;
 	const struct vsock_transport *transport;
 
 	if (msg->msg_flags & MSG_OOB)
@@ -1246,7 +1436,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
 
-	lock_sock(sk);
+	/* If auto-binding is required, acquire the slock to avoid potential
+	 * race conditions. Otherwise, do not acquire the lock.
+	 *
+	 * We know that the first check of local_addr is racy (indicated by
+	 * data_race()). By acquiring the lock and then subsequently checking
+	 * again if local_addr is bound (inside vsock_auto_bind()), we can
+	 * ensure there are no real data races.
+	 *
+	 * This technique is borrowed by inet_send_prepare().
+	 */
+	if (data_race(!vsock_addr_bound(&vsk->local_addr))) {
+		lock_sock(sk);
+		err = vsock_auto_bind(vsk);
+		release_sock(sk);
+		if (err)
+			return err;
+	}
 
 	/* If the provided message contains an address, use that.  Otherwise
 	 * fall back on the socket's remote handle (if it has been connected).
@@ -1256,6 +1462,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			    &remote_addr) == 0) {
 		transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
 							 remote_addr->svm_flags);
+
 		if (!transport) {
 			err = -EINVAL;
 			goto out;
@@ -1286,18 +1493,39 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			goto out;
 		}
 
-		err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
+		err = transport->dgram_enqueue(transport, vsk, remote_addr, msg, len);
 		module_put(transport->module);
 	} else if (sock->state == SS_CONNECTED) {
-		remote_addr = &vsk->remote_addr;
-		transport = vsk->transport;
+		struct vsock_remote_info *remote_info;
+		const struct vsock_transport *transport;
 
-		err = vsock_auto_bind(vsk);
-		if (err)
+		rcu_read_lock();
+		remote_info = vsock_core_get_remote_info(vsk);
+		if (!remote_info) {
+			err = -EINVAL;
+			rcu_read_unlock();
 			goto out;
+		}
 
-		if (remote_addr->svm_cid == VMADDR_CID_ANY)
+		transport = remote_info->transport;
+		memcpy(&stack_addr, &remote_info->addr, sizeof(stack_addr));
+		rcu_read_unlock();
+
+		remote_addr = &stack_addr;
+
+		if (remote_addr->svm_cid == VMADDR_CID_ANY) {
 			remote_addr->svm_cid = transport->get_local_cid();
+			lock_sock(sk_vsock(vsk));
+			/* Even though the CID has changed, We do not have to
+			 * look up the transport again because the local CID
+			 * will never resolve to a different transport.
+			 */
+			err = vsock_set_remote_info(vsk, transport, remote_addr);
+			release_sock(sk_vsock(vsk));
+
+			if (err)
+				goto out;
+		}
 
 		/* XXX Should connect() or this function ensure remote_addr is
 		 * bound?
@@ -1313,14 +1541,13 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			goto out;
 		}
 
-		err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
+		err = transport->dgram_enqueue(transport, vsk, &stack_addr, msg, len);
 	} else {
 		err = -EINVAL;
 		goto out;
 	}
 
 out:
-	release_sock(sk);
 	return err;
 }
 
@@ -1331,18 +1558,22 @@ static int vsock_dgram_connect(struct socket *sock,
 	struct sock *sk;
 	struct vsock_sock *vsk;
 	struct sockaddr_vm *remote_addr;
+	const struct vsock_transport *transport;
 
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
 
 	err = vsock_addr_cast(addr, addr_len, &remote_addr);
 	if (err == -EAFNOSUPPORT && remote_addr->svm_family == AF_UNSPEC) {
+		struct sockaddr_vm addr_any;
+
 		lock_sock(sk);
-		vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY,
-				VMADDR_PORT_ANY);
+		vsock_addr_init(&addr_any, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+		err = vsock_set_remote_info(vsk, vsock_core_get_transport(vsk),
+					    &addr_any);
 		sock->state = SS_UNCONNECTED;
 		release_sock(sk);
-		return 0;
+		return err;
 	} else if (err != 0)
 		return -EINVAL;
 
@@ -1352,14 +1583,13 @@ static int vsock_dgram_connect(struct socket *sock,
 	if (err)
 		goto out;
 
-	memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
-
-	err = vsock_assign_transport(vsk, NULL);
+	err = vsock_assign_transport(vsk, NULL, remote_addr);
 	if (err)
 		goto out;
 
-	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
-					 remote_addr->svm_port)) {
+	transport = vsock_core_get_transport(vsk);
+	if (!transport->dgram_allow(remote_addr->svm_cid,
+				    remote_addr->svm_port)) {
 		err = -EINVAL;
 		goto out;
 	}
@@ -1406,7 +1636,9 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 	if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
 		return -EOPNOTSUPP;
 
-	transport = vsk->transport;
+	rcu_read_lock();
+	transport = vsock_core_get_transport(vsk);
+	rcu_read_unlock();
 
 	/* Retrieve the head sk_buff from the socket's receive queue. */
 	err = 0;
@@ -1474,7 +1706,7 @@ static const struct proto_ops vsock_dgram_ops = {
 
 static int vsock_transport_cancel_pkt(struct vsock_sock *vsk)
 {
-	const struct vsock_transport *transport = vsk->transport;
+	const struct vsock_transport *transport = vsock_core_get_transport(vsk);
 
 	if (!transport || !transport->cancel_pkt)
 		return -EOPNOTSUPP;
@@ -1511,6 +1743,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
 	struct sock *sk;
 	struct vsock_sock *vsk;
 	const struct vsock_transport *transport;
+	struct vsock_remote_info *remote_info;
 	struct sockaddr_vm *remote_addr;
 	long timeout;
 	DEFINE_WAIT(wait);
@@ -1548,14 +1781,20 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,
 		}
 
 		/* Set the remote address that we are connecting to. */
-		memcpy(&vsk->remote_addr, remote_addr,
-		       sizeof(vsk->remote_addr));
-
-		err = vsock_assign_transport(vsk, NULL);
+		err = vsock_assign_transport(vsk, NULL, remote_addr);
 		if (err)
 			goto out;
 
-		transport = vsk->transport;
+		rcu_read_lock();
+		remote_info = vsock_core_get_remote_info(vsk);
+		if (!remote_info) {
+			err = -EINVAL;
+			rcu_read_unlock();
+			goto out;
+		}
+
+		transport = remote_info->transport;
+		rcu_read_unlock();
 
 		/* The hypervisor and well-known contexts do not have socket
 		 * endpoints.
@@ -1819,7 +2058,7 @@ static int vsock_connectible_setsockopt(struct socket *sock,
 
 	lock_sock(sk);
 
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	switch (optname) {
 	case SO_VM_SOCKETS_BUFFER_SIZE:
@@ -1957,7 +2196,7 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
 
 	lock_sock(sk);
 
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	/* Callers should not provide a destination with connection oriented
 	 * sockets.
@@ -1980,7 +2219,7 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out;
 	}
 
-	if (!vsock_addr_bound(&vsk->remote_addr)) {
+	if (!vsock_remote_addr_bound(vsk)) {
 		err = -EDESTADDRREQ;
 		goto out;
 	}
@@ -2101,7 +2340,7 @@ static int vsock_connectible_wait_data(struct sock *sk,
 
 	vsk = vsock_sk(sk);
 	err = 0;
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	while (1) {
 		prepare_to_wait(sk_sleep(sk), wait, TASK_INTERRUPTIBLE);
@@ -2169,7 +2408,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct msghdr *msg,
 	DEFINE_WAIT(wait);
 
 	vsk = vsock_sk(sk);
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	/* We must not copy less than target bytes into the user's buffer
 	 * before returning successfully, so we wait for the consume queue to
@@ -2245,7 +2484,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, struct msghdr *msg,
 	DEFINE_WAIT(wait);
 
 	vsk = vsock_sk(sk);
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 
@@ -2302,7 +2541,7 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	lock_sock(sk);
 
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	if (!transport || sk->sk_state != TCP_ESTABLISHED) {
 		/* Recvmsg is supposed to return 0 if a peer performs an
@@ -2369,7 +2608,7 @@ static int vsock_set_rcvlowat(struct sock *sk, int val)
 	if (val > vsk->buffer_size)
 		return -EINVAL;
 
-	transport = vsk->transport;
+	transport = vsock_core_get_transport(vsk);
 
 	if (transport && transport->set_rcvlowat)
 		return transport->set_rcvlowat(vsk, val);
@@ -2459,7 +2698,10 @@ static int vsock_create(struct net *net, struct socket *sock,
 	vsk = vsock_sk(sk);
 
 	if (sock->type == SOCK_DGRAM) {
-		ret = vsock_assign_transport(vsk, NULL);
+		struct sockaddr_vm remote_addr;
+
+		vsock_addr_init(&remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+		ret = vsock_assign_transport(vsk, NULL, &remote_addr);
 		if (ret < 0) {
 			sock_put(sk);
 			return ret;
@@ -2581,7 +2823,18 @@ static void __exit vsock_exit(void)
 
 const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
 {
-	return vsk->transport;
+	const struct vsock_transport *transport;
+	struct vsock_remote_info *remote_info;
+
+	rcu_read_lock();
+	remote_info = vsock_core_get_remote_info(vsk);
+	if (!remote_info) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	transport = remote_info->transport;
+	rcu_read_unlock();
+	return transport;
 }
 EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
diff --git a/net/vmw_vsock/diag.c b/net/vmw_vsock/diag.c
index a2823b1c5e28..f843bae86b32 100644
--- a/net/vmw_vsock/diag.c
+++ b/net/vmw_vsock/diag.c
@@ -15,8 +15,14 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
 			u32 portid, u32 seq, u32 flags)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
+	struct sockaddr_vm remote_addr;
 	struct vsock_diag_msg *rep;
 	struct nlmsghdr *nlh;
+	int err;
+
+	err = vsock_remote_addr_copy(vsk, &remote_addr);
+	if (err < 0)
+		return err;
 
 	nlh = nlmsg_put(skb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep),
 			flags);
@@ -36,8 +42,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb,
 	rep->vdiag_shutdown = sk->sk_shutdown;
 	rep->vdiag_src_cid = vsk->local_addr.svm_cid;
 	rep->vdiag_src_port = vsk->local_addr.svm_port;
-	rep->vdiag_dst_cid = vsk->remote_addr.svm_cid;
-	rep->vdiag_dst_port = vsk->remote_addr.svm_port;
+	rep->vdiag_dst_cid = remote_addr.svm_cid;
+	rep->vdiag_dst_port = remote_addr.svm_port;
 	rep->vdiag_ino = sock_i_ino(sk);
 
 	sock_diag_save_cookie(sk, rep->vdiag_cookie);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index c00bc5da769a..84e8c64b3365 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -323,6 +323,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		goto out;
 
 	if (conn_from_host) {
+		struct sockaddr_vm remote_addr;
+
 		if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
 			goto out;
 
@@ -336,10 +338,9 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		hvs_addr_init(&vnew->local_addr, if_type);
 
 		/* Remote peer is always the host */
-		vsock_addr_init(&vnew->remote_addr,
-				VMADDR_CID_HOST, VMADDR_PORT_ANY);
-		vnew->remote_addr.svm_port = get_port_by_srv_id(if_instance);
-		ret = vsock_assign_transport(vnew, vsock_sk(sk));
+		vsock_addr_init(&remote_addr, VMADDR_CID_HOST, get_port_by_srv_id(if_instance));
+
+		ret = vsock_assign_transport(vnew, vsock_sk(sk), &remote_addr);
 		/* Transport assigned (looking at remote_addr) must be the
 		 * same where we received the request.
 		 */
@@ -459,13 +460,18 @@ static int hvs_connect(struct vsock_sock *vsk)
 {
 	union hvs_service_id vm, host;
 	struct hvsock *h = vsk->trans;
+	int err;
 
 	vm.srv_id = srv_id_template;
 	vm.svm_port = vsk->local_addr.svm_port;
 	h->vm_srv_id = vm.srv_id;
 
 	host.srv_id = srv_id_template;
-	host.svm_port = vsk->remote_addr.svm_port;
+
+	err = vsock_remote_addr_port(vsk, &host.svm_port);
+	if (err < 0)
+		return err;
+
 	h->host_srv_id = host.srv_id;
 
 	return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
@@ -566,7 +572,8 @@ static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len)
 	return -EOPNOTSUPP;
 }
 
-static int hvs_dgram_enqueue(struct vsock_sock *vsk,
+static int hvs_dgram_enqueue(const struct vsock_transport *transport,
+			     struct vsock_sock *vsk,
 			     struct sockaddr_vm *remote, struct msghdr *msg,
 			     size_t dgram_len)
 {
@@ -866,7 +873,13 @@ static struct vsock_transport hvs_transport = {
 
 static bool hvs_check_transport(struct vsock_sock *vsk)
 {
-	return vsk->transport == &hvs_transport;
+	bool ret;
+
+	rcu_read_lock();
+	ret = vsock_core_get_transport(vsk) == &hvs_transport;
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static int hvs_probe(struct hv_device *hdev,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ab4af21c4f3f..09d35c488902 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -258,8 +258,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	src_cid = t_ops->transport.get_local_cid();
 	src_port = vsk->local_addr.svm_port;
 	if (!info->remote_cid) {
-		dst_cid	= vsk->remote_addr.svm_cid;
-		dst_port = vsk->remote_addr.svm_port;
+		ret = vsock_remote_addr_cid_port(vsk, &dst_cid, &dst_port);
+		if (ret < 0)
+			return ret;
 	} else {
 		dst_cid = info->remote_cid;
 		dst_port = info->remote_port;
@@ -877,12 +878,14 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
 EXPORT_SYMBOL_GPL(virtio_transport_shutdown);
 
 int
-virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
+virtio_transport_dgram_enqueue(const struct vsock_transport *transport,
+			       struct vsock_sock *vsk,
 			       struct sockaddr_vm *remote_addr,
 			       struct msghdr *msg,
 			       size_t dgram_len)
 {
-	const struct virtio_transport *t_ops;
+	const struct virtio_transport *t_ops =
+		(const struct virtio_transport *)transport;
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_RW,
 		.msg = msg,
@@ -896,7 +899,6 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
 	if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
 		return -EMSGSIZE;
 
-	t_ops = virtio_transport_get_ops(vsk);
 	src_cid = t_ops->transport.get_local_cid();
 	src_port = vsk->local_addr.svm_port;
 
@@ -1120,7 +1122,9 @@ virtio_transport_recv_connecting(struct sock *sk,
 	case VIRTIO_VSOCK_OP_RESPONSE:
 		sk->sk_state = TCP_ESTABLISHED;
 		sk->sk_socket->state = SS_CONNECTED;
-		vsock_insert_connected(vsk);
+		err = vsock_insert_connected(vsk);
+		if (err)
+			goto destroy;
 		sk->sk_state_change(sk);
 		break;
 	case VIRTIO_VSOCK_OP_INVALID:
@@ -1326,6 +1330,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 	struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
 	struct vsock_sock *vsk = vsock_sk(sk);
 	struct vsock_sock *vchild;
+	struct sockaddr_vm child_remote;
 	struct sock *child;
 	int ret;
 
@@ -1354,14 +1359,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 	vchild = vsock_sk(child);
 	vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid),
 			le32_to_cpu(hdr->dst_port));
-	vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid),
+	vsock_addr_init(&child_remote, le64_to_cpu(hdr->src_cid),
 			le32_to_cpu(hdr->src_port));
-
-	ret = vsock_assign_transport(vchild, vsk);
+	ret = vsock_assign_transport(vchild, vsk, &child_remote);
 	/* Transport assigned (looking at remote_addr) must be the same
 	 * where we received the request.
 	 */
-	if (ret || vchild->transport != &t->transport) {
+	if (ret || vsock_core_get_transport(vchild) != &t->transport) {
 		release_sock(child);
 		virtio_transport_reset_no_sock(t, skb);
 		sock_put(child);
@@ -1371,7 +1375,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 	if (virtio_transport_space_update(child, skb))
 		child->sk_write_space(child);
 
-	vsock_insert_connected(vchild);
+	ret = vsock_insert_connected(vchild);
+	if (ret) {
+		release_sock(child);
+		virtio_transport_reset_no_sock(t, skb);
+		sock_put(child);
+		return ret;
+	}
 	vsock_enqueue_accept(sk, child);
 	virtio_transport_send_response(vchild, skb);
 
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index b6a51afb74b8..b9ba6209e8fc 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -283,18 +283,25 @@ vmci_transport_send_control_pkt(struct sock *sk,
 				u16 proto,
 				struct vmci_handle handle)
 {
+	struct sockaddr_vm addr_stack;
+	struct sockaddr_vm *remote_addr = &addr_stack;
 	struct vsock_sock *vsk;
+	int err;
 
 	vsk = vsock_sk(sk);
 
 	if (!vsock_addr_bound(&vsk->local_addr))
 		return -EINVAL;
 
-	if (!vsock_addr_bound(&vsk->remote_addr))
+	if (!vsock_remote_addr_bound(vsk))
 		return -EINVAL;
 
+	err = vsock_remote_addr_copy(vsk, remote_addr);
+	if (err < 0)
+		return err;
+
 	return vmci_transport_alloc_send_control_pkt(&vsk->local_addr,
-						     &vsk->remote_addr,
+						     remote_addr,
 						     type, size, mode,
 						     wait, proto, handle);
 }
@@ -317,6 +324,7 @@ static int vmci_transport_send_reset(struct sock *sk,
 	struct sockaddr_vm *dst_ptr;
 	struct sockaddr_vm dst;
 	struct vsock_sock *vsk;
+	int err;
 
 	if (pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST)
 		return 0;
@@ -326,13 +334,16 @@ static int vmci_transport_send_reset(struct sock *sk,
 	if (!vsock_addr_bound(&vsk->local_addr))
 		return -EINVAL;
 
-	if (vsock_addr_bound(&vsk->remote_addr)) {
-		dst_ptr = &vsk->remote_addr;
+	if (vsock_remote_addr_bound(vsk)) {
+		err = vsock_remote_addr_copy(vsk, &dst);
+		if (err < 0)
+			return err;
 	} else {
 		vsock_addr_init(&dst, pkt->dg.src.context,
 				pkt->src_port);
-		dst_ptr = &dst;
 	}
+	dst_ptr = &dst;
+
 	return vmci_transport_alloc_send_control_pkt(&vsk->local_addr, dst_ptr,
 					     VMCI_TRANSPORT_PACKET_TYPE_RST,
 					     0, 0, NULL, VSOCK_PROTO_INVALID,
@@ -490,7 +501,7 @@ static struct sock *vmci_transport_get_pending(
 
 	list_for_each_entry(vpending, &vlistener->pending_links,
 			    pending_links) {
-		if (vsock_addr_equals_addr(&src, &vpending->remote_addr) &&
+		if (vsock_remote_addr_equals(vpending, &src) &&
 		    pkt->dst_port == vpending->local_addr.svm_port) {
 			pending = sk_vsock(vpending);
 			sock_hold(pending);
@@ -940,6 +951,7 @@ static void vmci_transport_recv_pkt_work(struct work_struct *work)
 static int vmci_transport_recv_listen(struct sock *sk,
 				      struct vmci_transport_packet *pkt)
 {
+	struct sockaddr_vm remote_addr;
 	struct sock *pending;
 	struct vsock_sock *vpending;
 	int err;
@@ -1015,10 +1027,10 @@ static int vmci_transport_recv_listen(struct sock *sk,
 
 	vsock_addr_init(&vpending->local_addr, pkt->dg.dst.context,
 			pkt->dst_port);
-	vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
-			pkt->src_port);
 
-	err = vsock_assign_transport(vpending, vsock_sk(sk));
+	vsock_addr_init(&remote_addr, pkt->dg.src.context, pkt->src_port);
+
+	err = vsock_assign_transport(vpending, vsock_sk(sk), &remote_addr);
 	/* Transport assigned (looking at remote_addr) must be the same
 	 * where we received the request.
 	 */
@@ -1133,6 +1145,7 @@ vmci_transport_recv_connecting_server(struct sock *listener,
 {
 	struct vsock_sock *vpending;
 	struct vmci_handle handle;
+	unsigned int vpending_remote_cid;
 	struct vmci_qp *qpair;
 	bool is_local;
 	u32 flags;
@@ -1189,8 +1202,13 @@ vmci_transport_recv_connecting_server(struct sock *listener,
 	/* vpending->local_addr always has a context id so we do not need to
 	 * worry about VMADDR_CID_ANY in this case.
 	 */
-	is_local =
-	    vpending->remote_addr.svm_cid == vpending->local_addr.svm_cid;
+	err = vsock_remote_addr_cid(vpending, &vpending_remote_cid);
+	if (err < 0) {
+		skerr = EPROTO;
+		goto destroy;
+	}
+
+	is_local = vpending_remote_cid == vpending->local_addr.svm_cid;
 	flags = VMCI_QPFLAG_ATTACH_ONLY;
 	flags |= is_local ? VMCI_QPFLAG_LOCAL : 0;
 
@@ -1203,7 +1221,7 @@ vmci_transport_recv_connecting_server(struct sock *listener,
 					flags,
 					vmci_transport_is_trusted(
 						vpending,
-						vpending->remote_addr.svm_cid));
+						vpending_remote_cid));
 	if (err < 0) {
 		vmci_transport_send_reset(pending, pkt);
 		skerr = -err;
@@ -1277,6 +1295,8 @@ static int
 vmci_transport_recv_connecting_client(struct sock *sk,
 				      struct vmci_transport_packet *pkt)
 {
+	struct vsock_remote_info *remote_info;
+	struct sockaddr_vm *remote_addr;
 	struct vsock_sock *vsk;
 	int err;
 	int skerr;
@@ -1306,9 +1326,20 @@ vmci_transport_recv_connecting_client(struct sock *sk,
 		break;
 	case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE:
 	case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE2:
+		rcu_read_lock();
+		remote_info = vsock_core_get_remote_info(vsk);
+		if (!remote_info) {
+			skerr = EPROTO;
+			err = -EINVAL;
+			rcu_read_unlock();
+			goto destroy;
+		}
+
+		remote_addr = &remote_info->addr;
+
 		if (pkt->u.size == 0
-		    || pkt->dg.src.context != vsk->remote_addr.svm_cid
-		    || pkt->src_port != vsk->remote_addr.svm_port
+		    || pkt->dg.src.context != remote_addr->svm_cid
+		    || pkt->src_port != remote_addr->svm_port
 		    || !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle)
 		    || vmci_trans(vsk)->qpair
 		    || vmci_trans(vsk)->produce_size != 0
@@ -1316,9 +1347,10 @@ vmci_transport_recv_connecting_client(struct sock *sk,
 		    || vmci_trans(vsk)->detach_sub_id != VMCI_INVALID_ID) {
 			skerr = EPROTO;
 			err = -EINVAL;
-
+			rcu_read_unlock();
 			goto destroy;
 		}
+		rcu_read_unlock();
 
 		err = vmci_transport_recv_connecting_client_negotiate(sk, pkt);
 		if (err) {
@@ -1379,6 +1411,7 @@ static int vmci_transport_recv_connecting_client_negotiate(
 	int err;
 	struct vsock_sock *vsk;
 	struct vmci_handle handle;
+	unsigned int remote_cid;
 	struct vmci_qp *qpair;
 	u32 detach_sub_id;
 	bool is_local;
@@ -1449,19 +1482,23 @@ static int vmci_transport_recv_connecting_client_negotiate(
 
 	/* Make VMCI select the handle for us. */
 	handle = VMCI_INVALID_HANDLE;
-	is_local = vsk->remote_addr.svm_cid == vsk->local_addr.svm_cid;
+
+	err = vsock_remote_addr_cid(vsk, &remote_cid);
+	if (err < 0)
+		goto destroy;
+
+	is_local = remote_cid == vsk->local_addr.svm_cid;
 	flags = is_local ? VMCI_QPFLAG_LOCAL : 0;
 
 	err = vmci_transport_queue_pair_alloc(&qpair,
 					      &handle,
 					      pkt->u.size,
 					      pkt->u.size,
-					      vsk->remote_addr.svm_cid,
+					      remote_cid,
 					      flags,
 					      vmci_transport_is_trusted(
 						  vsk,
-						  vsk->
-						  remote_addr.svm_cid));
+						  remote_cid));
 	if (err < 0)
 		goto destroy;
 
@@ -1692,6 +1729,7 @@ static int vmci_transport_dgram_bind(struct vsock_sock *vsk,
 }
 
 static int vmci_transport_dgram_enqueue(
+	const struct vsock_transport *transport,
 	struct vsock_sock *vsk,
 	struct sockaddr_vm *remote_addr,
 	struct msghdr *msg,
@@ -2052,7 +2090,13 @@ static struct vsock_transport vmci_transport = {
 
 static bool vmci_check_transport(struct vsock_sock *vsk)
 {
-	return vsk->transport == &vmci_transport;
+	bool retval;
+
+	rcu_read_lock();
+	retval = vsock_core_get_transport(vsk) == &vmci_transport;
+	rcu_read_unlock();
+
+	return retval;
 }
 
 static void vmci_vsock_transport_cb(bool is_host)
diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c
index a3c97546ab84..4d811c9cdf6e 100644
--- a/net/vmw_vsock/vsock_bpf.c
+++ b/net/vmw_vsock/vsock_bpf.c
@@ -148,6 +148,7 @@ static void vsock_bpf_check_needs_rebuild(struct proto *ops)
 
 int vsock_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore)
 {
+	const struct vsock_transport *transport;
 	struct vsock_sock *vsk;
 
 	if (restore) {
@@ -157,10 +158,15 @@ int vsock_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore
 	}
 
 	vsk = vsock_sk(sk);
-	if (!vsk->transport)
+
+	rcu_read_lock();
+	transport = vsock_core_get_transport(vsk);
+	rcu_read_unlock();
+
+	if (!transport)
 		return -ENODEV;
 
-	if (!vsk->transport->read_skb)
+	if (!transport->read_skb)
 		return -EOPNOTSUPP;
 
 	vsock_bpf_check_needs_rebuild(psock->sk_proto);

-- 
2.30.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox