All of lore.kernel.org
 help / color / mirror / Atom feed
From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Jia He <justin.he@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH] KVM: arm64: Fix unaligned addr case in mmu walking
Date: Wed, 3 Mar 2021 11:08:56 +0000	[thread overview]
Message-ID: <YD9uSPQtlP2WTObe@google.com> (raw)
In-Reply-To: <87sg5czhny.wl-maz@kernel.org>

On Wednesday 03 Mar 2021 at 09:54:25 (+0000), Marc Zyngier wrote:
> Hi Jia,
> 
> On Wed, 03 Mar 2021 02:42:25 +0000,
> Jia He <justin.he@arm.com> wrote:
> > 
> > If the start addr is not aligned with the granule size of that level.
> > loop step size should be adjusted to boundary instead of simple
> > kvm_granual_size(level) increment. Otherwise, some mmu entries might miss
> > the chance to be walked through.
> > E.g. Assume the unmap range [data->addr, data->end] is
> > [0xff00ab2000,0xff00cb2000] in level 2 walking and NOT block mapping.
> 
> When does this occur? Upgrade from page mappings to block? Swap out?
> 
> > And the 1st part of that pmd entry is [0xff00ab2000,0xff00c00000]. The
> > pmd value is 0x83fbd2c1002 (not valid entry). In this case, data->addr
> > should be adjusted to 0xff00c00000 instead of 0xff00cb2000.
> 
> Let me see if I understand this. Assuming 4k pages, the region
> described above spans *two* 2M entries:
> 
> (a) ff00ab2000-ff00c00000, part of ff00a00000-ff00c00000
> (b) ff00c00000-ff00db2000, part of ff00c00000-ff00e00000
> 
> (a) has no valid mapping, but (b) does. Because we fail to correctly
> align on a block boundary when skipping (a), we also skip (b), which
> is then left mapped.
> 
> Did I get it right? If so, yes, this is... annoying.
> 
> Understanding the circumstances this triggers in would be most
> interesting. This current code seems to assume that we get ranges
> aligned to mapping boundaries, but I seem to remember that the old
> code did use the stage2_*_addr_end() helpers to deal with this case.
> 
> Will: I don't think things have changed in that respect, right?

Indeed we should still use stage2_*_addr_end(), especially in the unmap
path that is mentioned here, so it would be helpful to have a little bit
more context.

> > Without this fix, userspace "segment fault" error can be easily
> > triggered by running simple gVisor runsc cases on an Ampere Altra
> > server:
> >     docker run --runtime=runsc -it --rm  ubuntu /bin/bash
> > 
> > In container:
> >     for i in `seq 1 100`;do ls;done
> 
> The workload on its own isn't that interesting. What I'd like to
> understand is what happens on the host during that time.
> 
> > 
> > Reported-by: Howard Zhang <Howard.Zhang@arm.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdf8e55ed308..4d99d07c610c 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -225,6 +225,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> >  		goto out;
> >  
> >  	if (!table) {
> > +		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
> >  		data->addr += kvm_granule_size(level);
> >  		goto out;
> >  	}
> 
> It otherwise looks good to me. Quentin, Will: unless you object to
> this, I plan to take it in the next round of fixes with

Though I'm still unsure how we hit that today, the change makes sense on
its own I think, so no objection from me.

Thanks,
Quentin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Jia He <justin.he@arm.com>,
	kvmarm@lists.cs.columbia.edu, James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>,
	Yanan Wang <wangyanan55@huawei.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: arm64: Fix unaligned addr case in mmu walking
Date: Wed, 3 Mar 2021 11:08:56 +0000	[thread overview]
Message-ID: <YD9uSPQtlP2WTObe@google.com> (raw)
In-Reply-To: <87sg5czhny.wl-maz@kernel.org>

On Wednesday 03 Mar 2021 at 09:54:25 (+0000), Marc Zyngier wrote:
> Hi Jia,
> 
> On Wed, 03 Mar 2021 02:42:25 +0000,
> Jia He <justin.he@arm.com> wrote:
> > 
> > If the start addr is not aligned with the granule size of that level.
> > loop step size should be adjusted to boundary instead of simple
> > kvm_granual_size(level) increment. Otherwise, some mmu entries might miss
> > the chance to be walked through.
> > E.g. Assume the unmap range [data->addr, data->end] is
> > [0xff00ab2000,0xff00cb2000] in level 2 walking and NOT block mapping.
> 
> When does this occur? Upgrade from page mappings to block? Swap out?
> 
> > And the 1st part of that pmd entry is [0xff00ab2000,0xff00c00000]. The
> > pmd value is 0x83fbd2c1002 (not valid entry). In this case, data->addr
> > should be adjusted to 0xff00c00000 instead of 0xff00cb2000.
> 
> Let me see if I understand this. Assuming 4k pages, the region
> described above spans *two* 2M entries:
> 
> (a) ff00ab2000-ff00c00000, part of ff00a00000-ff00c00000
> (b) ff00c00000-ff00db2000, part of ff00c00000-ff00e00000
> 
> (a) has no valid mapping, but (b) does. Because we fail to correctly
> align on a block boundary when skipping (a), we also skip (b), which
> is then left mapped.
> 
> Did I get it right? If so, yes, this is... annoying.
> 
> Understanding the circumstances this triggers in would be most
> interesting. This current code seems to assume that we get ranges
> aligned to mapping boundaries, but I seem to remember that the old
> code did use the stage2_*_addr_end() helpers to deal with this case.
> 
> Will: I don't think things have changed in that respect, right?

Indeed we should still use stage2_*_addr_end(), especially in the unmap
path that is mentioned here, so it would be helpful to have a little bit
more context.

> > Without this fix, userspace "segment fault" error can be easily
> > triggered by running simple gVisor runsc cases on an Ampere Altra
> > server:
> >     docker run --runtime=runsc -it --rm  ubuntu /bin/bash
> > 
> > In container:
> >     for i in `seq 1 100`;do ls;done
> 
> The workload on its own isn't that interesting. What I'd like to
> understand is what happens on the host during that time.
> 
> > 
> > Reported-by: Howard Zhang <Howard.Zhang@arm.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdf8e55ed308..4d99d07c610c 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -225,6 +225,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> >  		goto out;
> >  
> >  	if (!table) {
> > +		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
> >  		data->addr += kvm_granule_size(level);
> >  		goto out;
> >  	}
> 
> It otherwise looks good to me. Quentin, Will: unless you object to
> this, I plan to take it in the next round of fixes with

Though I'm still unsure how we hit that today, the change makes sense on
its own I think, so no objection from me.

Thanks,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Quentin Perret <qperret@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Jia He <justin.he@arm.com>,
	kvmarm@lists.cs.columbia.edu, James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>,
	Yanan Wang <wangyanan55@huawei.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] KVM: arm64: Fix unaligned addr case in mmu walking
Date: Wed, 3 Mar 2021 11:08:56 +0000	[thread overview]
Message-ID: <YD9uSPQtlP2WTObe@google.com> (raw)
In-Reply-To: <87sg5czhny.wl-maz@kernel.org>

On Wednesday 03 Mar 2021 at 09:54:25 (+0000), Marc Zyngier wrote:
> Hi Jia,
> 
> On Wed, 03 Mar 2021 02:42:25 +0000,
> Jia He <justin.he@arm.com> wrote:
> > 
> > If the start addr is not aligned with the granule size of that level.
> > loop step size should be adjusted to boundary instead of simple
> > kvm_granual_size(level) increment. Otherwise, some mmu entries might miss
> > the chance to be walked through.
> > E.g. Assume the unmap range [data->addr, data->end] is
> > [0xff00ab2000,0xff00cb2000] in level 2 walking and NOT block mapping.
> 
> When does this occur? Upgrade from page mappings to block? Swap out?
> 
> > And the 1st part of that pmd entry is [0xff00ab2000,0xff00c00000]. The
> > pmd value is 0x83fbd2c1002 (not valid entry). In this case, data->addr
> > should be adjusted to 0xff00c00000 instead of 0xff00cb2000.
> 
> Let me see if I understand this. Assuming 4k pages, the region
> described above spans *two* 2M entries:
> 
> (a) ff00ab2000-ff00c00000, part of ff00a00000-ff00c00000
> (b) ff00c00000-ff00db2000, part of ff00c00000-ff00e00000
> 
> (a) has no valid mapping, but (b) does. Because we fail to correctly
> align on a block boundary when skipping (a), we also skip (b), which
> is then left mapped.
> 
> Did I get it right? If so, yes, this is... annoying.
> 
> Understanding the circumstances this triggers in would be most
> interesting. This current code seems to assume that we get ranges
> aligned to mapping boundaries, but I seem to remember that the old
> code did use the stage2_*_addr_end() helpers to deal with this case.
> 
> Will: I don't think things have changed in that respect, right?

Indeed we should still use stage2_*_addr_end(), especially in the unmap
path that is mentioned here, so it would be helpful to have a little bit
more context.

> > Without this fix, userspace "segment fault" error can be easily
> > triggered by running simple gVisor runsc cases on an Ampere Altra
> > server:
> >     docker run --runtime=runsc -it --rm  ubuntu /bin/bash
> > 
> > In container:
> >     for i in `seq 1 100`;do ls;done
> 
> The workload on its own isn't that interesting. What I'd like to
> understand is what happens on the host during that time.
> 
> > 
> > Reported-by: Howard Zhang <Howard.Zhang@arm.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdf8e55ed308..4d99d07c610c 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -225,6 +225,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> >  		goto out;
> >  
> >  	if (!table) {
> > +		data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
> >  		data->addr += kvm_granule_size(level);
> >  		goto out;
> >  	}
> 
> It otherwise looks good to me. Quentin, Will: unless you object to
> this, I plan to take it in the next round of fixes with

Though I'm still unsure how we hit that today, the change makes sense on
its own I think, so no objection from me.

Thanks,
Quentin

  reply	other threads:[~2021-03-03 11:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03  2:42 [PATCH] KVM: arm64: Fix unaligned addr case in mmu walking Jia He
2021-03-03  2:42 ` Jia He
2021-03-03  2:42 ` Jia He
2021-03-03  9:54 ` Marc Zyngier
2021-03-03  9:54   ` Marc Zyngier
2021-03-03  9:54   ` Marc Zyngier
2021-03-03 11:08   ` Quentin Perret [this message]
2021-03-03 11:08     ` Quentin Perret
2021-03-03 11:08     ` Quentin Perret
2021-03-04  0:38     ` Justin He
2021-03-04  0:38       ` Justin He
2021-03-04  0:38       ` Justin He
2021-03-03 11:49   ` Will Deacon
2021-03-03 11:49     ` Will Deacon
2021-03-03 11:49     ` Will Deacon
2021-03-03 11:29 ` Will Deacon
2021-03-03 11:29   ` Will Deacon
2021-03-03 11:29   ` Will Deacon
2021-03-03 19:07   ` Marc Zyngier
2021-03-03 19:07     ` Marc Zyngier
2021-03-03 19:07     ` Marc Zyngier
2021-03-03 21:13     ` Will Deacon
2021-03-03 21:13       ` Will Deacon
2021-03-03 21:13       ` Will Deacon
2021-03-04  0:46       ` Justin He
2021-03-04  0:46         ` Justin He
2021-03-04  0:46         ` Justin He
2021-03-04  9:16         ` Marc Zyngier
2021-03-04  9:16           ` Marc Zyngier
2021-03-04  9:16           ` Marc Zyngier
2021-03-04  9:22           ` Will Deacon
2021-03-04  9:22             ` Will Deacon
2021-03-04  9:22             ` Will Deacon
2021-03-04  9:55 ` Marc Zyngier
2021-03-04  9:55   ` Marc Zyngier
2021-03-04  9:55   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YD9uSPQtlP2WTObe@google.com \
    --to=qperret@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=justin.he@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.