Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: gleb@kernel.org, avi.kivity@gmail.com, mtosatti@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	stable@vger.kernel.org, David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Date: Mon, 18 Aug 2014 20:47:37 +0200	[thread overview]
Message-ID: <53F24A49.2010807@redhat.com> (raw)
In-Reply-To: <9AD43423-2FF3-422D-A5AD-61CAE6339CCC@linux.vnet.ibm.com>

Il 18/08/2014 18:35, Xiao Guangrong ha scritto:
> 
> Hi Paolo,
> 
> Thank you to review the patch!
> 
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -	update_memslots(slots, new, kvm->memslots->generation);
>>> +	/* ensure generation number is always increased. */
>>> +	slots->generation = old_memslots->generation;
>>> +	update_memslots(slots, new);
>>> 	rcu_assign_pointer(kvm->memslots, slots);
>>> 	synchronize_srcu_expedited(&kvm->srcu);
>>> +	slots->generation++;
>>
>> I don't trust my brain enough to review this patch.
> 
> Sorry to make you confused. I should expain it more clearly.

Don't worry, it's not your fault. :)

>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
> 
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

The race occurs because gfn_to_pfn_many_atomic or some other function
has already used kvm_memslots().  Calling kvm_memslots() twice is the
root cause the bug.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
> Only these cases can happen:
> 
> 1)  page fault occurs before synchronize_srcu_expedited.
>     In this case, vcpu will generate mmio-exit for the memslot being registered
>     by the ioctl. That’s ok since the ioctl have not finished.
> 
> 2) page fault occurs after synchronize_srcu_expedited and during
>    increasing generation-number.
>    In this case, userspace may get wrong mmio-exit (that happen if handing
>    page-fault is slower that the ioctl), that’s ok too since userspace need do
>   the check anyway like i said above.
> 
> 3) page fault occurs after generation-number update
>    that’s definitely correct. :)
> 
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.
> 
> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
> 
> Maybe i missed your idea, could you please detail it?

Something like this:

-	update_memslots(slots, new, kvm->memslots->generation);
+	/* ensure generation number is always increased. */
+	slots->generation = old_memslots->generation + 1;
+	update_memslots(slots, new);
	rcu_assign_pointer(kvm->memslots, slots);
	synchronize_srcu_expedited(&kvm->srcu);
+	slots->generation++;

Then case 1 and 2 will just have a cache miss.

The "low bit" is really just because each slot update does 2 generation
increases.

Paolo

WARNING: multiple messages have this Message-ID (diff)

From: Paolo Bonzini <pbonzini@redhat.com>
To: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Cc: gleb@kernel.org, avi.kivity@gmail.com, mtosatti@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	stable@vger.kernel.org, David Matlack <dmatlack@google.com>
Subject: Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Date: Mon, 18 Aug 2014 20:47:37 +0200	[thread overview]
Message-ID: <53F24A49.2010807@redhat.com> (raw)
In-Reply-To: <9AD43423-2FF3-422D-A5AD-61CAE6339CCC@linux.vnet.ibm.com>

Il 18/08/2014 18:35, Xiao Guangrong ha scritto:
> 
> Hi Paolo,
> 
> Thank you to review the patch!
> 
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -	update_memslots(slots, new, kvm->memslots->generation);
>>> +	/* ensure generation number is always increased. */
>>> +	slots->generation = old_memslots->generation;
>>> +	update_memslots(slots, new);
>>> 	rcu_assign_pointer(kvm->memslots, slots);
>>> 	synchronize_srcu_expedited(&kvm->srcu);
>>> +	slots->generation++;
>>
>> I don't trust my brain enough to review this patch.
> 
> Sorry to make you confused. I should expain it more clearly.

Don't worry, it's not your fault. :)

>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
> 
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

The race occurs because gfn_to_pfn_many_atomic or some other function
has already used kvm_memslots().  Calling kvm_memslots() twice is the
root cause the bug.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault canï¿½t go across synchronize_srcu_expedited.
> Only these cases can happen:
> 
> 1)  page fault occurs before synchronize_srcu_expedited.
>     In this case, vcpu will generate mmio-exit for the memslot being registered
>     by the ioctl. Thatï¿½s ok since the ioctl have not finished.
> 
> 2) page fault occurs after synchronize_srcu_expedited and during
>    increasing generation-number.
>    In this case, userspace may get wrong mmio-exit (that happen if handing
>    page-fault is slower that the ioctl), thatï¿½s ok too since userspace need do
>   the check anyway like i said above.
> 
> 3) page fault occurs after generation-number update
>    thatï¿½s definitely correct. :)
> 
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.
> 
> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
> 
> Maybe i missed your idea, could you please detail it?

Something like this:

-	update_memslots(slots, new, kvm->memslots->generation);
+	/* ensure generation number is always increased. */
+	slots->generation = old_memslots->generation + 1;
+	update_memslots(slots, new);
	rcu_assign_pointer(kvm->memslots, slots);
	synchronize_srcu_expedited(&kvm->srcu);
+	slots->generation++;

Then case 1 and 2 will just have a cache miss.

The "low bit" is really just because each slot update does 2 generation
increases.

Paolo

next prev parent reply	other threads:[~2014-08-18 18:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-14  7:01 [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
2014-08-14  7:01 ` [PATCH 2/2] kvm: x86: fix stale mmio cache bug Xiao Guangrong
2014-08-14 16:25   ` David Matlack
2014-08-18 21:24   ` Paolo Bonzini
2014-08-14  7:06 ` [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number Xiao Guangrong
2014-08-18 13:57 ` Paolo Bonzini
2014-08-18 16:35   ` Xiao Guangrong
2014-08-18 16:35     ` Xiao Guangrong
2014-08-18 18:20     ` David Matlack
2014-08-18 18:47     ` Paolo Bonzini [this message]
2014-08-18 18:47       ` Paolo Bonzini
2014-08-18 19:56       ` Xiao Guangrong
2014-08-18 19:56         ` Xiao Guangrong
2014-08-18 21:15         ` David Matlack
2014-08-18 21:24           ` Paolo Bonzini
2014-08-18 21:33             ` David Matlack
2014-08-19  3:50           ` Xiao Guangrong
2014-08-19  4:31             ` David Matlack
2014-08-19  4:41               ` Xiao Guangrong
2014-08-19  5:00                 ` David Matlack
2014-08-19  5:19                   ` Xiao Guangrong
2014-08-19  5:40                     ` David Matlack
2014-08-19  5:55                       ` Xiao Guangrong
2014-08-19  8:28             ` Paolo Bonzini
2014-08-19  8:50               ` Xiao Guangrong
2014-08-19  9:03                 ` Paolo Bonzini
2014-08-20  0:29                   ` Xiao Guangrong
2014-08-20  1:03                     ` David Matlack
2014-08-20  8:38                       ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2014-08-12  5:02 Xiao Guangrong
2014-08-12 21:18 ` David Matlack
2014-08-14  5:41   ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53F24A49.2010807@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=avi.kivity@gmail.com \
    --cc=dmatlack@google.com \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.