Re: Reset problem vs. MMIO emulation, hypercalls, etc...

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <dwg@au1.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	kvm@vger.kernel.org, Alexander Graf <agraf@suse.de>,
	Paul Mackerras <paulus@au1.ibm.com>,
	kvm-ppc@vger.kernel.org
Subject: Re: Reset problem vs. MMIO emulation, hypercalls, etc...
Date: Mon, 6 Aug 2012 13:13:44 +1000	[thread overview]
Message-ID: <20120806031344.GG16664@truffala.fritz.box> (raw)
In-Reply-To: <1344033008.24037.67.camel@pasglop>

On Sat, Aug 04, 2012 at 08:30:08AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-03 at 14:41 -0300, Marcelo Tosatti wrote:
> 
> > > Hrm, except that doing KVM_RUN with a signal is very cumbersome to do
> > > and I couldn't quite find the logic in qemu to do it ... but I might
> > > just have missed it. I can see indeed that in the migration case you
> > > want to actually complete the operation rather than just "abort it".
> > > 
> > > Any chance you can point me to the code that performs that trick qemu
> > > side for migration ?
> > 
> > kvm-all.c:
> > 
> >         kvm_arch_pre_run(env, run);
> >         if (env->exit_request) {
> >             DPRINTF("interrupt exit requested\n");
> >             /*
> >              * KVM requires us to reenter the kernel after IO exits to
> >              * complete
> >              * instruction emulation. This self-signal will ensure that
> >              * we
> >              * leave ASAP again.
> >              */
> >             qemu_cpu_kick_self();
> >         }
> > 
> > 
> > > Anthony seems to think that for reset we can just abort the operation
> > > state in the kernel when the MP state changes.
> 
> Ok, I see now, this is scary really... set a flag somewhere, pray that
> we are in the right part of the loop on elsewhere... oh well.
> 
> In fact there's some fun (through harmless) bits to be found, look at
> x86 for example:
> 
>         if (env->exception_injected == EXCP08_DBLE) {
>             /* this means triple fault */
>             qemu_system_reset_request();
>             env->exit_request = 1;
>             return 0;
>         }
> 
> Well, qemu_system_reset_request() calls cpu_stop_current() which calls
> cpu_exit() which also does:
> 
>    env->exit_request = 1;
>  
> So obviously it will be well set by that time :-)
> 
> Now I can see how having it set during kvm_arch_process_async_events()
> will work as this is called right before the run loop. Similarily,
> EXIT_MMIO and EXIT_IO would work so they are a non issue.
> 
> Our problem I suspect with PAPR doing resets via hcalls is that
> our kvm_arch_handle_exit() does:
> 
>     case KVM_EXIT_PAPR_HCALL:
>         dprintf("handle PAPR hypercall\n");
>         run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
>                                               run->papr_hcall.args);
>         ret = 1;
>         break;
> 
> See the ret = 1 here ? That means that the caller (kvm_cpu_exec.c main
> loop) will exit immediately upon return. If we had instead returned 0,
> it would have looped, seen the "exit_requested" set by
> qemu_system_reset_request(), which would have then done the signal + run
> trick, completed the hypercall, and returned this time in a clean state.
> 
> So it seems (I don't have the machine at hand to test right now) that by
> just changing that ret = 1 to ret = 0, we might be fixing our problem,
> including the case where another vcpu is taking a hypercall at the time
> of the reset (this other CPU will also need to complete the operation
> before stopping).
> 
> David, is there any reason why you did ret = 1 to begin with ? For
> things like reset etc... the exit will happen as a result of
> env->exit_requested being set by cpu_exit().

Erm, there might have been, but I don't remember what it was.

> Are there other cases where you wish the hcall to go back to the main
> loop ? In that case, shouldn't it set env->exit_requested rather than
> returning 1 at that point ? (H_CEDE for example ?)

So, I'm still trying to nut out the implications for H_CEDE, and think
if there are any other hypercalls that might want to block the guest
for a time.  We were considering blocking H_PUT_TCE if qemu devices
had active dma maps on the previously mapped iovas.  I'm not sure if
the discussions that led to the inclusion of the qemu IOMMU code
decided that was wholly unnnecessary or just not necessary for the
time being.

> Paul, David, I don't have time to test that before Tuesday (I'm away on
> monday) but if you have a chance, revert the kernel change we did and
> try this qemu patch for reset:
> 
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -766,7 +766,7 @@ int kvm_arch_handle_exit(CPUPPCState *env, struct
> kvm_run *r
>          dprintf("handle PAPR hypercall\n");
>          run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
>                                                run->papr_hcall.args);
> -        ret = 1;
> +        ret = 0;
>          break;
>  #endif
>      default:

So, usually I think ret = 0 is correct, it lets us handle the
hypercall and get back to the guest without further fuss.  I think
right now, ret = 0 is always correct, since H_CEDE is always handled
in kernel.  But if we ever have qemu implemented hypercalls which want
to "block" then we might need a facility to do something else.  Maybe,
like I say, I haven't convinved myself I've thought of all the
implications yet.

> It might also need something like:
> 
> diff --git a/hw/spapr_hcall.c b/hw/spapr_hcall.c
> index a5990a9..d4d7fb0 100644
> --- a/hw/spapr_hcall.c
> +++ b/hw/spapr_hcall.c
> @@ -545,6 +545,7 @@ static target_ulong h_cede(CPUPPCState *env,
> sPAPREnvironmen
>      hreg_compute_hflags(env);
>      if (!cpu_has_work(env)) {
>          env->halted = 1;
> +        env->exit_request = 1;
>      }
>      return H_SUCCESS;
>  }
> 
> Though I don't think H_CEDE ever goes down to qemu, does it ?

No, though it might be worth doing that anyway, just in case we ever
have some whacky case where the H_CEDE does get handed on from the
kernel.

Ok, I'll send a patch and we can worry about the blocking qemu
hypercall case later, since it doesn't exist now.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

next prev parent reply	other threads:[~2012-08-06  3:13 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-01  3:17 Reset problem vs. MMIO emulation, hypercalls, etc Benjamin Herrenschmidt
2012-08-02 10:46 ` Alexander Graf
2012-08-02 12:22   ` Benjamin Herrenschmidt
2012-08-02 12:35 ` Avi Kivity
2012-08-02 12:59   ` Alexander Graf
2012-08-02 13:05     ` Avi Kivity
2012-08-02 20:29       ` Benjamin Herrenschmidt
2012-08-05  8:55         ` Avi Kivity
2012-08-05 20:45           ` Benjamin Herrenschmidt
2012-08-02 20:20   ` Benjamin Herrenschmidt
2012-08-03 17:41     ` Marcelo Tosatti
2012-08-03 18:05       ` Marcelo Tosatti
2012-08-03 22:32         ` Benjamin Herrenschmidt
2012-08-05  9:00           ` Avi Kivity
2012-08-06 20:25             ` Scott Wood
2012-08-07  8:44               ` Avi Kivity
2012-08-06 20:54             ` Benjamin Herrenschmidt
2012-08-03 22:30       ` Benjamin Herrenschmidt
2012-08-06  3:13         ` David Gibson [this message]
2012-08-06 20:57           ` Benjamin Herrenschmidt
2012-08-07  1:32             ` David Gibson
2012-08-07  8:46               ` Avi Kivity
2012-08-07 12:14                 ` David Gibson
2012-08-07 13:13                   ` Avi Kivity
2012-08-07 21:09                     ` Benjamin Herrenschmidt
2012-08-08  8:52                       ` Avi Kivity
2012-08-08  9:27                         ` Benjamin Herrenschmidt
2012-08-08  0:49                     ` David Gibson
2012-08-08  8:58                       ` Avi Kivity
2012-08-08 11:59                         ` David Gibson
2012-08-08 12:42                           ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120806031344.GG16664@truffala.fritz.box \
    --to=dwg@au1.ibm.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=paulus@au1.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).