Linux virtualization list

* Re: [PATCH v3 2/3] hvc_init(): Enforce one-time initialization.
From: Miche Baker-Harvey @ 2011-12-06 17:05 UTC (permalink / raw)
  To: Amit Shah
  Cc: Stephen Rothwell, xen-devel, Konrad Rzeszutek Wilk,
	Benjamin Herrenschmidt, linux-kernel, virtualization,
	Anton Blanchard, Mike Waychison, ppc-dev, Greg Kroah-Hartman,
	Eric Northrup
In-Reply-To: <20111205105452.GB27683@amit-x200.redhat.com>

Amit,

Ah, indeed.  I am not using MSI-X, so virtio_pci::vp_try_to_find_vqs()
calls vp_request_intx() and sets up an interrupt callback.  From
there, when an interrupt occurs, the stack looks something like this:

virtio_pci::vp_interrupt()
  virtio_pci::vp_vring_interrupt()
    virtio_ring::vring_interrupt()
      vq->vq.callback()  <-- in this case, that's virtio_console::control_intr()
        workqueue::schedule_work()
          workqueue::queue_work()
            queue_work_on(get_cpu())  <-- queues the work on the current CPU.

I'm not doing anything to keep multiple control message from being
sent concurrently to the guest, and we will take those interrupts on
any CPU. I've confirmed that the two instances of
handle_control_message() are occurring on different CPUs.

Should this work?  I don't see anywhere that QEMU is serializing the
sending of data to the control queue in the guest, and there's no
serialization in
the control_intr.  I don't understand why you are not seeing the
concurrent execution of handle_control_message().  Are you taking all
your interrupts on a single CPU, maybe?  Or is there some other
serialization in user space?

Miche

On Mon, Dec 5, 2011 at 2:54 AM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Tue) 29 Nov 2011 [09:50:41], Miche Baker-Harvey wrote:
>> Good grief!  Sorry for the spacing mess-up!  Here's a resend with reformatting.
>>
>> Amit,
>> We aren't using either QEMU or kvmtool, but we are using KVM.  All
>
> So it's a different userspace?  Any chance this different userspace is
> causing these problems to appear?  Esp. since I couldn't reproduce
> with qemu.
>
>> the issues we are seeing happen when we try to establish multiple
>> virtioconsoles at boot time.  The command line isn't relevant, but I
>> can tell you the protocol that's passing between the host (kvm) and
>> the guest (see the end of this message).
>>
>> We do go through the control_work_handler(), but it's not
>> providing synchronization.  Here's a trace of the
>> control_work_handler() and handle_control_message() calls; note that
>> there are two concurrent calls to control_work_handler().
>
> Ah; how does that happen?  control_work_handler() should just be
> invoked once, and if there are any more pending work items to be
> consumed, they should be done within the loop inside
> control_work_handler().
>
>> I decorated control_work_handler() with a "lifetime" marker, and
>> passed this value to handle_control_message(), so we can see which
>> control messages are being handled from which instance of
>> the control_work_handler() thread.
>>
>> Notice that we enter control_work_handler() a second time before
>> the handling of the second PORT_ADD message is complete. The
>> first CONSOLE_PORT message is handled by the second
>> control_work_handler() call, but the second is handled by the first
>> control_work_handler() call.
>>
>> root@myubuntu:~# dmesg | grep MBH
>> [3371055.808738] control_work_handler #1
>> [3371055.809372] + #1 handle_control_message PORT_ADD
>> [3371055.810169] - handle_control_message PORT_ADD
>> [3371055.810170] + #1 handle_control_message PORT_ADD
>> [3371055.810244]  control_work_handler #2
>> [3371055.810245] + #2 handle_control_message CONSOLE_PORT
>> [3371055.810246]  got hvc_ports_mutex
>> [3371055.810578] - handle_control_message PORT_ADD
>> [3371055.810579] + #1 handle_control_message CONSOLE_PORT
>> [3371055.810580]  trylock of hvc_ports_mutex failed
>> [3371055.811352]  got hvc_ports_mutex
>> [3371055.811370] - handle_control_message CONSOLE_PORT
>> [3371055.816609] - handle_control_message CONSOLE_PORT
>>
>> So, I'm guessing the bug is that there shouldn't be two instances of
>> control_work_handler() running simultaneously?
>
> Yep, I assumed we did that but apparently not.  Do you plan to chase
> this one down?
>
>                Amit
>

^ permalink raw reply