qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Jan Kiszka <jan.kiszka@web.de>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	liu ping fan <qemulist@gmail.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread
Date: Fri, 22 Jun 2012 17:56:30 -0500	[thread overview]
Message-ID: <4FE4F81E.2070007@codemonkey.ws> (raw)
In-Reply-To: <4FE4F163.6060805@web.de>

On 06/22/2012 05:27 PM, Jan Kiszka wrote:
> On 2012-06-22 23:44, Anthony Liguori wrote:
>> 1) unlock iothread before entering the do {} look in kvm_cpu_exec()
>>     a) reacquire the lock after the loop
>>     b) reacquire the lock in kvm_handle_io()
>>     c) introduce an unlocked memory accessor that for now, just requires
>> the iothread lock() and calls cpu_physical_memory_rw()
>
> Right, that's what we have here as well. The latter is modeled as a so
> called "I/O pathway", a thread-based execution context for
> frontend/backend pairs with some logic to transfer certain I/O requests
> asynchronously to the pathway thread.

Interesting, so the VCPU threads always hold the iothread mutex but some 
requests are routed to other threads?

I hadn't considered a design like that.  I've been thinking about a long term 
architecture that's a bit more invasive.

What we think of the I/O thread today wouldn't be special.  It would be one of N 
I/O threads all running separate copies of the main loop.  All of the functions 
that defer dispatch to a main loop would take a context as an argument and 
devices would essentially have a "vector" array of main loops as input.

So virtio-net probably would have two main loop "vectors" since it would like to 
schedule tx and rx independently.  There's nothing that says that you can't pass 
the same main loop context for each vector but that's a configuration choice.

Dispatch from VCPU context would behave the same it does today but obviously 
per-device locking is needed.

> The tricky part was to get nested requests right, i.e. when a requests
> triggers another one from within the device model. This is where things
> get ugly. In theory, you can end up with a vm deadlock if you just apply
> per-device locking. I'm currently trying to rebase our patches, review
> and document the logic behind it.

I really think the only way to solve this is to separate map()'d DMA access 
(where the device really wants to deal with RAM only) and copy-based access 
(where devices map DMA to other devices).

For copy-based access, we really ought to move to a callback based API.  It adds 
quite a bit of complexity but it's really the only way to solve the problem 
robustly.

>> 2) focus initially on killing the lock in kvm_handle_io()
>>     a) the ioport table is pretty simplistic so adding fine grain locking
>> won't be hard.
>>     b) reacquire lock right before ioport dispatch
>>
>> 3) allow for register ioport handlers w/o the dispatch function carrying
>> a iothread
>>     a) this is mostly memory API plumbing
>
> We skipped this as our NICs didn't do PIO, but you clearly need it for
> virtio.

Right.

>> 4) focus on going back and adding fine grain locking to the
>> cpu_physical_memory_rw() accessor
>
> In the end, PIO and MMIO should use the same patterns - and will face
> the same challenges. Ideally, we model things very similar right from
> the start.

Yes.

> And then there is also
>
> 5) provide direct IRQ delivery from the device model to the IRQ chip.
> That's much like what we need for VFIO and KVM device assignment. But
> here we won't be able to cheat and ignore correct generation of vmstates
> of the bypassed PCI host bridges etc... Which leads me to that other
> thread about how to handle this for PCI device pass-through.
> Contributions to that discussion are welcome as well.

I think you mean to the in-kernel IRQ chip.  I'm thinking about this still so I 
don't have a plan yet that I'm ready to share.  I have some ideas though.

>
>>
>> Note that whenever possible, we should be using rwlocks instead of a
>> normal mutex.  In particular, for the ioport data structures, a rwlock
>> seems pretty obvious.
>
> I think we should mostly be fine with a "big hammer" rwlock: unlocked
> read access from VCPUs and iothreads, and vmstop/resume around
> modifications of fast path data structures (like the memory region
> hierarchy or the PIO table).

Ack.

> Where that's not sufficient, RCU will be
> needed. Sleeping rwlocks have horrible semantics (specifically when
> thread priorities come into play) and are performance-wise inferior. We
> should avoid them completely.

Yes, I think RCU is inevitable here but I think starting with rwlocks will help 
with the big refactoring.

>>
>> To be clear, I'm not advocating introducing cpu_lock.  We should do
>> whatever makes the most sense to not have to hold iothread lock while
>> processing an exit from KVM.
>
> Good that we agree. :)
>
>>
>> Note that this is an RFC, the purpose of this series is to have this
>> discussion :-)
>
> Yep, I think we have it now ;). Hope I can contribute some code bits to
> it soon, though I didn't schedule this task for the next week.

Great!  If you have something you can share, I'd be eager to look at it 
regardless of the condition of the code.

Regards,

Anthony Liguori

>
> Jan
>

  reply	other threads:[~2012-06-22 22:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1340290158-11036-1-git-send-email-qemulist@gmail.com>
     [not found] ` <1340290158-11036-3-git-send-email-qemulist@gmail.com>
2012-06-21 15:02   ` [Qemu-devel] [PATCH 2/2] kvm: use per-cpu lock to free vcpu thread out of the big lock Jan Kiszka
2012-06-22 20:06   ` Anthony Liguori
2012-06-23  9:32     ` liu ping fan
2012-06-24 14:09       ` liu ping fan
2012-06-21 15:23 ` [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Jan Kiszka
2012-06-22 10:24   ` liu ping fan
2012-06-22 10:37     ` Jan Kiszka
2012-06-22 20:11       ` Anthony Liguori
2012-06-22 21:14         ` Jan Kiszka
2012-06-22 21:44           ` Anthony Liguori
2012-06-22 22:27             ` Jan Kiszka
2012-06-22 22:56               ` Anthony Liguori [this message]
2012-06-23  9:10                 ` Jan Kiszka
     [not found] ` <1340290158-11036-2-git-send-email-qemulist@gmail.com>
2012-06-22 20:01   ` [Qemu-devel] [PATCH 1/2] CPUArchState: introduce per-cpu lock Anthony Liguori
2012-06-21 15:06 [Qemu-devel] [RFC] use little granularity lock to substitue qemu_mutex_lock_iothread Liu Ping Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE4F81E.2070007@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=jan.kiszka@web.de \
    --cc=qemu-devel@nongnu.org \
    --cc=qemulist@gmail.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).