From: Avi Kivity <avi@redhat.com>
To: liu ping fan <qemulist@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Anthony Liguori <anthony@codemonkey.ws>,
Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap
Date: Wed, 19 Sep 2012 11:01:53 +0300 [thread overview]
Message-ID: <50597BF1.9070703@redhat.com> (raw)
In-Reply-To: <CAJnKYQmaKQXW4LEnciAMG0g=X-zSJtmJZfiW4BWpRMD87zLF5A@mail.gmail.com>
On 09/17/2012 05:24 AM, liu ping fan wrote:
> On Thu, Sep 13, 2012 at 4:19 PM, Avi Kivity <avi@redhat.com> wrote:
>> On 09/13/2012 09:55 AM, liu ping fan wrote:
>>> On Tue, Sep 11, 2012 at 8:41 PM, Avi Kivity <avi@redhat.com> wrote:
>>>> On 09/11/2012 03:24 PM, Avi Kivity wrote:
>>>>> On 09/11/2012 12:57 PM, Jan Kiszka wrote:
>>>>>> On 2012-09-11 11:44, liu ping fan wrote:
>>>>>>> On Tue, Sep 11, 2012 at 4:35 PM, Avi Kivity <avi@redhat.com> wrote:
>>>>>>>> On 09/11/2012 10:51 AM, Liu Ping Fan wrote:
>>>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>>>>
>>>>>>>>> The func call chain can suffer from recursively hold
>>>>>>>>> qemu_mutex_lock_iothread. We introduce lockmap to record the
>>>>>>>>> lock depth.
>>>>>>>>
>>>>>>>> What is the root cause? io handlers initiating I/O?
>>>>>>>>
>>>>>>> cpu_physical_memory_rw() can be called nested, and when called, it can
>>>>>>> be protected by no-lock/device lock/ big-lock.
>>>>>>> I think without big lock, io-dispatcher should face the same issue.
>>>>>>> As to main-loop, have not carefully consider, but at least, dma-helper
>>>>>>> will call cpu_physical_memory_rw() with big-lock.
>>>>>>
>>>>>> That is our core problem: inconsistent invocation of existing services
>>>>>> /wrt locking. For portio, I was lucky that there is no nesting and I was
>>>>>> able to drop the big lock around all (x86) call sites. But MMIO is way
>>>>>> more tricky due to DMA nesting.
>>>>>
>>>>> Maybe we need to switch to a continuation style. Instead of expecting
>>>>> cpu_physical_memory_rw() to complete synchronously, it becomes an
>>>>> asynchronous call and you provide it with a completion. That means
>>>>> devices which use it are forced to drop the lock in between. Block and
>>>>> network clients will be easy to convert since they already use APIs that
>>>>> drop the lock (except for accessing the descriptors).
>>>>>
>>>>>> We could try to introduce a different version of cpu_physical_memory_rw,
>>>>>> cpu_physical_memory_rw_unlocked. But the problem remains that an MMIO
>>>>>> request can trigger the very same access in a nested fashion, and we
>>>>>> will have to detect this to avoid locking up QEMU (locking up the guest
>>>>>> might be OK).
>>>>>
>>>>> An async version of c_p_m_rw() will just cause a completion to bounce
>>>>> around, consuming cpu but not deadlocking anything. If we can keep a
>>>>> count of the bounces, we might be able to stall it indefinitely or at
>>>>> least ratelimit it.
>>>>>
>>>>
>>>> Another option is to require all users of c_p_m_rw() and related to use
>>>> a coroutine or thread. That makes the programming easier (but still
>>>> required a revalidation after the dropped lock).
>>>>
>>> For the nested cpu_physical_memory_rw(), we change it internal but
>>> keep it sync API as it is. (Wrapper current cpu_physical_memory_rw()
>>> into cpu_physical_memory_rw_internal() )
>>>
>>>
>>> LOCK() // can be device or big lock or both, depend on caller.
>>> ..............
>>> cpu_physical_memory_rw()
>>> {
>>> UNLOCK() //unlock all the locks
>>> queue_work_on_thread(cpu_physical_memory_rw_internal, completion);
>>> // cpu_physical_memory_rw_internal can take lock(device,biglock) again
>>> wait_for_completion(completion)
>>> LOCK()
>>> }
>>> ..................
>>> UNLOCK()
>>>
>>
>> This is dangerous. The caller expects to hold the lock across the call,
>> and will not re-validate its state.
>>
>>> Although cpu_physical_memory_rw_internal() can be freed to use lock,
>>> but with precondition. -- We still need to trace lock stack taken by
>>> cpu_physical_memory_rw(), so that it can return to caller correctly.
>>> Is that OK?
>>
>> I'm convinced that we need a recursive lock if we don't convert
>> everything at once.
>>
>> So how about:
>>
>> - make bql a recursive lock (use pthreads, don't invent our own)
>> - change c_p_m_rw() calling convention to "caller may hold the BQL, but
>> no device lock"
>>
>> this allows devices to DMA each other. Unconverted device models will
>> work as they used to. Converted device models will need to drop the
>> device lock, re-acquire it, and revalidate any state. That will cause
>
> I think we are cornered by devices to DMA each other, and it raises
> the unavoidable nested lock. Also to avoid deadlock caused by device's
> lock sequence, we should drop the current device lock to acquire
> another one. The only little diverge is about the "revalidate", do
> we need to rollback?
It basically means you can't hold contents of device state in local
variables. You need to read everything again from the device. That
includes things like DMA enable bits.
(btw: to we respect PCI_COMMAND_MASTER? it seems that we do not. This
is a trivial example of an iommu, we should get that going).
> Or for converted device, we can just tag it a
> busy flag, that is check&set busy flag at the entry of device's
> mmio-dispatch. So when re-acquire device's lock, the device's state
> is intact.
The state can be changed by a parallel access to another register, which
is valid.
>
> Has anybody other suggestion?
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2012-09-19 8:02 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-11 7:51 [Qemu-devel] [PATCH V3 0/10] prepare unplug out of protection of global lock Liu Ping Fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 01/11] atomic: introduce atomic operations Liu Ping Fan
2012-09-11 8:04 ` Avi Kivity
2012-09-13 6:54 ` liu ping fan
2012-09-13 8:14 ` Avi Kivity
2012-09-13 8:19 ` Paolo Bonzini
2012-09-13 8:23 ` Avi Kivity
2012-09-13 8:29 ` Paolo Bonzini
2012-09-13 8:45 ` liu ping fan
2012-09-19 13:16 ` Jamie Lokier
2012-09-19 13:32 ` Jamie Lokier
2012-09-19 14:12 ` Peter Maydell
2012-09-19 15:53 ` Jamie Lokier
2012-09-11 8:15 ` Peter Maydell
2012-09-13 6:53 ` liu ping fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 02/11] qom: apply atomic on object's refcount Liu Ping Fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 03/11] hotplug: introduce qdev_unplug_complete() to remove device from views Liu Ping Fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 04/11] pci: remove pci device from mem view when unplug Liu Ping Fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 05/11] memory: introduce ref, unref interface for MemoryRegionOps Liu Ping Fan
2012-09-11 8:08 ` Avi Kivity
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 06/11] memory: make mmio dispatch able to be out of biglock Liu Ping Fan
2012-09-11 8:25 ` Avi Kivity
2012-09-11 8:47 ` Avi Kivity
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 07/11] memory: implement e1000's MemoryRegionOps ref/unref Liu Ping Fan
2012-09-11 8:37 ` Avi Kivity
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 08/11] qom: introduce reclaimer to release obj in async Liu Ping Fan
2012-09-11 8:32 ` Avi Kivity
2012-09-11 9:32 ` liu ping fan
2012-09-11 9:37 ` Avi Kivity
2012-09-13 6:54 ` liu ping fan
2012-09-13 8:45 ` Avi Kivity
2012-09-13 9:59 ` liu ping fan
2012-09-13 10:09 ` Avi Kivity
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 09/11] vcpu: make QemuThread as tls to store thread-self info Liu Ping Fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap Liu Ping Fan
2012-09-11 8:35 ` Avi Kivity
2012-09-11 9:44 ` liu ping fan
2012-09-11 9:54 ` Avi Kivity
2012-09-11 10:04 ` Jan Kiszka
2012-09-11 11:03 ` Avi Kivity
2012-09-11 11:08 ` Jan Kiszka
2012-09-11 12:20 ` Avi Kivity
2012-09-11 12:25 ` Jan Kiszka
2012-09-11 12:30 ` Avi Kivity
2012-09-11 12:35 ` Jan Kiszka
2012-09-11 12:39 ` Avi Kivity
2012-09-19 4:25 ` Peter Crosthwaite
2012-09-19 4:32 ` Edgar E. Iglesias
2012-09-19 4:40 ` Peter Crosthwaite
2012-09-19 7:55 ` Avi Kivity
2012-09-19 11:46 ` Edgar E. Iglesias
2012-09-19 12:12 ` Avi Kivity
2012-09-19 12:17 ` Edgar E. Iglesias
2012-09-19 13:01 ` Igor Mitsyanko
2012-09-19 13:03 ` Avi Kivity
2012-09-19 7:57 ` Jan Kiszka
2012-09-19 13:07 ` Igor Mitsyanko
2012-09-11 9:57 ` Jan Kiszka
2012-09-11 12:24 ` Avi Kivity
2012-09-11 12:41 ` Avi Kivity
2012-09-11 14:54 ` Marcelo Tosatti
2012-09-13 6:55 ` liu ping fan
2012-09-13 6:55 ` liu ping fan
2012-09-13 8:19 ` Avi Kivity
2012-09-17 2:24 ` liu ping fan
2012-09-19 8:01 ` Avi Kivity [this message]
2012-09-19 8:36 ` liu ping fan
2012-09-19 9:05 ` Avi Kivity
2012-09-20 7:51 ` liu ping fan
2012-09-20 9:15 ` Avi Kivity
2012-09-21 7:27 ` liu ping fan
2012-09-11 7:51 ` [Qemu-devel] [PATCH V3 11/11] vcpu: push mmio dispatcher out of big lock Liu Ping Fan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50597BF1.9070703@redhat.com \
--to=avi@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=jan.kiszka@siemens.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemulist@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).