From: Jason Wang <jasowang@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Fam Zheng <famz@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org,
Alex Williamson <alex.williamson@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Jintack Lim <jintack@cs.columbia.edu>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock
Date: Sat, 28 Apr 2018 11:11:53 +0800 [thread overview]
Message-ID: <cd0a28a9-2082-9c77-0b72-f2451764bd9e@redhat.com> (raw)
In-Reply-To: <20180428030601.GJ13269@xz-mi>
On 2018年04月28日 11:06, Peter Xu wrote:
> On Sat, Apr 28, 2018 at 10:42:11AM +0800, Jason Wang wrote:
>>
>> On 2018年04月28日 10:24, Peter Xu wrote:
>>> On Sat, Apr 28, 2018 at 09:43:54AM +0800, Jason Wang wrote:
>>>> On 2018年04月27日 14:26, Peter Xu wrote:
>>>>> On Fri, Apr 27, 2018 at 01:13:02PM +0800, Jason Wang wrote:
>>>>>> On 2018年04月25日 12:51, Peter Xu wrote:
>>>>>>> Add a per-iommu big lock to protect IOMMU status. Currently the only
>>>>>>> thing to be protected is the IOTLB cache, since that can be accessed
>>>>>>> even without BQL, e.g., in IO dataplane.
>>>>>>>
>>>>>>> Note that device page tables should not need any protection. The safety
>>>>>>> of that should be provided by guest OS. E.g., when a page entry is
>>>>>>> freed, the guest OS should be responsible to make sure that no device
>>>>>>> will be using that page any more.
>>>>>>>
>>>>>>> Reported-by: Fam Zheng<famz@redhat.com>
>>>>>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>>>>>> ---
>>>>>>> include/hw/i386/intel_iommu.h | 8 ++++++++
>>>>>>> hw/i386/intel_iommu.c | 31 +++++++++++++++++++++++++++++--
>>>>>>> 2 files changed, 37 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
>>>>>>> index 220697253f..1a8ba8e415 100644
>>>>>>> --- a/include/hw/i386/intel_iommu.h
>>>>>>> +++ b/include/hw/i386/intel_iommu.h
>>>>>>> @@ -262,6 +262,14 @@ struct IntelIOMMUState {
>>>>>>> uint8_t w1cmask[DMAR_REG_SIZE]; /* RW1C(Write 1 to Clear) bytes */
>>>>>>> uint8_t womask[DMAR_REG_SIZE]; /* WO (write only - read returns 0) */
>>>>>>> uint32_t version;
>>>>>>> + /*
>>>>>>> + * Protects IOMMU states in general. Normally we don't need to
>>>>>>> + * take this lock when we are with BQL held. However we have code
>>>>>>> + * paths that may run even without BQL. In those cases, we need
>>>>>>> + * to take the lock when we have access to IOMMU state
>>>>>>> + * informations, e.g., the IOTLB.
>>>>>>> + */
>>>>>>> + QemuMutex iommu_lock;
>>>>>> Some questions:
>>>>>>
>>>>>> 1) Do we need to protect context cache too?
>>>>> IMHO the context cache entry should work even without lock. That's a
>>>>> bit trickly since we have two cases that this cache will be updated:
>>>>>
>>>>> (1) first translation of the address space of a device
>>>>> (2) invalidation of context entries
>>>>>
>>>>> For (2) IMHO we don't need to worry about since guest OS should be
>>>>> controlling that part, say, device should not be doing any translation
>>>>> (DMA operations) when the context entry is invalidated.
>>>>>
>>>>> For (1) the worst case is that the context entry cache be updated
>>>>> multiple times with the same value by multiple threads. IMHO that'll
>>>>> be fine too.
>>>>>
>>>>> But yes for sure we can protect that too with the iommu lock.
>>>>>
>>>>>> 2) Can we just reuse qemu BQL here?
>>>>> I would prefer not. As I mentioned, at least I have spent too much
>>>>> time on fighting BQL already. I really hope we can start to use
>>>>> isolated locks when capable. BQL is always the worst choice to me.
>>>> Just a thought, using BQL may greatly simplify the code actually (consider
>>>> we don't plan to remove BQL now).
>>> Frankly speaking I don't understand why using BQL may greatly simplify
>>> the code... :( IMHO the lock here is really not a complicated one.
>>>
>>> Note that IMO BQL is mostly helpful when we really want something to
>>> be run sequentially with some other things _already_ protected by BQL.
>> Except for the translate path from dataplane, I belive all other codes were
>> already protected by BQL.
>>
>>> In this case, all the stuff is inside VT-d code itself (or other
>>> IOMMUs), why bother taking the BQL to make our life harder?
>> It looks to me it was as simple as:
>>
>> @@ -494,6 +494,7 @@ static MemoryRegionSection
>> flatview_do_translate(FlatView *fv,
>> IOMMUMemoryRegionClass *imrc;
>> hwaddr page_mask = (hwaddr)(-1);
>> hwaddr plen = (hwaddr)(-1);
>> + int locked = false;
>>
>> if (plen_out) {
>> plen = *plen_out;
>> @@ -510,8 +511,15 @@ static MemoryRegionSection
>> flatview_do_translate(FlatView *fv,
>> }
>> imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
>>
>> + if (!qemu_mutex_iothread_locked()) {
>> + locked = true;
>> + qemu_mutex_lock_iothread();
>> + }
>> iotlb = imrc->translate(iommu_mr, addr, is_write ?
>> IOMMU_WO : IOMMU_RO);
>> + if (locked) {
>> + qemu_mutex_unlock_iothread();
>> + }
>> addr = ((iotlb.translated_addr & ~iotlb.addr_mask)
>> | (addr & iotlb.addr_mask));
>> page_mask &= iotlb.addr_mask;
> We'll need to add the flags thing too. How do we flag-out existing
> thread-safe IOMMUs?
We can let thread safe IOMMU code to choose to set a flag somewhere.
>
>>
>>> So, even if we want to provide a general lock for the translation
>>> procedure, I would prefer we add a per AddressSpace lock but not BQL.
>> It could be, but it needs more work on each specific IOMMU codes.
>>
>>> However still that will need some extra flag showing that whether we
>>> need the protection of not. For example, we may need to expliclitly
>>> turn that off for Power and s390. Would that really worth it?
>> It would cost just several lines of code, anything wrong with this?
> It's not about anything wrong; it's just about preference.
>
> I never said BQL won't work here. It will work. But if you have
> spent tens of hours working on BQL-related problems maybe you'll have
> the same preference as me... :)
>
> IMHO the point is to decide which might be simpler and more efficient
> in general, really.
So I'm not against your approach. It could be on top of the BQL patch I
think.
>
>>> So my final preference is still current patch - we solve thread-safety
>>> problems in VT-d and IOMMU code. Again, we really should make sure
>>> all IOMMUs work with multithreads.
>>>
>>>>>> 3) I think the issue is common to all other kinds of IOMMU, so can we simply
>>>>>> synchronize before calling ->translate() in memory.c. This seems a more
>>>>>> common solution.
>>>>> I suspect Power and s390 live well with that. I think it mean at
>>>>> least these platforms won't have problem in concurrency. I'm adding
>>>>> DavidG in loop in case there is further comment. IMHO we should just
>>>>> make sure IOMMU code be thread safe, and we fix problem if there is.
>>>>>
>>>>> Thanks,
>>>>>
>>>> Yes, it needs some investigation, but we have other IOMMUs like AMD, and we
>>>> could have a flag to bypass BQL if IOMMU can synchronize by itself.
>>> AMD is still only for experimental. If we really want to use it in
>>> production IMHO it'll need more testings and tunings not only on
>>> thread-safety but on other stuffs too. So again, we can just fix them
>>> when needed. I still don't see it a reason to depend on BQL here.
>> Well, it's not about BQL specifically, it's about whether we have or need a
>> generic thread safety solution for all IOMMUs.
>>
>> We have more IOMMUs than just AMD, s390 and ppc:
>>
>> # git grep imrc-\>translate\ =
>> hw/alpha/typhoon.c: imrc->translate = typhoon_translate_iommu;
>> hw/dma/rc4030.c: imrc->translate = rc4030_dma_translate;
>> hw/i386/amd_iommu.c: imrc->translate = amdvi_translate;
>> hw/i386/intel_iommu.c: imrc->translate = vtd_iommu_translate;
>> hw/ppc/spapr_iommu.c: imrc->translate = spapr_tce_translate_iommu;
>> hw/s390x/s390-pci-bus.c: imrc->translate = s390_translate_iommu;
>> hw/sparc/sun4m_iommu.c: imrc->translate = sun4m_translate_iommu;
>> hw/sparc64/sun4u_iommu.c: imrc->translate = sun4u_translate_iommu;
>>
>> And we know there will be more in the near future.
> Again - here I would suggest we consider thread-safe when implementing
> new ones. I suppose it should not be a hard thing to achieve.
>
> I don't have more and new input here since I have had some in previous
> posts already. If this is still during discussion before the next
> post, I'll pick this patch out of the series since this patch is not
> related to other patches at all, so can be dealt with isolatedly.
>
> Thanks,
>
I fully understand the your motivation, just want to see if we can do
something simply for all other IOMMUs. I think this series can go alone
without caring other IOMMUs for sure.
Thanks
next prev parent reply other threads:[~2018-04-28 3:12 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-25 4:51 [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 01/10] intel-iommu: send PSI always even if across PDEs Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 02/10] intel-iommu: remove IntelIOMMUNotifierNode Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 03/10] intel-iommu: add iommu lock Peter Xu
2018-04-25 16:26 ` Emilio G. Cota
2018-04-26 5:45 ` Peter Xu
2018-04-27 5:13 ` Jason Wang
2018-04-27 6:26 ` Peter Xu
2018-04-27 7:19 ` Tian, Kevin
2018-04-27 9:53 ` Peter Xu
2018-04-28 1:54 ` Tian, Kevin
2018-04-28 1:43 ` Jason Wang
2018-04-28 2:24 ` Peter Xu
2018-04-28 2:42 ` Jason Wang
2018-04-28 3:06 ` Peter Xu
2018-04-28 3:11 ` Jason Wang [this message]
2018-04-28 3:14 ` Peter Xu
2018-04-28 3:16 ` Jason Wang
2018-04-30 7:22 ` Paolo Bonzini
2018-04-30 7:20 ` Paolo Bonzini
2018-05-03 5:39 ` Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 04/10] intel-iommu: only do page walk for MAP notifiers Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 05/10] intel-iommu: introduce vtd_page_walk_info Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 06/10] intel-iommu: pass in address space when page walk Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 07/10] util: implement simple interval tree logic Peter Xu
2018-04-27 5:53 ` Jason Wang
2018-04-27 6:27 ` Peter Xu
2018-05-03 7:10 ` Peter Xu
2018-05-03 7:21 ` Jason Wang
2018-05-03 7:30 ` Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 08/10] intel-iommu: maintain per-device iova ranges Peter Xu
2018-04-27 6:07 ` Jason Wang
2018-04-27 6:34 ` Peter Xu
2018-04-27 7:02 ` Tian, Kevin
2018-04-27 7:28 ` Peter Xu
2018-04-27 7:44 ` Tian, Kevin
2018-04-27 9:55 ` Peter Xu
2018-04-27 11:40 ` Peter Xu
2018-04-27 23:37 ` Tian, Kevin
2018-05-03 6:04 ` Peter Xu
2018-05-03 7:20 ` Jason Wang
2018-05-03 7:28 ` Peter Xu
2018-05-03 7:43 ` Jason Wang
2018-05-03 7:53 ` Peter Xu
2018-05-03 9:22 ` Jason Wang
2018-05-03 9:53 ` Peter Xu
2018-05-03 12:01 ` Peter Xu
2018-04-28 1:49 ` Jason Wang
2018-04-25 4:51 ` [Qemu-devel] [PATCH 09/10] intel-iommu: don't unmap all for shadow page table Peter Xu
2018-04-25 4:51 ` [Qemu-devel] [PATCH 10/10] intel-iommu: remove notify_unmap for page walk Peter Xu
2018-04-25 5:05 ` [Qemu-devel] [PATCH 00/10] intel-iommu: nested vIOMMU, cleanups, bug fixes no-reply
2018-04-25 5:34 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cd0a28a9-2082-9c77-0b72-f2451764bd9e@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=david@gibson.dropbear.id.au \
--cc=famz@redhat.com \
--cc=jintack@cs.columbia.edu \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).