virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>,
	Jason Wang <jasowang@redhat.com>
Cc: xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, mst@redhat.com
Subject: Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release
Date: Thu, 19 Oct 2023 15:28:15 -0700	[thread overview]
Message-ID: <07174d80-afad-4017-9088-02f2133e64b5@oracle.com> (raw)
In-Reply-To: <CAJaqyWdMPVUd_zjd0nkQKvDmG2HPe5DBS-w5=mx4qSPCqtDJwg@mail.gmail.com>



On 10/19/2023 7:39 AM, Eugenio Perez Martin wrote:
> On Thu, Oct 19, 2023 at 10:27 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>
>>>
>>> On 10/18/2023 7:53 PM, Jason Wang wrote:
>>>> On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>
>>>>> On 10/18/2023 12:00 AM, Jason Wang wrote:
>>>>>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
>>>>>>> don't have a better choice. Or we can fail the probe if userspace
>>>>>>> doesn't ack this feature.
>>>>>> Antoher idea we can just do the following in vhost_vdpa reset?
>>>>>>
>>>>>> config->reset()
>>>>>> if (IOTLB_PERSIST is not set) {
>>>>>>        config->reset_map()
>>>>>> }
>>>>>>
>>>>>> Then we don't have the burden to maintain them in the parent?
>>>>>>
>>>>>> Thanks
>>>>> Please see my earlier response in the other email, thanks.
>>>>>
>>>>> ----------------%<----------------%<----------------
>>>>>
>>>>> First, the ideal fix would be to leave this reset_vendor_mappings()
>>>>> emulation code on the individual driver itself, which already has the
>>>>> broken behavior.
>>>> So the point is, not about whether the existing behavior is "broken"
>>>> or not.
>>> Hold on, I thought earlier we all agreed upon that the existing behavior
>>> of vendor driver self-clearing maps during .reset violates the vhost
>>> iotlb abstraction and also breaks the .set_map/.dma_map API. This is
>>> 100% buggy driver implementation itself that we should discourage or
>>> eliminate as much as possible (that's part of the goal for this series),
>> I'm not saying it's not an issue, what I'm saying is, if the fix
>> breaks another userspace, it's a new bug in the kernel. See what Linus
>> said in [1]
>>
>> "If a change results in user programs breaking, it's a bug in the kernel."
>>
>>> but here you seem to go existentialism and suggests the very opposite
>>> that every .set_map/.dma_map driver implementation, regardless being the
>>> current or the new/upcoming, should unconditionally try to emulate the
>>> broken reset behavior for the sake of not breaking older userspace.
>> Such "emulation" is not done at the parent level. New parents just
>> need to implement reset_map() or not. everything could be done inside
>> vhost-vDPA as pseudo code that is shown above.
>>
>>> Set
>>> aside the criteria and definition for how userspace can be broken, can
>>> we step back to the original question why we think it's broken, and what
>>> we can do to promote good driver implementation instead of discuss the
>>> implementation details?
>> I'm not sure I get the point of this question. I'm not saying we don't
>> need to fix, what I am saying is that such a fix must be done in a
>> negotiable way. And it's better if parents won't get any burden. It
>> can just decide to implement reset_map() or not.
>>
>>> Reading the below response I found my major
>>> points are not heard even if written for quite a few times.
>> I try my best to not ignore any important things, but I can't promise
>> I will not miss any. I hope the above clarifies my points.
>>
>>> It's not
>>> that I don't understand the importance of not breaking old userspace, I
>>> appreciate your questions and extra patience, however I do feel the
>>> "broken" part is very relevant to our discussion here.
>>> If it's broken (in the sense of vhost IOTLB API) that you agree, I think
>>> we should at least allow good driver implementations; and when you think
>>> about the possibility of those valid good driver cases
>>> (.set_map/.dma_map implementations that do not clear maps in .reset),
>>> you might be able to see why it's coded the way as it is now.
>>>
>>>>    It's about whether we could stick to the old behaviour without
>>>> too much cost. And I believe we could.
>>>>
>>>> And just to clarify here, reset_vendor_mappings() = config->reset_map()
>>>>
>>>>> But today there's no backend feature negotiation
>>>>> between vhost-vdpa and the parent driver. Do we want to send down the
>>>>> acked_backend_features to parent drivers?
>>>> There's no need to do that with the above code, or anything I missed here?
>>>>
>>>> config->reset()
>>>> if (IOTLB_PERSIST is not set) {
>>>>         config->reset_map()
>>>> }
>>> Implementation issue: this implies reset_map() has to be there for every
>>> .set_map implementations, but vendor driver implementation for custom
>>> IOMMU could well implement DMA ops by itself instead of .reset_map. This
>>> won't work for every set_map driver (think about the vduse case).
>> Well let me do it once again, reset_map() is not mandated:
>>
>> config->reset()
>> if (IOTLB_PERSIST is not set) {
>>      if (config->reset_map)
>>            config->reset_map()
> To avoid new parent drivers
I am afraid it's not just new parent drivers, but any well behaved 
driver today may well break userspace if go with this forced emulation 
code, if they have to implement reset_map for some reason (e.g. restored 
to 1:1 passthrough mapping or other default state in mapping). For new 
userspace and user driver we can guard against it using the 
IOTLB_PERSIST flag, but the above code would get a big chance to break 
setup with good driver and older userspace in practice.

And .reset_map implementation doesn't necessarily need to clear maps. 
For e.g. IOMMU API compliant driver that only needs simple DMA model for 
passthrough, all .reset_map has to do is toggle to 1:1 mapping mode to 
the default/initial state without taking care of maps, as 
vhost_vdpa_unmap(0, -1ULL) earlier should have done the map cleaning job 
already.


>   to have this behavior if they need to
> implement reset_map,
>
> What if we add a new callback like "config->buggy_virtio_reset_map",
> different from regular reset_map callback at cleanup?
Right, separating out the need for old behavior emulation from 
.reset_map is much cleaner, and this is what individual broken driver 
has to maintain without penalizing other good drivers. Good to see what 
I said earlier is heard.

> Only mlx5 and
> vdpa_sim need to implement it, with a big warning, and new parent
> drivers can trust they'll never have the old bad behavior.
Let's see what Jason will say about it and try to converge on this 
first, I think he seemed to imply that this is part of ABI that every 
driver has to make compromise for. I'd better get it ack'ed before 
proceeding to the rest.

Thanks,
-Siwei

>
>> }
>>
>> Did you see any issue with VDUSE in this case?
>>
>>> But this is not the the point I was making. I think if you agree this is
>>> purely buggy driver implementation of its own, we should try to isolate
>>> this buggy behavior to individual driver rather than overload vhost-vdpa
>>> or vdpa core's role to help implement the emulation of broken driver
>>> behavior.
>> As I pointed out, if it is not noticeable in the userspace, that's
>> fine but it's not.
>>
>>> I don't get why .reset is special here, the abuse of .reset to
>>> manipulate mapping could also happen in other IOMMU unrelated driver
>>> entries like in .suspend, or in queue_reset.
>> Who can abuse reset here? It is totally under the control of
>> vhost-vDPA and it's not visible to uAPI. And we can fully control the
>> behaviour of vhost-vDPA.
>>
>>> If someday userspace is
>>> found coded around similar buggy driver implementation in other driver
>>> ops, do we want to follow and duplicate the same emulation in vdpa core
>>> as the precedent is already set here around .reset?
>> I think so, have you seen the links I give you? If you want to go
>> through the one from Linus thread[1], you can see the one that unbreak
>> virtio-IOMMU[2]:
>>
>> 1) Someday, we spot invalidate with size 0 is a bug
>> 2) We fix this bug by not allowing this
>> 3) But virtio-IOMMU userspace find that size 0 actually clean all the
>> IOTLB so it depends on the behaviour
>> 4) So the virtio-IOMMU userspace find it can't work after 2)
>> 5) Then we recover the behaviour before 2) via [2]
>>
>> Another example is the IOTLB_MSG_V2, V1 suffers from in-stable ABI in
>> 32bit archs, most of the userspace survives since it never runs on
>> 32bit archs. The fix is to introduce a V2 but we will stick to V1 by
>> default if V2 is not acknowledged by the userspace.
>>
>> I think the above 2 examples are sufficient for us to understand the
>> case. If not, I can help to clarify more since I'm involved in those 2
>> fixes.
>>
>>> The buggy driver can fail in a lot of other ways indefinitely during
>>> reset, if there's a buggy driver that's already broken the way as how it
>>> is and happens to survive with all userspace apps, we just don't care
>>> and let it be.
>> Without IOTLB_PRESIST it doesn't break. With IOTLB_PERSIST and if the
>> reset_map() is done unconditionally, it can break. That's my point.
>>
>>> There's no way we can enumerate all those buggy behaviors
>>> in .reset_map itself, it's overloading that driver API too much.
>> If it is not noticeable by userspace, we can do any fix at will. But
>> it is not, we don't have another choice. Especially considering the
>> cost is rather low.
>>
>>>>> Second, IOTLB_PERSIST is needed but not sufficient. Due to lack of
>>>>> backend feature negotiation in parent driver, if vhost-vdpa has to
>>>>> provide the old-behaviour emulation for compatibility on driver's
>>>>> behalf, it needs to be done per-driver basis. There could be good
>>>>> on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in
>>>>> .reset, and vendor specific IOMMU doesn't have to provide .reset_map,
>>>> Then we just don't offer IOTLB_PRESIST, isn't this by design?
>>> Think about the vduse case, it can work with DMA ops directly so doesn't
>>> have to implement .reset_map, unless for some specific good reason.
>>> Because it's a conforming and valid/good driver implementation, we may
>>> still allow it to advertise IOTLB_PERSIST to userspace.
>> I would like to know why this can't work in this case:
>>
>> config->reset()
>> if (IOTLB_PERSIST is not set) {
>>      if (config->reset_map)
>>            config->reset_map()
>> }
>>
>>> Which belongs to
>>> the 3rd bullet below:
>>>
>>> https://lore.kernel.org/virtualization/1696928580-7520-4-git-send-email-si-wei.liu@oracle.com/
>>>
>>> There are 3 cases that backend may claim this feature bit on:
>>>
>>> - parent device that has to work with platform IOMMU
>>> - parent device with on-chip IOMMU that has the expected
>>>     .reset_map support in driver
>>> - parent device with vendor specific IOMMU implementation
>>>     that explicitly declares the specific backend feature
>>>
>>>>> we
>>>>> should allow these good driver implementations rather than
>>>>> unconditionally stick to some specific problematic behavior for every
>>>>> other good driver.
>>>> Then you can force reset_map() with set_map() that is what I suggest
>>>> in another thread, no?
>>> This is exactly what I was afraid of that broken behavior emulation may
>>> become a dangerous slippery slope - in principle we should encourage
>>> good driver implementation, as they can work totally fine with older
>>> userspace. Why do they have to bother emulating broken behavior just
>>> because some other driver's misbehaving?
>> Please read the link [1], Linus has explained it.
>>
>>> And what's the boundary for
>>> this hack, do drivers backed by platform IOMMU even have to emulate (if
>>> not why not, and is there substantial difference in between)?
>> The boundary is whether the behaviour change could be noticed but
>> userspace. And I've shown you it's not a burden with the pseudo codes.
>> If not, please explain why.
>>
>>> After
>>> getting through all of this, do you still believe everything is just as
>>> easy and simple as what thought to be?
>> The truth is that bugs exist everywhere. We can't promise there's no
>> bug when developing an uAPI or subsystem. For kernel code, the bug
>> that touches uAPI might be fixed in a way that doesn't break existing
>> userspace. If you look at how downstream to maintain kABI, you will be
>> supersized furtherly.
>>
>>> Btw, I thought I was expecting but still haven't got the clear answers
>>> to what was the goal to do all this, we spent a lot of time trying to
>>> unbreak userspace,
>> The code is pretty simple. But yes, the time spent on justifying it
>> might take some time. That's the community. People need time to
>> understand each other's points.
>>
>>> but looks to me as if we were trying every possible
>>> way to break userspace
>> How could my suggestions break a userspace?
>>
>>> or try to approximate to the same brokenness
>>> mlx5_vdpa may have caused to the userspace. What we will get eventually
>>> from these lengthy discussions?
>> Siwei, I'd really suggest you read the link I gave you. You may get
>> the answer. What's more, It doesn't cost too much then we know for
>> sure there would not be any issue, why not choose the hard way?
>>
>>> On the other hand, if you think it from
>>> vhost-vdpa user perspective, you'll clearly see there's just a couple of
>>> ways to unbreak userspace from the internal broken map which is out of
>>> sync with vhost-vdpa iotlb after device reset.
>> Patches are more than welcomed.
>>
>>> If this brokenness was
>>> something universally done from the vhost-vdpa layer itself, I'd feel
>>> it's more of a shared problem, but this is not the case I see it here.
>>> While the long standing mlx5_vdpa/vdpa_sim issue is 100% misuse of
>>> .reset op in a wrong way per IOMMU API definition. Why leaving this
>>> discrepancy to the individual driver is not even an option, I'm still
>>> not sure?
>> Sorry? I start with a switch in the driver, and then I try to avoid
>> that. And it seems you don't want a burden on the driver as well.
>> Where did you see I say we can't do that in the driver? What I
>> disagree with is to use a module parameter.
>>
>> Even if I fail, it doesn't mean we can't do that in the driver code.
>> If you read the link[1] you can see the offending commit is a change
>> in uvcvideo driver.
>>
>> Thanks
>>
>>>
>>> Thanks,
>>> -Siwei
>>>
>>>>> Then we need a set of device flags (backend_features
>>>>> bit again?) to indicate the specific driver needs upper layer's help on
>>>>> old-behaviour emulation.
>>>>>
>>>>> Last but not least, I'm not sure how to properly emulate
>>>>> reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no
>>>>> .reset_map op implemented, or if .reset_map has a slightly different
>>>>> implementation than what it used to reset the iotlb in the .reset op,
>>>> See above, for reset_vendor_mappings() I meant config->reset_map() exactly.
>>>>
>>>> Thanks
>>>>
>>>>> then this either becomes effectively dead code if no one ends up using,
>>>>> or the vhost-vdpa emulation is helpless and limited in scope, unable to
>>>>> cover all the cases.
>>>>>
>>>>> ----------------%<----------------%<----------------
>>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2023-10-19 22:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10  9:02 [PATCH 0/4] vdpa: decouple reset of iotlb mapping from device reset Si-Wei Liu
2023-10-10  9:02 ` [PATCH 1/4] vdpa: introduce .reset_map operation callback Si-Wei Liu
2023-10-13  2:49   ` Jason Wang
2023-10-13  7:36     ` Si-Wei Liu
2023-10-16  5:30       ` Jason Wang
2023-10-10  9:02 ` [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release Si-Wei Liu
     [not found]   ` <CAJaqyWe=RwotkcNKFuStvX=HxQh6sdtfsH23jhf994eXi3-2Og@mail.gmail.com>
2023-10-12  6:18     ` Si-Wei Liu
2023-10-13  3:01   ` Jason Wang
2023-10-13  7:35     ` Si-Wei Liu
2023-10-16  6:32       ` Jason Wang
2023-10-16 20:10         ` Si-Wei Liu
     [not found]         ` <CAJaqyWf0AhsS6kaGUMVCosDjuRoeCAqO3OTVC=veqjV3jCqUjQ@mail.gmail.com>
2023-10-16 20:30           ` Si-Wei Liu
2023-10-17  2:35             ` Jason Wang
2023-10-18  4:35               ` Si-Wei Liu
2023-10-18  5:27                 ` Jason Wang
2023-10-18  7:00                   ` Jason Wang
2023-10-18  8:49                     ` Si-Wei Liu
2023-10-19  2:53                       ` Jason Wang
2023-10-19  6:46                         ` Si-Wei Liu
2023-10-19  8:27                           ` Jason Wang
     [not found]                             ` <CAJaqyWdMPVUd_zjd0nkQKvDmG2HPe5DBS-w5=mx4qSPCqtDJwg@mail.gmail.com>
2023-10-19 22:28                               ` Si-Wei Liu [this message]
2023-10-20  4:11                                 ` Jason Wang
2023-10-20  5:57                                   ` Si-Wei Liu
2023-10-18  8:44                   ` Si-Wei Liu
     [not found]                     ` <CAJaqyWc01_YgkhLRs961a-K1P+Zj4P+6qGN1t=eOFFwGvQ001A@mail.gmail.com>
2023-10-18 23:21                       ` Si-Wei Liu
2023-10-19  2:48                         ` Jason Wang
2023-10-19 22:57                   ` Si-Wei Liu
2023-10-10  9:02 ` [PATCH 3/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit Si-Wei Liu
2023-10-10  9:03 ` [PATCH 4/4] vdpa/mlx5: implement .reset_map driver op Si-Wei Liu
2023-10-13  3:04   ` Jason Wang
2023-10-13  7:55     ` Si-Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07174d80-afad-4017-9088-02f2133e64b5@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).