From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Xie Yongji <xieyongji@bytedance.com>,
virtualization@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, markver@us.ibm.com,
Christian Borntraeger <borntraeger@de.ibm.com>,
linux-s390@vger.kernel.org, Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [RFC PATCH 1/1] virtio: write back features before verify
Date: Fri, 01 Oct 2021 17:18:46 +0200 [thread overview]
Message-ID: <87v92g3h9l.fsf@redhat.com> (raw)
In-Reply-To: <20211001162213.18d7375e.pasic@linux.ibm.com>
On Fri, Oct 01 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> On Thu, 30 Sep 2021 13:31:04 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>> > On Thu, 30 Sep 2021 11:28:23 +0200
>> > Cornelia Huck <cohuck@redhat.com> wrote:
>> >
>> >> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> >> > if (device_features & (1ULL << i))
>> >> > __virtio_set_bit(dev, i);
>> >> >
>> >> > + /* Write back features before validate to know endianness */
>> >> > + if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> >> > + dev->config->finalize_features(dev);
>> >>
>> >> This really looks like a mess :(
>> >>
>> >> We end up calling ->finalize_features twice: once before ->validate, and
>> >> once after, that time with the complete song and dance. The first time,
>> >> we operate on one feature set; after validation, we operate on another,
>> >> and there might be interdependencies between the two (like a that a bit
>> >> is cleared because of another bit, which would not happen if validate
>> >> had a chance to clear that bit before).
>> >
>> > Basically the second set is a subset of the first set.
>>
>> I don't think that's clear.
>
> Validate can only remove features, or? So I guess after validate
> is a subset of before validate.
I was thinking about (more-or-less hypothetical) interdependencies (see
above). But that's not terribly important.
>
>
>>
>> >
>> >>
>> >> I'm not sure whether that is even a problem in the spec: while the
>> >> driver may read the config before finally accepting features
>> >
>> > I'm not sure I'm following you. Let me please qoute the specification:
>> > """
>> > 4. Read device feature bits, and write the subset of feature bits
>> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it.
>> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
>> > """
>> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001
>>
>> Yes, exactly, it MAY read before accepting features. How does the device
>> know whether the config space is little-endian or not?
>>
>
> Well that is what we are talking about. One can try to infer things from
> the spec. This reset dance I called ugly is probably the cleanest,
> because the spec says that re-nego should work.
>
>> >
>> >> , it does
>> >> not really make sense to do so before a feature bit as basic as
>> >> VERSION_1 which determines the endianness has been negotiated.
>> >
>> > Are you suggesting that ->verify() should be after
>> > virtio_finalize_features()?
>>
>> No, that would defeat the entire purpose of verify. After
>> virtio_finalize_features(), we are done with feature negotiation.
>>
>
> Exactly!
It seems we are in violent agreement :)
>
>> > Wouldn't
>> > that mean that verify() can't reject feature bits? But that is the whole
>> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
>> > in config space"). Do you think that the commit in question is
>> > conceptually flawed? My understanding of the verify is, that it is supposed
>> > to fence features and feature bits we can't support, e.g. because of
>> > config space things, but I may be wrong.
>>
>> No, that commit is not really flawed on its own, I think the whole
>> procedure may be problematic.
>>
>
> I agree! But that regression really hurts us. Maybe the best band-aid is
> to conditional-compile it (not compile the check if s390).
It's probably most likely to hit on s390 (big-endian, and devices with a
blocksize != 512 in common use); but I'd like to make that band-aid more
generic than "exclude for s390". A hack for honouring VERSION_1 before
negotiation has finished is probably better as a stop-gap before we
manage to figure out how to deal with this properly.
>
>> >
>> > The trouble is, feature bits are not negotiated one by one, but basically all
>> > at once. I suppose, I did the next best thing to first negotiating
>> > VERSION_1.
>>
>> We probably need to special-case VERSION_1 to move at least forward;
>> i.e. proceed as if we accepted it when reading the config space.
>>
>> The problem is that we do not know what the device assumes when we read
>> the config space prior to setting FEATURES_OK. It may assume
>> little-endian if it offered VERSION_1, or it may not. The spec does not
>> really say what happens before feature negotiation has finished.
>>
> No it does not, but I hope, the implementations we care the most about do
> little endian if VERSION_1 is set but FEATURES_OK is not yet done. A
> transitional device would have to act upon a feature that is set,
> because for legacy there is no FEATURES_OK. Where we can run into
> trouble is minimum required feature set, e.g. mandatory features.
All ugly :(
>
> I will do some testing.
>
>> >
>> >
>> >> For
>> >> VERSION_1, we can probably go ahead and just assume that we will accept
>> >> it if offered, but what about other (future) bits?
>> >
>> > I don't quite understand.
>>
>> There might be other bits in the future that change how the config space
>> works. We cannot assume that any of those bits will be accepted if
>> offered; i.e. we need a special hack for VERSION_1.
>
> I tend to agree. What I didn't consider in this patch is that, setting
> bits does not only set bits, but may also change the device in a way,
> that clearing the bit would not change it back.
>
>>
>> >
>> > Anyway, how do you think we should solve this problem?
>>
>> This is a mess. For starters, we need to think about if we should do
>> something in the spec, and if yes, what.. Then, we can probably think
>> about how to implement that properly.
>>
>
> I agree.
>
>
>> As we have an error right now that is basically a regression, we
>> probably need a band-aid to keep going. Not sure if your patch is the
>> right approach, maybe we really need to special-case VERSION_1 (the
>> "assume we accepted it" hack mentioned above.) This will likely fix the
>> reported problem (I assume that is s390x on QEMU); do we know about
>> other VMMs? Any other big-endian architectures?
>
> I didn't quite get it. Would this hack take place in QEMU or in the guest
> kernel?
I'd say we need a hack here so that we assume little-endian config space
if VERSION_1 has been offered; if your patch here works, I assume QEMU
does what we expect (assmuming little-endian as well.) I'm mostly
wondering what happens if you use a different VMM; can we expect it to
work similar to QEMU? Even if it helps for s390, we should double-check
what happens for other architectures.
>
>>
>> Anyone have any better suggestions?
>>
>
> There is the conditional compile, as an option but I would not say it is
> better.
Yes, I agree.
Anyone else have an idea? This is a nasty regression; we could revert the
patch, which would remove the symptoms and give us some time, but that
doesn't really feel right, I'd do that only as a last resort.
WARNING: multiple messages have this Message-ID (diff)
From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, markver@us.ibm.com,
"Michael S. Tsirkin" <mst@redhat.com>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Halil Pasic <pasic@linux.ibm.com>,
Xie Yongji <xieyongji@bytedance.com>,
Christian Borntraeger <borntraeger@de.ibm.com>
Subject: Re: [RFC PATCH 1/1] virtio: write back features before verify
Date: Fri, 01 Oct 2021 17:18:46 +0200 [thread overview]
Message-ID: <87v92g3h9l.fsf@redhat.com> (raw)
In-Reply-To: <20211001162213.18d7375e.pasic@linux.ibm.com>
On Fri, Oct 01 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
> On Thu, 30 Sep 2021 13:31:04 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>>
>> > On Thu, 30 Sep 2021 11:28:23 +0200
>> > Cornelia Huck <cohuck@redhat.com> wrote:
>> >
>> >> On Thu, Sep 30 2021, Halil Pasic <pasic@linux.ibm.com> wrote:
>> >> > @@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
>> >> > if (device_features & (1ULL << i))
>> >> > __virtio_set_bit(dev, i);
>> >> >
>> >> > + /* Write back features before validate to know endianness */
>> >> > + if (device_features & (1ULL << VIRTIO_F_VERSION_1))
>> >> > + dev->config->finalize_features(dev);
>> >>
>> >> This really looks like a mess :(
>> >>
>> >> We end up calling ->finalize_features twice: once before ->validate, and
>> >> once after, that time with the complete song and dance. The first time,
>> >> we operate on one feature set; after validation, we operate on another,
>> >> and there might be interdependencies between the two (like a that a bit
>> >> is cleared because of another bit, which would not happen if validate
>> >> had a chance to clear that bit before).
>> >
>> > Basically the second set is a subset of the first set.
>>
>> I don't think that's clear.
>
> Validate can only remove features, or? So I guess after validate
> is a subset of before validate.
I was thinking about (more-or-less hypothetical) interdependencies (see
above). But that's not terribly important.
>
>
>>
>> >
>> >>
>> >> I'm not sure whether that is even a problem in the spec: while the
>> >> driver may read the config before finally accepting features
>> >
>> > I'm not sure I'm following you. Let me please qoute the specification:
>> > """
>> > 4. Read device feature bits, and write the subset of feature bits
>> > understood by the OS and driver to the device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration fields to check that it can support the device before accepting it.
>> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
>> > """
>> > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-930001
>>
>> Yes, exactly, it MAY read before accepting features. How does the device
>> know whether the config space is little-endian or not?
>>
>
> Well that is what we are talking about. One can try to infer things from
> the spec. This reset dance I called ugly is probably the cleanest,
> because the spec says that re-nego should work.
>
>> >
>> >> , it does
>> >> not really make sense to do so before a feature bit as basic as
>> >> VERSION_1 which determines the endianness has been negotiated.
>> >
>> > Are you suggesting that ->verify() should be after
>> > virtio_finalize_features()?
>>
>> No, that would defeat the entire purpose of verify. After
>> virtio_finalize_features(), we are done with feature negotiation.
>>
>
> Exactly!
It seems we are in violent agreement :)
>
>> > Wouldn't
>> > that mean that verify() can't reject feature bits? But that is the whole
>> > point of commit 82e89ea077b9 ("virtio-blk: Add validation for block size
>> > in config space"). Do you think that the commit in question is
>> > conceptually flawed? My understanding of the verify is, that it is supposed
>> > to fence features and feature bits we can't support, e.g. because of
>> > config space things, but I may be wrong.
>>
>> No, that commit is not really flawed on its own, I think the whole
>> procedure may be problematic.
>>
>
> I agree! But that regression really hurts us. Maybe the best band-aid is
> to conditional-compile it (not compile the check if s390).
It's probably most likely to hit on s390 (big-endian, and devices with a
blocksize != 512 in common use); but I'd like to make that band-aid more
generic than "exclude for s390". A hack for honouring VERSION_1 before
negotiation has finished is probably better as a stop-gap before we
manage to figure out how to deal with this properly.
>
>> >
>> > The trouble is, feature bits are not negotiated one by one, but basically all
>> > at once. I suppose, I did the next best thing to first negotiating
>> > VERSION_1.
>>
>> We probably need to special-case VERSION_1 to move at least forward;
>> i.e. proceed as if we accepted it when reading the config space.
>>
>> The problem is that we do not know what the device assumes when we read
>> the config space prior to setting FEATURES_OK. It may assume
>> little-endian if it offered VERSION_1, or it may not. The spec does not
>> really say what happens before feature negotiation has finished.
>>
> No it does not, but I hope, the implementations we care the most about do
> little endian if VERSION_1 is set but FEATURES_OK is not yet done. A
> transitional device would have to act upon a feature that is set,
> because for legacy there is no FEATURES_OK. Where we can run into
> trouble is minimum required feature set, e.g. mandatory features.
All ugly :(
>
> I will do some testing.
>
>> >
>> >
>> >> For
>> >> VERSION_1, we can probably go ahead and just assume that we will accept
>> >> it if offered, but what about other (future) bits?
>> >
>> > I don't quite understand.
>>
>> There might be other bits in the future that change how the config space
>> works. We cannot assume that any of those bits will be accepted if
>> offered; i.e. we need a special hack for VERSION_1.
>
> I tend to agree. What I didn't consider in this patch is that, setting
> bits does not only set bits, but may also change the device in a way,
> that clearing the bit would not change it back.
>
>>
>> >
>> > Anyway, how do you think we should solve this problem?
>>
>> This is a mess. For starters, we need to think about if we should do
>> something in the spec, and if yes, what.. Then, we can probably think
>> about how to implement that properly.
>>
>
> I agree.
>
>
>> As we have an error right now that is basically a regression, we
>> probably need a band-aid to keep going. Not sure if your patch is the
>> right approach, maybe we really need to special-case VERSION_1 (the
>> "assume we accepted it" hack mentioned above.) This will likely fix the
>> reported problem (I assume that is s390x on QEMU); do we know about
>> other VMMs? Any other big-endian architectures?
>
> I didn't quite get it. Would this hack take place in QEMU or in the guest
> kernel?
I'd say we need a hack here so that we assume little-endian config space
if VERSION_1 has been offered; if your patch here works, I assume QEMU
does what we expect (assmuming little-endian as well.) I'm mostly
wondering what happens if you use a different VMM; can we expect it to
work similar to QEMU? Even if it helps for s390, we should double-check
what happens for other architectures.
>
>>
>> Anyone have any better suggestions?
>>
>
> There is the conditional compile, as an option but I would not say it is
> better.
Yes, I agree.
Anyone else have an idea? This is a nasty regression; we could revert the
patch, which would remove the symptoms and give us some time, but that
doesn't really feel right, I'd do that only as a last resort.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-10-01 15:18 UTC|newest]
Thread overview: 131+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-30 1:20 [RFC PATCH 1/1] virtio: write back features before verify Halil Pasic
2021-09-30 1:20 ` Halil Pasic
2021-09-30 8:04 ` Christian Borntraeger
2021-09-30 8:04 ` Christian Borntraeger
2021-09-30 9:28 ` Cornelia Huck
2021-09-30 9:28 ` Cornelia Huck
2021-09-30 11:03 ` Halil Pasic
2021-09-30 11:03 ` Halil Pasic
2021-09-30 11:31 ` Cornelia Huck
2021-09-30 11:31 ` Cornelia Huck
2021-10-01 14:22 ` Halil Pasic
2021-10-01 14:22 ` Halil Pasic
2021-10-01 15:18 ` Cornelia Huck [this message]
2021-10-01 15:18 ` Cornelia Huck
2021-10-02 18:13 ` Michael S. Tsirkin
2021-10-02 18:13 ` Michael S. Tsirkin
2021-10-04 2:23 ` Halil Pasic
2021-10-04 2:23 ` Halil Pasic
2021-10-04 9:07 ` Michael S. Tsirkin
2021-10-04 9:07 ` Michael S. Tsirkin
2021-10-04 9:07 ` Michael S. Tsirkin
2021-10-05 10:06 ` Cornelia Huck
2021-10-05 10:06 ` Cornelia Huck
2021-10-05 10:06 ` Cornelia Huck
2021-10-05 10:43 ` Halil Pasic
2021-10-05 10:43 ` Halil Pasic
2021-10-05 10:43 ` Halil Pasic
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-05 11:13 ` Cornelia Huck
2021-10-05 11:13 ` Cornelia Huck
2021-10-05 11:13 ` Cornelia Huck
2021-10-05 11:20 ` Michael S. Tsirkin
2021-10-05 11:20 ` Michael S. Tsirkin
2021-10-05 11:20 ` Michael S. Tsirkin
2021-10-05 11:59 ` Halil Pasic
2021-10-05 11:59 ` Halil Pasic
2021-10-05 11:59 ` Halil Pasic
2021-10-05 15:25 ` Cornelia Huck
2021-10-05 15:25 ` Cornelia Huck
2021-10-05 15:25 ` Cornelia Huck
2021-10-04 7:01 ` Cornelia Huck
2021-10-04 7:01 ` Cornelia Huck
2021-10-04 9:25 ` Halil Pasic
2021-10-04 9:25 ` Halil Pasic
2021-10-04 9:51 ` Cornelia Huck
2021-10-04 9:51 ` Cornelia Huck
2021-10-02 12:09 ` Michael S. Tsirkin
2021-10-02 12:09 ` Michael S. Tsirkin
2021-09-30 11:12 ` Michael S. Tsirkin
2021-09-30 11:12 ` Michael S. Tsirkin
2021-09-30 11:36 ` Cornelia Huck
2021-09-30 11:36 ` Cornelia Huck
2021-10-02 18:20 ` Michael S. Tsirkin
2021-10-02 18:20 ` Michael S. Tsirkin
2021-10-03 5:00 ` Halil Pasic
2021-10-03 5:00 ` Halil Pasic
2021-10-03 6:42 ` Michael S. Tsirkin
2021-10-03 6:42 ` Michael S. Tsirkin
2021-10-03 7:26 ` Michael S. Tsirkin
2021-10-03 7:26 ` Michael S. Tsirkin
2021-10-04 12:01 ` Cornelia Huck
2021-10-04 12:01 ` Cornelia Huck
2021-10-04 12:01 ` [virtio-dev] " Cornelia Huck
2021-10-04 12:54 ` Michael S. Tsirkin
2021-10-04 12:54 ` Michael S. Tsirkin
2021-10-04 14:27 ` Cornelia Huck
2021-10-04 14:27 ` Cornelia Huck
2021-10-04 14:27 ` [virtio-dev] " Cornelia Huck
2021-10-04 15:05 ` Michael S. Tsirkin
2021-10-04 15:05 ` Michael S. Tsirkin
2021-10-04 15:05 ` [virtio-dev] " Michael S. Tsirkin
2021-10-04 15:45 ` Cornelia Huck
2021-10-04 15:45 ` Cornelia Huck
2021-10-04 15:45 ` Cornelia Huck
2021-10-04 20:01 ` Michael S. Tsirkin
2021-10-04 20:01 ` Michael S. Tsirkin
2021-10-05 7:38 ` Cornelia Huck
2021-10-05 7:38 ` Cornelia Huck
2021-10-05 7:38 ` Cornelia Huck
2021-10-05 11:17 ` Halil Pasic
2021-10-05 11:17 ` Halil Pasic
2021-10-05 11:22 ` Michael S. Tsirkin
2021-10-05 11:22 ` Michael S. Tsirkin
2021-10-05 15:20 ` Cornelia Huck
2021-10-05 15:20 ` Cornelia Huck
2021-10-05 15:20 ` Cornelia Huck
2021-10-05 15:20 ` Cornelia Huck
2021-10-01 7:21 ` Halil Pasic
2021-10-01 7:21 ` Halil Pasic
2021-10-02 10:21 ` Michael S. Tsirkin
2021-10-02 10:21 ` Michael S. Tsirkin
2021-10-04 12:19 ` Cornelia Huck
2021-10-04 12:19 ` Cornelia Huck
2021-10-04 12:19 ` Cornelia Huck
2021-10-04 13:11 ` Michael S. Tsirkin
2021-10-04 13:11 ` Michael S. Tsirkin
2021-10-04 13:11 ` Michael S. Tsirkin
2021-10-04 14:33 ` Cornelia Huck
2021-10-04 14:33 ` Cornelia Huck
2021-10-04 14:33 ` Cornelia Huck
2021-10-04 15:07 ` Michael S. Tsirkin
2021-10-04 15:07 ` Michael S. Tsirkin
2021-10-04 15:07 ` Michael S. Tsirkin
2021-10-04 15:50 ` Cornelia Huck
2021-10-04 15:50 ` Cornelia Huck
2021-10-04 15:50 ` Cornelia Huck
2021-10-04 19:17 ` Michael S. Tsirkin
2021-10-04 19:17 ` Michael S. Tsirkin
2021-10-04 19:17 ` Michael S. Tsirkin
2021-10-06 10:13 ` Cornelia Huck
2021-10-06 10:13 ` Cornelia Huck
2021-10-06 10:13 ` Cornelia Huck
2021-10-06 12:15 ` Michael S. Tsirkin
2021-10-06 12:15 ` Michael S. Tsirkin
2021-10-06 12:15 ` Michael S. Tsirkin
2021-10-05 7:25 ` Halil Pasic
2021-10-05 7:25 ` Halil Pasic
2021-10-05 7:25 ` Halil Pasic
2021-10-05 7:53 ` Michael S. Tsirkin
2021-10-05 7:53 ` Michael S. Tsirkin
2021-10-05 7:53 ` Michael S. Tsirkin
2021-10-05 10:46 ` Halil Pasic
2021-10-05 10:46 ` Halil Pasic
2021-10-05 10:46 ` Halil Pasic
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-05 11:11 ` Michael S. Tsirkin
2021-10-01 14:34 ` Christian Borntraeger
2021-10-01 14:34 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v92g3h9l.fsf@redhat.com \
--to=cohuck@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=jasowang@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=markver@us.ibm.com \
--cc=mst@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=xieyongji@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.