* [virtio-dev] packed ring layout proposal v3
@ 2017-09-10 5:06 Michael S. Tsirkin
2017-09-12 16:20 ` Willem de Bruijn
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-09-10 5:06 UTC (permalink / raw)
To: virtio-dev; +Cc: virtualization
This is an update from v2 version.
Changes:
- update event suppression mechanism
- add wrap counter: DESC_WRAP flag in addition to
DESC_DRIVER flag used for validity so device does not have to
write out all used descriptors.
- more options especially helpful for hardware implementations
- in-order processing option due to popular demand
- list of TODO items to consider as a follow-up, only two are
open questions we need to descide now, marked as blocker,
others are future enhancement ideas.
---
Performance analysis of this is in my kvm forum 2016 presentation.
The idea is to have a r/w descriptor in a ring structure,
replacing the used and available ring, index and descriptor
buffer.
Note: the following mode of operation will actually work
without changes when descriptor rings do not overlap, with driver
writing out available entries in a read-only driver descriptor ring,
device writing used entries in a write-only device descriptor ring.
TODO: does this have any value for some? E.g. as a security feature?
* Descriptor ring:
Driver writes descriptors with unique index values and DESC_DRIVER set in flags.
Descriptors are written in a ring order: from start to end of ring,
wrapping around to the beginning.
Device writes used descriptors with correct len, index, and DESC_HW clear.
Again descriptors are written in ring order. This might not be the same
order of driver descriptors, and not all descriptors have to be written out.
Driver and device are expected to maintain (internally) a wrap-around
bit, starting at 0 and changing value each time they start writing out
descriptors at the beginning of the ring. This bit is passed as
DESC_WRAP bit in the flags field.
Flags are always set/cleared last.
Note that driver can write descriptors out in any order, but device
will not execute descriptor X+1 until descriptor X has been
read as valid.
Driver operation:
Driver makes descriptors available to device by writing out descriptors
in the descriptor ring. Once ring is full, driver waits for device to
use some descriptors before making more available.
Descriptors can be used by device in any order, but must be read from
ring in-order, and must be read completely before starting use. Thus,
once a descriptor is used, driver can over-write both this descriptor
and any descriptors which preceded it in the ring.
Driver can detect use of descriptor either by device specific means
(e.g. detect a buffer data change by device) or in a generic way
by detecting that a used buffer has been written out by device.
Driver writes out available scatter/gather descriptor entries in guest
descriptor format:
#define DESC_WRAP 0x0040
#define DESC_DRIVER 0x0080
struct desc {
__le64 addr;
__le32 len;
__le16 index;
__le16 flags;
};
Fields:
addr - physical address of a s/g entry
len - length of an entry
index - unique index. The low $\lceil log(N) \rceil - 1$
bits of the index is a driver-supplied value which can have any value
(under driver control). The high bits are reserved and should be
set to 0.
flags - descriptor flags.
Descriptors written by driver have DESC_DRIVER set.
Writing out this field makes the descriptor available for the device to use,
so all other fields must be valid when it is written.
DESC_WRAP - device uses this field to detect descriptor change by driver.
Driver can use 1 bit to set direction
/* This marks a descriptor as write-only (otherwise read-only). */
#define VRING_DESC_F_WRITE 2
Device operation (using descriptors):
Device is looking for descriptors in ring order. After detecting that
the flags value has changed with DESC_DRIVER set and DESC_WRAP matching
the wrap-around counter, it can start using the descriptors.
Descriptors can be used in any order, but must be read from ring
in-order. In other words device must not read descriptor X after it
started using descriptor X+1. Further, all buffer descriptors must be
read completely before device starts using the buffer.
This because once a descriptor is used, driver can over-write both this
descriptor and any preceeding descriptors in ring.
To help driver detect use of descriptors and to pass extra meta-data
to driver, device writes out used entries in device descriptor format:
#define DESC_WRAP 0x0040
#define DESC_DRIVER 0x0080
struct desc {
__le64 reserved;
__le32 len;
__le16 index;
__le16 flags;
};
Fields:
reserved - can be any value, ignored by driver
len - length written by device. only valid if VRING_DESC_F_WRITE is set
len bytes of data from beginning of buffer are assumed to have been updated
index - unique index copied from the driver descriptor that has been used.
flags - descriptor flags.
Descriptors written by device have DESC_DRIVER clear.
Writing out this field notifies the driver that it can re-use the
descriptor id. It is also a signal that driver can over-write the
relevant descriptor (with the supplied id), and any other
DESC_WRAP - driver uses this field to detect descriptor change by device.
If device has used a buffer containing a write descriptor, it sets this bit:
#define VRING_DESC_F_WRITE 2
* Driver scatter/gather support
Driver can use 1 bit to chain s/g entries in a request, similar to virtio 1.0:
/* This marks a buffer as continuing in the next ring entry. */
#define VRING_DESC_F_NEXT 1
When driver descriptors are chained in this way, multiple
descriptors are treated as a part of a single transaction
containing an optional write buffer followed by an optional read buffer.
All descriptors in the chain must have the same ID.
Unlike virtio 1.0, use of this flag will be an optional feature
so both devices and drivers can opt out of it.
If they do, they can either negotiate indirect descriptors or use
single-descriptor entries exclusively for buffers.
Device might detect a partial descriptor chain (VRING_DESC_F_NEXT
set but next descriptor not valid). In that case it must not
use any parts of the chain - it will later be completed by driver,
but device is allowed to store the valid parts of the chain as
driver is not allowed to change them anymore.
Two options are available:
Device can write out the same number of descriptors for the chain,
setting VRING_DESC_F_NEXT for all but the last descriptor.
Driver will ignore all used descriptors with VRING_DESC_F_NEXT bit set.
Device only writes out a single descriptor for the whole chain.
However, to keep device and driver in sync, it then skips a number of
descriptors corresponding to the length of the chain before writing out
the next used descriptor.
After detecting a used descriptor driver must find out the length of the
chain that it built in order to know where to look for the next
device descriptor.
* Indirect buffers
Indirect buffer descriptors is an optional feature.
These are always written by driver, not the device.
Indirect buffers have a special flag bit set - like in virtio 1.0:
/* This means the buffer contains a table of buffer descriptors. */
#define VRING_DESC_F_INDIRECT 4
VRING_DESC_F_WRITE and VRING_DESC_F_NEXT are always clear.
len specifies the length of the indirect descriptor buffer in bytes
and must be a multiple of 16.
Unlike virtio 1.0, the buffer pointed to is a table, not a list:
struct indirect_descriptor_table {
/* The actual descriptors (16 bytes each) */
struct indirect_desc desc[len / 16];
};
The first descriptor is located at start of the indirect descriptor
table, additional indirect descriptors come immediately afterwards.
struct indirect_desc {
__le64 addr;
__le32 len;
__le16 reserved;
__le16 flags;
};
DESC_F_WRITE is the only valid flag for descriptors in the indirect
table. Others should be set to 0 and are ignored. reserved field is
also set to 0 and should be ignored.
TODO (blocker): virtio 1.0 allows a s/g entry followed by
an indirect descriptor. Is this useful?
This support would be an optional feature, same as in virtio 1.0
* Batching descriptors:
virtio 1.0 allows passing a batch of descriptors in both directions, by
incrementing the used/avail index by values > 1.
At the moment only batching of used descriptors is used.
We can support this by chaining a list of device descriptors through
VRING_DESC_F_MORE flag. Device sets this bit to signal
driver that this is part of a batch of used descriptors
which are all part of a single transaction.
Driver might detect a partial descriptor chain (VRING_DESC_F_MORE
set but next descriptor not valid). In that case it must not
use any parts of the chain - it will later be completed by device,
but driver is allowed to store the valid parts of the chain as
device is not allowed to change them anymore.
Descriptor should not have both VRING_DESC_F_MORE and
VRING_DESC_F_NEXT set.
* Using descriptors in order
Some devices can guarantee that descriptors are used in
the order in which they were made available.
This allows driver optimizations and can be negotiated through
a feature bit.
* Per ring flags
It is useful to support features for some rings but not others.
E.g. it's reasonable to use single buffers for RX rings but
sg or indirect for TX rings of the network device.
Generic configuration space will be extended so features can
be negotiated per ring.
* Selective use of descriptors
As described above, descriptors with NEXT bit set are part of a
scatter/gather chain and so do not have to cause device to write a used
descriptor out.
Similarly, driver can set a flag VRING_DESC_F_MORE in the descriptor to
signal to device that it does not have to write out the used descriptor
as it is part of a batch of descriptors. Device has two options (similar
to VRING_DESC_F_NEXT):
Device can write out the same number of descriptors for the batch,
setting VRING_DESC_F_MORE for all but the last descriptor.
Driver will ignore all used descriptors with VRING_DESC_F_MORE bit set.
Device only writes out a single descriptor for the whole batch.
However, to keep device and driver in sync, it then skips a number of
descriptors corresponding to the length of the batch before writing out
the next used descriptor.
After detecting a used descriptor driver must find out the length of the
batch that it built in order to know where to look for the next
device descriptor.
TODO (blocker): skipping descriptors for selective and
scatter/gather seem to be only requested with in-order right now. Let's
require in-order for this skipping? This will simplify the accounting
by driver.
* Interrupt/event suppression
virtio 1.0 has two mechanisms for suppression but only
one can be used at a time. we pack them together
in a structure - one for interrupts, one for notifications:
struct event {
__le16 idx;
__le16 flags;
}
Both fields would be optional, with a feature bit:
VIRTIO_F_EVENT_IDX
VIRTIO_F_EVENT_FLAGS
Flags can be used like in virtio 1.0, by storing a special
value there:
#define VRING_F_EVENT_ENABLE 0x0
#define VRING_F_EVENT_DISABLE 0x1
Event index includes the index of the descriptor
which should trigger the event, and the wrap counter
in the high bit.
In that case, interrupt triggers when descriptor is written at a given
location in the ring (or skipped in case of NEXT/MORE).
If both features are negotiated, a special flags value
can be used to switch to event idx:
#define VRING_F_EVENT_IDX 0x2
* Available notification
Driver currently writes out the queue number to device to
kick off ring processing.
As queue number is currently 16 bit, we can extend that
to additionally include the offset within ring of the descriptor
which triggered the kick event in bits 16 to 30,
and the wrap counter in the high bit (31).
Device is allowed to pre-fetch descriptors beyond the specified
offset but is not required to do so.
* TODO: interrupt coalescing
Does it make sense just for networking or for all devices?
If later should we make it a per ring or a global feature?
* TODO: event index/flags in device memory?
Should we move the event index/flags to device memory?
Might be helpful for hardware configuration so they do not
need to do DMA reads to check whether interrupt is needed.
OTOH maybe interrupt coalescing is sufficient for this.
* TODO: Device specific descriptor flags
We have a lot of unused space in the descriptor. This can be put to
good use by reserving some flag bits for device use.
For example, network device can set a bit to request
that header in the descriptor is suppressed
(in case it's all 0s anyway). This reduces cache utilization.
Note: this feature can be supported in virtio 1.0 as well,
as we have unused bits in both descriptor and used ring there.
* TODO: Descriptor length in device descriptors
Some devices use identically-sized buffers in all descriptors.
Ignoring length for driver descriptors there could be an option too.
* TODO: Writing at an offset
Some devices might want to write into some descriptors
at an offset, the offset would be in reserved field in the descriptor,
possibly a descriptor flag could indicate this:
#define VRING_DESC_F_OFFSET 0x0020
How exactly to use the offset would be device specific,
for example it can be used to align beginning of packet
in the 1st buffer for mergeable buffers to cache line boundary
while also aligning rest of buffers.
* TODO: Non power-of-2 ring sizes
As the ring simply wraps around, there's no reason to
require ring size to be power of two.
It can be made a separate feature though.
TODO: limits on buffer alignment/size
Seems to be useful for RX for networking.
Is there a need to negotiate above in a generic way
or is this a networking specific optimization?
TODO: expose wrap counters to device for debugging
TODO: expose last avail/used offsets to device/driver for debugging
TODO: ability to reset individual rings
---
Note: should this proposal be accepted and approved, one or more
claims disclosed to the TC admin and listed on the Virtio TC
IPR page https://www.oasis-open.org/committees/virtio/ipr.php
might become Essential Claims.
Note: the page above is unfortunately out of date and out of
my hands. I'm in the process of updating ipr disclosures
in github instead. Will make sure all is in place before
this proposal is put to vote. As usual this TC operates under the
Non-Assertion Mode of the OASIS IPR Policy, which protects
anyone implementing the virtio spec.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-10 5:06 Michael S. Tsirkin
@ 2017-09-12 16:20 ` Willem de Bruijn
2017-09-20 9:11 ` Liang, Cunming
2017-09-21 13:36 ` Liang, Cunming
2 siblings, 0 replies; 14+ messages in thread
From: Willem de Bruijn @ 2017-09-12 16:20 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, virtualization
On Sun, Sep 10, 2017 at 1:06 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> This is an update from v2 version.
> Changes:
> - update event suppression mechanism
> - add wrap counter: DESC_WRAP flag in addition to
> DESC_DRIVER flag used for validity so device does not have to
> write out all used descriptors.
> - more options especially helpful for hardware implementations
> - in-order processing option due to popular demand
> - list of TODO items to consider as a follow-up, only two are
> open questions we need to descide now, marked as blocker,
> others are future enhancement ideas.
Perhaps this would make a good topic for a BoF session at the upcoming
netdev. A new ring structure can be useful elsewhere, too, such as for
af_packet v4.
>
> ---
>
> Performance analysis of this is in my kvm forum 2016 presentation.
> The idea is to have a r/w descriptor in a ring structure,
> replacing the used and available ring, index and descriptor
> buffer.
>
> Note: the following mode of operation will actually work
> without changes when descriptor rings do not overlap, with driver
> writing out available entries in a read-only driver descriptor ring,
> device writing used entries in a write-only device descriptor ring.
The ring is always read-write, as the consumer has to toggle the
DESC_DRIVER flag, right? Which mode are you referring to?
> TODO: does this have any value for some? E.g. as a security feature?
>
>
> * Descriptor ring:
>
> Driver writes descriptors with unique index values and DESC_DRIVER set in flags.
> Descriptors are written in a ring order: from start to end of ring,
> wrapping around to the beginning.
> Device writes used descriptors with correct len, index, and DESC_HW clear.
> Again descriptors are written in ring order. This might not be the same
> order of driver descriptors, and not all descriptors have to be written out.
Virtio rings are used in both directions between guest device driver and
host device, as well as in increasingly many situations outside vm i/o.
I suggest using producer and consumer instead of driver and device
when describing ring operations.
When working on the virtio-net tx path, having to invert all the documentation
in my head, because it is written from an rx point of view was a bit tricky ;-)
I would then also convert DESC_DRIVER to DESC_VALID or so.
> Driver and device are expected to maintain (internally) a wrap-around
> bit, starting at 0 and changing value each time they start writing out
> descriptors at the beginning of the ring. This bit is passed as
> DESC_WRAP bit in the flags field.
So, the flag effectively doubles the namespace of the id from 16 bit to
17 bit? Instead, how about using a larger identifier. Such as 32 bit.
This also future proofs the design for cases where the ring may grow
to exceed 65536 entries. Doing so is not a short term change, but it
ould avoid the need for indirect descriptors and give greated room
for out of order acknowledgement.
> Flags are always set/cleared last.
>
> Note that driver can write descriptors out in any order, but device
> will not execute descriptor X+1 until descriptor X has been
> read as valid.
Why this constraint on the ring?
> Driver operation:
>
> Driver makes descriptors available to device by writing out descriptors
> in the descriptor ring. Once ring is full, driver waits for device to
> use some descriptors before making more available.
>
> Descriptors can be used by device in any order, but must be read from
> ring in-order, and must be read completely before starting use. Thus,
> once a descriptor is used, driver can over-write both this descriptor
> and any descriptors which preceded it in the ring.
Does this mean that completing a descriptor by the consumer implicitly
completes all descriptors that precede it in the ring?
> Driver can detect use of descriptor either by device specific means
> (e.g. detect a buffer data change by device) or in a generic way
> by detecting that a used buffer has been written out by device.
I don't quite follow this.
> Driver writes out available scatter/gather descriptor entries in guest
> descriptor format:
>
>
> #define DESC_WRAP 0x0040
> #define DESC_DRIVER 0x0080
>
> struct desc {
> __le64 addr;
> __le32 len;
> __le16 index;
> __le16 flags;
> };
>
> Fields:
>
> addr - physical address of a s/g entry
> len - length of an entry
Is this ever larger than 16-bit? If not, then reducing to 16-bit allows
growing index to 32-bit.
> index - unique index. The low $\lceil log(N) \rceil - 1$
> bits of the index is a driver-supplied value which can have any value
> (under driver control). The high bits are reserved and should be
> set to 0.
>
> flags - descriptor flags.
>
> Descriptors written by driver have DESC_DRIVER set.
>
> Writing out this field makes the descriptor available for the device to use,
> so all other fields must be valid when it is written.
>
> DESC_WRAP - device uses this field to detect descriptor change by driver.
>
> Driver can use 1 bit to set direction
> /* This marks a descriptor as write-only (otherwise read-only). */
> #define VRING_DESC_F_WRITE 2
This is a per-ring flag, as opposed to the per-descriptor flags DESC_*.
Please make that explicit and state in which structure it is set.
>
>
> Device operation (using descriptors):
>
> Device is looking for descriptors in ring order. After detecting that
> the flags value has changed with DESC_DRIVER set and DESC_WRAP matching
> the wrap-around counter, it can start using the descriptors.
> Descriptors can be used in any order, but must be read from ring
> in-order. In other words device must not read descriptor X after it
> started using descriptor X+1.
Why? This is the same question as above, really. This seems like a
device constraint, not necessarily a constraint to impose on the ring
format.
> Further, all buffer descriptors must be
> read completely before device starts using the buffer.
>
> This because once a descriptor is used, driver can over-write both this
> descriptor and any preceeding descriptors in ring.
This does explain the above constraint. I guess that I just do not
understand the reason for this behavior.
> To help driver detect use of descriptors and to pass extra meta-data
> to driver, device writes out used entries in device descriptor format:
>
>
> #define DESC_WRAP 0x0040
> #define DESC_DRIVER 0x0080
>
> struct desc {
> __le64 reserved;
> __le32 len;
> __le16 index;
> __le16 flags;
> };
>
> Fields:
>
> reserved - can be any value, ignored by driver
> len - length written by device. only valid if VRING_DESC_F_WRITE is set
> len bytes of data from beginning of buffer are assumed to have been updated
> index - unique index copied from the driver descriptor that has been used.
> flags - descriptor flags.
>
> Descriptors written by device have DESC_DRIVER clear.
>
> Writing out this field notifies the driver that it can re-use the
> descriptor id. It is also a signal that driver can over-write the
> relevant descriptor (with the supplied id), and any other
>
> DESC_WRAP - driver uses this field to detect descriptor change by device.
>
> If device has used a buffer containing a write descriptor, it sets this bit:
> #define VRING_DESC_F_WRITE 2
>
> * Driver scatter/gather support
>
> Driver can use 1 bit to chain s/g entries in a request, similar to virtio 1.0:
>
> /* This marks a buffer as continuing in the next ring entry. */
> #define VRING_DESC_F_NEXT 1
Isn't this a descriptor flag, so DESC_NEXT?
>
> When driver descriptors are chained in this way, multiple
> descriptors are treated as a part of a single transaction
> containing an optional write buffer followed by an optional read buffer.
Can you elaborate on the last part about optional write and read buffer?
> All descriptors in the chain must have the same ID.
If so, then the explicit flag is not needed?
> Unlike virtio 1.0, use of this flag will be an optional feature
> so both devices and drivers can opt out of it.
> If they do, they can either negotiate indirect descriptors or use
> single-descriptor entries exclusively for buffers.
>
> Device might detect a partial descriptor chain (VRING_DESC_F_NEXT
> set but next descriptor not valid).
This can be forbidden, by requiring the producer to set the
DESC_DRIVER bit on the first descriptor only after the entire
chain has been written.
Do chains have to consist of consecutive descriptors?
> In that case it must not
> use any parts of the chain - it will later be completed by driver,
> but device is allowed to store the valid parts of the chain as
> driver is not allowed to change them anymore.
>
> Two options are available:
>
> Device can write out the same number of descriptors for the chain,
> setting VRING_DESC_F_NEXT for all but the last descriptor.
> Driver will ignore all used descriptors with VRING_DESC_F_NEXT bit set.
>
> Device only writes out a single descriptor for the whole chain.
> However, to keep device and driver in sync, it then skips a number of
> descriptors corresponding to the length of the chain before writing out
> the next used descriptor.
> After detecting a used descriptor driver must find out the length of the
> chain that it built in order to know where to look for the next
> device descriptor.
>
> * Indirect buffers
>
> Indirect buffer descriptors is an optional feature.
> These are always written by driver, not the device.
> Indirect buffers have a special flag bit set - like in virtio 1.0:
>
> /* This means the buffer contains a table of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT 4
>
> VRING_DESC_F_WRITE and VRING_DESC_F_NEXT are always clear.
>
> len specifies the length of the indirect descriptor buffer in bytes
> and must be a multiple of 16.
Multiple of sizeof(struct indirect_desc).
Also, struct indirect_desc is identical to struct desc. No need for a
separate struct definition?
>
> Unlike virtio 1.0, the buffer pointed to is a table, not a list:
This is just a linear array, right?
> struct indirect_descriptor_table {
> /* The actual descriptors (16 bytes each) */
> struct indirect_desc desc[len / 16];
> };
>
> The first descriptor is located at start of the indirect descriptor
> table, additional indirect descriptors come immediately afterwards.
>
> struct indirect_desc {
> __le64 addr;
> __le32 len;
> __le16 reserved;
> __le16 flags;
> };
>
>
> DESC_F_WRITE is the only valid flag for descriptors in the indirect
> table. Others should be set to 0 and are ignored. reserved field is
> also set to 0 and should be ignored.
>
> TODO (blocker): virtio 1.0 allows a s/g entry followed by
> an indirect descriptor. Is this useful?
Sounds like unnecessary complexity.
> This support would be an optional feature, same as in virtio 1.0
>
> * Batching descriptors:
>
> virtio 1.0 allows passing a batch of descriptors in both directions, by
> incrementing the used/avail index by values > 1.
> At the moment only batching of used descriptors is used.
>
> We can support this by chaining a list of device descriptors through
> VRING_DESC_F_MORE flag. Device sets this bit to signal
> driver that this is part of a batch of used descriptors
> which are all part of a single transaction.
>
> Driver might detect a partial descriptor chain (VRING_DESC_F_MORE
> set but next descriptor not valid). In that case it must not
> use any parts of the chain - it will later be completed by device,
> but driver is allowed to store the valid parts of the chain as
> device is not allowed to change them anymore.
>
> Descriptor should not have both VRING_DESC_F_MORE and
> VRING_DESC_F_NEXT set.
>
> * Using descriptors in order
>
> Some devices can guarantee that descriptors are used in
> the order in which they were made available.
> This allows driver optimizations and can be negotiated through
> a feature bit.
>
> * Per ring flags
>
> It is useful to support features for some rings but not others.
> E.g. it's reasonable to use single buffers for RX rings but
> sg or indirect for TX rings of the network device.
> Generic configuration space will be extended so features can
> be negotiated per ring.
>
> * Selective use of descriptors
>
> As described above, descriptors with NEXT bit set are part of a
> scatter/gather chain and so do not have to cause device to write a used
> descriptor out.
>
> Similarly, driver can set a flag VRING_DESC_F_MORE in the descriptor to
> signal to device that it does not have to write out the used descriptor
> as it is part of a batch of descriptors. Device has two options (similar
> to VRING_DESC_F_NEXT):
>
> Device can write out the same number of descriptors for the batch,
> setting VRING_DESC_F_MORE for all but the last descriptor.
> Driver will ignore all used descriptors with VRING_DESC_F_MORE bit set.
>
> Device only writes out a single descriptor for the whole batch.
> However, to keep device and driver in sync, it then skips a number of
> descriptors corresponding to the length of the batch before writing out
> the next used descriptor.
> After detecting a used descriptor driver must find out the length of the
> batch that it built in order to know where to look for the next
> device descriptor.
>
>
> TODO (blocker): skipping descriptors for selective and
> scatter/gather seem to be only requested with in-order right now. Let's
> require in-order for this skipping? This will simplify the accounting
> by driver.
>
>
> * Interrupt/event suppression
>
> virtio 1.0 has two mechanisms for suppression but only
> one can be used at a time. we pack them together
> in a structure - one for interrupts, one for notifications:
>
> struct event {
> __le16 idx;
> __le16 flags;
> }
>
> Both fields would be optional, with a feature bit:
> VIRTIO_F_EVENT_IDX
> VIRTIO_F_EVENT_FLAGS
>
> Flags can be used like in virtio 1.0, by storing a special
> value there:
>
> #define VRING_F_EVENT_ENABLE 0x0
>
> #define VRING_F_EVENT_DISABLE 0x1
>
> Event index includes the index of the descriptor
> which should trigger the event, and the wrap counter
> in the high bit.
>
> In that case, interrupt triggers when descriptor is written at a given
> location in the ring (or skipped in case of NEXT/MORE).
>
> If both features are negotiated, a special flags value
> can be used to switch to event idx:
>
> #define VRING_F_EVENT_IDX 0x2
>
> * Available notification
>
> Driver currently writes out the queue number to device to
> kick off ring processing.
>
> As queue number is currently 16 bit, we can extend that
> to additionally include the offset within ring of the descriptor
> which triggered the kick event in bits 16 to 30,
> and the wrap counter in the high bit (31).
>
> Device is allowed to pre-fetch descriptors beyond the specified
> offset but is not required to do so.
>
>
>
> * TODO: interrupt coalescing
>
> Does it make sense just for networking or for all devices?
> If later should we make it a per ring or a global feature?
>
>
> * TODO: event index/flags in device memory?
>
> Should we move the event index/flags to device memory?
> Might be helpful for hardware configuration so they do not
> need to do DMA reads to check whether interrupt is needed.
Agreed. This also resembles physical devices more closely.
> OTOH maybe interrupt coalescing is sufficient for this.
>
>
> * TODO: Device specific descriptor flags
>
> We have a lot of unused space in the descriptor. This can be put to
> good use by reserving some flag bits for device use.
> For example, network device can set a bit to request
> that header in the descriptor is suppressed
> (in case it's all 0s anyway). This reduces cache utilization.
>
> Note: this feature can be supported in virtio 1.0 as well,
> as we have unused bits in both descriptor and used ring there.
>
> * TODO: Descriptor length in device descriptors
>
> Some devices use identically-sized buffers in all descriptors.
> Ignoring length for driver descriptors there could be an option too.
>
> * TODO: Writing at an offset
>
> Some devices might want to write into some descriptors
> at an offset, the offset would be in reserved field in the descriptor,
> possibly a descriptor flag could indicate this:
>
> #define VRING_DESC_F_OFFSET 0x0020
>
> How exactly to use the offset would be device specific,
> for example it can be used to align beginning of packet
> in the 1st buffer for mergeable buffers to cache line boundary
> while also aligning rest of buffers.
>
> * TODO: Non power-of-2 ring sizes
>
> As the ring simply wraps around, there's no reason to
> require ring size to be power of two.
> It can be made a separate feature though.
>
>
> TODO: limits on buffer alignment/size
>
> Seems to be useful for RX for networking.
> Is there a need to negotiate above in a generic way
> or is this a networking specific optimization?
>
> TODO: expose wrap counters to device for debugging
>
> TODO: expose last avail/used offsets to device/driver for debugging
>
> TODO: ability to reset individual rings
>
> ---
>
> Note: should this proposal be accepted and approved, one or more
> claims disclosed to the TC admin and listed on the Virtio TC
> IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> might become Essential Claims.
> Note: the page above is unfortunately out of date and out of
> my hands. I'm in the process of updating ipr disclosures
> in github instead. Will make sure all is in place before
> this proposal is put to vote. As usual this TC operates under the
> Non-Assertion Mode of the OASIS IPR Policy, which protects
> anyone implementing the virtio spec.
>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [virtio-dev] packed ring layout proposal v3
2017-09-10 5:06 Michael S. Tsirkin
2017-09-12 16:20 ` Willem de Bruijn
@ 2017-09-20 9:11 ` Liang, Cunming
2017-09-25 22:24 ` Michael S. Tsirkin
2017-09-21 13:36 ` Liang, Cunming
2 siblings, 1 reply; 14+ messages in thread
From: Liang, Cunming @ 2017-09-20 9:11 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-dev@lists.oasis-open.org
Cc: virtualization@lists.linux-foundation.org
Hi Michael,
> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> On Behalf Of Michael S. Tsirkin
> Sent: Sunday, September 10, 2017 1:06 PM
> To: virtio-dev@lists.oasis-open.org
> Cc: virtualization@lists.linux-foundation.org
> Subject: [virtio-dev] packed ring layout proposal v3
>
[...]
> * Descriptor ring:
>
> Driver writes descriptors with unique index values and DESC_DRIVER set in
> flags.
> Descriptors are written in a ring order: from start to end of ring, wrapping
> around to the beginning.
> Device writes used descriptors with correct len, index, and DESC_HW clear.
> Again descriptors are written in ring order. This might not be the same order
> of driver descriptors, and not all descriptors have to be written out.
>
> Driver and device are expected to maintain (internally) a wrap-around bit,
> starting at 0 and changing value each time they start writing out descriptors
> at the beginning of the ring. This bit is passed as DESC_WRAP bit in the flags
> field.
One simple question there, trying to understand the usage of DESC_WRAP flag.
DESC_WRAP bit is a new flag since v2. It's used to address 'non power-of-2 ring sizes' mentioned in v2?
Being confused by the statement of wrap-around bit here, it's an internal wrap-around counter represented by single bit (0/1)?
DESC_WRAP can appear on any descriptor entry in the ring, why it highlights changing value at the beginning of the ring?
>
> Flags are always set/cleared last.
>
> Note that driver can write descriptors out in any order, but device will not
> execute descriptor X+1 until descriptor X has been read as valid.
>
> Driver operation:
>
[...]
>
> DESC_WRAP - device uses this field to detect descriptor change by driver.
Device uses this field to detect change of wrap-around boundary by driver?
[...]
>
> Device operation (using descriptors):
>
[...]
>
> DESC_WRAP - driver uses this field to detect descriptor change by device.
Driver uses this field to detect change of wrap-around boundary by device?
By using this, driver doesn't need to maintain any internal wrap-around count, but being aware of wrap-around by DESC_WRAP flag.
Thanks,
Steve
>
[...]
>
> ---
>
> Note: should this proposal be accepted and approved, one or more
> claims disclosed to the TC admin and listed on the Virtio TC
> IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> might become Essential Claims.
> Note: the page above is unfortunately out of date and out of
> my hands. I'm in the process of updating ipr disclosures
> in github instead. Will make sure all is in place before
> this proposal is put to vote. As usual this TC operates under the
> Non-Assertion Mode of the OASIS IPR Policy, which protects
> anyone implementing the virtio spec.
>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [virtio-dev] packed ring layout proposal v3
2017-09-10 5:06 Michael S. Tsirkin
2017-09-12 16:20 ` Willem de Bruijn
2017-09-20 9:11 ` Liang, Cunming
@ 2017-09-21 13:36 ` Liang, Cunming
2017-09-28 21:27 ` Michael S. Tsirkin
2 siblings, 1 reply; 14+ messages in thread
From: Liang, Cunming @ 2017-09-21 13:36 UTC (permalink / raw)
To: Michael S. Tsirkin, virtio-dev@lists.oasis-open.org
Cc: virtualization@lists.linux-foundation.org
Hi,
> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org] On
> Behalf Of Michael S. Tsirkin
> Sent: Sunday, September 10, 2017 1:06 PM
> To: virtio-dev@lists.oasis-open.org
> Cc: virtualization@lists.linux-foundation.org
> Subject: [virtio-dev] packed ring layout proposal v3
>
[...]
> * Batching descriptors:
>
> virtio 1.0 allows passing a batch of descriptors in both directions, by incrementing
> the used/avail index by values > 1.
> At the moment only batching of used descriptors is used.
>
> We can support this by chaining a list of device descriptors through
> VRING_DESC_F_MORE flag. Device sets this bit to signal driver that this is part of
> a batch of used descriptors which are all part of a single transaction.
It supposes each s/g chain represents for a packet, while each descriptor among batching chain represents for a packet. There're a few thoughts of batching chain(by VRING_DESC_F_MORE) and s/g chain(by VRING_DESC_F_NEXT).
- batching chain: It's up to device to coalesce the write-out of a batched used descriptors. As the batching size is variable, driver has to detect validity of each descriptor unless the number of incoming valid descriptor is predictable, being curious on the benefits of driver from VRING_DESC_F_MORE to take batching descriptors as a single transaction. On device perspective, it's great to write out one descriptor for the whole chain. However, it assumes no other useful fields in each descriptor of chain needs to write. TX buffer reclaiming can be the candidate, while RX side has to update 'len' at least. Even for this purpose, instead of writing out VRING_DESC_F_MORE on a few descriptors to suppress device writing back, it's cheaper to set flag (e.g. VRING_DESC_F_WB) on single descriptor of chain to hint the expected position for device to write back.
- s/g chain: It makes sense to indicate the packet boundary. Considering in-order descriptor ring without VRING_DESC_F_INDIRECT, the next descriptor always belongs to the same s/g chain until end of packet indicators occur. So one alternative approach is only to set a flag (e.g. VRING_DESC_F_EOP) on the last descriptor of the chain.
>
> Driver might detect a partial descriptor chain (VRING_DESC_F_MORE set but next
> descriptor not valid). In that case it must not use any parts of the chain - it will
> later be completed by device, but driver is allowed to store the valid parts of the
> chain as device is not allowed to change them anymore.
As each descriptor represent for a whole packet(otherwise it's s/g chain), wondering why it must not use any parts of the chain.
>
> Descriptor should not have both VRING_DESC_F_MORE and
> VRING_DESC_F_NEXT set.
>
[...]
>
> * Selective use of descriptors
>
> As described above, descriptors with NEXT bit set are part of a scatter/gather
> chain and so do not have to cause device to write a used descriptor out.
>
> Similarly, driver can set a flag VRING_DESC_F_MORE in the descriptor to signal to
> device that it does not have to write out the used descriptor as it is part of a batch
> of descriptors. Device has two options (similar to VRING_DESC_F_NEXT):
>
> Device can write out the same number of descriptors for the batch, setting
> VRING_DESC_F_MORE for all but the last descriptor.
> Driver will ignore all used descriptors with VRING_DESC_F_MORE bit set.
It will write out last descriptor without VRING_DESC_F_MORE anyway, so the following statement seems not like another option.
>
> Device only writes out a single descriptor for the whole batch.
> However, to keep device and driver in sync, it then skips a number of descriptors
> corresponding to the length of the batch before writing out the next used
> descriptor.
> After detecting a used descriptor driver must find out the length of the batch that
> it built in order to know where to look for the next device descriptor.
It would be good to keep it simple on device side, and to have the driver control the expectation.
>
>
> TODO (blocker): skipping descriptors for selective and scatter/gather seem to be
> only requested with in-order right now. Let's require in-order for this skipping?
> This will simplify the accounting by driver.
>
>
Thanks,
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-20 9:11 ` Liang, Cunming
@ 2017-09-25 22:24 ` Michael S. Tsirkin
2017-09-26 23:38 ` Steven Luong (sluong)
0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-09-25 22:24 UTC (permalink / raw)
To: Liang, Cunming
Cc: virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Wed, Sep 20, 2017 at 09:11:57AM +0000, Liang, Cunming wrote:
> Hi Michael,
>
> > -----Original Message-----
> > From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> > On Behalf Of Michael S. Tsirkin
> > Sent: Sunday, September 10, 2017 1:06 PM
> > To: virtio-dev@lists.oasis-open.org
> > Cc: virtualization@lists.linux-foundation.org
> > Subject: [virtio-dev] packed ring layout proposal v3
> >
> [...]
> > * Descriptor ring:
> >
> > Driver writes descriptors with unique index values and DESC_DRIVER set in
> > flags.
> > Descriptors are written in a ring order: from start to end of ring, wrapping
> > around to the beginning.
> > Device writes used descriptors with correct len, index, and DESC_HW clear.
> > Again descriptors are written in ring order. This might not be the same order
> > of driver descriptors, and not all descriptors have to be written out.
> >
> > Driver and device are expected to maintain (internally) a wrap-around bit,
> > starting at 0 and changing value each time they start writing out descriptors
> > at the beginning of the ring. This bit is passed as DESC_WRAP bit in the flags
> > field.
>
> One simple question there, trying to understand the usage of DESC_WRAP flag.
>
> DESC_WRAP bit is a new flag since v2. It's used to address 'non power-of-2 ring sizes' mentioned in v2?
>
> Being confused by the statement of wrap-around bit here, it's an internal wrap-around counter represented by single bit (0/1)?
> DESC_WRAP can appear on any descriptor entry in the ring, why it highlights changing value at the beginning of the ring?
No, this is necessary if not all descriptors are overwritten by device
after they are used.
Each time driver overwrites a descriptor, the value in DESC_WRAP changes
which makes it possible for device to detect that there's a new
descriptor.
> >
> > Flags are always set/cleared last.
> >
> > Note that driver can write descriptors out in any order, but device will not
> > execute descriptor X+1 until descriptor X has been read as valid.
> >
> > Driver operation:
> >
> [...]
> >
> > DESC_WRAP - device uses this field to detect descriptor change by driver.
>
> Device uses this field to detect change of wrap-around boundary by driver?
>
> [...]
> >
> > Device operation (using descriptors):
> >
> [...]
> >
> > DESC_WRAP - driver uses this field to detect descriptor change by device.
>
> Driver uses this field to detect change of wrap-around boundary by device?
>
> By using this, driver doesn't need to maintain any internal wrap-around count, but being aware of wrap-around by DESC_WRAP flag.
>
>
> Thanks,
> Steve
So v2 simply said descriptor has a single bit: driver writes 1 there,
device writes 0.
This requires device to overwrite each descriptor and people asked
for a way to communicate where some descriptors are not overwritten.
This new bit helps device distinguish new and old descriptors written by driver.
> >
> [...]
> >
> > ---
> >
> > Note: should this proposal be accepted and approved, one or more
> > claims disclosed to the TC admin and listed on the Virtio TC
> > IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> > might become Essential Claims.
> > Note: the page above is unfortunately out of date and out of
> > my hands. I'm in the process of updating ipr disclosures
> > in github instead. Will make sure all is in place before
> > this proposal is put to vote. As usual this TC operates under the
> > Non-Assertion Mode of the OASIS IPR Policy, which protects
> > anyone implementing the virtio spec.
> >
> > --
> > MST
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-25 22:24 ` Michael S. Tsirkin
@ 2017-09-26 23:38 ` Steven Luong (sluong)
2017-09-27 23:49 ` Michael S. Tsirkin
0 siblings, 1 reply; 14+ messages in thread
From: Steven Luong (sluong) @ 2017-09-26 23:38 UTC (permalink / raw)
To: Michael S. Tsirkin, Liang, Cunming
Cc: virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Michael,
Would you please give an example or two how these two flags DESC_DRIVER and DESC_WRAP are used together? Like others, I am confused by the description and still don’t quite grok it.
Steven
On 9/25/17, 3:24 PM, "virtio-dev@lists.oasis-open.org on behalf of Michael S. Tsirkin" <virtio-dev@lists.oasis-open.org on behalf of mst@redhat.com> wrote:
On Wed, Sep 20, 2017 at 09:11:57AM +0000, Liang, Cunming wrote:
> Hi Michael,
>
> > -----Original Message-----
> > From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> > On Behalf Of Michael S. Tsirkin
> > Sent: Sunday, September 10, 2017 1:06 PM
> > To: virtio-dev@lists.oasis-open.org
> > Cc: virtualization@lists.linux-foundation.org
> > Subject: [virtio-dev] packed ring layout proposal v3
> >
> [...]
> > * Descriptor ring:
> >
> > Driver writes descriptors with unique index values and DESC_DRIVER set in
> > flags.
> > Descriptors are written in a ring order: from start to end of ring, wrapping
> > around to the beginning.
> > Device writes used descriptors with correct len, index, and DESC_HW clear.
> > Again descriptors are written in ring order. This might not be the same order
> > of driver descriptors, and not all descriptors have to be written out.
> >
> > Driver and device are expected to maintain (internally) a wrap-around bit,
> > starting at 0 and changing value each time they start writing out descriptors
> > at the beginning of the ring. This bit is passed as DESC_WRAP bit in the flags
> > field.
>
> One simple question there, trying to understand the usage of DESC_WRAP flag.
>
> DESC_WRAP bit is a new flag since v2. It's used to address 'non power-of-2 ring sizes' mentioned in v2?
>
> Being confused by the statement of wrap-around bit here, it's an internal wrap-around counter represented by single bit (0/1)?
> DESC_WRAP can appear on any descriptor entry in the ring, why it highlights changing value at the beginning of the ring?
No, this is necessary if not all descriptors are overwritten by device
after they are used.
Each time driver overwrites a descriptor, the value in DESC_WRAP changes
which makes it possible for device to detect that there's a new
descriptor.
> >
> > Flags are always set/cleared last.
> >
> > Note that driver can write descriptors out in any order, but device will not
> > execute descriptor X+1 until descriptor X has been read as valid.
> >
> > Driver operation:
> >
> [...]
> >
> > DESC_WRAP - device uses this field to detect descriptor change by driver.
>
> Device uses this field to detect change of wrap-around boundary by driver?
>
> [...]
> >
> > Device operation (using descriptors):
> >
> [...]
> >
> > DESC_WRAP - driver uses this field to detect descriptor change by device.
>
> Driver uses this field to detect change of wrap-around boundary by device?
>
> By using this, driver doesn't need to maintain any internal wrap-around count, but being aware of wrap-around by DESC_WRAP flag.
>
>
> Thanks,
> Steve
So v2 simply said descriptor has a single bit: driver writes 1 there,
device writes 0.
This requires device to overwrite each descriptor and people asked
for a way to communicate where some descriptors are not overwritten.
This new bit helps device distinguish new and old descriptors written by driver.
> >
> [...]
> >
> > ---
> >
> > Note: should this proposal be accepted and approved, one or more
> > claims disclosed to the TC admin and listed on the Virtio TC
> > IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> > might become Essential Claims.
> > Note: the page above is unfortunately out of date and out of
> > my hands. I'm in the process of updating ipr disclosures
> > in github instead. Will make sure all is in place before
> > this proposal is put to vote. As usual this TC operates under the
> > Non-Assertion Mode of the OASIS IPR Policy, which protects
> > anyone implementing the virtio spec.
> >
> > --
> > MST
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-26 23:38 ` Steven Luong (sluong)
@ 2017-09-27 23:49 ` Michael S. Tsirkin
2017-09-28 9:44 ` Liang, Cunming
2017-09-28 21:13 ` Michael S. Tsirkin
0 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-09-27 23:49 UTC (permalink / raw)
To: Steven Luong (sluong)
Cc: Liang, Cunming, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Tue, Sep 26, 2017 at 11:38:18PM +0000, Steven Luong (sluong) wrote:
> Michael,
>
> Would you please give an example or two how these two flags DESC_DRIVER and DESC_WRAP are used together? Like others, I am confused by the description and still don’t quite grok it.
>
> Steven
My bad, I will need to work on it. Here is an example:
Let's assume device promised to consume packets in order
ring size = 2
Ring is 0 initialized.
Device initially polls DESC[0].flags for WRAP bit to change.
driver adds:
DESC[0].addr = 1234
DESC[0].id = 0
DESC[0].flags = DESC_DRIVER | DESC_NEXT | DESC_WRAP
and
DESC[0].addr = 5678
DESC[1].id = 1
DESC[1].flags = DESC_DRIVER | DESC_WRAP
it now starts polling DESC[0] flags.
Device reads 1234, executes it, does not use it.
Device reads 5678, executes it, and uses it:
DESC[0].id = 1
DESC[0].flags = 0
Device now polls DESC[0].flags for WRAP bit to change.
Now driver sees that DRIVER bit has been cleared, so it nows that id is
valid. I sees id 1, therefore id 0 and 1 has been read and are safe to
overwrite.
So it writes it out. It wrapped around to beginning of ring,
so it flips the WRAP bit to 0 on all descriptors now:
DESC[0].addr = 9ABC
DESC[0].id = 0
DESC[0].flags = DESC_DRIVER | DESC_NEXT
DESC[0].addr = DEF0
DESC[0].id = 1
DESC[0].flags = DESC_DRIVER
Next round wrap will be 1 again.
To summarise:
DRIVER bit is used by driver to detect device has used one or more
descriptors. WRAP is is used by device to detect driver has made a
new descriptor available.
--
MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [virtio-dev] packed ring layout proposal v3
2017-09-27 23:49 ` Michael S. Tsirkin
@ 2017-09-28 9:44 ` Liang, Cunming
2017-10-01 4:08 ` Michael S. Tsirkin
2017-09-28 21:13 ` Michael S. Tsirkin
1 sibling, 1 reply; 14+ messages in thread
From: Liang, Cunming @ 2017-09-28 9:44 UTC (permalink / raw)
To: Michael S. Tsirkin, Steven Luong (sluong)
Cc: virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Get it now. Please correct me if I missing something.
Flags status hints,
- DESC_DRIVER only: driver owns the descriptor w/o available info ready for device to use
- DESC_DRIVER | DESC_WRAP: driver has prepared an available descriptor, device hasn't used it yet
- None: device has used the descriptor, and write descriptor out
- DESC_WRAP only: shall not happen, device make sure to clear it
Polling behavior is,
- Device monitor DESC_WRAP bit set or not; If set, go to use descriptor and clear DESC_DRIVER bit in the end (note: always need to clear DESC_WRAP)
- Driver monitor DESC_DRIVER bit cleared or not; If cleared, reclaim descriptor(set DESC_DRIVER) and set DESC_WRAP once new available descriptor get ready to go
--
Steve
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, September 28, 2017 7:49 AM
> To: Steven Luong (sluong)
> Cc: Liang, Cunming; virtio-dev@lists.oasis-open.org;
> virtualization@lists.linux-foundation.org
> Subject: Re: [virtio-dev] packed ring layout proposal v3
>
> On Tue, Sep 26, 2017 at 11:38:18PM +0000, Steven Luong (sluong) wrote:
> > Michael,
> >
> > Would you please give an example or two how these two flags
> DESC_DRIVER and DESC_WRAP are used together? Like others, I am
> confused by the description and still don’t quite grok it.
> >
> > Steven
>
> My bad, I will need to work on it. Here is an example:
>
> Let's assume device promised to consume packets in order
>
> ring size = 2
>
> Ring is 0 initialized.
>
> Device initially polls DESC[0].flags for WRAP bit to change.
>
> driver adds:
>
> DESC[0].addr = 1234
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_NEXT | DESC_WRAP
>
> and
>
> DESC[0].addr = 5678
> DESC[1].id = 1
> DESC[1].flags = DESC_DRIVER | DESC_WRAP
>
>
> it now starts polling DESC[0] flags.
>
>
> Device reads 1234, executes it, does not use it.
>
> Device reads 5678, executes it, and uses it:
>
> DESC[0].id = 1
> DESC[0].flags = 0
>
> Device now polls DESC[0].flags for WRAP bit to change.
>
> Now driver sees that DRIVER bit has been cleared, so it nows that id is valid. I
> sees id 1, therefore id 0 and 1 has been read and are safe to overwrite.
>
> So it writes it out. It wrapped around to beginning of ring, so it flips the
> WRAP bit to 0 on all descriptors now:
>
> DESC[0].addr = 9ABC
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_NEXT
>
>
> DESC[0].addr = DEF0
> DESC[0].id = 1
> DESC[0].flags = DESC_DRIVER
>
>
> Next round wrap will be 1 again.
>
>
> To summarise:
>
> DRIVER bit is used by driver to detect device has used one or more
> descriptors. WRAP is is used by device to detect driver has made a new
> descriptor available.
>
>
> --
> MST
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-27 23:49 ` Michael S. Tsirkin
2017-09-28 9:44 ` Liang, Cunming
@ 2017-09-28 21:13 ` Michael S. Tsirkin
1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-09-28 21:13 UTC (permalink / raw)
To: Steven Luong (sluong)
Cc: Liang, Cunming, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Thu, Sep 28, 2017 at 02:49:15AM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2017 at 11:38:18PM +0000, Steven Luong (sluong) wrote:
> > Michael,
> >
> > Would you please give an example or two how these two flags DESC_DRIVER and DESC_WRAP are used together? Like others, I am confused by the description and still don’t quite grok it.
> >
> > Steven
Note: I made a mistake in the email. Instead of DESC_NEXT it should read
DESC_MORE everywhere. I corrected the quoted text below for simplicity.
> My bad, I will need to work on it. Here is an example:
>
> Let's assume device promised to consume packets in order
>
> ring size = 2
>
> Ring is 0 initialized.
>
> Device initially polls DESC[0].flags for WRAP bit to change.
>
> driver adds:
>
> DESC[0].addr = 1234
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_MORE | DESC_WRAP
>
> and
>
> DESC[0].addr = 5678
> DESC[1].id = 1
> DESC[1].flags = DESC_DRIVER | DESC_WRAP
>
>
> it now starts polling DESC[0] flags.
>
>
> Device reads 1234, executes it, does not use it.
>
> Device reads 5678, executes it, and uses it:
>
> DESC[0].id = 1
> DESC[0].flags = 0
>
> Device now polls DESC[0].flags for WRAP bit to change.
>
> Now driver sees that DRIVER bit has been cleared, so it nows that id is
> valid. I sees id 1, therefore id 0 and 1 has been read and are safe to
> overwrite.
>
> So it writes it out. It wrapped around to beginning of ring,
> so it flips the WRAP bit to 0 on all descriptors now:
>
> DESC[0].addr = 9ABC
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_MORE
>
>
> DESC[0].addr = DEF0
> DESC[0].id = 1
> DESC[0].flags = DESC_DRIVER
>
>
> Next round wrap will be 1 again.
>
>
> To summarise:
>
> DRIVER bit is used by driver to detect device has used one or more
> descriptors. WRAP is is used by device to detect driver has made a
> new descriptor available.
>
>
> --
> MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-21 13:36 ` Liang, Cunming
@ 2017-09-28 21:27 ` Michael S. Tsirkin
0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-09-28 21:27 UTC (permalink / raw)
To: Liang, Cunming
Cc: virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Thu, Sep 21, 2017 at 01:36:37PM +0000, Liang, Cunming wrote:
> Hi,
>
> > -----Original Message-----
> > From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org] On
> > Behalf Of Michael S. Tsirkin
> > Sent: Sunday, September 10, 2017 1:06 PM
> > To: virtio-dev@lists.oasis-open.org
> > Cc: virtualization@lists.linux-foundation.org
> > Subject: [virtio-dev] packed ring layout proposal v3
> >
> [...]
> > * Batching descriptors:
> >
> > virtio 1.0 allows passing a batch of descriptors in both directions, by incrementing
> > the used/avail index by values > 1.
> > At the moment only batching of used descriptors is used.
> >
> > We can support this by chaining a list of device descriptors through
> > VRING_DESC_F_MORE flag. Device sets this bit to signal driver that this is part of
> > a batch of used descriptors which are all part of a single transaction.
>
> It supposes each s/g chain represents for a packet, while each descriptor among batching chain represents for a packet. There're a few thoughts of batching chain(by VRING_DESC_F_MORE) and s/g chain(by VRING_DESC_F_NEXT).
>
> - batching chain: It's up to device to coalesce the write-out of a batched used descriptors. As the batching size is variable, driver has to detect validity of each descriptor unless the number of incoming valid descriptor is predictable, being curious on the benefits of driver from VRING_DESC_F_MORE to take batching descriptors as a single transaction. On device perspective, it's great to write out one descriptor for the whole chain. However, it assumes no other useful fields in each descriptor of chain needs to write. TX buffer reclaiming can be the candidate, while RX side has to update 'len' at least. Even for this purpose, instead of writing out VRING_DESC_F_MORE on a few descriptors to suppress device writing back, it's cheaper to set flag (e.g. VRING_DESC_F_WB) on single descriptor of chain to hint the expected position for device to write back.
But driver does not really benefit from batching and does not know how
many to batch, this depends on device. E.g. a software device does not
need batching at all, a pci express device would want batches to be
multiples of what fits in a pci express transaction, etc. We would have
to provide that info from device to driver.
> - s/g chain: It makes sense to indicate the packet boundary. Considering in-order descriptor ring without VRING_DESC_F_INDIRECT, the next descriptor always belongs to the same s/g chain until end of packet indicators occur. So one alternative approach is only to set a flag (e.g. VRING_DESC_F_EOP) on the last descriptor of the chain.
EOP would be the reverse of NEXT then? I think it does not matter much,
but NEXT matches what is there in virtio 1.0 right now. It also means that
simple implementations with short buffers can have flags set to 0 which
seems cleaner.
> >
> > Driver might detect a partial descriptor chain (VRING_DESC_F_MORE set but next
> > descriptor not valid). In that case it must not use any parts of the chain - it will
> > later be completed by device, but driver is allowed to store the valid parts of the
> > chain as device is not allowed to change them anymore.
> As each descriptor represent for a whole packet(otherwise it's s/g chain),
For RX mergeable buffers, a packet is composed of multiple s/g chains.
> wondering why it must not use any parts of the chain.
This is to match what is there in virtio 1.0 right now: driver
does not touch any used descriptors until the used index is updated.
> >
> > Descriptor should not have both VRING_DESC_F_MORE and
> > VRING_DESC_F_NEXT set.
> >
> [...]
> >
> > * Selective use of descriptors
> >
> > As described above, descriptors with NEXT bit set are part of a scatter/gather
> > chain and so do not have to cause device to write a used descriptor out.
> >
> > Similarly, driver can set a flag VRING_DESC_F_MORE in the descriptor to signal to
> > device that it does not have to write out the used descriptor as it is part of a batch
> > of descriptors. Device has two options (similar to VRING_DESC_F_NEXT):
> >
> > Device can write out the same number of descriptors for the batch, setting
> > VRING_DESC_F_MORE for all but the last descriptor.
> > Driver will ignore all used descriptors with VRING_DESC_F_MORE bit set.
> It will write out last descriptor without VRING_DESC_F_MORE anyway, so the following statement seems not like another option.
I don't understand this statement. All I said is that it's up to device
whether to write out the descriptors with VRING_DESC_F_MORE, or to skip
the write out.
> >
> > Device only writes out a single descriptor for the whole batch.
> > However, to keep device and driver in sync, it then skips a number of descriptors
> > corresponding to the length of the batch before writing out the next used
> > descriptor.
> > After detecting a used descriptor driver must find out the length of the batch that
> > it built in order to know where to look for the next device descriptor.
> It would be good to keep it simple on device side, and to have the driver control the expectation.
I'm not sure what above means either.
That is exactly what above proposal says: device simply writes out a single
descriptor. Driver has to keep track and know where the next one will be
written.
Example
Driver writes two pairs chained with MORE: 0 + 1, 2 + 3
Device writes: 0 and 3
> >
> >
> > TODO (blocker): skipping descriptors for selective and scatter/gather seem to be
> > only requested with in-order right now. Let's require in-order for this skipping?
> > This will simplify the accounting by driver.
> >
> >
>
> Thanks,
> Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
2017-09-28 9:44 ` Liang, Cunming
@ 2017-10-01 4:08 ` Michael S. Tsirkin
[not found] ` <20171004123901.oomueufqg52uhas4@localhost.localdomain>
0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-10-01 4:08 UTC (permalink / raw)
To: Liang, Cunming
Cc: Steven Luong (sluong), virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Thu, Sep 28, 2017 at 09:44:35AM +0000, Liang, Cunming wrote:
>
> Get it now. Please correct me if I missing something.
>
>
> Flags status hints,
>
> - DESC_DRIVER only: driver owns the descriptor w/o available info ready for device to use
>
> - DESC_DRIVER | DESC_WRAP: driver has prepared an available descriptor, device hasn't used it yet
>
> - None: device has used the descriptor, and write descriptor out
>
> - DESC_WRAP only: shall not happen, device make sure to clear it
>
>
> Polling behavior is,
>
> - Device monitor DESC_WRAP bit set or not; If set, go to use descriptor and clear DESC_DRIVER bit in the end (note: always need to clear DESC_WRAP)
>
> - Driver monitor DESC_DRIVER bit cleared or not; If cleared, reclaim descriptor(set DESC_DRIVER) and set DESC_WRAP once new available descriptor get ready to go
>
>
> --
> Steve
Hmm no, not what I had in mind.
DESC_DRIVER: used by driver to poll. Driver sets it when writing a
descriptor. Device clears it when overwriting a descriptor.
Thus driver uses DESC_DRIVER to detect that device data in
descriptor is valid.
DESC_WRAP: used by device to poll. Driver sets it to a *different*
value every time it overwrites a descriptor. How to achieve it?
since descriptors are written out in ring order,
simply maintain the current value internally (start value 1) and flip it
every time you overwrite the first descriptor.
Device leaves it intact when overwriting a descriptor.
After writing down this explanation, I think the names aren't
great.
Let me try an alternative explanation.
---------------
A two-bit field, DRIVER_OWNER, signals the buffer ownership.
It has 4 possible values:
values 0x1, 0x11 are written by driver
values 0x0, 0x10 are written by device
each time driver writes out a descriptor, it must make sure
that the high bit in OWNER changes.
each time device writes out a descriptor, it must make sure
that the high bit in OWNER does not change.
this is exactly the same functionally, DRIVER is high bit and
WRAP is the low bit. Does this make things clearer?
---------------
Maybe the difference between device and driver
is confusing. We can fix that by changing the values.
Here is an alternative. Let me know if you like it better -
I need to think a bit more to make sure it works,
but to give you an idea:
---------------
A two-bit field, DRIVER_OWNER, signals the buffer ownership.
It has 4 possible values:
values 0x1, 0x10 are written by driver
values 0x0, 0x11 are written by device
each time driver writes out a descriptor, it must make sure
that the high bit in OWNER changes. Thus first time
it writes 0x10, next time 0x1, then 0x10 again.
each time device writes out a descriptor, it must make sure
that the low bit in OWNER changes. Thus first time
it writes 0x11, next time 0x0, then 0x11 again.
---------------
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, September 28, 2017 7:49 AM
> > To: Steven Luong (sluong)
> > Cc: Liang, Cunming; virtio-dev@lists.oasis-open.org;
> > virtualization@lists.linux-foundation.org
> > Subject: Re: [virtio-dev] packed ring layout proposal v3
> >
> > On Tue, Sep 26, 2017 at 11:38:18PM +0000, Steven Luong (sluong) wrote:
> > > Michael,
> > >
> > > Would you please give an example or two how these two flags
> > DESC_DRIVER and DESC_WRAP are used together? Like others, I am
> > confused by the description and still don’t quite grok it.
> > >
> > > Steven
> >
> > My bad, I will need to work on it. Here is an example:
> >
> > Let's assume device promised to consume packets in order
> >
> > ring size = 2
> >
> > Ring is 0 initialized.
> >
> > Device initially polls DESC[0].flags for WRAP bit to change.
> >
> > driver adds:
> >
> > DESC[0].addr = 1234
> > DESC[0].id = 0
> > DESC[0].flags = DESC_DRIVER | DESC_NEXT | DESC_WRAP
> >
> > and
> >
> > DESC[0].addr = 5678
> > DESC[1].id = 1
> > DESC[1].flags = DESC_DRIVER | DESC_WRAP
> >
> >
> > it now starts polling DESC[0] flags.
> >
> >
> > Device reads 1234, executes it, does not use it.
> >
> > Device reads 5678, executes it, and uses it:
> >
> > DESC[0].id = 1
> > DESC[0].flags = 0
> >
> > Device now polls DESC[0].flags for WRAP bit to change.
> >
> > Now driver sees that DRIVER bit has been cleared, so it nows that id is valid. I
> > sees id 1, therefore id 0 and 1 has been read and are safe to overwrite.
> >
> > So it writes it out. It wrapped around to beginning of ring, so it flips the
> > WRAP bit to 0 on all descriptors now:
> >
> > DESC[0].addr = 9ABC
> > DESC[0].id = 0
> > DESC[0].flags = DESC_DRIVER | DESC_NEXT
> >
> >
> > DESC[0].addr = DEF0
> > DESC[0].id = 1
> > DESC[0].flags = DESC_DRIVER
> >
> >
> > Next round wrap will be 1 again.
> >
> >
> > To summarise:
> >
> > DRIVER bit is used by driver to detect device has used one or more
> > descriptors. WRAP is is used by device to detect driver has made a new
> > descriptor available.
> >
> >
> > --
> > MST
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
[not found] ` <20171004123901.oomueufqg52uhas4@localhost.localdomain>
@ 2017-10-04 12:58 ` Michael S. Tsirkin
2017-10-10 9:56 ` Liang, Cunming
0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2017-10-04 12:58 UTC (permalink / raw)
To: Jens Freimann
Cc: Liang, Cunming, Steven Luong (sluong),
virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
On Wed, Oct 04, 2017 at 02:39:01PM +0200, Jens Freimann wrote:
> On Sun, Oct 01, 2017 at 04:08:29AM +0000, Michael S. Tsirkin wrote:
> > On Thu, Sep 28, 2017 at 09:44:35AM +0000, Liang, Cunming wrote:
> > >
> > > Get it now. Please correct me if I missing something.
> > >
> > >
> > > Flags status hints,
> > >
> > > - DESC_DRIVER only: driver owns the descriptor w/o available info ready for device to use
> > >
> > > - DESC_DRIVER | DESC_WRAP: driver has prepared an available descriptor, device hasn't used it yet
> > >
> > > - None: device has used the descriptor, and write descriptor out
> > >
> > > - DESC_WRAP only: shall not happen, device make sure to clear it
> > >
> > >
> > > Polling behavior is,
> > >
> > > - Device monitor DESC_WRAP bit set or not; If set, go to use descriptor and clear DESC_DRIVER bit in the end (note: always need to clear DESC_WRAP)
> > >
> > > - Driver monitor DESC_DRIVER bit cleared or not; If cleared, reclaim descriptor(set DESC_DRIVER) and set DESC_WRAP once new available descriptor get ready to go
> > >
> > >
> > > --
> > > Steve
> >
> >
> > Hmm no, not what I had in mind.
> >
> > DESC_DRIVER: used by driver to poll. Driver sets it when writing a
> > descriptor. Device clears it when overwriting a descriptor.
> > Thus driver uses DESC_DRIVER to detect that device data in
> > descriptor is valid.
>
> Basically DESC_HW from v2 split in two?
Yes in order to avoid overwriting all descriptors.
> >
> > DESC_WRAP: used by device to poll. Driver sets it to a *different*
> > value every time it overwrites a descriptor. How to achieve it?
> > since descriptors are written out in ring order,
> > simply maintain the current value internally (start value 1) and flip it
> > every time you overwrite the first descriptor.
> > Device leaves it intact when overwriting a descriptor.
>
> This is confusing me a bit.
>
> My understanding is: 1. the internally kept wrap value only flipped when the
> first
> descriptor is overwritten
>
> 2. the moment the first descriptor is written the internal wrap value
> is flipped 0->1 or 1->0 and this value is written to every descriptor
> DESC_WRAP until we reach the first descriptor again
Yes this is what I tried to say. Can you suggest a better wording then?
> >
> > After writing down this explanation, I think the names aren't
> > great.
> >
> > Let me try an alternative explanation.
> >
> > ---------------
> > A two-bit field, DRIVER_OWNER, signals the buffer ownership.
> > It has 4 possible values:
> > values 0x1, 0x11 are written by driver
> > values 0x0, 0x10 are written by device
>
> The 0x prefix might add to the confusion here. It is really just two
> bits, no?
Ouch. Yes I meant 0b. Thanks!
> > each time driver writes out a descriptor, it must make sure
> > that the high bit in OWNER changes.
> >
> > each time device writes out a descriptor, it must make sure
> > that the high bit in OWNER does not change.
> >
> > this is exactly the same functionally, DRIVER is high bit and
> > WRAP is the low bit. Does this make things clearer?
>
> So far it makes sense to me.
> > ---------------
> >
> >
> >
> > Maybe the difference between device and driver
> > is confusing. We can fix that by changing the values.
> > Here is an alternative. Let me know if you like it better -
> > I need to think a bit more to make sure it works,
> > but to give you an idea:
> >
> >
> > ---------------
> > A two-bit field, DRIVER_OWNER, signals the buffer ownership.
> > It has 4 possible values:
> > values 0x1, 0x10 are written by driver
> > values 0x0, 0x11 are written by device
> >
> > each time driver writes out a descriptor, it must make sure
> > that the high bit in OWNER changes. Thus first time
> > it writes 0x10, next time 0x1, then 0x10 again.
> >
> > each time device writes out a descriptor, it must make sure
> > that the low bit in OWNER changes. Thus first time
> > it writes 0x11, next time 0x0, then 0x11 again.
>
> DESC_WRAP is changed by the device now, so this would work differently
> than in the scenario from above. This would mean we don't need the
> internally kept wrap value, right?
I think no but I did not complete simulating this yet so I don't
yet know if it even works.
>
> regards,
> Jens
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [virtio-dev] packed ring layout proposal v3
2017-10-04 12:58 ` Michael S. Tsirkin
@ 2017-10-10 9:56 ` Liang, Cunming
0 siblings, 0 replies; 14+ messages in thread
From: Liang, Cunming @ 2017-10-10 9:56 UTC (permalink / raw)
To: Michael S. Tsirkin, Jens Freimann
Cc: Steven Luong (sluong), virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
> -----Original Message-----
> From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org]
> On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, October 4, 2017 8:58 PM
> To: Jens Freimann <jfreimann@redhat.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Steven Luong (sluong)
> <sluong@cisco.com>; virtio-dev@lists.oasis-open.org;
> virtualization@lists.linux-foundation.org
> Subject: Re: [virtio-dev] packed ring layout proposal v3
>
> On Wed, Oct 04, 2017 at 02:39:01PM +0200, Jens Freimann wrote:
> > On Sun, Oct 01, 2017 at 04:08:29AM +0000, Michael S. Tsirkin wrote:
> > > On Thu, Sep 28, 2017 at 09:44:35AM +0000, Liang, Cunming wrote:
> > > >
> > > > Get it now. Please correct me if I missing something.
> > > >
> > > >
> > > > Flags status hints,
> > > >
> > > > - DESC_DRIVER only: driver owns the descriptor w/o available info
> > > > ready for device to use
> > > >
> > > > - DESC_DRIVER | DESC_WRAP: driver has prepared an available
> > > > descriptor, device hasn't used it yet
> > > >
> > > > - None: device has used the descriptor, and write descriptor out
> > > >
> > > > - DESC_WRAP only: shall not happen, device make sure to clear it
> > > >
> > > >
> > > > Polling behavior is,
> > > >
> > > > - Device monitor DESC_WRAP bit set or not; If set, go to use
> > > > descriptor and clear DESC_DRIVER bit in the end (note: always need
> > > > to clear DESC_WRAP)
> > > >
> > > > - Driver monitor DESC_DRIVER bit cleared or not; If cleared,
> > > > reclaim descriptor(set DESC_DRIVER) and set DESC_WRAP once new
> > > > available descriptor get ready to go
> > > >
> > > >
> > > > --
> > > > Steve
> > >
> > >
> > > Hmm no, not what I had in mind.
> > >
> > > DESC_DRIVER: used by driver to poll. Driver sets it when writing a
> > > descriptor. Device clears it when overwriting a descriptor.
> > > Thus driver uses DESC_DRIVER to detect that device data in
> > > descriptor is valid.
> >
> > Basically DESC_HW from v2 split in two?
>
> Yes in order to avoid overwriting all descriptors.
>
> > >
> > > DESC_WRAP: used by device to poll. Driver sets it to a *different*
> > > value every time it overwrites a descriptor. How to achieve it?
> > > since descriptors are written out in ring order, simply maintain the
> > > current value internally (start value 1) and flip it every time you
> > > overwrite the first descriptor.
> > > Device leaves it intact when overwriting a descriptor.
Ok, get it now.
> >
> > This is confusing me a bit.
> >
> > My understanding is: 1. the internally kept wrap value only flipped
> > when the first descriptor is overwritten
> >
> > 2. the moment the first descriptor is written the internal wrap value
> > is flipped 0->1 or 1->0 and this value is written to every descriptor
> > DESC_WRAP until we reach the first descriptor again
That's right, it's also my take.
DESC_WRAP is only used by driver, device does nothing with that flag.
>
> Yes this is what I tried to say. Can you suggest a better wording then?
The term of DESC_WRAP is fine to me.
>
> > >
> > > After writing down this explanation, I think the names aren't great.
> > >
> > > Let me try an alternative explanation.
> > >
> > > ---------------
> > > A two-bit field, DRIVER_OWNER, signals the buffer ownership.
> > > It has 4 possible values:
> > > values 0x1, 0x11 are written by driver values 0x0, 0x10 are written
> > > by device
> >
> > The 0x prefix might add to the confusion here. It is really just two
> > bits, no?
>
> Ouch. Yes I meant 0b. Thanks!
0b00, 0b10 are written by device?
I suppose device can only clear high bit, can keep low bit no change.
Then the value written by device can be either 0b01 or 0b00, but 0b10 means to set high bit, no?
>
> > > each time driver writes out a descriptor, it must make sure that the
> > > high bit in OWNER changes.
> > >
> > > each time device writes out a descriptor, it must make sure that the
> > > high bit in OWNER does not change.
Typo here? It should be "..., it must make sure that the low bit in OWNER does not change."?
For high bit in OWNER, each time devices writes out a descriptor, it must make sure to clear high bit in OWNER.
> > >
> > > this is exactly the same functionally, DRIVER is high bit and WRAP
> > > is the low bit. Does this make things clearer?
> >
> > So far it makes sense to me.
It sounds good.
> > > ---------------
> > >
> > >
> > >
> > > Maybe the difference between device and driver is confusing. We can
> > > fix that by changing the values.
> > > Here is an alternative. Let me know if you like it better - I need
> > > to think a bit more to make sure it works, but to give you an idea:
> > >
> > >
> > > ---------------
> > > A two-bit field, DRIVER_OWNER, signals the buffer ownership.
> > > It has 4 possible values:
> > > values 0x1, 0x10 are written by driver values 0x0, 0x11 are written
> > > by device
> > >
> > > each time driver writes out a descriptor, it must make sure that the
> > > high bit in OWNER changes. Thus first time it writes 0x10, next time
> > > 0x1, then 0x10 again.
> > >
> > > each time device writes out a descriptor, it must make sure that the
> > > low bit in OWNER changes. Thus first time it writes 0x11, next time
> > > 0x0, then 0x11 again.
> >
> > DESC_WRAP is changed by the device now, so this would work differently
> > than in the scenario from above. This would mean we don't need the
> > internally kept wrap value, right?
>
> I think no but I did not complete simulating this yet so I don't yet know if it
> even works.
>
> >
> > regards,
> > Jens
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [virtio-dev] packed ring layout proposal v3
@ 2017-10-20 9:50 Lars Ganrot
0 siblings, 0 replies; 14+ messages in thread
From: Lars Ganrot @ 2017-10-20 9:50 UTC (permalink / raw)
To: virtio-dev@lists.oasis-open.org
Hi Michael,
I'm trying to understand your Sep 28 13:32 example (text copied below). I've in-lined my questions [#lga#].
>
> Let's assume device promised to consume packets in order
>
> ring size = 2
>
> Ring is 0 initialized.
>
> Device initially polls DESC[0].flags for WRAP bit to change.
>
> driver adds:
>
> DESC[0].addr = 1234
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_MORE | DESC_WRAP
>
> and
>
> DESC[0].addr = 5678
[#lga#] Is index 0 above a typo (makes more sense if it is 1)?
> DESC[1].id = 1
> DESC[1].flags = DESC_DRIVER | DESC_WRAP
>
> it now starts polling DESC[0] flags.
>
> Device reads 1234, executes it, does not use it.
>
> Device reads 5678, executes it, and uses it:
>
> DESC[0].id = 1
> DESC[0].flags = 0
[#lga#] Should above line be: "DESC[0].flags = DESC[1].flags & DESC_WRAP"?
>
> Device now polls DESC[0].flags for WRAP bit to change.
>
> Now driver sees that DRIVER bit has been cleared, so it nows that id
> is valid. I sees id 1, therefore id 0 and 1 has been read and are safe to overwrite.
[#lga#] At this point, has the device returned both buffers *(1234) and *(5678) to the driver?
[#lga#] If so, how would out-of-order completion avoid head of line blocking?
[#lga#] E.g. the device is done with id=1 and *(5678), but not id=0 and *(1234).
>
> So it writes it out. It wrapped around to beginning of ring, so it
> flips the WRAP bit to 0 on all descriptors now:
>
> DESC[0].addr = 9ABC
> DESC[0].id = 0
> DESC[0].flags = DESC_DRIVER | DESC_MORE
>
> DESC[0].addr = DEF0
> DESC[0].id = 1
> DESC[0].flags = DESC_DRIVER
[#lga#] Index typo in all 3 lines above? DESC[1] makes more sense.
>
> Next round wrap will be 1 again.
>
> To summarise:
>
> DRIVER bit is used by driver to detect device has used one or more
> descriptors. WRAP is is used by device to detect driver has made a
> new descriptor available.
Best Regards,
-Lars
Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-10-20 9:50 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-20 9:50 [virtio-dev] packed ring layout proposal v3 Lars Ganrot
-- strict thread matches above, loose matches on Subject: below --
2017-09-10 5:06 Michael S. Tsirkin
2017-09-12 16:20 ` Willem de Bruijn
2017-09-20 9:11 ` Liang, Cunming
2017-09-25 22:24 ` Michael S. Tsirkin
2017-09-26 23:38 ` Steven Luong (sluong)
2017-09-27 23:49 ` Michael S. Tsirkin
2017-09-28 9:44 ` Liang, Cunming
2017-10-01 4:08 ` Michael S. Tsirkin
[not found] ` <20171004123901.oomueufqg52uhas4@localhost.localdomain>
2017-10-04 12:58 ` Michael S. Tsirkin
2017-10-10 9:56 ` Liang, Cunming
2017-09-28 21:13 ` Michael S. Tsirkin
2017-09-21 13:36 ` Liang, Cunming
2017-09-28 21:27 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox