* Re: [PATCH V2 net-next 5/6] macvlan/macvtap: Add support for SCTP checksum offload.
From: Michael S. Tsirkin @ 2018-05-02 14:17 UTC (permalink / raw)
To: Vlad Yasevich
Cc: virtio-dev, marcelo.leitner, nhorman, netdev, virtualization,
linux-sctp, Vladislav Yasevich
In-Reply-To: <cd94e936-f24c-de0d-7254-069054e33268@redhat.com>
On Wed, May 02, 2018 at 10:00:14AM -0400, Vlad Yasevich wrote:
> On 05/02/2018 09:46 AM, Michael S. Tsirkin wrote:
> > On Wed, May 02, 2018 at 09:27:00AM -0400, Vlad Yasevich wrote:
> >> On 05/01/2018 11:24 PM, Michael S. Tsirkin wrote:
> >>> On Tue, May 01, 2018 at 10:07:38PM -0400, Vladislav Yasevich wrote:
> >>>> Since we now have support for software CRC32c offload, turn it on
> >>>> for macvlan and macvtap devices so that guests can take advantage
> >>>> of offload SCTP checksums to the host or host hardware.
> >>>>
> >>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> >>>> ---
> >>>> drivers/net/macvlan.c | 5 +++--
> >>>> drivers/net/tap.c | 8 +++++---
> >>>> 2 files changed, 8 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> >>>> index 725f4b4..646b730 100644
> >>>> --- a/drivers/net/macvlan.c
> >>>> +++ b/drivers/net/macvlan.c
> >>>> @@ -834,7 +834,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
> >>>>
> >>>> #define ALWAYS_ON_OFFLOADS \
> >>>> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \
> >>>> - NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL)
> >>>> + NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL | NETIF_F_SCTP_CRC)
> >>>>
> >>>> #define ALWAYS_ON_FEATURES (ALWAYS_ON_OFFLOADS | NETIF_F_LLTX)
> >>>>
> >>>> @@ -842,7 +842,8 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
> >>>> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
> >>>> NETIF_F_GSO | NETIF_F_TSO | NETIF_F_LRO | \
> >>>> NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
> >>>> - NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
> >>>> + NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
> >>>> + NETIF_F_SCTP_CRC)
> >>>>
> >>>> #define MACVLAN_STATE_MASK \
> >>>> ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
> >>>> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> >>>> index 9b6cb78..2c8512b 100644
> >>>> --- a/drivers/net/tap.c
> >>>> +++ b/drivers/net/tap.c
> >>>> @@ -369,8 +369,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
> >>>> * check, we either support them all or none.
> >>>> */
> >>>> if (skb->ip_summed == CHECKSUM_PARTIAL &&
> >>>> - !(features & NETIF_F_CSUM_MASK) &&
> >>>> - skb_checksum_help(skb))
> >>>> + skb_csum_hwoffload_help(skb, features))
> >>>> goto drop;
> >>>> if (ptr_ring_produce(&q->ring, skb))
> >>>> goto drop;
> >>>> @@ -945,6 +944,9 @@ static int set_offload(struct tap_queue *q, unsigned long arg)
> >>>> }
> >>>> }
> >>>>
> >>>> + if (arg & TUN_F_SCTP_CSUM)
> >>>> + feature_mask |= NETIF_F_SCTP_CRC;
> >>>> +
> >>>
> >>> so this still affects TX, shouldn't this affect RX instead?
> >>
> >> There is no bit to set on the RX path just like there is no bit to set on the RX patch
> >> for TUN_F_CSUM.
> >>
> >> We only invert TSO offloads, not checksum offloads as the comment below states.
> >> For checksum, macvtap has to compute the checksum itself in tap_handle_frame() above.
> >> It uses tx feature bits to see if needs do to the checksum.
> >>
> >> If you think we need another flag to macvtap to control RXCSUM, that would need to be
> >> separate and cover standard TCP checksum as well.
> >>
> >> -vlad
> >
> > Confused. What is the meaning of TUN_F_SCTP_CSUM? I assume this is
> > a way for userspace to tell tun device: "I can handle
> > packets without SCTP checksum, pls send them my way".
>
> Yes, just as TUN_F_CSUM means that tun device can handle packets with
> partial tcp/udp checksum.
>
> >
> > Now what is the implication for macvtap?
>
> The implication is exactly the same as for TUN_F_CSUM. If the
> flag is set on the macvtap device, the TX checksum feature is
> turned on.
I guess I will have to go back and re-read that code - I do not remember
what does TUN_F_CSUM does by now. Here is a quick question that might
help me:
Let's assume userspace does not set TUN_F_SCTP_CSUM and device
does not calculate the checksums either. I would expect that with
macvtap this then behaves exactly like now with no overhead.
>
> > And why are
> > you setting NETIF_F_SCTP_CRC which is a flag
> > that affects packets sent by guest to host?
>
> Mainly its because we are using just 1 flag to control checksum
> offloading and we need to be able control both tx and rx paths.
Well that's not really the case I think. What we have is controls for tx
offloads for tun. That's TUN_F_CSUM.
there are no rx offloads - userspace can send what it wants.
These are supposed to translate to rx offloads for macvtap.
tx offloads shouldn't be affected at all.
Maybe that's not the case - as I said need to go back and check.
Will try to find the time in the next couple of days.
> What you are suggesting that we either invert what TUN_F_CSUM
> is doing in macvtap case, or have another flag that lets us control
> TX and RX paths separately.
>
> Either case, that would be separate work.
> -vlad
So assuming TUN_F_CSUM affects tx for macvtap do you
agree it's a bug? And should we add to it with
TUN_F_SCTP_CSUM doing the same?
> >
> >
> >>>
> >>>
> >>>> /* tun/tap driver inverts the usage for TSO offloads, where
> >>>> * setting the TSO bit means that the userspace wants to
> >>>> * accept TSO frames and turning it off means that user space
> >>>> @@ -1077,7 +1079,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
> >>>> case TUNSETOFFLOAD:
> >>>> /* let the user check for future flags */
> >>>> if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> >>>> - TUN_F_TSO_ECN | TUN_F_UFO))
> >>>> + TUN_F_TSO_ECN | TUN_F_UFO | TUN_F_SCTP_CSUM))
> >>>> return -EINVAL;
> >>>>
> >>>> rtnl_lock();
> >>>> --
> >>>> 2.9.5
^ permalink raw reply
* [PATCH] Revert "vhost: make msg padding explicit"
From: Michael S. Tsirkin @ 2018-05-02 14:19 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, kvm, virtualization
This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406.
Unfortunately the padding will break 32 bit userspace.
Ouch. Need to add some compat code, revert for now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/uapi/linux/vhost.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 5a8ad06..c51f8e5 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -68,7 +68,6 @@ struct vhost_iotlb_msg {
struct vhost_msg {
int type;
- int padding0;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
--
MST
^ permalink raw reply related
* Re: [PATCH] vhost: make msg padding explicit
From: Michael S. Tsirkin @ 2018-05-02 14:19 UTC (permalink / raw)
To: David Miller; +Cc: kevin, kvm, netdev, linux-kernel, virtualization
In-Reply-To: <20180502.100446.106883595468297057.davem@davemloft.net>
On Wed, May 02, 2018 at 10:04:46AM -0400, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Wed, 2 May 2018 16:36:37 +0300
>
> > Ouch. True - and in particular the 32 bit ABI on 64 bit kernels doesn't
> > work at all. Hmm. It's relatively new and maybe there aren't any 32 bit
> > users yet. Thoughts?
>
> If it's been in a released kernel version, we really aren't at liberty
> to play "maybe nobody uses this" UAPI changing games.
>
> Please send me a revert.
Sent. Looking at compat mess now.
--
MST
^ permalink raw reply
* Re: [PATCH V2 net-next 1/6] virtio: Add support for SCTP checksum offloading
From: Michael S. Tsirkin @ 2018-05-02 14:21 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: virtio-dev, nhorman, netdev, virtualization, linux-sctp,
Vladislav Yasevich
In-Reply-To: <20180502141413.GD5105@localhost.localdomain>
On Wed, May 02, 2018 at 11:14:13AM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, May 02, 2018 at 06:16:45AM +0300, Michael S. Tsirkin wrote:
> > On Tue, May 01, 2018 at 10:07:34PM -0400, Vladislav Yasevich wrote:
> > > To support SCTP checksum offloading, we need to add a new feature
> > > to virtio_net, so we can negotiate support between the hypervisor
> > > and the guest.
> > > The HOST feature bit signifies offloading support for transmit and
> > > enables device offload features.
> > > The GUEST feature bit signifies offloading support of recieve and
> > > is currently only used by the driver in case of xdp.
> > >
> > > That patch also adds an addition virtio_net header flag which
> > > mirrors the skb->csum_not_inet flag. This flags is used to indicate
> > > that is this an SCTP packet that needs its checksum computed by the
> > > lower layer. In this case, the lower layer is the host hypervisor or
> > > possibly HW nic that supporst CRC32c offload.
> > >
> > > In the case that GUEST feature bit is flag, it will be possible to
> > > receive a virtio_net header with this bit set, which will set the
> > > corresponding skb bit. SCTP protocol will be updated to correctly
> > > deal with it.
> > >
> > > Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> > > ---
> > > drivers/net/virtio_net.c | 14 +++++++++++++-
> > > include/linux/virtio_net.h | 6 ++++++
> > > include/uapi/linux/virtio_net.h | 5 +++++
> > > 3 files changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 7b187ec..34af280 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2148,6 +2148,8 @@ static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
> > >
> > > if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_SCTP_CSUM))
> > > + offloads |= 1ULL << VIRTIO_NET_F_GUEST_SCTP_CSUM;
> > >
> > > return virtnet_set_guest_offloads(vi, offloads);
> > > }
> > > @@ -2160,6 +2162,8 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> > > return 0;
> > > if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> > > + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_SCTP_CSUM))
> > > + offloads |= 1ULL << VIRTIO_NET_F_GUEST_SCTP_CSUM;
> > >
> > > return virtnet_set_guest_offloads(vi, offloads);
> > > }
> > > @@ -2724,6 +2728,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > /* Do we support "hardware" checksums? */
> > > if (virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) {
> > > /* This opens up the world of extra features. */
> > > +
> > > dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_SG;
> > > if (csum)
> > > dev->features |= NETIF_F_HW_CSUM | NETIF_F_SG;
> > > @@ -2746,9 +2751,15 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
> > > /* (!csum && gso) case will be fixed by register_netdev() */
> > > }
> > > +
> > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > dev->features |= NETIF_F_RXCSUM;
> > >
> > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_SCTP_CSUM)) {
> > > + dev->hw_features |= NETIF_F_SCTP_CRC;
> > > + dev->features |= NETIF_F_SCTP_CRC;
> > > + }
> > > +
> > > dev->vlan_features = dev->features;
> > >
> > > /* MTU range: 68 - 65535 */
> > > @@ -2962,7 +2973,8 @@ static struct virtio_device_id id_table[] = {
> > > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> > > VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> > > - VIRTIO_NET_F_SPEED_DUPLEX
> > > + VIRTIO_NET_F_SPEED_DUPLEX, \
> > > + VIRTIO_NET_F_HOST_SCTP_CSUM, VIRTIO_NET_F_GUEST_SCTP_CSUM
> > >
> > > static unsigned int features[] = {
> > > VIRTNET_FEATURES,
> > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > > index f144216..28fffdc 100644
> > > --- a/include/linux/virtio_net.h
> > > +++ b/include/linux/virtio_net.h
> > > @@ -39,6 +39,9 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > >
> > > if (!skb_partial_csum_set(skb, start, off))
> > > return -EINVAL;
> > > +
> > > + if (hdr->flags & VIRTIO_NET_HDR_F_CSUM_NOT_INET)
> > > + skb->csum_not_inet = 1;
> > > }
> > >
> > > if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> > > @@ -91,6 +94,9 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
> > > skb_checksum_start_offset(skb));
> > > hdr->csum_offset = __cpu_to_virtio16(little_endian,
> > > skb->csum_offset);
> > > +
> > > + if (skb->csum_not_inet)
> > > + hdr->flags |= VIRTIO_NET_HDR_F_CSUM_NOT_INET;
> > > } else if (has_data_valid &&
> > > skb->ip_summed == CHECKSUM_UNNECESSARY) {
> > > hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
> > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > index 5de6ed3..9dfca1a 100644
> > > --- a/include/uapi/linux/virtio_net.h
> > > +++ b/include/uapi/linux/virtio_net.h
> > > @@ -57,6 +57,10 @@
> > > * Steering */
> > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > >
> > > +#define VIRTIO_NET_F_GUEST_SCTP_CSUM 61 /* Guest handles SCTP pks w/ partial
> > > + * csum */
> > > +#define VIRTIO_NET_F_HOST_SCTP_CSUM 62 /* HOST handles SCTP pkts w/ partial
> > > + * csum */
> > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> > >
> > > #ifndef VIRTIO_NET_NO_LEGACY
> > > @@ -101,6 +105,7 @@ struct virtio_net_config {
> > > struct virtio_net_hdr_v1 {
> > > #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
> > > #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
> > > +#define VIRTIO_NET_HDR_F_CSUM_NOT_INET 4 /* Checksum is not inet */
> >
> > Both comment and name are not very informative.
> > How about just saying CRC32c ?
>
> csum_not_inet is following the nomenclature used in sk_buff. Initially
> Davide had named the sk_buff field after crc32c but then it was argued
> that maybe another check method may be introduced later and the name
> would be bogus, thus why the "csum_not_inet".
That's sk_buff internals, since Linux can change these
at any time without breaking anything.
virtio defines an ABI so we won't be able to make it
mean something else, need to document it fully.
"Go read linux net core code" is not a good answer
to the natural question "what does this bit in the
interface do".
> >
> > > __u8 flags;
> > > #define VIRTIO_NET_HDR_GSO_NONE 0 /* Not a GSO frame */
> > > #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> > > --
> > > 2.9.5
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
^ permalink raw reply
* Re: [PATCH V2 net-next 5/6] macvlan/macvtap: Add support for SCTP checksum offload.
From: Vlad Yasevich @ 2018-05-02 14:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtio-dev, marcelo.leitner, nhorman, netdev, virtualization,
linux-sctp, Vladislav Yasevich
In-Reply-To: <20180502170424-mutt-send-email-mst@kernel.org>
On 05/02/2018 10:17 AM, Michael S. Tsirkin wrote:
> On Wed, May 02, 2018 at 10:00:14AM -0400, Vlad Yasevich wrote:
>> On 05/02/2018 09:46 AM, Michael S. Tsirkin wrote:
>>> On Wed, May 02, 2018 at 09:27:00AM -0400, Vlad Yasevich wrote:
>>>> On 05/01/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, May 01, 2018 at 10:07:38PM -0400, Vladislav Yasevich wrote:
>>>>>> Since we now have support for software CRC32c offload, turn it on
>>>>>> for macvlan and macvtap devices so that guests can take advantage
>>>>>> of offload SCTP checksums to the host or host hardware.
>>>>>>
>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>>>> ---
>>>>>> drivers/net/macvlan.c | 5 +++--
>>>>>> drivers/net/tap.c | 8 +++++---
>>>>>> 2 files changed, 8 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>>>>>> index 725f4b4..646b730 100644
>>>>>> --- a/drivers/net/macvlan.c
>>>>>> +++ b/drivers/net/macvlan.c
>>>>>> @@ -834,7 +834,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
>>>>>>
>>>>>> #define ALWAYS_ON_OFFLOADS \
>>>>>> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \
>>>>>> - NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL)
>>>>>> + NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL | NETIF_F_SCTP_CRC)
>>>>>>
>>>>>> #define ALWAYS_ON_FEATURES (ALWAYS_ON_OFFLOADS | NETIF_F_LLTX)
>>>>>>
>>>>>> @@ -842,7 +842,8 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
>>>>>> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
>>>>>> NETIF_F_GSO | NETIF_F_TSO | NETIF_F_LRO | \
>>>>>> NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
>>>>>> - NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
>>>>>> + NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
>>>>>> + NETIF_F_SCTP_CRC)
>>>>>>
>>>>>> #define MACVLAN_STATE_MASK \
>>>>>> ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
>>>>>> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
>>>>>> index 9b6cb78..2c8512b 100644
>>>>>> --- a/drivers/net/tap.c
>>>>>> +++ b/drivers/net/tap.c
>>>>>> @@ -369,8 +369,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
>>>>>> * check, we either support them all or none.
>>>>>> */
>>>>>> if (skb->ip_summed == CHECKSUM_PARTIAL &&
>>>>>> - !(features & NETIF_F_CSUM_MASK) &&
>>>>>> - skb_checksum_help(skb))
>>>>>> + skb_csum_hwoffload_help(skb, features))
>>>>>> goto drop;
>>>>>> if (ptr_ring_produce(&q->ring, skb))
>>>>>> goto drop;
>>>>>> @@ -945,6 +944,9 @@ static int set_offload(struct tap_queue *q, unsigned long arg)
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> + if (arg & TUN_F_SCTP_CSUM)
>>>>>> + feature_mask |= NETIF_F_SCTP_CRC;
>>>>>> +
>>>>>
>>>>> so this still affects TX, shouldn't this affect RX instead?
>>>>
>>>> There is no bit to set on the RX path just like there is no bit to set on the RX patch
>>>> for TUN_F_CSUM.
>>>>
>>>> We only invert TSO offloads, not checksum offloads as the comment below states.
>>>> For checksum, macvtap has to compute the checksum itself in tap_handle_frame() above.
>>>> It uses tx feature bits to see if needs do to the checksum.
>>>>
>>>> If you think we need another flag to macvtap to control RXCSUM, that would need to be
>>>> separate and cover standard TCP checksum as well.
>>>>
>>>> -vlad
>>>
>>> Confused. What is the meaning of TUN_F_SCTP_CSUM? I assume this is
>>> a way for userspace to tell tun device: "I can handle
>>> packets without SCTP checksum, pls send them my way".
>>
>> Yes, just as TUN_F_CSUM means that tun device can handle packets with
>> partial tcp/udp checksum.
>>
>>>
>>> Now what is the implication for macvtap?
>>
>> The implication is exactly the same as for TUN_F_CSUM. If the
>> flag is set on the macvtap device, the TX checksum feature is
>> turned on.
>
> I guess I will have to go back and re-read that code - I do not remember
> what does TUN_F_CSUM does by now. Here is a quick question that might
> help me:
>
> Let's assume userspace does not set TUN_F_SCTP_CSUM and device
> does not calculate the checksums either. I would expect that with
> macvtap this then behaves exactly like now with no overhead.
correct.
>
>>
>>> And why are
>>> you setting NETIF_F_SCTP_CRC which is a flag
>>> that affects packets sent by guest to host?
>>
>> Mainly its because we are using just 1 flag to control checksum
>> offloading and we need to be able control both tx and rx paths.
>
> Well that's not really the case I think. What we have is controls for tx
> offloads for tun. That's TUN_F_CSUM.
> there are no rx offloads - userspace can send what it wants.
>
> These are supposed to translate to rx offloads for macvtap.
> tx offloads shouldn't be affected at all.
Ok. This is how it works for TSO, but looks like there was a disconnect
on the checksum. In both tun and macvtap case, TUN_F_CSUM controls
the tx checksum feature bit.
>
> Maybe that's not the case - as I said need to go back and check.
> Will try to find the time in the next couple of days.
>
>> What you are suggesting that we either invert what TUN_F_CSUM
>> is doing in macvtap case, or have another flag that lets us control
>> TX and RX paths separately.
>>
>> Either case, that would be separate work.
>> -vlad
>
> So assuming TUN_F_CSUM affects tx for macvtap do you
> agree it's a bug? And should we add to it with
> TUN_F_SCTP_CSUM doing the same?
let me think about what it would look like if TUN_F_CSUM
controlled the rx bits...
-vlad
>
>>>
>>>
>>>>>
>>>>>
>>>>>> /* tun/tap driver inverts the usage for TSO offloads, where
>>>>>> * setting the TSO bit means that the userspace wants to
>>>>>> * accept TSO frames and turning it off means that user space
>>>>>> @@ -1077,7 +1079,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
>>>>>> case TUNSETOFFLOAD:
>>>>>> /* let the user check for future flags */
>>>>>> if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
>>>>>> - TUN_F_TSO_ECN | TUN_F_UFO))
>>>>>> + TUN_F_TSO_ECN | TUN_F_UFO | TUN_F_SCTP_CSUM))
>>>>>> return -EINVAL;
>>>>>>
>>>>>> rtnl_lock();
>>>>>> --
>>>>>> 2.9.5
^ permalink raw reply
* Re: [PATCH] Revert "vhost: make msg padding explicit"
From: David Miller @ 2018-05-02 14:27 UTC (permalink / raw)
To: mst; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1525270740-460646-1-git-send-email-mst@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 2 May 2018 17:19:05 +0300
> This reverts commit 93c0d549c4c5a7382ad70de6b86610b7aae57406.
>
> Unfortunately the padding will break 32 bit userspace.
> Ouch. Need to add some compat code, revert for now.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Applied, thanks Michael.
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-02 15:12 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180502164828-mutt-send-email-mst@kernel.org>
On Wed, May 02, 2018 at 04:51:01PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 02, 2018 at 03:28:19PM +0800, Tiwei Bie wrote:
> > On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > This commit introduces the event idx support in packed
> > > > ring. This feature is temporarily disabled, because the
> > > > implementation in this patch may not work as expected,
> > > > and some further discussions on the implementation are
> > > > needed, e.g. do we have to check the wrap counter when
> > > > checking whether a kick is needed?
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > > drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> > > > 1 file changed, 49 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 0181e93897be..b1039c2985b9 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > {
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > - u16 flags;
> > > > + u16 new, old, off_wrap, flags;
> > > > bool needs_kick;
> > > > u32 snapshot;
> > > > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > * suppressions. */
> > > > virtio_mb(vq->weak_barriers);
> > > > + old = vq->next_avail_idx - vq->num_added;
> > > > + new = vq->next_avail_idx;
> > > > + vq->num_added = 0;
> > > > +
> > > > snapshot = *(u32 *)vq->vring_packed.device;
> > > > + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> > > > flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> > > > #ifdef DEBUG
> > > > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > vq->last_add_time_valid = false;
> > > > #endif
> > > > - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > + if (flags == VRING_EVENT_F_DESC)
> > > > + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> > >
> > > I wonder whether or not the math is correct. Both new and event are in the
> > > unit of descriptor ring size, but old looks not.
> >
> > What vring_need_event() cares is the distance between
> > `new` and `old`, i.e. vq->num_added. So I think there
> > is nothing wrong with `old`. But the calculation of the
> > distance between `new` and `event_idx` isn't right when
> > `new` wraps. How do you think about the below code:
> >
> > wrap_counter = off_wrap >> 15;
> > event_idx = off_wrap & ~(1<<15);
> > if (wrap_counter != vq->wrap_counter)
> > event_idx -= vq->vring_packed.num;
> >
> > needs_kick = vring_need_event(event_idx, new, old);
>
> I suspect this hack won't work for non power of 2 ring.
Above code doesn't require the ring size to be a power of 2.
For (__u16)(new_idx - old), what we want to get is vq->num_added.
old = vq->next_avail_idx - vq->num_added;
new = vq->next_avail_idx;
When vq->next_avail_idx >= vq->num_added, it's obvious that,
(__u16)(new_idx - old) is vq->num_added.
And when vq->next_avail_idx < vq->num_added, new will be smaller
than old (old will be a big unsigned number), but (__u16)(new_idx
- old) is still vq->num_added.
For (__u16)(new_idx - event_idx - 1), when new wraps and event_idx
doesn't wrap, the most straightforward way to calculate it is:
(new + vq->vring_packed.num) - event_idx - 1.
But we can also calculate it in this way:
event_idx -= vq->vring_packed.num;
(event_idx will be a big unsigned number)
Then (__u16)(new_idx - event_idx - 1) will be the value we want.
Best regards,
Tiwei Bie
>
>
> > Best regards,
> > Tiwei Bie
> >
> >
> > >
> > > Thanks
> > >
> > > > + else
> > > > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > END_USE(vq);
> > > > return needs_kick;
> > > > }
> > > > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > > if (vq->last_used_idx >= vq->vring_packed.num)
> > > > vq->last_used_idx -= vq->vring_packed.num;
> > > > + /* If we expect an interrupt for the next entry, tell host
> > > > + * by writing event index and flush out the write before
> > > > + * the read in the next get_buf call. */
> > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > > > + virtio_store_mb(vq->weak_barriers,
> > > > + &vq->vring_packed.driver->off_wrap,
> > > > + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > > > + (vq->wrap_counter << 15)));
> > > > +
> > > > #ifdef DEBUG
> > > > vq->last_add_time_valid = false;
> > > > #endif
> > > > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > /* We optimistically turn back on interrupts, then check if there was
> > > > * more to do. */
> > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > + * either clear the flags bit or point the event index at the next
> > > > + * entry. Always update the event index to keep code simple. */
> > > > +
> > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > + vq->last_used_idx | (vq->wrap_counter << 15));
> > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > virtio_wmb(vq->weak_barriers);
> > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > + VRING_EVENT_F_ENABLE;
> > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > vq->event_flags_shadow);
> > > > }
> > > > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > > > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > {
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > + u16 bufs, used_idx, wrap_counter;
> > > > START_USE(vq);
> > > > /* We optimistically turn back on interrupts, then check if there was
> > > > * more to do. */
> > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > + * either clear the flags bit or point the event index at the next
> > > > + * entry. Always update the event index to keep code simple. */
> > > > +
> > > > + /* TODO: tune this threshold */
> > > > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > > > +
> > > > + used_idx = vq->last_used_idx + bufs;
> > > > + wrap_counter = vq->wrap_counter;
> > > > +
> > > > + if (used_idx >= vq->vring_packed.num) {
> > > > + used_idx -= vq->vring_packed.num;
> > > > + wrap_counter ^= 1;
> > > > + }
> > > > +
> > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > + used_idx | (wrap_counter << 15));
> > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > virtio_wmb(vq->weak_barriers);
> > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > + VRING_EVENT_F_ENABLE;
> > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > vq->event_flags_shadow);
> > > > }
> > > > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > switch (i) {
> > > > case VIRTIO_RING_F_INDIRECT_DESC:
> > > > break;
> > > > +#if 0
> > > > case VIRTIO_RING_F_EVENT_IDX:
> > > > break;
> > > > +#endif
> > > > case VIRTIO_F_VERSION_1:
> > > > break;
> > > > case VIRTIO_F_IOMMU_PLATFORM:
> > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP WEBIST 2018 - 14th Int.l Conf. on Web Information Systems and Technologies (Seville/Spain)
From: webist @ 2018-05-02 15:27 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
14th International Conference on Web Information Systems and Technologies
Submission Deadline: May 24, 2018
http://www.webist.org/
September 18 - 20, 2018
Seville, Spain.
WEBIST is organized in 5 major tracks:
- Internet Technology
- Mobile and NLP Information Systems
- Service Based Information Systems, Platforms and Eco-Systems
- Web Intelligence
- Web Interfaces
With the presence of internationally distinguished keynote speakers:
Antonia Bertolino, Italian National Research Council - CNR, Italy
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
WEBIST Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://www.webist.org/
e-mail: webist.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP PhyCS 2018 - 5th Int.l Conf. on Physiological Computing Systems (Seville/Spain)
From: phycs @ 2018-05-02 15:27 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
5th International Conference on Physiological Computing Systems
Submission Deadline: May 24, 2018
http://www.phycs.org/
September 19 - 21, 2018
Seville, Spain.
PhyCS is organized in 4 major tracks:
- Devices
- Methodologies and Methods
- Human Factors
- Applications
With the presence of internationally distinguished keynote speakers:
Eduardo Rocon, Consejo Superior de Investigaciones Científicas, Spain
Lucas Noldus, Noldus Information Technology bv, Netherlands
Georgios N. Yannakakis, University of Malta, Malta
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
PhyCS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://www.phycs.org/
e-mail: phycs.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-05-02 15:34 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180502075021.GC19250@nanopsycho>
On 5/2/2018 12:50 AM, Jiri Pirko wrote:
> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>>>>> Now I try to change mac of the failover master:
>>>>> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>>>>> RTNETLINK answers: Operation not supported
>>>>>
>>>>> That I did expect to work. I would expect this would change the mac of
>>>>> the master and both standby and primary slaves.
>>>> If a VF is untrusted, a VM will not able to change its MAC and moreover
>>> Note that at this point, I have no VF. So I'm not sure why you mention
>>> that.
>>>
>>>
>>>> in this mode we are assuming that the hypervisor has assigned the MAC and
>>>> guest is not expected to change the MAC.
>>> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>>> mac and all works fine. How is this different? Change mac on "failover
>>> instance" should work and should propagate the mac down to its slaves.
>>>
>>>
>>>> For the initial implementation, i would propose not allowing the guest to
>>>> change the MAC of failover or standby dev.
>>> I see no reason for such restriction.
>>>
>> It is true that a VM user can change mac address of a normal virtio-net interface,
>> however when it is in STANDBY mode i think we should not allow this change specifically
>> because we are creating a failover instance based on a MAC that is assigned by the
>> hypervisor.
>>
>> Moreover, in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> the VF and it cannot be changed by the guest.
> So that is easy. You allow the change of the mac and in the "failover"
> mac change implementation you propagate the change down to slaves. If
> one slave does not support the change, you bail out. And since VF does
> not allow it as you say, once it will be enslaved, the mac change could
> not be done. Seems like a correct behavior to me and is in-sync with how
> bond/team behaves.
If we allow the mac to be changed when the VF is not yet plugged in
and the guest changes the mac, then VF cannot join the failover when
it is hot plugged with the original mac assigned by the hypervisor.
Specifically there could be timing window during the live migration where
VF would be unplugged at the source and if we allow the guest to change the
failover mac, then VF would not be able to register with the failover after
the VM is migrated to the destination.
>
>> So for the initial implementation, do you see any issues with having this restriction
>> in STANDBY mode.
>>
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Michael S. Tsirkin @ 2018-05-02 15:42 UTC (permalink / raw)
To: Tiwei Bie; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180502151255.h3x6rhszxa3euinl@debian>
On Wed, May 02, 2018 at 11:12:55PM +0800, Tiwei Bie wrote:
> On Wed, May 02, 2018 at 04:51:01PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 02, 2018 at 03:28:19PM +0800, Tiwei Bie wrote:
> > > On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> > > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > > This commit introduces the event idx support in packed
> > > > > ring. This feature is temporarily disabled, because the
> > > > > implementation in this patch may not work as expected,
> > > > > and some further discussions on the implementation are
> > > > > needed, e.g. do we have to check the wrap counter when
> > > > > checking whether a kick is needed?
> > > > >
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > ---
> > > > > drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> > > > > 1 file changed, 49 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 0181e93897be..b1039c2985b9 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > {
> > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > - u16 flags;
> > > > > + u16 new, old, off_wrap, flags;
> > > > > bool needs_kick;
> > > > > u32 snapshot;
> > > > > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > * suppressions. */
> > > > > virtio_mb(vq->weak_barriers);
> > > > > + old = vq->next_avail_idx - vq->num_added;
> > > > > + new = vq->next_avail_idx;
> > > > > + vq->num_added = 0;
> > > > > +
> > > > > snapshot = *(u32 *)vq->vring_packed.device;
> > > > > + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> > > > > flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> > > > > #ifdef DEBUG
> > > > > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > vq->last_add_time_valid = false;
> > > > > #endif
> > > > > - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > + if (flags == VRING_EVENT_F_DESC)
> > > > > + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> > > >
> > > > I wonder whether or not the math is correct. Both new and event are in the
> > > > unit of descriptor ring size, but old looks not.
> > >
> > > What vring_need_event() cares is the distance between
> > > `new` and `old`, i.e. vq->num_added. So I think there
> > > is nothing wrong with `old`. But the calculation of the
> > > distance between `new` and `event_idx` isn't right when
> > > `new` wraps. How do you think about the below code:
> > >
> > > wrap_counter = off_wrap >> 15;
> > > event_idx = off_wrap & ~(1<<15);
> > > if (wrap_counter != vq->wrap_counter)
> > > event_idx -= vq->vring_packed.num;
> > >
> > > needs_kick = vring_need_event(event_idx, new, old);
> >
> > I suspect this hack won't work for non power of 2 ring.
>
> Above code doesn't require the ring size to be a power of 2.
>
> For (__u16)(new_idx - old), what we want to get is vq->num_added.
>
> old = vq->next_avail_idx - vq->num_added;
> new = vq->next_avail_idx;
>
> When vq->next_avail_idx >= vq->num_added, it's obvious that,
> (__u16)(new_idx - old) is vq->num_added.
>
> And when vq->next_avail_idx < vq->num_added, new will be smaller
> than old (old will be a big unsigned number), but (__u16)(new_idx
> - old) is still vq->num_added.
>
> For (__u16)(new_idx - event_idx - 1), when new wraps and event_idx
> doesn't wrap, the most straightforward way to calculate it is:
> (new + vq->vring_packed.num) - event_idx - 1.
So how about we use the straightforward way then?
> But we can also calculate it in this way:
>
> event_idx -= vq->vring_packed.num;
> (event_idx will be a big unsigned number)
>
> Then (__u16)(new_idx - event_idx - 1) will be the value we want.
>
> Best regards,
> Tiwei Bie
> >
> >
> > > Best regards,
> > > Tiwei Bie
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > > + else
> > > > > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > END_USE(vq);
> > > > > return needs_kick;
> > > > > }
> > > > > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > > > if (vq->last_used_idx >= vq->vring_packed.num)
> > > > > vq->last_used_idx -= vq->vring_packed.num;
> > > > > + /* If we expect an interrupt for the next entry, tell host
> > > > > + * by writing event index and flush out the write before
> > > > > + * the read in the next get_buf call. */
> > > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > > > > + virtio_store_mb(vq->weak_barriers,
> > > > > + &vq->vring_packed.driver->off_wrap,
> > > > > + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > > > > + (vq->wrap_counter << 15)));
> > > > > +
> > > > > #ifdef DEBUG
> > > > > vq->last_add_time_valid = false;
> > > > > #endif
> > > > > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > * more to do. */
> > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > + * either clear the flags bit or point the event index at the next
> > > > > + * entry. Always update the event index to keep code simple. */
> > > > > +
> > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > + vq->last_used_idx | (vq->wrap_counter << 15));
> > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > virtio_wmb(vq->weak_barriers);
> > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > + VRING_EVENT_F_ENABLE;
> > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > vq->event_flags_shadow);
> > > > > }
> > > > > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > > > > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > {
> > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > + u16 bufs, used_idx, wrap_counter;
> > > > > START_USE(vq);
> > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > * more to do. */
> > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > + * either clear the flags bit or point the event index at the next
> > > > > + * entry. Always update the event index to keep code simple. */
> > > > > +
> > > > > + /* TODO: tune this threshold */
> > > > > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > > > > +
> > > > > + used_idx = vq->last_used_idx + bufs;
> > > > > + wrap_counter = vq->wrap_counter;
> > > > > +
> > > > > + if (used_idx >= vq->vring_packed.num) {
> > > > > + used_idx -= vq->vring_packed.num;
> > > > > + wrap_counter ^= 1;
> > > > > + }
> > > > > +
> > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > + used_idx | (wrap_counter << 15));
> > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > virtio_wmb(vq->weak_barriers);
> > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > + VRING_EVENT_F_ENABLE;
> > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > vq->event_flags_shadow);
> > > > > }
> > > > > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > > switch (i) {
> > > > > case VIRTIO_RING_F_INDIRECT_DESC:
> > > > > break;
> > > > > +#if 0
> > > > > case VIRTIO_RING_F_EVENT_IDX:
> > > > > break;
> > > > > +#endif
> > > > > case VIRTIO_F_VERSION_1:
> > > > > break;
> > > > > case VIRTIO_F_IOMMU_PLATFORM:
> > > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Michael S. Tsirkin @ 2018-05-02 15:47 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Samudrala, Sridhar,
virtualization, loseweigh, netdev, aaron.f.brown, davem
In-Reply-To: <20180502075021.GC19250@nanopsycho>
On Wed, May 02, 2018 at 09:50:21AM +0200, Jiri Pirko wrote:
> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
> >On 4/30/2018 12:20 AM, Jiri Pirko wrote:
> >>
> >> > > Now I try to change mac of the failover master:
> >> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> >> > > RTNETLINK answers: Operation not supported
> >> > >
> >> > > That I did expect to work. I would expect this would change the mac of
> >> > > the master and both standby and primary slaves.
> >> > If a VF is untrusted, a VM will not able to change its MAC and moreover
> >> Note that at this point, I have no VF. So I'm not sure why you mention
> >> that.
> >>
> >>
> >> > in this mode we are assuming that the hypervisor has assigned the MAC and
> >> > guest is not expected to change the MAC.
> >> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
> >> mac and all works fine. How is this different? Change mac on "failover
> >> instance" should work and should propagate the mac down to its slaves.
> >>
> >>
> >> > For the initial implementation, i would propose not allowing the guest to
> >> > change the MAC of failover or standby dev.
> >> I see no reason for such restriction.
> >>
> >
> >It is true that a VM user can change mac address of a normal virtio-net interface,
> >however when it is in STANDBY mode i think we should not allow this change specifically
> >because we are creating a failover instance based on a MAC that is assigned by the
> >hypervisor.
> >
> >Moreover, in a cloud environment i would think that PF/hypervisor assigns a MAC to
> >the VF and it cannot be changed by the guest.
>
> So that is easy. You allow the change of the mac and in the "failover"
> mac change implementation you propagate the change down to slaves. If
> one slave does not support the change, you bail out. And since VF does
I wish people would say primary/standby and not "VF" :)
> not allow it as you say, once it will be enslaved, the mac change could
> not be done. Seems like a correct behavior to me
what if primary does not allow mac changes and is attached after
mac is changed on standy?
> and is in-sync with how
> bond/team behaves.
I think in the end virtio can just block MAC changes for simplicity.
I personally don't see softmac as a must have feature in v1,
we can add it later.
What's the situation with init scripts and whether it's
possible to make them work well would be a better question.
>
> >
> >So for the initial implementation, do you see any issues with having this restriction
> >in STANDBY mode.
> >
> >
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-05-02 16:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Samudrala, Sridhar,
virtualization, loseweigh, netdev, aaron.f.brown, davem
In-Reply-To: <20180502184326-mutt-send-email-mst@kernel.org>
Wed, May 02, 2018 at 05:47:27PM CEST, mst@redhat.com wrote:
>On Wed, May 02, 2018 at 09:50:21AM +0200, Jiri Pirko wrote:
>> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> >On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> >>
>> >> > > Now I try to change mac of the failover master:
>> >> > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> >> > > RTNETLINK answers: Operation not supported
>> >> > >
>> >> > > That I did expect to work. I would expect this would change the mac of
>> >> > > the master and both standby and primary slaves.
>> >> > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> >> Note that at this point, I have no VF. So I'm not sure why you mention
>> >> that.
>> >>
>> >>
>> >> > in this mode we are assuming that the hypervisor has assigned the MAC and
>> >> > guest is not expected to change the MAC.
>> >> Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> >> mac and all works fine. How is this different? Change mac on "failover
>> >> instance" should work and should propagate the mac down to its slaves.
>> >>
>> >>
>> >> > For the initial implementation, i would propose not allowing the guest to
>> >> > change the MAC of failover or standby dev.
>> >> I see no reason for such restriction.
>> >>
>> >
>> >It is true that a VM user can change mac address of a normal virtio-net interface,
>> >however when it is in STANDBY mode i think we should not allow this change specifically
>> >because we are creating a failover instance based on a MAC that is assigned by the
>> >hypervisor.
>> >
>> >Moreover, in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> >the VF and it cannot be changed by the guest.
>>
>> So that is easy. You allow the change of the mac and in the "failover"
>> mac change implementation you propagate the change down to slaves. If
>> one slave does not support the change, you bail out. And since VF does
>
>I wish people would say primary/standby and not "VF" :)
Sure, sorry.
>
>> not allow it as you say, once it will be enslaved, the mac change could
>> not be done. Seems like a correct behavior to me
>
>
>what if primary does not allow mac changes and is attached after
>mac is changed on standy?
Mac should be changed on failover. In that case, the primary would have
a different mac and therefore it won't get enslaved.
>
>
>> and is in-sync with how
>> bond/team behaves.
>
>I think in the end virtio can just block MAC changes for simplicity.
>
>I personally don't see softmac as a must have feature in v1,
>we can add it later.
Okay.
>
>What's the situation with init scripts and whether it's
>possible to make them work well would be a better question.
>
>>
>> >
>> >So for the initial implementation, do you see any issues with having this restriction
>> >in STANDBY mode.
>> >
>> >
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-05-02 16:05 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <c94ce6a9-6f36-675c-cf5b-414454fc2244@intel.com>
Wed, May 02, 2018 at 05:34:44PM CEST, sridhar.samudrala@intel.com wrote:
>On 5/2/2018 12:50 AM, Jiri Pirko wrote:
>> Wed, May 02, 2018 at 02:20:26AM CEST, sridhar.samudrala@intel.com wrote:
>> > On 4/30/2018 12:20 AM, Jiri Pirko wrote:
>> > > > > Now I try to change mac of the failover master:
>> > > > > [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
>> > > > > RTNETLINK answers: Operation not supported
>> > > > >
>> > > > > That I did expect to work. I would expect this would change the mac of
>> > > > > the master and both standby and primary slaves.
>> > > > If a VF is untrusted, a VM will not able to change its MAC and moreover
>> > > Note that at this point, I have no VF. So I'm not sure why you mention
>> > > that.
>> > >
>> > >
>> > > > in this mode we are assuming that the hypervisor has assigned the MAC and
>> > > > guest is not expected to change the MAC.
>> > > Wait, for ordinary old-fashioned virtio_net, as a VM user, I can change
>> > > mac and all works fine. How is this different? Change mac on "failover
>> > > instance" should work and should propagate the mac down to its slaves.
>> > >
>> > >
>> > > > For the initial implementation, i would propose not allowing the guest to
>> > > > change the MAC of failover or standby dev.
>> > > I see no reason for such restriction.
>> > >
>> > It is true that a VM user can change mac address of a normal virtio-net interface,
>> > however when it is in STANDBY mode i think we should not allow this change specifically
>> > because we are creating a failover instance based on a MAC that is assigned by the
>> > hypervisor.
>> >
>> > Moreover, in a cloud environment i would think that PF/hypervisor assigns a MAC to
>> > the VF and it cannot be changed by the guest.
>> So that is easy. You allow the change of the mac and in the "failover"
>> mac change implementation you propagate the change down to slaves. If
>> one slave does not support the change, you bail out. And since VF does
>> not allow it as you say, once it will be enslaved, the mac change could
>> not be done. Seems like a correct behavior to me and is in-sync with how
>> bond/team behaves.
>
>If we allow the mac to be changed when the VF is not yet plugged in
>and the guest changes the mac, then VF cannot join the failover when
>it is hot plugged with the original mac assigned by the hypervisor.
>Specifically there could be timing window during the live migration where
>VF would be unplugged at the source and if we allow the guest to change the
>failover mac, then VF would not be able to register with the failover after
>the VM is migrated to the destination.
Okay. So as suggested by mst, just forbid the mac changes all together.
>
>>
>> > So for the initial implementation, do you see any issues with having this restriction
>> > in STANDBY mode.
>> >
>> >
>
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-05-02 16:15 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180428090601.GL5632@nanopsycho.orion>
Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
>Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
[...]
>>+
>>+ err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>>+ failover_dev);
>>+ if (err) {
>>+ netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>>+ err);
>>+ goto err_handler_register;
>>+ }
>>+
>>+ err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
>
>Please use netdev_master_upper_dev_link().
Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
Also, please call netdev_lower_state_changed() when the active slave
device changes from primary->backup of backup->primary and whenever link
state of a slave changes
[...]
^ permalink raw reply
* Re: [PATCH V2 net-next 4/6] tun: Add support for SCTP checksum offload
From: Vlad Yasevich @ 2018-05-02 17:17 UTC (permalink / raw)
To: Marcelo Ricardo Leitner, Vladislav Yasevich
Cc: virtio-dev, nhorman, mst, netdev, virtualization, linux-sctp
In-Reply-To: <20180502145607.GH5105@localhost.localdomain>
On 05/02/2018 10:56 AM, Marcelo Ricardo Leitner wrote:
> On Wed, May 02, 2018 at 11:53:47AM -0300, Marcelo Ricardo Leitner wrote:
>> On Tue, May 01, 2018 at 10:07:37PM -0400, Vladislav Yasevich wrote:
>>> Adds a new tun offload flag to allow for SCTP checksum offload.
>>> The flag has to be set by the user and defaults to "no offload".
>>
>> I'm confused here:
>>
>>> +++ b/drivers/net/tun.c
>>> @@ -216,7 +216,7 @@ struct tun_struct {
>>> struct net_device *dev;
>>> netdev_features_t set_features;
>>> #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
>>> - NETIF_F_TSO6)
>>> + NETIF_F_TSO6|NETIF_F_SCTP_CRC)
>>
>> Doesn't adding it here mean it defaults to "offload", instead?
>>
>> later on, it does:
>> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
>> TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
>> NETIF_F_HW_VLAN_STAG_TX;
>
> Missed to paste the next line too:
> dev->features = dev->hw_features | NETIF_F_LLTX;
>
Yes, as a software device, we can default it to on. However, qemu will 0-out
the features and then set them up based on the guest (just like regular checksum).
-vlad
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-02 17:51 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180502161542.GI19250@nanopsycho>
On 5/2/2018 9:15 AM, Jiri Pirko wrote:
> Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
>> Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
> [...]
>
>
>>> +
>>> + err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>>> + failover_dev);
>>> + if (err) {
>>> + netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>>> + err);
>>> + goto err_handler_register;
>>> + }
>>> +
>>> + err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
>> Please use netdev_master_upper_dev_link().
> Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
>
>
> Also, please call netdev_lower_state_changed() when the active slave
> device changes from primary->backup of backup->primary and whenever link
> state of a slave changes
>
Sure. will look into it. Do you think this will help with the issue
you saw with having to change mac on standy twice to get the init scripts
working? We are now going to block changing the mac on both standby and
failover.
Also, i was wondering if we should set dev->flags to IFF_MASTER on failover
and IFF_SLAVE on primary and standby. netvsc does this.
Does this help with the init scripts and network manager to skip slave
devices for dhcp requests?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-05-02 20:30 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, Jiri Pirko, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <052e2c60-dc4c-d6a0-0159-fc1821cb4365@intel.com>
On Wed, May 02, 2018 at 10:51:12AM -0700, Samudrala, Sridhar wrote:
>
>
> On 5/2/2018 9:15 AM, Jiri Pirko wrote:
> > Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
> > > Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
> > [...]
> >
> >
> > > > +
> > > > + err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
> > > > + failover_dev);
> > > > + if (err) {
> > > > + netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
> > > > + err);
> > > > + goto err_handler_register;
> > > > + }
> > > > +
> > > > + err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
> > > Please use netdev_master_upper_dev_link().
> > Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
> >
> >
> > Also, please call netdev_lower_state_changed() when the active slave
> > device changes from primary->backup of backup->primary and whenever link
> > state of a slave changes
> >
> Sure. will look into it. Do you think this will help with the issue
> you saw with having to change mac on standy twice to get the init scripts
> working? We are now going to block changing the mac on both standby and
> failover.
>
> Also, i was wondering if we should set dev->flags to IFF_MASTER on failover
> and IFF_SLAVE on primary and standby.
We do need a way to find things out, that's for sure.
How does userspace know it's a failover
config and find the failover device right now?
> netvsc does this.
> Does this help with the init scripts and network manager to skip slave
> devices for dhcp requests?
Try it?
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-02 21:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, Jiri Pirko, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180502232907-mutt-send-email-mst@kernel.org>
On 5/2/2018 1:30 PM, Michael S. Tsirkin wrote:
> On Wed, May 02, 2018 at 10:51:12AM -0700, Samudrala, Sridhar wrote:
>>
>> On 5/2/2018 9:15 AM, Jiri Pirko wrote:
>>> Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
>>>> Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>>> [...]
>>>
>>>
>>>>> +
>>>>> + err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>>>>> + failover_dev);
>>>>> + if (err) {
>>>>> + netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>>>>> + err);
>>>>> + goto err_handler_register;
>>>>> + }
>>>>> +
>>>>> + err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
>>>> Please use netdev_master_upper_dev_link().
>>> Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
>>>
>>>
>>> Also, please call netdev_lower_state_changed() when the active slave
>>> device changes from primary->backup of backup->primary and whenever link
>>> state of a slave changes
>>>
>> Sure. will look into it. Do you think this will help with the issue
>> you saw with having to change mac on standy twice to get the init scripts
>> working? We are now going to block changing the mac on both standby and
>> failover.
>>
>> Also, i was wondering if we should set dev->flags to IFF_MASTER on failover
>> and IFF_SLAVE on primary and standby.
> We do need a way to find things out, that's for sure.
> How does userspace know it's a failover
> config and find the failover device right now?
# ethtool -i ens12|grep driver
driver: failover
# ethtool -i ens12n_sby|grep driver
driver: virtio_net
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-05-02 21:39 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <052e2c60-dc4c-d6a0-0159-fc1821cb4365@intel.com>
Wed, May 02, 2018 at 07:51:12PM CEST, sridhar.samudrala@intel.com wrote:
>
>
>On 5/2/2018 9:15 AM, Jiri Pirko wrote:
>> Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
>> > Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>> [...]
>>
>>
>> > > +
>> > > + err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>> > > + failover_dev);
>> > > + if (err) {
>> > > + netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>> > > + err);
>> > > + goto err_handler_register;
>> > > + }
>> > > +
>> > > + err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
>> > Please use netdev_master_upper_dev_link().
>> Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
>>
>>
>> Also, please call netdev_lower_state_changed() when the active slave
>> device changes from primary->backup of backup->primary and whenever link
>> state of a slave changes
>>
>Sure. will look into it. Do you think this will help with the issue
>you saw with having to change mac on standy twice to get the init scripts
>working? We are now going to block changing the mac on both standby and
>failover.
>
>Also, i was wondering if we should set dev->flags to IFF_MASTER on failover
>and IFF_SLAVE on primary and standby. netvsc does this.
No. Don't set it. It is wrong.
>Does this help with the init scripts and network manager to skip slave
>devices for dhcp requests?
>
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Jiri Pirko @ 2018-05-02 21:39 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <052e2c60-dc4c-d6a0-0159-fc1821cb4365@intel.com>
Wed, May 02, 2018 at 07:51:12PM CEST, sridhar.samudrala@intel.com wrote:
>
>
>On 5/2/2018 9:15 AM, Jiri Pirko wrote:
>> Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
>> > Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>> [...]
>>
>>
>> > > +
>> > > + err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
>> > > + failover_dev);
>> > > + if (err) {
>> > > + netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
>> > > + err);
>> > > + goto err_handler_register;
>> > > + }
>> > > +
>> > > + err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
>> > Please use netdev_master_upper_dev_link().
>> Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
>>
>>
>> Also, please call netdev_lower_state_changed() when the active slave
>> device changes from primary->backup of backup->primary and whenever link
>> state of a slave changes
>>
>Sure. will look into it. Do you think this will help with the issue
>you saw with having to change mac on standy twice to get the init scripts
>working? We are now going to block changing the mac on both standby and
>failover.
I don't see any relation to that.
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-03 1:11 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180502184015-mutt-send-email-mst@kernel.org>
On Wed, May 02, 2018 at 06:42:57PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 02, 2018 at 11:12:55PM +0800, Tiwei Bie wrote:
> > On Wed, May 02, 2018 at 04:51:01PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 02, 2018 at 03:28:19PM +0800, Tiwei Bie wrote:
> > > > On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> > > > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > > > This commit introduces the event idx support in packed
> > > > > > ring. This feature is temporarily disabled, because the
> > > > > > implementation in this patch may not work as expected,
> > > > > > and some further discussions on the implementation are
> > > > > > needed, e.g. do we have to check the wrap counter when
> > > > > > checking whether a kick is needed?
> > > > > >
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> > > > > > 1 file changed, 49 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 0181e93897be..b1039c2985b9 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > > > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > - u16 flags;
> > > > > > + u16 new, old, off_wrap, flags;
> > > > > > bool needs_kick;
> > > > > > u32 snapshot;
> > > > > > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > * suppressions. */
> > > > > > virtio_mb(vq->weak_barriers);
> > > > > > + old = vq->next_avail_idx - vq->num_added;
> > > > > > + new = vq->next_avail_idx;
> > > > > > + vq->num_added = 0;
> > > > > > +
> > > > > > snapshot = *(u32 *)vq->vring_packed.device;
> > > > > > + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> > > > > > flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> > > > > > #ifdef DEBUG
> > > > > > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > vq->last_add_time_valid = false;
> > > > > > #endif
> > > > > > - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > + if (flags == VRING_EVENT_F_DESC)
> > > > > > + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> > > > >
> > > > > I wonder whether or not the math is correct. Both new and event are in the
> > > > > unit of descriptor ring size, but old looks not.
> > > >
> > > > What vring_need_event() cares is the distance between
> > > > `new` and `old`, i.e. vq->num_added. So I think there
> > > > is nothing wrong with `old`. But the calculation of the
> > > > distance between `new` and `event_idx` isn't right when
> > > > `new` wraps. How do you think about the below code:
> > > >
> > > > wrap_counter = off_wrap >> 15;
> > > > event_idx = off_wrap & ~(1<<15);
> > > > if (wrap_counter != vq->wrap_counter)
> > > > event_idx -= vq->vring_packed.num;
> > > >
> > > > needs_kick = vring_need_event(event_idx, new, old);
> > >
> > > I suspect this hack won't work for non power of 2 ring.
> >
> > Above code doesn't require the ring size to be a power of 2.
> >
> > For (__u16)(new_idx - old), what we want to get is vq->num_added.
> >
> > old = vq->next_avail_idx - vq->num_added;
> > new = vq->next_avail_idx;
> >
> > When vq->next_avail_idx >= vq->num_added, it's obvious that,
> > (__u16)(new_idx - old) is vq->num_added.
> >
> > And when vq->next_avail_idx < vq->num_added, new will be smaller
> > than old (old will be a big unsigned number), but (__u16)(new_idx
> > - old) is still vq->num_added.
> >
> > For (__u16)(new_idx - event_idx - 1), when new wraps and event_idx
> > doesn't wrap, the most straightforward way to calculate it is:
> > (new + vq->vring_packed.num) - event_idx - 1.
>
> So how about we use the straightforward way then?
You mean we do new += vq->vring_packed.num instead
of event_idx -= vq->vring_packed.num before calling
vring_need_event()?
The problem is that, the second param (new_idx) of
vring_need_event() will be used for:
(__u16)(new_idx - event_idx - 1)
(__u16)(new_idx - old)
So if we change new, we will need to change old too.
And that would be an ugly hack..
Best regards,
Tiwei Bie
>
> > But we can also calculate it in this way:
> >
> > event_idx -= vq->vring_packed.num;
> > (event_idx will be a big unsigned number)
> >
> > Then (__u16)(new_idx - event_idx - 1) will be the value we want.
> >
> > Best regards,
> > Tiwei Bie
>
>
> > >
> > >
> > > > Best regards,
> > > > Tiwei Bie
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > + else
> > > > > > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > END_USE(vq);
> > > > > > return needs_kick;
> > > > > > }
> > > > > > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > > > > if (vq->last_used_idx >= vq->vring_packed.num)
> > > > > > vq->last_used_idx -= vq->vring_packed.num;
> > > > > > + /* If we expect an interrupt for the next entry, tell host
> > > > > > + * by writing event index and flush out the write before
> > > > > > + * the read in the next get_buf call. */
> > > > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > > > > > + virtio_store_mb(vq->weak_barriers,
> > > > > > + &vq->vring_packed.driver->off_wrap,
> > > > > > + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > > > > > + (vq->wrap_counter << 15)));
> > > > > > +
> > > > > > #ifdef DEBUG
> > > > > > vq->last_add_time_valid = false;
> > > > > > #endif
> > > > > > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > * more to do. */
> > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > +
> > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > + vq->last_used_idx | (vq->wrap_counter << 15));
> > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > vq->event_flags_shadow);
> > > > > > }
> > > > > > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > > > > > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > > {
> > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > + u16 bufs, used_idx, wrap_counter;
> > > > > > START_USE(vq);
> > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > * more to do. */
> > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > +
> > > > > > + /* TODO: tune this threshold */
> > > > > > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > > > > > +
> > > > > > + used_idx = vq->last_used_idx + bufs;
> > > > > > + wrap_counter = vq->wrap_counter;
> > > > > > +
> > > > > > + if (used_idx >= vq->vring_packed.num) {
> > > > > > + used_idx -= vq->vring_packed.num;
> > > > > > + wrap_counter ^= 1;
> > > > > > + }
> > > > > > +
> > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > + used_idx | (wrap_counter << 15));
> > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > vq->event_flags_shadow);
> > > > > > }
> > > > > > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > > > switch (i) {
> > > > > > case VIRTIO_RING_F_INDIRECT_DESC:
> > > > > > break;
> > > > > > +#if 0
> > > > > > case VIRTIO_RING_F_EVENT_IDX:
> > > > > > break;
> > > > > > +#endif
> > > > > > case VIRTIO_F_VERSION_1:
> > > > > > break;
> > > > > > case VIRTIO_F_IOMMU_PLATFORM:
> > > > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Michael S. Tsirkin @ 2018-05-03 1:44 UTC (permalink / raw)
To: Tiwei Bie; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180503011116.qvoyblcpklinrk26@debian>
On Thu, May 03, 2018 at 09:11:16AM +0800, Tiwei Bie wrote:
> On Wed, May 02, 2018 at 06:42:57PM +0300, Michael S. Tsirkin wrote:
> > On Wed, May 02, 2018 at 11:12:55PM +0800, Tiwei Bie wrote:
> > > On Wed, May 02, 2018 at 04:51:01PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, May 02, 2018 at 03:28:19PM +0800, Tiwei Bie wrote:
> > > > > On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> > > > > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > > > > This commit introduces the event idx support in packed
> > > > > > > ring. This feature is temporarily disabled, because the
> > > > > > > implementation in this patch may not work as expected,
> > > > > > > and some further discussions on the implementation are
> > > > > > > needed, e.g. do we have to check the wrap counter when
> > > > > > > checking whether a kick is needed?
> > > > > > >
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > ---
> > > > > > > drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> > > > > > > 1 file changed, 49 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index 0181e93897be..b1039c2985b9 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > > > > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > {
> > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > - u16 flags;
> > > > > > > + u16 new, old, off_wrap, flags;
> > > > > > > bool needs_kick;
> > > > > > > u32 snapshot;
> > > > > > > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > * suppressions. */
> > > > > > > virtio_mb(vq->weak_barriers);
> > > > > > > + old = vq->next_avail_idx - vq->num_added;
> > > > > > > + new = vq->next_avail_idx;
> > > > > > > + vq->num_added = 0;
> > > > > > > +
> > > > > > > snapshot = *(u32 *)vq->vring_packed.device;
> > > > > > > + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> > > > > > > flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> > > > > > > #ifdef DEBUG
> > > > > > > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > vq->last_add_time_valid = false;
> > > > > > > #endif
> > > > > > > - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > > + if (flags == VRING_EVENT_F_DESC)
> > > > > > > + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> > > > > >
> > > > > > I wonder whether or not the math is correct. Both new and event are in the
> > > > > > unit of descriptor ring size, but old looks not.
> > > > >
> > > > > What vring_need_event() cares is the distance between
> > > > > `new` and `old`, i.e. vq->num_added. So I think there
> > > > > is nothing wrong with `old`. But the calculation of the
> > > > > distance between `new` and `event_idx` isn't right when
> > > > > `new` wraps. How do you think about the below code:
> > > > >
> > > > > wrap_counter = off_wrap >> 15;
> > > > > event_idx = off_wrap & ~(1<<15);
> > > > > if (wrap_counter != vq->wrap_counter)
> > > > > event_idx -= vq->vring_packed.num;
> > > > >
> > > > > needs_kick = vring_need_event(event_idx, new, old);
> > > >
> > > > I suspect this hack won't work for non power of 2 ring.
> > >
> > > Above code doesn't require the ring size to be a power of 2.
> > >
> > > For (__u16)(new_idx - old), what we want to get is vq->num_added.
> > >
> > > old = vq->next_avail_idx - vq->num_added;
> > > new = vq->next_avail_idx;
> > >
> > > When vq->next_avail_idx >= vq->num_added, it's obvious that,
> > > (__u16)(new_idx - old) is vq->num_added.
> > >
> > > And when vq->next_avail_idx < vq->num_added, new will be smaller
> > > than old (old will be a big unsigned number), but (__u16)(new_idx
> > > - old) is still vq->num_added.
> > >
> > > For (__u16)(new_idx - event_idx - 1), when new wraps and event_idx
> > > doesn't wrap, the most straightforward way to calculate it is:
> > > (new + vq->vring_packed.num) - event_idx - 1.
> >
> > So how about we use the straightforward way then?
>
> You mean we do new += vq->vring_packed.num instead
> of event_idx -= vq->vring_packed.num before calling
> vring_need_event()?
>
> The problem is that, the second param (new_idx) of
> vring_need_event() will be used for:
>
> (__u16)(new_idx - event_idx - 1)
> (__u16)(new_idx - old)
>
> So if we change new, we will need to change old too.
I think that since we have a branch there anyway,
we are better off just special-casing if (wrap_counter != vq->wrap_counter).
Treat is differenty and avoid casts.
> And that would be an ugly hack..
>
> Best regards,
> Tiwei Bie
I consider casts and huge numbers with two's complement
games even uglier.
> >
> > > But we can also calculate it in this way:
> > >
> > > event_idx -= vq->vring_packed.num;
> > > (event_idx will be a big unsigned number)
> > >
> > > Then (__u16)(new_idx - event_idx - 1) will be the value we want.
> > >
> > > Best regards,
> > > Tiwei Bie
> >
> >
> > > >
> > > >
> > > > > Best regards,
> > > > > Tiwei Bie
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > + else
> > > > > > > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > > END_USE(vq);
> > > > > > > return needs_kick;
> > > > > > > }
> > > > > > > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > > > > > if (vq->last_used_idx >= vq->vring_packed.num)
> > > > > > > vq->last_used_idx -= vq->vring_packed.num;
> > > > > > > + /* If we expect an interrupt for the next entry, tell host
> > > > > > > + * by writing event index and flush out the write before
> > > > > > > + * the read in the next get_buf call. */
> > > > > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > > > > > > + virtio_store_mb(vq->weak_barriers,
> > > > > > > + &vq->vring_packed.driver->off_wrap,
> > > > > > > + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > > > > > > + (vq->wrap_counter << 15)));
> > > > > > > +
> > > > > > > #ifdef DEBUG
> > > > > > > vq->last_add_time_valid = false;
> > > > > > > #endif
> > > > > > > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > > * more to do. */
> > > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > > +
> > > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > > + vq->last_used_idx | (vq->wrap_counter << 15));
> > > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > > vq->event_flags_shadow);
> > > > > > > }
> > > > > > > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > > > > > > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > > > {
> > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > + u16 bufs, used_idx, wrap_counter;
> > > > > > > START_USE(vq);
> > > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > > * more to do. */
> > > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > > +
> > > > > > > + /* TODO: tune this threshold */
> > > > > > > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > > > > > > +
> > > > > > > + used_idx = vq->last_used_idx + bufs;
> > > > > > > + wrap_counter = vq->wrap_counter;
> > > > > > > +
> > > > > > > + if (used_idx >= vq->vring_packed.num) {
> > > > > > > + used_idx -= vq->vring_packed.num;
> > > > > > > + wrap_counter ^= 1;
> > > > > > > + }
> > > > > > > +
> > > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > > + used_idx | (wrap_counter << 15));
> > > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > > vq->event_flags_shadow);
> > > > > > > }
> > > > > > > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > > > > switch (i) {
> > > > > > > case VIRTIO_RING_F_INDIRECT_DESC:
> > > > > > > break;
> > > > > > > +#if 0
> > > > > > > case VIRTIO_RING_F_EVENT_IDX:
> > > > > > > break;
> > > > > > > +#endif
> > > > > > > case VIRTIO_F_VERSION_1:
> > > > > > > break;
> > > > > > > case VIRTIO_F_IOMMU_PLATFORM:
> > > > > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-03 2:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180503044218-mutt-send-email-mst@kernel.org>
On Thu, May 03, 2018 at 04:44:39AM +0300, Michael S. Tsirkin wrote:
> On Thu, May 03, 2018 at 09:11:16AM +0800, Tiwei Bie wrote:
> > On Wed, May 02, 2018 at 06:42:57PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, May 02, 2018 at 11:12:55PM +0800, Tiwei Bie wrote:
> > > > On Wed, May 02, 2018 at 04:51:01PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, May 02, 2018 at 03:28:19PM +0800, Tiwei Bie wrote:
> > > > > > On Wed, May 02, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> > > > > > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > > > > > This commit introduces the event idx support in packed
> > > > > > > > ring. This feature is temporarily disabled, because the
> > > > > > > > implementation in this patch may not work as expected,
> > > > > > > > and some further discussions on the implementation are
> > > > > > > > needed, e.g. do we have to check the wrap counter when
> > > > > > > > checking whether a kick is needed?
> > > > > > > >
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > > > drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> > > > > > > > 1 file changed, 49 insertions(+), 4 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index 0181e93897be..b1039c2985b9 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > > > > > > static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > > {
> > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > - u16 flags;
> > > > > > > > + u16 new, old, off_wrap, flags;
> > > > > > > > bool needs_kick;
> > > > > > > > u32 snapshot;
> > > > > > > > @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > > * suppressions. */
> > > > > > > > virtio_mb(vq->weak_barriers);
> > > > > > > > + old = vq->next_avail_idx - vq->num_added;
> > > > > > > > + new = vq->next_avail_idx;
> > > > > > > > + vq->num_added = 0;
> > > > > > > > +
> > > > > > > > snapshot = *(u32 *)vq->vring_packed.device;
> > > > > > > > + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> > > > > > > > flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
> > > > > > > > #ifdef DEBUG
> > > > > > > > @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> > > > > > > > vq->last_add_time_valid = false;
> > > > > > > > #endif
> > > > > > > > - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > > > + if (flags == VRING_EVENT_F_DESC)
> > > > > > > > + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> > > > > > >
> > > > > > > I wonder whether or not the math is correct. Both new and event are in the
> > > > > > > unit of descriptor ring size, but old looks not.
> > > > > >
> > > > > > What vring_need_event() cares is the distance between
> > > > > > `new` and `old`, i.e. vq->num_added. So I think there
> > > > > > is nothing wrong with `old`. But the calculation of the
> > > > > > distance between `new` and `event_idx` isn't right when
> > > > > > `new` wraps. How do you think about the below code:
> > > > > >
> > > > > > wrap_counter = off_wrap >> 15;
> > > > > > event_idx = off_wrap & ~(1<<15);
> > > > > > if (wrap_counter != vq->wrap_counter)
> > > > > > event_idx -= vq->vring_packed.num;
> > > > > >
> > > > > > needs_kick = vring_need_event(event_idx, new, old);
> > > > >
> > > > > I suspect this hack won't work for non power of 2 ring.
> > > >
> > > > Above code doesn't require the ring size to be a power of 2.
> > > >
> > > > For (__u16)(new_idx - old), what we want to get is vq->num_added.
> > > >
> > > > old = vq->next_avail_idx - vq->num_added;
> > > > new = vq->next_avail_idx;
> > > >
> > > > When vq->next_avail_idx >= vq->num_added, it's obvious that,
> > > > (__u16)(new_idx - old) is vq->num_added.
> > > >
> > > > And when vq->next_avail_idx < vq->num_added, new will be smaller
> > > > than old (old will be a big unsigned number), but (__u16)(new_idx
> > > > - old) is still vq->num_added.
> > > >
> > > > For (__u16)(new_idx - event_idx - 1), when new wraps and event_idx
> > > > doesn't wrap, the most straightforward way to calculate it is:
> > > > (new + vq->vring_packed.num) - event_idx - 1.
> > >
> > > So how about we use the straightforward way then?
> >
> > You mean we do new += vq->vring_packed.num instead
> > of event_idx -= vq->vring_packed.num before calling
> > vring_need_event()?
> >
> > The problem is that, the second param (new_idx) of
> > vring_need_event() will be used for:
> >
> > (__u16)(new_idx - event_idx - 1)
> > (__u16)(new_idx - old)
> >
> > So if we change new, we will need to change old too.
>
> I think that since we have a branch there anyway,
> we are better off just special-casing if (wrap_counter != vq->wrap_counter).
> Treat is differenty and avoid casts.
>
> > And that would be an ugly hack..
> >
> > Best regards,
> > Tiwei Bie
>
> I consider casts and huge numbers with two's complement
> games even uglier.
The dependency on two's complement game is introduced
since the split ring.
In packed ring, old is calculated via:
old = vq->next_avail_idx - vq->num_added;
In split ring, old is calculated via:
old = vq->avail_idx_shadow - vq->num_added;
In both cases, when vq->num_added is bigger, old will
be a big number.
Best regards,
Tiwei Bie
>
> > >
> > > > But we can also calculate it in this way:
> > > >
> > > > event_idx -= vq->vring_packed.num;
> > > > (event_idx will be a big unsigned number)
> > > >
> > > > Then (__u16)(new_idx - event_idx - 1) will be the value we want.
> > > >
> > > > Best regards,
> > > > Tiwei Bie
> > >
> > >
> > > > >
> > > > >
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > + else
> > > > > > > > + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> > > > > > > > END_USE(vq);
> > > > > > > > return needs_kick;
> > > > > > > > }
> > > > > > > > @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > > > > > > > if (vq->last_used_idx >= vq->vring_packed.num)
> > > > > > > > vq->last_used_idx -= vq->vring_packed.num;
> > > > > > > > + /* If we expect an interrupt for the next entry, tell host
> > > > > > > > + * by writing event index and flush out the write before
> > > > > > > > + * the read in the next get_buf call. */
> > > > > > > > + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> > > > > > > > + virtio_store_mb(vq->weak_barriers,
> > > > > > > > + &vq->vring_packed.driver->off_wrap,
> > > > > > > > + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> > > > > > > > + (vq->wrap_counter << 15)));
> > > > > > > > +
> > > > > > > > #ifdef DEBUG
> > > > > > > > vq->last_add_time_valid = false;
> > > > > > > > #endif
> > > > > > > > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > > > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > > > * more to do. */
> > > > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > > > +
> > > > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > > > + vq->last_used_idx | (vq->wrap_counter << 15));
> > > > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > > > vq->event_flags_shadow);
> > > > > > > > }
> > > > > > > > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > > > > > > > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > > > > {
> > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > + u16 bufs, used_idx, wrap_counter;
> > > > > > > > START_USE(vq);
> > > > > > > > /* We optimistically turn back on interrupts, then check if there was
> > > > > > > > * more to do. */
> > > > > > > > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > > > > > > > + * either clear the flags bit or point the event index at the next
> > > > > > > > + * entry. Always update the event index to keep code simple. */
> > > > > > > > +
> > > > > > > > + /* TODO: tune this threshold */
> > > > > > > > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> > > > > > > > +
> > > > > > > > + used_idx = vq->last_used_idx + bufs;
> > > > > > > > + wrap_counter = vq->wrap_counter;
> > > > > > > > +
> > > > > > > > + if (used_idx >= vq->vring_packed.num) {
> > > > > > > > + used_idx -= vq->vring_packed.num;
> > > > > > > > + wrap_counter ^= 1;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > > > > > > > + used_idx | (wrap_counter << 15));
> > > > > > > > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > > > > > > > virtio_wmb(vq->weak_barriers);
> > > > > > > > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > > > > > > > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > > > > > > > + VRING_EVENT_F_ENABLE;
> > > > > > > > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > > > > > > > vq->event_flags_shadow);
> > > > > > > > }
> > > > > > > > @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> > > > > > > > switch (i) {
> > > > > > > > case VIRTIO_RING_F_INDIRECT_DESC:
> > > > > > > > break;
> > > > > > > > +#if 0
> > > > > > > > case VIRTIO_RING_F_EVENT_IDX:
> > > > > > > > break;
> > > > > > > > +#endif
> > > > > > > > case VIRTIO_F_VERSION_1:
> > > > > > > > break;
> > > > > > > > case VIRTIO_F_IOMMU_PLATFORM:
> > > > > > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [RFC] virtio: support VIRTIO_F_IO_BARRIER
From: Tiwei Bie @ 2018-05-03 2:59 UTC (permalink / raw)
To: mst, jasowang, pbonzini, stefanha, virtualization, linux-kernel
Cc: zhihong.wang
This patch introduces the support for VIRTIO_F_IO_BARRIER.
When this feature is negotiated, driver will use the barriers
suitable for hardware devices.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 5 +++++
include/uapi/linux/virtio_config.h | 8 +++++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 21d464a29cf8..edb565643bf4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -996,6 +996,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+ if (virtio_has_feature(vdev, VIRTIO_F_IO_BARRIER))
+ vq->weak_barriers = false;
+
/* No callback? Tell other side not to bother us. */
if (!callback) {
vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
@@ -1164,6 +1167,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+ case VIRTIO_F_IO_BARRIER:
+ break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..6ca8d24bf468 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
* transport being used (eg. virtio_ring), the rest are per-device feature
* bits. */
#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 38
#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,10 @@
* this is for compatibility with legacy systems.
*/
#define VIRTIO_F_IOMMU_PLATFORM 33
+
+/*
+ * If clear - driver may use barriers suitable for CPU cores.
+ * If set - driver must use barriers suitable for hardware devices.
+ */
+#define VIRTIO_F_IO_BARRIER 37
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox