* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 16:52 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522154501.GL2149@nanopsycho>
On Tue, May 22, 2018 at 05:45:01PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 05:32:30PM CEST, mst@redhat.com wrote:
> >On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
> >> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
> >> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
> >> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
> >> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
> >> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> >> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> >> >> >> >>Use the registration/notification framework supported by the generic
> >> >> >> >> >>failover infrastructure.
> >> >> >> >> >>
> >> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> >> >
> >> >> >> >> >In previous patchset versions, the common code did
> >> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >> >> >> >
> >> >> >> >> >This should be part of the common "failover" code.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> >> >> >> IFF_FAILOVER_SLAVE should be used.
> >> >> >> >
> >> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
> >> >> >>
> >> >> >> No. IFF_SLAVE is for bonding.
> >> >> >
> >> >> >What breaks if we reuse it for failover?
> >> >>
> >> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
> >> >> And failover slave is not a bonding slave.
> >> >
> >> >That does not really answer the question. I'd claim it's sufficiently
> >> >like a bond slave for IFF_SLAVE to make sense.
> >> >
> >> >In fact you will find that netvsc already sets IFF_SLAVE, and so
> >>
> >> netvsc does the whole failover thing in a wrong way. This patchset is
> >> trying to fix it.
> >
> >Maybe, but we don't need gratuitous changes either, especially if they
> >break userspace.
>
> What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
> the first place, lets fix it. If some userspace depends on that flag, it
> is broken anyway.
>
>
> >
> >> >does e.g. the eql driver.
> >> >
> >> >The advantage of using IFF_SLAVE is that userspace knows to skip it. If
> >>
> >> The userspace should know how to skip other types of slaves - team,
> >> bridge, ovs, etc.
> >> The "master link" should be the one to look at.
> >>
> >
> >How should existing userspace know which ones to skip and which one is
> >the master? Right now userspace seems to assume whatever does not have
> >IFF_SLAVE should be looked at. Are you saying that's not the right thing
>
> Why do you say so? What do you mean by "looked at"? Certainly not.
> IFLA_MASTER is the attribute that should be looked at, nothing else.
>
>
> >to do and userspace should be fixed? What should userspace do in
> >your opinion that will be forward compatible with future kernels?
> >
> >>
> >> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
> >>
> >> Each master type has a IFF_ master flag and IFF_ slave flag.
> >
> >Could you give some examples please?
>
> enum netdev_priv_flags {
> IFF_EBRIDGE = 1<<1,
> IFF_BRIDGE_PORT = 1<<9,
> IFF_OPENVSWITCH = 1<<20,
> IFF_OVS_DATAPATH = 1<<10,
> IFF_L3MDEV_MASTER = 1<<18,
> IFF_L3MDEV_SLAVE = 1<<21,
> IFF_TEAM = 1<<22,
> IFF_TEAM_PORT = 1<<13,
> };
That's not in uapi, is it? the comment above that says:
These flags are invisible to userspace
>
> >
> >> In private
> >> flag. I don't see no reason to break this pattern here.
> >
> >Other masters are setup from userspace, this one is set up automatically
> >by kernel. So the bar is higher, we need an interface that existing
> >userspace knows about. We can't just say "oh if userspace set this up
> >it should know to skip lowerdevs".
> >
> >Otherwise multiple interfaces with same mac tend to confuse userspace.
>
> No difference, really.
> Regardless who does the setup, and independent userspace deamon should
> react accordingly.
If the deamon does the setup itself, it's reasonable to require that it
learns about new flags each time we add a new driver. If it doesn't,
then I think it's less reasonable.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 16:12 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Samudrala, Sridhar,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522184112-mutt-send-email-mst@kernel.org>
Fixing the subj, sorry about that.
Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>> >
>> >On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> > > Use the registration/notification framework supported by the generic
>> >> > > failover infrastructure.
>> >> > >
>> >> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> > In previous patchset versions, the common code did
>> >> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >
>> >> > This should be part of the common "failover" code.
>> >
>> >Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>> >netvsc and only commonize the notifier and the main event handler routine.
>> >Another complication is that netvsc does part of registration in a delayed workqueue.
>>
>> :( This kind of degrades the whole efford of having single solution
>> in "failover" module. I think that common parts, as
>> netdev_rx_handler_register() and others certainly is should be inside
>> the common module. This is not a good time to minimize changes. Let's do
>> the thing properly and fix the netvsc mess now.
>>
>>
>> >
>> >It should be possible to move some of the code from net_failover.c to generic
>> >failover.c in future if Stephen is ok with it.
>> >
>> >
>> >> >
>> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> IFF_FAILOVER_SLAVE should be used.
>> >
>> >Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>> >in patch 3.
>>
>> The existing netvsc driver.
>
>We really can't change netvsc's flags now, even if it's interface is
>messy, it's being used in the field. We can add a flag that makes netvsc
>behave differently, and if this flag also allows enhanced functionality
>userspace will gradually switch.
Okay, although in this case, it really does not make much sense, so be
it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
now. (This once-wrong-forever-wrong policy is flustrating me).
But since this patchset introduces private flag IFF_FAILOVER and
IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
netdevice to get at least some consistency between virtio_net and
netvsc.
>
>Anything breaking userspace I fully expect Stephen to nack and
>IMO with good reason.
>
>--
>MST
^ permalink raw reply
* Re: Shepherd request (P83): Multipath TCP: Present Use Cases and an Upstream Future
From: Michael S. Tsirkin @ 2018-05-22 15:46 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Samudrala, Sridhar,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522153614.GK2149@nanopsycho>
On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
> >
> >On 5/22/2018 2:08 AM, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> > > Use the registration/notification framework supported by the generic
> >> > > failover infrastructure.
> >> > >
> >> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > In previous patchset versions, the common code did
> >> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >
> >> > This should be part of the common "failover" code.
> >
> >Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
> >netvsc and only commonize the notifier and the main event handler routine.
> >Another complication is that netvsc does part of registration in a delayed workqueue.
>
> :( This kind of degrades the whole efford of having single solution
> in "failover" module. I think that common parts, as
> netdev_rx_handler_register() and others certainly is should be inside
> the common module. This is not a good time to minimize changes. Let's do
> the thing properly and fix the netvsc mess now.
>
>
> >
> >It should be possible to move some of the code from net_failover.c to generic
> >failover.c in future if Stephen is ok with it.
> >
> >
> >> >
> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> IFF_FAILOVER_SLAVE should be used.
> >
> >Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
> >in patch 3.
>
> The existing netvsc driver.
We really can't change netvsc's flags now, even if it's interface is
messy, it's being used in the field. We can add a flag that makes netvsc
behave differently, and if this flag also allows enhanced functionality
userspace will gradually switch.
Anything breaking userspace I fully expect Stephen to nack and
IMO with good reason.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 15:45 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522181804-mutt-send-email-mst@kernel.org>
Tue, May 22, 2018 at 05:32:30PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
>> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
>> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> >> >> >>Use the registration/notification framework supported by the generic
>> >> >> >> >>failover infrastructure.
>> >> >> >> >>
>> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >> >> >
>> >> >> >> >In previous patchset versions, the common code did
>> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >> >> >
>> >> >> >> >This should be part of the common "failover" code.
>> >> >> >> >
>> >> >> >>
>> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> >> >> IFF_FAILOVER_SLAVE should be used.
>> >> >> >
>> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>> >> >>
>> >> >> No. IFF_SLAVE is for bonding.
>> >> >
>> >> >What breaks if we reuse it for failover?
>> >>
>> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
>> >> And failover slave is not a bonding slave.
>> >
>> >That does not really answer the question. I'd claim it's sufficiently
>> >like a bond slave for IFF_SLAVE to make sense.
>> >
>> >In fact you will find that netvsc already sets IFF_SLAVE, and so
>>
>> netvsc does the whole failover thing in a wrong way. This patchset is
>> trying to fix it.
>
>Maybe, but we don't need gratuitous changes either, especially if they
>break userspace.
What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
the first place, lets fix it. If some userspace depends on that flag, it
is broken anyway.
>
>> >does e.g. the eql driver.
>> >
>> >The advantage of using IFF_SLAVE is that userspace knows to skip it. If
>>
>> The userspace should know how to skip other types of slaves - team,
>> bridge, ovs, etc.
>> The "master link" should be the one to look at.
>>
>
>How should existing userspace know which ones to skip and which one is
>the master? Right now userspace seems to assume whatever does not have
>IFF_SLAVE should be looked at. Are you saying that's not the right thing
Why do you say so? What do you mean by "looked at"? Certainly not.
IFLA_MASTER is the attribute that should be looked at, nothing else.
>to do and userspace should be fixed? What should userspace do in
>your opinion that will be forward compatible with future kernels?
>
>>
>> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
>>
>> Each master type has a IFF_ master flag and IFF_ slave flag.
>
>Could you give some examples please?
enum netdev_priv_flags {
IFF_EBRIDGE = 1<<1,
IFF_BRIDGE_PORT = 1<<9,
IFF_OPENVSWITCH = 1<<20,
IFF_OVS_DATAPATH = 1<<10,
IFF_L3MDEV_MASTER = 1<<18,
IFF_L3MDEV_SLAVE = 1<<21,
IFF_TEAM = 1<<22,
IFF_TEAM_PORT = 1<<13,
};
>
>> In private
>> flag. I don't see no reason to break this pattern here.
>
>Other masters are setup from userspace, this one is set up automatically
>by kernel. So the bar is higher, we need an interface that existing
>userspace knows about. We can't just say "oh if userspace set this up
>it should know to skip lowerdevs".
>
>Otherwise multiple interfaces with same mac tend to confuse userspace.
No difference, really.
Regardless who does the setup, and independent userspace deamon should
react accordingly.
^ permalink raw reply
* Re: Shepherd request (P83): Multipath TCP: Present Use Cases and an Upstream Future
From: Jiri Pirko @ 2018-05-22 15:36 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <39081bce-3913-5b07-3d07-0c476fca5e78@intel.com>
Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>
>On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> > Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> > > Use the registration/notification framework supported by the generic
>> > > failover infrastructure.
>> > >
>> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > In previous patchset versions, the common code did
>> > netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> > (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >
>> > This should be part of the common "failover" code.
>
>Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>netvsc and only commonize the notifier and the main event handler routine.
>Another complication is that netvsc does part of registration in a delayed workqueue.
:( This kind of degrades the whole efford of having single solution
in "failover" module. I think that common parts, as
netdev_rx_handler_register() and others certainly is should be inside
the common module. This is not a good time to minimize changes. Let's do
the thing properly and fix the netvsc mess now.
>
>It should be possible to move some of the code from net_failover.c to generic
>failover.c in future if Stephen is ok with it.
>
>
>> >
>> Also note that in the current patchset you use IFF_FAILOVER flag for
>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> IFF_FAILOVER_SLAVE should be used.
>
>Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
>in patch 3.
The existing netvsc driver.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 15:32 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522151343.GJ2149@nanopsycho>
On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> >> >> >>Use the registration/notification framework supported by the generic
> >> >> >> >>failover infrastructure.
> >> >> >> >>
> >> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> >
> >> >> >> >In previous patchset versions, the common code did
> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >> >> >
> >> >> >> >This should be part of the common "failover" code.
> >> >> >> >
> >> >> >>
> >> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> >> >> IFF_FAILOVER_SLAVE should be used.
> >> >> >
> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
> >> >>
> >> >> No. IFF_SLAVE is for bonding.
> >> >
> >> >What breaks if we reuse it for failover?
> >>
> >> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
> >> And failover slave is not a bonding slave.
> >
> >That does not really answer the question. I'd claim it's sufficiently
> >like a bond slave for IFF_SLAVE to make sense.
> >
> >In fact you will find that netvsc already sets IFF_SLAVE, and so
>
> netvsc does the whole failover thing in a wrong way. This patchset is
> trying to fix it.
Maybe, but we don't need gratuitous changes either, especially if they
break userspace.
> >does e.g. the eql driver.
> >
> >The advantage of using IFF_SLAVE is that userspace knows to skip it. If
>
> The userspace should know how to skip other types of slaves - team,
> bridge, ovs, etc.
> The "master link" should be the one to look at.
>
How should existing userspace know which ones to skip and which one is
the master? Right now userspace seems to assume whatever does not have
IFF_SLAVE should be looked at. Are you saying that's not the right thing
to do and userspace should be fixed? What should userspace do in
your opinion that will be forward compatible with future kernels?
>
> >we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
>
> Each master type has a IFF_ master flag and IFF_ slave flag.
Could you give some examples please?
> In private
> flag. I don't see no reason to break this pattern here.
Other masters are setup from userspace, this one is set up automatically
by kernel. So the bar is higher, we need an interface that existing
userspace knows about. We can't just say "oh if userspace set this up
it should know to skip lowerdevs".
Otherwise multiple interfaces with same mac tend to confuse userspace.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-22 15:28 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180522090853.GF2149@nanopsycho>
On 5/22/2018 2:08 AM, Jiri Pirko wrote:
> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>> Use the registration/notification framework supported by the generic
>>> failover infrastructure.
>>>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> In previous patchset versions, the common code did
>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>
>> This should be part of the common "failover" code.
Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
netvsc and only commonize the notifier and the main event handler routine.
Another complication is that netvsc does part of registration in a delayed workqueue.
It should be possible to move some of the code from net_failover.c to generic
failover.c in future if Stephen is ok with it.
>>
> Also note that in the current patchset you use IFF_FAILOVER flag for
> master, yet for the slave you use IFF_SLAVE. That is wrong.
> IFF_FAILOVER_SLAVE should be used.
Not sure which code you are referring to. I only set IFF_FAILOVER_SLAVE
in patch 3.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 15:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522163502-mutt-send-email-mst@kernel.org>
Tue, May 22, 2018 at 03:39:33PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
>> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> >> >>Use the registration/notification framework supported by the generic
>> >> >> >>failover infrastructure.
>> >> >> >>
>> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >> >
>> >> >> >In previous patchset versions, the common code did
>> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >> >
>> >> >> >This should be part of the common "failover" code.
>> >> >> >
>> >> >>
>> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> >> IFF_FAILOVER_SLAVE should be used.
>> >> >
>> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>> >>
>> >> No. IFF_SLAVE is for bonding.
>> >
>> >What breaks if we reuse it for failover?
>>
>> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
>> And failover slave is not a bonding slave.
>
>That does not really answer the question. I'd claim it's sufficiently
>like a bond slave for IFF_SLAVE to make sense.
>
>In fact you will find that netvsc already sets IFF_SLAVE, and so
netvsc does the whole failover thing in a wrong way. This patchset is
trying to fix it.
>does e.g. the eql driver.
>
>The advantage of using IFF_SLAVE is that userspace knows to skip it. If
The userspace should know how to skip other types of slaves - team,
bridge, ovs, etc. The "master link" should be the one to look at.
>we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
Each master type has a IFF_ master flag and IFF_ slave flag. In private
flag. I don't see no reason to break this pattern here.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 13:39 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522132626.GH2149@nanopsycho>
On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> >> >>Use the registration/notification framework supported by the generic
> >> >> >>failover infrastructure.
> >> >> >>
> >> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >
> >> >> >In previous patchset versions, the common code did
> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >> >
> >> >> >This should be part of the common "failover" code.
> >> >> >
> >> >>
> >> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> >> IFF_FAILOVER_SLAVE should be used.
> >> >
> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
> >>
> >> No. IFF_SLAVE is for bonding.
> >
> >What breaks if we reuse it for failover?
>
> This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
> And failover slave is not a bonding slave.
That does not really answer the question. I'd claim it's sufficiently
like a bond slave for IFF_SLAVE to make sense.
In fact you will find that netvsc already sets IFF_SLAVE, and so
does e.g. the eql driver.
The advantage of using IFF_SLAVE is that userspace knows to skip it. If
we don't set IFF_SLAVE existing userspace tries to use the lowerdev.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 13:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522161509-mutt-send-email-mst@kernel.org>
Tue, May 22, 2018 at 03:17:37PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >> >>Use the registration/notification framework supported by the generic
>> >> >>failover infrastructure.
>> >> >>
>> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >
>> >> >In previous patchset versions, the common code did
>> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >> >
>> >> >This should be part of the common "failover" code.
>> >> >
>> >>
>> >> Also note that in the current patchset you use IFF_FAILOVER flag for
>> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> >> IFF_FAILOVER_SLAVE should be used.
>> >
>> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>>
>> No. IFF_SLAVE is for bonding.
>
>What breaks if we reuse it for failover?
This is exposed to userspace. IFF_SLAVE is expected for bonding slaves.
And failover slave is not a bonding slave.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 13:17 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522131422.GG2149@nanopsycho>
On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> >> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >> >>Use the registration/notification framework supported by the generic
> >> >>failover infrastructure.
> >> >>
> >> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >
> >> >In previous patchset versions, the common code did
> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >> >
> >> >This should be part of the common "failover" code.
> >> >
> >>
> >> Also note that in the current patchset you use IFF_FAILOVER flag for
> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
> >> IFF_FAILOVER_SLAVE should be used.
> >
> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
>
> No. IFF_SLAVE is for bonding.
What breaks if we reuse it for failover?
--
MST
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 13:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522161007-mutt-send-email-mst@kernel.org>
Tue, May 22, 2018 at 03:12:40PM CEST, mst@redhat.com wrote:
>On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>> >>Use the registration/notification framework supported by the generic
>> >>failover infrastructure.
>> >>
>> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >
>> >In previous patchset versions, the common code did
>> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>> >
>> >This should be part of the common "failover" code.
>> >
>>
>> Also note that in the current patchset you use IFF_FAILOVER flag for
>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>> IFF_FAILOVER_SLAVE should be used.
>
>Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
No. IFF_SLAVE is for bonding.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 13:12 UTC (permalink / raw)
To: Jiri Pirko
Cc: alexander.h.duyck, virtio-dev, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, anjali.singhai, aaron.f.brown,
davem
In-Reply-To: <20180522090853.GF2149@nanopsycho>
On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
> >Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
> >>Use the registration/notification framework supported by the generic
> >>failover infrastructure.
> >>
> >>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >
> >In previous patchset versions, the common code did
> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
> >
> >This should be part of the common "failover" code.
> >
>
> Also note that in the current patchset you use IFF_FAILOVER flag for
> master, yet for the slave you use IFF_SLAVE. That is wrong.
> IFF_FAILOVER_SLAVE should be used.
Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and IFF_SLAVE?
--
MST
^ permalink raw reply
* Re: [RFC PATCH net-next 10/12] vhost_net: build xdp buff
From: Jason Wang @ 2018-05-22 12:41 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180521095611.00005caa@intel.com>
On 2018年05月22日 00:56, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:31 +0800 Jason wrote:
>> This patch implement build XDP buffers in vhost_net. The idea is do
>> userspace copy in vhost_net and build XDP buff based on the
>> page. Vhost_net can then submit one or an array of XDP buffs to
>> underlayer socket (e.g TUN). TUN can choose to do XDP or call
>> build_skb() to build skb. To support build skb, vnet header were also
>> stored into the header of the XDP buff.
>>
>> This userspace copy and XDP buffs building is key to achieve XDP
>> batching in TUN, since TUN does not need to care about userspace copy
>> and then can disable premmption for several XDP buffs to achieve
>> batching from XDP.
>>
>> TODO: reserve headroom based on the TUN XDP.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index f0639d7..1209e84 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net,
>> likely(!vhost_exceeds_maxpend(net));
>> }
>>
>> +#define VHOST_NET_HEADROOM 256
>> +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
>> +
>> +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>> + struct iov_iter *from,
>> + struct xdp_buff *xdp)
>> +{
>> + struct vhost_virtqueue *vq = &nvq->vq;
>> + struct page_frag *alloc_frag = ¤t->task_frag;
>> + struct virtio_net_hdr *gso;
>> + size_t len = iov_iter_count(from);
>> + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM
>> + + nvq->sock_hlen);
>> + int sock_hlen = nvq->sock_hlen;
>> + void *buf;
>> + int copied;
>> +
>> + if (len < nvq->sock_hlen)
>> + return -EFAULT;
>> +
>> + if (SKB_DATA_ALIGN(len + pad) +
>> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
>> + return -ENOSPC;
>> +
>> + buflen += SKB_DATA_ALIGN(len + pad);
> maybe store the result of SKB_DATA_ALIGN in a local instead of doing
> the work twice?
Ok.
>
>> + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
>> + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
>> + return -ENOMEM;
>> +
>> + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>> +
>> + /* We store two kinds of metadata in the header which will be
>> + * used for XDP_PASS to do build_skb():
>> + * offset 0: buflen
>> + * offset sizeof(int): vnet header
>> + */
>> + copied = copy_page_from_iter(alloc_frag->page,
>> + alloc_frag->offset + sizeof(int), sock_hlen, from);
>> + if (copied != sock_hlen)
>> + return -EFAULT;
>> +
>> + gso = (struct virtio_net_hdr *)(buf + sizeof(int));
>> +
>> + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
>> + vhost16_to_cpu(vq, gso->csum_start) +
>> + vhost16_to_cpu(vq, gso->csum_offset) + 2 >
>> + vhost16_to_cpu(vq, gso->hdr_len)) {
>> + gso->hdr_len = cpu_to_vhost16(vq,
>> + vhost16_to_cpu(vq, gso->csum_start) +
>> + vhost16_to_cpu(vq, gso->csum_offset) + 2);
>> +
>> + if (vhost16_to_cpu(vq, gso->hdr_len) > len)
>> + return -EINVAL;
>> + }
>> +
>> + len -= sock_hlen;
>> + copied = copy_page_from_iter(alloc_frag->page,
>> + alloc_frag->offset + pad,
>> + len, from);
>> + if (copied != len)
>> + return -EFAULT;
>> +
>> + xdp->data_hard_start = buf;
>> + xdp->data = buf + pad;
>> + xdp->data_end = xdp->data + len;
>> + *(int *)(xdp->data_hard_start)= buflen;
> space before =
Yes.
Thanks
>
>> +
>> + get_page(alloc_frag->page);
>> + alloc_frag->offset += buflen;
>> +
>> + return 0;
>> +}
>> +
>> static void handle_tx_copy(struct vhost_net *net)
>> {
>> struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next 04/12] vhost_net: split out datacopy logic
From: Jason Wang @ 2018-05-22 12:39 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180521094646.00002dee@intel.com>
On 2018年05月22日 00:46, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:25 +0800 Jason wrote:
>> Instead of mixing zerocopy and datacopy logics, this patch tries to
>> split datacopy logic out. This results for a more compact code and
>> specific optimization could be done on top more easily.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 102 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 4ebac76..4682fcc 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -492,9 +492,95 @@ static bool vhost_has_more_pkts(struct vhost_net *net,
>> likely(!vhost_exceeds_maxpend(net));
>> }
>>
>> +static void handle_tx_copy(struct vhost_net *net)
>> +{
>> + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>> + struct vhost_virtqueue *vq = &nvq->vq;
>> + unsigned out, in;
> move "out" and "in" down to inside the for loop as well.
>
>> + int head;
> move this "head" declaration inside for-loop below.
Will do these in next version, basically this function were just copied
from handle_tx_zerocopy and remove unnecessary parts. So I'm thinking an
independent patch from the beginning to avoid changes in two places.
>
>> + struct msghdr msg = {
>> + .msg_name = NULL,
>> + .msg_namelen = 0,
>> + .msg_control = NULL,
>> + .msg_controllen = 0,
>> + .msg_flags = MSG_DONTWAIT,
>> + };
>> + size_t len, total_len = 0;
>> + int err;
>> + size_t hdr_size;
>> + struct socket *sock;
>> + struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
>> + int sent_pkts = 0;
> why do we need so many locals?
So it looks to me ubufs is unnecessary but all other is really needed.
>
>> +
>> + mutex_lock(&vq->mutex);
>> + sock = vq->private_data;
>> + if (!sock)
>> + goto out;
>> +
>> + if (!vq_iotlb_prefetch(vq))
>> + goto out;
>> +
>> + vhost_disable_notify(&net->dev, vq);
>> + vhost_net_disable_vq(net, vq);
>> +
>> + hdr_size = nvq->vhost_hlen;
>> +
>> + for (;;) {
>> + head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
>> + ARRAY_SIZE(vq->iov),
>> + &out, &in);
>> + /* On error, stop handling until the next kick. */
>> + if (unlikely(head < 0))
>> + break;
>> + /* Nothing new? Wait for eventfd to tell us they refilled. */
>> + if (head == vq->num) {
>> + if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>> + vhost_disable_notify(&net->dev, vq);
>> + continue;
>> + }
>> + break;
>> + }
>> + if (in) {
>> + vq_err(vq, "Unexpected descriptor format for TX: "
>> + "out %d, int %d\n", out, in);
> don't break strings, keep all the bits between " " together, even if it
> is longer than 80 chars.
Ok.
>
>> + break;
>> + }
>> +
>> + len = init_iov_iter(vq, &msg.msg_iter, hdr_size, out);
>> + if (len < 0)
>> + break;
> same comment as previous patch, len is a size_t which is unsigned.
Yes.
>
>> +
>> + total_len += len;
>> + if (total_len < VHOST_NET_WEIGHT &&
>> + vhost_has_more_pkts(net, vq)) {
>> + msg.msg_flags |= MSG_MORE;
>> + } else {
>> + msg.msg_flags &= ~MSG_MORE;
>> + }
> don't need { } here.
Right.
>> +
>> + /* TODO: Check specific error and bomb out unless ENOBUFS? */
>> + err = sock->ops->sendmsg(sock, &msg, len);
>> + if (unlikely(err < 0)) {
>> + vhost_discard_vq_desc(vq, 1);
>> + vhost_net_enable_vq(net, vq);
>> + break;
>> + }
>> + if (err != len)
>> + pr_debug("Truncated TX packet: "
>> + " len %d != %zd\n", err, len);
> single line " " string please
>
Will fix.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next 03/12] vhost_net: introduce vhost_has_more_pkts()
From: Jason Wang @ 2018-05-22 12:31 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180521093908.00006747@intel.com>
On 2018年05月22日 00:39, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:24 +0800 Jason wrote:
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index de544ee..4ebac76 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -485,6 +485,13 @@ static bool vhost_exceeds_weight(int pkts, int total_len)
>> unlikely(pkts >= VHOST_NET_PKT_WEIGHT);
>> }
>>
>> +static bool vhost_has_more_pkts(struct vhost_net *net,
>> + struct vhost_virtqueue *vq)
>> +{
>> + return !vhost_vq_avail_empty(&net->dev, vq) &&
>> + likely(!vhost_exceeds_maxpend(net));
> This really seems like mis-use of likely/unlikely, in the middle of a
> sequence of operations that will always be run when this function is
> called. I think you should remove the likely from this helper,
> especially, and control the branch from the branch point.
Yes, so I'm consider to make it a macro in next version.
>
>
>> +}
>> +
>> /* Expects to be always run from workqueue - which acts as
>> * read-size critical section for our kind of RCU. */
>> static void handle_tx(struct vhost_net *net)
>> @@ -578,8 +585,7 @@ static void handle_tx(struct vhost_net *net)
>> }
>> total_len += len;
>> if (total_len < VHOST_NET_WEIGHT &&
>> - !vhost_vq_avail_empty(&net->dev, vq) &&
>> - likely(!vhost_exceeds_maxpend(net))) {
>> + vhost_has_more_pkts(net, vq)) {
> Yes, I know it came from here, but likely/unlikely are for branch
> control, so they should encapsulate everything inside the if, unless
> I'm mistaken.
Ok.
>
>> msg.msg_flags |= MSG_MORE;
>> } else {
>> msg.msg_flags &= ~MSG_MORE;
>> @@ -605,7 +611,7 @@ static void handle_tx(struct vhost_net *net)
>> else
>> vhost_zerocopy_signal_used(net, vq);
>> vhost_net_tx_packet(net);
>> - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) {
>> + if (vhost_exceeds_weight(++sent_pkts, total_len)) {
> You should have kept the unlikely here, and not had it inside the
> helper (as per the previous patch. Also, why wasn't this change part
> of the previous patch?
Yes, will squash the above into previous one.
Thanks
>
>> vhost_poll_queue(&vq->poll);
>> break;
>> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next 02/12] vhost_net: introduce vhost_exceeds_weight()
From: Jason Wang @ 2018-05-22 12:27 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180521092923.00005cec@intel.com>
On 2018年05月22日 00:29, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:23 +0800 Jason wrote:
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 13 ++++++++-----
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 15d191a..de544ee 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -479,6 +479,12 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
>> return len;
>> }
>>
>> +static bool vhost_exceeds_weight(int pkts, int total_len)
>> +{
>> + return unlikely(total_len >= VHOST_NET_WEIGHT) ||
>> + unlikely(pkts >= VHOST_NET_PKT_WEIGHT);
> I was going to say just one unlikely, but then the caller of this
> function also says unlikely(vhost_exceeds...), so I think you should
> just drop the unlikely statements here (both of them)
Ok.
>
>> +}
>> +
>> /* Expects to be always run from workqueue - which acts as
>> * read-size critical section for our kind of RCU. */
>> static void handle_tx(struct vhost_net *net)
>> @@ -570,7 +576,6 @@ static void handle_tx(struct vhost_net *net)
>> msg.msg_control = NULL;
>> ubufs = NULL;
>> }
>> -
> unrelated whitespace changes?
Yes.
Thanks
>
>> total_len += len;
>> if (total_len < VHOST_NET_WEIGHT &&
>> !vhost_vq_avail_empty(&net->dev, vq) &&
>> @@ -600,8 +605,7 @@ static void handle_tx(struct vhost_net *net)
>> else
>> vhost_zerocopy_signal_used(net, vq);
>> vhost_net_tx_packet(net);
>> - if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
>> - unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT)) {
>> + if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) {
>> vhost_poll_queue(&vq->poll);
>> break;
>> }
>> @@ -887,8 +891,7 @@ static void handle_rx(struct vhost_net *net)
>> if (unlikely(vq_log))
>> vhost_log_write(vq, vq_log, log, vhost_len);
>> total_len += vhost_len;
>> - if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
>> - unlikely(++recv_pkts >= VHOST_NET_PKT_WEIGHT)) {
>> + if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
>> vhost_poll_queue(&vq->poll);
>> goto out;
>> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC PATCH net-next 01/12] vhost_net: introduce helper to initialize tx iov iter
From: Jason Wang @ 2018-05-22 12:26 UTC (permalink / raw)
To: Jesse Brandeburg; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <20180521092400.00004c68@intel.com>
On 2018年05月22日 00:24, Jesse Brandeburg wrote:
> Hi Jason, a few nits.
>
> On Mon, 21 May 2018 17:04:22 +0800 Jason wrote:
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/net.c | 34 +++++++++++++++++++++++-----------
>> 1 file changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index c4b49fc..15d191a 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -459,6 +459,26 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
>> min_t(unsigned int, VHOST_MAX_PEND, vq->num >> 2);
>> }
>>
>> +static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
>> + size_t hdr_size, int out)
>> +{
>> + /* Skip header. TODO: support TSO. */
>> + size_t len = iov_length(vq->iov, out);
>> +
>> + iov_iter_init(iter, WRITE, vq->iov, out, len);
>> + iov_iter_advance(iter, hdr_size);
>> + /* Sanity check */
>> + if (!iov_iter_count(iter)) {
>> + vq_err(vq, "Unexpected header len for TX: "
>> + "%zd expected %zd\n",
>> + len, hdr_size);
> ok, it was like this before, but please unwrap the string in " ", there
> should be no line breaks in string declarations and they are allowed to
> go over 80 characters.
Ok.
>
>> + return -EFAULT;
>> + }
>> + len = iov_iter_count(iter);
>> +
>> + return len;
>> +}
>> +
>> /* Expects to be always run from workqueue - which acts as
>> * read-size critical section for our kind of RCU. */
>> static void handle_tx(struct vhost_net *net)
>> @@ -521,18 +541,10 @@ static void handle_tx(struct vhost_net *net)
>> "out %d, int %d\n", out, in);
>> break;
>> }
>> - /* Skip header. TODO: support TSO. */
>> - len = iov_length(vq->iov, out);
>> - iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
>> - iov_iter_advance(&msg.msg_iter, hdr_size);
>> - /* Sanity check */
>> - if (!msg_data_left(&msg)) {
>> - vq_err(vq, "Unexpected header len for TX: "
>> - "%zd expected %zd\n",
>> - len, hdr_size);
>> +
>> + len = init_iov_iter(vq, &msg.msg_iter, hdr_size, out);
>> + if (len < 0)
> len is declared as size_t, which is unsigned, and can never be
> negative. I'm pretty sure this is a bug.
Yes, let me fix it in next version.
Thanks
>
>
>> break;
>> - }
>> - len = msg_data_left(&msg);
>>
>> zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>> && !vhost_exceeds_maxpend(net)
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH net] vhost: synchronize IOTLB message with dev cleanup
From: Jason Wang @ 2018-05-22 11:58 UTC (permalink / raw)
To: mst, jasowang; +Cc: netdev, linux-kernel, kvm, virtualization
DaeRyong Jeong reports a race between vhost_dev_cleanup() and
vhost_process_iotlb_msg():
Thread interleaving:
CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup)
(In the case of both VHOST_IOTLB_UPDATE and
VHOST_IOTLB_INVALIDATE)
===== =====
vhost_umem_clean(dev->iotlb);
if (!dev->iotlb) {
ret = -EFAULT;
break;
}
dev->iotlb = NULL;
The reason is we don't synchronize between them, fixing by protecting
vhost_process_iotlb_msg() with dev mutex.
Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vhost.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f3bd8e9..f0be5f3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -981,6 +981,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
{
int ret = 0;
+ mutex_lock(&dev->mutex);
vhost_dev_lock_vqs(dev);
switch (msg->type) {
case VHOST_IOTLB_UPDATE:
@@ -1016,6 +1017,8 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
}
vhost_dev_unlock_vqs(dev);
+ mutex_unlock(&dev->mutex);
+
return ret;
}
ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 9:08 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180522090637.GE2149@nanopsycho>
Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>Use the registration/notification framework supported by the generic
>>failover infrastructure.
>>
>>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>
>In previous patchset versions, the common code did
>netdev_rx_handler_register() and netdev_upper_dev_link() etc
>(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>
>This should be part of the common "failover" code.
>
Also note that in the current patchset you use IFF_FAILOVER flag for
master, yet for the slave you use IFF_SLAVE. That is wrong.
IFF_FAILOVER_SLAVE should be used.
^ permalink raw reply
* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-05-22 9:06 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1526954781-35359-3-git-send-email-sridhar.samudrala@intel.com>
Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>Use the registration/notification framework supported by the generic
>failover infrastructure.
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
In previous patchset versions, the common code did
netdev_rx_handler_register() and netdev_upper_dev_link() etc
(netvsc_vf_join()). Now, this is still done in netvsc. Why?
This should be part of the common "failover" code.
^ permalink raw reply
* Re: [PATCH net-next v11 3/5] net: Introduce net_failover driver
From: Jiri Pirko @ 2018-05-22 8:59 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: alexander.h.duyck, virtio-dev, mst, kubakici, netdev,
virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1526954781-35359-4-git-send-email-sridhar.samudrala@intel.com>
Tue, May 22, 2018 at 04:06:19AM CEST, sridhar.samudrala@intel.com wrote:
>The net_failover driver provides an automated failover mechanism via APIs
>to create and destroy a failover master netdev and mananges a primary and
>standby slave netdevs that get registered via the generic failover
>infrastructure.
>
>The failover netdev acts a master device and controls 2 slave devices. The
>original paravirtual interface gets registered as 'standby' slave netdev and
>a passthru/vf device with the same MAC gets registered as 'primary' slave
>netdev. Both 'standby' and 'failover' netdevs are associated with the same
>'pci' device. The user accesses the network interface via 'failover' netdev.
>The 'failover' netdev chooses 'primary' netdev as default for transmits when
>it is available with link up and running.
>
>This can be used by paravirtual drivers to enable an alternate low latency
>datapath. It also enables hypervisor controlled live migration of a VM with
>direct attached VF by failing over to the paravirtual datapath when the VF
>is unplugged.
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
[...]
>+
>+The net_failover driver provides an automated failover mechanism via APIs
>+to create and destroy a failover master netdev and mananges a primary and
s/mananges/manages/
^ permalink raw reply
* Re: KASAN: use-after-free Read in vhost_chr_write_iter
From: Jason Wang @ 2018-05-22 8:42 UTC (permalink / raw)
To: DaeRyong Jeong
Cc: bammanag, kt0755, kvm, mst, netdev, linux-kernel, virtualization,
byoungyoung
In-Reply-To: <20180522083842.GA10604@dragonet.kaist.ac.kr>
On 2018年05月22日 16:38, DaeRyong Jeong wrote:
> On Mon, May 21, 2018 at 10:38:10AM +0800, Jason Wang wrote:
>> On 2018年05月18日 17:24, Jason Wang wrote:
>>> On 2018年05月17日 21:45, DaeRyong Jeong wrote:
>>>> We report the crash: KASAN: use-after-free Read in vhost_chr_write_iter
>>>>
>>>> This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
>>>> version of Syzkaller), which we describe more at the end of this
>>>> report. Our analysis shows that the race occurs when invoking two
>>>> syscalls concurrently, write$vnet and ioctl$VHOST_RESET_OWNER.
>>>>
>>>>
>>>> Analysis:
>>>> We think the concurrent execution of vhost_process_iotlb_msg() and
>>>> vhost_dev_cleanup() causes the crash.
>>>> Both of functions can run concurrently (please see call sequence below),
>>>> and possibly, there is a race on dev->iotlb.
>>>> If the switch occurs right after vhost_dev_cleanup() frees
>>>> dev->iotlb, vhost_process_iotlb_msg() still sees the non-null value
>>>> and it
>>>> keep executing without returning -EFAULT. Consequently, use-after-free
>>>> occures
>>>>
>>>>
>>>> Thread interleaving:
>>>> CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup)
>>>> (In the case of both VHOST_IOTLB_UPDATE and
>>>> VHOST_IOTLB_INVALIDATE)
>>>> ===== =====
>>>> vhost_umem_clean(dev->iotlb);
>>>> if (!dev->iotlb) {
>>>> ret = -EFAULT;
>>>> break;
>>>> }
>>>> dev->iotlb = NULL;
>>>>
>>>>
>>>> Call Sequence:
>>>> CPU0
>>>> =====
>>>> vhost_net_chr_write_iter
>>>> vhost_chr_write_iter
>>>> vhost_process_iotlb_msg
>>>>
>>>> CPU1
>>>> =====
>>>> vhost_net_ioctl
>>>> vhost_net_reset_owner
>>>> vhost_dev_reset_owner
>>>> vhost_dev_cleanup
>>> Thanks a lot for the analysis.
>>>
>>> This could be addressed by simply protect it with dev mutex.
>>>
>>> Will post a patch.
>>>
>> Could you please help to test the attached patch? I've done some smoking
>> test.
>>
>> Thanks
> Sorry to say this, but we don't have a reproducer for this bug since our
> reproducer is being implemented.
>
> This crash had occrued a few times in our fuzzer, so I inspected the code
> manually.
>
> It seems the patch is good for me, but we can't test the patch for now.
> Sorry.
>
No problem.
I'm trying to craft a reproducer, looks not hard.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [RFC v5 5/5] virtio_ring: enable packed ring
From: Tiwei Bie @ 2018-05-22 8:16 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180522081648.14768-1-tiwei.bie@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1ee52a89cb04..355dfef4f1eb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1956,6 +1956,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+ case VIRTIO_F_RING_PACKED:
+ break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
--
2.17.0
^ permalink raw reply related
* [RFC v5 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-22 8:16 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180522081648.14768-1-tiwei.bie@intel.com>
This commit introduces the EVENT_IDX support in packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 67 ++++++++++++++++++++++++++++++++++--
1 file changed, 64 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb9fd5207a68..1ee52a89cb04 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1043,7 +1043,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 flags;
+ u16 new, old, off_wrap, flags, wrap_counter, event_idx;
bool needs_kick;
u32 snapshot;
@@ -1052,9 +1052,19 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
* suppressions. */
virtio_mb(vq->weak_barriers);
+ old = vq->next_avail_idx - vq->num_added;
+ new = vq->next_avail_idx;
+ vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+ off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+ wrap_counter = off_wrap >> 15;
+ event_idx = off_wrap & ~(1<<15);
+ if (wrap_counter != vq->avail_wrap_counter)
+ event_idx -= vq->vring_packed.num;
+
#ifdef DEBUG
if (vq->last_add_time_valid) {
WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
@@ -1063,7 +1073,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
vq->last_add_time_valid = false;
#endif
- needs_kick = (flags != VRING_EVENT_F_DISABLE);
+ if (flags == VRING_EVENT_F_DESC)
+ needs_kick = vring_need_event(event_idx, new, old);
+ else
+ needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
}
@@ -1170,6 +1183,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
ret = vq->desc_state_packed[id].data;
detach_buf_packed(vq, id, ctx);
+ /* If we expect an interrupt for the next entry, tell host
+ * by writing event index and flush out the write before
+ * the read in the next get_buf call. */
+ if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+ virtio_store_mb(vq->weak_barriers,
+ &vq->vring_packed.driver->off_wrap,
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+ ((u16)vq->used_wrap_counter << 15)));
+
#ifdef DEBUG
vq->last_add_time_valid = false;
#endif
@@ -1197,10 +1219,26 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ vq->last_used_idx |
+ ((u16)vq->used_wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
+ // XXX XXX XXX
+ // Setting VRING_EVENT_F_DESC in this function
+ // will break netperf test for now.
+ // Will need to look into this.
+#if 0
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
+#else
vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+#endif
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1227,15 +1265,38 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 bufs, used_idx, wrap_counter;
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ /* TODO: tune this threshold */
+ if (vq->next_avail_idx < vq->last_used_idx)
+ bufs = (vq->vring_packed.num + vq->next_avail_idx -
+ vq->last_used_idx) * 3 / 4;
+ else
+ bufs = (vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+ wrap_counter = vq->used_wrap_counter;
+
+ used_idx = vq->last_used_idx + bufs;
+ if (used_idx >= vq->vring_packed.num) {
+ used_idx -= vq->vring_packed.num;
+ wrap_counter ^= 1;
+ }
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ used_idx | (wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
- vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
--
2.17.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox