Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-25 23:29 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <00d34f67-f26f-0b20-af3f-2add24ae8a5c@intel.com>

On Fri, 25 May 2018 16:06:58 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:13 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 03ed492c4e14..0f4ba52b641d 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -1421,6 +1421,8 @@ struct net_device_ops {
> >>    *	entity (i.e. the master device for bridged veth)
> >>    * @IFF_MACSEC: device is a MACsec device
> >>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> >> + * @IFF_FAILOVER: device is a failover master device
> >> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
> >>    */
> >>   enum netdev_priv_flags {
> >>   	IFF_802_1Q_VLAN			= 1<<0,
> >> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
> >>   	IFF_PHONY_HEADROOM		= 1<<24,
> >>   	IFF_MACSEC			= 1<<25,
> >>   	IFF_NO_RX_HANDLER		= 1<<26,
> >> +	IFF_FAILOVER			= 1<<27,
> >> +	IFF_FAILOVER_SLAVE		= 1<<28,
> >>   };  
> > Why is FAILOVER any different than other master/slave relationships.
> > I don't think you need to take up precious netdev flag bits for this.  
> 
> These are netdev priv flags.
> Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
> with other failover mechanisms. Team also doesn't use this flags and it has its own
> priv_flags.
> 

They are already used by bonding and team.
I don't see why this can't reuse them.

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-25 23:28 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <c0be8e44-34f8-0e46-3a51-7ce869a5d351@intel.com>

On Fri, 25 May 2018 16:11:47 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 5/25/2018 3:34 PM, Stephen Hemminger wrote:
> > On Thu, 24 May 2018 09:55:14 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >  
> >> --- a/drivers/net/hyperv/Kconfig
> >> +++ b/drivers/net/hyperv/Kconfig
> >> @@ -2,5 +2,6 @@ config HYPERV_NET
> >>   	tristate "Microsoft Hyper-V virtual network driver"
> >>   	depends on HYPERV
> >>   	select UCS2_STRING
> >> +	select FAILOVER  
> > When I take a working kernel config, add the patches then do
> > make oldconfig
> >
> > It is not autoselecting FAILOVER, it prompts me for it. This means
> > if user says no then a non-working netvsc device is made.  
> 
> I see
>     Generic failover module (FAILOVER) [M/y/?] (NEW)
> 
> So the user is given an option to either build as a Module or part of the
> kernel. 'n' is not an option.

With most libraries there is no prompt at all.

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-25 23:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180525153456.28b44c7c@xeon-e3>


On 5/25/2018 3:34 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:14 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> --- a/drivers/net/hyperv/Kconfig
>> +++ b/drivers/net/hyperv/Kconfig
>> @@ -2,5 +2,6 @@ config HYPERV_NET
>>   	tristate "Microsoft Hyper-V virtual network driver"
>>   	depends on HYPERV
>>   	select UCS2_STRING
>> +	select FAILOVER
> When I take a working kernel config, add the patches then do
> make oldconfig
>
> It is not autoselecting FAILOVER, it prompts me for it. This means
> if user says no then a non-working netvsc device is made.

I see
    Generic failover module (FAILOVER) [M/y/?] (NEW)

So the user is given an option to either build as a Module or part of the
kernel. 'n' is not an option.

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-25 23:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180525153843.503ee052@xeon-e3>


On 5/25/2018 3:38 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:13 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 03ed492c4e14..0f4ba52b641d 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1421,6 +1421,8 @@ struct net_device_ops {
>>    *	entity (i.e. the master device for bridged veth)
>>    * @IFF_MACSEC: device is a MACsec device
>>    * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
>> + * @IFF_FAILOVER: device is a failover master device
>> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>    */
>>   enum netdev_priv_flags {
>>   	IFF_802_1Q_VLAN			= 1<<0,
>> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
>>   	IFF_PHONY_HEADROOM		= 1<<24,
>>   	IFF_MACSEC			= 1<<25,
>>   	IFF_NO_RX_HANDLER		= 1<<26,
>> +	IFF_FAILOVER			= 1<<27,
>> +	IFF_FAILOVER_SLAVE		= 1<<28,
>>   };
> Why is FAILOVER any different than other master/slave relationships.
> I don't think you need to take up precious netdev flag bits for this.

These are netdev priv flags.
Jiri says that IFF_MASTER/IFF_SLAVE are bonding specific flags and cannot be used
with other failover mechanisms. Team also doesn't use this flags and it has its own
priv_flags.

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-25 23:04 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <20180525153744.2b53c449@xeon-e3>


[-- Attachment #1.1: Type: text/plain, Size: 967 bytes --]



On 5/25/2018 3:37 PM, Stephen Hemminger wrote:
> On Thu, 24 May 2018 09:55:13 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>
>> +	spin_lock(&failover_lock);
> Since register is not in fast path, this should be a mutex?

This is Jiri's comment which made me to switch to spinlock from mutex

   >> Why mutex? Apparently you don't need to sleep while holding a lock.
   >> Simple spinlock would do.

>
>
>> +int failover_slave_unregister(struct net_device *slave_dev)
>> +{
>> +	struct net_device *failover_dev;
>> +	struct failover_ops *fops;
>> +
>> +	if (!netif_is_failover_slave(slave_dev))
>> +		goto done;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	failover_dev = failover_get_bymac(slave_dev->perm_addr, &fops);
>> +	if (!failover_dev)
>> +		goto done;
> Since the slave device must have a master device set already, why not use
> that instead of searching by MAC address on unregister or link change.
>
We also need to get the fops(failover_ops)


[-- Attachment #1.2: Type: text/html, Size: 36732 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-25 22:38 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1527180917-39737-2-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:13 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 03ed492c4e14..0f4ba52b641d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1421,6 +1421,8 @@ struct net_device_ops {
>   *	entity (i.e. the master device for bridged veth)
>   * @IFF_MACSEC: device is a MACsec device
>   * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
> + * @IFF_FAILOVER: device is a failover master device
> + * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>   */
>  enum netdev_priv_flags {
>  	IFF_802_1Q_VLAN			= 1<<0,
> @@ -1450,6 +1452,8 @@ enum netdev_priv_flags {
>  	IFF_PHONY_HEADROOM		= 1<<24,
>  	IFF_MACSEC			= 1<<25,
>  	IFF_NO_RX_HANDLER		= 1<<26,
> +	IFF_FAILOVER			= 1<<27,
> +	IFF_FAILOVER_SLAVE		= 1<<28,
>  };

Why is FAILOVER any different than other master/slave relationships.
I don't think you need to take up precious netdev flag bits for this.

^ permalink raw reply

* Re: [PATCH net-next v12 1/5] net: Introduce generic failover module
From: Stephen Hemminger @ 2018-05-25 22:37 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1527180917-39737-2-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:13 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:


> +	spin_lock(&failover_lock);

Since register is not in fast path, this should be a mutex?


> +int failover_slave_unregister(struct net_device *slave_dev)
> +{
> +	struct net_device *failover_dev;
> +	struct failover_ops *fops;
> +
> +	if (!netif_is_failover_slave(slave_dev))
> +		goto done;
> +
> +	ASSERT_RTNL();
> +
> +	failover_dev = failover_get_bymac(slave_dev->perm_addr, &fops);
> +	if (!failover_dev)
> +		goto done;

Since the slave device must have a master device set already, why not use
that instead of searching by MAC address on unregister or link change.

^ permalink raw reply

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-05-25 22:34 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1527180917-39737-3-git-send-email-sridhar.samudrala@intel.com>

On Thu, 24 May 2018 09:55:14 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> --- a/drivers/net/hyperv/Kconfig
> +++ b/drivers/net/hyperv/Kconfig
> @@ -2,5 +2,6 @@ config HYPERV_NET
>  	tristate "Microsoft Hyper-V virtual network driver"
>  	depends on HYPERV
>  	select UCS2_STRING
> +	select FAILOVER

When I take a working kernel config, add the patches then do
make oldconfig

It is not autoselecting FAILOVER, it prompts me for it. This means
if user says no then a non-working netvsc device is made.

^ permalink raw reply

* [PATCH v2 11/13] drm/virtio: Stop updating plane->crtc
From: Ville Syrjala @ 2018-05-25 18:50 UTC (permalink / raw)
  To: dri-devel; +Cc: David Airlie, intel-gfx, virtualization
In-Reply-To: <20180525185045.29689-1-ville.syrjala@linux.intel.com>

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

We want to get rid of plane->crtc on atomic drivers. Stop setting it.

v2: s/fb/crtc/ in the commit message (Gerd)

Cc: David Airlie <airlied@linux.ie>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/virtio/virtgpu_display.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index d6dd769a7ad3..ff9933e79416 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -282,8 +282,6 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
 	drm_crtc_init_with_planes(dev, crtc, primary, cursor,
 				  &virtio_gpu_crtc_funcs, NULL);
 	drm_crtc_helper_add(crtc, &virtio_gpu_crtc_helper_funcs);
-	primary->crtc = crtc;
-	cursor->crtc = crtc;
 
 	drm_connector_init(dev, connector, &virtio_gpu_connector_funcs,
 			   DRM_MODE_CONNECTOR_VIRTUAL);
-- 
2.16.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related

* [PATCH v2 00/13] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Ville Syrjala @ 2018-05-25 18:50 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, Daniel Vetter, virtualization, Eric Anholt,
	David (ChunMing) Zhou, Thomas Hellstrom, Joonyoung Shim,
	Sinclair Yeh, Kyungmin Park, amd-gfx, VMware Graphics,
	Harry Wentland, linux-arm-msm, intel-gfx, Inki Dae, Deepak Rawat,
	Seung-Woo Kim, Rob Clark, Alex Deucher, freedreno,
	Christian König

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Here are again the last (?) bits of eliminating the plane->fb/crtc
usage for atomic drivers. I've pushed everything else (thanks to
everyone who reviewed them). 

Deepak said he'd tested the vmwgfx stuff, so I think it should be
safe to land. Just missing a bit of review...

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Deepak Rawat <drawat@vmware.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: freedreno@lists.freedesktop.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: Rob Clark <robdclark@gmail.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: virtualization@lists.linux-foundation.org
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>

Ville Syrjälä (13):
  drm/vmwgfx: Stop using plane->fb in vmw_kms_atomic_check_modeset()
  drm/vmwgfx: Stop using plane->fb in vmw_kms_helper_dirty()
  drm/vmwgfx: Stop using plane->fb in vmw_kms_update_implicit_fb()
  drm/vmwgfx: Stop updating plane->fb
  drm/vmwgfx: Stop using plane->fb in atomic_enable()
  drm/vmwgfx: Stop messing about with plane->fb/old_fb/crtc
  drm/amdgpu/dc: Stop updating plane->fb
  drm/i915: Stop updating plane->fb/crtc
  drm/exynos: Stop updating plane->crtc
  drm/msm: Stop updating plane->fb/crtc
  drm/virtio: Stop updating plane->crtc
  drm/vc4: Stop updating plane->fb/crtc
  drm: Stop updating plane->crtc/fb/old_fb on atomic drivers

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 -
 drivers/gpu/drm/drm_atomic.c                      | 55 +++--------------------
 drivers/gpu/drm/drm_atomic_helper.c               | 15 +------
 drivers/gpu/drm/drm_crtc.c                        |  8 +++-
 drivers/gpu/drm/drm_fb_helper.c                   |  7 ---
 drivers/gpu/drm/drm_framebuffer.c                 |  5 ---
 drivers/gpu/drm/drm_plane.c                       | 14 +++---
 drivers/gpu/drm/drm_plane_helper.c                |  4 +-
 drivers/gpu/drm/exynos/exynos_drm_plane.c         |  2 -
 drivers/gpu/drm/i915/intel_atomic_plane.c         | 12 -----
 drivers/gpu/drm/i915/intel_display.c              |  7 ++-
 drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c         |  1 -
 drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c        |  2 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c         |  1 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c        |  2 -
 drivers/gpu/drm/vc4/vc4_crtc.c                    |  3 --
 drivers/gpu/drm/virtio/virtgpu_display.c          |  2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_fb.c                | 24 ----------
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c               | 24 +++++++---
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c              |  2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c              |  5 +--
 include/drm/drm_atomic.h                          |  3 --
 22 files changed, 46 insertions(+), 154 deletions(-)

-- 
2.16.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net
From: Michael S. Tsirkin @ 2018-05-25 17:53 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com>

On Mon, May 21, 2018 at 05:04:21PM +0800, Jason Wang wrote:
> Hi all:
> 
> We do not support XDP batching for TUN since it can only receive one
> packet a time from vhost_net. This series tries to remove this
> limitation by:
> 
> - introduce a TUN specific msg_control that can hold a pointer to an
>   array of XDP buffs
> - try copy and build XDP buff in vhost_net
> - store XDP buffs in an array and submit them once for every N packets
>   from vhost_net
> - since TUN can only do native XDP for datacopy packet, to simplify
>   the logic, split datacopy out logic and only do batching for
>   datacopy.

I like how this rework looks. Pls go ahead and repost as
non-RFC.

> With this series, TX PPS can improve about 34% from 2.9Mpps to
> 3.9Mpps when doing xdp_redirect_map between TAP and ixgbe.
> 
> Thanks
> 
> Jason Wang (12):
>   vhost_net: introduce helper to initialize tx iov iter
>   vhost_net: introduce vhost_exceeds_weight()
>   vhost_net: introduce vhost_has_more_pkts()
>   vhost_net: split out datacopy logic
>   vhost_net: batch update used ring for datacopy TX
>   tuntap: enable premmption early
>   tuntap: simplify error handling in tun_build_skb()
>   tuntap: tweak on the path of non-xdp case in tun_build_skb()
>   tuntap: split out XDP logic
>   vhost_net: build xdp buff
>   vhost_net: passing raw xdp buff to tun
>   vhost_net: batch submitting XDP buffers to underlayer sockets
> 
>  drivers/net/tun.c      | 226 +++++++++++++++++++++++++++----------
>  drivers/vhost/net.c    | 297 ++++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/if_tun.h |   7 ++
>  3 files changed, 444 insertions(+), 86 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply

* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-05-25 17:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: robh, mpe, linux-kernel, virtualization, hch, joe, david,
	linuxppc-dev, elfring, Anshuman Khandual
In-Reply-To: <f77638ddef3af52dd71341083707c9e3745dd505.camel@kernel.crashing.org>

On Thu, May 24, 2018 at 08:27:04AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-05-23 at 21:50 +0300, Michael S. Tsirkin wrote:
> 
> > I re-read that discussion and I'm still unclear on the
> > original question, since I got several apparently
> > conflicting answers.
> > 
> > I asked:
> > 
> > 	Why isn't setting VIRTIO_F_IOMMU_PLATFORM on the
> > 	hypervisor side sufficient?
> 
> I thought I had replied to this...
> 
> There are a couple of reasons:
> 
> - First qemu doesn't know that the guest will switch to "secure mode"
> in advance. There is no difference between a normal and a secure
> partition until the partition does the magic UV call to "enter secure
> mode" and qemu doesn't see any of it. So who can set the flag here ?

Not sure I understand. Just set the flag e.g. on qemu command line.
I might be wrong, but these secure mode things usually
a. require hypervisor side tricks anyway

> - Second, when using VIRTIO_F_IOMMU_PLATFORM, we also make qemu (or
> vhost) go through the emulated MMIO for every access to the guest,
> which adds additional overhead.
> 
> Cheers,
> Ben.

Well it's not supposed to be much slower for the static case.

vhost has a cache so should be fine.

A while ago Paolo implemented a translation cache which should be
perfect for this case - most of the code got merged but
never enabled because of stability issues.

If all else fails, we could teach QEMU to handle the no-iommu case
as if VIRTIO_F_IOMMU_PLATFORM was off.



> > 
> > 
> > >  arch/powerpc/include/asm/dma-mapping.h |  6 ++++++
> > >  arch/powerpc/platforms/pseries/iommu.c | 11 +++++++++++
> > >  drivers/virtio/virtio_ring.c           | 10 ++++++++++
> > >  3 files changed, 27 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> > > index 8fa3945..056e578 100644
> > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
> > >  #define ARCH_HAS_DMA_MMAP_COHERENT
> > >  
> > >  #endif /* __KERNEL__ */
> > > +
> > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > +
> > > +struct virtio_device;
> > > +
> > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > >  #endif	/* _ASM_DMA_MAPPING_H */
> > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > index 06f0296..a2ec15a 100644
> > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > @@ -38,6 +38,7 @@
> > >  #include <linux/of.h>
> > >  #include <linux/iommu.h>
> > >  #include <linux/rculist.h>
> > > +#include <linux/virtio.h>
> > >  #include <asm/io.h>
> > >  #include <asm/prom.h>
> > >  #include <asm/rtas.h>
> > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > >  __setup("multitce=", disable_multitce);
> > >  
> > >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > +
> > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > +	/*
> > > +	 * On protected guest platforms, force virtio core to use DMA
> > > +	 * MAP API for all virtio devices. But there can also be some
> > > +	 * exceptions for individual devices like virtio balloon.
> > > +	 */
> > > +	return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > +}
> > 
> > Isn't this kind of slow?  vring_use_dma_api is on
> > data path and supposed to be very fast.
> > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 21d464a..47ea6c3 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > >   * unconditionally on data path.
> > >   */
> > >  
> > > +#ifndef platform_forces_virtio_dma
> > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > > +
> > >  static bool vring_use_dma_api(struct virtio_device *vdev)
> > >  {
> > > +	if (platform_forces_virtio_dma(vdev))
> > > +		return true;
> > > +
> > >  	if (!virtio_has_iommu_quirk(vdev))
> > >  		return true;
> > >  
> > > -- 
> > > 2.9.3

^ permalink raw reply

* Re: [PATCH net-next v12 0/5] Enable virtio_net to act as a standby for a passthru device
From: Michael S. Tsirkin @ 2018-05-25 17:19 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: alexander.h.duyck, virtio-dev, jiri, kubakici, netdev,
	virtualization, loseweigh, anjali.singhai, aaron.f.brown, davem
In-Reply-To: <1527180917-39737-1-git-send-email-sridhar.samudrala@intel.com>

On Thu, May 24, 2018 at 09:55:12AM -0700, Sridhar Samudrala wrote:
> The main motivation for this patch is to enable cloud service providers
> to provide an accelerated datapath to virtio-net enabled VMs in a 
> transparent manner with no/minimal guest userspace changes. This also
> enables hypervisor controlled live migration to be supported with VMs that
> have direct attached SR-IOV VF devices.
> 
> Patch 1 introduces a failover module that provides a generic interface for 
> paravirtual drivers to listen for netdev register/unregister/link change
> events from pci ethernet devices with the same MAC and takeover their
> datapath. The notifier and event handling code is based on the existing
> netvsc implementation. 
> 
> Patch 2 refactors netvsc to use the registration/notification framework
> introduced by failover module.
> 
> Patch 3 introduces a net_failover driver that provides an automated
> failover mechanism to paravirtual drivers via APIs to create and destroy
> a failover master netdev and mananges a primary and standby slave netdevs
> that get registered via the generic failover infrastructure.
> 
> Patch 4 introduces a new feature bit VIRTIO_NET_F_STANDBY to virtio-net
> that can be used by hypervisor to indicate that virtio_net interface
> should act as a standby for another device with the same MAC address.
> 
> Patch 5 extends virtio_net to use alternate datapath when available and
> registered. When STANDBY feature is enabled, virtio_net driver uese the
> net_failover API to create an additional 'failover' netdev that acts as
> a master device and controls 2 slave devices.  The original virtio_net
> netdev is registered as 'standby' netdev and a passthru/vf device with
> the same MAC gets registered as 'primary' netdev. Both 'standby' and
> 'failover' netdevs are associated with the same 'pci' device.  The user
> accesses the network interface via 'failover' netdev. The 'failover'
> netdev chooses 'primary' netdev as default for transmits when it is
> available with link up and running.
> 
> As this patch series is initially focusing on usecases where hypervisor 
> fully controls the VM networking and the guest is not expected to directly 
> configure any hardware settings, it doesn't expose all the ndo/ethtool ops
> that are supported by virtio_net at this time. To support additional usecases,
> it should be possible to enable additional ops later by caching the state
> in failover netdev and replaying when the 'primary' netdev gets registered. 
>  
> At the time of live migration, the hypervisor needs to unplug the VF device
> from the guest on the source host and reset the MAC filter of the VF to
> initiate failover of datapath to virtio before starting the migration. After
> the migration is completed, the destination hypervisor sets the MAC filter
> on the VF and plugs it back to the guest to switch over to VF datapath.
> 
> This patch is based on the discussion initiated by Jesse on this thread.
> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2

Series:

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> v12:
> - Tested live migration with virtio-net/AVF(i40evf) configured in failover
>   mode while running iperf in background. Tried static ip and dhcp
>   configurations using 'network' scripts and Network Manager.
> - Build tested netvsc module.
> Updates:
> - Extended generic failover module to do common functions like setting
>   FAILOVER_SLAVE flag, registering rx-handler and linking to upper dev in
>   the generic register/unregister handlers.
>   This required adding 3 additional failover ops pre_register, pre_unregister
>   and handle_frame.  netvsc and net_failover drivers are updated to support
>   these ops.
> 
> v11:
> - Split net_failover module into 2 components.
>   1. 'failover' module that provides generic failover infrastructure
>   to register a failover instance and listen for slave events.
>   2. 'net_failover' driver that provides APIs to create/destroy upper
>   netdev and supports 3-netdev model used by virtio-net.
> - Added documentation
> 
> v10:
> - fix net_failover_open() to update failover CARRIER correctly based on
>   standby and primary states. 
> - fix net_failover_handle_frame() to handle frames received on standby
>   when primary is present.
> - replace netdev_upper_dev_link with netdev_master_upper_dev_link and
>   handle lower dev state changes.
> - fix net_failver_create() and net_failover_register() interfaces to
>   use ERR_PTR and avoid arg **
> - disable setting mac address when virtio-net in STANDBY mode
> - document exported symbols
> - added entry to MAINTAINERS file
> 
> v9:
> Select NET_FAILOVER automatically when VIRTIO_NET/HYPERV_NET 
> are enabled. (stephen)
> 
> v8:
> - Made the failover managment routines more robust by updating the feature 
>   bits/other fields in the failover netdev when slave netdevs are 
>   registered/unregistered. (mst)
> - added support for handling vlans.
> - Limited the changes in netvsc to only use the notifier/event/lookups
>   from the failover module. The slave register/unregister/link-change 
>   handlers are only updated to use the getbymac routine to get the 
>   upper netdev. There is no change in their functionality. (stephen)
> - renamed structs/function/file names to use net_failover prefix. (mst)
> 
> v7
> - Rename 'bypass/active/backup' terminology with 'failover/primary/standy'
>   (jiri, mst)
> - re-arranged dev_open() and dev_set_mtu() calls in the register routines
>   so that they don't get called for 2-netdev model. (stephen)
> - fixed select_queue() routine to do queue selection based on VF if it is
>   registered as primary. (stephen)
> -  minor bugfixes
> 
> v6 RFC:
>   Simplified virtio_net changes by moving all the ndo_ops of the 
>   bypass_netdev and create/destroy of bypass_netdev to 'bypass' module.
>   avoided 2 phase registration(driver + instances).
>   introduced IFF_BYPASS/IFF_BYPASS_SLAVE dev->priv_flags 
>   replaced mutex with a spinlock
> 
> v5 RFC:
>   Based on Jiri's comments, moved the common functionality to a 'bypass'
>   module so that the same notifier and event handlers to handle child
>   register/unregister/link change events can be shared between virtio_net
>   and netvsc.
>   Improved error handling based on Siwei's comments.
> v4:
> - Based on the review comments on the v3 version of the RFC patch and
>   Jakub's suggestion for the naming issue with 3 netdev solution,
>   proposed 3 netdev in-driver bonding solution for virtio-net.
> v3 RFC:
> - Introduced 3 netdev model and pointed out a couple of issues with
>   that model and proposed 2 netdev model to avoid these issues.
> - Removed broadcast/multicast optimization and only use virtio as
>   backup path when VF is unplugged.
> v2 RFC:
> - Changed VIRTIO_NET_F_MASTER to VIRTIO_NET_F_BACKUP (mst)
> - made a small change to the virtio-net xmit path to only use VF datapath
>   for unicasts. Broadcasts/multicasts use virtio datapath. This avoids
>   east-west broadcasts to go over the PCI link.
> - added suppport for the feature bit in qemu
> 
> Sridhar Samudrala (5):
>   net: Introduce generic failover module
>   netvsc: refactor notifier/event handling code to use the failover
>     framework
>   net: Introduce net_failover driver
>   virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
>   virtio_net: Extend virtio to use VF datapath when available
> 
>  Documentation/networking/failover.rst     |  18 +
>  Documentation/networking/net_failover.rst | 116 +++++
>  MAINTAINERS                               |  16 +
>  drivers/net/Kconfig                       |  13 +
>  drivers/net/Makefile                      |   1 +
>  drivers/net/hyperv/Kconfig                |   1 +
>  drivers/net/hyperv/hyperv_net.h           |   2 +
>  drivers/net/hyperv/netvsc_drv.c           | 222 ++------
>  drivers/net/net_failover.c                | 836 ++++++++++++++++++++++++++++++
>  drivers/net/virtio_net.c                  |  40 +-
>  include/linux/netdevice.h                 |  16 +
>  include/net/failover.h                    |  36 ++
>  include/net/net_failover.h                |  40 ++
>  include/uapi/linux/virtio_net.h           |   3 +
>  net/Kconfig                               |  13 +
>  net/core/Makefile                         |   1 +
>  net/core/failover.c                       | 315 +++++++++++
>  17 files changed, 1522 insertions(+), 167 deletions(-)
>  create mode 100644 Documentation/networking/failover.rst
>  create mode 100644 Documentation/networking/net_failover.rst
>  create mode 100644 drivers/net/net_failover.c
>  create mode 100644 include/net/failover.h
>  create mode 100644 include/net/net_failover.h
>  create mode 100644 net/core/failover.c
> 
> -- 
> 2.14.3

^ permalink raw reply

* Re: [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-25 17:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Kate Stewart, Nicolas Pitre, the arch/x86 maintainers,
	Sergey Senozhatsky, Petr Mladek, Len Brown, Peter Zijlstra,
	Yonghong Song, Christopher Li, Dave Hansen, Dominik Brodowski,
	LKML, Masahiro Yamada, Jan Beulich, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Alok Kataria,
	Linux Doc Mailing List, linux-arch, Jonathan Corbet, Herbert Xu
In-Reply-To: <20180525091447.GC9666@amd>

On Fri, May 25, 2018 at 2:14 AM Pavel Machek <pavel@ucw.cz> wrote:

> On Thu 2018-05-24 09:35:42, Thomas Garnier wrote:
> > On Thu, May 24, 2018 at 4:03 AM Pavel Machek <pavel@ucw.cz> wrote:
> >
> > > On Wed 2018-05-23 12:54:03, Thomas Garnier wrote:
> > > > Change the assembly code to use only relative references of symbols
for
> > the
> > > > kernel to be PIE compatible.
> > > >
> > > > Position Independent Executable (PIE) support will allow to
extended the
> > > > KASLR randomization range below the -2G memory limit.
> >
> > > What testing did this get?
> >
> > Tested boot, hibernation and performance on qemu and dedicated machine.

> Well, this is suspend, not hibernation code.

> So "sudo pm-suspend" or "echo mem > /sys/power/state" would be good
> way to test this.

Thanks, it worked. I added this to the testsuite I use for KASLR.


> Thanks,
>                                                          Pavel

> > > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S
> > b/arch/x86/kernel/acpi/wakeup_64.S
> > > > index 50b8ed0317a3..472659c0f811 100644
> > > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > > @@ -14,7 +14,7 @@
> > > >        * Hooray, we are in Long 64-bit mode (but still running in
low
> > memory)
> > > >        */
> > > >  ENTRY(wakeup_long64)
> > > > -     movq    saved_magic, %rax
> > > > +     movq    saved_magic(%rip), %rax
> > > >       movq    $0x123456789abcdef0, %rdx
> > > >       cmpq    %rdx, %rax
> > > >       jne     bogus_64_magic
> >
> > > Because, as comment says, this is rather tricky code.
> >
> > I agree, I think maintainers feedback is very important for this
patchset.


> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Thomas

^ permalink raw reply

* Re: [PATCH v3 09/27] x86/acpi: Adapt assembly for PIE support
From: Pavel Machek @ 2018-05-25  9:14 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kate Stewart, Nicolas Pitre, the arch/x86 maintainers,
	Sergey Senozhatsky, Petr Mladek, Len Brown, Peter Zijlstra,
	Yonghong Song, Christopher Li, Dave Hansen, Dominik Brodowski,
	LKML, Masahiro Yamada, Jan Beulich, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Alok Kataria,
	Linux Doc Mailing List, linux-arch, Jonathan Corbet, Herbert Xu
In-Reply-To: <CAJcbSZFJ84+VC5xDQZGHctupdqwmMBgqzLzFRqCTBpi5t-2Gvw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1606 bytes --]

On Thu 2018-05-24 09:35:42, Thomas Garnier wrote:
> On Thu, May 24, 2018 at 4:03 AM Pavel Machek <pavel@ucw.cz> wrote:
> 
> > On Wed 2018-05-23 12:54:03, Thomas Garnier wrote:
> > > Change the assembly code to use only relative references of symbols for
> the
> > > kernel to be PIE compatible.
> > >
> > > Position Independent Executable (PIE) support will allow to extended the
> > > KASLR randomization range below the -2G memory limit.
> 
> > What testing did this get?
> 
> Tested boot, hibernation and performance on qemu and dedicated machine.

Well, this is suspend, not hibernation code.

So "sudo pm-suspend" or "echo mem > /sys/power/state" would be good
way to test this.

Thanks,
							Pavel

> > > diff --git a/arch/x86/kernel/acpi/wakeup_64.S
> b/arch/x86/kernel/acpi/wakeup_64.S
> > > index 50b8ed0317a3..472659c0f811 100644
> > > --- a/arch/x86/kernel/acpi/wakeup_64.S
> > > +++ b/arch/x86/kernel/acpi/wakeup_64.S
> > > @@ -14,7 +14,7 @@
> > >        * Hooray, we are in Long 64-bit mode (but still running in low
> memory)
> > >        */
> > >  ENTRY(wakeup_long64)
> > > -     movq    saved_magic, %rax
> > > +     movq    saved_magic(%rip), %rax
> > >       movq    $0x123456789abcdef0, %rdx
> > >       cmpq    %rdx, %rax
> > >       jne     bogus_64_magic
> 
> > Because, as comment says, this is rather tricky code.
> 
> I agree, I think maintainers feedback is very important for this patchset.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 11/27] x86/power/64: Adapt assembly for PIE support
From: Pavel Machek @ 2018-05-25  9:10 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kate Stewart, Nicolas Pitre, the arch/x86 maintainers,
	Sergey Senozhatsky, Petr Mladek, Len Brown, Peter Zijlstra,
	Yonghong Song, Christopher Li, Dave Hansen, Dominik Brodowski,
	LKML, Masahiro Yamada, Jan Beulich, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Alok Kataria,
	Linux Doc Mailing List, linux-arch, Jonathan Corbet, Herbert Xu
In-Reply-To: <CAJcbSZEnRhFDvYCG4ORH71LrcH2bAzOU5yF-oa9KkLpKo5UuSQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 962 bytes --]

On Thu 2018-05-24 09:37:20, Thomas Garnier wrote:
> On Thu, May 24, 2018 at 4:04 AM Pavel Machek <pavel@ucw.cz> wrote:
> 
> > On Wed 2018-05-23 12:54:05, Thomas Garnier wrote:
> > > Change the assembly code to use only relative references of symbols for
> the
> > > kernel to be PIE compatible.
> > >
> > > Position Independent Executable (PIE) support will allow to extended the
> > > KASLR randomization range below the -2G memory limit.
> > >
> > > Signed-off-by: Thomas Garnier <thgarnie@google.com>
> 
> > Again, was this tested?
> 
> Hibernation was tested as much as I can with qemu and my dedicated
>machine.

Ok, good.

Acked-by: Pavel Machek <pavel@ucw.cz>

> Any specific test you think I should use?

Hibernation working should be good enough test for this.

Thanks,
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v5 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-05-25  3:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <b19e0888-61dc-c067-0b05-fde3b5eb0902@redhat.com>

On Fri, May 25, 2018 at 10:31:26AM +0800, Jason Wang wrote:
> On 2018年05月22日 16:16, Tiwei Bie wrote:
> > Hello everyone,
> > 
> > This RFC implements packed ring support in virtio driver.
> > 
> > Some simple functional tests have been done with Jason's
> > packed ring implementation in vhost (RFC v4):
> > 
> > https://lkml.org/lkml/2018/5/16/501
> > 
> > Both of ping and netperf worked as expected w/ EVENT_IDX
> > disabled. Ping worked as expected w/ EVENT_IDX enabled,
> > but netperf didn't (A hack has been added in the driver
> > to make netperf test pass in this case. The hack can be
> > found by `grep -rw XXX` in the code).
> 
> Looks like this is because a bug in vhost which wrongly track signalled_used
> and may miss an interrupt. After fixing that, both side works like a charm.
> 
> I'm testing vIOMMU and zerocopy, and will post a new version shortly. (Hope
> it would be the last RFC version).

Great to know that. Thanks a lot! :)

Best regards,
Tiwei Bie
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v5 0/5] virtio: support packed ring
From: Jason Wang @ 2018-05-25  2:31 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180522081648.14768-1-tiwei.bie@intel.com>



On 2018年05月22日 16:16, Tiwei Bie wrote:
> Hello everyone,
>
> This RFC implements packed ring support in virtio driver.
>
> Some simple functional tests have been done with Jason's
> packed ring implementation in vhost (RFC v4):
>
> https://lkml.org/lkml/2018/5/16/501
>
> Both of ping and netperf worked as expected w/ EVENT_IDX
> disabled. Ping worked as expected w/ EVENT_IDX enabled,
> but netperf didn't (A hack has been added in the driver
> to make netperf test pass in this case. The hack can be
> found by `grep -rw XXX` in the code).

Looks like this is because a bug in vhost which wrongly track 
signalled_used and may miss an interrupt. After fixing that, both side 
works like a charm.

I'm testing vIOMMU and zerocopy, and will post a new version shortly. 
(Hope it would be the last RFC version).

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC V4 PATCH 8/8] vhost: event suppression for packed ring
From: Jason Wang @ 2018-05-25  2:28 UTC (permalink / raw)
  To: mst; +Cc: kvm, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <1526473941-16199-9-git-send-email-jasowang@redhat.com>



On 2018年05月16日 20:32, Jason Wang wrote:
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (event_flags == cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE))
> +		return false;
> +	else if (event_flags == cpu_to_vhost16(vq, RING_EVENT_FLAGS_ENABLE))
> +		return true;
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_warp) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;

We should move those idx tracking before checking event_flags. Otherwise 
we may lose interrupts because of a wrong signalled_used value.

Thanks

> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, off_wrap, new, old);
> +}

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net] vhost: synchronize IOTLB message with dev cleanup
From: David Miller @ 2018-05-25  2:10 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1526990337-24892-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Tue, 22 May 2018 19:58:57 +0800

> DaeRyong Jeong reports a race between vhost_dev_cleanup() and
> vhost_process_iotlb_msg():
> 
> Thread interleaving:
> CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
> (In the case of both VHOST_IOTLB_UPDATE and
> VHOST_IOTLB_INVALIDATE)
> =====						=====
> 						vhost_umem_clean(dev->iotlb);
> if (!dev->iotlb) {
> 	        ret = -EFAULT;
> 		        break;
> }
> 						dev->iotlb = NULL;
> 
> The reason is we don't synchronize between them, fixing by protecting
> vhost_process_iotlb_msg() with dev mutex.
> 
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] vhost: synchronize IOTLB message with dev cleanup
From: Michael S. Tsirkin @ 2018-05-25  0:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <1526990337-24892-1-git-send-email-jasowang@redhat.com>

On Tue, May 22, 2018 at 07:58:57PM +0800, Jason Wang wrote:
> DaeRyong Jeong reports a race between vhost_dev_cleanup() and
> vhost_process_iotlb_msg():
> 
> Thread interleaving:
> CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
> (In the case of both VHOST_IOTLB_UPDATE and
> VHOST_IOTLB_INVALIDATE)
> =====						=====
> 						vhost_umem_clean(dev->iotlb);
> if (!dev->iotlb) {
> 	        ret = -EFAULT;
> 		        break;
> }
> 						dev->iotlb = NULL;
> 
> The reason is we don't synchronize between them, fixing by protecting
> vhost_process_iotlb_msg() with dev mutex.
> 
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

We should think of a way to have a per-vq lock here, but for now:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/vhost/vhost.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..f0be5f3 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -981,6 +981,7 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  {
>  	int ret = 0;
>  
> +	mutex_lock(&dev->mutex);
>  	vhost_dev_lock_vqs(dev);
>  	switch (msg->type) {
>  	case VHOST_IOTLB_UPDATE:
> @@ -1016,6 +1017,8 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
>  	}
>  
>  	vhost_dev_unlock_vqs(dev);
> +	mutex_unlock(&dev->mutex);
> +
>  	return ret;
>  }
>  ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support
From: Thomas Garnier via Virtualization @ 2018-05-24 20:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Kate Stewart, Nicolas Pitre, the arch/x86 maintainers,
	Sergey Senozhatsky, Len Brown, Tom Lendacky, Peter Zijlstra,
	Yonghong Song, Christopher Li, Dave Hansen, Dominik Brodowski,
	LKML, Masahiro Yamada, Jan Beulich, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Alok Kataria,
	Linux Doc Mailing List, linux-arch, Jonathan Corbet <corbet@
In-Reply-To: <20180524161622.4202319f@gandalf.local.home>

On Thu, May 24, 2018 at 1:16 PM Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 24 May 2018 13:40:24 +0200
> Petr Mladek <pmladek@suse.com> wrote:

> > On Wed 2018-05-23 12:54:15, Thomas Garnier wrote:
> > > When using -fPIE/PIC with function tracing, the compiler generates a
> > > call through the GOT (call *__fentry__@GOTPCREL). This instruction
> > > takes 6 bytes instead of 5 on the usual relative call.
> > >
> > > If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte
nop
> > > so ftrace can handle the previous 5-bytes as before.
> > >
> > > Position Independent Executable (PIE) support will allow to extended
the
> > > KASLR randomization range below the -2G memory limit.
> > >
> > > Signed-off-by: Thomas Garnier <thgarnie@google.com>
> > > ---
> > >  arch/x86/include/asm/ftrace.h   |  6 +++--
> > >  arch/x86/include/asm/sections.h |  4 ++++
> > >  arch/x86/kernel/ftrace.c        | 42
+++++++++++++++++++++++++++++++--
> > >  3 files changed, 48 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/ftrace.h
b/arch/x86/include/asm/ftrace.h
> > > index c18ed65287d5..8f2decce38d8 100644
> > > --- a/arch/x86/include/asm/ftrace.h
> > > +++ b/arch/x86/include/asm/ftrace.h
> > > @@ -25,9 +25,11 @@ extern void __fentry__(void);
> > >  static inline unsigned long ftrace_call_adjust(unsigned long addr)
> > >  {
> > >     /*
> > > -    * addr is the address of the mcount call instruction.
> > > -    * recordmcount does the necessary offset calculation.
> > > +    * addr is the address of the mcount call instruction. PIE has
always a
> > > +    * byte added to the start of the function.
> > >      */
> > > +   if (IS_ENABLED(CONFIG_X86_PIE))
> > > +           addr -= 1;
> >
> > This seems to modify the address even for modules that are _not_
compiled with
> > PIE, see below.

> Can one load a module not compiled for PIE in a kernel with PIE?

> >
> > >     return addr;
> > >  }
> > >
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 01ebcb6f263e..73b3c30cb7a3 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip,
unsigned const char *old_code,
> > >     return 0;
> > >  }
> > >
> > > +/* Bytes before call GOT offset */
> > > +const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
> > > +
> > > +static int
> > > +ftrace_modify_initial_code(unsigned long ip, unsigned const char
*old_code,
> > > +                      unsigned const char *new_code)
> > > +{
> > > +   unsigned char replaced[MCOUNT_INSN_SIZE + 1];
> > > +
> > > +   ftrace_expected = old_code;
> > > +
> > > +   /*
> > > +    * If PIE is not enabled or no GOT call was found, default to the
> > > +    * original approach to code modification.
> > > +    */
> > > +   if (!IS_ENABLED(CONFIG_X86_PIE) ||
> > > +       probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
> > > +       memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
> > > +           return ftrace_modify_code_direct(ip, old_code, new_code);
> >
> > And this looks like an attempt to handle modules compiled without
> > PIE. Does it works with the right ip in that case?

> I'm guessing the || is for the "or no GOT call was found", but it
> doesn't explain why no GOT would be found.

Yes, maybe I could have made it work by using text_ip_addr earlier.


> >
> > I wonder if a better solution would be to update
> > scripts/recordmcount.c to store the incremented location into the
module.

I will look into it.


> If recordmcount.c can handle this, then I think that's the preferred
> approach. Thanks!

> -- Steve

> >
> > IMPORTANT: I have only vague picture about how this all works. It is
> > possible that I am completely wrong. The code might be correct,
> > especially if you tested this situation.
> >
> > Best Regards,
> > Petr



-- 
Thomas

^ permalink raw reply

* Re: [PATCH v3 21/27] x86/ftrace: Adapt function tracing for PIE support
From: Steven Rostedt @ 2018-05-24 20:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Kate Stewart, Nicolas Pitre, x86, Sergey Senozhatsky, kvm,
	Len Brown, Peter Zijlstra, Yonghong Song, Christopher Li,
	Dave Hansen, Dominik Brodowski, linux-kernel, Masahiro Yamada,
	Jan Beulich, Pavel Machek, H . Peter Anvin, kernel-hardening,
	Christoph Lameter, Alok Kataria, linux-doc, linux-arch,
	Jonathan Corbet, Herbert Xu, Baoquan He, David Woodhouse
In-Reply-To: <20180524114024.pa67zjipy5qcg4tm@pathway.suse.cz>

On Thu, 24 May 2018 13:40:24 +0200
Petr Mladek <pmladek@suse.com> wrote:

> On Wed 2018-05-23 12:54:15, Thomas Garnier wrote:
> > When using -fPIE/PIC with function tracing, the compiler generates a
> > call through the GOT (call *__fentry__@GOTPCREL). This instruction
> > takes 6 bytes instead of 5 on the usual relative call.
> > 
> > If PIE is enabled, replace the 6th byte of the GOT call by a 1-byte nop
> > so ftrace can handle the previous 5-bytes as before.
> > 
> > Position Independent Executable (PIE) support will allow to extended the
> > KASLR randomization range below the -2G memory limit.
> > 
> > Signed-off-by: Thomas Garnier <thgarnie@google.com>
> > ---
> >  arch/x86/include/asm/ftrace.h   |  6 +++--
> >  arch/x86/include/asm/sections.h |  4 ++++
> >  arch/x86/kernel/ftrace.c        | 42 +++++++++++++++++++++++++++++++--
> >  3 files changed, 48 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> > index c18ed65287d5..8f2decce38d8 100644
> > --- a/arch/x86/include/asm/ftrace.h
> > +++ b/arch/x86/include/asm/ftrace.h
> > @@ -25,9 +25,11 @@ extern void __fentry__(void);
> >  static inline unsigned long ftrace_call_adjust(unsigned long addr)
> >  {
> >  	/*
> > -	 * addr is the address of the mcount call instruction.
> > -	 * recordmcount does the necessary offset calculation.
> > +	 * addr is the address of the mcount call instruction. PIE has always a
> > +	 * byte added to the start of the function.
> >  	 */
> > +	if (IS_ENABLED(CONFIG_X86_PIE))
> > +		addr -= 1;  
> 
> This seems to modify the address even for modules that are _not_ compiled with
> PIE, see below.

Can one load a module not compiled for PIE in a kernel with PIE?

> 
> >  	return addr;
> >  }
> >  
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 01ebcb6f263e..73b3c30cb7a3 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -135,6 +135,44 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
> >  	return 0;
> >  }
> >  
> > +/* Bytes before call GOT offset */
> > +const unsigned char got_call_preinsn[] = { 0xff, 0x15 };
> > +
> > +static int
> > +ftrace_modify_initial_code(unsigned long ip, unsigned const char *old_code,
> > +			   unsigned const char *new_code)
> > +{
> > +	unsigned char replaced[MCOUNT_INSN_SIZE + 1];
> > +
> > +	ftrace_expected = old_code;
> > +
> > +	/*
> > +	 * If PIE is not enabled or no GOT call was found, default to the
> > +	 * original approach to code modification.
> > +	 */
> > +	if (!IS_ENABLED(CONFIG_X86_PIE) ||
> > +	    probe_kernel_read(replaced, (void *)ip, sizeof(replaced)) ||
> > +	    memcmp(replaced, got_call_preinsn, sizeof(got_call_preinsn)))
> > +		return ftrace_modify_code_direct(ip, old_code, new_code);  
> 
> And this looks like an attempt to handle modules compiled without
> PIE. Does it works with the right ip in that case?

I'm guessing the || is for the "or no GOT call was found", but it
doesn't explain why no GOT would be found.

> 
> I wonder if a better solution would be to update
> scripts/recordmcount.c to store the incremented location into the module.

If recordmcount.c can handle this, then I think that's the preferred
approach. Thanks!

-- Steve

> 
> IMPORTANT: I have only vague picture about how this all works. It is
> possible that I am completely wrong. The code might be correct,
> especially if you tested this situation.
> 
> Best Regards,
> Petr

^ permalink raw reply

* Re: [PATCH] block drivers/block: Use octal not symbolic permissions
From: Jens Axboe @ 2018-05-24 19:40 UTC (permalink / raw)
  To: Joe Perches, Ed L. Cashin, Philipp Reisner, Lars Ellenberg,
	Jiri Kosina, Josef Bacik, linux-block, Ilya Dryomov, Sage Weil,
	Alex Elder, Philip Kelleher, Michael S. Tsirkin, Jason Wang,
	Konrad Rzeszutek Wilk, Roger Pau Monné, Boris Ostrovsky,
	Juergen Gross
  Cc: linux-kernel, nbd, xen-devel, ceph-devel, virtualization,
	drbd-dev
In-Reply-To: <e64c2846b1b9102a52eba7b242f746e7bfff1464.camel@perches.com>

On 5/24/18 7:01 AM, Joe Perches wrote:
> On Thu, 2018-05-24 at 06:47 -0600, Jens Axboe wrote:
>> On 5/23/18 4:35 PM, Joe Perches wrote:
>>> On Wed, 2018-05-23 at 15:27 -0600, Jens Axboe wrote:
>>>> On 5/23/18 2:05 PM, Joe Perches wrote:
>>>>> Convert the S_<FOO> symbolic permissions to their octal equivalents as
>>>>> using octal and not symbolic permissions is preferred by many as more
>>>>> readable.
>>>>>
>>>>> see: https://lkml.org/lkml/2016/8/2/1945
>>>>>
>>>>> Done with automated conversion via:
>>>>> $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>
>>>>>
>>>>> Miscellanea:
>>>>>
>>>>> o Wrapped modified multi-line calls to a single line where appropriate
>>>>> o Realign modified multi-line calls to open parenthesis
>>>>
>>>> Honestly, I see this as pretty needless churn.
>>>
>>> btw:
>>>
>>> There is currently a mixture of symbolic and octal
>>> permissions uses in block and drivers/block
>>>
>>> ie: 94 octal and 146 symbolic uses.
>>>
>>> If this is applied, all would become octal.
>>
>> That does help justify the change. My main worry here is creating
>> unnecessary conflicts, which is always annoying. But it's even more
>> annoying when the change creating the conflict isn't really that
>> important at all. Case in point, the patch doesn't apply to the
>> for-4.18/block branch that it should go into...
> 
> Done against most recent -next as it's basically impossible
> to do anything against multiple private trees.
> 
> Also, the script that generated the patch is in the changelog
> so it's simple to rerun.

Alright, applied, thanks.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH net] vhost: synchronize IOTLB message with dev cleanup
From: David Miller @ 2018-05-24 19:35 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, virtualization, linux-kernel, kvm, mst
In-Reply-To: <1526990337-24892-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Tue, 22 May 2018 19:58:57 +0800

> DaeRyong Jeong reports a race between vhost_dev_cleanup() and
> vhost_process_iotlb_msg():
> 
> Thread interleaving:
> CPU0 (vhost_process_iotlb_msg)			CPU1 (vhost_dev_cleanup)
> (In the case of both VHOST_IOTLB_UPDATE and
> VHOST_IOTLB_INVALIDATE)
> =====						=====
> 						vhost_umem_clean(dev->iotlb);
> if (!dev->iotlb) {
> 	        ret = -EFAULT;
> 		        break;
> }
> 						dev->iotlb = NULL;
> 
> The reason is we don't synchronize between them, fixing by protecting
> vhost_process_iotlb_msg() with dev mutex.
> 
> Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Michael, please review.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox