virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH 1/2] virtio/s390: avoid race on vcdev->config
       [not found] ` <20180912140202.12292-2-pasic@linux.ibm.com>
@ 2018-09-18 18:29   ` Cornelia Huck
       [not found]     ` <2f27c41d-4465-0fce-bbbb-b7b22a179eae@linux.ibm.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Cornelia Huck @ 2018-09-18 18:29 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization

On Wed, 12 Sep 2018 16:02:01 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> Currently we have a race on vcdev->config in virtio_ccw_get_config() and
> in virtio_ccw_set_config().
> 
> This normally does not cause problems, as these are usually infrequent
> operations. But occasionally sysfs attributes are directly dependent on
> pieces of virio config and trigger a get on each read. This gives us at
> least one trigger.

So, the problem is that you might get unexpected/inconsistent config
information?

> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> ---
>  drivers/s390/virtio/virtio_ccw.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index 8f5c1d7f751a..a5e8530a3391 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -828,6 +828,7 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
>  	int ret;
>  	struct ccw1 *ccw;
>  	void *config_area;
> +	unsigned long flags;
>  
>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
>  	if (!ccw)
> @@ -846,11 +847,13 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
>  	if (ret)
>  		goto out_free;
>  
> +	spin_lock_irqsave(&vcdev->lock, flags);

I'm not sure that vcdev->lock is the right lock to use for this, but
I'm a bit unsure about what you're guarding against here.

>  	memcpy(vcdev->config, config_area, offset + len);
> -	if (buf)
> -		memcpy(buf, &vcdev->config[offset], len);
>  	if (vcdev->config_ready < offset + len)
>  		vcdev->config_ready = offset + len;
> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> +	if (buf)
> +		memcpy(buf, config_area + offset, len);
>  
>  out_free:
>  	kfree(config_area);
> @@ -864,6 +867,7 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
>  	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
>  	struct ccw1 *ccw;
>  	void *config_area;
> +	unsigned long flags;
>  
>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
>  	if (!ccw)
> @@ -876,9 +880,11 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
>  	/* Make sure we don't overwrite fields. */
>  	if (vcdev->config_ready < offset)
>  		virtio_ccw_get_config(vdev, 0, NULL, offset);
> +	spin_lock_irqsave(&vcdev->lock, flags);
>  	memcpy(&vcdev->config[offset], buf, len);
>  	/* Write the config area to the host. */
>  	memcpy(config_area, vcdev->config, sizeof(vcdev->config));
> +	spin_unlock_irqrestore(&vcdev->lock, flags);
>  	ccw->cmd_code = CCW_CMD_WRITE_CONF;
>  	ccw->flags = 0;
>  	ccw->count = offset + len;

One thing that might be a problem here is how reading/writing the
config stuff works for virtio-ccw: As channel devices don't have a
config space like e.g. PCI devices, I had to come up with a way to
implement something like that via dedicated channel commands. I did not
want to go via a payload that would provide offset/length, but went
with reading/writing everything before the value to be read/written as
well. That means we need to call read before write to make sure we
don't overwrite things (as the comment states), and how this is done
might be problematic.

I'm thinking what we may need is a way to make "read and then write" a
single operation and make sure that it does not run concurrently with
the simple read operation. Factor out the proper invocation of the read
command and the status update, have get_config call this with a reader
lock and have set_config call the read-then-write combo with a write
lock, maybe?

I might not understand the problem correctly, though (or I might have
spent too much time reading email today already :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
       [not found] ` <20180912140202.12292-3-pasic@linux.ibm.com>
@ 2018-09-18 18:45   ` Cornelia Huck
       [not found]     ` <c0148f86-b7a5-0c66-146d-f2dbcd678436@linux.ibm.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Cornelia Huck @ 2018-09-18 18:45 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization

On Wed, 12 Sep 2018 16:02:02 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> While ccw_io_helper() seems like intended to be exclusive in a sense that
> it is supposed to facilitate I/O for at most one thread at any given
> time, there is actually nothing ensuring that threads won't pile up at
> vcdev->wait_q. If they all threads get woken up and see the status that
> belongs to some other request as their own. This can lead to bugs. For an
> example see :
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1788432
> 
> This normally does not cause problems, as these are usually infrequent
> operations that happen in a well defined sequence and normally do not
> fail. But occasionally sysfs attributes are directly dependent
> on pieces of virio config and trigger a get on each read.  This gives us
> at least one method to trigger races.

Yes, the idea behind ccw_io_helper() was to provide a simple way to use
the inherently asynchronous channel I/O operations in a synchronous
way, as that's what the virtio callbacks expect. I did not consider
multiple callbacks for a device running at the same time; but if the
interface allows that, we obviously need to be able to handle it.

Has this only been observed for the config get/set commands? (The
read-before-write thing?)

> 
> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> Reported-by: Colin Ian King <colin.king@canonical.com>
> ---
> 
> This is a big hammer -- mutex on virtio_ccw device level would more than
> suffice. But I don't think it hurts, and maybe there is a better way e.g.
> one using some common ccw/cio mechanisms to address this. That's why this
> is an RFC.

I'm for using more delicate tools, if possible :)

We basically have two options:
- Have a way to queue I/O operations and then handle them in sequence.
  Creates complexity, and is likely overkill. (We already have a kind
  of serialization because we re-submit the channel program until the
  hypervisor accepts it; the problem comes from the wait queue usage.)
- Add serialization around the submit/wait procedure (as you did), but
  with a per-device mutex. That looks like the easiest solution.

> ---
>  drivers/s390/virtio/virtio_ccw.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> index a5e8530a3391..36252f344660 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -289,6 +289,8 @@ static int doing_io(struct virtio_ccw_device *vcdev, __u32 flag)
>  	return ret;
>  }
>  
> +DEFINE_MUTEX(vcio_mtx);
> +
>  static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  			 struct ccw1 *ccw, __u32 intparm)
>  {
> @@ -296,6 +298,7 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  	unsigned long flags;
>  	int flag = intparm & VIRTIO_CCW_INTPARM_MASK;
>  
> +	mutex_lock(&vcio_mtx);
>  	do {
>  		spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
>  		ret = ccw_device_start(vcdev->cdev, ccw, intparm, 0, 0);
> @@ -308,7 +311,9 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev,
>  		cpu_relax();
>  	} while (ret == -EBUSY);

We probably still want to keep this while loop to be on the safe side
(unsolicited status from the hypervisor, for example.)

>  	wait_event(vcdev->wait_q, doing_io(vcdev, flag) == 0);
> -	return ret ? ret : vcdev->err;
> +	ret = ret ? ret : vcdev->err;
> +	mutex_unlock(&vcio_mtx);
> +	return ret;
>  }
>  
>  static void virtio_ccw_drop_indicator(struct virtio_ccw_device *vcdev,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] virtio/s390: avoid race on vcdev->config
       [not found]     ` <2f27c41d-4465-0fce-bbbb-b7b22a179eae@linux.ibm.com>
@ 2018-09-19 11:28       ` Cornelia Huck
  0 siblings, 0 replies; 5+ messages in thread
From: Cornelia Huck @ 2018-09-19 11:28 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization

On Wed, 19 Sep 2018 00:52:14 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/18/2018 08:29 PM, Cornelia Huck wrote:
> > On Wed, 12 Sep 2018 16:02:01 +0200
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> >> Currently we have a race on vcdev->config in virtio_ccw_get_config() and
> >> in virtio_ccw_set_config().
> >>
> >> This normally does not cause problems, as these are usually infrequent
> >> operations. But occasionally sysfs attributes are directly dependent on
> >> pieces of virio config and trigger a get on each read. This gives us at
> >> least one trigger.  
> > 
> > So, the problem is that you might get unexpected/inconsistent config
> > information?
> >   
> >>
> >> Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
> >> ---
> >>  drivers/s390/virtio/virtio_ccw.c | 10 ++++++++--
> >>  1 file changed, 8 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
> >> index 8f5c1d7f751a..a5e8530a3391 100644
> >> --- a/drivers/s390/virtio/virtio_ccw.c
> >> +++ b/drivers/s390/virtio/virtio_ccw.c
> >> @@ -828,6 +828,7 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
> >>  	int ret;
> >>  	struct ccw1 *ccw;
> >>  	void *config_area;
> >> +	unsigned long flags;
> >>  
> >>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
> >>  	if (!ccw)
> >> @@ -846,11 +847,13 @@ static void virtio_ccw_get_config(struct virtio_device *vdev,
> >>  	if (ret)
> >>  		goto out_free;
> >>  
> >> +	spin_lock_irqsave(&vcdev->lock, flags);  
> > 
> > I'm not sure that vcdev->lock is the right lock to use for this, but
> > I'm a bit unsure about what you're guarding against here.>  
> 
> I'm guarding against multiple threads using the shared state that is
> the config member of struct virtio_ccw_device so that at least one
> writes. I will continue with an example below.

Ok.

>  
> >>  	memcpy(vcdev->config, config_area, offset + len);
> >> -	if (buf)
> >> -		memcpy(buf, &vcdev->config[offset], len);
> >>  	if (vcdev->config_ready < offset + len)
> >>  		vcdev->config_ready = offset + len;
> >> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> >> +	if (buf)
> >> +		memcpy(buf, config_area + offset, len);
> >>  
> >>  out_free:
> >>  	kfree(config_area);
> >> @@ -864,6 +867,7 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
> >>  	struct virtio_ccw_device *vcdev = to_vc_device(vdev);
> >>  	struct ccw1 *ccw;
> >>  	void *config_area;
> >> +	unsigned long flags;
> >>  
> >>  	ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL);
> >>  	if (!ccw)
> >> @@ -876,9 +880,11 @@ static void virtio_ccw_set_config(struct virtio_device *vdev,
> >>  	/* Make sure we don't overwrite fields. */
> >>  	if (vcdev->config_ready < offset)
> >>  		virtio_ccw_get_config(vdev, 0, NULL, offset);
> >> +	spin_lock_irqsave(&vcdev->lock, flags);
> >>  	memcpy(&vcdev->config[offset], buf, len);
> >>  	/* Write the config area to the host. */
> >>  	memcpy(config_area, vcdev->config, sizeof(vcdev->config));  
> 
> While in this section which is critical now, we could have raced
> with another thread that is writing the vcdev->config (critical section in get)
> that we are reading. That could result in something that could never happen if the
> operations are serialized.

So, the race is basically on the memcpy?

> 
> >> +	spin_unlock_irqrestore(&vcdev->lock, flags);
> >>  	ccw->cmd_code = CCW_CMD_WRITE_CONF;
> >>  	ccw->flags = 0;
> >>  	ccw->count = offset + len;  
> > 
> > One thing that might be a problem here is how reading/writing the
> > config stuff works for virtio-ccw: As channel devices don't have a
> > config space like e.g. PCI devices, I had to come up with a way to
> > implement something like that via dedicated channel commands. I did not
> > want to go via a payload that would provide offset/length, but went
> > with reading/writing everything before the value to be read/written as
> > well. That means we need to call read before write to make sure we
> > don't overwrite things (as the comment states), and how this is done
> > might be problematic.
> >   
> 
> Nod.
> 
> > I'm thinking what we may need is a way to make "read and then write" a
> > single operation and make sure that it does not run concurrently with
> > the simple read operation. Factor out the proper invocation of the read
> > command and the status update, have get_config call this with a reader
> > lock and have set_config call the read-then-write combo with a write
> > lock, maybe?
> >   
> 
> I'm inclined to say. The other tread doing the get may only get us more
> recent results, and that should at least as good. Our get is guaranteed
> to finish, so we won't write complete garbage.
> 
> AFAIR the config can change form the other side too. In that sense making
> the read and the write one operation on the other side would help make us
> completely sane. 

Yes, config fields may be writeable both by the host and by the guest.

> But at the moment I don't think that what you propose
> here is giving us an edge over this patch.

It is probably a bit cleaner concept wise, but I'm not wedded to it.

Another thing that comes to mind is that 'config generation' thing,
which is used on e.g. PCI to make sure that values longer than 32 bits
are consistent. I wonder whether we should explore that direction as
well.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
       [not found]     ` <c0148f86-b7a5-0c66-146d-f2dbcd678436@linux.ibm.com>
@ 2018-09-19 14:07       ` Cornelia Huck
       [not found]         ` <592b9fd1-0ab0-aa9f-31d7-a717610bd95c@linux.ibm.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Cornelia Huck @ 2018-09-19 14:07 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization

On Wed, 19 Sep 2018 15:17:28 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/18/2018 08:45 PM, Cornelia Huck wrote:

> > We basically have two options:
> > - Have a way to queue I/O operations and then handle them in sequence.
> >   Creates complexity, and is likely overkill. (We already have a kind
> >   of serialization because we re-submit the channel program until the
> >   hypervisor accepts it; the problem comes from the wait queue usage.)  
> 
> I secretly hoped we already have something like this somewhere. Getting
> some kind of requests processed and wanting to know if each of these worked
> or not seemed like fairly common. I agree, implementing this just for
> virtio-ccw would be an overkill, I agree.

I've encountered that pattern so far mostly for driver-internal I/O
(setting some stuff up via channel commands etc.) Other usages (like
e.g. the dasd driver processing block layer requests) are asynchronous,
and the common I/O layer uses a full-fledged fsm. Much of the trouble
comes from implementing a synchronous interface via asynchronous
commands, and I'd really like to keep that as simple as possible
(especially as this is not the hot path).

> 
> > - Add serialization around the submit/wait procedure (as you did), but
> >   with a per-device mutex. That looks like the easiest solution.
> >   
> 
> Yep, I'm for doing something like this first. We can think about doing
> something more elaborate later. I will send a non-RFC with an extra
> per-device mutex. Unless you object.

No, that sounds good to me.

> 
> Another thought that crossed my head was making the transport ops
> mutex on each virtio-ccw device -- see our conversation on get/set
> config. I don't think it would make a big difference, since the
> ccw stuff is mutex already, so we only have parallelism for the
> preparation and for post-processing the results of the ccw io.

Do you spot any other places where we may need to care about concurrent
processing (like for the ->config area in the previous patch)?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper()
       [not found]         ` <592b9fd1-0ab0-aa9f-31d7-a717610bd95c@linux.ibm.com>
@ 2018-09-20 10:15           ` Cornelia Huck
  0 siblings, 0 replies; 5+ messages in thread
From: Cornelia Huck @ 2018-09-20 10:15 UTC (permalink / raw)
  To: Halil Pasic; +Cc: linux-s390, Colin Ian King, kvm, virtualization

On Wed, 19 Sep 2018 18:56:45 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/19/2018 04:07 PM, Cornelia Huck wrote:

> > Do you spot any other places where we may need to care about concurrent
> > processing (like for the ->config area in the previous patch)?
> >   
> 
> It is hard to tell, because:
> * Synchronization external to the transport could make things work
> out just fine.
> * virtio_config_ops does not document these requirements if any.
> * So it's up to the devices to use the stuff without shooting
>   themselves in the foot.
> * virtio-pci does not seem to do more to avoid such problems that
>   we do.
> 
> Back then when learning vritio-ccw I did ask myself such questions
> and based on vrito-pci and I was like looks similar, should be
> good enough.

Yep, I agree. If there's nothing obvious, I think we should just leave
it as it is now.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-20 10:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180912140202.12292-1-pasic@linux.ibm.com>
     [not found] ` <20180912140202.12292-2-pasic@linux.ibm.com>
2018-09-18 18:29   ` [RFC PATCH 1/2] virtio/s390: avoid race on vcdev->config Cornelia Huck
     [not found]     ` <2f27c41d-4465-0fce-bbbb-b7b22a179eae@linux.ibm.com>
2018-09-19 11:28       ` Cornelia Huck
     [not found] ` <20180912140202.12292-3-pasic@linux.ibm.com>
2018-09-18 18:45   ` [RFC PATCH 2/2] virtio/s390: fix race in ccw_io_helper() Cornelia Huck
     [not found]     ` <c0148f86-b7a5-0c66-146d-f2dbcd678436@linux.ibm.com>
2018-09-19 14:07       ` Cornelia Huck
     [not found]         ` <592b9fd1-0ab0-aa9f-31d7-a717610bd95c@linux.ibm.com>
2018-09-20 10:15           ` Cornelia Huck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).