All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>, David Airlie <airlied@linux.ie>,
	Tony Krowiak <akrowiak@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	dri-devel@lists.freedesktop.org,
	Eric Farman <farman@linux.ibm.com>,
	Harald Freudenberger <freude@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	intel-gfx@lists.freedesktop.org,
	intel-gvt-dev@lists.freedesktop.org,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Jason Herne <jjherne@linux.ibm.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	kvm@vger.kernel.org, Kirti Wankhede <kwankhede@nvidia.com>,
	linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Vineeth Vijayan <vneethv@linux.ibm.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhi Wang <zhi.a.wang@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Subject: Re: [Intel-gfx] [PATCH v2 4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev
Date: Mon, 20 Sep 2021 14:19:18 +0200	[thread overview]
Message-ID: <87zgs7fni1.fsf@redhat.com> (raw)
In-Reply-To: <4-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com>

On Thu, Sep 09 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> The subchannel should be left in a quiescent state unless the VFIO device
> FD is opened. When the FD is opened bring the chanel to active and allow
> the VFIO device to operate. When the device FD is closed then quiesce the
> channel.
>
> To make this work the FSM needs to handle the transitions to/from open and
> closed so everything is sequenced. Rename state NOT_OPER to BROKEN and use
> it wheneven the driver has malfunctioned. STANDBY becomes CLOSED. The
> normal case FSM looks like:
>     CLOSED -> IDLE -> PROCESS/PENDING* -> IDLE -> CLOSED
>
> With a possible branch off to BROKEN from any state. Once the device is in
> BROKEN it cannot be recovered other than be reloading the driver.

Hm, not sure whether it is a good idea to conflate "something went
wrong" and "device is not operational". In the latter case, we will
eventually get a removal of the css device when the common I/O layer has
processed the channel report for the subchannel; while the former case
could mean all kind of things, but the subchannel will likely stay
around. I think NOT_OPER was always meant to be a transitional state.

>
> Delete the triply redundant calls to
> vfio_ccw_sch_quiesce(). vfio_ccw_mdev_close_device() always leaves the
> subchannel quiescent. vfio_ccw_mdev_remove() cannot return until
> vfio_ccw_mdev_close_device() completes and vfio_ccw_sch_remove() cannot
> return until vfio_ccw_mdev_remove() completes. Have the FSM code take care
> of calling cp_free() when appropriate.

I remember some serialization issues wrt cp_free() etc. coming up every
now and than; that might need extra care (I'm taking a look.)

>
> Device reset becomes a CLOSE/OPEN sequence which now properly handles the
> situation if the device becomes BROKEN.
>
> Machine shutdown via vfio_ccw_sch_shutdown() now simply tries to close and
> leaves the device BROKEN (though arguably the bus should take care to
> quiet down the subchannel HW during shutdown, not the drivers)

The problem is that there is not really a uniform thing that can be done
for shutdown; e.g. we can quiesce and then disable I/O and EADM
subchannels, but CHSC subchannels cannot be quiesced.


WARNING: multiple messages have this Message-ID (diff)
From: Cornelia Huck <cohuck@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>, David Airlie <airlied@linux.ie>,
	Tony Krowiak <akrowiak@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	dri-devel@lists.freedesktop.org,
	Eric Farman <farman@linux.ibm.com>,
	Harald Freudenberger <freude@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	intel-gfx@lists.freedesktop.org,
	intel-gvt-dev@lists.freedesktop.org,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Jason Herne <jjherne@linux.ibm.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	kvm@vger.kernel.org, Kirti Wankhede <kwankhede@nvidia.com>,
	linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Peter Oberparleiter <oberpar@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Vineeth Vijayan <vneethv@linux.ibm.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Zhi Wang <zhi.a.wang@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v2 4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev
Date: Mon, 20 Sep 2021 14:19:18 +0200	[thread overview]
Message-ID: <87zgs7fni1.fsf@redhat.com> (raw)
In-Reply-To: <4-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com>

On Thu, Sep 09 2021, Jason Gunthorpe <jgg@nvidia.com> wrote:

> The subchannel should be left in a quiescent state unless the VFIO device
> FD is opened. When the FD is opened bring the chanel to active and allow
> the VFIO device to operate. When the device FD is closed then quiesce the
> channel.
>
> To make this work the FSM needs to handle the transitions to/from open and
> closed so everything is sequenced. Rename state NOT_OPER to BROKEN and use
> it wheneven the driver has malfunctioned. STANDBY becomes CLOSED. The
> normal case FSM looks like:
>     CLOSED -> IDLE -> PROCESS/PENDING* -> IDLE -> CLOSED
>
> With a possible branch off to BROKEN from any state. Once the device is in
> BROKEN it cannot be recovered other than be reloading the driver.

Hm, not sure whether it is a good idea to conflate "something went
wrong" and "device is not operational". In the latter case, we will
eventually get a removal of the css device when the common I/O layer has
processed the channel report for the subchannel; while the former case
could mean all kind of things, but the subchannel will likely stay
around. I think NOT_OPER was always meant to be a transitional state.

>
> Delete the triply redundant calls to
> vfio_ccw_sch_quiesce(). vfio_ccw_mdev_close_device() always leaves the
> subchannel quiescent. vfio_ccw_mdev_remove() cannot return until
> vfio_ccw_mdev_close_device() completes and vfio_ccw_sch_remove() cannot
> return until vfio_ccw_mdev_remove() completes. Have the FSM code take care
> of calling cp_free() when appropriate.

I remember some serialization issues wrt cp_free() etc. coming up every
now and than; that might need extra care (I'm taking a look.)

>
> Device reset becomes a CLOSE/OPEN sequence which now properly handles the
> situation if the device becomes BROKEN.
>
> Machine shutdown via vfio_ccw_sch_shutdown() now simply tries to close and
> leaves the device BROKEN (though arguably the bus should take care to
> quiet down the subchannel HW during shutdown, not the drivers)

The problem is that there is not really a uniform thing that can be done
for shutdown; e.g. we can quiesce and then disable I/O and EADM
subchannels, but CHSC subchannels cannot be quiesced.


  reply	other threads:[~2021-09-20 12:19 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 19:38 [Intel-gfx] [PATCH v2 0/9] Move vfio_ccw to the new mdev API Jason Gunthorpe
2021-09-09 19:38 ` Jason Gunthorpe
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 1/9] vfio/ccw: Use functions for alloc/free of the vfio_ccw_private Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-10 11:27   ` [Intel-gfx] " Christoph Hellwig
2021-09-10 11:27     ` Christoph Hellwig
2021-09-14 15:50     ` [Intel-gfx] " Cornelia Huck
2021-09-14 15:50       ` Cornelia Huck
2021-09-14 18:03       ` [Intel-gfx] " Jason Gunthorpe
2021-09-14 18:03         ` Jason Gunthorpe
2021-09-24  2:53   ` [Intel-gfx] " Eric Farman
2021-09-24  2:53     ` Eric Farman
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 2/9] vfio/ccw: Pass vfio_ccw_private not mdev_device to various functions Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-10 11:32   ` [Intel-gfx] " Christoph Hellwig
2021-09-10 11:32     ` Christoph Hellwig
2021-09-20 11:12   ` [Intel-gfx] " Cornelia Huck
2021-09-20 11:12     ` Cornelia Huck
2021-09-24  2:53   ` [Intel-gfx] " Eric Farman
2021-09-24  2:53     ` Eric Farman
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 3/9] vfio/ccw: Convert to use vfio_register_group_dev() Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-24 20:37   ` [Intel-gfx] " Eric Farman
2021-09-24 20:37     ` Eric Farman
2021-09-27 12:17     ` [Intel-gfx] " Jason Gunthorpe
2021-09-27 12:17       ` Jason Gunthorpe
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 4/9] vfio/ccw: Make the FSM complete and synchronize it to the mdev Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-20 12:19   ` Cornelia Huck [this message]
2021-09-20 12:19     ` Cornelia Huck
2021-09-20 12:30     ` [Intel-gfx] " Jason Gunthorpe
2021-09-20 12:30       ` Jason Gunthorpe
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 5/9] vfio/mdev: Consolidate all the device_api sysfs into the core code Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-10 12:10   ` [Intel-gfx] " Christoph Hellwig
2021-09-10 12:10     ` Christoph Hellwig
2021-09-10 13:38     ` [Intel-gfx] " Jason Gunthorpe
2021-09-10 13:38       ` Jason Gunthorpe
2021-09-10 16:09       ` [Intel-gfx] " Alex Williamson
2021-09-10 16:09         ` Alex Williamson
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 6/9] vfio/mdev: Add mdev available instance checking to the core Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-10 12:25   ` [Intel-gfx] " Christoph Hellwig
2021-09-10 12:25     ` Christoph Hellwig
2021-09-20 18:02   ` [Intel-gfx] " Cornelia Huck
2021-09-20 18:02     ` Cornelia Huck
2021-09-21 13:19     ` [Intel-gfx] " Jason Gunthorpe
2021-09-21 13:19       ` Jason Gunthorpe
2021-09-24  2:54       ` [Intel-gfx] " Eric Farman
2021-09-24  2:54         ` Eric Farman
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 7/9] vfio/ccw: Remove private->mdev Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-24 20:45   ` [Intel-gfx] " Eric Farman
2021-09-24 20:45     ` Eric Farman
2021-09-27 12:32     ` [Intel-gfx] " Jason Gunthorpe
2021-09-27 12:32       ` Jason Gunthorpe
2021-09-27 20:45       ` [Intel-gfx] " Eric Farman
2021-09-27 20:45         ` Eric Farman
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 8/9] vfio: Export vfio_device_try_get() Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-09 19:38 ` [Intel-gfx] [PATCH v2 9/9] vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the mdev Jason Gunthorpe
2021-09-09 19:38   ` Jason Gunthorpe
2021-09-09 19:54 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API Patchwork
2021-09-13 17:40 ` [Intel-gfx] [PATCH v2 0/9] " Eric Farman
2021-09-13 17:40   ` Eric Farman
2021-09-13 19:24   ` [Intel-gfx] " Jason Gunthorpe
2021-09-13 19:24     ` Jason Gunthorpe
2021-09-13 20:31     ` [Intel-gfx] " Eric Farman
2021-09-13 20:31       ` Eric Farman
2021-09-14 13:36       ` [Intel-gfx] " Jason Gunthorpe
2021-09-14 13:36         ` Jason Gunthorpe
2021-09-17 11:59         ` [Intel-gfx] " Cornelia Huck
2021-09-17 11:59           ` Cornelia Huck
2021-09-17 12:51           ` [Intel-gfx] " Jason Gunthorpe
2021-09-17 12:51             ` Jason Gunthorpe
2021-09-17 14:37             ` [Intel-gfx] " Cornelia Huck
2021-09-17 14:37               ` Cornelia Huck
2021-09-14 13:46 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev2) Patchwork
2021-09-14 18:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev3) Patchwork
2021-09-21 15:11 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Move vfio_ccw to the new mdev API (rev4) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgs7fni1.fsf@redhat.com \
    --to=cohuck@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=farman@linux.ibm.com \
    --cc=freude@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=oberpar@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=vneethv@linux.ibm.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.