From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34324) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cg500-0000vd-QQ for qemu-devel@nongnu.org; Tue, 21 Feb 2017 02:36:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cg4zx-00049h-M9 for qemu-devel@nongnu.org; Tue, 21 Feb 2017 02:36:36 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48134 helo=mx0a-001b2d01.pphosted.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cg4zx-00049Z-FU for qemu-devel@nongnu.org; Tue, 21 Feb 2017 02:36:33 -0500 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v1L7NobZ141388 for ; Tue, 21 Feb 2017 02:36:32 -0500 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0b-001b2d01.pphosted.com with ESMTP id 28r2pnykgn-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 21 Feb 2017 02:36:31 -0500 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 21 Feb 2017 00:36:31 -0700 Date: Tue, 21 Feb 2017 15:36:23 +0800 From: Dong Jia Shi References: <20170217082939.33208-1-bjsdjshi@linux.vnet.ibm.com> <20170217082939.33208-5-bjsdjshi@linux.vnet.ibm.com> <20170220193113.3ba5af2c.cornelia.huck@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170220193113.3ba5af2c.cornelia.huck@de.ibm.com> Message-Id: <20170221073623.GJ562@bjsdjshi@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cornelia Huck Cc: Dong Jia Shi , kvm@vger.kernel.org, linux-s390@vger.kernel.org, qemu-devel@nongnu.org, renxiaof@linux.vnet.ibm.com, borntraeger@de.ibm.com, agraf@suse.com, alex.williamson@redhat.com, pmorel@linux.vnet.ibm.com, pasic@linux.vnet.ibm.com, wkywang@linux.vnet.ibm.com * Cornelia Huck [2017-02-20 19:31:13 +0100]: > On Fri, 17 Feb 2017 09:29:28 +0100 > Dong Jia Shi wrote: > > > To make vfio support subchannel devices, we need a css driver for > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel > > driver for this purpose. > > > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option > > and configure VFIO as required. > > > > Signed-off-by: Dong Jia Shi > > Acked-by: Pierre Morel > > --- > > arch/s390/Kconfig | 10 ++ > > arch/s390/include/asm/isc.h | 1 + > > drivers/iommu/Kconfig | 8 ++ > > drivers/s390/cio/Makefile | 3 + > > drivers/s390/cio/vfio_ccw_drv.c | 262 ++++++++++++++++++++++++++++++++++++ > > drivers/s390/cio/vfio_ccw_private.h | 25 ++++ > > 6 files changed, 309 insertions(+) > > create mode 100644 drivers/s390/cio/vfio_ccw_drv.c > > create mode 100644 drivers/s390/cio/vfio_ccw_private.h > > > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > > index c6722112..b920df8 100644 > > --- a/arch/s390/Kconfig > > +++ b/arch/s390/Kconfig > > @@ -670,6 +670,16 @@ config EADM_SCH > > To compile this driver as a module, choose M here: the > > module will be called eadm_sch. > > > > +config VFIO_CCW > > + def_tristate n > > + prompt "Support for VFIO-CCW subchannels" > > + depends on S390_CCW_IOMMU && VFIO > > + help > > + This driver allows usage of VFIO-CCW subchannels. > > Hm... > > "This driver allows usage of I/O subchannels via VFIO-CCW." > > ? > This is better. Will change. > > + > > + To compile this driver as a module, choose M here: the > > + module will be called vfio_ccw. > > + > > endmenu > > > > menu "Dump support" > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h > > index 68d7d68..8a0b721 100644 > > --- a/arch/s390/include/asm/isc.h > > +++ b/arch/s390/include/asm/isc.h > > @@ -16,6 +16,7 @@ > > #define CONSOLE_ISC 1 /* console I/O subchannel */ > > #define EADM_SCH_ISC 4 /* EADM subchannels */ > > #define CHSC_SCH_ISC 7 /* CHSC subchannels */ > > +#define VFIO_CCW_ISC IO_SCH_ISC /* VFIO-CCW I/O subchannels */ > > This is OK for now, I guess; but do we want to have the isc > configurable in the long run? I.e., if a host wants to run its own I/O > devices at a different priority than the devices it passes to a guest? > I think we can keep this as the default value, and provide a driver param to customize the ISC value in the future once we need this. I put this on my LATER list, or I do it in next version? > > /* Adapter interrupts. */ > > #define QDIO_AIRQ_ISC IO_SCH_ISC /* I/O subchannel in qdio mode */ > > #define PCI_ISC 2 /* PCI I/O subchannels */ > > (...) > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c > > new file mode 100644 > > index 0000000..b068207 > > --- /dev/null > > +++ b/drivers/s390/cio/vfio_ccw_drv.c > > @@ -0,0 +1,262 @@ > > +/* > > + * VFIO based Physical Subchannel device driver > > + * > > + * Copyright IBM Corp. 2017 > > + * > > + * Author(s): Dong Jia Shi > > + * Xiao Feng Ren > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +#include "vfio_ccw_private.h" > > + > > +/* > > + * Helpers > > + */ > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch) > > +{ > > + struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev); > > + DECLARE_COMPLETION_ONSTACK(completion); > > + int iretry, ret = 0; > > + > > + spin_lock_irq(sch->lock); > > + if (!sch->schib.pmcw.ena) > > + goto out_unlock; > > + ret = cio_disable_subchannel(sch); > > + if (ret != -EBUSY) > > + goto out_unlock; > > + > > + do { > > + iretry = 255; > > + > > + ret = cio_cancel_halt_clear(sch, &iretry); > > + while (ret == -EBUSY) { > > + /* > > + * Flushing all I/O and wait the > > "Flush all I/O and wait for..." > Ok. > > + * cancel/halt/clear completion. > > + */ > > + private->completion = &completion; > > + spin_unlock_irq(sch->lock); > > + > > + wait_for_completion(&completion); > > What happens for cancel? It won't generate an interrupt. > Right! How about using: wait_for_completion_timeout(&completion, 3*HZ); (I stole '3*HZ' from ccw_device_kill_io.) > > + > > + spin_lock_irq(sch->lock); > > + private->completion = NULL; > > + ret = cio_cancel_halt_clear(sch, &iretry); > > + }; > > + > > + ret = cio_disable_subchannel(sch); > > + } while (ret == -EBUSY); > > + > > +out_unlock: > > + spin_unlock_irq(sch->lock); > > + return ret; > > +} > > + > > +/* > > + * Sysfs interfaces > > + */ > > +static ssize_t chpids_show(struct device *dev, > > + struct device_attribute *attr, > > + char *buf) > > +{ > > + struct subchannel *sch = to_subchannel(dev); > > + struct chsc_ssd_info *ssd = &sch->ssd_info; > > + ssize_t ret = 0; > > + int chp; > > + int mask; > > + > > + for (chp = 0; chp < 8; chp++) { > > + mask = 0x80 >> chp; > > + if (ssd->path_mask & mask) > > + ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id); > > + else > > + ret += sprintf(buf + ret, "00 "); > > + } > > + ret += sprintf(buf+ret, "\n"); > > + return ret; > > +} > > + > > +static ssize_t pimpampom_show(struct device *dev, > > + struct device_attribute *attr, > > + char *buf) > > +{ > > + struct subchannel *sch = to_subchannel(dev); > > + struct pmcw *pmcw = &sch->schib.pmcw; > > + > > + return sprintf(buf, "%02x %02x %02x\n", > > + pmcw->pim, pmcw->pam, pmcw->pom); > > +} > > + > > +static DEVICE_ATTR(chpids, 0444, chpids_show, NULL); > > +static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL); > > Quick question: You need to duplicate these so that lscss shows a sane > output for vfio-ccw subchannels? > Yes, and also for retrieving these values in QEMU. > > + > > +static struct attribute *vfio_subchannel_attrs[] = { > > + &dev_attr_chpids.attr, > > + &dev_attr_pimpampom.attr, > > + NULL, > > +}; > > + > > +static struct attribute_group vfio_subchannel_attr_group = { > > + .attrs = vfio_subchannel_attrs, > > +}; > > + > > +/* > > + * Css driver callbacks > > + */ > > +static void vfio_ccw_sch_irq(struct subchannel *sch) > > +{ > > + struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev); > > + > > + inc_irq_stat(IRQIO_CIO); > > + > > + if (!private) > > + return; > > + > > + if (private->completion) > > + complete(private->completion); > > +} > > + > > +static int vfio_ccw_sch_probe(struct subchannel *sch) > > +{ > > + struct pmcw *pmcw = &sch->schib.pmcw; > > + struct vfio_ccw_private *private; > > + int ret; > > + > > + if (pmcw->qf) { > > + dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n", > > s/do/does/ > Ok. > > + dev_name(&sch->dev)); > > + return -ENOTTY; > > Is -ENOTTY the right return code here? -EINVAL? > Ok. Think it again. -EINVAL makes more sense. It's like: "hey, I know it's an I/O subchannel, but not the kind we support". > > + } > > + > > + private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA); > > + if (!private) > > + return -ENOMEM; > > + private->sch = sch; > > + dev_set_drvdata(&sch->dev, private); > > + > > + spin_lock_irq(sch->lock); > > + sch->isc = VFIO_CCW_ISC; > > + ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch); > > + spin_unlock_irq(sch->lock); > > + if (ret) > > + goto out_free; > > + > > + ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group); > > + if (ret) > > + goto out_disable; > > + > > + return 0; > > + > > +out_disable: > > + cio_disable_subchannel(sch); > > +out_free: > > + dev_set_drvdata(&sch->dev, NULL); > > + kfree(private); > > + return ret; > > +} > > + > > (...) > > > +/** > > + * vfio_ccw_sch_event - process subchannel event > > + * @sch: subchannel > > + * @process: non-zero if function is called in process context > > + * > > + * An unspecified event occurred for this subchannel. Adjust data according > > + * to the current operational state of the subchannel. Return zero when the > > + * event has been handled sufficiently or -EAGAIN when this function should > > + * be called again in process context. > > + */ > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(sch->lock, flags); > > + if (!device_is_registered(&sch->dev)) > > + goto out_unlock; > > + > > + if (work_pending(&sch->todo_work)) > > + goto out_unlock; > > + > > + if (cio_update_schib(sch)) { > > + /* Not operational. */ > > + css_sched_sch_todo(sch, SCH_TODO_UNREG); > > + > > + /* > > + * TODO: > > + * Probably we should send the machine check to the guest. > > Yes, we should do that later on. Will user space notice that the device > is gone? (I think crw injection should be done by user space.) > Currently we lack this mechanism. I think there are many todos here. I will investigate latter. > > + */ > > + goto out_unlock; > > + } > > + > > +out_unlock: > > + spin_unlock_irqrestore(sch->lock, flags); > > + > > + return 0; > > +} > > (...) > > Looks sane in general from my POV. Thanks for the comments. -- Dong Jia