From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43406) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dnMuS-0007Ey-Jy for qemu-devel@nongnu.org; Thu, 31 Aug 2017 06:41:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dnMuN-0008RS-N8 for qemu-devel@nongnu.org; Thu, 31 Aug 2017 06:41:16 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37987 helo=mx0a-001b2d01.pphosted.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dnMuN-0008RF-GT for qemu-devel@nongnu.org; Thu, 31 Aug 2017 06:41:11 -0400 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v7VAcqOA041674 for ; Thu, 31 Aug 2017 06:41:11 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0b-001b2d01.pphosted.com with ESMTP id 2cpe3mrwu2-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 31 Aug 2017 06:41:10 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 31 Aug 2017 11:41:09 +0100 References: <20170830163609.50260-1-pasic@linux.vnet.ibm.com> <20170830163609.50260-3-pasic@linux.vnet.ibm.com> <20170831111953.242ddc28.cohuck@redhat.com> From: Halil Pasic Date: Thu, 31 Aug 2017 12:41:05 +0200 MIME-Version: 1.0 In-Reply-To: <20170831111953.242ddc28.cohuck@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Message-Id: Subject: Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cornelia Huck Cc: Dong Jia Shi , Pierre Morel , qemu-devel@nongnu.org On 08/31/2017 11:19 AM, Cornelia Huck wrote: > On Wed, 30 Aug 2017 18:36:02 +0200 > Halil Pasic wrote: > >> According to the POP a start subchannel instruction (SSCH) returning with >> cc 1 implies that the subchannel was status pending when SSCH executed. >> >> Due to a somewhat confusing error handling, where error codes are mapped >> to cc value, sane looking error codes result in non AR compliant >> behavior. >> >> Let's fix this! Instead of cc 1 we use cc 3 which means device not >> operational, and is much closer to the truth in the given cases. >> >> Signed-off-by: Halil Pasic >> Acked-by: Pierre Morel >> --- >> >> This patch turned out quite controversial. We did not reach a consensus >> during the internal review. >> >> The most of the discussion revolved around the ORB flag which >> architecturally must be supported, but are currently not supported by >> vfio-ccw (not yet, or can't be). The idea showing the most promise for >> consensus was to handle this via device status (along the lines better a >> strange acting device than a non-conform machine) but since it's a >> radical change we decided to first discuss upstream and then do whatever >> needs to be done. >> --- >> hw/s390x/css.c | 15 ++++++--------- >> hw/s390x/s390-ccw.c | 2 +- >> 2 files changed, 7 insertions(+), 10 deletions(-) >> >> diff --git a/hw/s390x/css.c b/hw/s390x/css.c >> index a50fb0727e..0822538cde 100644 >> --- a/hw/s390x/css.c >> +++ b/hw/s390x/css.c >> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev *sch) >> */ >> if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) || >> !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) { >> - return -EINVAL; >> + return -ENODEV; > > This feels wrong. If we don't support this yet, doing something like a > channel-program check or an operand exception feels closer to the > architecture than indicating a gone device. I disagree, a channel-program check or an operand exception, or cc 1 (current solution) makes the machine obviously non-conform. My train of thought was that architecturally you can loose connection to the device at any time (you can't prohibit admins pulling cables or smashing equipment with a 10kg hammer). Also from the guest OS perspective I think saying device not operational could provoke a proper reaction form the guest OS: that is just give up on the device. The things you propose would in my opinion put the blame on the guest OS driver (making non-conform requests) so in that case it would make sense to give up on the driver (but the same driver could wonderfully work with let's say a fully emulated device). As I have stated in the cover letter of this patch, I would find setting device status even better, but I wanted to discuss first before going from setting cc to something else. Setting cc was not my idea in the first place (AFAIK the -EINVAL here effectively triggers cc 1). > >> } >> >> ret = s390_ccw_cmd_request(orb, s, sch->driver_data); >> @@ -1046,16 +1046,13 @@ static int sch_handle_start_func_passthrough(SubchDev *sch) >> break; >> case -ENODEV: >> break; >> + case -EFAULT: >> + break; >> case -EACCES: >> /* Let's reflect an inaccessible host device by cc 3. */ >> - ret = -ENODEV; >> - break; >> default: >> - /* >> - * All other return codes will trigger a program check, >> - * or set cc to 1. >> - */ >> - break; >> + /* Let's make all other return codes map to cc 3. */ >> + ret = -ENODEV; > > Why? This feels wrong. For those cases where we want to signal an error > but cc 1 is conceptually wrong, either an operand exception (for very > few cases) or a channel-program check feels more in line with the > architecture. You mean the original code feels wrong, or? I keep the program check for -EFAULT (that's why it's added) and just change cc 1 to cc 3 for the not explicitly handled error codes (reason stated in the commit message). > > That's a general problem with doing stuff in the hypervisor: We have > sets of internal problems that obviously don't show up in the > architecture, and we can either handle them internally or use what the > architecture offers for problem signaling. z/VM has probably faced the > same problems :) I agree. > >> }; >> >> return ret; >> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch) >> if (sch->do_subchannel_work) { >> return sch->do_subchannel_work(sch); >> } else { >> - return -EINVAL; >> + return -ENODEV; > > This rather seems like a job for an assert? If we don't have a function > for the 'asynchronous' handling of the various functions assigned for a > subchannel, that looks like an internal error. > IMHO it depends. Aborting qemu is heavy handed, and as an user I would not be happy about it. But certainly it is an assert situation. We can look for an even better solution, but I think this is an improvement. The logic behind is that the device is broken and can't be talked to properly. >> } >> } >> >> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c >> index 8614dda6f8..2b0741741c 100644 >> --- a/hw/s390x/s390-ccw.c >> +++ b/hw/s390x/s390-ccw.c >> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data) >> if (cdc->handle_request) { >> return cdc->handle_request(orb, scsw, data); >> } else { >> - return -ENOSYS; >> + return -ENODEV; > > If we get here, it means that we called a request handler (which is > only done for the passthrough variety) without having assigned a > request handler beforehand. This also looks like an internal error to > me... > Certainly. Again I was not the one who wrote or accepted the original code. My previous comment about whether assert or not applies here as well. >> } >> } >> > >