From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36722 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726428AbfGIV1w (ORCPT ); Tue, 9 Jul 2019 17:27:52 -0400 Subject: Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a channel program References: <1405df8415d3bff446c22753d0e9b91ff246eb0f.1562616169.git.alifm@linux.ibm.com> <20190709121613.6a3554fa.cohuck@redhat.com> <45ad7230-3674-2601-af5b-d9beef9312be@linux.ibm.com> <20190709162142.789dd605.pasic@linux.ibm.com> From: Farhan Ali Message-ID: <87f7a37f-cc34-36fb-3a33-309e33bbbdde@linux.ibm.com> Date: Tue, 9 Jul 2019 17:27:47 -0400 MIME-Version: 1.0 In-Reply-To: <20190709162142.789dd605.pasic@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-s390-owner@vger.kernel.org List-ID: To: Halil Pasic Cc: Cornelia Huck , farman@linux.ibm.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org On 07/09/2019 10:21 AM, Halil Pasic wrote: > On Tue, 9 Jul 2019 09:46:51 -0400 > Farhan Ali wrote: > >> >> >> On 07/09/2019 06:16 AM, Cornelia Huck wrote: >>> On Mon, 8 Jul 2019 16:10:37 -0400 >>> Farhan Ali wrote: >>> >>>> There is a small window where it's possible that we could be working >>>> on an interrupt (queued in the workqueue) and setting up a channel >>>> program (i.e allocating memory, pinning pages, translating address). >>>> This can lead to allocating and freeing the channel program at the >>>> same time and can cause memory corruption. >>>> >>>> Let's not call cp_free if we are currently processing a channel program. >>>> The only way we know for sure that we don't have a thread setting >>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING. >>> >>> Can we pinpoint a commit that introduced this bug, or has it been there >>> since the beginning? >>> >> >> I think the problem was always there. >> > > I think it became relevant with the async stuff. Because after the async > stuff was added we start getting solicited interrupts that are not about > channel program is done. At least this is how I remember the discussion. > >>>> >>>> Signed-off-by: Farhan Ali >>>> --- >>>> drivers/s390/cio/vfio_ccw_drv.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c >>>> index 4e3a903..0357165 100644 >>>> --- a/drivers/s390/cio/vfio_ccw_drv.c >>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c >>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work) >>>> (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT)); >>>> if (scsw_is_solicited(&irb->scsw)) { >>>> cp_update_scsw(&private->cp, &irb->scsw); >>>> - if (is_final) >>>> + if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING) > > Ain't private->state potentially used by multiple threads of execution? yes One of the paths I can think of is a machine check from the host which will ultimately call vfio_ccw_sch_event callback which could set state to NOT_OPER or IDLE. > Do we need to use atomic operations or external synchronization to avoid > this being another gamble? Or am I missing something? I think we probably should think about atomic operations for synchronizing the state (and it could be a separate add on patch?). But for preventing 2 threads from stomping on the cp the check should be enough, unless I am missing something? > >>>> cp_free(&private->cp); >>>> } >>>> mutex_lock(&private->io_mutex); >>> >>> Reviewed-by: Cornelia Huck >>> >>> >> Thanks for reviewing. >> >> Thanks >> Farhan > >