From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-s390-owner@vger.kernel.org>
Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:36722 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL)
        by vger.kernel.org with ESMTP id S1726428AbfGIV1w (ORCPT
        <rfc822;linux-s390@vger.kernel.org>); Tue, 9 Jul 2019 17:27:52 -0400
Subject: Re: [RFC v2 4/5] vfio-ccw: Don't call cp_free if we are processing a
 channel program
References: <cover.1562616169.git.alifm@linux.ibm.com>
 <1405df8415d3bff446c22753d0e9b91ff246eb0f.1562616169.git.alifm@linux.ibm.com>
 <20190709121613.6a3554fa.cohuck@redhat.com>
 <45ad7230-3674-2601-af5b-d9beef9312be@linux.ibm.com>
 <20190709162142.789dd605.pasic@linux.ibm.com>
From: Farhan Ali <alifm@linux.ibm.com>
Message-ID: <87f7a37f-cc34-36fb-3a33-309e33bbbdde@linux.ibm.com>
Date: Tue, 9 Jul 2019 17:27:47 -0400
MIME-Version: 1.0
In-Reply-To: <20190709162142.789dd605.pasic@linux.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-s390-owner@vger.kernel.org
List-ID: <linux-s390.vger.kernel.org>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>, farman@linux.ibm.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org


On 07/09/2019 10:21 AM, Halil Pasic wrote:
> On Tue, 9 Jul 2019 09:46:51 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>>
>>
>> On 07/09/2019 06:16 AM, Cornelia Huck wrote:
>>> On Mon,  8 Jul 2019 16:10:37 -0400
>>> Farhan Ali <alifm@linux.ibm.com> wrote:
>>>
>>>> There is a small window where it's possible that we could be working
>>>> on an interrupt (queued in the workqueue) and setting up a channel
>>>> program (i.e allocating memory, pinning pages, translating address).
>>>> This can lead to allocating and freeing the channel program at the
>>>> same time and can cause memory corruption.
>>>>
>>>> Let's not call cp_free if we are currently processing a channel program.
>>>> The only way we know for sure that we don't have a thread setting
>>>> up a channel program is when the state is set to VFIO_CCW_STATE_CP_PENDING.
>>>
>>> Can we pinpoint a commit that introduced this bug, or has it been there
>>> since the beginning?
>>>
>>
>> I think the problem was always there.
>>
> 
> I think it became relevant with the async stuff. Because after the async
> stuff was added we start getting solicited interrupts that are not about
> channel program is done. At least this is how I remember the discussion.
> 
>>>>
>>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
>>>> index 4e3a903..0357165 100644
>>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>>> @@ -92,7 +92,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
>>>>    		     (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>>    	if (scsw_is_solicited(&irb->scsw)) {
>>>>    		cp_update_scsw(&private->cp, &irb->scsw);
>>>> -		if (is_final)
>>>> +		if (is_final && private->state == VFIO_CCW_STATE_CP_PENDING)
> 
> Ain't private->state potentially used by multiple threads of execution?

yes

One of the paths I can think of is a machine check from the host which 
will ultimately call vfio_ccw_sch_event callback which could set state 
to NOT_OPER or IDLE.

> Do we need to use atomic operations or external synchronization to avoid
> this being another gamble? Or am I missing something?

I think we probably should think about atomic operations for 
synchronizing the state (and it could be a separate add on patch?).

But for preventing 2 threads from stomping on the cp the check should be 
enough, unless I am missing something?

> 
>>>>    			cp_free(&private->cp);
>>>>    	}
>>>>    	mutex_lock(&private->io_mutex);
>>>
>>> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
>>>
>>>
>> Thanks for reviewing.
>>
>> Thanks
>> Farhan
> 
>