public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Farman <farman@linux.ibm.com>
To: Farhan Ali <alifm@linux.ibm.com>, cohuck@redhat.com
Cc: pasic@linux.ibm.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC v1 1/1] vfio-ccw: Don't call cp_free if we are processing a channel program
Date: Fri, 21 Jun 2019 13:40:58 -0400	[thread overview]
Message-ID: <2d9c04ba-ee50-2f9b-343a-5109274ff52d@linux.ibm.com> (raw)
In-Reply-To: <581d756d-7418-cd67-e0e8-f9e4fe10b22d@linux.ibm.com>



On 6/21/19 10:17 AM, Farhan Ali wrote:
> 
> 
> On 06/20/2019 04:27 PM, Eric Farman wrote:
>>
>>
>> On 6/20/19 3:40 PM, Farhan Ali wrote:
>>> There is a small window where it's possible that an interrupt can
>>> arrive and can call cp_free, while we are still processing a channel
>>> program (i.e allocating memory, pinnging pages, translating
>>
>> s/pinnging/pinning/
>>
>>> addresses etc). This can lead to allocating and freeing at the same
>>> time and can cause memory corruption.
>>>
>>> Let's not call cp_free if we are currently processing a channel program.
>>
>> The check around this cp_free() call is for a solicited interrupt, so
>> it's presumably in response to a SSCH we issued.  But if we're still
>> processing a CP, then we hadn't issued the SSCH to the hardware yet.  So
>> what is this interrupt for?  Do the contents of irb.cpa provide any
>> clues, perhaps if it's in the current cp or for someone else?
>>
> 
> I don't think the interrupt is in response to an ssch but rather due to
> an csch/hsch.
> 
>>>
>>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>>> ---
>>>
>>> I have been running my test overnight with this patch and I haven't
>>> seen the stack traces that I mentioned about earlier. I would like
>>> to get some reviews on this and also if this is the right thing to
>>> do?
>>>
>>> Thanks
>>> Farhan
>>>
>>>   drivers/s390/cio/vfio_ccw_drv.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/s390/cio/vfio_ccw_drv.c
>>> b/drivers/s390/cio/vfio_ccw_drv.c
>>> index 66a66ac..61ece3f 100644
>>> --- a/drivers/s390/cio/vfio_ccw_drv.c
>>> +++ b/drivers/s390/cio/vfio_ccw_drv.c
>>> @@ -88,7 +88,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct
>>> *work)
>>>                (SCSW_ACTL_DEVACT | SCSW_ACTL_SCHACT));
>>>       if (scsw_is_solicited(&irb->scsw)) {
>>>           cp_update_scsw(&private->cp, &irb->scsw);
>>
>> As I alluded earlier, do we know this irb is for this cp?  If no, what
>> does this function end up putting in the scsw?
>>
>>> -        if (is_final)
>>> +        if (is_final && private->state != VFIO_CCW_STATE_CP_PROCESSING)
>>
>> In looking at how we set this state, and how we exit it, I see we do:
>>
>> if SSCH got CC0, CP_PROCESSING -> CP_PENDING
>> if SSCH got !CC0, CP_PROCESSING -> IDLE
>>
>> While the first scenario happens immediately after the SSCH instruction,
>> I guess it could be just tiny enough, like the io_trigger FSM patch I
>> sent a few weeks ago.
>>
>> Meanwhile, the latter happens way after we return from the jump table.
>> So that scenario leaves considerable time for such an interrupt to
>> occur, though I don't understand why it would if we got a CC(1-3) on the
>> SSCH.
>>
>> And anyway, the return from fsm_io_helper() in that case will also call
>> cp_free().  So why does the cp->initialized check provide protection
>> from a double-free in that direction, but not here?  I'm confused.
> 
> I have a theory where I think it's possible to have 2 different threads
> executing cp_free
> 
> If we start with private->state == IDLE and the guest issues a
> clear/halt and then an ssch
> 
> - clear/halt will be issued to hardware, and if succeeds we will return
> cc=0 to guest
> 
> - the guest can then issue ssch

It can issue whatever it wants, but shouldn't the SSCH get a CC2 until
the halt/clear pending state is cleared?  Hrm, fsm_io_request() doesn't
look.  Rather, it (fsm_io_helper()) relies on the CC2 to be signalled
from the SSCH issued to the device.  There's a lot of stuff that happens
before we get to that point.

I'm wondering if there's a way we could/should return the SSCH here
before we do any processing.  After all, until the interrupt on the
workqueue is processed, we are busy.

> 
> - we get an interrupt for csch/hsch and we queue the interrupt in the
> workqueue
> 
> - we start processing the ssch and then at the same time another cpu
> could be working on the
> interrupt>
> 
> Thread 1                                        Thread 2
> --------                                        --------
> 
> fsm_io_request                                  vfio_ccw_sch_io_todo
>     cp_init                                         cp_free
>     cp_prefetch
>     fsm_io_helper
>         cp_free
> 
> 
> 
> The test that I am trying is with a guest running an fio workload, while
> at the same time stressing the error recovery path in the guest. So
> there is a lot of ssch and lot of csch.
> 
> Of course I don't think my patch completely solves the problem, I think
> it just makes the window narrower. I just wanted to get a discussion
> started :)
> 
> 
> Now that I am thinking more about it, I think we might have to protect
> cp with it's own mutex.

That seems like a big hammer, and I wonder if the existing SCHIB/FSM/CP
state data doesn't provide that information to us.  But I gotta wander
around some code before I say.

> 
> Thanks
> Farhan
> 
> 
>>
>>>               cp_free(&private->cp);
>>>       }
>>>       mutex_lock(&private->io_mutex);
>>>
>>
> 


  reply	other threads:[~2019-06-21 17:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1561055076.git.alifm@linux.ibm.com>
2019-06-20 19:40 ` [RFC v1 1/1] vfio-ccw: Don't call cp_free if we are processing a channel program Farhan Ali
2019-06-20 20:27   ` Eric Farman
2019-06-21 14:17     ` Farhan Ali
2019-06-21 17:40       ` Eric Farman [this message]
2019-06-21 18:34         ` Farhan Ali
2019-06-24  9:42           ` Cornelia Huck
2019-06-24 10:05             ` Cornelia Huck
2019-06-24 11:46               ` Cornelia Huck
2019-06-24 12:07                 ` Cornelia Huck
2019-06-24 14:44                   ` Farhan Ali
2019-06-24 15:09                     ` Cornelia Huck
2019-06-24 15:24                       ` Farhan Ali
2019-06-27  9:14                         ` Cornelia Huck
2019-06-28 13:05                           ` Farhan Ali
2019-06-24 11:31             ` Halil Pasic
2019-06-20 21:07   ` Farhan Ali
2019-06-21 14:00   ` Halil Pasic
2019-06-21 14:26     ` Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d9c04ba-ee50-2f9b-343a-5109274ff52d@linux.ibm.com \
    --to=farman@linux.ibm.com \
    --cc=alifm@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox