public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Farman <farman@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Jared Rossi <jrossi@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR
Date: Tue, 30 Jun 2020 15:10:01 -0400	[thread overview]
Message-ID: <02b20850-55cd-331a-8fb5-e9bec3386c2a@linux.ibm.com> (raw)
In-Reply-To: <20200629165629.24f21585.cohuck@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 3894 bytes --]



On 6/29/20 10:56 AM, Cornelia Huck wrote:
> On Wed, 17 Jun 2020 07:24:17 -0400
> Eric Farman <farman@linux.ibm.com> wrote:
> 
>> On 6/16/20 3:50 PM, Eric Farman wrote:
>>> Let's continue our discussion of the handling of vfio-ccw interrupts.
>>>
>>> The initial fix [1] relied upon the interrupt path's examination of the
>>> FSM state, and freeing all resources if it were CP_PENDING. But the
>>> interface used by HALT/CLEAR SUBCHANNEL doesn't affect the FSM state.
>>> Consider this sequence:
>>>
>>>     CPU 1                           CPU 2
>>>     CLEAR (state=IDLE/no change)
>>>                                     START [2]
>>>     INTERRUPT (set state=IDLE)
>>>                                     INTERRUPT (set state=IDLE)
>>>
>>> This translates to a couple of possible scenarios:
>>>
>>>  A) The START gets a cc2 because of the outstanding CLEAR, -EBUSY is
>>>     returned, resources are freed, and state remains IDLE
>>>  B) The START gets a cc0 because the CLEAR has already presented an
>>>     interrupt, and state is set to CP_PENDING
>>>
>>> If the START gets a cc0 before the CLEAR INTERRUPT (stacked onto a
>>> workqueue by the IRQ context) gets a chance to run, then the INTERRUPT
>>> will release the channel program memory prematurely. If the two
>>> operations run concurrently, then the FSM state set to CP_PROCESSING
>>> will prevent the cp_free() from being invoked. But the io_mutex
>>> boundary on that path will pause itself until the START completes,
>>> and then allow the FSM to be reset to IDLE without considering the
>>> outstanding START. Neither scenario would be considered good.
>>>
>>> Having said all of that, in v2 Conny suggested [3] the following:
>>>   
>>>> - Detach the cp from the subchannel (or better, remove the 1:1
>>>>   relationship). By that I mean building the cp as a separately
>>>>   allocated structure (maybe embedding a kref, but that might not be
>>>>   needed), and appending it to a list after SSCH with cc=0. Discard it
>>>>   if cc!=0.
>>>> - Remove the CP_PENDING state. The state is either IDLE after any
>>>>   successful SSCH/HSCH/CSCH, or a new state in that case. But no
>>>>   special state for SSCH.
>>>> - A successful CSCH removes the first queued request, if any.
>>>> - A final interrupt removes the first queued request, if any.  
>>>
>>> What I have implemented here is basically this, with a few changes:
>>>
>>>  - I don't queue cp's. Since there should only be one START in process
>>>    at a time, and HALT/CLEAR doesn't build a cp, I didn't see a pressing
>>>    need to introduce that complexity.
>>>  - Furthermore, while I initially made a separately allocated cp, adding
>>>    an alloc for a cp on each I/O AND moving the guest_cp alloc from the
>>>    probe path to the I/O path seems excessive. So I implemented a
>>>    "started" flag to the cp, set after a cc0 from the START, and examine
>>>    that on the interrupt path to determine whether cp_free() is needed.  
>>
>> FYI... After a day or two of running, I sprung a kernel debug oops for
>> list corruption in ccwchain_free(). I'm going to blame this piece, since
>> it was the last thing I changed and I hadn't come across any such damage
>> since v2. So either "started" is a bad idea, or a broken one. Or both. :)
> 
> Have you come to any conclusion wrt 'started'? Not wanting to generate
> stress, just asking :)
> 

I've talked myself out of it, and gone back to your original proposal of
a separately allocated cp. (Still no queuing.) Too early to pass
judgement though.

Yesterday, when running with a cp_free() call after a CSCH, I was
getting all sorts of errors very early on, so at the moment I've pulled
that back out again. If it looks good in this form, I'll put that as a
separate patch and write up some doc for a discussion on that point.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

  reply	other threads:[~2020-06-30 19:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 19:50 [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 1/3] vfio-ccw: Indicate if a channel_program is started Eric Farman
2020-06-17 23:11   ` Halil Pasic
2020-06-18 11:47     ` Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 2/3] vfio-ccw: Remove the CP_PENDING FSM state Eric Farman
2020-06-16 19:50 ` [RFC PATCH v3 3/3] vfio-ccw: Check workqueue before doing START Eric Farman
2020-06-19 11:40   ` Cornelia Huck
2020-06-17 11:24 ` [RFC PATCH v3 0/3] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-06-29 14:56   ` Cornelia Huck
2020-06-30 19:10     ` Eric Farman [this message]
2020-06-19 11:21 ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02b20850-55cd-331a-8fb5-e9bec3386c2a@linux.ibm.com \
    --to=farman@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=jrossi@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox