All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Eric Farman <farman@linux.ibm.com>
Cc: Jared Rossi <jrossi@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR
Date: Tue, 19 May 2020 13:23:06 +0200	[thread overview]
Message-ID: <20200519132306.0a3335ed.cohuck@redhat.com> (raw)
In-Reply-To: <33909b2e-2939-9345-175b-960697d05b4e@linux.ibm.com>

On Mon, 18 May 2020 17:57:39 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On 5/18/20 12:09 PM, Cornelia Huck wrote:
> > On Wed, 13 May 2020 16:29:30 +0200
> > Eric Farman <farman@linux.ibm.com> wrote:
> >   
> >> Hi Conny,
> >>
> >> Back in January, I suggested a small patch [1] to try to clean up
> >> the handling of HSCH/CSCH interrupts, especially as it relates to
> >> concurrent SSCH interrupts. Here is a new attempt to address this.
> >>
> >> There was some suggestion earlier about locking the FSM, but I'm not
> >> seeing any problems with that. Rather, what I'm noticing is that the
> >> flow between a synchronous START and asynchronous HALT/CLEAR have
> >> different impacts on the FSM state. Consider:
> >>
> >>     CPU 1                           CPU 2
> >>
> >>     SSCH (set state=CP_PENDING)
> >>     INTERRUPT (set state=IDLE)
> >>     CSCH (no change in state)
> >>                                     SSCH (set state=CP_PENDING)  
> > 
> > This is the transition I do not understand. When we get a request via
> > the I/O area, we go to CP_PROCESSING and start doing translations.
> > However, we only transition to CP_PENDING if we actually do a SSCH with
> > cc 0 -- which shouldn't be possible in the flow you outline... unless
> > it really is something that can be taken care of with locking (state
> > machine transitioning due to an interrupt without locking, so we go to
> > IDLE without other parts noticing.)  
> 
> I'm only going by what the (existing and my temporary) tea leaves in
> s390dbf are telling us. :)

/me makes a note to try tea leaves :)

> 
> >   
> >>     INTERRUPT (set state=IDLE)  
> 
> Part of the problem is that this is actually comprised of these elements:
> 
> if (irb_is_final && state == CP_PENDING)
> 	cp_free()
> 
> lock io_mutex
> copy irb to io_region
> unlock io_mutex
> 
> if (irb_is_final)
> 	state = IDLE
> 
> The CP_PENDING check will protect us if a SSCH is still being built at
> the time we execute this code. But if we got to CP_PENDING first
> (between fsm_irq() stacking to the workqueue and us unstacking
> vfio_ccw_sch_io_todo()), we would free an unrelated operation. (This was
> the scenario in the first version of my fix back in January.)
> 
> We can't add a CP_PENDING check after the io_mutex barrier, because if a
> second SSCH is being processed, we will hang on the lock acquisition and
> will DEFINITELY be in CP_PENDING state when we come back. But by that
> point, we will have skipped freeing the (now active) CP but are back in
> an IDLE state.

That's all very ugly :(

> 
> 
> >>                                     INTERRUPT (set state=IDLE)  
> > 
> > But taking a step back (and ignoring your series and the discussion,
> > sorry about that):  
> 
> No apologies necessary.
> 
> > 
> > We need to do something (creating a local translation of the guest's
> > channel program) that does not have any relation to the process in the
> > architecture at all, but is only something that needs to be done
> > because of what vfio-ccw is trying to do (issuing a channel program on
> > behalf of another entity.) Trying to sort that out by poking at actl
> > and fctl bits does not seem like the best way; especially as keeping
> > the bits up-to-date via STSCH is an exercise in futility.  
> 
> I am coming to strongly agree with this sentiment.

Thank you for making me feel like I'm not completely out in the weeds :)

> 
> > 
> > What about the following (and yes, I had suggested something vaguely in
> > that direction before):
> > 
> > - Detach the cp from the subchannel (or better, remove the 1:1
> >   relationship). By that I mean building the cp as a separately
> >   allocated structure (maybe embedding a kref, but that might not be
> >   needed), and appending it to a list after SSCH with cc=0. Discard it
> >   if cc!=0.
> > - Remove the CP_PENDING state. The state is either IDLE after any
> >   successful SSCH/HSCH/CSCH, or a new state in that case. But no
> >   special state for SSCH.
> > - A successful CSCH removes the first queued request, if any.
> > - A final interrupt removes the first queued request, if any.
> > 
> > Thoughts?
> >   
> 
> I'm cautiously optimistic, for exactly the reason I mention above. If we
> always expect to be in IDLE state once an interrupt arrives, we can just
> rely on determining if the interrupt is in relation to an actual
> operation we're waiting on. I'll give this a try and report back.

Great, good luck!

  reply	other threads:[~2020-05-19 11:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 14:29 [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 1/4] vfio-ccw: Do not reset FSM state for unsolicited interrupts Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 2/4] vfio-ccw: Utilize scsw actl to serialize start operations Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 3/4] vfio-ccw: Expand SCSW usage to HALT and CLEAR Eric Farman
2020-05-13 14:29 ` [RFC PATCH v2 4/4] vfio-ccw: Clean up how to react to a failed START Eric Farman
2020-05-14 13:46 ` [RFC PATCH v2 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR Halil Pasic
2020-05-15 13:09   ` Eric Farman
2020-05-15 14:55     ` Halil Pasic
2020-05-15 15:58       ` Cornelia Huck
2020-05-15 17:41         ` Halil Pasic
2020-05-15 18:19           ` Eric Farman
2020-05-15 18:12       ` Eric Farman
2020-05-15 18:37         ` Halil Pasic
2020-05-18 22:01           ` Eric Farman
2020-05-15 19:35         ` Halil Pasic
2020-05-18 16:09 ` Cornelia Huck
2020-05-18 21:57   ` Eric Farman
2020-05-19 11:23     ` Cornelia Huck [this message]
2020-05-18 22:09   ` Halil Pasic
2020-05-19 11:36     ` Cornelia Huck
2020-05-19 12:10       ` Halil Pasic
2020-05-26  9:55 ` Cornelia Huck
2020-05-26 11:08   ` Eric Farman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200519132306.0a3335ed.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jrossi@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.