All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: kvm@vger.kernel.org, linux-s390@vger.kernel.org,
	farman@linux.ibm.com, pasic@linux.ibm.com, pmorel@linux.ibm.com
Subject: Re: [RFC v2 2/3] vfio-ccw: Prevent quiesce function going into an infinite loop
Date: Mon, 15 Apr 2019 10:13:32 +0200	[thread overview]
Message-ID: <20190415101332.7ebbe5ad.cohuck@redhat.com> (raw)
In-Reply-To: <396cde69-5c1d-b9e5-aaa2-248cf91e6f60@linux.ibm.com>

On Fri, 12 Apr 2019 10:38:50 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 04/12/2019 04:10 AM, Cornelia Huck wrote:
> > On Thu, 11 Apr 2019 16:30:44 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >> On 04/11/2019 12:24 PM, Cornelia Huck wrote:  
> >>> On Mon,  8 Apr 2019 17:05:32 -0400
> >>> Farhan Ali <alifm@linux.ibm.com> wrote:

> >>> Looking at the possible return codes:
> >>> * -ENODEV -> device is not operational anyway, in theory you should even
> >>>      not need to bother with disabling the subchannel
> >>> * -EIO -> we've run out of retries, and the subchannel still is not
> >>>     idle; I'm not sure if we could do anything here, as disable is
> >>>     unlikely to work, either  

(...)

> Thinking a little bit more about EIO, if the return code is EIO then it 
> means we have exhausted all our options with cancel_halt_clear and the 
> subchannel/device is still status pending, right?

Yes.

> 
> I think we should still continue to try and disable the subchannel, 
> because if not then the subchannel/device could in some point of time 
> come back and bite us. So we really should protect the system from this 
> behavior.

I think trying to disable the subchannel does not really hurt, but I
fear it won't succeed in that case...

> 
> I think for EIO we should log an error message, but still try to 
> continue with disabling the subchannel. What do you or others think?

Logging an error may be useful (it's really fouled up at that time), but...

> 
> 
> 
> 
> >>  
> >>>> +		flush_workqueue(vfio_ccw_work_q);
> >>>> +		spin_lock_irq(sch->lock);
> >>>>    		ret = cio_disable_subchannel(sch);

...there's a good chance that we'd get -EBUSY here, which would keep us
in the loop. We probably need to break out after we got -EIO from
cancel_halt_clear, regardless of which return code we get from the
disable.

(It will be "interesting" to see what happens with such a stuck
subchannel in the calling code; but I don't really see many options.
Panic seems way too strong; maybe mark the subchannel as "broken; no
idea how to fix"? But that would be a follow-on patch; I think if we
avoid the endless loop here, this patch is a real improvement and
should just go in.)

> >>>>    	} while (ret == -EBUSY);
> >>>>    out_unlock:  
> >>>
> >>>      
> >>  
> > 
> >   
> 

  reply	other threads:[~2019-04-15  8:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-08 21:05 [RFC v2 0/3] fio-ccw fixes for kernel stacktraces Farhan Ali
2019-04-08 21:05 ` [RFC v2 1/3] vfio-ccw: Do not call flush_workqueue while holding the spinlock Farhan Ali
2019-04-08 21:05 ` [RFC v2 2/3] vfio-ccw: Prevent quiesce function going into an infinite loop Farhan Ali
2019-04-11 16:24   ` Cornelia Huck
2019-04-11 20:30     ` Farhan Ali
2019-04-12  8:10       ` Cornelia Huck
2019-04-12 14:38         ` Farhan Ali
2019-04-15  8:13           ` Cornelia Huck [this message]
2019-04-15 13:38             ` Farhan Ali
2019-04-15 14:18               ` Cornelia Huck
2019-04-15 14:24                 ` Farhan Ali
2019-04-15 14:44                   ` Cornelia Huck
2019-04-08 21:05 ` [RFC v2 3/3] vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev Farhan Ali
2019-04-11 16:27   ` Cornelia Huck
2019-04-11 20:39     ` Farhan Ali
2019-04-12  8:12       ` Cornelia Huck
2019-04-12 14:13         ` Farhan Ali
2019-04-12 21:03           ` Eric Farman
2019-04-12 21:01   ` Eric Farman
2019-04-15 16:45 ` [RFC v2 0/3] fio-ccw fixes for kernel stacktraces Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415101332.7ebbe5ad.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=alifm@linux.ibm.com \
    --cc=farman@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.