linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.vnet.ibm.com>
Cc: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, borntraeger@de.ibm.com,
	pmorel@linux.vnet.ibm.com
Subject: Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
Date: Thu, 29 Mar 2018 14:32:55 +0200	[thread overview]
Message-ID: <20180329143255.5ce349f1.cohuck@redhat.com> (raw)
In-Reply-To: <493b19b7-fa86-0220-7427-be519f1b40ad@linux.vnet.ibm.com>

On Tue, 27 Mar 2018 17:27:54 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:

> On 03/27/2018 12:07 PM, Cornelia Huck wrote:
> > On Tue, 27 Mar 2018 15:51:14 +0800
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >   
> >> * Cornelia Huck <cohuck@redhat.com> [2018-03-26 15:59:02 +0200]:
> >>
> >> [...]
> >>  
> >>>> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>  
> >>>>  		io_region->ret_code = cp_prefetch(&private->cp);
> >>>>  		if (io_region->ret_code) {
> >>>> +			trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
> >>>> +							  io_region->ret_code);
> >>>>  			cp_free(&private->cp);
> >>>>  			goto err_out;
> >>>>  		}
> >>>> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>  		/* Start channel program and wait for I/O interrupt. */
> >>>>  		io_region->ret_code = fsm_io_helper(private);
> >>>>  		if (io_region->ret_code) {
> >>>> +			trace_vfio_ccw_ssch_failed(get_schid(private),
> >>>> +						   io_region->ret_code);
> >>>>  			cp_free(&private->cp);
> >>>>  			goto err_out;
> >>>>  		}
> >>>> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>  	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> >>>>  		/* XXX: Handle halt. */
> >>>>  		io_region->ret_code = -EOPNOTSUPP;
> >>>> +		trace_vfio_ccw_halt(get_schid(private));
> >>>>  		goto err_out;
> >>>>  	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> >>>>  		/* XXX: Handle clear. */
> >>>>  		io_region->ret_code = -EOPNOTSUPP;
> >>>> +		trace_vfio_ccw_clear(get_schid(private));
> >>>>  		goto err_out;    
> >>>
> >>> Hmmm.... perhaps better to just trace the function (start/halt/clear)
> >>> in any case?
> >>>     
> >> I agree trace the function in any case is good. @Halil, opinion?  
> 
> See below. I don't really understand the question. Trace the function
> means, trace when it was requested on a subch, or trace the outcome
> of the request? Seems the question got amended though.

Both tracing the functions per se and tracing their outcome has its
benefits IME.

> 
> >>
> >> But the traces for cp_prefetch() and fsm_io_helper() should also be
> >> kept, since they are helpful to debug problem. So I tend to trace the
> >> following in any case:
> >> - cp_prefetch()
> >> - fsm_io_helper()
> >> - start
> >> - halt
> >> - clear  
> > 
> > OK, I was unclear :) I'd argue to keep the others, just replace the
> > halt/clear tracing with tracing the function.  
> 
> I'm a bit confused.
> 
> My idea was the following: Prior to this patch we had a kind of OK
> possibility to trace what we consider the expected and good scenario
> using the function tracer and the normal cio stuff. But what I wanted is
> to verify that my fix works (problem occurs but is handled more
> appropriately) and I've found it difficult to trace this. So the idea was
> to introduce trace points which tell us what went wrong. The idea is to
> benefit diagnostic of unrecoverable failures and get an idea how often
> are we doing extra work recovering recoverable failures.
> 
> In this sense halt and clear is something that does not work currently.
> When we get proper halt and clear, these trace points were supposed to
> become obsolete and get removed. I guess the implementation will
> eventually issue csch() and hsch() for the underlying subchannel and and
> we should be able to trace those (see drivers/s390/cio/ioasm.c)

Tracing what userspace expects us to do has its benefits as well (i.e.
do we get mainly ssch? unexpected amounts of csch? etc.). I find it
useful to be able to get this information from the vfio-ccw layer
already.

> 
> Now this is the tricky part. I've read some used to see trace points as
> part of the kernel ABI.  See e.g. https://lwn.net/Articles/705270/. AFAIU
> this is not a concern any more -- but my confidence on this is pretty
> low.

I'd not worry about that.

FWIW, for kvm/s390 I tried to do the following:
- one set of tracepoints for things that are mandated by the
  architecture and therefore expected to be stable
- and another set of tracepoints for implementation details that have a
  good change of changing

> 
> So IMHO we have two questions to answer:
> 
> * Do we want static trace points (events) for undesirable or concerning
> stuff happened (e.g. translation failed, a function we hope we can live
> without was supposed to be executed)?
> 
> * Do we want static trace points (events) coverage for the normal path
> (that is beyond what cio and the function tracer already give us)? What
> benefit do we expect if we do want these? (E.g. performance evaluation,
> better debugging especially when multiple virtualized subchannels used.)

Whatever you think is useful. Of course, we can go there step by step
(and I'd really advise to do so; getting some useful info right now and
not holding things up until we have a more complete understand what
would be useful for e.g. ffdc).

You can always have userspace enable tracing of some things by default
and leave the rest off until wanted.

> >>> Just tracing the function is useful now and will stay useful in the
> >>> future.    
> >> If we agree with ideas given above, we could:
> >> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno
> >> 2. DEFINE_EVENT:
> >>    vfio_ccw_fam_io_helper
> >>    vfio_ccw_cp_prefetch
> >>    vfio_ccw_io_start
> >>    vfio_ccw_io_clear
> >>    vfio_ccw_io_halt  
> > 
> > Use a vfio_ccw_io_fctl tracepoint instead?
> >   
> 
> That would trace what? A request to perform a basic I/O function
> on the virtualized subchannel by issuing the corresponding I/O
> instruction to the underlying subchannel?

Basically, what userspace requests us to do.

> 
> If that's the case, I think using the trace events from
> drivers/s390/cio/ioasm.c (tracing the instructions) may suffice for
> the 'good case'.
> 
> >> 3. add trace points in coresponding places
> >>  
> >>>
> >>> Another idea: Trace the fsm state transitions. Probably something for
> >>> an additional patch.    
> >> Considering Pierre is refactoring the fsm, we can add trace points in
> >> that series (or as following on patch).  
> > 
> > Yes, while poking around I also wondered whether we should tweak the
> > fsm in places. So adding tracepoints there looks like a good idea.
> >   
> 
> 
> How does this relate to normal cio? I don't think cio has such trace 
> points capturing the state machine transitions. I wonder why? Spontaneously
> I would say sounds like something useful.

The cio fsm simply dates back to the 2.5 era.

  reply	other threads:[~2018-03-29 12:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21  2:08 [PATCH 0/4] vfio: ccw: error handling fixes and improvements Dong Jia Shi
2018-03-21  2:08 ` [PATCH 1/4] vfio: ccw: fix cleanup if cp_prefetch fails Dong Jia Shi
2018-03-21 12:49   ` Halil Pasic
     [not found]     ` <20180322022248.GB12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-22  9:37       ` Pierre Morel
2018-03-22 10:10         ` Halil Pasic
2018-03-26 12:28         ` Cornelia Huck
     [not found]           ` <20180327014200.GH12194@bjsdjshi@linux.vnet.ibm.com>
2018-04-20 10:54             ` Halil Pasic
2018-04-20 11:36               ` Cornelia Huck
2018-04-20 11:55                 ` Halil Pasic
2018-03-21  2:08 ` [PATCH 2/4] vfio: ccw: refactor and improve pfn_array_alloc_pin() Dong Jia Shi
2018-03-26 13:28   ` Cornelia Huck
     [not found]     ` <20180327030026.GI12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:01       ` Cornelia Huck
     [not found]         ` <20180328023638.GL12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-28  7:58           ` Cornelia Huck
2018-03-21  2:08 ` [PATCH 3/4] vfio: ccw: set ccw->cda to NULL defensively Dong Jia Shi
2018-03-26 13:47   ` Cornelia Huck
     [not found]     ` <20180327030809.GJ12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:03       ` Cornelia Huck
2018-03-21  2:08 ` [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths Dong Jia Shi
2018-03-26 13:59   ` Cornelia Huck
     [not found]     ` <20180327075114.GK12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:07       ` Cornelia Huck
2018-03-27 15:27         ` Halil Pasic
2018-03-29 12:32           ` Cornelia Huck [this message]
     [not found]         ` <20180410021639.GN5428@bjsdjshi@linux.vnet.ibm.com>
2018-04-10  8:55           ` Cornelia Huck
2018-04-10 10:48             ` Halil Pasic
2018-03-26  9:02 ` [PATCH 0/4] vfio: ccw: error handling fixes and improvements Cornelia Huck
2018-03-26 11:25   ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180329143255.5ce349f1.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.vnet.ibm.com \
    --cc=pmorel@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).