From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.vnet.ibm.com>
Cc: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
kvm@vger.kernel.org, borntraeger@de.ibm.com,
pmorel@linux.vnet.ibm.com
Subject: Re: [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths
Date: Thu, 29 Mar 2018 14:32:55 +0200 [thread overview]
Message-ID: <20180329143255.5ce349f1.cohuck@redhat.com> (raw)
In-Reply-To: <493b19b7-fa86-0220-7427-be519f1b40ad@linux.vnet.ibm.com>
On Tue, 27 Mar 2018 17:27:54 +0200
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> On 03/27/2018 12:07 PM, Cornelia Huck wrote:
> > On Tue, 27 Mar 2018 15:51:14 +0800
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >
> >> * Cornelia Huck <cohuck@redhat.com> [2018-03-26 15:59:02 +0200]:
> >>
> >> [...]
> >>
> >>>> @@ -131,6 +138,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> io_region->ret_code = cp_prefetch(&private->cp);
> >>>> if (io_region->ret_code) {
> >>>> + trace_vfio_ccw_cp_prefetch_failed(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> @@ -138,6 +147,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> /* Start channel program and wait for I/O interrupt. */
> >>>> io_region->ret_code = fsm_io_helper(private);
> >>>> if (io_region->ret_code) {
> >>>> + trace_vfio_ccw_ssch_failed(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> @@ -145,10 +156,12 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> >>>> /* XXX: Handle halt. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> + trace_vfio_ccw_halt(get_schid(private));
> >>>> goto err_out;
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
> >>>> /* XXX: Handle clear. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> + trace_vfio_ccw_clear(get_schid(private));
> >>>> goto err_out;
> >>>
> >>> Hmmm.... perhaps better to just trace the function (start/halt/clear)
> >>> in any case?
> >>>
> >> I agree trace the function in any case is good. @Halil, opinion?
>
> See below. I don't really understand the question. Trace the function
> means, trace when it was requested on a subch, or trace the outcome
> of the request? Seems the question got amended though.
Both tracing the functions per se and tracing their outcome has its
benefits IME.
>
> >>
> >> But the traces for cp_prefetch() and fsm_io_helper() should also be
> >> kept, since they are helpful to debug problem. So I tend to trace the
> >> following in any case:
> >> - cp_prefetch()
> >> - fsm_io_helper()
> >> - start
> >> - halt
> >> - clear
> >
> > OK, I was unclear :) I'd argue to keep the others, just replace the
> > halt/clear tracing with tracing the function.
>
> I'm a bit confused.
>
> My idea was the following: Prior to this patch we had a kind of OK
> possibility to trace what we consider the expected and good scenario
> using the function tracer and the normal cio stuff. But what I wanted is
> to verify that my fix works (problem occurs but is handled more
> appropriately) and I've found it difficult to trace this. So the idea was
> to introduce trace points which tell us what went wrong. The idea is to
> benefit diagnostic of unrecoverable failures and get an idea how often
> are we doing extra work recovering recoverable failures.
>
> In this sense halt and clear is something that does not work currently.
> When we get proper halt and clear, these trace points were supposed to
> become obsolete and get removed. I guess the implementation will
> eventually issue csch() and hsch() for the underlying subchannel and and
> we should be able to trace those (see drivers/s390/cio/ioasm.c)
Tracing what userspace expects us to do has its benefits as well (i.e.
do we get mainly ssch? unexpected amounts of csch? etc.). I find it
useful to be able to get this information from the vfio-ccw layer
already.
>
> Now this is the tricky part. I've read some used to see trace points as
> part of the kernel ABI. See e.g. https://lwn.net/Articles/705270/. AFAIU
> this is not a concern any more -- but my confidence on this is pretty
> low.
I'd not worry about that.
FWIW, for kvm/s390 I tried to do the following:
- one set of tracepoints for things that are mandated by the
architecture and therefore expected to be stable
- and another set of tracepoints for implementation details that have a
good change of changing
>
> So IMHO we have two questions to answer:
>
> * Do we want static trace points (events) for undesirable or concerning
> stuff happened (e.g. translation failed, a function we hope we can live
> without was supposed to be executed)?
>
> * Do we want static trace points (events) coverage for the normal path
> (that is beyond what cio and the function tracer already give us)? What
> benefit do we expect if we do want these? (E.g. performance evaluation,
> better debugging especially when multiple virtualized subchannels used.)
Whatever you think is useful. Of course, we can go there step by step
(and I'd really advise to do so; getting some useful info right now and
not holding things up until we have a more complete understand what
would be useful for e.g. ffdc).
You can always have userspace enable tracing of some things by default
and leave the rest off until wanted.
> >>> Just tracing the function is useful now and will stay useful in the
> >>> future.
> >> If we agree with ideas given above, we could:
> >> 1. DECLARE_EVENT_CLASS as vfio_ccw_schid_errno
> >> 2. DEFINE_EVENT:
> >> vfio_ccw_fam_io_helper
> >> vfio_ccw_cp_prefetch
> >> vfio_ccw_io_start
> >> vfio_ccw_io_clear
> >> vfio_ccw_io_halt
> >
> > Use a vfio_ccw_io_fctl tracepoint instead?
> >
>
> That would trace what? A request to perform a basic I/O function
> on the virtualized subchannel by issuing the corresponding I/O
> instruction to the underlying subchannel?
Basically, what userspace requests us to do.
>
> If that's the case, I think using the trace events from
> drivers/s390/cio/ioasm.c (tracing the instructions) may suffice for
> the 'good case'.
>
> >> 3. add trace points in coresponding places
> >>
> >>>
> >>> Another idea: Trace the fsm state transitions. Probably something for
> >>> an additional patch.
> >> Considering Pierre is refactoring the fsm, we can add trace points in
> >> that series (or as following on patch).
> >
> > Yes, while poking around I also wondered whether we should tweak the
> > fsm in places. So adding tracepoints there looks like a good idea.
> >
>
>
> How does this relate to normal cio? I don't think cio has such trace
> points capturing the state machine transitions. I wonder why? Spontaneously
> I would say sounds like something useful.
The cio fsm simply dates back to the 2.5 era.
next prev parent reply other threads:[~2018-03-29 12:32 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-21 2:08 [PATCH 0/4] vfio: ccw: error handling fixes and improvements Dong Jia Shi
2018-03-21 2:08 ` [PATCH 1/4] vfio: ccw: fix cleanup if cp_prefetch fails Dong Jia Shi
2018-03-21 12:49 ` Halil Pasic
[not found] ` <20180322022248.GB12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-22 9:37 ` Pierre Morel
2018-03-22 10:10 ` Halil Pasic
2018-03-26 12:28 ` Cornelia Huck
[not found] ` <20180327014200.GH12194@bjsdjshi@linux.vnet.ibm.com>
2018-04-20 10:54 ` Halil Pasic
2018-04-20 11:36 ` Cornelia Huck
2018-04-20 11:55 ` Halil Pasic
2018-03-21 2:08 ` [PATCH 2/4] vfio: ccw: refactor and improve pfn_array_alloc_pin() Dong Jia Shi
2018-03-26 13:28 ` Cornelia Huck
[not found] ` <20180327030026.GI12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:01 ` Cornelia Huck
[not found] ` <20180328023638.GL12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-28 7:58 ` Cornelia Huck
2018-03-21 2:08 ` [PATCH 3/4] vfio: ccw: set ccw->cda to NULL defensively Dong Jia Shi
2018-03-26 13:47 ` Cornelia Huck
[not found] ` <20180327030809.GJ12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:03 ` Cornelia Huck
2018-03-21 2:08 ` [PATCH 4/4] vfio: ccw: add traceponits for interesting error paths Dong Jia Shi
2018-03-26 13:59 ` Cornelia Huck
[not found] ` <20180327075114.GK12194@bjsdjshi@linux.vnet.ibm.com>
2018-03-27 10:07 ` Cornelia Huck
2018-03-27 15:27 ` Halil Pasic
2018-03-29 12:32 ` Cornelia Huck [this message]
[not found] ` <20180410021639.GN5428@bjsdjshi@linux.vnet.ibm.com>
2018-04-10 8:55 ` Cornelia Huck
2018-04-10 10:48 ` Halil Pasic
2018-03-26 9:02 ` [PATCH 0/4] vfio: ccw: error handling fixes and improvements Cornelia Huck
2018-03-26 11:25 ` Halil Pasic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180329143255.5ce349f1.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=bjsdjshi@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=pasic@linux.vnet.ibm.com \
--cc=pmorel@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).