From: Cornelia Huck <cohuck@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>,
Halil Pasic <pasic@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
kvm@vger.kernel.org, borntraeger@de.ibm.com,
bjsdjshi@linux.ibm.com, pmorel@linux.ibm.com
Subject: Re: [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths
Date: Mon, 30 Apr 2018 17:03:58 +0200 [thread overview]
Message-ID: <20180430170358.0ee6fe6a.cohuck@redhat.com> (raw)
In-Reply-To: <ca160353-696e-86f6-8a37-dd6a2f7fae8d@linux.ibm.com>
On Mon, 30 Apr 2018 16:14:21 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:
> On 04/30/2018 01:51 PM, Cornelia Huck wrote:
> > On Sat, 28 Apr 2018 13:50:23 +0800
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >
> >> * Cornelia Huck <cohuck@redhat.com> [2018-04-27 12:13:53 +0200]:
> >>
> >>> On Mon, 23 Apr 2018 13:01:13 +0200
> >>> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> >>>
> >>> typo in subject: s/traceponits/tracepoints/
> >>>
> >>>> From: Halil Pasic <pasic@linux.vnet.ibm.com>
> >>>>
> >>>> Add some tracepoints so we can inspect what is not working as is should.
> >>>>
> >>>> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
> >>>> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> >>>> ---
> >>>> drivers/s390/cio/Makefile | 1 +
> >>>> drivers/s390/cio/vfio_ccw_fsm.c | 16 +++++++-
> >>>> drivers/s390/cio/vfio_ccw_trace.h | 77 +++++++++++++++++++++++++++++++++++++++
> >>>> 3 files changed, 93 insertions(+), 1 deletion(-)
> >>>> create mode 100644 drivers/s390/cio/vfio_ccw_trace.h
> >>>
> >>>
> >>>> @@ -135,6 +142,8 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>> goto err_out;
> >>>>
> >>>> io_region->ret_code = cp_prefetch(&private->cp);
> >>>> + trace_vfio_ccw_cp_prefetch(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> if (io_region->ret_code) {
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> @@ -142,11 +151,13 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> /* Start channel program and wait for I/O interrupt. */
> >>>> io_region->ret_code = fsm_io_helper(private);
> >>>> + trace_vfio_ccw_fsm_io_helper(get_schid(private),
> >>>> + io_region->ret_code);
> >>>> if (io_region->ret_code) {
> >>>> cp_free(&private->cp);
> >>>> goto err_out;
> >>>> }
> >>>> - return;
> >>>> + goto out;
> >>>> } else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
> >>>> /* XXX: Handle halt. */
> >>>> io_region->ret_code = -EOPNOTSUPP;
> >>>> @@ -159,6 +170,9 @@ static void fsm_io_request(struct vfio_ccw_private *private,
> >>>>
> >>>> err_out:
> >>>> private->state = VFIO_CCW_STATE_IDLE;
> >>>> +out:
> >>>> + trace_vfio_ccw_io_fctl(scsw->cmd.fctl, get_schid(private),
> >>>> + io_region->ret_code);
> >>>> }
> >>>>
> >>>> /*
> >>>
> >>> I really don't want to bikeshed, especially as some tracepoints are
> >>> better than no tracepoints, but...
> >>>
> >>> We now trace fctl/schid/ret_code unconditionally (good).
> >>>
> >>> We trace the outcome of cp_prefetch() and fsm_io_helper()
> >>> unconditionally. We don't, however, trace all things that may go wrong.
> >>> We have the tracepoint at the end, but it cannot tell us where the
> >>> error came from. Should we have tracepoints in every place (in this
> >>> function) that may generate an error? Only if there is an actual error?
> >>> Are the two enough for common debug scenarios?
> >> Trace actual error sounds like a better idea than trace unconditionally
> >> of these two functions.
> >> These two are not enough for common debug scenarios. For example, we
> >> cann't tell if a -EOPNOTSUPP is a orb->tm.b problem, or error code
> >> returned by cp_init().
> >>
> >> Idea to improve:
> >> 1. Trace actual error.
> >> 2. Define a trace event and add error trace for cp_init().
> >
> > Hm. Going from what I have done in the past when doing printk debugging:
> >
> > - stick in a message that is always hit, with some information about
> > parameters, if it makes sense
> > - stick in a message "foo happened!" in the error branches
> > - or, alternatively, trace the called functions
> >
> > So tracing on failure only might be more useful? Have all failure paths
> > under a common knob to turn on/off?
> >
> >>> Opinions? We can just go ahead with this and improve things later
> >>> on, I guess.
> >>>
> >> I think it's also fine to do this - better something than nothing. We
> >> could at least have a code base to be improved to make everybody
> >> happier in future.
> >
> > Maybe keep the patch as it is now, except trace the errors only
> > (keeping the fctl trace point)?
>
> What do you mean by this sentence. Get rid of vfio_ccw_io_fctl or get
> rid of vfio_ccw_cp_prefetch and vfio_ccw_fsm_io_helper, or get don't
> get rid of any, but make some conditional (!errno)?
The third option.
>
> >
> > Halil, as you wrote the patch (and I presume you found it helpful):
> > What is your opinion?
> >
>
> I'm in favor of this patch (as previously stated here
> https://patchwork.kernel.org/patch/10298305/). And regarding the
> questions under discussion I'm mostly fine either way.
OK.
>
> I think the naming of this fctl thing is a bit cryptic,
> but if we don't see this as ABI I'm fine with it -- can be improved.
> What would be a better name? I was thinking along the lines accept_request.
> (Bad error code would mean that the request did not get accepted. Good
> code does not mean the requested function was performed successfully.)
I think fctl is fine (if you don't understand what 'fctl' is, you're
unlikely to understand it even if it were named differently.)
>
> Also I think vfio_ccw_io_fctl with no zero error code would make sense
> as dev_warn. If I were an admin looking into a problem I would very much
> appreciate seeing something in the messages log (and not having to enable
> tracing first). This point seems to be a good one for high level 'request gone
> bad' kind of report. Opinions?
I'd also exclude -EOPNOTSUPP (as this also might happen with e.g. a halt/clear enabled user space, which probes availability of halt/clear support by giving it a try once (yes, I really want to post my patches this week.))
next prev parent reply other threads:[~2018-04-30 15:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-23 11:01 [PATCH v2 0/5] vfio: ccw: error handling fixes and improvements Dong Jia Shi
2018-04-23 11:01 ` [PATCH v2 1/5] vfio: ccw: fix cleanup if cp_prefetch fails Dong Jia Shi
2018-04-23 11:38 ` Halil Pasic
2018-04-23 11:40 ` Cornelia Huck
2018-04-23 12:00 ` Halil Pasic
2018-04-24 9:31 ` Cornelia Huck
[not found] ` <20180425024341.GY12194@bjsdjshi@linux.vnet.ibm.com>
2018-04-25 11:19 ` Halil Pasic
2018-04-23 11:01 ` [PATCH v2 2/5] vfio: ccw: shorten kernel doc description for pfn_array_pin() Dong Jia Shi
2018-04-23 11:44 ` Cornelia Huck
2018-04-23 11:01 ` [PATCH v2 3/5] vfio: ccw: refactor and improve pfn_array_alloc_pin() Dong Jia Shi
2018-04-23 11:01 ` [PATCH v2 4/5] vfio: ccw: set ccw->cda to NULL defensively Dong Jia Shi
2018-04-23 11:01 ` [PATCH v2 5/5] vfio: ccw: add traceponits for interesting error paths Dong Jia Shi
2018-04-27 10:13 ` Cornelia Huck
[not found] ` <20180428055023.GS5428@bjsdjshi@linux.vnet.ibm.com>
2018-04-30 11:51 ` Cornelia Huck
2018-04-30 14:14 ` Halil Pasic
2018-04-30 15:03 ` Cornelia Huck [this message]
2018-04-30 16:51 ` Halil Pasic
[not found] ` <20180502022330.GT5428@bjsdjshi@linux.vnet.ibm.com>
[not found] ` <20180516065355.GB6363@bjsdjshi@linux.ibm.com>
2018-05-22 12:55 ` Cornelia Huck
2018-04-23 11:32 ` [PATCH v2 0/5] vfio: ccw: error handling fixes and improvements Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180430170358.0ee6fe6a.cohuck@redhat.com \
--to=cohuck@redhat.com \
--cc=bjsdjshi@linux.ibm.com \
--cc=bjsdjshi@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=pasic@linux.ibm.com \
--cc=pasic@linux.vnet.ibm.com \
--cc=pmorel@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.