All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Goel, Akash" <akash.goel@intel.com>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file
Date: Wed, 12 Oct 2016 14:32:17 +0100	[thread overview]
Message-ID: <0fcf3442-1697-e83a-55c1-f57c19fe9c22@linux.intel.com> (raw)
In-Reply-To: <080dacb9-dff0-4963-af40-c2d47b497c8c@intel.com>


On 10/10/2016 15:31, Goel, Akash wrote:
>
>
> On 10/10/2016 7:22 PM, Tvrtko Ursulin wrote:
>>
>> On 10/10/2016 11:59, akash.goel@intel.com wrote:
>>> From: Akash Goel <akash.goel@intel.com>
>>>
>>> This patch provides a test utility which helps capture GuC firmware
>>> logs and
>>> then dump them to file.
>>> The logs are pulled from a debugfs file
>>> '/sys/kernel/debug/dri/guc_log' and
>>> by default stored into a file 'guc_log_dump.dat'. The name, 
>>> including the
>>> location, of the output file can be changed through a command line
>>> argument.
>>>
>>> The utility goes into an infinite loop where it waits for the arrival
>>> of new
>>> logs and as soon as new set of logs are produced it captures them in
>>> its local
>>> buffer which is then flushed out to the file on disk.
>>> Any time when logging needs to be ended, User can stop this utility
>>> (CTRL+C).
>>>
>>> Before entering into a loop, it first discards whatever logs are
>>> present in
>>> the debugfs file.
>>> This way User can first launch this utility and then start a
>>> workload/activity
>>> for which GuC firmware logs are to be actually captured and keep
>>> running the
>>> utility for as long as its needed, like once the workload is over this
>>> utility
>>> can be forcefully stopped.
>>>
>>> If the logging wasn't enabled on GuC side by the Driver at boot time,
>>> utility
>>> will first enable the logging and later on when it is stopped (CTRL+C)
>>> it will
>>> also pause the logging on GuC side.
>>>
>>> v2:
>>> - Use combination of alarm system call & SIGALRM signal to run the
>>> utility
>>>    for required duration. (Tvrtko)
>>> - Fix inconsistencies, do minor cleanup and refactoring. (Tvrtko)
>>>
>>> v3:
>>> - Fix discrepancy for the output file command line option and update 
>>> the
>>>    Usage/help string.
>>>
>>> v4:
>>> - Update the exit condition for flusher thread, now will exit only 
>>> after
>>>    the capture loop is over and not when the flag to stop logging is 
>>> set.
>>>    This handles a corner case, due to which the dump of last captured
>>> buffer
>>>    was getting missed.
>>> - Add a newline character at the end of assert messages.
>>> - Avoid the assert for the case, which occurs very rarely, when there
>>> are no
>>>    bytes read from the relay file.
>>>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v3)
>>> ---
>>>   tools/Makefile.sources   |   1 +
>>>   tools/intel_guc_logger.c | 438
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 439 insertions(+)
>>>   create mode 100644 tools/intel_guc_logger.c
>>>
>>> diff --git a/tools/Makefile.sources b/tools/Makefile.sources
>>> index 2bb6c8e..be58871 100644
>>> --- a/tools/Makefile.sources
>>> +++ b/tools/Makefile.sources
>>> @@ -19,6 +19,7 @@ tools_prog_lists =        \
>>>       intel_gpu_time        \
>>>       intel_gpu_top        \
>>>       intel_gtt        \
>>> +    intel_guc_logger        \
>>>       intel_infoframes    \
>>>       intel_l3_parity        \
>>>       intel_lid        \
>>> diff --git a/tools/intel_guc_logger.c b/tools/intel_guc_logger.c
>>> new file mode 100644
>>> index 0000000..159a54e
>>> --- /dev/null
>>> +++ b/tools/intel_guc_logger.c
>>> @@ -0,0 +1,438 @@
>>> +
>>> +#define _GNU_SOURCE  /* For using O_DIRECT */
>>> +#include <inttypes.h>
>>> +#include <stdio.h>
>>> +#include <fcntl.h>
>>> +#include <sched.h>
>>> +#include <sys/ioctl.h>
>>> +#include <string.h>
>>> +#include <errno.h>
>>> +#include <signal.h>
>>> +#include <stdlib.h>
>>> +#include <unistd.h>
>>> +#include <sys/poll.h>
>>> +#include <sys/wait.h>
>>> +#include <sys/types.h>
>>> +#include <sys/syscall.h>
>>> +#include <sys/mman.h>
>>> +#include <assert.h>
>>> +#include <pthread.h>
>>> +
>>> +#include "igt.h"
>>> +
>>> +#define MB(x) ((uint64_t)(x) * 1024 * 1024)
>>> +#ifndef PAGE_SIZE
>>> +  #define PAGE_SIZE 4096
>>> +#endif
>>> +/* Currently the size of GuC log buffer is 19 pages & so is the size
>>> of relay
>>> + * subbuffer. If the size changes in future, then this define also
>>> needs to be
>>> + * updated accordingly.
>>> + */
>>> +#define SUBBUF_SIZE (19*PAGE_SIZE)
>>> +/* Need large buffering from logger side to hide the DISK IO latency,
>>> Driver
>>> + * can only store 8 snapshots of GuC log buffer in relay.
>>> + */
>>> +#define NUM_SUBBUFS 100
>>> +
>>> +#define RELAY_FILE_NAME  "guc_log"
>>> +#define DEFAULT_OUTPUT_FILE_NAME  "guc_log_dump.dat"
>>> +#define CONTROL_FILE_NAME "i915_guc_log_control"
>>> +
>>> +char *read_buffer;
>>> +char *out_filename;
>>> +int poll_timeout = 2; /* by default 2ms timeout */
>>> +pthread_mutex_t mutex;
>>> +pthread_t flush_thread;
>>> +int verbosity_level = 3; /* by default capture logs at max 
>>> verbosity */
>>> +uint32_t produced, consumed;
>>> +uint64_t total_bytes_written;
>>> +int num_buffers = NUM_SUBBUFS;
>>> +int relay_fd, outfile_fd = -1;
>>> +uint32_t test_duration, max_filesize;
>>> +pthread_cond_t underflow_cond, overflow_cond;
>>> +bool stop_logging, discard_oldlogs, capturing_stopped;
>>> +
>>> +static void guc_log_control(bool enable_logging)
>>> +{
>>> +    int control_fd;
>>> +    char data[19];
>>> +    uint64_t val;
>>> +    int ret;
>>> +
>>> +    control_fd = igt_debugfs_open(CONTROL_FILE_NAME, O_WRONLY);
>>> +    igt_assert_f(control_fd >= 0, "couldn't open the guc log control
>>> file\n");
>>> +
>>> +    val = enable_logging ? ((verbosity_level << 4) | 0x1) : 0;
>>> +
>>> +    ret = snprintf(data, sizeof(data), "0x%" PRIx64, val);
>>> +    igt_assert(ret > 2 && ret < sizeof(data));
>>> +
>>> +    ret = write(control_fd, data, ret);
>>> +    igt_assert_f(ret > 0, "couldn't write to the log control file\n");
>>> +
>>> +    close(control_fd);
>>> +}
>>> +
>>> +static void int_sig_handler(int sig)
>>> +{
>>> +    igt_info("received signal %d\n", sig);
>>> +
>>> +    stop_logging = true;
>>> +}
>>> +
>>> +static void pull_leftover_data(void)
>>> +{
>>> +    unsigned int bytes_read = 0;
>>> +    int ret;
>>> +
>>> +    do {
>>> +        /* Read the logs from relay buffer */
>>> +        ret = read(relay_fd, read_buffer, SUBBUF_SIZE);
>>> +        if (!ret)
>>> +            break;
>>> +
>>> +        igt_assert_f(ret > 0, "failed to read from the guc log 
>>> file\n");
>>> +        igt_assert_f(ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> +        bytes_read += ret;
>>> +
>>> +        if (outfile_fd >= 0) {
>>> +            ret = write(outfile_fd, read_buffer, SUBBUF_SIZE);
>>> +            igt_assert_f(ret == SUBBUF_SIZE, "couldn't dump the logs
>>> in a file\n");
>>> +            total_bytes_written += ret;
>>> +        }
>>> +    } while(1);
>>> +
>>> +    igt_debug("%u bytes flushed\n", bytes_read);
>>> +}
>>> +
>>> +static int num_filled_bufs(void)
>>> +{
>>> +    return (produced - consumed);
>>> +}
>>> +
>>> +static void pull_data(void)
>>> +{
>>> +    char *ptr;
>>> +    int ret;
>>> +
>>> +    pthread_mutex_lock(&mutex);
>>> +    while (num_filled_bufs() >= num_buffers) {
>>> +        igt_debug("overflow, will wait, produced %u, consumed %u\n",
>>> produced, consumed);
>>> +        /* Stall the main thread in case of overflow, as there are no
>>> +         * buffers available to store the new logs, otherwise there
>>> +         * could be corruption if both threads work on the same 
>>> buffer.
>>> +         */
>>> +        pthread_cond_wait(&overflow_cond, &mutex);
>>> +    };
>>> +    pthread_mutex_unlock(&mutex);
>>> +
>>> +    ptr = read_buffer + (produced % num_buffers) * SUBBUF_SIZE;
>>> +
>>> +    /* Read the logs from relay buffer */
>>> +    ret = read(relay_fd, ptr, SUBBUF_SIZE);
>>> +    igt_assert_f(ret >= 0, "failed to read from the guc log file\n");
>>> +    igt_assert_f(!ret || ret == SUBBUF_SIZE, "invalid read from relay
>>> file\n");
>>> +
>>> +    if (ret) {
>>> +        pthread_mutex_lock(&mutex);
>>> +        produced++;
>>> +        pthread_cond_signal(&underflow_cond);
>>> +        pthread_mutex_unlock(&mutex);
>>> +    } else {
>>> +        /* Occasionally (very rare) read from the relay file 
>>> returns no
>>> +         * data, albeit the polling done prior to read call indicated
>>> +         * availability of data.
>>> +         */
>>> +        igt_debug("no data read from the relay file\n");
>>
>> This worries me. Relayfs kernel side is quite simple which would suggest
>> there is maybe a bug in the logger? And it is very important that we
>> trust logger correctly capturing data. Because people will spend a lot
>> of time debugging issues from the logs and it would be bad to have to
>> chase irregularities caused by a problem in the logger.
>>
> Actually I see this issue rarely and only after logging some GBs of 
> data. And most likely it won't have any bearing on the correctness of 
> captured data, the missed snapshot shall be pulled in the next 
> iteration of capture loop. Also no-overwrite mode of relay is being used.
>
> As per my hunch, it can't be ruled out that this is not a relayfs issue.
>
>> Could you please try and identify why exactly is this happening?
> Will try to investigate both from relayfs side as well as logger side, 
> but it would take some time.
> This may not be considered a blocker.
>

As you reported off list, after some deep investigation there seems to 
be no potential for lost packets. "Empty wakeup" is just a consequence 
of unsynchronized nature of relayfs.

In the light of that I am happy with the logger, so:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-10-12 13:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06 10:43 [PATCH] tools/intel_guc_logger: Utility for capturing GuC firmware logs in a file akash.goel
2016-09-06 13:17 ` Tvrtko Ursulin
2016-09-06 15:33   ` Goel, Akash
2016-09-06 15:52     ` Tvrtko Ursulin
2016-09-07  8:10       ` Goel, Akash
2016-09-07  8:44         ` Chris Wilson
2016-09-07  9:37           ` Tvrtko Ursulin
2016-09-07  9:55             ` Chris Wilson
2016-09-07 15:27               ` [PATCH v2] " akash.goel
2016-09-07 15:39                 ` Tvrtko Ursulin
2016-09-08 10:44                   ` [PATCH v3] " akash.goel
2016-10-10 10:59                     ` [PATCH v4] " akash.goel
2016-10-10 13:52                       ` Tvrtko Ursulin
2016-10-10 14:31                         ` Goel, Akash
2016-10-12 13:32                           ` Tvrtko Ursulin [this message]
2016-10-25  9:01                             ` Petri Latvala
2016-09-07 10:45             ` [PATCH] " Goel, Akash

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0fcf3442-1697-e83a-55c1-f57c19fe9c22@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=akash.goel@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.