* [PATCH RFC v2 0/3] Documentation: Debugging guide
@ 2024-09-24 8:45 Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 1/3] docs: media: Create separate documentation folder for media Sebastian Fricke
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Sebastian Fricke @ 2024-09-24 8:45 UTC (permalink / raw)
To: Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
The RFC contains:
- a general debugging guide split into debugging for driver developers and
debugging from userspace
- a new summary page for all media related documentation. This is inspired by
other subsystems, which first of all allows a user to find the subsystem
under the subsystems page and secondly eases general navigation through the
documentation that is sprinkled onto multiple places.
- a guide on how to debug code in the media subsystem, which points to the
parts of the general documentation and adds own routines.
WHY do we need this?
--------------------
For anyone without years of experience in the Linux kernel, knowing which tool
to use or even which tools are available is not as straightforward as some
senior developers might perceive.
We realized that there is a general need for a kind of "start page", that
allows especially beginners to get up-to-speed with the codebase and the
documentation. The documentation in particular is currently quite hard to navigate
as you mostly have to know what you are searching for to find it.
WHAT do we cover?
-----------------
The document is structured into two sections:
1. A problem-focused approach: This means, a developer facing an issue matching
one of the given examples, will find suggestions for how to approach that
problem (e.g. which tool to use) in this section
2. A tool-focused approach: This sections highlights the available tools, with
comparisions between the tools if sensible. The goal of this work is
**duplicate as little as possible** from the existing documentation and
instead provide a rough overview that provides:
- A link to the actual documentation
- A minimal example for how it can be used (from a media perspective,
if the usage isn't absolutely trivial like printk)
- A rational for why it should be used
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: laurent.pinchart@ideasonboard.com
Cc: hverkuil-cisco@xs4all.nl
Cc: mauro.chehab@linux.intel.com
Cc: kernel@collabora.com
Cc: bob.beckett@collabora.com
Cc: nicolas.dufresne@collabora.com
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
---
Changes in v2:
- Split the media debugging guide into a general and a media specific guide,
which contains mostly references to the general guide and a few media
specific aspects.
- Fill out TBD sections
- Add device coredump section
---
Sebastian Fricke (3):
docs: media: Create separate documentation folder for media
docs: Add guides section for debugging
docs: media: Debugging guide for the media subsystem
.../driver_development_debugging_guide.rst | 193 +++++++++++++++
Documentation/debugging/index.rst | 66 +++++
.../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
Documentation/index.rst | 2 +
Documentation/media/guides/debugging_issues.rst | 174 +++++++++++++
Documentation/media/guides/index.rst | 11 +
Documentation/media/index.rst | 20 ++
Documentation/subsystem-apis.rst | 1 +
8 files changed, 736 insertions(+)
---
base-commit: 68a72104cbcf38ad16500216e213fa4eb21c4be2
change-id: 20240529-b4-media_docs_improve-79ea2d480483
Best regards,
--
Sebastian Fricke <sebastian.fricke@collabora.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH RFC v2 1/3] docs: media: Create separate documentation folder for media
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
@ 2024-09-24 8:45 ` Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
` (3 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Sebastian Fricke @ 2024-09-24 8:45 UTC (permalink / raw)
To: Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
The media subsystem currently only maintains a set of Kernel API
documentations, with the intent of adding guides for the subsystem a
separate location is required. Create an empty folder with an index and
embed it into the subsystem listing.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
---
Documentation/media/index.rst | 19 +++++++++++++++++++
Documentation/subsystem-apis.rst | 1 +
2 files changed, 20 insertions(+)
diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
new file mode 100644
index 000000000000..d056a9e99dca
--- /dev/null
+++ b/Documentation/media/index.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
+Media Subsystem Documentation
+=============================
+
+.. toctree::
+ :maxdepth: 2
+
+ ../userspace-api/media/index
+ ../driver-api/media/index.rst
+ ../admin-guide/media/index
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/subsystem-apis.rst b/Documentation/subsystem-apis.rst
index 74af50d2ef7f..5a1d90fd1af6 100644
--- a/Documentation/subsystem-apis.rst
+++ b/Documentation/subsystem-apis.rst
@@ -33,6 +33,7 @@ Human interfaces
input/index
hid/index
sound/index
+ media/index
gpu/index
fb/index
leds/index
--
2.25.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 1/3] docs: media: Create separate documentation folder for media Sebastian Fricke
@ 2024-09-24 8:45 ` Sebastian Fricke
2024-09-25 8:01 ` Bagas Sanjaya
` (2 more replies)
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
` (2 subsequent siblings)
4 siblings, 3 replies; 18+ messages in thread
From: Sebastian Fricke @ 2024-09-24 8:45 UTC (permalink / raw)
To: Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
This idea was formed after noticing that new developers experience
certain difficulty to navigate within the multitude of different
debugging options in the Kernel and while there often is good
documentation for the tools, the developer has to know first that they
exist and where to find them.
Add a general debugging section to the Kernel documentation, as an
easily locatable entry point to other documentation and as a general
guideline for the topic.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
---
.../driver_development_debugging_guide.rst | 193 +++++++++++++++
Documentation/debugging/index.rst | 66 +++++
.../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
Documentation/index.rst | 2 +
4 files changed, 530 insertions(+)
diff --git a/Documentation/debugging/driver_development_debugging_guide.rst b/Documentation/debugging/driver_development_debugging_guide.rst
new file mode 100644
index 000000000000..c750f63ac1d3
--- /dev/null
+++ b/Documentation/debugging/driver_development_debugging_guide.rst
@@ -0,0 +1,193 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+========================================
+Debugging advice for driver development
+========================================
+
+This document serves as a general starting point and lookup for debugging device
+drivers.
+While this guide focuses on debugging that requires re-compiling the
+module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
+will guide you through tools like dynamic debug, ftrace and other tools useful
+for debugging issues and behavior.
+For general debugging advice, see `general-debugging-guide <index.html>`__.
+
+.. contents::
+ :depth: 3
+
+Available tools
+===============
+
+Printk & friends
+----------------
+
+These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
+
+.. _printk:
+
+**Simple printk**
+~~~~~~~~~~~~~~~~~
+
+The classic, can be used to great effect for quick and dirty development
+of new modules or to extract arbitrary necessary data for troubleshooting.
+
+Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
+
+**Pros**:
+
+- No need to learn anything, simple to use
+- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
+- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
+
+**Cons**:
+
+- Requires rebuilding the kernel/module
+- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
+
+`Full documentation <../core-api/printk-basics.html>`__
+
+.. _trace_printk:
+
+**Trace_printk**
+~~~~~~~~~~~~~~~~
+
+Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
+
+It is a tiny bit less comfortable to use than `printk`_, because you will have
+to read the messages from the trace file (See: `Reading the ftrace log
+<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
+kernel log, but very useful when printk adds unwanted delays into the code
+execution, causing issues to be flaky or hidden.)
+
+If the processing of this still causes timing issues then you can try `trace_puts()`.
+
+`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
+
+**dev_dbg**
+~~~~~~~~~~~
+
+Print statement, which can be target by `dynamic debug
+<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
+information about the device used within the context.
+
+**When is it appropriate to leave a debug print in the code?**
+
+Permanent debug statements have to be useful for a developer to troubleshoot
+driver misbehavior. Judging that is a bit more of an art than a science, but
+some guidelines are in the `Coding style guide
+<../process/coding-style.html#printing-kernel-messages>`__.
+
+**Custom printk**
+~~~~~~~~~~~~~~~~~
+
+Example:
+::
+
+ #define core_dbg(fmt, arg...) do { \
+ if (core_debug) \
+ printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
+ } while (0)
+
+**When should you do this?**
+
+It is better to just use a `pr_debug()`, which can later be turned on/off with
+dynamic debug. Additionally, a lot of drivers activate these prints via a
+variable like `core_debug` set by a module parameter. However, Module
+parameters `are not recommended anymore
+<https://lkml.org/lkml/2024/3/27/163>`_.
+
+Ftrace
+------
+
+**Creating custom Ftrace tracepoint**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
+
+`Full event tracing documentation <../trace/events.html>`__
+
+`Full Ftrace documentation <../trace/ftrace.html>`__
+
+DebugFS
+-------
+
+Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
+
+DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
+With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
+Possible use-cases among others:
+
+- Store register values
+- Keep track of variables
+- Store errors
+- Store settings
+- Toggle a setting like debug on/off
+- Error injection
+
+This is especially useful, when the size of a data dump would be hard to digest as
+part of the general kernel log (for example when dumping raw bitstream data) or
+when you are not interested in all the values all the time, but with the
+possibility to inspect them.
+
+The general idea is:
+
+- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
+- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
+
+ - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
+ - any update of `my_variable` will update the value in the file
+
+- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
+
+`Full documentation <../filesystems/debugfs.html>`__
+
+.. _error_checking:
+
+KASAN, UBSAN, lockdep and other error checkers
+----------------------------------------------
+
+KASAN (Kernel Address Sanitizer)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KASAN is a dynamic memory error detector that helps to find use-after-free and
+out-of-bounds bugs. It uses compile-time instrumentation to check every memory
+access.
+
+`Full documentation <../dev-tools/kasan.html>`__
+
+UBSAN (Undefined Behavior Sanitizer)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+UBSAN relies on compiler instrumentation and runtime checks to detect undefined
+behavior. It is designed to find a variety of issues, including signed integer overflow,
+array index out of bounds, and more.
+
+`Full documentation <../dev-tools/ubsan.html>`__
+
+lockdep (Lock Dependency Validator)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+lockdep is a runtime lock dependency validator that detects potential deadlocks
+and other locking-related issues in the kernel.
+It tracks lock acquisitions and releases, building a dependency graph that is
+analyzed for potential deadlocks.
+lockdep is especially useful for validating the correctness of lock ordering in
+the kernel.
+
+device coredump
+---------------
+
+Prerequisite: :code:`#include <linux/devcoredump.h>`
+
+Provides infrastructure through which the driver can provide arbitrary data to
+userland. It is most often used in conjunction with udev or similar userland
+infrastructure to listen for the kernel uevents, which indicates the dump is
+ready. Udev then usually has rules to copy that file somewhere for long-term
+storage and analysis as by default the data for the dump is automatically
+cleaned up after 5 minutes.
+That data is then analyzed with driver-specific tools or GDB.
+
+You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
+
+**Copyright** |copy| 2024 : Collabora
diff --git a/Documentation/debugging/index.rst b/Documentation/debugging/index.rst
new file mode 100644
index 000000000000..7bdad2fa09e1
--- /dev/null
+++ b/Documentation/debugging/index.rst
@@ -0,0 +1,66 @@
+
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+====================================================
+General debugging advice for Linux Kernel developers
+====================================================
+
+.. toctree::
+ :maxdepth: 1
+
+ driver_development_debugging_guide
+ userspace_debugging_guide
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
+
+General debugging advice
+========================
+
+Depending on the issue, a different set of tools is available to track down the
+problem or even to realize whether there is one in the first place.
+
+As a first step you have to figure out what kind of issue you want to debug.
+Depending on the answer, your methodology and choice of tools may vary.
+
+Do I need to debug with limited access?
+---------------------------------------
+
+Do you have limited access to the machine or are you unable to stop the running execution?
+
+In this case your debugging capability depends on built-in debugging support of
+provided distro kernel.
+The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
+brief overview over range of possible debugging tools in that situation. You
+can check the capability of your kernel, in most cases, by looking into config
+file within the /boot folder.
+
+Do I have root access to the system?
+------------------------------------
+
+Are you easily able to replace the module in question or to install a new kernel?
+
+In that case your range of available tools is a lot bigger, you can find the
+tools `here <driver_development_debugging_guide.html>`__.
+
+Is timing a factor?
+-------------------
+
+It is important to understand if the problem you want to debug manifests itself
+consistently (i.e. given a set of inputs you always get the same, incorrect
+output), or inconsistently. If it manifests itself inconsistently, some timing
+factor might be at play. If inserting delays into the code does change the
+behavior, then quite likely timing is a factor.
+
+When timing does alter the outcome of the code execution using a simple `printk
+<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
+work, a similar alternative is to use `trace_printk
+<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
+messages to the trace file instead of the kernel log.
+
+**Copyright** |copy| 2024 : Collabora
diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
new file mode 100644
index 000000000000..4d269a9ef913
--- /dev/null
+++ b/Documentation/debugging/userspace_debugging_guide.rst
@@ -0,0 +1,269 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==========================
+Userspace debugging advice
+==========================
+
+A brief overview of common tools to debug the Linux Kernel from userspace.
+For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
+For general debugging advice, see `general-debugging-guide <index.html>`__.
+
+.. contents::
+ :depth: 3
+
+Available tools
+===============
+
+Dynamic debug
+-------------
+
+Mechanism to filter what ends up in the kernel log by dis-/en-abling log
+messages.
+
+Prerequisite: `CONFIG_DYNAMIC_DEBUG`
+
+.. _valid_dyndbg_prints:
+
+Dynamic debug is only able to target:
+
+- `pr_debug()`
+- `dev_dbg()`
+- `print_hex_dump_debug()`
+- `print_hex_dump_bytes()`
+
+Therefore the usability of this tool is quite limited in the media subsystem,
+because, as of now, there is no uniform rule for adding debug prints to the codebase,
+resulting in a variety of ways these prints are implemented.
+
+Also, note that most debug statements are implemented as a variation of
+`dprintk`, which have to be activated via a parameter in respective module,
+dynamic debug is unable to do that step for you.
+
+Here is one example, that enables all available `pr_debug()`'s within the file:
+::
+
+ $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
+ $ ddcmd '-p; file v4l2-h264.c +p'
+ $ grep =p /proc/dynamic_debug/control
+ drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
+ drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
+
+**When should you use this over** `Ftrace`_ **?**
+
+- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
+- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
+- When you care more about receiving specific log messages than tracing the pattern of how a function is called
+
+`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
+
+Ftrace
+------
+
+Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
+
+Trace whenever the a file is opened:
+::
+
+ $ cd /sys/kernel/tracing
+ $ echo function > /sys/kernel/tracing/current_tracer
+ $ echo do_filep_open > set_ftrace_filter
+ $ echo 1 > tracing_on
+ $ cat trace
+ find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
+ find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
+ find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
+
+.. _event_tracing:
+
+Activate a ftrace event on top of that:
+::
+
+ $ echo 1 > events/kmem/kfree/enable
+ find-5351 [005] ...1. 678288.910143: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910185: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ...1. 678288.910218: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910260: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ...1. 678288.910293: do_filp_open <-do_sys_openat2
+ find-5351 [005] ..... 678288.910345: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
+ find-5351 [005] ..... 678288.910389: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000007ba73e40
+ find-5351 [005] ..... 678288.910390: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000009e4850bc
+ find-5351 [005] ..... 678288.910391: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000004156f20f
+ find-5351 [005] ..... 678288.910393: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=00000000c7207e20
+ find-5351 [005] ..... 678288.910394: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000037c31d76
+ find-5351 [005] ..... 678288.910395: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000001922677
+
+.. _read_ftrace_log:
+
+**Reading the ftrace log**
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `trace` file can be read just like any other file (`cat`, `tail`, `head`,
+`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo
+1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace`
+file, but whenever you read from the file the content is consumed.
+
+**Kernelshark**
+~~~~~~~~~~~~~~~
+
+A GUI interface to visualize the traces as a graph and list view from the
+output of the `trace-cmd
+<https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/>`__ application.
+
+`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_
+
+`Full Ftrace documentation <../trace/ftrace.html>`__
+
+Perf & alternatives
+-------------------
+
+The tools mentioned above provide ways to inspect kernel code, results, variable values, etc.
+Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue.
+
+.. _performance:
+
+**Why should you do a performance analysis?**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A performance analysis is a good first step when among other reasons:
+
+- you cannot define the issue
+- you do not know where it occurs
+- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel
+
+.. _linux-tools:
+
+**How to do a simple analysis with linux tools?**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For the start of a performance analysis, you can start with the usual tools like:
+
+- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*)
+- `mpstat -P ALL` (*look at the load distribution among CPUs*)
+- `iostat -x` (*observe input and output devices utilization and performance*)
+- `vmstat` (*overview of memory usage on the system*)
+- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*)
+- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*)
+
+These should help to figure out restrict the areas to look at sufficiently.
+
+**Diving deeper with perf**
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The **perf** tool provides a series of metrics and events to further dial down on issues.
+
+Prerequisite: build or install perf on your system
+
+Gather statistics data for finding all files starting with `gcc` in `/usr`
+::
+
+ # perf stat -d find /usr -name 'gcc*' | wc -l
+
+ Performance counter stats for 'find /usr -name gcc*':
+
+ 1277.81 msec task-clock # 0.997 CPUs utilized
+ 9 context-switches # 7.043 /sec
+ 1 cpu-migrations # 0.783 /sec
+ 704 page-faults # 550.943 /sec
+ 766548897 cycles # 0.600 GHz (97.15%)
+ 798285467 instructions # 1.04 insn per cycle (97.15%)
+ 57582731 branches # 45.064 M/sec (2.85%)
+ 3842573 branch-misses # 6.67% of all branches (97.15%)
+ 281616097 L1-dcache-loads # 220.390 M/sec (97.15%)
+ 4220975 L1-dcache-load-misses # 1.50% of all L1-dcache accesses (97.15%)
+ <not supported> LLC-loads
+ <not supported> LLC-load-misses
+
+ 1.281746009 seconds time elapsed
+
+ 0.508796000 seconds user
+ 0.773209000 seconds sys
+
+
+ 52
+
+The availability of events and metrics depends on the system you are running.
+
+`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__
+
+**Perfetto**
+~~~~~~~~~~~~
+
+A set of tools to measure and analyze how well applications and systems perform.
+You can use it to:
+
+* identify bottlenecks
+* optimize code
+* make software run faster and more efficiently.
+
+**What is the difference between perfetto and perf?**
+
+* perf is tool as part of and specialized for the Linux Kernel and has CLI user
+ interface.
+* perfetto cross-platform performance analysis stack, has extended
+ functionality into userspace and provides a WEB user interface.
+
+`Full documentation <https://perfetto.dev/docs/>`__
+
+.. _kernel_panic_analysis_tools:
+
+Kernel panic analysis tools
+---------------------------
+
+ To analyse the crash dump please use `Kdump` & `Kexec`.
+
+ `Full documentation <../admin-guide/kdump/kdump.html>`__
+
+ In order to find the corresponding line in the code you can use `faddr2line
+ <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note
+ that you need to enable `CONFIG_DEBUG_INFO` for that to work.
+
+ An alternative to using `faddr2line` is the use of `objdump` (and it's
+ derivatives for the different platforms like `aarch64-linux-gnu-objdump`),
+ take this line as an example:
+
+ `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`.
+
+ We can find the corresponding line of code by executing:
+ ::
+
+ aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40
+ 0000000000000ac8 <rkvdec_device_run>:
+ ac8: d503201f nop
+ acc: d503201f nop
+ {
+ ad0: d503233f paciasp
+ ad4: a9bd7bfd stp x29, x30, [sp, #-48]!
+ ad8: 910003fd mov x29, sp
+ adc: a90153f3 stp x19, x20, [sp, #16]
+ ae0: a9025bf5 stp x21, x22, [sp, #32]
+ const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
+ ae4: f9411814 ldr x20, [x0, #560]
+ struct rkvdec_dev *rkvdec = ctx->dev;
+ ae8: f9418015 ldr x21, [x0, #768]
+ if (WARN_ON(!desc))
+ aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec>
+ ret = pm_runtime_resume_and_get(rkvdec->dev);
+ af0: f943d2b6 ldr x22, [x21, #1952]
+ ret = __pm_runtime_resume(dev, RPM_GET_PUT);
+ af4: aa0003f3 mov x19, x0
+ af8: 52800081 mov w1, #0x4 // #4
+ afc: aa1603e0 mov x0, x22
+ b00: 94000000 bl 0 <__pm_runtime_resume>
+ if (ret < 0) {
+ b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4>
+ dev_warn(rkvdec->dev, "Not good\n");
+ b08: f943d2a0 ldr x0, [x21, #1952]
+ b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8>
+ b10: 91000021 add x1, x1, #0x0
+ b14: 94000000 bl 0 <_dev_warn>
+ *bad = 1;
+ b18: d2800001 mov x1, #0x0 // #0
+ ...
+
+
+ To find the matching line we just have to add `0x50` (from
+ `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8
+ <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`.
+
+**Copyright** |copy| 2024 : Collabora
diff --git a/Documentation/index.rst b/Documentation/index.rst
index f9f525f4c0dd..eb8de7ba8e41 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -57,6 +57,7 @@ Various other manuals with useful information for all kernel developers.
Testing guide <dev-tools/testing-overview>
Hacking guide <kernel-hacking/index>
Tracing <trace/index>
+ Debugging <debugging/index>
Fault injection <fault-injection/index>
Livepatching <livepatch/index>
Rust <rust/index>
@@ -76,6 +77,7 @@ developers seeking information on the kernel's user-space APIs.
Build system <kbuild/index>
Reporting issues <admin-guide/reporting-issues.rst>
Userspace tools <tools/index>
+ Userspace debugging tools <debugging/userspace_debugging_guide.rst>
Userspace API <userspace-api/index>
See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
--
2.25.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 1/3] docs: media: Create separate documentation folder for media Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
@ 2024-09-24 8:45 ` Sebastian Fricke
2024-10-03 22:32 ` Steve Cho
` (2 more replies)
2024-10-22 15:23 ` [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
2024-10-24 22:01 ` Jonathan Corbet
4 siblings, 3 replies; 18+ messages in thread
From: Sebastian Fricke @ 2024-09-24 8:45 UTC (permalink / raw)
To: Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
Create a guides section for all documentation material, that isn't
strictly related to a specific piece of code.
Provide a guide for developers on how to debug code with a focus on the
media subsystem. This document aims to provide a rough overview over the
possibilities and a rational to help choosing the right tool for the
given circumstances.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
---
Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
Documentation/media/guides/index.rst | 11 ++
Documentation/media/index.rst | 1 +
3 files changed, 186 insertions(+)
diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
new file mode 100644
index 000000000000..5f37801dd4ba
--- /dev/null
+++ b/Documentation/media/guides/debugging_issues.rst
@@ -0,0 +1,174 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+============================================
+Debugging and tracing in the media subsystem
+============================================
+
+This document serves as a starting point and lookup for debugging device
+drivers in the media subsystem.
+
+.. contents::
+ :depth: 3
+
+General debugging advice
+========================
+
+For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
+
+Available tools
+===============
+
+dev_debug module parameter
+--------------------------
+
+For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
+
+Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
+::
+
+ # cat /sys/class/video4linux/video3/name
+ rkvdec
+ # echo 0xff > /sys/class/video4linux/video3/dev_debug
+ # dmesg -wH
+ [...] videodev: v4l2_open: video3: open (0)
+ [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
+
+`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
+
+dev_dbg / v4l2_dbg
+------------------
+
+- Difference between both?
+
+ - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
+ - dev_dbg can be targeted by dynamic debug
+ - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
+
+Dynamic debug
+-------------
+
+For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
+
+Here is one example, that enables all available `pr_debug()`'s within the file:
+::
+
+ $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
+ $ ddcmd '-p; file v4l2-h264.c +p'
+ $ grep =p /proc/dynamic_debug/control
+ drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
+ drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
+
+Ftrace
+------
+
+For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
+
+Trace whenever the `rkvdec_try_ctrl` function is called
+::
+
+ $ cd /sys/kernel/tracing
+ $ echo function > /sys/kernel/tracing/current_tracer
+ $ echo rkvdec_try_ctrl > set_ftrace_filter
+ $ echo 1 > tracing_on
+ $ cat trace
+ h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
+ h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
+
+Find out from where the calls originate
+::
+
+ $ echo 1 > options/func_stack_trace
+ h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
+ h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
+ => rkvdec_try_ctrl
+ => try_or_set_cluster
+ => try_set_ext_ctrls_common
+ => try_set_ext_ctrls
+ => v4l2_s_ext_ctrls
+ => v4l_s_ext_ctrls
+ ...
+ h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
+ h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
+ => rkvdec_try_ctrl
+ => try_or_set_cluster
+ => v4l2_ctrl_request_setup
+ => rkvdec_run_preamble
+ => rkvdec_h264_run
+ => rkvdec_device_run
+ ...
+
+Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
+::
+
+ echo function_graph > current_tracer
+ echo rkvdec_h264_run > set_graph_function
+ echo 4 > max_graph_depth
+ echo do_interrupt_handler mutex_* > set_graph_notrace
+ echo 1 > options/funcgraph-retval
+ ...
+ 4) | rkvdec_h264_run [rockchip_vdec]() {
+ 4) | v4l2_ctrl_find [videodev]() {
+ ...
+ 4) | rkvdec_run_preamble [rockchip_vdec]() {
+ 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
+ ...
+ 4) | v4l2_ctrl_request_setup [videodev]() {
+ 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
+ 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
+ ...
+ 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
+ 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
+ 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
+ ...
+ 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
+ ...
+ 4) | rkvdec_run_postamble [rockchip_vdec]() {
+ ...
+ 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
+
+DebugFS
+-------
+
+For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
+
+Perf & alternatives
+-------------------
+
+For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
+
+Example for media devices:
+
+Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
+::
+
+ perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
+ ...
+ Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
+
+ 7794.23 msec task-clock:u # 0.697 CPUs utilized
+ 0 context-switches:u # 0.000 /sec
+ 0 cpu-migrations:u # 0.000 /sec
+ 11901 page-faults:u # 1.527 K/sec
+ 882671556 cycles:u # 0.113 GHz (95.79%)
+ 711708695 instructions:u # 0.81 insn per cycle (95.79%)
+ 10581935 branches:u # 1.358 M/sec (15.13%)
+ 6871144 branch-misses:u # 64.93% of all branches (95.79%)
+ 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
+ 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
+ <not supported> LLC-loads:u
+ <not supported> LLC-load-misses:u
+
+ 11.180830431 seconds time elapsed
+
+ 1.502318000 seconds user
+ 6.377221000 seconds sys
+
+The availability of events and metrics depends on the system you are running.
+
+Error checking & panic analysis
+-------------------------------
+
+For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
+
+**Copyright** |copy| 2024 : Collabora
diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
new file mode 100644
index 000000000000..0008966c0862
--- /dev/null
+++ b/Documentation/media/guides/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Media Guides
+============
+
+.. toctree::
+ :caption: Table of Contents
+ :maxdepth: 1
+
+ debugging_issues
diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
index d056a9e99dca..5461876fc401 100644
--- a/Documentation/media/index.rst
+++ b/Documentation/media/index.rst
@@ -7,6 +7,7 @@ Media Subsystem Documentation
.. toctree::
:maxdepth: 2
+ guides/index
../userspace-api/media/index
../driver-api/media/index.rst
../admin-guide/media/index
--
2.25.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
@ 2024-09-25 8:01 ` Bagas Sanjaya
2024-10-03 23:06 ` Steve Cho
2024-10-24 22:20 ` Jonathan Corbet
2 siblings, 0 replies; 18+ messages in thread
From: Bagas Sanjaya @ 2024-09-25 8:01 UTC (permalink / raw)
To: Sebastian Fricke, Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne
[-- Attachment #1: Type: text/plain, Size: 3017 bytes --]
On Tue, Sep 24, 2024 at 10:45:58AM +0200, Sebastian Fricke wrote:
> +This document serves as a general starting point and lookup for debugging device
> +drivers.
> +While this guide focuses on debugging that requires re-compiling the
> +module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
> +will guide you through tools like dynamic debug, ftrace and other tools useful
> +for debugging issues and behavior.
> +For general debugging advice, see `general-debugging-guide <index.html>`__.
You can use :doc: syntax for linking to other docs with custom anchor text,
like:
```
:doc:`userspace debugging guide <userspace_debugging_guide>`
```
(note the file suffix omission).
Better yet, specify the full doc path (e.g.
`Documentation/debugging/userspace_debugging_guide.rst`) and Sphinx will
generate the anchor text with target doc's title.
> +====================================================
> +General debugging advice for Linux Kernel developers
> +====================================================
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + driver_development_debugging_guide
> + userspace_debugging_guide
> +
> +.. only:: subproject and html
> +
> + Indices
> + =======
> +
> + * :ref:`genindex`
> +
> +General debugging advice
> +========================
Please split general debugging advice into its own doc (e.g. at
general-advice.rst). Most other docs have index.rst only for toctree
listing.
> +**When should you use this over** `Ftrace`_ **?**
> +
> +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
Sphinx complains about stray underscores:
Documentation/debugging/userspace_debugging_guide.rst:54: WARNING: Mismatch: both interpreted text role prefix and reference suffix.
I have to trim them:
---- >8 ----
diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
index 4d269a9ef9..eac459e30f 100644
--- a/Documentation/debugging/userspace_debugging_guide.rst
+++ b/Documentation/debugging/userspace_debugging_guide.rst
@@ -51,7 +51,7 @@ Here is one example, that enables all available `pr_debug()`'s within the file:
**When should you use this over** `Ftrace`_ **?**
-- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
+- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints>` or when you have added multiple pr_debug() statements during development
- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
- When you care more about receiving specific log messages than tracing the pattern of how a function is called
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
@ 2024-10-03 22:32 ` Steve Cho
2024-10-14 17:42 ` Daniel Almeida
2024-10-23 11:43 ` Hans Verkuil
2 siblings, 0 replies; 18+ messages in thread
From: Steve Cho @ 2024-10-03 22:32 UTC (permalink / raw)
To: Sebastian Fricke
Cc: Jonathan Corbet, linux-doc, linux-kernel, linux-media,
laurent.pinchart, hverkuil-cisco, mauro.chehab, kernel,
bob.beckett, nicolas.dufresne
Thank you Sebastian. I found this overview to be helpful.
Information about different tools are scattered around, so sometimes
not easy to find.
On Tue, Sep 24, 2024 at 1:47 AM Sebastian Fricke
<sebastian.fricke@collabora.com> wrote:
>
> Create a guides section for all documentation material, that isn't
> strictly related to a specific piece of code.
>
> Provide a guide for developers on how to debug code with a focus on the
> media subsystem. This document aims to provide a rough overview over the
> possibilities and a rational to help choosing the right tool for the
> given circumstances.
>
> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> ---
> Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
> Documentation/media/guides/index.rst | 11 ++
> Documentation/media/index.rst | 1 +
> 3 files changed, 186 insertions(+)
>
> diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
> new file mode 100644
> index 000000000000..5f37801dd4ba
> --- /dev/null
> +++ b/Documentation/media/guides/debugging_issues.rst
> @@ -0,0 +1,174 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +============================================
> +Debugging and tracing in the media subsystem
> +============================================
> +
> +This document serves as a starting point and lookup for debugging device
> +drivers in the media subsystem.
Also can be used for debugging from user space as you mentioned in the
cover letter.
> +
> +.. contents::
> + :depth: 3
> +
> +General debugging advice
> +========================
> +
> +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
> +
> +Available tools
> +===============
> +
> +dev_debug module parameter
> +--------------------------
> +
> +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
> +::
> +
> + # cat /sys/class/video4linux/video3/name
> + rkvdec
> + # echo 0xff > /sys/class/video4linux/video3/dev_debug
> + # dmesg -wH
> + [...] videodev: v4l2_open: video3: open (0)
> + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
mentioning limitations or known issues for tools would be helpful.
Yes, I saw you did it for some of tools.
e.g. I find dmesg to be helpful to scan through different ioctl calls
happening,
but it doesn't necessarily provide details about detailed setups for
each ioctl call.
> +
> +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
> +
> +dev_dbg / v4l2_dbg
> +------------------
> +
> +- Difference between both?
> +
> + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
> + - dev_dbg can be targeted by dynamic debug
> + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
> +
> +Dynamic debug
> +-------------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Here is one example, that enables all available `pr_debug()`'s within the file:
> +::
> +
> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> + $ ddcmd '-p; file v4l2-h264.c +p'
> + $ grep =p /proc/dynamic_debug/control
> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
As you know, dynamic debug was not consistently working for our team
for multiple reasons.
Mentioning few known issues or FAQ would be helpful.
> +
> +Ftrace
> +------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
general advice link is here, but shouldn't we at least provide 1
sentence summary for each tool?
Also, brief explanation about the below examples would be helpful too.
Sometimes people can get lost what you are trying to highlight here.
> +
> +Trace whenever the `rkvdec_try_ctrl` function is called
> +::
> +
> + $ cd /sys/kernel/tracing
> + $ echo function > /sys/kernel/tracing/current_tracer
> + $ echo rkvdec_try_ctrl > set_ftrace_filter
> + $ echo 1 > tracing_on
> + $ cat trace
> + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
> +
> +Find out from where the calls originate
> +::
> +
> + $ echo 1 > options/func_stack_trace
> + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => try_set_ext_ctrls_common
> + => try_set_ext_ctrls
> + => v4l2_s_ext_ctrls
> + => v4l_s_ext_ctrls
> + ...
> + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => v4l2_ctrl_request_setup
> + => rkvdec_run_preamble
> + => rkvdec_h264_run
> + => rkvdec_device_run
> + ...
> +
> +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
> +::
> +
> + echo function_graph > current_tracer
> + echo rkvdec_h264_run > set_graph_function
> + echo 4 > max_graph_depth
> + echo do_interrupt_handler mutex_* > set_graph_notrace
> + echo 1 > options/funcgraph-retval
> + ...
> + 4) | rkvdec_h264_run [rockchip_vdec]() {
> + 4) | v4l2_ctrl_find [videodev]() {
> + ...
> + 4) | rkvdec_run_preamble [rockchip_vdec]() {
> + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
> + ...
> + 4) | v4l2_ctrl_request_setup [videodev]() {
> + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
> + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
> + ...
> + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
> + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
> + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
> + ...
> + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
> + ...
> + 4) | rkvdec_run_postamble [rockchip_vdec]() {
> + ...
> + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
> +
> +DebugFS
> +-------
> +
> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +Perf & alternatives
> +-------------------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Example for media devices:
> +
> +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
> +::
> +
> + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
Some people might have no idea what this cmd is trying to do. :)
> + ...
> + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
> +
> + 7794.23 msec task-clock:u # 0.697 CPUs utilized
> + 0 context-switches:u # 0.000 /sec
> + 0 cpu-migrations:u # 0.000 /sec
> + 11901 page-faults:u # 1.527 K/sec
> + 882671556 cycles:u # 0.113 GHz (95.79%)
> + 711708695 instructions:u # 0.81 insn per cycle (95.79%)
> + 10581935 branches:u # 1.358 M/sec (15.13%)
> + 6871144 branch-misses:u # 64.93% of all branches (95.79%)
> + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
> + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
> + <not supported> LLC-loads:u
> + <not supported> LLC-load-misses:u
> +
> + 11.180830431 seconds time elapsed
> +
> + 1.502318000 seconds user
> + 6.377221000 seconds sys
> +
> +The availability of events and metrics depends on the system you are running.
> +
> +Error checking & panic analysis
> +-------------------------------
> +
> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
> new file mode 100644
> index 000000000000..0008966c0862
> --- /dev/null
> +++ b/Documentation/media/guides/index.rst
> @@ -0,0 +1,11 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +Media Guides
> +============
> +
> +.. toctree::
> + :caption: Table of Contents
> + :maxdepth: 1
> +
> + debugging_issues
> diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
> index d056a9e99dca..5461876fc401 100644
> --- a/Documentation/media/index.rst
> +++ b/Documentation/media/index.rst
> @@ -7,6 +7,7 @@ Media Subsystem Documentation
> .. toctree::
> :maxdepth: 2
>
> + guides/index
> ../userspace-api/media/index
> ../driver-api/media/index.rst
> ../admin-guide/media/index
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
2024-09-25 8:01 ` Bagas Sanjaya
@ 2024-10-03 23:06 ` Steve Cho
2024-10-04 10:12 ` Sebastian Fricke
2024-10-24 22:20 ` Jonathan Corbet
2 siblings, 1 reply; 18+ messages in thread
From: Steve Cho @ 2024-10-03 23:06 UTC (permalink / raw)
To: Sebastian Fricke
Cc: Jonathan Corbet, linux-doc, linux-kernel, linux-media,
laurent.pinchart, hverkuil-cisco, mauro.chehab, kernel,
bob.beckett, nicolas.dufresne
Few minor comments.
On Tue, Sep 24, 2024 at 1:47 AM Sebastian Fricke
<sebastian.fricke@collabora.com> wrote:
>
> This idea was formed after noticing that new developers experience
> certain difficulty to navigate within the multitude of different
> debugging options in the Kernel and while there often is good
> documentation for the tools, the developer has to know first that they
> exist and where to find them.
> Add a general debugging section to the Kernel documentation, as an
> easily locatable entry point to other documentation and as a general
> guideline for the topic.
>
> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> ---
> .../driver_development_debugging_guide.rst | 193 +++++++++++++++
> Documentation/debugging/index.rst | 66 +++++
> .../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
> Documentation/index.rst | 2 +
> 4 files changed, 530 insertions(+)
>
> diff --git a/Documentation/debugging/driver_development_debugging_guide.rst b/Documentation/debugging/driver_development_debugging_guide.rst
> new file mode 100644
> index 000000000000..c750f63ac1d3
> --- /dev/null
> +++ b/Documentation/debugging/driver_development_debugging_guide.rst
> @@ -0,0 +1,193 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +========================================
> +Debugging advice for driver development
> +========================================
> +
> +This document serves as a general starting point and lookup for debugging device
> +drivers.
> +While this guide focuses on debugging that requires re-compiling the
> +module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
> +will guide you through tools like dynamic debug, ftrace and other tools useful
> +for debugging issues and behavior.
> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> +
> +.. contents::
> + :depth: 3
> +
> +Available tools
> +===============
> +
> +Printk & friends
> +----------------
> +
> +These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
> +
> +.. _printk:
> +
> +**Simple printk**
> +~~~~~~~~~~~~~~~~~
> +
> +The classic, can be used to great effect for quick and dirty development
> +of new modules or to extract arbitrary necessary data for troubleshooting.
> +
> +Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
> +
> +**Pros**:
> +
> +- No need to learn anything, simple to use
> +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
> +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
I experienced this too before, but isn't it generalizing too much here?
I mean it is not easy to observe timing issue with printk.
> +
> +**Cons**:
> +
> +- Requires rebuilding the kernel/module
> +- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
> +
> +`Full documentation <../core-api/printk-basics.html>`__
> +
> +.. _trace_printk:
> +
> +**Trace_printk**
> +~~~~~~~~~~~~~~~~
> +
> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
> +
> +It is a tiny bit less comfortable to use than `printk`_, because you will have
> +to read the messages from the trace file (See: `Reading the ftrace log
> +<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
> +kernel log, but very useful when printk adds unwanted delays into the code
> +execution, causing issues to be flaky or hidden.)
> +
> +If the processing of this still causes timing issues then you can try `trace_puts()`.
> +
> +`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
> +
> +**dev_dbg**
> +~~~~~~~~~~~
> +
> +Print statement, which can be target by `dynamic debug
nit: s/target/targeted
> +<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
> +information about the device used within the context.
> +
> +**When is it appropriate to leave a debug print in the code?**
> +
> +Permanent debug statements have to be useful for a developer to troubleshoot
> +driver misbehavior. Judging that is a bit more of an art than a science, but
> +some guidelines are in the `Coding style guide
> +<../process/coding-style.html#printing-kernel-messages>`__.
> +
> +**Custom printk**
> +~~~~~~~~~~~~~~~~~
> +
> +Example:
> +::
> +
> + #define core_dbg(fmt, arg...) do { \
> + if (core_debug) \
> + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
> + } while (0)
> +
> +**When should you do this?**
> +
> +It is better to just use a `pr_debug()`, which can later be turned on/off with
> +dynamic debug. Additionally, a lot of drivers activate these prints via a
> +variable like `core_debug` set by a module parameter. However, Module
> +parameters `are not recommended anymore
> +<https://lkml.org/lkml/2024/3/27/163>`_.
> +
> +Ftrace
> +------
> +
> +**Creating custom Ftrace tracepoint**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
> +
> +`Full event tracing documentation <../trace/events.html>`__
> +
> +`Full Ftrace documentation <../trace/ftrace.html>`__
> +
> +DebugFS
> +-------
> +
> +Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
> +
> +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
> +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
> +Possible use-cases among others:
> +
> +- Store register values
> +- Keep track of variables
> +- Store errors
> +- Store settings
> +- Toggle a setting like debug on/off
> +- Error injection
> +
> +This is especially useful, when the size of a data dump would be hard to digest as
> +part of the general kernel log (for example when dumping raw bitstream data) or
> +when you are not interested in all the values all the time, but with the
> +possibility to inspect them.
> +
> +The general idea is:
> +
> +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
> +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
> +
> + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
> + - any update of `my_variable` will update the value in the file
> +
> +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
> +
> +`Full documentation <../filesystems/debugfs.html>`__
> +
> +.. _error_checking:
> +
> +KASAN, UBSAN, lockdep and other error checkers
> +----------------------------------------------
> +
> +KASAN (Kernel Address Sanitizer)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +KASAN is a dynamic memory error detector that helps to find use-after-free and
> +out-of-bounds bugs. It uses compile-time instrumentation to check every memory
> +access.
> +
> +`Full documentation <../dev-tools/kasan.html>`__
> +
> +UBSAN (Undefined Behavior Sanitizer)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +UBSAN relies on compiler instrumentation and runtime checks to detect undefined
> +behavior. It is designed to find a variety of issues, including signed integer overflow,
> +array index out of bounds, and more.
> +
> +`Full documentation <../dev-tools/ubsan.html>`__
> +
> +lockdep (Lock Dependency Validator)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +lockdep is a runtime lock dependency validator that detects potential deadlocks
> +and other locking-related issues in the kernel.
> +It tracks lock acquisitions and releases, building a dependency graph that is
> +analyzed for potential deadlocks.
> +lockdep is especially useful for validating the correctness of lock ordering in
> +the kernel.
> +
> +device coredump
> +---------------
> +
> +Prerequisite: :code:`#include <linux/devcoredump.h>`
> +
> +Provides infrastructure through which the driver can provide arbitrary data to
> +userland. It is most often used in conjunction with udev or similar userland
> +infrastructure to listen for the kernel uevents, which indicates the dump is
> +ready. Udev then usually has rules to copy that file somewhere for long-term
> +storage and analysis as by default the data for the dump is automatically
nit: didn't understand. maybe intended s/as/and/ ?
> +cleaned up after 5 minutes.
> +That data is then analyzed with driver-specific tools or GDB.
> +
> +You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/debugging/index.rst b/Documentation/debugging/index.rst
> new file mode 100644
> index 000000000000..7bdad2fa09e1
> --- /dev/null
> +++ b/Documentation/debugging/index.rst
> @@ -0,0 +1,66 @@
> +
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +====================================================
> +General debugging advice for Linux Kernel developers
> +====================================================
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + driver_development_debugging_guide
> + userspace_debugging_guide
> +
> +.. only:: subproject and html
> +
> + Indices
> + =======
> +
> + * :ref:`genindex`
> +
> +General debugging advice
> +========================
> +
> +Depending on the issue, a different set of tools is available to track down the
> +problem or even to realize whether there is one in the first place.
> +
> +As a first step you have to figure out what kind of issue you want to debug.
> +Depending on the answer, your methodology and choice of tools may vary.
> +
> +Do I need to debug with limited access?
> +---------------------------------------
> +
> +Do you have limited access to the machine or are you unable to stop the running execution?
> +
> +In this case your debugging capability depends on built-in debugging support of
> +provided distro kernel.
> +The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
> +brief overview over range of possible debugging tools in that situation. You
> +can check the capability of your kernel, in most cases, by looking into config
> +file within the /boot folder.
> +
> +Do I have root access to the system?
> +------------------------------------
> +
> +Are you easily able to replace the module in question or to install a new kernel?
> +
> +In that case your range of available tools is a lot bigger, you can find the
> +tools `here <driver_development_debugging_guide.html>`__.
> +
> +Is timing a factor?
> +-------------------
> +
> +It is important to understand if the problem you want to debug manifests itself
> +consistently (i.e. given a set of inputs you always get the same, incorrect
> +output), or inconsistently. If it manifests itself inconsistently, some timing
> +factor might be at play. If inserting delays into the code does change the
> +behavior, then quite likely timing is a factor.
> +
> +When timing does alter the outcome of the code execution using a simple `printk
> +<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
> +work, a similar alternative is to use `trace_printk
> +<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
> +messages to the trace file instead of the kernel log.
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
> new file mode 100644
> index 000000000000..4d269a9ef913
> --- /dev/null
> +++ b/Documentation/debugging/userspace_debugging_guide.rst
> @@ -0,0 +1,269 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +==========================
> +Userspace debugging advice
> +==========================
> +
> +A brief overview of common tools to debug the Linux Kernel from userspace.
> +For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> +
> +.. contents::
> + :depth: 3
> +
> +Available tools
> +===============
> +
> +Dynamic debug
> +-------------
> +
> +Mechanism to filter what ends up in the kernel log by dis-/en-abling log
> +messages.
> +
> +Prerequisite: `CONFIG_DYNAMIC_DEBUG`
> +
> +.. _valid_dyndbg_prints:
> +
> +Dynamic debug is only able to target:
> +
> +- `pr_debug()`
> +- `dev_dbg()`
> +- `print_hex_dump_debug()`
> +- `print_hex_dump_bytes()`
> +
> +Therefore the usability of this tool is quite limited in the media subsystem,
> +because, as of now, there is no uniform rule for adding debug prints to the codebase,
> +resulting in a variety of ways these prints are implemented.
Shouldn't this explanation and some examples below go to the media
specific document instead of here?
Some parts in this patch are tricky to decide where to go, but
suggesting to review once again.
> +
> +Also, note that most debug statements are implemented as a variation of
> +`dprintk`, which have to be activated via a parameter in respective module,
> +dynamic debug is unable to do that step for you.
> +
> +Here is one example, that enables all available `pr_debug()`'s within the file:
> +::
> +
> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> + $ ddcmd '-p; file v4l2-h264.c +p'
> + $ grep =p /proc/dynamic_debug/control
> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> +
> +**When should you use this over** `Ftrace`_ **?**
> +
> +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
> +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
> +- When you care more about receiving specific log messages than tracing the pattern of how a function is called
> +
> +`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
> +
> +Ftrace
> +------
> +
> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
> +
> +Trace whenever the a file is opened:
> +::
> +
> + $ cd /sys/kernel/tracing
> + $ echo function > /sys/kernel/tracing/current_tracer
> + $ echo do_filep_open > set_ftrace_filter
> + $ echo 1 > tracing_on
> + $ cat trace
> + find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
> + find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
> + find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
> +
> +.. _event_tracing:
> +
> +Activate a ftrace event on top of that:
> +::
> +
> + $ echo 1 > events/kmem/kfree/enable
> + find-5351 [005] ...1. 678288.910143: do_filp_open <-do_sys_openat2
> + find-5351 [005] ..... 678288.910185: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> + find-5351 [005] ...1. 678288.910218: do_filp_open <-do_sys_openat2
> + find-5351 [005] ..... 678288.910260: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> + find-5351 [005] ...1. 678288.910293: do_filp_open <-do_sys_openat2
> + find-5351 [005] ..... 678288.910345: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> + find-5351 [005] ..... 678288.910389: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000007ba73e40
> + find-5351 [005] ..... 678288.910390: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000009e4850bc
> + find-5351 [005] ..... 678288.910391: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000004156f20f
> + find-5351 [005] ..... 678288.910393: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=00000000c7207e20
> + find-5351 [005] ..... 678288.910394: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000037c31d76
> + find-5351 [005] ..... 678288.910395: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000001922677
> +
> +.. _read_ftrace_log:
> +
> +**Reading the ftrace log**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The `trace` file can be read just like any other file (`cat`, `tail`, `head`,
> +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo
> +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace`
> +file, but whenever you read from the file the content is consumed.
> +
> +**Kernelshark**
> +~~~~~~~~~~~~~~~
> +
> +A GUI interface to visualize the traces as a graph and list view from the
> +output of the `trace-cmd
> +<https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/>`__ application.
> +
> +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_
> +
> +`Full Ftrace documentation <../trace/ftrace.html>`__
> +
> +Perf & alternatives
> +-------------------
> +
> +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc.
> +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue.
nit: add comma. better readability.
s/and for those cases /and for those cases, /
> +
> +.. _performance:
> +
> +**Why should you do a performance analysis?**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +A performance analysis is a good first step when among other reasons:
> +
> +- you cannot define the issue
> +- you do not know where it occurs
> +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel
> +
> +.. _linux-tools:
> +
> +**How to do a simple analysis with linux tools?**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For the start of a performance analysis, you can start with the usual tools like:
> +
> +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*)
> +- `mpstat -P ALL` (*look at the load distribution among CPUs*)
> +- `iostat -x` (*observe input and output devices utilization and performance*)
> +- `vmstat` (*overview of memory usage on the system*)
> +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*)
> +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*)
> +
> +These should help to figure out restrict the areas to look at sufficiently.
nit: s/figure out restrict/narrow down/
> +
> +**Diving deeper with perf**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The **perf** tool provides a series of metrics and events to further dial down on issues.
> +
> +Prerequisite: build or install perf on your system
> +
> +Gather statistics data for finding all files starting with `gcc` in `/usr`
> +::
> +
> + # perf stat -d find /usr -name 'gcc*' | wc -l
> +
> + Performance counter stats for 'find /usr -name gcc*':
> +
> + 1277.81 msec task-clock # 0.997 CPUs utilized
> + 9 context-switches # 7.043 /sec
> + 1 cpu-migrations # 0.783 /sec
> + 704 page-faults # 550.943 /sec
> + 766548897 cycles # 0.600 GHz (97.15%)
> + 798285467 instructions # 1.04 insn per cycle (97.15%)
> + 57582731 branches # 45.064 M/sec (2.85%)
> + 3842573 branch-misses # 6.67% of all branches (97.15%)
> + 281616097 L1-dcache-loads # 220.390 M/sec (97.15%)
> + 4220975 L1-dcache-load-misses # 1.50% of all L1-dcache accesses (97.15%)
> + <not supported> LLC-loads
> + <not supported> LLC-load-misses
> +
> + 1.281746009 seconds time elapsed
> +
> + 0.508796000 seconds user
> + 0.773209000 seconds sys
> +
> +
> + 52
> +
> +The availability of events and metrics depends on the system you are running.
> +
> +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__
> +
> +**Perfetto**
> +~~~~~~~~~~~~
> +
> +A set of tools to measure and analyze how well applications and systems perform.
> +You can use it to:
> +
> +* identify bottlenecks
> +* optimize code
> +* make software run faster and more efficiently.
> +
> +**What is the difference between perfetto and perf?**
> +
> +* perf is tool as part of and specialized for the Linux Kernel and has CLI user
> + interface.
> +* perfetto cross-platform performance analysis stack, has extended
> + functionality into userspace and provides a WEB user interface.
> +
> +`Full documentation <https://perfetto.dev/docs/>`__
> +
> +.. _kernel_panic_analysis_tools:
> +
> +Kernel panic analysis tools
> +---------------------------
> +
> + To analyse the crash dump please use `Kdump` & `Kexec`.
> +
> + `Full documentation <../admin-guide/kdump/kdump.html>`__
> +
> + In order to find the corresponding line in the code you can use `faddr2line
> + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note
> + that you need to enable `CONFIG_DEBUG_INFO` for that to work.
> +
> + An alternative to using `faddr2line` is the use of `objdump` (and it's
> + derivatives for the different platforms like `aarch64-linux-gnu-objdump`),
> + take this line as an example:
> +
> + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`.
> +
> + We can find the corresponding line of code by executing:
> + ::
> +
> + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40
> + 0000000000000ac8 <rkvdec_device_run>:
> + ac8: d503201f nop
> + acc: d503201f nop
> + {
> + ad0: d503233f paciasp
> + ad4: a9bd7bfd stp x29, x30, [sp, #-48]!
> + ad8: 910003fd mov x29, sp
> + adc: a90153f3 stp x19, x20, [sp, #16]
> + ae0: a9025bf5 stp x21, x22, [sp, #32]
> + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
> + ae4: f9411814 ldr x20, [x0, #560]
> + struct rkvdec_dev *rkvdec = ctx->dev;
> + ae8: f9418015 ldr x21, [x0, #768]
> + if (WARN_ON(!desc))
> + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec>
> + ret = pm_runtime_resume_and_get(rkvdec->dev);
> + af0: f943d2b6 ldr x22, [x21, #1952]
> + ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> + af4: aa0003f3 mov x19, x0
> + af8: 52800081 mov w1, #0x4 // #4
> + afc: aa1603e0 mov x0, x22
> + b00: 94000000 bl 0 <__pm_runtime_resume>
> + if (ret < 0) {
> + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4>
> + dev_warn(rkvdec->dev, "Not good\n");
> + b08: f943d2a0 ldr x0, [x21, #1952]
> + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8>
> + b10: 91000021 add x1, x1, #0x0
> + b14: 94000000 bl 0 <_dev_warn>
> + *bad = 1;
> + b18: d2800001 mov x1, #0x0 // #0
> + ...
> +
> +
> + To find the matching line we just have to add `0x50` (from
> + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8
> + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`.
I didn't understand this part. Can you explain?
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index f9f525f4c0dd..eb8de7ba8e41 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -57,6 +57,7 @@ Various other manuals with useful information for all kernel developers.
> Testing guide <dev-tools/testing-overview>
> Hacking guide <kernel-hacking/index>
> Tracing <trace/index>
> + Debugging <debugging/index>
> Fault injection <fault-injection/index>
> Livepatching <livepatch/index>
> Rust <rust/index>
> @@ -76,6 +77,7 @@ developers seeking information on the kernel's user-space APIs.
> Build system <kbuild/index>
> Reporting issues <admin-guide/reporting-issues.rst>
> Userspace tools <tools/index>
> + Userspace debugging tools <debugging/userspace_debugging_guide.rst>
> Userspace API <userspace-api/index>
>
> See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-10-03 23:06 ` Steve Cho
@ 2024-10-04 10:12 ` Sebastian Fricke
2024-10-04 23:30 ` Steve Cho
0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Fricke @ 2024-10-04 10:12 UTC (permalink / raw)
To: Steve Cho
Cc: Jonathan Corbet, linux-doc, linux-kernel, linux-media,
laurent.pinchart, hverkuil-cisco, mauro.chehab, kernel,
bob.beckett, nicolas.dufresne
Hey Steve,
On 03.10.2024 16:06, Steve Cho wrote:
>Few minor comments.
>
>On Tue, Sep 24, 2024 at 1:47 AM Sebastian Fricke
><sebastian.fricke@collabora.com> wrote:
>>
>> This idea was formed after noticing that new developers experience
>> certain difficulty to navigate within the multitude of different
>> debugging options in the Kernel and while there often is good
>> documentation for the tools, the developer has to know first that they
>> exist and where to find them.
>> Add a general debugging section to the Kernel documentation, as an
>> easily locatable entry point to other documentation and as a general
>> guideline for the topic.
>>
>> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
>> ---
>> .../driver_development_debugging_guide.rst | 193 +++++++++++++++
>> Documentation/debugging/index.rst | 66 +++++
>> .../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
>> Documentation/index.rst | 2 +
>> 4 files changed, 530 insertions(+)
>>
>> diff --git a/Documentation/debugging/driver_development_debugging_guide.rst b/Documentation/debugging/driver_development_debugging_guide.rst
>> new file mode 100644
>> index 000000000000..c750f63ac1d3
>> --- /dev/null
>> +++ b/Documentation/debugging/driver_development_debugging_guide.rst
>> @@ -0,0 +1,193 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. include:: <isonum.txt>
>> +
>> +========================================
>> +Debugging advice for driver development
>> +========================================
>> +
>> +This document serves as a general starting point and lookup for debugging device
>> +drivers.
>> +While this guide focuses on debugging that requires re-compiling the
>> +module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
>> +will guide you through tools like dynamic debug, ftrace and other tools useful
>> +for debugging issues and behavior.
>> +For general debugging advice, see `general-debugging-guide <index.html>`__.
>> +
>> +.. contents::
>> + :depth: 3
>> +
>> +Available tools
>> +===============
>> +
>> +Printk & friends
>> +----------------
>> +
>> +These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
>> +
>> +.. _printk:
>> +
>> +**Simple printk**
>> +~~~~~~~~~~~~~~~~~
>> +
>> +The classic, can be used to great effect for quick and dirty development
>> +of new modules or to extract arbitrary necessary data for troubleshooting.
>> +
>> +Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
>> +
>> +**Pros**:
>> +
>> +- No need to learn anything, simple to use
>> +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
>> +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
>I experienced this too before, but isn't it generalizing too much here?
>I mean it is not easy to observe timing issue with printk.
True it is not very easy to do this deterministically, but I thought it
might be helpful as an idea, in case an issue is flaky. A few print
statements might cause enough delay to make the issue more reproducible,
that was at least my experience in the past. Maybe I could word it
better or this potential positive aspect can be derived from the
negative aspect below.
>
>> +
>> +**Cons**:
>> +
>> +- Requires rebuilding the kernel/module
>> +- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
>> +
>> +`Full documentation <../core-api/printk-basics.html>`__
>> +
>> +.. _trace_printk:
>> +
>> +**Trace_printk**
>> +~~~~~~~~~~~~~~~~
>> +
>> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
>> +
>> +It is a tiny bit less comfortable to use than `printk`_, because you will have
>> +to read the messages from the trace file (See: `Reading the ftrace log
>> +<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
>> +kernel log, but very useful when printk adds unwanted delays into the code
>> +execution, causing issues to be flaky or hidden.)
>> +
>> +If the processing of this still causes timing issues then you can try `trace_puts()`.
>> +
>> +`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
>> +
>> +**dev_dbg**
>> +~~~~~~~~~~~
>> +
>> +Print statement, which can be target by `dynamic debug
>nit: s/target/targeted
>
>> +<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
>> +information about the device used within the context.
>> +
>> +**When is it appropriate to leave a debug print in the code?**
>> +
>> +Permanent debug statements have to be useful for a developer to troubleshoot
>> +driver misbehavior. Judging that is a bit more of an art than a science, but
>> +some guidelines are in the `Coding style guide
>> +<../process/coding-style.html#printing-kernel-messages>`__.
>> +
>> +**Custom printk**
>> +~~~~~~~~~~~~~~~~~
>> +
>> +Example:
>> +::
>> +
>> + #define core_dbg(fmt, arg...) do { \
>> + if (core_debug) \
>> + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
>> + } while (0)
>> +
>> +**When should you do this?**
>> +
>> +It is better to just use a `pr_debug()`, which can later be turned on/off with
>> +dynamic debug. Additionally, a lot of drivers activate these prints via a
>> +variable like `core_debug` set by a module parameter. However, Module
>> +parameters `are not recommended anymore
>> +<https://lkml.org/lkml/2024/3/27/163>`_.
>> +
>> +Ftrace
>> +------
>> +
>> +**Creating custom Ftrace tracepoint**
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
>> +
>> +`Full event tracing documentation <../trace/events.html>`__
>> +
>> +`Full Ftrace documentation <../trace/ftrace.html>`__
>> +
>> +DebugFS
>> +-------
>> +
>> +Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
>> +
>> +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
>> +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
>> +Possible use-cases among others:
>> +
>> +- Store register values
>> +- Keep track of variables
>> +- Store errors
>> +- Store settings
>> +- Toggle a setting like debug on/off
>> +- Error injection
>> +
>> +This is especially useful, when the size of a data dump would be hard to digest as
>> +part of the general kernel log (for example when dumping raw bitstream data) or
>> +when you are not interested in all the values all the time, but with the
>> +possibility to inspect them.
>> +
>> +The general idea is:
>> +
>> +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
>> +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
>> +
>> + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
>> + - any update of `my_variable` will update the value in the file
>> +
>> +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
>> +
>> +`Full documentation <../filesystems/debugfs.html>`__
>> +
>> +.. _error_checking:
>> +
>> +KASAN, UBSAN, lockdep and other error checkers
>> +----------------------------------------------
>> +
>> +KASAN (Kernel Address Sanitizer)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +KASAN is a dynamic memory error detector that helps to find use-after-free and
>> +out-of-bounds bugs. It uses compile-time instrumentation to check every memory
>> +access.
>> +
>> +`Full documentation <../dev-tools/kasan.html>`__
>> +
>> +UBSAN (Undefined Behavior Sanitizer)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +UBSAN relies on compiler instrumentation and runtime checks to detect undefined
>> +behavior. It is designed to find a variety of issues, including signed integer overflow,
>> +array index out of bounds, and more.
>> +
>> +`Full documentation <../dev-tools/ubsan.html>`__
>> +
>> +lockdep (Lock Dependency Validator)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +lockdep is a runtime lock dependency validator that detects potential deadlocks
>> +and other locking-related issues in the kernel.
>> +It tracks lock acquisitions and releases, building a dependency graph that is
>> +analyzed for potential deadlocks.
>> +lockdep is especially useful for validating the correctness of lock ordering in
>> +the kernel.
>> +
>> +device coredump
>> +---------------
>> +
>> +Prerequisite: :code:`#include <linux/devcoredump.h>`
>> +
>> +Provides infrastructure through which the driver can provide arbitrary data to
>> +userland. It is most often used in conjunction with udev or similar userland
>> +infrastructure to listen for the kernel uevents, which indicates the dump is
>> +ready. Udev then usually has rules to copy that file somewhere for long-term
>> +storage and analysis as by default the data for the dump is automatically
>nit: didn't understand. maybe intended s/as/and/ ?
What I wanted to communicate here is that, by default dumps are cleaned
up automatically, if you want to store the files for later usage you
need to specifiy udev rules, which copy the dumps to a different
location.
I will try to clarify this for the next version
>
>> +cleaned up after 5 minutes.
>> +That data is then analyzed with driver-specific tools or GDB.
>> +
>> +You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
>> +
>> +**Copyright** |copy| 2024 : Collabora
>> diff --git a/Documentation/debugging/index.rst b/Documentation/debugging/index.rst
>> new file mode 100644
>> index 000000000000..7bdad2fa09e1
>> --- /dev/null
>> +++ b/Documentation/debugging/index.rst
>> @@ -0,0 +1,66 @@
>> +
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. include:: <isonum.txt>
>> +
>> +====================================================
>> +General debugging advice for Linux Kernel developers
>> +====================================================
>> +
>> +.. toctree::
>> + :maxdepth: 1
>> +
>> + driver_development_debugging_guide
>> + userspace_debugging_guide
>> +
>> +.. only:: subproject and html
>> +
>> + Indices
>> + =======
>> +
>> + * :ref:`genindex`
>> +
>> +General debugging advice
>> +========================
>> +
>> +Depending on the issue, a different set of tools is available to track down the
>> +problem or even to realize whether there is one in the first place.
>> +
>> +As a first step you have to figure out what kind of issue you want to debug.
>> +Depending on the answer, your methodology and choice of tools may vary.
>> +
>> +Do I need to debug with limited access?
>> +---------------------------------------
>> +
>> +Do you have limited access to the machine or are you unable to stop the running execution?
>> +
>> +In this case your debugging capability depends on built-in debugging support of
>> +provided distro kernel.
>> +The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
>> +brief overview over range of possible debugging tools in that situation. You
>> +can check the capability of your kernel, in most cases, by looking into config
>> +file within the /boot folder.
>> +
>> +Do I have root access to the system?
>> +------------------------------------
>> +
>> +Are you easily able to replace the module in question or to install a new kernel?
>> +
>> +In that case your range of available tools is a lot bigger, you can find the
>> +tools `here <driver_development_debugging_guide.html>`__.
>> +
>> +Is timing a factor?
>> +-------------------
>> +
>> +It is important to understand if the problem you want to debug manifests itself
>> +consistently (i.e. given a set of inputs you always get the same, incorrect
>> +output), or inconsistently. If it manifests itself inconsistently, some timing
>> +factor might be at play. If inserting delays into the code does change the
>> +behavior, then quite likely timing is a factor.
>> +
>> +When timing does alter the outcome of the code execution using a simple `printk
>> +<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
>> +work, a similar alternative is to use `trace_printk
>> +<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
>> +messages to the trace file instead of the kernel log.
>> +
>> +**Copyright** |copy| 2024 : Collabora
>> diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
>> new file mode 100644
>> index 000000000000..4d269a9ef913
>> --- /dev/null
>> +++ b/Documentation/debugging/userspace_debugging_guide.rst
>> @@ -0,0 +1,269 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. include:: <isonum.txt>
>> +
>> +==========================
>> +Userspace debugging advice
>> +==========================
>> +
>> +A brief overview of common tools to debug the Linux Kernel from userspace.
>> +For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
>> +For general debugging advice, see `general-debugging-guide <index.html>`__.
>> +
>> +.. contents::
>> + :depth: 3
>> +
>> +Available tools
>> +===============
>> +
>> +Dynamic debug
>> +-------------
>> +
>> +Mechanism to filter what ends up in the kernel log by dis-/en-abling log
>> +messages.
>> +
>> +Prerequisite: `CONFIG_DYNAMIC_DEBUG`
>> +
>> +.. _valid_dyndbg_prints:
>> +
>> +Dynamic debug is only able to target:
>> +
>> +- `pr_debug()`
>> +- `dev_dbg()`
>> +- `print_hex_dump_debug()`
>> +- `print_hex_dump_bytes()`
>> +
>> +Therefore the usability of this tool is quite limited in the media subsystem,
>> +because, as of now, there is no uniform rule for adding debug prints to the codebase,
>> +resulting in a variety of ways these prints are implemented.
>Shouldn't this explanation and some examples below go to the media
>specific document instead of here?
Oh yes you are right that has to go to the media guide, I am still a bit
unsure about the example because I haven't found a good general example
so far.
>Some parts in this patch are tricky to decide where to go, but
>suggesting to review once again.
Yes, thanks for highlighting.
>
>> +
>> +Also, note that most debug statements are implemented as a variation of
>> +`dprintk`, which have to be activated via a parameter in respective module,
>> +dynamic debug is unable to do that step for you.
>> +
>> +Here is one example, that enables all available `pr_debug()`'s within the file:
>> +::
>> +
>> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
>> + $ ddcmd '-p; file v4l2-h264.c +p'
>> + $ grep =p /proc/dynamic_debug/control
>> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
>> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
>> +
>> +**When should you use this over** `Ftrace`_ **?**
>> +
>> +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
>> +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
>> +- When you care more about receiving specific log messages than tracing the pattern of how a function is called
>> +
>> +`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
>> +
>> +Ftrace
>> +------
>> +
>> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
>> +
>> +Trace whenever the a file is opened:
>> +::
>> +
>> + $ cd /sys/kernel/tracing
>> + $ echo function > /sys/kernel/tracing/current_tracer
>> + $ echo do_filep_open > set_ftrace_filter
>> + $ echo 1 > tracing_on
>> + $ cat trace
>> + find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
>> + find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
>> + find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
>> +
>> +.. _event_tracing:
>> +
>> +Activate a ftrace event on top of that:
>> +::
>> +
>> + $ echo 1 > events/kmem/kfree/enable
>> + find-5351 [005] ...1. 678288.910143: do_filp_open <-do_sys_openat2
>> + find-5351 [005] ..... 678288.910185: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
>> + find-5351 [005] ...1. 678288.910218: do_filp_open <-do_sys_openat2
>> + find-5351 [005] ..... 678288.910260: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
>> + find-5351 [005] ...1. 678288.910293: do_filp_open <-do_sys_openat2
>> + find-5351 [005] ..... 678288.910345: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
>> + find-5351 [005] ..... 678288.910389: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000007ba73e40
>> + find-5351 [005] ..... 678288.910390: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000009e4850bc
>> + find-5351 [005] ..... 678288.910391: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000004156f20f
>> + find-5351 [005] ..... 678288.910393: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=00000000c7207e20
>> + find-5351 [005] ..... 678288.910394: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000037c31d76
>> + find-5351 [005] ..... 678288.910395: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000001922677
>> +
>> +.. _read_ftrace_log:
>> +
>> +**Reading the ftrace log**
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The `trace` file can be read just like any other file (`cat`, `tail`, `head`,
>> +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo
>> +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace`
>> +file, but whenever you read from the file the content is consumed.
>> +
>> +**Kernelshark**
>> +~~~~~~~~~~~~~~~
>> +
>> +A GUI interface to visualize the traces as a graph and list view from the
>> +output of the `trace-cmd
>> +<https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/>`__ application.
>> +
>> +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_
>> +
>> +`Full Ftrace documentation <../trace/ftrace.html>`__
>> +
>> +Perf & alternatives
>> +-------------------
>> +
>> +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc.
>> +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue.
>nit: add comma. better readability.
>s/and for those cases /and for those cases, /
ack
>
>> +
>> +.. _performance:
>> +
>> +**Why should you do a performance analysis?**
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +A performance analysis is a good first step when among other reasons:
>> +
>> +- you cannot define the issue
>> +- you do not know where it occurs
>> +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel
>> +
>> +.. _linux-tools:
>> +
>> +**How to do a simple analysis with linux tools?**
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +For the start of a performance analysis, you can start with the usual tools like:
>> +
>> +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*)
>> +- `mpstat -P ALL` (*look at the load distribution among CPUs*)
>> +- `iostat -x` (*observe input and output devices utilization and performance*)
>> +- `vmstat` (*overview of memory usage on the system*)
>> +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*)
>> +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*)
>> +
>> +These should help to figure out restrict the areas to look at sufficiently.
>nit: s/figure out restrict/narrow down/
ack
>
>> +
>> +**Diving deeper with perf**
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The **perf** tool provides a series of metrics and events to further dial down on issues.
>> +
>> +Prerequisite: build or install perf on your system
>> +
>> +Gather statistics data for finding all files starting with `gcc` in `/usr`
>> +::
>> +
>> + # perf stat -d find /usr -name 'gcc*' | wc -l
>> +
>> + Performance counter stats for 'find /usr -name gcc*':
>> +
>> + 1277.81 msec task-clock # 0.997 CPUs utilized
>> + 9 context-switches # 7.043 /sec
>> + 1 cpu-migrations # 0.783 /sec
>> + 704 page-faults # 550.943 /sec
>> + 766548897 cycles # 0.600 GHz (97.15%)
>> + 798285467 instructions # 1.04 insn per cycle (97.15%)
>> + 57582731 branches # 45.064 M/sec (2.85%)
>> + 3842573 branch-misses # 6.67% of all branches (97.15%)
>> + 281616097 L1-dcache-loads # 220.390 M/sec (97.15%)
>> + 4220975 L1-dcache-load-misses # 1.50% of all L1-dcache accesses (97.15%)
>> + <not supported> LLC-loads
>> + <not supported> LLC-load-misses
>> +
>> + 1.281746009 seconds time elapsed
>> +
>> + 0.508796000 seconds user
>> + 0.773209000 seconds sys
>> +
>> +
>> + 52
>> +
>> +The availability of events and metrics depends on the system you are running.
>> +
>> +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__
>> +
>> +**Perfetto**
>> +~~~~~~~~~~~~
>> +
>> +A set of tools to measure and analyze how well applications and systems perform.
>> +You can use it to:
>> +
>> +* identify bottlenecks
>> +* optimize code
>> +* make software run faster and more efficiently.
>> +
>> +**What is the difference between perfetto and perf?**
>> +
>> +* perf is tool as part of and specialized for the Linux Kernel and has CLI user
>> + interface.
>> +* perfetto cross-platform performance analysis stack, has extended
>> + functionality into userspace and provides a WEB user interface.
>> +
>> +`Full documentation <https://perfetto.dev/docs/>`__
>> +
>> +.. _kernel_panic_analysis_tools:
>> +
>> +Kernel panic analysis tools
>> +---------------------------
>> +
>> + To analyse the crash dump please use `Kdump` & `Kexec`.
>> +
>> + `Full documentation <../admin-guide/kdump/kdump.html>`__
>> +
>> + In order to find the corresponding line in the code you can use `faddr2line
>> + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note
>> + that you need to enable `CONFIG_DEBUG_INFO` for that to work.
>> +
>> + An alternative to using `faddr2line` is the use of `objdump` (and it's
>> + derivatives for the different platforms like `aarch64-linux-gnu-objdump`),
>> + take this line as an example:
>> +
>> + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`.
>> +
>> + We can find the corresponding line of code by executing:
>> + ::
>> +
>> + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40
>> + 0000000000000ac8 <rkvdec_device_run>:
>> + ac8: d503201f nop
>> + acc: d503201f nop
>> + {
>> + ad0: d503233f paciasp
>> + ad4: a9bd7bfd stp x29, x30, [sp, #-48]!
>> + ad8: 910003fd mov x29, sp
>> + adc: a90153f3 stp x19, x20, [sp, #16]
>> + ae0: a9025bf5 stp x21, x22, [sp, #32]
>> + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
>> + ae4: f9411814 ldr x20, [x0, #560]
>> + struct rkvdec_dev *rkvdec = ctx->dev;
>> + ae8: f9418015 ldr x21, [x0, #768]
>> + if (WARN_ON(!desc))
>> + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec>
>> + ret = pm_runtime_resume_and_get(rkvdec->dev);
>> + af0: f943d2b6 ldr x22, [x21, #1952]
>> + ret = __pm_runtime_resume(dev, RPM_GET_PUT);
>> + af4: aa0003f3 mov x19, x0
>> + af8: 52800081 mov w1, #0x4 // #4
>> + afc: aa1603e0 mov x0, x22
>> + b00: 94000000 bl 0 <__pm_runtime_resume>
>> + if (ret < 0) {
>> + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4>
>> + dev_warn(rkvdec->dev, "Not good\n");
>> + b08: f943d2a0 ldr x0, [x21, #1952]
>> + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8>
>> + b10: 91000021 add x1, x1, #0x0
>> + b14: 94000000 bl 0 <_dev_warn>
>> + *bad = 1;
>> + b18: d2800001 mov x1, #0x0 // #0
>> + ...
>> +
>> +
>> + To find the matching line we just have to add `0x50` (from
>> + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8
>> + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`.
>I didn't understand this part. Can you explain?
I try to explain in that line how to interpret the output above.
Meaning, in this line from the crash dump:
`[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`
I can take the 0x50 as offset, which I have to add to the base address
of the corresponding function, which I find in this line:
0000000000000ac8 <rkvdec_device_run>:
The result of 0xac8 + 0x50 = 0xb18
And when I search for that address within the function I get the
following line:
*bad = 1;
b18: d2800001 mov x1, #0x0 // #0
Which is the offending line, I will try to clarify this in the next
version.
Regards,
Sebastian
>
>> +
>> +**Copyright** |copy| 2024 : Collabora
>> diff --git a/Documentation/index.rst b/Documentation/index.rst
>> index f9f525f4c0dd..eb8de7ba8e41 100644
>> --- a/Documentation/index.rst
>> +++ b/Documentation/index.rst
>> @@ -57,6 +57,7 @@ Various other manuals with useful information for all kernel developers.
>> Testing guide <dev-tools/testing-overview>
>> Hacking guide <kernel-hacking/index>
>> Tracing <trace/index>
>> + Debugging <debugging/index>
>> Fault injection <fault-injection/index>
>> Livepatching <livepatch/index>
>> Rust <rust/index>
>> @@ -76,6 +77,7 @@ developers seeking information on the kernel's user-space APIs.
>> Build system <kbuild/index>
>> Reporting issues <admin-guide/reporting-issues.rst>
>> Userspace tools <tools/index>
>> + Userspace debugging tools <debugging/userspace_debugging_guide.rst>
>> Userspace API <userspace-api/index>
>>
>> See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
>>
>> --
>> 2.25.1
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-10-04 10:12 ` Sebastian Fricke
@ 2024-10-04 23:30 ` Steve Cho
0 siblings, 0 replies; 18+ messages in thread
From: Steve Cho @ 2024-10-04 23:30 UTC (permalink / raw)
To: Sebastian Fricke
Cc: Jonathan Corbet, linux-doc, linux-kernel, linux-media,
laurent.pinchart, hverkuil-cisco, mauro.chehab, kernel,
bob.beckett, nicolas.dufresne
Thank you Sebastian.
On Fri, Oct 4, 2024 at 3:13 AM Sebastian Fricke
<sebastian.fricke@collabora.com> wrote:
>
> Hey Steve,
>
> On 03.10.2024 16:06, Steve Cho wrote:
> >Few minor comments.
> >
> >On Tue, Sep 24, 2024 at 1:47 AM Sebastian Fricke
> ><sebastian.fricke@collabora.com> wrote:
> >>
> >> This idea was formed after noticing that new developers experience
> >> certain difficulty to navigate within the multitude of different
> >> debugging options in the Kernel and while there often is good
> >> documentation for the tools, the developer has to know first that they
> >> exist and where to find them.
> >> Add a general debugging section to the Kernel documentation, as an
> >> easily locatable entry point to other documentation and as a general
> >> guideline for the topic.
> >>
> >> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> >> ---
> >> .../driver_development_debugging_guide.rst | 193 +++++++++++++++
> >> Documentation/debugging/index.rst | 66 +++++
> >> .../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
> >> Documentation/index.rst | 2 +
> >> 4 files changed, 530 insertions(+)
> >>
> >> diff --git a/Documentation/debugging/driver_development_debugging_guide.rst b/Documentation/debugging/driver_development_debugging_guide.rst
> >> new file mode 100644
> >> index 000000000000..c750f63ac1d3
> >> --- /dev/null
> >> +++ b/Documentation/debugging/driver_development_debugging_guide.rst
> >> @@ -0,0 +1,193 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +.. include:: <isonum.txt>
> >> +
> >> +========================================
> >> +Debugging advice for driver development
> >> +========================================
> >> +
> >> +This document serves as a general starting point and lookup for debugging device
> >> +drivers.
> >> +While this guide focuses on debugging that requires re-compiling the
> >> +module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
> >> +will guide you through tools like dynamic debug, ftrace and other tools useful
> >> +for debugging issues and behavior.
> >> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> >> +
> >> +.. contents::
> >> + :depth: 3
> >> +
> >> +Available tools
> >> +===============
> >> +
> >> +Printk & friends
> >> +----------------
> >> +
> >> +These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
> >> +
> >> +.. _printk:
> >> +
> >> +**Simple printk**
> >> +~~~~~~~~~~~~~~~~~
> >> +
> >> +The classic, can be used to great effect for quick and dirty development
> >> +of new modules or to extract arbitrary necessary data for troubleshooting.
> >> +
> >> +Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
> >> +
> >> +**Pros**:
> >> +
> >> +- No need to learn anything, simple to use
> >> +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
> >> +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
> >I experienced this too before, but isn't it generalizing too much here?
> >I mean it is not easy to observe timing issue with printk.
>
> True it is not very easy to do this deterministically, but I thought it
> might be helpful as an idea, in case an issue is flaky. A few print
> statements might cause enough delay to make the issue more reproducible,
> that was at least my experience in the past. Maybe I could word it
> better or this potential positive aspect can be derived from the
> negative aspect below.
Ack.
> >
> >> +
> >> +**Cons**:
> >> +
> >> +- Requires rebuilding the kernel/module
> >> +- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
> >> +
> >> +`Full documentation <../core-api/printk-basics.html>`__
> >> +
> >> +.. _trace_printk:
> >> +
> >> +**Trace_printk**
> >> +~~~~~~~~~~~~~~~~
> >> +
> >> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
> >> +
> >> +It is a tiny bit less comfortable to use than `printk`_, because you will have
> >> +to read the messages from the trace file (See: `Reading the ftrace log
> >> +<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
> >> +kernel log, but very useful when printk adds unwanted delays into the code
> >> +execution, causing issues to be flaky or hidden.)
> >> +
> >> +If the processing of this still causes timing issues then you can try `trace_puts()`.
> >> +
> >> +`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
> >> +
> >> +**dev_dbg**
> >> +~~~~~~~~~~~
> >> +
> >> +Print statement, which can be target by `dynamic debug
> >nit: s/target/targeted
> >
> >> +<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
> >> +information about the device used within the context.
> >> +
> >> +**When is it appropriate to leave a debug print in the code?**
> >> +
> >> +Permanent debug statements have to be useful for a developer to troubleshoot
> >> +driver misbehavior. Judging that is a bit more of an art than a science, but
> >> +some guidelines are in the `Coding style guide
> >> +<../process/coding-style.html#printing-kernel-messages>`__.
> >> +
> >> +**Custom printk**
> >> +~~~~~~~~~~~~~~~~~
> >> +
> >> +Example:
> >> +::
> >> +
> >> + #define core_dbg(fmt, arg...) do { \
> >> + if (core_debug) \
> >> + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
> >> + } while (0)
> >> +
> >> +**When should you do this?**
> >> +
> >> +It is better to just use a `pr_debug()`, which can later be turned on/off with
> >> +dynamic debug. Additionally, a lot of drivers activate these prints via a
> >> +variable like `core_debug` set by a module parameter. However, Module
> >> +parameters `are not recommended anymore
> >> +<https://lkml.org/lkml/2024/3/27/163>`_.
> >> +
> >> +Ftrace
> >> +------
> >> +
> >> +**Creating custom Ftrace tracepoint**
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
> >> +
> >> +`Full event tracing documentation <../trace/events.html>`__
> >> +
> >> +`Full Ftrace documentation <../trace/ftrace.html>`__
> >> +
> >> +DebugFS
> >> +-------
> >> +
> >> +Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
> >> +
> >> +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
> >> +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
> >> +Possible use-cases among others:
> >> +
> >> +- Store register values
> >> +- Keep track of variables
> >> +- Store errors
> >> +- Store settings
> >> +- Toggle a setting like debug on/off
> >> +- Error injection
> >> +
> >> +This is especially useful, when the size of a data dump would be hard to digest as
> >> +part of the general kernel log (for example when dumping raw bitstream data) or
> >> +when you are not interested in all the values all the time, but with the
> >> +possibility to inspect them.
> >> +
> >> +The general idea is:
> >> +
> >> +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
> >> +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
> >> +
> >> + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
> >> + - any update of `my_variable` will update the value in the file
> >> +
> >> +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
> >> +
> >> +`Full documentation <../filesystems/debugfs.html>`__
> >> +
> >> +.. _error_checking:
> >> +
> >> +KASAN, UBSAN, lockdep and other error checkers
> >> +----------------------------------------------
> >> +
> >> +KASAN (Kernel Address Sanitizer)
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +KASAN is a dynamic memory error detector that helps to find use-after-free and
> >> +out-of-bounds bugs. It uses compile-time instrumentation to check every memory
> >> +access.
> >> +
> >> +`Full documentation <../dev-tools/kasan.html>`__
> >> +
> >> +UBSAN (Undefined Behavior Sanitizer)
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +UBSAN relies on compiler instrumentation and runtime checks to detect undefined
> >> +behavior. It is designed to find a variety of issues, including signed integer overflow,
> >> +array index out of bounds, and more.
> >> +
> >> +`Full documentation <../dev-tools/ubsan.html>`__
> >> +
> >> +lockdep (Lock Dependency Validator)
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +lockdep is a runtime lock dependency validator that detects potential deadlocks
> >> +and other locking-related issues in the kernel.
> >> +It tracks lock acquisitions and releases, building a dependency graph that is
> >> +analyzed for potential deadlocks.
> >> +lockdep is especially useful for validating the correctness of lock ordering in
> >> +the kernel.
> >> +
> >> +device coredump
> >> +---------------
> >> +
> >> +Prerequisite: :code:`#include <linux/devcoredump.h>`
> >> +
> >> +Provides infrastructure through which the driver can provide arbitrary data to
> >> +userland. It is most often used in conjunction with udev or similar userland
> >> +infrastructure to listen for the kernel uevents, which indicates the dump is
> >> +ready. Udev then usually has rules to copy that file somewhere for long-term
> >> +storage and analysis as by default the data for the dump is automatically
> >nit: didn't understand. maybe intended s/as/and/ ?
>
> What I wanted to communicate here is that, by default dumps are cleaned
> up automatically, if you want to store the files for later usage you
> need to specifiy udev rules, which copy the dumps to a different
> location.
>
> I will try to clarify this for the next version
Got it. It's just that wording was not clear to me here.
>
> >
> >> +cleaned up after 5 minutes.
> >> +That data is then analyzed with driver-specific tools or GDB.
> >> +
> >> +You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
> >> +
> >> +**Copyright** |copy| 2024 : Collabora
> >> diff --git a/Documentation/debugging/index.rst b/Documentation/debugging/index.rst
> >> new file mode 100644
> >> index 000000000000..7bdad2fa09e1
> >> --- /dev/null
> >> +++ b/Documentation/debugging/index.rst
> >> @@ -0,0 +1,66 @@
> >> +
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +.. include:: <isonum.txt>
> >> +
> >> +====================================================
> >> +General debugging advice for Linux Kernel developers
> >> +====================================================
> >> +
> >> +.. toctree::
> >> + :maxdepth: 1
> >> +
> >> + driver_development_debugging_guide
> >> + userspace_debugging_guide
> >> +
> >> +.. only:: subproject and html
> >> +
> >> + Indices
> >> + =======
> >> +
> >> + * :ref:`genindex`
> >> +
> >> +General debugging advice
> >> +========================
> >> +
> >> +Depending on the issue, a different set of tools is available to track down the
> >> +problem or even to realize whether there is one in the first place.
> >> +
> >> +As a first step you have to figure out what kind of issue you want to debug.
> >> +Depending on the answer, your methodology and choice of tools may vary.
> >> +
> >> +Do I need to debug with limited access?
> >> +---------------------------------------
> >> +
> >> +Do you have limited access to the machine or are you unable to stop the running execution?
> >> +
> >> +In this case your debugging capability depends on built-in debugging support of
> >> +provided distro kernel.
> >> +The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
> >> +brief overview over range of possible debugging tools in that situation. You
> >> +can check the capability of your kernel, in most cases, by looking into config
> >> +file within the /boot folder.
> >> +
> >> +Do I have root access to the system?
> >> +------------------------------------
> >> +
> >> +Are you easily able to replace the module in question or to install a new kernel?
> >> +
> >> +In that case your range of available tools is a lot bigger, you can find the
> >> +tools `here <driver_development_debugging_guide.html>`__.
> >> +
> >> +Is timing a factor?
> >> +-------------------
> >> +
> >> +It is important to understand if the problem you want to debug manifests itself
> >> +consistently (i.e. given a set of inputs you always get the same, incorrect
> >> +output), or inconsistently. If it manifests itself inconsistently, some timing
> >> +factor might be at play. If inserting delays into the code does change the
> >> +behavior, then quite likely timing is a factor.
> >> +
> >> +When timing does alter the outcome of the code execution using a simple `printk
> >> +<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
> >> +work, a similar alternative is to use `trace_printk
> >> +<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
> >> +messages to the trace file instead of the kernel log.
> >> +
> >> +**Copyright** |copy| 2024 : Collabora
> >> diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
> >> new file mode 100644
> >> index 000000000000..4d269a9ef913
> >> --- /dev/null
> >> +++ b/Documentation/debugging/userspace_debugging_guide.rst
> >> @@ -0,0 +1,269 @@
> >> +.. SPDX-License-Identifier: GPL-2.0
> >> +.. include:: <isonum.txt>
> >> +
> >> +==========================
> >> +Userspace debugging advice
> >> +==========================
> >> +
> >> +A brief overview of common tools to debug the Linux Kernel from userspace.
> >> +For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
> >> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> >> +
> >> +.. contents::
> >> + :depth: 3
> >> +
> >> +Available tools
> >> +===============
> >> +
> >> +Dynamic debug
> >> +-------------
> >> +
> >> +Mechanism to filter what ends up in the kernel log by dis-/en-abling log
> >> +messages.
> >> +
> >> +Prerequisite: `CONFIG_DYNAMIC_DEBUG`
> >> +
> >> +.. _valid_dyndbg_prints:
> >> +
> >> +Dynamic debug is only able to target:
> >> +
> >> +- `pr_debug()`
> >> +- `dev_dbg()`
> >> +- `print_hex_dump_debug()`
> >> +- `print_hex_dump_bytes()`
> >> +
> >> +Therefore the usability of this tool is quite limited in the media subsystem,
> >> +because, as of now, there is no uniform rule for adding debug prints to the codebase,
> >> +resulting in a variety of ways these prints are implemented.
> >Shouldn't this explanation and some examples below go to the media
> >specific document instead of here?
>
> Oh yes you are right that has to go to the media guide, I am still a bit
> unsure about the example because I haven't found a good general example
> so far.
>
> >Some parts in this patch are tricky to decide where to go, but
> >suggesting to review once again.
>
> Yes, thanks for highlighting.
>
> >
> >> +
> >> +Also, note that most debug statements are implemented as a variation of
> >> +`dprintk`, which have to be activated via a parameter in respective module,
> >> +dynamic debug is unable to do that step for you.
> >> +
> >> +Here is one example, that enables all available `pr_debug()`'s within the file:
> >> +::
> >> +
> >> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> >> + $ ddcmd '-p; file v4l2-h264.c +p'
> >> + $ grep =p /proc/dynamic_debug/control
> >> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> >> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> >> +
> >> +**When should you use this over** `Ftrace`_ **?**
> >> +
> >> +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
> >> +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
> >> +- When you care more about receiving specific log messages than tracing the pattern of how a function is called
> >> +
> >> +`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
> >> +
> >> +Ftrace
> >> +------
> >> +
> >> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
> >> +
> >> +Trace whenever the a file is opened:
> >> +::
> >> +
> >> + $ cd /sys/kernel/tracing
> >> + $ echo function > /sys/kernel/tracing/current_tracer
> >> + $ echo do_filep_open > set_ftrace_filter
> >> + $ echo 1 > tracing_on
> >> + $ cat trace
> >> + find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
> >> + find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
> >> + find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
> >> +
> >> +.. _event_tracing:
> >> +
> >> +Activate a ftrace event on top of that:
> >> +::
> >> +
> >> + $ echo 1 > events/kmem/kfree/enable
> >> + find-5351 [005] ...1. 678288.910143: do_filp_open <-do_sys_openat2
> >> + find-5351 [005] ..... 678288.910185: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> >> + find-5351 [005] ...1. 678288.910218: do_filp_open <-do_sys_openat2
> >> + find-5351 [005] ..... 678288.910260: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> >> + find-5351 [005] ...1. 678288.910293: do_filp_open <-do_sys_openat2
> >> + find-5351 [005] ..... 678288.910345: kfree: call_site=fscrypt_fname_free_buffer+0x28/0x48 ptr=0000000000000000
> >> + find-5351 [005] ..... 678288.910389: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000007ba73e40
> >> + find-5351 [005] ..... 678288.910390: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000009e4850bc
> >> + find-5351 [005] ..... 678288.910391: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=000000004156f20f
> >> + find-5351 [005] ..... 678288.910393: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=00000000c7207e20
> >> + find-5351 [005] ..... 678288.910394: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000037c31d76
> >> + find-5351 [005] ..... 678288.910395: kfree: call_site=free_rb_tree_fname+0x54/0x88 ptr=0000000001922677
> >> +
> >> +.. _read_ftrace_log:
> >> +
> >> +**Reading the ftrace log**
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +The `trace` file can be read just like any other file (`cat`, `tail`, `head`,
> >> +`vim`, etc.), the size of the file is limited by the `buffer_size_kb` (``echo
> >> +1000 > buffer_size_kb``). The `trace_pipe` will behave similar to the `trace`
> >> +file, but whenever you read from the file the content is consumed.
> >> +
> >> +**Kernelshark**
> >> +~~~~~~~~~~~~~~~
> >> +
> >> +A GUI interface to visualize the traces as a graph and list view from the
> >> +output of the `trace-cmd
> >> +<https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/>`__ application.
> >> +
> >> +`Kernelshark documentation <https://kernelshark.org/Documentation.html>`_
> >> +
> >> +`Full Ftrace documentation <../trace/ftrace.html>`__
> >> +
> >> +Perf & alternatives
> >> +-------------------
> >> +
> >> +The tools mentioned above provide ways to inspect kernel code, results, variable values, etc.
> >> +Sometimes you have to find out first where to look and for those cases a box of performance tracking tools can help you to frame the issue.
> >nit: add comma. better readability.
> >s/and for those cases /and for those cases, /
>
> ack
>
> >
> >> +
> >> +.. _performance:
> >> +
> >> +**Why should you do a performance analysis?**
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +A performance analysis is a good first step when among other reasons:
> >> +
> >> +- you cannot define the issue
> >> +- you do not know where it occurs
> >> +- the running system should not be interrupted or it is a remote system, where you cannot install a new module/kernel
> >> +
> >> +.. _linux-tools:
> >> +
> >> +**How to do a simple analysis with linux tools?**
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +For the start of a performance analysis, you can start with the usual tools like:
> >> +
> >> +- `top` / `htop` / `atop` (*get an overview of the system load, see spikes on specific processes*)
> >> +- `mpstat -P ALL` (*look at the load distribution among CPUs*)
> >> +- `iostat -x` (*observe input and output devices utilization and performance*)
> >> +- `vmstat` (*overview of memory usage on the system*)
> >> +- `pidstat` (*similar to* `vmstat` *but per process, to dial it down to the target*)
> >> +- `strace -tp $PID` (*once you know the process, you can figure out how it communicates with the Kernel*)
> >> +
> >> +These should help to figure out restrict the areas to look at sufficiently.
> >nit: s/figure out restrict/narrow down/
>
> ack
>
> >
> >> +
> >> +**Diving deeper with perf**
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +The **perf** tool provides a series of metrics and events to further dial down on issues.
> >> +
> >> +Prerequisite: build or install perf on your system
> >> +
> >> +Gather statistics data for finding all files starting with `gcc` in `/usr`
> >> +::
> >> +
> >> + # perf stat -d find /usr -name 'gcc*' | wc -l
> >> +
> >> + Performance counter stats for 'find /usr -name gcc*':
> >> +
> >> + 1277.81 msec task-clock # 0.997 CPUs utilized
> >> + 9 context-switches # 7.043 /sec
> >> + 1 cpu-migrations # 0.783 /sec
> >> + 704 page-faults # 550.943 /sec
> >> + 766548897 cycles # 0.600 GHz (97.15%)
> >> + 798285467 instructions # 1.04 insn per cycle (97.15%)
> >> + 57582731 branches # 45.064 M/sec (2.85%)
> >> + 3842573 branch-misses # 6.67% of all branches (97.15%)
> >> + 281616097 L1-dcache-loads # 220.390 M/sec (97.15%)
> >> + 4220975 L1-dcache-load-misses # 1.50% of all L1-dcache accesses (97.15%)
> >> + <not supported> LLC-loads
> >> + <not supported> LLC-load-misses
> >> +
> >> + 1.281746009 seconds time elapsed
> >> +
> >> + 0.508796000 seconds user
> >> + 0.773209000 seconds sys
> >> +
> >> +
> >> + 52
> >> +
> >> +The availability of events and metrics depends on the system you are running.
> >> +
> >> +`Full documentation <https://perf.wiki.kernel.org/index.php/Main_Page>`__
> >> +
> >> +**Perfetto**
> >> +~~~~~~~~~~~~
> >> +
> >> +A set of tools to measure and analyze how well applications and systems perform.
> >> +You can use it to:
> >> +
> >> +* identify bottlenecks
> >> +* optimize code
> >> +* make software run faster and more efficiently.
> >> +
> >> +**What is the difference between perfetto and perf?**
> >> +
> >> +* perf is tool as part of and specialized for the Linux Kernel and has CLI user
> >> + interface.
> >> +* perfetto cross-platform performance analysis stack, has extended
> >> + functionality into userspace and provides a WEB user interface.
> >> +
> >> +`Full documentation <https://perfetto.dev/docs/>`__
> >> +
> >> +.. _kernel_panic_analysis_tools:
> >> +
> >> +Kernel panic analysis tools
> >> +---------------------------
> >> +
> >> + To analyse the crash dump please use `Kdump` & `Kexec`.
> >> +
> >> + `Full documentation <../admin-guide/kdump/kdump.html>`__
> >> +
> >> + In order to find the corresponding line in the code you can use `faddr2line
> >> + <https://elixir.bootlin.com/linux/latest/source/scripts/faddr2line>`__, note
> >> + that you need to enable `CONFIG_DEBUG_INFO` for that to work.
> >> +
> >> + An alternative to using `faddr2line` is the use of `objdump` (and it's
> >> + derivatives for the different platforms like `aarch64-linux-gnu-objdump`),
> >> + take this line as an example:
> >> +
> >> + `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`.
> >> +
> >> + We can find the corresponding line of code by executing:
> >> + ::
> >> +
> >> + aarch64-linux-gnu-objdump -dS drivers/staging/media/rkvdec/rockchip-vdec.ko | grep rkvdec_device_run\>: -A 40
> >> + 0000000000000ac8 <rkvdec_device_run>:
> >> + ac8: d503201f nop
> >> + acc: d503201f nop
> >> + {
> >> + ad0: d503233f paciasp
> >> + ad4: a9bd7bfd stp x29, x30, [sp, #-48]!
> >> + ad8: 910003fd mov x29, sp
> >> + adc: a90153f3 stp x19, x20, [sp, #16]
> >> + ae0: a9025bf5 stp x21, x22, [sp, #32]
> >> + const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
> >> + ae4: f9411814 ldr x20, [x0, #560]
> >> + struct rkvdec_dev *rkvdec = ctx->dev;
> >> + ae8: f9418015 ldr x21, [x0, #768]
> >> + if (WARN_ON(!desc))
> >> + aec: b4000654 cbz x20, bb4 <rkvdec_device_run+0xec>
> >> + ret = pm_runtime_resume_and_get(rkvdec->dev);
> >> + af0: f943d2b6 ldr x22, [x21, #1952]
> >> + ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> >> + af4: aa0003f3 mov x19, x0
> >> + af8: 52800081 mov w1, #0x4 // #4
> >> + afc: aa1603e0 mov x0, x22
> >> + b00: 94000000 bl 0 <__pm_runtime_resume>
> >> + if (ret < 0) {
> >> + b04: 37f80340 tbnz w0, #31, b6c <rkvdec_device_run+0xa4>
> >> + dev_warn(rkvdec->dev, "Not good\n");
> >> + b08: f943d2a0 ldr x0, [x21, #1952]
> >> + b0c: 90000001 adrp x1, 0 <rkvdec_try_ctrl-0x8>
> >> + b10: 91000021 add x1, x1, #0x0
> >> + b14: 94000000 bl 0 <_dev_warn>
> >> + *bad = 1;
> >> + b18: d2800001 mov x1, #0x0 // #0
> >> + ...
> >> +
> >> +
> >> + To find the matching line we just have to add `0x50` (from
> >> + `rkvdec_device_run+0x50`) to `0xac8` (from `0000000000000ac8
> >> + <rkvdec_device_run>:`), which yields `0xb18` corresponding with `*bad = 1`.
> >I didn't understand this part. Can you explain?
>
> I try to explain in that line how to interpret the output above.
> Meaning, in this line from the crash dump:
>
> `[ +0.000240] rkvdec_device_run+0x50/0x138 [rockchip_vdec]`
>
> I can take the 0x50 as offset, which I have to add to the base address
> of the corresponding function, which I find in this line:
>
> 0000000000000ac8 <rkvdec_device_run>:
>
> The result of 0xac8 + 0x50 = 0xb18
> And when I search for that address within the function I get the
> following line:
>
> *bad = 1;
> b18: d2800001 mov x1, #0x0 // #0
>
> Which is the offending line, I will try to clarify this in the next
> version.
Got it. Thank you for the clarification.
>
> Regards,
> Sebastian
>
> >
> >> +
> >> +**Copyright** |copy| 2024 : Collabora
> >> diff --git a/Documentation/index.rst b/Documentation/index.rst
> >> index f9f525f4c0dd..eb8de7ba8e41 100644
> >> --- a/Documentation/index.rst
> >> +++ b/Documentation/index.rst
> >> @@ -57,6 +57,7 @@ Various other manuals with useful information for all kernel developers.
> >> Testing guide <dev-tools/testing-overview>
> >> Hacking guide <kernel-hacking/index>
> >> Tracing <trace/index>
> >> + Debugging <debugging/index>
> >> Fault injection <fault-injection/index>
> >> Livepatching <livepatch/index>
> >> Rust <rust/index>
> >> @@ -76,6 +77,7 @@ developers seeking information on the kernel's user-space APIs.
> >> Build system <kbuild/index>
> >> Reporting issues <admin-guide/reporting-issues.rst>
> >> Userspace tools <tools/index>
> >> + Userspace debugging tools <debugging/userspace_debugging_guide.rst>
> >> Userspace API <userspace-api/index>
> >>
> >> See also: the `Linux man pages <https://www.kernel.org/doc/man-pages/>`_,
> >>
> >> --
> >> 2.25.1
> >>
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
2024-10-03 22:32 ` Steve Cho
@ 2024-10-14 17:42 ` Daniel Almeida
2024-10-23 11:43 ` Hans Verkuil
2 siblings, 0 replies; 18+ messages in thread
From: Daniel Almeida @ 2024-10-14 17:42 UTC (permalink / raw)
To: Sebastian Fricke
Cc: Jonathan Corbet, linux-doc, linux-kernel, linux-media,
laurent.pinchart, hverkuil-cisco, mauro.chehab, kernel,
bob.beckett, nicolas.dufresne
Hi Sebastian,
> On 24 Sep 2024, at 05:45, Sebastian Fricke <sebastian.fricke@collabora.com> wrote:
>
> Create a guides section for all documentation material, that isn't
> strictly related to a specific piece of code.
>
> Provide a guide for developers on how to debug code with a focus on the
> media subsystem. This document aims to provide a rough overview over the
> possibilities and a rational to help choosing the right tool for the
> given circumstances.
>
> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> ---
> Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
> Documentation/media/guides/index.rst | 11 ++
> Documentation/media/index.rst | 1 +
> 3 files changed, 186 insertions(+)
>
> diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
> new file mode 100644
> index 000000000000..5f37801dd4ba
> --- /dev/null
> +++ b/Documentation/media/guides/debugging_issues.rst
> @@ -0,0 +1,174 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +============================================
> +Debugging and tracing in the media subsystem
> +============================================
> +
> +This document serves as a starting point and lookup for debugging device
> +drivers in the media subsystem.
> +
> +.. contents::
> + :depth: 3
> +
> +General debugging advice
> +========================
> +
> +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
> +
> +Available tools
> +===============
> +
> +dev_debug module parameter
> +--------------------------
> +
> +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
> +::
> +
> + # cat /sys/class/video4linux/video3/name
> + rkvdec
> + # echo 0xff > /sys/class/video4linux/video3/dev_debug
> + # dmesg -wH
> + [...] videodev: v4l2_open: video3: open (0)
> + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
> +
> +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
> +
> +dev_dbg / v4l2_dbg
> +------------------
> +
> +- Difference between both?
> +
> + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
> + - dev_dbg can be targeted by dynamic debug
> + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
> +
> +Dynamic debug
> +-------------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Here is one example, that enables all available `pr_debug()`'s within the file:
> +::
> +
> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> + $ ddcmd '-p; file v4l2-h264.c +p'
> + $ grep =p /proc/dynamic_debug/control
> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> +
> +Ftrace
> +------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Trace whenever the `rkvdec_try_ctrl` function is called
> +::
> +
> + $ cd /sys/kernel/tracing
> + $ echo function > /sys/kernel/tracing/current_tracer
> + $ echo rkvdec_try_ctrl > set_ftrace_filter
> + $ echo 1 > tracing_on
> + $ cat trace
> + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
> +
> +Find out from where the calls originate
> +::
> +
> + $ echo 1 > options/func_stack_trace
> + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => try_set_ext_ctrls_common
> + => try_set_ext_ctrls
> + => v4l2_s_ext_ctrls
> + => v4l_s_ext_ctrls
> + ...
> + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => v4l2_ctrl_request_setup
> + => rkvdec_run_preamble
> + => rkvdec_h264_run
> + => rkvdec_device_run
> + ...
> +
> +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
> +::
> +
> + echo function_graph > current_tracer
> + echo rkvdec_h264_run > set_graph_function
> + echo 4 > max_graph_depth
> + echo do_interrupt_handler mutex_* > set_graph_notrace
> + echo 1 > options/funcgraph-retval
> + ...
> + 4) | rkvdec_h264_run [rockchip_vdec]() {
> + 4) | v4l2_ctrl_find [videodev]() {
> + ...
> + 4) | rkvdec_run_preamble [rockchip_vdec]() {
> + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
> + ...
> + 4) | v4l2_ctrl_request_setup [videodev]() {
> + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
> + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
> + ...
> + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
> + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
> + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
> + ...
> + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
> + ...
> + 4) | rkvdec_run_postamble [rockchip_vdec]() {
> + ...
> + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
Maybe have a look at retsnoop?
To me, a very frustrating and very common issue is having to figure out where
exactly an error code was generated. Ftrace helps a great deal, but I feel that
retsnoop just takes it a step further. On top of that, you can retrace the
execution on a statement level.
That is, with the right options, it can tell you exactly what statements
executed, which then lets you get very precise without any extra prints.
See [0]. There’s a talk on KR2024 about it too [1].
[0]: https://github.com/anakryiko/retsnoop
[1]: https://www.youtube.com/watch?v=NvTBrx6EdF8
— Daniel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 0/3] Documentation: Debugging guide
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
` (2 preceding siblings ...)
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
@ 2024-10-22 15:23 ` Sebastian Fricke
2024-10-23 6:57 ` Bagas Sanjaya
2024-10-24 22:01 ` Jonathan Corbet
4 siblings, 1 reply; 18+ messages in thread
From: Sebastian Fricke @ 2024-10-22 15:23 UTC (permalink / raw)
To: Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Greg Kroah-Hartman
(CC: Grep KH)
Hello,
On 24.09.2024 10:45, Sebastian Fricke wrote:
>The RFC contains:
>- a general debugging guide split into debugging for driver developers and
> debugging from userspace
>- a new summary page for all media related documentation. This is inspired by
> other subsystems, which first of all allows a user to find the subsystem
> under the subsystems page and secondly eases general navigation through the
> documentation that is sprinkled onto multiple places.
>- a guide on how to debug code in the media subsystem, which points to the
> parts of the general documentation and adds own routines.
I wanted to give this a little push, so far I have received a lot of
good feedback but none from the core and documentation folks. What do
you think about this?
Regards,
Sebastian
>
>WHY do we need this?
>--------------------
>
>For anyone without years of experience in the Linux kernel, knowing which tool
>to use or even which tools are available is not as straightforward as some
>senior developers might perceive.
>We realized that there is a general need for a kind of "start page", that
>allows especially beginners to get up-to-speed with the codebase and the
>documentation. The documentation in particular is currently quite hard to navigate
>as you mostly have to know what you are searching for to find it.
>
>WHAT do we cover?
>-----------------
>
>The document is structured into two sections:
>
>1. A problem-focused approach: This means, a developer facing an issue matching
>one of the given examples, will find suggestions for how to approach that
>problem (e.g. which tool to use) in this section
>2. A tool-focused approach: This sections highlights the available tools, with
>comparisions between the tools if sensible. The goal of this work is
>**duplicate as little as possible** from the existing documentation and
>instead provide a rough overview that provides:
> - A link to the actual documentation
> - A minimal example for how it can be used (from a media perspective,
> if the usage isn't absolutely trivial like printk)
> - A rational for why it should be used
>
>To: Jonathan Corbet <corbet@lwn.net>
>Cc: linux-doc@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org
>Cc: linux-media@vger.kernel.org
>Cc: laurent.pinchart@ideasonboard.com
>Cc: hverkuil-cisco@xs4all.nl
>Cc: mauro.chehab@linux.intel.com
>Cc: kernel@collabora.com
>Cc: bob.beckett@collabora.com
>Cc: nicolas.dufresne@collabora.com
>Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
>
>---
>Changes in v2:
>- Split the media debugging guide into a general and a media specific guide,
> which contains mostly references to the general guide and a few media
> specific aspects.
>- Fill out TBD sections
>- Add device coredump section
>
>---
>Sebastian Fricke (3):
> docs: media: Create separate documentation folder for media
> docs: Add guides section for debugging
> docs: media: Debugging guide for the media subsystem
>
> .../driver_development_debugging_guide.rst | 193 +++++++++++++++
> Documentation/debugging/index.rst | 66 +++++
> .../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
> Documentation/index.rst | 2 +
> Documentation/media/guides/debugging_issues.rst | 174 +++++++++++++
> Documentation/media/guides/index.rst | 11 +
> Documentation/media/index.rst | 20 ++
> Documentation/subsystem-apis.rst | 1 +
> 8 files changed, 736 insertions(+)
>---
>base-commit: 68a72104cbcf38ad16500216e213fa4eb21c4be2
>change-id: 20240529-b4-media_docs_improve-79ea2d480483
>
>Best regards,
>--
>Sebastian Fricke <sebastian.fricke@collabora.com>
>
Sebastian Fricke
Consultant Software Engineer
Collabora Ltd
Platinum Building, St John's Innovation Park, Cambridge CB4 0DS, UK
Registered in England & Wales no 5513718.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 0/3] Documentation: Debugging guide
2024-10-22 15:23 ` [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
@ 2024-10-23 6:57 ` Bagas Sanjaya
0 siblings, 0 replies; 18+ messages in thread
From: Bagas Sanjaya @ 2024-10-23 6:57 UTC (permalink / raw)
To: Sebastian Fricke, Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Greg Kroah-Hartman
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
On Tue, Oct 22, 2024 at 05:23:16PM +0200, Sebastian Fricke wrote:
> On 24.09.2024 10:45, Sebastian Fricke wrote:
> > The RFC contains:
> > - a general debugging guide split into debugging for driver developers and
> > debugging from userspace
> > - a new summary page for all media related documentation. This is inspired by
> > other subsystems, which first of all allows a user to find the subsystem
> > under the subsystems page and secondly eases general navigation through the
> > documentation that is sprinkled onto multiple places.
> > - a guide on how to debug code in the media subsystem, which points to the
> > parts of the general documentation and adds own routines.
>
> I wanted to give this a little push, so far I have received a lot of
> good feedback but none from the core and documentation folks. What do
> you think about this?
Address all reviews then reroll (maybe as non-RFC series).
Thanks.
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
2024-10-03 22:32 ` Steve Cho
2024-10-14 17:42 ` Daniel Almeida
@ 2024-10-23 11:43 ` Hans Verkuil
2024-10-23 12:29 ` Laurent Pinchart
2 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2024-10-23 11:43 UTC (permalink / raw)
To: Sebastian Fricke, Jonathan Corbet
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
mauro.chehab, kernel, bob.beckett, nicolas.dufresne
On 24/09/2024 10:45, Sebastian Fricke wrote:
> Create a guides section for all documentation material, that isn't
> strictly related to a specific piece of code.
>
> Provide a guide for developers on how to debug code with a focus on the
> media subsystem. This document aims to provide a rough overview over the
> possibilities and a rational to help choosing the right tool for the
> given circumstances.
>
> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> ---
> Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
> Documentation/media/guides/index.rst | 11 ++
> Documentation/media/index.rst | 1 +
> 3 files changed, 186 insertions(+)
>
> diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
> new file mode 100644
> index 000000000000..5f37801dd4ba
> --- /dev/null
> +++ b/Documentation/media/guides/debugging_issues.rst
> @@ -0,0 +1,174 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +============================================
> +Debugging and tracing in the media subsystem
> +============================================
> +
> +This document serves as a starting point and lookup for debugging device
> +drivers in the media subsystem.
> +
> +.. contents::
> + :depth: 3
> +
> +General debugging advice
> +========================
> +
> +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
> +
> +Available tools
> +===============
> +
> +dev_debug module parameter
> +--------------------------
> +
> +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
> +::
> +
> + # cat /sys/class/video4linux/video3/name
> + rkvdec
> + # echo 0xff > /sys/class/video4linux/video3/dev_debug
> + # dmesg -wH
> + [...] videodev: v4l2_open: video3: open (0)
> + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
> +
> +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
> +
> +dev_dbg / v4l2_dbg
> +------------------
> +
> +- Difference between both?
> +
> + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
> + - dev_dbg can be targeted by dynamic debug
> + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
> +
> +Dynamic debug
> +-------------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Here is one example, that enables all available `pr_debug()`'s within the file:
> +::
> +
> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> + $ ddcmd '-p; file v4l2-h264.c +p'
> + $ grep =p /proc/dynamic_debug/control
> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> +
> +Ftrace
> +------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Trace whenever the `rkvdec_try_ctrl` function is called
> +::
> +
> + $ cd /sys/kernel/tracing
> + $ echo function > /sys/kernel/tracing/current_tracer
> + $ echo rkvdec_try_ctrl > set_ftrace_filter
> + $ echo 1 > tracing_on
> + $ cat trace
> + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
> +
> +Find out from where the calls originate
> +::
> +
> + $ echo 1 > options/func_stack_trace
> + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => try_set_ext_ctrls_common
> + => try_set_ext_ctrls
> + => v4l2_s_ext_ctrls
> + => v4l_s_ext_ctrls
> + ...
> + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
> + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
> + => rkvdec_try_ctrl
> + => try_or_set_cluster
> + => v4l2_ctrl_request_setup
> + => rkvdec_run_preamble
> + => rkvdec_h264_run
> + => rkvdec_device_run
> + ...
> +
> +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
> +::
> +
> + echo function_graph > current_tracer
> + echo rkvdec_h264_run > set_graph_function
> + echo 4 > max_graph_depth
> + echo do_interrupt_handler mutex_* > set_graph_notrace
> + echo 1 > options/funcgraph-retval
> + ...
> + 4) | rkvdec_h264_run [rockchip_vdec]() {
> + 4) | v4l2_ctrl_find [videodev]() {
> + ...
> + 4) | rkvdec_run_preamble [rockchip_vdec]() {
> + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
> + ...
> + 4) | v4l2_ctrl_request_setup [videodev]() {
> + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
> + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
> + ...
> + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
> + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
> + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
> + ...
> + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
> + ...
> + 4) | rkvdec_run_postamble [rockchip_vdec]() {
> + ...
> + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
> +
> +DebugFS
> +-------
> +
> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +Perf & alternatives
> +-------------------
> +
> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> +
> +Example for media devices:
> +
> +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
> +::
> +
> + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
> + ...
> + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
> +
> + 7794.23 msec task-clock:u # 0.697 CPUs utilized
> + 0 context-switches:u # 0.000 /sec
> + 0 cpu-migrations:u # 0.000 /sec
> + 11901 page-faults:u # 1.527 K/sec
> + 882671556 cycles:u # 0.113 GHz (95.79%)
> + 711708695 instructions:u # 0.81 insn per cycle (95.79%)
> + 10581935 branches:u # 1.358 M/sec (15.13%)
> + 6871144 branch-misses:u # 64.93% of all branches (95.79%)
> + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
> + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
> + <not supported> LLC-loads:u
> + <not supported> LLC-load-misses:u
> +
> + 11.180830431 seconds time elapsed
> +
> + 1.502318000 seconds user
> + 6.377221000 seconds sys
> +
> +The availability of events and metrics depends on the system you are running.
> +
> +Error checking & panic analysis
> +-------------------------------
> +
> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> +
> +**Copyright** |copy| 2024 : Collabora
I would add a few more:
- Implementing vidioc_log_status in the driver: this can log the current status to the kernel log.
It's called by v4l2-ctl --log-status. Very useful for debugging problems with receiving video
(TV/S-Video/HDMI/etc) since the video signal is external (so unpredictable). Less useful with
camera sensor inputs since you have control over what the camera sensor does.
- Run v4l2-compliance to verify the driver. To see the detailed media topology (and check it) use:
v4l2-compliance -M /dev/mediaX --verbose
You can also run a full compliance check for all devices referenced in the media topology by
running v4l2-compliance -m /dev/mediaX
Regards,
Hans
> diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
> new file mode 100644
> index 000000000000..0008966c0862
> --- /dev/null
> +++ b/Documentation/media/guides/index.rst
> @@ -0,0 +1,11 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +Media Guides
> +============
> +
> +.. toctree::
> + :caption: Table of Contents
> + :maxdepth: 1
> +
> + debugging_issues
> diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
> index d056a9e99dca..5461876fc401 100644
> --- a/Documentation/media/index.rst
> +++ b/Documentation/media/index.rst
> @@ -7,6 +7,7 @@ Media Subsystem Documentation
> .. toctree::
> :maxdepth: 2
>
> + guides/index
> ../userspace-api/media/index
> ../driver-api/media/index.rst
> ../admin-guide/media/index
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-10-23 11:43 ` Hans Verkuil
@ 2024-10-23 12:29 ` Laurent Pinchart
2024-10-23 12:37 ` Hans Verkuil
0 siblings, 1 reply; 18+ messages in thread
From: Laurent Pinchart @ 2024-10-23 12:29 UTC (permalink / raw)
To: Hans Verkuil
Cc: Sebastian Fricke, Jonathan Corbet, linux-doc, linux-kernel,
linux-media, mauro.chehab, kernel, bob.beckett, nicolas.dufresne
On Wed, Oct 23, 2024 at 01:43:34PM +0200, Hans Verkuil wrote:
> On 24/09/2024 10:45, Sebastian Fricke wrote:
> > Create a guides section for all documentation material, that isn't
> > strictly related to a specific piece of code.
> >
> > Provide a guide for developers on how to debug code with a focus on the
> > media subsystem. This document aims to provide a rough overview over the
> > possibilities and a rational to help choosing the right tool for the
> > given circumstances.
> >
> > Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> > ---
> > Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
> > Documentation/media/guides/index.rst | 11 ++
> > Documentation/media/index.rst | 1 +
> > 3 files changed, 186 insertions(+)
> >
> > diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
> > new file mode 100644
> > index 000000000000..5f37801dd4ba
> > --- /dev/null
> > +++ b/Documentation/media/guides/debugging_issues.rst
> > @@ -0,0 +1,174 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. include:: <isonum.txt>
> > +
> > +============================================
> > +Debugging and tracing in the media subsystem
> > +============================================
> > +
> > +This document serves as a starting point and lookup for debugging device
> > +drivers in the media subsystem.
> > +
> > +.. contents::
> > + :depth: 3
> > +
> > +General debugging advice
> > +========================
> > +
> > +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
> > +
> > +Available tools
> > +===============
> > +
> > +dev_debug module parameter
> > +--------------------------
> > +
> > +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> > +
> > +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
> > +::
> > +
> > + # cat /sys/class/video4linux/video3/name
> > + rkvdec
> > + # echo 0xff > /sys/class/video4linux/video3/dev_debug
> > + # dmesg -wH
> > + [...] videodev: v4l2_open: video3: open (0)
> > + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
> > +
> > +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
> > +
> > +dev_dbg / v4l2_dbg
> > +------------------
> > +
> > +- Difference between both?
> > +
> > + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
> > + - dev_dbg can be targeted by dynamic debug
> > + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
> > +
> > +Dynamic debug
> > +-------------
> > +
> > +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> > +
> > +Here is one example, that enables all available `pr_debug()`'s within the file:
> > +::
> > +
> > + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> > + $ ddcmd '-p; file v4l2-h264.c +p'
> > + $ grep =p /proc/dynamic_debug/control
> > + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> > + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> > +
> > +Ftrace
> > +------
> > +
> > +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> > +
> > +Trace whenever the `rkvdec_try_ctrl` function is called
> > +::
> > +
> > + $ cd /sys/kernel/tracing
> > + $ echo function > /sys/kernel/tracing/current_tracer
> > + $ echo rkvdec_try_ctrl > set_ftrace_filter
> > + $ echo 1 > tracing_on
> > + $ cat trace
> > + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
> > + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
> > +
> > +Find out from where the calls originate
> > +::
> > +
> > + $ echo 1 > options/func_stack_trace
> > + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
> > + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
> > + => rkvdec_try_ctrl
> > + => try_or_set_cluster
> > + => try_set_ext_ctrls_common
> > + => try_set_ext_ctrls
> > + => v4l2_s_ext_ctrls
> > + => v4l_s_ext_ctrls
> > + ...
> > + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
> > + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
> > + => rkvdec_try_ctrl
> > + => try_or_set_cluster
> > + => v4l2_ctrl_request_setup
> > + => rkvdec_run_preamble
> > + => rkvdec_h264_run
> > + => rkvdec_device_run
> > + ...
> > +
> > +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
> > +::
> > +
> > + echo function_graph > current_tracer
> > + echo rkvdec_h264_run > set_graph_function
> > + echo 4 > max_graph_depth
> > + echo do_interrupt_handler mutex_* > set_graph_notrace
> > + echo 1 > options/funcgraph-retval
> > + ...
> > + 4) | rkvdec_h264_run [rockchip_vdec]() {
> > + 4) | v4l2_ctrl_find [videodev]() {
> > + ...
> > + 4) | rkvdec_run_preamble [rockchip_vdec]() {
> > + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
> > + ...
> > + 4) | v4l2_ctrl_request_setup [videodev]() {
> > + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
> > + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
> > + ...
> > + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
> > + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
> > + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
> > + ...
> > + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
> > + ...
> > + 4) | rkvdec_run_postamble [rockchip_vdec]() {
> > + ...
> > + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
> > +
> > +DebugFS
> > +-------
> > +
> > +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> > +
> > +Perf & alternatives
> > +-------------------
> > +
> > +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> > +
> > +Example for media devices:
> > +
> > +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
> > +::
> > +
> > + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
> > + ...
> > + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
> > +
> > + 7794.23 msec task-clock:u # 0.697 CPUs utilized
> > + 0 context-switches:u # 0.000 /sec
> > + 0 cpu-migrations:u # 0.000 /sec
> > + 11901 page-faults:u # 1.527 K/sec
> > + 882671556 cycles:u # 0.113 GHz (95.79%)
> > + 711708695 instructions:u # 0.81 insn per cycle (95.79%)
> > + 10581935 branches:u # 1.358 M/sec (15.13%)
> > + 6871144 branch-misses:u # 64.93% of all branches (95.79%)
> > + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
> > + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
> > + <not supported> LLC-loads:u
> > + <not supported> LLC-load-misses:u
> > +
> > + 11.180830431 seconds time elapsed
> > +
> > + 1.502318000 seconds user
> > + 6.377221000 seconds sys
> > +
> > +The availability of events and metrics depends on the system you are running.
> > +
> > +Error checking & panic analysis
> > +-------------------------------
> > +
> > +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> > +
> > +**Copyright** |copy| 2024 : Collabora
>
> I would add a few more:
>
> - Implementing vidioc_log_status in the driver: this can log the current status to the kernel log.
> It's called by v4l2-ctl --log-status. Very useful for debugging problems with receiving video
> (TV/S-Video/HDMI/etc) since the video signal is external (so unpredictable). Less useful with
> camera sensor inputs since you have control over what the camera sensor does.
To avoid unnecessary complexity in drivers, should we encourage
implementing log_status for receivers but discourage it for camera
sensors ? I haven't seen many people attempting to do so, but I have
pushed back against the debug read/write register ops in sensor drivers.
> - Run v4l2-compliance to verify the driver. To see the detailed media topology (and check it) use:
> v4l2-compliance -M /dev/mediaX --verbose
> You can also run a full compliance check for all devices referenced in the media topology by
> running v4l2-compliance -m /dev/mediaX
>
> > diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
> > new file mode 100644
> > index 000000000000..0008966c0862
> > --- /dev/null
> > +++ b/Documentation/media/guides/index.rst
> > @@ -0,0 +1,11 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +============
> > +Media Guides
> > +============
> > +
> > +.. toctree::
> > + :caption: Table of Contents
> > + :maxdepth: 1
> > +
> > + debugging_issues
> > diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
> > index d056a9e99dca..5461876fc401 100644
> > --- a/Documentation/media/index.rst
> > +++ b/Documentation/media/index.rst
> > @@ -7,6 +7,7 @@ Media Subsystem Documentation
> > .. toctree::
> > :maxdepth: 2
> >
> > + guides/index
> > ../userspace-api/media/index
> > ../driver-api/media/index.rst
> > ../admin-guide/media/index
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-10-23 12:29 ` Laurent Pinchart
@ 2024-10-23 12:37 ` Hans Verkuil
2024-10-23 13:04 ` Dave Stevenson
0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2024-10-23 12:37 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Sebastian Fricke, Jonathan Corbet, linux-doc, linux-kernel,
linux-media, mauro.chehab, kernel, bob.beckett, nicolas.dufresne
On 23/10/2024 14:29, Laurent Pinchart wrote:
> On Wed, Oct 23, 2024 at 01:43:34PM +0200, Hans Verkuil wrote:
>> On 24/09/2024 10:45, Sebastian Fricke wrote:
>>> Create a guides section for all documentation material, that isn't
>>> strictly related to a specific piece of code.
>>>
>>> Provide a guide for developers on how to debug code with a focus on the
>>> media subsystem. This document aims to provide a rough overview over the
>>> possibilities and a rational to help choosing the right tool for the
>>> given circumstances.
>>>
>>> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
>>> ---
>>> Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
>>> Documentation/media/guides/index.rst | 11 ++
>>> Documentation/media/index.rst | 1 +
>>> 3 files changed, 186 insertions(+)
>>>
>>> diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
>>> new file mode 100644
>>> index 000000000000..5f37801dd4ba
>>> --- /dev/null
>>> +++ b/Documentation/media/guides/debugging_issues.rst
>>> @@ -0,0 +1,174 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +.. include:: <isonum.txt>
>>> +
>>> +============================================
>>> +Debugging and tracing in the media subsystem
>>> +============================================
>>> +
>>> +This document serves as a starting point and lookup for debugging device
>>> +drivers in the media subsystem.
>>> +
>>> +.. contents::
>>> + :depth: 3
>>> +
>>> +General debugging advice
>>> +========================
>>> +
>>> +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
>>> +
>>> +Available tools
>>> +===============
>>> +
>>> +dev_debug module parameter
>>> +--------------------------
>>> +
>>> +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
>>> +
>>> +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
>>> +::
>>> +
>>> + # cat /sys/class/video4linux/video3/name
>>> + rkvdec
>>> + # echo 0xff > /sys/class/video4linux/video3/dev_debug
>>> + # dmesg -wH
>>> + [...] videodev: v4l2_open: video3: open (0)
>>> + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
>>> +
>>> +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
>>> +
>>> +dev_dbg / v4l2_dbg
>>> +------------------
>>> +
>>> +- Difference between both?
>>> +
>>> + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
>>> + - dev_dbg can be targeted by dynamic debug
>>> + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
>>> +
>>> +Dynamic debug
>>> +-------------
>>> +
>>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
>>> +
>>> +Here is one example, that enables all available `pr_debug()`'s within the file:
>>> +::
>>> +
>>> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
>>> + $ ddcmd '-p; file v4l2-h264.c +p'
>>> + $ grep =p /proc/dynamic_debug/control
>>> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
>>> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
>>> +
>>> +Ftrace
>>> +------
>>> +
>>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
>>> +
>>> +Trace whenever the `rkvdec_try_ctrl` function is called
>>> +::
>>> +
>>> + $ cd /sys/kernel/tracing
>>> + $ echo function > /sys/kernel/tracing/current_tracer
>>> + $ echo rkvdec_try_ctrl > set_ftrace_filter
>>> + $ echo 1 > tracing_on
>>> + $ cat trace
>>> + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
>>> + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
>>> +
>>> +Find out from where the calls originate
>>> +::
>>> +
>>> + $ echo 1 > options/func_stack_trace
>>> + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
>>> + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
>>> + => rkvdec_try_ctrl
>>> + => try_or_set_cluster
>>> + => try_set_ext_ctrls_common
>>> + => try_set_ext_ctrls
>>> + => v4l2_s_ext_ctrls
>>> + => v4l_s_ext_ctrls
>>> + ...
>>> + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
>>> + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
>>> + => rkvdec_try_ctrl
>>> + => try_or_set_cluster
>>> + => v4l2_ctrl_request_setup
>>> + => rkvdec_run_preamble
>>> + => rkvdec_h264_run
>>> + => rkvdec_device_run
>>> + ...
>>> +
>>> +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
>>> +::
>>> +
>>> + echo function_graph > current_tracer
>>> + echo rkvdec_h264_run > set_graph_function
>>> + echo 4 > max_graph_depth
>>> + echo do_interrupt_handler mutex_* > set_graph_notrace
>>> + echo 1 > options/funcgraph-retval
>>> + ...
>>> + 4) | rkvdec_h264_run [rockchip_vdec]() {
>>> + 4) | v4l2_ctrl_find [videodev]() {
>>> + ...
>>> + 4) | rkvdec_run_preamble [rockchip_vdec]() {
>>> + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
>>> + ...
>>> + 4) | v4l2_ctrl_request_setup [videodev]() {
>>> + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
>>> + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
>>> + ...
>>> + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
>>> + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
>>> + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
>>> + ...
>>> + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
>>> + ...
>>> + 4) | rkvdec_run_postamble [rockchip_vdec]() {
>>> + ...
>>> + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
>>> +
>>> +DebugFS
>>> +-------
>>> +
>>> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
>>> +
>>> +Perf & alternatives
>>> +-------------------
>>> +
>>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
>>> +
>>> +Example for media devices:
>>> +
>>> +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
>>> +::
>>> +
>>> + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
>>> + ...
>>> + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
>>> +
>>> + 7794.23 msec task-clock:u # 0.697 CPUs utilized
>>> + 0 context-switches:u # 0.000 /sec
>>> + 0 cpu-migrations:u # 0.000 /sec
>>> + 11901 page-faults:u # 1.527 K/sec
>>> + 882671556 cycles:u # 0.113 GHz (95.79%)
>>> + 711708695 instructions:u # 0.81 insn per cycle (95.79%)
>>> + 10581935 branches:u # 1.358 M/sec (15.13%)
>>> + 6871144 branch-misses:u # 64.93% of all branches (95.79%)
>>> + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
>>> + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
>>> + <not supported> LLC-loads:u
>>> + <not supported> LLC-load-misses:u
>>> +
>>> + 11.180830431 seconds time elapsed
>>> +
>>> + 1.502318000 seconds user
>>> + 6.377221000 seconds sys
>>> +
>>> +The availability of events and metrics depends on the system you are running.
>>> +
>>> +Error checking & panic analysis
>>> +-------------------------------
>>> +
>>> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
>>> +
>>> +**Copyright** |copy| 2024 : Collabora
>>
>> I would add a few more:
>>
>> - Implementing vidioc_log_status in the driver: this can log the current status to the kernel log.
>> It's called by v4l2-ctl --log-status. Very useful for debugging problems with receiving video
>> (TV/S-Video/HDMI/etc) since the video signal is external (so unpredictable). Less useful with
>> camera sensor inputs since you have control over what the camera sensor does.
>
> To avoid unnecessary complexity in drivers, should we encourage
> implementing log_status for receivers but discourage it for camera
> sensors ? I haven't seen many people attempting to do so, but I have
> pushed back against the debug read/write register ops in sensor drivers.
I'm fine with that.
Logging the current status is particularly useful if you have no control over
what you receive, so anything can happen. But for camera sensors it is typically
not needed.
The register debug ops I haven't used in a very long time, and I wonder if it
shouldn't be deprecated.
Regards,
Hans
>
>> - Run v4l2-compliance to verify the driver. To see the detailed media topology (and check it) use:
>> v4l2-compliance -M /dev/mediaX --verbose
>> You can also run a full compliance check for all devices referenced in the media topology by
>> running v4l2-compliance -m /dev/mediaX
>>
>>> diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
>>> new file mode 100644
>>> index 000000000000..0008966c0862
>>> --- /dev/null
>>> +++ b/Documentation/media/guides/index.rst
>>> @@ -0,0 +1,11 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +
>>> +============
>>> +Media Guides
>>> +============
>>> +
>>> +.. toctree::
>>> + :caption: Table of Contents
>>> + :maxdepth: 1
>>> +
>>> + debugging_issues
>>> diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
>>> index d056a9e99dca..5461876fc401 100644
>>> --- a/Documentation/media/index.rst
>>> +++ b/Documentation/media/index.rst
>>> @@ -7,6 +7,7 @@ Media Subsystem Documentation
>>> .. toctree::
>>> :maxdepth: 2
>>>
>>> + guides/index
>>> ../userspace-api/media/index
>>> ../driver-api/media/index.rst
>>> ../admin-guide/media/index
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem
2024-10-23 12:37 ` Hans Verkuil
@ 2024-10-23 13:04 ` Dave Stevenson
0 siblings, 0 replies; 18+ messages in thread
From: Dave Stevenson @ 2024-10-23 13:04 UTC (permalink / raw)
To: Hans Verkuil
Cc: Laurent Pinchart, Sebastian Fricke, Jonathan Corbet, linux-doc,
linux-kernel, linux-media, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne
On Wed, 23 Oct 2024 at 13:37, Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 23/10/2024 14:29, Laurent Pinchart wrote:
> > On Wed, Oct 23, 2024 at 01:43:34PM +0200, Hans Verkuil wrote:
> >> On 24/09/2024 10:45, Sebastian Fricke wrote:
> >>> Create a guides section for all documentation material, that isn't
> >>> strictly related to a specific piece of code.
> >>>
> >>> Provide a guide for developers on how to debug code with a focus on the
> >>> media subsystem. This document aims to provide a rough overview over the
> >>> possibilities and a rational to help choosing the right tool for the
> >>> given circumstances.
> >>>
> >>> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> >>> ---
> >>> Documentation/media/guides/debugging_issues.rst | 174 ++++++++++++++++++++++++
> >>> Documentation/media/guides/index.rst | 11 ++
> >>> Documentation/media/index.rst | 1 +
> >>> 3 files changed, 186 insertions(+)
> >>>
> >>> diff --git a/Documentation/media/guides/debugging_issues.rst b/Documentation/media/guides/debugging_issues.rst
> >>> new file mode 100644
> >>> index 000000000000..5f37801dd4ba
> >>> --- /dev/null
> >>> +++ b/Documentation/media/guides/debugging_issues.rst
> >>> @@ -0,0 +1,174 @@
> >>> +.. SPDX-License-Identifier: GPL-2.0
> >>> +.. include:: <isonum.txt>
> >>> +
> >>> +============================================
> >>> +Debugging and tracing in the media subsystem
> >>> +============================================
> >>> +
> >>> +This document serves as a starting point and lookup for debugging device
> >>> +drivers in the media subsystem.
> >>> +
> >>> +.. contents::
> >>> + :depth: 3
> >>> +
> >>> +General debugging advice
> >>> +========================
> >>> +
> >>> +For general advice see the `general-debugging-guide <../../debugging/index.html>`__.
> >>> +
> >>> +Available tools
> >>> +===============
> >>> +
> >>> +dev_debug module parameter
> >>> +--------------------------
> >>> +
> >>> +For a general overview please see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> >>> +
> >>> +Every video device provides a `dev_debug` parameter, which allows to get further insights into the IOCTLs in the background.
> >>> +::
> >>> +
> >>> + # cat /sys/class/video4linux/video3/name
> >>> + rkvdec
> >>> + # echo 0xff > /sys/class/video4linux/video3/dev_debug
> >>> + # dmesg -wH
> >>> + [...] videodev: v4l2_open: video3: open (0)
> >>> + [ +0.000036] video3: VIDIOC_QUERYCAP: driver=rkvdec, card=rkvdec, bus=platform:rkvdec, version=0x00060900, capabilities=0x84204000, device_caps=0x04204000
> >>> +
> >>> +`Full documentation <../../driver-api/media/v4l2-dev.html#video-device-debugging>`__
> >>> +
> >>> +dev_dbg / v4l2_dbg
> >>> +------------------
> >>> +
> >>> +- Difference between both?
> >>> +
> >>> + - v4l2_dbg utilizes v4l2_printk under the hood, which further uses printk directly, thus it cannot be targeted by dynamic debug
> >>> + - dev_dbg can be targeted by dynamic debug
> >>> + - v4l2_dbg has a more specific prefix format for the media subsystem, while dev_dbg only highlights the driver name and the location of the log
> >>> +
> >>> +Dynamic debug
> >>> +-------------
> >>> +
> >>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> >>> +
> >>> +Here is one example, that enables all available `pr_debug()`'s within the file:
> >>> +::
> >>> +
> >>> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> >>> + $ ddcmd '-p; file v4l2-h264.c +p'
> >>> + $ grep =p /proc/dynamic_debug/control
> >>> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> >>> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> >>> +
> >>> +Ftrace
> >>> +------
> >>> +
> >>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> >>> +
> >>> +Trace whenever the `rkvdec_try_ctrl` function is called
> >>> +::
> >>> +
> >>> + $ cd /sys/kernel/tracing
> >>> + $ echo function > /sys/kernel/tracing/current_tracer
> >>> + $ echo rkvdec_try_ctrl > set_ftrace_filter
> >>> + $ echo 1 > tracing_on
> >>> + $ cat trace
> >>> + h264parse0:sink-6359 [001] ...1. 172714.547523: rkvdec_try_ctrl <-try_or_set_cluster
> >>> + h264parse0:sink-6359 [005] ...1. 172714.567386: rkvdec_try_ctrl <-try_or_set_cluster
> >>> +
> >>> +Find out from where the calls originate
> >>> +::
> >>> +
> >>> + $ echo 1 > options/func_stack_trace
> >>> + h264parse0:sink-6715 [002] ..... 172837.967762: rkvdec_try_ctrl <-try_or_set_cluster
> >>> + h264parse0:sink-6715 [002] ..... 172837.967773: <stack trace>
> >>> + => rkvdec_try_ctrl
> >>> + => try_or_set_cluster
> >>> + => try_set_ext_ctrls_common
> >>> + => try_set_ext_ctrls
> >>> + => v4l2_s_ext_ctrls
> >>> + => v4l_s_ext_ctrls
> >>> + ...
> >>> + h264parse0:sink-6715 [004] ..... 172837.985747: rkvdec_try_ctrl <-try_or_set_cluster
> >>> + h264parse0:sink-6715 [004] ..... 172837.985750: <stack trace>
> >>> + => rkvdec_try_ctrl
> >>> + => try_or_set_cluster
> >>> + => v4l2_ctrl_request_setup
> >>> + => rkvdec_run_preamble
> >>> + => rkvdec_h264_run
> >>> + => rkvdec_device_run
> >>> + ...
> >>> +
> >>> +Trace the children of a function call and show the return values (requires config `FUNCTION_GRAPH_RETVAL`)
> >>> +::
> >>> +
> >>> + echo function_graph > current_tracer
> >>> + echo rkvdec_h264_run > set_graph_function
> >>> + echo 4 > max_graph_depth
> >>> + echo do_interrupt_handler mutex_* > set_graph_notrace
> >>> + echo 1 > options/funcgraph-retval
> >>> + ...
> >>> + 4) | rkvdec_h264_run [rockchip_vdec]() {
> >>> + 4) | v4l2_ctrl_find [videodev]() {
> >>> + ...
> >>> + 4) | rkvdec_run_preamble [rockchip_vdec]() {
> >>> + 4) 4.666 us | v4l2_m2m_next_buf [v4l2_mem2mem](); /* = 0xffff000005782000 */
> >>> + ...
> >>> + 4) | v4l2_ctrl_request_setup [videodev]() {
> >>> + 4) 4.667 us | media_request_object_find [mc](); /* = 0xffff000005e3aa98 */
> >>> + 4) 1.750 us | find_ref [videodev](); /* = 0xffff00000833b2a0 */
> >>> + ...
> >>> + 4) 1.750 us | v4l2_m2m_buf_copy_metadata [v4l2_mem2mem](); /* = 0x0 */
> >>> + 4) ! 114.333 us | } /* rkvdec_run_preamble [rockchip_vdec] = 0x0 */
> >>> + 4) 2.334 us | v4l2_h264_init_reflist_builder [v4l2_h264](); /* = 0x3e */
> >>> + ...
> >>> + 4) | v4l2_h264_build_b_ref_lists [v4l2_h264]() {
> >>> + ...
> >>> + 4) | rkvdec_run_postamble [rockchip_vdec]() {
> >>> + ...
> >>> + 4) ! 444.208 us | } /* rkvdec_h264_run [rockchip_vdec] = 0x0 */
> >>> +
> >>> +DebugFS
> >>> +-------
> >>> +
> >>> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> >>> +
> >>> +Perf & alternatives
> >>> +-------------------
> >>> +
> >>> +For general advice see the `userspace-debugging-guide <../../debugging/userspace_debugging_guide.html>`__.
> >>> +
> >>> +Example for media devices:
> >>> +
> >>> +Gather statistics data for a decoding job: (This example is on a RK3399 SoC with the rkvdec codec driver using the `fluster test suite <https://github.com/fluendo/fluster>`__)
> >>> +::
> >>> +
> >>> + perf stat -d python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1
> >>> + ...
> >>> + Performance counter stats for 'python3 fluster.py run -d GStreamer-H.264-V4L2SL-Gst1.0 -ts JVT-AVC_V1 -tv AUD_MW_E -j1 -v':
> >>> +
> >>> + 7794.23 msec task-clock:u # 0.697 CPUs utilized
> >>> + 0 context-switches:u # 0.000 /sec
> >>> + 0 cpu-migrations:u # 0.000 /sec
> >>> + 11901 page-faults:u # 1.527 K/sec
> >>> + 882671556 cycles:u # 0.113 GHz (95.79%)
> >>> + 711708695 instructions:u # 0.81 insn per cycle (95.79%)
> >>> + 10581935 branches:u # 1.358 M/sec (15.13%)
> >>> + 6871144 branch-misses:u # 64.93% of all branches (95.79%)
> >>> + 281716547 L1-dcache-loads:u # 36.144 M/sec (95.79%)
> >>> + 9019581 L1-dcache-load-misses:u # 3.20% of all L1-dcache accesses (95.79%)
> >>> + <not supported> LLC-loads:u
> >>> + <not supported> LLC-load-misses:u
> >>> +
> >>> + 11.180830431 seconds time elapsed
> >>> +
> >>> + 1.502318000 seconds user
> >>> + 6.377221000 seconds sys
> >>> +
> >>> +The availability of events and metrics depends on the system you are running.
> >>> +
> >>> +Error checking & panic analysis
> >>> +-------------------------------
> >>> +
> >>> +For general advice see the `driver-development-debugging-guide <../../debugging/driver_development_debugging_guide.html>`__.
> >>> +
> >>> +**Copyright** |copy| 2024 : Collabora
> >>
> >> I would add a few more:
> >>
> >> - Implementing vidioc_log_status in the driver: this can log the current status to the kernel log.
> >> It's called by v4l2-ctl --log-status. Very useful for debugging problems with receiving video
> >> (TV/S-Video/HDMI/etc) since the video signal is external (so unpredictable). Less useful with
> >> camera sensor inputs since you have control over what the camera sensor does.
> >
> > To avoid unnecessary complexity in drivers, should we encourage
> > implementing log_status for receivers but discourage it for camera
> > sensors ? I haven't seen many people attempting to do so, but I have
> > pushed back against the debug read/write register ops in sensor drivers.
>
> I'm fine with that.
>
> Logging the current status is particularly useful if you have no control over
> what you receive, so anything can happen. But for camera sensors it is typically
> not needed.
>
> The register debug ops I haven't used in a very long time, and I wonder if it
> shouldn't be deprecated.
For most things I'll agree that you can use "i2ctransfer -f" to send
I2C commands independent of the driver, so implementing it has limited
gain.
However with adv7180 (and potentially others) the driver caches a page
register for accessing different pages of registers.
From userspace you can't easily know which page is currently being
used, so can't change it without potentially messing up the driver's
next access. That rather limits debug options through other routes, so
for that particular device I would be sad to see the ops go. I have a
patch that I need to send which adds support for the register debug
ops to adv7180.
Dave
> Regards,
>
> Hans
>
> >
> >> - Run v4l2-compliance to verify the driver. To see the detailed media topology (and check it) use:
> >> v4l2-compliance -M /dev/mediaX --verbose
> >> You can also run a full compliance check for all devices referenced in the media topology by
> >> running v4l2-compliance -m /dev/mediaX
> >>
> >>> diff --git a/Documentation/media/guides/index.rst b/Documentation/media/guides/index.rst
> >>> new file mode 100644
> >>> index 000000000000..0008966c0862
> >>> --- /dev/null
> >>> +++ b/Documentation/media/guides/index.rst
> >>> @@ -0,0 +1,11 @@
> >>> +.. SPDX-License-Identifier: GPL-2.0
> >>> +
> >>> +============
> >>> +Media Guides
> >>> +============
> >>> +
> >>> +.. toctree::
> >>> + :caption: Table of Contents
> >>> + :maxdepth: 1
> >>> +
> >>> + debugging_issues
> >>> diff --git a/Documentation/media/index.rst b/Documentation/media/index.rst
> >>> index d056a9e99dca..5461876fc401 100644
> >>> --- a/Documentation/media/index.rst
> >>> +++ b/Documentation/media/index.rst
> >>> @@ -7,6 +7,7 @@ Media Subsystem Documentation
> >>> .. toctree::
> >>> :maxdepth: 2
> >>>
> >>> + guides/index
> >>> ../userspace-api/media/index
> >>> ../driver-api/media/index.rst
> >>> ../admin-guide/media/index
> >
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 0/3] Documentation: Debugging guide
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
` (3 preceding siblings ...)
2024-10-22 15:23 ` [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
@ 2024-10-24 22:01 ` Jonathan Corbet
4 siblings, 0 replies; 18+ messages in thread
From: Jonathan Corbet @ 2024-10-24 22:01 UTC (permalink / raw)
To: Sebastian Fricke
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
Sebastian Fricke <sebastian.fricke@collabora.com> writes:
> The RFC contains:
> - a general debugging guide split into debugging for driver developers and
> debugging from userspace
> - a new summary page for all media related documentation. This is inspired by
> other subsystems, which first of all allows a user to find the subsystem
> under the subsystems page and secondly eases general navigation through the
> documentation that is sprinkled onto multiple places.
> - a guide on how to debug code in the media subsystem, which points to the
> parts of the general documentation and adds own routines.
I've just begun to take a look at this, apologies for taking so long.
Overall:
- I have been trying to reduce the number of top-level directories under
Documentation/, and this adds two more. Can we avoid that? Let's
start in that direction by putting your debugging guide inside
Documentation/process, please.
- If we *must* create a separate "media" directory, please make it
devices/media. My plan is to move most of the device-specific
documentation under Documentation/devices, making it match the source
layout; I just haven't summoned up the energy to start the slog of
actually doing it.
But it would be nice to avoid that altogether here. If we create
Documentation/process/debugging/, it should be able to hold both your
general and media-specific guides, and perhaps other
subsystem-specific guides could eventually land there as well.
- Please adhere to the 80-column limit for written text. It really does
make a difference for people reading it.
Thanks,
jon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC v2 2/3] docs: Add guides section for debugging
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
2024-09-25 8:01 ` Bagas Sanjaya
2024-10-03 23:06 ` Steve Cho
@ 2024-10-24 22:20 ` Jonathan Corbet
2 siblings, 0 replies; 18+ messages in thread
From: Jonathan Corbet @ 2024-10-24 22:20 UTC (permalink / raw)
To: Sebastian Fricke
Cc: linux-doc, linux-kernel, linux-media, laurent.pinchart,
hverkuil-cisco, mauro.chehab, kernel, bob.beckett,
nicolas.dufresne, Sebastian Fricke
Sebastian Fricke <sebastian.fricke@collabora.com> writes:
Mostly low-level comments here; the organizational comments I made
before also apply, of course.
> This idea was formed after noticing that new developers experience
> certain difficulty to navigate within the multitude of different
> debugging options in the Kernel and while there often is good
> documentation for the tools, the developer has to know first that they
> exist and where to find them.
> Add a general debugging section to the Kernel documentation, as an
> easily locatable entry point to other documentation and as a general
> guideline for the topic.
>
> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
> ---
> .../driver_development_debugging_guide.rst | 193 +++++++++++++++
> Documentation/debugging/index.rst | 66 +++++
> .../debugging/userspace_debugging_guide.rst | 269 +++++++++++++++++++++
> Documentation/index.rst | 2 +
> 4 files changed, 530 insertions(+)
>
> diff --git a/Documentation/debugging/driver_development_debugging_guide.rst b/Documentation/debugging/driver_development_debugging_guide.rst
> new file mode 100644
> index 000000000000..c750f63ac1d3
> --- /dev/null
> +++ b/Documentation/debugging/driver_development_debugging_guide.rst
> @@ -0,0 +1,193 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
I hadn't realized how many people have gone nuts with that file. I
wouldn't be surprised to learn that it slows the build measurably at
this point, all so that we can say |copy|... Maybe I'll go through and
replace all that with a simple © someday.
> +========================================
> +Debugging advice for driver development
> +========================================
> +
> +This document serves as a general starting point and lookup for debugging device
> +drivers.
> +While this guide focuses on debugging that requires re-compiling the
> +module/kernel, the `userspace-debugging-guide <userspace_debugging_guide.html>`__
You can refer to another file just by its name; say
Documentation/process/debugging/userspace_debugging_guide.rst and the
right things will happen.
People reading the text files aren't looking for HTML files, so we
really don't want to make internal links that way.
> +will guide you through tools like dynamic debug, ftrace and other tools useful
> +for debugging issues and behavior.
> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> +
> +.. contents::
> + :depth: 3
> +
> +Available tools
> +===============
> +
> +Printk & friends
> +----------------
> +
> +These are derivatives of printf() with varying destinations and support for being dynamically turned on or off, or lack thereof.
> +
> +.. _printk:
Do you really need this label? That's a pretty general term to put into
a global namespace.
> +**Simple printk**
> +~~~~~~~~~~~~~~~~~
Why the ** markup? It's already a section head
> +The classic, can be used to great effect for quick and dirty development
> +of new modules or to extract arbitrary necessary data for troubleshooting.
> +
> +Prerequisite: :code:`CONFIG_PRINTK` (usually enabled by default)
No need to clutter things with :code:; just say `CONFIG_PRINTK` (or even
leave out the literal markup entirely).
> +**Pros**:
> +
> +- No need to learn anything, simple to use
> +- Easy to modify exactly to your needs (formatting of the data (See: `format specifiers <../core-api/printk-formats.html>`__), visibility in the log)
> +- Can cause delays in the execution of the code (beneficial to confirm whether timing is a factor)
> +
> +**Cons**:
> +
> +- Requires rebuilding the kernel/module
> +- Can cause delays in the execution of the code (which can cause issues to be not reproducible)
> +
> +`Full documentation <../core-api/printk-basics.html>`__
Documentation/core-api/printk-basics.rst
> +.. _trace_printk:
> +
> +**Trace_printk**
> +~~~~~~~~~~~~~~~~
> +
> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE` & :code:`#include <linux/ftrace.h>`
> +
> +It is a tiny bit less comfortable to use than `printk`_, because you will have
> +to read the messages from the trace file (See: `Reading the ftrace log
> +<userspace_debugging_guide.html#read-the-ftrace-log>`_ instead of from the
> +kernel log, but very useful when printk adds unwanted delays into the code
Say printk() with parentheses, and the build process will recognize it
and make an automatic link to the documentation; that would be good to
do throughout.
> +execution, causing issues to be flaky or hidden.)
> +
> +If the processing of this still causes timing issues then you can try `trace_puts()`.
...but do it without `literal` markup, let the build system do it.
> +`Full Documentation <../driver-api/basics.html#c.trace_printk>`__
> +
> +**dev_dbg**
> +~~~~~~~~~~~
> +
> +Print statement, which can be target by `dynamic debug
> +<userspace_debugging_guide.html#dynamic-debug>`__, that contains additional
> +information about the device used within the context.
> +
> +**When is it appropriate to leave a debug print in the code?**
> +
> +Permanent debug statements have to be useful for a developer to troubleshoot
> +driver misbehavior. Judging that is a bit more of an art than a science, but
> +some guidelines are in the `Coding style guide
> +<../process/coding-style.html#printing-kernel-messages>`__.
> +
> +**Custom printk**
> +~~~~~~~~~~~~~~~~~
> +
> +Example:
> +::
This can be just "Example::" - good to fix throughout.
> + #define core_dbg(fmt, arg...) do { \
> + if (core_debug) \
> + printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
> + } while (0)
> +
> +**When should you do this?**
> +
> +It is better to just use a `pr_debug()`, which can later be turned on/off with
> +dynamic debug. Additionally, a lot of drivers activate these prints via a
> +variable like `core_debug` set by a module parameter. However, Module
> +parameters `are not recommended anymore
> +<https://lkml.org/lkml/2024/3/27/163>`_.
Use lore links, please
> +Ftrace
> +------
> +
> +**Creating custom Ftrace tracepoint**
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Here is a basic description of `how to implement new tracepoints <../trace/tracepoints.html#usage>`__.
> +
> +`Full event tracing documentation <../trace/events.html>`__
> +
> +`Full Ftrace documentation <../trace/ftrace.html>`__
> +
> +DebugFS
> +-------
> +
> +Prerequisite: :code:`CONFIG_DEBUG_FS` & :code:`#include <linux/debugfs.h>`
> +
> +DebugFS differs from the other approaches of debugging, as it doesn't write messages to the kernel log nor add traces to the code. Instead it allows the developer to handle a set of files.
> +With these files you can either store values of variables or make register/memory dumps or you can make these files writable and modify values/settings in the driver.
> +Possible use-cases among others:
> +
> +- Store register values
> +- Keep track of variables
> +- Store errors
> +- Store settings
> +- Toggle a setting like debug on/off
> +- Error injection
> +
> +This is especially useful, when the size of a data dump would be hard to digest as
> +part of the general kernel log (for example when dumping raw bitstream data) or
> +when you are not interested in all the values all the time, but with the
> +possibility to inspect them.
> +
> +The general idea is:
> +
> +- Create a directory during probe (`struct dentry *parent = debugfs_create_dir("my_driver", NULL);`)
> +- Create a file (`debugfs_create_u32("my_value", 444, parent, &my_variable);`)
> +
> + - In this example the file is found in `/sys/kernel/debug/my_driver/my_value` (with read permissions for user/group/all)
> + - any update of `my_variable` will update the value in the file
> +
> +- Clean up the folder when removing the device (`debugfs_remove_recursive(parent);`)
> +
> +`Full documentation <../filesystems/debugfs.html>`__
> +
> +.. _error_checking:
> +
> +KASAN, UBSAN, lockdep and other error checkers
> +----------------------------------------------
> +
> +KASAN (Kernel Address Sanitizer)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +KASAN is a dynamic memory error detector that helps to find use-after-free and
> +out-of-bounds bugs. It uses compile-time instrumentation to check every memory
> +access.
> +
> +`Full documentation <../dev-tools/kasan.html>`__
> +
> +UBSAN (Undefined Behavior Sanitizer)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +UBSAN relies on compiler instrumentation and runtime checks to detect undefined
> +behavior. It is designed to find a variety of issues, including signed integer overflow,
> +array index out of bounds, and more.
> +
> +`Full documentation <../dev-tools/ubsan.html>`__
> +
> +lockdep (Lock Dependency Validator)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +lockdep is a runtime lock dependency validator that detects potential deadlocks
> +and other locking-related issues in the kernel.
> +It tracks lock acquisitions and releases, building a dependency graph that is
> +analyzed for potential deadlocks.
> +lockdep is especially useful for validating the correctness of lock ordering in
> +the kernel.
> +
> +device coredump
> +---------------
> +
> +Prerequisite: :code:`#include <linux/devcoredump.h>`
> +
> +Provides infrastructure through which the driver can provide arbitrary data to
> +userland. It is most often used in conjunction with udev or similar userland
> +infrastructure to listen for the kernel uevents, which indicates the dump is
> +ready. Udev then usually has rules to copy that file somewhere for long-term
> +storage and analysis as by default the data for the dump is automatically
> +cleaned up after 5 minutes.
> +That data is then analyzed with driver-specific tools or GDB.
> +
> +You can find an example implementation at: :code:`drivers/media/platform/qcom/venus/core.c`
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/debugging/index.rst b/Documentation/debugging/index.rst
> new file mode 100644
> index 000000000000..7bdad2fa09e1
> --- /dev/null
> +++ b/Documentation/debugging/index.rst
> @@ -0,0 +1,66 @@
> +
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +====================================================
> +General debugging advice for Linux Kernel developers
> +====================================================
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + driver_development_debugging_guide
> + userspace_debugging_guide
> +
> +.. only:: subproject and html
> +
> + Indices
> + =======
> +
> + * :ref:`genindex`
> +
> +General debugging advice
> +========================
> +
> +Depending on the issue, a different set of tools is available to track down the
> +problem or even to realize whether there is one in the first place.
> +
> +As a first step you have to figure out what kind of issue you want to debug.
> +Depending on the answer, your methodology and choice of tools may vary.
> +
> +Do I need to debug with limited access?
> +---------------------------------------
> +
> +Do you have limited access to the machine or are you unable to stop the running execution?
> +
> +In this case your debugging capability depends on built-in debugging support of
> +provided distro kernel.
> +The `userspace debugging guide <userspace_debugging_guide.html>`__ provides a
> +brief overview over range of possible debugging tools in that situation. You
> +can check the capability of your kernel, in most cases, by looking into config
> +file within the /boot folder.
> +
> +Do I have root access to the system?
> +------------------------------------
> +
> +Are you easily able to replace the module in question or to install a new kernel?
> +
> +In that case your range of available tools is a lot bigger, you can find the
> +tools `here <driver_development_debugging_guide.html>`__.
> +
> +Is timing a factor?
> +-------------------
> +
> +It is important to understand if the problem you want to debug manifests itself
> +consistently (i.e. given a set of inputs you always get the same, incorrect
> +output), or inconsistently. If it manifests itself inconsistently, some timing
> +factor might be at play. If inserting delays into the code does change the
> +behavior, then quite likely timing is a factor.
> +
> +When timing does alter the outcome of the code execution using a simple `printk
> +<driver_development_debugging_guide.html#printk>`_ for debugging purposes won't
> +work, a similar alternative is to use `trace_printk
> +<driver_development_debugging_guide.html#trace-printk>`_, which logs the debug
> +messages to the trace file instead of the kernel log.
> +
> +**Copyright** |copy| 2024 : Collabora
> diff --git a/Documentation/debugging/userspace_debugging_guide.rst b/Documentation/debugging/userspace_debugging_guide.rst
> new file mode 100644
> index 000000000000..4d269a9ef913
> --- /dev/null
> +++ b/Documentation/debugging/userspace_debugging_guide.rst
> @@ -0,0 +1,269 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
> +
> +==========================
> +Userspace debugging advice
> +==========================
> +
> +A brief overview of common tools to debug the Linux Kernel from userspace.
> +For debugging advice aimed at driver developer go `here <driver_development_debugging_guide.html>`__.
> +For general debugging advice, see `general-debugging-guide <index.html>`__.
> +
> +.. contents::
> + :depth: 3
> +
> +Available tools
> +===============
> +
> +Dynamic debug
> +-------------
> +
> +Mechanism to filter what ends up in the kernel log by dis-/en-abling log
> +messages.
> +
> +Prerequisite: `CONFIG_DYNAMIC_DEBUG`
> +
> +.. _valid_dyndbg_prints:
> +
> +Dynamic debug is only able to target:
> +
> +- `pr_debug()`
> +- `dev_dbg()`
> +- `print_hex_dump_debug()`
> +- `print_hex_dump_bytes()`
> +
> +Therefore the usability of this tool is quite limited in the media subsystem,
> +because, as of now, there is no uniform rule for adding debug prints to the codebase,
> +resulting in a variety of ways these prints are implemented.
This is supposed to be the general user-space debugging guide, so the
"media" reference doesn't quite fit.
> +Also, note that most debug statements are implemented as a variation of
> +`dprintk`, which have to be activated via a parameter in respective module,
> +dynamic debug is unable to do that step for you.
> +
> +Here is one example, that enables all available `pr_debug()`'s within the file:
> +::
> +
> + $ alias ddcmd='echo $* > /proc/dynamic_debug/control'
> + $ ddcmd '-p; file v4l2-h264.c +p'
> + $ grep =p /proc/dynamic_debug/control
> + drivers/media/v4l2-core/v4l2-h264.c:372 [v4l2_h264]print_ref_list_b =p "ref_pic_list_b%u (cur_poc %u%c) %s"
> + drivers/media/v4l2-core/v4l2-h264.c:333 [v4l2_h264]print_ref_list_p =p "ref_pic_list_p (cur_poc %u%c) %s\n"
> +
> +**When should you use this over** `Ftrace`_ **?**
> +
> +- When the code contains one of the :ref:`valid print statements <valid_dyndbg_prints_>`_ or when you have added multiple pr_debug() statements during development
> +- When timing is not an issue, meaning if multiple `pr_debug()` statements in the code won't cause delays
> +- When you care more about receiving specific log messages than tracing the pattern of how a function is called
> +
> +`Full documentation <../admin-guide/dynamic-debug-howto.html>`__
> +
> +Ftrace
> +------
> +
> +Prerequisite: :code:`CONFIG_DYNAMIC_FTRACE`
> +
> +Trace whenever the a file is opened:
> +::
> +
> + $ cd /sys/kernel/tracing
> + $ echo function > /sys/kernel/tracing/current_tracer
> + $ echo do_filep_open > set_ftrace_filter
> + $ echo 1 > tracing_on
> + $ cat trace
> + find-4624 [005] ...1. 580781.888166: do_filp_open <-do_sys_openat2
> + find-4624 [005] ...1. 580781.888237: do_filp_open <-do_sys_openat2
> + find-4624 [005] ...1. 580781.888361: do_filp_open <-do_sys_openat2
> +
> +.. _event_tracing:
It seems we're kind of duplicating some of the ftrace documentation at
this point? Duplicate docs have the same problems as duplicate code -
they get out of sync and are best avoided.
Also, *please* resist the temptation to add labels everywhere.
Stopping here; I think this should be enough to get started with.
Thanks for working on this, I think it could turn into useful material
to have.
jon
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-10-24 22:20 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 8:45 [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 1/3] docs: media: Create separate documentation folder for media Sebastian Fricke
2024-09-24 8:45 ` [PATCH RFC v2 2/3] docs: Add guides section for debugging Sebastian Fricke
2024-09-25 8:01 ` Bagas Sanjaya
2024-10-03 23:06 ` Steve Cho
2024-10-04 10:12 ` Sebastian Fricke
2024-10-04 23:30 ` Steve Cho
2024-10-24 22:20 ` Jonathan Corbet
2024-09-24 8:45 ` [PATCH RFC v2 3/3] docs: media: Debugging guide for the media subsystem Sebastian Fricke
2024-10-03 22:32 ` Steve Cho
2024-10-14 17:42 ` Daniel Almeida
2024-10-23 11:43 ` Hans Verkuil
2024-10-23 12:29 ` Laurent Pinchart
2024-10-23 12:37 ` Hans Verkuil
2024-10-23 13:04 ` Dave Stevenson
2024-10-22 15:23 ` [PATCH RFC v2 0/3] Documentation: Debugging guide Sebastian Fricke
2024-10-23 6:57 ` Bagas Sanjaya
2024-10-24 22:01 ` Jonathan Corbet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).