All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xunlei Pang <xpang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	xlpang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Harald Hoyer <harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Pratyush Anand <panand-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC PATCH 1/2] add 99memdebug-ko dracut module
Date: Thu, 3 Nov 2016 19:52:56 +0800	[thread overview]
Message-ID: <581B2518.6060503@redhat.com> (raw)
In-Reply-To: <20161103083826.GA15431-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>

On 2016/11/03 at 16:38, Dave Young wrote:
> On 11/03/16 at 03:28pm, Xunlei Pang wrote:
> [snip]
>>> For large trace data(tested on rhel7, the filter doesn't work on rhel7, and will produce huge trace data),
>>> the time consumption is huge, I am afraid in minutes because I once suspected the script was in some
>>> dead loop when parsing "tracing/trace" directly. It is the same situation when turning off tracing_on and
>>> try again.
>> Although I don't know why, after I replaced the following scripts
>> 1)
>> while read pid cpu flags ts function
>> do
>>     ... ...
>> done < "$TRACE_BASE/tracing/trace"
>>
>> with
>>
>> 2)
>> cat "$TRACE_BASE/tracing/trace" | while read pid cpu flags ts function
>> do
>>     ... ...
>> done
>>
>> 2) became not time-consuming just like parsing the copied filename in 1) ...
> Maybe 1) read the sysfs file a lot of times, but 2) only once then
> parsing them in pipe which is quiker.
>
> It should be fine if 2) is acceptable, but if the data is very large it
> may worth to use some external program like awk which will be faster.

Hi Dave,

What do you think the following approach?

============== [PATCH 1/2] ================
---
 modules.d/99base/memdebug-ko.sh | 119 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)
 create mode 100755 modules.d/99base/memdebug-ko.sh

diff --git a/modules.d/99base/memdebug-ko.sh b/modules.d/99base/memdebug-ko.sh
new file mode 100755
index 0000000..2839966
--- /dev/null
+++ b/modules.d/99base/memdebug-ko.sh
@@ -0,0 +1,119 @@
+# Try to find out kernel modules with large total memory allocation during loading.
+# For large slab allocation, it will fall into buddy, thus tracing "mm_page_alloc"
+# alone should be enough for the purpose.
+
+# "sys/kernel/tracing" has the priority if exists.
+get_trace_base() {
+    # trace access through debugfs would be obsolete if "/sys/kernel/tracing" is available.
+    if [[ -d "/sys/kernel/tracing" ]]; then
+        echo "/sys/kernel"
+    else
+        echo "/sys/kernel/debug"
+    fi
+}
+
+is_trace_data_prepared() {
+    local trace_base
+
+    trace_base=$(get_trace_base)
+    # old debugfs interface case.
+    if ! [[ -d "$trace_base/tracing" ]]; then
+        mount none -t debugfs $trace_base
+    # new tracefs interface case.
+    elif ! [[ -f "$trace_base/tracing/trace" ]]; then
+        mount none -t tracefs "$trace_base/tracing"
+    fi
+
+    if ! [[ -f "$trace_base/tracing/trace" ]]; then
+        echo "WARN: Mount trace failed for kernel module memory analyzing."
+        return 1
+    fi
+
+    MATCH_EVENTS="module:module_put module:module_load kmem:mm_page_alloc"
+    SET_EVENTS=$(echo $(cat $trace_base/tracing/set_event))
+    # Check if trace was properly setup, prepare it if not.
+    if [[ $(cat $trace_base/tracing/tracing_on) != 1 ]] || \
+        [[ "$SET_EVENTS" != "$MATCH_EVENTS" ]]; then
+        # Set our trace events.
+        echo $MATCH_EVENTS > $trace_base/tracing/set_event
+
+        # There are three kinds of known applications for module loading:
+        # "systemd-udevd", "modprobe" and "insmod".
+        # Set them to the mm_page_alloc event filter.
+        # NOTE: Some kernel may not support this format of filter, anyway
+        #       the operation will fail and it doesn't matter.
+        page_alloc_filter="comm == systemd-udevd || comm == modprobe || comm == insmod"
+        echo $page_alloc_filter > $trace_base/tracing/events/kmem/mm_page_alloc/filter
+
+        # Set the number of comm-pid if supported.
+        if [[ -f "$trace_base/tracing/saved_cmdlines_size" ]]; then
+            # Thanks to filters, 4096 is big enough(also well supported).
+            echo 4096 > $trace_base/tracing/saved_cmdlines_size
+        fi
+
+        # Enable and clear trace data for the first time.
+        echo 1 > $trace_base/tracing/tracing_on
+        echo > $trace_base/tracing/trace
+        echo "Prepare trace success."
+        return 1
+    fi
+
+    return 0
+}
+
+parse_trace_data() {
+    local module_name
+    # Indexed by task pid.
+    local -A current_module
+    # Indexed by module name.
+    local -A module_loaded
+    local -A nr_alloc_pages
+
+    cat "$(get_trace_base)/tracing/trace" | while read pid cpu flags ts function
+    do
+        # Skip comment lines
+        if [[ $pid = "#" ]]; then
+            continue
+        fi
+
+        if [[ $function = module_load* ]]; then
+            # One module is being loaded, save the task pid for tracking.
+            module_name=${function#*: }
+            # Remove the trailing after whitespace, there may be the module flags.
+            module_name=${module_name%% *}
+            # Mark current_module to track the task.
+            current_module[$pid]="$module_name"
+            [[ ${module_loaded[$module_name]} ]] && echo "WARN: \"$module_name\" was loaded multiple times!"
+            unset module_loaded[$module_name]
+            nr_alloc_pages[$module_name]=0
+            continue
+        fi
+
+        if ! [[ ${current_module[$pid]} ]]; then
+            continue
+        fi
+
+        # Once we get here, the task is being tracked(is loading a module).
+        # Get the module name.
+        module_name=${current_module[$pid]}
+
+        if [[ $function = module_put* ]]; then
+            # Mark the module as loaded when the first module_put event happens after module_load.
+            echo "${nr_alloc_pages[$module_name]} pages consumed by \"$module_name\""
+            module_loaded[$module_name]=1
+            # Module loading finished, so untrack the task.
+            unset current_module[$pid]
+            continue
+        fi
+
+        if [[ $function = mm_page_alloc* ]]; then
+            order=$(echo $function | sed -e 's/.*order=\([0-9]*\) .*/\1/')
+            nr_alloc_pages[$module_name]=$((${nr_alloc_pages[$module_name]}+$((2 ** $order))))
+        fi
+    done
+}
+
+if is_trace_data_prepared ; then
+    echo "showkomem - memory consumption of loading kernel modules(the larger, the more precise)"
+    parse_trace_data
+fi
-- 
1.8.3.1

============== [PATCH 2/2] ================
---
 modules.d/98dracut-systemd/dracut-cmdline.sh     |  2 +-
 modules.d/98dracut-systemd/dracut-mount.sh       |  2 +-
 modules.d/98dracut-systemd/dracut-pre-mount.sh   |  2 +-
 modules.d/98dracut-systemd/dracut-pre-pivot.sh   |  2 +-
 modules.d/98dracut-systemd/dracut-pre-trigger.sh |  2 +-
 modules.d/98dracut-systemd/dracut-pre-udev.sh    |  2 +-
 modules.d/99base/dracut-lib.sh                   |  5 ++++-
 modules.d/99base/init.sh                         | 10 +++++-----
 modules.d/99base/module-setup.sh                 |  1 +
 9 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/modules.d/98dracut-systemd/dracut-cmdline.sh b/modules.d/98dracut-systemd/dracut-cmdline.sh
index 6c6ee02..bff9435 100755
--- a/modules.d/98dracut-systemd/dracut-cmdline.sh
+++ b/modules.d/98dracut-systemd/dracut-cmdline.sh
@@ -42,7 +42,7 @@ export root
 export rflags
 export fstype
 
-make_trace_mem "hook cmdline" '1+:mem' '1+:iomem' '3+:slab'
+make_trace_mem "hook cmdline" '1+:mem' '1+:iomem' '3+:slab' '4+:komem'
 # run scriptlets to parse the command line
 getarg 'rd.break=cmdline' -d 'rdbreak=cmdline' && emergency_shell -n cmdline "Break before cmdline"
 source_hook cmdline
diff --git a/modules.d/98dracut-systemd/dracut-mount.sh b/modules.d/98dracut-systemd/dracut-mount.sh
index c4febfe..89ebc31 100755
--- a/modules.d/98dracut-systemd/dracut-mount.sh
+++ b/modules.d/98dracut-systemd/dracut-mount.sh
@@ -7,7 +7,7 @@ type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
 
 source_conf /etc/conf.d
 
-make_trace_mem "hook mount" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook mount" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 
 getarg 'rd.break=mount' -d 'rdbreak=mount' && emergency_shell -n mount "Break mount"
 # mount scripts actually try to mount the root filesystem, and may
diff --git a/modules.d/98dracut-systemd/dracut-pre-mount.sh b/modules.d/98dracut-systemd/dracut-pre-mount.sh
index ae51128..a3b9d29 100755
--- a/modules.d/98dracut-systemd/dracut-pre-mount.sh
+++ b/modules.d/98dracut-systemd/dracut-pre-mount.sh
@@ -8,7 +8,7 @@ type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
 
 source_conf /etc/conf.d
 
-make_trace_mem "hook pre-mount" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-mount" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 # pre pivot scripts are sourced just before we doing cleanup and switch over
 # to the new root.
 getarg 'rd.break=pre-mount' 'rdbreak=pre-mount' && emergency_shell -n pre-mount "Break pre-mount"
diff --git a/modules.d/98dracut-systemd/dracut-pre-pivot.sh b/modules.d/98dracut-systemd/dracut-pre-pivot.sh
index cc70e3c..dfd328c 100755
--- a/modules.d/98dracut-systemd/dracut-pre-pivot.sh
+++ b/modules.d/98dracut-systemd/dracut-pre-pivot.sh
@@ -8,7 +8,7 @@ type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
 
 source_conf /etc/conf.d
 
-make_trace_mem "hook pre-pivot" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-pivot" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 # pre pivot scripts are sourced just before we doing cleanup and switch over
 # to the new root.
 getarg 'rd.break=pre-pivot' 'rdbreak=pre-pivot' && emergency_shell -n pre-pivot "Break pre-pivot"
diff --git a/modules.d/98dracut-systemd/dracut-pre-trigger.sh b/modules.d/98dracut-systemd/dracut-pre-trigger.sh
index ac1ec36..7cd821e 100755
--- a/modules.d/98dracut-systemd/dracut-pre-trigger.sh
+++ b/modules.d/98dracut-systemd/dracut-pre-trigger.sh
@@ -8,7 +8,7 @@ type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
 
 source_conf /etc/conf.d
 
-make_trace_mem "hook pre-trigger" "1:shortmem" "2+:mem" "3+:slab"
+make_trace_mem "hook pre-trigger" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 
 source_hook pre-trigger
 
diff --git a/modules.d/98dracut-systemd/dracut-pre-udev.sh b/modules.d/98dracut-systemd/dracut-pre-udev.sh
index ca13048..17268a1 100755
--- a/modules.d/98dracut-systemd/dracut-pre-udev.sh
+++ b/modules.d/98dracut-systemd/dracut-pre-udev.sh
@@ -7,7 +7,7 @@ type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
 
 source_conf /etc/conf.d
 
-make_trace_mem "hook pre-udev" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-udev" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 # pre pivot scripts are sourced just before we doing cleanup and switch over
 # to the new root.
 getarg 'rd.break=pre-udev' 'rdbreak=pre-udev' && emergency_shell -n pre-udev "Break pre-udev"
diff --git a/modules.d/99base/dracut-lib.sh b/modules.d/99base/dracut-lib.sh
index 060b3fe..833ed5f 100755
--- a/modules.d/99base/dracut-lib.sh
+++ b/modules.d/99base/dracut-lib.sh
@@ -1206,7 +1206,7 @@ are_lists_eq() {
 
 setmemdebug() {
     if [ -z "$DEBUG_MEM_LEVEL" ]; then
-        export DEBUG_MEM_LEVEL=$(getargnum 0 0 3 rd.memdebug)
+        export DEBUG_MEM_LEVEL=$(getargnum 0 0 4 rd.memdebug)
     fi
 }
 
@@ -1296,6 +1296,9 @@ show_memstats()
         iomem)
             cat /proc/iomem
             ;;
+        komem)
+            showkomem
+            ;;
     esac
 }
 
diff --git a/modules.d/99base/init.sh b/modules.d/99base/init.sh
index a563393..f0195d8 100755
--- a/modules.d/99base/init.sh
+++ b/modules.d/99base/init.sh
@@ -131,7 +131,7 @@ if ! getargbool 1 'rd.hostonly'; then
 fi
 
 # run scriptlets to parse the command line
-make_trace_mem "hook cmdline" '1+:mem' '1+:iomem' '3+:slab'
+make_trace_mem "hook cmdline" '1+:mem' '1+:iomem' '3+:slab' '4+:komem'
 getarg 'rd.break=cmdline' -d 'rdbreak=cmdline' && emergency_shell -n cmdline "Break before cmdline"
 source_hook cmdline
 
@@ -141,7 +141,7 @@ source_hook cmdline
 export root rflags fstype netroot NEWROOT
 
 # pre-udev scripts run before udev starts, and are run only once.
-make_trace_mem "hook pre-udev" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-udev" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 getarg 'rd.break=pre-udev' -d 'rdbreak=pre-udev' && emergency_shell -n pre-udev "Break before pre-udev"
 source_hook pre-udev
 
@@ -160,7 +160,7 @@ fi
 
 udevproperty "hookdir=$hookdir"
 
-make_trace_mem "hook pre-trigger" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-trigger" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 getarg 'rd.break=pre-trigger' -d 'rdbreak=pre-trigger' && emergency_shell -n pre-trigger "Break before pre-trigger"
 source_hook pre-trigger
 
@@ -230,7 +230,7 @@ unset RDRETRY
 
 # pre-mount happens before we try to mount the root filesystem,
 # and happens once.
-make_trace_mem "hook pre-mount" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-mount" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 getarg 'rd.break=pre-mount' -d 'rdbreak=pre-mount' && emergency_shell -n pre-mount "Break pre-mount"
 source_hook pre-mount
 
@@ -266,7 +266,7 @@ done
 
 # pre pivot scripts are sourced just before we doing cleanup and switch over
 # to the new root.
-make_trace_mem "hook pre-pivot" '1:shortmem' '2+:mem' '3+:slab'
+make_trace_mem "hook pre-pivot" '1:shortmem' '2+:mem' '3+:slab' '4+:komem'
 getarg 'rd.break=pre-pivot' -d 'rdbreak=pre-pivot' && emergency_shell -n pre-pivot "Break pre-pivot"
 source_hook pre-pivot
 
diff --git a/modules.d/99base/module-setup.sh b/modules.d/99base/module-setup.sh
index b03772e..13019f0 100755
--- a/modules.d/99base/module-setup.sh
+++ b/modules.d/99base/module-setup.sh
@@ -35,6 +35,7 @@ install() {
     inst_script "$moddir/initqueue.sh" "/sbin/initqueue"
     inst_script "$moddir/loginit.sh" "/sbin/loginit"
     inst_script "$moddir/rdsosreport.sh" "/sbin/rdsosreport"
+    inst_script "$moddir/memdebug-ko.sh" "/sbin/showkomem"
 
     [ -e "${initdir}/lib" ] || mkdir -m 0755 -p ${initdir}/lib
     mkdir -m 0755 -p ${initdir}/lib/dracut
-- 
1.8.3.1


  parent reply	other threads:[~2016-11-03 11:52 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02  9:03 [RFC PATCH 1/2] add 99memdebug-ko dracut module Xunlei Pang
     [not found] ` <1478077386-30039-1-git-send-email-xlpang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-02  9:03   ` [RFC PATCH 2/2] dracut.cmdline.7.asc: update document for rd.memdebug=4 Xunlei Pang
2016-11-03  3:01   ` [RFC PATCH 1/2] add 99memdebug-ko dracut module Dave Young
     [not found]     ` <20161103030142.GA3201-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2016-11-03  4:15       ` Xunlei Pang
     [not found]         ` <581AB9D7.2010905-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-03  4:45           ` Xunlei Pang
2016-11-03  7:28           ` Xunlei Pang
     [not found]             ` <581AE707.9050606-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-03  8:38               ` Dave Young
     [not found]                 ` <20161103083826.GA15431-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2016-11-03 11:52                   ` Xunlei Pang [this message]
     [not found]                     ` <581B2518.6060503-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-04  5:50                       ` Dave Young
     [not found]                         ` <20161104055026.GB2889-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2016-11-04  6:35                           ` Xunlei Pang
     [not found]                             ` <581C2C30.8070403-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-04  7:06                               ` Dave Young
     [not found]                                 ` <20161104070629.GA17145-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2016-11-04  7:55                                   ` Xunlei Pang
     [not found]                                     ` <581C3EDE.9080601-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-11-04  8:23                                       ` Dave Young
     [not found]                                         ` <20161104082328.GA19275-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2016-11-04  9:12                                           ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=581B2518.6060503@redhat.com \
    --to=xpang-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=panand-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=xlpang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.