From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sedat Dilek Subject: Re: [next-20130118] Analyzing a call-trace reproducible with pm_test/freezer [ X86|RCU|TTY|EXT4FS|JBD2|PM related? ] Date: Sat, 19 Jan 2013 21:48:33 +0100 Message-ID: References: Reply-To: sedat.dilek@gmail.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: LKML , Linux PM List , linux-serial@vger.kernel.org, linux-fsdevel , x86@kernel.org, "Rafael J. Wysocki" , Ilya Zykov , Greg Kroah-Hartman , Jiri Slaby , "Theodore Ts'o" , Thomas Gleixner , Paul McKenney , Paul McKenney , Jan Kara , Stephen Rothwell To: linux-next Return-path: In-Reply-To: Sender: linux-next-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, Jan 19, 2013 at 7:28 PM, Sedat Dilek wrote: > Hi, > > I am still stepping in the dark how track this problem down. > If you have useful (debug) help, you are welcome. > > Regards, > - Sedat - > > ##### ANALYZING CALL-TRACES SEEN IN LINUX-NEXT-20130118 ##### > > ##### INTRO > > ### MY LINUX-SYSTEM > > I am running here Ubuntu/precise AMD64 in a WUBI environment. > The system was formatted with EXT4-FS. > > ### MY LINUX-KERNEL > > My own built Linux-Next (next-20130118) kernel includes some additional patches, > especially a TTY-NEXT fix from Ilya Zykov. > > See also attached kernel-config file, dmesg and lspci outputs. > > ### SOME USEFUL KERNEL DEBUG OPTIONS > > CONFIG_PM_DEBUG=y > CONFIG_EXT4_DEBUG=y > CONFIG_JBD2_DEBUG=y > > > ##### [A] ANALYZING LAST REBOOT > > ### [A.1] MESSAGES > > # Messages #1: DBUS related? > nm-dispatcher.action: Could not get the system bus. Make sure the > message bus daemon is running! Message: Failed to connect to socket > /var/run/dbus/system_bus_socket: No such file or directory > > # Messages #2: RCU_SCHED related? > INFO: rcu_sched detected stalls on CPUs/tasks: { 2 3} (detected by 1, > t=15004 jiffies, g=70928, c=70927, q=15) > INFO: task kworker/0:1:37 blocked for more than 120 seconds. "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > ### [A.2] ANALYZING DBUS PROBLEM > > # ll /var/run/dbus/system_bus_socket* > srwxrwxrwx 1 root root 0 Jan 19 17:08 /var/run/dbus/system_bus_socket= > > # ll /var/run/dbus/ > insgesamt 4 > drwxr-xr-x 2 messagebus messagebus 80 Jan 19 17:08 ./ > drwxr-xr-x 22 root root 800 Jan 19 17:11 ../ > -rw-r--r-- 1 root root 4 Jan 19 17:08 pid > srwxrwxrwx 1 root root 0 Jan 19 17:08 system_bus_socket= > > # cat /var/run/dbus/pid > 862 > > # ps axu | grep -v grep | grep 862 > 102 862 0.2 0.0 25492 2332 ? Ss 17:08 0:01 > dbus-daemon --system --fork --activation=upstart > > # pidof dbus-daemon > 1921 862 > > ### [A.3] ANALYZING RCU_SCHED PROBLEM > > # cat /proc/sys/kernel/hung_task_timeout_secs > 120 > > > ##### [B] TESTCASE DESCRIPTION > > ### [B.1] TESTCASE #1: Do a "normal" suspend and resume > > Select "suspend" (DE: "Bereitschaft") from system's menue within Ubuntu/unity. > > ### [B.2] TESTCASE #2: Run pm_test/freezer (PREREQ: CONFIG_PM_DEBUG=y) > > HELP: > > # cat /sys/power/pm_test > [none] core processors platform devices freezer > > # echo "freezer" > /sys/power/pm_test > # cat /sys/power/pm_test > none core processors platform devices [freezer] > > # echo mem > /sys/power/state && sleep 1 <--- LOOP until you see a call-trace > > ### [B.3] TESTCASE #3: Change "hung_task_timeout_secs" before running > pm_test/freezer > > # cat /proc/sys/kernel/hung_task_timeout_secs > 120 > > # echo 0 > /proc/sys/kernel/hung_task_timeout_secs > # cat /proc/sys/kernel/hung_task_timeout_secs > 0 > > > ##### [C] THE CALL-TRACE > > ### [C.1] TRY TO FORCE THE CALL-TRACE > > The call-trace is not ***always*** reproducible! > Even with "0" for "hung_task_timeout_secs" it is seen (see testcase #3)! > > The call-trace ***always*** happened when the output "-su: echo: write > error: Device or resource busy" was seen! > Unsure, how to intpret it! > What means "-su"? > Write-error to which device? > Which resource is busy? > > # LC_ALL=C echo mem > /sys/power/state && sleep 1 > -su: echo: write error: Device or resource busy <--- THIS OUTPUT! > > [ /var/log/kern.log ] > > n 19 18:00:02 fambox kernel: [ 3065.559184] PM: Syncing filesystems ... done. > Jan 19 18:00:02 fambox kernel: [ 3065.768458] PM: Preparing system for mem sleep > Jan 19 18:00:14 fambox kernel: [ 3072.733178] Freezing user space > processes ... (elapsed 0.01 seconds) done. > Jan 19 18:00:14 fambox kernel: [ 3072.749284] Freezing remaining > freezable tasks ... (elapsed 0.01 seconds) done. > Jan 19 18:00:14 fambox kernel: [ 3072.765274] suspend debug: Waiting > for 5 seconds. > Jan 19 18:00:14 fambox kernel: [ 3077.722222] PM: Finishing wakeup. > Jan 19 18:00:14 fambox kernel: [ 3077.722224] Restarting tasks ... done. > Jan 19 18:00:14 fambox kernel: [ 3077.726719] video LNXVIDEO:00: > Restoring backlight state > Jan 19 18:00:18 fambox kernel: [ 3081.596512] PM: Syncing filesystems ... done. > Jan 19 18:00:18 fambox kernel: [ 3081.750235] PM: Preparing system for mem sleep > Jan 19 18:00:45 fambox kernel: [ 3087.738204] Freezing user space > processes ... (elapsed 0.04 seconds) done. > Jan 19 18:00:45 fambox kernel: [ 3087.786660] Freezing remaining > freezable tasks ... > Jan 19 18:00:45 fambox kernel: [ 3107.788540] Freezing of tasks failed > after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): > Jan 19 18:00:45 fambox kernel: [ 3107.788664] jbd2/loop0-8 D > ffffffff8180d780 0 299 2 0x00000000 > Jan 19 18:00:45 fambox kernel: [ 3107.788673] ffff88011822db68 > 0000000000000046 ffff88011822db08 ffffffff81044c29 > Jan 19 18:00:45 fambox kernel: [ 3107.788682] ffff880117c6dc80 > ffff88011822dfd8 ffff88011822dfd8 ffff88011822dfd8 > Jan 19 18:00:45 fambox kernel: [ 3107.788689] ffffffff81c15440 > ffff880117c6dc80 ffff88011822db68 ffff88011fa14738 > Jan 19 18:00:45 fambox kernel: [ 3107.788696] Call Trace: > Jan 19 18:00:45 fambox kernel: [ 3107.788715] [] ? > default_spin_lock_flags+0x9/0x10 > Jan 19 18:00:45 fambox kernel: [ 3107.788726] [] ? > __wait_on_buffer+0x30/0x30 > Jan 19 18:00:45 fambox kernel: [ 3107.788735] [] > schedule+0x29/0x70 > Jan 19 18:00:45 fambox kernel: [ 3107.788741] [] > io_schedule+0x8f/0xd0 > Jan 19 18:00:45 fambox kernel: [ 3107.788748] [] > sleep_on_buffer+0xe/0x20 > Jan 19 18:00:45 fambox kernel: [ 3107.788758] [] > __wait_on_bit+0x5f/0x90 > Jan 19 18:00:45 fambox kernel: [ 3107.788765] [] ? > submit_bh+0x121/0x1e0 > Jan 19 18:00:45 fambox kernel: [ 3107.788771] [] ? > __wait_on_buffer+0x30/0x30 > Jan 19 18:00:45 fambox kernel: [ 3107.788779] [] > out_of_line_wait_on_bit+0x7c/0x90 > Jan 19 18:00:45 fambox kernel: [ 3107.788787] [] ? > autoremove_wake_function+0x40/0x40 > Jan 19 18:00:45 fambox kernel: [ 3107.788794] [] > __wait_on_buffer+0x2e/0x30 > Jan 19 18:00:45 fambox kernel: [ 3107.788805] [] > jbd2_journal_commit_transaction+0x18cc/0x1d60 > Jan 19 18:00:45 fambox kernel: [ 3107.788814] [] ? > _raw_spin_lock_irqsave+0x2e/0x40 > Jan 19 18:00:45 fambox kernel: [ 3107.788823] [] ? > try_to_del_timer_sync+0x4f/0x70 > Jan 19 18:00:45 fambox kernel: [ 3107.788830] [] > kjournald2+0xd6/0x3e0 > Jan 19 18:00:45 fambox kernel: [ 3107.788835] [] ? > add_wait_queue+0x60/0x60 > Jan 19 18:00:45 fambox kernel: [ 3107.788841] [] ? > commit_timeout+0x10/0x10 > Jan 19 18:00:45 fambox kernel: [ 3107.788846] [] > kthread+0xc0/0xd0 > Jan 19 18:00:45 fambox kernel: [ 3107.788852] [] ? > flush_kthread_worker+0xb0/0xb0 > Jan 19 18:00:45 fambox kernel: [ 3107.788859] [] > ret_from_fork+0x7c/0xb0 > Jan 19 18:00:45 fambox kernel: [ 3107.788865] [] ? > flush_kthread_worker+0xb0/0xb0 > Jan 19 18:00:45 fambox kernel: [ 3107.788929] > Jan 19 18:00:45 fambox kernel: [ 3107.788932] Restarting kernel > threads ... done. > Jan 19 18:00:45 fambox kernel: [ 3107.789086] Restarting tasks ... done. > Jan 19 18:00:45 fambox kernel: [ 3107.802117] video LNXVIDEO:00: > Restoring backlight state > > ### [C.2] CHECKING DBUS AND RCU_SCHED STATUS POST-MORTEM > > # ll /var/run/dbus/ > insgesamt 4 > drwxr-xr-x 2 messagebus messagebus 80 Jan 19 17:08 ./ > drwxr-xr-x 22 root root 800 Jan 19 17:11 ../ > -rw-r--r-- 1 root root 4 Jan 19 17:08 pid > srwxrwxrwx 1 root root 0 Jan 19 17:08 system_bus_socket= > > # cat /var/run/dbus/pid > 862 > > root@fambox:~# ps axu | grep -v grep | grep 862 > 102 862 0.0 0.0 25492 2332 ? Ss 17:08 0:01 > dbus-daemon --system --fork --activation=upstart > wearefam 1931 0.0 0.5 862752 20116 ? Sl 17:09 0:02 > /usr/lib/gnome-settings-daemon/gnome-settings-daemon > > # pidof dbus-daemon > 1921 862 > > # cat /proc/sys/kernel/hung_task_timeout_secs > 0 > > > ##### [D] INTERIM SUMMARY > > 1. The issue is reproducible! > 2. The issue ***can*** be forced by running pm_test/freezer! > > Again, the root cause is not very clear to me! > > Recent TTY-NEXT changes? > Problems in EXT4-FS/JBD2? > Problems in X86/RCU? > > Ask the experts :-)! > > > -dileks // 19-Jan-2013 I have a applied another pending patch ("tty: Add driver unthrottle in ioctl(...,TCFLSH,..).") from tty-next. Now, the call-trace looks like this: [ 433.527986] PM: Syncing filesystems ... done. [ 433.843761] PM: Preparing system for mem sleep [ 436.306002] Freezing user space processes ... [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0): [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004 [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8 0000000300000001 [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8 ffff88007b9dffd8 [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28 ffff880118077800 [ 456.305076] Call Trace: [ 456.305085] [] schedule+0x29/0x70 [ 456.305089] [] jbd2_log_wait_commit+0xcd/0x1a0 [ 456.305094] [] ? add_wait_queue+0x60/0x60 [ 456.305098] [] ext4_sync_file+0x205/0x380 [ 456.305103] [] do_fsync+0x5d/0x90 [ 456.305108] [] ? sys_write+0x6b/0xa0 [ 456.305111] [] sys_fsync+0x10/0x20 [ 456.305114] [] system_call_fastpath+0x1a/0x1f [ 456.305122] [ 456.305124] Restarting tasks ... done. [ 456.315056] video LNXVIDEO:00: Restoring backlight state JBD2/EX4FS problem? - Sedat - [1] http://git.kernel.org/?p=linux/kernel/git/gregkh/tty.git;a=patch;h=a1bf9584429d61b7096f93ae09325e1ba538e9e8