From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <499BF9BB.3090700@domain.hid> Date: Wed, 18 Feb 2009 13:06:19 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <499B36C5.3060104@domain.hid> <499B3F35.2080604@domain.hid> <499B5136.2030107@domain.hid> <499B55CF.3020005@domain.hid> <499BBF88.9000207@domain.hid> <499BE2E7.4090704@domain.hid> In-Reply-To: <499BE2E7.4090704@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-help] BUG fs/buffer.c with Linux 2.6.26,27 List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: xenomai@xenomai.org Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Roman Pisl wrote: >>>>>> Hello, >>>>>> I'm porting our application to Xenomai and I found a bug with Linux >>>>>> 2.6.26 and 2.6.27 (and also .19) and Xenomai 2.4.6.1. This happens when >>>>>> the application is running and there is hard disk activity. It happens >>>>>> also when latency test is running for a while. Linux 2.6.16 with Xenomai >>>>>> 2.4.6.1 works without problem. So this is solution for now. >>>>>> >>>>>> Tested on x86 - thinkpad laptop and pentium 4 desktop. >>>>>> >>>>>> I'm not a kernel hacker but surely try to help with fixing. I'd like to >>>>>> see this fixed. >>>>> /me, too. But could you first of all check if the problem persists even >>>>> with latest adeos-ipipe-2.6.27.13-x86-2.2-05.patch [1]? >>>>> >>>>> If it does: Go back to 2.6.26 (in order to have the ipipe tracer). >>>>> Before the oops can occur, extend /proc/ipipe/trace/back_trace_points >>>>> to, say, one thousand ("echo 1000 > /proc/..."). Then post the resulting >>>>> trace. It may include some hints about when Linux IRQs were last enabled >>>>> and when they became stuck at disabled. Also, please attach your .config. >>>> Other traces show that it happens in __ipipe_unstall_iret_root. >>> Here is the other trace. >>> >>> >>> / # /etc/ini/ # /etc/init./ # /etc/init.d/rc/ # /etc/init.d/rcS >>> model : 9 >>> Geode GX1 Found >>> EXT3-fs warning: maximal mount count reached, running e2fsck is recommended >>> EXT3 FS on hdc1, internal journal >>> kjournald starting. Commit interval 5 seconds >>> EXT3-fs warning: maximal mount count reached, running e2fsck is recommended >>> EXT3 FS on hda1, internal journal >>> EXT3-fs: recovery complete. >>> EXT3-fs: mounted filesystem with ordered data mode. >>> modprobe: module ohci-hcd not found >>> modprobe: failed to load module ohci-hcd >>> modprobe: module sg not found >>> modprobe: failed to load module sg >>> modprobe: module usb-storage not found >>> modprobe: failed to load module usb-storage >>> I-pipe: Domain Xenomai registered. >>> Xenomai: hal/i386 started. >>> Xenomai: real-time nucleus v2.4.6 (Bamboo) loaded. >>> Xenomai: starting native API services. >>> Generic RTC Driver v1.07 >>> eepro100.c:v1.09j-t 9/29/99 Donald Becker >>> eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin and others >>> eepro100 0000:00:0b.0: found PCI INT A -> IRQ 11 >>> eepro100 0000:00:0b.0: IRQ routing conflict: have IRQ 10, want IRQ 11 >>> eth0: OEM i82557/i82558 10/100 Ethernet, 00:0b:ab:12:0d:69, IRQ 10. >>> Board assembly 727095-004, Physical connectors present: RJ45 >>> Primary interface chip i82555 PHY #1. >>> General self-test: passed. >>> Serial sub-system self-test: passed. >>> Internal registers self-test: passed. >>> ROM checksum self-test: passed (0xd0a6c714). >>> st_dacsnd: module license 'Proprietary' taints kernel. >>> ticks are 125000 >>> st audio device registered v3 >>> / # NET: Registered protocol family 1 >>> NET: Registered protocol family 10 >>> ADDRCONF(NETDEV_UP): eth0: link is not ready >>> >>> / # ./ sh ./xe/ # sh ./xenomai_trace.sh >>> / # cd xo/ # cd xoft/ >>> /xoft # ./xe/xoft # ./xenoxoft >>> stAudio >>> initializing xenoxoft instance >>> Xenomai 1.40c Jan 30 2009 12:27:55 >>> xenoxoft initialized! >>> maxTreatmentFractions = 10 >>> maxTreatmentMinutes = 170 >>> maxCalibrationMinutes = 170 >>> maxDays = 35 >>> xoft input running >>> ------------[ cut here ]------------ >>> kernel BUG at fs/buffer.c:1289! >>> invalid opcode: 0000 [#1] PREEMPT >>> Modules linked in: ipv6 unix st_dacsnd(P) soundcore eepro100 mii genrtc xeno_native xeno_nucleus vfat fat >>> >>> Pid: 913, comm: rtlogd Tainted: P (2.6.27.11-no3dnow-serial-xenomai #19) >>> EIP: 0060:[] EFLAGS: 00010046 CPU: 0 >>> EIP is at __find_get_block+0x2d/0x198 >>> EAX: 00000000 EBX: 00017e83 ECX: 00000400 EDX: 00060003 >>> ESI: 00000400 EDI: 00060003 EBP: cfb79d38 ESP: cfb79d04 >>> DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 >>> Process rtlogd (pid: 913, ti=cfb78000 task=cf998700 task.ti=cfb78000)<0> >>> I-pipe domain Linux >>> Stack: cf4101a0 00000000 00003ce8 00000082 c04ec200 00017e83 cf953600 cfeffe00 >>> cfb79d5c c010b4eb 00017e83 cf953600 00000400 cfb79d5c c017a3d8 00060003 >>> cf4101a0 00000000 cf951200 00017e83 cf953600 cfeffe00 cfb79d98 c0197094 >>> Call Trace: >>> [] ? mcount+0x1f/0x23 >>> [] ? __getblk+0x1b/0x1d8 >>> [] ? __ext3_get_inode_loc+0xce/0x282 >>> [] ? ext3_get_inode_loc+0x8/0x1b >>> [] ? ext3_get_inode_loc+0x19/0x1b >>> [] ? ext3_reserve_inode_write+0x25/0x6d >>> [] ? ext3_mark_inode_dirty+0x2a/0x41 >>> [] ? ext3_dirty_inode+0x57/0x6b >>> [] ? __mark_inode_dirty+0x2b/0x14d >>> [] ? file_update_time+0x9e/0xc6 >>> [] ? __generic_file_aio_write_nolock+0x2f9/0x42c >>> [] ? ipipe_check_context+0x9/0x94 >>> [] ? generic_file_aio_write+0x5c/0xb7 >>> [] ? ext3_file_write+0x1f/0x8e >>> [] ? do_sync_write+0xbf/0xfd >>> [] ? losyscall_event+0xe/0x158 [xeno_nucleus] >>> [] ? autoremove_wake_function+0x0/0x35 >>> [] ? mcount+0x1f/0x23 >>> [] ? do_sync_write+0x0/0xfd >>> [] ? vfs_write+0x91/0x113 >>> [] ? sys_write+0x40/0x65 >>> [] ? syscall_call+0x7/0xb >>> ======================= >>> Code: e5 57 56 53 83 ec 28 e8 99 12 f9 ff 89 45 cc 89 d7 89 ce 0f ba 25 0c e6 31 c0 00 19 c0 83 f8 01 19 c0 25 00 02 00 00 85 c0 75 04 <0f> 0b eb fe b8 01 00 00 00 e8 e8 5d f9 ff 31 c9 8b 1c 8d f4 4b >>> EIP: [] __find_get_block+0x2d/0x198 SS:ESP 0068:cfb79d04 >>> I-pipe tracer log (10000 points): >>> | #func 0 ipipe_trace_panic_freeze+0x9 (oops_enter+0xd) >>> | #func -1 oops_enter+0x8 (oops_begin+0xe) >>> | #func -3 oops_begin+0x9 (die+0x19) >>> | #func -5 die+0xe (do_trap+0x8e) >>> | #func -8 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -10 sub_preempt_count+0x9 (search_module_extables+0x64) >>> | #func -13 search_extable+0x9 (search_module_extables+0x3f) >>> | #func -16 search_extable+0x9 (search_module_extables+0x3f) >>> | #func -23 add_preempt_count+0x8 (search_module_extables+0x17) >>> | #func -24 search_module_extables+0xb (search_exception_tables+0x27) >>> | #func -28 search_extable+0x9 (search_exception_tables+0x1c) >>> | #func -30 search_exception_tables+0x9 (fixup_exception+0x13) >>> | #func -32 fixup_exception+0x9 (do_trap+0x6f) >>> | #func -34 do_trap+0xe (do_invalid_op+0x71) >>> | #func -37 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -38 sub_preempt_count+0x9 (__atomic_notifier_call_chain+0x38) >>> | #func -40 notifier_call_chain+0xe (__atomic_notifier_call_chain+0x2c) >>> | #func -43 add_preempt_count+0x8 (__atomic_notifier_call_chain+0x1b) >>> | #func -44 __atomic_notifier_call_chain+0xb (atomic_notifier_call_chain+0x11) >>> | #func -46 atomic_notifier_call_chain+0x8 (notify_die+0x32) >>> | #func -48 notify_die+0xb (do_invalid_op+0x48) >>> | #func -51 do_invalid_op+0xd (__ipipe_handle_exception+0x128) >>> | #func -57 __ipipe_handle_exception+0xe (error_code+0x6f) >>> | #func -62 __find_get_block+0xe (__getblk+0x1b) >>> | #func -64 __getblk+0xe (__ext3_get_inode_loc+0xce) >>> | #func -71 ext3_get_group_desc+0xe (__ext3_get_inode_loc+0x67) >>> | #func -76 __ext3_get_inode_loc+0xe (ext3_get_inode_loc+0x19) >>> | #func -78 ext3_get_inode_loc+0x8 (ext3_reserve_inode_write+0x25) >>> | #func -79 ext3_reserve_inode_write+0xe (ext3_mark_inode_dirty+0x2a) >>> | #func -81 ipipe_check_context+0x9 (ext3_mark_inode_dirty+0x1c) >>> | #func -82 ext3_mark_inode_dirty+0xe (ext3_dirty_inode+0x57) >>> | #func -85 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -86 sub_preempt_count+0x9 (start_this_handle+0x33c) >>> | #func -88 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -90 sub_preempt_count+0x9 (start_this_handle+0x320) >>> | #func -92 __log_space_left+0x8 (start_this_handle+0x2cd) >>> | #func -95 add_preempt_count+0x8 (start_this_handle+0x230) >>> | #func -98 add_preempt_count+0x8 (start_this_handle+0x93) >>> | #func -101 start_this_handle+0xe (journal_start+0x86) >>> | #func -104 __ipipe_restore_root+0x8 (kmem_cache_alloc+0x91) >>> | #func -110 ipipe_check_context+0x9 (kmem_cache_alloc+0x23) >>> | #func -111 kmem_cache_alloc+0xe (journal_start+0x53) >>> | #func -115 journal_start+0xe (ext3_journal_start_sb+0x45) >>> | #func -118 ext3_journal_start_sb+0x8 (ext3_dirty_inode+0x28) >>> | #func -120 ext3_dirty_inode+0xb (__mark_inode_dirty+0x2b) >>> | #func -124 __mark_inode_dirty+0xe (file_update_time+0x9e) >>> | #func -128 timespec_trunc+0xe (current_fs_time+0x1b) >>> | #func -131 current_kernel_time+0xa (current_fs_time+0x10) >>> | #func -133 current_fs_time+0x9 (file_update_time+0x3a) >>> | #func -135 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -137 sub_preempt_count+0x9 (mnt_want_write+0x88) >>> | #func -138 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -140 sub_preempt_count+0x9 (mnt_want_write+0x6c) >>> | #func -143 __mnt_is_readonly+0x8 (mnt_want_write+0x2c) >>> | #func -145 add_preempt_count+0x8 (mnt_want_write+0x20) >>> | #func -146 add_preempt_count+0x8 (mnt_want_write+0x16) >>> | #func -148 mnt_want_write+0xa (file_update_time+0x27) >>> | #func -150 file_update_time+0xa (__generic_file_aio_write_nolock+0x2f9) >>> | #func -153 cap_inode_need_killpriv+0x8 (file_remove_suid+0x20) >>> | #func -156 should_remove_suid+0x9 (file_remove_suid+0x17) >>> | #func -158 file_remove_suid+0xd (__generic_file_aio_write_nolock+0x2e6) >>> | #func -161 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -163 sub_preempt_count+0x9 (__generic_file_aio_write_nolock+0x141) >>> | #func -164 add_preempt_count+0x8 (__generic_file_aio_write_nolock+0x12b) >>> | #func -172 generic_segment_checks+0xe (__generic_file_aio_write_nolock+0x3c) >>> | #func -175 __generic_file_aio_write_nolock+0xe (generic_file_aio_write+0x5c) >>> | #func -179 ipipe_check_context+0x9 (mutex_lock+0x18) >>> | #func -180 mutex_lock+0xc (generic_file_aio_write+0x4b) >>> | #func -184 generic_file_aio_write+0xe (ext3_file_write+0x1f) >>> | #func -187 ext3_file_write+0xb (do_sync_write+0xbf) >>> | #func -193 do_sync_write+0x11 (vfs_write+0x91) >>> | #func -200 rw_verify_area+0xe (vfs_write+0x6f) >>> | #func -205 vfs_write+0xe (sys_write+0x40) >>> | #func -208 ipipe_check_context+0x9 (sub_preempt_count+0x15) >>> | #func -209 sub_preempt_count+0x9 (fget_light+0x8f) >>> | #func -214 add_preempt_count+0x8 (fget_light+0x46) >>> | #func -217 fget_light+0xb (sys_write+0x1d) >>> | #func -219 sys_write+0xd (syscall_call+0x7) >>> | #func -223 losyscall_event+0xe [xeno_nucleus] (__ipipe_dispatch_event+0xbc) >>> | +*func -228 hisyscall_event+0xe [xeno_nucleus] (__ipipe_dispatch_event+0xbc) >>> | #func -233 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x74) >>> | #func -239 __ipipe_syscall_root+0xa (system_call+0x29) >>> | #end 0x8000000d -503 __ipipe_unstall_iret_root+0x60 (restore_nocheck_notrace+0x0) >>> | #begin 0x80000000 -506 __ipipe_unstall_iret_root+0x1e (restore_nocheck_notrace+0x0) >>> #func -507 __ipipe_unstall_iret_root+0x9 (restore_nocheck_notrace+0x0) >> Mmmh, something may have messed up the argument (register file) that >> __ipipe_unstall_iret_root sees so that regs.flags & X86_EFLAGS_IF >> evaluates to 0. > > Yes, as far as I understand the code, the flags have been changed in > ipipe_syscall_root by the call to fixup_if. The trouble is that the task > migrated right before the execution of the syscall, so the call to > ipipe_syscall_root is farther in the trace, and I can not seem to find it. > I first thought I found it @-3170, but that's a different task. However, let's assumed we entered __ipipe_syscall_root with root domain stalled. If we then return from __ipipe_dispatch_event with 0 (=> forward this syscall to Linux), we would not call __fixup_if again so that the stalled state is kept. Is this a valid scenario for the given task, or would this be broken already? At least it looks like the path taken here Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux