* [Xenomai] "inconsistent lock state" on boot-up
@ 2014-11-09 10:07 Stoidner, Christoph
2014-11-09 15:53 ` Gilles Chanteperdrix
0 siblings, 1 reply; 47+ messages in thread
From: Stoidner, Christoph @ 2014-11-09 10:07 UTC (permalink / raw)
To: xenomai@xenomai.org
Hi at all,
I am using linux 3.10.32 and ipipe-core-3.10.32-arm-4.patch on a Freescale i.MX28.
When booting the kernel the message "inconsistent lock state" is given (see below). Does anyone have an idea why this happens? With kernel 3.10.18 and according ipipe it is the same. With linux 3.4.6 and ipipe 3.4.6-arm-4 the message does not appear.
I am very interested to understand if these message could lead to any problems, since I have I unpredictable crashes of my xenomai-based application program (e.g. with "segmentation fault" or "scheduling while atomic" messages).
Here is the kernel output of 3.10.32 and ipipe-core-3.10.32-arm-4.patch:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 3.10.32-ipipe (stch@Kubuntu-Default) (gcc version 4.6.2 (arvero ARM tools 2013-08-05) ) #3 PREEMPT Thu Nov 6 14:54:08 CET 2014
[ 0.000000] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177
[ 0.000000] CPU: VIVT data cache, VIVT instruction cache
[ 0.000000] Machine: Freescale MXS (Device Tree), model: Viessmann Vitocom 100
[ 0.000000] Memory policy: ECC disabled, Data cache writeback
[ 0.000000] On node 0 totalpages: 32768
[ 0.000000] free_area_init_node: node 0, pgdat c06fa4e8, node_mem_map c0ca5000
[ 0.000000] Normal zone: 256 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 32768 pages, LIFO batch:7
[ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
[ 0.000000] Kernel command line: fec.macaddr=0x00,0xD0,0x93,0x2A,0xCC,0x40 ip=192.168.200.20:192.168.200.125:192.168.1.254:255.255.255.0::eth0:off root=/dev/nfs rw nfsroot=192.168.200.125:/srv/nfs/rtos-linux-rootfs,v3,tcp rw console=ttyAMA0,115200
[ 0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
[ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[ 0.000000] Memory: 128MB = 128MB total
[ 0.000000] Memory: 116928k/116928k available, 14144k reserved, 0K highmem
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)
[ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
[ 0.000000] vmalloc : 0xc8800000 - 0xff000000 ( 872 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xc8000000 ( 128 MB)
[ 0.000000] modules : 0xbf000000 - 0xc0000000 ( 16 MB)
[ 0.000000] .text : 0xc0008000 - 0xc067dce4 (6616 kB)
[ 0.000000] .init : 0xc067e000 - 0xc06b4588 ( 218 kB)
[ 0.000000] .data : 0xc06b6000 - 0xc06fed30 ( 292 kB)
[ 0.000000] .bss : 0xc06fed30 - 0xc0c9ff80 (5765 kB)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] NR_IRQS:16 nr_irqs:16 16
[ 0.000000] of_irq_init: children remain, but no parents
[ 0.000000] I-pipe, 24.000 MHz clocksource
[ 0.000000] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
[ 0.000000] I-pipe: ARM926EJ-S detected, disabling wfi instruction in idle loop
[ 0.000000] Interrupt pipeline (release #4)
[ 0.000000] Console: colour dummy device 80x30
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
[ 0.000000] ... CHAINHASH_SIZE: 16384
[ 0.000000] memory used by lock dependency info: 3695 kB
[ 0.000000] per task-struct memory footprint: 1152 bytes
[ 0.002748] Calibrating delay loop... 226.09 BogoMIPS (lpj=1130496)
[ 0.071047] pid_max: default: 32768 minimum: 301
[ 0.071902] Mount-cache hash table entries: 512
[ 0.079998] CPU: Testing write buffer coherency: ok
[ 0.083826] Setting up static identity map for 0xc04acc20 - 0xc04acc78
[ 0.091221]
[ 0.091279] =================================
[ 0.091308] [ INFO: inconsistent lock state ]
[ 0.091344] 3.10.32-ipipe #3 Not tainted
[ 0.091366] ---------------------------------
[ 0.091392] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 0.091427] kthreadd/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 0.091452] (std_spinlock_raw(&rq->lock)){?.....}, at: [<c0049608>] try_to_wake_up+0x80/0x12c
[ 0.091572] {IN-HARDIRQ-W} state was registered at:
[ 0.091599] [<c005eb40>] __lock_acquire+0xac8/0x1aec
[ 0.091657] [<c006017c>] lock_acquire+0xc0/0x184
[ 0.091702] [<c04abf18>] _raw_spin_lock+0x40/0x50
[ 0.091764] [<c004a1a8>] scheduler_tick+0x20/0xdc
[ 0.091815] [<c002c4ec>] update_process_times+0x58/0x68
[ 0.091867] [<c0058e8c>] tick_handle_periodic+0x18/0x8c
[ 0.091909] [<c03ae6f4>] mxs_timer_interrupt+0x34/0x40
[ 0.091960] [<c006ccc8>] handle_irq_event_percpu+0x68/0x314
[ 0.092013] [<c006cfb0>] handle_irq_event+0x3c/0x5c
[ 0.092056] [<c006fb78>] handle_level_irq+0x6c/0xcc
[ 0.092106] [<c006c5b4>] generic_handle_irq+0x20/0x30
[ 0.092148] [<c000f9a0>] handle_IRQ+0x30/0x84
[ 0.092203] [<c0077c14>] __ipipe_do_sync_stage+0x2b0/0x2fc
[ 0.092259] [<c00085bc>] __ipipe_grab_irq+0x2c/0x70
[ 0.092301] [<c000e624>] __irq_svc+0x44/0x70
[ 0.092342] [<c06a8f54>] calibrate_delay+0x360/0x4e8
[ 0.092401] [<c067e95c>] start_kernel+0x25c/0x2f0
[ 0.092457] [<40008040>] 0x40008040
[ 0.092498] irq event stamp: 6
[ 0.092523] hardirqs last enabled at (5): [<c0060d84>] debug_check_no_locks_freed+0xd8/0x170
[ 0.092576] hardirqs last disabled at (6): [<c04ac000>] _raw_spin_lock_irqsave+0x34/0x80
[ 0.092636] softirqs last enabled at (0): [<c001b454>] copy_process+0x2a4/0x107c
[ 0.092701] softirqs last disabled at (0): [< (null)>] (null)
[ 0.092733]
[ 0.092733] other info that might help us debug this:
[ 0.092761] Possible unsafe locking scenario:
[ 0.092761]
[ 0.092786] CPU0
[ 0.092804] ----
[ 0.092821] lock(std_spinlock_raw(&rq->lock));
[ 0.092858] <Interrupt>
[ 0.092876] lock(std_spinlock_raw(&rq->lock));
[ 0.092913]
[ 0.092913] *** DEADLOCK ***
[ 0.092913]
[ 0.092950] 3 locks held by kthreadd/9:
[ 0.092969] #0: (&x->wait){+.....}, at: [<c00489c0>] complete+0x1c/0x5c
[ 0.093069] #1: (std_spinlock_raw(&p->pi_lock)){+.....}, at: [<c00495a8>] try_to_wake_up+0x20/0x12c
[ 0.093165] #2: (std_spinlock_raw(&rq->lock)){?.....}, at: [<c0049608>] try_to_wake_up+0x80/0x12c
[ 0.093260]
[ 0.093260] stack backtrace:
[ 0.093306] CPU: 0 PID: 9 Comm: kthreadd Not tainted 3.10.32-ipipe #3
[ 0.093386] [<c0013ed8>] (unwind_backtrace+0x0/0xf0) from [<c0011c10>] (show_stack+0x10/0x14)
[ 0.093457] [<c0011c10>] (show_stack+0x10/0x14) from [<c04a50a0>] (print_usage_bug.part.28+0x218/0x280)
[ 0.093523] [<c04a50a0>] (print_usage_bug.part.28+0x218/0x280) from [<c005df38>] (mark_lock+0x528/0x668)
[ 0.093585] [<c005df38>] (mark_lock+0x528/0x668) from [<c0060a38>] (mark_held_locks+0x9c/0x120)
[ 0.093646] [<c0060a38>] (mark_held_locks+0x9c/0x120) from [<c0060b70>] (trace_hardirqs_on_caller+0xb4/0x1e0)
[ 0.093707] [<c0060b70>] (trace_hardirqs_on_caller+0xb4/0x1e0) from [<c000e654>] (__ipipe_fast_svc_irq_exit+0x4/0x10)
[ 0.093771] [<c000e654>] (__ipipe_fast_svc_irq_exit+0x4/0x10) from [<c0049028>] (update_rq_clock+0x48/0x58)
[ 0.093832] [<c0049028>] (update_rq_clock+0x48/0x58) from [<c00490fc>] (enqueue_task+0x18/0x70)
[ 0.093893] [<c00490fc>] (enqueue_task+0x18/0x70) from [<c0049618>] (try_to_wake_up+0x90/0x12c)
[ 0.093952] [<c0049618>] (try_to_wake_up+0x90/0x12c) from [<c0046e3c>] (__wake_up_common+0x54/0x94)
[ 0.094012] [<c0046e3c>] (__wake_up_common+0x54/0x94) from [<c00489ec>] (complete+0x48/0x5c)
[ 0.094071] [<c00489ec>] (complete+0x48/0x5c) from [<c003ef78>] (kthread+0x7c/0xb0)
[ 0.094132] [<c003ef78>] (kthread+0x7c/0xb0) from [<c000eb34>] (ret_from_fork+0x18/0x24)
[ 0.097934] devtmpfs: initialized
[ 0.102894] pinctrl core: initialized pinctrl subsystem
[ 0.104625] regulator-dummy: no parameters
[ 0.105613] NET: Registered protocol family 16
[ 0.107080] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.122886] gpiochip_add: registered GPIOs 0 to 31 on device: gpio.0
[ 0.125709] gpiochip_add: registered GPIOs 32 to 63 on device: gpio.1
[ 0.128234] gpiochip_add: registered GPIOs 64 to 95 on device: gpio.2
[ 0.130826] gpiochip_add: registered GPIOs 96 to 127 on device: gpio.3
[ 0.133570] gpiochip_add: registered GPIOs 128 to 159 on device: gpio.4
[ 0.150654] Serial: AMBA PL011 UART driver
[ 0.151544] 80074000.serial: ttyAMA0 at MMIO 0x80074000 (irq = 225) is a PL011 rev2
[ 0.898747] console [ttyAMA0] enabled
[ 0.930718] bio: create slab <bio-0> at 0
[ 0.943158] mxs-dma 80004000.dma-apbh: initialized
[ 0.954720] mxs-dma 80024000.dma-apbx: initialized
[ 0.960462] of_get_named_gpio_flags exited with status 124
[ 0.966563] vddio-sd0: 3300 mV
[ 0.970520] of_get_named_gpio_flags: can't parse gpios property
[ 0.977004] 3P3V: 3300 mV
[ 0.980476] of_get_named_gpio_flags exited with status 125
[ 0.986546] fec-3v3: 3300 mV
[ 0.990375] of_get_named_gpio_flags exited with status 122
[ 0.996481] usb0_vbus: 5000 mV
[ 1.001710] SCSI subsystem initialized
[ 1.006680] usbcore: registered new interface driver usbfs
[ 1.012713] usbcore: registered new interface driver hub
[ 1.018681] usbcore: registered new device driver usb
[ 1.025394] pps_core: LinuxPPS API ver. 1 registered
[ 1.030404] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 1.039859] PTP clock support registered
[ 1.045411] Advanced Linux Sound Architecture Driver Initialized.
[ 1.054090] Switching to clocksource ipipe_tsc
[ 1.221815] NET: Registered protocol family 2
[ 1.228393] TCP established hash table entries: 1024 (order: 1, 8192 bytes)
[ 1.236257] TCP bind hash table entries: 1024 (order: 3, 36864 bytes)
[ 1.243306] TCP: Hash tables configured (established 1024 bind 1024)
[ 1.250050] TCP: reno registered
[ 1.253378] UDP hash table entries: 256 (order: 2, 20480 bytes)
[ 1.259726] UDP-Lite hash table entries: 256 (order: 2, 20480 bytes)
[ 1.267635] NET: Registered protocol family 1
[ 1.273650] RPC: Registered named UNIX socket transport module.
[ 1.279881] RPC: Registered udp transport module.
[ 1.284637] RPC: Registered tcp transport module.
[ 1.289495] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 1.296613] NetWinder Floating Point Emulator V0.97 (double precision)
[ 1.307586] I-pipe: head domain Xenomai registered.
[ 1.312699] Xenomai: hal/arm started.
[ 1.316959] Xenomai: scheduling class idle registered.
[ 1.322159] Xenomai: scheduling class rt registered.
[ 1.349552] Xenomai: real-time nucleus v2.6.3 (Lies and Truths) loaded.
[ 1.356216] Xenomai: debug mode enabled.
[ 1.361259] Xenomai: starting @CHIP-RTOS services.
[ 1.366196] Xenomai: starting native API services.
[ 1.371211] Xenomai: starting POSIX services.
[ 1.376029] Xenomai: starting RTDM services.
[ 1.436475] NFS: Registering the id_resolver key type
[ 1.442015] Key type id_resolver registered
[ 1.446266] Key type id_legacy registered
[ 1.450607] jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
[ 1.459635] msgmni has been set to 228
[ 1.470890] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[ 1.478351] io scheduler noop registered (default)
[ 1.486256] of_dma_request_slave_channel: dma-names property missing or empty
[ 1.493691] uart-pl011 80074000.serial: no DMA platform data
[ 1.500737] 80072000.serial: ttyAPP4 at MMIO 0x80072000 (irq = 224) is a 80072000.serial
[ 1.510672] mxs-auart 80072000.serial: Found APPUART 3.1.0
[ 1.548255] of_get_named_gpio_flags exited with status 17
[ 1.668496] libphy: fec_enet_mii_bus: probed
[ 1.675807] usbcore: registered new interface driver asix
[ 1.681869] usbcore: registered new interface driver ax88179_178a
[ 1.688302] usbcore: registered new interface driver cdc_ether
[ 1.694882] usbcore: registered new interface driver smsc95xx
[ 1.701232] usbcore: registered new interface driver net1080
[ 1.707224] usbcore: registered new interface driver cdc_subset
[ 1.713634] usbcore: registered new interface driver zaurus
[ 1.720035] usbcore: registered new interface driver cdc_ncm
[ 1.725754] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 1.732963] usbcore: registered new interface driver usb-storage
[ 1.742273] ci_hdrc ci_hdrc.0: doesn't support gadget
[ 1.747423] ci_hdrc ci_hdrc.0: EHCI Host Controller
[ 1.752764] ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1
[ 1.778903] ci_hdrc ci_hdrc.0: USB 2.0 started, EHCI 1.00
[ 1.787744] hub 1-0:1.0: USB hub found
[ 1.791825] hub 1-0:1.0: 1 port detected
[ 1.798368] mousedev: PS/2 mouse device common for all mice
[ 1.809289] stmp3xxx-rtc 80056000.rtc: rtc core: registered 80056000.rtc as rtc0
[ 1.817578] i2c /dev entries driver
[ 1.823272] stmp3xxx_rtc_wdt stmp3xxx_rtc_wdt: initialized watchdog with heartbeat 19s
[ 1.833634] of_get_named_gpio_flags exited with status 76
[ 1.878880] mxs-mmc 80012000.ssp: initialized
[ 1.887802] usbcore: registered new interface driver usbhid
[ 1.893631] usbhid: USB HID core driver
[ 1.902393] mxs-lradc 80050000.lradc: Touchscreen not enabled.
[ 1.925086] TCP: cubic registered
[ 1.928491] NET: Registered protocol family 17
[ 1.934010] Key type dns_resolver registered
[ 1.941379] registered taskstats version 1
[ 1.951463] stmp3xxx-rtc 80056000.rtc: setting system clock to 1970-01-01 00:21:15 UTC (1275)
[ 1.973991] mmc0: BKOPS_EN bit is not set
[ 1.991482] mmc0: new high speed MMC card at address 0001
[ 1.998963] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[ 2.011564] mmcblk0: mmc0:0001 SEM02G 1.82 GiB
[ 2.016804] mmcblk0boot0: mmc0:0001 SEM02G partition 1 1.00 MiB
[ 2.024103] mmcblk0boot1: mmc0:0001 SEM02G partition 2 1.00 MiB
[ 2.035524] mmcblk0: p1 p2 p3 p4 < p5 p6 p7 >
[ 2.053456] mmcblk0boot1: unknown partition table
[ 2.062682] mmcblk0boot0: unknown partition table
[ 3.989508] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
[ 4.019719] IP-Config: Gateway not on directly connected network
[ 4.025792] ALSA device list:
[ 4.028962] No soundcards found.
[ 4.066526] VFS: Mounted root (nfs filesystem) on device 0:11.
[ 4.074463] devtmpfs: mounted
[ 4.078876] Freeing unused kernel memory: 216K (c067e000 - c06b4000)
Thanks in advance,
Christoph
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-09 10:07 [Xenomai] "inconsistent lock state" on boot-up Stoidner, Christoph @ 2014-11-09 15:53 ` Gilles Chanteperdrix 2014-11-10 9:08 ` Stoidner, Christoph 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-09 15:53 UTC (permalink / raw) To: Stoidner, Christoph; +Cc: xenomai@xenomai.org On Sun, Nov 09, 2014 at 10:07:54AM +0000, Stoidner, Christoph wrote: > Hi at all, > > I am using linux 3.10.32 and ipipe-core-3.10.32-arm-4.patch on a > Freescale i.MX28. > > When booting the kernel the message "inconsistent lock state" is > given (see below). Does anyone have an idea why this happens? With > kernel 3.10.18 and according ipipe it is the same. With linux > 3.4.6 and ipipe 3.4.6-arm-4 the message does not appear. Do you have the same message with exactly the same kernel configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > I am very interested to understand if these message could lead to > any problems, since I have I unpredictable crashes of my > xenomai-based application program (e.g. with "segmentation fault" > or "scheduling while atomic" messages). Do you have FCSE enabled? If yes, did you try disabling it? same with unlocked context switch. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-09 15:53 ` Gilles Chanteperdrix @ 2014-11-10 9:08 ` Stoidner, Christoph 2014-11-10 12:33 ` Stoidner, Christoph 2014-11-10 12:43 ` Gilles Chanteperdrix 0 siblings, 2 replies; 47+ messages in thread From: Stoidner, Christoph @ 2014-11-10 9:08 UTC (permalink / raw) To: xenomai@xenomai.org Hi Gilles, > Do you have the same message with exactly the same kernel > configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not appear on boot-up. > Do you have FCSE enabled? If yes, did you try disabling it? same > with unlocked context switch. FCSE is already disabled at all. Do you have an idea how to overcome the problem? Regards, Christoph ________________________________________ Von: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Gesendet: Sonntag, 9. November 2014 16:53 An: Stoidner, Christoph Cc: xenomai@xenomai.org Betreff: Re: [Xenomai] "inconsistent lock state" on boot-up On Sun, Nov 09, 2014 at 10:07:54AM +0000, Stoidner, Christoph wrote: > Hi at all, > > I am using linux 3.10.32 and ipipe-core-3.10.32-arm-4.patch on a > Freescale i.MX28. > > When booting the kernel the message "inconsistent lock state" is > given (see below). Does anyone have an idea why this happens? With > kernel 3.10.18 and according ipipe it is the same. With linux > 3.4.6 and ipipe 3.4.6-arm-4 the message does not appear. Do you have the same message with exactly the same kernel configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > I am very interested to understand if these message could lead to > any problems, since I have I unpredictable crashes of my > xenomai-based application program (e.g. with "segmentation fault" > or "scheduling while atomic" messages). Do you have FCSE enabled? If yes, did you try disabling it? same with unlocked context switch. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 9:08 ` Stoidner, Christoph @ 2014-11-10 12:33 ` Stoidner, Christoph 2014-11-10 12:44 ` Gilles Chanteperdrix 2014-11-10 12:43 ` Gilles Chanteperdrix 1 sibling, 1 reply; 47+ messages in thread From: Stoidner, Christoph @ 2014-11-10 12:33 UTC (permalink / raw) To: xenomai@xenomai.org Hi again, now I have disabled CONFIG_XENOMAI but still enabled CONFIG_IPIPE. The error messages on boot-up is still given. Since no one else has reported that problem I assume it happens only on a specific architecture (for me: i.MX28 => ARMv5). Is there something ARMv5 architecture specific in ipipe's locking mechanism? Regards, Christoph ________________________________________ Von: Xenomai <xenomai-bounces@xenomai.org> im Auftrag von Stoidner, Christoph <c.stoidner@arvero.de> Gesendet: Montag, 10. November 2014 10:08 An: xenomai@xenomai.org Betreff: Re: [Xenomai] "inconsistent lock state" on boot-up Hi Gilles, > Do you have the same message with exactly the same kernel > configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not appear on boot-up. > Do you have FCSE enabled? If yes, did you try disabling it? same > with unlocked context switch. FCSE is already disabled at all. Do you have an idea how to overcome the problem? Regards, Christoph ________________________________________ Von: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Gesendet: Sonntag, 9. November 2014 16:53 An: Stoidner, Christoph Cc: xenomai@xenomai.org Betreff: Re: [Xenomai] "inconsistent lock state" on boot-up On Sun, Nov 09, 2014 at 10:07:54AM +0000, Stoidner, Christoph wrote: > Hi at all, > > I am using linux 3.10.32 and ipipe-core-3.10.32-arm-4.patch on a > Freescale i.MX28. > > When booting the kernel the message "inconsistent lock state" is > given (see below). Does anyone have an idea why this happens? With > kernel 3.10.18 and according ipipe it is the same. With linux > 3.4.6 and ipipe 3.4.6-arm-4 the message does not appear. Do you have the same message with exactly the same kernel configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > I am very interested to understand if these message could lead to > any problems, since I have I unpredictable crashes of my > xenomai-based application program (e.g. with "segmentation fault" > or "scheduling while atomic" messages). Do you have FCSE enabled? If yes, did you try disabling it? same with unlocked context switch. -- Gilles. _______________________________________________ Xenomai mailing list Xenomai@xenomai.org http://www.xenomai.org/mailman/listinfo/xenomai ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 12:33 ` Stoidner, Christoph @ 2014-11-10 12:44 ` Gilles Chanteperdrix 0 siblings, 0 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 12:44 UTC (permalink / raw) To: Stoidner, Christoph; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 12:33:40PM +0000, Stoidner, Christoph wrote: > > Hi again, > > now I have disabled CONFIG_XENOMAI but still enabled CONFIG_IPIPE. The error messages on boot-up is still given. > > Since no one else has reported that problem I assume it happens only on a specific architecture (for me: i.MX28 => ARMv5). Is there something ARMv5 architecture specific in ipipe's locking mechanism? This error message probably happens to anyone enabling lockdep with I-pipe on ARM. You are not the first to report it. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 9:08 ` Stoidner, Christoph 2014-11-10 12:33 ` Stoidner, Christoph @ 2014-11-10 12:43 ` Gilles Chanteperdrix 2014-11-10 14:52 ` Jan Kiszka 2014-11-11 17:33 ` Stoidner, Christoph 1 sibling, 2 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 12:43 UTC (permalink / raw) To: Stoidner, Christoph; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > > Hi Gilles, > > > Do you have the same message with exactly the same kernel > > configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > appear on boot-up. > > > Do you have FCSE enabled? If yes, did you try disabling it? same > > with unlocked context switch. > > FCSE is already disabled at all. > > Do you have an idea how to overcome the problem? I am not sure the lockdep message really is a problem. lockdep could be confused by the fact that the hardware interrupts are not off when running the I-pipe, or because we are missing some bit in the I-pipe arm specific code to get it looking at the virtual mask instead of the hardware mask. As for the scheduling while atomic and random segmentation fault, you should use the I-pipe tracer, configure it with enough back trace points, something like 1000 or 10000, and trigger a trace freeze in the kernell code when the problem happens. Also, for the "scheduling while atomic", it may happen if you call some Linux service which reschedules from primary mode, you can try enabling I-pipe debugging, and in fact all Xenomai debugging, to try and catch such mistakes. This is especially important if you are running a custom skin. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 12:43 ` Gilles Chanteperdrix @ 2014-11-10 14:52 ` Jan Kiszka 2014-11-10 15:56 ` Gilles Chanteperdrix 2014-11-11 17:33 ` Stoidner, Christoph 1 sibling, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 14:52 UTC (permalink / raw) To: Gilles Chanteperdrix, Stoidner, Christoph; +Cc: xenomai@xenomai.org On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >> >> Hi Gilles, >> >>> Do you have the same message with exactly the same kernel >>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >> >> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >> appear on boot-up. >> >>> Do you have FCSE enabled? If yes, did you try disabling it? same >>> with unlocked context switch. >> >> FCSE is already disabled at all. >> >> Do you have an idea how to overcome the problem? > > I am not sure the lockdep message really is a problem. lockdep could > be confused by the fact that the hardware interrupts are not off > when running the I-pipe, or because we are missing some bit in the > I-pipe arm specific code to get it looking at the virtual mask > instead of the hardware mask. > > As for the scheduling while atomic and random segmentation fault, > you should use the I-pipe tracer, configure it with enough back > trace points, something like 1000 or 10000, and trigger a trace > freeze in the kernell code when the problem happens. > > Also, for the "scheduling while atomic", it may happen if you call > some Linux service which reschedules from primary mode, you can try > enabling I-pipe debugging, and in fact all Xenomai debugging, to try > and catch such mistakes. This is especially important if you are > running a custom skin. "Scheduling while atomic" may have the same reason why lockdep stumbles: some changes of I-pipe messe up with IRQ state tracing of Linux. I just started to look into this issue again. We tried earlier but got distracted. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 14:52 ` Jan Kiszka @ 2014-11-10 15:56 ` Gilles Chanteperdrix 2014-11-10 18:29 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 15:56 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >> > >> Hi Gilles, > >> > >>> Do you have the same message with exactly the same kernel > >>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >> > >> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >> appear on boot-up. > >> > >>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>> with unlocked context switch. > >> > >> FCSE is already disabled at all. > >> > >> Do you have an idea how to overcome the problem? > > > > I am not sure the lockdep message really is a problem. lockdep could > > be confused by the fact that the hardware interrupts are not off > > when running the I-pipe, or because we are missing some bit in the > > I-pipe arm specific code to get it looking at the virtual mask > > instead of the hardware mask. > > > > As for the scheduling while atomic and random segmentation fault, > > you should use the I-pipe tracer, configure it with enough back > > trace points, something like 1000 or 10000, and trigger a trace > > freeze in the kernell code when the problem happens. > > > > Also, for the "scheduling while atomic", it may happen if you call > > some Linux service which reschedules from primary mode, you can try > > enabling I-pipe debugging, and in fact all Xenomai debugging, to try > > and catch such mistakes. This is especially important if you are > > running a custom skin. > > "Scheduling while atomic" may have the same reason why lockdep stumbles: > some changes of I-pipe messe up with IRQ state tracing of Linux. I just > started to look into this issue again. We tried earlier but got distracted. I doubt that very much. Though I never run with lockdep, I sometimes run with CONFIG_PREEMPT, and never saw this message. From what I can see, the "scheduling while atomic" message is based on the preempt_count only and does not use irqs_disabled() (which by the way is known to work with I-pipe on ARM as well, so, if something is broken, that should be something more obscure). -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 15:56 ` Gilles Chanteperdrix @ 2014-11-10 18:29 ` Jan Kiszka 2014-11-10 19:46 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 18:29 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>> >>>> Hi Gilles, >>>> >>>>> Do you have the same message with exactly the same kernel >>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>> >>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>> appear on boot-up. >>>> >>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>> with unlocked context switch. >>>> >>>> FCSE is already disabled at all. >>>> >>>> Do you have an idea how to overcome the problem? >>> >>> I am not sure the lockdep message really is a problem. lockdep could >>> be confused by the fact that the hardware interrupts are not off >>> when running the I-pipe, or because we are missing some bit in the >>> I-pipe arm specific code to get it looking at the virtual mask >>> instead of the hardware mask. >>> >>> As for the scheduling while atomic and random segmentation fault, >>> you should use the I-pipe tracer, configure it with enough back >>> trace points, something like 1000 or 10000, and trigger a trace >>> freeze in the kernell code when the problem happens. >>> >>> Also, for the "scheduling while atomic", it may happen if you call >>> some Linux service which reschedules from primary mode, you can try >>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>> and catch such mistakes. This is especially important if you are >>> running a custom skin. >> >> "Scheduling while atomic" may have the same reason why lockdep stumbles: >> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >> started to look into this issue again. We tried earlier but got distracted. > > I doubt that very much. Though I never run with lockdep, I sometimes > run with CONFIG_PREEMPT, and never saw this message. From what I can > see, the "scheduling while atomic" message is based on the > preempt_count only and does not use irqs_disabled() (which by the > way is known to work with I-pipe on ARM as well, so, if something is > broken, that should be something more obscure). Let's see. I think I've identified one wrong path: diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d32f8bd..ab911f8 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -198,7 +198,10 @@ #ifdef CONFIG_TRACE_IRQFLAGS @ The parent context IRQs must have been enabled to get here in @ the first place, so there's no point checking the PSR I bit. - bl trace_hardirqs_on + tst \rpsr, #PSR_I_BIT + bleq trace_hardirqs_off + tst \rpsr, #PSR_I_BIT + blne trace_hardirqs_on #endif .else @ IRQs off again before pulling preserved data off the stack This is probably no fix, but a with that change applied, the warning is gone. Now the question is what to really test for when returning here. I suppose we want the pipeline state of root here - should I __ipipe_check_root_interruptible? For reference, here is a trace that relates to a lockdep report: | #func -155 __save_stack_trace+0x14 (save_stack_trace+0x30) | #func -157 save_stack_trace+0x10 (save_trace+0x3c) :| #func -159 __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4) :| #func -160 __ipipe_check_root_interruptible+0x10 (__irq_svc+0x48) :| #func -161 __ipipe_exit_irq+0x10 (__ipipe_grab_irq+0x48) :| #func -164 __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x1f0) :| #func -167 irq_gc_mask_disable_reg+0x10 (omap_mask_ack_irq+0x18) :| #func -168 omap_mask_ack_irq+0x10 (__ipipe_ack_level_irq+0x30) :| #func -169 __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x6c) :| #func -171 irq_to_desc+0x10 (__ipipe_dispatch_irq+0xc8) :| #func -174 irq_to_desc+0x10 (__ipipe_dispatch_irq+0xb8) :| #func -175 __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x40) :| #func -177 __ipipe_grab_irq+0x10 (omap3_intc_handle_irq+0x94) :| #func -179 irq_find_mapping+0x14 (omap3_intc_handle_irq+0x88) :| #func -180 omap3_intc_handle_irq+0x10 (__irq_svc+0x44) : #func -184 update_curr.constprop.48+0x14 (dequeue_task_fair+0x30) : #func -184 dequeue_task_fair+0x10 (dequeue_task+0x38) : #func -186 update_rq_clock.part.71+0x10 (dequeue_task+0x4c) : #func -187 dequeue_task+0x14 (deactivate_task+0x38) : #func -187 deactivate_task+0x10 (__schedule+0x2b4) : #func -188 do_raw_spin_lock+0x14 (_raw_spin_lock_irq+0x7c) +func -190 _raw_spin_lock_irq+0x14 (__schedule+0x84) +func -190 ipipe_root_only+0x10 (__schedule+0x5c) | #func -191 ipipe_root_only+0x10 (ipipe_unstall_root+0x1c) #func -192 ipipe_unstall_root+0x10 (rcu_sched_qs+0xa0) +func -193 rcu_sched_qs+0x10 (__schedule+0x48) +func -194 __schedule+0x14 (schedule+0x40) +func -195 schedule+0x10 (smpboot_thread_fn+0x108) The ":" at the beginning stands for !current->hardirqs_enabled. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 18:29 ` Jan Kiszka @ 2014-11-10 19:46 ` Gilles Chanteperdrix 2014-11-10 19:51 ` Gilles Chanteperdrix 2014-11-10 19:55 ` Jan Kiszka 0 siblings, 2 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 19:46 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>> > >>>> Hi Gilles, > >>>> > >>>>> Do you have the same message with exactly the same kernel > >>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>> > >>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>> appear on boot-up. > >>>> > >>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>> with unlocked context switch. > >>>> > >>>> FCSE is already disabled at all. > >>>> > >>>> Do you have an idea how to overcome the problem? > >>> > >>> I am not sure the lockdep message really is a problem. lockdep could > >>> be confused by the fact that the hardware interrupts are not off > >>> when running the I-pipe, or because we are missing some bit in the > >>> I-pipe arm specific code to get it looking at the virtual mask > >>> instead of the hardware mask. > >>> > >>> As for the scheduling while atomic and random segmentation fault, > >>> you should use the I-pipe tracer, configure it with enough back > >>> trace points, something like 1000 or 10000, and trigger a trace > >>> freeze in the kernell code when the problem happens. > >>> > >>> Also, for the "scheduling while atomic", it may happen if you call > >>> some Linux service which reschedules from primary mode, you can try > >>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>> and catch such mistakes. This is especially important if you are > >>> running a custom skin. > >> > >> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >> started to look into this issue again. We tried earlier but got distracted. > > > > I doubt that very much. Though I never run with lockdep, I sometimes > > run with CONFIG_PREEMPT, and never saw this message. From what I can > > see, the "scheduling while atomic" message is based on the > > preempt_count only and does not use irqs_disabled() (which by the > > way is known to work with I-pipe on ARM as well, so, if something is > > broken, that should be something more obscure). > > Let's see. I think I've identified one wrong path: > > diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > index d32f8bd..ab911f8 100644 > --- a/arch/arm/kernel/entry-header.S > +++ b/arch/arm/kernel/entry-header.S > @@ -198,7 +198,10 @@ > #ifdef CONFIG_TRACE_IRQFLAGS > @ The parent context IRQs must have been enabled to get here in > @ the first place, so there's no point checking the PSR I bit. > - bl trace_hardirqs_on > + tst \rpsr, #PSR_I_BIT > + bleq trace_hardirqs_off > + tst \rpsr, #PSR_I_BIT > + blne trace_hardirqs_on > #endif > .else > @ IRQs off again before pulling preserved data off the stack > > This is probably no fix, but a with that change applied, the warning is > gone. Now the question is what to really test for when returning here. I > suppose we want the pipeline state of root here - should I > __ipipe_check_root_interruptible? This does not make sense, read the comment above that change: there is no way an interrupt can be taken, and so entering svc_entry, with interrupts off. Besides this is mainline code, so it would be a problem for mainline too. We are necessarily returning to a place where hardware irqs were on. To me the problem is rather that we enter trace_hardirqs_on/trace_hardirqs_off when in the xenomai domain. We can try and fix that, but this will result in a hell of entry.S to maintain.I would rather exit early in trace_hardirqs_on/trace_hardirqs_off if current domain is not root. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 19:46 ` Gilles Chanteperdrix @ 2014-11-10 19:51 ` Gilles Chanteperdrix 2014-11-10 19:55 ` Jan Kiszka 1 sibling, 0 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 19:51 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 08:46:06PM +0100, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > > On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > > > On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > > >> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > > >>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > > >>>> > > >>>> Hi Gilles, > > >>>> > > >>>>> Do you have the same message with exactly the same kernel > > >>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > >>>> > > >>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > > >>>> appear on boot-up. > > >>>> > > >>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > > >>>>> with unlocked context switch. > > >>>> > > >>>> FCSE is already disabled at all. > > >>>> > > >>>> Do you have an idea how to overcome the problem? > > >>> > > >>> I am not sure the lockdep message really is a problem. lockdep could > > >>> be confused by the fact that the hardware interrupts are not off > > >>> when running the I-pipe, or because we are missing some bit in the > > >>> I-pipe arm specific code to get it looking at the virtual mask > > >>> instead of the hardware mask. > > >>> > > >>> As for the scheduling while atomic and random segmentation fault, > > >>> you should use the I-pipe tracer, configure it with enough back > > >>> trace points, something like 1000 or 10000, and trigger a trace > > >>> freeze in the kernell code when the problem happens. > > >>> > > >>> Also, for the "scheduling while atomic", it may happen if you call > > >>> some Linux service which reschedules from primary mode, you can try > > >>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > > >>> and catch such mistakes. This is especially important if you are > > >>> running a custom skin. > > >> > > >> "Scheduling while atomic" may have the same reason why lockdep stumbles: > > >> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > > >> started to look into this issue again. We tried earlier but got distracted. > > > > > > I doubt that very much. Though I never run with lockdep, I sometimes > > > run with CONFIG_PREEMPT, and never saw this message. From what I can > > > see, the "scheduling while atomic" message is based on the > > > preempt_count only and does not use irqs_disabled() (which by the > > > way is known to work with I-pipe on ARM as well, so, if something is > > > broken, that should be something more obscure). > > > > Let's see. I think I've identified one wrong path: > > > > diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > > index d32f8bd..ab911f8 100644 > > --- a/arch/arm/kernel/entry-header.S > > +++ b/arch/arm/kernel/entry-header.S > > @@ -198,7 +198,10 @@ > > #ifdef CONFIG_TRACE_IRQFLAGS > > @ The parent context IRQs must have been enabled to get here in > > @ the first place, so there's no point checking the PSR I bit. > > - bl trace_hardirqs_on > > + tst \rpsr, #PSR_I_BIT > > + bleq trace_hardirqs_off > > + tst \rpsr, #PSR_I_BIT > > + blne trace_hardirqs_on > > #endif > > .else > > @ IRQs off again before pulling preserved data off the stack > > > > This is probably no fix, but a with that change applied, the warning is > > gone. Now the question is what to really test for when returning here. I > > suppose we want the pipeline state of root here - should I > > __ipipe_check_root_interruptible? > > This does not make sense, read the comment above that change: there > is no way an interrupt can be taken, and so entering svc_entry, with > interrupts off. Besides this is mainline code, so it would be a > problem for mainline too. We are necessarily returning to a place > where hardware irqs were on. > > To me the problem is rather that we enter > trace_hardirqs_on/trace_hardirqs_off when in the xenomai domain. > We can try and fix that, but this will result in a hell of entry.S > to maintain.I would rather exit early in > trace_hardirqs_on/trace_hardirqs_off if current domain is not root. The whole code in kernel/tracer/trace_irqsoff.c can clearly not be called from real-time domain, it uses local_irq_save, raw_smp_processor_id, preempt_count. So, really, the only sane fix is if (!ipipe_root_p) return; at the beginning trace_hardirqs_on/trace_hardirqs_off. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 19:46 ` Gilles Chanteperdrix 2014-11-10 19:51 ` Gilles Chanteperdrix @ 2014-11-10 19:55 ` Jan Kiszka 2014-11-10 20:00 ` Gilles Chanteperdrix 1 sibling, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 19:55 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>> >>>>>> Hi Gilles, >>>>>> >>>>>>> Do you have the same message with exactly the same kernel >>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>> >>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>> appear on boot-up. >>>>>> >>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>> with unlocked context switch. >>>>>> >>>>>> FCSE is already disabled at all. >>>>>> >>>>>> Do you have an idea how to overcome the problem? >>>>> >>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>> be confused by the fact that the hardware interrupts are not off >>>>> when running the I-pipe, or because we are missing some bit in the >>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>> instead of the hardware mask. >>>>> >>>>> As for the scheduling while atomic and random segmentation fault, >>>>> you should use the I-pipe tracer, configure it with enough back >>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>> freeze in the kernell code when the problem happens. >>>>> >>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>> some Linux service which reschedules from primary mode, you can try >>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>> and catch such mistakes. This is especially important if you are >>>>> running a custom skin. >>>> >>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>> started to look into this issue again. We tried earlier but got distracted. >>> >>> I doubt that very much. Though I never run with lockdep, I sometimes >>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>> see, the "scheduling while atomic" message is based on the >>> preempt_count only and does not use irqs_disabled() (which by the >>> way is known to work with I-pipe on ARM as well, so, if something is >>> broken, that should be something more obscure). >> >> Let's see. I think I've identified one wrong path: >> >> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >> index d32f8bd..ab911f8 100644 >> --- a/arch/arm/kernel/entry-header.S >> +++ b/arch/arm/kernel/entry-header.S >> @@ -198,7 +198,10 @@ >> #ifdef CONFIG_TRACE_IRQFLAGS >> @ The parent context IRQs must have been enabled to get here in >> @ the first place, so there's no point checking the PSR I bit. >> - bl trace_hardirqs_on >> + tst \rpsr, #PSR_I_BIT >> + bleq trace_hardirqs_off >> + tst \rpsr, #PSR_I_BIT >> + blne trace_hardirqs_on >> #endif >> .else >> @ IRQs off again before pulling preserved data off the stack >> >> This is probably no fix, but a with that change applied, the warning is >> gone. Now the question is what to really test for when returning here. I >> suppose we want the pipeline state of root here - should I >> __ipipe_check_root_interruptible? > > This does not make sense, read the comment above that change: there > is no way an interrupt can be taken, and so entering svc_entry, with > interrupts off. Besides this is mainline code, so it would be a > problem for mainline too. We are necessarily returning to a place > where hardware irqs were on. Did you also look at the trace I posted? Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/3a4c666d/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 19:55 ` Jan Kiszka @ 2014-11-10 20:00 ` Gilles Chanteperdrix 2014-11-10 20:02 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:00 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>> > >>>>>> Hi Gilles, > >>>>>> > >>>>>>> Do you have the same message with exactly the same kernel > >>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>> > >>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>> appear on boot-up. > >>>>>> > >>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>> with unlocked context switch. > >>>>>> > >>>>>> FCSE is already disabled at all. > >>>>>> > >>>>>> Do you have an idea how to overcome the problem? > >>>>> > >>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>> be confused by the fact that the hardware interrupts are not off > >>>>> when running the I-pipe, or because we are missing some bit in the > >>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>> instead of the hardware mask. > >>>>> > >>>>> As for the scheduling while atomic and random segmentation fault, > >>>>> you should use the I-pipe tracer, configure it with enough back > >>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>> freeze in the kernell code when the problem happens. > >>>>> > >>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>> some Linux service which reschedules from primary mode, you can try > >>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>> and catch such mistakes. This is especially important if you are > >>>>> running a custom skin. > >>>> > >>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>> started to look into this issue again. We tried earlier but got distracted. > >>> > >>> I doubt that very much. Though I never run with lockdep, I sometimes > >>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>> see, the "scheduling while atomic" message is based on the > >>> preempt_count only and does not use irqs_disabled() (which by the > >>> way is known to work with I-pipe on ARM as well, so, if something is > >>> broken, that should be something more obscure). > >> > >> Let's see. I think I've identified one wrong path: > >> > >> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >> index d32f8bd..ab911f8 100644 > >> --- a/arch/arm/kernel/entry-header.S > >> +++ b/arch/arm/kernel/entry-header.S > >> @@ -198,7 +198,10 @@ > >> #ifdef CONFIG_TRACE_IRQFLAGS > >> @ The parent context IRQs must have been enabled to get here in > >> @ the first place, so there's no point checking the PSR I bit. > >> - bl trace_hardirqs_on > >> + tst \rpsr, #PSR_I_BIT > >> + bleq trace_hardirqs_off > >> + tst \rpsr, #PSR_I_BIT > >> + blne trace_hardirqs_on > >> #endif > >> .else > >> @ IRQs off again before pulling preserved data off the stack > >> > >> This is probably no fix, but a with that change applied, the warning is > >> gone. Now the question is what to really test for when returning here. I > >> suppose we want the pipeline state of root here - should I > >> __ipipe_check_root_interruptible? > > > > This does not make sense, read the comment above that change: there > > is no way an interrupt can be taken, and so entering svc_entry, with > > interrupts off. Besides this is mainline code, so it would be a > > problem for mainline too. We are necessarily returning to a place > > where hardware irqs were on. > > Did you also look at the trace I posted? Yes, but I did not see what I am supposed to see. The only thing I see is that these trace functions should never have been called from rt domain in the first place. Note that the fact that this trace_irqs stuff is not working well may be the fact that part of them are commented with CONFIG_IPIPE (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/468994f6/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:00 ` Gilles Chanteperdrix @ 2014-11-10 20:02 ` Jan Kiszka 2014-11-10 20:06 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:02 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: >> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>>>> >>>>>>>> Hi Gilles, >>>>>>>> >>>>>>>>> Do you have the same message with exactly the same kernel >>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>>>> >>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>>>> appear on boot-up. >>>>>>>> >>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>>>> with unlocked context switch. >>>>>>>> >>>>>>>> FCSE is already disabled at all. >>>>>>>> >>>>>>>> Do you have an idea how to overcome the problem? >>>>>>> >>>>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>>>> be confused by the fact that the hardware interrupts are not off >>>>>>> when running the I-pipe, or because we are missing some bit in the >>>>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>>>> instead of the hardware mask. >>>>>>> >>>>>>> As for the scheduling while atomic and random segmentation fault, >>>>>>> you should use the I-pipe tracer, configure it with enough back >>>>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>>>> freeze in the kernell code when the problem happens. >>>>>>> >>>>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>>>> some Linux service which reschedules from primary mode, you can try >>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>>>> and catch such mistakes. This is especially important if you are >>>>>>> running a custom skin. >>>>>> >>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>>>> started to look into this issue again. We tried earlier but got distracted. >>>>> >>>>> I doubt that very much. Though I never run with lockdep, I sometimes >>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>>>> see, the "scheduling while atomic" message is based on the >>>>> preempt_count only and does not use irqs_disabled() (which by the >>>>> way is known to work with I-pipe on ARM as well, so, if something is >>>>> broken, that should be something more obscure). >>>> >>>> Let's see. I think I've identified one wrong path: >>>> >>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>> index d32f8bd..ab911f8 100644 >>>> --- a/arch/arm/kernel/entry-header.S >>>> +++ b/arch/arm/kernel/entry-header.S >>>> @@ -198,7 +198,10 @@ >>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>> @ The parent context IRQs must have been enabled to get here in >>>> @ the first place, so there's no point checking the PSR I bit. >>>> - bl trace_hardirqs_on >>>> + tst \rpsr, #PSR_I_BIT >>>> + bleq trace_hardirqs_off >>>> + tst \rpsr, #PSR_I_BIT >>>> + blne trace_hardirqs_on >>>> #endif >>>> .else >>>> @ IRQs off again before pulling preserved data off the stack >>>> >>>> This is probably no fix, but a with that change applied, the warning is >>>> gone. Now the question is what to really test for when returning here. I >>>> suppose we want the pipeline state of root here - should I >>>> __ipipe_check_root_interruptible? >>> >>> This does not make sense, read the comment above that change: there >>> is no way an interrupt can be taken, and so entering svc_entry, with >>> interrupts off. Besides this is mainline code, so it would be a >>> problem for mainline too. We are necessarily returning to a place >>> where hardware irqs were on. >> >> Did you also look at the trace I posted? > > Yes, but I did not see what I am supposed to see. The only thing I > see is that these trace functions should never have been called from > rt domain in the first place. > There is no RT domain in the trace, only an inconsistent Linux trace state after return from IRQ. > Note that the fact that this trace_irqs stuff is not working well > may be the fact that part of them are commented with CONFIG_IPIPE > (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) No, that doesn't solve all issues. Even with my hack (which may not address all cases properly) plus the reversion of that commit, there are still inconsistencies. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/78be1a6f/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:02 ` Jan Kiszka @ 2014-11-10 20:06 ` Gilles Chanteperdrix 2014-11-10 20:10 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:06 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>>>> > >>>>>>>> Hi Gilles, > >>>>>>>> > >>>>>>>>> Do you have the same message with exactly the same kernel > >>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>>>> > >>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>>>> appear on boot-up. > >>>>>>>> > >>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>>>> with unlocked context switch. > >>>>>>>> > >>>>>>>> FCSE is already disabled at all. > >>>>>>>> > >>>>>>>> Do you have an idea how to overcome the problem? > >>>>>>> > >>>>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>>>> be confused by the fact that the hardware interrupts are not off > >>>>>>> when running the I-pipe, or because we are missing some bit in the > >>>>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>>>> instead of the hardware mask. > >>>>>>> > >>>>>>> As for the scheduling while atomic and random segmentation fault, > >>>>>>> you should use the I-pipe tracer, configure it with enough back > >>>>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>>>> freeze in the kernell code when the problem happens. > >>>>>>> > >>>>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>>>> some Linux service which reschedules from primary mode, you can try > >>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>>>> and catch such mistakes. This is especially important if you are > >>>>>>> running a custom skin. > >>>>>> > >>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>>>> started to look into this issue again. We tried earlier but got distracted. > >>>>> > >>>>> I doubt that very much. Though I never run with lockdep, I sometimes > >>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>>>> see, the "scheduling while atomic" message is based on the > >>>>> preempt_count only and does not use irqs_disabled() (which by the > >>>>> way is known to work with I-pipe on ARM as well, so, if something is > >>>>> broken, that should be something more obscure). > >>>> > >>>> Let's see. I think I've identified one wrong path: > >>>> > >>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>> index d32f8bd..ab911f8 100644 > >>>> --- a/arch/arm/kernel/entry-header.S > >>>> +++ b/arch/arm/kernel/entry-header.S > >>>> @@ -198,7 +198,10 @@ > >>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>> @ The parent context IRQs must have been enabled to get here in > >>>> @ the first place, so there's no point checking the PSR I bit. > >>>> - bl trace_hardirqs_on > >>>> + tst \rpsr, #PSR_I_BIT > >>>> + bleq trace_hardirqs_off > >>>> + tst \rpsr, #PSR_I_BIT > >>>> + blne trace_hardirqs_on > >>>> #endif > >>>> .else > >>>> @ IRQs off again before pulling preserved data off the stack > >>>> > >>>> This is probably no fix, but a with that change applied, the warning is > >>>> gone. Now the question is what to really test for when returning here. I > >>>> suppose we want the pipeline state of root here - should I > >>>> __ipipe_check_root_interruptible? > >>> > >>> This does not make sense, read the comment above that change: there > >>> is no way an interrupt can be taken, and so entering svc_entry, with > >>> interrupts off. Besides this is mainline code, so it would be a > >>> problem for mainline too. We are necessarily returning to a place > >>> where hardware irqs were on. > >> > >> Did you also look at the trace I posted? > > > > Yes, but I did not see what I am supposed to see. The only thing I > > see is that these trace functions should never have been called from > > rt domain in the first place. > > > > There is no RT domain in the trace, only an inconsistent Linux trace > state after return from IRQ. What can I say, when returning from IRQ, you are necessarily returning to a point where irqs are ON, as the comment says, and it makes perfect sense. So your "fix" should be a nop. So, something else is broken. > > > Note that the fact that this trace_irqs stuff is not working well > > may be the fact that part of them are commented with CONFIG_IPIPE > > (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > > No, that doesn't solve all issues. Even with my hack (which may not > address all cases properly) plus the reversion of that commit, there are > still inconsistencies. You can not reverse that commit, otherwise you will end-up calling trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I repeat, can not work. -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/1effbfc6/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:06 ` Gilles Chanteperdrix @ 2014-11-10 20:10 ` Jan Kiszka 2014-11-10 20:14 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:10 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: >> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>>>>>> >>>>>>>>>> Hi Gilles, >>>>>>>>>> >>>>>>>>>>> Do you have the same message with exactly the same kernel >>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>>>>>> >>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>>>>>> appear on boot-up. >>>>>>>>>> >>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>>>>>> with unlocked context switch. >>>>>>>>>> >>>>>>>>>> FCSE is already disabled at all. >>>>>>>>>> >>>>>>>>>> Do you have an idea how to overcome the problem? >>>>>>>>> >>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>>>>>> be confused by the fact that the hardware interrupts are not off >>>>>>>>> when running the I-pipe, or because we are missing some bit in the >>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>>>>>> instead of the hardware mask. >>>>>>>>> >>>>>>>>> As for the scheduling while atomic and random segmentation fault, >>>>>>>>> you should use the I-pipe tracer, configure it with enough back >>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>>>>>> freeze in the kernell code when the problem happens. >>>>>>>>> >>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>>>>>> some Linux service which reschedules from primary mode, you can try >>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>>>>>> and catch such mistakes. This is especially important if you are >>>>>>>>> running a custom skin. >>>>>>>> >>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>>>>>> started to look into this issue again. We tried earlier but got distracted. >>>>>>> >>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes >>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>>>>>> see, the "scheduling while atomic" message is based on the >>>>>>> preempt_count only and does not use irqs_disabled() (which by the >>>>>>> way is known to work with I-pipe on ARM as well, so, if something is >>>>>>> broken, that should be something more obscure). >>>>>> >>>>>> Let's see. I think I've identified one wrong path: >>>>>> >>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>>>> index d32f8bd..ab911f8 100644 >>>>>> --- a/arch/arm/kernel/entry-header.S >>>>>> +++ b/arch/arm/kernel/entry-header.S >>>>>> @@ -198,7 +198,10 @@ >>>>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>>>> @ The parent context IRQs must have been enabled to get here in >>>>>> @ the first place, so there's no point checking the PSR I bit. >>>>>> - bl trace_hardirqs_on >>>>>> + tst \rpsr, #PSR_I_BIT >>>>>> + bleq trace_hardirqs_off >>>>>> + tst \rpsr, #PSR_I_BIT >>>>>> + blne trace_hardirqs_on >>>>>> #endif >>>>>> .else >>>>>> @ IRQs off again before pulling preserved data off the stack >>>>>> >>>>>> This is probably no fix, but a with that change applied, the warning is >>>>>> gone. Now the question is what to really test for when returning here. I >>>>>> suppose we want the pipeline state of root here - should I >>>>>> __ipipe_check_root_interruptible? >>>>> >>>>> This does not make sense, read the comment above that change: there >>>>> is no way an interrupt can be taken, and so entering svc_entry, with >>>>> interrupts off. Besides this is mainline code, so it would be a >>>>> problem for mainline too. We are necessarily returning to a place >>>>> where hardware irqs were on. >>>> >>>> Did you also look at the trace I posted? >>> >>> Yes, but I did not see what I am supposed to see. The only thing I >>> see is that these trace functions should never have been called from >>> rt domain in the first place. >>> >> >> There is no RT domain in the trace, only an inconsistent Linux trace >> state after return from IRQ. > > What can I say, when returning from IRQ, you are necessarily > returning to a point where irqs are ON, as the comment says, and it > makes perfect sense. So your "fix" should be a nop. So, something > else is broken. The test is for selecting trace_hardirqs_off/on is wrong, that's why I was asking for a better check. Also, if that path can be taken by RT domains as well, calling trace_hardirqs_off/on was always wrong, and we additionally need to check for the caller's domain. > >> >>> Note that the fact that this trace_irqs stuff is not working well >>> may be the fact that part of them are commented with CONFIG_IPIPE >>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) >> >> No, that doesn't solve all issues. Even with my hack (which may not >> address all cases properly) plus the reversion of that commit, there are >> still inconsistencies. > > You can not reverse that commit, otherwise you will end-up calling > trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > repeat, can not work. I can help to understand if that is sufficient to resolve the tracing breakage - it isn't, there are more paths missing or wrongly instrumented. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/7faf509f/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:10 ` Jan Kiszka @ 2014-11-10 20:14 ` Gilles Chanteperdrix 2014-11-10 20:17 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:14 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Gilles, > >>>>>>>>>> > >>>>>>>>>>> Do you have the same message with exactly the same kernel > >>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>>>>>> > >>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>>>>>> appear on boot-up. > >>>>>>>>>> > >>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>>>>>> with unlocked context switch. > >>>>>>>>>> > >>>>>>>>>> FCSE is already disabled at all. > >>>>>>>>>> > >>>>>>>>>> Do you have an idea how to overcome the problem? > >>>>>>>>> > >>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>>>>>> be confused by the fact that the hardware interrupts are not off > >>>>>>>>> when running the I-pipe, or because we are missing some bit in the > >>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>>>>>> instead of the hardware mask. > >>>>>>>>> > >>>>>>>>> As for the scheduling while atomic and random segmentation fault, > >>>>>>>>> you should use the I-pipe tracer, configure it with enough back > >>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>>>>>> freeze in the kernell code when the problem happens. > >>>>>>>>> > >>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>>>>>> some Linux service which reschedules from primary mode, you can try > >>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>>>>>> and catch such mistakes. This is especially important if you are > >>>>>>>>> running a custom skin. > >>>>>>>> > >>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>>>>>> started to look into this issue again. We tried earlier but got distracted. > >>>>>>> > >>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > >>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>>>>>> see, the "scheduling while atomic" message is based on the > >>>>>>> preempt_count only and does not use irqs_disabled() (which by the > >>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > >>>>>>> broken, that should be something more obscure). > >>>>>> > >>>>>> Let's see. I think I've identified one wrong path: > >>>>>> > >>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>>>> index d32f8bd..ab911f8 100644 > >>>>>> --- a/arch/arm/kernel/entry-header.S > >>>>>> +++ b/arch/arm/kernel/entry-header.S > >>>>>> @@ -198,7 +198,10 @@ > >>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>>>> @ The parent context IRQs must have been enabled to get here in > >>>>>> @ the first place, so there's no point checking the PSR I bit. > >>>>>> - bl trace_hardirqs_on > >>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>> + bleq trace_hardirqs_off > >>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>> + blne trace_hardirqs_on > >>>>>> #endif > >>>>>> .else > >>>>>> @ IRQs off again before pulling preserved data off the stack > >>>>>> > >>>>>> This is probably no fix, but a with that change applied, the warning is > >>>>>> gone. Now the question is what to really test for when returning here. I > >>>>>> suppose we want the pipeline state of root here - should I > >>>>>> __ipipe_check_root_interruptible? > >>>>> > >>>>> This does not make sense, read the comment above that change: there > >>>>> is no way an interrupt can be taken, and so entering svc_entry, with > >>>>> interrupts off. Besides this is mainline code, so it would be a > >>>>> problem for mainline too. We are necessarily returning to a place > >>>>> where hardware irqs were on. > >>>> > >>>> Did you also look at the trace I posted? > >>> > >>> Yes, but I did not see what I am supposed to see. The only thing I > >>> see is that these trace functions should never have been called from > >>> rt domain in the first place. > >>> > >> > >> There is no RT domain in the trace, only an inconsistent Linux trace > >> state after return from IRQ. > > > > What can I say, when returning from IRQ, you are necessarily > > returning to a point where irqs are ON, as the comment says, and it > > makes perfect sense. So your "fix" should be a nop. So, something > > else is broken. > > The test is for selecting trace_hardirqs_off/on is wrong, that's why I > was asking for a better check. Also, if that path can be taken by RT > domains as well, calling trace_hardirqs_off/on was always wrong, and we > additionally need to check for the caller's domain. > > > > >> > >>> Note that the fact that this trace_irqs stuff is not working well > >>> may be the fact that part of them are commented with CONFIG_IPIPE > >>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > >> > >> No, that doesn't solve all issues. Even with my hack (which may not > >> address all cases properly) plus the reversion of that commit, there are > >> still inconsistencies. > > > > You can not reverse that commit, otherwise you will end-up calling > > trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > > repeat, can not work. > > I can help to understand if that is sufficient to resolve the tracing > breakage - it isn't, there are more paths missing or wrongly instrumented. My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on !IPIPE, since the I-pipe tracer provides the same functionality. And is not broken. -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/9d9110a1/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:14 ` Gilles Chanteperdrix @ 2014-11-10 20:17 ` Jan Kiszka 2014-11-10 20:18 ` Gilles Chanteperdrix 2014-11-10 20:23 ` Gilles Chanteperdrix 0 siblings, 2 replies; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:17 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: >> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: >>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Gilles, >>>>>>>>>>>> >>>>>>>>>>>>> Do you have the same message with exactly the same kernel >>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>>>>>>>> >>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>>>>>>>> appear on boot-up. >>>>>>>>>>>> >>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>>>>>>>> with unlocked context switch. >>>>>>>>>>>> >>>>>>>>>>>> FCSE is already disabled at all. >>>>>>>>>>>> >>>>>>>>>>>> Do you have an idea how to overcome the problem? >>>>>>>>>>> >>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>>>>>>>> be confused by the fact that the hardware interrupts are not off >>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the >>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>>>>>>>> instead of the hardware mask. >>>>>>>>>>> >>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, >>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back >>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>>>>>>>> freeze in the kernell code when the problem happens. >>>>>>>>>>> >>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>>>>>>>> some Linux service which reschedules from primary mode, you can try >>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>>>>>>>> and catch such mistakes. This is especially important if you are >>>>>>>>>>> running a custom skin. >>>>>>>>>> >>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. >>>>>>>>> >>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes >>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>>>>>>>> see, the "scheduling while atomic" message is based on the >>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the >>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is >>>>>>>>> broken, that should be something more obscure). >>>>>>>> >>>>>>>> Let's see. I think I've identified one wrong path: >>>>>>>> >>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>>>>>> index d32f8bd..ab911f8 100644 >>>>>>>> --- a/arch/arm/kernel/entry-header.S >>>>>>>> +++ b/arch/arm/kernel/entry-header.S >>>>>>>> @@ -198,7 +198,10 @@ >>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>>>>>> @ The parent context IRQs must have been enabled to get here in >>>>>>>> @ the first place, so there's no point checking the PSR I bit. >>>>>>>> - bl trace_hardirqs_on >>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>> + bleq trace_hardirqs_off >>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>> + blne trace_hardirqs_on >>>>>>>> #endif >>>>>>>> .else >>>>>>>> @ IRQs off again before pulling preserved data off the stack >>>>>>>> >>>>>>>> This is probably no fix, but a with that change applied, the warning is >>>>>>>> gone. Now the question is what to really test for when returning here. I >>>>>>>> suppose we want the pipeline state of root here - should I >>>>>>>> __ipipe_check_root_interruptible? >>>>>>> >>>>>>> This does not make sense, read the comment above that change: there >>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with >>>>>>> interrupts off. Besides this is mainline code, so it would be a >>>>>>> problem for mainline too. We are necessarily returning to a place >>>>>>> where hardware irqs were on. >>>>>> >>>>>> Did you also look at the trace I posted? >>>>> >>>>> Yes, but I did not see what I am supposed to see. The only thing I >>>>> see is that these trace functions should never have been called from >>>>> rt domain in the first place. >>>>> >>>> >>>> There is no RT domain in the trace, only an inconsistent Linux trace >>>> state after return from IRQ. >>> >>> What can I say, when returning from IRQ, you are necessarily >>> returning to a point where irqs are ON, as the comment says, and it >>> makes perfect sense. So your "fix" should be a nop. So, something >>> else is broken. >> >> The test is for selecting trace_hardirqs_off/on is wrong, that's why I >> was asking for a better check. Also, if that path can be taken by RT >> domains as well, calling trace_hardirqs_off/on was always wrong, and we >> additionally need to check for the caller's domain. >> >>> >>>> >>>>> Note that the fact that this trace_irqs stuff is not working well >>>>> may be the fact that part of them are commented with CONFIG_IPIPE >>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) >>>> >>>> No, that doesn't solve all issues. Even with my hack (which may not >>>> address all cases properly) plus the reversion of that commit, there are >>>> still inconsistencies. >>> >>> You can not reverse that commit, otherwise you will end-up calling >>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I >>> repeat, can not work. >> >> I can help to understand if that is sufficient to resolve the tracing >> breakage - it isn't, there are more paths missing or wrongly instrumented. > > My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > !IPIPE, since the I-pipe tracer provides the same functionality. And > is not broken. No, the I-pipe trace does not provide a Linux lock dependency checker, nor does it support might_sleep and such. If you have Linux drivers which depend on Xenomai directly or indirectly, you cannot validate them anymore. That's why we support this on x86. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/9bac3121/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:17 ` Jan Kiszka @ 2014-11-10 20:18 ` Gilles Chanteperdrix 2014-11-10 20:22 ` Jan Kiszka 2014-11-10 20:23 ` Gilles Chanteperdrix 1 sibling, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:18 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > That's why we support this on x86. But at what cost? -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/e393170e/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:18 ` Gilles Chanteperdrix @ 2014-11-10 20:22 ` Jan Kiszka 0 siblings, 0 replies; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:22 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:18, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: >> That's why we support this on x86. > > But at what cost? The last times I touched that arch: none. It's apparently mature and doesn't break on updates. Not sure if Philippe stumbled over something the past years, though. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/381ad6f2/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:17 ` Jan Kiszka 2014-11-10 20:18 ` Gilles Chanteperdrix @ 2014-11-10 20:23 ` Gilles Chanteperdrix 2014-11-10 20:28 ` Jan Kiszka 1 sibling, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:23 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > >>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Gilles, > >>>>>>>>>>>> > >>>>>>>>>>>>> Do you have the same message with exactly the same kernel > >>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>>>>>>>> > >>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>>>>>>>> appear on boot-up. > >>>>>>>>>>>> > >>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>>>>>>>> with unlocked context switch. > >>>>>>>>>>>> > >>>>>>>>>>>> FCSE is already disabled at all. > >>>>>>>>>>>> > >>>>>>>>>>>> Do you have an idea how to overcome the problem? > >>>>>>>>>>> > >>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>>>>>>>> be confused by the fact that the hardware interrupts are not off > >>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the > >>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>>>>>>>> instead of the hardware mask. > >>>>>>>>>>> > >>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, > >>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back > >>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>>>>>>>> freeze in the kernell code when the problem happens. > >>>>>>>>>>> > >>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>>>>>>>> some Linux service which reschedules from primary mode, you can try > >>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>>>>>>>> and catch such mistakes. This is especially important if you are > >>>>>>>>>>> running a custom skin. > >>>>>>>>>> > >>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. > >>>>>>>>> > >>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > >>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>>>>>>>> see, the "scheduling while atomic" message is based on the > >>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the > >>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > >>>>>>>>> broken, that should be something more obscure). > >>>>>>>> > >>>>>>>> Let's see. I think I've identified one wrong path: > >>>>>>>> > >>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>>>>>> index d32f8bd..ab911f8 100644 > >>>>>>>> --- a/arch/arm/kernel/entry-header.S > >>>>>>>> +++ b/arch/arm/kernel/entry-header.S > >>>>>>>> @@ -198,7 +198,10 @@ > >>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>>>>>> @ The parent context IRQs must have been enabled to get here in > >>>>>>>> @ the first place, so there's no point checking the PSR I bit. > >>>>>>>> - bl trace_hardirqs_on > >>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>> + bleq trace_hardirqs_off > >>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>> + blne trace_hardirqs_on > >>>>>>>> #endif > >>>>>>>> .else > >>>>>>>> @ IRQs off again before pulling preserved data off the stack > >>>>>>>> > >>>>>>>> This is probably no fix, but a with that change applied, the warning is > >>>>>>>> gone. Now the question is what to really test for when returning here. I > >>>>>>>> suppose we want the pipeline state of root here - should I > >>>>>>>> __ipipe_check_root_interruptible? > >>>>>>> > >>>>>>> This does not make sense, read the comment above that change: there > >>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with > >>>>>>> interrupts off. Besides this is mainline code, so it would be a > >>>>>>> problem for mainline too. We are necessarily returning to a place > >>>>>>> where hardware irqs were on. > >>>>>> > >>>>>> Did you also look at the trace I posted? > >>>>> > >>>>> Yes, but I did not see what I am supposed to see. The only thing I > >>>>> see is that these trace functions should never have been called from > >>>>> rt domain in the first place. > >>>>> > >>>> > >>>> There is no RT domain in the trace, only an inconsistent Linux trace > >>>> state after return from IRQ. > >>> > >>> What can I say, when returning from IRQ, you are necessarily > >>> returning to a point where irqs are ON, as the comment says, and it > >>> makes perfect sense. So your "fix" should be a nop. So, something > >>> else is broken. > >> > >> The test is for selecting trace_hardirqs_off/on is wrong, that's why I > >> was asking for a better check. Also, if that path can be taken by RT > >> domains as well, calling trace_hardirqs_off/on was always wrong, and we > >> additionally need to check for the caller's domain. > >> > >>> > >>>> > >>>>> Note that the fact that this trace_irqs stuff is not working well > >>>>> may be the fact that part of them are commented with CONFIG_IPIPE > >>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > >>>> > >>>> No, that doesn't solve all issues. Even with my hack (which may not > >>>> address all cases properly) plus the reversion of that commit, there are > >>>> still inconsistencies. > >>> > >>> You can not reverse that commit, otherwise you will end-up calling > >>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > >>> repeat, can not work. > >> > >> I can help to understand if that is sufficient to resolve the tracing > >> breakage - it isn't, there are more paths missing or wrongly instrumented. > > > > My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > > !IPIPE, since the I-pipe tracer provides the same functionality. And > > is not broken. > > No, the I-pipe trace does not provide a Linux lock dependency checker, > nor does it support might_sleep and such. If you have Linux drivers > which depend on Xenomai directly or indirectly, you cannot validate them > anymore. That's why we support this on x86. Since the I-pipe is already keeping track of irq state with CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead of trying and using this trace_hardirqs stuff which looks irremediably broken to me? -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/2d528f20/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:23 ` Gilles Chanteperdrix @ 2014-11-10 20:28 ` Jan Kiszka 2014-11-10 20:37 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:28 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:23, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: >> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: >>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: >>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: >>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Gilles, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel >>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>>>>>>>>>> >>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>>>>>>>>>> appear on boot-up. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>>>>>>>>>> with unlocked context switch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> FCSE is already disabled at all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you have an idea how to overcome the problem? >>>>>>>>>>>>> >>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off >>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the >>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>>>>>>>>>> instead of the hardware mask. >>>>>>>>>>>>> >>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, >>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back >>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>>>>>>>>>> freeze in the kernell code when the problem happens. >>>>>>>>>>>>> >>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try >>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>>>>>>>>>> and catch such mistakes. This is especially important if you are >>>>>>>>>>>>> running a custom skin. >>>>>>>>>>>> >>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. >>>>>>>>>>> >>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes >>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>>>>>>>>>> see, the "scheduling while atomic" message is based on the >>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the >>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is >>>>>>>>>>> broken, that should be something more obscure). >>>>>>>>>> >>>>>>>>>> Let's see. I think I've identified one wrong path: >>>>>>>>>> >>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>>>>>>>> index d32f8bd..ab911f8 100644 >>>>>>>>>> --- a/arch/arm/kernel/entry-header.S >>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S >>>>>>>>>> @@ -198,7 +198,10 @@ >>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>>>>>>>> @ The parent context IRQs must have been enabled to get here in >>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. >>>>>>>>>> - bl trace_hardirqs_on >>>>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>>>> + bleq trace_hardirqs_off >>>>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>>>> + blne trace_hardirqs_on >>>>>>>>>> #endif >>>>>>>>>> .else >>>>>>>>>> @ IRQs off again before pulling preserved data off the stack >>>>>>>>>> >>>>>>>>>> This is probably no fix, but a with that change applied, the warning is >>>>>>>>>> gone. Now the question is what to really test for when returning here. I >>>>>>>>>> suppose we want the pipeline state of root here - should I >>>>>>>>>> __ipipe_check_root_interruptible? >>>>>>>>> >>>>>>>>> This does not make sense, read the comment above that change: there >>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with >>>>>>>>> interrupts off. Besides this is mainline code, so it would be a >>>>>>>>> problem for mainline too. We are necessarily returning to a place >>>>>>>>> where hardware irqs were on. >>>>>>>> >>>>>>>> Did you also look at the trace I posted? >>>>>>> >>>>>>> Yes, but I did not see what I am supposed to see. The only thing I >>>>>>> see is that these trace functions should never have been called from >>>>>>> rt domain in the first place. >>>>>>> >>>>>> >>>>>> There is no RT domain in the trace, only an inconsistent Linux trace >>>>>> state after return from IRQ. >>>>> >>>>> What can I say, when returning from IRQ, you are necessarily >>>>> returning to a point where irqs are ON, as the comment says, and it >>>>> makes perfect sense. So your "fix" should be a nop. So, something >>>>> else is broken. >>>> >>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I >>>> was asking for a better check. Also, if that path can be taken by RT >>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we >>>> additionally need to check for the caller's domain. >>>> >>>>> >>>>>> >>>>>>> Note that the fact that this trace_irqs stuff is not working well >>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE >>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) >>>>>> >>>>>> No, that doesn't solve all issues. Even with my hack (which may not >>>>>> address all cases properly) plus the reversion of that commit, there are >>>>>> still inconsistencies. >>>>> >>>>> You can not reverse that commit, otherwise you will end-up calling >>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I >>>>> repeat, can not work. >>>> >>>> I can help to understand if that is sufficient to resolve the tracing >>>> breakage - it isn't, there are more paths missing or wrongly instrumented. >>> >>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on >>> !IPIPE, since the I-pipe tracer provides the same functionality. And >>> is not broken. >> >> No, the I-pipe trace does not provide a Linux lock dependency checker, >> nor does it support might_sleep and such. If you have Linux drivers >> which depend on Xenomai directly or indirectly, you cannot validate them >> anymore. That's why we support this on x86. > > Since the I-pipe is already keeping track of irq state with > CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead > of trying and using this trace_hardirqs stuff which looks > irremediably broken to me? The former reflects the hw state, the latter traces the Linux state - from Linux POV. This is fixable. We just need to call the tracing functions where Linux would call it or where we replaced some Linux call with an I-pipe specific path and avoid calling it when the domain != root. Identifying those spots is tricky. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/4d34a8bb/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:28 ` Jan Kiszka @ 2014-11-10 20:37 ` Gilles Chanteperdrix 2014-11-10 20:42 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:37 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote: > On 2014-11-10 21:23, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > >>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > >>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Gilles, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel > >>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>>>>>>>>>> appear on boot-up. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>>>>>>>>>> with unlocked context switch. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> FCSE is already disabled at all. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Do you have an idea how to overcome the problem? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off > >>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the > >>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>>>>>>>>>> instead of the hardware mask. > >>>>>>>>>>>>> > >>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, > >>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back > >>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>>>>>>>>>> freeze in the kernell code when the problem happens. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try > >>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>>>>>>>>>> and catch such mistakes. This is especially important if you are > >>>>>>>>>>>>> running a custom skin. > >>>>>>>>>>>> > >>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. > >>>>>>>>>>> > >>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > >>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>>>>>>>>>> see, the "scheduling while atomic" message is based on the > >>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the > >>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > >>>>>>>>>>> broken, that should be something more obscure). > >>>>>>>>>> > >>>>>>>>>> Let's see. I think I've identified one wrong path: > >>>>>>>>>> > >>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>>>>>>>> index d32f8bd..ab911f8 100644 > >>>>>>>>>> --- a/arch/arm/kernel/entry-header.S > >>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S > >>>>>>>>>> @@ -198,7 +198,10 @@ > >>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>>>>>>>> @ The parent context IRQs must have been enabled to get here in > >>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. > >>>>>>>>>> - bl trace_hardirqs_on > >>>>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>>>> + bleq trace_hardirqs_off > >>>>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>>>> + blne trace_hardirqs_on > >>>>>>>>>> #endif > >>>>>>>>>> .else > >>>>>>>>>> @ IRQs off again before pulling preserved data off the stack > >>>>>>>>>> > >>>>>>>>>> This is probably no fix, but a with that change applied, the warning is > >>>>>>>>>> gone. Now the question is what to really test for when returning here. I > >>>>>>>>>> suppose we want the pipeline state of root here - should I > >>>>>>>>>> __ipipe_check_root_interruptible? > >>>>>>>>> > >>>>>>>>> This does not make sense, read the comment above that change: there > >>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with > >>>>>>>>> interrupts off. Besides this is mainline code, so it would be a > >>>>>>>>> problem for mainline too. We are necessarily returning to a place > >>>>>>>>> where hardware irqs were on. > >>>>>>>> > >>>>>>>> Did you also look at the trace I posted? > >>>>>>> > >>>>>>> Yes, but I did not see what I am supposed to see. The only thing I > >>>>>>> see is that these trace functions should never have been called from > >>>>>>> rt domain in the first place. > >>>>>>> > >>>>>> > >>>>>> There is no RT domain in the trace, only an inconsistent Linux trace > >>>>>> state after return from IRQ. > >>>>> > >>>>> What can I say, when returning from IRQ, you are necessarily > >>>>> returning to a point where irqs are ON, as the comment says, and it > >>>>> makes perfect sense. So your "fix" should be a nop. So, something > >>>>> else is broken. > >>>> > >>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I > >>>> was asking for a better check. Also, if that path can be taken by RT > >>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we > >>>> additionally need to check for the caller's domain. > >>>> > >>>>> > >>>>>> > >>>>>>> Note that the fact that this trace_irqs stuff is not working well > >>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE > >>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > >>>>>> > >>>>>> No, that doesn't solve all issues. Even with my hack (which may not > >>>>>> address all cases properly) plus the reversion of that commit, there are > >>>>>> still inconsistencies. > >>>>> > >>>>> You can not reverse that commit, otherwise you will end-up calling > >>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > >>>>> repeat, can not work. > >>>> > >>>> I can help to understand if that is sufficient to resolve the tracing > >>>> breakage - it isn't, there are more paths missing or wrongly instrumented. > >>> > >>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > >>> !IPIPE, since the I-pipe tracer provides the same functionality. And > >>> is not broken. > >> > >> No, the I-pipe trace does not provide a Linux lock dependency checker, > >> nor does it support might_sleep and such. If you have Linux drivers > >> which depend on Xenomai directly or indirectly, you cannot validate them > >> anymore. That's why we support this on x86. > > > > Since the I-pipe is already keeping track of irq state with > > CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead > > of trying and using this trace_hardirqs stuff which looks > > irremediably broken to me? > > The former reflects the hw state, the latter traces the Linux state - > from Linux POV. The I-pipe tracer keeps track of the root domain stall bit as well. > > This is fixable. We just need to call the tracing functions where Linux > would call it or where we replaced some Linux call with an I-pipe > specific path and avoid calling it when the domain != root. Identifying > those spots is tricky. If we take the example of an irq, we probably want not to call trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the root domain stall bit. -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/138e12ca/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:37 ` Gilles Chanteperdrix @ 2014-11-10 20:42 ` Jan Kiszka 2014-11-10 20:55 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-10 20:42 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-10 21:37, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote: >> On 2014-11-10 21:23, Gilles Chanteperdrix wrote: >>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: >>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: >>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: >>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: >>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: >>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: >>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: >>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: >>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: >>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Gilles, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel >>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not >>>>>>>>>>>>>>>> appear on boot-up. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same >>>>>>>>>>>>>>>>> with unlocked context switch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> FCSE is already disabled at all. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could >>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off >>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the >>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask >>>>>>>>>>>>>>> instead of the hardware mask. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, >>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back >>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace >>>>>>>>>>>>>>> freeze in the kernell code when the problem happens. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call >>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try >>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>>>>>>>>>>>>>> and catch such mistakes. This is especially important if you are >>>>>>>>>>>>>>> running a custom skin. >>>>>>>>>>>>>> >>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: >>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just >>>>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. >>>>>>>>>>>>> >>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes >>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can >>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the >>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the >>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is >>>>>>>>>>>>> broken, that should be something more obscure). >>>>>>>>>>>> >>>>>>>>>>>> Let's see. I think I've identified one wrong path: >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>>>>>>>>>> index d32f8bd..ab911f8 100644 >>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S >>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S >>>>>>>>>>>> @@ -198,7 +198,10 @@ >>>>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS >>>>>>>>>>>> @ The parent context IRQs must have been enabled to get here in >>>>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. >>>>>>>>>>>> - bl trace_hardirqs_on >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>>>>>> + bleq trace_hardirqs_off >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT >>>>>>>>>>>> + blne trace_hardirqs_on >>>>>>>>>>>> #endif >>>>>>>>>>>> .else >>>>>>>>>>>> @ IRQs off again before pulling preserved data off the stack >>>>>>>>>>>> >>>>>>>>>>>> This is probably no fix, but a with that change applied, the warning is >>>>>>>>>>>> gone. Now the question is what to really test for when returning here. I >>>>>>>>>>>> suppose we want the pipeline state of root here - should I >>>>>>>>>>>> __ipipe_check_root_interruptible? >>>>>>>>>>> >>>>>>>>>>> This does not make sense, read the comment above that change: there >>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with >>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a >>>>>>>>>>> problem for mainline too. We are necessarily returning to a place >>>>>>>>>>> where hardware irqs were on. >>>>>>>>>> >>>>>>>>>> Did you also look at the trace I posted? >>>>>>>>> >>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I >>>>>>>>> see is that these trace functions should never have been called from >>>>>>>>> rt domain in the first place. >>>>>>>>> >>>>>>>> >>>>>>>> There is no RT domain in the trace, only an inconsistent Linux trace >>>>>>>> state after return from IRQ. >>>>>>> >>>>>>> What can I say, when returning from IRQ, you are necessarily >>>>>>> returning to a point where irqs are ON, as the comment says, and it >>>>>>> makes perfect sense. So your "fix" should be a nop. So, something >>>>>>> else is broken. >>>>>> >>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I >>>>>> was asking for a better check. Also, if that path can be taken by RT >>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we >>>>>> additionally need to check for the caller's domain. >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Note that the fact that this trace_irqs stuff is not working well >>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE >>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) >>>>>>>> >>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not >>>>>>>> address all cases properly) plus the reversion of that commit, there are >>>>>>>> still inconsistencies. >>>>>>> >>>>>>> You can not reverse that commit, otherwise you will end-up calling >>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I >>>>>>> repeat, can not work. >>>>>> >>>>>> I can help to understand if that is sufficient to resolve the tracing >>>>>> breakage - it isn't, there are more paths missing or wrongly instrumented. >>>>> >>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on >>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And >>>>> is not broken. >>>> >>>> No, the I-pipe trace does not provide a Linux lock dependency checker, >>>> nor does it support might_sleep and such. If you have Linux drivers >>>> which depend on Xenomai directly or indirectly, you cannot validate them >>>> anymore. That's why we support this on x86. >>> >>> Since the I-pipe is already keeping track of irq state with >>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead >>> of trying and using this trace_hardirqs stuff which looks >>> irremediably broken to me? >> >> The former reflects the hw state, the latter traces the Linux state - >> from Linux POV. > > The I-pipe tracer keeps track of the root domain stall bit as well. > >> >> This is fixable. We just need to call the tracing functions where Linux >> would call it or where we replaced some Linux call with an I-pipe >> specific path and avoid calling it when the domain != root. Identifying >> those spots is tricky. > > If we take the example of an irq, we probably want not to call > trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the > root domain stall bit. Linux tracks the IRQ state separately from the (now virtualized) real state - to validate the consistency independently of some spurious hard irq enable/disable. And it tracks per task, not per CPU. It will be more messy to fake this than to fix it, I'm quite sure. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/d7a9728f/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:42 ` Jan Kiszka @ 2014-11-10 20:55 ` Gilles Chanteperdrix 2014-11-10 21:58 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 20:55 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:42:22PM +0100, Jan Kiszka wrote: > On 2014-11-10 21:37, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote: > >> On 2014-11-10 21:23, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > >>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > >>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > >>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > >>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > >>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > >>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > >>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > >>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Gilles, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel > >>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > >>>>>>>>>>>>>>>> appear on boot-up. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > >>>>>>>>>>>>>>>>> with unlocked context switch. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> FCSE is already disabled at all. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > >>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off > >>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the > >>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > >>>>>>>>>>>>>>> instead of the hardware mask. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, > >>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back > >>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > >>>>>>>>>>>>>>> freeze in the kernell code when the problem happens. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > >>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try > >>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > >>>>>>>>>>>>>>> and catch such mistakes. This is especially important if you are > >>>>>>>>>>>>>>> running a custom skin. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > >>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > >>>>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > >>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > >>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the > >>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the > >>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > >>>>>>>>>>>>> broken, that should be something more obscure). > >>>>>>>>>>>> > >>>>>>>>>>>> Let's see. I think I've identified one wrong path: > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>>>>>>>>>> index d32f8bd..ab911f8 100644 > >>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S > >>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S > >>>>>>>>>>>> @@ -198,7 +198,10 @@ > >>>>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > >>>>>>>>>>>> @ The parent context IRQs must have been enabled to get here in > >>>>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. > >>>>>>>>>>>> - bl trace_hardirqs_on > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>>>>>> + bleq trace_hardirqs_off > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > >>>>>>>>>>>> + blne trace_hardirqs_on > >>>>>>>>>>>> #endif > >>>>>>>>>>>> .else > >>>>>>>>>>>> @ IRQs off again before pulling preserved data off the stack > >>>>>>>>>>>> > >>>>>>>>>>>> This is probably no fix, but a with that change applied, the warning is > >>>>>>>>>>>> gone. Now the question is what to really test for when returning here. I > >>>>>>>>>>>> suppose we want the pipeline state of root here - should I > >>>>>>>>>>>> __ipipe_check_root_interruptible? > >>>>>>>>>>> > >>>>>>>>>>> This does not make sense, read the comment above that change: there > >>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with > >>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a > >>>>>>>>>>> problem for mainline too. We are necessarily returning to a place > >>>>>>>>>>> where hardware irqs were on. > >>>>>>>>>> > >>>>>>>>>> Did you also look at the trace I posted? > >>>>>>>>> > >>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I > >>>>>>>>> see is that these trace functions should never have been called from > >>>>>>>>> rt domain in the first place. > >>>>>>>>> > >>>>>>>> > >>>>>>>> There is no RT domain in the trace, only an inconsistent Linux trace > >>>>>>>> state after return from IRQ. > >>>>>>> > >>>>>>> What can I say, when returning from IRQ, you are necessarily > >>>>>>> returning to a point where irqs are ON, as the comment says, and it > >>>>>>> makes perfect sense. So your "fix" should be a nop. So, something > >>>>>>> else is broken. > >>>>>> > >>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I > >>>>>> was asking for a better check. Also, if that path can be taken by RT > >>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we > >>>>>> additionally need to check for the caller's domain. > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>> Note that the fact that this trace_irqs stuff is not working well > >>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE > >>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > >>>>>>>> > >>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not > >>>>>>>> address all cases properly) plus the reversion of that commit, there are > >>>>>>>> still inconsistencies. > >>>>>>> > >>>>>>> You can not reverse that commit, otherwise you will end-up calling > >>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > >>>>>>> repeat, can not work. > >>>>>> > >>>>>> I can help to understand if that is sufficient to resolve the tracing > >>>>>> breakage - it isn't, there are more paths missing or wrongly instrumented. > >>>>> > >>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > >>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And > >>>>> is not broken. > >>>> > >>>> No, the I-pipe trace does not provide a Linux lock dependency checker, > >>>> nor does it support might_sleep and such. If you have Linux drivers > >>>> which depend on Xenomai directly or indirectly, you cannot validate them > >>>> anymore. That's why we support this on x86. > >>> > >>> Since the I-pipe is already keeping track of irq state with > >>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead > >>> of trying and using this trace_hardirqs stuff which looks > >>> irremediably broken to me? > >> > >> The former reflects the hw state, the latter traces the Linux state - > >> from Linux POV. > > > > The I-pipe tracer keeps track of the root domain stall bit as well. > > > >> > >> This is fixable. We just need to call the tracing functions where Linux > >> would call it or where we replaced some Linux call with an I-pipe > >> specific path and avoid calling it when the domain != root. Identifying > >> those spots is tricky. > > > > If we take the example of an irq, we probably want not to call > > trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the > > root domain stall bit. > > Linux tracks the IRQ state separately from the (now virtualized) real > state - to validate the consistency independently of some spurious hard > irq enable/disable. And it tracks per task, not per CPU. It will be more > messy to fake this than to fix it, I'm quite sure. If we take the example of irq_svc (the example you patched). We have 4 cases: 1- entry over root, exit over root 2- entry over root, exit over non root 3- entry over non root, exit over non root 4- entry over non root, exit over root For all these cases, currently, trace_hardirqs_off is called on entry, and trace_hardirqs_on is called on exit. Case 1: trace_hardirqs_off on entry, may be right, but may in fact be useless if root domain is already stalled; trace_hardirqs_on on exit, may be wrong if root domain is stalled (so is right only if trace_hardirqs_off was not a nop on entry). Case 2: trace_hardirqs_off on entry, same as case 1; trace_hardirqs_on on exit is always wrong: we are now running in RT domain and should not touch the root domain irq mask Case 3: wrong, wrong Case 4: trace_hardirqs_off on entry is wrong, we are not running in the root domain. trace_hardirqs_on on exit may be right, if the root domain was not stalled when we took the interrupt that put us in the RT domain. -- Gilles. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20141110/14b4e990/attachment.sig> ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 20:55 ` Gilles Chanteperdrix @ 2014-11-10 21:58 ` Gilles Chanteperdrix 2014-11-12 17:27 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-10 21:58 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 09:55:12PM +0100, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:42:22PM +0100, Jan Kiszka wrote: > > On 2014-11-10 21:37, Gilles Chanteperdrix wrote: > > > On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote: > > >> On 2014-11-10 21:23, Gilles Chanteperdrix wrote: > > >>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > > >>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > > >>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > > >>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > > >>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > > >>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > > >>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > > >>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > > >>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > > >>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > > >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > > >>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > > >>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Hi Gilles, > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel > > >>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > > >>>>>>>>>>>>>>>> appear on boot-up. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > > >>>>>>>>>>>>>>>>> with unlocked context switch. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> FCSE is already disabled at all. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > > >>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off > > >>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the > > >>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > > >>>>>>>>>>>>>>> instead of the hardware mask. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, > > >>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back > > >>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > > >>>>>>>>>>>>>>> freeze in the kernell code when the problem happens. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > > >>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try > > >>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > > >>>>>>>>>>>>>>> and catch such mistakes. This is especially important if you are > > >>>>>>>>>>>>>>> running a custom skin. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > > >>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > > >>>>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > > >>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > > >>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the > > >>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the > > >>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > > >>>>>>>>>>>>> broken, that should be something more obscure). > > >>>>>>>>>>>> > > >>>>>>>>>>>> Let's see. I think I've identified one wrong path: > > >>>>>>>>>>>> > > >>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > > >>>>>>>>>>>> index d32f8bd..ab911f8 100644 > > >>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S > > >>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S > > >>>>>>>>>>>> @@ -198,7 +198,10 @@ > > >>>>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > > >>>>>>>>>>>> @ The parent context IRQs must have been enabled to get here in > > >>>>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. > > >>>>>>>>>>>> - bl trace_hardirqs_on > > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > > >>>>>>>>>>>> + bleq trace_hardirqs_off > > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > > >>>>>>>>>>>> + blne trace_hardirqs_on > > >>>>>>>>>>>> #endif > > >>>>>>>>>>>> .else > > >>>>>>>>>>>> @ IRQs off again before pulling preserved data off the stack > > >>>>>>>>>>>> > > >>>>>>>>>>>> This is probably no fix, but a with that change applied, the warning is > > >>>>>>>>>>>> gone. Now the question is what to really test for when returning here. I > > >>>>>>>>>>>> suppose we want the pipeline state of root here - should I > > >>>>>>>>>>>> __ipipe_check_root_interruptible? > > >>>>>>>>>>> > > >>>>>>>>>>> This does not make sense, read the comment above that change: there > > >>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with > > >>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a > > >>>>>>>>>>> problem for mainline too. We are necessarily returning to a place > > >>>>>>>>>>> where hardware irqs were on. > > >>>>>>>>>> > > >>>>>>>>>> Did you also look at the trace I posted? > > >>>>>>>>> > > >>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I > > >>>>>>>>> see is that these trace functions should never have been called from > > >>>>>>>>> rt domain in the first place. > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> There is no RT domain in the trace, only an inconsistent Linux trace > > >>>>>>>> state after return from IRQ. > > >>>>>>> > > >>>>>>> What can I say, when returning from IRQ, you are necessarily > > >>>>>>> returning to a point where irqs are ON, as the comment says, and it > > >>>>>>> makes perfect sense. So your "fix" should be a nop. So, something > > >>>>>>> else is broken. > > >>>>>> > > >>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I > > >>>>>> was asking for a better check. Also, if that path can be taken by RT > > >>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we > > >>>>>> additionally need to check for the caller's domain. > > >>>>>> > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> Note that the fact that this trace_irqs stuff is not working well > > >>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE > > >>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > > >>>>>>>> > > >>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not > > >>>>>>>> address all cases properly) plus the reversion of that commit, there are > > >>>>>>>> still inconsistencies. > > >>>>>>> > > >>>>>>> You can not reverse that commit, otherwise you will end-up calling > > >>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > > >>>>>>> repeat, can not work. > > >>>>>> > > >>>>>> I can help to understand if that is sufficient to resolve the tracing > > >>>>>> breakage - it isn't, there are more paths missing or wrongly instrumented. > > >>>>> > > >>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > > >>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And > > >>>>> is not broken. > > >>>> > > >>>> No, the I-pipe trace does not provide a Linux lock dependency checker, > > >>>> nor does it support might_sleep and such. If you have Linux drivers > > >>>> which depend on Xenomai directly or indirectly, you cannot validate them > > >>>> anymore. That's why we support this on x86. > > >>> > > >>> Since the I-pipe is already keeping track of irq state with > > >>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead > > >>> of trying and using this trace_hardirqs stuff which looks > > >>> irremediably broken to me? > > >> > > >> The former reflects the hw state, the latter traces the Linux state - > > >> from Linux POV. > > > > > > The I-pipe tracer keeps track of the root domain stall bit as well. > > > > > >> > > >> This is fixable. We just need to call the tracing functions where Linux > > >> would call it or where we replaced some Linux call with an I-pipe > > >> specific path and avoid calling it when the domain != root. Identifying > > >> those spots is tricky. > > > > > > If we take the example of an irq, we probably want not to call > > > trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the > > > root domain stall bit. > > > > Linux tracks the IRQ state separately from the (now virtualized) real > > state - to validate the consistency independently of some spurious hard > > irq enable/disable. And it tracks per task, not per CPU. It will be more > > messy to fake this than to fix it, I'm quite sure. > > If we take the example of irq_svc (the example you patched). We have > 4 cases: > > 1- entry over root, exit over root > 2- entry over root, exit over non root > 3- entry over non root, exit over non root > 4- entry over non root, exit over root Sorry, it does not work like that. Only case 1 and 3 make sense. Case 3 is easy, we do not need to call the trace_hardirqs functions. For case 1, I guess the trace_hardirqs_on at the end must be replaced with a test of the root domain stall bit, and call trace_hardirqs_on only if we return to a non-stalled root. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 21:58 ` Gilles Chanteperdrix @ 2014-11-12 17:27 ` Gilles Chanteperdrix 2014-11-17 16:48 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-12 17:27 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 10, 2014 at 10:58:46PM +0100, Gilles Chanteperdrix wrote: > On Mon, Nov 10, 2014 at 09:55:12PM +0100, Gilles Chanteperdrix wrote: > > On Mon, Nov 10, 2014 at 09:42:22PM +0100, Jan Kiszka wrote: > > > On 2014-11-10 21:37, Gilles Chanteperdrix wrote: > > > > On Mon, Nov 10, 2014 at 09:28:50PM +0100, Jan Kiszka wrote: > > > >> On 2014-11-10 21:23, Gilles Chanteperdrix wrote: > > > >>> On Mon, Nov 10, 2014 at 09:17:18PM +0100, Jan Kiszka wrote: > > > >>>> On 2014-11-10 21:14, Gilles Chanteperdrix wrote: > > > >>>>> On Mon, Nov 10, 2014 at 09:10:31PM +0100, Jan Kiszka wrote: > > > >>>>>> On 2014-11-10 21:06, Gilles Chanteperdrix wrote: > > > >>>>>>> On Mon, Nov 10, 2014 at 09:02:58PM +0100, Jan Kiszka wrote: > > > >>>>>>>> On 2014-11-10 21:00, Gilles Chanteperdrix wrote: > > > >>>>>>>>> On Mon, Nov 10, 2014 at 08:55:26PM +0100, Jan Kiszka wrote: > > > >>>>>>>>>> On 2014-11-10 20:46, Gilles Chanteperdrix wrote: > > > >>>>>>>>>>> On Mon, Nov 10, 2014 at 07:29:58PM +0100, Jan Kiszka wrote: > > > >>>>>>>>>>>> On 2014-11-10 16:56, Gilles Chanteperdrix wrote: > > > >>>>>>>>>>>>> On Mon, Nov 10, 2014 at 03:52:41PM +0100, Jan Kiszka wrote: > > > >>>>>>>>>>>>>> On 2014-11-10 13:43, Gilles Chanteperdrix wrote: > > > >>>>>>>>>>>>>>> On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Hi Gilles, > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Do you have the same message with exactly the same kernel > > > >>>>>>>>>>>>>>>>> configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > > > >>>>>>>>>>>>>>>> appear on boot-up. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Do you have FCSE enabled? If yes, did you try disabling it? same > > > >>>>>>>>>>>>>>>>> with unlocked context switch. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> FCSE is already disabled at all. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Do you have an idea how to overcome the problem? > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> I am not sure the lockdep message really is a problem. lockdep could > > > >>>>>>>>>>>>>>> be confused by the fact that the hardware interrupts are not off > > > >>>>>>>>>>>>>>> when running the I-pipe, or because we are missing some bit in the > > > >>>>>>>>>>>>>>> I-pipe arm specific code to get it looking at the virtual mask > > > >>>>>>>>>>>>>>> instead of the hardware mask. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> As for the scheduling while atomic and random segmentation fault, > > > >>>>>>>>>>>>>>> you should use the I-pipe tracer, configure it with enough back > > > >>>>>>>>>>>>>>> trace points, something like 1000 or 10000, and trigger a trace > > > >>>>>>>>>>>>>>> freeze in the kernell code when the problem happens. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Also, for the "scheduling while atomic", it may happen if you call > > > >>>>>>>>>>>>>>> some Linux service which reschedules from primary mode, you can try > > > >>>>>>>>>>>>>>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try > > > >>>>>>>>>>>>>>> and catch such mistakes. This is especially important if you are > > > >>>>>>>>>>>>>>> running a custom skin. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> "Scheduling while atomic" may have the same reason why lockdep stumbles: > > > >>>>>>>>>>>>>> some changes of I-pipe messe up with IRQ state tracing of Linux. I just > > > >>>>>>>>>>>>>> started to look into this issue again. We tried earlier but got distracted. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I doubt that very much. Though I never run with lockdep, I sometimes > > > >>>>>>>>>>>>> run with CONFIG_PREEMPT, and never saw this message. From what I can > > > >>>>>>>>>>>>> see, the "scheduling while atomic" message is based on the > > > >>>>>>>>>>>>> preempt_count only and does not use irqs_disabled() (which by the > > > >>>>>>>>>>>>> way is known to work with I-pipe on ARM as well, so, if something is > > > >>>>>>>>>>>>> broken, that should be something more obscure). > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Let's see. I think I've identified one wrong path: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > > > >>>>>>>>>>>> index d32f8bd..ab911f8 100644 > > > >>>>>>>>>>>> --- a/arch/arm/kernel/entry-header.S > > > >>>>>>>>>>>> +++ b/arch/arm/kernel/entry-header.S > > > >>>>>>>>>>>> @@ -198,7 +198,10 @@ > > > >>>>>>>>>>>> #ifdef CONFIG_TRACE_IRQFLAGS > > > >>>>>>>>>>>> @ The parent context IRQs must have been enabled to get here in > > > >>>>>>>>>>>> @ the first place, so there's no point checking the PSR I bit. > > > >>>>>>>>>>>> - bl trace_hardirqs_on > > > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > > > >>>>>>>>>>>> + bleq trace_hardirqs_off > > > >>>>>>>>>>>> + tst \rpsr, #PSR_I_BIT > > > >>>>>>>>>>>> + blne trace_hardirqs_on > > > >>>>>>>>>>>> #endif > > > >>>>>>>>>>>> .else > > > >>>>>>>>>>>> @ IRQs off again before pulling preserved data off the stack > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> This is probably no fix, but a with that change applied, the warning is > > > >>>>>>>>>>>> gone. Now the question is what to really test for when returning here. I > > > >>>>>>>>>>>> suppose we want the pipeline state of root here - should I > > > >>>>>>>>>>>> __ipipe_check_root_interruptible? > > > >>>>>>>>>>> > > > >>>>>>>>>>> This does not make sense, read the comment above that change: there > > > >>>>>>>>>>> is no way an interrupt can be taken, and so entering svc_entry, with > > > >>>>>>>>>>> interrupts off. Besides this is mainline code, so it would be a > > > >>>>>>>>>>> problem for mainline too. We are necessarily returning to a place > > > >>>>>>>>>>> where hardware irqs were on. > > > >>>>>>>>>> > > > >>>>>>>>>> Did you also look at the trace I posted? > > > >>>>>>>>> > > > >>>>>>>>> Yes, but I did not see what I am supposed to see. The only thing I > > > >>>>>>>>> see is that these trace functions should never have been called from > > > >>>>>>>>> rt domain in the first place. > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> There is no RT domain in the trace, only an inconsistent Linux trace > > > >>>>>>>> state after return from IRQ. > > > >>>>>>> > > > >>>>>>> What can I say, when returning from IRQ, you are necessarily > > > >>>>>>> returning to a point where irqs are ON, as the comment says, and it > > > >>>>>>> makes perfect sense. So your "fix" should be a nop. So, something > > > >>>>>>> else is broken. > > > >>>>>> > > > >>>>>> The test is for selecting trace_hardirqs_off/on is wrong, that's why I > > > >>>>>> was asking for a better check. Also, if that path can be taken by RT > > > >>>>>> domains as well, calling trace_hardirqs_off/on was always wrong, and we > > > >>>>>> additionally need to check for the caller's domain. > > > >>>>>> > > > >>>>>>> > > > >>>>>>>> > > > >>>>>>>>> Note that the fact that this trace_irqs stuff is not working well > > > >>>>>>>>> may be the fact that part of them are commented with CONFIG_IPIPE > > > >>>>>>>>> (see asm_trace_hardirqs_on_cond, asm_trace_hardirqs_off) > > > >>>>>>>> > > > >>>>>>>> No, that doesn't solve all issues. Even with my hack (which may not > > > >>>>>>>> address all cases properly) plus the reversion of that commit, there are > > > >>>>>>>> still inconsistencies. > > > >>>>>>> > > > >>>>>>> You can not reverse that commit, otherwise you will end-up calling > > > >>>>>>> trace_hardirqs_on/trace_hardirqs_off from RT domain, which, I > > > >>>>>>> repeat, can not work. > > > >>>>>> > > > >>>>>> I can help to understand if that is sufficient to resolve the tracing > > > >>>>>> breakage - it isn't, there are more paths missing or wrongly instrumented. > > > >>>>> > > > >>>>> My idea of all this is that CONFIG_TRACE_IRQFLAGS should depend on > > > >>>>> !IPIPE, since the I-pipe tracer provides the same functionality. And > > > >>>>> is not broken. > > > >>>> > > > >>>> No, the I-pipe trace does not provide a Linux lock dependency checker, > > > >>>> nor does it support might_sleep and such. If you have Linux drivers > > > >>>> which depend on Xenomai directly or indirectly, you cannot validate them > > > >>>> anymore. That's why we support this on x86. > > > >>> > > > >>> Since the I-pipe is already keeping track of irq state with > > > >>> CONFIG_IPIPE_TRACE_IRQSOFF, can we not use that information instead > > > >>> of trying and using this trace_hardirqs stuff which looks > > > >>> irremediably broken to me? > > > >> > > > >> The former reflects the hw state, the latter traces the Linux state - > > > >> from Linux POV. > > > > > > > > The I-pipe tracer keeps track of the root domain stall bit as well. > > > > > > > >> > > > >> This is fixable. We just need to call the tracing functions where Linux > > > >> would call it or where we replaced some Linux call with an I-pipe > > > >> specific path and avoid calling it when the domain != root. Identifying > > > >> those spots is tricky. > > > > > > > > If we take the example of an irq, we probably want not to call > > > > trace_hardirqs_on/trace_hardirqs_off anywhere, and just rely on the > > > > root domain stall bit. > > > > > > Linux tracks the IRQ state separately from the (now virtualized) real > > > state - to validate the consistency independently of some spurious hard > > > irq enable/disable. And it tracks per task, not per CPU. It will be more > > > messy to fake this than to fix it, I'm quite sure. > > > > If we take the example of irq_svc (the example you patched). We have > > 4 cases: > > > > 1- entry over root, exit over root > > 2- entry over root, exit over non root > > 3- entry over non root, exit over non root > > 4- entry over non root, exit over root > > Sorry, it does not work like that. Only case 1 and 3 make sense. > Case 3 is easy, we do not need to call the trace_hardirqs functions. > For case 1, I guess the trace_hardirqs_on at the end must be > replaced with a test of the root domain stall bit, and call > trace_hardirqs_on only if we return to a non-stalled root. We do not need trace_hardirqs_on and trace_hardirqs_off for the particular case of IRQs: they are already handled by __ipipe_do_sync_stage. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-12 17:27 ` Gilles Chanteperdrix @ 2014-11-17 16:48 ` Jan Kiszka 2014-11-17 16:59 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-17 16:48 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-12 18:27, Gilles Chanteperdrix wrote: > We do not need trace_hardirqs_on and trace_hardirqs_off for the > particular case of IRQs: they are already handled by > __ipipe_do_sync_stage. That was the key: Simply disabling the instrumentations in the CONFIG_IPIPE removes all lock state inconsistencies, at least this far: diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d32f8bd..d8e0b2c 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -195,7 +195,7 @@ #ifdef CONFIG_IPIPE_DEBUG_INTERNAL bl __ipipe_bugon_irqs_enabled #endif -#ifdef CONFIG_TRACE_IRQFLAGS +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) @ The parent context IRQs must have been enabled to get here in @ the first place, so there's no point checking the PSR I bit. bl trace_hardirqs_on @@ -203,7 +203,7 @@ .else @ IRQs off again before pulling preserved data off the stack disable_irq_notrace -#ifdef CONFIG_TRACE_IRQFLAGS +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) tst \rpsr, #PSR_I_BIT bleq trace_hardirqs_on tst \rpsr, #PSR_I_BIT Will send a patch. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 16:48 ` Jan Kiszka @ 2014-11-17 16:59 ` Gilles Chanteperdrix 2014-11-17 17:11 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 16:59 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: > On 2014-11-12 18:27, Gilles Chanteperdrix wrote: > > We do not need trace_hardirqs_on and trace_hardirqs_off for the > > particular case of IRQs: they are already handled by > > __ipipe_do_sync_stage. > > That was the key: Simply disabling the instrumentations in the > CONFIG_IPIPE removes all lock state inconsistencies, at least this far: > > diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > index d32f8bd..d8e0b2c 100644 > --- a/arch/arm/kernel/entry-header.S > +++ b/arch/arm/kernel/entry-header.S > @@ -195,7 +195,7 @@ > #ifdef CONFIG_IPIPE_DEBUG_INTERNAL > bl __ipipe_bugon_irqs_enabled > #endif > -#ifdef CONFIG_TRACE_IRQFLAGS > +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > @ The parent context IRQs must have been enabled to get here in > @ the first place, so there's no point checking the PSR I bit. > bl trace_hardirqs_on > @@ -203,7 +203,7 @@ > .else > @ IRQs off again before pulling preserved data off the stack > disable_irq_notrace > -#ifdef CONFIG_TRACE_IRQFLAGS > +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > tst \rpsr, #PSR_I_BIT > bleq trace_hardirqs_on > tst \rpsr, #PSR_I_BIT > > Will send a patch. Will this work for other paths in entry.S, such as exceptions or syscalls? -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 16:59 ` Gilles Chanteperdrix @ 2014-11-17 17:11 ` Jan Kiszka 2014-11-17 17:33 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-17 17:11 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-17 17:59, Gilles Chanteperdrix wrote: > On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: >> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: >>> We do not need trace_hardirqs_on and trace_hardirqs_off for the >>> particular case of IRQs: they are already handled by >>> __ipipe_do_sync_stage. >> >> That was the key: Simply disabling the instrumentations in the >> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: >> >> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >> index d32f8bd..d8e0b2c 100644 >> --- a/arch/arm/kernel/entry-header.S >> +++ b/arch/arm/kernel/entry-header.S >> @@ -195,7 +195,7 @@ >> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL >> bl __ipipe_bugon_irqs_enabled >> #endif >> -#ifdef CONFIG_TRACE_IRQFLAGS >> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >> @ The parent context IRQs must have been enabled to get here in >> @ the first place, so there's no point checking the PSR I bit. >> bl trace_hardirqs_on >> @@ -203,7 +203,7 @@ >> .else >> @ IRQs off again before pulling preserved data off the stack >> disable_irq_notrace >> -#ifdef CONFIG_TRACE_IRQFLAGS >> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >> tst \rpsr, #PSR_I_BIT >> bleq trace_hardirqs_on >> tst \rpsr, #PSR_I_BIT >> >> Will send a patch. > > Will this work for other paths in entry.S, such as exceptions or > syscalls? Do they all come along that code? Then we need to differentiate, likely via a separate macro parameter. Just noticed that there is also svc_enter, and that should be handled in the same way. And it's likely also shared across the board. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 17:11 ` Jan Kiszka @ 2014-11-17 17:33 ` Gilles Chanteperdrix 2014-11-17 19:07 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 17:33 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 06:11:44PM +0100, Jan Kiszka wrote: > On 2014-11-17 17:59, Gilles Chanteperdrix wrote: > > On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: > >> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: > >>> We do not need trace_hardirqs_on and trace_hardirqs_off for the > >>> particular case of IRQs: they are already handled by > >>> __ipipe_do_sync_stage. > >> > >> That was the key: Simply disabling the instrumentations in the > >> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: > >> > >> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >> index d32f8bd..d8e0b2c 100644 > >> --- a/arch/arm/kernel/entry-header.S > >> +++ b/arch/arm/kernel/entry-header.S > >> @@ -195,7 +195,7 @@ > >> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL > >> bl __ipipe_bugon_irqs_enabled > >> #endif > >> -#ifdef CONFIG_TRACE_IRQFLAGS > >> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >> @ The parent context IRQs must have been enabled to get here in > >> @ the first place, so there's no point checking the PSR I bit. > >> bl trace_hardirqs_on > >> @@ -203,7 +203,7 @@ > >> .else > >> @ IRQs off again before pulling preserved data off the stack > >> disable_irq_notrace > >> -#ifdef CONFIG_TRACE_IRQFLAGS > >> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >> tst \rpsr, #PSR_I_BIT > >> bleq trace_hardirqs_on > >> tst \rpsr, #PSR_I_BIT > >> > >> Will send a patch. > > > > Will this work for other paths in entry.S, such as exceptions or > > syscalls? > > Do they all come along that code? Then we need to differentiate, likely > via a separate macro parameter. > > Just noticed that there is also svc_enter, and that should be handled in > the same way. And it's likely also shared across the board. There are 4 macros: svc_enter svc_exit when entering/exiting svc mode (whether from irq, data abort, prefetch abort), that means reentering the irq/exception path when already in kerne-mode usr_enter usr_exit when entering/exiting usr mode (whether from irq, data abort, prefetch abort, or syscall), which is entered from user mode. All these paths call trace_hardirqs_on/trace_hardirqs_off I have not checked the details on the how and when and if, but since you are the one working on this, I suggest you do. If there is a need to call the real trace_hardirqs_on/trace_hardirqs_off in some cases, I would very much prefer replacing the bl trace_hard_irqs* with a bl __ipipe_trace_hardirqs* sorting out the details in C, in arch/arm/kernel/ipipe.c, than doing this in assembly files with complicated #if conditions, or retrieval of the current domain in assembly. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 17:33 ` Gilles Chanteperdrix @ 2014-11-17 19:07 ` Jan Kiszka 2014-11-17 19:24 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-17 19:07 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-17 18:33, Gilles Chanteperdrix wrote: > On Mon, Nov 17, 2014 at 06:11:44PM +0100, Jan Kiszka wrote: >> On 2014-11-17 17:59, Gilles Chanteperdrix wrote: >>> On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: >>>> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: >>>>> We do not need trace_hardirqs_on and trace_hardirqs_off for the >>>>> particular case of IRQs: they are already handled by >>>>> __ipipe_do_sync_stage. >>>> >>>> That was the key: Simply disabling the instrumentations in the >>>> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: >>>> >>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>> index d32f8bd..d8e0b2c 100644 >>>> --- a/arch/arm/kernel/entry-header.S >>>> +++ b/arch/arm/kernel/entry-header.S >>>> @@ -195,7 +195,7 @@ >>>> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL >>>> bl __ipipe_bugon_irqs_enabled >>>> #endif >>>> -#ifdef CONFIG_TRACE_IRQFLAGS >>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >>>> @ The parent context IRQs must have been enabled to get here in >>>> @ the first place, so there's no point checking the PSR I bit. >>>> bl trace_hardirqs_on >>>> @@ -203,7 +203,7 @@ >>>> .else >>>> @ IRQs off again before pulling preserved data off the stack >>>> disable_irq_notrace >>>> -#ifdef CONFIG_TRACE_IRQFLAGS >>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >>>> tst \rpsr, #PSR_I_BIT >>>> bleq trace_hardirqs_on >>>> tst \rpsr, #PSR_I_BIT >>>> >>>> Will send a patch. >>> >>> Will this work for other paths in entry.S, such as exceptions or >>> syscalls? >> >> Do they all come along that code? Then we need to differentiate, likely >> via a separate macro parameter. >> >> Just noticed that there is also svc_enter, and that should be handled in >> the same way. And it's likely also shared across the board. > > There are 4 macros: > svc_enter > svc_exit > when entering/exiting svc mode (whether from irq, data abort, > prefetch abort), that means reentering the > irq/exception path when already in kerne-mode > > usr_enter > usr_exit > when entering/exiting usr mode (whether from irq, data abort, > prefetch abort, or syscall), which is entered from user mode. > > All these paths call trace_hardirqs_on/trace_hardirqs_off > I have not checked the details on the how and when and if, but since > you are the one working on this, I suggest you do. > > If there is a need to call the real > trace_hardirqs_on/trace_hardirqs_off in some cases, I would very > much prefer replacing the bl trace_hard_irqs* with a bl > __ipipe_trace_hardirqs* sorting out the details in C, in > arch/arm/kernel/ipipe.c, than doing this in assembly files with > complicated #if conditions, or retrieval of the current domain > in assembly. > OK, here is another proposal: filter out tracing in kernel IRQ exit path (that is required as we may have interrupted Linux with virtual IRQs off), but otherwise rely on domain filtering in the respective tracing functions: diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 102adcb..e285269 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -130,7 +130,7 @@ #endif .macro asm_trace_hardirqs_off -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) +#if defined(CONFIG_TRACE_IRQFLAGS) stmdb sp!, {r0-r3, ip, lr} bl trace_hardirqs_off ldmia sp!, {r0-r3, ip, lr} @@ -138,7 +138,7 @@ .endm .macro asm_trace_hardirqs_on_cond, cond -#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) +#if defined(CONFIG_TRACE_IRQFLAGS) /* * actually the registers should be pushed and pop'd conditionally, but * after bl the flags are certainly clobbered diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d32f8bd..cf2772a 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -195,7 +195,7 @@ #ifdef CONFIG_IPIPE_DEBUG_INTERNAL bl __ipipe_bugon_irqs_enabled #endif -#ifdef CONFIG_TRACE_IRQFLAGS +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) @ The parent context IRQs must have been enabled to get here in @ the first place, so there's no point checking the PSR I bit. bl trace_hardirqs_on @@ -285,7 +285,7 @@ #ifdef CONFIG_IPIPE_DEBUG_INTERNAL bl __ipipe_bugon_irqs_enabled #endif -#ifdef CONFIG_TRACE_IRQFLAGS +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) @ The parent context IRQs must have been enabled to get here in @ the first place, so there's no point checking the PSR I bit. bl trace_hardirqs_on diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e24bb30..2e9043b 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2559,6 +2559,9 @@ static void __trace_hardirqs_on_caller(unsigned long ip) void trace_hardirqs_on_caller(unsigned long ip) { + if (!ipipe_root_p) + return; + time_hardirqs_on(CALLER_ADDR0, ip); if (unlikely(!debug_locks || current->lockdep_recursion)) @@ -2690,8 +2693,12 @@ void trace_softirqs_on(unsigned long ip) */ void trace_softirqs_off(unsigned long ip) { - struct task_struct *curr = current; + struct task_struct *curr; + + if (!ipipe_root_p) + return; + curr = current; if (unlikely(!debug_locks || current->lockdep_recursion)) return; diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 2aefbee..c3ec43f 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -486,28 +486,28 @@ inline void print_irqtrace_events(struct task_struct *curr) */ void trace_hardirqs_on(void) { - if (!preempt_trace() && irq_trace()) + if (ipipe_root_p && !preempt_trace() && irq_trace()) stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1); } EXPORT_SYMBOL(trace_hardirqs_on); void trace_hardirqs_off(void) { - if (!preempt_trace() && irq_trace()) + if (ipipe_root_p && !preempt_trace() && irq_trace()) start_critical_timing(CALLER_ADDR0, CALLER_ADDR1); } EXPORT_SYMBOL(trace_hardirqs_off); void trace_hardirqs_on_caller(unsigned long caller_addr) { - if (!preempt_trace() && irq_trace()) + if (ipipe_root_p && !preempt_trace() && irq_trace()) stop_critical_timing(CALLER_ADDR0, caller_addr); } EXPORT_SYMBOL(trace_hardirqs_on_caller); void trace_hardirqs_off_caller(unsigned long caller_addr) { - if (!preempt_trace() && irq_trace()) + if (ipipe_root_p && !preempt_trace() && irq_trace()) start_critical_timing(CALLER_ADDR0, caller_addr); } EXPORT_SYMBOL(trace_hardirqs_off_caller); This works for ARM so far, need to revalidate x86, but it should work based on the concept. Comments? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 19:07 ` Jan Kiszka @ 2014-11-17 19:24 ` Gilles Chanteperdrix 2014-11-18 6:19 ` Jan Kiszka 0 siblings, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 19:24 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 08:07:39PM +0100, Jan Kiszka wrote: > On 2014-11-17 18:33, Gilles Chanteperdrix wrote: > > On Mon, Nov 17, 2014 at 06:11:44PM +0100, Jan Kiszka wrote: > >> On 2014-11-17 17:59, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: > >>>>> We do not need trace_hardirqs_on and trace_hardirqs_off for the > >>>>> particular case of IRQs: they are already handled by > >>>>> __ipipe_do_sync_stage. > >>>> > >>>> That was the key: Simply disabling the instrumentations in the > >>>> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: > >>>> > >>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>> index d32f8bd..d8e0b2c 100644 > >>>> --- a/arch/arm/kernel/entry-header.S > >>>> +++ b/arch/arm/kernel/entry-header.S > >>>> @@ -195,7 +195,7 @@ > >>>> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL > >>>> bl __ipipe_bugon_irqs_enabled > >>>> #endif > >>>> -#ifdef CONFIG_TRACE_IRQFLAGS > >>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >>>> @ The parent context IRQs must have been enabled to get here in > >>>> @ the first place, so there's no point checking the PSR I bit. > >>>> bl trace_hardirqs_on > >>>> @@ -203,7 +203,7 @@ > >>>> .else > >>>> @ IRQs off again before pulling preserved data off the stack > >>>> disable_irq_notrace > >>>> -#ifdef CONFIG_TRACE_IRQFLAGS > >>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >>>> tst \rpsr, #PSR_I_BIT > >>>> bleq trace_hardirqs_on > >>>> tst \rpsr, #PSR_I_BIT > >>>> > >>>> Will send a patch. > >>> > >>> Will this work for other paths in entry.S, such as exceptions or > >>> syscalls? > >> > >> Do they all come along that code? Then we need to differentiate, likely > >> via a separate macro parameter. > >> > >> Just noticed that there is also svc_enter, and that should be handled in > >> the same way. And it's likely also shared across the board. > > > > There are 4 macros: > > svc_enter > > svc_exit > > when entering/exiting svc mode (whether from irq, data abort, > > prefetch abort), that means reentering the > > irq/exception path when already in kerne-mode > > > > usr_enter > > usr_exit > > when entering/exiting usr mode (whether from irq, data abort, > > prefetch abort, or syscall), which is entered from user mode. > > > > All these paths call trace_hardirqs_on/trace_hardirqs_off > > I have not checked the details on the how and when and if, but since > > you are the one working on this, I suggest you do. > > > > If there is a need to call the real > > trace_hardirqs_on/trace_hardirqs_off in some cases, I would very > > much prefer replacing the bl trace_hard_irqs* with a bl > > __ipipe_trace_hardirqs* sorting out the details in C, in > > arch/arm/kernel/ipipe.c, than doing this in assembly files with > > complicated #if conditions, or retrieval of the current domain > > in assembly. > > > > OK, here is another proposal: filter out tracing in kernel IRQ exit path > (that is required as we may have interrupted Linux with virtual IRQs > off), but otherwise rely on domain filtering in the respective tracing > functions: The only case where it does not work, is for asymmetric things, namely syscalls, and exceptions (page faults) because you can enter a syscall or exception in secondary mode (so trace_hardirqs_on gets called) and leave in primary mode, in which case you will reenter root with the kernel considering that hardirqs are off whereas they may not be. Listen, you have stop trying and testing patches and just say "it works, so, take my patch", this will not work. I absolutely require of you that you enumerate for each case, what the code does, and why it works. I will not accept a patch that was quickly tested and appeared to work. I consider that stuff a corner case, and not really useful, so, I would very much prefer get it to depend on !IPIPE. However, you seem to want and have it working, that is fine by me, but in that case do the work well, so that we do not get users complaining that it does not work in corner cases. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 19:24 ` Gilles Chanteperdrix @ 2014-11-18 6:19 ` Jan Kiszka 2014-11-18 6:28 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Jan Kiszka @ 2014-11-18 6:19 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2014-11-17 20:24, Gilles Chanteperdrix wrote: > On Mon, Nov 17, 2014 at 08:07:39PM +0100, Jan Kiszka wrote: >> On 2014-11-17 18:33, Gilles Chanteperdrix wrote: >>> On Mon, Nov 17, 2014 at 06:11:44PM +0100, Jan Kiszka wrote: >>>> On 2014-11-17 17:59, Gilles Chanteperdrix wrote: >>>>> On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: >>>>>> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: >>>>>>> We do not need trace_hardirqs_on and trace_hardirqs_off for the >>>>>>> particular case of IRQs: they are already handled by >>>>>>> __ipipe_do_sync_stage. >>>>>> >>>>>> That was the key: Simply disabling the instrumentations in the >>>>>> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: >>>>>> >>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S >>>>>> index d32f8bd..d8e0b2c 100644 >>>>>> --- a/arch/arm/kernel/entry-header.S >>>>>> +++ b/arch/arm/kernel/entry-header.S >>>>>> @@ -195,7 +195,7 @@ >>>>>> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL >>>>>> bl __ipipe_bugon_irqs_enabled >>>>>> #endif >>>>>> -#ifdef CONFIG_TRACE_IRQFLAGS >>>>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >>>>>> @ The parent context IRQs must have been enabled to get here in >>>>>> @ the first place, so there's no point checking the PSR I bit. >>>>>> bl trace_hardirqs_on >>>>>> @@ -203,7 +203,7 @@ >>>>>> .else >>>>>> @ IRQs off again before pulling preserved data off the stack >>>>>> disable_irq_notrace >>>>>> -#ifdef CONFIG_TRACE_IRQFLAGS >>>>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) >>>>>> tst \rpsr, #PSR_I_BIT >>>>>> bleq trace_hardirqs_on >>>>>> tst \rpsr, #PSR_I_BIT >>>>>> >>>>>> Will send a patch. >>>>> >>>>> Will this work for other paths in entry.S, such as exceptions or >>>>> syscalls? >>>> >>>> Do they all come along that code? Then we need to differentiate, likely >>>> via a separate macro parameter. >>>> >>>> Just noticed that there is also svc_enter, and that should be handled in >>>> the same way. And it's likely also shared across the board. >>> >>> There are 4 macros: >>> svc_enter >>> svc_exit >>> when entering/exiting svc mode (whether from irq, data abort, >>> prefetch abort), that means reentering the >>> irq/exception path when already in kerne-mode >>> >>> usr_enter >>> usr_exit >>> when entering/exiting usr mode (whether from irq, data abort, >>> prefetch abort, or syscall), which is entered from user mode. >>> >>> All these paths call trace_hardirqs_on/trace_hardirqs_off >>> I have not checked the details on the how and when and if, but since >>> you are the one working on this, I suggest you do. >>> >>> If there is a need to call the real >>> trace_hardirqs_on/trace_hardirqs_off in some cases, I would very >>> much prefer replacing the bl trace_hard_irqs* with a bl >>> __ipipe_trace_hardirqs* sorting out the details in C, in >>> arch/arm/kernel/ipipe.c, than doing this in assembly files with >>> complicated #if conditions, or retrieval of the current domain >>> in assembly. >>> >> >> OK, here is another proposal: filter out tracing in kernel IRQ exit path >> (that is required as we may have interrupted Linux with virtual IRQs >> off), but otherwise rely on domain filtering in the respective tracing >> functions: > > The only case where it does not work, is for asymmetric things, > namely syscalls, and exceptions (page faults) because you > can enter a syscall or exception in secondary mode (so > trace_hardirqs_on gets called) and leave in primary mode, in which > case you will reenter root with the kernel considering that hardirqs > are off whereas they may not be. > > Listen, you have stop trying and testing patches and just say "it > works, so, take my patch", this will not work. I absolutely require > of you that you enumerate for each case, what the code does, and why > it works. I will not accept a patch that was quickly tested and > appeared to work. I consider that stuff a corner case, and not > really useful, so, I would very much prefer get it to depend on > !IPIPE. However, you seem to want and have it working, that is fine > by me, but in that case do the work well, so that we do not get > users complaining that it does not work in corner cases. The current changes already return lockdep to usable state, which is an improvement. Plus they remove remaining risks to call the tracing functions over the head domain, another improvement over the existing code, for all archs. That this may not catch all migration corner cases yet shouldn't be your worries - if you don't care about lockdep at all. However, I will propose properly described and signed-off patches for merge once testing and analysis provide the required confidence in the approach. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-18 6:19 ` Jan Kiszka @ 2014-11-18 6:28 ` Gilles Chanteperdrix 0 siblings, 0 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-18 6:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On Tue, Nov 18, 2014 at 07:19:17AM +0100, Jan Kiszka wrote: > On 2014-11-17 20:24, Gilles Chanteperdrix wrote: > > On Mon, Nov 17, 2014 at 08:07:39PM +0100, Jan Kiszka wrote: > >> On 2014-11-17 18:33, Gilles Chanteperdrix wrote: > >>> On Mon, Nov 17, 2014 at 06:11:44PM +0100, Jan Kiszka wrote: > >>>> On 2014-11-17 17:59, Gilles Chanteperdrix wrote: > >>>>> On Mon, Nov 17, 2014 at 05:48:00PM +0100, Jan Kiszka wrote: > >>>>>> On 2014-11-12 18:27, Gilles Chanteperdrix wrote: > >>>>>>> We do not need trace_hardirqs_on and trace_hardirqs_off for the > >>>>>>> particular case of IRQs: they are already handled by > >>>>>>> __ipipe_do_sync_stage. > >>>>>> > >>>>>> That was the key: Simply disabling the instrumentations in the > >>>>>> CONFIG_IPIPE removes all lock state inconsistencies, at least this far: > >>>>>> > >>>>>> diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S > >>>>>> index d32f8bd..d8e0b2c 100644 > >>>>>> --- a/arch/arm/kernel/entry-header.S > >>>>>> +++ b/arch/arm/kernel/entry-header.S > >>>>>> @@ -195,7 +195,7 @@ > >>>>>> #ifdef CONFIG_IPIPE_DEBUG_INTERNAL > >>>>>> bl __ipipe_bugon_irqs_enabled > >>>>>> #endif > >>>>>> -#ifdef CONFIG_TRACE_IRQFLAGS > >>>>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >>>>>> @ The parent context IRQs must have been enabled to get here in > >>>>>> @ the first place, so there's no point checking the PSR I bit. > >>>>>> bl trace_hardirqs_on > >>>>>> @@ -203,7 +203,7 @@ > >>>>>> .else > >>>>>> @ IRQs off again before pulling preserved data off the stack > >>>>>> disable_irq_notrace > >>>>>> -#ifdef CONFIG_TRACE_IRQFLAGS > >>>>>> +#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_IPIPE) > >>>>>> tst \rpsr, #PSR_I_BIT > >>>>>> bleq trace_hardirqs_on > >>>>>> tst \rpsr, #PSR_I_BIT > >>>>>> > >>>>>> Will send a patch. > >>>>> > >>>>> Will this work for other paths in entry.S, such as exceptions or > >>>>> syscalls? > >>>> > >>>> Do they all come along that code? Then we need to differentiate, likely > >>>> via a separate macro parameter. > >>>> > >>>> Just noticed that there is also svc_enter, and that should be handled in > >>>> the same way. And it's likely also shared across the board. > >>> > >>> There are 4 macros: > >>> svc_enter > >>> svc_exit > >>> when entering/exiting svc mode (whether from irq, data abort, > >>> prefetch abort), that means reentering the > >>> irq/exception path when already in kerne-mode > >>> > >>> usr_enter > >>> usr_exit > >>> when entering/exiting usr mode (whether from irq, data abort, > >>> prefetch abort, or syscall), which is entered from user mode. > >>> > >>> All these paths call trace_hardirqs_on/trace_hardirqs_off > >>> I have not checked the details on the how and when and if, but since > >>> you are the one working on this, I suggest you do. > >>> > >>> If there is a need to call the real > >>> trace_hardirqs_on/trace_hardirqs_off in some cases, I would very > >>> much prefer replacing the bl trace_hard_irqs* with a bl > >>> __ipipe_trace_hardirqs* sorting out the details in C, in > >>> arch/arm/kernel/ipipe.c, than doing this in assembly files with > >>> complicated #if conditions, or retrieval of the current domain > >>> in assembly. > >>> > >> > >> OK, here is another proposal: filter out tracing in kernel IRQ exit path > >> (that is required as we may have interrupted Linux with virtual IRQs > >> off), but otherwise rely on domain filtering in the respective tracing > >> functions: > > > > The only case where it does not work, is for asymmetric things, > > namely syscalls, and exceptions (page faults) because you > > can enter a syscall or exception in secondary mode (so > > trace_hardirqs_on gets called) and leave in primary mode, in which > > case you will reenter root with the kernel considering that hardirqs > > are off whereas they may not be. > > > > Listen, you have stop trying and testing patches and just say "it > > works, so, take my patch", this will not work. I absolutely require > > of you that you enumerate for each case, what the code does, and why > > it works. I will not accept a patch that was quickly tested and > > appeared to work. I consider that stuff a corner case, and not > > really useful, so, I would very much prefer get it to depend on > > !IPIPE. However, you seem to want and have it working, that is fine > > by me, but in that case do the work well, so that we do not get > > users complaining that it does not work in corner cases. > > The current changes already return lockdep to usable state, which is an > improvement. If this causes the kernel to hang and crash in some cases, no this is not an improvement over "depends on !IPIPE", which is what I will merge unless you provide me with something better. See, you are not competing with the current state of the I-pipe, you are competing with "depends on !IPIPE". > Plus they remove remaining risks to call the tracing > functions over the head domain, another improvement over the existing > code, for all archs. That this may not catch all migration corner cases > yet shouldn't be your worries - if you don't care about lockdep at all. As soon as I merge your patch, I am the one maintaining it, this means I will start enabling LOCKDEP in my "full debug" configuration when validating the I-pipe patch, and I do not want it to crash. Besides I am the one answering user requests about the I-pipe patch for the ARM architecture, and I do not want to have to deal with users reporting weird crashes with LOCKDEP turned on. So, this will become my worry, and no, I do not want any case not covered by your patch. Besides there are not that many paths in entry.S, so asking for a thorough analysis is not asking for something tedious or complicated.. > > However, I will propose properly described and signed-off patches for > merge once testing and analysis provide the required confidence in the > approach. Validation by testing is required, but by no means as valuable as a good design. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-10 12:43 ` Gilles Chanteperdrix 2014-11-10 14:52 ` Jan Kiszka @ 2014-11-11 17:33 ` Stoidner, Christoph 2014-11-11 17:46 ` Gilles Chanteperdrix 1 sibling, 1 reply; 47+ messages in thread From: Stoidner, Christoph @ 2014-11-11 17:33 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org Hi Gilles, > Also, for the "scheduling while atomic", it may happen if you call > some Linux service which reschedules from primary mode, you can try > enabling I-pipe debugging, and in fact all Xenomai debugging, to try > and catch such mistakes. This is especially important if you are > running a custom skin. you are completely right, we have implemented our own skin. Using the debugging functionality mentioned above we have identified a function-call that leads to exact the behaviour as described in your post. Hopefully solving that issue solves our application crash. Thanks for your fast and helpful support! Regards, Christoph ________________________________________ Von: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Gesendet: Montag, 10. November 2014 13:43 An: Stoidner, Christoph Cc: xenomai@xenomai.org Betreff: Re: [Xenomai] "inconsistent lock state" on boot-up On Mon, Nov 10, 2014 at 09:08:47AM +0000, Stoidner, Christoph wrote: > > Hi Gilles, > > > Do you have the same message with exactly the same kernel > > configuration, only with CONFIG_XENOMAI and CONFIG_IPIPE disabled? > > When CONFIG_XENOMAI and CONFIG_IPIPE are disabled the message does not > appear on boot-up. > > > Do you have FCSE enabled? If yes, did you try disabling it? same > > with unlocked context switch. > > FCSE is already disabled at all. > > Do you have an idea how to overcome the problem? I am not sure the lockdep message really is a problem. lockdep could be confused by the fact that the hardware interrupts are not off when running the I-pipe, or because we are missing some bit in the I-pipe arm specific code to get it looking at the virtual mask instead of the hardware mask. As for the scheduling while atomic and random segmentation fault, you should use the I-pipe tracer, configure it with enough back trace points, something like 1000 or 10000, and trigger a trace freeze in the kernell code when the problem happens. Also, for the "scheduling while atomic", it may happen if you call some Linux service which reschedules from primary mode, you can try enabling I-pipe debugging, and in fact all Xenomai debugging, to try and catch such mistakes. This is especially important if you are running a custom skin. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-11 17:33 ` Stoidner, Christoph @ 2014-11-11 17:46 ` Gilles Chanteperdrix 2014-11-11 18:04 ` Philippe Gerum 2014-11-17 10:01 ` Stoidner, Christoph 0 siblings, 2 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-11 17:46 UTC (permalink / raw) To: Stoidner, Christoph; +Cc: xenomai@xenomai.org On Tue, Nov 11, 2014 at 05:33:55PM +0000, Stoidner, Christoph wrote: > > Hi Gilles, > > > Also, for the "scheduling while atomic", it may happen if you call > > some Linux service which reschedules from primary mode, you can try > > enabling I-pipe debugging, and in fact all Xenomai debugging, to try > > and catch such mistakes. This is especially important if you are > > running a custom skin. > > you are completely right, we have implemented our own skin. Using > the debugging functionality mentioned above we have identified a > function-call that leads to exact the behaviour as described in your post. > > Hopefully solving that issue solves our application crash. > > Thanks for your fast and helpful support! You are welcome. As a side note, if you have followed the discussion, CONFIG_TRACE_IRQFLAGS is broken on ARM, Jan hopes to be able to fix it, but for the time being, you should disable it, or you risk to create other issues. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-11 17:46 ` Gilles Chanteperdrix @ 2014-11-11 18:04 ` Philippe Gerum 2014-11-17 10:01 ` Stoidner, Christoph 1 sibling, 0 replies; 47+ messages in thread From: Philippe Gerum @ 2014-11-11 18:04 UTC (permalink / raw) To: Gilles Chanteperdrix, Stoidner, Christoph; +Cc: xenomai@xenomai.org On 11/11/2014 06:46 PM, Gilles Chanteperdrix wrote: > On Tue, Nov 11, 2014 at 05:33:55PM +0000, Stoidner, Christoph wrote: >> >> Hi Gilles, >> >>> Also, for the "scheduling while atomic", it may happen if you call >>> some Linux service which reschedules from primary mode, you can try >>> enabling I-pipe debugging, and in fact all Xenomai debugging, to try >>> and catch such mistakes. This is especially important if you are >>> running a custom skin. >> >> you are completely right, we have implemented our own skin. Using >> the debugging functionality mentioned above we have identified a >> function-call that leads to exact the behaviour as described in your post. >> >> Hopefully solving that issue solves our application crash. >> >> Thanks for your fast and helpful support! > > You are welcome. As a side note, if you have followed the > discussion, CONFIG_TRACE_IRQFLAGS is broken on ARM, Jan hopes to be > able to fix it, but for the time being, you should disable it, or > you risk to create other issues. > CONFIG_TRACE_IRQFLAGS is currently broken on several architectures with IPIPE enabled. ppc64 certainly is, ppc32 likely, blackfin maybe. I did not check x86 while upgrading to 3.16 yet. -- Philippe. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-11 17:46 ` Gilles Chanteperdrix 2014-11-11 18:04 ` Philippe Gerum @ 2014-11-17 10:01 ` Stoidner, Christoph 2014-11-17 10:22 ` Gilles Chanteperdrix 2014-11-17 11:49 ` Philippe Gerum 1 sibling, 2 replies; 47+ messages in thread From: Stoidner, Christoph @ 2014-11-17 10:01 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org Hi, > you are completely right, we have implemented our own skin. Using > the debugging functionality mentioned above we have identified a > function-call that leads to exact the behaviour as described in your post. > > Hopefully solving that issue solves our application crash. Now the problem "scheduling while atomic" does not occur anymore within API calls of our own skin. However, after some run-time (about 5 minutes or more) it seems to appear in gatekeeper-thread (see below). What is not clear to me is the ipipe_raise_irq() call in backtrace below. I could not identify any according call from within gatekeeper_thread(). Do I overlook something? What I also do not understand is the timestamp of kernel message. As mentioned the messages do appear about 5 minutes after kernel start. However the message's timestamp are from about 40 seconds after boot? Could it happen that messages are delayed? Or is the timestamp wrong? NOTE: The config option CONFIG_TRACE_IRQFLAGS is disabled. [ 40.670362] BUG: scheduling while atomic: gatekeeper/0/22/0x00010001 [ 40.670393] CPU: 0 PID: 22 Comm: gatekeeper/0 Not tainted 3.10.18-rt14-arvero-rev01-ipipe #2 [ 40.670480] [<c00130a4>] (unwind_backtrace+0x0/0xf0) from [<c0011484>] (show_stack+0x10/0x14) [ 40.670538] [<c0011484>] (show_stack+0x10/0x14) from [<c048329c>] (__schedule_bug+0x3c/0x54) [ 40.670581] [<c048329c>] (__schedule_bug+0x3c/0x54) from [<c04887b8>] (__schedule+0x35c/0x4f0) [ 40.670611] [<c04887b8>] (__schedule+0x35c/0x4f0) from [<c0488980>] (schedule+0x34/0xa0) [ 40.670646] [<c0488980>] (schedule+0x34/0xa0) from [<c0489628>] (rt_spin_lock_slowlock+0x14c/0x308) [ 40.670684] [<c0489628>] (rt_spin_lock_slowlock+0x14c/0x308) from [<c002ad24>] (__lock_task_sighand+0x40/0x6c) [ 40.670716] [<c002ad24>] (__lock_task_sighand+0x40/0x6c) from [<c002ad74>] (do_send_sig_info+0x24/0x64) [ 40.670748] [<c002ad74>] (do_send_sig_info+0x24/0x64) from [<c00aaf88>] (lostage_handler+0xec/0x11c) [ 40.670778] [<c00aaf88>] (lostage_handler+0xec/0x11c) from [<c006dfa4>] (rthal_apc_handler+0x4c/0x60) [ 40.670810] [<c006dfa4>] (rthal_apc_handler+0x4c/0x60) from [<c00611d8>] (__ipipe_do_sync_stage+0x1f8/0x288) [ 40.670844] [<c00611d8>] (__ipipe_do_sync_stage+0x1f8/0x288) from [<c0014224>] (ipipe_raise_irq+0x18/0x20) [ 40.670875] [<c0014224>] (ipipe_raise_irq+0x18/0x20) from [<c00aac84>] (gatekeeper_thread+0x150/0x368) [ 40.670918] [<c00aac84>] (gatekeeper_thread+0x150/0x368) from [<c00383f8>] (kthread+0x9c/0xa4) [ 40.670962] [<c00383f8>] (kthread+0x9c/0xa4) from [<c000ea20>] (ret_from_fork+0x18/0x38) [ 40.671179] ------------[ cut here ]------------ [ 40.671233] WARNING: at kernel/softirq.c:748 irq_exit+0x118/0x138() [ 40.671260] CPU: 0 PID: 22 Comm: gatekeeper/0 Tainted: G W 3.10.18-rt14-arvero-rev01-ipipe #2 [ 40.671317] [<c00130a4>] (unwind_backtrace+0x0/0xf0) from [<c0011484>] (show_stack+0x10/0x14) [ 40.671372] [<c0011484>] (show_stack+0x10/0x14) from [<c001b7bc>] (warn_slowpath_common+0x48/0x64) [ 40.671416] [<c001b7bc>] (warn_slowpath_common+0x48/0x64) from [<c001b8a0>] (warn_slowpath_null+0x1c/0x24) [ 40.671453] [<c001b8a0>] (warn_slowpath_null+0x1c/0x24) from [<c0023414>] (irq_exit+0x118/0x138) [ 40.671487] [<c0023414>] (irq_exit+0x118/0x138) from [<c00611dc>] (__ipipe_do_sync_stage+0x1fc/0x288) [ 40.671518] [<c00611dc>] (__ipipe_do_sync_stage+0x1fc/0x288) from [<c0014224>] (ipipe_raise_irq+0x18/0x20) [ 40.671548] [<c0014224>] (ipipe_raise_irq+0x18/0x20) from [<c00aac84>] (gatekeeper_thread+0x150/0x368) [ 40.671586] [<c00aac84>] (gatekeeper_thread+0x150/0x368) from [<c00383f8>] (kthread+0x9c/0xa4) [ 40.671620] [<c00383f8>] (kthread+0x9c/0xa4) from [<c000ea20>] (ret_from_fork+0x18/0x38) [ 40.671632] ---[ end trace 0000000000000002 ]--- Thanks and advance Christoph ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 10:01 ` Stoidner, Christoph @ 2014-11-17 10:22 ` Gilles Chanteperdrix 2014-11-17 11:13 ` Stoidner, Christoph 2014-11-17 11:49 ` Philippe Gerum 1 sibling, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 10:22 UTC (permalink / raw) To: Stoidner, Christoph; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 10:01:40AM +0000, Stoidner, Christoph wrote: > > Hi, > > > you are completely right, we have implemented our own skin. Using > > the debugging functionality mentioned above we have identified a > > function-call that leads to exact the behaviour as described in your post. > > > > Hopefully solving that issue solves our application crash. > > Now the problem "scheduling while atomic" does not occur anymore > within API calls of our own skin. However, after some run-time > (about 5 minutes or more) it seems to appear in gatekeeper-thread > (see below). > > What is not clear to me is the ipipe_raise_irq() call in backtrace > below. I could not identify any according call from within > gatekeeper_thread(). Do I overlook something? > > What I also do not understand is the timestamp of kernel message. > As mentioned the messages do appear about 5 minutes after kernel > start. However the message's timestamp are from about 40 seconds > after boot? Could it happen that messages are delayed? Or is the > timestamp wrong? This would probably mean an issue with the tsc emulation, have you tried running the "tsc" program, from xenomai regression testsuite with the -w option ? I remember than the imx28 tsc emulation is a bit weird, the hardware sometimes returns wrong values, and the support answer was "read it twice, until you get twice the same value". But I never found this really satisfactory: what if reading it twice returns the same wrong value twice ? The tsc test should see if the tsc wrapping is doing fine. You can try to run it several time, or even in parallel to your tests, to see if it does not detect any problem. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 10:22 ` Gilles Chanteperdrix @ 2014-11-17 11:13 ` Stoidner, Christoph 2014-11-17 11:30 ` Philippe Gerum 0 siblings, 1 reply; 47+ messages in thread From: Stoidner, Christoph @ 2014-11-17 11:13 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org >> >> Now the problem "scheduling while atomic" does not occur anymore >> within API calls of our own skin. However, after some run-time >> (about 5 minutes or more) it seems to appear in gatekeeper-thread >> (see below). >> >> What is not clear to me is the ipipe_raise_irq() call in backtrace >> below. I could not identify any according call from within >> gatekeeper_thread(). Do I overlook something? >> >> What I also do not understand is the timestamp of kernel message. >> As mentioned the messages do appear about 5 minutes after kernel >> start. However the message's timestamp are from about 40 seconds >> after boot? Could it happen that messages are delayed? Or is the >> timestamp wrong? > > This would probably mean an issue with the tsc emulation, have you > tried running the "tsc" program, from xenomai regression testsuite > with the -w option ? I remember than the imx28 tsc emulation is a > bit weird, the hardware sometimes returns wrong values, and the > support answer was "read it twice, until you get twice the same > value". But I never found this really satisfactory: what if reading > it twice returns the same wrong value twice ? > > The tsc test should see if the tsc wrapping is doing fine. You can > try to run it several time, or even in parallel to your tests, to > see if it does not detect any problem. There are some other kernel message whose's timestamp seems to be correct. E.g. when creating a semaphore (as below): [ 17.336237] Xenomai: registered exported object @CGI (semaphores) [ 17.344122] Xenomai: registered exported object LOG (msgx) I would expect these message on program start which would also match the shown timestamp. However these message are also outputted late after 5 minutes run-time, exact same time when "scheduling while atomic" is showed. So now I am assuming the timestamp is valid but messages are delayed shown. However I feel this has nothing to do with my main problem: the program crash. So maybe I should open a new thread for "delayed kernel message or wrong time stamp". Back to topic: Do you have any idea why "scheduling while atomic" is thrown by gatekeeper_thread(), based on the backtrace? Or do you know on which place ipipe_raise_irq() is called from gatekeeper thread respectively if that would be legal/expected? Regards, Christoph ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 11:13 ` Stoidner, Christoph @ 2014-11-17 11:30 ` Philippe Gerum 2014-11-17 13:16 ` Gilles Chanteperdrix 0 siblings, 1 reply; 47+ messages in thread From: Philippe Gerum @ 2014-11-17 11:30 UTC (permalink / raw) To: Stoidner, Christoph, Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 11/17/2014 12:13 PM, Stoidner, Christoph wrote: > >>> >>> Now the problem "scheduling while atomic" does not occur anymore >>> within API calls of our own skin. However, after some run-time >>> (about 5 minutes or more) it seems to appear in gatekeeper-thread >>> (see below). >>> >>> What is not clear to me is the ipipe_raise_irq() call in backtrace >>> below. I could not identify any according call from within >>> gatekeeper_thread(). Do I overlook something? >>> >>> What I also do not understand is the timestamp of kernel message. >>> As mentioned the messages do appear about 5 minutes after kernel >>> start. However the message's timestamp are from about 40 seconds >>> after boot? Could it happen that messages are delayed? Or is the >>> timestamp wrong? >> >> This would probably mean an issue with the tsc emulation, have you >> tried running the "tsc" program, from xenomai regression testsuite >> with the -w option ? I remember than the imx28 tsc emulation is a >> bit weird, the hardware sometimes returns wrong values, and the >> support answer was "read it twice, until you get twice the same >> value". But I never found this really satisfactory: what if reading >> it twice returns the same wrong value twice ? >> >> The tsc test should see if the tsc wrapping is doing fine. You can >> try to run it several time, or even in parallel to your tests, to >> see if it does not detect any problem. > > There are some other kernel message whose's timestamp seems to be correct. E.g. when creating a semaphore (as below): > > [ 17.336237] Xenomai: registered exported object @CGI (semaphores) > [ 17.344122] Xenomai: registered exported object LOG (msgx) > > I would expect these message on program start which would also match the shown timestamp. However these message are also outputted late after 5 minutes run-time, exact same time when "scheduling while atomic" is showed. So now I am assuming the timestamp is valid but messages are delayed shown. However I feel this has nothing to do with my main problem: the program crash. So maybe I should open a new thread for "delayed kernel message or wrong time stamp". > > Back to topic: Do you have any idea why "scheduling while atomic" is thrown by gatekeeper_thread(), based on the backtrace? Or do you know on which place ipipe_raise_irq() is called from gatekeeper thread respectively if that would be legal/expected? > You seem to be running a preempt-rt patched kernel, but the Xenomai core acts as if it was built for a regular preemption kernel. This virq is triggered by some code in the Xenomai rescheduling when the caller runs in secondary mode, which the gatekeeper always does. This code is correct, the way it is handled by the APC code in Xenomai due to this apparent build mismatch is not. -- Philippe. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 11:30 ` Philippe Gerum @ 2014-11-17 13:16 ` Gilles Chanteperdrix 0 siblings, 0 replies; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 13:16 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 12:30:28PM +0100, Philippe Gerum wrote: > On 11/17/2014 12:13 PM, Stoidner, Christoph wrote: > > > >>> > >>> Now the problem "scheduling while atomic" does not occur anymore > >>> within API calls of our own skin. However, after some run-time > >>> (about 5 minutes or more) it seems to appear in gatekeeper-thread > >>> (see below). > >>> > >>> What is not clear to me is the ipipe_raise_irq() call in backtrace > >>> below. I could not identify any according call from within > >>> gatekeeper_thread(). Do I overlook something? > >>> > >>> What I also do not understand is the timestamp of kernel message. > >>> As mentioned the messages do appear about 5 minutes after kernel > >>> start. However the message's timestamp are from about 40 seconds > >>> after boot? Could it happen that messages are delayed? Or is the > >>> timestamp wrong? > >> > >> This would probably mean an issue with the tsc emulation, have you > >> tried running the "tsc" program, from xenomai regression testsuite > >> with the -w option ? I remember than the imx28 tsc emulation is a > >> bit weird, the hardware sometimes returns wrong values, and the > >> support answer was "read it twice, until you get twice the same > >> value". But I never found this really satisfactory: what if reading > >> it twice returns the same wrong value twice ? > >> > >> The tsc test should see if the tsc wrapping is doing fine. You can > >> try to run it several time, or even in parallel to your tests, to > >> see if it does not detect any problem. > > > > There are some other kernel message whose's timestamp seems to be correct. E.g. when creating a semaphore (as below): > > > > [ 17.336237] Xenomai: registered exported object @CGI (semaphores) > > [ 17.344122] Xenomai: registered exported object LOG (msgx) > > > > I would expect these message on program start which would also match the shown timestamp. However these message are also outputted late after 5 minutes run-time, exact same time when "scheduling while atomic" is showed. So now I am assuming the timestamp is valid but messages are delayed shown. However I feel this has nothing to do with my main problem: the program crash. So maybe I should open a new thread for "delayed kernel message or wrong time stamp". > > > > Back to topic: Do you have any idea why "scheduling while atomic" is thrown by gatekeeper_thread(), based on the backtrace? Or do you know on which place ipipe_raise_irq() is called from gatekeeper thread respectively if that would be legal/expected? > > > > You seem to be running a preempt-rt patched kernel, but the Xenomai core > acts as if it was built for a regular preemption kernel. This virq is > triggered by some code in the Xenomai rescheduling when the caller runs > in secondary mode, which the gatekeeper always does. This code is > correct, the way it is handled by the APC code in Xenomai due to this > apparent build mismatch is not. Note that if you are looking for low latencies on imx28 (ok, not bounded, but much lower on average), you would probably be better off using the FCSE extension, it also improves Linux latency. If the guaranteed mode has to many restrictions for your use case, use the best-effort mode, which still reduces the latency on average. For instance, on the at91sam9263 I use to test Xenomai on armv5, enabling the FCSE in best-effort mode divides hackbench run time by 2, for the same arguments of hackbench. http://sisyphus.hd.free.fr/~gilles/pub/fcse/hackbench-fcse-v4.png -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 10:01 ` Stoidner, Christoph 2014-11-17 10:22 ` Gilles Chanteperdrix @ 2014-11-17 11:49 ` Philippe Gerum 2014-11-17 11:51 ` Philippe Gerum 2014-11-17 13:10 ` Gilles Chanteperdrix 1 sibling, 2 replies; 47+ messages in thread From: Philippe Gerum @ 2014-11-17 11:49 UTC (permalink / raw) To: Stoidner, Christoph, Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 11/17/2014 11:01 AM, Stoidner, Christoph wrote: > > Hi, > >> you are completely right, we have implemented our own skin. Using >> the debugging functionality mentioned above we have identified a >> function-call that leads to exact the behaviour as described in your post. >> >> Hopefully solving that issue solves our application crash. > > Now the problem "scheduling while atomic" does not occur anymore within API calls of our own skin. However, after some run-time (about 5 minutes or more) it seems to appear in gatekeeper-thread (see below). > > What is not clear to me is the ipipe_raise_irq() call in backtrace below. I could not identify any according call from within gatekeeper_thread(). Do I overlook something? > You could match the closest routine to the calling PC value (c00aac84) using add2line on your kernel image. e.g. arm-linux-gnueabihf-addr2line -e vmlinux -a c00aac84 -- Philippe. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 11:49 ` Philippe Gerum @ 2014-11-17 11:51 ` Philippe Gerum 2014-11-17 13:10 ` Gilles Chanteperdrix 1 sibling, 0 replies; 47+ messages in thread From: Philippe Gerum @ 2014-11-17 11:51 UTC (permalink / raw) To: Stoidner, Christoph, Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 11/17/2014 12:49 PM, Philippe Gerum wrote: > On 11/17/2014 11:01 AM, Stoidner, Christoph wrote: >> >> Hi, >> >>> you are completely right, we have implemented our own skin. Using >>> the debugging functionality mentioned above we have identified a >>> function-call that leads to exact the behaviour as described in your post. >>> >>> Hopefully solving that issue solves our application crash. >> >> Now the problem "scheduling while atomic" does not occur anymore within API calls of our own skin. However, after some run-time (about 5 minutes or more) it seems to appear in gatekeeper-thread (see below). >> >> What is not clear to me is the ipipe_raise_irq() call in backtrace below. I could not identify any according call from within gatekeeper_thread(). Do I overlook something? >> > > You could match the closest routine to the calling PC value (c00aac84) > using add2line on your kernel image. > > e.g. arm-linux-gnueabihf-addr2line -e vmlinux -a c00aac84 > You may need CONFIG_DEBUG_INFO enabled. -- Philippe. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 11:49 ` Philippe Gerum 2014-11-17 11:51 ` Philippe Gerum @ 2014-11-17 13:10 ` Gilles Chanteperdrix 2014-11-17 13:33 ` Philippe Gerum 1 sibling, 1 reply; 47+ messages in thread From: Gilles Chanteperdrix @ 2014-11-17 13:10 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai@xenomai.org On Mon, Nov 17, 2014 at 12:49:01PM +0100, Philippe Gerum wrote: > On 11/17/2014 11:01 AM, Stoidner, Christoph wrote: > > > > Hi, > > > >> you are completely right, we have implemented our own skin. Using > >> the debugging functionality mentioned above we have identified a > >> function-call that leads to exact the behaviour as described in your post. > >> > >> Hopefully solving that issue solves our application crash. > > > > Now the problem "scheduling while atomic" does not occur anymore within API calls of our own skin. However, after some run-time (about 5 minutes or more) it seems to appear in gatekeeper-thread (see below). > > > > What is not clear to me is the ipipe_raise_irq() call in backtrace below. I could not identify any according call from within gatekeeper_thread(). Do I overlook something? > > > > You could match the closest routine to the calling PC value (c00aac84) > using add2line on your kernel image. > > e.g. arm-linux-gnueabihf-addr2line -e vmlinux -a c00aac84 That would rather be: arm-none-linux-gnueabi-addr2line imx28 is an armv5 (who said that armv4 and armv5 were no longer in circulation ?). :-) just nitpicking. If you have a the multiarch binutils version installed, the default addr2line should work as well. -- Gilles. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Xenomai] "inconsistent lock state" on boot-up 2014-11-17 13:10 ` Gilles Chanteperdrix @ 2014-11-17 13:33 ` Philippe Gerum 0 siblings, 0 replies; 47+ messages in thread From: Philippe Gerum @ 2014-11-17 13:33 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 11/17/2014 02:10 PM, Gilles Chanteperdrix wrote: > On Mon, Nov 17, 2014 at 12:49:01PM +0100, Philippe Gerum wrote: >> On 11/17/2014 11:01 AM, Stoidner, Christoph wrote: >>> >>> Hi, >>> >>>> you are completely right, we have implemented our own skin. Using >>>> the debugging functionality mentioned above we have identified a >>>> function-call that leads to exact the behaviour as described in your post. >>>> >>>> Hopefully solving that issue solves our application crash. >>> >>> Now the problem "scheduling while atomic" does not occur anymore within API calls of our own skin. However, after some run-time (about 5 minutes or more) it seems to appear in gatekeeper-thread (see below). >>> >>> What is not clear to me is the ipipe_raise_irq() call in backtrace below. I could not identify any according call from within gatekeeper_thread(). Do I overlook something? >>> >> >> You could match the closest routine to the calling PC value (c00aac84) >> using add2line on your kernel image. >> >> e.g. arm-linux-gnueabihf-addr2line -e vmlinux -a c00aac84 > > That would rather be: > arm-none-linux-gnueabi-addr2line > > imx28 is an armv5 (who said that armv4 and armv5 were no longer in > circulation ?). > > :-) > just nitpicking. > > If you have a the multiarch binutils version installed, the default > addr2line should work as well. > "e.g." precisely stands for this, "to be replaced by the command that fits". I'm not building for v4/5 these days. -- Philippe. ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2014-11-18 6:28 UTC | newest] Thread overview: 47+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-09 10:07 [Xenomai] "inconsistent lock state" on boot-up Stoidner, Christoph 2014-11-09 15:53 ` Gilles Chanteperdrix 2014-11-10 9:08 ` Stoidner, Christoph 2014-11-10 12:33 ` Stoidner, Christoph 2014-11-10 12:44 ` Gilles Chanteperdrix 2014-11-10 12:43 ` Gilles Chanteperdrix 2014-11-10 14:52 ` Jan Kiszka 2014-11-10 15:56 ` Gilles Chanteperdrix 2014-11-10 18:29 ` Jan Kiszka 2014-11-10 19:46 ` Gilles Chanteperdrix 2014-11-10 19:51 ` Gilles Chanteperdrix 2014-11-10 19:55 ` Jan Kiszka 2014-11-10 20:00 ` Gilles Chanteperdrix 2014-11-10 20:02 ` Jan Kiszka 2014-11-10 20:06 ` Gilles Chanteperdrix 2014-11-10 20:10 ` Jan Kiszka 2014-11-10 20:14 ` Gilles Chanteperdrix 2014-11-10 20:17 ` Jan Kiszka 2014-11-10 20:18 ` Gilles Chanteperdrix 2014-11-10 20:22 ` Jan Kiszka 2014-11-10 20:23 ` Gilles Chanteperdrix 2014-11-10 20:28 ` Jan Kiszka 2014-11-10 20:37 ` Gilles Chanteperdrix 2014-11-10 20:42 ` Jan Kiszka 2014-11-10 20:55 ` Gilles Chanteperdrix 2014-11-10 21:58 ` Gilles Chanteperdrix 2014-11-12 17:27 ` Gilles Chanteperdrix 2014-11-17 16:48 ` Jan Kiszka 2014-11-17 16:59 ` Gilles Chanteperdrix 2014-11-17 17:11 ` Jan Kiszka 2014-11-17 17:33 ` Gilles Chanteperdrix 2014-11-17 19:07 ` Jan Kiszka 2014-11-17 19:24 ` Gilles Chanteperdrix 2014-11-18 6:19 ` Jan Kiszka 2014-11-18 6:28 ` Gilles Chanteperdrix 2014-11-11 17:33 ` Stoidner, Christoph 2014-11-11 17:46 ` Gilles Chanteperdrix 2014-11-11 18:04 ` Philippe Gerum 2014-11-17 10:01 ` Stoidner, Christoph 2014-11-17 10:22 ` Gilles Chanteperdrix 2014-11-17 11:13 ` Stoidner, Christoph 2014-11-17 11:30 ` Philippe Gerum 2014-11-17 13:16 ` Gilles Chanteperdrix 2014-11-17 11:49 ` Philippe Gerum 2014-11-17 11:51 ` Philippe Gerum 2014-11-17 13:10 ` Gilles Chanteperdrix 2014-11-17 13:33 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.