public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Steven Rostedt <srostedt@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jens Axboe <jens.axboe@oracle.com>, Ingo Molnar <mingo@elte.hu>
Subject: [patch 13/71] tracing: Fix too large stack usage in do_one_initcall()
Date: Fri, 04 Sep 2009 17:13:48 -0700	[thread overview]
Message-ID: <20090905001449.445826398@mini.kroah.org> (raw)
In-Reply-To: <20090905001824.GA18171@kroah.com>

[-- Attachment #1: tracing-fix-too-large-stack-usage-in-do_one_initcall.patch --]
[-- Type: text/plain, Size: 7966 bytes --]

2.6.30-stable review patch.  If anyone has any objections, please let us know.

------------------
From: Ingo Molnar <mingo@elte.hu>

commit 4a683bf94b8a10e2bb0da07aec3ac0a55e5de61f upstream.

One of my testboxes triggered this nasty stack overflow crash
during SCSI probing:

[    5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.875004] device: 'sda': device_add
[    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
[    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
[    5.878004] *pde = 00000000
[    5.878004] Thread overran stack, or stack corrupted
[    5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[    5.878004] last sysfs file:
[    5.878004]
[    5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
[    5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
[    5.878004] EIP is at print_context_stack+0x81/0x110
[    5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
[    5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
[    5.878004]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
[    5.878004] Stack:
[    5.878004]  b1004867 fffff000 cf8a3ffc
[    5.878004] Call Trace:
[    5.878004]  [<b1004867>] ? kernel_thread_helper+0x7/0x10
[    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
[    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
[    5.878004] *pde = 00000000
[    5.878004] Thread overran stack, or stack corrupted
[    5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC

The oops did not reveal any more details about the real stack
that we have and the system got into an infinite loop of
recursive pagefaults.

So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
parameter. The box did not crash (timings/conditions probably
changed a tiny bit to trigger the catastrophic crash), but the
/debug/tracing/stack_trace file was rather revealing:

        Depth    Size   Location    (72 entries)
        -----    ----   --------
  0)     3704      52   __change_page_attr+0xb8/0x290
  1)     3652      24   __change_page_attr_set_clr+0x43/0x90
  2)     3628      60   kernel_map_pages+0x108/0x120
  3)     3568      40   prep_new_page+0x7d/0x130
  4)     3528      84   get_page_from_freelist+0x106/0x420
  5)     3444     116   __alloc_pages_nodemask+0xd7/0x550
  6)     3328      36   allocate_slab+0xb1/0x100
  7)     3292      36   new_slab+0x1c/0x160
  8)     3256      36   __slab_alloc+0x133/0x2b0
  9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
 10)     3216     108   create_object+0x28/0x250
 11)     3108      40   kmemleak_alloc+0x81/0xc0
 12)     3068      24   kmem_cache_alloc+0x162/0x1d0
 13)     3044      52   scsi_pool_alloc_command+0x29/0x70
 14)     2992      20   scsi_host_alloc_command+0x22/0x70
 15)     2972      24   __scsi_get_command+0x1b/0x90
 16)     2948      28   scsi_get_command+0x35/0x90
 17)     2920      24   scsi_setup_blk_pc_cmnd+0xd4/0x100
 18)     2896     128   sd_prep_fn+0x332/0xa70
 19)     2768      36   blk_peek_request+0xe7/0x1d0
 20)     2732      56   scsi_request_fn+0x54/0x520
 21)     2676      12   __generic_unplug_device+0x2b/0x40
 22)     2664      24   blk_execute_rq_nowait+0x59/0x80
 23)     2640     172   blk_execute_rq+0x6b/0xb0
 24)     2468      32   scsi_execute+0xe0/0x140
 25)     2436      64   scsi_execute_req+0x152/0x160
 26)     2372      60   scsi_vpd_inquiry+0x6c/0x90
 27)     2312      44   scsi_get_vpd_page+0x112/0x160
 28)     2268      52   sd_revalidate_disk+0x1df/0x320
 29)     2216      92   rescan_partitions+0x98/0x330
 30)     2124      52   __blkdev_get+0x309/0x350
 31)     2072       8   blkdev_get+0xf/0x20
 32)     2064      44   register_disk+0xff/0x120
 33)     2020      36   add_disk+0x6e/0xb0
 34)     1984      44   sd_probe_async+0xfb/0x1d0
 35)     1940      44   __async_schedule+0xf4/0x1b0
 36)     1896       8   async_schedule+0x12/0x20
 37)     1888      60   sd_probe+0x305/0x360
 38)     1828      44   really_probe+0x63/0x170
 39)     1784      36   driver_probe_device+0x5d/0x60
 40)     1748      16   __device_attach+0x49/0x50
 41)     1732      32   bus_for_each_drv+0x5b/0x80
 42)     1700      24   device_attach+0x6b/0x70
 43)     1676      16   bus_attach_device+0x47/0x60
 44)     1660      76   device_add+0x33d/0x400
 45)     1584      52   scsi_sysfs_add_sdev+0x6a/0x2c0
 46)     1532     108   scsi_add_lun+0x44b/0x460
 47)     1424     116   scsi_probe_and_add_lun+0x182/0x4e0
 48)     1308      36   __scsi_add_device+0xd9/0xe0
 49)     1272      44   ata_scsi_scan_host+0x10b/0x190
 50)     1228      24   async_port_probe+0x96/0xd0
 51)     1204      44   __async_schedule+0xf4/0x1b0
 52)     1160       8   async_schedule+0x12/0x20
 53)     1152      48   ata_host_register+0x171/0x1d0
 54)     1104      60   ata_pci_sff_activate_host+0xf3/0x230
 55)     1044      44   ata_pci_sff_init_one+0xea/0x100
 56)     1000      48   amd_init_one+0xb2/0x190
 57)      952       8   local_pci_probe+0x13/0x20
 58)      944      32   pci_device_probe+0x68/0x90
 59)      912      44   really_probe+0x63/0x170
 60)      868      36   driver_probe_device+0x5d/0x60
 61)      832      20   __driver_attach+0x89/0xa0
 62)      812      32   bus_for_each_dev+0x5b/0x80
 63)      780      12   driver_attach+0x1e/0x20
 64)      768      72   bus_add_driver+0x14b/0x2d0
 65)      696      36   driver_register+0x6e/0x150
 66)      660      20   __pci_register_driver+0x53/0xc0
 67)      640       8   amd_init+0x14/0x16
 68)      632     572   do_one_initcall+0x2b/0x1d0
 69)       60      12   do_basic_setup+0x56/0x6a
 70)       48      20   kernel_init+0x84/0xce
 71)       28      28   kernel_thread_helper+0x7/0x10

There's a lot of fat functions on that stack trace, but
the largest of all is do_one_initcall(). This is due to
the boot trace entry variables being on the stack.

Fixing this is relatively easy, initcalls are fundamentally
serialized, so we can move the local variables to file scope.

Note that this large stack footprint was present for a
couple of months already - what pushed my system over
the edge was the addition of kmemleak to the call-chain:

  6)     3328      36   allocate_slab+0xb1/0x100
  7)     3292      36   new_slab+0x1c/0x160
  8)     3256      36   __slab_alloc+0x133/0x2b0
  9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
 10)     3216     108   create_object+0x28/0x250
 11)     3108      40   kmemleak_alloc+0x81/0xc0
 12)     3068      24   kmem_cache_alloc+0x162/0x1d0
 13)     3044      52   scsi_pool_alloc_command+0x29/0x70

This pushes the total to ~3800 bytes, only a tiny bit
more was needed to corrupt the on-kernel-stack thread_info.

The fix reduces the stack footprint from 572 bytes
to 28 bytes.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 init/main.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/init/main.c
+++ b/init/main.c
@@ -702,13 +702,14 @@ asmlinkage void __init start_kernel(void
 int initcall_debug;
 core_param(initcall_debug, initcall_debug, bool, 0644);
 
+static char msgbuf[64];
+static struct boot_trace_call call;
+static struct boot_trace_ret ret;
+
 int do_one_initcall(initcall_t fn)
 {
 	int count = preempt_count();
 	ktime_t calltime, delta, rettime;
-	char msgbuf[64];
-	struct boot_trace_call call;
-	struct boot_trace_ret ret;
 
 	if (initcall_debug) {
 		call.caller = task_pid_nr(current);



  parent reply	other threads:[~2009-09-05  0:20 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090905001335.106974681@mini.kroah.org>
2009-09-05  0:18 ` [patch 00/71] 2.6.30.6-stable review Greg KH
2009-09-05  0:13   ` [patch 01/71] ehea: Fix napi list corruption on ifconfig down Greg KH
2009-09-05  0:13   ` [patch 02/71] poll/select: initialize triggered field of struct poll_wqueues Greg KH
2009-09-05  0:13   ` [patch 03/71] Make bitmask and operators return a result code Greg KH
2009-09-05  0:13   ` [patch 04/71] x86: dont send an IPI to the empty set of CPUs Greg KH
2009-09-05  0:13   ` [patch 05/71] x86: dont call ->send_IPI_mask() with an empty mask Greg KH
2009-09-05  0:13   ` [patch 06/71] mm: build_zonelists(): move clear node_load[] to __build_all_zonelists() Greg KH
2009-09-05  0:13   ` [patch 07/71] rt2x00: fix memory corruption in rf cache, add a sanity check Greg KH
2009-09-05  0:13   ` [patch 08/71] mac80211: fix panic when splicing unprepared TIDs Greg KH
2009-09-05  0:13   ` [patch 09/71] Re-introduce page mapping check in mark_buffer_dirty() Greg KH
2009-09-05  0:13   ` [patch 10/71] mm: fix hugetlb bug due to user_shm_unlock call Greg KH
2009-09-05  0:13   ` [patch 11/71] ima: hashing large files bug fix Greg KH
2009-09-05  0:13   ` [patch 12/71] kernel_read: redefine offset type Greg KH
2009-09-05  0:13   ` Greg KH [this message]
2009-09-05  0:13   ` [patch 14/71] sound: pcm_lib: fix unsorted list constraint handling Greg KH
2009-09-05  0:13   ` [patch 15/71] clone(): fix race between copy_process() and de_thread() Greg KH
2009-09-05  0:13   ` [patch 16/71] wmi: fix kernel panic when stack protection enabled Greg KH
2009-09-05  0:13   ` [patch 17/71] SUNRPC: Fix rpc_task_force_reencode Greg KH
2009-09-05  0:13   ` [patch 18/71] ALSA: hda - Fix MacBookPro 3,1/4,1 quirk with ALC889A Greg KH
2009-09-05  0:13   ` [patch 19/71] KVM: take mmu_lock when updating a deleted slot Greg KH
2009-09-05  0:13   ` [patch 20/71] KVM: x86: check for cr3 validity in mmu_alloc_roots Greg KH
2009-09-05  0:13   ` [patch 21/71] KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock Greg KH
2009-09-05  0:13   ` [patch 22/71] KVM: MMU: do not free active mmu pages in free_mmu_pages() Greg KH
2009-09-05  0:13   ` [patch 23/71] KVM: Introduce {set/get}_interrupt_shadow() Greg KH
2009-09-05  0:13   ` [patch 24/71] KVM: Deal with interrupt shadow state for emulated instructions Greg KH
2009-09-05  0:14   ` [patch 25/71] KVM: MMU: Use different shadows when EFER.NXE changes Greg KH
2009-09-05  0:14   ` [patch 26/71] KVM: x86: Ignore reads to EVNTSEL MSRs Greg KH
2009-09-05  0:14   ` [patch 27/71] KVM: Ignore reads to K7 " Greg KH
2009-09-05  0:14   ` [patch 28/71] KVM: Fix cpuid feature misreporting Greg KH
2009-09-05  0:14   ` [patch 29/71] KVM: x86: verify MTRR/PAT validity Greg KH
2009-09-05  0:14   ` [patch 30/71] KVM: SVM: force new asid on vcpu migration Greg KH
2009-09-05  0:14   ` [patch 31/71] KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages Greg KH
2009-09-05  0:14   ` [patch 32/71] [stable] [PATCH 14/16] KVM: MMU: limit rmap chain length Greg KH
2009-09-05  0:14   ` [patch 33/71] KVM: fix ack not being delivered when msi present Greg KH
2009-09-05  0:14   ` [patch 34/71] KVM: Fix KVM_GET_MSR_INDEX_LIST Greg KH
2009-09-05  0:14   ` [patch 35/71] iwl3945: fix rfkill switch Greg KH
2009-09-05  0:14   ` [patch 36/71] iwlagn: do not send key clear commands when rfkill enabled Greg KH
2009-09-05  0:14   ` [patch 37/71] libata: OCZ Vertex cant do HPA Greg KH
2009-09-05  0:14   ` [patch 38/71] SCSI: mpt2sas: Introduced check for enclosure_handle to avoid crash Greg KH
2009-09-05  0:14   ` [patch 39/71] SCSI: mpt2sas: Expander fix oops saying "Already part of another port" Greg KH
2009-09-05  0:14   ` [patch 40/71] SCSI: mpt2sas: Raid 10 Value is showing as Raid 1E in /va/log/messages Greg KH
2009-09-05  0:14   ` [patch 41/71] SCSI: mpt2sas: Excessive log info causes sas iounit page time out Greg KH
2009-09-05  0:14   ` [patch 42/71] SCSI: mpt2sas: fix infinite loop inside config request Greg KH
2009-09-05  0:14   ` [patch 43/71] SCSI: mpt2sas: fix crash due to Watchdog is active while OS in standby mode Greg KH
2009-09-05  0:14   ` [patch 44/71] SCSI: mpt2sas: fix oops because drv data points to NULL on resume from hibernate Greg KH
2009-09-05  0:14   ` [patch 45/71] [SCSI] mpt2sas: fix config request and diag reset deadlock Greg KH
2009-09-05  0:14   ` [patch 46/71] do_sigaltstack: avoid copying stack_t as a structure to user space Greg KH
2009-09-05  0:14   ` [patch 47/71] Bug Fix arch/ia64/kernel/pci-dma.c: fix recursive dma_supported() call in iommu_dma_supported() Greg KH
2009-09-05  0:14   ` [patch 48/71] x86, amd: Dont probe for extended APIC ID if APICs are disabled Greg KH
2009-09-05  0:14   ` [patch 49/71] ocfs2: Initialize the cluster were writing to in a non-sparse extend Greg KH
2009-09-05  0:14   ` [patch 50/71] ACPI processor: force throttling state when BIOS returns incorrect value Greg KH
2009-09-05  0:14   ` [patch 51/71] vfs: fix inode_init_always calling convention Greg KH
2009-09-05  0:14   ` [patch 52/71] vfs: add __destroy_inode Greg KH
2009-09-05  0:14   ` [patch 53/71] xfs: fix freeing of inodes not yet added to the inode cache Greg KH
2009-09-05  0:14   ` [patch 54/71] xfs: fix spin_is_locked assert on uni-processor builds Greg KH
2009-09-05  0:14   ` [patch 55/71] gspca - ov534: Fix ov772x Greg KH
2009-09-05  0:14   ` [patch 56/71] kthreads: fix kthread_create() vs kthread_stop() race Greg KH
2009-09-05  0:14   ` [patch 57/71] ipv6: Fix commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make v4-mapped bindings consistent with IPv4) Greg KH
2009-09-05  0:14   ` [patch 58/71] USB: fix the clear_tt_buffer interface Greg KH
2009-09-05  0:14   ` [patch 59/71] USB: EHCI: use the new " Greg KH
2009-09-05  0:14   ` [patch 60/71] USB: EHCI: fix two new bugs related to Clear-TT-Buffer Greg KH
2009-09-05  0:14   ` [patch 61/71] powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration Greg KH
2009-09-05  0:14   ` [patch 62/71] ipv4: make ip_append_data() handle NULL routing table Greg KH
2009-09-05  0:14   ` [patch 63/71] ar9170: fix read & write outside array bounds Greg KH
2009-09-05  0:14   ` [patch 64/71] xenfb: connect to backend before registering fb Greg KH
2009-09-05  0:14   ` [patch 65/71] can: Fix raw_getname() leak Greg KH
2009-09-05  0:14   ` [patch 66/71] irda: Fix irda_getname() leak Greg KH
2009-09-05  0:14   ` [patch 67/71] appletalk: fix atalk_getname() leak Greg KH
2009-09-05  0:14   ` [patch 68/71] netrom: Fix nr_getname() leak Greg KH
2009-09-05  0:14   ` [patch 69/71] econet: Fix econet_getname() leak Greg KH
2009-09-05  0:14   ` [patch 70/71] rose: Fix rose_getname() leak Greg KH
2009-09-05  0:14   ` [patch 71/71] NET: llc, zero sockaddr_llc struct Greg KH
2009-09-05  4:54   ` [patch 00/71] 2.6.30.6-stable review Grant Coady
2009-09-05 14:48     ` Greg KH
2009-09-08 19:23   ` [Stable-review] " Luis R. Rodriguez
2009-09-09 22:44     ` Greg KH
2009-09-10  0:21       ` Luis R. Rodriguez
2009-09-10  3:00         ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090905001449.445826398@mini.kroah.org \
    --to=gregkh@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=fweisbec@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=srostedt@redhat.com \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox